Basic Linux Exploits Of Hackers
Why study exploits? Ethical hackers should study exploits to understand whether vulnerabilities are exploitable. Sometimes security professionals mistakenly believe and will publicly state that a certain vulnerability isn’t exploitable, but black hat hackers know otherwise. One person’s inability to find an exploit for a vulnerability doesn’t mean someone else can’t. It’s a matter of time and skill level. Therefore, ethical hackers must understand how to exploit vulnerabilities and check for themselves. In the process, they might need to produce proof-of-concept code to demonstrate to a vendor that a vulnerability is exploitable and needs to be fixed.
In this article, we discuss the following topics:
- Stack operations and function-calling procedures
- Buffer overflows
- Local buffer overflow exploits
- Exploit development process
Stack Operations and Function-Calling Procedures
The concept of a stack in computer science can best be explained by comparing it to a stack of lunch trays in a school cafeteria. When you put a tray on the stack, the tray that was previously on top is now covered up. When you take a tray from the stack, you take the tray from the top of the stack, which happens to be the last one put there. More formally, in computer science terms, a stack is a data structure that has the quality of a first in, last out (FILO) queue.
The process of putting items on the stack is called a push and is done in assembly language code with the push command. Likewise, the process of taking an item from the stack is called a pop and is accomplished with the pop command in assembly language code.
Every program that runs has its own stack in memory. The stack grows backward from the highest memory address to the lowest. This means that, using our cafeteria tray example, the bottom tray would be the highest memory address, and the top tray would be the lowest. Two important registers deal with the stack: Extended Base Pointer (EBP) and Extended Stack Pointer (ESP). As Figure 1 indicates, the EBP register is the base of the current stack frame of a process (higher address). The ESP register always points to the top of the stack (lower address).
a function is a self-contained module of code that can be called by other functions, including the main() function. When a function is called, it causes a jump in the flow of the program. When a function is called in assembly code, three things take place:
- By convention, the calling program sets up the function call by first placing the function parameters on the stack in reverse order.
- Next, the Extended Instruction Pointer (EIP) is saved on the stack so the program can continue where it left off when the function returns. This is referred to as the return address.
- Finally, the call command is executed, and the address of the function is placed in the EIP to execute.
In assembly code, the call looks like this:
The called function’s responsibilities are first to save the calling program’s EBP register on the stack, then to save the current ESP register to the EBP register (setting the current stack frame), and then to decrement the ESP register to make room for the function’s local variables. Finally, the function gets an opportunity to execute its statements. This process is called the function prolog.
In assembly code, the epilog looks like this:
In assembly code, the epilog looks like this:
Now that you have the basics down, we can get to the good stuff. Buffers are used to store data in memory. We are mostly interested in buffers that hold strings. Buffers themselves have no mechanism to keep you from putting too much data into the reserved space. In fact, if you get sloppy as a programmer, you can quickly outgrow the allocated space. For example, the following declares a string in memory of 10 bytes:
So what happens if you execute the following?
strcpy( str1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA")
Let’s find out:
Now compile and execute the program as follows:
NOTE In Linux-style operating systems, it’s worth noting the convention for prompts that helps you distinguish between a user shell and a root shell. Typically, a root-level shell will have a # sign as part of the prompt, whereas user shells typically have a $ sign in the prompt. This is a visual cue that shows when you’ve succeeded in escalating your privileges, but you’ll still will want to verify this using a command such as whoami or id.
Why did you get a segmentation fault? Let’s see by firing up gdb (the GNU Debugger):
As you can see, when you run the program in gdb, it crashes when trying to execute the instruction at 0x41414141, which happens to be hex for AAAA (A in hex is 0x41). Next, you can check whether the EIP was corrupted with A’s. Indeed, EIP is full of A’s and the program was doomed to crash. Remember, when the function (in this case, main) attempts to return, the saved EIP value is popped off of the stack and executed next. Because the address 0x41414141 is out of your process segment, you got a segmentation fault.
CAUTION Most modern operating systems use address space layout randomization (ASLR) to randomize stack memory calls, so we will have mixed results for the rest of this article. To disable ASLR, run the following:
Now, let’s look at attacking meet.c.
The meet.c program looks like this:
To overflow the 400-byte buffer in meet.c, you will need another tool, Perl. Perl is an interpreted language, meaning that you do not need to precompile it, which makes it very handy to use at the command line. For now, you only need to understand one Perl command:
`perl -e 'print "A" x 600'`
NOTE Backticks (`) are used to wrap a Perl command and have the shell interpreter execute the command and return the value. This command will simply print 600 A’s to standard out—try it! Using this trick, you will start by feeding ten A’s to the meet program (remember, it takes two parameters):
Next, you will feed 600 A’s to the meet.c program as the second parameter, as follows:
As expected, your 400-byte buffer has overflowed; hopefully, so has the EIP. To verify, start gdb again:
NOTE Your values will be different. Keep in mind that it is the concept we are trying to get across here, not the memory values. Depending on the version of gcc you are using and other factors, it may even crash in a different portion of the program.
Not only did you not control the EIP, you have moved far away to another portion of memory. If you take a look at meet.c, you will notice that after the strcpy() function in the greeting function, there is a printf() call, which in turn calls vfprintf() in the libc library. The vfprintf() function then calls strlen. But what could have gone wrong? You have several nested functions and therefore several stack frames, each pushed on the stack. When you caused the overflow, you must have corrupted the arguments passed into the printf() function. Recall from the previous section that the call and prolog of a function leave the stack looking like the following illustration:
If you write past the EIP, you will overwrite the function arguments, starting with temp1. Because the printf() function uses temp1, you will have problems. To check out this theory, let’s check back with gdb. When we run gdb again, we can attempt to get the source listing, like so:
You can see in the preceding bolded line that the arguments to the function, temp1 and temp2, have been corrupted. The pointers now point to 0x41414141 and the values are "" (or null). The problem is that printf() will not take nulls as the only input and therefore chokes. So let’s start with a lower number of A’s, such as 405, and then slowly increase it until we get the effect we need:
As you can see, when a segmentation fault occurs in gdb, the current value of the EIP is shown.
It is important to realize that the numbers (400–412) are not as important as the concept of starting low and slowly increasing until you just overflow the saved EIP and nothing else. This is due to the printf call immediately after the overflow. Sometimes you will have more breathing room and will not need to worry too much about this. For example, if nothing was following the vulnerable strcpy command, there would be no problem overflowing beyond 412 bytes in this case.
NOTE Remember, we are using a very simple piece of flawed code here; in real life, you will encounter many problems like this. Again, it’s the concepts we want you to get, not the numbers required to overflow a particular vulnerable piece of code.
Ramifications of Buffer Overflows
When you’re dealing with buffer overflows, basically three things can happen. The first is denial of service. As you saw previously, it is really easy to get a segmentation fault when dealing with process memory. However, it’s possible that this is the best thing that can happen to a software developer in this situation, because a crashed program will draw attention. The alternatives are silent and much worse.
The second thing that can happen when a buffer overflow occurs is that the EIP can be controlled to execute malicious code at the user level of access. This happens when the vulnerable program is running at the user level of privilege.
The third and absolutely worst thing that can happen when a buffer overflow occurs is that the EIP can be controlled to execute malicious code at the system or root level. In Unix systems, there is only one superuser, called root. The root user can do anything on the system. Some functions on Unix systems should be protected and reserved for the root user. For example, it would generally be a bad idea to give users root privileges to change passwords. Therefore, a concept called Set User ID (SUID) was developed to temporarily elevate a process to allow some files to be executed under their owner’s privilege level. So, for example, the passwd command can be owned by root, and when a user executes it, the process runs as root. The problem here is that when the SUID program is vulnerable, an exploit may gain the privileges of the file’s owner (in the worst case, root). To make a program an SUID program, you would issue the following command:
chmod u+s <filename> or chmod 4755 <filename>
The program will run with the permissions of the owner of the file. To see the full ramifications of this, let’s apply SUID settings to our meet program. Then later, when we exploit this program, we will gain root privileges.
The first field of the preceding line indicates the file permissions. The first position of that field is used to indicate a link, directory, or file (l, d, or –). The next three positions represent the file owner’s permissions in this order: read, write, execute. Normally, an x is used for execute; however, when the SUID condition applies, that position turns to an s, as shown. That means when the file is executed, it will execute with the file owner’s permissions (in this case, root—the third field in the line).
Local Buffer Overflow Exploits
Local exploits are easier to perform than remote exploits because you have access to the system memory space and can debug your exploit more easily.
The basic goal of buffer overflow exploits is to overflow a vulnerable buffer and change the EIP for malicious purposes. Remember, the EIP points to the next instruction to be executed. An attacker could use this to point to malicious code. A copy of the EIP is saved on the stack as part of calling a function in order to be able to continue with the command after the call when the function completes. If you can influence the saved EIP value, when the function returns, the corrupted value of the EIP will be popped off the stack into the register (EIP) and then executed.
Components of the Exploit
To build an effective exploit in a buffer overflow situation, you need to create a larger buffer than the program is expecting by using the following components: a NOP sled, shellcode, and a return address.
In assembly code, the NOP (no operation) command simply means to do nothing but move to the next command. Hackers have learned to use NOP for padding. When placed at the front of an exploit buffer, this padding is called a NOP sled. If the EIP is pointed to a NOP sled, the processor will ride the sled right into the next component. On x86 systems, the 0x90 opcode represents NOP. There are actually many more, but 0x90 is the most commonly used.
Shellcode is the term reserved for machine code that will do the hacker’s bidding. Originally, the term was coined because the purpose of the malicious code was to provide a simple shell to the attacker. Since then, the term has evolved to encompass code that is used to do much more than provide a shell, such as to elevate privileges or to execute a single command on the remote system. The important thing to realize here is that shellcode is actually binary, often represented in hexadecimal form. You can find tons of shellcode libraries online, ready to be used for all platforms.
We will use Aleph1’s shellcode (shown within a testprogram) as follows:
Let’s check it out by compiling and running the test shellcode.c program:
It worked—we got a root shell prompt.
NOTE We used compile options to disable memory and compiler protections in recent versions of Linux. We did this to aid in your learning of the subject at hand.
Repeating Return Addresses
The most important element of the exploit is the return address, which must be aligned perfectly and repeated until it overflows the saved EIP value on the stack. Although it is possible to point directly to the beginning of the shellcode, it is often much easier to be a little sloppy and point to somewhere in the middle of the NOP sled. To do that, the first thing you need to know is the current ESP value, which points to the top of the stack. The gcc compiler allows you to use assembly code inline and to compile programs as follows:
Remember the ESP value; we will use it soon as our return address, though yours will be different.
At this point, it may be helpful to check whether your system has ASLR turned on. You can check this easily by simply executing the last program several times in a row, as shown here. If the output changes on each execution, then your system is running some sort of stack randomization scheme.
Until you learn later how to work around this situation, go ahead and disable ASLR, as described in the Caution earlier in this article:
Now you can check the stack again (it should stay the same):
Now that we have reliably found the current ESP, we can estimate the top of the vulnerable buffer. If you are still getting random stack addresses, try another one of the echo lines shown previously.
These components are assembled in the order shown here:
As you can see, the addresses overwrite the EIP and point to the NOP sled, which then “slides” to the shellcode.
Exploiting Stack Overflows from the Command Line
Remember that in this case, the ideal size of our attack is 408. Therefore, we will use Perl to craft an exploit of that size from the command line. As a rule of thumb, it is a good idea to fill half of the attack buffer with NOPs; in this case, we will use 200 with the following Perl command:
perl -e 'print "\x90"x200';
A similar Perl command, shown next, will allow you to print your shellcode into a binary file (notice the use of the output redirector, >):
You can calculate the size of the shellcode with the following command:
Next, we need to calculate our return address. We could do this one of two ways: by using math based on the stack pointer address or by finding exactly where our data sits on the stack with gdb. The gdb method is more accurate, so let’s take a look at how to do that. To begin with, we want our application to crash and for us to be able to easily identify the data. We already know that our buffer length is 412, so let’s build a sample overflow and see if we can find our return address.
Our first step is to load a crash scenario into gdb. To do this, we are going to issue the command:
We have now successfully crashed our program and can see that our EIP overwrite is 0x41414141. Next, let’s take a look at what’s on the stack. To do that, we are going to use the “examine memory” command and ask gdb to give us the output in hex. Because looking at individual chunks isn’t always super helpful, we are going to look in batches of 32 words (4 bytes) at a time.
We still don’t see our A’s, so to get more data from the stack, we can just press enter again. We’ll keep going until we see something like this:
You can see at the bottom that our A’s (0x41) are visible. We can safely use the stack address 0xbffff3ac as our jump address. (Remember, your address may be different.) This will put us into our NOP sled and is a few words in, so it gives us a little room to be wrong by a byte or two. Now we can use Perl to write this address in little-endian format on the command line:
The number 39 was calculated in our case with some simple modulo math:
If you put this into a calculator, you will notice that the value is 38.25; however, we rounded up. When Perl commands are wrapped in backticks (`), they may be concatenated to make a larger series of characters or numeric values. For example, we can craft a 412-byte attack string and feed it to our vulnerable meet.c program as follows:
This 412-byte attack string is used for the second argument and creates a buffer overflow, as follows:
- 200 bytes of NOPs ("\x90")
- 59 bytes of shellcode
- 156 bytes of repeated return addresses (remember to reverse this due to the little-endian style of x86 processors)
The segmentation fault showed that the exploit crashed. The likely reason for this lies in the fact that we have a misalignment of the repeating addresses. Namely, they don’t correctly or completely overwrite the saved return address on the stack. To check for this, simply increment the number of NOPs used:
It worked! The important thing to realize here is how the command line allowed us to experiment and tweak the values much more efficiently than by compiling and debugging code.
Exploiting Stack Overflows with Generic
The following code is a variation of many stack overflow exploits found online and in the references. It is generic in the sense that it will work with many exploits under many situations.
The program sets up a global variable called shellcode, which holds the malicious shell-producing machine code in hex notation. Next, a function is defined that will return the current value of the ESP register on the local system. The main function takes up to three arguments, which optionally set the size of the overflowing buffer, the offset of the buffer and ESP, and the manual ESP value for remote exploits. User directions are printed to the screen, followed by the memory locations used. Next, the malicious buffer is built from scratch, filled with addresses, then NOPs, then shellcode. The buffer is terminated with a null character. The buffer is then injected into the vulnerable local program and printed to the screen (useful for remote exploits).
Let’s try our new exploit on meet.c:
It worked! Notice how we compiled the program as root and set it as an SUID program. Next, we switched privileges to a normal user and ran the exploit. We got a root shell, which worked well. Notice that the program did not crash with a buffer size of 500 as it did when we were playing with Perl in the previous section because we called the vulnerable program differently this time, from within the exploit. In general, this is a more tolerant way to call the vulnerable program; however, your results may vary.
Exploiting Small Buffers
What happens when the vulnerable buffer is too small to use an exploit buffer as previously described? Most pieces of shellcode are 21–50 bytes in size. What if the vulnerable buffer you find is only 10 bytes long? For example, let’s look at the following vulnerable code with a small buffer:
Now compile it and set it as SUID:
Now that we have such a program, how would we exploit it? The answer lies in the use of environment variables. You could store your shellcode in an environment variable or somewhere else in memory and then point the return address to that environment variable, as follows:
Why did this work? It turns out, this technique, which was published by a Turkish hacker named Murat Balaban, relies on the fact that all Linux ELF files are mapped into memory with the last relative address as 0xbfffffff. The environment variables and arguments are stored in this area, and just below them is the stack. Let’s look at the upper process memory in detail:
Notice how the end of memory is terminated with null values; next comes the program name, then the environment variables, and finally the arguments. The following line of code from exploit2.c sets the value of the environment for the process as the shellcode:
That places the beginning of the shellcode at the precise location:
Let’s verify this with gdb. First, to assist with the debugging, place a \xcc at the beginning of the shellcode to halt the debugger when the shellcode is executed. Next, recompile the program and load it into the debugger:
When we executed with our \xcc character in, we see that when the execution stopped, the message was a little bit different. In this case, the program stopped with a SIGTRAP because the \xcc we added created a soft breakpoint. When our execution encountered the \xcc, the program stopped, indicating that the application successfully made it to our shellcode.
We will talk about Exploit Development Process in our next article. Thank you for reading