Security-Oriented C Tutorial 0x14 - Format String Vulnerability Part I: Buffer Overflow's Nasty Little Brother
What's up readers? Today I'll be introducing to you a new vulnerability called the Format String vulnerability (in case you missed the title). It won't be much, just a little motivation to keep you guys going. A little teaser, if you may.
Pre-requisite information: For those who do not know what endianness is, please read this so that you know what's up.
So we should all know about format strings pretty well by now. We've seen them used in functions like printf to output data onto our console. We also know that with functions, we can pass in arguments and that they are pushed onto the stack in reverse order to allow for variable arguments. What are variable arguments? They allow us to place more than just a set number of arguments to be passed, a variable number of arguments. These arguments correspond with a format string which tell them how they should be printed. But what happens if we give printf just the format specifier(s)?
If we take a look at printf with format specifiers but without the corresponding variables, from where would it get the data to print? But what are we looking for? Surely we can't compile a printf with missing arguments. It's true but we can do this.
This code will definitely compile properly but how do we exploit it? Well, considering what I said above, if we just give it the format specifier, it'll look something like this: printf ("%x %x %x %x");. So knowing how function parameters utilize the stack, from where would the data corresponding with the %x format specifiers come? Can you guess?
Look at that! Seems like we've pulled out some data from the stack! When printf is called like this "without parameters", it seems to just grab whatever data it can get its hands on!
To make things clearer, consider the following diagram.
And the following stack analysis in GDB.
Note: The values are different because every time we run a program, data may be overwritten by other programs.
Following the GDB output, if this were translated back into code, it would be something like this:
printf ("%08x %08x %08x %08x %08x", 0xffffd31c, 0x000000c2, 0xf7eaa216, 0x78383025, 0x38302520);
The last two hex values are actually part of the string in the format specifier parameter or the string of buf.
Our buf variable is at 0xffffd030 and holds the format string parameter of printf. We can see that the stack holds the other arguments which correspond to our %08xs in the correct, reverse order and lastly, the address of buf right at the top. If we continue to stack on more %08xs, we can pull out a lot of data.
Let's think about the consequences of this for a second. If a program was holding information within memory and a vulnerability like this were to be exploited, it could potentially grab information such as password, session and cryptographic keys and other highly valuable information. This is something we definitely do not want to allow.
As we've seen with buffer overflows, we were able to overwrite the data within memory but only the data before the buffer itself. With this vulnerability, we can write to any memory address we wish and therefore overwrite any data and replace it with our own.
Let's modify our first example code and see if we can overwrite our own variable.
Remember that the static keyword places the variable into the data segment which is out of our reach from the stack and because of this, we cannot do a simple buffer overflow to overwrite it. What we will perform is an arbitrary memory write using the format string vulnerability. How are we going to approach this? Let's think clearly about it for a second... Going back to the %n conversion specifier, what we need do is to write to the variable's address like scanf, right? Now we need to know how to "set up" the string so that everything is in the correct place on the stack when the exploit is executed. Can you figure out how?
Let's take a look at how the code for printf would look like. Maybe something like this: printf ("...%n", &var); where ... would represent the number of characters we require. Since the parameters are pushed on in reverse, so we need the address of var to be first followed by the format string.
To make things simpler, we've included the address of var but I'm sure you guys are knowledgeable enough to debug and find where the variable would be. We'll do a test run first and see the output.
So our var is in the address 0x0804a028 which means we need our argv to be something along the lines of [0x0804a028]%n but since we are on a little endian machine (if you are not, ignore this), we need the address in reverse order. How do we type hex values into the program? We can use command line Perl, Bash or Python to this for us. I will be using a little Bash for this example.
Hang on, our var didn't change! What went wrong? Let's try a little memory analysis to figure out the problem.
We can see the contents of buf in the red. The printf has pushed the address of our buf onto the stack for execution (yellow) but what happens with our %n? Our %n actually corresponds with the address in the blue because printf grabs whatever it can from the top of the stack, remember? Let's see what's stored at that address.
It stores the value 4, because when we printed the address of var, it consisted of 8 hex values which is equivalent to 4 chars or 4 bytes.
Now, recall the Direct Parameter Access method we had learned. This is where it comes in handy because we can grab whatever data we can on the stack. Our target address is actually stored inside the buf which starts at the address 0xffffd04c (from above in the red) which is also exactly the 7th position from the first parameter. Let's try it with the %n conversion specifier.
Notice that we require the backslash (\) character before the $ symbol. This is because it would be interpreted as a Bash command without it and wouldn't work. Aaand we've successfully done our first arbitrary memory write!
Never print a buffer by directly placing it into printf like that. Always, always use the string conversion specifier followed by the buffer in the variable argument section. It's that simple!
If any of you find this a but hard to understand, not to worry. Exploitation isn't an easy topic especially if you've just started out so just keep working at it. If anything, what you should take away from this is how this is can be prevented.
This tutorial has only shown us what the vulnerability is capable of and there are many more tricks we can do when we open up the executable a bit more.