Before I continue with a topic on strings, we first require some fundamental understanding of how memory works, i.e. what it is, how data looks in memory, etc. as this is crucial when we are analyzing vulnerabilities and exploitation. I highly suggest that your mind is clear and focused when reading the following article because it may prove to be confusing. Also, if you do not understand something, please verify all of your doubts, otherwise you may not completely understand when we touch on buffer overflows. I will do my part and try to clearly explain the content.
Basics of Memory
Here is a generalized representation of what memory looks like.
Low memory is located at the top and high memory is located at the bottom. Sometimes memory is depicted as the other way around with low memory at the bottom but we will be using this version throughout the course. As you can see, there is data on both ends of memory. At the low end, there is something called the "Heap" which we will not discuss now. At the high end, there is something called the "Stack" which is typically where the program data is stored such as functions and local variables. Notice that each end grows towards the middle. This is to maintain maximum space for each section. It sort of looks like a huge array, well, because it kind of is a huge array. How convenient! We've already done arrays! Each box is like an element, but we do not refer to them as elements. Instead, we call them memory addresses, like how your house has an address.
The Stack
As previously stated, the stack contains program data such as functions and local variables. We are going to discuss how variables are stored on the stack but first, we must know some details about the stack.
The stack is a "last in, first out" (LIFO) system. A good comparison would be a Pez dispenser where the last item you put in has to be the first to come out. In the CPU, registers EBP (Base Pointer) and ESP (Stack Pointer) maintain the organization of the stack where EBP points at the base of the stack and the ESP register points to the top. We will be using the ESP register to see what data in memory looks like.
Are you ready? Let's do this.
Example Code
We have declared two variables, an int and a char array. For the following memory analysis, I will be using GNU's GDB (debugger). The flags which I used with GCC to build my program will be explained now:
- -m32 - build a 32-bit executable (I am on 64-bit so I need this)
- -gdwarf-2 - build for GNU's GDB debugging
Memory Analysis
Okay, wow! A lot to take in at a single moment but don't worry, we will go through it all. Remember, if anything is unclear, ask someone for help!
Some prior knowledge before starting:
- A "word" is a term used to describe a size of 32 bits, or 4 bytes.
- Addresses in memory are represented as hexadecimal values.
Let's begin in the blue section. As I have said above in the description of the stack, the ESP register points to the top of the stack and since our data is stacked on top of each other like plates, we can see our variables if we look far enough. Note that the top of the stack is lower in memory (lower value) than the bottom (0xffffd080 < 0xffffd0c0). We can now locate our variables.
Moving on to the red section, we can see a 0x0000000a which is hexadecimal for the decimal 10. I have printed it out (for you to see clearly) as a decimal in the size of a word since it is appropriate for an int data type.
Onto the green section, if you look at the stack, you can see the "0x54" part at the front instead of the end but it is actually the end of that data block. Confused yet? The "0x54" value is hexadecimal for the letter "T" but why didn't I include the rest of that block? Okay, this is because of "endianness" but I will not talk about it now. The green boxed values are the characters of the string where two numbers and/or letters represent one byte (or a char). How you read these blocks of hexadecimals is a bit funny. The data in each block starts at the right hand side and you read it backwards but for the entire green area, you read it normally from left to right. Try decoding the hex values and write them down, you will see what I mean. I will explain hexadecimals in another tutorial.
In the bigger green box, I have located the address of the string (p &str) and then printed out 17 characters in the size of bytes (x/17cb 0xffffd0b8) and now it's human readable and easier on your eyes. Notice that the buffer is 17 bytes which includes the null terminator (\0). The number on the left is the decimal value of the corresponding letter on the right. But wait, characters are numbers? Yes, they are! Computers don't understand what letters are... What do you think they are, humans? If you want something to clear this up, head over to AsciiTable for a table of different representations for characters.
Conclusion
Okay, I think that's enough content for one tutorial! Hope you haven't decided to give up yet! Everything will be made clear when I go over them in future tutorials so just relax. Take a break if you're mentally exhausted from all of this random spewage of nonsense. But hey, at least we can now go over strings and buffers now! I know you've been waiting for a while! Stay tuned!
dtm.
Just updated your iPhone to iOS 18? You'll find a ton of hot new features for some of your most-used Apple apps. Dive in and see for yourself:
10 Comments
When I try to compile the program, i get the following error in the attached screenshot and i googled it and tried running "apt-get install libc6-dev-i386" but it didn't work. Can someone help me to figure out what's wrong? (any help is greatly appreciated)
Hey there, Black Heart,
Can you try to find and install a gcc-multilib package?
dtm.
You can use flag "-g" instead of "-gdwarf-2"
I tried running the command "apt-get install gcc-multilib" but it says "unable to fetch some archives" so I guess the links are different now?
Can you try this:
sudo apt-get install --reinstall gcc-4.8-multilib
If 4.8 doesn't work, try 4.7.
I tried both 4.7 and 4.8 but i get the error 404, I'm really confused about what's really wrong here.
Okay, I have no idea, it doesn't work on my Kali either. Probably something to do with missing repositories.
Maybe you can find someone who knows something about installations because I'm not familiar with this.
OK.
I've googled and tried to find an answer and it said something like i need to install 32-bit libraries and tried installing, no such luck yet
P.S. I posted a thread regarding this so anyone else with the same problem might be benefited in the future
Hello there, I am so excited to be learning from this tutorial. So many thanks to DTM for doing this amazing effort. Just to share with you the similar problem that I had when creating the 32-bit file.
root@kali:~/Desktop/c_files# gcc -m32 -gdwarf-2 memory.c
In file included from /usr/include/stdio.h:27:0,
from memory.c:1:
/usr/include/features.h:364:25: fatal error: sys/cdefs.h: No such file or directory
# include <sys/cdefs.h>
^
compilation terminated.
sudo apt-get install --reinstall gcc-6-multilib
//// because in my case I am using gcc 6.2.0 on my kali kali-rolling
root@kali:~/Desktop/c_files# sudo apt-get install --reinstall gcc-6-multilib
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
gdebi-core libcrypto++6 libfftw3-single3 python-pycryptopp
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
gcc-multilib lib32asan3 lib32atomic1 lib32cilkrts5 lib32gcc-6-dev lib32gcc1
lib32gomp1 lib32itm1 lib32mpx2 lib32quadmath0 lib32stdc++6 lib32ubsan0
libc6-dev-i386 libc6-dev-x32 libc6-i386 libc6-x32 libx32asan3 libx32atomic1
libx32cilkrts5 libx32gcc-6-dev libx32gcc1 libx32gomp1 libx32itm1
libx32quadmath0 libx32stdc++6 libx32ubsan0
The following NEW packages will be installed:
gcc-6-multilib gcc-multilib lib32asan3 lib32atomic1 lib32cilkrts5
lib32gcc-6-dev lib32gcc1 lib32gomp1 lib32itm1 lib32mpx2 lib32quadmath0
lib32stdc++6 lib32ubsan0 libc6-dev-i386 libc6-dev-x32 libc6-i386 libc6-x32
libx32asan3 libx32atomic1 libx32cilkrts5 libx32gcc-6-dev libx32gcc1
libx32gomp1 libx32itm1 libx32quadmath0 libx32stdc++6 libx32ubsan0
root@kali:~/Desktop/c_files# gdb -q ./a.out
Reading symbols from ./a.out...done.
(gdb) list
1 #include <stdio.h>
2
3 int main(void)
4 { int num=10;
5 char str="this is a string" ;
6
7 return 0;
8 }(gdb)
Line number 9 out of range; memory.c has 8 lines.
(gdb) break 6
Breakpoint 1 at 0x597: file memory.c, line 6.
(gdb) run
Starting program: /root/Desktop/c_files/a.out
Breakpoint 1, main () at memory.c:7
7 return 0;
(gdb) i r esp
esp 0xffffd3b8 0xffffd3b8
(gdb) x/20xw 0xffffd3b8
0xffffd3b8: 0xffffd47c 0x565555d1 0x74faf3dc 0x20736968
0xffffd3c8: 0x61207369 0x72747320 0x00676e69 0x0000000a
0xffffd3d8: 0x00000000 0xf7e14276 0x00000001 0xffffd474
0xffffd3e8: 0xffffd47c 0x00000000 0x00000000 0x00000000
0xffffd3f8: 0xf7faf000 0xf7ffdc04 0xf7ffd000 0x00000000
(gdb) p &str
$1 = (char (*)17) 0xffffd3c3
(gdb) p &num
$2 = (int *) 0xffffd3d4
(gdb)
Well I just wanted to share that I could run the debug so others can have some hope LOL. sometimes Linux can be tricky, more than expected, for newbies ... like me. Thanks so much for your tutorials, I hope I can contribute more in the future.
Blinko
Hello DTM, I was analyzing your tutorial when running the debug and I have a question that I couldn't resolve.
When you say :
"How you read these blocks of hexadecimals is a bit funny. The data in each block starts at the right hand side and you read it backwards but for the entire green area, you read it normally from left to right."
I got this
(gdb) x/20xw 0xffffd3b8
0xffffd3b8: 0xffffd47c 0x565555d1 0x74faf3dc 0x20736968
0xffffd3c8: 0x61207369 0x72747320 0x00676e69 0x0000000a
0xffffd3d8: 0x00000000 0xf7e14276 0x00000001 0xffffd474
0xffffd3e8: 0xffffd47c 0x00000000 0x00000000 0x00000000
0xffffd3f8: 0xf7faf000 0xf7ffdc04 0xf7ffd000 0x00000000
The fact that 0x0000000a is on a different block than your picture, it makes me wonder how data is arranged on memory. Maybe it is not that clear. For me, I am assuming that LIFO acts like it receives first the sentence num=10, it is stored on memory and then it receives str="this is a string" and stores on memory to and makes senses with my memory data array above.
Can you explain me please why your 0x0000000a is on a lower memory than mine? Many Thanks!
Share Your Thoughts