Security-Oriented C Tutorial 0x18 - Malloc and the Heap
Hey guys, in this tutorial, we are going to learn about the heap segment and how to use it for storing data in our programs. We will also go into the details of its use in conjunction with the char pointer and struct data types.
We've already discussed the stack in a previous tutorial only slightly mentioning the heap. I've given a diagram of memory for visual aid and in it we can see that the heap is located at the top. Here it is again.
The heap grows down from low memory to maximize the available memory between it and the stack but we already know this. What could possibly be stored in the heap if our variables are stored in the stack? Well, the stack is good and all and there's nothing wrong with it or anything. The purpose of the heap is to store dynamically allocated data.
Unlike the stack, variables are given a set amount of memory where it can hold its information therefore it cannot request for more space. The heap, on the other hand, can request for more space and it can even reduce its required memory load, either expand or shrink according to the needs of the user. However, you must know that the heap's information arrangement is not quite the same as the stack where it's all nice and tidy, everything side by side, arrays' data are all snuggly placed happily together, no. It's a rather jumbled disorganized battlefield where other programs are also requesting its own memory allocation and everything is everywhere. You could have one piece of data in one little area and another 10 address down memory lane. So to be able to keep everything in check and not lose anything, there are special little helpers placed along with our data to help point out where they all exist. This, I will not get into because it is a bit too advanced, but by all means, research if you wish! Just know that every single piece of data will be in check.
malloc is a function found in the stdlib header which is responsible for dynamically allocating memory. It returns a pointer to the location of the allocated memory so we require a pointer variable to hold this information. In our code, we use the char pointer, string, to store the location of a 1024-byte-allocated memory space (kind of like an array). You should add some error checking with the return value of malloc because if it does not execute correctly and you try to read or assign data to memory, it could crash your program due to a read/write access violation. I did not include it in here because I did not want to clutter the example code.
Using a strcpy, we copy the string from argv into string and then print it out. Again, we should check if argc is greater than 1 otherwise it would throw us a seg fault because argv would not exist.
After this, we need to free the data from memory so that it can be reused by other programs which require it and also to prevent leaks. As long as we have not "lost" the pointer to the dynamically allocated memory, we can safely release it.
Output looks good!
We can see that string acts just like a normal char array. But why would we need this? If we don't know the amount of memory we will need beforehand, malloc allows us to set a predefined size which can be expanded where necessary, giving us a "heap" of freedom. Ha...ha... We cannot just allocate a static amount of memory in the stack with a size of a billion, that's just inefficient and a waste of resources.
Structs can usually contain a lot of information due to the total summation of each variable's size. Because of this, it's incredibly wasteful of both time and memory to store and keep throwing it around memory by passing and returning it from functions. What's the solution?
Back to our previous struct code, we have now defined a new type for struct _person, *pPerson which is a pointer to the struct. Also notice that we've changed the return type of newPerson and Moe 's type to *pPerson and that we do not need to specify a * between the type and the variable name since it is already defined to be a pointer.
In function newPerson, we can see that we've declared a variable p as a pointer to some dynamically allocated memory with the size of a struct _person using the sizeof operator. We then assign each of the members with the new arrow operator (->). The arrow operator essentially translates to the dereferencing of the pointer to the struct and then accessing the member with the usual period (.) operator, i.e. p->firstName would be equivalent to (*p).firstName. When we're finished with initializing, we return it as we would normally.
In main, we assign Moe the pointer to the new dynamically allocated struct (we have just passed the reference to the memory so we did not "lose" it since Moe now knows where we can find it). After printing out the data, we call on free to release the memory back for other programs to use.
The difference between this code and our previous code is that now instead of copying an entire struct across memory from newPerson into main (which is however big the size of struct _person is), all we are doing now is passing a reference, or a pointer which is a measly 4 bytes. Much more efficient.
Sweet, works perfectly!
If you feel like you need something to test whether the heap was freed properly or not, there is a tool out there which can track memory allocation for us to analyze called Valgrind. Here it is in action with our program.
Looks like we're in the clear!
Dynamic allocation is a great method for allocating an unknown or a large amount of data. Using malloc is especially handy when trying to move data around memory because all that is used is a 4 byte reference to a single block of memory without the need to copy the entire chuck in and out of functions, saving us time and resources to perform operations. Since there is no native garbage collection in C, we are required to free dynamically allocated memory when we are finished using them otherwise there could be leaks and other programs won't be able to reuse those parts of memory. As long as you having a pointer somewhere still pointing to the memory, it is still reachable and can be released.
In the next tutorial on functions, we will discuss the nature of using references for efficient parameter passing. And as we have now covered the pre-requisite material to empower our little virus, head on over to see what's up.