Security-Oriented C Tutorial 0x21 - Linked Lists
Welcome to the final tutorial of the series on standard C. This article will cover the linked list abstract data type (ADT). There will be a lot of abstraction to try to deliver the understanding in the most basic way for easiest interpretation of what they are and how they work, then we will get into the guts of it and learn the technical code underneath. For those who have yet to grasp the concept of pointers, it's advisable that you do that first before approaching this. Having learned this in my beginning stages, it was an incredibly frightful and excruciating experience so if you feel like you are entirely lost and have absolutely no idea about what is going on, then that's acceptable, you'll just need to buckle down and try your best. I will also try to the best of my ability to make this as comprehensible as I can. Here we go, and good luck!
As previously stated, a linked list is an abstract data type, a data structure which resembles something like a conga line, a linearly arranged set of data closely representing the array data type. The linked list consists of objects called nodes, like the elements of an array, and they have two sections: the data and pointer(s). The data portion contains any amount of any type of data such as a string, an int value, whatever. The pointer(s) part holds a pointer value to other nodes if they exists. We will be only using one pointer to point to the next node. Here is what a linked list looks like:
We can see the two halves of the nodes containing the data and the pointer to the next node. We will be using the X to show the end of the list, i.e. the last pointer will point to NULL.
These nodes might seem familiar from the description of the data half, in fact, nodes are structs and with structs, we can group specific data together into a user-defined data type.
Here is the basis of a linked list structure. We have defined the node struct and we have also defined a struct node pointer data type called link, i.e. link is a pointer to a struct node. Item is defined as an integer value and will be the data section of the node. We've also added a different struct _list which will point to the first node.
Before writing any more code, we need to understand another property of linked lists. You may have already noticed when I defined pointers to these structs that they are typically created within the heap. Linked lists are usually dynamically allocated on runtime where the user may require a variable amount of nodes to hold a variable amount of data, declaring a new or destroying nodes as required. This is something that arrays cannot do as they are statically declared on the stack and can only hold as much data as it was given. Using pointers to reference the lists, we can pass them around efficiently without much resources, especially if the lists are long and contain a lot of information.
An application using this method can be seen in scoreboards (like in online games) where when a new player joins the room, a new node can be allocated for them and then appended onto the list. If a player leaves the room, their node can be destroyed and their data can be freed.
Notice that data leakage can be a real issue here and any implementation of linked lists MUST be managed carefully. You must have your priorities straightened out and understand what is going on in your list, otherwise you will be losing pointers to heap allocated data and you will not be able to reach them in order to free them. If you're not someone who plans out code before writing it, I would highly advise you to do so when dealing with this.
Continuing onto our code, similar to a previous tutorial on Malloc and the Heap, we will need to define a function to create a new node to make things easier for us. We will also need a function to help initialize the list with a head node.
We see in both functions that the new struct is declared with malloc with the size of the struct by dereferencing the pointer to the struct. We also need to check if the malloc call was successful which is where the assert call comes in handy. If when a malloc call fails on runtime, the assert function will abort our program and tell us which line the error occurred.
For our head node, we initialize the first pointer to the first node as NULL because we have yet to point it to a node. For our normal node, we need to initialize the item to the item value passed in as an argument and we need to set the next pointer to NULL as we have yet to point it to an existing node.
Let's try and make a single-node list.
We declare a new list with a variable head to point to the beginning of the list and one node called node with a data of 42. We set head's first pointer equal to the address of node which essentially means that head now points to node like so:
We initialized node's pointer to NULL by default when calling newNode and since we only have one node, node points to NULL meaning it's the end of the list.
We also need to make sure that our heap variables are freed using free to release the data for other use.
Since linked lists are heap objects, we should check the status of the heap after our program finishes. We can do this using the Valgrind tool we discovered back in Tutorial 0x18 - Malloc and the Heap.
In the blue box, we can see that head is at address 0x51fc040 and it points to the address 0x51fc090. Our node is at address 0x51fc090, has the item 42 and points to NULL. Our head points to node and node points to NULL so everything went according to plan. We've also freed all of our heap data which is a good sign.
Now that we've covered the absolute basics on linked lists, let's try to define more functions to help us. First, we will do a freeList function to free all the nodes of a list and then the head node. How will we approach this? Let's start a plan and sort our priorities.
We need to be careful in this stage because we do not want to lose any pointers to nodes as if they were the strings on balloons. If we let any of them go, we will lose the node and every other node attached to it.
What we can do is use a temporary pointer to a node to hold onto a node so we can still reach it. This is how it should be done:
As you can see, prioritizing is a must when dealing with linked lists. If something small were to go wrong, we could be losing nodes unexpectedly. Let's see this in code.
We use a temporary pointer variable temp to point to the first node. After setting the head's first pointer to the node after temp's node, we can safely free the first node without losing any nodes since we have secured the second node with the head's first pointer. We can also free temp's node as the same time because we can still reach it with the temp pointer. Using temp as an iterator, we can repeat this process until all the nodes have been freed which then allows us to finally free the head itself.
Let's try another function to help us append a node to the end of the list called appendNode. The function will take two arguments, the head and a node. But how do we append a node onto a list? Well we'll need to access the last node and then set it's next pointer to point to the node we want however, we cannot directly reach the last node so we will have to traverse the list until we can. One last thing, we need to check if the list has a first node. If it does not, we simply set head's first equal to the node.
Now that we have these functions, let's try and make a linked list of items by parsing command line arguments like so:
./list [1st NUMBER] [2nd NUMBER] [3rd NUMBER] ...
Let's see the code with everything put together.
Let's see it all in action.
Everything worked perfectly! All heap memory was freed as well.
If you want to implement more functions, please go ahead and do so! If you need some ideas, try these:
- deleteNode - delete a node in a specified position,
- prependNode - place a node at the first position,
- removeItems - remove all nodes which have a specified item.
If you want to go even further, try implementing a sort function!
You can also customize your list by adding a prev pointer to point to the previous node, or add the number of nodes in the head struct or add a last pointer to point to the last node.
The linked list ADT is a great way to create a dynamic version of an array. We can add or destroy nodes on demand just by freeing and moving pointers around without the need to shift elements like we need in arrays.
Furthermore, the linked list is a form of basic implementation of a tree or graph. If we expand on our struct, we can create these more advanced data structures and do some real neat things. If you're wanting to go further than just lists and create these higher-level ADTs, you'll need to do research on your own from here on because this is it for the standard C series, hope you've all enjoyed and learned something and possibly go out and venture deeper into the language!