An Introduction to Steganography & Its Uses
It has been a long while since I last came here to write an article. Graduate school keeps you busy. After I looked over what I had written previously, I decided that I should introduce another fun topic from cryptography. In this case, steganography.
In this article, I will discuss what steganography is, why it is important, some theory involving steganography, and what it is used for.
Steganography is the study of embedding and hiding messages in a medium called a covertext. Steganography is related to cryptography and is just about as old. It was used by the Ancient Greeks to hide information about troop movements by tattooing the information on someone's head and then letting the person grow out their hair. Simply put, steganography is as old as dirt.
The basic idea behind cryptography is that you can keep a message a secret by encoding it so that no one can read it. If a good cryptographic cipher is used, it is likely that no one, not even a government entity, will be able to read it.
The picture above illustrates this point. However, sometimes merely communicating in secret can trip up alarms and make others suspicious. Therein lies the double edge of cryptography. While it may very well be unbreakable by all available standards, an encrypted message is easy to see and tag.
The simple fact is that an encrypted message does not resemble anything else but an encrypted message. Once a third party determines that you are communicating in secret, they may feel compelled to force you or the person you are communicating with to tell them what you are hiding.
This is where steganography comes in. Unlike cryptography, the purpose of steganography is to hide a message. All steganography requires is a covertext, which is where data will be hidden, a message that is made up of data, an algorithm that decides how to hide the data, and frequently, a key that will be used to randomize the placement of the data and perhaps even encrypt it.
The above diagram depicts how a steganographic algorithm works during the embedding process. First the data that is being passed from one person to another is encrypted (not always, but this is highly suggested). Then the information is embedded into a covertext. This is done according to the embedding algorithm and a secret key that modulates the actions of the embedding process (the key is also not necessary, but highly suggested). This process outputs a steganogram that has the information hidden inside.
Returning to the original problem, how do we know if we can communicate in secret? In this particular problem, what matters is not whether or not Oscar knows what we are saying. What constitutes a break is whether or not Oscar knows we are communicating in secret at all. Once he knows we are communicating in secret, he can take countermeasures such as rubber-hose cryptography or simply destroying the embedded data thus preventing us from communicating.
The problem ultimately reduces to whether or not Oscar can tell the difference between a normal message and an embedded message reliably. This is important to consider as Oscar does not have an infinite amount of manpower needed to interrogate everyone and widespread destruction of data in transit would grind information transfer to a halt.
So ultimately Oscar needs to be certain that he captures the largest number of offenders (or the largest chance to find one offender) while keeping the number of false alarms low.
This means that we can define a secure steganography algorithm. An algorithm is secure if Oscar cannot tell the difference between a normal message and a steganogram (computational indistinguishability). The research on this topic is rather new with most work on the topic being done in the last decade. I would suggest this paper if you wish to learn more on the topic of computational indistinguishability.
I am a big fan of practical definitions, so for this article, and any others I may write in the future, I will use this definition for secure system. A steganographic system is secure if, and only if, the value to Oscar in detecting a given number of steganographic messages is exceeded by the costs associated with investigating them and the false alarms.
This is a harder definition to imagine than my definition for secure cryptography. For one, what is the value of finding a steganographic message? It is easy to put a price on much information traded on the internet now, like credit card numbers or intellectual property, but it is harder to judge the value of knowing who is communicating with whom.
Nevertheless, this is still an intuitive definition because Oscar will not put the effort forward to uncover the identities of those who are communicating if it is not worth it.
The most commonly discussed steganography is embedded images. This is also the form that has the most research investigating it. While there are many types of algorithms, the three most common are LSB, DCT, and Append types. LSB stands for Least Significant Bit. It embeds data in the photo by replacing the least significant bit in a BMP type picture.
You can think of the least significant bit as the ones place. Because it has the smallest effect on the amount of a color, replacing this bit with a bit from the hidden data will have the smallest effect on the picture possible. The more bits replaced, the more bit "depth" available, and the larger the image, the more data that can be stored in the photo.
However, the more bits that are replaced, the more obvious the alterations will appear to both a statistical inspection and a visual inspection.
DCT stands for Discrete Cosine Transform. This works on many types of photo types. It works by calculating the "frequencies" of the image and then replacing some of them. DCT algorithms are more subtle in the way they manipulate photos and so are harder to detect. Note that larger transformations (due to more embedded data) will make the manipulations more obvious.
Just about the worst of these algorithms is the class of Append algorithms. Rather than hide the data in the photo by manipulating the picture, it instead appends the data to the end of the file as padding. In this manner, the data is hidden and never read by any photo displaying program. The good things about these algorithms are the simplicity of programming the algorithm and the fact that they are immune to visual inspection of the picture (the photos are identical so far as the photo is displayed).
However, these algorithms will change the size of the file. If the hidden data is large enough, the file size itself can be a dead giveaway. Further, the additional data appended to the end of the photo can be a dead giveaway to steganalysts looking for steganograms. If an Append algorithm must be used, definitely encrypt the data beforehand. If it is not encrypted first, a simple program that looks for text strings would be sufficient to find the files containing this data.
Now, while it is true that almost all research on this subject focuses on photos, there are algorithms that can hide data in sound files, executables, zip files, and even network communication. The last one is particularly interesting to those who desire to communicate secretly with ratted servers or even botnets.
The uses of steganography are as varied as the uses of communication itself. Obviously you can use it to send secret messages to a friend, colleague, or co-conspirator. You can use it to transport sensitive data from point A to point B such that the transfer of the data is unknown.
As this website points out, it can also be used in network topologies. This is particularly useful for covert communication of botnets and other systems under a hackers control. It could also be used to further obfuscate the origination and endpoint of data because some procedural packets are simply very common, and frequently ignored. It can take a well trained malware analyst hours to weeks to find when and how a system was compromised from a packet dump. A well designed network steganographic program may be able to withstand greater tests of time.
Once again, thank you for reading this post. I look forward to your comments. I plan on another post solely on the topic of detecting steganograms and countermeasures that can be taken to prevent detection.