Introduction to Modern Cryptography
Cryptography is the science of keeping secrets, or more specifically, the science of disguising them. As a point of fact, cryptography has progressed quite a bit farther and now encompasses file and message integrity, sender authentication, and pseudo-random number generators.
Most articles on cryptography skim over these facts and will usually only cover the most basic alphabetic substitution and permutation encryption algorithms. While I will mention those in the beginning, I will do so only so as to make the more modern concepts stick.
In this article, I will discuss cryptography's beginnings. Then, I will cover what it means for an encryption algorithm to be breakable and unbreakable. I will then cover iterated functions with special attention to the Substitution Permutation Network (SPN). Finally, I will mention both what encryption can do, and what it cannot do.
Note: All examples in this article will follow the following formatting.
- unencrypted code will be unbolded and not capitalized
- ENCRYPTED CODE WILL BE BOLD AND CAPITALIZED
All encryption algorithms are based upon the transformation and permutation of elements of information. The classic example of transformation by substitution is the Caesar cipher and its descendant, Vigenere. As an example, we could have the following sentence...
- the quick brown fox jumped over the lazy dog
This sentence can be encrypted with a Caesar cipher with a key of "b"...
- VJKS WKEM DTQY PHQZ LWOR GFQX GTVJ KNCB AFQI
Note that the letters have been incremented by two and the formatting removed. Vigenere works in a similar way, but instead of a one letter or number key, it uses several letters and numbers. For example, the same sentence is now encrypted with the keyword "disney"...
- XQXE ZHGT UFTV ROHL OTQY XRTU IAMV JKEI RRTF
Finally, we can use a transposition cipher. A transposition cipher does not change the letter's distributions. Instead, it simply scrambles their order. A rail fence cipher is one of the most famous of the simple transpositions. First you take the sentence you wish to encode and write them in a staggered formations like this...
- t e u c b o n o j m e o e t e a y o
- h q i k r w f x u p d v r h l z d g
Then you copy each row down...
- TEUC BONO JMEO ETEA YOHQ IKRW FXUP DVRH LZDG
In each case, the sentence is encoded so that you cannot simply read it. Against incurious eyes, this is enough. However, what if someone wishes to read it enough to try and break it?
Ultimately, information is not the letters nor the collections of bits. The information is the patterns these letters and bits make. Cryptanalysis is all about searching for patterns that may give hints to what the key is (or what the key isn't).
In essence, a cryptanalyst is looking for information from a set of data that has been encrypted, or meta data relating to the encrypted data. Much of this is out of the scope of this article, but I will probably cover it later.
Before I talk about the theory behind strength of encryption algorithms, I shall first use an example. Look at my first example using the Caesar cipher. Suppose the attacker (often named Oscar in many cases) knew that the words were encrypted with a Caesar cipher. He might try to brute force the answer by guessing several decryption keys...
- "a" uifr vjdl cspx ogpy kvnq fepw fsui fmba zeph
- "b" theq uick brow nfox jump edov erth elaz ydog
- "c" sgdp thbj aqnv menw itlo dcnu dqsg dkzy xcnf
The Caesar cipher is extremely vulnerable to a brute-force attack because it has a very small key space. A key space is the number of possible keys, usually represented with a capital K, subscript l, to denote number of values a 'piece' of the key may have, and a superscript n to represent the number of pieces per key. Common values for l are 2, 10, and 26. In the case of the Caesar cipher, its key space is K sub 26, which is equal to 26. That number is small enough that the Caesar can be brute forced by hand without a computer.
Other ways to gather information about the key are to look at patterns in the letters by counting the number of occurrences of individual letters and bigrams. Most languages have inherent patterns that are conserved through the encryption process. Indeed there are many ways to find patterns, but for this article, the most important thing to remember is not the how, but the fact that this is true.
Short answer, Yes. The long answer requires some information.
First, we must realize that for all n-grams (n pieces of information in a row), there is a certain probability attached to their appearance. Some are very common.
For example, the letter e is very common in English, the 'jge' operator is common in assembly, certain pitches of sound are more common in music, and so on. An n-gram can represent just about anything from words and sentences, to code and algorithms. Furthermore, these frequencies can be determined ahead of time if the attacker knows what he is looking for.
Second, we generally assume that a key is chosen randomly. That is to say, we assume that given knowledge of our previous keys chosen from our keyspace, the attacker cannot determine the net key that will be used.
Note that this is an assumption, and it is not always true. Indeed, some systems of encryption have been broken because of a bad number generator. Nevertheless, when analyzing the strength of a system, we assume information about the key can only be gleaned from the system itself and not underlying problems with implementation.
Now we can understand what it means to be unbreakable.
Suppose we have a set of n-gram probabilities and a randomly generated key. According to Shannon (the original information theorist), "A system is unbreakable if and only if, from any given ciphertext, the probability of that ciphertext being equal to a particular plaintext given any particular key, is equal to the probability of receiving that plaintext randomly."
What the heck does that crap mean?
Well lets go back to the Caesar cipher for a moment.
Suppose I sent you a letter like "e" and encrypted it with the randomly chosen letter "b". Suppose the attacker receives the text "g". The attacker has to somehow determine what letter I sent you from the letter I have.
He could consult his list of probabilities, but this gives him no other information about my tendency to send you the letter e than he already had. Indeed, had he only been aware of the fact that I had sent you only one letter, he would still be just as well off. It would not help to brute force the letters identity either, as this would simply give him a list of letters but no information on the probability of my having selected it.
The Caesar cipher is unbreakable in this case. In fact, the Caesar cipher is unbreakable for all plaintext lengths equal to 1, and the Vigenere cipher is unbreakable for all plaintext lengths equal to the keylength.
Shannon also had a definition for a strong cipher, but I shall substitute his definition for a simpler one.
A cipher is strong (though not unbreakable) if and only if, for all reasonable amounts of computational power and for all reasonable amounts of time, the probability that an attacker will break a particular code is less than the reciprocal of the value of breaking that code.
To make that clearer, suppose I am trying to determine whether or not my encryption scheme is good enough to protect my $1000 credit limit credit card number. The value an attacker would get from breaking my particular cards encryption is $1000. Let us also suppose the card expires (and loses all value) in two years.
Finally, suppose the attacker has a supercomputer. The encryption scheme I created would be strong, if after two years of the attacker trying to brute force, or analyze my system, the probability he happened to get the right key and hence get my card number is less than 1/1000.
This means that if the attacker knew in advance how long it would take, and that his chance of success is less than 1/1000, then he would know that for all two years of work, he could expect, on average, less than a dollar in value.
Obviously the time frame can change (in war, particular encrypted orders might only be useful for a month at most), the value can change (databases filled with credit cards for example are worth more), and what is considered reasonable in computational power will change depending on who the attacker is (governments have a lot more than the average hacker).
Now that I have covered what a strong cipher is, I shall cover a type of modern cipher that is both popular and strong (depending on which cipher is used).
This is one of the most popular types of symmetric block ciphers in use today (Feistel is another). Just as the name says, this type of cipher uses both substitution and permutation to change the plaintext into ciphertext. To understand how these work, you must first understand that these ciphers first break up the data into chunks.
Each chunk is made up of a certain number of bytes called a word. For an example, I shall make up a SPN that has a substitution section, a permutation section, and a whitening section. Suppose I have the hexadecimal code:
If I defined my word size to be 4 bits (1/2 byte), then I could split it up like this...
- a 2 5 8 f e 1 b
For the substitution phase, I shall add three to each one...
- D 5 8 B 2 1 4 E
Then I shall permute the data by moving each word two to the right and then switching the second word with the third, the fourth with the fifth, and so on until the eighth is switched with the first.
- 1 D E 8 5 2 B 4
Finally, I will take my key (0x01234567) and I will XOR it to the current data. This process is called whitening.
- 1 C C 6 1 7 D 3
This may seem like the end result, but it's not! No. In today's SPNs, we would do this several times before we were done.
Rijndael, aka AES, uses a more complex set of substitutions and permutations, adds in a key generator that generates a key for each cycle and repeats the process twelve times! These processes are sufficiently complex and random that any change in the key or the plaintext will affect each and every bit in the block.
Encryption cannot do everything for you. There are a lot of things it can do though.
For example, encryption can be used to keep you messages safe, obviously. With a sufficiently strong encryption scheme and a sufficiently strong key, even a government cannot break encryption.
Cryptography is used daily, hourly, secondly (my English teacher be damned!) for online transactions. You are aware that your credit card number is transmitted every time you shop at a new website, right? Without public-key cryptography, we could not shop online with any semblance of security.
Cryptography is also used for file integrity assurance. Using hashes, we can be certain that since the last time a file was 'hashed', the file has not been altered. Finally, using hashes and signature schemes, we can verify someone's identity without every meeting them.
However encryption cannot do everything.
As I said, a sufficiently good encryption scheme cannot be broken even by a government. However, if the government in question is really that interested in your information, they can simply force you to give them the password using whatever means are available to them.
This brings me to my second point.
Encryption does not guarantee secrecy of communication, only of the content. In fact, the fact that something is encrypted is so obvious that a powerful entity can use it to tag your communication across the internet. This process is called traffic analysis and is, once again, outside the scope of this article.
If you want to prevent traffic analysis, you need to either obfuscate the endpoint with proxies (like a TOR route), or you need to hide the fact that your even communicating secretly with steganography.
I hope you enjoyed this article. If you liked this article, please reply below. Also, tell me if there is anything in particular anyone wants to know about. Keeping in mind that my cryptanalysis skills are lagging behind everything else.