Friday, October 28, 2011

Encryption and Ciphers - Part 1

This is starting a three or four part series on encryption and ciphers, going from simple ciphers such as the Caesar Cipher to industry strength encryption methods like RSA (maybe even Blowfish if I cover that much stuff). This assumes that you're familiar with atleast a little bit of modular arithmetic (http://en.wikipedia.org/wiki/Modular_arithmetic).

So, lets get started.

Firstly, what is encryption? Let's get a definition down. Encryption is taking "plaintext" (text everyone concerned can read and understand, including people that you would rather wouldn't read your message) and converting it to "ciphertext", which only you and the parties that understand how to go from ciphertext to plaintext (a.k.a the recipient) should understand.

That sounds pretty good, right? 

Now, for some basic knowledge I'll be using throughout the article, so, pay attention. When I say string, I mean a string of characters (which should be self-explanatory for the developers here). When I say convert a string to numbers/integers, I mean that we map "a" to "1", "b" to "2", "c" to "3" and so on, until "z" to "26". It would be a good idea to make a table of this mapping on a piece of paper, because you'll be using it a lot if you follow along.

Despite encryption sounding like something invented in the 20th century (or maybe even more recent than that), encryption originated a long time ago. A really long time ago. Caesar (yeah, that dude from Rome you learned about in high school) needed to send secret messages to his allies that could not read even if the messenger was attacked. His men invented a system called the Caesar Cipher. Its a very simple idea, you convert the message to numbers (and call the resulting set of numbers $P$), and you decide on a shift value, $n$ (which the other party knows of as well), and you do this:

$P_i+n \equiv S_i (mod 26)  ....      (1)$

where $i$ ranges from 1 to the length of $P$ and the set $S$ gives us the numbers for our ciphertext, and, from the numbers we get the actual ciphertext. This might look complicated, but, its quite clear with a little bit of thought. Basically, this, instead of mapping from letters to numbers, we map from letters to other letters. For example, with a shift of 3, we would map "a" to "d", "b" to "e" and so on (and because of the $mod 26$ the z wraps around) all the way to "z" to "c". Similarly, the decryption algorithm would be:

$S_i-n \equiv P_i (mod 26)$ 

This we can get just by manipulating (1) a bit.

Lets do a quick example. Encrypt "attack" with a shift of three. 

Convert to numbers:

$P = \{1,20,20,1,3,11\}$

Apply shift, and mod:

$S = \{1+3,20+3,20+3,1+3,3+3,11+3\}  (mod 26)$
$S = \{4, 23, 23, 4, 6, 14\}$

Result (this is why I told you to make the table): "DWWDFN"

Okay, great, now we know how to use a Caesar Cipher. Of course, it were bulletproof, we wouldn't have people that do PhD's on encryption. The Caesar Cipher is what is known as a substitution cipher, it takes one letter and replaces it with a different one. The problem is, it doesn't change the frequency of the letters,  we can see in the "attack" example that the two T's are now represented as two W's. This is easy to "crack" (going from ciphertext to plaintext without having the shift) by statistical methods. If we have this chart: 


We can try ever shift value (only 26 of them, can't be too hard) and check if the suggested plaintext has a letter distribution matching the one above. The closest match should be the right one!

And, we can also dictionary force it. If we try all the shift values and get a suggested plaintext for each one and check how many dictionary words are formed for each of these, the plaintext with the most matched dictionary words is most likely the one we are looking for.

It may seem a bit cumbersome to go through 26 different shift indices, and it would probably take some time, but, computers can do it before you can say "Caesar was a liar". Literally.

So, how would we solve this problem? 

That's the topic for the next one :)

Code for this article is located here: https://github.com/Poincare/cryptoBlog
Only in Python right now, but, I should have a C version up soon, and I would love ports to other languages, please fork and submit a pull request and I will accept it.

Comments are highly appreciated, and I would love it if you guys gave me any feedback at all (even if it is negative). Redditors, upvotes are very welcome.

I know this one was a bit light, and did not cover a ton of material, but, I aim to remedy that with the next one (I assume starting slow would be better). 

No comments:

Post a Comment