Documente Academic
Documente Profesional
Documente Cultură
Abstract
This paper is a survey on classical cryptography. In this paper, the ciphers of classical cryptography are
mainly discussed. Starting from mathematical and information theory background, ciphers with two major
categories – substitution and transposition are analyzed with encoding, decoding algorithms, cryptanalysis
matters and some applications. At last, some of the machines used in the early age of cryptography are
discussed.
Introduction
Classical Cryptography, the history of which has at least 4000 years as we know1, is mainly used
in diplomacy and war over centuries. However, comparing to modern cryptography which are
mainly used in computer security nowadays, most of the classical ciphers are claimed to be
The motivation for us to write the paper is that the classical cryptography was useful in history
and is useful in some of the recent applications as well. They give basic ideas of how people do
confusions2 and diffusions3 which are the properties of some secure ciphers nowadays. Moreover,
they give clues on how cryptography theory is developed along the history.
1
From http://williamstallings.com/Extras/Security-Notes/lectures/classical.html
2
confusing: an encrypting algorithm that make the original message unrecognizable.
3
Diffusion :a principle that changes in one part of the plaintext will affect many parts of the entire
plaintext.
1
In the following sections, we are going to survey on each classical ciphers starting from
mathematical background and information theory. In our analysis, English (26 letters) is used as
the template for most of the ciphers. Similar analysis can be conducted for other languages which
have different alphabet size. For example, German language (30 letters in the alphabet).
Background Information
Mathematical Backgrounds
XOR operation
XOR operation is a binary bitwise operator who takes in two operands which could be either 0 or
A B A XOR B
0 0 0
0 1 1
1 0 1
1 1 0
Modular Arithmetic
Mod is a binary operator that takes in two integers as its operands. The result of mod operation is
the reminder of integer division of the left argument and the right argument.
2
5 mod 3 =2
9 mod 5 =4
Congruence is a mathematic concept closely related with mod. Integers a and b are called
Another equivalent definition is that integers a and b are congruent modular a non-zero integer n
Index of coincidence
Index of coincidence was discovered by Philip Friedman stated in his article “The Index of
The index of coincidence of a ciphertext measures the probability of two letters that are randomly
selected from text to be identical. It will become less when the key length goes larger. The formula
is given by:
z
Freq(O ) * ( Freq(O ) 1)
IC = ¦
O a N * ( N 1)
IC stands for index of coincidence, Freq(x) is the number of occurrences of symbol x in the a text,
The value for IC ranges from 0.0384, for a polyalphabetic substitution with a perfect flat
3
Table 2: Number of Enciphering Alphabets Versus Index of Coincidence;
No. of 1 2 3 4 5 10 Large
alphabets
Unicity Distance
Unicity distance measures the minimal length of cipher text for which there is only one single
possible plaintext decryption. Usually the larger distance value, the better the cryptosystem is. For
example, unicity distance for substitution for English text is 27 which means given a 27 letters
U§logK/RlogP
(K is the size of key space, R is the redundancy, P is the size of alphabet used)
Frequency Analysis
In most languages, certain letters, words or symbols appear at certain frequencies if the text is long
enough. Frequency analysis is based on this idea. For Example, in English text, ‘e’ is the most
frequently used letter, that means it appears at highest frequency. The differences between the high
frequency letters and the low frequency letters can be used to analyze the cipher text. In the
appendix, there are statistic data for most common used letters and digrams and trigrams.
Substitution Ciphers
In substitution ciphers one letter is replaced by another letter. There are many categories of
substitution ciphers. In this section, we are going to discuss monoalphabetic substitution ciphers,
homophonic ciphers, polygraphic ciphers, polyalphabetic substitution ciphers and the one time
pad.
4
Monoalphabetic Substitution ciphers
The Monoalphabetic Substitution cipher, also called as Simple Substitution cipher, is the one in
which each character in the plaintext is replaced by a corresponding one from a cipher alphabet.
The cipher alphabet can be reversed or shifted or scrambled. Although the number of possible
keys is very large (e.g, 26!-1 for English), this cipher is not very strong and considered easily
breakable by frequency analysis. However, the advantage for this cipher is that it can be
performed by direct lookup, and the time to encrypt message of n characters is proportional to n.
We are going to look at Caesar cipher in details and briefly introduce some other ciphers such as
Caesar Cipher
Caesar cipher is one of the simplest encryption methods by shifting the alphabet to a fixed number
of positions.
Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher: XYZABCDEFGHIJKLMNOPQRSTUVW
The encryption can also be transformed by modular arithmetic. Letters are transformed to
respective numbers. Let A=0, B=1, C=2, D=3 … Z=25. Encryption a letter X by shifting a
position of n can be encoded as: E(x) = x +n mod 26 while the decryption can be represented as
D(x) = x -n mod 26. Thus, in English, there are 25 different ciphers while a language with an
The Caesar Cipher is said to be invented by Julius Caesar to communicate with his army. He is
considered as the first person who has ever used encryption for secure messages. Although this
5
It is quite easy to break the Caesar cipher. Take English as an example, since the key space is only
25, we can break it by hand with less than 25 tries (exhaustive key search). That is, rotate it and
see whether the resulting decoded text is readable according to English syntax and common sense.
However, with frequency analysis of English letters4, it becomes easier and faster to break the
cipher.
By roughly mapping the frequency distribution curve (rearrange the letters to enable the curve
increasing) of the ciphertext with the normal frequency distribution curve of normal English, we
may get a readable English text. This method works well especially for the messages with long
content.
Another way to break the cipher is the recognition of the short, commonly-used words. For
example, in English, “the”, “and” and “of” appear regularly5. When the cipher text includes the
spaces, the two or three letters—so called digram and trigram--- are likely to be standing out and
repeated. Trying the regularly used digrams and trigrams, it is possible to decode the cipher easily.
Besides short words, consecutive and repeated letters also give hints to break the cipher. In
English, “tt”, “ss” and “ee” are the ones commonly repeated consecutively.
Application
ROT13
ROT13 is a self-reversing Caesar cipher popularly used on Usenet and other online forums as a
means of masking joke punchlines, movie and story spoilers, and offensive expressions from the
casual glance6.
The name “ROT13” stands for “Rotate by 13 places”. Since there are 26=2*13 letters in English,
4
Discussed in the background section
5
From http://www.all-science-fair-projects.com/science_fair_projects_encyclopedia/Caesar_cipher
6
From http://www.fact-index.com/r/ro/rot13.html
6
ROT13(ROT13(x)) = ROT26(x) = x for any text x
To apply ROT13 to a piece of text, simply shift every English letter by 13 places leaving numbers,
ROT13 is not intended to be secure. Instead of protecting the message, ROT13 protecting the
readers from materials they may not wish to view in the forums. Thus the viewer of the message
will be the ones who consciously choose to decipher it using rotate by 13 scheme.
Affine Cipher
The encryption function for the cipher is e(x) = ax + b mod m where a and m are relatively prime
The decryption function is d(x) = a - 1(x - b) mod m where a – 1 is the multiplicative inverse
modular m.
The cipher is less secure in the way that if a cryptanalyst can discover two of the ciphertext
characters then the key can be obtained by solving the equations system.
Atbash Cipher
Atbash Cipher
A B G D H V Z Ch T Y K L M N S O P Tz Q R Sh Th
Th Sh R Q Tz P O S N M L K Y T Ch Z V H D G B A
The Atbash Cipher is a simple substitution cipher in Hebrew. It substitutes the first letter by the
last one and the second letter by the second last oneDŽ˄As shown in the table˅
7
substitution ciphers. The way is to disguise the letter frequencies by homophony. Usually in this
cipher, high frequency letters are given more ciphertext symbols while the lower frequency ones
are given less. Thus, it is different from monoaphabetic cipher in the way that one letter can be
Book Cipher
The key of a book cipher is the identity of a book. The ciphertext of a plaintext word is the
location of the word in the book. One of the problems is that the word in the plaintext may not
appear in the book. So one of the alternative ways is to encode the plaintext letter by the location
of the letter in the book. However, when a large ciphertext is needed, the time for encoding the
message is long.
Straddling checkerboard
The Straddling checkerboard is a device to convert letters into digits.
0 1 2 3 4 5 6 7 8 9
E T A O N R I S
2 B C D F G H J K L M
6 P Q U V W X Y Z .
3113212731223655
+0452045204520452
7
From http://en.wikipedia.org/wiki/Straddling_checkerboard
8
=3565257935743007
3 5 65 25 7 9 3 5 7 4 3 0 0 7
larger letter groups. It is more difficult for cryptanalyst to use frequency analysis to break the
cipher. However, for a specific language, there are still some frequency patterns for larger letter
groups.
Playfair cipher
The Playfair Cipher is the earliest practical Polygraphic Substitution Cipher. The cipher used a 5
by five table and a key. In order to create the 5 by 5 table and use the cipher, one needs to
x If the letters of a pair are both the same (or only one letter is left), add an "X" after the
x If the letters appear on the same row of your table, replace them with the letters to their
immediate right respectively (wrapping around to the left side of the row if a letter in the
x If the letters appear on the same column of your table, replace them with the letters
immediately below respectively (wrapping around to the top side of the column if a letter
x If the letters are not on the same row or column, replace them with the letters on the same
row respectively but at the other pair of corners of the rectangle defined by the original
pair.
Use the inverse of these four rules can decrypt the message.
8
From http://www.fact-index.com/p/pl/playfair_cipher.html
9
Hill Cipher9
The Hill cipher is a polygraphic substitution which can combine much larger groups of letters
simultaneously, using linear algebra. Each letter is treated as a digit in base 26: A = 0, B =1, and so
matrix, modulo 26. The components of the matrix are the key, and should be random provided that
In order to make substitution ciphers more secure, more than one cipher alphabet can be used to
encode a single alphabet in the plaintext. Such ciphers are called polyalphabetic substitution
cipher. Such a one-to-many correspondence makes the use of frequency analysis much more
difficult to attack.
Leon Battista Alberti invented the first published polyalphabetic cipher around 1467.[1] At the
beginning, a good polyalphabetic substitution cipher was extremely hard to break. But after the
mid-1800s when Friedrich Kasiski published the first procedure for attacking polyalphabetic
10
cipher, especially Vigenere cipher.
Vigenere cipher
This cipher is named after a Frenchman--Blaise de Vigenere. The encoding and decoding
procedures utilize a tableau rectum called Vigenere tableau and a key. A Vigenere tableau is a
square matrix indexed by a pair of English letters, with all 26 letters in each row and each column.
9
From http://en.wikipedia.org/wiki/Polygraphic#Polygraphic
10
In the book “Die Geheimschriften und die Dechiffrierkunst” (“Secret writing and the Art of
Deciphering” in English), the polyalphabetic cipher was no longer considered as secure.
10
Suppose the key is K=<k(0),k(1),k(2),(3),…,k(d-1)>, where k(i) is a symbol from the alphabet
used, typically an English letter . The length of the key is d. For example, if the key is “BAD”,
then d=3, k(0)=”B”, k(1)=”A”, k(2)=”D”. The key may repeat as many times as needed because it
Suppose the plaintext is P=<p(0),p(1),p(2),…, p(n-1)>, where n is the length of plaintext P and
Suppose the ciphertext encrypted from plaintext P with key K using Vigenere cipher is
C=<c(0),c(1),c(2),c(3),…,c(n-1)> where each of c(i) is a letter and n is the length of the ciphertext.
Note that the ciphertext C and plaintext P are of the same length. This is a characteristic of
Vigenere cipher.
Denote the Vigenere tableau with Vigenere_table. Then the encryption of Vigenere cipher can be
described as:
Example. For the message COMPUTER SECURITY and keyword LUCKY we proceed the
encryption as follows:
11
Table 5:
L U C K Y L U C K Y L U C K Y L
C O M P U T E R S E C U R I T Y
For each letter of the message, we use the letter of the keyword to determine a row and go across
the row to the column headed by the corresponding letter of the message. As in the following
table (Table 6), it follows that the first two letters "CO" in the message are encoded as "NI".
Continuing in this way we find the encoded message that appears in table 6
12
Table 7: Encryption of Vigenere Cipher
L U C K Y L U C K Y L U C K Y L
C O M P U T E R S E C U R I T Y
N I O Z S E Y T C C N O T S R J
Beaufort cipher
Beaufort cipher is another polyalphabetic cipher which is very similar to the Vigenere Cipher. The
The running key cipher is a type of polyalphabetic substitution cipher, in which a text, typically
from a book, is used to provide a very long key stream. Generally speaking, such a book has to be
determined ahead of time, while the passage to be used as the key would be chosen randomly for
each message. Obviously, nobody except the sender knew the key if it’s not indicated somewhere
in the message. Like Vigenere cipher, running key cipher also employs Vigenere tableau. But in
running key cipher the key is not repeated, instead this cipher uses a key stream, which is as long
as the message. We need a predefined secure way to tell the recipient where to find the running
To our surprise, the security of running key cipher is not as secure as we might image due to the
13
low entropy per character of both plaintext and key. The most obvious and easiest way to improve
the security is to use a predefined mixed alphabets table instead of the tableau recta (Vigener
table).
Autokey cipher
An autokey cipher incorporates the message into the key. It’s also called self-synchronizing stream
cipher. There are two kinds of autokey cipher: key autokey cipher, in which the next element of
the key is determined by the previous elements in the key stream, and text autokey ciphers, in
which the next element in the key is determined by the previous message.
Vigenère also invented a kind of autokey cipher. His innovation was to append the message to the
keyword to form the real key. So it’s a text-autokey cipher.This text-autokey cipher was
undeciphered for over 200 years, until Charles Babbage discovered a means of breaking the
cipher.
The method to break the polyaphabetic ciphers is to determine the number of alphabets used,
break the ciphertext into pieces which were enciphered with the same alphabet, and solve each
piece as a monoalphabetic substitution. There are two tools that can decrypt messages written with
a large number of alphabets. They are the Kasiski method, to determine when a pattern of
encryption permutation has repeated, and the index of coincidence, to predict the number of
14
The method of Kasiski, named from its developer Friedrich Kasiski, a Prussian military officer, is
a way of finding the number of alphabets that were used for encryption.
The method relies on the regularity of English. Not only letters but also letter groupings and words
are repeated. (e.g. –th, -ing, -ed, -ion, -tion, -ation, im-, in-, un-, re-,–eek-, -oot-, -our-, and words
The Kasiski method follows this rule: if a message is encrypted with n alphabets (e.g., key length
is n for Vigener cipher), and if a particular word or letters group appears k times in the plaintext,
then it should be encrypted approximately k/m times (ceiling of k/m11) from the same alphabet.
This is resulting from the Pigeon Hole Principle12. The distance between the repeated pattern in
cipher text should be a multiple of the key length or say the number of alphabets used.
2. For each pattern, calculate the distance between the position of starting point of
3. Determine the great common divisor of all distances obtained from step 2
4. If polyalphabetic substitution is used, the key length should be one factor of the
Short repeated patterns, such as 2 letters pattern, are often accidental, so it’s more trouble to
consider it that to ignore it. Any pattern over 3 characters is almost certainly not accidental. (The
likelihood of two four letters pattern not being from the same plaintext segment is 1/264 ) [security
in computing]. The distance of two repeated pattern should be divided evenly by the key length.
So if the distance is calculated with two non-successive instances, the number of candidates for
11
Ceiling is a mathematic function which takes into a real number as argument and output the least
integer value which is larger than or equal to the argument.
12
If you have fewer pigeon holes than pigeons and you put every pigeon in a pigeon hole, then there
must result at least one pigeon hole with more than one pigeon.
15
the key length would become larger.
For the details of the index of coincidence method, we can calculate the IC and look for the table
One time pad uses a random key to encrypt the message. The reason why it is called one time pad
is because the key is used only once for each segment of message and never used again. Simple
Example:
Message: COMPUTER
KEY: SECURITY
COMPUTER: 01000011 01001111 01001101 01010000 01010101 01010100 01000101 01010010
SECURITY: 01010011 01000101 01000011 01010101 01010010 01001001 01010100 01011001
______________________________________________________________
CIPHERTEXT: 00010000 00001010 00001110 00000101 00000111 00011101 00010001 01001011
Each encryption is independent of any other encryption thus the pattern cannot be detected. The
unicity distance14 for one time pad is infinite because the key length should be equal to or longer
than text length. Thus, it is the only cipher that has been proven to be perfectly theoretical secure.
However, the length of key is an obvious drawback for one time pad(In one time pad the key
should be at least as long as the plaintext that is to be encrypted). Moreover, one needs to require
the user to agree on a key in advance, thus cause the problem of transmitting the key securely.
Cryptanalysis
One time pad is said to be a key transmission not message transmission. In order of one time pad
to be effectively secure, the key should be random enough. As long as the key is random enough
13
Refer to the background section.
14
Refer to the background section for unicity distance
16
and can be kept safe, one time pad is perfectly secure.
Transposition Ciphers
Transposition means reorder the elements of plaintext according to some rule agreed by the sender
The major property of Transposition cipher is that the number of each element in the plaintext is
the same as that are in cipher text, because elements are simply reordered but not substituted. Thus,
it has preservation of frequency distribution. However, the frequencies for digrams and trigrams
are probably not equal to the frequency distribution of original language. From this we may detect
one ciphertext is encrypted with transposition cipher. Transposition is not safe because modern
computer can easily decode the cipher by trying all the possible ways of arrangement and do it
quickly.
plaintext: ILOVECOMPUTERSECURITY
ciphertext: YTIRUCESRETUPMOCEVOLI
If you read carefully, you can find that the plaintext is simply reversed. That is if we reverse the
For most applications, they apply some bijective function to plaintext. The procedure to encode
Write the plaintext into a matrix row by row and the cipher output is column by column. The key
17
Example:
Message: WEAREDOINGCOMPUTERSECURITYASSIGNMENT
Key length: 6
Matrix:
W E A R E D
O I N G C O
M P U T E R
S E C U R I
T Y A S S I
G N M E N T
Cipher: WOMSTGEIPEYNANUCAMRGTUSEECERSNDORIIT
Columnar transposition
If we want to complicate the route in Rail Fence Cipher, we can permute the column to enhance
security. The way is to read the column in alphabetic order of the key.
Message: WEAREDOINGCOMPUTERSECURITYASSIGNMENT
Key: BIRDAY
Read the column from A->B->I->R->Y
Matrix:
W E A R E D
O I N G C O
M P U T E R
S E C U R I
T Y A S S I
G N M E N T
Cipher: ECERSNWOMSTGRGTUSEEIPEYNANUCAMDORIIT
18
Double transposition
Double transposition is to apply columnar transposition twice on the text to enhance the security.
In one time transposition, the adversary could try all the possible length of the key and get the
plaintext while double transposition will complicate the situation. Since one time transposition we
Encryption and decryption can be done by a rotor machine practically. A rotor machine is a device
To make a rotor machine an encipherer, we need to do the following steps. Firstly, when turning
Secondly, we replace the switches to the keys on a typewriter attached to the switch. And the light
bulbs are labeled with letters as well. For example, when you press key “A”, the light bulb “A”
will light up. But this is not an encryption; we need to make it a mono-alphabetic encipherer.
Thirdly, in order to turn it into an encryption system, we simply change the writing by light up
different light bulbs corresponding to each letter pressed on the keyboard. For example, when an
“A” is typed, light bulb “X” will light up. Thus, when we type a message, the lighting of the
light bulbs will encrypt the message. This is similar to a single-alphabet (mono-alphabet)
Since this kind of simple substitution is not safe, how can we make the machine rotor more secure?
The solution is to introduce a poly-alphabetic substitution cipher system by using a rotor in the
19
machine and rotate it! While rotating the rotor, a new substitution will be generated every time
the same letter is pressed. For example, the first time you press an “A”, light bulb ”X” lights up,
the second time you press “A”, light bulb “ S” lights up, the third time you press ”A” some other
letter will light up, And so on. There is a website15 simulate “enigma machine” (an example of
rotor machine) where you can try to press the letters on the keyboard and get a view of how the
The algorithm involved here is “use the next alphabet with every key press16” the rotor is
generating the key by rotating, and the key is hidden on the wiring of the disk. The first key you
pressed is very important since it is used to generate a large key which is used to encrypt the
following keys.. The generation of the large key is done by rotating from the first key you pressed
The number of the rotors is also an important factor concerning its degree of security If a
machine with a single rotor is considered not secure enough, the security level can be increased by
simply more rotors. The reason is one rotor is a poly-alphabetic substitution system with 26 keys
With more than one rotor, another rotor spin one position after the first rotor spins “all the way”.
For example, after the first rotor spins from position “A” to “Z”, the second rotor spins from ”A”
to “B”. If you are using 3 rotors, the third rotor will spin one position after the second rotor spins
This is how encryption is done using a rotor machine In order to turn the rotor machine a
Enigma Machine
15
The website is : http://www.ugrad.cs.jhu.edu/~russell/classes/enigma/
16
From http://www.fact-index.com/r/ro/rotor_machine.html
20
Enigma machine is a typical example for rotor machine. (examples of rotor machines are : Enigma
machine ,Fialka ,Hebern rotor machine ,HX-63 ,KL-7 ,Lucida ,NEMA ,SIGABA , Typex )
Enigma machine is a rotor machine with 3 rotors, a unique feature and a reflector. The
mechanization for enigma machine is a complex algorithm. The task of encoding and decoding it
Enigma machine has been used during the World War II in early 1920s, most famously by Nazi
Germany17.
17
From http://webhome.idirect.com/~jproc/crypto/enigma.html
21
18
Other machines
There are other machines used for encryption and decryption purposes. The algorithms and
19
( The cylinder is cut to slices with each slice 5 mm in width) and on each slice, there are 26
random allocated equal size letters written on the side of the slice. )
An important feature of the Jefferson Cylinder is that, the person who receive the secrete message
should have an exactly same allocated cylinder as the the person who encrypt and send the
message. In another words, there must be 2 identical cylinders to carry on the encryption and
18
From http://en.wikipedia.org/wiki/Enigma_machine
19
From http://williamstallings.com/Extras/Security-Notes/lectures/classical.html
22
decryption process.
The encryption process is carried out like this: firstly, you turn the wheels on the cylinder and get
the letters of the secrete message alone the side of the cylinder. And another random chosen line of
letters (on which the order of the letters also appears to be quite random) is copied. The random
At the receiver side, as he received the cipher text, he could just organize the letters on his
cylinder by arranging each letter of the cipher text on his cylinder. Since the cylinder used for
encrypt the message is identical to the one used to encrypt, when he turn the cylinder around, he
will be able to find a line of letters which is meaningful thus can find the plaintext.
rotating the 2 wheels, the inner wheel will have all letters towards to the letter at the outer wheel.
The encryption will generate a poly-alphabetic cipher. The construction of the Wheatstone disk is
similar to a clock. There are 2 hands on the disk, one big hand, one small hand, which look like
the hour and minute pointer on the clock. These 2 hands are connected by gears. When the large
23
hand is pointing to a letter, the small hand will point to the corresponding cipher text. That is how
encryption is done using the Wheatstone disk. Note that when you rearrange the gears, the
encryption will be changed, which means, the small hand will not point to the same position when
Conclusion
In this paper, we have discussed the various kinds of ciphers and some of their applications. We
have seen that most of the ciphers are based on changing characters or stream of characters and
most of them are symmetric – once you know how to encode it, you will know how to decode it.
From the analysis, we have seen that most of the classical encryption methods are vulnerable and
can be easily attacked by the technology today. That is why we seldom use them in the computer
security nowadays. However, these encryption methods had given us clues on how cryptography
can be done. These basic theories, concepts of classical cryptography are important to the
References
24
http://www.edict.com.hk/TextAnalyser/default.htm
[6] Enigma
http://webhome.idirect.com/~jproc/crypto/enigma.html
25
l
[12] U-boot Enigma Simulation
http://www.u-boot-greywolf.de/uenigmasimulation.htm
[14] Encryption-Wikipidea
http://en.wikipedia.org/wiki/Cipher
26
Appendix
27
Table II : English letters frequency table
Analysis of 45406 Common Words
This table analyzes a pool of words includes plurals and words with common suffix
(From http://www.edict.com.hk/TextAnalyser/default.htm)
28