Sunteți pe pagina 1din 8

Xingjian Yan

MATH547
Cryptography and Linear Algebra

Cryptography is the study of the protocols which allow for secure communication in the
presence of antagonistic third parties. More specifically, cryptography allows for the
encryption of messages into forms that only a designated recipient can decrypt and
read. Since its inception, cryptography has expanded far past its early military and
political uses to become widely adopted in a variety of modern innovations, such as
ATM machines, computer passwords, and electronic commerce. Invented by Lester S.
Hill in 1929, the Hill Cipher presents one of the most straightforward applications of
linear algebra to the field of cryptography. Hills algorithm was accompanied by a patent
for a machine to quickly perform 6x6 matrix multiplication in modulo 26. Though
rudimentary and vulnerable by todays standards, Hills algorithm was a major
pedagogical contribution to modern cryptography as the first polygraphic substitution
cipher. Furthermore, it demonstrated that cipher operations could be done more
efficiently with the help of machines. This eventually gave rise to more practical and
secure machines such as the Enigma machine, a cipher machine used by the Germans
in World War II.
In this paper, we will discuss and illustrate the Hill Cipher and its vulnerabilities from a
mathematical perspective, before comparing and contrasting to the stronger algorithms
that are used today.

The Hill Cipher
The Hill Cipher is a polygraphic substitution cipher, that is, a method of encryption by
which plaintext is replaced by ciphertext. The simplest substitution ciphers are
monographic, meaning that each letter in plaintext is simply mapped to another letter on
a one-to-one basis. However, this method of encryption is susceptible to frequency
analysis given sufficient information about the language. The polygraphic feature of the
Hill Cipher makes it more secure because it splits plaintext into groups of fixed length
before encryption and masks the frequency distribution of letters. A slightly altered
message can also cause the entire reading frame to shift, resulting in completely
different ciphertext. With a sufficiently high key matrix size, frequency analysis becomes
useless for decrypting a Hill Cipher.
A message of length is encrypted as follows. Each letter is represented by a number
from 0 to 25. For simplicity, we will use 0=A, 1=B, 2=C, 25=Z. These numbers are
then delimited into / groups of size and placed into an / plaintext matrix . A
cipher key is then generated in the form of an invertible matrix. The values of


are randomly generated integers modulo 26.
The matrix product result in an / matrix written modulo 26. Because modular
arithmetic preserves products, this process can be reversed to recreate the values in .
That is:
( ) = ( ) ( ) = ( ) ( ) = ( )
Finally, these values are converted to letters by the original and written in string form to
give the ciphertext.
Implementation of decryption requires calculation of the inverse of the cipher key
modulo 26. Each of the values in matrix
1
is a quotient

/det (). In modular


arithmetic, an equivalent calculation is the product

det()
1
( 26), where
det()
1
is the modular multiplicative inverse of det (). This is calculated by
substituting into the following congruence.
1 det()
1
det() ( 26)
Because we are dealing with relatively small numbers in this case, det()
1
can easily
by determined by trial and error. Note that det()
1
is not unique. When dealing with
larger numbers in this congruence, the extended Euclidean algorithm can be used to
solve for det()
1
instead. (It is out of the scope of this paper, so we will not discuss it
here).
There are two limitations on the key matrix that can be used. First, the determinant of
the key matrix cannot equal zero, since the matrix would not be invertible. Second, the
determinant of the key matrix must be relatively prime to the length of the alphabet
used. This is because solutions to the modular multiplicative inverse congruence exist if
and only if the determinant of the key matrix and the length of the alphabet are relatively
prime. This theorem is known as Eulers theorem, also out of the scope of our
discussion.
Because of the limitations on the key matrix, it may be helpful to choose an alphabet
length that is less likely to be relatively prime with the determinants of the key matrix.
Since 26 has many factors, it places excessive constraints on what can be used as a
key matrix. This is not only inconvenient when selecting a key matrix, but also an issue
of security. If the alphabet length is known and a brute force method is used to
determine the key matrix, a cryptographer need not check any key matrix with a
determinant relatively prime to 26, thus greatly decreasing the time needed to crack the
code. For this reason, the length of the English alphabet is often set to 29, with the 26
letters in addition to three punctuation, spaces, or dummy characters.
Example 1. Mechanics of Encryption and Decryption
Plaintext Message (length 18): linearalgebraisfun
Numerical Representation: 11, 8, 13, 4, 0, 17, 0, 11, 6, 4, 1, 17, 0, 8, 18, 5, 20, 13

Rewrite the message in a matrix of size 3x6.
= [
11 4 0 4 0 5
8 0 11 1 8 20
13 17 6 17 18 13
]
Generate a random 3x3 matrix K modulo 26
= [
4 13 2
7 23 25
10 17 25
]
Check for invertibility
det() = 1353 0
Check that the determinant is relatively prime with the alphabet length.
gcd(1353, 26) = 1
Calculate the matrix product =
= = [
174 50 155 63 140 306
586 453 403 476 634 820
571 465 337 482 586 715
]
Calculate the values of C modulo 26.
( 26) = [
18 24 25 11 10 20
14 11 13 8 10 14
25 23 25 14 14 13
]
Convert C into string form and letters to give the ciphertext
Ciphertext: sozylxznzliokkouon

Each of the values in matrix
1
is a quotient

/1353. In modular arithmetic, this is


equivalent to the product of

and the multiplicative inverse of 1353 modulo 26.



1353
1
( 26) 1353 1 ( 26) = 1

1
=
1
1353
[
150 291 279
75 80 86
111 62 1
] 1 [
150 291 279
75 80 86
111 62 1
]
[
20 21 19
23 2 18
19 10 1
] ( 26)
Checking for invertibility

1
= [
4 13 2
7 23 25
10 17 25
] [
20 21 19
23 2 18
19 10 1
] = [
417 130 312
1144 443 572
1066 494 521
] [
1 0 0
0 1 0
0 0 1
] ( 26)

Decrypting the ciphertext
( 26) = [
18 24 25 11 10 20
14 11 13 8 10 14
25 23 25 14 14 13
]

1
= [
20 21 19
23 2 18
19 10 1
] [
18 24 25 11 10 20
14 11 13 8 10 14
25 23 25 14 14 13
]
= [
11 4 0 4 0 5
8 0 11 1 8 20
13 17 6 17 18 13
] ( 26) =
This returns the matrix P, which contains the desired plaintext message.

Cracking the Hill Cipher: Known Plaintext Attack
One of the major weaknesses of the Hill Cipher is its susceptibility to cracking. In more
exact terms, a cipher has been cracked when the inverse key matrix can be determined
without knowing the key matrix. The known plaintext attack is useful when some
plaintext can be matched to ciphertext. For example, messages may start with
stereotypical phrases such as Dear or To and end with Stop. Given this
information, a segment of ciphertext can be matched to plaintext since the position of
text does not change with the Hill cipher. The inverse cipher matrix can be determined
in a similar way as above - using sets of linear equations to solve for the inverse matrix.
For an key matrix, cracking the Hill cipher requires only solving sets of equations
with variables. This simple calculation can usually be done by hand.
This is an especially crippling weakness. Even if a cryptographer does not have enough
information to solve for the components of the inverse matrix determinately, the partial
information can help him to better understand the relationships between the plaintext
and the ciphertext, synthesize this information with knowledge of the language, the type
of message, or the recipient, and subsequently discover more information about the
inverse matrix.

Example 2. Known Plaintext Attack
Ciphertext: FAGQQILABQVLJCYQULAUSTYTOJSDJJPODFSZNLUHKMOW
Assume that the plaintext message begins with the characters A CRIB, and that the
encryption key is a 2x2 matrix. Note that in this example we have A=1, B=2, , Z=26.
In the first two diagraphs, we have
FA -> AC
[
6
1
] = [


] [
1
3
]
GQ -> RI
[
7
17
] = [


] [
18
9
]
Where the inverse encryption key
1
= [


].

This gives rise to a system of equations. Using the method of multiplicative inverses as
outlined in Example 1, we then rewrite the matrix modulo 26.

+ 3 = 6
+ 3 = 1
18 + 9 = 7
18 + 9 = 17





1
= [
11/15 101/45
14/15 1/45
] =
1
45
[
33 101
42 1
]

45
1
( 26) 45 1 ( 26) = 11

1
= 11 [
33 101
42 1
] [
1 19
20 17
] ( 26)

With the inverse key matrix, the remainder of the message can now be decrypted. The
calculation was performed in excel.

=
1

=
[1 18 2 19 3 18 5 20 15 10 3 21 5 15 16 1 14 5 20 5 20 18
3 9 9 1 15 18 3 3 14 5 20 18 6 18 12 9 20 24 12 20 5 1]

Plaintext: ACRIBISACORRECTCONJECTUREFORPLAINTEXTLETTERS (A crib is a
correct conjecture for plaintext letters)

Cracking the Hill Cipher: Ciphertext Attack
With modern computational power, the Hill Cipher can easily be cracked even without
plaintext-ciphertext pairs. The ciphertext attack is a computationally intensive cracking
strategy which attempts to match common letter sequences in ciphertext to the most
common letter sequences in the plaintext language. With a sufficiently long sample of
ciphertext, a cryptographer can rank ciphertext letter sequences by frequency and
compare to a list of common sequences, such as Table 1, to make statistical
predictions about the cipher matrix. This is simplest when the size of cipher matrix used
is small. Essentially, this is the same cracking method as the plaintext attack, but
numerous hypotheses for inverse matrices must be tested through trial and error before
a feasible inverse matrix is found.

Table 1. Common English letter sequences listed in order of frequency
Digraphs (Two
Letter Sequences)
TH HE AN IN ER ON RE ED ND HA AT EN ES OF NT EA TI TO IO LE IS OU AR
AS DE RT VE SE OR AL TE CO
Trigraphs (Three
Letter Sequences)
THE AND THA ENT ION TIO FOR NDE HAS NCE TIS OFT MEN ING EDT STH





A Better Cipher
Two properties of a cipher are considered when discussing its strength: diffusion and
confusion. Diffusion refers to the complexity of the relationship between the plaintext
and the ciphertext. Confusion is the complexity of the relationship between the
ciphertext and the cipher key. The best ciphers have high diffusion and confusion.
In the Hill Cipher, diffusion is adequate, since many plaintext characters affect each
ciphertext character. However, a change in plaintext affects only a single matrix column
of the ciphertext. For maximum diffusion, the entire ciphertext matrix would change with
each change in plaintext. This is known as an avalanche effect, where a slight change
in input causes a drastic change in output. Confusion is poor for this cipher, since partial
information about the cipher key can help one decrypt the ciphertext. Ideally, if one
ciphertext letter changes the entire calculation of the cipher key would change as well,
and therefore the key is useless unless it is known completely.
A substitution-permutation network offers many benefits with respect to diffusion and
confusion. In this cipher algorithm, several rounds of substitutions and permutation
occur to the bits in plaintext. In the S-box, the plaintext is split into several blocks and
substitution of the plaintext in each block occurs with high diffusion. This information is
subsequently passed to the P-box, where the bits are permuted and sent to the next
round of S-boxes. The calculations in each round of substitution-permutation can be
represented by a round key. Changing a single bit of the plaintext would cause an entire
block of output to be different after being fed into the S-box, conferring diffusion to this
algorithm. A single bit change would also make each of the round keys completely
different, resulting in cryptological confusion as well.
The Data Encryption Standard (DES) was the first standardized substitution-
permutation network developed in the 1970s jointly by IBM and the NSA. In its heyday,
it was considered secure enough to be used for the US governments security needs.
However, with the technology of the late 20
th
century, the DES could be cracked by
brute force in a matter of hours. The gold standard for substitution-permutation ciphers
has been superseded by the Advanced Encryption Standard (AES), which uses up to
256-bit keys. For now, this standard is resilient enough to withstand brute force attacks.
A brute force attack on AES-256 would take all of the current computing power in the
world longer than the age of the universe to crack.



References

Christensen, C. (2006). Retrieved from The Hill Cipher:
www.nku.edu/~christensen/section%2020%20hill%20cipher.pdf
Coppersmith, D. (1994). The data encryption standard (des) and its strength against
attacks. IBM Journal of Research and Development, 38(3), 243-250.
Eisenberg, M. (1999, March 7). Hill Ciphers and Moduar Linear Algebra. Retrieved from
www.apprendre-en-ligne.net/crypto/hill/Hillciph.pdf
Hill, L. (1929). Cryptography in an algebraic alphabet. The American Mathematical
Monthly, 36(6), 306- 312.
Katz, J. (2007). Introduction to modern cryptography. Boca Raton, FL: Taylor and
Francis Group.
Saeednia, S. (2000). How to make the hill cipher secure. Cryptologia, 353-360.

S-ar putea să vă placă și