0 evaluări0% au considerat acest document util (0 voturi)

43 vizualizări336 pagini© © All Rights Reserved

PDF, TXT sau citiți online pe Scribd

© All Rights Reserved

0 evaluări0% au considerat acest document util (0 voturi)

43 vizualizări336 pagini© All Rights Reserved

Sunteți pe pagina 1din 336

Arkadii Slinko

Algebra for

Applications

Cryptography, Secret Sharing,

Error-Correcting, Fingerprinting,

Compression

Springer Undergraduate Mathematics Series

Advisory Board

M.A.J. Chaplain, University of Dundee, Dundee, Scotland, UK

K. Erdmann, University of Oxford, Oxford, England, UK

A. MacIntyre, Queen Mary, University of London, London, England, UK

E. Süli, University of Oxford, Oxford, England, UK

M.R. Tehranchi, University of Cambridge, Cambridge, England, UK

J.F. Toland, University of Cambridge, Cambridge, England, UK

More information about this series at http://www.springer.com/series/3423

Arkadii Slinko

Cryptography, Secret Sharing,

Error-Correcting, Fingerprinting,

Compression

123

Arkadii Slinko

Department of Mathematics

The University of Auckland

Auckland

New Zealand

Springer Undergraduate Mathematics Series

ISBN 978-3-319-21950-9 ISBN 978-3-319-21951-6 (eBook)

DOI 10.1007/978-3-319-21951-6

Mathematics Subject Classiﬁcation: 11A05–11A51, 11C08, 11C20, 11T06, 11T71, 11Y05, 11Y11,

11Y16, 20A05, 20B30, 12E20, 14H52, 14G50, 68P25, 68P30, 94A60, 94A62

© Springer International Publishing Switzerland 2015

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part

of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations,

recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission

or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar

methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this

publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from

the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this

book are believed to be true and accurate at the date of publication. Neither the publisher nor the

authors or the editors give a warranty, express or implied, with respect to the material contained herein or

for any errors or omissions that may have been made.

(www.springer.com)

The aim of a Lecturer should be,

not to gratify his vanity by a shew

of originality; but to explain,

to arrange, and to digest with clearness,

what is already known in the science…

To my parents Michael and Zinaida,

my wife Lilia,

my children Irina and Michael, and

my grandchildren Erik and Yuri.

Preface

This book originated from my lecture notes for the one-semester course which I

have given many times at The University of Auckland since 1998. The goal of that

course and this book is to show the incredible power of algebra and number theory

in the real world. It does not advance far in theoretical algebra, theoretical number

theory or combinatorics. Instead, we concentrate on concrete objects like groups of

points on elliptic curves, polynomial rings and ﬁnite ﬁelds, study their elementary

properties and show their exceptional applicability to various problems in infor-

mation handling. Among the applications are cryptography, secret sharing,

error-correcting, ﬁngerprinting and compression of information.

Some chapters of this book—especially the number-theoretic and cryptographic

ones—use GAP to illustrate the main ideas. GAP is a system for computational

discrete algebra, which provides a programming language, a library of thousands of

functions implementing algebraic algorithms, written in the GAP language, as well

as large data libraries of algebraic objects.

If you are using this book for self-study, then studying a certain topic, familiarise

yourself with the corresponding section of Appendix A, where you will ﬁnd

detailed instructions on how to use GAP for this particular topic. As GAP will be

useful for most topics, it is not a good idea to skip it completely.

I owe a lot to Robin Christian who in 2006 helped me to introduce GAP to my

course and proofread the lecture notes. The introduction of GAP has been the

biggest single improvement to this course. The initial version of the GAP Notes,

which have now been developed into Appendix A, were written by Robin. Stefan

Kohl, with the assistance of Eamonn O’Brien, has kindly provided us with two

programs for GAP that allowed us to calculate in groups of points on elliptic curves.

I am grateful to Paul Hafner, Primož Potočnic, Jamie Sneddon and especially to

Steven Galbraith who in various years were members of the teaching team for this

course and suggested valuable improvements or contributed exercises.

Many thanks go to Shaun White who did a very thorough job proofreading part

of the text in 2008 and to Steven Galbraith who improved the section on cryp-

tography in 2009 and commented on the section on compression. However, I bear

ix

x Preface

the sole responsibility for all mistakes and misprints in this book. I would be most

obliged if you report any noticed mistakes and misprints to me.

I hope you will enjoy this book as much as I enjoyed writing it.

March 2015

Contents

1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Divisibility and Primes . . . . . . . . . . . . . . . . . . . . . . 4

1.1.3 Factoring Integers. The Sieve of Eratosthenes. . . . . . . 9

1.2 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.1 Greatest Common Divisor and Least Common

Multiple. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 14

1.2.2 Extended Euclidean Algorithm. Chinese

Remainder Theorem . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Fermat’s Little Theorem and Its Generalisations . . . . . . . . . . 22

1.3.1 Euler’s φ-Function . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.3.2 Congruences. Euler’s Theorem . . . . . . . . . . . . . . . . . 24

1.4 The Ring of Integers Modulo n. The Field Zp . . . . . . . . . . . . 27

1.5 Representation of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Cryptology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.1 Classical Secret-Key Cryptology . . . . . . . . . . . . . . . . . . . . . . 38

2.1.1 The One-Time Pad . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.1.2 An Affine Cryptosystem . . . . . . . . . . . . . . . . . . . . . 41

2.1.3 Hill’s Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2 Modern Public-Key Cryptology . . . . . . . . . . . . . . . . . . . . . . 47

2.2.1 One-Way Functions and Trapdoor Functions . . . . . . . 47

2.3 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.3.1 Orders of Magnitude . . . . . . . . . . . . . . . . . . . . . . . . 50

2.3.2 The Time Complexity of Several Number-Theoretic

Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.4 The RSA Public-Key Cryptosystem . . . . . . . . . . . . . . . . . . . 58

2.4.1 How Does the RSA System Work?. . . . . . . . . . . . . . 58

2.4.2 Why Does the RSA System Work?. . . . . . . . . . . . . . 61

2.4.3 Pseudoprimality Tests . . . . . . . . . . . . . . . . . . . . . . . 64

xi

xii Contents

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73

3.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73

3.1.1 Composition of Mappings. The Group

of Permutations of Degree n. . . . . . . . . . . . . . . . . . . 73

3.1.2 Block Permutation Cipher . . . . . . . . . . . . . . . . . . . . 78

3.1.3 Cycles and Cycle Decomposition . . . . . . . . . . . . . . . 79

3.1.4 Orders of Permutations . . . . . . . . . . . . . . . . . . . . . . 81

3.1.5 Analysis of Repeated Actions . . . . . . . . . . . . . . . . . . 84

3.1.6 Transpositions. Even and Odd . . . . . . . . . . . . . . . . . 86

3.1.7 Puzzle 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.2 General Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.2.1 Definition of a Group. Examples . . . . . . . . . . . . . . . 93

3.2.2 Powers, Multiples and Orders. Cyclic Groups. . . . . . . 95

3.2.3 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.2.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.3 The Abelian Group of an Elliptic Curve . . . . . . . . . . . . . . . . 103

3.3.1 Elliptic Curves. The Group of Points of an Elliptic

Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

3.3.2 Quadratic Residues and Hasse’s Theorem . . . . . . . . . 109

3.3.3 Calculating Large Multiples Efficiently . . . . . . . . . . . 112

3.4 Applications to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 114

3.4.1 Encoding Plaintext . . . . . . . . . . . . . . . . . . . . . . . . . 114

3.4.2 Additive Diffie–Hellman Key Exchange

and the Elgamal Cryptosystem . . . . . . . . . . . . . . . .. 115

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 116

4 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.1 Introduction to Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.1.1 Examples and Elementary Properties of Fields . . . . . . 117

4.1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.1.3 The Cardinality of a Finite Field. . . . . . . . . . . . . . . . 124

4.2 The Multiplicative Group of a Finite Field Is Cyclic . . . . . . . . 125

4.2.1 Lemmas on Orders of Elements . . . . . . . . . . . . . . . . 126

4.2.2 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . 128

4.2.3 Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . 129

4.3 The Elgamal Cryptosystem Revisited . . . . . . . . . . . . . . . . . . 130

5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.1 The Ring of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.1.1 Introduction to Polynomials . . . . . . . . . . . . . . . . . . . 133

5.1.2 Lagrange Interpolation. . . . . . . . . . . . . . . . . . . . . . . 138

Contents xiii

5.1.4 Greatest Common Divisor and Least Common

Multiple. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.2.1 Polynomials Modulo mðxÞ . . . . . . . . . . . . . . . . . . . . 145

5.2.2 Minimal Annihilating Polynomials . . . . . . . . . . . . . . 148

6.1 Introduction to Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . 154

6.1.1 Access Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6.1.2 Shamir’s Threshold Access Scheme . . . . . . . . . . . . . 155

6.2 A General Theory of Secret Sharing Schemes . . . . . . . . . . . . 158

6.2.1 General Properties of Secret Sharing Schemes . . . . . . 158

6.2.2 Linear Secret Sharing Schemes . . . . . . . . . . . . . . . . . 163

6.2.3 Ideal and Non-ideal Secret Sharing Schemes . . . . . . . 167

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7.1 Binary Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . 172

7.1.1 The Hamming Weight and the Hamming Distance . . . 172

7.1.2 Encoding and Decoding. Simple Examples . . . . . . . . 175

7.1.3 Minimum Distance, Minimum Weight.

Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

7.1.4 Matrix Encoding Technique . . . . . . . . . . . . . . . . . . . 182

7.1.5 Parity Check Matrix . . . . . . . . . . . . . . . . . . . . . . . . 187

7.1.6 The Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . 190

7.1.7 Polynomial Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 193

7.1.8 Bose–Chaudhuri–Hocquenghem (BCH) Codes . . . . . . 196

7.2 Non-binary Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . 199

7.2.1 The Basics of Non-binary Codes . . . . . . . . . . . . . . . 199

7.2.2 Reed–Solomon (RS) Codes . . . . . . . . . . . . . . . . . . . 201

7.3 Fingerprinting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

7.3.1 The Basics of Fingerprinting . . . . . . . . . . . . . . . . . . 204

7.3.2 Frameproof Codes. . . . . . . . . . . . . . . . . . . . . . . . . . 207

7.3.3 Codes with the Identifiable Parent Property . . . . . . . . 208

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

8.1 Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . ....... . . 214

8.1.1 Information and Information Relative to a Partition . . . 214

8.1.2 Non-uniform Encoding. Prefix Codes . . . ....... . . 217

8.2 Fitingof’s Compression Code . . . . . . . . . . . . . . . ....... . . 221

8.2.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . ....... . . 221

8.2.2 Fast Decoding . . . . . . . . . . . . . . . . . . . ....... . . 224

xiv Contents

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

9.1 Computing with GAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.1.1 Starting with GAP . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.1.2 The GAP Interface . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.1.3 Programming in GAP: Variables, Lists, Sets

and Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

9.2 Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

9.2.1 Basic Number-Theoretic Algorithms . . . . . . . . . . . . . 232

9.2.2 Arithmetic Modulo m . . . . . . . . . . . . . . . . . . . . . . . 234

9.2.3 Digitising Messages . . . . . . . . . . . . . . . . . . . . . . . . 235

9.3 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

9.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

9.4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

9.4.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

9.4.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

9.4.4 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

10.1 Linear Dependency Relationship Algorithm . . . . . . . . . . . . . . 249

10.2 The Vandermonde Determinant . . . . . . . . . . . . . . . . . . . . . . 250

11.1 Solutions to Exercises of Chap. 1 . . . . . . . . . . . . . . . . . . . . 253

11.2 Solutions to Exercises of Chap. 2 . . . . . . . . . . . . . . . . . . . . 266

11.3 Solutions to Exercises of Chap. 3 . . . . . . . . . . . . . . . . . . . . 283

11.4 Solutions to Exercises of Chap. 4 . . . . . . . . . . . . . . . . . . . . 296

11.5 Solutions to Exercises of Chap. 5 . . . . . . . . . . . . . . . . . . . . 301

11.6 Solutions to Exercises of Chap. 6 . . . . . . . . . . . . . . . . . . . . 309

11.7 Solutions to Exercises of Chap. 7 . . . . . . . . . . . . . . . . . . . . 315

11.8 Solutions to Exercises of Chap. 8 . . . . . . . . . . . . . . . . . . . . 327

Chapter 1

Integers

Rhinoceros. Eugène Ionesco (1909–1994)

The formula ‘Two and two make ﬁve’ is not without its

attractions.

Notes from Underground. Fyodor Dostoevsky (1821–1881)

The theory of numbers is the oldest and the most fundamental mathematical disci-

pline. Despite its old age, it is one of the most active research areas of mathematics

due to two main reasons. Firstly, the advent of fast computers has changed Number

Theory profoundly and made it in some ways almost an experimental discipline.

Secondly, new important areas of applications such as cryptography have emerged.

Some of the applications of Number Theory will be considered in this course.

positive integers, also called the natural numbers. The most important properties of

N are formulated in the following three principles.

The Least Integer Principle. Every non-empty set S ⊆ N of positive integers

contains a smallest (least) element.

The Principle of Mathematical Induction. Let S ⊆ N be a set of positive

integers which contains 1 and contains n + 1 whenever it contains n. Then S = N.

The Principle of Strong Mathematical Induction. Let S ⊆ N be a set of positive

integers which contains 1 and contains n + 1 whenever it contains 1, 2, . . . , n. Then

S = N.

These three principles are equivalent to each other: if you accept one of them

you can prove the remaining two as theorems. Normally one of them, most often

the Principle of Mathematical Induction, is taken as an axiom of Arithmetic but in

© Springer International Publishing Switzerland 2015 1

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_1

2 1 Integers

proofs we use all of them since one may be much more convenient to use than the

others.

Example 1.1.1 On planet Tralfamadore there are only 3 cent and 5 cent coins in

circulation. Prove that an arbitrary sum of n ≥ 8 cents can be paid (provided one has

a sufﬁcient supply of coins).

Solution: Suppose that this statement is not true and there are positive integers

m ≥ 8 for which the sum of m cents cannot be paid by a combination of 3 cent and

5 cent coins. By the Least Integer Principle there is a smallest such positive integer

s (the minimal counterexample). It is clear that s is not 8, 9 or 10 as 8 = 3 + 5,

9 = 3 + 3 + 3, 10 = 5 + 5. Thus s − 3 ≥ 8 and, since s was minimal, the sum of

s − 3 cents can be paid as required. Adding to s − 3 cents one more 3 cent coin we

obtain that the sum of s cents can be also paid, which is a contradiction.

1 1 1

+ 2 + · · · + 2 < 2.

12 2 n

Solution: Denote the left-hand side of the inequality by F(n). We have a sequence

of statements A1 , A2 , . . . , An , . . . to be proved, where An is F(n) < 2, and we are

going to use the Principle of Mathematical Induction to prove all of them.

The statement A1 reduces to

1

< 2,

12

which is true. Now we have to derive the validity of An+1 from the validity of An ,

that is, to prove that

1

F(n) < 2 implies F(n) + < 2.

(n + 1)2

Oops! It is not possible because, while we do know that F(n) < 2, we do not have

the slightest idea how close F(n) is to 2, and we therefore cannot be sure that there

1

will be room for (n+1) 2 . What shall we do?

Surprisingly, the stronger inequality

1 1 1 1

2

+ 2 + ··· + 2 ≤ 2 −

1 2 n n

1.1 Natural Numbers 3

1 1

=2− ,

12 1

and

1 1 1

F(n) ≤ 2 − implies F(n) + ≤2− (1.1)

n (n + 1)2 n+1

is now true. Due to the induction hypothesis, which is F(n) ≤ 2 − n1 , to show (1.1)

it would be sufﬁcient to show that

1 1 1

2− + 2

≤2− .

n (n + 1) n+1

This is equivalent to

1 1 1 1

≤ − = ,

(n + 1)2 n n+1 n(n + 1)

which is true.

This example shows that we shouldn’t expect that someone has already prepared

the problem for us so that the Principle of Mathematical Induction can be applied

directly.

The reader needs to be familiar with the induction principles. The exercises below

concentrate on the use of the Least Integer Principle.

Exercises

(a) Every nonempty set of integers (we do not require the integers in the set to

be positive) contains a smallest element.

(b) Every nonempty set of positive rational numbers contains a smallest element.

2. Prove that, for any integer n ≥ 1, the integer 4n + 15n − 1 is divisible by 9.

3. Prove that 11n+2 + 122n+1 is divisible by 133 for all n ≥ 0.

n

4. Let Fn = 22 + 1 be the nth Fermat number. Show that F0 F1 . . . Fn = Fn+1 − 2.

5. Prove that 2n + 1 is divisible by n for all numbers of the form n = 3k .

6. Prove that an arbitrary positive integer N can be represented as a sum of distinct

powers chosen from 1, 2, 22 , . . . , 2n , . . .

7. Use the Least Integer Principle to prove that the representation of N as a sum of

distinct powers of 2 is unique.

8. Several discs of equal diameter lie on a table so that some of them touch each

other but no two of them overlap. Prove that these discs can be painted with four

colours so that no two discs of the same colour touch.

4 1 Integers

. . . , −3, −2, −1, 0, 1, 2, 3, . . .

is denoted by Z.

Theorem 1.1.1 (Division with Remainder) Given any integers a, b, with a > 0,

there exist unique integers q, r such that

b = qa + r, and 0 ≤ r < a.

In this case we also say that q and r are, respectively, the quotient and the remainder

of b when it is divided by a. It is often said that q and r are the quotient and the

remainder of dividing a into b. The notation r = b mod a is often used. You can

ﬁnd q and r by using long division, a technique which most students learn at school.

If you want to ﬁnd q and r using a calculator, use it to divide b by a. This will give

you a number with decimals. Discard all the digits to the right of the decimal point

to obtain q. Then ﬁnd r as a − bq.

Example 1.1.3 (a) 35 = 3 · 11 + 2, (b) −51 = (−8) · 7 + 5; so that 2 = 35 mod 11

and 5 = −51 mod 7.

Deﬁnition 1.1.1 An integer b is divisible by an integer a = 0 if there exists an

integer c such that b = ac, that is, we have b mod a = 0. We also say that a is a

divisor of b and write a|b.

Let n be a positive integer. Let us denote by d(n) the number of positive divisors

of n. It is clear that 1 and n are always divisors of any number n which is greater

than 1. Thus we have d(1) = 1 and d(n) ≥ 2 for n > 1.

Deﬁnition 1.1.2 A positive integer n is called a prime if d(n) = 2. An integer n > 1

which is not prime is called a composite number.

Example 1.1.4 (a) 2, 3, 5, 7, 11, 13 are primes; (b) 1, 4, 6, 8, 9, 10 are not primes;

(c) 4, 6, 8, 9, 10 are composite numbers.

A composite positive integer n can always be represented as a product of two

other positive integers different from 1 and n. Indeed, since d(n) > 2, there is a

divisor n 1 such that 1 < n 1 < n. But then n 2 = n/n 1 also satisﬁes 1 < n 2 < n and

n = n 1 n 2 . We are ready to prove

Theorem 1.1.2 (The Fundamental Theorem of Arithmetic) Every positive integer

n > 1 can be expressed as a product of primes (with perhaps only one factor), that is

This factorisation is unique apart from the order of the prime factors.

1.1 Natural Numbers 5

Proof Let us prove ﬁrst that any number n > 1 can be decomposed into a product

of primes. We will use the Principle of Strong Mathematical Induction. If n = 2, the

decomposition is trivial and we have only one factor, which is 2 itself. Let us assume

that for all positive integers which are less than n, a decomposition into a product

of primes exists. If n is a prime, then n = n is the decomposition required. If n is

composite, then n = n 1 n 2 , where n > n 1 > 1 and n > n 2 > 1 and by the induction

hypothesis there are prime decompositions n 1 = p1 . . . pr and n 2 = q1 . . . qs for n 1

and n 2 . Then we may combine them

n = n 1 n 2 = p1 . . . pr q1 . . . qs

To prove that the decomposition is unique, we shall assume the existence of

an integer capable of two essentially different prime decompositions, and from this

assumption derive a contradiction. This will show that the hypothesis that there exists

an integer with two essentially different prime decompositions cannot be true, and

hence the prime decomposition of every integer is unique. We will use the Least

Integer Principle.

Suppose that there exists a positive integer with two essentially different prime

decompositions, then there will be a smallest such integer

n = p1 p2 . . . pr = q1 q2 . . . qs , (1.2)

where pi and q j are primes. By rearranging the order of the p’s and the q’s, if

necessary, we may assume that

p1 ≤ p2 ≤ . . . ≤ pr , q1 ≤ q2 ≤ . . . ≤ qs .

It is impossible that p1 = q1 , for, if it were the case, we would cancel the ﬁrst factor

from each side of Eq. (1.2) and obtain two essentially different prime decompositions

for the number n/ p1 , which is smaller than n, contradicting the choice of n. Hence

either p1 < q1 or q1 < p1 . Without loss of generality we suppose that p1 < q1 .

We now form the integer

n = n − p1 q 2 q 3 . . . q s . (1.3)

number is positive. It is obviously smaller than n. The two distinct decompositions

of n give the following two decompositions of n :

n = ( p1 p2 . . . pr ) − ( p1 q2 . . . qs ) = p1 ( p2 . . . pr − q2 . . . qs ), (1.4)

6 1 Integers

Since n is a positive integer, which is smaller than n and greater than 1, the prime

decomposition for n must be unique, apart from the order of the factors. This means

that if we complete prime factorisations (1.4) and (1.5) the result will be identical.

From (1.4) we learn that p1 is a factor of n and must appear as a factor in decompo-

sition (1.5). Since p1 < q1 ≤ qi , we see that p1 = qi , i = 2, 3, . . . , s. Hence, it is a

factor of q1 − p1 , i.e., q1 − p1 = p1 m or q1 = p1 (m + 1), which is impossible as q1

is prime and m + 1 ≥ 2. This contradiction completes the proof of the Fundamental

Theorem of Arithmetic. �

corresponding output of GAP will look as follows:

gap> FactorsInt(396);

[ 2, 2, 3, 3, 11 ]

gap> FactorsInt(17);

[ 17 ]

GAP conveniently remembers all 168 primes not exceeding 1000. They are stored

in the array Primes (in Sect. 9.1.3 all the primes in this array are listed). GAP can

also check if a particular number is prime or not.

gap> IsPrime(2ˆ(2ˆ4)-1);

false

gap> IsPrime(2ˆ(2ˆ4)+1);

true

What GAP cannot answer is whether or not there are inﬁnitely many primes. This is

something that can only be proved.

Proof Suppose there were only ﬁnitely many primes p1 , p2 , . . . , pr . Then form the

integer

n = 1 + p1 p2 . . . pr .

1 Euclid of Alexandria (about 325 BC–265 BC) is one of the most prominent educators of all

time. He is best known for his treatise on mathematics The Elements which is divided into 13

books: the ﬁrst six on geometry, three on number theory, one is devoted to Eudoxus’s theory of

irrational numbers and the last three to solid geometry. Euclid is not known to have made any

original discoveries and The Elements are based on the work of the people before him such as

Eudoxus, Thales, Hippocrates and Pythagoras. Over a thousand editions of this work have been

published since the ﬁrst printed version appeared in 1482. Very little, however, is known about his

life. The enormity of the work attributed to Euclid even led some researchers to suggest that The

Elements was written by a team of mathematicians at Alexandria who took the name Euclid from

the historical character who lived 100 years earlier.

1.1 Natural Numbers 7

Since n > pi for all i, it must be composite. Let q be the smallest prime factor of n.

As p1 , p2 , . . . , pr represent all existing primes, then q is one of them, say q = p1

and n = p1 m. Now we can write

1 = n − p1 p2 . . . pr = p1 m − p1 p2 . . . pr = p1 (m − p2 . . . pr ).

assumption that there were only ﬁnitely many primes must be false. �

In the past many mathematicians looked for a formula that always evaluates to

a prime number. Euler2 noticed that all values of the quadratic polynomial P(n) =

n 2 − n + 41 are prime for n = 0, 1, 2, . . . , 40. However, P(41) = 412 is not prime.

m

For the same reason Fermat introduced the numbers Fm = 22 +1, m ≥ 0, which are

now called Fermat numbers. He checked that F0 = 3, F1 = 5, F2 = 17, F3 = 257

and F4 = 65537 are primes. He believed that all such numbers are primes, however

he could not prove that F5 = 4294967297 is prime. Euler in 1732 showed that F5

was composite by presenting its prime factorisation F5 = 641 · 6700417. We can

now easily check this with GAP:

gap> F5:=2ˆ(2ˆ5)+1;

4294967297

gap> IsPrime(F5);

false

gap> FactorsInt(F5);

[ 641, 6700417 ]

Since then it has been shown that all numbers F5 , F6 , . . . , F32 are composite. The

status of F33 remains unknown (December, 2014). It is also unknown whether there

are inﬁnitely many prime Fermat numbers.

Many early scholars felt that the numbers of the form 2n − 1 were prime for all

prime values of n, but in 1536 Hudalricus Regius showed that 211 − 1 = 2047 =

23 · 89 was not prime. The French monk Marin Mersenne (1588–1648) gave in

the preface to his Cogitata Physica-Mathematica (1644) a list of positive integers

n < 257 for which the numbers 2n − 1 were prime. Several numbers in that list were

incorrect. By 1947 Mersenne’s range, n < 257, had been completely checked and it

was determined that the correct list was:

As of February 2014 there are 48 known Mersenne primes. The last one was

discovered in January 2013 by the Great Internet Mersenne Prime Search (GIMPS)

2 Leonhard Euler (1707–1783) was a Swiss mathematician who made enormous contributions in

ﬁelds as diverse as inﬁnitesimal calculus and graph theory. He introduced much of the modern

mathematical terminology and notation. [3] He is also renowned for his work in mechanics, ﬂuid

dynamics, optics, astronomy, and music theory.

8 1 Integers

project led by Dr. Curtis Cooper.3 The new prime number is 257,885,161 − 1; it has

17,425,170 digits. This is the largest known prime to date. We can check with GAP

if the number of digits of this prime was reported correctly:

gap> n:=57885161;;

gap> 2ˆn-1;

<integer 581...951 (17425170 digits)>

Exercises

1. Write a GAP program that calculates the 2007th prime p2007 . Calculate p2007 .

2. Write a GAP program that ﬁnds the smallest k for which

n = p1 p 2 . . . p k + 1

3. Find all integers a = 3 for which a − 3 is a divisor of a 3 − 17.

4. Prove that the set P of all primes that are greater than 2 is split into two disjoint

classes: primes of the form 4k + 1 and primes of the form 4k + 3. Similarly, P is

split into two other disjoint classes: primes of the form 6k + 1 and primes of the

form 6k + 5.

5. Prove that any prime of the form 3k + 1 is also of the form 6k + 1 (but for a

different k, of course).

6. GAP remembers all 168 primes not exceeding 1000. The command Primes[i];

gives you the ith prime. Using GAP:

(a) Create two lists of prime numbers, called Primes1 and Primes3, include in

the ﬁrst list all the primes p ≤ 1000 for which p = 4k + 1 and include in

the second list all the primes p ≤ 1000 for which p = 4k + 3.

(b) Output the number of primes in each list.

(c) Output the 32nd prime from the ﬁrst list and the 53rd prime from the second

list.

(d) Output the positions of 601 and 607 in their respective lists.

7. (a) Use GAP to list all primes up to 1000 representable in the form 6k + 5.

(b) Prove that there are inﬁnitely many primes representable in the form 6k + 5.

8. Give an alternative proof that the number of primes is inﬁnite along the following

lines:

3 See http://www.mersenne.org/various/57885161.htm.

1.1 Natural Numbers 9

• Given n, ﬁnd an upper bound f (n) for the number of products

might assume.

• Show that f (n) grows more slowly than n for n sufﬁciently large.

None of the ideas we have learned up to now will help us to ﬁnd the prime factorisation

of a particular integer n. Finding prime factorisations is not an easy task, and there

are no simple ways to do so. The theorem that we will prove in this section is of

some help since it tells us where to look for the smallest prime divisor of n.

Firstly, we have to deﬁne the following useful function.

Deﬁnition 1.1.3 Let x be a real number. By x we denote the largest integer n such

that n ≤ x. The integer x is called the integer part of x or the ﬂoor of x.

√

Example 1.1.6 π = 3, 19 = 4, −2.1 = −3.

Theorem√1.1.4

The smallest prime divisor of a composite number n is less than or

equal to n .

Proof

√ We prove ﬁrst that n has a divisor which is greater than 1 but not greater than

√

is composite, we have n = d1 d2 , where d1 > 1 and d2 > 1. If d1 > n

n. As n √

and d2 > n, then √

n = d1 d2 > ( n)2 = n,

√

which is impossible. Suppose,

√ d1 ≤ n. Then any of the prime divisors of d1 will

be less than or equal to n. But every divisor of d1 is also a divisor

√ of n, thus the

smallest prime√divisor p of n will satisfy the inequality p ≤ n. Since p is an

integer, p ≤ n. �

Now we may demonstrate a beautiful and efﬁcient method of listing all primes

up to x, called the Sieve of Eratosthenes.

Algorithm (The Sieve of Eratosthenes): To ﬁnd all the primes up to x begin

by writing down all the integers from 2 to x in ascending order. The ﬁrst number

on the list is 2. Leave it there and cross out all other multiples of 2. Then use the

following iterative procedure. Let d be the next√smallest number on the list that is

not eliminated.

√ Leave d on the list and, if d ≤ x, cross out all other multiples of

it. If d > x, then stop. The prime numbers up to x are those which have not been

crossed out.

10 1 Integers

square table, then at the end of the process our table will look like:

2 3 5 7

11 13 17 19

23 29

31 37

41 43 47

53 59

61 67

71 73 79

83 89

97

The numbers in this table are all primes not exceeding 100. Please√note that we

had to cross out only multiples of the primes from the ﬁrst row since 100 = 10.

The simplest algorithm for factoring integers is Trial Division.

Algorithm (Trial Division): Suppose a sufﬁciently long list of primes is available.

Given a positive√ integer n, divide it with remainder by all primes on the list which

do not exceed n, starting from 2. The ﬁrst prime which divides n (call this prime

p1 ) will be the smallest prime divisor of n. In this case n is composite. Calculate

√

n 1 = n/ p1 and repeat the procedure. If none of the primes, which do not exceed n,

divide n, then n is prime, and its prime factorisation is trivial.

Using the list of primes stored by GAP in array Primes we can apply the Trial

Division algorithm to factorise numbers not exceeding one million. Practically, it is

virtually impossible to completely factor a large number of about 100 decimal digits

only with Trial Division unless it has small prime divisors. Trial Division is very fast

for ﬁnding small factors (up to about 106 ) of n.

It is important to know how many operation will be needed to factorise n. If we

do not know how many operations are needed, it is impossible to estimate the time it

would take to use the Trial Division Algorithm in the worst possible case—the case

in which small factors are absent.

Let π(x) denote the number of primes which do not exceed x. Because of the

irregular occurrence of the primes, we cannot expect a simple formula for π(x). The

following simple program calculates this number for x = 1000.

gap> n:=1000;;

gap> piofx:=0;;

gap> p:=2;;

1.1 Natural Numbers 11

> p:=NextPrimeInt(p);

> piofx:=piofx+1;

> od;

gap> piofx;

168

As we see there are 168 primes not exceeding 1000. GAP stores them in an array

Primes. For example, the command

gap> Primes[100];

541

One of the most impressive results in advanced number theory gives an asymptotic

approximation for π(x).

Theorem 1.1.5 (The Prime Number Theorem)

ln x

lim π(x) = 1, (1.6)

x→∞ x

where ln x is the natural logarithm, to base e.

The proof is beyond the scope of this book. The ﬁrst serious attempt towards prov-

ing this theorem (which was long conjectured to be true) was made by Chebyshev4

who proved (1848–1850) that if the limit exists at all, then it is necessarily equal to

one. The existence of the limit (1.6) was proved independently by Hadamard5 and

de la Vallée-Poussin6 with both papers appearing almost simultaneously in 1896.

Corollary 1.1.1 For a large positive integer n there exist approximately n/ ln n

primes among the numbers 1, 2, . . . , n. This can be expressed as

n

π(n) ∼ , (1.7)

ln n

where ∼ means approximately equal for large n. (In Sect. 2.3 we will give it a precise

meaning.)

4 Pafnutii Lvovich Chebyshev (1821–1894) was a Russian mathematician who is largely remem-

bered for his investigations in number theory. Chebyshev is also famous for the orthogonal polyno-

mials he invented. He had a strong interest in mechanics as well.

5 Jacques Salomon Hadamard (1865–1963) was a French mathematician whose most important

result is the prime number theorem which he proved in 1896. He worked on entire functions and

zeta functions and became famous for introducing Hadamard matrices and Hadamard transforms.

6 Charles Jean Gustave Nicolas Baron de la Vallée-Poussin (1866–1962) is best known for his

proof of the prime number theorem and his major work Cours d’Analyse. He was additionally known

for his writings about the zeta function, Lebesgue and Stieltjes integrals, conformal representation,

and algebraic and trigonometric series.

12 1 Integers

√

Example 1.1.7 Suppose n = 999313. Then n = 999. Using (1.7) we approxi-

mate π(999) as 999

6.9 ≈ 145. The real value of π(999), as we know, is 168. The number

999 is too small for the approximation in (1.7) to be good.

So, if we try to ﬁnd a minimal prime divisor of n using Trial Division, then, in the worst

case scenario, we might need to perform 168 divisions. However n = 7 · 142759,

where the latter number is prime. So 7 will be discovered after four divisions only and

factored out but we will need to perform 74 additional divisions to√ prove that 142759

is prime by dividing 142759 by all primes smaller than or equal to 142759 = 377.

The following two facts are also related to the distribution of primes. Both facts

are useful to know and easy to remember.

Theorem 1.1.6 (Bertrand’s Postulate) For each positive integer n > 1 there is a

prime p such that n < p < 2n − 2.

In 1845 Bertrand7 conjectured that there is at least one prime between n and 2n −2

for every n > 3 and checked it for numbers up to at least 2 · 106 . This conjecture,

similar to one stated by Euler one hundred years earlier, was proved by Chebyshev

in 1850.

Theorem 1.1.7 There are arbitrarily large gaps between consecutive primes.

Proof This follows from the fact that, for any positive integer n, all numbers

n! + 2, n! + 3, . . . , n! + n

n!

n! + k = k +1 .

k

Exercises

1. (a) Use the Sieve of Eratosthenes to ﬁnd the prime numbers up to 210. Hence

calculate π(210) exactly.

(b) Calculate the estimate that the Prime Number Theorem gives for π(210) and

compare your result with the exact value of π(210) obtained in (a).

2. Convince yourself that the following program implements the Sieve of

Eratosthenes

7 Joseph Louis Francois Bertrand (1822–1900), born and died in Paris, was a professor at the

École Polytechnique and Collège de France. He was a member of the Paris Academy of Sciences

and was its permanent secretary for twenty-six years. Bertrand made a major contribution to group

theory and published many works on differential geometry and on probability theory.

1.1 Natural Numbers 13

n:=2*10ˆ3;;

set:=Set([2..n]);;

p:=2;;

while p<RootInt(n)+1 do

k:=2;;

while k*p<n+1 do

RemoveSet(set,k*p);

k:=k+1;

od;

p:=NextPrimeInt(p);

od;

3. Professor Woodhead has compiled a list of all primes that are less than 10,000

and is very proud of himself. He checks that the number n = 123123137 does

not have any prime divisors in his list by dividing n by all of the primes that he

found.

(a) Can he claim that n is prime?

(b) Estimate the number of additional divisions that Professor Woodhead must

perform in order to be able to claim that n is prime.

4. A composite

√ number n does not have prime divisors which are less than or equal

to 3 n. Prove that it is a product of two primes.

5. Use Bertrand’s postulate to show that any integer greater than 6 is the sum of two

relatively prime integers each of which is greater than 1.

6. What would be the output for the following GAP program?

n:=10ˆ6;

set:=Set([1..n]);

p:=3;

while p<n+1 do;

k:=1;

while k*p<n+1 do;

RemoveSet(set,k*p);

k:=k+1;

od;

p:=NextPrimeInt(p);

od;

set;

14 1 Integers

where pi are distinct primes and αi are positive integers. How can we ﬁnd all divisors

of n? Let d be a divisor of n. Then n = dm, for some m, thus

Since the prime factorisation of n is unique, d cannot have in its prime factorisation

a prime which is not among the primes p1 , p2 , . . . , pr . Also, a prime pi in the prime

factorisation of d cannot have an exponent greater than αi . Therefore

β β

d = p1 1 p2 2 . . . prβr , 0 ≤ βi ≤ αi , i = 1, 2, . . . , r. (1.9)

0, 1, 2, . . . , αi . Thus the total number of divisors will be exactly the product

(α1 + 1)(α2 + 1) . . . (αr + 1). �

It is important to note that Eq. (1.10) does not give us a complete algorithm for

the calculation of d(n) as we need to run the factorisation algorithm ﬁrst. No direct

method of calculation is known.

Deﬁnition 1.2.1 The numbers in, where i = 0, ±1, ±2, . . . , are called multiples

of n.

γ γ

m = kp1 1 p2 2 . . . prγr , γi ≥ αi , i = 1, 2, . . . , r,

where k has none of the primes p1 , p2 , . . . , pr in its prime factorisation. The number

of multiples of n is inﬁnite.

Let a and b be two positive integers. If d is a divisor of a and also a divisor of b,

then we say that d is a common divisor of a and b. As there are only a ﬁnite number

of common divisors, there is the greatest common divisor, denoted by gcd(a, b). The

number m is said to be a common multiple of a and b if m is a multiple of a and also

1.2 The Euclidean Algorithm 15

a multiple of b. Among all common multiples there is a minimal one (Least Integer

Principle!). It is called the least common multiple and it is denoted by lcm(a, b).

In the decomposition (1.8), all exponents were positive. However, sometimes it

is convenient to allow some exponents to be 0 as in the formulation of the following

theorem.

β β

a = p1α1 p2α2 . . . prαr , b = p1 1 p2 2 . . . prβr

be two arbitrary positive integers, where αi ≥ 0 and βi ≥ 0. (We could assume that

a and b are expressed using the same primes p1 , p2 , . . . , pr because we allowed

some exponents to be 0.) Then

gcd(a, b) = p1 p2 ... prmin(αr ,βr ) , (1.11)

and

max(α1 ,β1 ) max(α2 ,β2 )

lcm(a, b) = p1 p2 ... prmax(αr ,βr ) . (1.12)

Moreover,

gcd(a, b) · lcm(a, b) = a · b. (1.13)

Proof Formulas (1.11) and (1.12) follow from our description of common divi-

sors and common multiples. To prove (1.13) we have to notice that min(αi , βi ) +

max(αi , βi ) = αi + βi . �

b = 84474819 = 35 · 112 · 132 · 17 = 20 · 35 · 112 · 132 · 171 .

Theorem 1.2.2 gives us an algorithm for calculating the greatest common divisor.

However, it depends on the factorisation algorithm, which is computationally difﬁcult

using existing methods. It is suspected but has not yet been proved that no easy

algorithms for prime factorisation exist. So it is desirable in any number theoretic

algorithm to avoid factorisation of the numbers involved. The algorithm given above

for ﬁnding the greatest common divisor cannot be used unless prime factorisation has

already been done. Fortunately, the greatest common divisor gcd(a, b) of numbers

a and b can be found without knowing the prime factorisations of a and b. Such

an algorithm will be presented below. It was known to Euclid; he could even have

been the ﬁrst to have discovered it. The algorithm is based on the following simple

observation.

16 1 Integers

gcd(a, b) = gcd(b, r ).

Then r = a − qb = a d − qb d = (a − qb )d and d is also a common divisor

of b and r . Also, if d is a common divisor of b and r , then b = b d, r = r d and

a = qb + r = qb d + r d = (qb + r )d, whence d is a common divisor of a

and b. �

Now to the algorithm. The idea of it is clear: start with the pair (a, b) for which

the greatest common divisor is sought, and replace it with a “smaller” pair with the

same greatest common divisor. Repeat the process (if necessary) until the greatest

common divisor is easily seen.

Theorem 1.2.3 (The Euclidean Algorithm) Let a and b be positive integers. We use

the division algorithm several times to ﬁnd:

a = q1 b + r 1 , 0 < r1 < b,

b = q2 r 1 + r 2 , 0 < r2 < r1 ,

r 1 = q3 r 2 + r 3 , 0 < r3 < r2 ,

..

.

rs−2 = qs rs−1 + rs , 0 < rs < rs−1 ,

rs−1 = qs+1 rs .

gcd(rs−1 , rs ) = rs . �

Example 1.2.2 Let a = 321, b = 843. Find the greatest common divisor gcd(a, b).

The Euclidean algorithm yields

843 = 2 · 321 + 201

321 = 1 · 201 + 120

201 = 1 · 120 + 81

120 = 1 · 81 + 39

81 = 2 · 39 + 3

39 = 13 · 3 + 0,

321 · 843

and therefore gcd(321, 843) = 3 and lcm(321, 843) = = 107 · 843 =

3

90201.

1.2 The Euclidean Algorithm 17

prime (or coprime).

For example, the numbers 200 = 23 · 52 and 567 = 34 · 7 are coprime.

Exercises

1. How many divisors does the number 22 · 33 · 44 · 55 have? (No GAP, please.)

2. How many divisors does the number 123456789 have?

3. Find all common divisors of 10650 and 6750.

4. (a) Find the greatest common divisor and the least common multiple of m =

24 · 32 · 57 · 112 and n = 22 · 54 · 72 · 113 .

(b) Use GAP to check the identity lcm(m, n) · gcd(m, n) = m · n.

5. Find all positive integers n ≤ 10000 with exactly 33 distinct positive divisors.

6. Calculate d(d(246246 )), where d(n) is the number of divisors of n.

7. Show that gcd(a, b) = gcd(a, a − b).

8n + 13

8. Show that the fraction is in lowest possible terms for every n ≥ 1.

13n + 21

9. Suppose two positive integers a and b are relatively prime.

(a) Prove that gcd(a 2 , a + b) = 1.

(b) Suppose a +b and a 2 +b2 are not relatively prime. Find the greatest common

divisor of this pair and give an example of two such integers.

10 Show that any two distinct Fermat numbers are coprime. (Use Exercise 4 of

Sect. 1.1.1.)

11 Use Fermat numbers to give an alternative proof that the number of primes is

inﬁnite.

Theorem

Given two integers a and b we can consider all their possible linear combinations

k1 a + k2 b, where k1 , k2 ∈ Z. Let us denote this set by <a, b>. We note that a and b

belong to this set since a = 1 · a + 0 · b and b = 0 · a + 1 · b. We also note that when

we add two numbers from <a, b>, even with some coefﬁcients, we always remain

in <a, b>. Indeed, suppose we have linear combinations k1 a + k2 b and k1 a + k2 b.

Then

u(k1 a + k2 b) + v(k1 a + k2 b) = (uk1 + vk1 )a + (uk2 + vk2 ),

Analysing the chain of divisions with remainder in the formulation of Theo-

rem 1.2.3, we come to the conclusion that all remainders ri , i = 1, 2, . . . , s belong

to <a, b>. In particular, gcd(a, b) belongs to <a, b>. This is an important fact so

we formulate it as theorem for further reference.

18 1 Integers

Theorem 1.2.4 Let a and b be positive integers. Then there exist integers m and n

such that

gcd(a, b) = ma + nb. (1.14)

The numbers m and n in (1.14) are not unique, moreover there exist inﬁnitely

many such pairs. However, sometimes, knowing even one pair of such numbers is

more important than knowing the greatest common divisor itself. One pair of numbers

m and n satisfying (1.14) can be easily obtained from the Euclidean algorithm by back

substitution. The following theorem provides us with a convenient way of calculating

them. It also gives an alternative proof of the existence of m and n based on Linear

Algebra.

Theorem 1.2.5 (The Extended Euclidean Algorithm) Let us write the following

matrix with two rows R1 , R2 , and three columns C1 , C2 , C3 :

R1 a10

[C1 C2 C3 ] = = .

R2 b01

operations R3 := R1 − q1 R2 , R4 := R2 − q2 R3 , . . . , each time creating a new row,

so as to obtain: ⎡ ⎤

a 1 0

⎢b 0 1 ⎥

⎢ ⎥

⎢ r1 1 −q 1 ⎥

⎢ ⎥

[C1 C2 C3 ] = ⎢ r2 −q2 1 + q1 q2 ⎥ .

⎢ ⎥

⎢ .. ⎥

⎣ . ⎦

rs m n

Proof Note that C1 = aC2 +bC3 . In Linear Algebra you have learned that elementary

row operations do not change linear relationships between columns. Since new rows

were obtained by means of elementary row operations on the existing rows, the

relationships between the columns of C1 , C2 , C3 must be exactly the same as those

between the columns of C1 , C2 , C3 (see Sect. 10.1 of the Appendix for justiﬁcation).

Thus we conclude that C1 = aC2 + bC3 . In particular, rs = ma + nb. �

Example 1.2.3 Let a = 321, b = 843. Find a linear presentation of the greatest

common divisor in the form gcd(a, b) = ma + nb.

The Euclidean algorithm on these numbers was performed in Example 1.2.2 and

we know that gcd(321, 843) = 3 and all the quotients obtained at each division. The

Extended Euclidean algorithm yields

1.2 The Euclidean Algorithm 19

321 1 0

843 0 1 0

321 1 0 2

201 −2 1 1

120 3 −1 1

81 −5 2 1

39 8 −3 2

3 −21 8 13

where for convenience of performing row operations the quotients are placed on

the right of the bar. Thus we obtain the linear presentation gcd(321, 843) = 3 =

(−21) · 321 + 8 · 843. So m = −21 and n = 8.

The properties of relatively prime numbers gathered in the following are often

used.

(a) a and b do not have common primes in their prime factorisations;

(b) if c is a multiple of a and c is also a multiple of b, then c is a multiple of ab;

(c) if ac is a multiple of b, then c is a multiple of b;

(d) there exist integers m, n such that ma + nb = 1.

β β

a = p1α1 p2α2 . . . prαr , b = p1 1 p2 2 . . . prβr ,

min(α1 ,β1 ) min(α2 ,β2 )

gcd(a, b) = p1 p2 ... prmin(αr ,βr ) = 1,

which implies min(αi , βi ) = 0 for all i = 1, 2, . . . , r . This means that either the

prime pi does not enter the prime factorisation of a or it does not enter the prime

factorisation of b. Thus a and b do not have primes in common. This proves (a).

Let us prove (b). As we know from (a) the numbers a and b do not have primes

in common in their prime factorisations. Hence

β β

a = p1α1 p2α2 . . . prαr , b = q1 1 q2 2 . . . qsβs ,

β β

c = p1α1 p2α2 . . . prαr k = q1 1 q2 2 . . . qsβs m.

β β β

Since the prime factorisation is unique k must be divisible by q1 1 q2 2 . . . qs s , which

is b, and m must be divisible by p1α1 p2α2 . . . prαr , which is a. As a result, c is divisible

by ab,

20 1 Integers

β β

ac = p1α1 p2α2 . . . prαr c = bd = q1 1 q2 2 . . . qsβs d.

Due to the uniqueness of the prime factorisation of ac the number c must be divisible

β β β

by q1 1 q2 2 . . . qs s which is b.

Now (d) follows from Theorem 1.2.5. �

The following result is extremely important. Its author is not known exactly but

it could be Sun Tzu (or Sun Zi)8 in whose book it was ﬁrst mentioned.

Theorem 1.2.6 (The Chinese Remainder Theorem) Let a and b be two relatively

prime numbers, 0 ≤ r < a and 0 ≤ s < b. Then there exists a unique number N

such that 0 ≤ N < ab and

Proof Let us prove, ﬁrst, that there exists at most one integer N with the conditions

required. Assume, on the contrary, that for two integers N1 and N2 we have 0 ≤

N1 < ab, 0 ≤ N2 < ab and

Without loss of generality let us assume that N1 > N2 . Then the number M =

N1 − N2 satisﬁes 0 ≤ M < ab and

M = 0 and N1 = N2 .

Now we will ﬁnd an integer N such that r = N mod a and s = N mod b,

ignoring the condition 0 ≤ N < ab. By Theorem 1.2.4 there are integers m, n such

that gcd(a, b) = 1 = ma +nb. Multiplying this equation by r −s we get the equation

r − s = (r − s)ma + (r − s)nb = m a + n b.

N = r − m a = s + n b.

8 Sun Tzu (3rd–5th century AD) (or Sun Zi) was a Chinese mathematician and astronomer. He

investigated Diophantine equations. He authored “Sun Tzu’s Calculation Classic”, which contained,

among other things, the Chinese remainder theorem.

1.2 The Euclidean Algorithm 21

It clearly satisﬁes condition (1.15). If N does not satisfy 0 ≤ N < ab, we divide

N by ab with remainder. Let N = q · ab + N1 , where N1 is the remainder. Then

0 ≤ N1 < ab and N1 satisﬁes (1.15) since N1 has the same remainder as N on

division by a and also by b. The theorem is proved. �

991 1 0

441 0 1 2

109 1 −2 4

5 −4 9 21

4 85 −191 1

1 −89 200 4

thus yielding 1 = (−89) · 991 + 200 · 441. We may write 8 − 5 = 3 =

(−267) · 991 + 600 · 441 and obtain the number N = −264592 = 8 − 600 · 441 =

5 + (−267) · 991, which satisﬁes all the requirements apart from being between

0 and 437031 = 991 · 441. We divide N by 437031 with remainder. We have

−264592 = (−1) · 437031 + 172439, and the remainder N1 = 172439 will be the

number required.

Exercises

1. Use the Extended Euclidean Algorithm to ﬁnd the greatest common divisor d of

3773 and 3596 and ﬁnd any integers x and y such that d = 3773x + 3596y.

2. Using the Extended Euclidean Algorithm, ﬁnd at least one pair of integers (x, y)

satisfying 1840x +1995y = 5, and at least three pairs of integers (z, w) satisfying

1840z + 1995w = −10.

3. Let a, b, c and d be non-negative integers with c > 1 and d > 1. Suppose that

there exists an integer N such that

4. (a) Find any integer y such that

(b) Find the unique integer x such that 0 ≤ x < 3550 and

22 1 Integers

(b) Let a and b be non-zero integers. Describe the set of integers c for which

there exist integers x and y satisfying the equation ax + by = c.

Deﬁnition 1.3.1 Let n be a positive integer. The number of positive integers not

exceeding n and relatively prime to n is denoted by φ(n). This function is called

Euler’s φ-function or Euler’s totient function.

Let us denote by Zn the set {0, 1, 2, . . . , n−1} and by Z∗n the set of those positive

numbers from Zn that are relatively prime to n. Then φ(n) is the number of elements

of Z∗n , i.e., φ(n) = |Z∗n |.

Example 1.3.1 Let n = 20. Then Z∗20 = {1, 3, 7, 9, 11, 13, 17, 19} and φ(20) = 8.

k k k−1 k 1

Lemma 1.3.1 If n = p , where p is prime, then φ(n) = p − p = p 1− .

p

Proof It is easy to list all positive integers that are less than or equal to pk and

not relatively prime to p k . They are 1· p, 2· p, 3· p, . . . , ( p k−1 − 1)· p. They are all

multiples of p and we have exactly p k−1 − 1 of them. To obtain Z∗n we have to

remove from Zn all these p k−1 − 1 numbers and also 0. Therefore Z∗n will contain

p k − ( p k−1 − 1) − 1 = p k − p k−1 numbers. �

φ(n) is multiplicative in the following sense.

Theorem 1.3.1 Let m and n be any two relatively prime positive integers. Then

φ(mn) = φ(m)φ(n).

Proof Let Z∗m = {r1 , r2 , . . . , rφ(m) } and Z∗n = {s1 , s2 , . . . , sφ(n) }. Let us consider

an arbitrary pair (ri , s j ) of numbers, one from each of these sets. By the Chinese

Remainder Theorem there exists a unique positive integer Ni j such that 0 ≤ Ni j <

mn and

ri = Ni j mod m, s j = Ni j mod n,

Ni j = am + ri , Ni j = bn + s j . (1.17)

1.3 Fermat’s Little Theorem and Its Generalisations 23

is Ni j is relatively prime to m and also relatively prime to n. Since m and n are

relatively prime, Ni j is relatively prime to mn, i.e., Ni j ∈ Z∗mn . Clearly, different

pairs (i, j) = (k, l) yield different numbers, that is Ni j = Nkl for (i, j) = (k, l).

We note that there are φ(m)φ(n) of the numbers Ni j , exactly as many as there are

pairs of the form (ri , s j ). This shows that φ(m)φ(n) ≤ φ(mn).

Suppose now that a number N ∈ Zmn is different from Ni j for all i and j. Then

r = N mod m, s = N mod n,

where either r does not belong to Z∗m or s does not belong to Z∗n . Assuming the

former, we get gcd(r, m) > 1. But then gcd(N , m) = gcd(m, r ) > 1, in which case

gcd(N , mn) > 1 too. Thus N does not belong to Z∗mn . This shows that the numbers

Ni j —and only these numbers—form Z∗mn . Therefore φ(mn) = φ(m)φ(n). �

1 1 1

φ(n) = n 1 − 1− ... 1 − .

p1 p2 pr

Proof We use Lemma 1.3.1 and Theorem 1.3.1 to compute φ(n). Repeatedly apply-

ing Theorem 1.3.1 we get

φ(n) = φ p1α1 φ p2α2 . . . φ prαr .

1 1 1

φ(n) = p1α1 1 − p2α2 1 − . . . prαr 1 −

p1 p2 pr

1 1 1

= p1α1 p2α2 . . . prαr 1 − 1− ... 1 −

p1 p2 pr

1 1 1

=n 1− 1− ... 1 − ,

p1 p2 pr

as required. �

1 2 10

Example 1.3.2 φ(264) = φ(23 · 3 · 11) = 264 2 3 11 = 80. We also have

φ(269) = 268 as 269 is prime.

24 1 Integers

Corollary 1.3.1 If n = pq, where p and q are primes, then φ(n) = ( p−1)(q −1) =

pq − p − q + 1.

There are no known methods for computing φ(n) in situations where the prime

factorisation of n is not known. If n is so big that modern computers cannot factorise

it, you can publish n and keep φ(n) secret.

Exercises

1. Compute φ(125), φ(180) and φ(1001).

2. Factor n = 4386607, which is a product of two primes, given φ(n) = 4382136.

3. Find m = p 2 q 2 , given that p and q are primes and φ(m) = 11424.

2013

4. Find the remainder of 2(2 ) on division by 5.

5. Using Fermat’s Little Theorem ﬁnd the remainder on dividing by 7 the number

333555 + 555333 .

the result consistent with the hypothesis that n is prime?

7. Let p > 2 be a prime. Prove that all prime divisors of 2 p − 1 have the form

2kp + 1.

Deﬁnition 1.3.2 Let a and b be integers and m be a positive integer. We say that

a is congruent to b modulo m and write a ≡ b mod m if a and b have the same

remainder on dividing by m, that is a mod m = b mod m.

For example, 41 ≡ 80 mod 13 since the numbers 41 and 80 both have remainder

2 when divided by 13. Also, 41 ≡ −37 mod 13. When a and b are not congruent

we write a ≡ b mod m. For example, 41 ≡ 7 mod 13 because 41 has remainder 2,

when divided by 13, and 7 has remainder 7.

Lemma 1.3.2 (Criterion) Let a and b be two integers and m be a positive integer.

Then a ≡ b mod m, if and only if a − b is divisible by m.

is divisible by m if and only if r1 − r2 is divisible by m but this can happen if and

only if r1 − r2 = 0, or r1 = r2 , which is the same as a ≡ b mod m. �

1.3 Fermat’s Little Theorem and Its Generalisations 25

Lemma 1.3.3 Let a and b be two integers and m be a positive integer. Then

(a) if a ≡ b mod m and c ≡ d mod m, then a + c ≡ b + d mod m;

(b) if a ≡ b mod m and c ≡ d mod m, then ac ≡ bd mod m;

(c) if a ≡ b mod m and n is a positive integer, then a n ≡ bn mod m;

(d) if ac ≡ bc mod m and c is relatively prime to m, then a ≡ b mod m.

(b) If a ≡ b mod m and c ≡ d mod m, then m|(a − b) and m|(c − d), i.e.,

a − b = im and c − d = jm for some integers i, j. Then

ac −bd = (ac −bc)+(bc −bd) = (a −b)c +b(c −d) = icm + jbm = (ic + jb)m,

whence ac ≡ bd mod m.

(c) Follows immediately from (b).

(d) Suppose that ac ≡ bc mod m and gcd(c, m) = 1. Then, by the criterion,

(a − b)c = ac − bc is a multiple of m. As gcd(c, m) = 1, by Lemma 1.2.1(c) a − b

is a multiple of m, and by the criterion a ≡ b mod m. �

divisible by p, then a p−1 ≡ 1 mod p. Also a p ≡ a mod p for all a.

They all have different remainders on dividing by p. Indeed, suppose that for some

1 ≤ i < j ≤ p−1 we have ia ≡ ja mod p. Then by Lemma 1.3.3(d) a can be

canceled and i ≡ j mod p, which is impossible. Therefore these remainders are

1, 2, . . . , p − 1 and by repeated application of Lemma 1.3.3 (b) we have

which is

( p − 1)! · a p−1 ≡ ( p − 1)! mod p.

we get a p−1 ≡ 1 mod p. When a is relatively prime to p, the last statement follows

from the ﬁrst one. If a is a multiple of p the last statement is also clear. �

Firstly we note that 6 = 328 mod 7 and using Lemma 1.3.3(c) we ﬁnd that

3282013 mod 7 = 62013 mod 7. Now we have to reduce 2013. We can do this using

Fermat’s Little Theorem. Since for all a relatively prime to 7 we have a 6 ≡ 1 mod 7

we can replace 2013 with its remainder on division by 6. Since 3 = 2013 mod 6 we

obtain

3282013 mod 7 = 62013 mod 7 = 63 mod 7 = 6.

26 1 Integers

Fermat’s Little Theorem is a powerful (but not perfect) tool for checking primality.

Let

p := 20747222467734852078216952221076085874809964747211172927529925

89912196684750549658310084416732550077

gap> PowerMod(3,p-1,p);

1

gap> q:=pˆ2;;

gap> PowerMod(3,q-1,q)=1;

false

shows that 3 p−1 ≡ 1 mod p but for q = p 2 we have 3q−1 ≡ 1 mod q thus revealing

the compositeness of q. We will discuss thoroughly primality checking in Sect. 2.4.3.

Despite its usefulness, Fermat’s Little Theorem has limited applicability since the

modulus p must be a prime. The following theorem generalises it to an arbitrary

positive integer n. It will be very important in cryptographic applications.

a φ(n) ≡ 1 mod n

Both z i and a are relatively prime to n, therefore z i a is also relatively prime to

n. Suppose that ri = z i a mod n, i.e., ri is the remainder on dividing z i a by n.

Since gcd(z i a, n) = gcd(ri , n), one has ri ∈ Z∗n . These remainders are all different.

Indeed, suppose that ri = r j for some 1 ≤ i < j ≤ n. Then z i a ≡ z j a mod n. By

Lemma 1.3.3(d) a can be canceled and we get z i ≡ z j mod n, which is impossible.

Therefore the remainders r1 , r2 , . . . , rφ(n) coincide with z 1 , z 2 , . . . , z φ(n) , apart from

the order in which they are listed. Thus

which is

Z · a φ(n) ≡ Z mod n,

Lemma 1.3.3(d), and we get a φ(n) ≡ 1 mod n. �

1.3 Fermat’s Little Theorem and Its Generalisations 27

Example 1.3.4 Using Euler’s Theorem compute the last decimal digit (units digit)

of the number 32007 .

Since the last decimal digit of 32007 is equal to 32007 mod 10, we have to calculate

this remainder. As gcd(3, 10) = 1 and φ(10) = 4 we have 34 ≡ 1 mod 10. As

3 = 2007 mod 4 we obtain

Exercises

1. Show that:

(a) Both sides of the congruence and its modulus can be simultaneously divided

by a common positive divisor.

(b) If a congruence holds modulo m, then it also holds modulo d, where d is an

arbitrary divisor of m.

(c) If a congruence holds for moduli m 1 and m 2 , then it also holds modulo

lcm(m 1 , m 2 ).

2. Without using mathematical induction show that 722n+2 − 472n + 282n−1 is

divisible by 25 for any n ≥ 1.

3. Find all positive integer solutions x, y to the equation φ(3x 5 y ) = 600, where φ

is the Euler totient function.

4. List all positive integers a such that 0 ≤ a ≤ 242 for which the congruence

x 162 ≡ a mod 243 has a solution.

5. Without resorting to FactorsInt command, factorise n if it is known that it is

a product of two primes and that φ(n) = 3308580.

Birds of eternity sing there.

And you too ﬁnd a ring in your heart.

ing two algebraic operations on it. First, given a, b ∈ Zn , we deﬁne a new addition

a ⊕ b by

df

a ⊕ b = a + b mod n. (1.18)

therefore it is always in Zn .

28 1 Integers

1. It is commutative, a ⊕ b = b ⊕ a, for all a, b ∈ Zn .

2. It is associative, a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c, for all a, b, c ∈ Zn .

3. Element 0 (zero) is the unique element such that a ⊕ 0 = 0 ⊕ a = a, for every

a ∈ Zn .

df

4. For each a ∈ Zn there exists a unique element (−a) = n−a ∈ Zn such that

a ⊕ (−a) = (−a) ⊕ a = 0.

Proof Only the second property is not completely obvious. We prove it by noting

that a ⊕ b ≡ a + b mod n. Then by Lemma 1.3.3(a)

(a ⊕ b) ⊕ c ≡ (a ⊕ b) + c ≡ (a + b) + c mod n

and

a ⊕ (b ⊕ c) ≡ a ⊕ (b + c) ≡ a + (b + c) mod n,

difference between them is less than n. Therefore (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c). �

Deﬁnition 1.4.1 An algebraic system < G, + > which consists of a set G together

with an algebraic operation + deﬁned on it is said to be a commutative group if the

following axioms are satisﬁed:

CG1 The operation is commutative, a + b = b + a, for all a, b ∈ G.

CG2 The operation is associative, a + (b + c) = (a + b) + c, for all a, b, c ∈ G.

CG3 There exists a unique element 0 such that a + 0 = 0 + a = a, for all a ∈ G.

CG4 For every element a ∈ G there exists a unique element −a such that a +

(−a) = (−a) + a = 0, for all a ∈ G.

Thus we can reformulate Theorem 1.4.1 by saying that < Zn , ⊕ > is a commu-

tative group.

x = (−a) ⊕ b.

Proof Suppose that a ⊕ x = b, where x is a solution. Add (−a) to both sides of the

equation. We get

(−a) ⊕ (a ⊕ x) = (−a) ⊕ b,

from where, by using properties 1–4, we can ﬁnd that (−a)⊕(a ⊕ x) = ((−a)⊕a)⊕

x = 0 ⊕ x = x, hence x = (−a) ⊕ b. Similar computations show that x = (−a) ⊕ b

is indeed a solution. �

1.4 The Ring of Integers Modulo n. The Field Z p 29

8 ⊕ 13 = 21.

Now, given a, b ∈ Zn , we deﬁne a new multiplication a b by

df

a b = ab mod n. (1.19)

it is always in Zn .

Example 1.4.3 In Z12 the following identities hold: 5 5 = 1, 2 4 = 8, 4 6 = 0.

Theorem 1.4.2 The new multiplication modulo n satisﬁes the following properties:

5. It is commutative, a b = b a, for all a, b ∈ Zn .

6. It is associative, a (b c) = (a b) c, for all a, b, c ∈ Zn .

7. It is distributive relative to the addition, a (b ⊕ c) = (a b) ⊕ (a c) and

(a ⊕ b) c = (a c) ⊕ (b c), for all a, b, c ∈ Zn .

8. There is a unique element 1 in Zn such that a 1 = 1 a = a, for every a ∈ Zn .

Proof Statements 5 and 8 are clear. The two other can be proved as in

Theorem 1.4.1. �

Properties 1–8 in algebraic terms mean that Zn together with operations ⊕ and is

a commutative ring with a unity element 1 which is deﬁned by the following set of

axioms.

Deﬁnition 1.4.2 An algebraic system < R, +, · > which consists of a set R together

with two algebraic operations + and · deﬁned on it is said to be a commutative ring

if the following axioms are satisﬁed:

CR1 < R, + > is a commutative group.

CR2 The operation · is commutative, a · b = b · a, for all a, b ∈ R.

CR3 The operation · is associative, a · (b · c) = (a · b) · c, for all a, b, c ∈ R.

CR4 There exists a unique element 1 ∈ R such that a ·1 = 1·a = a, for all nonzero

a ∈ R.

CR5 The distributive law holds, that is, a · (b + c) = a · b + a · c, for all a, b, c ∈ R.

Example 1.4.4 Other commutative rings include the ring of polynomials Z[x] with

integer coefﬁcients or else with rational or real coefﬁcients. The set of all n × n

matrices over the integers Zn×n is also a ring but not commutative since axiom CR2

is not satisﬁed.

Deﬁnition 1.4.3 An element a of a ring R is called invertible if there exists an

element b in R such that a · b = b · a = 1. An element b in this case is called a

multiplicative inverse of a.

Lemma 1.4.1 If a ∈ Zn possesses a multiplicative inverse, then this inverse is

unique.

30 1 Integers

and a c = c a = 1. Then

b (a c) = b 1 = b,

and

(b a) c = 1 c = c,

If a multiplicative inverse of a exists, it is denoted a −1 .

Theorem 1.4.3 All elements from Z∗n are invertible in Zn .

Proof Let a ∈ Z∗n . Then gcd(a, n) = 1, and we can write a linear presentation of

this greatest common divisor 1 = ua + vn. Let us divide u by n with remainder w.

We have u = qn + w, where 0 ≤ w < n, and we substitute qn + w in place of u:

Example 1.4.5 Find 11−1 in Z26 and solve 11 x ⊕ 5 = 3.

Solution. We use the Extended Euclidean algorithm

26 1 0

11 0 1 2

4 1 −2 2

3 −2 5 1

1 3 −7 3

3 ⊕ 21 = 24. Finally, x = 11−1 24 = 19 24 = 14. �

Deﬁnition 1.4.4 A nonzero element a ∈ Zn is called a zero divisor if there exists

another nonzero element b ∈ Zn such that a b = 0.

Example 1.4.6 4 5 = 0 in Z10 .

Lemma 1.4.2 A divisor of zero in Zn is never invertible.

Proof Suppose that a b = 0, a = 0, b = 0 and a is invertible, that is a −1

exists. Then we have a −1 (a b) = a −1 0 = 0. The left-hand side is equal to

a −1 (a b) = (a −1 a) b = 1 b = b, hence b = 0, a contradiction. �

1.4 The Ring of Integers Modulo n. The Field Z p 31

divisor in Zn . All elements from Z∗n are invertible in Zn and all other elements are

not invertible.

Thus a m = 0 and a is a zero divisor. Thus, in Zn , aside from Z∗n , we have the zero

element and the zero divisors. On the other hand, by Theorem 1.4.3 all elements of

Z∗n are invertible. �

Hence, depending on n, the following property may or may not be true for Zn :

9. For every nonzero a ∈ Zn there is a unique element a −1 ∈ Zn such that aa −1 =

a −1 a = 1.

Deﬁnition 1.4.5 A commutative ring < R, +, · > is called a ﬁeld if the following

axiom is satisﬁed

F1 For every nonzero a ∈ R there is a unique element a −1 ∈ R such that a · a −1 =

a −1 · a = 1.

Theorem 1.4.3 all elements of Z∗p are invertible. Hence Z p is a ﬁeld. Suppose Zn

is a ﬁeld. Since all non-zero elements of any ﬁeld are invertible, by Theorem 1.4.4

Z∗n = Zn \ {0}, that is all integers smaller than n are relatively prime to n. This is

possible only when n is prime. �

Exercises

1. Prove that in any commutative ring R a divisor of zero is not invertible. (Hint:

prove ﬁrst that for any a ∈ R we have a · 0 = 0. Then follow the proof of

Lemma 1.4.2.)

2. (a) List all invertible elements of Z16 and for each invertible element a give its

inverse a −1 .

(b) List all zero divisors of Z15 and for each zero divisor a give all non-zero

elements b such that a b = 0.

3. (a) Which one of the two elements 74 and 77 is invertible in Z111 and which one

is a zero divisor? For the invertible element a, give the inverse a −1 and for

the zero divisor b give the element c ∈ Z111 such that b c = c b = 0.

(b) Solve the equations 77 x ⊕ 21 = 10 and 74 x ⊕ 11 = 0 in Z111 .

4. Let a and b be two elements of the ring Z21 and let f : Z21 → Z21 be a linear

function deﬁned by f (x) = a x ⊕ b (where the operations are computed in

Z21 ).

32 1 Integers

(a) Describe the set of all pairs (a, b) for which the function f is one-to-one.

(b) Find the range of the function f for the case a = 7, b = 3.

(c) Suppose a = 4 and b = 15. Find the inverse function f −1 (x) = c x ⊕ d

which satisﬁes f −1 ( f (x)) = x for each x ∈ Z21 .

5. How many solutions in Z11 does the equation x 102 = 4 have? List them all.

6. Given an odd number m > 1, ﬁnd the remainder when 2φ(m)−1 is divided by m.

This remainder should be expressed in terms of m.

7. (Wilson’s Theorem) Let p be an integer greater than one. Prove that p is prime if

and only if ( p −1)! = −1 in Z p . (Hint: 1 and −1 = p −1 are the only self-inverse

elements of Z∗p .)

8. Prove that any commutative ﬁnite ring R (unity is not assumed) without zero

divisors is a ﬁeld.

decimal system the zero and the ﬁrst nine positive integers are denoted by symbols

0, 1, 2, . . . , 9, respectively. These symbols are called digits. The same symbols are

used to represent all the integers. The tenth integer is denoted as 10 and an arbitrary

integer N can now be represented in the form

0, 1, 2, . . . , 9. For example,

In this notation the meaning of a digit depends on its position. Thus two digit symbols

“9” are situated in the tens and the hundreds places and their meaning is different.

In general, for the number N given by (1.20) we write

to emphasise the exceptional role of 10. This notation is called positional, and its

invention has been attributed to the Sumerians or the Babylonians. It was further

developed by Hindus, and proved to be of enormous signiﬁcance to civilisation. In

Roman symbolism, for example, one wrote

1.5 Representation of Numbers 33

It is clear that more and more new symbols such as I, V, X, C, M are needed as the

numbers get larger while with the Hindu positional system, now in use, we need

only ten “Arabic numerals” 0, 1, 2, . . . , 9, no matter how large the number is. The

positional system was introduced into medieval Europe by merchants, who learned

it from the Arabs. It is exactly this system which is to blame for the fact that the

ancient art of computation, once conﬁned to a few adepts, has become a routine

algorithmic skill that can be done automatically by a machine, and is now taught in

primary school.

Mathematically, there is nothing special about the decimal system. The use of ten

as the base goes back to the dawn of civilisation, and is attributed to the fact that

we have ten ﬁngers on which to count. Other numbers could be used as the base,

and undoubtedly some of them were used. The number words in many languages

show remnants of other bases, mainly twelve, ﬁfteen and twenty. For example, in

English the words for 11 and 12 and in Spanish the words for 11, 12, 13, 14 and 15

are not constructed on the decimal principle. In French the word for 20—vingt—

suggests that that number had a special role at some time in the past. The Babylonian

astronomers had a system of notation with base 60. This is believed to be the reason

for the customary division of the hour and the angular degree into 60 minutes. In the

theorem that follows we show that an arbitrary positive integer b > 1 can be used as

a base.

Theorem 1.5.1 Let b > 1 be a positive integer. Then every positive integer N can

be uniquely represented in the form

N = d0 + d1 b + d2 b2 + · · · + dn bn , (1.21)

the representation 1 = 1 for 1 is unique. Suppose, inductively, that every inte-

ger 1, 2, . . . , N −1 is uniquely representable. Now consider the integer N . Let

d0 = N mod b. Then N − d0 is divisible by b and let N1 = (N − d0 )/b. Since

N1 < N , by the induction hypothesis N1 is uniquely representable in the form

N − d0

N1 = = d1 + d2 b + d3 b2 + · · · + dn bn−1 .

b

Then clearly

N = d0 + N1 b = d0 + d1 b + d2 b2 + · · · + dn bn

Finally, suppose that N has some other representation in this form, i.e.,

N = d0 + d1 b + d2 b2 + · · · + dn bn = e0 + e1 b + e2 b2 + · · · + en bn .

34 1 Integers

the number

N −r

N1 = = d1 + d2 b + d3 b2 + · · · + dn bn−1 = e1 + e2 b + e3 b2 + · · · + en bn−1

b

has two different representations which contradicts the inductive assumption, since

we have assumed the truth of the result for all N1 < N . �

to express (1.21). The digits di can be found by the repeated application of the division

algorithm as follows:

N = q1 b + d0 , (0 ≤ d0 < b)

q1 = q2 b + d1 , (0 ≤ d1 < b)

..

.

qn = 0 · b + dn (0 ≤ dn < b)

For example, the positional system with base 5 employs the digits 0, 1, 2, 3, 4 and

we can write

1998(10) = 3 · 54 + 0 · 53 + 4 · 52 + 4 · 5 + 3 = 30443(5) .

But in the era of computers it is the binary (or dyadic) system (base 2) that has

emerged as the most important. This system has only two digits, 0 and 1, and a very

simple multiplication table for them. But under the binary system, representations

of numbers get longer quickly. For example,

150(10) = 1 · 27 + 0 · 26 + 0 · 25 + 1 · 24 + 0 · 23 + 1 · 22 + 1 · 2 + 0.

= 10010110(2) (1.23)

Leibniz9 was one of the ardent proponents of the binary system. According to

Laplace: “Leibniz saw in his binary arithmetic the image of creation. He imag-

ined that Unity represented God, and zero the void; that the Supreme Being drew

all beings from the void, just as unity and zero express all numbers in his system of

numeration.”

9 Gottfried Wilhelm von Leibniz (1646–1716) was a German mathematician and philosopher who

developed inﬁnitesimal calculus independently of Isaac Newton, and Leibniz’s mathematical nota-

tion has been widely used ever since it was published. He invented an early mechanical calculating

machine.

1.5 Representation of Numbers 35

Let us look at the binary representation of a number from the information point

of view. Information is measured in bits. One bit is a unit of information expressed

as a choice between two possibilities 0 and 1. The number of binary digits in the

binary representation of a number N is therefore the number of bits we need to

transmit N through an information channel (or input into a computer). For example,

the Eq. (1.23) shows that we need 8 bits to transmit or convey the number 150.

we need log2 N + 1 bits of information.

Proof Suppose that N has n binary digits in its binary representation. That is

n − 1. Hence n = log2 N + 1. �

Example 1.5.2 The input is the number 15011. Convert it to binary. What is the

length of this input?

Solution. Let 15011 = (an an−1 . . . a1 a0 )(2) be the binary representation of 15111.

We can ﬁnd the binary digits of 15011 recursively by a series of divisions with

remainder:

15011 = 2 · 7505 + 1 −→ a0 = 1,

7505 = 2 · 3752 + 1 −→ a1 = 1,

3752 = 2 · 1876 + 0 −→ a2 = 0,

1876 = 2 · 938 + 0 −→ a3 = 0,

938 = 2 · 469 + 0 −→ a4 = 0,

469 = 2 · 234 + 1 −→ a5 = 1,

234 = 2 · 117 + 0 −→ a6 = 0,

117 = 2 · 58 + 1 −→ a7 = 1,

58 = 2 · 29 + 0 −→ a8 = 0,

29 = 2 · 14 + 1 −→ a9 = 1,

14 = 2 · 7 + 0 −→ a10 = 0,

7 = 2·3+1 −→ a11 = 1,

3 = 2·1+1 −→ a12 = 1,

1 = 2·0+1 −→ a13 = 1,

we see that 15011 = 11101010100011(2) , reading the binary digits from the column

of remainders from bottom to top. Hence the length of the input is 14 bits. �

Example 1.5.3 To estimate from above and from below the number of bits required

to input an integer N which has 100-digits in its decimal representation we may use

the GAP command LogInt(N,2) to calculate log2 N . A 100-digit integer is

between 1099 and 10100 , so we have

36 1 Integers

gap> LogInt(10ˆ100,2)+1;

333

gap> LogInt(10ˆ99,2)+1;

329

So the number in this range will need between 329 and 333 bits.

The negative powers of 10 are used to express those real numbers which are not

integers. This also works in other bases. For example,

1 1 2 5 0 0 1

= 0.125(10) = + + 3 = + 2 + 3 = 0.001(2)

8 10 102 10 2 2 2

1

= 0.142857142857 . . .(10) = 0.(142857)(10) = 0.001001 . . .(2) = 0.(001)(2)

7

The binary expansions of irrational numbers, such as

√

5 = 10.001111000110111 . . .(2) ,

√ sequence of bits. But this

method is considered to be insecure. The number 5 in the example above can be

guessed from the initial segment, which will reveal the whole sequence.

Exercises

1. Find the binary representation of the number 2002(10) and the decimal represen-

tation of the number 1100101(2) .

2. (a) Find the binary representation of the number whose decimal representation

is 2011.

(b) Find the decimal representation of the number whose binary representation

is 101001000.

3. Use Euler’s Theorem to ﬁnd the last three digits in the binary representation of

751015 .

4. How many non-zero digits are there in the binary representation of the integer

. . . 001 (2) ?

n m

digits of n. Prove that n is divisible by 6 if and only if the sum a + b + c + d of

its digits is divisible by 6.

6. The symbols A, B, C, D, E and F are used to denote the digits 10, 11, 12, 13, 14

and 15, respectively, in the hexadecimal representation (i.e., to base 16).

(a) Find the decimal representation of 2A4F(16) .

(b) Find the hexadecimal representation of 1000(10) .

Chapter 2

Cryptology

Nikolai Roerich (1874–1947)

saries. In medieval times diplomats had to communicate with their superiors using

a messenger. Messengers could be killed, and letters could be captured and read

by adversaries. During times of war, orders from military headquarters needed to

be sent to the line ofﬁcers without being intercepted and understood by the enemy.

The case of a war is an extreme example where the adversary is clearly deﬁned. But

there are also situations where the existence of an ‘adversary’ is less obvious. For

example, corporate deals and all negotiations must remain secret until completed.

Sometimes two parties want to communicate privately even if they do not have any

adversaries. For example, they wish to exchange love letters, and conﬁdentiality of

messages for them remains a very high priority. Thus, a classical goal of cryptography

is privacy. Authentication is another goal of cryptography which is any process by

which you verify that someone is indeed who they claim they are. We use passwords

to ensure that only certain people have access to certain resources (for example, if

you do your banking on the Internet you do not want other people to know your

ﬁnancial situation or to tamper with your accounts). Digital signatures are a special

technique for achieving authentication. Apart from signing your encrypted emails,

digital signatures are used for other applications, for example to ensure that auto-

matic software updates originate from the company they are supposed to, rather than

being viruses. Digital signatures are to electronic communication what handwritten

signatures are to paper-based communication. Nowadays cryptography has matured

and it is addressing an ever increasing number of other goals.

In his article “Cryptography”1 Ronald Rivest writes: “The invention of radio

gave a tremendous impetus to cryptography, since an adversary can eavesdrop easily

1 Chapter 13. Handbook of Theoretical Computer Science. J. Van Leeuwen (ed.) (Elsevier, 1990)

pp. 717–755.

© Springer International Publishing Switzerland 2015 37

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_2

38 2 Cryptology

over great distance. The course of World War II was signiﬁcantly affected by use,

misuse, and breaking of cryptographic systems used for radio trafﬁc. It is intriguing

the computational engines designed and built by the British to crack the German

Enigma cipher are deemed by some to be the ﬁrst real “computers”; one could argue

that cryptography is the mother (or at least the midwife) of computer science.” (This

chapter can be downloaded from Ron Rivest’s web page.)

Here, Rivest mentions the famous “Colossus” computers. Until recently all infor-

mation about them was classiﬁed. The Colossus computers were built by a dedicated

team of British mathematicians and engineers led by Alan Turing and Tommy Flow-

ers. It was extensively used in the cryptanalysis of high-level German communica-

tions. It is believed that this heroic effort shortened the Second World War by many

months. Recently Colossus was recreated and outperformed a modern computer (in

deciphering messages which had been encrypted using the Lorenz SZ 40/42 cipher

machine).2 Due to the secrecy that surrounded everything related to Colossus, there

arose a myth that the ENIAC was the ﬁrst large-scale electronic digital calculator in

the world. It was not.

One of the oldest ciphers known is Atbash. It even appears in the Hebrew Scriptures

of the Bible. Any occurrence of the ﬁrst letter of the alphabet is replaced by the last

letter, occurrences of the second letter are replaced by the second to last etc. Atbash

is a speciﬁc example of a general technique called inversion.

Caesar is also a very old cipher used by Gaius Julius Caesar (100 BC–44 BC).

Letters are simply replaced by letters three steps further down the alphabet. This way

‘a’ becomes ‘d’, ‘b’ becomes ‘e’ etc. In fact, any cipher using a displacement of any

size is now known as a Caesar. Caesar is a speciﬁc example of a general technique

called displacement.

These two ciphers are examples of the so-called substitution methods which use

a mapping of an alphabet onto itself that replace a character with the one it maps

onto. If the mapping does not change within the message, the scheme is known as a

mono-alphabet scheme. Such cryptosystems were not very secure but were sufﬁcient

enough when literacy was not widespread.

For both of these cryptosystems it is essential to keep the method of encryption

secret, because even publicising the idea on which it is based might give away an

essential part of the security of the system, especially if the adversary managed to

intercept sufﬁciently many encrypted messages.

2 For more about this exciting project, and for further historical information about Colossus, see

http://www.codesandcyphers.org.uk/lorenz/rebuild.htm.

2.1 Classical Secret-Key Cryptology 39

By the end of the 19th century it became clear that security must be introduced

differently. In 1883 Auguste Kerckhoffs [9]3 wrote two journal articles titled La Cryp-

tographie Militaire, in which he stated six design principles for military ciphers. His

main idea—which is called now Kerckhoffs’ Principle—was that the security must

be a result of not keeping the encryption mechanism secret but as a result of keep-

ing a changeable part of the encryption mechanism—called the secret key—secret.

Depending on the secret key the encryption mechanism should encrypt messages dif-

ferently. So, even if the adversary knows the encryption method but does not know

the key, they will not know how to decrypt messages.

Thus, until recently, a standard cryptographic solution to the privacy problem was

a secret-key cryptosystem, which consisted of the following:

• A message space M: a set of strings (plaintext messages) over some alphabet (e.g.,

binary alphabet, English, Cyrillic or Arabic alphabets);

• A ciphertext space C: a set of strings (ciphertext messages) over some alphabet

(e.g., the alphabet of the dancing men in one of the Arthur Conan Doyle’s stories

of Sherlock Holmes);

• A key space K: a set of strings (keys) over some alphabet;

• An encryption algorithm E : M × K → C, which to every pair m ∈ M and k ∈ K

puts in correspondence a ciphertext E(m, k);

• A decryption algorithm D : C ×K → M with the property that D(E(m, k), k) = m

for all m ∈ M and k ∈ K.

The meaning of the last condition is that if a message is encrypted with a key k,

then the same key, when used in the decryption algorithm, will decrypt this message

from the ciphertext.

To use a secret-key cryptosystem the parties wishing to communicate privately

agree on a key k ∈ K, which they must keep secret. They communicate a message

m ∈ M by sending the ciphertext c = E(m, k). The recipient can decrypt the

ciphertext to obtain the message m by means of the key k and the decryption algorithm

D since m = D(c, k). The cryptosystem is considered to be secure if it is infeasible

in practice for an eavesdropper, who has discovered E(m, k) but does not know k, to

deduce m.

Below we present three examples.

The one-time pad is a nearly perfect solution to the privacy problem. It was invented

in 1917 by Gilbert Vernam (D. Kahn, The Codebreakers, Macmillan, New York,

1967) for use in telegraphy. In this secret-key cryptosystem the key is as long as the

message being encrypted. The key, once used, is discarded and never reused.

3 Auguste Kerckhoffs (1983–1903) was a Dutch linguist and cryptographer who was professor of

languages at the Ecole des Hautes Etudes Commerciales in Paris.

40 2 Cryptology

Suppose that parties managed to generate a very long string k of randomly chosen

0’s and 1’s. Suppose that they also managed to secretly deliver k to all parties involved

with the intention to use it as a key. If a party A wishes to send a telegraphic message

m to other parties, then it writes the message as a string of zeros and ones m =

m1 m2 . . . mn , takes the ﬁrst n numbers from k, that is k = k1 k2 . . . kn and adds these

two strings component-wise mod 2 to get the encrypted message

c = m ⊕ k = c1 c2 . . . cn , where ci = mi ⊕ ki .

Then A destroys the ﬁrst n numbers of the key. On the receiving end all other parties

decrypt the message c by computing m = c ⊕ k and also destroy the ﬁrst n numbers

of the key. When another message is to be sent, another part of the key will be

used—hence the name “one-time pad.” This system is unconditionally secure in

the following sense. If c = c1 c2 . . . cn is the ciphertext, then an arbitrary message

m = m1 m2 . . . mn could be sent. Indeed, if the key were m ⊕ c, then m ⊕ (m ⊕ c) = c

and the ciphertext is c.

For written communication this system can be modiﬁed as follows. Each letter of

the alphabet is given a number in Z26 :

A B C D E F G H I J K L M

0 1 2 3 4 5 6 7 8 9 10 11 12

N O P Q R S T U V W X Y Z

13 14 15 16 17 18 19 20 21 22 23 24 25

You then agree to use a book, little-known to the general public (considered as a

very long string of letters), as the secret key. For example, “The Complete Poems of

Emily Dickinson” would be a good choice.4 Then you do the same as we did with

telegraphic messages except that we add messages mod 26. Suppose we need to send

a message

BUY TELECOM SHARES

Best Witchcraft is Geometry

To the magician’s mind –

His ordinary acts are feats

To thinking of mankind.

B U Y T E L EC O M S HA R E S

1 20 24 19 4 11 4 2 14 12 18 7 0 17 4 18

2.1 Classical Secret-Key Cryptology 41

BE S T W I T CHC R AF T I S

1 4 18 19 22 8 19 2 7 2 17 0 5 19 8 18

2 24 16 12 0 19 23 4 21 14 9 7 5 10 12 10

C Y Q M A T X E V O JHF K M K

CYQMATXEVOJHFKMK.

This version of the one-time pad is much less secure as it is vulnerable to frequency

analysis.

Exercises

1. Use Khlebnikov’s poem

Today I will go once again

Into life, into haggling, into the ﬂea market,

And lead the army of my songs

To duel against the market tide.5

WOAPDYWCAUERKYWHZRGSXQJW.

2. Use the GAP command Random([0..25]); to generate a sequence of 20 random

letters of the alphabet.

3. Using the sequence obtained in the previous exercise as a key for a one-time pad

cryptosystem, encrypt and then decrypt back the sentence by Emily Dickinson

“I HAVE NO TIME TO HATE”. The GAP programs LettertoNumber and

NumbertoLetter found in Sect. 9.2.3 can help you to convert messages into

the digital format and back.

This is a substitution cipher which is also based on modular arithmetic. The key to

this cryptosystem is a pair k = (a, b) of numbers a ∈ Z∗26 , b ∈ Z26 . Under this

5 Velemir Khlebnikov (1885–1922) was one of the key poets in the Russian Futurist movement but

his work and inﬂuence stretch far beyond it. He was educated as a mathematician and his poetry is

very abstract and mathematical. He experimented with the Russian language, drawing deeply upon

its roots.

42 2 Cryptology

system a number in Z26 is assigned to every letter of the alphabet as in the previous

section. Each letter is encoded into the corresponding number x it is assigned to and

then into the letter to which the number a x ⊕ b is assigned. For instance, if a = 3

and b = 5, then the letter “H” will have a numerical encoding “7”. Then 37⊕5 = 0

is computed, and we note that “0” is the numerical encoding for “A”, which shows

that “H” is encrypted into “A”. Using the key k = (3, 5), the message

INZKRMRLVPHAFERH

The requirement that a ∈ Z∗26 , i.e., gcd(a, 26) = 1 is needed to ensure that the

encryption function is one-to-one. Indeed, this is equivalent to having the function

E(x) = a x ⊕ b

E(x ⊕ d) and E is not one-to-one. In particular, E(0) = E(d) and unambiguous

decryption is impossible. On the other hand, if a is invertible, then a x ⊕ b = y

implies x = a−1 (y ⊕ (−b)) and the decryption function exists:

Since the key is very short, this system is not secure: one can simply use all keys one

by one and see which key gives a meaningful result. However it can be meaningfully

used in combination with other cryptosystems. For example, if we use this encryption

ﬁrst and then use a one-time pad (or the other way around), the frequency analysis

will be very much hampered.

Exercises

2. Using the afﬁne cryptosystem with the key k = (11, 13) encrypt the message

CRYPTO and decrypt the message DRDOFP.

3. In an afﬁne cryptosystem with an unknown key Eve guessed that the letter F was

encrypted as N and the letter K was encrypted as O. Help Eve to calculate the

key.

4. A plaintext (in English) has been encrypted using an afﬁne cryptosystem. The

obtained ciphertext is:

ljpcc puxya nip ljc cbhcx quxya wxrcp ljc aqo achcx nip ljc rskpn bipra ux ljcup

jkbba in alixc xuxc nip miplkb mcx riimcr li ruc ixc nip ljc rkpq bipr ix jua rkpq

ljpixc ux ljc bkxr in miprip sjcpc ljc ajkrisa buc ixc puxy li pwbc ljcm kbb ixc puxy

li nuxr ljcm ixc puxy li vpuxy ljcm kbb kxr ux ljc rkpqxcaa vuxr ljcm ux ljc bkxr

in miprip sjcpc ljc ajkrisa buc

2.1 Classical Secret-Key Cryptology 43

Find the original plaintext. The following estimates of the relative frequencies of

the 26 letters in English texts may be of some help. You are also encouraged to

use any computer assistance you need.

a 0.082 j 0.002 s 0.063

b 0.015 k 0.008 t 0.091

c 0.028 l 0.040 u 0.028

d 0.043 m 0.024 v 0.010

e 0.127 n 0.067 w 0.023

f 0.022 o 0.075 x 0.001

g 0.020 p 0.019 y 0.020

h 0.061 q 0.001 z 0.001

i 0.070 r 0.060

called The Hill Cipher which was invented in 1929 by Lester S. Hill. Instead of sub-

stituting letters it substitutes blocks of letters of ﬁxed length m. The whole message

is divided into such m-tuples and each m-tuple is encrypted separately as follows.

The key for this cryptosystem is an invertible m × m matrix over Z26 . Both matrix

operations, addition and multiplication, are deﬁned by means of addition and multi-

plication modulo 26. Since we will not be using any other operations, it is no longer

appropriate to write the symbols ⊕ and for modular operations. To simplify things

we will use ordinary notation. We will consider the case m = 2 and therefore pairs

of letters and 2 × 2 matrices. Let

ab

K=

cd

x1 x1 y1

(P1 , P2 ) → →K = → (C1 , C2 ),

x2 x2 y2

where x1 , x2 are the numerical codes for P1 , P2 and y1 , y2 are the numerical codes

for C1 , C2 . The invertibility of K is needed for the unambiguous recovery of x1 , x2

from y1 , y2 .

44 2 Cryptology

33

K=

25

and suppose the plaintext message is HELP. Then this plaintext is represented by

two pairs

H L 7 11

HELP → , → , .

E P 4 15

Then we compute

33 7 7 33 11 0

= , =

25 4 8 25 15 19

7 0 H A

, → , → HIAT

8 19 I T

The matrix K is invertible, hence an inverse K −1 exists such that KK −1 =

−1

K K = I2 , where I2 is the identity matrix of order 2. It follows that

x1 x1 x1

K −1 K = I2 = ,

x2 x2 x2

−1 −1 5 23 5 23 15 17

K =9 =3 = .

24 3 24 3 20 9

H A 7 0

HIAT → , → , .

I T 8 19

Then we compute

15 17 7 7 15 17 0 11

= , =

20 9 8 4 20 9 19 15

2.1 Classical Secret-Key Cryptology 45

7 11 H L

, → , → HELP.

4 15 E P

of invertibility for matrices over R is a nonzero determinant. Since Z26 has zero

divisors, we have to slightly modify the standard criterion.

Theorem 2.1.1 An n × n matrix K over Z26 is invertible if and only if det(K) is an

invertible element in Z26 .

Proof We will prove this theorem only for n = 2. Let us consider a 2 × 2 matrix

ab

K= whose determinant is = det(K) = ad − bc. Let us compute

cd

ab d −b 0

= . (2.1)

cd −c a 0

ab d −b 0 00

= .

cd −c a 0 00

d −b 0

L= .

−c a 0

shown that K cannot be invertible. If, however,

d −b 0 d −b 00

= ,

−c a 0 −c a 00

ab 0 00

= ,

cd 0 00

On the other hand, Eq. 2.1 shows that if is invertible, then

−1

ab d −b

= −1

cd −c a

is the inverse. �

46 2 Cryptology

so-called known plaintext attack. Indeed, if a k × k matrix K is a key, then it is

normally enough to know that the message fragments m1 , m2 , . . . , mk are encrypted

as c1 , c2 , . . . , ck . Indeed, if the ith column of a matrix X represents the numerical

encodings of the plain text fragment mi and the ith column of a matrix Y represents

the numerical encodings of the cipher text fragment ci , then Y = KX, from which the

key can be found as K = YX −1 . In rare cases the matrix X may appear degenerate, in

which case we will not be able to ﬁnd K exactly but will still have much information

about it.

In cryptanalysis, which is the art of breaking ciphers, the so-called method of

‘cribs’ is widely used. This term was introduced by cryptographers in Bletchley Park

and it means a suspected plaintext. For example, an English language text contains

the word ‘that’ with high probability and a letter often starts with the word ‘dear’.

Exercises

1. (a) Which one of the two matrices, considered as matrices over Z26 ,

1 12 16

,

12 1 6 1

is invertible and which is not? Find the inverse for the invertible matrix.

(b) Let M be the invertible matrix found in part (a). Use it as a key in Hill’s

cryptosystem to encrypt YEAR and to decrypt ROLK.

2. In Hill’s cryptosystem with the key

11 12

K=

12 11

ﬁnd all pairs of letters XY which do not change after being encoded twice, i.e.,

if we encode XY we get a pair ZT which is being encoded as XY.

3. You have captured the ciphertext

NWOLBOTEPEHKICNSHR.

You know it has been encrypted using the Hill cipher with a 2 × 2 matrix and you

know the ﬁrst 4 letters of the message are the word “DEAR”. Find the secret key

and decrypt the message.

4. The key for Hill’s cryptosystem is the following matrix over Z26

⎡ ⎤

1 2 3 4 5

⎢9 11 18 12 4⎥

⎢ ⎥

K =⎢

⎢1 2 8 23 3⎥⎥.

⎣7 14 21 5 1⎦

5 20 6 5 0

2.1 Classical Secret-Key Cryptology 47

WGVUUTGEPVRIMFTXMXMHCYTNGYMJJE

EZKEWHLQQISDJYJCTYEUBYKFBWPBBE

5. (advanced linear algebra required) Prove that a square n × n matrix A over Z26 is

invertible if and only if its determinant det(A) is an invertible element of Z26 .

Traditional secret-key cryptology assumes that both the sender and the receiver must

be in possession of the same secret key which they use both for encryption and

decryption. This secret key must be delivered all around the world, to all the corre-

spondents. This is a major weakness of this system.

Modern public-key cryptology breaks the symmetry between the sender and the

receiver. It requires two separate keys for each user, one of which is secret (or private)

and one of which is public. The public key is used to encrypt plaintext, whereas the

private key is used to decrypt ciphertext. Since the public key is no longer secret it

can be widely distributed without compromising security. The term “asymmetric”

stems from the use of different keys to perform the encryption and the decryption,

each operation being the inverse of the other—while the conventional (“symmetric”)

cryptography uses the same key to perform both.

The computational complexity is the main reason why the system works. The

adversary will know how to decrypt messages but will still be unable to do it due to

the extremely high complexity of the task.

function.

putation of f (n), given n, is computationally easy while the computation of n, given

f (n), is intractable.

Example 2.2.1 Given the availability of ordinary telephone books the function

can be easily performed in seconds as it is easy to ﬁnd the name in the book since

they are listed in the alphabetical order but the function

48 2 Cryptology

can hardly be performed at all, since in the worst case you need to read the whole

book in order to ﬁnd the name corresponding to a given number. You might need a

month to do that.

deﬁnition. It contains references to ‘easy’ and ‘intractable’ tasks which may be

dependent on the computing resources available.

A publicly available one-way function f has a number of useful applications. In

time-shared computer systems, instead of storing a table of login passwords, one

can store, for each password w, the value f (w). Passwords can easily be checked

for correctness at login, but even the system administrator cannot deduce any user’s

password by examining the stored table.

parameter (it can be a number, a graph, a function, anything) such that it is compu-

tationally easy to compute n, given f (n) and t. Then t is called a trapdoor and f is

called a trapdoor function.

Example 2.2.2 Imagine that you have taken the time to enter the telephone directory

into your computer, sorted all phone numbers in increasing order, and printed them.

Suppose that it took one month of your time. Then you possess a trapdoor to the one-

way function f described in Example 2.2.1. For you it is equally easy to compute f or

f −1 and you are the only person (at least for the next month) who can compute f −1 .

Imagine that Alice possesses a trapdoor function

f (TEXT) = CIPHERTEXT

with a secret trapdoor t. Then she puts this function f in the public domain, where it

is accessible to everyone, and asks everybody to send her f (TEXT) each time when

the necessity arises to send a message TEXT conﬁdentially. Knowing the trapdoor t,

it is an easy job for her to compute the TEXT from f (TEXT) while it is infeasible to

compute it for anybody else. The function f (or a certain parameter which determines

f uniquely) is called Alice’s public key and the trapdoor t is called her private key.

Example 2.2.3 Let us see how we can use the trapdoor function of Example 2.2.1

to construct a public-key cryptosystem. Take the University telephone directory and

announce the method of encryption as follows. Your correspondent must take a letter

of your message, ﬁnd a name in the directory starting with this letter, and assign

to this letter the phone number of the person with the chosen name. She must do

it with all letters of your message. Then all these phone numbers combined will

form a ciphertext. For example, the message SELL BRIERLY, sent to you, will be

encrypted as follows:

2.2 Modern Public-Key Cryptology 49

S SCOTT 8751

E EVANS 8057

L LEE 8749

L LEE 5999

B BANDYOPADHYAY 7439

R ROSENBERG 5114

I ITO 7518

E ESCOBAR 6121

R RAWIRI 7938

L LEE 6346

Y YU 5125

87518057874959997439511475186121793863465125

For decryption you must use your private key, which is the inverse telephone directory.

In this section we will develop several rigorous concepts necessary for implementing

the idea of the previous section. To measure the running time of an algorithm we need

ﬁrst to choose a unit of work, say one multiplication, or one division with remainder,

etc.; we will often call the chosen units of work as steps.

It is often the case that not all instances of a problem under consideration are

equally hard even if the two inputs are of the same length. For example, if we feed

an algorithm two different—but equally long—inputs (and we feed them in one at

a time, not both at once) then the algorithm might require an astronomical number

of operations to deal with the ﬁrst input, but only a handful of operations to deal

with the second input. The (worst case) time complexity of an algorithm is a function

that for each length of the input shows the maximal number of units of work that

may be required. We say that an algorithm is of time complexity f (n) if for all n and

for all inputs of n bits, the execution of the algorithm requires at most f (n) steps.

The worst-case complexity takes into consideration only the hardest instances of the

problem. It is relevant when people are pessimistic, and think that it is very likely

that a really hard instance of the problem will crop up.

Average-case complexity, on the other hand, estimates how difﬁcult the relevant

problem is ‘on average’. An optimist, thinking that hard instances of the problem

are rare, will be more interested in the average-case than the worst-case complexity.

At present, much less is known about the average-case complexity than about the

worst-case one, so we concentrate on the former.

We need a language to compare the time complexity functions of different

algorithms.

50 2 Cryptology

Deﬁnition 2.3.1 Let f (x) and g(x) be two real-valued functions. We say that f (n) ∼

g(n) (read “f is asymptotically equal to g”) if

f (n)

lim = 1.

n→∞ g(n)

a0 nd . Indeed, when n → ∞

f (n) a0 nd + a1 nd−1 + · · · + ad a1 1 ad 1

= =1+ · + ··· + · → 1.

a0 nd a0 nd a0 n a0 nd

√

n! ∼ 2πn · nn e−n (2.2)

For comparing the growth of functions we use the “little-oh,” “big-Oh” and “big-

Theta” notation.

f (n)

lim = 0.

n→∞ g(n)

Informally, this means that f grows more slowly than g when n gets large.

This is almost obvious since

1000n2.9 1000

3

= 0.1 → 0.

n n

However not all comparisons can be done so easily. To compare the rate of growth

of two functions one often needs L’Hospital’s rule. We formulate it in the form that

suits our applications.

Theorem 2.3.1 (L’Hospital’s rule.) Let f (x) and g(x) be two differentiable functions

such that limx→∞ f (x) = ∞, and limx→∞ g(x) = ∞. Suppose that

2.3 Computational Complexity 51

f (x)

lim

x→∞ g (x)

exists. Then

f (x) f (x)

lim = lim .

x→∞ g(x) x→∞ g (x)

√

Example 2.3.4 ln n = o( n).

Let us justify this using L’Hospital’s rule. Indeed,

lim √ = lim √ = lim √ = lim √ = 0.

x→∞ x x→∞ ( x) x→∞ 1/2 x x→∞ x

(a) again follows from L’Hospital’s rule and we leave it as an exercise. (b) follows

from Stirling’s formula. Indeed,

cn cn 1 (ec)n 1 ec n

lim = lim √ = lim √ · n = lim √ · = 0.

n→∞ n! n→∞ 2πn · nn e−n n→∞ 2πn n n→∞ 2πn n

Deﬁnition 2.3.3 We say that f (n) = O(g(n)) (read “f is big-Oh of g”) if there exists

a number C > 0 and an integer n0 such that for n > n0

Informally, this means that f doesn’t grow at a faster rate than g when n gets large.

√

Example 2.3.6 (a) sin n = O(1), (b) 1000n3 + n = O(n3 ).

√ In the3 ﬁrst case |sin n| ≤ 1 · 1 so we can take C = 1. In the second, we note that

n ≤ n , hence

√

1000n3 + n ≤ 1001n3

d k

Proposition 2.3.1 Let f (x) = k=0 ak x be a polynomial of degree d. Then f (n) =

O(nd ).

Proof Let C = |a0 | + |a1 | + · · · + |ad |. Then x i < x d for sufﬁciently large x, and

52 2 Cryptology

Deﬁnition 2.3.4 We say that f (n) = (g(n)) (read “f is big-Theta of g”) if there

exist two numbers c, C > 0 and an integer n0 such that for n > n0

Informally, this means that f grows as fast as g does when n gets large.

Example 2.3.7 πn+sin(n) = (n) since 2n < |πn+sin(n)| < 5n so we can choose

c = 2 and C = 5.

measure the growth of other functions by comparing their growth against the standard

ones:

O(1) at most constant Θ(1) constant

O(log n) at most logarithmic Θ(log n) logarithmic

O(n) at most linear Θ(n) linear

O(n2 ) at most quadratic Θ(n2 ) quadratic

O(n3 ) at most cubic Θ(n3 ) cubic

O(nd ) at most polynomial Θ(nd ) polynomial

O(cn ) at most exponential Θ(cn ) exponential

O(n!) at most factorial Θ(n!) factorial

These functions are listed in increasing order of the rapidity of growth. Of course

there are some intermediate cases like O(log log n) and O(n log n). The table below

provides estimates of the running times of algorithms for certain orders of complexity.

Here we have problems with input strings of 2, 16 and 64 bits.

size

n log n n n log2 n n2 2n n!

2 1 2 2 4 4 2

16 4 16 64 256 6.5 × 104 2.1 × 1013

64 6 64 384 4096 1.8 × 1019 >1089

If we assume that one operation (unit of labour) requires 1 µs (= 10−6 s), then it

is worth noting that a problem with exponential complexity will require on input of

64 bits:

Problems which can only be solved by algorithms whose time complexity is expo-

nential quickly become intractable when the size of the input grows. That is why

mathematicians and computer scientists consider polynomial growth as the upper

2.3 Computational Complexity 53

growth is considered to be intractable (though there are some interesting intermedi-

ate cases, such as the subexponential time complexity algorithms for factorisation of

integers).

Exercises

√

1. Prove that (log n)2 = o( n).

2. Use L’Hospital rule to compare the growth of the two functions:

√

f (n) = n2007 , g(n) = 2 n

.

x

4. It has been experimentally established that the function ψ(x) = 2 lndtt approxi-

mates the function π(x) introduced in Sect. 1.1.3 even better than x/ ln x. Using

L’Hospital’s rule, prove that

x

ψ(x) ∼ .

ln x

5. List the following functions in increasing order of magnitude, when n → ∞:

√

(a) f (n) = (ln n)1000 , g(n) = n10 , h(n) = 3 en .

(b) f (n) = esin n , g(n) = n2 , h(n) = ln n!

6. We say that is a perfect power if there are positive integers m > 1 and k > 1 such

that = mk . Suppose that the unit of work is execution of one GAP command

RootInt(x,y) and that multiplication is costless. Write a GAP program that has

polynomial complexity and determines if the given integer n is a perfect power

or not. Find out if the following number n is a perfect power

32165915990795960806201156497692131799189453658831821777511700748913568729

08523398835627858363307507667451980912979425575549941566762328495958107942

76742746387660103832022754020518414200488508306904576286091630047326061732

13147723760062022617223850536734439419187423527298618434826797850608981800

75920878659088367693192622340064634811419535028889335540064440165586139725

67864525460233092587652156920261205787558242189274149331895101172683052822

80727849358699658455141506222721476847645629705008614991371536420103263486

34959615993459063845793313984237722143683892937148998975391746809877568851

72762336013543700624574174575024244791527281937.

7. Use Stirling’s formula to establish the character of growth of the following bino-

mial coefﬁcients:

(a) nk , where k is ﬁxed,

(b) nk , where k ∼ αn, and α is a ﬁxed real number with 0 < α < 1.

54 2 Cryptology

Algorithms

In a number theoretic algorithm the input is often a number (or several numbers).

So what is the length of the input in bits if it is an integer N? In other words, we are

asking how many zeros and ones one needs to express N. This question was solved

in Sect. 1.5, where we learned how to represent numbers in binary. By Theorem 1.5.2

to express N in binary we need n = log2 N + 1 bits of information. For most

calculations it would be sufﬁcient to use the following approximations: N ≈ 2n and

n ≈ log2 N.

Now we will consider two algorithms for calculating cN mod m, where c and m

are ﬁxed numbers. Here N is the input and cN mod m is an output. The running

time of the algorithm will be measured by the number of modular multiplications

required. In ordinary arithmetic this measure might not be satisfactory since the

numbers grow in size and some multiplications are much more labour intensive than

the others. However, in modular arithmetic all numbers are of approximately equal

size and our assumption is realistic.

Algorithm 1 is given by the formula cN = (. . . (((c · c) · c) · c) . . .) · c. That is we

calculate powers of c recursively by setting c1 = c and ci+1 = ci · c. To calculate

cN by this method we require N − 1 multiplications. Hence the complexity function

f (n) for this algorithm is f (n) = N − 1 ≈ 2n − 1, where n = log2 N + 1 is the

length of the input. Since 21 2n < f (n) < 2n we have f (n) = (2n ). This algorithm

has exponential complexity.

We have been too straightforward in calculating cN mod m and the result was

appalling. We can be much more clever and do much better.

Algorithm 2 (Square and Multiply): Let us represent N in binary

1 2 3 2 k k−1

c2 = c2 mod m, c2 = (c2 )2 mod m, c2 = (c2 )2 mod m, . . . , c2 = (c2 )2 mod m

multiplications may be required to calculate

i0 +2i1 +···+2is i0 i1 is

cN = c2 = c2 · c2 · · · · · c2 mod m.

2.3 Computational Complexity 55

So n − 1 ≤ f (n) ≤ 2n − 1. This means that f (n) = (n) and the algorithm has

linear complexity. We have now proven the following theorem.

Theorem 2.3.2 Let c and m be positive integers. Then for every positive integer N we

can calculate cN mod m using at most 2 log N multiplications modulo m. Algorithm

2 (Square and Multiply) has linear complexity.

Example 2.3.8 How many multiplications are needed to calculate c29 using Algo-

rithms 1 and 2?

The binary representation for 29 is as follows:

29 = 16 + 8 + 4 + 1 = 11101(2) .

c8 , c16 by successive squaring, and then we will need 3 more to calculate c29 =

c16 · c8 · c4 · c. Thus Algorithm 2 would use 7 multiplications in total. Algorithm 1

would use 28 multiplications.

The complexity of the Euclidean algorithm will also be important for us. So we

prove:

Theorem 2.3.3 For any two positive integers a and b the Euclidean algorithm will

ﬁnd their greatest common divisor after at most 2 log2 N + 1 integer divisions with

remainder, where N = max(a, b).

Proof Let us make one observation ﬁrst. Suppose a = qb + r is a division with

remainder, a = a/gcd(a, b), b = b/gcd(a, b), and r = r/gcd(a, b). Then a =

qb + r is also a division with remainder. Hence the number of steps that the

Euclidean algorithm (Theorem 1.2.3) requires is the same for the pair (a, b) as for

the pair (a , b ). This allows us to assume that gcd(a, b) = 1. Let us also assume that

a is not smaller than b.

We will ﬁrst prove that if a ≥ b (as we just assumed) then on dividing a by b with

remainder

a = qb + r, (0 ≤ r < b),

we get r < a/2. Indeed, if q ≥ 2, then r < b < a/q ≤ a/2, and when q = 1, then

b > a/2, hence r = a − b < a/2.

Let us perform the Euclidean algorithm on a and b

a = q1 b + r1 , 0 < r1 < b,

b = q2 r1 + r2 , 0 < r2 < r1 ,

r1 = q3 r2 + r3 , 0 < r 3 < r2 ,

..

.

rs−2 = qs rs−1 + rs , 0 < rs < rs−1 ,

rs−1 = qs+1 rs .

56 2 Cryptology

Then rs = gcd(a, b). Due to the observation at the beginning of the proof we can

conclude that

r3 < r1 /2 < a/4, r5 < r3 /2 < a/8,

a b

and by induction r2k+1 < and r2k < k . Suppose the algorithm stops at step

2k+1 2

s, i.e., after calculating that rs = 1. Then if s = 2k + 1, we have 2k+1 < a and

k < log2 a, whence s = 2k + 1 < 2 log2 a + 1. Hence s ≤ 2 log2 a = 2 log2 N. If

s = 2k, then 2k < b, whence k < log2 b ≤ log2 N, and s = 2k < 2 log2 N.

If a is smaller than b then we will need an additional step, and the number of steps

will be no greater than 2 log2 N + 1. �

Now we can draw conclusions about the time complexity of the Euclidean algo-

rithm. For one unit of work we will adopt the execution of a single a mod b operation,

that is division of a by b with remainder.

Proof The upper bound in Theorem 2.3.3 can be interpreted as follows. The number

log2 N, where N = max(a, b), is almost exactly the number of bits, say k, in the

binary representation of N. So the length of the input, n, (numbers a and b) is at least

k and at most 2k while the number of units of work is at most 2k. So for the time

complexity function f (n) we have f (n) ≤ 2n. Thus f (n) = O(n). �

In Sect. 1.1.3 we saw that the Trial Division algorithm for factoring an integer n

(which, we recall,√ could involve performing as many divisions as there are primes

between 2 and n), was computationally difﬁcult. Now we can state this precisely.

It has exponential time complexity!

Theorem 2.3.4 (A worst-case time complexity for factoring) The Trial Division

algorithm for factoring integers has exponential complexity.

Proof Let the unit of work be one division. Let us assume that we have an inﬁnite

memory and that all primes are stored there: p1 , p2 , . . . , pm , . . .. Given a positive

√

integer N we have to try to divide it by all primes which do not exceed M = N.

According to the Prime Number Theorem there are approximately

√

M N

≈2

ln M ln N

such primes. This means

√ that in the worst-case scenario we have to try all of them

and thus perform 2 N/ ln N divisions. Since N ≈ 2n , where n is the number of

input bits, the worst-case complexity function takes the form

2 √ n 2 1 √ n

f (n) ≈ 2 ≈ · · 2 .

n ln 2 ln 2 n

2.3 Computational Complexity 57

√

Let 2 = αβ, where α > 1 and β > 1. Then

1 √ n αn

· 2 = · βn > βn.

n n

In the case of calculating Nth powers we know one efﬁcient and one inefﬁcient

algorithm. For factoring integers we know only one and it is inefﬁcient. All attempts

of researchers in number theory and computer science to come up with a more efﬁ-

cient algorithm have resulted in only very modest improvements. Several algorithms

are known that are subexponential with the time complexity function, for example,

1/3 2/3

f (n) = ecn (ln n) (see [1]). This growth is still very fast. At the moment of writing

it is not feasible to factor a 200 digit integer unless it has many small divisors.

Exercises

1. (a) Estimate the number of bits required to input an integer N which has 100

digits in its decimal representation.

(b) Represent n = 1234567 in binary and decide how many multiplications

mod m the Square and Multiply algorithm would require to calculate

cn mod m.

2. The Bubble Sort Algorithm takes a ﬁnite list of numbers and arranges them in

the increasing order. Given a list of numbers, it compares each item in the list

with the item next to it, and swaps them if the former is larger than the latter. The

algorithm repeats this process until it makes a pass all the way through the list

without swapping any items (in other words, all items are in the correct order).

This causes larger values to “bubble” to the end of the list while smaller values

“sink” towards the beginning of the list.

Assume that one needs 100 bits to input any number on the list (so the length of

the input is 100n). Take one swap as one unit of work. Determine the worst case

complexity of the Bubble Sort Algorithm. Use the appropriate notation (big-oh,

little oh, etc.) to express the character of the growth.

3. The input of the following algorithm is a positive integer N. The algorithm tries to

divide N by the ﬁrst (log2 N)3 primes and, if one of them divides N, it declares N

composite. If none of those primes divide N, the algorithm declares N interesting.

What is the worst-case complexity of this algorithm?

4. Let (fn ) be the sequence of Fibonacci numbers given by f0 = f1 = 1 and fn+2 =

fn+1 + fn .

(a) Prove that

fn < 2fn−1 and fn+5 > 10fn .

(b) Using part (a), prove Lamé’s theorem that the number of divisions with

remainder required by the Euclidean algorithm for ﬁnding gcd(a, b) is at

most ﬁve times the number of decimal digits in the smaller of a or b.

58 2 Cryptology

Alice wishes to receive conﬁdential messages from her correspondents. For this

purpose she may use the public-key RSA cryptosystem, named after Rivest, Shamir

and Adelman [2], who invented it in 1977. It is widely used now. It is based on the

fact that the mapping

f : x → x e mod n

for a specially selected very large number n and exponent e is a one-way function.

1. she generates two large primes p = q of roughly the same size;

2. calculates n = pq and φ = (p − 1)(q − 1), where φ is the value of the Euler

φ-function, φ(n);

3. using trial and error method, selects a random integer e with 1 < e < φ and

gcd(e, φ) = 1;

4. computes d such that ed ≡ 1 mod φ and 1 < d < φ.

We will later discuss how Alice can generate two large primes. She can then do steps

2–4 because the complexity of the Extended Euclidean Algorithm is so low that it

easily works for very large numbers. Note that ﬁnding d is also done by the Extended

Euclidean algorithm.

Alice uses a certain public domain which is accessible for all her correspondents,

for example, her home page, to publish her public key (n, e), keeping everything

else secret; in particular, d which is Alice’s private key (which will be used for

decryption). It must be clear for everybody that (n, e) is indeed Alice’s public key

and nobody but Alice could publish it.

She then instructs how to use her public key to convert text into ciphertext. In the

ﬁrst instance all messages must be transformed into numbers by some convention

speciﬁed by Alice, e.g., we may use “01” instead of “a”, “02” instead of “b”, etc.;

for simplicity, let us not distinguish between upper and lower case, and denote a

space by “27”. Thus a message for us is a non-negative integer. The public key (n, e)

stipulates that Alice may receive messages, which are non-negative integers m which

are smaller than n. (If the message is longer it should be split into several shorter

messages.) The message m must be encrypted applying the following function to the

message:

f (m) = me mod n.

function to everybody but Alice who has a trapdoor d (we will see later how it can

2.4 The RSA Public-Key Cryptosystem 59

be used for decryption). For example, when Bob wishes to send a private message

to Alice, he obtains Alice’s public key (n, e) from the public domain and uses it as

follows:

• turns the message text into a nonnegative integer m < n (or several of them if

breaking the text into blocks of smaller size is necessary);

• computes c = me mod n;

• sends the ciphertext c to Alice.

Alice then recovers the plaintext m using her private key d (which is the trapdoor

for f ) by calculating

m = cd mod n.

This may seem to be a miracle at this stage but it can (and, below, will) be explained.

This system can work only because of the clever choice of the primes p and q.

Indeed, p and q should be chosen so that their product n = pq is infeasible to factorise.

This secures that p and q are known only to Alice, while at the same time n and her

public exponent e are known to everybody. This implies that Alice’s private exponent

d is also known only to her. Indeed, to calculate d from publicly known parameters,

one needs to calculate φ(n) ﬁrst. But the only known method of calculating φ(n)

requires calculation of the prime factorisation of n. Since it is infeasible, we can

publish n but keep φ(n), and hence d, secret.

Example 2.4.1 This is of course a very small example (too small for practical pur-

poses), just to illustrate the algorithms involved. Suppose Alice’s arrangements were

as follows:

1. p = 101, q = 113;

2. n = pq = 11413, φ = (p − 1)(q − 1) = 11200;

3. e = 4203 (picked at random from the interval (1, φ), making sure that

gcd(e, φ) = 1);

4. d = 3267 (the inverse of e in Zφ );

5. the public key is therefore (11413, 4203), the private key is 3267.

If Bob wants to send the message “Hello Alice” he transforms it into a number as

described. The message is then represented by the integer

0805121215270112090305.

This is too large (≥11413), so we break the message text into chunks of 2 letters

at a time.

B. Bob computes c = me = 8054203 ≡ 6134 mod 11413;

C. Alice decrypts this message fragment by calculating cd = 61343267 ≡ 805

mod 11413.

60 2 Cryptology

If Bob wants to receive an encrypted answer from Alice he has to set up a similar

scheme. In practice people do not set up cryptosystems individually but use a trusted

provider of such services. Such a company would create a public domain and place

there all public keys attributed to participating individuals. Such a company creates

an infrastructure that makes encrypted communication possible. The infrastructure

that is needed for such cryptosystem to work is called a public-key infrastructure

(PKI) and the company that certiﬁes that a particular public key belongs to a certain

person or organisation is called a certiﬁcation authority (CA). The most known such

companies are Symantec (which bought VeriSign), Comodo, GlobalSign, Go Daddy

etc. Furthermore, we will show in Sect. 2.5 that the PKI also allows Alice and Bob

to sign their letters with digital signatures.

Exercises

1. With the primes given in Example 2.4.1 decide which one of the two numbers

e1 = 2145 and e2 = 3861 can be used as a public key and calculate the matching

private key for it.

2. Alice and Bob agreed to use the RSA cryptosystem to communicate in secret.

Each message consist of a single letter which is encoded as

Bob’s public key is (n, e) = (143, 113) and Alice sent him the message 97. Which

letter did Alice send to Bob in this message?

3. Alice’s public exponent in RSA is e = 41 and the modulus is n = 13337. How

many multiplications mod n does Bob need to perform to encrypt his message

m = 2619? (Do not do the actual encryption, just count.)

4. Set up your own RSA cryptosystem. Demonstrate how a message addressed to

you can be encrypted and how you can decrypt it using your private key.

5. Alice and Bob have the public RSA keys (20687, 17179) and (20687, 4913),

respectively. Bob sent an encrypted message to Alice, Eve found out that the

encrypted message was 353. Help Eve to decrypt the message, suspecting that

the modulus 20687 might be a product of two three-digit primes. Try to do it with

an ordinary calculator ﬁrst, then check your answer with GAP.

6. Alice and Bob encrypt their messages using the RSA method. Bob’s public key

is (n, e) = (24613, 1003).

(a) Alice would like to send Bob the plaintext m = 183. What ciphertext should

she send?

(b) Bob knows that φ(n) = 24300 but has forgotten his private key d. Help Bob

to calculate d.

(c) Bob has received the ciphertext 16935 from Casey addressed to him. Show

how he ﬁnds the original plaintext.

2.4 The RSA Public-Key Cryptosystem 61

1. Why is m = (me )d mod n?

2. Can me mod n and cd mod n be calculated efﬁciently?

3. To what extent can the RSA system be considered ‘secure’ as a cryptosystem?

4. How can the encryption and decryption exponents e and d be found?

5. How can large primes p and q be found?

Let us address these issues one by one.

1. First we consider the question of why the text recovered by Alice via her

private decryption key is actually the original plaintext. This means we must consider

(me )d mod n. We note that since ed ≡ 1 mod φ and φ = φ(n) = (p − 1)(q − 1)

we have ed = 1 + φ(n)k for some integer k. Suppose ﬁrst that m and n are coprime.

Then by Euler’s theorem mφ(n) ≡ 1 mod n and

e d k

m = med = m1+φ(n)k = m · mφ(n) ≡ m mod n.

There is a very small probability that m will be divisible by p or q but even in this

unlikely case we still have m = (me )d mod n. To prove this we have to consider

(me )d mod p and (me )d mod q separately. Indeed,

e d

m = med = m1+(p−1)(q−1)x = m · m(p−1)(q−1)x

m mod p if gcd(m, p) = 1,

≡

0 mod p if p|m.

since in the ﬁrst case by Fermat’s Little Theorem m(p−1) ≡ 1 mod p. In both cases

we see that m ≡ (me )d mod p.

Similarly we ﬁnd (me )d ≡ m mod q. Then the statement follows from the Chinese

Remainder Theorem (Theorem 1.2.6). According to this theorem, there is a unique

integer N in the interval [0, pq) such that N ≡ m mod p and N ≡ m mod q. We have

two numbers with such property, namely m and (me )d mod n. Hence they coincide

and m = (me )d mod n.

We have established that the decrypted message is identical to the message that

was encrypted. This resolves the ﬁrst issue.

2. To resolve the second issue we considered the computational problem of raising

a number to a power. The complexity of this operation is very low, in fact it is linear

(see Theorem 2.3.2). Hence me mod n and cd mod n can be calculated efﬁciently.

3. It is evident that if the prime factorisation of the number n in the public key is

known then anybody can compute φ and thus d. In this case encrypted messages are

not secure. But for large values of n the task of factorisation is too difﬁcult and time

consuming to be feasible. So the encryption function (raise to power e mod n) is a

one-way function, with d as a trapdoor.

62 2 Cryptology

To illustrate how secure the system is Rivest, Shamir and Adelman encrypted a

sentence in English. This sentence was converted into a number as we did before

(the only difference was that they denoted a space as “00.” Then they encrypted it

further using e = 9007 and

n = 11438162575788886766932577997614661201021829672124236256256184293

5706935245733897830597123563958705058989075147599290026879543541.

These two numbers were published, and it was made known that n = pq, where

p and q are primes which contain 64 and 65 digits in their decimal representations,

respectively. Also published was the message

f (m) = 9686961375462206147714092225435588290575999112457431987469512093

0816298225145708356931476622883989628013391990551829945157815154.

An award of $100 was offered for decrypting it. This award was only paid 17 years

later, in 1994, when Atkins et al. [3] reported that they had decrypted the sentence.

This sentence—“The magic words are squeamish ossifrage,”—was placed in the title

of their paper. For decrypting, they factored n and found p and q which were

p = 3490529510847650949147849619903898133417764638493387843990820577

and

q = 32769132993266709549961988190834461413177642967992942539798288533.

In this work 600 volunteers participated. They worked 220 h on 1600 computers to

achieve this result! Recently, in 2009, another effort involving several researchers

factored a 232-digit number (RSA-768) utilising hundreds of machines over a span

of two years.6 Of course, doable does not mean practical but for very sensitive

information one would now want to choose primes as large as containing 150 digits

and even more.

It can be shown that ﬁnding d is just as hard as factoring n, and it is believed that

ﬁnding any trapdoor is as hard as factoring n, although this has not been proven. 30

years have passed since RSA was invented and so far all attacks on RSA have been

unsuccessful.

4. To ﬁnd e and d we need only the Euclidean and the Extended Euclidean algo-

rithms. Indeed, ﬁrst we try different numbers between 1 and φ(n) at random until

we ﬁnd one which is relatively prime to φ(n) (the fact that it can be done quickly we

leave here without proof). This will be taken as e. Since d is the inverse of e modulo

φ(n), we ﬁnd d using the Extended Euclidean algorithm. This can be done because

the Euclidean algorithm is very fast (Corollary 2.3.1).

2.4 The RSA Public-Key Cryptosystem 63

5. One may ask: if we cannot factor positive integers efﬁciently, then surely we

will not be able to say if a number is prime or not. If so, our wonderful system is in

danger because two big primes cannot be efﬁciently found. However this is not the

case and it is easier to establish if a number is prime or not than to factorise it. We

devote the next section to checking primality.

In the case of RSA it is preferable to use the following encodings for letters:

A B C D E F G H I J K L M

11 12 13 14 15 16 17 18 19 20 21 22 23

N O P Q R S T U V W X Y Z

24 25 26 27 28 29 30 31 32 33 34 35 36

The advantage of it is that a letter always has a two-digit encoding which resolves

some ambiguities. We will use it from now on and, in particular, in exercises.

Exercises

1. In RSA Bob has been using a product of two large primes n and a single public

exponent e. In order to increase security, he now chooses two public exponents

e1 and e2 which are both relatively prime to φ(n). He asks Alice to encrypt her

messages twice: once using the ﬁrst exponent and then using another one. That

is, Alice is supposed to calculate c1 = me1 (mod n), then c2 = c1e2 (mod n),

and send c2 to Bob. He has also prepared two decryption exponents d1 and d2

for decrypting her messages. Does this double encryption increase security over

single encryption?

2. Eve intercepted the following message from Bob to Alice:

In the public domain Eve learns that this message was sent using the encryption

modulus n = pq = 30796045883. She also observes that Alice’s public key

is e = 48611. Decode the message which was encoded using the encodings

A = 11, B = 12, . . . , Z = 36.

3. Eve has intercepted the following message from Bob to Alice

[ 427849968240759007228494978639775081809,

498308250136673589542748543030806629941,

925288105342943743271024837479707225255,

95024328800414254907217356783906225740 ]

She knows Bob used the RSA cryptosystem with the modulus

64 2 Cryptology

n = 956331992007843552652604425031376690367

and that Alice’s public exponent is e = 12398737. She also knows that, to convert

their messages into numbers, Bob and Alice usually use the encodings: space =

00, A = 11, B = 12, . . . , Z = 36. Help Eve to break the code and decrypt the

message.

In this section we will discuss four probabilistic tests that might be used for testing the

compositeness of integers. Their sophistication and quality will gradually increase,

and only the last one will be practical.

By a pseudoprimality test we mean a test that is applied to a pair of integers (b, n),

where 2 ≤ b ≤ n − 1, and that has the following characteristics:

(a) The possible outcomes of the test are: “n is composite” or “inconclusive”.

(b) If the test reports “n is composite” then n is composite.

(c) The test runs in a time that is polynomial in log n (i.e., in the number of bits

necessary to input n).

If n is prime, then the outcome of the test will be “inconclusive” for every b. If the

test result is “inconclusive” for one particular b, then we say that n is a pseudoprime

to the base b (which means that n is so far acting like a prime number).

The outcome of the test for the primality of n depends on the base b that is chosen.

In a good pseudoprimality test there will be many bases b that will reveal that n is

composite in case it is composite. More precisely, a good pseudoprimality test will,

with high probability (i.e., for a large number of choices of the base b) declare that

a composite number n is composite. More formally, we deﬁne

(b, n) is good if there is a ﬁxed positive real number t such that 0 < t ≤ 1, and every

composite integer n is declared to be composite for at least t(n − 2) choices of the

base b, in the interval [2, n − 1].

least t and, most importantly, this number t does not depend on n. This is, in fact,

sufﬁcient for practical purposes since we can increase this probability by running this

test several times for several different bases. Indeed, if the probability of missing the

compositeness of n is p, then the probability of missing the compositeness running

it for two different bases will be p2 and for k different bases pk . For k → ∞ this

value quickly tends to 0, hence we can make our test as reliable as we want it to be.

Of course, given an integer n, it is silly to say that “there is a high probability that

n is prime”. Either n is prime or it is not, and we should not blame our ignorance on

n itself. Nonetheless, the abuse of language is sufﬁciently appealing and it is often

2.4 The RSA Public-Key Cryptosystem 65

said that a given integer n is very probably prime if it was subjected it to a good

pseudoprimality test, with a large number of different bases b, and have found that

it is pseudoprime to all of those bases.

Here are four examples of pseudoprimality tests, only one of which is good.

Test 1. Given b, n. Output “n is composite” if b divides n, else “inconclusive.”

If n is composite, the probability that it will be so declared is the probability that

we happen to have found an integer b that divides n. The probability of this event, if

b is chosen at random uniformly from [2, n − 1], is

d(n) − 2

p(n) = ,

n−2

where d(n) is the number of divisors of n. Certainly p(n) is not bounded from below

by a positive constant t, if n is composite. Indeed, if ni = p2i , where pi is the ith

prime, then d(ni ) = 3, and

1

p(ni ) = →0

ni − 2

4 2

p(n) = = .

42 21

Test 2. Given b, n, where 2 ≤ b ≤ n−1. Output “n is composite” if gcd(b, n) = 1,

else output “inconclusive.”

This test runs in linear time and it is a little better than Test 1, but not yet good.

If n is composite, the number of bases b for which Test 2 will produce the result

“composite” is n − φ(n) − 1, where φ is the Euler totient function. Indeed, we have

φ(n) numbers b that are relatively prime to n; for those numbers b and only for those

we have gcd(b, n) = 1. We also have to exclude b = n which is outside of the range.

Hence the probability of declaring a composite n composite will be

n − φ(n) − 1

p(n) = .

n−2

For this test the number of useful bases will be large if n has some small prime

factors, but in that case it is easy to ﬁnd out that n is composite by other methods.

If n has only a few large prime factors, say if n = p2 , then the proportion of useful

bases is very small, and we have the same kind of inefﬁciency as in Test 1. Indeed,

if ni = p2i , then φ(ni ) = pi (pi − 1) and

ni − φ(ni ) − 1 p2 − pi (pi − 1) − 1 pi − 1 1

p(ni ) = = i 2

= 2 ∼ →0

ni − 2 pi − 2 pi − 2 pi

66 2 Cryptology

if pi → ∞.

Example 2.4.3 Suppose n = 44 = 22 · 11. Then φ(n) = 44 1 − 21 1 − 1

11 = 20,

and

44 − 20 − 1 23

p(n) = = .

42 42

Test 3. Given b, n. If b and n are not relatively prime or if bn−1 ≡ 1 mod n then

output “n is composite”, else output “inconclusive”.

This test rests on Fermat’s Little Theorem. Indeed, if gcd(b, n) > 1 or gcd(b, n) =

1 and bn−1 ≡ 1 mod n, then n cannot be prime since, if n was prime, by Fermat’s

Little Theorem in the latter case we must have bn−1 ≡ 1 mod n. It also runs in linear

time if we use the Square and Multiply algorithm to calculate bn−1 , and it works

much better than the previous two tests.

Example 2.4.4 To see how this test works let us calculate 232 mod 33. We obtain:

Unfortunately, this test is still not good. It works well for most but not for all num-

bers. The weak point of it is that there exist composite numbers n, called Carmichael

numbers, with the property that the pair (b, n) produces the output “inconclusive”

for every integer b in [2, n − 1] that is relatively prime to n. An example of such

a Carmichael number is n = 561, which is composite (561 = 3 · 11 · 17), but for

which Test 3 gives the result “inconclusive” for every integer b < 561 that is rel-

atively prime to 561 (i.e., that is not divisible by 3 or 11 or 17). For Carmichael

numbers Test 3 behaves exactly like Test 2, which we know is unsatisfactory. More-

over, it was proved recently that there are inﬁnitely many Carmichael numbers [4],

which means that the drawback is serious. The ﬁrst ten Carmichael numbers7 are:

561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341 . . .

Despite such occasional misbehaviour, the test usually seems to perform quite

well. When n = 169 (a difﬁcult integer for tests 1 and 2) it turns out that there

are 158 different b’s in [2, 168] that produce the “composite” outcome from Test 3,

namely every such b except for 19, 22, 23, 70, 80, 89, 99, 146, 147, 150, 168.

Finally, we will describe a good pseudoprimality test. The idea was suggested in

1976 by Miller (see the details in [5]).

gcd(b, n). If gcd(b, n) > 1 then we output “composite”. If gcd(b, n) = 1, let us

represent n − 1 as n − 1 = 2s t, where t is an odd integer. If

2.4 The RSA Public-Key Cryptosystem 67

(b) for every integer i in [0, s − 1]

i

b2 t ≡ −1 mod n,

Let us convince ourselves that Test 4 works. For this we need the identity

s−1 s

(a − 1)(a + 1)(a2 + 1) · · · · · (a2 + 1) = a2 − 1, (2.3)

Suppose that conditions (a) and (b) are satisﬁed but n is prime. Then gcd(b, n) = 1.

Substituting a = bt into the identity (2.3) and, using Fermat’s Little Theorem, we

will obtain

s−1 t s

(bt − 1)(bt + 1)(b2t + 1) · · · · · (b2 + 1) = b2 t − 1 = bn−1 − 1 ≡ 0 mod n.

However, by (a) and (b) every bracket is non-zero modulo n. Hence there are zero

divisors in Zn which contradicts the primality of n. This means that if the test outputs

“composite”, the number n is composite.

What is the computational complexity of this test? By Theorem 2.3.3, part (a) of

the test can be done in O(log n) divisions with remainder, and the complexity of this

is at most linear. Similarly, in part (b) of the test there are O(log n) possible values

of i to check, and for each of them we do a single multiplication of two integers

i i−1 i−1

calculating b2 t = b2 t · b2 t , each of which has O(log n) bits. Hence the overall

complexity is still linear.

b, such that 2 ≤ b ≤ n − 1, Test 4 gives the result “n is composite”.

This means that Test 4 is a good pseudoprimality test and, if we choose b at random

to prove the compositeness of n, then we will ﬁnd the required b with probability

greater than 3/4. Hence we can set t = 3/4. The proof of this result cannot be

considered in this book.

Example 2.4.5 If n = 169, then it turns out that for 157 of the possible 167 bases b

in [2, 168] Test 4 will output “169 is composite”. The only bases b that 169 can fool

are 19, 22, 23, 70, 80, 89, 99, 146, 147, 150, 168. In this case the performance of

Test 4 and of Test 3 are identical. However, there are no analogues of the Carmichael

numbers for Test 4.

How can this pseudoprimality test be used to ﬁnd large primes? Suppose that you

want to generate an n-digit prime. You generate an arbitrary n-digit number r and

subject it to a good pseudoprimality test (for example, Rabin–Miller Test) repeating

the test several times. Suppose that we have done k runs of Test 4 with different

68 2 Cryptology

random b’s and each time got the answer ‘inconclusive’. If r is composite, then the

probability that we get the answer “inconclusive” once is less than 1/4. If we run

this test k times, the probability that we get the answer “inconclusive” k times is less

than 1/4k . For k = 5 this probability is less than 10−3 . For k = 10 it is less than

10−6 , which is a very small number already. Since Test 4 performs very quickly we

may run this test 100 times. If we got the answer “inconclusive” all 100 times, the

probability that n is composite is negligible.

In 2002 Agrawal et al. [6] came up with a polynomial deterministic algorithm

(AKS algorithm) for primality testing. It is based on the following variation of Fer-

mat’s Little Theorem for polynomials:

Theorem 2.4.2 Let gcd(a, n) = 1 and n > 1. Then n is prime if and only if

(x − a)n ≡ (x n − a) mod n.

The authors received the 2006 Gödel Prize and the 2006 Fulkerson Prize for this work.

Originally the AKS algorithm had complexity O((log n)12 ), where n is the number

to be tested, but in 2005 C. Pomerance and H.W. Lenstra, Jr. demonstrated a variant

of AKS algorithm that runs in O((log n)6 ) operations, a marked improvement over

the bound in the original algorithm. Despite all the efforts it is still not yet practical,

but a number of researchers are actively working on improving this algorithm. See

[7] for more information on the algorithm and a proof of Theorem 2.4.2.

Exercises

1. We implement the ﬁrst and the second pseudoprimality tests by choosing at ran-

dom b in the interval 1 < b < n and applying it to the pair (b, n).

(a) What is the probability that the ﬁrst pseudoprimality test ﬁnds that 91 is

composite?

(b) What is the probability that the second pseudoprimality test ﬁnds that 91 is

composite?

2. Show that the third pseudoprimality test ﬁnds that 91 is composite for the pair

(5, 91).

n

3. Prove that any number Fn = 22 + 1 is either a prime or a pseudoprime to the

base 2. (Use Exercise 4 in Sect. 1.1.1.)

4. Write a GAP program that checks if a number n is a Carmichael number. Use it

to ﬁnd out if the number 15841 is a Carmichael number.

5. Prove without using GAP that 561 is a Carmichael number, i.e., a560 ≡ 1 mod 561

for all a relatively prime to 561.

6. Show that 561 is a pseudoprime to the base 7 (i.e., n = 561 passes the Third

Pseudoprimality Test with b = 7) but not a pseudoprime to the base 7 relative to

the Miller–Rabin test.

7. Show that the Miller–Rabin test with b = 2 proves that n = 294409 is composite

(despite 294409 being a Carmichael number).

8. Show that a power of a prime is never a Carmichael number.

2.5 Applications of Cryptology 69

1976 by Difﬁe and Hellman [8], and it triggered the development of public-key

cryptography. Two parties A and B openly agree on two parameters: a positive integer

n and g ∈ Zn . They secretly choose two exponents a and b, respectively. Then A

sends g a to B and B sends g b to A. After that, B takes the received g a to the exponent

b to get g ab and A takes g b to the exponent a and also gets g ab . Then they use g ab

as their secret key. An eavesdropper has to compute g ab from g, g a and g b which for

n sufﬁciently large is intractable. The Elgamal cryptosystem, which we will study

later, develops this idea further.

2. Digital signatures. The notion of a digital signature may prove to be one of

the most fundamental and useful inventions of modern cryptography. A signature

scheme provides a way for each user to sign messages so that the signatures can be

veriﬁed by anyone. More speciﬁcally, each user can create a matched pair of private

and public keys so that only they can create a signature for a message (using their

private key) but anyone can verify the signature for the message (using the signer’s

public key). The veriﬁer can convince himself that the message content has not been

altered since the message was signed. Also, the signer cannot later repudiate having

signed the message, since no one but the signer possesses the signer’s private key.

For example, when your computer receives a software update, say from Adobe,

it checks the digital signature to make sure that this is a genuine update from Adobe

and not a virus or trojan.

At this stage the only public-key cryptosystem that we know is the RSA but as

we will see the idea can also be used for other cryptosystems. If in RSA n = pq

is the product of two large primes p and q, then the message space M is the set

{0, 1, 2, . . . , n − 1}. We have functions EU and DU (encryption and decryption) as

where eU and dU are the public exponent and the private exponent of user U, respec-

tively. One can turn this around to obtain a digital signature. If m is a document which

is to be signed by the user U, then she computes her signature as s = DU (m). The

user sends m together with the signature s. Anyone can now verify the signature by

testing whether EU (s) ≡ m mod NU or not.

This idea was ﬁrst proposed by Difﬁe and Hellman [8]. The point is that if the

message m was changed then the old signature would be no longer valid, and the

only person who can create a new signature, matching the new message, should be

someone who knows the private key DU and we assume that only user U possess

DU .

By analogy with the paper world, where Alice might sign a letter and seal it in an

envelope addressed to Bob, Alice can sign her electronic letter m to Bob by appending

her digital signature DA (m) to m, and then seal it in an “electronic envelope” with

Bob’s address by encrypting her signed message with Bob’s public key, sending

70 2 Cryptology

the resulting message EB (m|DA (m)) to Bob. Only Bob can open this “electronic

envelope” by applying his private key to it to obtain DB (EB (m|DA (m))) = m|DA (m).

After that he will apply Alice’s public key to the signature obtaining EA (DA (m)). On

seeing that EA (DA (m)) = m, Bob can be really sure that the message m came from

Alice and its content was not altered by a third party.

These applications of public-key technology to electronic mail are likely to

become widespread in the near future. For simplicity, we assumed here that the

message m was short enough to be transmitted in one piece. If the message is long

there are methods to keep the signature short. We will not dwell on this here.

3. Pay-per-view movies. It is common these days that cable TV operators with all-

digital systems encrypt their services. This lets cable operators activate and deactivate

a cable service without sending a technician to your home. The set-up involves each

subscriber having a set-top box, which is a device connected to a television set at the

subscribers’ premises and which allows a subscriber to view encrypted channels of

his choice on payment. The set-top box contains a set of private keys of the user. A

‘header’ broadcast in advance of the movie contains keys sufﬁcient to download the

actual movie. This header is in turn encrypted with the relevant user public keys.

4. Friend-or-foe identiﬁcation. Suppose A and B share a secret key K. Later, A

is communicating with someone and he wishes to verify that he is communicating

with B. A simple challenge-response protocol to achieve this identiﬁcation is as

follows:

• A generates a random value r and transmits r to the other party.

• The other party (assuming that it is B) encrypts r using their shared secret key K

and transmits the result back to A.

• A compares the received ciphertext with the result he obtains by encrypting r

himself using the secret key K. If the result agrees with the response from B, A

knows that the other party is B; otherwise he assumes that the other party is an

impostor.

This protocol is generally more useful than the transmission of an unencrypted shared

password from B to A, since the eavesdropper could learn the password and then pre-

tend to be B later. With the challenge-response protocol an eavesdropper presumably

learns nothing about K by hearing many values of r encrypted with K as key.

An interesting exercise is to consider whether the following variant of the above

idea is secure: A sends the encryption of a random r, B decrypts it and sends the

value r to A, and A veriﬁes that the response is correct.

Exercises

1. Alice and Bob agreed to use Difﬁe–Hellman secret key exchange to come up with

a secret key for their secret key cryptosystem. They openly agreed on the prime

p = 100140889442062814140434711571

2.5 Applications of Cryptology 71

a = 123456789. She also got a message g b = 92639204398732276532642490

482 from Bob. Which message should she send to Bob and how should she

calculate the shared secret key?

2. Alice and Bob have the following RSA parameters:

nA = 171024704183616109700818066925197841516671277, eA = 1571,

nB = 839073542734369359260871355939062622747633109, eB = 87697.

pB = 8495789457893457345793, qB = 98763457697834568934613.

then encrypts the pair (m, s) using Bob’s public key by calculating (m1 , s1 ), where

m1 = meB (mod nB ) and s1 = seB (mod nB ). She obtains

m1 = 119570441441889749705031896557386843883475475,

s1 = 443682430493102486978079719507596795657729083

and sends the pair (m1 , s1 ) to Bob. Show how Bob can ﬁnd the message m and

verify that it came from Alice. (Do not try to convert digits of m into letters, the

message is meaningless.)

References

1. Lenstra, A.K., Lenstra, H.W., Manasse, M.S., Pollard, J.M.: The number ﬁeld sieve. In: Pro-

ceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, pp. 564–

572,14–16 May 1990

2. Rivest, R.L., Shamir, A., Adelman, L.: A method for obtaining digital signatures and public key

cryptosystems. Commun. ACM 21(2), 120–126 (1978)

3. Atkins, D., Graff, M., Lenstra, A.K., Leyland, P.C.: The magic words are squeamish ossifrage.

In: ASIACRYPT-94, Lecture Notes in Computer Science. vol. 917. Springer, New York (1995)

4. Alford, W.R., Granville, A., Pomerance, C.: There are inﬁnitely many Carmichael numbers.

Ann. Math. 140, 703–722 (1994)

5. Williams, H.C.: Primality testing on a computer. Ars Combinatoria 5, 127–185 (1978)

6. Agrawal, M., Kayal, N., Saxena, N.: Primes is in P. Department of Computer Science and

Engineering, Indian Institute of Technology, Kanpur, India, 6 August 2002

7. Song., Y.Y.: Primality Testing and Integer Factorization in Public-key Cryptography. Kluwer,

The Netherlands (2004)

8. Difﬁe, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory IT 22, 644–

654 (1976)

9. Kerckhoffs, A.: La cryptographie militaire. Journal des sciences militaires. 9, 5–83 (1883)

Chapter 3

Groups

activity, that communal bath where the hairy and slippery mix in

a multiplication of mediocrity.

Vladimir Nabokov (1899–1977)

It may seem pretty obvious what a group is, but it’s worth giving

it some thought anyway.

(from business management literature)

structures which at ﬁrst sight might appear unrelated. In this chapter we will start by

looking at groups of permutations from which groups take their origin. We will then

give a general deﬁnition of a group, and move on to studying the multiplicative group

of Zn and the group of points of an elliptic curve. The latter two groups have recently

gained cryptographic signiﬁcance. Group theory plays a central role in cryptography;

as a matter of fact, any large ﬁnite group can potentially be a basis of a cryptographic

system.

3.1 Permutations

of Permutations of Degree n

For any element a ∈ A we can ﬁnd its image f (a) ∈ B under f and for that element

of B we can ﬁnd its image g(f (a)) ∈ C under g. We have now implicitly deﬁned

a third mapping which maps a ∈ A onto g(f (a)). We denote this mapping by f ◦ g

and call it the composition of mappings f and g. As a formula, it can be written as

(f ◦ g)(a) = g(f (a)).

Important Note: the convention we use runs contrary to that used in Calculus,

where f ◦ g(x) = f (g(x)) (i.e., ﬁrst compute g(x), then apply the function f to

the result). This may cause some minor problems to students used to a different

© Springer International Publishing Switzerland 2015 73

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_3

74 3 Groups

convention. The great advantage of writing the composition in this way is that it is

the same convention as the one used in GAP.

One of the properties of composition of major importance is its compliance with

the associative law.

Proposition 3.1.1 Composition of mappings is associative, that is, given sets A, B,

C, D and mappings f : A → B, g : B → C and h : C → D, we have

(f ◦ g) ◦ h = f ◦ (g ◦ h).

Proof Two mappings from A to D are equal when they assign exactly the same

images in D to every element in A. Let us calculate the image of a ∈ A ﬁrst under

the mapping (f ◦ g) ◦ h and then under f ◦ (g ◦ h):

(f ◦ (g ◦ h)(a) = (g ◦ h)(f (a)) = h(g(f (a))).

The image of a under both mappings is the same. Since a ∈ A was arbitrary, the two

mappings are equal. �

one-to-one and onto then f is invertible, i.e., there exists a function g : A → A such

that

g ◦ f = f ◦ g = id, (3.1)

where id is the identity mapping on A. In this case f and g are called mutual inverses

and we use the notation g = f −1 and f = g −1 to express that. Equation (3.1) means

that g maps f (a) to a while f maps g(a) to a, i.e., g undoes the work of f , and f

undoes the work of g.

Example 3.1.1 Let R+ be the set of positive real numbers. Let f : R+ → R and

g : R → R+ be given as f (x) = ln x and g(x) = ex . These are mutual inverses and

hence both functions are invertible.

In what follows we assume that the set A is ﬁnite and consider mappings from A

into itself. If A has n elements, for convenience, we assume that the elements of A

are the numbers 1, 2, . . . , n (the elements of any ﬁnite set can be labeled with the

ﬁrst few integers, so this does not restrict generality).

3.1 Permutations 75

Since a function is speciﬁed if we indicate what the image of each element is, we

can specify a permutation π by listing each element together with its image, like so:

1 2 3 ······ n − 1 n

π= .

π(1) π(2) π(3) · · · · · · π(n − 1) π(n)

Given that π is one-to-one, no number is repeated in the second row of the array.

Given that π is onto, each number from 1 to n appears somewhere in the second row.

In other words, the second row is just a rearrangement of the ﬁrst.1

1 2 3 4 5 6 7

Example 3.1.2 π = is the permutation of degree 7 which

2 5 3 1 7 6 4

maps 1 to 2, 2 to 5, 3 to 3, 4 to 1, 5 to 7, 6 to 6, and 7 to 4.

Example 3.1.3 The mapping σ : {1, 2, . . . , 6} → {1, 2, . . . , 6} given by σ(i) =

3i mod 7 is a permutation of degree 6. Indeed,

and thus

1 2 3 4 5 6

σ= .

3 6 2 5 1 4

Proof Let us consider a permutation of degree n. It is completely determined by its

bottom row. There are n ways to ﬁll the ﬁrst position of this row, n − 1 ways to ﬁll

the second position (since we must not repeat the ﬁrst entry), etc., leading to a total

of n · (n − 1) · · · · · 2 · 1 = n! different possibilities. �

The composition of two permutations of degree n is again a permutation of degree

n. Most of the time we will omit the symbol ◦ for the composition, and speak of the

product πσ of two permutations π and σ, meaning the composition π ◦ σ.

Example 3.1.4 Let

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

σ= , π= .

2 4 5 6 1 8 3 7 4 6 1 3 8 5 7 2

Then

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

σπ =

2 4 5 6 1 8 3 7 4 6 1 3 8 5 7 2

1 2 3 4 5 6 7 8

= ,

6 3 8 5 4 2 1 7

1 Clearly, in this case of ﬁnite sets, one-to-one implies onto and vice versa but this will no longer be

76 3 Groups

and

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

πσ =

4 6 1 3 8 5 7 2 2 4 5 6 1 8 3 7

1 2 3 4 5 6 7 8

= .

6 8 2 5 7 1 3 4

σ π

• the image of 1 when we apply ﬁrst σ, then π, (1 → 2 → 6, so write the 6 under

the 1),

σ π

• the image of 2 when we apply ﬁrst σ, then π, (2 → 4 → 3, so write the 3 under

the 2),

• etc.

.. All this is easily done at a glance and can be written down immediately; BUT

. be careful to start with the left hand factor!

π σ

• the image of 1 when we apply ﬁrst π, then σ, (1 → 4 → 6, so write the 6 under

the 1)

π σ

• the image of 2 when we apply ﬁrst π, then σ, (2 → 6 → 8, so write the 8 under

the 2)

• etc.

.. All this is easily done at a glance and can be written down immediately; BUT

. be careful to start with the left hand factor again!

Important Note: the example shows clearly that πσ = σπ, that is, the commuta-

tive law for permutations does not hold; so we have to be very careful about the order

of the factors in a product of permutations. But the good news is that the composition

of permutations is associative. This follows from Proposition 3.1.1.

We can also calculate the inverse of a permutation; for example, using the same

π as above, we ﬁnd

1 2 3 4 5 6 7 8

π −1 = .

3 8 4 1 6 2 7 5

Explanation: just read the array for π from the bottom up: since π(1) = 4, we must

have π −1 (4) = 1, hence write 1 under the 4 in the array for π −1 , since π(2) = 6, we

must have π −1 (6) = 2, hence write 2 under the 6 in the array for π −1 , etc. In this

case we will indeed have ππ −1 = id = π −1 π.

Similarly, we calculate

−1 1 2 3 4 5 6 7 8

σ = .

5 1 7 2 3 4 8 6

3.1 Permutations 77

Simple algebra shows that the inverse of a product can be calculated from the product

of the inverses (but note how the order is reversed!):

(πσ)−1 = σ −1 π −1 . (3.2)

To justify this, we need only to check that the product of πσ and σ −1 π −1 equals the

identity, and this is pure algebra: it follows from the associative law that

Deﬁnition 3.1.2 The set of all permutations of degree n with the operation of com-

position is called the symmetric group of degree n, and is denoted by Sn .

1. Sn is associative, i.e., (πσ)τ = π(στ ) for all π, σ, τ ∈ Sn ;

2. Sn has an identity element id, i.e., π id = id π = π for all π ∈ Sn ;

3. every element π ∈ Sn has an inverse π −1 , i.e., ππ −1 = id = π −1 π.

In Sect. 1.4 we deﬁned a commutative group. This group is not commutative as πσ is

not necessarily equal to σπ. The concept of a group was introduced into mathematics

by Évariste Galois.2

Exercises

1. In the following two cases calculate f ◦ g and g ◦ f . Note that they are different

and even their natural domains are different.

(a) f (x) = sin x and g(x) =

√ 1/x;

(b) f (x) = ex and g(x) = x.

2. Let Rθ be an anticlockwise rotation of the plane about the origin through an

angle θ. Show that Rθ is invertible with the inverse R2π−θ .

3. Show that any reﬂection H of the plane in any line is invertible and the inverse

of H is H itself.

4. Determine how many permutations of degree n act identically on a ﬁxed set of

k elements of {1, 2, . . . , n}.

5. Show that the mapping σ : {1, 2, . . . , 8} → {1, 2, . . . , 8} given by σ(i) =

5i mod 9 is a permutation by writing it down in the form of a table.

6. Let the mapping π : {1, 2, . . . , 12} → {1, 2, . . . , 12} be deﬁned by π(k) =

3k mod 13. Show that π is a permutation of S12 .

2 Évariste Galois (1811–1832), a French mathematician who was the ﬁrst to use the word “group”

(French: groupe) as a technical term in mathematics to represent a group of permutations. While

still in his teens, he was able to determine a necessary and sufﬁcient condition for a polynomial to

be solvable by radicals, thereby solving a long-standing problem. His work laid the foundations for

Galois theory, a major branch of abstract algebra.

78 3 Groups

mod 13. Show that τ is not a permutation of S12 by showing that both one-

to-one and onto properties are violated.

8. Calculate the inverses and all distinct powers of the permutations:

1 2 3 4 5 6 1 2 3 4 5 6

ρ= , τ= .

3 4 5 6 1 2 4 6 5 1 3 2

9. Let

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

σ= , γ= .

2 4 5 6 1 9 8 3 7 6 2 7 9 3 8 1 4 5

10. Prove rigorously that the composition of two permutations of degree n is a

permutation of degree n.

called a permutation cipher. In this cryptosystem a plaintext and ciphertext are both

over the same alphabet. Let m = a1 a2 . . . an be a message of ﬁxed length n over an

alphabet A. Then the corresponding cryptotext is deﬁned as

which means the symbols of the message are permuted in accord with the permutation

π. If the message is longer than n we split it into smaller segments of length n. (It

is always possible to add some junk letters to make the total length of the message

divisible by n.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

π=

2 12 3 16 4 10 9 15 7 8 6 5 14 1 13 11

ALL ALL ARE GONE THE OLD FAMILIAR FACES

ALLALLAREGONETHE OLDFAMILIARFACES

3.1 Permutations 79

LNLEAGEHARLLTAEO LFDSFAIEILMACOAR

LNLEAGEHARLLTAEOLFDSFAIEILMACOAR

only. Indeed, the length of blocks is unknown, and even if known, the space of

secret keys is very large: it has n! possible permutations and n! grows very fast. Even

for reasonably small n like n = 128 the number of possible keys is astronomical.

However, if one can guess even a fragment of the plaintext, it may become easy.

To make guessing the plaintext difﬁcult a substitution cipher can be applied ﬁrst.

The combination of substitutions and permutations is called a product cipher. The

product ciphers are not normally used on their own but they are an indispensable

part of modern cryptography. For example, adopted on 23 November 1976 the DES

(Data Encryption Standard) involved 16 rounds of substitutions and permutations.

The main steps of the DES algorithm are as follows:

• Partitioning of the text into 64-bit blocks;

• Initial permutation within each block;

• Breakdown of the blocks into two parts: left and right, named L and R;

• Permutation and substitution steps repeated 16 times (called rounds) on each part;

• Re-joining of the left and right parts then, the inverse of the initial permutation.

DES is now considered to be insecure for many applications. In 1997, a call was

launched for projects to develop an encryption algorithm in order to replace the

DES. After an international competition, in 2001, a new block cipher Rijndael3 was

selected as a replacement for DES. It is now referred to as the Advanced Encryption

Standard (AES).

(and leaves all others ﬁxed) is called acycle.

1 2 3 4 5 6 7

For example, the permutation π = is a cycle, because we

1 5 3 7 4 6 2

π π π π

have 5 → 4 → 7 → 2 → 5, and each of the other elements of {1, 2, 3, 4, 5, 6, 7},

namely 1, 3, 6, stay unchanged. To see this, we must of course chase elements around,

the nice cyclic structure is not immediately evident from our notation. We write

π = (5 4 7 2), meaning that all numbers not on the list are mapped to themselves,

3 J.

Daemen and V. Rijmen. The block cipher Rijndael, Smart Card Research and Applications,

LNCS 1820, Springer–Verlag, pp. 288–296.

80 3 Groups

whilst the ones in the bracket are mapped to the one listed to the right, except the

rightmost one, which is mapped to the leftmost on the list.

Note: cycle notation is not unique, since there is no beginning or end to a circle.

We can write π = (5 4 7 2) and π = (2 5 4 7), as well as π = (4 7 2 5) and

π = (7 2 5 4)—they all denote one and the same cycle.

We say that a permutation is a cycle of length k (or a k-cycle) if it moves k numbers.

For example, (3 6 4 9 2) is a 5-cycle, (3 6) is a 2-cycle, (1 3 2) is a 3-cycle. We note

also that the inverse of a cycle is again a cycle. For example (1 2 3)−1 = (1 3 2) (or

(3 2 1) if you prefer). Similarly, (1 2 3 4 5)−1 = (1 5 4 3 2). To ﬁnd the inverse of

a cycle one has to reverse the arrows. This leads us to the following

Theorem 3.1.2 (i1 i2 i3 . . . ik )−1 = (ik ik−1 . . . i2 i1 ).

Not all permutations are cycles; for example, the permutation

1 2 3 4 5 6 7 8 9 10 11 12

σ= (3.3)

4 3 2 11 8 9 5 6 7 10 1 12

σ σ σ

is not a cycle (we have 1 → 4 → 11 → 1, but the other elements are not all

σ σ

ﬁxed (2 goes to 3, for example). Let us chase other elements. We ﬁnd: 2 → 3 → 2

σ σ σ σ σ

and 5 → 8 → 6 → 9 → 7 → 5. So in the permutation σ three cycles coexist

peacefully.

Two cycles (i1 i2 i3 . . . ik ) and (j1 j2 j3 . . . jm ) are said to be disjoint, if the sets

{i1 , i2 , . . . , ik } and {j1 , j2 , . . . , jm } have empty intersection. For instance, we may say

that

(1 5 8) and (2 4 3 6 9)

are disjoint. Any two disjoint cycles σ and τ commute, i.e., στ = τ σ (see Exercise 1).

For example,

(1 2 3 4)(5 6 7) = (5 6 7)(1 2 3 4).

However, if we multiply any cycles which are not disjoint, we have to watch their

order; for example: (1 2)(1 3) = (1 2 3), whilst (1 3)(1 2) = (1 3 2), and

(1 3 2) = (1 2 3).

The relationship between a cycle and the permutation group it belongs to is much

like that between a prime and the natural numbers.

Moreover, any such representation is unique up to the order of the factors.

and start a cycle: σ(i1 ) = i2 , σ(i2 ) = i3 , etc. Suppose that i1 , i2 , . . . , ik were all

different and σ(ik ) ∈ {i1 , i2 , . . . , ik } (this has to happen sooner or later since the set

{1, 2, . . . , n} is ﬁnite). If σ(ik ) = i1 , we have a cycle. No other possibility can exist.

If σ(ik ) = i for 2 ≤ ≤ k, then σ(i−1 ) = i = σ(ik ), which contradicts σ being

one-to-one. We observe then that σ = (i1 i2 i3 . . . ik )σ , where σ does not move

3.1 Permutations 81

any element of the set {i1 , i2 , . . . , ik } and acts as σ on the complement of this set. So

σ ﬁxes strictly more elements than σ does. This operation can be now applied to σ

and so on. It will terminate at some stage and at that moment σ will be represented

as a product of disjoint cycles. �

Exercises

1. Explain why any two disjoint cycles commute.

2. Let the mapping π : {1, 2, . . . , 12} → {1, 2, . . . , 12} be deﬁned by π(k) =

3k mod 13. This is a permutation, don’t prove this. Find the decomposition

of π into disjoint cycles.

3. Calculate the following product of permutations in S5

4. Let

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

σ= , τ= .

9 8 7 6 5 3 1 4 2 6 2 1 4 7 5 9 3 8

Calculate (στ )−1 and represent the result as a product of disjoint cycles.

important for cryptography. Now we will deﬁne the order of a permutation, and show

how the decomposition of this permutation into a product of disjoint cycles allows

us to calculate its order.

It is clear that if a permutation τ is a cycle of length k, then τ k = id, i.e., if this

permutation is repeated k times, we will have the identity permutation as a result of

this repeated action. Moreover, for no positive integer s smaller than k we will have

τ s = id. Also it is clear that if τ m = id for some positive integer m, then k is a divisor

of m. This observation motivates our next deﬁnition.

Deﬁnition 3.1.3 Let π be a permutation. The smallest positive integer i such that

π i = id is called the order of π.

It is not immediately obvious that any permutation has an order. We will see later

that this is indeed the case.

82 3 Groups

Example 3.1.7 Let us calculate the order of the permutation π = (1 2)(3 4 5). We

have:

π = (1 2)(3 4 5),

π 2 = (3 5 4),

π 3 = (1 2),

π 4 = (3 4 5),

π 5 = (1 2)(3 5 4),

π 6 = id.

So the order of σ is 2 · 3 = 6 (note that π has been given as a product of two disjoint

cycles with relatively prime lengths).

(3 5 4)k = id and the other way around. This happens because the orders of (1 2)

and (3 4 5) are relatively prime.

Example 3.1.8 The order of permutation ρ = (1 2)(3 4 5 6) is four. To see this let

us calculate

ρ = (1 2)(3 4 5 6),

ρ2 = (3 5)(4 6),

ρ3 = (1 2)(3 6 5 4),

ρ4 = id.

So the order of σ is 4 (note that ρ has been given as a product of disjoint cycles but

their lengths were not coprime).

More generally, this suggests that the order of a product of disjoint cycles equals

the least common multiple of the lengths of those cycles. We will upgrade this

suggestion into a theorem.

of σ into a product of disjoint cycles. Let k be the order of σ and k1 , k2 , . . . , kr be

the orders (lengths) of τ1 , τ2 , . . . , τr , respectively. Then

since the cycles τi are disjoint, we know that they commute and hence for

k = lcm (k1 , k2 , . . . , kr )

3.1 Permutations 83

lcm (k1 , k2 , . . . , kr ). We have

The powers of cycles τ1m , τ2m , . . ., τrm act on disjoint sets of indices and, since

σ m = id, it must be τ1m = τ2m = · · · = τrm = id. For if not, and τsm (i) = j with

i = j, then the product τ1m τ2m . . . τrm cannot be equal to id because all permutations

τ1m , . . . , τs−1

m , τ m , . . . , τ m leave i and j invariant. Thus the order of σ is a multiple

s+1 r

of each of the k1 , k2 , . . . , kr and hence a multiple of the least common multiple of

them. Thus the order of σ is not smaller than lcm (k1 , k2 , . . . , kr ). This proves the

theorem. �

is lcm(4, 3, 2, 3, 5) = 60. Before applying the formula (3.4) we must carefully check

that the cycles are disjoint.

product of disjoint cycles. For example, to determine the order of

1 2 3 4 5 6 7 8 9 10 11 12

σ=

4 3 2 11 8 9 5 6 7 10 1 12

we represent it as

σ = (1 4 11)(2 3)(5 8 6 9 7),

Exercises

1 2 3 4 5 6 7 8 9

(a) σ = ,

5 3 6 7 1 2 8 9 4

(b) τ = (1 2)(2 3 4)(4 5 6 7)(7 8 9 10 11).

2. There is an amusing legend about Flavius Josephus, a famous historian and math-

ematician who lived in the ﬁrst century A.D. The story says that in the Jewish

revolt against Rome, Josephus and 40 of his comrades were holding out against

the Romans in a cave. With defeat imminent, they resolved that, like the rebels

at Masada, they would rather die than be slaves to the Romans. They decided

to arrange themselves in a circle. One man was designated as number one, and

they proceeded clockwise around the circle of 41 men killing every third man.

84 3 Groups

At ﬁrst it is obvious whose turn it was to be killed. Initially, the men in positions

3, 6, 9, 12, . . . , 39 were killed. The next man to be killed was in position 1 and

then in position 5 (since the man in position 3 was slaughtered earlier), and so on.

Josephus (according to the story) instantly ﬁgured out where he ought to stand

in order to be the last man to go. When the time came, instead of killing him-

self, he surrendered to the Romans and lived to write his famous histories: “The

Antiquities” and “The Jewish War”.

(a) Find the permutation σ (called the Josephus permutation) for which σ(i) is

the number of the man who was ith to be killed.

(b) In which position did Josephus stand around the circle?

(c) Find the cyclic structure of the Josephus permutation.

(d) What is the order of the Josephus permutation?

(e) Calculate σ 2 and σ 3 .

3. The mapping π(i) = 13i mod 23 is a permutation of S22 (do not prove this). Find

the decomposition of π into a product of disjoint cycles and determine the order

of this permutation.

(and often in cryptography) a certain action is performed repeatedly and we are

interested in the outcome that results after a number of repetitions.

As one particularly instructive example we will analyse the so-called interlacing

shufﬂe that card players often do with a deck of cards. Suppose that we have a deck

of 2n cards (normally 52) and suppose that our cards were numbered from 1 to 2n

and the original order of cards in the deck was

a1 a2 a3 . . . a2n−1 a2n .

We split the deck into two halves which contain the cards a1 , a2 , . . . , an and

an+1 an+2 , . . . , a2n , respectively. Then we interlace them as follows. We put the ﬁrst

card of the second pile ﬁrst, then the ﬁrst card of the ﬁrst pile, then the second card of

the second pile, then the second card of the ﬁrst pile etc. This is called the interlacing

shufﬂe. After this operation the order of cards will be:

1 2 3 ... n n + 1 n + 2 ... 2n

σn =

2 4 6 . . . 2n 1 3 . . . 2n − 1

3.1 Permutations 85

in correspondence to this shufﬂe. All it says is that the ﬁrst card goes to the second

position, the second card is moved to the fourth position, etc. We see that we can

deﬁne this permutation by the formula:

σn (i) = 2i mod 2n + 1

and σn (i) is the position of the ith card after the shufﬂe. What will happen after

2, 3, 4, . . . shufﬂes? The resulting change will be characterized by the permutations

σn2 , σn3 , σn4 , . . . , respectively.

1 2 3 4 5 6 7 8

σ4 = = 1 2 4 8 7 5 3 6 .

2 4 6 8 1 3 5 7

The order of σ4 is 6.

1 2 3 4 5 6 7 8 9 10

σ5 =

2 4 6 8 10 1 3 5 7 9

= 1 2 4 8 5 10 9 7 3 6 .

Also σ510 = id and 10 is the order of σ5 . Hence all cards will be back to their initial

positions after 10 shufﬂes but not before.

Let us deal with the real thing that is the deck of card of 52 cards. We know that

the interlacing shufﬂe is deﬁned by the equation σ26 (i) = 2i mod 53. GAP helps us

to investigate. We have:

gap> lastrow:=[1..52];;

gap> for i in [1..52] do

> lastrow[i]:=2*i mod 53;

> od;

gap> lastrow;

[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40,

42, 44, 46, 48, 50, 52, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27,

29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51 ]

gap> PermList(lastrow);

(1,2,4,8,16,32,11,22,44,35,17,34,15,30,7,14,28,3,6,12,24,48,43,33,13,26,52,51,

49,45,37,21,42,31,9,18,36,19,38,23,46,39,25,50,47,41,29,5,10,20,40,27)

gap> Order(last);

52

Thus the interlacing shufﬂe σ26 is a cycle of length 52 and has order 52.

Exercises

1. A shufﬂe of a deck of 15 cards is made as follows. The top card is put at the bottom,

the deck is cut into three equal decks, the bottom third is switched with the middle

86 3 Groups

third, and then the resulting bottom card is placed on the top. How many times

must this shufﬂe be repeated to get the cards back in the initial order? Write

down the permutation corresponding to this shufﬂe and ﬁnd its decomposition

into disjoint cycles.

2. Use GAP to determine the decomposition into disjoint cycles and the order of the

interlacing shufﬂe σ52 for the deck of 104 cards which consists of two copies of

ordinary decks with 52 cards in each.

3. On a circle there are n beetles. At a certain moment they start to move all at once

and with the same speed (but maybe in different directions). When two beetles

meet, both of them reverse their directions and continue to move with the same

speed. Prove that there will be a moment when all beetles again occupy their

initial positions. (Hint: Suppose one beetle makes the full circle in time t. Think

about what will happen after time t when all beetles move.)

Cycles of length 2 are the simplest permutations, as they move only two elements.

We deﬁne

every arrangement of n objects can be obtained from a given starting position by

making a sequence of swaps). We will observe, ﬁrst, that a cycle of arbitrary length

can be expressed as a product of transpositions. Then using Theorem 3.1.3 we will be

able to express any permutation as product of transpositions. Here are some examples:

Example 3.1.13 (1 2 3 4 5) = (1 2)(1 3)(1 4)(1 5) (just check that the left-hand

side equals the right-hand side!).

way:

(i1 i2 . . . ir ) = (i1 i2 )(i1 i3 ) . . . (i1 ir ). (3.5)

decompose σ into a product of disjoint cycles, then write the cycles as a product of

transpositions as in formula (3.5). For example,

1 2 3 4 5 6 7 8 9 10 11

= (1 4 11)(2 3)(5 8 6 9 7)

4 3 2 11 8 9 5 6 7 10 1

3.1 Permutations 87

Example 3.1.15 Note that there are many different ways to write a permutation as

product of transpositions; for example, (1 2 3 4 5) can be written in any of the

following forms

= (3 4)(3 5)(2 3)(1 3)(2 3)(2 1)(3 1)(3 2).

(Don’t ask how these products were found! The point is to check that all these

products are equal, and to note that there is nothing unique about how one can write

a permutation as a product of transpositions.)

tion into a product of transpositions. As we will see, the number of such transpositions

will be either always even or always odd.

even number of transpositions. A permutation is called odd if it can be written as a

product of an odd number of transpositions.

which is at the same time even and odd—this justiﬁes the use of the terminology.

We will establish this by looking at the polynomial

f (x1 , x2 , . . . , xn ) = (xi − xj ).

i<j

since all brackets will remain except (xi − xi+1 ), which will become (xi+1 − xi ) =

−(xi − xi+1 ), so we will have one change of sign.

Arguing by induction we suppose that (3.6) is true for all permutations π = (i j)

for which |j − i| < . Suppose now that |j − i| = . Since

Hence (3.6) holds for any product of an odd number of transpositions. It is now also

clear that

f (xπ(1) , xπ(2) , . . . , xπ(n) ) = +f (x1 , x2 , . . . , xn ) (3.7)

88 3 Groups

is no permutation which is both even and odd.

Example 3.1.16 (1 2 3 4) is an odd permutation, because (1 2 3 4) = (1 2)(1 3)

(1 4). On the other hand, the permutation (1 2 3 4 5) is even, because (1 2 3 4 5) =

(1 2)(1 3)(1 4)(1 5).

Example 3.1.17 Since id = (1 2)(1 2), the identity permutation is even.

1 2 3 4 5 6 7 8 9

Example 3.1.18 Let π = . Is π even or odd?

4 3 2 5 1 6 9 8 7

First we decompose π into a product of cycles, then use the result above:

Theorem 3.1.5 A k-cycle is even if k is odd and odd if k is even.

Proof Immediately follows from (3.5). �

Deﬁnition 3.1.6 We say that two permutations have the same parity if they are both

odd or both even and different parity if one of them is odd and another is even.

Theorem 3.1.6 In any symmetric group Sn

(i) The product of two even permutations is even.

(ii) The product of two odd permutations is even.

(iii) The product of an even permutation and an odd one is odd.

(iv) A permutation and its inverse have the same parities.

Proof Only statement (iv) needs a comment. It follows from (iii). Indeed, for any

permutation π we have ππ −1 = id, and, since the identity permutation is even, by

(iii), π and π −1 cannot have different parities. �

Theorem 3.1.7 Exactly half of the elements of Sn are even and half of them are odd.

Proof Denote by E the set of even permutations in Sn , and by O the set of odd

permutations in Sn . If τ is any ﬁxed transposition from Sn , we can establish a one-

to-one correspondence between E and O as follows: for π in E we know that τ π

belongs to O. Therefore we have a mapping f : E → O deﬁned by f (π) = τ π. The

function f is one-to-one since τ π = τ σ implies that π = σ; f is onto, because if κ

is an odd permutation then τ κ is even, and f (τ κ) = τ τ κ = κ. �

n!

Corollary 3.1.1 The number of even permutations in Sn is 2. The number of odd

permutations in Sn is also n!2 .

Corollary 3.1.2 The set An of all even permutations of degree n is a group relative

to the operation of composition called the alternating group of degree n.

3.1 Permutations 89

Example 3.1.19 We can have a look at the elements of S4 , listing all of them, checking

which of them are even and which of them are odd.

(1 3 4), (1 4 3), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3),

(1 2), (1 3), (1 4), (2 3), (2 4), (3 4), (1 2 3 4), (1 4 3 2),

(1 3 2 4), (1 4 2 3), (1 2 4 3), (1 3 4 2)}.

The elements in the ﬁrst two lines are even permutations, and the remaining elements

are odd. We have

(1 3 4), (1 4 3), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}.

Exercises

as products of transpositions.

2. What would be the parity of the product of 11 odd permutations?

3. Let π, ρ ∈ Sn be two permutations. Prove that π and ρ−1 πρ have the same parity.

4. Let π, ρ ∈ Sn be two permutations. Prove that π −1 ρ−1 πρ is an even permutation.

5. Determine the parity of the permutation σ of order n such that σ(i) = n + 1 − i.

3.1.7 Puzzle 15

We close this section with a few words about a game played with a simple toy. This

game seems to have been invented in the 1870s by the famous puzzle-maker Sam

Loyd. It caught on and became all the rage in the United States in the 1870s, and

ﬁnally led to a discussion by W. Johnson in the scholarly journal, the American

Journal of Mathematics, in 1879. It is often called the 15-puzzle.

Consider a toy made up of 16 squares, numbered from 1 to 15 inclusive and with

the lower right-hand corner blank.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15

90 3 Groups

The toy is constructed so that the squares can be slid vertically and horizontally,

such moves being possible because of the presence of the blank square. Start with

the position shown above and perform a sequence of slides in such a way that, at

the end, the lower right-hand square is again blank. Call the new position realisable.

The natural question is: How can we determine whether or not the given position is

realisable?

What do we have here? After a sequence of slides we have shufﬂed the numbers

from 1 to 15; that is, we have effected a permutation of the numbers from 1 to 15. To

ask which positions are realisable is merely to ask which permutations can be carried

out. This is a permutation of S16 since the blank square also moves in the process. In

other words, in S16 , the symmetric group of degree 16, which permutations can be

reached via the toy? For instance, can the following position

13 4 12 15

1 14 9 6

8 3 2 7

10 5 11

be realised?

We will denote the empty square by the number 16. The position will be then

a1 a2 a3 a4

a5 a6 a7 a8

1 2 . . . 16

.

a1 a2 . . . a16

1 3 5 7

9 11 13 15

2 4 6

8 10 12 14

3.1 Permutations 91

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

σ= .

1 3 5 7 9 11 13 15 2 4 16 6 8 10 12 14

If we make a move pulling down the square 13, then the new position will be

1 3 5 7

9 11 15

2 4 13 6

8 10 12 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

= σ (13 16).

1 3 5 7 9 11 16 15 2 4 13 6 8 10 12 14

We observethe rule the permutation changes: when we swap the square with the

number i on it with the neighboring empty square, the permutation is being multiplied

on the right by the transposition (i 16).

by legal moves to the initial position, then there exist transpositions τ1 , τ2 , . . . , τm

such that

id = σ τ1 τ2 . . . τm . (3.8)

If the empty square was initially in the right bottom corner, then m is even and σ is

even.

by legal moves to the initial position. As we noted in Example 3.1.20, every legal move

is equivalent to a multiplication by a transposition (i 16) for some i ∈ {1, 2, . . . , 15}.

Since the initial position is characterised by the identity permutation, we see that

(3.8) follows. This implies

σ = τm τm−1 . . . τ2 τ1

from which we see that the parity of σ is the same as the parity of m.

Let us colour the board in the chessboard pattern.

92 3 Groups

Every move changes the colour of the empty square. Thus if at the beginning and

at the end the empty square was black, then there was an even number of moves made.

Therefore, if initially the right bottom corner was empty and we could transform this

position to the initial position, then an even number of moves was made, m is even,

and σ is also even. �

It can be shown that every position, with an even permutation σ can be transformed

to the initial position, but no easy proof is known.

Exercises

14 10 13 12 10 14 13 12

6 11 9 8 6 11 9 8

7 3 5 1 7 3 5 1

4 15 2 4 15 2

show that one of them is realisable and one is not without writing down the

corresponding permutations and determining their parities.

2. For each of the following arrangements of the 15-puzzle determine the parity of

the corresponding permutation.

1 3 2 4 13 5 3

6 5 7 8 9 2 7 10

9 13 15 11 1 15 14 8

14 10 12 12 11 6 4

3.2 General Groups 93

Surprisingly many objects in mathematics satisfy the same properties as the symmet-

ric groups deﬁned in Deﬁnition 3.1.2. There is good reason to study all such objects

simultaneously. For this purpose we introduce the concept of a general group.

satisﬁes the following three properties:

1. the operation ∗ is associative; i.e.,

(a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ G.

(This element is often also denoted by 1, or, if the group operation is written as

addition, it is usually denoted by 0.)

3. Every element of G possesses an inverse; i.e., given g ∈ G there exists a unique

element h in G such that

The element h is called the inverse of g, and denoted by g −1 (when the operation

is written as addition, the inverse is usually denoted by −g).

We denote this group (G, ∗), or simply G, when this invites no confusion. A group

G in which the commutative law holds (a ∗ b = b ∗ a for all a, b ∈ G) is called a

commutative group or an abelian group.

In any group (G, ∗) we have the familiar formula for the inverse of the product

Example 3.2.1 We established in the previous sections that Sn is a group, the opera-

tion being multiplication of permutations (i.e., composition of functions). This group

is not abelian.

Example 3.2.2 Here is an example where the group operation is written as addi-

tion: Zn is an abelian group under addition ⊕ modulo n. This was established in

Theorem 1.4.1.

94 3 Groups

Example 3.2.3 Z∗n (the set of invertible elements in the ring Zn ) is a group under

multiplication modulo n. In particular, Z∗8 = {1, 3, 5, 7} with 3−1 = 3, 5−1 = 5,

7−1 = 7.

When we talk about a group it is important to be clear about the group operation;

either it must be explicitly speciﬁed, or the group operation must be clear from

the context and tacitly understood. The following are cases where there is a clear

understanding of the operation, so it will often not be made explicit. Most important

are:

• When we talk about the group Zn we mean the set of integers modulo n under

addition modulo n.

• When we talk about the group Z∗n we mean the set of invertible elements of the

ring Zn under multiplication modulo n.

Normally, when making general statements about groups, we write the statements

in multiplicative notation; but it is important to be able to apply them also in situations

where the group operation is written as addition (some obvious modiﬁcations must

be made).

Deﬁnition 3.2.2 Let G be a group and e be its identity element. The number of

elements of G is called the order of G and is denoted |G|.

• Sn is a group of order n!.

• Zn is a group of order n.

• Z∗n is a group of order φ(n), where φ is Euler’s totient function; for example,

|Z∗12 | = 4.

• Z is an inﬁnite group.

• The positive integers R+ with the usual operation of multiplication of the reals is

also an inﬁnite group.

Exercises

1. Show that division a b = a/b is a binary operation on R\{0}. Show that it is not

associative.

2. Show that a b = ab is a binary operation on the set R+ of positive real numbers.

Show that it does not have a neutral element.

3. Let Cn be the set of all complex numbers satisfying the equation zn = 1. Prove

that this is an abelian group of order n.

4. Prove that the set GLn (R) of all invertible n × n matrices is a non-abelian group.

5. Prove that for four arbitrary elements g1 , g2 , g3 , g4 of a group G (where the

operation is written as multiplication)

3.2 General Groups 95

List all possible arrangements of brackets on the product g1 g2 g3 g4 and show that

the result will be always the same so that we can write

g1 g2 g3 g4

for all of them without confusion. Finally you may try to prove that a product

g1 g2 . . . g n

combined and associated. Hint: You need to use a clever induction here.

element of G, e the identity element of G, and n ∈ Z. We deﬁne

⎧

⎪ gg · · · g if n > 0,

⎪

⎪

⎪

⎨ n times

gn = e if n = 0,

⎪

⎪ −1 −1 −1

⎪g g ···g

⎪ if n < 0.

⎩

|n| times

Since we know that the product g1 g2 . . . gn is independent of the way in which these

elements are associated, it becomes clear that the usual law of exponents g i g j = g i+j

holds (totally obvious in the case where both i and j are positive, and still trivial in

all other cases). The set of all powers of g ∈ G we denote by < g >.

of G, 0 the identity element of G, and n ∈ Z. We deﬁne

⎧

⎪ g + g··· + g if n > 0,

⎪

⎪

⎪

⎨ n times

ng = 0 if n = 0,

⎪

⎪ (−g) + (−g) + · · · + (−g) if n < 0.

⎪

⎪

⎩

|n| times

The usual law of multiples mg + ng = (m + n)g also holds. The set of all

multiples of g ∈ G we also denote by < g >.

element g is called cyclic. This fact can be written as G = < g >. The element g in

this case is called the generator of G.

96 3 Groups

ng + mg.

Example 3.2.5 Several examples:

• Sn is NOT a cyclic group since it is not abelian.

• Zn = < 1 > and is cyclic.

• Z∗5 = < 2 > and is cyclic. Check this by calculating all multiples of 2.

• Z = < 1 > is an inﬁnite cyclic group.

Later (see, e.g., Exercise 4) we will see that abelian groups do not have to be

cyclic.

Deﬁnition 3.2.6 Let G be a group and e be its identity element. Then the order of g

in G is the least positive integer i such that g i = e, if such an integer exists; otherwise

we say that the order of g is inﬁnite. It is denoted ord (g).

We note that this deﬁnition is consistent with the deﬁnition of the order of a

permutation given earlier.

In an additively written group G the order of g ∈ G is the least positive integer m

such that mg = 0, if such an integer exists; if no such integer exists, we say that the

order of g is inﬁnite.

Example 3.2.6 Conﬁrm for yourself that:

• Each of the non-identity elements of Z∗12 have order 2;

• In Z12 the element 10 has order 6;

• Element 6 in the group Z has inﬁnite order.

As we will see later, in a ﬁnite group G, the orders of its elements and the order

of the group |G| are closely related.

We start to establish this link with the following

Lemma 3.2.1 If ord (g) = n, then < g > = {e, g, g 2 , . . . , g n−1 }, and all n powers

of g in this set are distinct, i.e., | < g > | = n. Conversely, if | < g > | = n, then g

is an element of order n.

Proof Suppose ord (g) = n. Then g n = e and all powers of g belong to the set

{e, g, g 2 , . . . , g n−1 }. Indeed, for any k ∈ Z we may divide k by n with remainder

k = qn + r, where 0 ≤ r < n. Then g k = g qn+r = g qn g r = (g n )q g r = g r ,

which belongs to {e, g, g 2 , . . . , g n−1 }. Hence < g > = {e, g, g 2 , . . . , g n−1 }, On the

other hand, if any two powers in this set are equal, say g i = g j with i < j, then

g j = g i g j−i = g i and g j−i = e. This is a contradiction since j − i < n and n is

the order of g. Therefore, if the order of g is ﬁnite, the order of g is the same as the

cardinality of the set < g >.

Suppose now that the cardinality of < g > is n. Then we have only n distinct

powers of g and there will exist two distinct integers k and m such that g k = g m .

If we assume that k > m then we will ﬁnd that g k−m = e, and g will have ﬁnite

order. We have already proved that in this case the order of g and the size of < g >

coincide. Hence ord (g) = n. �

3.2 General Groups 97

In the following corollary we ﬁnd a link between the two concepts of ‘order’. It

is often useful since we can decide whether a group is cyclic or not by looking at the

orders of its elements.

that it has an element of order n.

Example 3.2.7 Z∗8 is NOT a cyclic group, because |Z∗8 | = φ(8) = 4 and there is no

element of order 4 in this group (indeed, check that they all have order 2).

elements in Zn .

n

Theorem 3.2.1 The order of i ∈ Zn is ord (i) = .

gcd(i, n)

Proof To see this, note that the group is written additively, so the order of i is the

smallest positive integer k such that ki ≡ 0 mod n. That is, ki is the smallest positive

number which is a multiple of i as well as of n. This means that ki = lcm(i, n). Now

solve this equation for k using (1.13):

lcm(i, n) in n

k= = = .

i igcd(i, n) gcd(i, n)

121 121

ord (110) = = = 11.

gcd(121, 110) 11

Exercises

2. Find all elements of order 7 in Z84 .

3. Find the order of i = 41670852902912 in the abelian group Zn , where n =

563744998038700032.

4. Show that Z∗12 is an abelian group which is not cyclic.

5. Show that the order of the interlacing shufﬂe σn (deﬁned in Sect. 3.1.5) is equal

to the order of 2 in Z∗2n+1 .

3.2.3 Isomorphism

A single group may have several very different presentations. To deal with this

problem mathematics introduces the concept of isomorphism.

98 3 Groups

Deﬁnition 3.2.7 Let G and H be two groups with operations ∗ and ◦, respectively.

An onto and one-to-one mapping σ : G → H is called an isomorphism if

for all g1 , g2 ∈ G.

What it says is that if we rename the elements of H appropriately and change the

name for the operation in H, we will obtain the group G. If two groups G and H are

isomorphic, we write G ∼ = H. The equation (3.9) written as

g1 ∗ g2 = σ −1 (σ(g1 ) ◦ σ(g2 ))

elements g1 and g2 in the group G one can compute σ(g1 ) ◦ σ(g2 ) for the images

σ(g1 ) and σ(g2 ) of these elements in H and take the preimage of the result.

phism of the group R, which is the reals with the operation of addition, and the group

R+ , which is the positive reals with the operation of multiplication. The isomorphism

σ : R → R+ between these two groups is given by σ(x) = ex . Indeed, the condition

(3.9) is satisﬁed since

The famous slide rule—a commonly used calculation tool in science and engineering

before electronic calculators became available—was based on this isomorphism.

= Z∗5 . Let us look at their addition and multiplication

tables, respectively.

Z4 Z∗5

⊕ 0 1 2 3 1 2 3 4

0 0 1 2 3 1 1 2 3 4

1 1 2 3 0 2 2 4 1 3

2 2 3 0 1 3 3 1 4 2

3 3 0 1 2 4 4 3 2 1

We may observe that the ﬁrst table can be converted into the second one if we

make the following substitution:

0 → 1, 1 → 2, 2 → 4, 3 → 3

(check it right now). Therefore this mapping, let us call it σ, from Z4 to Z∗5 is an

isomorphism. The mystery behind this mapping is clariﬁed if we notice that we

actually map

3.2 General Groups 99

0 → 20 , 1 → 21 , 2 → 22 , 3 → 23 .

Then the isomorphism property (3.9) follows from the formula 2i 2j = 2i⊕j .

Before continuing with the study of isomorphisms we make a useful observation:

in any group G the only element that satisﬁes g 2 = g is the identity element. Indeed,

multiplying this equation by g −1 we get g = e.

Proposition 3.2.1 Let (G, ∗) and (H, ◦) be two groups and e be the identity element

of G. Let σ : G → H be an isomorphism of these groups. Then σ(e) is the identity

of H.

, where e is the identity element of G. Let us prove that

is the

identity element of H. We note that

2 = σ(e)2 = σ(e2 ) = σ(e) =

. Any element

in a group with this property must be the identity, so

is the identity of H. �

Theorem 3.2.2 Let (G, ∗) and (H, ◦) be two groups and σ : G → H be an isomor-

phism. Then σ −1 : H → G is also an isomorphism.

for all h1 , h2 ∈ H. For this reason we apply σ to both sides of this equation. As

σσ −1 = idG and σ −1 σ = idH , and due to (3.9)

σ(σ −1 (h1 ◦ h2 )) = h1 ◦ h2 ,

σ(σ −1 (h1 ) ∗ σ −1 (h2 )) = σ(σ −1 (h1 )) ◦ σ(σ −1 (h2 )) = h1 ◦ h2 .

ﬁnite order. Then ord (g) = ord (σ(g)).

, where e is the identity element of G and

is

the identity of H. Suppose now ord (g) = n. Then g n = e. Let us now apply σ to

both sides of this equation. We obtain σ(g)n = σ(g n ) = σ(e) =

, from which we

see that ord (σ(g)) ≤ n, i.e., ord (σ(g)) ≤ ord (g). Since σ −1 is also an isomorphism,

which takes σ(g) to g, we obtain ord (g) ≤ ord (σ(g)). This proves the theorem. �

We now move on to one of the main theorems of this section. The theorem will,

in particular, give us a tool for calculating orders of elements of cyclic groups which

are also written multiplicatively.

100 3 Groups

Proof Since G = <g> has cardinality n, by Lemma 3.2.1 we have ord (g) = n and

G = {g 0 , g 1 , g 2 , . . . , g n−1 }. We deﬁne σ : Zn → G by setting σ(i) = g i . Then

where ⊕ is addition modulo n. This checks (3.9) and proves that the mapping σ is

indeed an isomorphism. �

Now we can now reap beneﬁts of Theorem 3.2.4.

Corollary 3.2.2 Let G be a multiplicative cyclic group and G = < g >, where g is

an element of order n. Then

n

ord (g i ) = . (3.11)

gcd(i, n)

Proof This now follows from the theorem we have just proved and Theorems 3.2.1

and 3.2.3. Indeed, the order of g i in G must be the same as the order of i in Zn . �

Exercises

1. Let σ : G → H be an isomorphism and g be an element of G. Prove that σ(g −1 ) =

σ(g)−1 .

2. Let Cn be the group of all complex numbers satisfying the equation zn = 1. Prove

that Cn ∼= Zn .

3. Prove that the multiplicative group of complex numbers C∗ is isomorphic to the

group of matrices

a −b

G= | a, b ∈ R

b a

4. Both groups G 1 = Z∗191 and G 2 = Z∗193 are cyclic (do not try to prove this).

Which of these groups contains elements of order 19? How many?

5. Knowing that 2 is a generating element for the cyclic group Z∗211 determine the

order of 2150 in Z∗211 .

6. 264 is a generator of Z∗271 , i.e., the (multiplicative) order of 264 in Z271 is 270,

as is shown by the following calculation:

gap> OrderMod(264,271);

270

3.2.4 Subgroups

if it satisﬁes the following properties:

3.2 General Groups 101

2. H is closed under the group operation; i.e., if a and b belong to H, then ab also

belongs to H.

3. H is closed under inverses; i.e., if a belongs to H then a−1 also belongs to H.

G ≤ G and {e} ≤ G. These are trivial examples. Let us consider a non-trivial one.

Firstly, we would like to introduce a construction which, given an element g ∈ G

will always give us a subgroup containing this element. Moreover, this subgroup will

be the smallest subgroup with this property. This is the familiar < g > = {g i | i ∈ Z},

which is the set of all powers of g.

is the smallest subgroup of G that contains g.

Proof To decide whether or not < g > is a subgroup, we must answer three questions:

• Does the identity e of G belong to < g >? The answer is YES, because g 0 = e

and < g > consists of all powers of g.

• If x, y ∈< g >, does xy also belong to < g >?

x ∈< g > means that x = g i for some integer i; similarly, y = g j for some integer

j. Then xy = g i g j = g i+j , which shows that xy is a power of g and therefore

belongs to < g >.

• If x ∈< g >, does x −1 also belong to < g >?

x ∈< g > means that x = g i for some integer i; then x −1 = g −i , i.e., x −1 is also

a power of g and therefore belongs to < g >.

So < g > is indeed a subgroup. It is the smallest subgroup containing g ∈ G since

any subgroup that contains g must also contain all powers of g. �

Another example gives us a subgroup of a non-commutative group.

where e is the identity permutation and

1. The identity e belongs to V . This is obvious.

2. The product of two elements of V also belongs to V . We check:

ab = ba = c, bc = cb = a, ac = ca = b, a2 = b2 = c2 = e,

3. V is closed under taking inverses. This is also true since a−1 = a, b−1 = b,

c−1 = c.

102 3 Groups

We see that V is indeed a subgroup of S4 . This group is known as the Klein four-group.

Additional information about orders may be extracted using Lagrange’s Theorem.

We will state and prove this theorem below, but ﬁrst we need to introduce the cosets

of a subgroup. Let G be a group, H a subgroup of G, and g ∈ G. The set gH = {gh |

h ∈ H} is called a left coset of H and the set Hg = {hg | h ∈ H} is called a right

coset of H.

Example 3.2.12 Let us consider G = S4 and H = V , the Klein four-group which

is a subgroup of S4 . Let g = (12). Then the corresponding left coset consists of the

permutations

(12)V = {(12), (34), (1 4 2 3), (1 3 2 4)}.

Proposition 3.2.3 If H is ﬁnite, then |gH| = |Hg| = |H| for any g ∈ G.

Proof We need to prove that all elements gh are different, i.e., if gh1 = gh2 , then

h1 = h2 . This is obvious since we can multiply both sides of the equation gh1 = gh2

by g −1 on the left. This proves |gH| = |H|. The proof of |Hg| = |H| is similar. �

We are now ready to state and prove Lagrange’s Theorem.

Theorem 3.2.5 (Lagrange’s Theorem) Let G be a ﬁnite group, H a subgroup of G.

Then the order of H is a divisor of the order of G.

Proof Our proof relies on the decomposition of G into a disjoint union of left cosets

of H, all of which have the same number of elements, namely |H|. Let us prove that

such decomposition exists. All we need to show is that any two cosets are either

disjoint or coincide.

Suppose the two cosets aH and bH have a nonzero intersection, i.e., ah1 = bh2

for some h1 , h2 ∈ H. Then b−1 a = h2 h1−1 ∈ H. In this case any element ah ∈ aH

can be expressed as b(b−1 a)h, where (b−1 a)h belongs to H. This proves aH ⊆ bH

and hence aH = bH as both sets have the same cardinality. Hence these cosets must

coincide. We obtain a partition of G into a number of disjoint cosets each of which

has cardinality |H|. If k is the number of cosets in the partition, then in total G has

k|H| elements. This proves the theorem. �

Corollary 3.2.3 The order of an element g of a ﬁnite group G is a divisor of the

order of G. In particular, g |G| = 1.

Proof Just note that by Lemma 3.2.1 the order of an element g ∈ G equals the order

of the subgroup < g > of G. Then Lagrange’s theorem implies that the order of g

is a divisor of |G|. Let ord (g) = m, |G| = n and n = mk for some integer k. Then

g n = g mk = (g m )k = 1k = 1. �

Example 3.2.13 Find the order of the element 2 ∈ Z∗17 .

A naive approach is to calculate all powers of 2, until one such power is found

to be the identity. We have a more economical way to ﬁnd the order: since Z∗17 has

16 elements, it is sufﬁcient to calculate all the powers 2i where i is a divisor of 16

until the result equals 1. We know that 216 mod 17 = 1 and we need to calculate

3.2 General Groups 103

only 22 mod 17, 24 mod 17, and 28 mod 17. Our calculations will terminate when

we ﬁnd that 28 ≡ 1 mod 17; the order of 2 in Z∗17 is therefore 8.

The question asks: is the order of 2 in Z∗13 equal to 12? Now we see that it is

not necessary to calculate each of the powers of 2, but only those powers 2i where

i is a divisor of 12 (which is φ(13) as the order of the (multiplicative) group Z∗n

is φ(n)). So we calculate 22 mod 13, 23 mod 13, 24 mod 13, and 26 mod 13.

If none of them turn out to be 1, then we can be sure that the order of 2 in Z∗13 is

12, and that 2 is a generator of the group Z∗13 (which is therefore cyclic). It turns

out that 2k mod 13 ≡ 1 for k = 2, 3, 4, 6 and therefore 2 is indeed a generator

of Z∗13 .

Exercises

1. Let SLn (R) be the set of all real matrices with determinant 1. Prove that this is a

subgroup of GLn (R).

2. Let m, n be positive integers and let m be a divisor of n. Prove that Cm is a subgroup

of Cn .

3. Prove that a cyclic group G of order n has exactly φ(n) generators, i.e., elements

g ∈ G such that G =< g >.

4. Let G be a ﬁnite group with |G| even. Prove that it contains an element of order

2.

5. Prove that any ﬁnite subgroup of the multiplicative group C of the ﬁeld C of

complex numbers is cyclic.

During the last 20 years, the theory of elliptic curves over ﬁnite ﬁelds has been

found to be of great value to cryptography. As methods of factorisation of integers

are getting better and computers are getting more powerful, to maintain the same

level of security the prime numbers p and q in RSA have to be chosen bigger and

bigger, which slows calculations down. The idea of using elliptic curves over ﬁnite

ﬁelds belongs to Neal Koblitz [1] and Victor Miller [2] who in 1985 independently

proposed cryptosystems based on groups of points of elliptic curves. By now their

security has been thoroughly tested and in 2009 the National Security Agency of

the USA stated that “Elliptic Curve Cryptography provides greater security and

more efﬁcient performance than the ﬁrst generation public key techniques (RSA and

Difﬁe–Hellman) now in use”. Some researchers also see elliptic curves as the source

of cryptosystems for the next generation. Certicom www.certicom.com was the ﬁrst

company to market security products using elliptic curve cryptography.

104 3 Groups

Elliptic curves are not ellipses and do not look like them. They received their name due

to their similarities with denominators of elliptic integrals that arise in calculations

of the arc length of ellipses.

Deﬁnition 3.3.1 Let F be a ﬁeld, and a, b be scalars in F such that the cubic X 3 +

aX + b has no multiple roots. An elliptic curve E over a ﬁeld F is the set of solutions

(X, Y ) ∈ F 2 to the equation

Y 2 = X 3 + aX + b, (3.12)

When F is the ﬁeld of real numbers the condition on the cubic can be expressed

in terms of a and b. Let r1 , r2 , r3 be the roots (maybe complex) of X 3 + aX + b, taken

together with their multiplicities, i.e., such that

This real number is called the discriminant of the cubic, and the cubic has no multiple

roots if and only if this discriminant is nonzero, i.e.,

This condition also guarantees the absence of multiple roots over an arbitrary ﬁeld F.

Y 2 = X 3 + 3X + 4

The point (5, 2) ∈ Z27 belongs to this curve since 22 ≡ 53 + 3 · 5 + 4 mod 7 with

both sides being equal to 4.

When F = R is a ﬁeld of reals, the graph of an elliptic curve can have two different

forms depending on whether the cubic on the right-hand side of (3.12) has one or

three real roots (see Fig. 3.1).

3.3 The Abelian Group of an Elliptic Curve 105

Jacobi4 (1835) was the ﬁrst to suggest using the group law on a cubic curve. In

this section we will introduce the addition law for points of the elliptic curve (3.12),

so that it will become an abelian group. We will do this ﬁrst for elliptic curves over

the familiar ﬁeld of real numbers. These curves have the advantage that they can be

represented graphically.

deﬁne −P as the point (x, −y), which is symmetric to P about x-axis. It is clear that

(x, −y) ∈ E whenever (x, y) ∈ E.

(a) Suppose that P = Q and that the line PQ is not parallel to the y-axis. Then PQ

intersects E at the third point R. Then we deﬁne P + Q = −R (see Fig. 3.2).

(b) Suppose that P = Q and the tangent line to the curve at P is not parallel to the

y-axis. Further, suppose that the tangent line to the curve at P intersects E at

the third point R. Then we deﬁne 2P = P + P = −R. If the tangent line has a

“double tangency” at P, i.e., P is a point of inﬂection, then R is taken to be P.

(c) Suppose that P = Q and PQ is parallel to the y-axis. Then we deﬁne P+Q = ∞.

(d) Suppose that P = Q and the tangent line to the curve at P is parallel to the y-axis.

Then we deﬁne 2P = P + P = ∞.

(e) For every P ∈ E (including P = ∞) we deﬁne P + ∞ = P.

4 Carl Gustav Jacob Jacobi (1804–1851) was a German mathematician who made fundamental

contributions to elliptic functions, dynamics, differential equations, and number theory.

106 3 Groups

elliptic curve

Theorem 3.3.1 The elliptic curve E over R relative to this addition is an (inﬁnite)

abelian group. If P = (x1 , y1 ) and Q = (x2 , y2 ) are two points of E, then P + Q =

(x3 , y3 ), where

1. in case (a)

y2 − y1 2

x3 = − x1 − x2 , (3.16)

x2 − x1

y2 − y1

y3 = −y1 + (x1 − x3 ). (3.17)

x2 − x1

2. in case (b)

2

3x12 + a

x3 = − 2x1 , (3.18)

2y1

3x12 + a

y3 = −y1 + (x1 − x3 ). (3.19)

2y1

Proof First, we have to prove that the addition is deﬁned for every pair of (not

necessarily distinct) points of E. Suppose we are in case (a), which means x1 = x2 .

Then we have to show that the third point R on the line PQ exists. The equation of

this line is y = mx + c, where m = yx22 −y

−x1 and c = y1 − mx1 . A point (x, mx + c) of

1

3.3 The Abelian Group of an Elliptic Curve 107

Since we already have two real roots of this polynomial x1 and x2 we will have the

third one as well. Dividing the left-hand side of (3.20) by (x − x1 )(x − x2 ) will give

the factorisation

where x3 is this third root. Knowing x1 and x2 , the easiest way to ﬁnd x3 is to notice

that x1 + x2 + x3 = m2 , and express the third root as x3 = m2 − x1 − x2 . Since

m = yx22 −y 1

−x1 , this is exactly (3.16). Now we can also calculate y3 as follows

(remember (x3 , y3 ) represents −R, hence the minus). This will give us (3.17).

Case (b) is similar, except that m can now be calculated as the derivative dy/dx at

P. Implicit differentiation of (3.12) gives us

dy

2y = 3x 2 + a,

dx

or dy/dx = (3x 2 + a)/2y. Hence m = (3x12 + a)/2y1 . (We note that y1 = 0 in this

case.) This implies (3.18) and (3.19). �

It helps to visualise the point at inﬁnity ∞ as located far up the y-axis. Then it

becomes the third point of intersection of any vertical line with the curve. Then (c),

(d), and (e) of Deﬁnition 3.3.2 will implement the same set of rules as (a) and (b),

for the case when the point at inﬁnity is involved.

We deduced formulae (3.16)–(3.19) for the real ﬁeld R but they make sense for

any ﬁeld. Of course we have to remove references to parallel lines and interpret the

addition rule in terms of coordinates only.

Deﬁnition 3.3.4 Let F be a ﬁeld and let E be the set of pairs (x, y) ∈ F 2 satisfying

(3.12) plus a special symbol ∞. Then for any (x1 , y1 ), (x2 , y2 ) ∈ E we deﬁne:

(a) If x1 = x2 , then (x1 , y1 ) + (x2 , y2 ) = (x3 , y3 ), where x3 , y3 are deﬁned by

formulae (3.16) and (3.17).

(b) If y1 = 0, then (x1 , y1 ) + (x1 , y1 ) = (x3 , y3 ), where x3 , y3 are deﬁned by

formulae (3.18) and (3.19).

(c) (x, y) + (x, −y) = ∞ for all (x, y) ∈ E (including the case y = 0).

(d) (x, y) + ∞ = ∞ + (x, y) = (x, y) for all (x, y) ∈ E.

(e) ∞ + ∞ = ∞.

Theorem 3.3.2 For any ﬁeld F and for any elliptic curve

Y 2 = X 3 + aX + b, a, b ∈ F,

the set E with the operation of addition deﬁned in Deﬁnition 3.3.4 is an abelian group.

108 3 Groups

Proof It is easy to check that the identity element is ∞ and the inverse of P = (x, y)

is −P = (x, −y). So two axioms of a group are obviously satisﬁed. It is not easy

to prove that the addition, so deﬁned, is associative. We omit this proof since it is a

tedious calculation. �

Example 3.3.2 Suppose F = Z11 and the curve is given by the equation Y 2 = X 3 +7.

Then P = (5, 0) and Q = (3, 10) belong to the curve. We have

y2 −y1 10 −1 1

Indeed, if P + Q = (x3 , y3 ), then m = x2 −x1 = −2 = −2 = 2 = 6 and

x3 = m2 − x1 − x2 = 3 − 5 − 3 = 6,

y3 = −y1 + m(x1 − x3 ) = 0 + 6(−1) = 5,

3·32 +0

so P + Q = (6, 5). Calculating 2Q = (x4 , y4 ), we get m = 9 = 3 and

x4 = m2 − 2x1 = 9 − 2 · 3 = 3,

y4 = −y1 + m(x1 − x4 ) = −10 + 3 · 0 = 1,

so 2Q = (3, 1). The last equation 2P = ∞ follows straight from the deﬁnition (part

(c) of Deﬁnition 3.3.4).

The calculations in the last exercise can be done with GAP. The program has

to read the ﬁles elliptic.gd and elliptic.gi ﬁrst (given in the appendix).

Then the command EllipticCurveGroup(a,b,p); calculates the points of

the elliptic curve Y 2 = aX +b mod p. As you see below, GAP uses the multiplicative

notation for operations of elliptic curves:

Read("/.../elliptic.gd");

Read("/.../elliptic.gi");

gap> G:=EllipticCurveGroup(0,7,11);

EllipticCurveGroup(0,7,11)

gap> points:=AsList(G);

[ ( 2, 2 ), ( 2, 9 ), ( 3, 1 ), ( 3, 10 ), ( 4, 4 ), ( 4, 7 ), ( 5, 0 ),

( 6, 5 ), ( 6, 6 ), ( 7, 3 ), ( 7, 8 ), infinity ]

gap> P:=points[7];

( 5, 0 )

gap> Q:=points[4];

( 3, 10 )

gap> P*Q;

( 6, 5 )

gap> Qˆ2;

( 3, 1 )

gap> Pˆ2;

infinity

3.3 The Abelian Group of an Elliptic Curve 109

Exercises

Y 2 = X 3 + 4X + 11, Y 2 = X 3 + 6X + 11?

r1 + r2 + r3 = 0, r1 r2 + r1 r3 + r2 r3 = a, r1 r2 r3 = −b. (3.21)

4. Consider the elliptic curve E given by Y 2 = X 3 + X − 1 mod 7.

(a) Check that (1, 1), (2, 3), (3, 1), (4, 2), (6, 2) are points on E;

(b) Find another six points on this curve;

(c) Calculate −(2, 3), 2(4, 2), (1, 1) + (3, 1);

(d) Use GAP to show that E has 11 points in total.

5. Let F = Z13 and let the elliptic curve E be given by the equation Y 2 = X 3 +5X+1.

(a) Using GAP list all the elements of the abelian group E of this elliptic curve.

Hence ﬁnd the order of the abelian group E.

(b) Find (manually) the order of P = (0, 1) in E. Is E cyclic?

6. Using GAP generate the elliptic curve Y 2 = X 3 + 7X + 11 in Z46301 . Determine

its order and check that it is cyclic.

Deﬁnition 3.3.5 Let F be a ﬁnite ﬁeld. An element h ∈ F ∗ is called a quadratic

residue if there exists another element g ∈ F such that g 2 = h. Otherwise, it is called

a quadratic nonresidue.

Theorem 3.3.3 If F = Zp for p > 2, then exactly half of all nonzero elements of

the ﬁeld Z∗p are quadratic residues.

Proof Since p is odd, p − 1 is even. Then all nonzero elements of Zp can be split

into pairs,

Zp \ {0} = {±1, ±2, . . . , ±(p − 1)/2}.

Since i2 = (−i)2 , each pair gives us only one quadratic residue, hence we cannot

have more than (p − 1)/2 quadratic residues. On the other hand, if we have x 2 = y2 ,

then x 2 − y2 = (x − y)(x + y) = 0. Due to the absence of zero divisors, we then

have x = ±y. Therefore we have exactly (p − 1)/2 nonzero quadratic residues. �

set of nonzero quadratic residues is {1, 2, 4}.

110 3 Groups

or non-residue is of great importance for applications of elliptic curves. Even more

important are the algorithms for ﬁnding a square root of a, if it exists. The ﬁrst

question can be efﬁciently solved by using the following criterion.

Theorem 3.3.4 (Euler’s criterion) Let p be an odd prime and a ∈ Z∗p . Then

p−1 1 if a is a quadratic residue

a 2 =

−1 if a is a quadratic non-residue

p−1

a 2 = bp−1 = 1. For the converse see Exercise 5. �

The importance of this criterion is that we can use the Square and Multiply algo-

rithm to raise a to the power of p−1 2 and thus check if a is a quadratic residue or not.

By Theorem 2.3.2 the Square and Multiply algorithm has linear complexity, hence

this is an easy problem to solve. It is somewhat more difﬁcult to ﬁnd a square root of

an element of Zp , given that it is a quadratic residue. Reasonably fast polynomial time

algorithms exist—most notably the Tonelli–Shanks algorithm, however it is not fully

deterministic as it requires ﬁnding at least one quadratic non-residue. This necessary

quadratic non-residue is easy to ﬁnd using the trial and error method with the aver-

age expected number of trials being only 2. No fully deterministic polynomial-time

algorithm is known.

GAP uses the Tonelli–Shanks algorithm to extract square roots in ﬁnite ﬁelds. For

example, the following calculation shows that 12 is a quadratic non-residue in Z103

and 13 is a quadratic residue in this ﬁeld:

gap> RootMod(12,103);

fail

gap> RootMod(13,103);

61

Let p be a large prime. Let us try to estimate the number of points on the elliptic

curve Y 2 = f (X) over Zp , where f (X) is a cubic. For a solution with the ﬁrst

coordinate X to exist it is necessary and sufﬁcient that f (X) is a quadratic residue. It

is plausible to suggest that f (X) will be a quadratic residue for approximately half of

all points X ∈ Zp . On the other hand, if f (X) is a nonzero quadratic residue, then the

equation Y 2 = f (X) will have two solutions with X as the ﬁrst coordinate. Hence it

is reasonable to expect that the number of points on the curve will be approximately

p 5

2 · 2 + 1 = p + 1 (p plus the point at inﬁnity). Hasse (1930) gave the exact bound,

which we give here without a proof:

5 Helmut Hasse (1898–1979) was a German mathematician working in algebraic number theory,

known for many fundamental contributions. The period when Hasse’s most important discoveries

were made was a very difﬁcult time for German mathematics. When the Nazis came to power in

1933 a great number of mathematicians with Jewish ancestry were forced to resign and many of

them left the country. Hasse did not compromise his mathematics for political reasons, he struggled

3.3 The Abelian Group of an Elliptic Curve 111

Theorem 3.3.5 (Hasse’s Theorem) Suppose E is an elliptic curve over Zp and let

N be the number of points on E. Then

√ √

p + 1 − 2 p ≤ N ≤ p + 1 + 2 p. (3.22)

It was also shown that for any p and N satisfying (3.22) there exists a curve over Zp

having exactly N points.

As we have already seen, cryptography works with large objects with which it is

difﬁcult to calculate. Large elliptic curves are of great interest to it. Hasse’s theorem

says that to have a large curve we need a large ﬁeld. This can be achieved in two

ways. The ﬁrst is to have a large prime p. The second is to keep p small but to try

to build a new large ﬁeld F, as an extension of Zp . As we will see later, for every n

there is a ﬁeld containing exactly q = pn elements. There is a more general version

of Theorem 3.3.5 which also often goes by the name of “Hasse’s Theorem”.

Theorem 3.3.6 Suppose E is an elliptic curve over a ﬁeld F containing q elements

and let N be the number of points on E. Then

√ √

q + 1 − 2 q ≤ N ≤ q + 1 + 2 q. (3.23)

For cryptographic purposes, it is not uncommon to use elliptic curves over ﬁelds of

2150 or more elements. It is worth noting that for n ≥ 20 it is infeasible to list all

points on the elliptic curve over a ﬁeld of 2n elements.

Despite the fact that each curve has quite a few points there does not exist a

deterministic algorithm which will produce, in less than exponential time, a point

on a given curve Y 2 = f (X). In particular, it is difﬁcult to ﬁnd X such that f (X) is a

quadratic residue. In practice, fast probabilistic methods are used.

Example 3.3.4 Let F = Z5 . Consider the curve Y 2 = X 3 + 2. Let us list all the

points on this curve and calculate the addition table for the corresponding abelian

group E. The quadratic residues of Z5 are 1 = 12 = 42 and 4 = 22 = 32 . We shall

list all possibilities for x and in each case see what y can be:

x = 0 =⇒ y2 = 2, no solution

x = 1 =⇒ y2 = 3, no solution

x = 2 =⇒ y2 = 0 =⇒ y = 0

x = 3 =⇒ y2 = 4 =⇒ y = 2, 3

x = 4 =⇒ y2 = 1 =⇒ y = 1, 4

Hence we can list all the points of E. We have E = {∞, (2, 0), (3, 2), (3, 3), (4, 1),

(4, 4)}. Let us calculate the addition table.

(Footnote 5 continued)

against Nazi functionaries who tried (sometimes successfully) to subvert mathematics to political

doctrine. On the other hand, he made no secret of his strong nationalistic views and his approval of

many of Hitler’s policies.

112 3 Groups

∞ ∞ (2,0) (3,2) (3,3) (4,1) (4,4)

(2,0) (2,0) ∞ (4,1) (4,4)

(3,2) (3,2) (4,1) (3,3) ∞

(3,3) (3,3) (4,4) ∞ (3,2)

(4,1) (4,1) ∞

(4,4) (4,4) ∞ (3,2)

We see that 2 · (2, 0) = ∞, hence ord ((2, 0)) = 2. Also 3 · (3, 2) = 3 · (3, 3) = ∞,

while 2 · (3, 2) = ∞ and 2 · (3, 3) = ∞, hence ord ((3, 2)) = ord ((3, 3)) = 3.

Exercises

1. Fill the remaining empty slots of the table above and ﬁnd the orders of (4, 1) and

(4, 4).

2. Find all quadratic residues of the ﬁeld Z17 .

3. Use Hasse’s theorem to estimate the number of points on an elliptic curve over

Z2011 .

4. Prove that:

(a) the product of two quadratic residues and the inverse of a quadratic residue

are quadratic residues;

(b) the product of a quadratic residue and a quadratic non-residue is a quadratic

non-residue;

(c) the product of two quadratic non-residues is a quadratic residue.

p−1

5. Prove that if a is a quadratic non-residue, then a 2 = −1. (Use Wilson’s theorem,

which is Exercise 7 in Sect. 1.4.)

6. Use the trial and error method to ﬁnd a quadratic non-residue in Zp , where

p = 359334085968622831041960188598043661065388726959079837.

For calculating multiples efﬁciently the same rules apply as to calculating powers.

Below we give a complete analogue of the Square and Multiply algorithm.

Theorem 3.3.7 Given P ∈ E, for any positive integer N it is possible to calculate

N · P using no more than 2log2 N additions.

divisions to convert N into binary representation):

3.3 The Abelian Group of an Elliptic Curve 113

where m0 = log2 N and m0 > m1 > · · · > ms . We can ﬁnd all multiples 2mi · P,

i = 1, 2, . . . , s by successive doubling in m0 additions:

21 · P = P + P,

22 · P = 21 · P + 21 · P,

...

2m0 · P = 2m0 −1 · P + 2m0 −1 · P.

Now to calculate

we need no more than m0 extra additions. In total no more than 2m0 = 2log2 N.

Since n = log2 N is the length of the input, the complexity function f (n) is at most

linear in n or f (n) = (n). �

The algorithm presented here can be called the Double and Add algorithm. It

has linear complexity. Up to isomorphism, this is the same algorithm as Square and

Multiply.

We see that it is an easy task to calculate multiples of any point P on elliptic curve.

That is, it is easy to calculate N · P given an integer N and a point P on the curve.

However there is no easy way to calculate N given N · P and P. So the function

N → N · P is a one way function and it has been recognised by now that it has a great

signiﬁcance for cryptography. This branch of cryptography is called Elliptic Curve

Cryptography (ECC). It was proposed in 1985 by Victor Miller and Neil Koblitz as a

mechanism for implementing public-key cryptography alternative to RSA. We will

show one of the cryptosystems of ECC in the next section.

Exercises

Let G be the abelian group of the elliptic curve Y 2 = X 3 + 1234X + 17 over Z346111 .

(GAP will take a few seconds to generate this group, be patient.)

2. Calculate 123 · P.

3. If GAP uses the Double and Add algorithm to compute large multiples, how many

additions will GAP perform when calculating 1729 · P?

4. What is the order of P in G? (Use the command Order(P);.)

5. Calculate the order of G and decide whether P is a generator of G. (Use the

command Size(G);.)

114 3 Groups

elliptic curve. To illustrate the difﬁculties we may face here it is enough to say that

there is no known polynomial time algorithm for ﬁnding a single point on the curve.

This problem has not yet been fully resolved. However, there are fast probabilistic

methods which work for most messages, but for a small proportion of them these

methods fail to produce a point. The probability of such an unwanted event can be

managed and made arbitrarily small.

The following method was suggested by Koblitz [1]. Suppose that we have an

elliptic curve over Zp given by the equation Y 2 = X 3 + aX + b. We may assume that

our message is already represented by a number m. We will try to embed this number

in the X-coordinate of the point P = (X, Y ) ∈ E. Of course, we would like to make

X = m but this is not always possible since f (m) = m3 + am + b is a quadratic

residue only in about 50 % of cases. A failure rate of 1/2 is, of course, unacceptable.

Suppose that a failure rate of 2−k is acceptable for some sufﬁciently large positive

integer k. Then, for each of the numbers mi = km + i, where 0 ≤ i < k, we check

if f (mi ) is a quadratic residue. If f (mi ) is a quadratic residue, then we can ﬁnd a

point P = (X, Y ) ∈ E, for which X = mi (using, for example, the GAP command

RootMod(f(mi ),p);). This will be the plaintext. The message m can always be

recovered as m = X/k. We should choose p sufﬁciently large so that (m + 1)k < p

for any message m. Since we now have k numbers mi that represent the message, the

probability that for none of them f (mi ) is a quadratic residue will be less than 2−k .

If k = 10, then this means that we can add another junk digit to m (it will be

placed in the rightmost position) in order to get a point on the curve. This junk digit

will be discarded at the receiving end. If k = 100, then we can add two junk digits.

This is already sufﬁcient for practical purposes as 2−100 is very small.

Suppose we have chosen the prime number p = 17487707 and we would like to

represent the message “HAPPY NEW YEAR” using points of the elliptic curve

Y 2 = X 3 + 123X + 456 mod p. Let us encode letters as follows: A = 11,

B = 12, . . ., Z = 36 and suppose we view the failure rate 2−10 as acceptable.

Since our chosen prime has 8 digits we can make messages 6 digits long and still

have a possibility to add one junk digit. This means we have split our message into

blocks with three letters in each:

3.4 Applications to Cryptography 115

Calculating this, we initially added a junk digit zero to every xi and tried to ﬁnd a

matching yi . If we failed, we would change the last digit to 1, and, in the case of

another failure to 2, etc. We see that x3 gave us a quadratic residue straightaway, x1

and x2 needed the second attempt with the last digit 1 and x4 needed three attempts

with the last digits 0,1,2.

Exercises

1. Use the trial and error method to ﬁnd a quadratic residue r and a quadratic non-

residue n in Zp , where

p = 359334085968622831041960188598043661065388726959079837.

2. Represent the message “CHRISTMAS” using the points of the elliptic curve

Y 2 = X 3 + 123X + 456 mod 17487707. (Note that you do not have to generate

the whole group of points for this curve, which would be time consuming.)

Cryptosystem

The exponential Difﬁe–Hellman key exchange can easily be adapted for elliptic

curves. Suppose that E is a publicly known elliptic curve over Zp . Alice and Bob,

through an open channel, agree upon a point Q ∈ E. Alice chooses a secret positive

integer kA (her private multiplier) and sends kA · Q to Bob. Bob chooses a secret pos-

itive integer kB (his private multiplier) and sends kB · Q to Alice. Bob then calculates

P = kB · (kA · Q) = kA kB · Q, and Alice calculates P = kA · (kB · Q) = kA kB · Q.

They now both know the point P which they can use as the key for a conventional

secret key cryptosystem. An eavesdropper wanting to spy on Alice and Bob would

face the following task called the Difﬁe–Hellman problem for elliptic curves:

The Difﬁe–Hellman Problem: Given Q, kA · Q, kB · Q (but not kA or kB ), ﬁnd

P = kA kB · Q. No polynomial time algorithms are known for this problem.

Elgamal6 (1985) modiﬁed the Difﬁe–Hellman idea to adapt it for message trans-

mission (see [3], p. 287). It starts as above with Alice and Bob publicly announcing

Q and exchanging kB · Q and kA · Q, which play the role of their public keys. Alter-

natively you may think that there is a public domain run by a trusted authority where

Q is stored and that any new entrant, say Cathy, chooses her private multiplier kC

and publishes her public key kC · Q there.

Elgamal published a paper titled “A Public Key Cryptosystem and a Signature Scheme Based on

Discrete Logarithms” in which he proposed the design of the Elgamal discrete logarithm cryptosys-

tem and of the Elgamal signature scheme. He is also recognized as the “father of SSL”, which is

a protocol for transmitting private documents via the Internet that is now the industry standard for

Internet security and ecommerce.

116 3 Groups

agreed upon way, and that Bob wants to send Alice a message M ∈ E. He chooses

a secret random integer s (for each message a distinct random number should be

generated), reads Alice’s public key kA · Q from the public domain and sends Alice

the pair of points C1 = s · Q and C2 = M + s · (kA · Q). On the receiving end, Alice,

using her private multiplier kA , can calculate the plaintext as M = C2 − kA · C1 .

Nobody else can do this without knowing Alice’s private multiplier kA .

Exercises

1. Alice and Bob are setting up the Elgamal elliptic curve cryptosystem for private

communication. They’ve chosen a point Q = (88134, 77186) on the elliptic curve

E given by Y 2 = X 3 +12345 over Z95701 . They’ve chosen their private multipliers

kA = 373 and kB = 5191 and published the points QA = (27015, 92968) and

QB = (55035, 17248), respectively. They agreed to cut the messages into two-

letter segments and encode the letters as A = 11, B = 12, . . . , Z = 36, space =

41, ’ = 42, . = 43, , = 44, ? = 45. They also agreed that, for each point (x, y),

only the ﬁrst four digits of x are meaningful (so that they can add additional junk

digits to their messages, if needed, to obtain a point on the curve).

(a) Alice got the message:

[ ( 50702, 2643 ), ( 33440, 56603 ) ], [ ( 34778, 12017 ), ( 81577, 501 ) ],

[ ( 93385, 52237 ), ( 38536, 21346 ) ], [ ( 63482, 12110 ), ( 70599, 87781 ) ],

[ ( 16312, 46508 ), ( 62735, 69061 ) ], [ ( 64937, 58445 ), ( 41541, 36985 ) ],

[ ( 40290, 45534 ), ( 11077, 77207 ) ], [ ( 64001, 62429 ), ( 32755, 18973 ) ],

[ ( 81332, 47042 ), ( 35413, 9688 ) ], [ ( 5345, 68939 ), ( 475, 53184 ) ] ]

(b) She suspects that the sender of the message was Bob. Show how Alice may

reply to this message and how Bob will decrypt it.

References

1. Koblitz, N.: Elliptic curve cryptosystems. Math. Comput. 48, 203–209 (1987)

2. Miller, V.: Uses of Elliptic Curves in Cryptography. Advances in Cryptology—Crypto ’85, pp.

417–426 (1986)

3. Koblitz, N.: Algebraic Aspects of Cryptography. Springer, Berlin (1998)

4. Shanks, D.: Five number theoretic algorithms. In: Proceedings of the Second Manitoba Confer-

ence on Numerical Mathematics, pp. 51–70 (1973)

Chapter 4

Fields

Who sank on you with glory here?

Ruslan and Liudmila. Alexander Pushkin (1799–1837)

In Sect. 1.4 we deﬁned a ﬁeld and proved that, for any prime p, the set of integers

Z p = {0, 1, 2, . . . , p − 1} with the operations:

a ⊕ b = a + b mod p,

a b = ab mod p

is a ﬁeld. This ﬁeld has cardinality p. So far, these are the only ﬁnite ﬁelds we have

learned. In this chapter we prove that a ﬁnite ﬁeld must have cardinality pn for some

prime p and positive integer n, i.e., its cardinality may only be a power of a prime.

Such ﬁelds exist and we lay the grounds for the construction of such ﬁelds in Chap. 5.

In this chapter we also prove a very important result that the multiplicative group

of any ﬁnite ﬁeld is cyclic. This makes it possible to deﬁne “discrete logarithms”—

special functions on ﬁnite ﬁelds that are difﬁcult to compute, and widely used in

cryptography. We show that the Elgamal cryptosystem can also be based on the

multiplicative group of a large ﬁnite ﬁeld.

We recap that an algebraic system consisting of a set F equipped with two operations,

addition + and multiplication ·, is called a ﬁeld if the following nine axioms are

satisﬁed:

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_4

118 4 Fields

F2. Addition is associative, a + (b + c) = (a + b) + c, for all a, b, c ∈ F.

F3. There exists a unique element 0 such that a + 0 = 0 + a = a, for all a ∈ F.

F4. For every element a ∈ F there exists a unique element −a such that a + (−a) =

(−a) + a = 0, for all a ∈ F.

F5. Multiplication is commutative, a · b = b · a, for all a, b ∈ F.

F6. Multiplication is associative, a · (b · c) = (a · b) · c, for all a, b, c ∈ F.

F7. There exists a unique element 1 ∈ F such that a · 1 = 1 · a = a, for all nonzero

a ∈ F.

F8. The distributive law holds, that is, a · (b + c) = a · b + a · c, for all a, b, c ∈ F.

F9. For every nonzero a ∈ F there is a unique element a −1 ∈ F such that a · a −1 =

a −1 · a = 1.

Here and later, for any ﬁeld F the set of its non-zero elements will be denoted by

F ∗ . We note that axioms F1–F4 mean that F is an abelian group relative to the

addition and axioms F5–F7 mean that F ∗ is also an abelian group but relative to the

multiplication. Axioms F1–F8 mean that F is a commutative ring relative to the two

operations. Only the last axiom is speciﬁc for ﬁelds.

The examples of inﬁnite ﬁelds are numerous. The most important are the ﬁelds

of rational numbers Q, real numbers R, and complex numbers C.

relative to the same operations of addition and multiplication as in F. If so, we say

that G is a subﬁeld of F.

Three basic properties of ﬁelds are stated in the following theorem. The second

one is called absence of divisors of zero and the third solvability of linear equations.

We saw these properties hold for Z p but now we would like to prove them for arbitrary

ﬁelds.

(i) a0 = 0 for all a ∈ F;

(ii) ab = 0 if and only if a = 0 or b = 0 (or both);

(iii) if a = 0, the equation ax = b has a unique solution x = a −1 b in F.

F3 F8

0 · a = (0 + 0) · a = 0 · a + 0 · a.

F2 F4 F3

0 = −(0 · a) + (0 · a + 0 · a) = (−(0 · a) + 0 · a) + 0 · a = 0 + 0 · a = 0 · a.

4.1 Introduction to Fields 119

we assume the former. Then by F9 we know that a −1 exist. We have now

F6 F9 F7

0 = a −1 · 0 = a −1 (ab) = (a −1 a)b = 1 · b = b.

Let us prove (iii). Since a = 0, we know a −1 exists. Suppose that the equation

ax = b has a solution. Then multiplying both sides by a −1 we get a −1 (ax) = a −1 b.

As in the proof of (i) we calculate that the left-hand side of this equation is x. So

x = a −1 b. It is also easy to check that x = a −1 b is indeed a solution of ax = b. �

A very important technique is enlarging a given ﬁeld to obtain a larger ﬁeld with

some given property. After learning a few basic facts about polynomials we discuss

how to make such extensions.

Exercises

1. Prove that the set of all non-negative rational numbers Q+ is NOT a ﬁeld.

2. Prove that the set of all integers Z is NOT√a ﬁeld. √

3. Prove that the set of all real numbers Q( 2) of the form x + y 2, where x and

y are in Q is√

a ﬁeld.

4. Consider Q( 3), which is deﬁned √similarly to the ﬁeld from the previous exercise.

Find the inverse element of 2 − 3 and solve the equation

√ √

(2 − 3)x = 1 + 3.

3x + y + 4z = 1

x + 2y + z = 2

4x + y + 4z = 4

with coefﬁcients in Z 5 .

The reader familiar with Linear Algebra may well skip this section.

120 4 Fields

VS1. a ﬁeld F of scalars;

VS2. a set V of objects, called vectors;

VS3. a rule (or operation) called vector addition, which associates with each pair

of vectors u, v in V a vector u + v in V , called the sum of u and v, in such a

way that

(a) addition is commutative, u + v = v + u;

(b) addition is associative, u + (v + w) = (u + v) + w;

(c) there exists a unique vector 0 in V , called the zero vector, such that

u + 0 = u for all u in V ;

(d) for each vector u in V there is a unique vector −u in V such that u +

(−u) = 0;

VS4. a rule (or operation) called scalar multiplication, which associates with each

scalar a in F and vector u in V a vector au in V , called the product of a and

u, in such a way that

(a) 1u = u for all u in V ;

(b) a1 (a2 u) = (a1 a2 )u;

(c) a(u + v) = au + av;

(d) (a1 + a2 )u = a1 u + a2 u.

Then we call V a vector space over the ﬁeld F.

Example 4.1.2 Where F is a ﬁeld, F n is the set of n-tuples whose entries are scalars

from F. It is a vector space over F relative to the following addition and scalar

multiplication:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

a1 b1 a1 + b1 a1 ka1

⎢ a2 ⎥ ⎢ b2 ⎥ ⎢ a2 + b2 ⎥ ⎢ a2 ⎥ ⎢ ka2 ⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ .. ⎥ + ⎢ .. ⎥ = ⎢ .. ⎥, k ⎢ . ⎥ = ⎢ . ⎥.

. .

⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦

an bn an + bn an kan

In particular, Rn , Cn and Znp are vector spaces over the ﬁelds R, C and Z p , respec-

tively.

Example 4.1.3 Let Fm×n be the set of m ×n matrices whose entries are scalars from a

ﬁeld F. It is a vector space over F relative to matrix addition and scalar multiplication.

The sets of all m × n matrices Rm×n , Cm×n and (Z p )m×n with entries from R, C

and Z p are vector spaces over the ﬁelds R, C and Z p , respectively.

Example 4.1.4 Let F be a ﬁeld, and Fn [x] be the set of all polynomials of degree

at most n whose coefﬁcients are scalars from F. It is a vector space over F relative

to the addition of polynomials and scalar multiplication. The sets of all polynomials

4.1 Introduction to Fields 121

Rn [x], Cn [x] and (Z p )n [x] of degree at most n with coefﬁcients from R, C and Z p

are vector spaces over the ﬁelds R, C and Z p , respectively.

Example 4.1.5 Let F be a ﬁeld, F[x] be the set of all polynomials (without restric-

tion on their degrees), whose coefﬁcients are scalars from F. It is a vector space

over F relative to addition of polynomials and scalar multiplication. The sets of all

polynomials R[x], C[x] and Z p [x] with coefﬁcients from R, C and Z p are vector

spaces over the ﬁelds R, C and Z p , respectively.

relative to the following operations. The addition of elements of G is the operation

of addition in the ﬁeld G. The scalar multiplication of elements of G by elements of

F is performed as multiplication in the ﬁeld G.

Proof This is an exercise. Check that the vector space axioms for G all follow from

the ﬁeld axioms. �

Example 4.1.6 The ﬁeld of complex numbers C is a vector space over the reals R

which is a subﬁeld of C. Both C and R are vector spaces over the rationals Q.

The axioms of a vector space have many useful consequences. The two most

important ones are as follows:

0 · v = 0, (−1) · v = −v.

Proof We will prove only the ﬁrst one, the second is an exercise. We will use VS4

(d) for this. We have

0 · v = (0 + 0) · v = 0 · v + 0 · v.

on both sides we get 0 = x. �

Deﬁnition 4.1.3 Let V be a vector space over the ﬁeld F and v1 , . . . , vk be arbitrary

vectors in V . Then the set of all possible linear combinations a1 v1 + a1 v2 + · · · +

ak vk with coefﬁcients a1 , . . . , ak in F is called the span of v1 , . . . , vk and denoted

span{v1 , . . . , vk }.

Deﬁnition 4.1.4 Let V be a vector space over the ﬁeld F. The space V is said to

be ﬁnite-dimensional if there exists a ﬁnite number of vectors v1 , v2 , . . . , vk which

span V , that is V = span{v1 , v2 , . . . , vk }.

monomials {1, x, x 2 , . . . , x n } spans it. The space of polynomials F[x] is inﬁnite-

dimensional.

122 4 Fields

Proof We will concentrate only on the second part of this example (for the ﬁrst see

the exercise below). Suppose F[x] is ﬁnite-dimensional and there exist polynomials

f 1 , f 2 , . . . , f n such that

F[x] = span{ f 1 , f 2 , . . . , f n }.

Let us choose a positive integer N such that N > deg ( f i ) for all i = 1, . . . , n.

As { f 1 , f 2 , . . . , f n } spans F[x] we can ﬁnd scalars a1 , a2 , . . . , an such that x N =

a1 f 1 + a2 f 2 + · · · + an f n . Then

G(x) cannot have more than N roots. (When F = R, this result is well-known. For

an arbitrary ﬁeld this will be proved in Proposition 5.1.3.) �

Deﬁnition 4.1.5 Let V be a vector space over the ﬁeld F. A subset {v1 , v2 , . . . , vk }

of V is said to be linearly dependent if there exist scalars a1 , a2 , . . . , ak in F, not all

of which are 0, such that

a1 v1 + a2 v2 + · · · + ak vk = 0.

Example 4.1.8 Let Fm×n be the space of all m × n matrix with entries from F. Let

E i j be the matrix whose (i j)-entry is 1 and all other entries are 0. Such a matrix is

called a matrix unit. The set of all n 2 matrix units is linearly independent.

Fn [x].

set of vectors which spans V .

subset of V can be reduced to a basis.

pendent. Then

a1 v1 + a2 v2 + · · · + ak vk = 0

and at least one coefﬁcient is nonzero. Without loss of generality we may assume

that ak = 0. Then

vk = −(ak−1 a1 )v1 − · · · − (ak−1 ak−1 )vk−1

4.1 Introduction to Fields 123

bination of v1 , v2 , . . . , vk−1 . Indeed,

this process until the remaining system of vectors is linearly independent. Then we

will have arrived at a basis for V . �

space V over a ﬁeld F and v ∈ V . Then there exist a unique n-tuple (a1 , a2 , . . . , an )

of elements of F such that

v = a1 v1 + a2 v2 + · · · + an vn . (4.1)

Proof The fact that there is at least one such n-tuple follows from the fact that

{v1 , v2 , . . . , vn } spans V . Suppose there were two different ones:

v = a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn .

Then

(a1 − b1 )v1 + · · · + (an − bn )vn = 0,

basis for V over F. Then V contains q n elements.

(4.1). Each coefﬁcient ai appearing in this linear combination may take any one of

q values. The total number of such linear combinations will therefore be q n . This is

how many elements V has. �

In the case when F is ﬁnite, it is now clear that all bases are equinumerous, i.e.,

contain the same number of vectors. This is also true in general.

dimension of V is the number of vectors in any basis of V . It is denoted dimF V .

Exercises

1. Check that F n satisﬁes all axioms of a vector space, observing how these axioms

follow from the axioms of a ﬁeld.

2. Justify the statement in Example 4.1.8.

3. Justify the statement in Example 4.1.9.

124 4 Fields

vector space over F. Find its dimension over F.

5. Let V be the set of positive real numbers with the addition

df

u ⊕ v = uv,

i.e., the new addition is the former multiplication. Also for any real number a ∈ R

and any u ∈ V we deﬁne the scalar multiplication

df

a u = ua .

Theorem 4.1.4 Any ﬁnite ﬁeld F contains one of the ﬁelds Z p for a certain prime

p. In this case F is a vector space over Z p and it contains p n elements, where

n = dimZ p F.

by adding m ones, that is m · 1 = 1 + · · · + 1 (m times). When m = 1, 2, . . ., we

obtain the sequence

1, 2 · 1, 3 · 1, . . . , m · 1, . . .

The following is clear from the ring axioms: for any positive integers a, b

a · 1 + b · 1 = (a + b) · 1, (4.2)

(a · 1) · (b · 1) = (ab) · 1. (4.3)

(m 2 − m 1 ) · 1 = 0. Let p be the minimal positive integer for which p · 1 = 0. Then

p is prime. If not, and p = ab for a < p and b < p, then a · 1 = 0 and b · 1 = 0

but (a · 1) · (b · 1) = (ab) · 1 = p · 1 = 0. This is a contradiction since F, being a

ﬁeld, by Theorem 4.1.1, contains no zero divisors.

Now, since p · 1 = 0, the Eqs. (4.2) and (4.3) become

a · 1 + b · 1 = (a ⊕ b) · 1,

(a · 1) · (b · 1) = (a b) · 1,

where ⊕ and are addition and multiplication modulo p. We can now recognise

that the set {0, 1, 2 · 1, . . . , ( p − 1) · 1} together with the operations of addition and

multiplication in F is in fact Z p . By Theorem 4.1.2 F is a vector space over Z p .

4.1 Introduction to Fields 125

Lemma 4.1.1, there are exactly p n elements of F. �

The theorem we have proved states that the cardinality of any ﬁnite ﬁeld is a power

of a prime. The converse is also true.

Deﬁnition 4.1.8 If p · 1 = 0 in a ﬁeld F for some prime p, then this prime p is said

to be the characteristic of F. If such a prime does not exist, the ﬁeld F is said to have

characteristic 0.

Theorem 4.1.5 For any prime p and any positive integer n there exists a ﬁeld of

cardinality p n . This ﬁeld is unique up to isomorphism.

Proof We will show how to construct the ﬁelds of cardinality pn in the next chapter.

The uniqueness, however, is beyond the scope of this book. �

The unique ﬁeld of cardinality p n is denoted GF( p n ) and is called the Galois

ﬁeld of p n elements.1

Exercises

1. Let n 1 = 449873499879757801 and n 2 = 449873475733618561. Find out if

there are ﬁelds GF(n 1 ) and GF(n 2 ). In case GF(n i ) exists for i = 1 or i = 2,

identify the prime number p such that Z p ⊆ GF(n i ) and determine the dimension

of GF(n i ) over Z p .

2. Let F be a ﬁnite ﬁeld of q elements. Prove that all its elements are roots of the

equation x q − x = 0. (Hint. Consider the multiplicative group (F ∗ , ·) of this ﬁeld

and use Corollary 3.2.3.

In any ﬁeld F the set F ∗ of all nonzero elements play a very important role. Axiom

F9 states that all elements of F ∗ are invertible. Moreover, this axiom, together with

axioms F5–F7 imply that F ∗ relative to the operation of multiplication is a commu-

tative group. This group is called the multiplicative group of F. Our goal for the rest

of this chapter is to prove that in any ﬁnite ﬁeld F the multiplicative group of F is

cyclic.

We will concentrate our attention on orders of elements in F ∗ . Eventually, we will

ﬁnd that there is always an element in F ∗ whose order is exactly the cardinality of

this group, thus proving that F ∗ is cyclic.

We now look at the ﬁeld Z7 to get an intuition of what is to come. In this case

Z∗7 = {1, 2, 3, 4, 5, 6}. Let us calculate the powers of each element:

1 See Sect. 3.1.3 for a brief historical note about Évariste Galois.

126 4 Fields

Powers of 1: 1, 12 = 1.

Powers of 2: 2, 22 = 4, 23 = 1; so there are 3 elements in Z7 which are powers

of 2.

Powers of 3: 3, 32 = 2, 33 = 6, 34 = 4, 35 = 5, 36 = 1; so all nonzero elements

are powers of 3.

Powers of 4: 4, 42 = 2, 43 = 1, so there are three distinct powers of 4.

Powers of 5: 5, 52 = 4, 53 = 6, 54 = 2, 55 = 3, 56 = 1, so all nonzero elements

are powers of 5.

Powers of 6: 6, 62 = 1, so there are two powers.

We summarise our experience: the element 1 has order 1, the elements 2 and 4

have order 3, the elements 3 and 5 have order 6, and the element 6 has order 2. Hence

Z∗7 =< 3 >=< 5 >, it is cyclic and has two generators 3 and 5.

ord (g) = ord (g −1 ).

conversely. Therefore the orders of g and of g −1 are the same. �

Lemma 4.2.2 Every element of a ﬁnite group has a ﬁnite order. Moreover, in a ﬁnite

group the order of any element is a divisor of the total number of elements in the

group.

Proof Let G be a ﬁnite group containing g. Then by Proposition 3.2.1 ord (g) =

|<g>|, which is a divisor of |G| by Lagrange’s theorem. �

that g n = 1. Then ord (g)|n, i.e., ord (g) is a divisor of n.

Proof Let ord (g) = m. Suppose n = qm + r , where 0 ≤ r < m, and suppose that

r = 0. Then 1 = g n = g qm+r = (g m )q · gr = gr which contradicts the minimality

of m. �

Equation 3.11 will play a crucial role in the proof of our next lemma. To recap,

Eq. 3.11 says that for any element g ∈ G and positive integer i

ord (g)

ord (g i ) = . (4.4)

gcd(i, ord (g))

Lemma 4.2.4 If g is an element of a group G and ord (g) = ki, where k and i are

positive integers, then ord (g i ) = k.

4.2 The Multiplicative Group of a Finite Field Is Cyclic 127

ord (g) ki

ord (g i ) = = = k. �

gcd(i, ord (g)) i

Lemma 4.2.5 Let G be a commutative group, and a and b be two elements of G that

have orders m and n, respectively. Suppose that gcd(m, n) = 1. Then ord (ab) = mn.

Proof Since (ab)mn = a mn bmn = 1, we know by Lemma 4.2.3 that ord (ab)|mn.

Suppose that for some k the equality (ab)k = 1 holds. Then (ab)k = a k bk = 1

and a k = (bk )−1 . Let c = a k = (bk )−1 . Then cm = (a k )m = (a m )k = 1 and

cn = ((bk )−1 )n = ((bn )k )−1 = 1. As 1 = gcd(m, n) = um + vn for some integers

u and v, we may write c = cum+vn = cum · cvn = (cm )u · (cn )v = 1. Thus

a k = bk = 1 and by Lemma 4.2.3 we have m|k and n|k. This implies mn|k, because

m and n are relatively prime. If k = ord (ab), we get mn|ord (ab) and together with

ord (ab)|mn we get ord (ab) = mn. �

ements of ﬁnite order such that ord (ai ) = piαi and all primes p1 , p2 , . . . , pk are

different. Then

ord (a1 a2 . . . ak ) = p1α1 p2α2 . . . pkαk .

Let us show how to use these elements to construct an element g ∈ G such that

ord (g) = m and a m = bm = cm = 1.

We claim that m can be taken as lcm(ord (a), ord (b), ord (c)) = 53 · 72 · 172 and

g = a 17 b5 c7 . Indeed, by Lemma 4.2.4 we have

ord (a)|m, ord (b)|m, ord (c)|m, which implies a m = bm = cm = 1.

Exercises

Prove that it is cyclic.

2. Let g, h, k be elements of a ﬁnite abelian group G of orders 183618, 131726,

127308, respectively. Use g, h, k to construct an element of G of order 1018264

646281, i.e., express an element of this order using g, h, k.

128 4 Fields

Theorem 4.2.1 Let G be a ﬁnite commutative group. Then there exists an element

g ∈ G such that ord (g) = m ≤ |G| and x m = 1 for all x ∈ G.

Proof Let us consider the set of integers I = {ord (g) | g ∈ G} and let p1 , p2 , . . . , pn

be the set of all primes that occur in the prime factorizations of integers from I . For

each such prime pi let us choose the element gi such that ord (gi ) = piαi qi , where

gcd( pi , qi ) = 1 and the integer αi is maximal among all elements of G. (Note that

the same element might correspond to several primes, i.e., among g1 , g2 , . . . , gn not

q

all elements may be distinct.) Then by Lemma 4.2.4 for the element h i = gi i we

αi

have ord (h i ) = pi . Set g = h 1 h 2 . . . h n . Then, by Corollary 4.2.1,

and it is also clear that m divides the order of every element in G, thus x m = 1 for

all x ∈ G. Moreover, m ≤ |G| by Lemma 4.2.2. �

Theorem 4.2.2 Let F be a ﬁnite ﬁeld consisting of q elements. Then there exists an

element g ∈ F ∗ such that ord (g) = |F ∗ | = q − 1, i.e., F ∗ = <g>.

Theorem 4.2.1 there exists an element of order m ≤ q − 1 such that x m = 1 for all

x ∈ F ∗ . In the next chapter we will prove that a polynomial of degree n over any

ﬁeld has no more than n roots in that ﬁeld. The polynomial x m − 1 can be considered

as a polynomial from F[x]; it has degree m and q − 1 roots in F. Since q − 1 ≥ m,

this is possible only if m = q − 1. The theorem is proved. �

g ∈ F ∗ of order q − 1 is called a primitive element of F.

Corollary 4.2.3 Let F be a ﬁnite ﬁeld consisting of q elements. Then ord (a) divides

q − 1 for every element a ∈ F ∗ .

Proof Let g be a primitive element of F. Then ord (g) = q − 1 and a = g i for some

1 ≤ i < q − 1. Then by Lemma 4.2.4 ord (a) = ord (g i ) = q − 1/gcd(i, q − 1),

which is a divisor of q − 1. �

Theorem 4.2.3 For each prime p and positive integer n there is a unique, up to

isomorphism, ﬁnite ﬁeld GF( pn ) that consists of p n elements. Its elements are the

n

roots of the polynomial f (x) = x p − x.

Proof We cannot prove the ﬁrst part of the statement, i.e., the existence of F =

GF( p n ) but we can prove the second. Suppose F exists and g is a primitive element.

Then every nonzero element a of F lies in F ∗ , which is a cyclic group of order p n − 1

4.2 The Multiplicative Group of a Finite Field Is Cyclic 129

n

with generator g. By Corollary 4.2.3 ord (a) is a divisor of p n − 1, hence a p −1 = 1.

n

It follows that a p = a for all a ∈ F, including 0, which proves the second part of

the theorem. �

The idea behind the proof of the existence of GF( p n ) is as follows. Firstly we

construct an extension Z p ⊂ K such that every polynomial with coefﬁcients in Z p

n

has a root in K . Then the polynomial f (x) = x p − x will have p n roots in K and

we have to check that f (x) does not have multiple roots. These pn distinct roots will

then be a ﬁeld GF( p n ).

From our considerations it follows that, if m|n, then GF( pm ) is a subﬁeld of

m

GF( p n ). Indeed, any root of the equation x p = x will also be a root of the equation

n

x p = x (see Exercise 3 that follows).

Exercises

in Z∗p of orders 11561 and 58380?

2. Let p be a prime and m, n be positive integers. Then pm − 1 divides p n − 1 if

and only if m divides n.

3. GF( p m ) is a subﬁeld of GF( p n ) if and only if m|n.

a primitive element of F. Then the equation g x = h has a unique solution modulo

q − 1 which is called the discrete logarithm of h to base g, denoted logg (h).

Thus 3 is a primitive element of Z7 and log3 (3) = 1, log3 (2) = 2, log3 (6) = 3,

log3 (4) = 4, log3 (5) = 5, log3 (1) = 6.

Example 4.2.3 For example, g = 3 is a primitive element of Z19 as seen from the

table featuring powers of 3:

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

3n 3 9 8 5 15 7 2 6 18 16 10 11 14 4 12 17 13 1

130 4 Fields

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

log3 (n) 18 7 1 14 4 8 6 3 2 11 12 15 17 13 5 10 16 9

difﬁcult. So we can now add the following problem to our list of apparently hard

problems in Number Theory.

Discrete Logarithm Problem: Given a prime p, a generator g of Z∗p and an

element h ∈ Z∗p , ﬁnd the integer x such that g x = h and 0 ≤ x ≤ p − 2.

We recap that an element h of a ﬁnite ﬁeld F is a quadratic residue if there exists

another element g ∈ F such that g 2 = h.

Then an element h ∈ F ∗ is a quadratic residue if and only if logg (h) is even.

The reverse is clearly also true. Indeed, if h is a quadratic residue, then h = (h 1 )2 for

some h 1 ∈ F ∗ . Since h 1 = g k for some k, we get h = (g k )2 = g 2k and logg (h) = 2k

is even. �

Exercises

1. How many primitive elements are there in the ﬁeld Z1237 ?

2. Let F = Z17 .

(a) Decide whether 2 or 3 is a primitive element of F. Denote the one which is

primitive by g.

(b) Compute the table of powers of g in F and the table of discrete logs to base g.

3. Let g be a primitive element in a ﬁnite ﬁeld F consisting of q elements. Prove

that

logg (ab) ≡ logg (a) + logg (b) mod q−1.

complexity of factoring integers. Here we present a cryptosystem whose security is

based on the complexity of calculating discrete logarithms. It is based on the Difﬁe–

Hellman key exchange agreement. It was invented by Taher Elgamal in 1985. The

4.3 The Elgamal Cryptosystem Revisited 131

Elgamal algorithm is used in the free GNU Privacy Guard software, recent versions

of PGP, and other cryptosystems.

In a public domain, a large prime p and a primitive element α of Z p are displayed.

Each participant of the group, who wants to send or receive encrypted messages,

creates their private and public keys. Alice, for example, selects a secret integer k A

and calculates α k A which she places in the public domain as her public key. Bob

selects a secret integer k B and calculates α k B which he places in a public domain as

his public key. Now they can exchange messages.

Suppose, for example, that Bob wants to send a message m to Alice. We’ll assume

that m is an integer such that 0 ≤ m < p. (If m is larger, he breaks it into several

blocks as usual.) He chooses a secret random integer s and computes c1 = α s in

Z p . He also takes Alice’s public key α k A from the public domain and calculates

c2 = m · (α k A )s . He sends this pair (c1 , c2 ) of elements of Z p to Alice so this is

the cyphertext. On the receiving end Alice uses her private key k A to calculate m

as follows: m = c2 · ((c1 )k A )−1 . For the evil eavesdropper Eve to ﬁgure out k A she

must solve a Discrete Logarithm Problem, which is difﬁcult.

Exercises

1. Alice and Bob agreed to use the Elgamal cryptosystem based on the multiplicative

group of the ﬁeld Z p for p = 53. They also agreed to use 2 as the primitive element

of Z p . Since p is small their messages consist of a single letter which is encoded as

Bob’s public key is 32 and Alice sent him the message (30, 42). Which letter did

Alice send to Bob in this message?

2. Alice and Bob have set up the multiplicative Elgamal cryptosystem for private

communication. They’ve chosen an element g = 123456789 in the multiplicative

group of the ﬁeld Z p , where p = 123456789987654353003. They’ve chosen

their private exponents k A = 373 and k B = 5191 and published the elements

g A = 52808579942366933355 and g B = 39318628345168608817, respectively.

They agreed to cut the messages into ten-letter segments and encode the letters

as A = 11, B = 12, . . ., Z = 36, space = 41, ’ = 42, . = 43, , = 44, ? = 45.

Bob got the following message from Alice:

[ [ 83025882561049910713, 66740266984208729661 ],

[ 117087132399404660932, 44242256035307267278 ],

[ 67508282043396028407, 77559274822593376192 ],

[ 60938739831689454113, 14528504156719159785 ],

[ 5059840044561914427, 59498668430421643612 ],

[ 92232942954165956522, 105988641027327945219 ],

[ 97102226574752360229, 46166643538418294423 ] ]

Chapter 5

Polynomials

A polynomial walks into a bar and asks for a drink. The barman

declines: “We don’t cater for functions.”

An old math joke.

This chapter is about polynomials and their use. After learning the basics we discuss

Lagrange’s interpolation needed for Shamir’s secret sharing scheme that we discuss

in Chap. 6. Then, after proving some further results on polynomials, we give a con-

struction of a ﬁnite ﬁeld whose cardinality pn which is a power of a prime p. This ﬁeld

is constructed as polynomials modulo an irreducible polynomial of degree n. The

ﬁeld constructed will be an extension of Zp and in this context we discuss minimal

annihilating polynomials which we will need in Chap. 7 for the construction of good

error-correcting coding.

k

f (x) = ai x i , ai ∈ F, (5.1)

i=0

where k is an arbitrary positive integer, is called a polynomial over F. The set of all

polynomials over F is denoted by F[x]. For k = 0 there is no distinction between the

scalar a0 and the polynomial f (x) = a0 . Thus we assume that F ⊂ F[x]. The zero

polynomial 0 is a very special one. Any other polynomial f (x) = 0 we can write in

the form (5.1) with ak = 0 and deﬁne its degree as follows.

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_5

134 5 Polynomials

Deﬁnition 5.1.1 Given a nonzero polynomial f (x) = ki=0 ai x i , with ak = 0, the

number k is said to be the degree of f (x) and will be denoted deg (f ). Note that deg (f )

is undeﬁned if f = 0. Colloquially speaking, the degree of f (x) is the highest power

of x which appears.

k

m

f (x) = ai x i , g(x) = bi x i

i=0 i=0

that these two polynomials are equal, and write f (x) = g(x), if k = m and ai = bi

for all i = 0, 1, 2, . . . , k.

The addition and multiplication in the ﬁeld induces the corresponding operations

over polynomials. Let

k

m

f (x) = ai x i , g(x) = bi x i

i=0 i=0

be two polynomials and assume that deg (f ) = k ≥ m = deg (g). Then we deﬁne

k

f (x) + g(x) := (ai + bi )x i ,

i=0

where for i > deg (g) we assume that bi = 0. Multiplication is deﬁned in such a way

that x i · x j = x i+j is true. The only way to do this is to set

⎛ ⎞

k+m

i

f (x)g(x) := ⎝ aj bi−j ⎠ x i .

i=0 j=0

The same convention also works here: ap = 0, when p > deg (f ), and bq = 0, when

q > deg (g).

By deﬁning these two operations we obtain an algebraic object which is called

the polynomial ring over F; it is also denoted by F[x].

from Z2 [x]. Then

yourself.)

5.1 The Ring of Polynomials 135

We observe that

Proposition 5.1.1 For any two nonzero polynomials f , g ∈ F[x]

1. deg (f + g) ≤ max(deg (f ), deg (g));

2. deg (f g) = deg (f ) + deg (g) and, in particular, F[x] has no zero divisors.

Division with remainder is also possible.

Theorem 5.1.1 (Division Algorithm) Given polynomials f (x) and g(x) in F[x] with

g(x) = 0, there exist a “quotient” q(x) ∈ F[x] and a “remainder” r(x) ∈ F[x] such

that

f (x) = g(x)q(x) + r(x)

and either r(x) = 0 or deg (r) < deg (g). Moreover, the quotient and the remainder

are uniquely deﬁned.

Proof Let k m

f (x) = ai x i , g(x) = bi x i

i=0 i=0

be two polynomials with deg (f ) = k and deg (g) = m. Then there are two cases to

consider:

Case 1. If k < m, then we can set q(x) = 0 and r(x) = f (x).

Case 2. If k ≥ m, we can deﬁne

−1

f1 (x) = f (x) − bm ak x k−m g(x) = f (x) − g(x)q1 (x),

where q1 (x) = bm −1 a x k−m . This polynomial f (x) will be of smaller degree than f ,

k 1

since f (x) and q1 (x)g(x) have the same degree m and the same leading coefﬁcient

am . By induction hypothesis,

Then

g(x)(q1 (x) − q2 (x)) = r2 (x) − r1 (x).

136 5 Polynomials

This cannot happen for r1 (x) = r2 (x) since the degree of the right-hand side is

smaller than the degree of the left-hand side. Thus r2 (x) − r1 (x) = 0. This can

happen only when q1 (x) − q2 (x) = 0, since F[x] has no zero divisors. �

The quotient and the remainder can be computed by the following “polynomial

long division” process, commonly taught in high school. For example, let us consider

polynomials f (x) = x 4 + x 3 + x 2 + x + 1 and g(x) = x 2 + 1 from Z2 [x]. Then

x2 + x

x2 +1 x4 + x3 + x2 + x + 1

x4 + x2

x 3 + x +1

x3 +x

+1

encodes a division with remainder of the polynomial f (x) by g(x). It shows that the

quotient q(x) and the remainder r(x) are

q(x) = x 2 + x, r(x) = 1,

that is

x 4 + x 3 + x 2 + x + 1 = (x 2 + x)(x 2 + 1) + 1.

We say that a polynomial f (x) is divisible by g(x) if f (x) = q(x)g(x), i.e., when the

remainder is zero.

A polynomial (5.1) deﬁnes a function f : F → F with

k

f (α) = ai α i .

i=0

In Analysis this function is always identiﬁed with the polynomial itself. However,

working over a ﬁnite ﬁeld we cannot do this. Indeed, 12 + 1 = 0 and 02 + 0 = 0. So

the polynomial f (x) = x 2 + x over Z2 is non-zero but the function associated with

it is the zero function.

Deﬁnition 5.1.3 An element α ∈ F is called a root1 of f (x) if f (α) = 0.

Proposition 5.1.2 An element α ∈ F is a root of a polynomial f (x) if and only if

f (x) = g(x)(x − α), i.e., f (x) is divisible by x − α.

1A purist would talk about a zero of the polynomial f (x) but a root of the equation f (x) = 0. We

are not making this distinction.

5.1 The Ring of Polynomials 137

f (x) = q(x)(x − α) + r,

where r ∈ F is the remainder and q(x) is the quotient. Substituting α in this equation

we get 0 = 0 + r, whence r = 0 and f (x) is divisible by (x − α) and q(x) can be

taken as g(x). Conversely, if f (x) = g(x)(x − α), then f (α) = g(α) · 0 = 0. �

k

f (x) = ai x i , ai ∈ F, (5.2)

i=0

Proof Suppose that α1 , . . . , αk+1 ∈F are distinct roots of f (x). By Proposition 5.1.2

Since in any ﬁeld there are no divisors of zero we conclude that g1 (α2 ) = 0 and by

Proposition 5.1.2

g1 (x) = (x − α2 )g2 (x), deg (g2 ) = k − 2.

that c = ak and

f (x) = ak (x − α1 )(x − α2 ) . . . (x − αk ). (5.5)

138 5 Polynomials

Exercises

f (x) = 5x 4 + x 2 + 3x + 4, g(x) = 3x 2 + 2x + 1.

2. Find the roots of f (x) = x 4 +2x 3 +2x 2 +2x+1 in Z5 [x]. Hence ﬁnd a factorisation

of f (x) into linear factors.

polynomial.

be arbitrary elements of F. Then there exists no more than one polynomial f (x) of

degree at most k such that f (αi ) = βi for i = 0, 1, . . . , k.

k i

Proof Suppose that two distinct polynomials f (x) = i=0 ai x and g(x) =

k i

i=0 bi x satisfy f (αi ) = βi and g(αi ) = βi for all i = 0, 1, 2, . . . , k. Then the

polynomial h(x) = f (x) − g(x) is not zero, and its degree is not greater than k. Also

and h(x) has at least k+1 distinct roots α0 , α1 , . . . , αk . However, by Proposition 5.1.3

this is impossible. �

arbitrary elements of F. Then there exists a unique polynomial

k

(x − α0 ) . . . (x − αi−1 )(x − αi+1 ) . . . (x − αk )

f (x) = βi (5.6)

(αi − α0 ) . . . (αi − αi−1 )(αi − αi+1 ) . . . (αi − αk )

i=0

Proof The polynomial (5.6) was constructed as follows. We ﬁrst constructed poly-

nomials gi (x) of degree k such that gi (αi ) = 1 and gi (αj ) = 0 for i = j. These

polynomials are:

gi (x) = .

(αi − α0 ) . . . (αi − αi−1 )(αi − αi+1 ) . . . (αi − αk )

5.1 The Ring of Polynomials 139

Then the desired polynomial was constructed as f (x) = ki=0 βi gi (x). We immedi-

ately see that f (αi ) = βi , as required. This polynomial is unique because of Propo-

sition 5.1.4. �

F = Z5 with the properties: f (1) = 2, f (2) = 4, f (3) = 4. We apply Theorem 5.1.2

to the case F = Z5 , k = 2, α0 = 1, α1 = 2, α2 = 3, β0 = 2, β1 = 4, β2 = 4.

The formula tells us that

(x − 2)(x − 3) (x − 1)(x − 3) (x − 1)(x − 2)

f (x) = 2 +4 +4

(1 − 2)(1 − 3) (2 − 1)(2 − 3) (3 − 1)(3 − 2)

have to expand all the expressions, bearing in mind that all the arithmetic is in Z5 :

x2 + 1 x2 + x + 3 x 2 + 2x + 2

f (x) = 2 +4 +4 =

4·3 1·4 2·1

x 2 + 1 + (x 2 + x + 3) + 2(x 2 + 2x + 2) = 4x 2 + 3.

(You can easily check that indeed f (1) = 2, f (2) = 4, f (3) = 4. Do it!)

Note: a simple alternative to using the formula is to calculate the coefﬁcients of

the desired polynomial as the unique solution of a system of linear equations: if

f (x) = ax 2 + bx + c and f (1) = 2, f (2) = 4, f (3) = 4, we have the system

⎧

⎨ a + b + c = 2,

4a + 2b + c = 4,

⎩

4a + 3b + c = 4.

b = 0, c = 3, conﬁrming the result obtained by the previous method. Another way

to solve this system of linear equation is of course to calculate the inverse of the

matrix of this system and multiply it by the column on the right-hand side.

k

f (x) = ai x i , ai ∈ F,

i=0

F and β1 , . . . , βk be arbitrary elements of F. Then there exists a unique polynomial

f (x) of degree k in this class such that f (αi ) = βi for i = 1, 2, . . . , k.

5.1.2. �

140 5 Polynomials

Exercises

1. Use Lagrange interpolation to ﬁnd f (x) = 2i=0 ai x i ∈ Z7 [x] with f (1) = f (2) =

1 and f (3) = 2.

2. Find the constant term of the polynomial f (x) of degree no greater than 2 with

coefﬁcients in Z7 such that f (1) = 3, f (3) = 2, f (4) = 1.

3. Find the constant term of the polynomial f (x) of degree at most 3 in Z7 such that

4. Use GAP to ﬁnd a polynomial f (x) ∈ Z13 [x] of degree at most 3 such that

cryptography, namely to secret sharing.

k

f (x) = ai x i , ai ∈ F, (5.7)

i=0

degree k over F.

degree 5. The polynomial g(x) = x 2 + 2x 5 − 1 is not monic.

Deﬁnition 5.1.5 A polynomial f (x) from F[x] is said to be reducible over F if there

exist two polynomials f1 (x) and f2 (x) from F[x], each of degree greater than or

equal 1, such that f (x) = f1 (x)f2 (x). Otherwise f (x) is said to be irreducible over F.

over C since f (x) = (x − i)(x + i). The polynomial g(x) = x 2 − 2 is irreducible

over Q and reducible over R. The polynomial h1 (x) = x 2 + 2 ∈ Z5 [x] is irreducible

over Z5 , and h2 (x) = x 2 + 2 = (x + 3)(x + 8) ∈ Z11 [x] is reducible over Z11 .

ily on the ﬁeld under consideration. We will be especially interested in irreducible

polynomials over Z2 . Of course, both linear polynomials x and x + 1 are irreducible.

Since x 2 , (x + 1)2 = x 2 + 1, x(x + 1) = x 2 + x are reducible, the only irreducible

polynomial of degree 2 is x 2 + x + 1. There are eight polynomials of degree 3:

5.1 The Ring of Polynomials 141

f1 (x) = x 3 ,

f2 (x) = x 3 + 1,

f3 (x) = x 3 + x + 1,

f4 (x) = x 3 + x,

f5 (x) = x 3 + x 2 ,

f6 (x) = x 3 + x 2 + 1,

f7 (x) = x 3 + x 2 + x,

f8 (x) = x 3 + x 2 + x + 1.

only if it has no roots in F.

Proof If f (x) is irreducible clearly it has no linear factors, nor by Proposition 5.1.2

does it have any roots in F. Conversely, suppose that f (x) has no roots in F. If it

is reducible, then f (x) = g(x)h(x), where either g(x) or h(x) has degree 1, and

polynomial of degree 1 always has a root in F. This gives us a contradiction to

Proposition 5.1.2. �

Returning to our list, we know that any reducible polynomial f (x) of degree 3 has

a root in Z2 , i.e., either f (0) = 0 or f (1) = 0. Six out of the eight polynomials in the

table have roots in Z2 and only f3 (x) = x 3 + x + 1 and f6 (x) = x 3 + x 2 + 1 do not

have roots, hence they are the only two irreducible polynomials.

polynomial over F of degree not greater than n2 , then it is irreducible over F.

Proof If f (x) is reducible over F, then f (x) = g(x)h(x), where g(x), h(x) ∈ F[x]

both have degrees at least one. Then at least one of them will have degree not greater

than n2 . Any of its irreducible factors will have degree not greater than n2 . Hence,

if there are no irreducible polynomials over F of degree not greater than n2 that

divide f (x), it must be irreducible over F. �

over Z2 . We check that f (0) = f (1) = 1, that is f (x) has no roots in Z2 . But does this

imply its irreducibility? Not at all. The absence of roots means the absence of linear

factors. However it is possible that a polynomial of degree ﬁve has no linear root

but is reducible by having one quadratic irreducible factor and another one of degree

three. We now have to check that there are no quadratic irreducible factors. The only

possible irreducible quadratic factor is x 2 + x + 1, so we have to divide f (x) by

x 2 + x + 1 and calculate the remainder. We ﬁnd that f (x) = (x 2 + x + 1)(x 3 + x + 1).

Hence f (x) is reducible.

142 5 Polynomials

Irreducible polynomials play a similar role to that played by prime numbers. The

following theorem can be proved using the same ideas as for integers.

Theorem 5.1.4 Any polynomial f (x) from F[x] of degree no less than 1 can be

uniquely represented as a product

where p1 (x), p2 (x), . . . , pk (x) ∈ F[x] are monic irreducible (over F) polynomials, c

is a non-zero constant, and α1 , α2 , . . . , αk are positive integers. This representation

is unique apart from the order of p1 (x), p2 (x), . . . , pk (x).

Exercises

(a) If f (x) has a root in F then f (x) is reducible in F[x].

(b) If f (x) is reducible in F[x] then f (x) has a root in F.

2. Find all irreducible quadratic polynomials in Z3 [x].

3. Explain why checking irreducibility is much easier for cubic (degree 3) polyno-

mials than for quartic (degree 4) polynomials.

4. Which of the following polynomials are irreducible in Z3 [x]:

(i) f (x) = x 3 + 2x + 2,

(ii) g(x) = x 4 + 2x 3 + 2x + 1,

(iii) h(x) = x 4 + x 3 + x 2 + x + 1?

5. Represent f (x) = x 5 + x + 1 ∈ Z2 [x] as a product of irreducible polynomials.

6. Show that f (x) = x 5 + x 2 + 1 ∈ Z2 [x] is an irreducible polynomial over Z2 .

Deﬁnition 5.1.6 Let F be a ﬁeld and f (x), g(x) be two polynomials from F[x]. A

monic polynomial d(x) ∈ F[x] is called the greatest common divisor of f (x) and

g(x) iff:

(a) d(x) divides both f (x) and g(x), and

(b) d(x) is of maximal degree with the above property.

The greatest common divisor of f (x) and g(x) is denoted gcd(f (x), g(x)) or

gcd(f , g)(x). Its uniqueness follows from the following

Theorem 5.1.5 (The Euclidean Algorithm) Let f and g be two polynomials. We use

the division algorithm several times to ﬁnd:

5.1 The Ring of Polynomials 143

g = q2 r1 + r2 , deg (r2 ) < deg (r1 ),

r1 = q3 r2 + r3 , deg (r3 ) < deg (r2 ),

..

.

rs−2 = qs rs−1 + rs , deg (rs ) < deg (rs−1 ),

rs−1 = qs+1 rs .

Then all common divisors of f and g are also divisors of rs . Moreover, rs divides

both f and g. Thus rs = gcd(f , g).

Theorem 5.1.6 (The Extended Euclidean Algorithm) Let f and g be two polynomi-

als. Let us form the following matrix with two rows R1 , R2 , and three columns C1 ,

C2 , C3 :

f 10

(C1 C2 C3 ) = .

g01

operations R3 := R1 − q1 R2 , R4 := R2 − q2 R3 , . . ., each time creating a new row,

so as to obtain: ⎡ ⎤

f 1 0

⎢g 0 1 ⎥

⎢ ⎥

⎢ r1 1 −q1 ⎥

⎢ ⎥

(C1 C2 C3 ) = ⎢ r2 −q2 1 + q1 q2 ⎥ .

⎢ ⎥

⎢ .. ⎥

⎣ . ⎦

rs m n

Z2 [x]. We write:

x 4 + x 3 + x 2 + 1 = (x 4 + x 2 + x + 1) · 1 + (x 3 + x)

x 4 + x 2 + x + 1 = (x 3 + x) · x + (x + 1)

x 3 + x = (x + 1) · (x 2 + x).

So gcd(f , g)(x) = x + 1.

144 5 Polynomials

x4 + x3 + x2 + 1 1 0

x4 + x2 + x + 1 0 1

x3 + x 1 1

x+1 x x+1

Deﬁnition 5.1.7 Two polynomials f (x), g(x) ∈ F[x] are said to be coprime (rela-

tively prime) if gcd(f , g)(x) = 1.

Corollary 5.1.2 Two polynomials f (x), g(x) ∈ F[x] are coprime if and only if there

exist polynomials m(x), n(x) ∈ F[x] such that

1 = f (x)m(x) + g(x)n(x).

Deﬁnition 5.1.8 Let F be a ﬁeld and f (x), g(x) be two polynomials from F[x]. A

monic polynomial m(x) ∈ F[x] is called the least common multiple of f (x) and g(x)

if:

(a) m(x) is a multiple of both f (x) and g(x);

(b) m(x) is of minimal degree with the above property.

It is denoted lcm(f (x), g(x)) or lcm(f , g)(x).

All the usual properties of the least common multiple are satisﬁed. For example,

as for the integers, we can prove:

Theorem 5.1.7 Let f (x) and g(x) be two monic polynomials in F[x]. Then

polynomials in Z2 [x]. We know that gcd(f , g)(x) = x + 1. Hence lcm(f , g)(x) =

f (x)g(x) 7 6 5 4 2

x+1 = x + x + x + x + x + 1.

Exercises

1. Find the greatest common divisor d(x) of the polynomials f (x) = x 7 + 1 and

g(x) = x 3 + x 2 + x + 1 in Z2 [x] and represent it in the form d(x) = f (x)m(x) +

g(x)n(x).

5.2 Finite Fields 145

of all polynomials of degree lower than n. This is exactly the set of all possible

remainders on division by m(x). Clearly F[x]/(m(x)) is an n dimensional vector

space over F spanned by the monomials 1, x, . . . , x n−1 .

Let f (x) be a polynomial from F[x] and r(x) be its remainder on division by m(x).

We deﬁne

r(x) = f (x) mod m(x).

We will also write f (x) ≡ g(x) mod m(x) if f (x) mod m(x) = g(x) mod m(x).

Note that f (x) mod m(x) belongs to F[x]/(m(x)) for all f (x) ∈ F[x].

Let us now convert F[x]/(m(x)) into a ring2 by introducing the following addition

and multiplication:

f (x) g(x) := f g(x) mod m(x). (5.9)

Note that the ‘new’ addition is not really new as it coincides with the old one.

But we do indeed get a new multiplication. All properties of a commutative ring for

F[x]/(m(x)) can be easily veriﬁed.

Example 5.2.1 Let us consider the ring R[x]/(x 2 + 1). Since deg (x 2 + 1) = 2, this

is a 2-dimensional space over the reals with basis {1, x}. The addition is

≡ (ac − bd) · 1 + (ad + bc)x.

One must be able to recognise the complex numbers (with x playing the role of i).

In mathematical language the ring R[x]/(x 2 + 1) is said to be isomorphic to C.

As in the case of the integers, and by using the same approach, we can prove

2 Those familiar with the basics of abstract algebra will recognise the quotient-ring of F[x] by the

principal ideal generated by m(x).

146 5 Polynomials

Proof Suppose m(x) is of degree n and is irreducible over F. Then we need to show

that every non-zero polynomial f (x) ∈ F[x]/(m(x)) is invertible. We know that

deg (f ) < n. Since m(x) is irreducible we have gcd(f , m) = 1 and by the Extended

Euclidean Algorithm we can ﬁnd a(x), b(x) ∈ F[x] such that a(x)f (x) + b(x)m(x) =

1. Let us divide a(x) by m(x) with remainder: a(x) = q(x)m(x) + r(x) and substitute

into the previous equation. We will obtain

This means that r(x)
f (x) = 1 in F[x]/(m(x)), thus f (x) is invertible and r(x) is

its inverse.

On the other hand, if m(x) is not irreducible, we can write m(x) = n(x)k(x),

which will lead to n(x)
k(x) = 0 in F[x]/(m(x)). Then, having divisors of zero,

by Lemma 1.4.2 F[x]/(m(x)) cannot be a ﬁeld. �

From now on, we will not use the special symbols ⊕ and
to denote the operations

in F[x]/(m(x)); this will invite no confusion.

Example 5.2.2 Prove that K = Z2 [x]/(x 4 + x + 1) is a ﬁeld, and determine how

many elements it has. Then ﬁnd (x 3 + x 2 )−1 .

Solution To prove that K is a ﬁeld we must prove that m(x) = x 4 + x + 1 is

irreducible. If it were reducible, then it would have a factor of degree 1 or 2. Since

m(0) = m(1) = 1, it does not have linear factors. So, if it is reducible, the only

possibility left is that it is the square of the only irreducible polynomial of degree 2,

that is (x 2 + x + 1)2 = x 4 + x 2 + 1. This does not coincide with m(x), hence m(x)

is irreducible. Hence K is a ﬁeld. Since dimZ2 K = deg (m(x)) = 4, K has 24 = 16

elements.

By using the Extended Euclidean algorithm we get

x4 + x + 1 1 0

3

x +x 2 0 1

x2 + x + 1 1 x+1

x x x2 + x + 1

1 x + x + 1 x3 + x

2

Thus (x 3 + x 2 )−1 = x 3 + x. �

Example 5.2.3 Let us continue to investigate K = Z2 [x]/(x 4 + x + 1) for a while.

We know that, as a ﬁnite ﬁeld, K must have a primitive element, in fact φ(15) = 8

of them. The polynomial x 4 + x + 1 is very convenient since x is one of the primitive

elements of K. Let us compute powers of x and place all elements of K in the table

below.

5.2 Finite Fields 147

Note that x 15 = 1, so logs are manipulated mod 15. We now have two different

representations of elements of K: as tuples (or polynomials) and as powers. The

ﬁrst representation is best for calculating additions and the second for calculating

multiplications and inverses.

0000 0

1000 1 1 0

0100 x x 1

0010 x2 x2 2

0001 x3 x3 3

1100 1+x x4 4

0110 x + x2 x5 5

0011 x2 + x3 x6 6

1101 1 + x + x3 x7 7

1010 1 + x2 x8 8

0101 x + x3 x9 9

1110 1 + x + x2 x 10 10

0111 x + x2 + x3 x 11 11

1111 1 + x + x2 + x3 x 12 12

1011 1 + x2 + x3 x 13 13

1001 1 + x3 x 14 14

1. (1 + x 2 )−1 = (x 8 )−1 = x −8 = x 15−8 = x 7 = 1 + x + x 3 .

2. log(x+x 2 +x 3 ) = 11 and log(1+x+x 2 +x 3 ) = 12. Thus log(x+x 2 +x 3 )(1+x+

x 2 + x 3 ) = (11 + 12) mod 15 = 8, hence (x + x 2 + x 3 )(1 + x + x 2 + x 3 ) = 1 + x 2 .

Theorem 5.2.1 allows us to construct a ﬁeld of cardinality pn for any prime p and

any positive integer n. All we need to do is to take Zp and an irreducible polynomial

m(x) of degree n. Then Zp [x]/(m(x)) is the desired ﬁeld. In this book we will not

prove that for any p and any positive integer n such a polynomial indeed exists

(although it does!). Moreover, for any prime p and positive integer n the ﬁeld of

pn elements is unique up to an isomorphism. This is why it is denoted GF(pn ) and

called the Galois3 ﬁeld of cardinality pn . Again, proving its uniqueness is beyond

the scope of this book.

Theorem 5.2.2 For any prime p and any positive integer n there exists a unique,

up to isomorphism, ﬁeld GF(pn ) consisting of pn elements.

In the Advanced Encryption Standard (AES) algorithm, adopted in 2001, the

ﬁeld GF(28 ) is used for calculations. This ﬁeld is constructed with the use of the

irreducible polynomial m(x) = x 8 + x 4 + x 3 + x + 1.

3 See Sect. 3.1.3 for a brief historical note about this mathematician.

148 5 Polynomials

Exercises

1 + x + x 2 + x 3 in F = Z2 [x]/(x 5 + x 3 + 1).

2. Let F = Z3 [x]/(x 2 + 2x + 2).

(a) Prove that F is a ﬁeld.

(b) List all elements of F.

(c) Show that 2x + 1 is a primitive element in F by calculating all powers of

2x + 1 and constructing the ‘logarithm table’ as in Example 5.2.3.

(d) Using the ‘logarithm table’ which you created in part (c), calculate

(e) How many primitive elements are there in the ﬁeld F? List them all.

3. (advanced) Let f (x) = a0 + a1 x + · · · + an x n be a polynomial from F[x], where

F is any ﬁeld. We deﬁne the derivative of f (x) by the formula:

f

(x) = a1 + 2a2 x + · · · + nan x n−1 .

(a) Check that the product rule holds for such a derivative.

(b) Prove that any multiple root of f (x) is also a root of gcd(f (x), f

(x)).

n

(c) Let p be a prime. Prove that the polynomial f (x) = x p − x does not have

multiple roots in any ﬁeld F of characteristic p.

Let F and K be two ﬁelds such that F ⊆ K. We say that F is a subﬁeld of K and that

K is an extension of F if the addition and multiplication in K, being restricted to F,

coincide with the operations in F of the same name.

0 and 1 of K = Z2 [x]/(x 4 + x + 1). So Z2 is a subﬁeld of K = Z2 [x]/(x 4 + x + 1).

polynomial f (t) ∈ F[t] is an annihilating polynomial of a if a is a root of f (t), i.e.,

f (a) = 0. (Please note that the coefﬁcients of f (t) lie in F while a is an element

of K.) A polynomial f (t) ∈ F[t] is called the minimal annihilating polynomial of a

over F if it is an annihilating polynomial which is monic and of minimal possible

degree.

Example 5.2.5 In the extension R ⊆ C, check that the polynomial f (t) = t 2 −2t +2

is the minimal annihilating polynomial for a = 1 + i over R.

5.2 Finite Fields 149

so f (t) is annihilating for a. At the same time there can be no linear annihilating

polynomial. Such polynomial would have real coefﬁcients and hence would be of

the form g(t) = t − r, where r ∈ R. Substituting a will give (1 + i) − b = 0, which

is not possible. �

which is at most quadratic.

t 4 + t + 1 is the minimal annihilating polynomial for x.

an annihilating polynomial for x. On the other hand, if it were possible to ﬁnd an

annihilating polynomial of degree 3 or smaller, say g(t) = αt 3 + βt 2 + γ t + δ1 with

at least one coefﬁcient non-zero, then

αx 3 + βx 2 + γ x + δ1 = 0,

which means that 1, x, x 2 , x 3 are linearly dependent over Z2 . But this was a basis of

Z2 [x]/(x 4 + x + 1), so we have drawn a contradiction. �

a ∈ K. Then the minimal annihilating polynomial for a has degree at most n.

the dimension of K over F is n, these n + 1 vectors must be linearly dependent over

F. Thus there exist c0 , c1 , . . . , cn ∈ F, not all zero, such that

c0 1 + c1 a + c2 a2 + · · · + cn an = 0.

This is the same as saying that f (a) = 0 for f (t) = c0 + c1 t + · · · + cn t n from F[t],

so we have found an annihilating polynomial of degree at most n. �

(i) The minimal annihilating polynomial of a is irreducible over F.

(ii) Every annihilating polynomial of a is a multiple of the minimal annihilating

polynomial of a.

Proof (i) Suppose that f (t) is the minimal annihilating polynomial of a and that it

is reducible, i.e., f (t) = g(t)h(t), where g(t) and h(t) can be considered monic and

each of degree strictly less that deg (f ). Then 0 = f (a) = g(a)h(a), whence (no zero

divisors in K) either g(a) = 0 or h(a) = 0, which contradicts the minimality of f (t).

150 5 Polynomials

(ii) Suppose that f (t) is the minimal annihilating polynomial of a and g(t) is any

other annihilating polynomial of a. Let us divide g(t) by f (t) with remainder:

0 = 0 + r(a), from which r(a) = 0 but the degree of r(t) is strictly smaller than that

of f (t), and thus we have arrived at a contradiction. �

To calculate the minimal annihilating polynomial we use the Linear Dependency

Relationship Algorithm (see Appendix B for this algorithm). Suppose we need to

ﬁnd the minimal annihilating polynomial of an element a ∈ K over a subﬁeld F of

K. Suppose n = dimF K. We choose any basis B of K over F. Then every element

x ∈ K can be represented by its coordinate column [x]B relative to the basis B.

For an element a ∈ K we consider the matrix A = ( [1]B [a]B [a2 ]B . . . [an ]B ).

Its columns are linearly dependent (as is any n + 1 vector in n-dimensional vector

space). By row reducing A to its reduced echelon form we ﬁnd the ﬁrst k such that

{[1]B , [a]B , [a2 ]B , . . . , [ak ]B } is linearly dependent. This reduced row echelon form

will also give us coefﬁcients c0 , c1 , c2 , . . . , ck−1 such that c0 [1]B +c1 [a]B +c2 [a2 ]B +

· · · + ck−1 [ak−1 ]B + [ak ]B = 0. Then f (x) = x k + ck−1 x k−1 + · · · + c1 x + c0 is the

minimal annihilating polynomial of a over F.

Example 5.2.7 In the extension Z2 ⊆ Z2 [x]/(x 4 + x + 1), ﬁnd the minimal annihi-

lating polynomial of a = 1 + x + x 3 .

Solution We calculate the coordinate tuples of the following powers of a:

a0 = (1 + x + x 3 )0 = 1 → 1000

a1 = (1 + x + x 3 )1 = 1 + x + x3 → 1101

a2 = (1 + x + x 3 )2 = 1 + x3 → 1001

a3 = (1 + x + x 3 )3 = x2 + x3 → 0011

a4 = (1 + x + x 3 )4 = 1 + x2 + x3 → 1011

These ﬁve are already linearly dependent, so we don’t have to compute any further

powers. Now we use Linear Dependency Relationship Algorithm to ﬁnd a linear

dependency between these tuples. We place them as columns in a matrix and take it

to the row reduced echelon form:

⎡ ⎤ ⎡ ⎤

11101 10001

⎢ 0 1 0 0 0 ⎥ rref ⎢ 0 1 0 0 0 ⎥

⎢ ⎥ ⎢ ⎥

⎣ 0 0 0 1 1 ⎦ −→ ⎣ 0 0 1 0 0 ⎦ ,

01111 00011

polynomials of degree ≤ 3) and that a4 = 1 + a3 , whence the minimal annihilating

polynomial of a will be f (t) = t 4 + t 3 + 1.

5.2 Finite Fields 151

Exercises

1. What is the dimension of the ﬁeld F = GF(24 ) over its subﬁeld F1 = GF(22 )?

2. Let K = Z2 [x]/(1 + x + x 4 ) as introduced in Example 5.2.3. Find the minimal

annihilating polynomial over Z2 for:

(a) α = 1 + x + x 2 ;

(b) α = 1 + x.

3. Let K be the ﬁeld K = Z2 [x]/(x 4 + x 3 + 1). Then K is an extension of Z2 .

(a) Create a table for K as in Example 5.2.3. Check that x is a primitive element

of this ﬁeld.

(b) Find the minimal annihilator polynomials for x, x 3 and x 5 over Z2 .

(c) Calculate (x 100 + x + 1)(x 3 + x 2 + x + 1)15 + x 3 + x + 1 in the most efﬁcient

way and represent it as a power of x and as a polynomial in x of degree at

most 3.

4. Generate a ﬁeld consisting of 16 elements using GAP. It will give you:

gap> F:=GaloisField(2ˆ4);

GF(2ˆ4)

gap> AsList(F);

[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2, Z(2ˆ4), Z(2ˆ4)ˆ2, Z(2ˆ4)ˆ3, Z(2ˆ4)ˆ4,

Z(2ˆ4)ˆ6, Z(2ˆ4)ˆ7, Z(2ˆ4)ˆ8, Z(2ˆ4)ˆ9, Z(2ˆ4)ˆ11, Z(2ˆ4)ˆ12, Z(2ˆ4)ˆ13,

Z(2ˆ4)ˆ14 ]

(a) Explain why Z(24 )5 and Z(24 )10 are not listed among the elements.

(b) Using GAP ﬁnd the polynomial in Z2 [x] of smallest degree of which Z(24 )7

is a root.

Chapter 6

Secret Sharing

and we are as a people inherently and historically opposed to

secret societies, to secret oaths, and to secret proceedings.

John F. Kennedy (1917–1963)

Secrecy is the ﬁrst essential in affairs of state.

Cardinal Richelieu (1585–1642)

Certain cryptographic keys, such as missile launch codes, numbered bank accounts

and the secret decoding exponent in an RSA public key cryptosystem, are so impor-

tant that they present a dilemma. If too many copies are distributed, one may be leaked.

If too few, they might all be lost or accidentally destroyed. Secret sharing schemes

invented by Shamir [1] and Blakley [2] address this problem, and allow arbitrarily

high levels of conﬁdentiality and reliability to be achieved. A secret sharing scheme

‘divides’ the secret s into ‘shares’—one for every user—in such a way that s can be

easily reconstructible by any authorised subset of users, but an unauthorised subset

of users can extract absolutely no information about s. A secret sharing scheme, for

example, can secure a secret over multiple servers and remain recoverable despite

multiple server failures.

Secret sharing also provides a mechanism to facilitate a cooperation—in both

human and artiﬁcial societies—when cooperating agents have different status with

respect to the activity and certain actions are only allowed to coalitions that satisfy

certain criteria, e.g., to sufﬁciently large coalitions or coalitions with players of

sufﬁcient seniority or to coalitions that satisfy a combination of both criteria. The

banking system where the employees are arranged into a hierarchy according to their

ranks or designations provides many examples. Simmons,1 for example, describes

the situation of a money transfer from one bank to another. If the sum to be transferred

is sufﬁciently large this transaction must be authorised by three senior tellers or two

vice-presidents. However, two senior tellers and a vice-president can also authorise

the transaction. Tassa2 provides another banking scenario. The shares of the vault

1 Simmons, G. (1990). How to (really) share a secret. In: Proceedings of the 8th annual international

2 Tassa,T. (2007). Hierarchical threshold secret sharing. Journal of Cryptology, 20, 237–264.

© Springer International Publishing Switzerland 2015 153

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_6

154 6 Secret Sharing

key may be distributed among bank employees, some of whom are tellers and some

are department managers. The bank policy could require the presence of, say, three

employees in opening the vault, but at least one of them must be a departmental

manager.

More formally, we assume that the set of users is U = {1, 2, . . . , n} and D is the

dealer who facilitates secret sharing.3 It is always assumed that the dealer knows the

secret.

Deﬁnition 6.1.1 Let 2U be the power set4 of the set of all users U . The set ⊆ 2U

of all authorised coalitions is called the access structure of the secret-sharing scheme.

An access structure may be any subset of 2U such that

property, it reﬂects the natural requirement that if a smaller coalition knows the

secret, then the larger one will know it too. The access structure is public knowledge

and all users know it.

Let ⊆ 2U be an access structure. A coalition C ⊆ U is called minimal autho-

rised coalition if it is authorised and any proper subset of C is not authorised. Due

to the monotone property (6.1) the access structure is completely deﬁned by the set

min of its minimal authorised coalitions.

We assume that every user participates in at least one minimal authorised coalition.

If not, such a user never brings useful information to any coalition of users and is

redundant.

Example 6.1.1 The threshold access structure “k-out-of-n” consists of all subsets

of 2U consisting of k or more users.

According to Time Magazine, May 4, 1992, a typical threshold access structure

was realized in USSR. The three top state ofﬁcials, the President, the Prime Minister,

and the Minister of Defense, each had the so-called “nuclear suitcase” and any two

of them could authorise a launch of a nuclear warhead. No one of them could do it

alone. So it was a two-out-of-three threshold scheme.

In a two-out-of-three scheme U = {1, 2, 3} and min = {{1, 2}, {1, 3}, {2, 3}}.

We see that all users are equally important. If, however, U = {1, 2, 3} and

4 The set of all subsets of U .

6.1 Introduction to Secret Sharing 155

min = {{1, 2}, {1, 3}}, then user 1 is much more important than the two other

users. Without user 1 the secret cannot be accessed. But user 1 is not almighty. To

access the secret she needs to join forces with at least one other user.

Here are a couple of real life examples.

Example 6.1.2 Consider the situation of a money transfer from one bank to another.

If the sum to be transferred is sufﬁciently large this transaction must be authorised

by three senior tellers or two vice-presidents. However, two senior tellers and a

vice-president can also authorise the transaction.

Example 6.1.3 The United Nations Security Council consists of ﬁve permanent

members and 10 non-permanent members. The passage of a resolution requires

that all ﬁve permanent members vote for it, and also at least nine members in total.

We will deal with threshold access structures ﬁrst. A very elegant construction by

Shamir realising the threshold access structure is based on Lagrange’s interpolation

polynomial and will be presented in the next section.

Exercises

1. Let U = {1, 2, 3, 4} and min = {{1, 2, 3}, {3, 4}}. List all authorised coalitions.

2. Write down the minimal authorised coalitions for the access structure in

Example 6.1.2. Assume that the vice-presidents are users 1 and 2 and the senior

tellers are users 3, 4, 5.

3. Find the number of minimal authorised coalitions in Example 6.1.3.

4. Let U1 and U2 be disjoint sets of users and let 1 and 2 be access structures over

U1 and U2 respectively. Let U = U1 ∪ U2 . Then

(a) The sum of 1 and 2 is 1 + 2 = {X ⊆ U | X ∩U1 ∈ 1 or X ∩U2 ∈ 2 }.

Prove that 1 + 2 is an access structure.

(b) The product of 1 and 2 is 1 × 2 = {X ⊆ U | X ∩ U1 ∈ 1 and

X ∩ U2 ∈ 2 }. Prove that 1 × 2 is an access structure.

5. Let be an access structure over a set of users U and let us deﬁne the dual

structure of as the set of complements of all unauthorised coalitions, i.e.,

= {X ⊆ U | X c ∈

/ }.

raphy, namely to secret sharing.

Suppose that the secret is a string of zeros and ones. We may assume that it is

the binary representation of a positive integer s. We choose a prime p which is

156 6 Secret Sharing

sufﬁciently large. Then the ﬁeld Z p is large and we may assume that s ∈ Z p without

any danger that it can be easily guessed. Thus our secret will always be an element

of a ﬁnite ﬁeld.

Suppose n users wish to share this secret by dividing it into ‘pieces’ in such a way

that any k people, where k is a ﬁxed positive integer not exceeding n, can learn the

secret from their pieces, but no subset of less than k people can do so. Here the word

“dividing” must not be understood literally. Shamir proposed the following elegant

solution to this problem. The secret can be “divided into pieces” as follows. The

centre:

1. generates k random coefﬁcients t0 , t1 , . . . , tk−1 ∈ Z p and sets the secret s to be

t0 ; k−1 i

2. forms the polynomial p(x) = i=0 ti x ∈ Z p [x];

3. gives user i the “piece” p(i), for i = 1, . . . , n. Practically it can be an electronic

card where a pair of numbers (i, p(i)) is stored.

Now, given any k values for p(x), one can use Theorem 5.1.2 to interpolate and to ﬁnd

all coefﬁcients of p(x) including the secret t0 = s. However, due to Corollary 5.1.1,

a subset of k−1 values for p(x) provides absolutely no information about s, since for

any possible s there is a polynomial of degree k−1 consistent with the given values

and the possible value of s.

Example 6.1.4 The company Dodgy Dealings Inc. has four directors. According to

a clause in the company’s constitution any three of them are allowed to get access

to the company’s secret offshore account. The company set up a Shamir’s threshold

access secret sharing scheme for facilitating this clause with the secret password

being an element of Z7 . According to this scheme the system administrator issued

magnetic cards to the directors as required.

Suppose that three directors with the following magnetic cards

1 2 4

3 0 6

gathered to make a withdrawal from their offshore account. Show how the secret

password can be calculated.

Solution. A quadratic polynomial p(x) = t0 + t1 x + t2 x 2 ∈ Z7 [x] satisﬁes

p(x) = 3 +6 =3 +6

(1 − 2)(1 − 4) (4 − 1)(4 − 2) 3 6

= (x 2 + x + 1) + (x 2 + 4x + 2) = 2x 2 + 5x + 3,

6.1 Introduction to Secret Sharing 157

If in Shamir’s scheme the enumeration of users is publicly known, then only the

value p(i) must be given to the ith user. In this case the secret s and each share p(i)

are both an element of the same ﬁeld and need the same number of binary digits to

encode them. As we will see one cannot do any better.

Exercises

1. According to the 3-out-of-4 Shamir’s threshold secret sharing scheme the admin-

istrator issued electronic cards to the users:

1 2 3 4

4 4 x 0

(b) Find x and determine the card of user 3.

2. Shamir’s secret sharing scheme is set up so that the secret is an element of Z31

and the threshold is 3 which means that any three users are authorised. Show how

the secret can be reconstructed from the shares

1 5 7

16 7 22

3. The league club Crawlers United has six senior board members. Each year the

club holds an anniversary day, and on this day the senior board members have a

duty to open the club vault, take out the club’s meager collection of trophies, and

put them on display. According to a clause in the club’s constitution any four of

them are allowed to open the vault. The club set up a Shamir’s threshold access

secret sharing scheme for facilitating this clause with the secret password being

an element of Z97 . According to this scheme the administrator issued electronic

cards to the senior board members as required.

Suppose that four senior board members are gathered to open the vault with the

following cards:

1 2 4 6

56 40 22 34

(b) Guess which cards were given to the two remaining senior board members.

158 6 Secret Sharing

Let us see now how we can deﬁne a secret sharing scheme formally.

Let S0 , S1 , . . . , Sn be ﬁnite sets where S0 will be interpreted as a set of all possible

secrets and Si will be interpreted as a set of all possible shares that can be given to

user i. Suppose |Si | = m i . We may think of a very large table, consisting of up to

M = m 0 m 1 · · · m n rows, where each row contains a tuple

(s0 , s1 , . . . , sn ), (6.2)

where si comes from Si (and all rows are distinct). Mathematically, the set of all such

(n + 1)-tuples is denoted by the Cartesian product S0 × S1 × . . . × Sn . Any subset

T ⊆ S0 × S1 × . . . × Sn

rows like the one shown in (6.2). If a secret s0 ∈ S0 is to be distributed among users,

then one (n + 1)-tuple

(s0 , s1 , . . . , sn ) ∈ T

is chosen by the dealer from T at random uniformly among those tuples whose ﬁrst

coordinate is s0 . Then user i gets the share si ∈ Si .

There is only one but essential component of a secret sharing scheme that we

have not introduced yet. We must ensure that every authorised coalition must be

able to recover the secret. Thus we need to have, for every authorised coalition

X = {i 1 , i 2 , . . . , i k } ∈ , a secret recovery function (algorithm)

with the property that f X (si1 , si2 , . . . , sik ) = s0 for every (s0 , s1 , s2 , . . . , sn ) ∈ T .

In particular, in the distribution table there cannot be tuples (s, . . . , si1 , . . . , si2 , . . . ,

sik , . . .) with s = s0 .

6.2 A General Theory of Secret Sharing Schemes 159

{{1, 2}, {1, 3}} with Si = Z3 for i = 0, 1, 2, 3 and the distribution table

⎡ ⎤

D 1 2 3

⎢0 0 0 0⎥

⎢ ⎥

⎢1 1 1 2⎥

⎢ ⎥

⎢0 1 2 1⎥

⎢ ⎥

⎢1 2 0 0⎥

T =⎢

⎢2

⎥. (6.3)

⎢ 2 2 1⎥⎥

⎢0 2 1 2⎥

⎢ ⎥

⎢2 1 0 0⎥

⎢ ⎥

⎣1 0 2 1⎦

2 0 1 2

The two secret recovery functions s0 = f {1,2} (s1 , s2 ) and s0 = f {1,3} (s1 , s2 ) can be

given by the tables

0 0 0 0 0 0

1 0 2 1 0 2

0 1 2 0 1 1

1 1 1 1 1 0

0 2 1 0 2 2

2 0 1 2 0 1

1 2 0 1 2 1

2 1 0 2 1 2

2 2 2 2 2 0

respectively. Note that the function f {2,3} does not exist. Indeed, when (s2 , s3 ) =

(0, 0) the secret s0 can take values 0, 1, 2 so f {2,3} (0, 0) is not deﬁned.

Example 6.2.2 (n-out-of-n scheme) Let us design a secret sharing scheme with n

users such that the only authorised coalition is the grand coalition, that is the set

U = {1, 2, . . . , n}. We need a sufﬁciently large ﬁeld F and set S0 = F so that it is

infeasible to try all secrets one by one. We will also have Si = F for all i = 1, . . . , n.

To share a secret s ∈ F the dealer generates n−1 random elements s1 , s2 , . . . , sn−1

∈ F and calculates sn = s − (s1 + · · · + sn−1 ). Then he gives share si to user i.

n distribution table T will consists of all n-tuples (s0 , s1 , s2 , . . . , sn ) such that

The

i=1 si = s0 and the secret recovery function (in this case the only one) will be

fU (s1 , s2 , . . . , sn ) = s1 + s2 + · · · + sn .

The distribution table is convenient for deﬁning the secret sharing scheme, how-

ever, in practical applications it is usually huge, so schemes are normally deﬁned

differently.

160 6 Secret Sharing

perfect if for every non-authorised coalition of users { j1 , j2 , . . . , jm } ⊂ U , for

every sequence of shares s j1 , s j2 , . . . , s jm with s jr ∈ S jr , and for every two

possible secrets s, s ∈ S0 the distribution table T contains as many tuples

(s, . . . , s j1 , s j2 , . . . , s jm , . . .) as tuples (s , . . . , s j1 , s j2 , . . . , s jm , . . .).

In other words, if the scheme is perfect a non-authorised coalition

X = { j1 , j2 , . . . , jm } with shares s j1 , s j2 , . . . , s jm will have no reason to believe

that the secret s was more likely to be chosen than any other secret s . For example,

in Example 6.2.1 if users 2 and 3 have shares 2 and 1, respectively, they will observe

the following rows of T

D 1 2 3

0 1 2 1

2 2 2 1

1 0 2 1

and will be unable to determine which row was chosen by the dealer. So the scheme

in that example is perfect.

The scheme from Example 6.2.2 is obviously perfect. Let us have another look at

the perfect secret sharing scheme invented by Shamir and specify the secret recovery

functions.

Example 6.2.3 ([1]) Suppose that we have n users and the access structure is now

= {X ⊆ U | |X | ≥ k}, i.e., a coalition is authorised if it contains at least k users.

Let F be a large ﬁnite ﬁeld and put Si = F for i = 0, 1, . . . , n. Let a1 , a2 , . . . , an

be distinct ﬁxed publicly known nonzero elements of F (in the earlier example we

took ai = i).

Suppose s ∈ F is the secret to share. The dealer randomly generates t0 , t1 , . . . , tk−1

∈ F, sets s = t0 , and forms the polynomial

Then she gives the share si = p(ai ) to user i. Note that s = p(0).

Suppose now X = {i 1 , i 2 , . . . , i k } is a minimal authorised coalition. Then the

secret recovery function is

k

(−ai1 ) · · · (−a ir ) · · · (−ai k )

f X (si1 , si2 , · · · , sik ) = sir ,

r =1 (air − ai1 ) · · · (ai

r − air ) · · · (air − ai k )

where the hat over the term indicates its non-existence. This is the value at zero of

Lagrange’s interpolation polynomial

k

(x − ai1 ) · · · (x − air ) · · · (x − aik )

p(air ) ,

r =1

(air − ai1 ) · · · (air − air ) · · · (air − aik )

6.2 A General Theory of Secret Sharing Schemes 161

We now may use the idea in Example 6.2.2 to construct a perfect secret sharing

scheme for an arbitrary access structure . We will illustrate this method in the

following.

Example 6.2.4 Let U = {1, 2, 3, 4} and min = {{1, 2}, {2, 3}, {3, 4}}. Let s ∈ Z p

be a secret. Firstly we consider three coalitions of users {1, 2}, {2, 3} and {3, 4}

separately and build 2-out-of-2 schemes on each of these sets of users. Under the

ﬁrst scheme users 1 and 2 will get shares a and s − a, under the second scheme users

2 and 3 get shares b and s − b and under the third scheme users 3 and 4 get shares c

and s − c. Thus altogether users will get the following shares:

1 ← a,

2 ← (s − a, b),

3 ← (s − b, c),

4 ← s − c.

Let us show that this scheme is perfect. For this we have to consider every maximal

non-authorised coalition and show that it has no clue about the secret. It is easy

to see that every coalition of three or more players is authorised. So the maximal

non-authorised coalitions will be {1, 3}, {1, 4}, {2, 4}. The coalition {1, 3} will know

values a, s − b and c. Since a, b, c were chosen randomly and independently, a, s − b

and c are also three random independent values which contain no information about

s. Similarly for {1, 4} and {2, 4}. Note that under this scheme users 2 and 3 will have

to hold as their shares two elements of Z p each. Their shares will be twice as long

as the secret (in binary representation).

Theorem 6.2.1 For any access structure there exists a perfect secret sharing

scheme which realises it.

Sketch of the proof. Let us consider the set min of all minimal authorised coalitions.

Suppose a user m belongs to q authorised coalitions W1 , W2 , . . . , Wq whose cardi-

nalities are m 1 , m 2 , . . . , m q . We then consider q separate smaller access structures

where the ith one will be deﬁned on the set of users Wi and will be an m i -out-of-m i

access structure. Let si be the share received by user i in this reduced access struc-

ture. So, in total, user i receives the vector of shares (s1 , s2 , . . . , sq ). As the access

structure is public knowledge, user i will use his share si only when an authorised

coalition with his participation contains Wi . If a coalition is not authorised, then

it does not contain any of the W1 , W2 , . . . , Wq and it is possible to show that its

participants cannot get any information about the secret. �

Under this method if a user belongs to k minimal authorised coalitions, then she

will receive k elements of the ﬁeld to hold as her share.

Suppose 2d−1 ≤ |S0 | < 2d or
log2 |S0 | = d. Then we can encode elements of

S0 (secrets) using binary strings of length d. In this case we say that the length of

162 6 Secret Sharing

the secret is d. Similarly we can talk about the lengths of the share that user i has

received. We say that the information ratio of the secret sharing scheme S is

n
log2 |Si |

i(S) = max .

i=1
log2 |S0 |

This number is the maximal ratio of the amount of information that must be conveyed

to a participating user to the amount of information that is contained in the secret.

In the secret sharing literature it is also common to use the term information rate,

which is the inverse of the information ratio. The information ratio of the scheme

constructed in Theorem 6.2.1 is terrible. For example, for the ( n2 +1)-out-of-n scheme

n

(assume that n is even) every user belongs to n/2 authorised coalitions, which by

√

Stirling’s formula grows approximately as 2n / n. More precisely, we will have

2 2n

i(S) ∼ ·√ ,

π n

i.e., the information ratio of such scheme grows exponentially with n. We know we

can do much better: the information ratio of Shamir’s scheme is 1. However, for

some access structures the information ratio can be large. It is not known exactly

how large it can be.

Exercises

1. Consider the secret sharing scheme with the following distribution table.

s0 s1 s2 s3 s4 s5 s6

0 0 0 1 1 2 2

0 0 0 2 2 1 1

0 1 1 2 2 0 0

0 1 1 0 0 2 2

0 2 2 0 0 1 1

0 2 2 1 1 0 0

1 0 1 1 2 2 0

1 0 2 2 1 1 0

1 1 2 2 0 0 1

1 1 0 0 2 2 1

1 2 0 0 1 1 2

1 2 1 1 0 0 2

(a) What is the domain of the secrets? What are the domains of the shares?

(b) Show that the coalition of users {1, 2} is authorised but {1, 3, 5} is not.

(c) Give the table for the secret recovery function for the coalition {1, 2}.

6.2 A General Theory of Secret Sharing Schemes 163

Let us look at Shamir’s scheme from a different perspective. We can observe that the

vector of the shares (where we think that the secret is the share of the dealer) can be

obtained by the following matrix multiplication as

⎡ ⎤

1 0 0 ... 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

⎢ 1 2 k−1 ⎥ t0 p(0) s0

⎢ a1 a1 . . . a1 ⎥ ⎢ t ⎥ ⎢ p(a ) ⎥ ⎢ s ⎥

⎢ ⎥⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1⎥

⎢ 1 a2 a22 . . . a2k−1 ⎥ ⎢ . ⎥ = ⎢ . ⎥ = ⎢ . ⎥, (6.5)

⎢ ⎥ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦

⎣... ... ... ... ... ⎦

tk−1 p(an ) sn

1 an an2 . . . ank−1

different and nonzero, any k rows of the matrix in (6.5) are linearly independent

since the determinant of the matrix formed by these rows is the well-known Van-

dermonde determinant (10.3). This is why any k users can learn all the coefﬁcients

t0 , t1 , t2 , . . . , tk−1 of p(x), including its constant term t0 (which is the secret).

Let us write (6.5) in the matrix form as H t = s, where

⎡ ⎤ ⎡ ⎤ ⎡ ⎤

1 0 0 ... 0 t0 s0

k−1 ⎥

⎢ 1

⎢ a1 a12 . . . a1 ⎥ ⎢ t1 ⎥ ⎢ s1 ⎥

⎢ ⎥ ⎢ ⎥

H =⎢

⎢ 1 a2 a22 . . . a2k−1 ⎥

⎥, t = ⎢ .. ⎥ , s = ⎢ .. ⎥, (6.6)

⎣... ⎣ ⎦ ⎣ . ⎦

... ... ... ... ⎦ .

1 an an2 . . . ank−1 tk−1 sn

and denote the rows of H as h0 , h1 , h2 , . . . , hn . Then the following is true: the span

of a group of distinct rows {hi1 , hi2 , . . . , hir }, none of which is h0 , contains h0 if and

only if r ≥ k. We may now deﬁne the k-out-of-n access structure as follows:

This can be generalised by considering matrices H other than the one in (6.6).

matrix with coefﬁcients in a ﬁnite ﬁeld F. Let h0 , h1 , . . . , hn be the rows of H .

Let us deﬁne a secret sharing scheme on the set of users U = {1, 2, . . . , n} as

follows. Choose the coefﬁcients of vector t = (t0 , t1 , . . . , tk−1 ) randomly, calculate

the vector s = (s0 , s1 , . . . , sn ) from the equation H t = s, declare s0 to be the secret

and s1 , s2 , . . . , sn the shares of users 1, 2, . . . , n, respectively. Then this is a perfect

secret sharing scheme realising the access structure H deﬁned as in (6.7).

164 6 Secret Sharing

equations. Multiplying both sides of this equation by t we obtain s0 = λ1 si1 +

λ2 si2 + · · · + λr sir , hence the secret s0 can be calculated from the shares of users

i 1 , i 2 , . . . , ir .

Suppose now that h0 is not in the span of {hi1 , hi2 , . . . , hir }. Without loss of gener-

ality we may assume that i 1 = 1, . . . , ir = r , i.e., that there are users 1, 2, . . . , r with

their shares s1 , s2 , . . . , sr and that h0 is not a linear combination of the h1 , h2 , . . . , hr .

Let Hr be the matrix with rows h1 , h2 , . . . , hr and H r be the matrix with rows

h0 , h1 , . . . , hr . By the assumption we have rank(H r ) = rank(Hr ) + 1.

Let sr be the column vector with entries s1 , s2 , . . . , sr and s̄r be the column vector

with entries s0 , s1 , s2 , . . . , sr . Since the system

Hr t = sr (6.8)

matrix of the system (6.8). As the matrix (H r | s̄r ) is obtained by adding just one

row to (Hr | sr ) its rank is either the same or larger by 1. On the other hand, it is

not smaller than the rank of H r . Since rank(H r ) = rank(Hr ) + 1 it will be true

that rank(H r | s̄r ) = rank(H r ) and this system is consistent for every s0 . Since

the dimension and hence the cardinality—remember F is ﬁnite—of the solution is

determined by the rank of H r only, we will have the same number of solutions to the

equation H r t = s̄r no matter what s0 was. So members of the coalition {i 1 , i 2 , . . . , ir }

will be unable to identify s0 , hence this coalition is not authorised. �

Example 6.2.5 Let U = {1, 2, 3} and min = {{1, 2}, {1, 3}}. We can realise this

access structure by a linear scheme. Consider the matrix

⎡ ⎤

1 0

⎢1 1⎥

H =⎢ ⎥

⎣ 1 −1 ⎦ .

2 −2

The dealer may choose two random elements t0 , t1 from a ﬁeld Z p for some large

prime p and calculate ⎡ ⎤

s0

⎢ s1 ⎥

⎢ ⎥ = H t0 ,

⎣ s2 ⎦ t1

s3

where s0 is taken as the secret and s1 , s2 and s3 are given as shares to users 1, 2 and

3, respectively. (Note that s0 = t0 .) If users 1 and 2 come together they can ﬁnd t0

and t1 from the system of linear equations

1 1 t0 s

= 1

1 −1 t1 s2

6.2 A General Theory of Secret Sharing Schemes 165

because the determinant of this system is nonzero. Similarly, 1 and 3 can also do

this. But, if 2 and 3 come together, they will face the system

1 −1 t0 s2

= ,

2 −2 t1 s3

which has exactly p solutions. Their shares therefore provide them with no informa-

tion about t0 and hence s0 .

Exercises

1. Determine the minimal authorised coalitions for the access structure realised by

the linear secret sharing scheme with the matrix

⎤

⎡

1 0

⎢1 1⎥

⎢ ⎥

H =⎢ ⎥

⎢ 2 −2 ⎥

⎣3 3⎦

4 −4

over Z11 .

2. Let F be a sufﬁciently large ﬁeld Z p . Find the access structure which is realised

by the linear secret sharing scheme with the matrix

⎡ ⎤

1 0 0

⎢1 1 1⎥

⎢ ⎥

⎢1 2 4⎥

⎢ ⎥

H =⎢

⎢1 3 9⎥⎥.

⎢0 0 1⎥

⎢ ⎥

⎣0 0 2⎦

0 0 3

3. Let F be a sufﬁciently large ﬁeld. Find the access structure which is realised by

the linear secret sharing scheme with the matrix

⎡ ⎤

1 0 0

⎢1 1 0⎥

⎢ ⎥

⎢1 2 0⎥

H =⎢ ⎥.

⎢1

⎢ 3 32 ⎥

⎥

⎣1 4 42 ⎦

1 5 52

4. Let F be a sufﬁciently large ﬁeld. Find the access structure which is realised by

the linear secret sharing scheme with the matrix

166 6 Secret Sharing

⎡ ⎤

1 0 0

⎢1 a1 0 ⎥

⎢ ⎥

⎢1 a2 0 ⎥

⎢ ⎥

H = ⎢1 a3 a32 ⎥ ,

⎢ ⎥

⎢ ⎥

⎣1 a4 a42 ⎦

1 a5 a52

5. A linear secret sharing scheme for the group of users U = {1, 2, 3, 4, 5} is

deﬁned by the matrix over Z31 :

⎡ ⎤ ⎡ ⎤

h0 1 0 0 0

⎢ h1 ⎥ ⎢ 1 2 3 0⎥

⎢ ⎥ ⎢ ⎥

⎢ h2 ⎥ ⎢ 1 3 3 0⎥

H =⎢ ⎥ ⎢

⎢ h 3 ⎥ = ⎢ 11

⎥.

⎢ ⎥ ⎢ 5 2 0⎥⎥

⎣ h4 ⎦ ⎣ 0 1 1 2⎦

h5 0 6 1 1

These users got shares 2, 27, 20, 10, 16, respectively, which are also elements

of Z31 . Let A = {1, 2, 3} and B = {1, 4, 5} be two coalitions.

(a) Show that one of the coalitions is authorised and the other is not.

(b) Show how the authorised coalition can determine the secret.

6. Let H be an (n + 1) × k matrix over a ﬁeld F and H be the access structure

deﬁned by the formula (6.7). Let us represent the ith row hi of this matrix as

hi = (ci , hi ), where ci ∈ F is the ﬁrst coordinate of hi and hi is a (k − 1)-

dimensional row vector of the remaining coordinates. Suppose the coalition

{i 1 , i 2 , . . . , ir } is not authorised in H . Then

r

r

λ j h j = 0 =⇒ λjcj = 0

j=1 j=1

for all λ1 , λ2 , . . . , λr .

7. Let U and V be disjoint sets of k and m users, respectively. Let M and N be two

matrices realising linear secret sharing schemes with access structures M and

N . Find the matrix realising the access structures

(a) M + N ,

(b) M × N

on the set of users U ∪ V .

8. Prove that the access structure min = {{1, 2}, {2, 3}, {3, 4}} on the set of users

U = {1, 2, 3, 4} cannot be realised by a linear secret sharing scheme.

9. Let n > 2. The access structure with the set of minimal authorized coalitions

6.2 A General Theory of Secret Sharing Schemes 167

scheme.

Given a secret sharing scheme with access structure , a user is called a dummy if

she does not belong to any minimal authorised coalition in min . A dummy user can

be removed from any authorised coalition without making it non-authorised.

Theorem 6.2.3 Let S0 be the set of possible secrets and Si be the set of possible

shares that can be given to user i in a secret sharing scheme S. If this scheme is

perfect and has no dummy users, then |Si | ≥ |S0 | for all i = 1, . . . , n or i(S) ≥ 1.

Proof Let i be an arbitrary user. Since no dummies exist, i belongs to one of the

minimal authorised coalitions, say X = {i 1 , i 2 , . . . , i k }, and with no loss of generality

we may assume that i = i k . Suppose that there is a tuple (s0 , s1 , . . . , sn ) ∈ T in the

distribution table where s0 is the secret shared and si1 , si2 , . . . , sik−1 are the shares

given to users i 1 , i 2 , . . . , i k−1 . Since the scheme is perfect the distribution table

contains tuples (s, . . . , si1 , . . . , si2 , . . . , sik−1 , . . .) for every s ∈ S0 . However, if we

add user i = i k we get the coalition X which is authorised and can recover the secret.

Thus, when the shares si1 , si2 , . . . , sik−1 of users i 1 , i 2 , . . . , i k−1 are ﬁxed the secret

depends on the share of the user i only. Hence for every possible secret s there is a

share t (s) which, if given to the user i, leads to recovery s as the secret by coalition

X and can be calculated using the secret recovery function f X of coalition X , that is

Deﬁnition 6.2.2 A secret sharing scheme S is called ideal if it is perfect and

i(S) = 1.

Ideal schemes are the most informationally efﬁcient having their information rate

equal to 1. By Theorem 6.2.3 this is the best possible rate for a perfect scheme. An

equivalent statement would be that |Si | = |S0 | for all i = 1, . . . , n. Normally in such

cases both secret and shares belong to the same ﬁnite ﬁeld. In particular, this is true

for Shamir’s secret sharing scheme given in Example 6.2.3. Indeed, if the elements

a1 , a2 , . . . , an are publicly known, the secret is p(0) and the share of the ith user is

p(ai ) for the polynomial p there deﬁned. More generally,

168 6 Secret Sharing

Proof We need to recap how the shares in this scheme are deﬁned. We have a (nor-

mally large) ﬁeld F and an (n + 1) × k matrix H over this ﬁeld. Then we deﬁne

a k-dimensional vector t over F at random and calculate the (n + 1)-dimensional

vector H t = s = (s0 , s1 , . . . , sn )T . Here s0 is the secret and si is the share of user i.

Both are elements of F. �

However there exist very simple access structures for which there are no ideal

secret sharing schemes (see [3] and [4]). Theorem 6.2.4 tells us that we have to look

for such examples among non-linear schemes.

Example 6.2.6 For the access structure of Example 6.2.4 with

Proof Suppose on the contrary there is an ideal secret sharing scheme S with the

distribution table T realising . Then for some positive integer q we have |Si | = q

for i = 0, 1, 2, 3, 4. For any subset I ⊆ {0, 1, 2, 3, 4} let T I be the restriction of T

to columns indexed by numbers from I and let #T I stand for the number of distinct

rows in T I . Let us ﬁrstly note that

Theorem 6.2.3 we conclude that for any secret s0 there will be exactly one value

s2 ∈ S2 such that (s0 , s1 , s2 , . . .) is a row in T . Hence there will be exactly q distinct

rows in T{1,2} with s1 in column 1. As |S1 | = q there are exactly q 2 distinct rows in

T{1,2} .

Let us now ﬁx arbitrary elements s0 ∈ S0 and s2 ∈ S2 . Since both {1, 2} and {2, 3}

are authorised, there will be unique s1 and s3 such that (s0 , s1 , s2 , s3 . . .) is a row in

T . In other words s1 uniquely determines s3 in any row of the distribution table. This

leads to the coalition {1, 4} being authorised. Indeed, since the table T is the public

knowledge users 1 and 4 can ﬁgure out the share given to user 3 and then can ﬁgure

out the secret since {3, 4} is authorised. �

The construction of the previous theorem leads us to a deﬁnition of a generalised

linear secret sharing scheme which may not be ideal.

Example 6.2.7 A family L of subspaces {L 0 , L 1 , . . . , L n } is said to satisfy property

“all or nothing” if for every subset X ⊂ {1, 2, . . . , n} the span span{L i | i ∈ X }

either contains L 0 or has zero intersection with it. Any such family deﬁnes a certain

access structure, namely

L = {X ⊆ U | span{L i | i ∈ X } ⊇ L 0 }.

6.2 A General Theory of Secret Sharing Schemes 169

Now the secret and the shares will be ﬁnite-dimensional vectors over F. Let

{L 0 , L 1 , . . . , L n } be subspaces of F k satisfying the property all-or-nothing. Let Hi

be the matrix whose rows form a basis of L i . Then we generate random vectors ti of

the same dimension as dim L i and calculate the secret and the shares as si = Hi ti ,

i = 0, 1, . . . , n. As in the Theorem 6.2.2 it leads to a perfect secret sharing scheme

realising L , however it may not be ideal as the following example shows.

matrices:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

1 0 00 10 0 00 1 00

⎢0 1⎥ ⎢0 0⎥ ⎢0 1 0⎥ ⎢0 0 0⎥ ⎢0 1⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢0 0⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ ⎥

H0T = ⎢ ⎥, HT = ⎢ 1 0 ⎥ , HT = ⎢ 1 0 ⎥, HT = ⎢0 0 ⎥, HT = ⎢0 0 ⎥ .

⎢0 0⎥ 1 ⎢0 1⎥ 2 ⎢0 1 0⎥ 3 ⎢0 1 0⎥ 4 ⎢0 0⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎣0 0⎦ ⎣0 0⎦ ⎣0 0 1⎦ ⎣0 0 1⎦ ⎣1 0⎦

0 0 00 00 0 10 0 01

This family satisﬁes the property all-or-nothing. The access structure associated with

it can be given by the set of minimal authorised coalitions as:

Since the secret is 2-dimensional and some shares are 3-dimensional the information

rate of such scheme will be 3/2. As 3/2 < 2 this is a more efﬁcient secret sharing

scheme realising than the one in Example 6.2.4. In fact, it can be proved that the

scheme for this example is optimal for in the sense that it gives the best possible

information rate.

Exercises

1. Let T be the distribution table of a perfect ideal secret sharing scheme with the

set of users, U = {1, 2, . . . , n}, the dealer 0 and the cardinality of the domain of

secrets q. Prove that

(i) If a coalition C is authorised and C = C ∪ {0}, then #TC = #TC ;

(ii) If a coalition C is not authorised and C = C ∪ {0}, then #TC = q · #TC .

2. Prove all the missing details in Example 6.2.8.

3. In this exercise we consider the case when for an access structure of a secret

sharing scheme with distribution table T all minimal authorised coalitions have

size 2. In this case min can be interpreted as edges of a graph G() deﬁned on

U = {1, 2, . . . , n}. We assume that this graph is connected. Let the cardinality of

the domain of secrets be q.

(i) Show that, if {i, j} ∈ min , then #T{i, j} = q 2 .

(ii) Prove that #TU ∪{0} = q 2 .

/ min , then #T{i, j} = q.

(iii) Prove that if {i, j} ∈

170 6 Secret Sharing

(iv) Prove that if {i, j} and { j, k} are both not authorised, then {i, k} is not autho-

rised too.

(v) Prove the following theorem proved in [3].

Theorem 6.2.5 Let be an ideal access structure such that all minimal authorised

coalitions have size 2 and G() is connected. Then the complementary graph of

G() is a disjoint union of cliques.5

References

1. Shamir, A.: How to share a secret. Commun. ACM 22, 612–613 (1979)

2. Blakley, G.R.: Safeguarding cryptographic keys. In: Proceedings of the National Computer

Conference, vol. 48, pp. 313–317 (1979)

3. Brickell, E.F., Davenport, D.M.: On the classiﬁcation of ideal secret sharing schemes. J. Cryptol.

4, 123–134 (1991)

4. Stinson, D.R.: An explication of secret sharing schemes. Des. Codes Cryptogr. 2, 357–390

(1992)

Chapter 7

Error-Correcting Codes

surprised to know the number of doctors who claim they are

treating pregnant men.

Isaac Asimov (1920–1992)

This chapter deals with the problem of reliable transmission of digitally encoded

information through an unreliable channel. When we transmit information from a

satellite, or an automatic station orbiting the moon, or from a probe on Mars, then

for many reasons (e.g., sun-bursts) our message can be distorted. Even the best

telecommunication systems connecting numerous information centres in various

countries have some non-zero error rate. These are examples of transmission in

space. When we save a ﬁle on a hard disc and then try to read it one month later, we

may ﬁnd that this ﬁle have been distorted (due, for example, to microscopic defects

on the disc’s surface). This is an example of transmission in time. The channels

of transmission in both cases are different but they have one important feature in

common: they are not 100 % reliable. In some cases even a single mistake in the

transmission of a message can have serious consequences. We will show how algebra

can help to address this important problem.

We think of a message as a string of symbols of a certain alphabet. The most

common is the alphabet consisting of two symbols 0 and 1. It is called the binary

alphabet and we can interpret these symbols as elements of the ﬁnite ﬁeld Z2 . Some

non-binary alphabets are also used, for example, we can use the symbols of any ﬁnite

ﬁeld F. But we will initially concentrate on the binary case.

The symbols of the message are transmitted through the channel one by one. Let

us see what can happen to them. Since mistakes in the channel do occur, we assume

that, when we transmit 0, with probability p > 1/2 we receive 0 and with probability

1 − p we receive 1 as a result of a mistake in the channel. Similarly, we assume that

transmitting 1 we get 1 with probability p and 0 with probability 1 − p. Thus we

assume that the probability of a mistake does not depend on the transmitted symbol.

In this case the channel is called symmetric. In our case we are talking about a binary

symmetric channel. It can be illustrated as follows:

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_7

172 7 Error-Correcting Codes

x⊕1

1−p

p

x

Here the error is modeled by means of addition modulo 2. Let x be the symbol to

be transmitted. If transmission is perfect, then x will also be the symbol received, but

if a mistake occurs, then the message received will be x ⊕ 1, where the addition is

in the ﬁeld Z2 . Indeed, 0 ⊕ 1 = 1 and 1 ⊕ 1 = 0. Thus the mistake can be modeled

algebraically as the addition of 1 to the transmitted symbol.

In practical situations p is very close to 1, however, even when p = 0.98, among

any 100 symbols transmitted, on average two will be transmitted with an error. Such a

channel may not be satisfactory to transfer some sensitive data and an error-correction

technique must be implemented.

Binary error-correcting codes are used when messages are strings of zeros and ones,

i.e., the alphabet is Z2 = {0, 1}.

If we transmit symbols of our message one by one, then there is no way that we can

detect an error. That is why we will try to split the message into blocks of symbols

of ﬁxed length m. Any block of m symbols

a1 a2 . . . am , ai ∈ Z2

2 . Note that

according to a long-established tradition in coding theory the messages are written

as row vectors—this is different from the convention used in most undergraduate

Linear Algebra courses where elements of Rm are viewed as column vectors. Since

we split all messages into blocks of length m we may consider all messages to have

a ﬁxed length m and view them as elements of the m-dimensional vector space Zm 2

over Z2 . When considering vectors as messages we will often omit commas, e.g.,

(1 1 1 0) is the vector (1, 1, 1, 0) treated as a message.

7.1 Binary Error-Correcting Codes 173

pose that a message a = (a1 , a2 , . . . , am ) ∈ Zm 2 was transmitted and b =

(b1 , b2 , . . . , bm ) ∈ Zm

2 was received with one mistake in position i. Then

b = a + ei ,

positions i1 , i2 , . . . , ik were damaged, then

b = a + ,

where = ei1 + · · · + eik is a vector with k ones and m − k zeros. In this case is

called the error vector.

2 is the number of nonzero

coordinates in x. It is denoted by wt(x).

b = (b1 , b2 , . . . , bm ) ∈ Zm

2 was received with k mistakes during the transmission,

then b = a + with wt() = k.

with wt() = 2.

Deﬁnition 7.1.2 The Hamming distance between two vectors x, y ∈ Zm 2 is the

number of coordinates in which these two vectors differ. It is denoted as d(x, y).

we have xi + yi = 0, if xi = yi , and xi + yi = 1, if xi = yi . Hence every i for which

xi = yi increases the weight of x + y by one. This proves the lemma. �

mitted and b = (b1 , b2 , . . . , bm ) ∈ Zm

2 was received. Then the fact that k mistakes

occur during the transmission is equivalent to d(a, b) = k.

2 , which

means that

174 7 Error-Correcting Codes

2. d(x, y) = d(y, x) for all x, y;

3. d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z.

Proof The ﬁrst two properties are obvious. Let us prove the third one. Suppose

that xi = zi and the position i contributes 1 to d(x, z). Then either xi = yi and

yi = zi or xi = yi and yi = zi . Hence the ith position will also contribute 1 to the

sum d(x, y) + d(y, z). Suppose now that xi = zi and the position i contributes 0

to d(x, z). Then either xi = yi = zi and the ith position contributes also 0 to the

sum d(x, y) + d(y, z) or xi = yi = zi and the ith position contributes 2 to the

sum d(x, y) + d(y, z). Hence the right-hand side is not smaller than the left-hand

side. �

The following sets play a special role in coding theory. For any x ∈ Zm2 we deﬁne

Bk (x) = {y ∈ Zm

2 | d(x, y) ≤ k}, and we call it the ball of radius k with centre x.

m m m

|Bk (x)| = + + ··· + . (7.1)

0 1 k

Proof Let y ∈ Bk (x). We may consider the “error vector” e such that y = x + e. Then

y ∈ Bk (x) if and only

if wt(e) ≤ k. It is enough to prove that, for each i = 1, . . . , k,

there are exactly mi vectors e ∈ Zm 2 such that wt(e) = i. Indeed, we must choose i

positions out of m in the zero vector and change the coordinates there to ones. Hence

every vector e with wt(e) = i corresponds to an i-element subset of {1, 2, . . . , m}.

We know that there are exactly

m m!

=

i i!(m − i)!

such subsets. (see, for example, [1], p. 271). Now it is clear that the formula (7.1)

counts all “error vectors” of weight at most k, and hence all vectors y which are at

Hamming distance k or less from x. �

m m m m(m − 1)

|B2 (x)| = + + =1+m+ .

0 1 2 2

7.1 Binary Error-Correcting Codes 175

Exercises

u = (1 1 0 1 1 1 0), v = (1 0 0 0 1 1 1).

(b) The vector x = (0 1 1 1 0 1 0 1 1 0) was sent through a binary channel and

y = (0 1 0 1 0 1 1 1 1 0) was received. How many mistakes have occurred?

Write down the error vector.

2. List all vectors of B2 (x) ⊂ Z42 , where x = (1 0 1 0).

3. Let x be a word in Z72 . How many elements are there in the ball B3 (x) of radius

3?

4. Explain why the cardinality of Bk (x) does not depend on x.

By now we have already understood the convenience of having all messages of equal

length, say m. Longer messages can be split into several shorter ones. The idea of

error-correction is to increase the length m of a transmitted message and to add to

each message several auxiliary symbols, the so-called check symbols, which will not

bear any information but will help to correct errors. Hence we increase the length of

every message from m to n, where m < n.

E : Zm n n m

2 → Z2 and a decoding function D : Z2 → Z2 ∪ {error} which satisﬁes

m

D(E(x)) = x for all x ∈ Z2 . Such a code is called a (binary) (m, n)-code.

E(x1 ) = E(x2 ), then x1 = D(E(x1 )) = D(E(x2 )) = x2 i.e., x1 = x2 .

2 ) are called codewords (or codevectors).

Example 7.1.4 (Parity check code) This code increases the length of a message by

1 adding only one check symbol which is the sum modulo 2 of all other symbols.

That is

E(x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xm+1 ),

where xm+1 = x1 + · · · + xm . Note that the sum of all coordinates for any of the

codevectors is equal to 0:

176 7 Error-Correcting Codes

Let us see now what happens if one mistake occurs. In this case for the received

vector y = (y1 , y2 , . . . , ym+1 ) we will get

y1 + · · · + ym+1 = x1 + · · · + xm+1 + 1 = 0 + 1 = 1.

(y1 , y2 , . . . , ym ), if y1 + y2 + · · · + ym+1 = 0

D(y1 , y2 , . . . , ym+1 ) =

error if y1 + y2 + · · · + ym+1 = 1,

Example 7.1.5 (Triple repetition code) This code increases the length of a message

threefold by repeating every symbol three times:

E(x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xm , x1 , x2 , . . . , xm , x1 , x2 , . . . , xm ).

Decoding may be organised as follows. To decide on the ﬁrst symbol the algorithm

inspects y1 , ym+1 , and y2m+1 . If the majority (two or three) of these symbols are 0’s,

then the decoding algorithm decides that a 0 was transmitted, while if the majority of

symbols are 1’s, then the algorithm decides that a 1 was sent. This code will correct

any single error but will fail to correct some double ones.

the geometric properties of the set of codewords E(Zm n

2 ) ⊂ Z2 and the properties of

the decoding function D.

Deﬁnition 7.1.5 Suppose that for all y ∈ Zn2 the vector x = D(y) is such that the

vector E(x) is the closest (in respect to the Hamming distance) codeword to y (any

of them if there are several within the same distance), then we say that the decoding

function D satisﬁes maximum likelihood decoding.

that, under the assumption that mistakes are random and independent, in the sym-

metric channel the probability of k mistakes during the transmission is less then the

probability of j mistakes if and only if j > k. Therefore, if our assumption on the

distribution of mistakes is true, the maximum likelihood decoding minimises the

probability of the decoder making a mistake and will always be assumed.

(a) C detects all combinations of k or fewer errors.

(b) For any codeword x the ball Bk (x) does not contain codewords different from x.

(c) The minimum distance between any two codewords is at least k + 1.

Proof We will prove that (c) ⇒ (b) ⇒ (a) ⇒ (c). Suppose that the minimum

distance between any two codewords is at least k + 1. Then, for any codeword x,

7.1 Binary Error-Correcting Codes 177

the ball Bk (x) does not contain any other codeword, hence (c) ⇒ (b). Further, if a

combination of k or fewer errors occurs, by Proposition 7.1.2 the received vector y

will be in Bk (x). As there are no codevectors in Bk (x), other than x, the error will be

detected, hence (b) ⇒ (a). Finally, for a maximum likelihood decoder to be able to

detect all combinations of k or fewer errors, for any codeword x all vectors in Bk (x)

must not be codewords. Hence the distance between any two codewords is at least

k + 1, thus (a) ⇒ (c). �

Theorem 7.1.4 For a code C the following statements are equivalent:

(a) C corrects all combinations of k or fewer errors.

(b) For any two codewords x and y of C the balls Bk (x) and Bk (y) do not intersect.

(c) The minimum distance between any two codewords of C is at least 2k + 1.

Proof We will prove that (c) ⇒ (b) ⇒ (a) ⇒ (c). Suppose that the minimum

distance between any two codewords is at least 2k + 1. Then, for any two codewords

x and y the balls Bk (x) and Bk (y) do not intersect. Indeed, if they did, then for a

certain z ∈ Bk (x) ∩ Bk (y)

d(x, z) ≤ k, d(y, z) ≤ k.

which is a contradiction, hence (c) ⇒ (b). Further, if no more than k mistakes happen

during the transmission of a vector x, the received vector y will be in the ball Bk (x)

and will not be in the ball of radius k for any other codeword. Hence y is closer to x

than to any other codevector. Since the decoding is a maximum likelihood decoding

y will be decoded to x and all mistakes will be corrected. Thus (b) ⇒ (a).

On the other hand, it is easy to see that if the distance d between two codewords

x and y does not exceed 2k, then certain combinations of k or fewer errors will not

be corrected. To show this let us change d coordinates of x, one by one, and convert

it into y:

x = x0 → x1 → · · · → xk → · · · → xd = y.

Then xk will be no further from y than from x. Hence if k mistakes take place and

the received vector is xk , then it may be decoded as y (even if d = 2k). This shows

that (a) ⇒ (c). �

Exercises

1. Consider the triple repetition (4, 12)-code. Find a necessary and sufﬁcient con-

dition on the error vector e = (e1 , e2 , . . . , e12 ) for the message to be decrypted

correctly. Give an example of an error vector e of Hamming weight 4 which the

code corrects.

178 7 Error-Correcting Codes

and write our messages into this array (in any but ﬁxed way). To every message

a = (a1 , a2 , . . . , am ) we add m1 + m2 additional symbols e1 , e2 , . . . , em1 and

f1 , f2 , . . . , fm2 , where ei is the sum (modulo 2) of all symbols in row i and fj

be the sum of all symbols in column j. Thus we have an (m, n)-code, where

n = m + m1 + m2 . Show that this code can correct all single errors and detect all

triple errors.

2 → Z2 is the encoding function,

n m

D : Z2 → Z2 is the (maximum likelihood) decoding function and D ◦ E = id or

D(E(x)) = x holds for all x ∈ Zm 2 . We observed that the set of codewords E(Z2 ) is

m

an important object. It is so important that it is often identiﬁed with the code itself and

also denoted C. We will also do this when it invites no confusion and the encoding

function is clear from the context. We saw that it is extremely important to spread

C = E(Zm n

2 ) in Z2 uniformly and that the most important characteristic of C is the

minimum distance between any two codewords of C

a=b∈C

Theorem 7.1.5 A code C detects all combinations of k or fewer errors if and only

if dmin (C) ≥ k + 1 and corrects all combinations of k or fewer errors if and only if

dmin (C) ≥ 2k + 1.

on their minimum distance.

dmin 1 2 3 4 5 6 7 8 9

Errors detected 0 1 2 3 4 5 6 7 8

Errors corrected 0 0 1 1 2 2 3 3 4

1 1 Hk Hk

Example 7.1.6 Let H1 = and let us deﬁne inductively: Hk+1 = .

1 −1 Hk −Hk

Then Hn is a matrix of order 2n × 2n . For example,

7.1 Binary Error-Correcting Codes 179

⎡ ⎤

1 1 1 1 1 1 1 1

⎢ 1 −1 1 −1 1 −1 1 −1 ⎥

⎡ ⎤ ⎢ ⎥

1 1 1 1 ⎢ 1 1 −1 −1 1 1 −1 −1 ⎥

⎢ ⎥

⎢ 1 −1 1 −1 ⎥ ⎢ 1 −1 −1 1 1 −1 −1 1 ⎥

H2 = ⎢ ⎥

⎣ 1 1 −1 −1 ⎦ , H3 = ⎢ ⎥

⎢ 1 1 1 1 −1 −1 −1 −1 ⎥ .

⎢ ⎥

1 −1 −1 1 ⎢ 1 −1 1 −1 −1 1 −1 1 ⎥

⎢ ⎥

⎣ 1 1 −1 −1 −1 −1 1 1 ⎦

1 −1 −1 1 −1 1 1 −1

It can be proved by induction that any two distinct rows of Hn are orthogonal (see

Exercise 2). This, in turn, is equivalent to the matrix equation

Deﬁnition 7.1.6 An n × n matrix H with entries from {+1, −1} satisfying (7.2) is

called a Hadamard matrix.

The orthogonality of the rows of Hn means that any two rows of Hn coincide in

2n−1 positions and also differ in 2n−1 positions. Hence if we replace each −1 with

a 0, we will have a set of vectors with minimum distance 2n−1 . For example, if we

do this with the rows of H3 shown above we will get eight vectors with minimum

distance 4. We can use these vectors for the construction of a code. For example,

( 0 0 0 ) → ( 1 1 1 1 1 1 1 1 ),

( 1 0 0 ) → ( 1 0 1 0 1 0 1 0 ),

( 0 1 0 ) → ( 1 1 0 0 1 1 0 0 ),

( 0 0 1 ) → ( 1 0 0 1 1 0 0 1 ),

( 1 1 0 ) → ( 1 1 1 1 0 0 0 0 ),

( 1 0 1 ) → ( 1 0 1 0 0 1 0 1 ),

( 0 1 1 ) → ( 1 1 0 0 0 0 1 1 ),

( 1 1 1 ) → ( 1 0 0 1 0 1 1 0 ).

In fact, we can do even better as the following exercise shows.

H3

Exercise 7.1.1 We may consider the matrix and replace in this matrix each

−H3

−1 by a 0. Then we will obtain 16 vectors which may be used to construct a (4, 8)-

code with minimum distance 4.

When, in 1969, the Mariner spacecraft sent pictures to Earth, the matrix H5 was

used to construct 64 codewords of length 32 with minimum distance 16. Each pixel

had a darkness given by a 6-bit number. Each of them was changed to one of the

180 7 Error-Correcting Codes

64 codewords and transmitted. This code could correct any combination of 7 errors.

Since the signals from Mariner were fairly weak such an error-correcting capability

was really needed.

We may also deﬁne the minimum weight of the code by

0=a∈C

This concept will be also quite important, especially for linear codes.

We remind the reader of the deﬁnition of a subspace. Let F be a ﬁeld and V be a

vector space over F. A subset W ⊆ V is a subspace if for any two vectors u, v ∈ W

and any two scalars α, β ∈ F the linear combination αu + βv is also an element of

W . In this case W becomes a vector space in its own right.

Exercise 7.1.2 Let W be the set of all vectors from Zn2 whose sum of all coordinates

is equal to zero. Show that W is a subspace of Zn2 .

Zn2 is a linear transformation from Zm

2 into Z2

n . For a binary ﬁeld, where the only

for all x, y ∈ Zm

2.

Proposition 7.1.3 For any linear code the set of codewords C is a subspace of Zn2 .

In particular, the zero vector 0 is a codeword.

Proof We will prove that C is a subspace of Zn2 if we show that the sum of any two

codewords is again a codeword. (As our coefﬁcients come from Z2 , linear combina-

tions are reduced to sums.) Let b, c be two codewords. Then b = E(x) and c = E(y)

and

b + c = E(x) + E(y) = E(x + y) ∈ C.

In particular, 0 = b + b ∈ C. �

7.1 Binary Error-Correcting Codes 181

Proof Suppose dmin (C) = d(a, b). Then as we know from Lemma 7.1.1 d(a, b) =

wt(a + b), and since a + b ∈ C we get

On the other hand, if wtmin (C) = wt(a), then, again by Lemma 7.1.1, wt(a) =

d(0, a), and hence

dmin (C) ≤ wtmin (C).

Theorem 7.1.6 is very useful. There are M = 2m codewords in any (m, n)-code. To

ﬁnd the minimum distance we need to perform M(M − 1)/2 calculations of distance

while to ﬁnd the minimum weight we need only M such calculations.

Example 7.1.7 For the following (3, 6)-code C

0 = (0 0 0) → (0 0 0 0 0 0) = 0

a1 = (1 0 0) → (1 0 0 1 0 0) = c1

a2 = (0 1 0) → (0 1 0 1 1 1) = c2

a3 = (0 0 1) → (0 0 1 0 1 1) = c3

a1 + a2 = (1 1 0) → (1 1 0 0 1 1) = c1 + c2

a1 + a3 = (1 0 1) → (1 0 1 1 1 1) = c1 + c3

a2 + a3 = (0 1 1) → (0 1 1 1 0 0) = c2 + c3

a1 + a2 + a3 = (1 1 1) → (1 1 1 0 0 0) = c1 + c2 + c3 ,

it is easy to see that it is linear. We see that C = Span{c1 , c2 , c3 }, and dmin (C) =

wtmin (C) = wt(c1 ) = 2.

Exercises

ces are Hadamard matrices.

H

2. Let H be a Hadamard matrix. Let us construct a 2n × n matrix and then

−H

replace each −1 by a 0. Prove that in the resulting matrix every two distinct rows

have Hamming distance of at least n/2 between them.

3. Let Ei : Z32 → Z72 , i = 1, 2 be the encoding mappings of the codes C1 and C2 ,

respectively, given by

(a) E1 (a) = (a1 , a2 , a3 , a1 + a2 , a2 + a3 , a1 + a3 , 0),

(b) E2 (a) = (a1 , a2 , a3 , a1 + a2 , a2 , a1 + a2 + a3 , 1).

Which code is linear and which is not?

4. Show that in a binary linear code, either all codewords have even Hamming weight

or exactly half of the codewords have even Hamming weight.

182 7 Error-Correcting Codes

the standard basis of Zm 2 , where ei is the vector which has only one nonzero element

1 in the ith position. Let us consider the vectors

E(e1 ) = g1 , . . . , E(em ) = gm ,

important since in the linear code they fully determine the encoding function. Indeed,

for an arbitrary message vector a = (a1 , a2 , . . . , am ) we have

= a1 g1 + a2 g2 + · · · + am gm .

represent the encoding function by means of matrix multiplication

where ⎡⎤

g1

⎢ g2 ⎥

⎢ ⎥

G=⎢ . ⎥

⎣ .. ⎦

gm

is the matrix with rows g1 , g2 , . . . , gm . Equation (7.3) shows that the code is the row

space of the matrix G, i.e., C = Row(G).

Deﬁnition 7.1.8 Let C = (E, D) be a linear (m, n)-code. Then the matrix G such

that

E(a) = aG,

for all a ∈ Zm

2 , is called the generator matrix of C.

E(a) = (a1 , a2 , a2 , a1 + a2 ).

Then

1001

E(a) = (a1 , a2 , a2 , a1 + a2 ) = a1 (1, 0, 0, 1) + a2 (0, 1, 1, 1) = (a1 , a2 ) .

0111

7.1 Binary Error-Correcting Codes 183

matrix G. Then the rows of G are linearly independent. Moreover, rank(G) = m and

dim C = m.

Proof It is enough to prove linear independence of the rows g1 , g2 , . . . , gm . The two

remaining statements will then follow. Suppose on the contrary that a1 g1 + a2 g2 +

· · · + am gm = 0 with not all ai ’s being zero. Then, since E is linear,

= E(a1 e1 + · · · + am em ).

am em = 0. This contradicts the fact E is one-to-one, since we have E(0) = 0

and E(a) = 0. �

Example 7.1.9 (Parity check code revisited) The parity check (m, m + 1)-code is

linear. Indeed, if the sum of coordinates for both x and y is zero, then the same is

true for x + y. We have

E(e1 ) = (1 0 . . . 0 1),

E(e2 ) = (0 1 . . . 0 1),

...

E(em ) = (0 0 . . . 1 1).

Hence ⎡ ⎤

1 0 ... 0 1

⎢0 1 ... 0 1⎥

G=⎢

⎣ ..

⎥ = [Im 1m ],

.. ... .. .. ⎦

0 0 ... 1 1

Example 7.1.10 (Triple repetition code) The triple repetition code (m, 3m)-code is

also linear. We have

E(e1 ) = (1 0 . . . 0 1 0 . . . 0 1 0 . . . 0),

E(e2 ) = (0 1 . . . 0 0 1 . . . 0 0 1 . . . 0),

...

E(em ) = (0 0 . . . 1 0 0 . . . 1 0 0 . . . 1).

Hence ⎡ ⎤

10 ... 0 1 0 ... 0 1 0 ... 0

⎢0 1 ... 0 0 1 ... 0 0 1 ... 0⎥

G=⎢

⎣ .. ..

⎥ = [Im Im Im ].

... .. .. .. . . . .. .. .. . . . .. ⎦

00 ... 1 0 0 ... 1 0 0 ... 1

184 7 Error-Correcting Codes

Example 7.1.11 Let us deﬁne a linear (3, 5)-code by its generator matrix

⎡ ⎤

10001

G = ⎣0 1 0 1 0⎦.

00111

⎡ ⎤

1000 1

E(a1 , a2 , a3 ) = (a1 , a2 , a3 ) ⎣ 0 1 0 1 0 ⎦ = (a1 , a2 , a3 , a2 + a3 , a1 + a3 ).

0011 1

We see that the codeword E(a), which encodes a, consists of the vector a itself

embedded into the ﬁrst three coordinates and two additional symbols.

Deﬁnition 7.1.9 A linear (m, n)-code C = (E, D) is called systematic if, for any

a ∈ Zm2 , the ﬁrst m symbols of the codeword E(a) are the symbols of the word a,

i.e.,

E(a1 , a2 , . . . , am ) = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn−m ).

info symbols check symbols

The symbols of a in E(a) are called the information symbols and the remaining

symbols are called the check symbols. These are the auxiliary symbols which we

mentioned earlier.

sufﬁcient that its generator matrix has the form G = (Im A), where A is an m×(n−m)

matrix.

Hence ⎡

⎤ ⎡ ⎤

g1 1 0 ... 0 a11 . . . a1n−m

⎢ g2 ⎥ ⎢

⎢ ⎥ ⎢0 1 ... 0 a21 . . . a2n−m ⎥

⎥ = [Im A].

G=⎢ . ⎥=⎣

⎣ .. ⎦ .. .. ... .. .. . . . ... ⎦

gm 0 0 ... 1 am1 . . . amn−m

Deﬁnition 7.1.10 Two (m, n)-codes C 1 = (E1 , D1 ) and C 2 = (E2 , D2 ) are called

equivalent if, for every a ∈ Zm

2 , their respective codewords E1 (a) and E2 (a) differ

only in the order of symbols, moreover the permutation that is required to obtain

E1 (a) from E2 (a) does not depend on a.

7.1 Binary Error-Correcting Codes 185

(0 0) → (0 0 0 0) (0 0) → (0 0 0 0)

(0 1) → (0 1 0 1) (0 1) → (0 1 0 1)

(1 0) → (1 0 0 1) (1 0) → (0 1 1 0)

(1 1) → (1 1 0 0) (1 1) → (0 0 1 1)

are equivalent. The permutation that must be applied to the symbols of the ﬁrst code

to obtain the second is (1 3)(2 4).

It is clear that two equivalent codes have the same minimum distance.

Theorem 7.1.7 Let C be a linear (m, n)-code with minimum distance d. Then there

is a systematic linear (m, n)-code with the same minimum distance d.

Proof Let C be a linear (m, n)-code with generator matrix G. When we perform

elementary row operations over the rows of G we do not change Row(G) and hence

the set of codewords (it will change the encoding function, however).

We may, therefore, assume that our matrix G is already in its reduced row echelon

form. Since G has full rank (its rows are linearly independent), we must have m pivot

columns which are the m columns of the identity matrix Im . Let the positions of these

columns be i1 , i2 , . . . , im . Then in a codeword E(a) we will ﬁnd our information

symbols a1 , a2 , . . . , am in positions i1 , i2 , . . . , im . Moving these columns (and hence

the respective coordinates) to the ﬁrst m positions, we will obtain a systematic code

which is equivalent to the given one. �

⎡ ⎤

101011

G = ⎣0 1 1 1 1 0⎦.

000111

⎡ ⎤ ⎡ ⎤

101011 101011

G = ⎣ 0 1 1 1 1 0 ⎦ → ⎣ 0 1 1 0 0 1 ⎦ = G

000111 000111

gives us a generator matrix G of a new code with the same minimum distance. It is

equivalent to the systematic code with the generator matrix

⎡ ⎤

100111

G = ⎣ 0 1 0 1 0 1 ⎦ ,

001011

186 7 Error-Correcting Codes

⎡ ⎤

1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1100010

⎢0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1⎥

⎢ ⎥

⎢0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0⎥

⎢ ⎥

⎢0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0⎥

⎢ ⎥

⎢0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0⎥

⎢ ⎥

⎢0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1⎥

G=⎢

⎢0

⎥

⎢ 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1⎥

⎥

⎢0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1⎥

⎢ ⎥

⎢0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0⎥

⎢ ⎥

⎢0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 1⎥

⎢ ⎥

⎣0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1⎦

0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1111111

is the generator matrix of the famous Golay code. This is a (12, 24)-code and its

minimum distance is 8. It was used by the Voyager I and Voyager II space-crafts

during 1979–1981 to provide error correction when the Voyagers transmitted to

Earth colour pictures of Jupiter and Saturn.

Exercises

E(a) = (a1 , a2 , a3 , a1 + a2 + a4 , a2 + a3 , a1 + a3 + a4 , a4 ).

2. Check by inspection that the Golay code is systematic.

3. Show that elementary row operations performed on a generator matrix G do not

change the set of codewords and, in particular, the minimum distance of the code.

4. Let C 1 be the linear (3, 6)-code with the following generator matrix over Z2

⎡ ⎤

101010

G = ⎣1 1 0 0 1 1⎦.

111000

(b) Find the generator matrix of another linear (3, 6)-code C 2 which is systematic

and equivalent to C 1 .

(c) List all codewords of C 2 and determine its minimum distance.

7.1 Binary Error-Correcting Codes 187

The generator matrix of a code is a great tool for the sender since with its help

the encoding can be done by means of matrix multiplication. All she needs is to

store the generator matrix which contains all the information about the encoding

function. However, the generator matrix is not very useful at the receiving end. On

the receiving end we need another matrix—the parity check matrix, which we will

introduce below.

a parity check matrix of C if x ∈ C if and only if HxT = 0.

can reformulate the above deﬁnition as follows: an (n − m) × n matrix H is a parity

check matrix of C if and only if C = Null(H).

Having this matrix at the receiving end we may quickly check if the received

vector y was the codevector by calculating its syndrome S(y) = HyT . Then y ∈ C if

and only if S(y) = 0. If the syndrome is the zero vector, the decoder assumes that no

mistakes happen. Later we will learn how the syndrome S(y), if nonzero, can help

to correct mistakes that occurred.

But, ﬁrstly, we have to learn how to construct such a matrix given the generator

matrix G. We will assume that our code is systematic and G has the form G = (Im A),

where A is an arbitrary m × (n − m) matrix. In other words,

⎡ ⎤ ⎡ ⎤

g1 10 ... 0 a11 . . . a1n−m

⎢ g2 ⎥ ⎢

⎢ ⎥ ⎢0 1 ... 0 a21 . . . a2n−m ⎥

⎥.

G=⎢ . ⎥=⎣

⎣ .. ⎦ .. .. ... .. .. . . . ... ⎦

gm 00 ... 1 am1 . . . amn−m

Since gi ∈ C, for any i = 1, 2, . . . , m, we must have HgiT = 0 and hence HGT = 0.

We also have GHT = (HGT )T = 0. This means that all columns of H T must be

solutions to the system of linear equations GxT = 0. Since G is already in its reduced

row echelon form, we separate variables to obtain

x2 = −a21 xm+1 − · · · − a2n−m xn

...

xm = −am1 xm+1 − · · · − amn−m xn

(of course in Z2 we have −aij = aij however we would like to leave the possibility

of a non-binary alphabet). Setting, as usual, the values of the free variables to be

188 7 Error-Correcting Codes

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

xm+1 1 0 0

⎢ xm+2 ⎥ ⎢ 0 ⎥ ⎢1⎥ ⎢0⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ .. ⎥ = ⎢ .. ⎥ , ⎢ .. ⎥ , . . . , ⎢ .. ⎥ ,

⎣ . ⎦ ⎣.⎦ ⎣.⎦ ⎣.⎦

xn 0 0 1

we obtain a basis {f1 , f2 , . . . , fn−m } for the solution space of the system GxT = 0

calculating

⎡ ⎤ ⎡ ⎤ ⎡ ⎤

−a11 −a12 −a1n−m

⎢ −a21 ⎥ ⎢ −a22 ⎥ ⎢ −a2n−m ⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥

⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥

T

⎢ −a ⎥

m1 ⎥ T

⎢ −a ⎥

m2 ⎥ T

⎢ −a ⎥

mn−m ⎥

⎢

f1 = ⎢ ⎢ ⎢

1 ⎥ , f2 = ⎢ 0 ⎥ , . . . , fn−m = ⎢ 0 ⎥.

⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥

⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥

⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦

0 0 1

We will show that the matrix H with rows {f1 , f2 , . . . , fn−m } is a parity check matrix

for this code. Indeed, HgiT = 0, hence for any codeword c ∈ C we have c =

a1 g1 + a2 g2 + · · · + am gm and

T

= 0.

dim Null(H) = n − (n − m) = m = dim C. Hence Null(H) = C and H is indeed a

parity check matrix for C. We see that H has the form

T

−A

H= = (−AT | In−m ).

In−m

We have proved:

Theorem 7.1.8 Let C be a linear (m, n)-code. If G = (Im | A) is a generator matrix

of C, then H = (−AT | In−m ) is a parity check matrix of C.

This works in the other direction too: given an (n−m)×n matrix H = (A | In−m ),

where A is an (n − m) × m matrix, we can construct a linear (m, n)-code C with the

generator matrix G = (Im | −AT ) and it will have H as its parity check matrix.

Example 7.1.15 Suppose that the generator matrix for a binary (4, 7)-code is

⎡ ⎤

1 0 0 0 1 0 1

⎢0 1 0 0 0 1 1⎥

G=⎢

⎣0

⎥ = (I4 | A).

0 1 0 1 1 0⎦

0 0 0 1 0 1 0

7.1 Binary Error-Correcting Codes 189

Then ⎡ ⎤

1010 100

H = ⎣0 1 1 1 0 1 0 ⎦ = (AT | I3 ).

1100 001

⎡ ⎤

1 0 0 0 1 0 1

⎢0 1 0 0 0 1 1⎥

b = aG = (1 0 1 0) ⎢

⎣0

⎥ = (1 0 1 0 0 1 1).

0 1 0 1 1 0⎦

0 0 0 1 1 1 1

have S(c) = HcT = (0 1 1)T = (0 0 0)T . If c was received, this would show that

one or more mistakes happened.

Let hi be the ith column of the parity check matrix, that is, H = (h1 h2 . . . hn ).

We know that a vector b ∈ Zn2 is a codevector if and only if S(b) = 0.

Let a be a codevector and suppose b = a + e. We may treat b as the codevector a

with an error. Our goal is to determine how the syndrome S(b) of the vector b ∈ Zn2

depends on the codevector a and on the error vector e. We will ﬁnd that it does not

depend on a at all! This will allow us to develop a method of error correction.

Lemma 7.1.2 Let a be a codevector and suppose b = a + e, where the error vector

e has Hamming weight s and ones in positions i1 , i2 , . . . , is , which corresponds to s

mistakes in the corresponding positions. Then

Proof By Proposition 7.1.1 e = ei1 + ei2 + · · · + eis , where ej is the jth vector of the

standard basis of Zn2 . Then

S(b) = HbT = H(a + e)T = 0 + HeT = H(eiT1 + eiT2 + · · · + eiTs ) = hi1 + hi2 + · · · + his ,

We see that, indeed, the syndrome of the received vector depends only on the error

vector and not on the codevector.

Theorem 7.1.9 Let H = (h1 , h2 , . . . , hn ) be an (n − m) × n matrix with entries

from Z2 such that no two columns of H coincide. Then any binary linear (m, n)-code

C with H as its parity check matrix corrects all single errors. If a single error occurs

in ith position, then the syndrome of the received vector is equal to the ith column of

H, i.e., hi .

Proof Suppose that a codevector a was sent and the vector b = a + ei was received

(which means that a mistake occurred in the ith position). Then due to (7.4)

190 7 Error-Correcting Codes

S(b) = HbT = hi .

We now know where the mistake happened and can correct it. �

Exercises

1. Let ⎡ ⎤

12121

A = ⎣1 2 1 0 2⎦

21010

be a matrix over Z3 .

(a) Find a basis for the null space Null(A) of this matrix.

(b) List all vectors of the Null(A).

(c) Find among the nonzero vectors of Null(A) the vector whose weight is min-

imal.

2. Let us consider a binary code C given by its parity check matrix

⎡ ⎤

001 1101

⎢0 1 0 1 0 1 1⎥

H=⎢

⎣1 0 0

⎥.

0 1 1 1⎦

111 1110

(a) Compute the generator matrix for C. What is the number of information

symbols for this code?

(b) Will the code C correct any single mistake?

(c) Will the code C correct any two mistakes?

(d) Will the code C detect any two mistakes?

(e) Encode the message vector whose coordinates are all equal to 1;

(f) Decode y1 = (1 1 0 1 0 0 1) and y2 = (1 1 0 1 1 0 0);

(g) Show that a single mistake could not result in receiving the vector z =

(0 1 0 1 1 1 1). Show that two mistakes could result in receiving z.

started a new subject within information theory. Hamming codes, Hamming distance

1 Richard Wesley Hamming (1915–1998) He participated in the Manhattan Project that produced

the ﬁrst atomic bombs during World War II. There he was responsible for running the IBM computers

in Los Alamos laboratory which played a vital role in the project. Later he worked for Bell Labs after

which he became increasingly interested in teaching and taught in a number of leading universities

in the USA. Hamming is best known for his work on error-detecting and error-correcting codes.

His fundamental paper on this topic “Error detecting and error correcting codes” appeared in April

1950 in the Bell System Technical Journal.

7.1 Binary Error-Correcting Codes 191

and Hamming metric are standard terms used today in coding theory but they are

also used in many other areas of mathematics.

We start with the Hamming (4, 7)-code. Let us consider the binary 3 × 7 matrix

⎡ ⎤

0001111

H = (h1 h2 h3 h4 h5 h6 h7 ) = ⎣ 0 1 1 0 0 1 1 ⎦ , (7.5)

1010101

to i = 7. Theorem 7.1.9 gives us reason to believe that the (4, 7)-code with this

parity check matrix will be good since by that theorem such a code will correct all

single errors. We also note that all nonzero three-dimensional columns are used in

the construction of H and every binary 3 × 8 matrix will have at least two equal

columns. This says to us that the code with parity check matrix H must be in some

way the optimal (4, 7)-code.

Let us ﬁnd a generator matrix G that will match the parity check matrix H. We

know that by row reducing H we do not change the null-space of H, hence the set

of codewords stays the same. We will therefore be trying to obtain a matrix with

the identity matrix I3 in the last three columns in order to apply Theorem 7.1.8. The

technique is the same as for ﬁnding the reduced row echelon form. We obtain:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤

0001111 0001111 0001111

H = ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 0 1 1 0 0 1 1 ⎦

1010101 1011010 1101001

⎡ ⎤ ⎡ ⎤

0111100 0111 100

−→ ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 1 0 1 1 0 1 0 ⎦ = (C | I3 ).

1101001 1101 001

⎡ ⎤

1 0 0 0 0 1 1

⎢ 0 1 0 0 1 0 1⎥

G = (I4 | C T ) = ⎢

⎣0

⎥.

0 1 0 1 1 0⎦

0 0 0 1 1 1 1

⎡ ⎤

1 0 0 0 0 1 1

⎢0 1 0 0 1 0 1⎥

b = aG = (1 1 1 0) ⎢

⎣0

⎥ = (1 1 1 0 0 0 0).

0 1 0 1 1 0⎦

0 0 0 1 1 1 1

192 7 Error-Correcting Codes

⎡ ⎤

1

⎢0⎥

⎡ ⎤⎢ ⎥ ⎡ ⎤

0001111 ⎢ ⎢1⎥

⎥ 0

T

S(c) = Hc = ⎣ 0 1 1 0 0 1 1 ⎦ ⎢ 0 ⎥

⎢

⎥ = ⎣ 1 ⎦ = h2 .

1010101 ⎢ ⎢0⎥

⎥ 0

⎣0⎦

0

Assuming that only one mistake happened, we know that this mistake occurred in the

second position. Hence the vector b = (1 1 1 0 0 0 0) was sent and a = (1 1 1 0)

was the original message.

This code is very interesting. It has 24 = 16 codewords and, since it corrects any

single error, it has minimum distance of at least 3. So, if we take a ball B1 (x) of

radius one centred at a codeword x, it will not intersect with other similar balls of

radius one around other codewords. Due to Theorem 7.1.2, every such ball will have

eight vectors of Z72 . In total, these balls will contain 16 · 8 = 128 = 27 vectors, that

is all vectors of Z72 . The whole space is the union of those unit balls! This means that

the Hamming (4,7)-code corrects all single mistakes but not a single double mistake

since any double mistake will take you to another ball. Lemma 7.1.2 provides an

alternative explanation of why any double mistake will not be corrected. Indeed, the

syndrome of a double mistake is the sum of the corresponding two columns of H.

However, since all three-dimensional vectors are used as columns of H, the sum of

any two columns will be a third column. This means that any double mistake will be

treated as a single mistake and will not be corrected.

Suppose, for example, that the vector a = (1 1 1 0) encoded as b =

(1 1 1 0 0 0 0), was sent, and the vector c = (0 1 1 0 0 0 1) was received

with mistakes in the ﬁrst and the seventh positions. The syndrome of it is

⎡ ⎤

0

⎢1⎥

⎡ ⎤⎢ ⎥ ⎡ ⎤

0001111 ⎢ ⎢ 1⎥⎥ 1

S(c) = HcT = ⎣ 0 1 1 0 0 1 1 ⎦ ⎢ 0

⎢ ⎥

⎥ = h1 + h7 = ⎣ 1 ⎦ = h6 .

1010101 ⎢ ⎢0⎥

⎥ 0

⎣0⎦

1

double mistake will not be corrected as it will mimic a single mistake in position 6.

The (4, 7) binary Hamming code is the ‘smallest’ code from the inﬁnite family

of Hamming codes.

7.1 Binary Error-Correcting Codes 193

the parity check matrix H, whose rth column contains the binary representation of

the integer r, for r = 1, 2, . . . , 2k − 1.

Example 7.1.16 The Hamming (11, 15)-code is given by its parity check matrix

⎡ ⎤

0 0 0 0 0 0 0 1 1 1 11 1 1 1

⎢0 0 0 1 1 1 1 0 0 0 01 1 1 1⎥

H=⎢

⎣0

⎥.

1 1 0 0 1 1 0 0 1 10 0 1 1⎦

1 0 1 0 1 0 1 0 1 0 10 1 0 1

mistakes.

Exercises

1. We have deﬁned the Hamming (4, 7)-code by means of the parity check matrix

H and we computed the generator matrix G, where

⎡ ⎤

⎡ ⎤ 100 0 011

1010101 ⎢0 1 0 0 1 0 1⎥

H = ⎣0 1 1 0 0 1 1⎦, G=⎢

⎣0 0 1

⎥.

0 1 1 0⎦

0001111

000 1 111

(b) Decode the vector v = (1 0 1 1 0 1 1);

(c) Find all strings of length 7 which are decoded to w = (1 0 1 1).

2. A code that, for some k, corrects all combinations of k mistakes and does not

correct any combination of mistakes for > k, is called perfect. Prove that all

codes of the family of Hamming codes are perfect.

There is one particular class of linear codes the construction of which uses some

advanced algebra, and because of that these codes are very effective. In this section

we will consider (m, n)-codes obtained in this way. We will identify our messages

(strings of symbols of length m or vectors from Zm 2 ) with polynomials of degree at

most m − 1. More precisely, this identiﬁcation is given by the formula

the message a can be easily recovered from the polynomial a(x). Suppose now that

194 7 Error-Correcting Codes

can deﬁne an (m, n)-code C as follows. For every a = (a0 , a1 , . . . , am−1 ) ∈ Zm

2 we

deﬁne

= b0 + b1 x + · · · + bn−1 x n−1 → b,

where b = (b0 , b1 , . . . , bn−1 ) ∈ Zn2 . Such a code is called a polynomial code and

the polynomial g(x) is called the generator polynomial of this code.

Theorem 7.1.10 The polynomial code C is linear with the following m×n generator

matrix ⎡ ⎤

g0 g1 . . . gk

⎢ g0 g1 . . . gk ⎥

⎢ ⎥

G=⎢ ⎢ g0 g1 . . . gk ⎥, (7.6)

⎥

⎣ ... ... ... ... ⎦

g0 g1 . . . gk

Proof The linearity of the encoding function follows from the distributive law for

polynomials. Suppose that E(a1 ) = b1 and E(a2 ) = b2 with a1 (x), b1 (x), a2 (x),

b2 (x) being the corresponding polynomials. We need to show that E(a1 + a2 ) =

b1 + b2 . Indeed, we have

= b1 (x) + b2 (x) → b1 + b2 ,

as required.

To determine the generator matrix we need to calculate E(e1 ), . . . , E(em ). We

have

ei → x i−1 → x i−1 g(x) = g0 x i−1 + g1 x i + · · · + gn−m x n−m+i−1

i−1

This must be the ith row of the generator matrix G. This gives us (7.6). �

7.1 Binary Error-Correcting Codes 195

Although for a polynomial code the generator matrix (7.6) is easy to obtain, it is

sometimes more convenient (and gives more insight) to multiply polynomials and

not matrices.

for all m. Let us choose m = 4. Then we obtain a (4, 7)-code whose generator matrix

will be ⎡ ⎤

1011000

⎢0 1 0 1 1 0 0⎥

G=⎢ ⎣0 0 1 0 1 1 0⎦.

⎥

0001011

By row reducing G (when we change the encoding function but not the set of

codewords), we get

⎡ ⎤ ⎡ ⎤ ⎡ ⎤

1 0 1 1 0 0 0 1 0 10 011 1 0 0 0 1 0 1

⎢0 1 0 1 1 0 0⎥ ⎢0 1 00 1 1 1⎥ ⎢0 1 0 0 1 1 1⎥

G=⎢

⎣0

⎥→⎢ ⎥→⎢ ⎥.

0 1 0 1 1 0⎦ ⎣0 0 10 1 1 0⎦ ⎣0 0 1 0 1 1 0⎦

0 0 0 1 0 1 1 0 0 01 011 0 0 0 1 0 1 1

Since it is now in the form (I4 | A), by Theorem 7.1.8 we may obtain its parity check

matrix H as (AT | I3 ), that is,

⎡ ⎤

1110100

H = (AT | I3 ) = ⎣ 0 1 1 1 0 1 0 ⎦ .

1101001

From this we observe that the code which we obtained is equivalent to the Hamming

code since H = (h5 , h7 , h6 , h3 , h4 , h2 , h1 ), where h1 , h2 , . . . , h7 are the columns

of the parity check matrix of the Hamming code. Hence it is equivalent to the (4,7)

Hamming code.

Exercises

Let g(x) = 1 + x + x 3 . Consider the polynomial (5, 8)-code C with g(x) as generator

polynomial. For this code

1. Encode a = (1 0 1 0 1).

2. Find the generator matrix G of the code C.

3. Find a systematic linear code C (in terms of its parity check matrix) which is

equivalent to C.

196 7 Error-Correcting Codes

This is one particularly good class of polynomial codes which was discovered inde-

pendently around 1960 by Bose, Chaudhuri and Hocquenghem. They enable us to

correct multiple errors. Since the construction of the generator polynomial for these

codes is based on a ﬁnite ﬁeld of certain cardinality, we have to construct one ﬁrst,

say F and then ﬁnd its primitive element α.

In Chap. 5 we discussed a method of constructing a ﬁeld which consists of pn

elements. It is unique up to isomorphism and denoted by GF(pn ). To construct it we

need to take Zp , ﬁnd an irreducible polynomial m(x) over Zp of degree n and form

F = Zp [x]/(m(x)). There are very good tables of irreducible polynomials over Zp

of virtually any degree (see, for example, [2]).

BCH codes work equally well for binary and for non-binary alphabets but in this

section we put the main emphasis on the binary case. Therefore we will consider a

ﬁeld F = GF(2r ) for some r which is an extension of Z2 . The general case is not

much different with only minor changes needed.

As usual we will consider (m, n)-codes, where m denotes the number of informa-

tion symbols and n the length of codewords. The minimum distance of the code we

will denote by d. For BCH codes we, ﬁrst, have to decide on the length of the code-

words n and on the minimum distance d, then m will depend on these two parameters

but this dependence is not straightforward.

This restriction on the length is not important in applications because it is not

the length of codewords that is practically important (we may divide our messages

into segments of any length) but the speed of transmission, which is characterised

by the ratio m/n, and the error-correcting capabilities of the code, i.e., the minimum

distance d.

We use the extension Z2 ⊆ F for the construction. The length of the word n will

be taken to be the number of elements in the multiplicative group of the ﬁeld F. As

we consider the binary situation, this number can only be n = 2r − 1, where r is

an arbitrary positive integer, since the ﬁeld F of characteristic 2 may have only 2r

elements for some r.

Let α be a primitive element of F. Then it has multiplicative order n and the

powers 1 = α 0 , α, α 2 , . . . , α n−1 are all different. To construct g(x) we need to know

the minimal annihilating polynomials of α, α 2 , . . . , α d−1 . Let mi (x) be the minimal

annihilating polynomial of α i .

Theorem 7.1.11 The polynomial code of length n with the generator polynomial

Proof Since this code is linear, the minimum distance is the same as the minimum

weight. Hence it is enough to prove that there are no codewords of weight d − 1 or

7.1 Binary Error-Correcting Codes 197

less. Since the code is polynomial, all vectors from Zn2 are identiﬁed with polynomials

of degree smaller than n and the codewords are identiﬁed with polynomials which

are divisible by g(x). Hence, we have to show that there are no polynomials of degree

smaller than n which are multiples of g(x) and have less than d nonzero coefﬁcients.

Suppose on the contrary that the polynomial

i.e.,

c(α) = c(α 2 ) = · · · = c(α d−1 ) = 0.

c1 α i1 + c2 α i2 + · · · + cd−1 α id−1 = 0

c1 α 2i1 + c2 α 2i2 + · · · + cd−1 α 2id−1 = 0

...

c1 α (d−1)i1 + c2 α (d−1)i2

+ · · · + cd−1 α (d−1)id−1

= 0.

β1 x1 + β2 x2 + · · · + βd−1 xd−1 = 0

β12 x1 + β22 x2 + · · · + βd−1

2

xd−1 = 0

...

β1d−1 x1 + β2d−1 x2 + · · · + βd−1

d−1

xd−1 = 0

has a nontrivial solution (c1 , c2 , . . . , cd−1 ). This can happen only if the determinant

of this system vanishes. This, however, contradicts the classical result of the theory

of determinants that, for any k > 1, the Vandermonde determinant

β1 β2 ... βk

2

β

1 β22 ... βk2

... ... ... . . .

βk β2k ... βk

1 k

is zero if and only if βs = βt for some s = t such that s ≤ k and t ≤ k (see the

Appendix for the proof). Indeed, in our case k = d − 1 and βs = α is = α it = βt

because is ≤ n − 1 and it ≤ n − 1. This contradiction proves the theorem. �

the minimal annihilating polynomial of α over Z2 . Then m(t) is also the minimal

irreducible polynomial of α 2 .

198 7 Error-Correcting Codes

We note, ﬁrst, that ai2 = ai as 02 = 0 and 12 = 1 for ai ∈ {0, 1}. We also note that

since 2x = 0 for all x ∈ F, then (x + y)2 = x 2 + y2 for all x, y ∈ F and by induction

= (α 2 )k + a1 (α 2 )k−1 + · · · + ak · 12 = m(α 2 ).

Hence m(t) is also an annihilating polynomial for α 2 . Therefore the minimal irre-

ducible polynomial of α 2 must divide m(t). Since m(t) is irreducible, this is possible

only if it coincides with m(t). �

Example 7.1.19 Suppose that we need a code which corrects any two errors and has

length 15. Hence d = 5, and we need a ﬁeld containing 16 elements. Such a ﬁeld

K = Z2 [x]/(x 4 + x + 1) was constructed in Example 5.2.3. We also saw that the

multiplicative order of x was 15, hence x is a primitive element of F. Let α = x.

For correcting any two mistakes we need a code with minimum distance d = 5.

Theorem 7.1.11 tells us that we need to take the generator polynomial

m4 (t). Hence g(t) = m1 (t)m3 (t) and we have to calculate m3 (t) which is the minimal

annihilating polynomial for β = x 3 . Using the table in Example 5.2.3, we calculate

that β 2 = x 6 = x 2 + x 3 , β 3 = x 9 = x + x 3 , β 4 = x 12 = 1 + x + x 2 + x 3 . Elements

1, β, β 2 , β 3 , β 4 must be linearly dependent in the 4-dimensional vector space F

and we can ﬁnd the linear dependency between them using the Linear Dependency

Relationship Algorithm. By row reducing the following matrix to its row reduced

echelon form ⎡ ⎤ ⎡ ⎤

10001 10001

⎢ 0 0 0 1 1 ⎥ rref ⎢ 0 1 0 0 1 ⎥

⎢ ⎥ ⎢ ⎥

⎣ 0 0 1 0 1 ⎦ −→ ⎣ 0 0 1 0 1 ⎦

01111 00011

g(t) = (t 4 + t + 1)(t 4 + t 3 + t 2 + t + 1) = t 8 + t 7 + t 6 + t 4 + 1.

7.1 Binary Error-Correcting Codes 199

Now we may say that m = n − deg (g) = 15 − 8 = 7 and our code C will be a

(7, 15)-code. It will correct any two errors.

A more practical example is a code widely used in European data communication

systems. It is a binary (231, 255)-code with a guaranteed minimum distance of 7.

The ﬁeld consisting of 28 = 256 elements is used and the encoding polynomial has

degree 24.

Exercises

1. Construct a binary (m, n)-code with the length of codewords n = 15, which

corrects all triple errors, in following steps:

(a) Using the ﬁeld K = Z2 [x]/(x 4 + x 3 + 1), compute the generating polynomial

g(t) of a binary BCH code with the length of the codewords n = 15 and with

a minimum distance 7.

(b) What is the number m of information symbols?

(c) Write down the generating matrix G of this BCH code.

(d) Encode the message which is represented by the string of m ones.

2. In European data communication systems a binary BCH (231, 255)-code is used

with guaranteed minimum distance 7. Using GAP ﬁnd the generator polynomial

of this code.

Non-binary codes have many different uses. Any ﬁnite ﬁeld Zp can be used as an

alphabet of a code if the channel allows us to distinguish p different symbols. Non-

binary codes can be used as an intermediate step in the construction of good binary

codes, and they can also be used in the construction of ﬁngerprinting codes, which

we will discuss in the next section.

We will again consider (m, n)-codes. The encoding function of such a code will be a

mapping (normally linear) E : F m → F n for a certain ﬁnite ﬁeld F which serves as

the alphabet. The Hamming weight and the Hamming distance are deﬁned exactly

as for binary codes.

Then wt(u) = 4, wt(v) = 3 and

d(u, v) = wt(u − v) = wt (0 2 0 0 0 1 0) = 2.

If u was sent and v was received, then the error vector is e = v −u = (0 1 0 0 0 2 0).

200 7 Error-Correcting Codes

With non-binary codes we don’t have the luxury that −a = a anymore. With

ternary codes we have −a = 2a instead! But the following theorem is still true:

Theorem 7.2.1 A code C detects all combinations of k or fewer errors if and only

if dmin (C) ≥ k + 1 and corrects all combinations of k or fewer errors if and only if

dmin (C) ≥ 2k + 1.

The error correction capabilities of any code will again be dependent on the

minimum distance of the code, and the minimum distance for a linear code will be

equal to the minimum weight.

The concepts of generator matrix G and parity check matrix H are the same.

A little reﬁnement must be made for ﬁnding G from H and the other way around.

Namely, if G = (Im | A), then H = (−AT | In−m ). Theorem 7.1.9 must be also

slightly generalised to allow the design of non-binary error-correcting codes capable

of correcting all single mistakes.

Theorem 7.2.3 A linear (non-binary) code with parity check matrix H corrects all

single mistakes if and only if no one column of H is a multiple of another column.

and some 0 = a ∈ Zp , i.e.,

e = (0 . . . 0 a 0 . . . 0) = a(0 . . . 0 1 0 . . . 0).

HeT = ahi ,

hi , then we can ﬁnd both i and a. If there is such a column the identiﬁcation of the

mistake would be impossible. �

by deﬁning a code by its parity check matrix

⎡ ⎤

0000111111111

H = ⎣0 1 1 1 0 0 0 1 1 1 2 2 2⎦.

1012012012012

7.2 Non-binary Error-Correcting Codes 201

The secret behind this matrix is that every nonzero column vector from Z33 is either a

column of H or a multiple of such a column. Then this code will be a (10, 13)-code

that corrects any single mistake. For example, the syndrome

⎡ ⎤

2

HyT = ⎣ 0 ⎦ = 2h7 ,

1

for y ∈ Z133 shows that a mistake happened in the 7th position and it should be

corrected by subtracting 2 (or adding 1) to the coordinate y7 .

Exercises

In the exercises below, all matrices and codes are ternary, i.e., over Z3 .

1. Suppose the matrix ⎡ ⎤

121211

H1 = ⎣ 1 2 1 0 2 1 ⎦

210102

is taken as a parity check matrix of a ternary error correcting code C1 . Will this

code correct all single errors?

2. Find the generator matrix for the code C2 with the following parity check matrix

⎡ ⎤

121211

H2 = ⎣ 1 2 1 0 2 2 ⎦ .

210101

3. Suppose that the code C2 was used. Decode the vector y = (0 2 2 2 2 2).

No changes at all should be made for polynomial codes and BCH codes. Among

non-binary BCH codes Reed–Solomon codes are of special practical importance.

They are also widely used to build other good codes, including good binary codes.

primitive elements. Let d > 1 be a positive integer such that |F| > d − 1. A Reed–

Solomon (or RS) code over F is a polynomial (q − d, q − 1)-code with the generator

polynomial

g(x) = (x − α)(x − α 2 ) . . . (x − α d−1 ). (7.9)

mial (7.9) has a minimum distance of at least d.

202 7 Error-Correcting Codes

Proof We consider the trivial extension of ﬁelds F ⊆ F. Let mi (x) be the minimal

irreducible polynomial of α i over F. Then mi (x) = x − α i and we see that the RS

code is a BCH code. By Theorem 7.1.11 its guaranteed minimum distance is d. �

Example 7.2.3 Let F = Z2 [t]/(t 2 + t + 1). Then F = {0, 1, α, β}, where α = t and

β = t + 1. We note that β = α 2 , so α is a primitive element of F. The RS (2, 3)-code

over F with generator polynomial g(x) = x +α (which is the same as x −α) will have

minimum distance 2. It will have 42 = 16 codevectors. Let us encode the message

(α β). We have

= β + βx + βx 2 = (β β β).

(0 0 0) (α 1 0) (0 α 1) (α β 1)

(β α 0) (0 β α) (β 1 α) (1 1 1)

(1 β 0) (0 1 β) (1 α β) (α α α)

(β 0 1) (α β 1) (1 0 α) (β β β)

4)-code over F with generator polynomial g(x) = (x−α)(x −α 2 ) = (x−2)(x−4) =

x 2 + 4x + 3 will have minimum distance 3. It will have 52 = 25 codevectors:

The Reed–Solomon codes are among the best known. To substantiate this claim

let us prove the following

Theorem 7.2.5 (The Singleton bound) Let C be a linear (m, n)-code. Then

dmin (C) ≤ n − m + 1.

Proof Let us consider the codeword E(e1 ) = g1 . It has only one nonzero information

symbol. It has n − m check symbols which may also be nonzero. In total, wt(g1 ) ≤

n − m + 1. But

dmin (C) = wtmin (C) ≤ wt(g1 ) ≤ n − m + 1.

Now we can show that any Reed–Solomon code achieves the Singleton bound.

7.2 Non-binary Error-Correcting Codes 203

Proof Let us consider the Reed–Solomon code C of length n with the generator

polynomial

g(x) = (x − α)(x − α 2 ) . . . (x − α d−1 ).

Let m be the number of information symbols. We know that dmin (C) ≥ d since d

is the guaranteed minimum distance of this code. Since the degree of the generator

polynomial is d − 1, this will be the number of check symbols of this polynomial

code, i.e., d − 1 = n − m. Hence dmin (C) ≥ d = n − m + 1. By the previous theorem

we obtain dmin (C) = n − m + 1 and C achieves the Singleton bound. �

As we mentioned, good binary codes can be obtained from RS codes. Let F be a

ﬁeld of 2r elements, n = 2r −1. We know that F is an r-dimensional vector space over

Z2 and any element of F can be represented as a binary r-tuple. First we construct

an RS (m, n)-code over F and then, in each codeword we replace every element of

F with the corresponding binary tuple. We obtain an (rm, rn)-code which is binary.

Such codes are very good in correcting bursts of errors (several errors occurring in

close proximity) because such multiple errors affect not too many elements of F in

codewords of the RS-code and can be therefore corrected. Such codes are used in

CD-players because any microscopic defect on a disc results in a burst of errors.

We see that our choice of a code might be a result of the selected model for

mistakes: when they are random and independent we use one type of code, when

they are highly dependent (and come in bursts) we user another type of code.

Example 7.2.5 In Example 7.2.3, using the basis {1, α} for F, we may represent the

elements of F as follows:

Then we will obtain a binary (4, 6)-code with the following codevectors:

(0 0 0 0 0 0) (0 1 1 0 0 0) (0 0 0 1 1 0) (0 1 1 1 1 0)

(1 1 0 1 0 0) (0 0 1 1 0 1) (1 1 1 0 0 1) (1 0 1 0 1 0)

(1 0 1 1 0 1) (0 0 1 0 1 1) (1 0 0 1 1 1) (0 1 0 1 0 1)

(1 1 0 0 1 0) (0 1 1 1 1 0) (1 0 0 0 0 1) (1 1 1 1 1 1).

Exercises

In a series of exercises below we construct a ternary BCH-code of length n = 8 with

minimum distance 4 using the ﬁeld F = Z3 [x]/(x 2 + 2x + 2).

1. Show that α = x is a primitive element of F. Build a ‘table of powers’ of α.

2. Show that the minimal annihilating polynomials of α, α 2 and α 3 are

respectively.

204 7 Error-Correcting Codes

4. How many information symbols does this code have?

5. Find the generator matrix G of C.

The rapid growth of the digital economy, facilitated by spread of broadband avail-

ability, and rapid increases in computing power and storage capacity, has created

a global market for content and rights holders of intellectual property. But it also

creates a threat, that without adequate means of protection, piracy will prevent this

market from functioning properly.

Managing intellectual property in electronic environments is not an easy task. On

the one hand owners of the content would like to sell it for proﬁt to paying customers

but at the same time to protect it from any further illegal distribution. There are many

ways to do so. One avenue is opened with the recent development of ﬁngerprinting2

codes that provide combinatorial and algebraic methods of tracing illegally ‘pirated’

data. The idea is that a codeword might be embedded in the content (software, music,

movie) in such a way that any illegally produced copies will reveal the distributor.

For example, such a situation emerges in the context of pay TV, where only

paying customers should be able to view certain programs. The broadcasted signal

is normally encrypted and the decryption keys are sent to the paying subscribers. If

an illegal decoder is found, the source of its decryption keys must be identiﬁed.

Fingerprinting techniques have been used for quite some time; ﬁngerprints have

been embedded in digital video, documents and computer programs. However, only

recently has it become possible to give protection against colluding malicious users.

This is what ﬁngerprinting codes are about. This section is largely based on the

groundbreaking paper of Boneh and Shaw [4] and also on the paper by Staddon

et al. [5].

There are numerous ways to embed a codeword identifying the user in the content

which is normally represented as a ﬁle. A copy of the ﬁle sold to the user can

therefore be characterised by a vector x = (x1 , x2 , . . . , xn ) ∈ Znq speciﬁc to this

particular copy. This is a ﬁngerprint of this copy. Any subset C ⊂ Znq may be used

as the set of ﬁngerprints and will be called a ﬁngerprinting (watermarking) code.

A malicious coalition of users may try to create a pirate copy of the product

by trying to identify the embedded ﬁngerprint and to change it. To achieve this,

they might compare their ﬁles—for example, using the diff command—and ﬁnd

7.3 Fingerprinting Codes 205

positions in which their ﬁles differ. These will certainly belong to the code so the

coalition may discover some but not all symbols of the ﬁngerprint. They might change

the symbols in the identiﬁed positions with the goal of producing another legitimate

copy of the product that was sold to another user (or has not yet been sold). This way

they might ‘frame’ an innocent user.

The owner of the property rights for the content would like to design a scheme

that enables the identiﬁcation of at least one member of the coalition that produced a

pirated copy. As a bottom line, the scheme should make it infeasible for a malicious

coalition to frame an innocent user by producing their ﬁngerprint. Of course, we

have to make an assumption that the malicious coalition is not too large (and here

we have clear analogy with error-correcting codes that too are effective if there were

not too many mistakes during the transmission).

Let us now proceed to formal deﬁnitions.

Deﬁnition 7.3.1 Let X ⊆ Znq . For any coordinate i we deﬁne the projection

Pi (X) = {xi }.

x∈X

In other words Pi (X) is the set of all ith coordinates of the words from X.

Example 7.3.1 Let X = {x, y, z}, where

x = (0 1 2 3),

y = (0 0 2 2),

x = (0 1 3 1).

Then P1 (X) = {0}, P2 (X) = {0, 1}, P3 (X) = {2, 3}, P4 (X) = {1, 2, 3}.

Deﬁnition 7.3.2 We also deﬁne the envelope of X

Elements of the envelope are called descendants of X and elements from X are called

their parents. It is clear that X ⊆ desc(X).

vectors in X but may take, say, its 1st coordinate from x5 , the second from x2 and all

the rest from x3 . For example, in Example 7.3.1 vector (0 0 3 3) is a descendant of

X but vector (0 2 2 2) is not.

Deﬁnition 7.3.3 For any positive integer w, we will also deﬁne a restricted envelope

descw (X), which consists of all descendants of subsets of X of cardinality w.

We illustrate the difference between desc(X) and descw (X) in the following

example.

206 7 Error-Correcting Codes

x = (1 0 0),

y = (0 1 0),

x = (0 0 1).

/ desc2 (X).

Example 7.3.3 Let C ⊂ Z44 be the ﬁngerprinting code consisting of the vectors

u = (0 1 2 3),

v = (1 2 3 0),

w = (2 3 0 1),

x = (3 0 1 2),

y = (0 0 0 0),

z = (1 1 1 1).

(2 0 1 1). We see that s ∈ desc3 (X) but s ∈ / desc2 (X). To prove the last statement

we note that for s to be a descendant of a pair of vectors from C, one of them must

be either u or y (otherwise we cannot get the ﬁrst coordinate 0). Neither of these two

vectors has 2 as their second coordinate. Hence the second vector in this pair must

be v. But P4 ({u, v, y}) does not contain 1. Hence s ∈/ desc2 (X).

Exercises

x1 = (1 1 1 0 0 0 2 2 2),

x2 = (1 1 2 2 0 0 1 1 2),

x3 = (1 2 2 0 2 0 1 2 0).

(b) Find the number of elements in the envelope desc(X).

(c) Write down a vector y which belongs to desc2 (X) but for which no parent

can be identiﬁed.

2. Give an example of a set of vectors X such that |X| > 1 and desc(X) = X.

3. Suppose X = {x1 , x2 , . . . , xn } and Pi (X) = mi for i = 1, . . . , n. Prove that

|desc(X)| = m1 · . . . · mn .

7.3 Fingerprinting Codes 207

One goal that immediately comes to our mind is to secure that a coalition of malicious

users cannot frame an innocent user. Of course, such protection can be put in place

only against reasonably small malicious coalitions in a direct analogy with error-

correcting codes where the decoder is capable of correcting only a limited number

of mistakes.

Deﬁnition 7.3.4 A code C is called w-frameproof (w-FP code) if for every subset

X ⊂ C such that |X| ≤ w we have

desc(X) ∩ C = X.

another user, who is not in the coalition, by producing the ﬁngerprint of that user.

Example 7.3.4 The code C consisting of the n elements of the standard basis of Znq

e1 = (1 0 0 . . . 0),

e2 = (0 1 0 . . . 0),

...

en = (0 0 0 . . . 1)

Example 7.3.5 The code in Example 7.3.3 is 3-frameproof. Indeed, the ﬁrst four

users cannot be framed by any coalition to which they do not belong because each

of them contains 3 in the position where all other users have symbols different from

3. It is also easy to see that the two last users cannot be framed by any coalition of

three or fewer users.

The following function will be useful in our proofs. For any two words u, v of

length n we deﬁne I(u, v) = n − d(u, v). In other words, I(u, v) is the number of

coordinates where u and v agree.

As in the theory of error-correcting codes, the minimum distance dmin (C) between

any two distinct codewords is an important parameter.

1

dmin (C) > n 1 − ,

w

y ∈ C \ X, that is y ∈ desc(X). Since y, xi ∈ C, for every i = 1, 2, . . . , w we

208 7 Error-Correcting Codes

have d(y, xi ) > n (1 − 1/w) and hence we obtain I(y, xi ) = n − d(y, xi ) < n −

(n − n/w) = n/w. This means that y and xi coincide in less than n/w positions and,

hence, fewer than n/w positions of y could come from xi . Since we have exactly w

elements in X, it follows now that fewer than w · n/w = n coordinates in y can come

from vectors of X. Hence at least one coordinate of y, say yj , does not coincide with

the jth coordinates of any of the vectors x1 , x2 , . . . , xw and therefore yj ∈

/ Pj (X).

This contradicts the assumption that y is a descendant of X. �

Exercises

The code C ⊂ {1, 2, 3}6 consists of six codewords:

c4 = (1 2 3 1 2 3), c5 = (2 3 1 2 3 1), c6 = (3 1 2 3 1 2).

2. Prove that it is 2-frameproof.

Deﬁnition 7.3.5 We say that a code C has the identiﬁable parent property of order

w (w-IPP code) if for any x ∈ descw (C) the family of subsets

What this says is that, for any w-IPP code and for any x ∈ descw (C) this vector

cannot be produced without the participation of a certain user: the one who is in the

intersection of the family of subsets (7.10). Therefore this user can be identiﬁed. The

w-IPP property is stronger than w-frameproofness.

Proposition 7.3.1 Any code C with the identiﬁable parent property of order w is

w-frameproof.

Proof Suppose that the w-IPP property holds but a certain coalition X with no more

than w users can frame an innocent user c ∈ C \ X. Then c ∈ desc(X) and c ∈

desc({c}). Since {c} ∩ X = ∅, this contradicts the w-IPP property. �

Let us now give a non-trivial example of a w-IPP code.

Example 7.3.6 The following code has the identiﬁable parent property of order 2

and was constructed with the help of a Reed–Solomon code:

c1 = (1 1 1 1 1),

7.3 Fingerprinting Codes 209

c2 = (1 2 2 2 2),

c3 = (1 3 3 3 3),

c4 = (1 4 4 4 4),

c5 = (2 1 2 3 4),

sc6 = (2 2 1 4 3),

c7 = (2 3 1 4 2),

c8 = (2 4 3 2 1),

c9 = (3 1 4 2 3),

c10 = (3 2 3 1 4),

c11 = (3 3 2 4 1),

c12 = (3 4 1 3 2),

c13 = (3 4 1 3 2),

c14 = (4 2 4 3 1),

c15 = (4 4 2 1 3).

It is really hard to check that this code indeed is 2-IPP but relatively easy to check

that dmin (C) = 4. As we will see later Theorem 7.3.3 will imply 2-IPP for this code.

Codes with the identiﬁable parent property normally require a large alphabet. The

binary alphabet is the worst one.

Proposition 7.3.2 There does not exist a binary 2-IPP code C with |C| ≥ 3.

a descendant u in the following way. For each i, we consider the coordinates xi , yi ,

zi ; among them there will be a majority of zeros or a majority of ones. We deﬁne ui

to coincide with the majority. Then u belongs to each of the desc(x, y), desc(x, z),

and desc(y, z). However, {x, y} ∩ {x, z} ∩ {y, z} = ∅. �

We see from the Example 7.3.6 that it is not too easy to check that the code in the

above example satisﬁes the identiﬁable parent property of order 2. But there exists

one slightly stronger property that is much easier to check.

Deﬁnition 7.3.6 A code C is called w-traceable (w-TA code) if for any y ∈ descw (C)

the inclusion y ∈ desc(X), for some subset X ⊆ C with |X| = w, implies the existence

of at least one codeword x ∈ X such that d(y, x) < d(y, z) for any z ∈ C \ X.

If a code is a w-TA code, we can always trace at least one parent of y ∈ descw (C)

using a process similar to maximum likelihood decoding for error correcting codes.

Indeed, the following proposition is true.

210 7 Error-Correcting Codes

Proposition 7.3.3 Suppose that a code C is w-traceable, and y ∈ desc(X) for some

subset X ⊆ C with |X| = w. Let x1 , x2 , . . . , xk be the set of vectors from C such

that d = d(y, x1 ) = · · · = d(y, xk ) and no vector z ∈ C satisﬁes d(y, z) < d. Then

{x1 , x2 , . . . , xk } ⊆ X.

Proof Suppose xi ∈/ X for some i. Then by the traceability property there must be a

vector in x ∈ X such that d(y, x) < d(y, xi ) = d, which contradicts the minimality

of d. �

Let us now state one obvious fact.

Lemma 7.3.1 Let X = {x1 , x2 , . . . , xw } and y ∈ desc(X). Then there exists i ∈

{1, 2, . . . , w} such that I(xi , y) ≥ n/w.

Proof Suppose on the contrary that I(xi , y) < n/w for all i ∈ {1, 2, . . . , w}. Then

y inherited fewer than n/w coordinates from each xi . In total it inherited fewer than

n · n/w = n coordinates from vectors of X and cannot be a descendant of X. �

Theorem 7.3.2 Any w-TA code C is also a w-IPP code.

Proof Suppose that the code C is w-traceable. Let x ∈ descw (C). Let us consider a

family of subsets (7.10). Suppose y ∈ C is the closest or one of the closest vectors of

C to x, i.e., the distance d(x, y) is the smallest possible. Because C is w-traceable y

must belong to every subset of the family (7.10), hence its intersection is nonempty

and the w-IPP property holds. �

Theorem 7.3.3 Suppose that a code C of length n has a minimum distance

1

dmin (C) > n 1 − 2 .

w

Then C is a w-traceable code and hence has the identiﬁable parent property of

order w.

Proof Let X ⊆ C with |X| = w. Suppose X = {x1 , x2 , . . . , xw }. Let us consider any

z ∈ C \ X. Then, for any i, I(z, xi ) = n − d(z, xi ) < n − (n − n/w2 ) = n/w 2 , i.e.,

the number of coordinates where z and xi agree is less than n/w2 . We now deﬁne

I(z, X) = {j | zj ∈ Pj (X)}.

We obtain now n n

I(z, X) ≤ wI(z, xi ) < w · = . (7.11)

w2 w

On the other hand, by Lemma 7.3.1, for every y ∈ desc(X) we can ﬁnd a xi such that

I(xi , y) ≥ n/w. Thus we obtain d(xi , y) ≤ n − n/w = n(1 − 1/w) while for any

z ∈ C \ X we will have I(z, y) ≤ I(z, X) < n/w and hence d(z, y) > n − n/w =

n(1 − 1/w), proving w-traceability. �

This theorem works only for a reasonably large alphabet.

7.3 Fingerprinting Codes 211

Exercises

1. Let the size of the alphabet be q. Then there does not exist a w-IPP code C with

|C| > w ≥ q.

2. Using the Reed–Solomon code C over Z17 of length 16 with minimum distance 13,

show that there exists a ﬁngerprinting code with the identiﬁable parent property

of order 2 containing 83521 codewords.

References

1. Ross, K., Wright, K.: Discrete Mathematics. Prentice Hall, Upper Saddle River (1999)

2. Peterson, W.W., Weldon, E.J.: Error-Correcting Codes, 2nd edn. MIT Press, Cambridge (1972)

3. Macwilliams, F.J., Sloane, N.J.A.: The Theory of Error-Correcting Codes. North-Holland,

Amsterdam (1977)

4. Boneh, D., Shaw, J.: Collusion-secure ﬁngerprinting for digital data. IEEE Trans. Inf. Theory

44(5), 1897–1905 (1998)

5. Staddon, J.N., Stinson, D.R., Wei, R.: Combinatorial properties of frameproof and traceability

codes. IEEE Trans. Inf. Theory 47(3), 1042–1049 (2001)

Chapter 8

Compression

Baltasar Gracián y Morales (1601–1658)

resource so if our ﬁles can be stored in a more economic fashion, this has to be

done. Some ﬁles, like pictures, contain a lot of redundancy and can be compressed

signiﬁcantly even without loss of quality of pictures. There are numerous ways to

do so.

There are three major approaches to measuring the quantity of information in

a message of a certain alphabet: probabilistic, combinatorial, and algorithmic. The

probabilistic view is that information is anything that resolves uncertainty. The more

uncertain an event that may or may not take place in the future, the more information

is required to resolve the uncertainty. This works well with messages generated by

random sources but cannot help answering questions like: “What is the quantity of

information in Leo Tolstoy’s “War and Peace”? or “How much information is needed

for the reproduction of a particular form of cockroach?”.

The combinatorial approach tries to reduce complex events to some basic ones.

Suppose you would like to know if there will be rain tomorrow. You look at the

weather forecast and get the answer. This is a simple ‘yes’ or ‘no’ situation and

it is easy to resolve. Suppose a 1 means ‘no rain’ and a 0 means rain, then one

binary digit carries all the information you need. One bit is a unit of information

expressed as a choice between two possibilities 0 and 1. Asking whether there will

be rain tomorrow you ask for one bit of information. Information for more complex

events can also be measured in bits. Given a set of possible events we ask how

many bits of information is required to individualise each particular event. Suppose

n binary digits are sufﬁcient to give a distinctive label to every event and you cannot

do this with n − 1 binary digits. Then we say that every event in the set of events

carries n bits of information.

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_8

214 8 Compression

complexity. Roughly speaking, the longer the program that we have to write for

a computer to output the given message, the less redundancy this massage has and

less compressible it is.

Here we give a glimpse of the combinatorial approach describing Fitingof’s

compression code. These types of codes are universal as they can be used when

we do not know how the data was generated. Boris Fitingof [2] developed the ﬁrst

such code, and the construction is quite elegant. His paper was inspired by a paper of

Kolmogorov [1]. However, it is fair to consider Fitingof the founder of the universal

encoding.

Let be a ﬁnite set and || be the number of elements in it. Suppose that we want

to give an individual label to each element of and each label must be a sequence

of zeros and ones. How long must our sequences be so that we have enough labels

for all elements of ? Since we have exactly 2n sequences of length n, this number

should be taken so that 2n ≥ ||. If we aim at sequences of the shortest possible

length, we should choose n so that

example, the labeling can be done in the following way. Let || = 2n (or n =

log2 ||), and ω0 , ω1 , . . . , ω||−1 be elements of listed in some order. Then we

can think of the correspondence

ωk → k → k(2) ,

convention: if k, in binary, has < n binary digits then n − zeros are added

in front of the standard binary representation of k. In other words, the information

contained in ωk is the binary representation of k. Then, under this arrangement, every

element of carries exactly n bits of information.

Example 8.1.1 Let || = 16, n = 4. Then ω5 can be put in correspondence to 5 and

to 5(2) = 0101. Thus, every element of carries 4 bits of information.

Deﬁnition 8.1.1 The information of an element ω ∈ is by deﬁnition

8.1 Preﬁx Codes 215

Here and further in this section all logarithms will be taken to base 2.

Let x be the nearest integer which is greater than or equal to x. Then (8.1)

implies n ≥ log || > n − 1, hence, for an element ω ∈ , the integer I (ω) is

the minimal number of binary symbols necessary for individualising ω among other

elements of .

Let now

= 1 ∪ 2 ∪ · · · ∪ n (8.3)

be a partition of into n disjoint classes. Let π(ω) denote the class which contains ω.

Deﬁnition 8.1.2 The information of an element ω ∈ relative to the given partition

is deﬁned as

I (ω) = log |π(ω)|. (8.4)

It can be interpreted as follows. In a partitioned set, when the information about the

partition is public knowledge, every element ω ∈ carries information only about

its class π(ω). In the extreme case, when there is only one class in the partition, i.e.,

the set itself, we get the same concept as in Deﬁnition 8.1.1.

Example 8.1.2 Let = Z42 be the four-dimensional vector space over Z2 . Let

partition of . Let u = 1111, v = 0010, w = 0101.1 Then

� �

4

I (u) = log |4 | = log = log 1 = 0 bits,

4

� �

4

I (v) = log |1 | = log = log 4 = 2 bits,

1

� �

4

I (w) = log |2 | = log = log 6 ≈ 2.6 bits.

2

= 0 ∪ 1 ∪ . . . ∪ n

� � of vectors of

Hamming weight i. Let z ∈ Zn2 have weight d. Since |d | = dn ,

� �

n

I (z) = log |d | = log .

d

1 In this chapter we will identify vectors from Zn2 and words of length n in the binary alphabet.

216 8 Compression

If d is small, then

� �

n n(n − 1) . . . (n − d + 1)

I (z) = log = log < log n d = d log n,

d d!

is close to n, the information will be small too. It will be maximal for d = n/2 in

which case, due to the asymptotic formula,

� � �

n 2 2n

∼ ·√ , (8.5)

n/2 π n

which can easily be obtained from Stirling’s formula 2.2. This implies

� � � �

n n 1 1

I (z) = log = log ∼ n − log n + (1 − log π) ∼ n,

d n/2 2 2

Proposition 8.1.1 For the partition (8.3)

�

2−(I (ω)+log n) = 1. (8.6)

ω∈

Thus

� n

� n

� n

�

|i | 1

2−(I (ω)+log n) = |i |2(− log |i |−log n) = = = 1.

|i |n n

ω∈ i=1 i=1 i=1

�

We shall see soon what equation (8.6) means.

Exercises

1. How many bits of information does one need to specify one letter of the English

alphabet?

2. In a magic trick, there are three participants: the magician, an assistant, and a

volunteer. The assistant, who claims to have paranormal abilities, is in a sound-

proof room. The magician gives the volunteer six blank cards, ﬁve white and one

blue. The volunteer writes a different integer from 1 to 100 on each card, as the

magician is watching. The volunteer keeps the blue card. The magician arranges

the ﬁve white cards in some order and passes them to the assistant. The assistant

then announces the number on the blue card. How does the trick work?

8.1 Preﬁx Codes 217

Let X be a ﬁnite set (alphabet) and X n be the set of all possible words of length n in

this alphabet. We stress that in X n we collect all words regardless of whether they

are meaningful or not. For example, if X is the English alphabet, then yyyx za is also

considered as a word belonging to X 6 . Let also W (X ) be the set of all words in this

alphabet, i.e.,

W (X ) = X 1 ∪ X 2 ∪ · · · ∪ X n ∪ . . .

(compression) code we understand a mapping

ψ : X n → W (Z2 ). (8.7)

This means that every word w from X n is encoded into a binary codeword ψ(w).

Note that the length of w is strictly n while the length of ψ(w) can be arbitrary. The

code of a message M, which is a word from W (X ), will be obtained as follows. We

divide M into segments of length n and the tail which is of length at most n (but

by agreement it can also be viewed as of length n; for example, for English words

we may add as many letters ‘z’ at the end of the message as is needed). Then M is

represented as M = w1 w2 . . . ws . . ., where wi ∈ X n and we deﬁne

to be the encoding for M. What we should take care of is that the message (8.8) can

be uniquely decoded and that this decoding is as easy as possible. This is non-trivial

since the words ψ(w1 ), . . . ψ(ws ) . . . may have different lengths and we may not

know, for example, where ψ(w1 ) ends and where ψ(w2 ) starts. We will now introduce

a class of codes for which such decoding is possible.

Deﬁnition 8.1.4 A non-uniform code (8.7) is said to be a preﬁx code if for every two

words w1 , w2 ∈ X n neither of the two codewords ψ(w1 ), ψ(w2 ) is the beginning of

the other.

If our code is a preﬁx one, then we can decode (8.8) uniquely. Indeed, there will be

only one codeword which is the beginning of (8.8) and that will be ψ(w1 ). Similarly

we decode the rest of the message.

Example 8.1.4 Let X = {a, b, c} and ψ(a) = 1, ψ(b) = 01, ψ(c) = 00. This is a

preﬁx code and the message 0001101100 can be uniquely decoded as

218 8 Compression

Example 8.1.5 Every binary rooted tree gives us a preﬁx code. We assign a 1 to each

edge from a parent to its left child and a 0 to each edge from a parent to its right

child. Then the set of all terminal vertices can be identiﬁed with the set of codewords

of a preﬁx code. Indeed, for any terminal vertex, there is a unique directed path from

the root to it. This path gives a string of 0’s and 1’s which we assign to the terminal

vertex. Since we always ﬁnish at a terminal vertex, no path is a beginning of the other

and therefore no codeword will be a beginning of the other. For example, the tree

below will give us the code {0, 11, 101, 100}.

1 0

1 0 0

11 1 0

101 100

ψ : X n → W (Z2 ) with the lengths of codewords m 1 , m 2 , . . . , m q exists if and only if

q

�

2−m i ≤ 1. (8.9)

i=1

Proof We will assume that m = max(m 1 , . . . , m q ), which means that the longest

codeword has length m. Suppose that a preﬁx code possesses a codeword u of length

i. Then the 21 = 2 words u0 and u1 cannot be codewords. The 22 = 4 words u00,

u01, u10 and u11 also cannot be codewords. In general all 2k−i words of length k

obtained by extending u to the right cannot be codewords. If v is another codeword

of length j then it excludes another 2k− j words of length k from being codewords.

The codewords u and v cannot exclude the same word, otherwise one of them will

be the beginning of the other.

Let us denote by S j the number of codewords of length j. Then, as we just noticed,

words of length k cannot be codewords. This number plus Sk , which is the number of

codewords of length k, should be less than or equal to 2k , which is the total number of

words of length k. The existence of a preﬁx code with the given lengths of codewords

implies that the following inequality holds for any k = 1, . . . m:

8.1 Preﬁx Codes 219

Thus, all these inequalities are necessary conditions for the existence of such a preﬁx

code. But the inequality for k = m is the strongest because it implies all the rest.

Indeed, (8.10) implies

i.e., the same inequality for k − 1. Thus, indeed, the inequality for k = m implies all

other inequalities.

Taking this strongest inequality (8.10) and dividing it by 2m we get

m

�

S j · 2− j ≤ 1. (8.11)

j=1

m

� q

�

S j · 2− j = 2−m i .

j=1 i=1

Hence the inequality (8.9) is a necessary condition for the existence of a preﬁx code

with lengths of codewords m 1 , m 2 , . . . , m q .

Let us show that it is also sufﬁcient. Let S j be the number of codewords of length

j and m be the maximal length of codewords. We will again use (8.9) in its equivalent

form (8.11) which implies (8.10) for all k = 1, . . . , m.

Firstly, we take S1 arbitrary words of length 1. Since (8.10) for k = 1 gives

2 − S1 ≥ 0, we have S1 ≤ 2 and we can do this step. Suppose that we have done

k − 1 steps already and have chosen Si words of length i for i = 1, . . . , k−1 so that

no one word is the beginning of the other. Then the chosen words will prohibit us

from choosing

S1 · 2k−1 + S2 · 2k−2 + · · · + Sk−1 · 2

� �

2k − S1 · 2k−1 + S2 · 2k−2 + · · · + Sk−1 · 2 ≥ Sk ,

hence we can ﬁnd Sk words of length k which are compatible with the words previ-

ously chosen. This argument shows that the construction of the code can be completed

to the end. �

220 8 Compression

1 1 1 1

+ 2 + 3 + 3 = 1. (8.12)

21 2 2 2

If X = {a, b, c, d}, then according to Theorem 8.1.1 there exists a preﬁx code

ψ : X → W (Z2 ) with the lengths of the codewords 1, 2, 3, 3. Let us choose the

codeword ψ(a) = 0 of length 1, then we cannot use the words 00 and 01 for the

choice of the codeword for b of length 2 and we choose ψ(b) = 10. For the choice

of codewords for c and d we cannot choose the words 000, 001, 010, 011 (because

of the choice of ψ(a)) and the words 100, 101 (because of the choice of ψ(b)), thus

we choose the two remaining words of length 3, i.e., ψ(c) = 110 and ψ(d) = 111.

Suppose now that X = {a, b}. Then |X 2 | = 4 and we can use (8.12) again for

this situation to deﬁne a code ψ : X 2 → W (Z2 ) as follows:

The words abba and baabab will be encoded as 010 and 1000, respectively. The

word 11111001000 can be represented as

11111001000 = ψ(bb)ψ(aa)ψ(ab)ψ(ba)ψ(ab)ψ(ab)

= 1 ∪ 2 ∪ · · · ∪ n

be a partition of a ﬁnite set into n disjoint classes. Then there exists a preﬁx

code ψ : → W (Z2 ) such that for any ω ∈ the length of the codeword ψ(ω)

is l(ω) = I (ω) + log n, where I (ω) is the information of ω relative to the given

partition.

� �

2−I (ω)+log n ≤ 2−(I (ω)+log n) = 1,

ω∈ ω∈

The existence of the code is not everything. Another important issue is its fast

decodability.

Exercises

1. Check that the set {11, 10, 00, 011, 010} is a set of codewords of a preﬁx code

and construct the corresponding tree.

8.1 Preﬁx Codes 221

1 1 1 1 1 1 1

+ 3 + 3 + 4 + 4 + 4 + 4 = 1,

22 2 2 2 2 2 2

the existence of which preﬁx code can we imply from Kraft’s inequality?

3. Let X be an alphabet consisting of 9 elements. Construct a preﬁx binary code

ψ : X → W (Z2 ) with the lengths of the codewords: 2, 3, 3, 3, 3, 3, 4, 5, 5 in

following steps:

(a) Use Kraft’s inequality to prove that such a code exists.

(b) Construct any tree that corresponds to such a code.

(c) List the codewords corresponding to this tree.

8.2.1 Encoding

We need to compress ﬁles when we are short of memory and want to use it effectively.

Since computer ﬁles are already written as strings of binary digits, in this section we

will consider the code ψ : Zn2 → W (Z2 ) which encodes binary sequences of ﬁxed

length n into binary sequences of variable length. The idea of Fittingof’s compression

is expressed in Example 8.1.3, where it was shown that the information of a vector

from Zn2 of small (or large) Hamming weight is relatively small compared to n.

Therefore if we encode words in such a way that the length of a codeword ψ(x)

will be approximately equal to the information of x, then words of small and large

Hamming weights will be signiﬁcantly compressed. This, for example, often works

well with photographs.

In this section we will order all binary words of the same length using lexico-

graphic order. This order depends on an order on our binary symbols and we will

assume that zero precedes one (denoted 0 ≺ 1).

Deﬁnition 8.2.1 Let y = y1 y2 . . . yn and z = z 1 z 2 . . . z n be two binary words of the

same length. We say that y is lexicographically earlier than z, and write y ≺ z, if for

some k ≥ 0

This order is called lexicographic since it is used in dictionaries to list words.

For example, in the Oxford English Dictionary the word “abash” precedes the word

“abate” because the ﬁrst three letters of these words coincide but the fourth letter “s”

of “abash” precedes, in the English alphabet, the fourth letter “t” of “abate”.

222 8 Compression

For example, all 15 binary words of length 6 and weight 4 will be listed in lexi-

cographic order as shown:

(8.13)

≺ 101110 ≺ 110011 ≺ 110101 ≺ 110110 ≺ 111001 ≺ 111010 ≺ 111100.

We can refer to these words by just quoting their ordinal numbers. We adopt the

agreement that the ﬁrst word has ordinal number zero. Thus an ordinal number x

is the number of words that are earlier than x. In particular, the ordinal number of

101011 is 6.

d. If X d is ordered lexicographically, then the ordinal number N (x) of x in X d can

be calculated as

� � � � � �

n−n d n−n 2 n−n 1

N (x) = + ··· + + , (8.14)

1 d−1 d

where the 1’s in x occupy the positions n 1 < n 2 < · · · < n d (counting from the left).

Proof Firstly, we count all the words of weight d whose n 1 −1 leftmost symbols

coincide with those of x, i.e., are all zeros, and the position n 1 is also occupied by

a zero (this condition secures that all such words are lexicographically earlier than

x). �Since�we have to distribute d ones between n−n 1 remaining positions, there will

be n−n d

1

such words. Secondly, we have to count all the words whose ﬁrst n 2 −1

symbols

�n−n 2 � coincide with those of x and which have a zero in the position n 2 . There are

d−1 such words as we have to distribute d−1 ones between n−n 2 places. Finally,

we will have to count all words whose ﬁrst n d −1 symbols�coincide � with those of

x and which have a zero in the position n d . There will be n−n 1

d

such words. All

the words that are lexicographically earlier than x are now counted. As the ordinal

number of x is equal to the number of words which lexicographically precede x, this

proves (8.14). �

n 2 = 3, n 3 = 5, n 4 = 6 and d = 4. So

� � � � � � � �

0 1 3 5

N (x) = + + + = 0 + 0 + 1 + 5 = 6,

1 2 3 4

The idea of this code is to characterise any word x from X = Zn2 by two parameters,

namely, its Hamming weight d and the ordinal number N(x) of x in X d . We partition

X = Zn2 into n + 1 disjoint classes

8.2 Fitingof’s Compression Code 223

The codeword ψ(x) for x ∈ X d (i.e., for a word x of weight d) will consist of two

parts: ψ(x) = μ(x)ν(x), where μ(x) is the preﬁx of ﬁxed length log(n + 1), which

is the binary code for d, and ν(x) is the binary � �� of the ordinal number N (x) of x in

� code

the class X d consisting of log |X d | = log dn binary symbols. Both parameters

together characterise x uniquely. In total the length of the codeword ψ(x) = μ(x)ν(x)

will be � � ��

n

l(ψ(x)) = log(n + 1) + log .

d

o(n)

l(ψ(x)) = I (x) + o(n), → 0,

n

i.e., equal to its information relative to the given partition.

We now state the main theorem of this chapter.

Theorem 8.2.1 (Fitingof) There exists a preﬁx code ψ : Zn2 → W (Z2 ) for which the

length of the codeword ψ(x) is asymptotically equal to the information of the word

x and for which there exists a decoding procedure of polynomial complexity.

Proof We have shown already that the length of the codeword ψ(x) is asymptotically

equal to the information of the word x. Let us prove that Fitingof’s code is a preﬁx

one. Suppose ψ(x1 ) = μ(x1 )ν(x1 ) is a beginning of ψ(x2 ) = μ(x2 )ν(x2 ). We know

that the length of μ(x1 ) is the same as the length of μ(x2 ), hence μ(x1 ) = μ(x2 ) and

hence x1 and x2 has the same weight. But then the length of ν(x1 ) is the same as the

length of ν(x2 ) and hence ψ(x1 ) and ψ(x2 ) have the same length. However, in such

a case one cannot be a beginning of another without being equal.

The proof will be continued in the next section devoted to the decoding algorithm.

�

2 → W (Z2 ). For

the vector

x = 0000000100000101000100000000000

we will have μ(x) = 00100 because wt(x) = 4 = 100(2) and the preﬁx must be of

length 5 to accommodate � all

� possible weights in the range from 0 to 31. The length of

the sufﬁx ν(x) will be 31

4 = 15. Further, we will have n 1 = 8, n 2 = 14, n 3 = 16,

n 4 = 20 and

� � � � � � � �

11 15 17 23

N (x) = + + + = 9651 = 10010110110011(2) ,

1 2 3 4

224 8 Compression

10110011 has length 20.

Exercises

1. Put the following three words of Z72 in the increasing lexicographic order:

2. How many vectors of Hamming weight at least 4 and at most 5 are there in Z10

2 ?

3. Calculate the ordinal number of the word w = 0011011 in X 4 ⊂ Z72 .

4. Let ψ : Z15

2 → W (Z2 ) be Fitingof’s code.

(a) How long is the preﬁx which shows the Hamming weight of the word?

(b) Given x = 000010100000100, how long must be the sufﬁx of the codeword

ψ(x)?

(c) Encode x, i.e., ﬁnd ψ(x).

To decode a message we have to decode the codewords one by one starting from the

ﬁrst. Suppose the ﬁrst codeword is ψ(x). First, we separate its preﬁx μ(x) (because it

is of ﬁxed known length log(n + 1)) and� reconstruct

� �� d = wt(x). Then, knowing d,

we calculate the length of ν(x), which is log dn . Then looking at ν(x) and knowing

that it represents the ordinal number N (x) of x in X d , we reconstruct N = N (x).

Then we are left with the equation

� � � � � �

xd x2 x1

+ ··· + + =N (8.15)

1 d−1 d

to solve for xd < · · · < x2 < x1 , where xi = n − n i . This can be done in a fast and

elegant way using the properties of Pascal’s triangle, part of which is shown below:

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

� �

The nth row of this triangle contains the binomial coefﬁcients mn , m = 0, 1, . . . n,

where m increases from left to right. These binomial coefﬁcients are deﬁned induc-

tively by the formula

� � � � � �

n n−1 n−1

= + (8.16)

j j j−1

8.2 Fitingof’s Compression Code 225

�0� �0�

and the boundary conditions: 0 = 1, and m = 0 for all 0 = m ∈ Z. We also know

the explicit formula � �

n n!

= ,

m m!(n−m)!

The solution of (8.15) will be based on the formula

� � � � � � � � � �

n−d n−d+1 n−1 n n+1

+ + ··· + + = . (8.17)

0 1 d−1 d d

� � � �

n n+1

1+ = or 1 + n = n + 1,

1 1

which is true. Let us assume that (8.17) is true for d = k − 1. Then by the induction

hypothesis, applied to the ﬁrst k − 1 summands of the left-hand side of (8.17), and

using (8.16), we get

� � � � � � � �

n−k n−k+1 n−1 n

+ + ··· + + =

0 1 k−1 k

�� � � � � �� � �

(n−1)−(k−1) (n−1)−(k−1)+1 n−1 n

+ + ··· + + =

0 1 k−1 k

� � � � � �

n n n+1

+ = ,

k−1 k k

Proposition 8.2.1 Suppose the Eq. (8.15) is satisﬁed for some x1 , . . . , xd such that

xd < xd−1 < · · · < x1 . Then x1 can be found as the largest integer satisfying the

inequality

� �

x1

≤ N. (8.18)

d

�m �

Proof Suppose that x1 < m, where m is the largest integer satisfying d ≤ N.

Then, since xd < xd−1 < · · · < x1 , by (8.17)

� � � � � �

xd x2 x1

+ ··· + + ≤

1 d−1 d

226 8 Compression

� � � � � � � � � �

x1 −d+1 x1 −1 x1 x1 +1 m

+ ··· + + = −1< ≤ N,

1 d−1 d d d

� � Indeed, we

ﬁnd x1 directly applying Proposition 8.2.1. Then we move the term xd1 to the right

� � � � � �

xd x2 x1

+ ··· + =N−

1 d−1 d

� � � � � � � �

x4 x3 x2 x1

+ + + = 30

1 2 3 4

� � � � � �

we ﬁnd successively: x41 = 15 and x1 = 6, x32 = 10 and x2 = 5, x23 = 3 and

�x4 �

x3 = 3, 1 = 2 and x4 = 2.

If we needed, for example, to ﬁnd the word x which has ordinal number 30 in

X 4 ⊂ Z72 , then according to the equation (8.14)

� � � � � � � �

7−n 4 7−n 3 7−n 2 7−n 1

+ + + = 30,

1 2 3 4

n 3 = 4, n 4 = 5, whence x = 1101100.

Exercise

1. Let ψ : Z15

2 → W (Z2 ) be Fitingof’s compression code. Decode ψ(y) =

00100011110, i.e., ﬁnd y.

is based on the assumption that this word was generated by a random source. This

is, of course, not always a realistic assumption so our approach here is more general.

We will show, however, that in the case of a random source the two approaches are

asymptotically equivalent, i.e., when n gets large. Let us consider a random source

which sends signal “1” with probability p and signal “0” with probability 1 − p.

Then the measure of uncertainty about what the next signal will be is given by the

binary entropy function

8.3 Information and Uncertainty 227

(logarithms are to the base 2 and it is assumed that 0 · log 0 = 0). The uncertainty

is minimal when p = 0 or p = 1, in which case we essentially don’t have any

uncertainty and the entropy of such source is zero. If p = 1/2, then the uncertainty

is maximal and the entropy of such source is equal to 1. We say that one symbol sent

from such a random source contains H ( p) bits of information. Thus we have 1 bit

of information from a symbol from a random source only in the case of probability

1/2. A word of length n contains n H ( p) bits of information.

Given a binary word x of length n consisting of m 1 ones and m 2 zeros we deﬁne

m1 m1 m2 m2

H (x) = − log − log .

n n n n

Of course, if this word was generated from a random source with probability p, then

m 1 /n → p, when n gets large, and H (x) → H ( p). The following theorem then

shows that the two approaches are equivalent.

log n

where as usual o(1) → 0, when n → ∞. Moreover, o(1) ∼ n .

will need Stirling’s formula (2.2) again. We use it to calculate

� � �� � �

n n n!

I (x) = log ∼ log = log

m1 m1 (m 1 )!(m 2 )!

√ n −n

2πn n e

∼ log √ √

( 2πm 1 m m 1

1 −m 1

e )( 2πm 2 m m 2 −m 2

2 e )

1 n 1

= log + log � m 1 �m 1 � m 2 �m 2

2 2πm 1 m 2 n n

1 n m1 m2

= log − m 1 log − m 2 log .

2 2πm 1 m 2 n n

Therefore

I (x) 1 n m1 m1 m2 m2

= log − log − log = o(1) + H (x),

n 2n 2πm 1 m 2 n n n n

228 8 Compression

References

1. Kolmogorov, A.N.: Three approaches to the deﬁnition of the concept “the quantity of informa-

tion”. Probl. Inf. Transm. 1(1), 3–11 (1965)

2. Fitingof, B.M.: Optimal encoding under an unknown or changing statistics. Probl. Inf. Transm.

2(2), 3–11 (1966)

Chapter 9

Appendix A: GAP

GAP is a system for computational algebra. GAP has been and is developed by the

international cooperation of many people, including user contributions. This package

is free and you can install it onto your computer using the instructions from the

website www.gap-system.org. A reference manual and tutorial can be found

there. There is plenty of information about GAP available online too.

Once you have started GAP, you can start working straight away. If you type a simple

command (for example, ‘quit’) followed by a semi-colon, GAP will evaluate your

command immediately. If you press enter without entering a semi-colon, GAP will

simply give you a new line to continue entering more input. This is useful if you want

to write a more complicated command, perhaps a simple program. If you wanted your

simple command to be evaluated, then simply enter a semi-colon on the new line

and press enter again. A double semi-colon executes the command but suppresses

the output. Since GAP ignores whitespace, this will work just the same as if you

had entered the semi-colon in the ﬁrst place. A semi-colon will not always cause

GAP to evaluate straight away, GAP is able to work out whether you have ﬁnished

a complete set of instructions or are part of the way through entering a program.

Another way to interact with GAP, which is particularly useful for things you

want to do more than once, is to prepare a collection of commands and programs in a

text ﬁle. Then you can type the command Read (“MyGAPprog.txt”); and GAP will

evaluate all of the instructions in your text ﬁle. If your ﬁle is not in the same place

that GAP was launched from, you will have to provide its relative path (for example,

“../../GAPprogs/Example1.txt”).

© Springer International Publishing Switzerland 2015 229

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_9

230 9 Appendix A: GAP

You can declare a variable in GAP using the ‘:=’ operator. For example, if you

want a variable n to equal 2000, you would enter n := 2000;, or if you want n

to be the product of p and q you would enter n := p ∗ q;. You can also declare

lists using the ‘:=’ operator, for example, zeros := [0,0,0];. The command

list:=[m..n]; deﬁnes the list of integers m, m + 1, m + 2, . . . , n. A list may

have several identical numbers in it. Lists have a length given by the command

Length(listName);, and their entries can be referenced individually by typing

listName[index]; (indices start from 1!). In GAP a list of primes ≤ 1000 is

stored. It is called ‘Primes’. This is very useful.

gap> Primes;

[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,

79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,

173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263,

269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,349, 353, 359, 367,

373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,

467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587,

593, 599, 601, 607, 613, 617, 619, 631, 641,643, 647, 653, 659, 661, 673, 677, 683,

691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,

821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929,

937, 941, 947, 953, 967, 971, 977, 983, 991, 997 ]

The command

gap> Length(Primes);

168

gives us the number of primes in this list. We can ﬁnd the prime in 100th position

and the position of 953 in this list as follows:

gap> Primes[100];

541

gap> Position(Primes,953);

162

Sets cannot contain multiple occurrences of elements and the order of elements

does not matter. Basically GAP views sets as ordered lists without repetitions. The

command Set(list); converts a list into a set.

gap> list:=[2,5,8,3,5];

[ 2, 5, 8, 3, 5 ]

gap> Add(list,2);

gap> list;

[ 2, 5, 8, 3, 5, 2 ]

gap> set:=Set(list);

[ 2, 3, 5, 8 ]

gap> RemoveSet(set,2);

gap> set;

[ 3, 5, 8 ]

For loops and while loops exist in GAP. Both have the same format:

for (while) [condition] do [statements] od;

9.1 Computing with GAP 231

For example, the following for loop squares all of the entries in the list ‘boringList’,

and places them in the same position in the list ‘squaredList’:

gap> boringList:=[2..13];

[ 2 .. 13 ]

gap> squaredList:=[1..Length(boringList)];

[ 1 .. 12 ]

gap> for i in [1..Length(boringList)] do

> squaredList[i]:=boringList[i]ˆ2;

> od;

gap> squaredList;

[ 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169 ]

Here is an example of using a while loop. We want to square the ﬁrst ﬁve numbers

of the boringList.

gap> boringList:=[2..13];;

gap> i:=1;;

gap> while i<6 do

> boringList[i]:=boringList[i]ˆ2;

> i:=i+1;

> od;

gap> boringList;

[ 4, 9, 16, 25, 36, 7, 8, 9, 10, 11, 12, 13 ]

Lists may contain other lists. Analyse the following program that lists all pairs of

twin primes not exceeding 1000. It also illustrates the use of the ‘if-then’ command.

if [condition] then [statements] fi;

Here it is:

gap> twinpairs:=[];

[ ]

gap> numbers:=[1..Length(Primes)-1];

[ 1 .. 167 ]

gap> for i in numbers do

> if Primes[i]=Primes[i+1]-2 then

> Add(twinpairs,[Primes[i],Primes[i+1]]);

> fi;

> od;

gap> twinpairs;

[ [ 3, 5 ], [ 5, 7 ], [ 11, 13 ], [ 17, 19 ], [ 29, 31 ], [ 41, 43 ],

[ 59, 61 ], [ 71, 73 ], [ 101, 103 ], [ 107, 109 ], [ 137, 139 ],

[ 149, 151 ], [ 179, 181 ], [ 191, 193 ], [ 197, 199 ], [ 227, 229 ],

[ 239, 241 ], [ 269, 271 ], [ 281, 283 ], [ 311, 313 ], [ 347, 349 ],

[ 419, 421 ], [ 431, 433 ], [ 461, 463 ], [ 521, 523 ], [ 569, 571 ],

[ 599, 601 ], [ 617, 619 ], [ 641, 643 ], [ 659, 661 ], [ 809, 811 ],

[ 821, 823 ], [ 827, 829 ], [ 857, 859 ], [ 881, 883 ] ]

we have already encountered in the previous section.

232 9 Appendix A: GAP

puts the prime factorisation of n or, more precisely, the primes that enter this prime

factorisation with their multiplicity. The command PrintFactorsInt(n);

gives a nicer view of this prime factorisation but you cannot use the output

as a list, which you can do with the output of FactorsInt. The command

DivisorsInt(n); can be used to ﬁnd all of the divisors of n. The com-

mand PrimeDivisorsInt(n); ﬁnds the set of unique prime divisors of n. For

example,

gap> FactorsInt(571428568);

[ 2, 2, 2, 71428571 ]

gap> PrintFactorsInt(571428568);

2ˆ3*71428571

gap> DivisorsInt(571428568);

[ 1, 2, 4, 8, 71428571, 142857142, 285714284, 571428568 ]

gap> PrimeDivisors(571428568);

[ 2, 71428571 ]

NextPrimeInt(n); gives the smallest prime number that is strictly greater than

n. The action of the command PrevPrimeInt(n); is similar. For example,

gap> IsPrime(571428568);

false

gap> NextPrimeInt(571428568);

571428569

gap> PrevPrimeInt(571428568);

571428527

The list of primes ‘Primes’ contains only the 168 primes that are smaller than 1000.

Using the commands that we have just introduced we can, for example, create a list

of the ﬁrst 5000 primes:

gap> biggerPrimes := [];

[ ]

gap> counter := 1;

1

gap> currentPrime := 2;

2

gap> while counter < 5000 do;

> biggerPrimes[counter] := currentPrime;

> counter := counter + 1;

> currentPrime := NextPrimeInt(currentPrime);

> od;

The remainder and quotient of n divided by m are given by the commands RemInt

(n,m); and QuoInt(n,m);, respectively. For example,

gap> RemInt(9786354,383);

321

gap> QuoInt(9786354,383);

25551

gap> 9786354 mod 383;

321

9.2 Number Theory 233

gap> GcdInt(123456789,987654321);

9

To ﬁnd m, n such that ma + nb = gcd(a, b), use the GAP command Gcdex(a,b);.

For example,

Gcdex(108,801);

returns

rec( gcd := 9, coeff1 := -37, coeff2 := 5, coeff3 := 89, coeff4 := -12 )

where m =coeff1, n =coeff2 (m 1 =coeff3 and n 1 =coeff4 will also work). Another

example,

gap> Gcdex(123456789,987654321);

rec( gcd := 9, coeff1 := -8, coeff2 := 1, coeff3 := 109739369,

coeff4 := -13717421 )

To ﬁnd the least common multiple of m and n, use the GAP command LcmInt(m,

n);. For example,

gap> LcmInt(123456789,987654321);

13548070123626141

The Euler totient function φ(n) is given by the command Phi(n);. For example,

gap> Phi(2ˆ15-1); Phi(2ˆ17-1);

27000

131070

The Chinese remainder theorem states the existence of the minimal solution N ≥ 0

of N = a1 mod n 1 , N = a2 mod n 2 , . . . , N = ak mod n k . The command for ﬁnding

this solution is ChineseRem([n1 , n2 , ..., nk ], [a1 , a2 , ..., ak ]);. For example:

gap> ChineseRem([5,7],[1,2]);

16

GAP does not provide automatic conversion between bases. One way of doing base

conversion is to use the p-adic numbers package, feel free to investigate this on

your own. Another way is to write simple programs. For example, 120789 can be

converted to binary as follows:

gap> n := 120789;

120789

gap> base := 2;

2 gap> rems := [];

[ ]

gap> pos := 1;

1

gap> while n > 0 do;

> rems[pos] := RemInt(n,base);

> n := QuoInt(n,base);

> pos := pos + 1;

> od;

gap> n;

0

gap> rems;

[ 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1 ]

234 9 Appendix A: GAP

That is, 120789 is 11101011111010101 in binary. If you are not sure why the list

rems are read in the reverse order, you need to study the base conversion algorithm

in Chap. 1. As for converting from another base into decimal, you should now be

able to do it yourself. Write a simple program to convert 100011100001111100000

from binary to decimal.

The commands RootInt(n,k); and LogInt(n,b); can be used to determine,

respectively, the integer part √

of the kth (positive real) root of n and the logarithm

of n to the base b, that is, k n and logb (n). These should be used instead of

computing roots and logarithms as GAP does not support real numbers.

Despite not supporting real numbers GAP can display a complicated fraction as

a ﬂoating-point real number e.g.,

gap> Float(254638754321/387498765398);

0.657134

ple,

gap> OrderMod(10,77);

6

ger n, which is the integer r of smallest absolute value for which a positive integer

k exists such that n = r k . For example, 135 = 371293 and this command gives

gap> SmallestRootInt(371293);

13

example,

gap> PowerMod(987654321,123456789,987654321123456823);

171767037218848697

987654321ˆ123456789 mod 987654321123456823;

will be a mistake. The latter may take centuries (guess why). The command

QuotientMod(r,s,m) returns the quotient r s −1 of the elements r and s mod-

ulo m. In particular, using the command QuotientMod(r,s,m) is preferable to

using s −1 mod m. For example,

gap> QuotientMod(1,123456789,987654321123456823);

743084182864240163

gap> 123456789ˆ-1 mod 987654321123456823;

743084182864240163

9.2 Number Theory 235

log of a to the base b modulo m is given by LogMod(a,b,m). For example,

gap> PrimitiveRootMod(23);

5

gap> LogMod(11,5,97);

86

curves; it determines whether or not m is a quadratic residue in Z p and, if it is, outputs

k such that m = k 2 mod p.

gap> q:=[0,0,0,0,0,0,0,0,0,0,0,0];

[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]

gap> for i in [1..12] do

> q[i]:=RootMod(i,13);

> od;

gap> q;

[ 1, fail, 9, 11, fail, fail, fail, fail, 3, 7, fail, 8 ]

In the crypto section we needed to convert messages into numbers. Two small pro-

grams LettertoNumber and NumbertoLetter do the trick.1 They are not part

of GAP so you have to execute them before converting.

LtoN (an acronym for “Letter to Number”) takes any capital letter, which must

be put between apostrophes, e.g., ‘A’, and returns the corresponding number in the

range [0..25]. Any other argument would return −1, and print out an error message.

LtoN:=function(itamar)

local amith;

if itamar < ’A’ or ’Z’ < itamar then

Print("Out of range\n");

return -1;

else

amith:=INT_CHAR(itamar)-65;

return amith;

fi;

end;;

NtoL (an acronym of “Number to Letter”) takes any number, positive or negative,

and ﬁnds the corresponding letter. The argument must be an integer.

NtoL:=function(itamar)

local amith;

amith:=CHAR_INT(itamar mod 26+65);

return amith;

end;;

gap> Read("LettertoNumber");

gap> Read("NumbertoLetter");

gap> letters:="ABRACADABRA";

236 9 Appendix A: GAP

"ABRACADABRA"

gap> numbers:=[1..Length(letters)];

[ 1 .. 11 ]

gap> for i in [1..Length(letters)] do

> numbers[i]:=LtoN(letters[i]);

> od;

gap> numbers;

[ 0, 1, 17, 0, 2, 0, 3, 0, 1, 17, 0 ]

gap> letters2:="ZZZZZZZZZZZ";

"ZZZZZZZZZZZ"

gap> for i in [1..Length(numbers)] do

> letters2[i]:=NtoL(numbers[i]);

> od;

gap> letters2;

"ABRACADABRA"

of the same length. In such situations the following two programs can be used instead.

LtoN1 takes any capital letter, which must be put between apostrophes, e.g., ‘A’,

and returns the corresponding number in the range [11..36]. Any other argument

would return −1, and print out an error message.

LtoN1:=function(itamar)

local amith;

if itamar < ’A’ or ’Z’ < itamar then

Print("Out of range\n");

return -1;

else

amith:=INT_CHAR(itamar)-65+11;

return amith;

fi;

end;;

NtoL1 takes any two-digit number, positive or negative, and ﬁnds the corre-

sponding letter. The argument must be an integer.

NtoL1:=function(itamar)

local amith;

amith:=CHAR_INT(itamar-11 mod 26+65);

return amith;

end;;

The following program CNtoL1 written by Joel Laity is very convenient for

decryption of messages in RSA. It converts a number with any number of digits into

a message. For example,

gap> n:=1112131415161718192021222324252627282930313233343536;

1112131415161718192021222324252627282930313233343536

gap> CNtoL1(n);

"A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"

# CNtoL1 converts a number with an even number of digits to a sequence of characters.

# The last two digits will be converted to a character, two at a time, using the

# function NtoL1 until the entire number is exhausted. The output is a string of the

# characters with spaces in between.

CNtoL1:=function(joel)

local n, string, temp, i;

9.2 Number Theory 237

if IsInt(joel) then

string:=[];

while joel > 0 do

n:=joel mod 100;

joel:= (joel-n)/100;

Add(string,NtoL1(n));

Add(string,’ ’);

od;

#reverses the order of the list

for i in [1..QuoInt(Length(string),2)] do

temp:=string[i];

string[i]:=string[Length(string)+1-i];

string[Length(string)+1-i]:=temp;

od;

#removes extra space

string:=string{[2..Length(string)]};

return string;

else Print("Input must be an integer!");

fi;

end;;

gap> v:=[1,2,3];

[ 1, 2, 3 ]

gap> IsRowVector(v);

true

gap> 2*[1,1,1] + [1,2,3];

[ 3, 4, 5 ]

⎡ ⎤

1 2 3

A = ⎣4 5 6⎦

7 8 9

will be presented as

gap> A:=[[1, 2, 3],[4, 5, 6],[7, 8, 9]];

[ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]

gap> IsMatrix(A);

true

gap> u:=[1,1,1];

[ 1, 1, 1 ]

gap> u*A;

[ 12, 15, 18 ]

238 9 Appendix A: GAP

One has to note that if we multiply the matrix A by a row vector u (which would not

be normally deﬁned) it will actually calculate Au T , e.g.,

gap> A*u;

[ 6, 15, 24 ]

gap> Determinant(A);

0

gap> B:=[[1,1,1],[0,2,1],[0,0,13]];

[ [ 1, 1, 1 ], [ 0, 2, 1 ], [ 0, 0, 13 ] ]

gap> Bˆ-1;

[ [ 1, -1/2, -1/26 ], [ 0, 1/2, -1/26 ], [ 0, 0, 1/13 ] ]

gap> Inverse(B);

[ [ 1, -1/2, -1/26 ], [ 0, 1/2, -1/26], [ 0, 0, 1/13 ] ]

Matrices with entries in Z26 can be added, multiplied and inverted adding mod 26 at

the end of the line, e.g.,

gap> C:=[[1,1,1],[0,3,1],[0,0,5]];

[ [ 1, 1, 1 ], [ 0, 3, 1 ], [ 0, 0, 5 ] ]

gap> Cˆ-1 mod 26;

[ [ 1, 17, 12 ], [ 0, 9, 19 ], [ 0, 0, 21 ] ]

9.4 Algebra

9.4.1 Permutations

the permutation

1 2 3 4

π=

2 1 4 3

will be represented as (1, 2)(3, 4). The identity permutation is represented as ( ). For

example:

gap> pi:=(1,2)(3,4);

(1,2)(3,4)

gap> piˆ2;

()

A permutations can also be deﬁned by its last row using the command PermList.

For example, the permutation π can be deﬁned as

gap> pi:=PermList([2,1,4,3]);

(1,2)(3,4)

Given a permutation written as a product of disjoint cycles, we may recover its last

row using the command ListPerm:

9.4 Algebra 239

gap> tau:=(1,3,4)(2,5,6,7);

(1,3,4)(2,5,6,7)

gap> ListPerm(tau);

[ 3, 5, 4, 1, 6, 7, 2 ]

gap> c:=PermList([2,3,4,1]);

(1,2,3,4)

gap> 2ˆc;

3

gap> Order(PermList([2,4,5,1,3]));

6

Group(n);. Then you can ask to generate a random permutation of degree n. Say,

gap> G:=SymmetricGroup(9);

Sym( [ 1 .. 9 ] )

gap> Random(G);

(1,9,2,8,4,6,3,7)

For working in elliptic curves, ﬁrst of all we have to read the two ﬁles elliptic.gd and

elliptic.gi, given at the end of this section:

gap> Read("elliptic.gd");

gap> Read("elliptic.gi");

gap> G:=EllipticCurveGroup(a,b,p);

If we try to input parameters for which the discriminant of the cubic d = −(4a 3 +

27b2 ) is zero it will return an error. If the discriminant is nonzero, it will generate

the group G. To list it we may use the command AsList(G);

gap> G:=EllipticCurveGroup(3,2,5);

EllipticCurveGroup(3,2,5)

gap> AsList(G);

[ ( 1, 1 ), ( 1, 4 ), ( 2, 1 ), ( 2, 4 ), infinity ]

gap> H:=EllipticCurveGroup(17,19,97);

EllipticCurveGroup(17,19,97)

gap> ptsList := AsList(H);

[ ( 2, 35 ), ( 2, 62 ), ( 3, 0 ), ( 4, 32), ( 4, 65 ), ( 5, 36 ), ( 5, 61 ),

( 7, 44 ), ( 7, 53 ), ( 8, 45 ), ( 8, 52 ), ( 10, 5 ), ( 10, 92 ),

( 12, 37 ), ( 12, 60 ), ( 13, 20 ), ( 13, 77 ), ( 14, 24 ), ( 14, 73 ),

( 16, 33 ), ( 16, 64 ), ( 23, 8 ), ( 23, 89 ), ( 24, 34 ), ( 24, 63 ),

( 25, 8 ), ( 25, 89 ), ( 31, 48 ), ( 31, 49 ), ( 35, 18 ), ( 35, 79 ),

( 36, 40 ), ( 36, 57 ), ( 37, 45 ), ( 37, 52 ), ( 38, 21 ), ( 38, 76 ),

( 40, 0 ), ( 41, 31 ), ( 41, 66 ), ( 44, 3 ), ( 44, 94 ), ( 45, 27 ),

( 45, 70 ), ( 46, 19 ), ( 46, 78 ), ( 47, 47 ), ( 47, 50 ), ( 49, 8 ),

( 49, 89 ), ( 51, 29 ), ( 51, 68 ), ( 52, 45 ), ( 52, 52 ), ( 54, 0 ),

240 9 Appendix A: GAP

( 63, 2 ), ( 63, 95 ), ( 65, 47 ), ( 65, 50 ), ( 66, 16 ), ( 66, 81 ),

( 68, 39 ), ( 68, 58 ), ( 69, 17 ), ( 69, 80 ), ( 70, 21 ), ( 70, 76 ),

( 71, 25 ), ( 71, 72 ), ( 76, 2 ), ( 76, 95 ), ( 79, 34 ), ( 79, 63 ),

( 81, 4 ), ( 81, 93 ), ( 82, 47 ), ( 82, 50 ), ( 83, 23 ), ( 83, 74 ),

( 85, 30 ), ( 85, 67 ), ( 86, 21 ), ( 86, 76 ), ( 89, 27 ), ( 89, 70 ),

( 91, 34 ), ( 91, 63 ), ( 92, 10 ), ( 92, 87 ), ( 93, 9 ), ( 93, 88 ),

( 96, 1 ), ( 96, 96 ), infinity ]

gap> Size(H);

100

multiply points instead of adding them and calculate P −1 instead of −P:

gap> point1:=ptsList[2];

( 2, 62 )

gap> point2:=ptsList[21];

( 16, 64 )

gap> point1 * point2;

( 81, 93 )

gap> point1ˆ-1;

( 2, 35 )

gap> g := Random(G);

( 92, 87 )

gap> gˆ5;

( 69, 80 )

You can determine orders of all elements of the group simultaneously using the

command

gap> List(ptsList,Order);

[ 50, 50, 2, 25, 25, 25, 25, 25, 25, 50, 50, 50, 50, 50, 50, 10, 10, 50, 50,

50, 50, 50, 50, 10, 10, 50, 50, 50, 50, 50, 50, 5, 5, 50, 50, 25, 25, 2,

50, 50, 50, 50, 50, 50, 25, 25, 50, 50, 50, 50, 25, 25, 5, 5, 2, 50, 50,

25, 25, 50, 50, 50, 50, 25, 25, 50, 50, 50, 50, 10, 10, 50, 50, 50, 50, 25,

25, 10, 10, 50, 50, 50, 50, 50, 50, 10, 10, 50, 50, 25, 25, 10, 10, 50, 50,

50, 50, 50, 50, 1 ]

The group of an elliptic curve for a 10-digit prime is already too big for GAP; it will

not be able to keep the whole group in the memory. For example, the two commands

gap> p:=123456791;;

gap> G:=EllipticCurveGroup(123,17,p);

will return an error. If one wants to calculate in larger groups, special techniques

must be applied.

We can ﬁnd out if the group is cyclic or not.

gap> n:=NextPrimeInt(12345);

12347

gap> G:=EllipticCurveGroup(123,17,n);

EllipticCurveGroup(123,17,12347)

gap> Size(G);

12371

gap> Random(G);

( 11802, 5830 )

gap> P:=Random(G);

gap> Order(P);

12371

gap> IsCyclic(G);

true

9.4 Algebra 241

There is no known polynomial time algorithm which ﬁnds a point on the given curve,

although the following randomised algorithm gives us a point with probability close

to 1/2. This algorithm chooses x at random and tries to ﬁnd a matching y such that

(x, y) is on the curve. For example,

gap> p:=NextPrimeInt(99921);

99923

gap> G:=EllipticCurveGroup(123,17,p);

EllipticCurveGroup(123,17,99923)

gap> Size(G);

100260

gap> IsCyclic(G);

true

gap> x:=12345;

12345

gap> fx:=(xˆ3+123*x+17) mod p;

51321

gap> y:=RootMod(fx,p);

fail

gap> x:=1521;

gap> fx:=(xˆ3+123*x+17) mod p;

42493

gap>

y:=RootMod(fx,p);

72372

It is not so easy to input a point of a given elliptic curve. Suppose we want to

input a point M = (2425, 89535) of the curve

We must generate the curve but we also have to explain to GAP that M is the point

of the curve we have deﬁned. For this we present GAP with an already known point

of the target curve (for example, we can generate a point P on this curve at random)

and say that we will input a point of the same curve. We see how this can be done in

the example below:

gap> G:=EllipticCurveGroup(0,12345,95701);;

gap> P:=Random(G);

(91478, 65942 )

gap> M:=EllipticCurvePoint(FamilyObj(P),[2425,89535]);

( 2425, 89535 )

Finally, below are the ﬁles that have to be read before any calculations with elliptic

curves are possible.

#############################################################################

##

#W elliptic.gd Stefan Kohl

##

## This file contains declarations of functions etc. for computing with

## elliptic curve

##

DeclareCategoryCollections( "IsPointOnEllipticCurve" );

242 9 Appendix A: GAP

["x","y"] );

DeclareGlobalFunction( "EllipticCurvePoint" );

DeclareGlobalFunction( "EllipticCurveGroup" );

#############################################################################

##

#E elliptic.gd . . . . . . . . . . . . . . . . . . . . . . . . . ends here

##

#############################################################################

##

#W elliptic.gi Stefan Kohl

##

## This file contains implementations of methods and functions for

## computing in the elliptic curve point groups E( a , b )/p

## (only in affine Weierstrass form) ##

InstallGlobalFunction( EllipticCurvePoint,

function ( Fam, P )

local X, Y;

X := P[ 1 ]; Y := P[ 2 ];

if X <> infinity

and (Yˆ2) mod Fam!.p <> (Xˆ3 + Fam!.a*X + Fam!.b) mod Fam!.p

then Error( "The given point must be on the specified curve" ); fi;

if X = infinity

then Y := infinity;

else X := X mod Fam!.p; Y := Y mod Fam!.p;

fi;

and IsAffineWeierstrassRep ),

rec( x := X, y := Y ) );

end );

InstallGlobalFunction( EllipticCurveGroup,

function ( a, b, p )

local F, G, X, Y, FamName, ready, Point;

and IsPosInt(p) and IsPrimeInt(p) and p >= 5 )

then Error( "E(a,b)/p : <a> and <b> have to be integers, ",

" and p has to be a prime >= 5" ); fi;

if (4*aˆ3 + 27*bˆ2) mod p = 0

then Error( "<a> and <b> must satisfy 4*a3 + 27*b2 <> 0 (mod <p>)" );

fi;

String( p ) );

SetName( F, FamName );

F!.a := a;

F!.b := b;

F!.p := p;

X := 0; ready := false;

9.4 Algebra 243

repeat

if Legendre( Xˆ3 + a*X + b, p ) = 1

then Y := RootMod( Xˆ3 + a*X + b, p );

Point := EllipticCurvePoint( F, [ X, Y ] );

if not IsBound( G )

then G := GroupByGenerators( [ Point ] );

else G := ClosureGroup( G, Point );

fi;

if p > 31 and Size( G ) > p - 2 * RootInt( p )

then ready := true; fi;

fi;

X := X + 1;

until X = p or ready;

SetIsWholeFamily( G, true );

SetName( G, Concatenation( "EllipticCurveGroup(", String( a ),

",", String( b ), ",", String( p ), ")" ) );

return G;

end );

InstallMethod( PrintObj,

"for element in E(a,b)/p, (AffineWeierstrassRep)",

true, [ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,

function( p )

Print( "EllipticCurvePoint( ", FamilyObj( p ),

", [ ",p!.x,", ", p!.y, " ] )" );

end );

InstallMethod( ViewObj,

"for element in E(a,b)/p, AffineWeierstrassRep",

true, [ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,

function( p )

if p!.x <> infinity

then Print( "( ",p!.x,", ", p!.y, " )" );

else Print( "infinity" );

fi; end );

InstallMethod( \=,

"for two elements in E(a,b)/p, AffineWeierstrassRep",

IsIdenticalObj,

[ IsPointOnEllipticCurve and IsAffineWeierstrassRep,

IsPointOnEllipticCurve and IsAffineWeierstrassRep ],

0,

function( x, y )

return x!.x = y!.x and x!.y = y!.y;

end );

InstallMethod( \<,

"for two elements in E(a,b)/p, AffineWeierstrassRep",

IsIdenticalObj,

[ IsPointOnEllipticCurve and IsAffineWeierstrassRep,

IsPointOnEllipticCurve and IsAffineWeierstrassRep ],

0,

function( x, y )

return [x!.x, x!.y] < [y!.x, y!.y];

end );

InstallMethod( \*,

"for two elements in E(a,b)/p, AffineWeierstrassRep",

IsIdenticalObj,

244 9 Appendix A: GAP

IsPointOnEllipticCurve and IsAffineWeierstrassRep ],

0,

function( p1, p2 )

local lambda, p3, p, h;

p := FamilyObj( p1 )!.p;

if (p1!.x <> infinity) and (p2!.x <> infinity)

then

if p1!.x = p2!.x and p1!.y = (- p2!.y) mod FamilyObj( p2 )!.p

then p3 := rec( x := infinity, y := infinity );

else

if p1!.x <> p2!.x

then h := QuotientMod( 1, p1!.x - p2!.x, FamilyObj( p1 )!.p );

if h = fail then return Gcd( p1!.x - p2!.x, p ); fi;

lambda := (p1!.y - p2!.y) * h;

else h := QuotientMod( 1, 2 * p1!.y, FamilyObj( p1 )!.p );

if h = fail then return Gcd( 2 * p1!.y, p ); fi;

lambda := (3 * p1!.xˆ2 + FamilyObj( p1 )!.a) * h;

fi;

p3 := rec();

p3.x := lambdaˆ2 - p1!.x - p2!.x;

p3.y := - (lambda * (p3.x - p1!.x) + p1!.y);

fi;

else

if p1!.x = infinity then p3 := rec( x := p2!.x, y := p2!.y );

else p3 := rec( x := p1!.x, y := p1!.y ); fi;

fi;

return EllipticCurvePoint( FamilyObj( p1 ), [ p3.x, p3.y ] );

end );

InstallMethod( OneOp,

"for an element in E(a,b)/p, AffineWeierstrassRep",

true,

[ IsPointOnEllipticCurve ], 0,

x -> EllipticCurvePoint( FamilyObj( x ),

[ infinity, infinity ] )

);

InstallMethod( InverseOp,

"for an element in E(a,b)/p, AffineWeierstrassRep",

true,

[ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,

function ( p )

if p!.x = infinity

then return EllipticCurvePoint( FamilyObj( p ), [ infinity, infinity ] );

else return EllipticCurvePoint( FamilyObj( p ),

[ p!.x, (- p!.y) mod FamilyObj( p )!.p ] );

fi;

end );

InstallMethod( Random,

"for group E(a,b)/p",

true,

[ CategoryCollections( IsPointOnEllipticCurve )

and IsWholeFamily ], 0,

function ( G )

local X, Y, a, b, p;

b := ElementsFamily( FamilyObj( G ) )!.b;

9.4 Algebra 245

repeat

X := Random( [0 .. p - 1] );

until Legendre( Xˆ3 + a*X + b, p ) = 1;

Y := RootMod( Xˆ3 + a*X + b, p );

return EllipticCurvePoint( ElementsFamily( FamilyObj( G ) ), [ X, Y ] );

end );

#############################################################################

##

#E elliptic.gi . . . . . . . . . . . . . . . . . . . . . . . . . ends here

##

GAP knows about all the ﬁnite ﬁelds. To create the ﬁnite ﬁeld Z p , type GF(p); For

example,

gap> F:=GF(5);;

gap> List:=Elements(F);

[ 0*Z(5), Z(5)ˆ0, Z(5), Z(5)ˆ2, Z(5)ˆ3 ]

The ﬁrst element is 0 (GAP makes it clear that this is the zero of Z5 and not, say,

of Z3 ). The remaining elements are powers of a primitive element of Z5 , and, in

particular, the second element is 1. Type Int(Z(5)); to determine the value of

Z (5) (as an integer mod 5).

gap> Int(Z(5));

2

gap> value:=[0,0,0,0,0];;

gap> for i in [1..5] do

> value[i]:=Int(List[i]);

> od;

gap> value;

[ 0, 1, 2, 4, 3 ]

gap> F:=GF(7);

GF(7)

gap> Elements(F);

[ 0*Z(7), Z(7)ˆ0, Z(7), Z(7)ˆ2, Z(7)ˆ3, Z(7)ˆ4, Z(7)ˆ5 ]

gap> # Here 0*Z(7)=0, Z(7)ˆ0=1, Z(7)=3, Z(7)ˆ2=2, Z(7)ˆ3=6, Z(7)ˆ4=4, Z(7)ˆ5=5.

gap> # Z(7) is not 2 since 2 is not a primitive element.

In GAP the generator of Z ( p) is chosen as the smallest primitive root mod p, as

is obtained from the PrimitiveRootMod function. Here’s how to verify this for

p = 3 and p = 5:

gap> PrimitiveRootMod(7);

3

gap> p:=123456791;;

gap> PrimitiveRootMod(p);

17

246 9 Appendix A: GAP

type GF(pˆk);. For example,

gap> GF4:=GF(4);

GF(2ˆ2)

gap> F:=Elements(GF4);

[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2 ]

Since F ∗ is a cyclic group, GAP uses a generator of this cyclic group, denoted Z ( pk ),

to list all elements (except zero) as its powers.

gap> GF4:=GF(4);

GF(2ˆ2)

gap> gf4:=Elements(GF4);

[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2 ]

gap> # Note that GAP lists elements of Z_2 first.

gap> GF8:=GF(8);

GF(2ˆ3)

gap> gf8:=Elements(GF8);

[ 0*Z(2), Z(2)ˆ0, Z(2ˆ3), Z(2ˆ3)ˆ2, Z(2ˆ3)ˆ3, Z(2ˆ3)ˆ4, Z(2ˆ3)ˆ5, Z(2ˆ3)ˆ6 ]

Note that G F(8) contains G F(2) but not G F(4). It is a general fact that G F( pm )

contains G F( p k ) as a subﬁeld if and only if k|m.

gap> GF9:=GF(9);

GF(3ˆ2)

gap> gf9:=Elements(GF9);

[ 0*Z(3), Z(3)ˆ0, Z(3), Z(3ˆ2), Z(3ˆ2)ˆ2, Z(3ˆ2)ˆ3, Z(3ˆ2)ˆ5, Z(3ˆ2)ˆ6,

Z(3ˆ2)ˆ7 ]

Note that GAP lists elements of Z 3 ﬁrst. Next, let’s try adding, subtracting, and

multiplying ﬁeld elements in GAP. For example in G F(9) :

[ 0*Z(3), Z(3)ˆ0, Z(3), Z(3ˆ2), Z(3ˆ2)ˆ2, Z(3ˆ2)ˆ3, Z(3ˆ2)ˆ5, Z(3ˆ2)ˆ6, Z(3ˆ2)ˆ7 ]

gap> gf9[5]+gf9[6]; gf9[5]-gf9[7];

Z(3)

Z(3ˆ2)ˆ3

gap> gf9[5]ˆ2;

Z(3)

the smallest nonnegative integer i such that r i = z. The command LogFFE(z, r)

returns this value. (Note that r must not be a primitive element of the ﬁeld for this

command to work.) An error is signalled if z is zero, or if z is not a power of r .

gap> LogFFE( Z(409)ˆ116, Z(409) ); LogFFE( Z(409)ˆ116, Z(409)ˆ2 );

116; 58

9.4.4 Polynomials

It is not too hard to explain to GAP that we now want x to be a polynomial. We can

deﬁne the polynomial ring F[x] ﬁrst. For example, we deﬁne the polynomial ring in

one variable x over Z2 as follows:

gap> R:=PolynomialRing(GF2,["x"]);

PolynomialRing(..., [ x ])

gap> x:=IndeterminatesOfPolynomialRing(R)[1];

x

9.4 Algebra 247

Now GAP will understand the following commands in which we deﬁne a polynomial

1+x+x 3 ∈ Z2 [x] and substitute the primitive element of G F(8) in it. All calculations

will therefore be conducted in the ﬁeld G F(8):

gap> p:=Z(2)+x+xˆ3;

xˆ3+x+Z(2)ˆ0

gap> Value(p,Z(2ˆ3));

0*Z(2)

This tells us that the generator Z(23 ) of G F(8) is a root of the polynomial p(x) =

x 3 + x + 1 over Z2 .

We can factorise polynomials as follows:

gap> Factors(xˆ16+x+1);

[ xˆ8+xˆ6+xˆ5+xˆ3+Z(2)ˆ0, xˆ8+xˆ6+xˆ5+xˆ4+xˆ3+x+Z(2)ˆ0 ]

gap> g:=xˆ3+1;

xˆ3+Z(2)ˆ0

gap> h:=xˆ4+xˆ2+1;

xˆ4+xˆ2+Z(2)ˆ0

gap> Gcd(g,h);

xˆ2+x+Z(2)ˆ0

gap> GcdRepresentation(g,h);

[ x, Z(2)ˆ0 ]

gap> x*g+Z(2)ˆ0*h;

xˆ2+x+Z(2)ˆ0

as a polynomial from Q[x]:

gap> x := Indeterminate(Rationals);

x_1

gap> Factors(xˆ12-1);

[x_1-1, x_1+1, x_1ˆ2-x_1+1, x_1ˆ2+1, x_1ˆ2+x_1+1, x_1ˆ4-x_1ˆ2+1 ]

When you type x, GAP understands what you want to say but still gives the answer

in terms of x1 . Another useful command:

gap> QuotientRemainder( (x+1)*(x+2)+5, x+1 );

[ x_1+2, 5 ]

ments in a ring R of the same lengths, say, n, the unique polynomial of degree less

than n which has value y[i] at x[i], for all i = 1, 2, . . . , n. Note that the elements in

x must be distinct. For example,

gap> InterpolatedPolynomial( Integers, [ 1, 2, 3 ], [ 5, 7, 0 ] );

-9/2*xˆ2+31/2*x-6

gap> F:=GF(2ˆ6);

GF(2ˆ6)

gap> elts:=Elements(F);

[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2, Z(2ˆ3), Z(2ˆ3)ˆ2, Z(2ˆ3)ˆ3, Z(2ˆ3)ˆ4,

Z(2ˆ3)ˆ5, Z(2ˆ3)ˆ6, Z(2ˆ6), Z(2ˆ6)ˆ2, Z(2ˆ6)ˆ3, Z(2ˆ6)ˆ4, Z(2ˆ6)ˆ5,

Z(2ˆ6)ˆ6, Z(2ˆ6)ˆ7, Z(2ˆ6)ˆ8, Z(2ˆ6)ˆ10, Z(2ˆ6)ˆ11, Z(2ˆ6)ˆ12, Z(2ˆ6)ˆ13,

Z(2ˆ6)ˆ14, Z(2ˆ6)ˆ15, Z(2ˆ6)ˆ16, Z(2ˆ6)ˆ17, Z(2ˆ6)ˆ19, Z(2ˆ6)ˆ20,

Z(2ˆ6)ˆ22, Z(2ˆ6)ˆ23, Z(2ˆ6)ˆ24, Z(2ˆ6)ˆ25, Z(2ˆ6)ˆ26, Z(2ˆ6)ˆ28,

Z(2ˆ6)ˆ29, Z(2ˆ6)ˆ30, Z(2ˆ6)ˆ31, Z(2ˆ6)ˆ32, Z(2ˆ6)ˆ33, Z(2ˆ6)ˆ34,

248 9 Appendix A: GAP

Z(2ˆ6)ˆ43, Z(2ˆ6)ˆ44, Z(2ˆ6)ˆ46, Z(2ˆ6)ˆ47, Z(2ˆ6)ˆ48, Z(2ˆ6)ˆ49,

Z(2ˆ6)ˆ50, Z(2ˆ6)ˆ51, Z(2ˆ6)ˆ52, Z(2ˆ6)ˆ53, Z(2ˆ6)ˆ55, Z(2ˆ6)ˆ56,

Z(2ˆ6)ˆ57, Z(2ˆ6)ˆ58, Z(2ˆ6)ˆ59, Z(2ˆ6)ˆ60, Z(2ˆ6)ˆ61, Z(2ˆ6)ˆ62 ]

gap> a:=elts[11];

Z(2ˆ6)

gap> MinimalPolynomial(GF(2),a);

x_1ˆ6+x_1ˆ4+x_1ˆ3+x_1+Z(2)ˆ0

gap> MinimalPolynomial(GF(2ˆ3),a);

x_1ˆ2+Z(2ˆ3)*x_1+Z(2ˆ3)

m(t) = t 6 + t 4 + t 3 + t + 1

m 1 (t) = t 2 + αt + α

Chapter 10

Appendix B: Miscellanies

Lemma 10.1.1 Let A = [ a1 , a2 , . . . , an ] and B = [ b1 , b2 , . . . , bn ] be two m × n

matrices given by their columns a1 , a2 , . . . , an and b1 , b2 , . . . , bn . Suppose that A

is row reducible to B. Then

only if the system {bi1 , bi2 , . . . , bik } is linearly independent.

Proof Let x = (x1 , x2 , . . . , xn )T . Then

Ax = x1 a1 + x2 a2 + · · · + xn an and Bx = x1 b1 + x2 b2 + · · · + xn bn .

Since elementary row operations do not change the solution set of systems of linear

equations, we know that

Ax = 0 if and only if Bx = 0.

The algorithm is used when we are given a set of vectors v1 , v2 , . . . , vn ∈ Rn and

we need to identify a basis of span{v1 , v2 , . . . , vn } and express all other vectors as

linear combinations of that basis. We form a matrix (v1 · · · vn ) whose columns are the

given vectors and reduce it to the reduced row echelon form where all relationships

are transparent.

Example 10.1.1 The matrix A = [a1 , a2 , . . . , a5 ] with columns a1 , a2 , . . . , a5

is brought to its reduced row echelon form R = [r1 , r2 , . . . , r5 ] with columns

© Springer International Publishing Switzerland 2015 249

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_10

250 10 Appendix B: Miscellanies

r1 , r2 , . . . , r5 as follows:

⎡ ⎤ ⎡ ⎤

1 −1 0 1 −4 1010 2

⎢ 0 2 2 2 0 ⎥ rref ⎢ 0 1 1 0 3 ⎥

A=⎢ ⎥ ⎢ ⎥

⎣ 2 1 3 1 4 ⎦ −→ ⎣ 0 0 0 1 −3 ⎦ .

3 254 0 0000 0

The relationships between columns of R are much more transparent than that of

A. For example, we see that {r1 , r2 , r4 } is linearly independent (as a part of the

standard basis of R4 ) and that r1 + r2 − r3 = 0 and r5 = 2r1 + 3r2 − 3r4 .

Hence we can conclude that {a1 , a2 , a4 } is linearly independent, hence a basis of

span{a1 , a2 , . . . , a5 } and that a3 = a1 + a2 and a5 = 2a1 + 3a2 − 3a4 .

1 1 · · · 1

x1 x2 · · · xn

2 2

Vn (x1 , x2 , . . . , xn ) = x1 x2 · · · xn2 (10.2)

.. .. .. .

.

. . ..

x n−1 x n−1 · · · xnn−1

1 2

plays a signiﬁcant role in algebra and applications. It can be deﬁned over any ﬁeld,

has a beautiful structure and can be calculated directly for any order.

More precisely, the following theorem is true.

the value of the Vandermonde determinant of order n ≥ 2 is

Vn (a1 , a2 , . . . , an ) = (ai − a j ). (10.3)

1≤i< j≤n

Proof Since V2 = a2 − a1 we get a basis for induction. Suppose the theorem is true

for order n − 1. Consider the determinant

1 1 · · · 1

x

2 a22 · · · an2

x a · · · a

f (x) = 2 n

.. .. . . ..

. . . .

x n−1 a n−1 · · · a n−1

2 n

10.2 The Vandermonde Determinant 251

If we expand it using cofactors of the ﬁrst column we will see that it has degree n − 1.

Also it is easy to see that f (a2 ) = · · · = f (an ) = 0 since, if we replace x with any

of the ai for i > 1, we will have a determinant with two equal columns. Hence

f (x) = C(x − a2 ) . . . (x − an ).

From the expansion of f (x) by cofactors of the ﬁrst column we see that C =

Vn−1 (a2 , . . . , an ). Hence we have

Vn (a1 , a2 , . . . , an ) = f (a1 ) = (a1 − a2 ) . . . (a1 − an ) (ai − a j )

2≤i< j≤n

= (ai − a j ).

1≤i< j≤n

monde determinant Vn (a1 , a2 , . . . , an ) is nonzero.

The determinant

x1 x2 · · · xn

2

x1

x22 · · · xn2

. .. . . ..

Vn (x1 , x2 , . . . , xn ) = .. . . . (10.4)

x n−1 x2n−1 · · · xn

n−1

1

xn x2n · · · xn

1 n

original Vandermonde determinant as the following theorem states.

Theorem 10.2.2 Let a1 , a2 , . . . , an be elements of the ﬁeld F. Then

n

Vn (a1 , a2 , . . . , an ) = ai Vn (a1 , a2 , . . . , an ). (10.5)

i=1

Chapter 11

Solutions to Exercises

1. (a) The whole set of integers itself does not contain a smallest element.

(b) The set {1/2, 1/3, . . . , 1/n, . . .} does not contain a smallest element.

2. Here we just need the Principle of Mathematical Induction. For n = 1, the integer

4n + 15n − 1 = 18 is divisible by 9. This is a basis for the induction. Suppose that

4n +15n −1 is divisible by 9 for some n > 1. Let us consider 4n+1 +15(n +1)−1

and represent it as 4 · 4n + 15n + 14 = 4(4n + 15n − 1) − 45n + 18. This is

now obviously divisible by 9 since both 4(4n + 15n − 1) and 45n + 18 are (the

former by induction hypothesis). Thus 4n+1 + 15(n + 1) − 1 is divisible by 9 and

the induction step has been proven.

3. For n = 0 we have 112 + 121 = 133 which is, of course, divisible by 133. This

gives us a basis for the induction.

We need the Principle of Mathematical Induction again. Suppose now

133 | 11n+2 + 122n+1 (induction hypothesis) and let us consider

= 144(11n+2 + 122n+1 ) − 133 · 11n+2 .

The right-hand side is divisible by 133. Indeed, the ﬁrst summand is divisible by

133 by induction hypothesis and the second is simply a multiple of 133. Thus

11n+3 + 122n+3 is divisible by 133, which completes the induction step and the

proof.

4. We have F0 = 3 and F1 = 5. We see that F0 = F1 − 2 and this is a basis for our

induction. The induction step

© Springer International Publishing Switzerland 2015 253

A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,

DOI 10.1007/978-3-319-21951-6_11

254 11 Solutions to Exercises

n+1 n+1 n+2

(Fn+1 − 2)Fn+1 = (22 − 1)(22 + 1) = 22 − 1 = Fn+2 − 2.

5. For k = 1 we have 3k = 3, which is a divisor of 23 + 1 = 9. This gives us a basis

for the induction.

k

Suppose now 3k | 23 + 1 (induction hypothesis). Then there exists an integer m

k

such that m · 3k = 23 + 1 and let us consider

k+1 k

23 = (23 )3 = (m · 3k −1)3 = m 3 · 33k −m 2 · 32k+1 +m · 3k+1 −1=t · 3k+1 −1,

k+1

where t = m 3 · 32k+1 − m 2 · 3k + m is an integer. Thus 3k+1 | 23 + 1, which

proves the induction step.

6. Let M be the minimal number that cannot be represented as required. Then M is

between two powers from the list, say 2k < M < 2k+1 . Since M is minimal, the

number M − 2k can be represented as

M − 2k = 2i1 + · · · + 2is ,

where i 1 < · · · < i s . Since M − 2k < 2k , it is clear that 2k > 2is . Therefore

M = 2i1 + · · · + 2is + 2k

assumed. This contradiction proves the statement.

7. Let M be the minimal positive integer which can be represented as a sum of

distinct powers of 2 in two different ways:

M = 2i1 + · · · + 2is = 2 j1 + · · · + 2 jt .

not, we can divide both sides by 2 and get two different representations for M/2

which contradicts the minimality of M. If i 1 = j1 = 0, then 2i1 = 2 j1 = 1 and

subtracting 1 on both sides we would get two different representations for M − 1,

which again contradicts the minimality of M.

Hence

1 + 2i2 + · · · + 2is = 2 j1 + · · · + 2 jt

and j1 > 0. But then the left-hand side is odd and the right-hand side is even.

This contradiction shows that such a minimal counterexample M does not exist

and all integers can be uniquely represented.

8. Consider a minimal counter example, i.e., any conﬁguration of discs which cannot

be painted as required and which consists of the least possible number of discs.

Consider the centers of all discs and consider the convex hull of them. This hull

11.1 Solutions to Exercises of Chap. 1 255

is a convex polygon and each angle of it is less than 180◦ . If a disc with the centre

O is touched by two other discs with centres P and Q

whence ∠POQ ≥ 60◦ . Thus every disc with centre at a vertex of the convex hull

cannot be touched by more than three other discs. Remove any of the discs whose

center is at the vertex of the convex hull. Then the rest of the discs can already

be painted because the counterexample was minimal. But then the removed disc

can be painted as well, since it was touched by at most three other discs and we

can choose the fourth colour to paint it. This contradiction proves the statement.

1. The 2007th prime will not be stored in Primes so we have to use the command

NextPrimeInt to ﬁnd it:

gap> p:=1;;

gap> n:=2007;;

gap> for i in [1..n] do

> p:=NextPrimeInt(p);

> od;

gap> p;

17449

2. The following GAP program

gap> k:=1;;

gap> N:=Primes[1];;

gap> while IsPrime(N+1)=true do

> k:=k+1;

> N:=N*Primes[k];

> od;

gap> k;

6

gap> N:=N+1;

30031

gap> FactorsInt(n);

[ 59, 509 ]

Then N6 = 30031 = 59 · 509. Both 59 and 509 are greater than p6 = 13.

256 11 Solutions to Exercises

a 3 − 27. Hence a − 3 divides a 3 − 17 if and only if it divides the difference

(a 3 − 27) − (a 3 − 17) = 10. This happens if and only if

4. We will prove the second statement. Let p > 2 be a prime. Let us divide it by

6 with remainder: p = 6k + r , where r = 0, 1, 2, 3, 4, 5. When r takes values

0, 2, 3, 4 the right-hand side is divisible by 2 or 3, hence in this case p cannot be

a prime. Only two possibilities are left: p = 6k + 1 and p = 6k + 5. Examples

of primes of these two sorts are 7 and 11.

5. Let p = 3k +1 be a prime. Then p > 2 and hence it is odd. But then 3k = p −1 is

even and 3k = 2m. Due to uniqueness of prime factorisation, k must be divisible

by 2, i.e., k = 2k . Therefore p = 3k + 1 = 6k + 1.

6. Here is the program:

Primes1:=[];;

Primes3:=[];;

numbers:=[1..168];;

for i in numbers do

if RemInt(Primes[i],4)=1 then

Add(Primes1,Primes[i]);

fi;

if RemInt(Primes[i],4)= 3 then

Add(Primes3,Primes[i]);

fi;

od;

Length(Primes1);

Length(Primes3);

Primes1[32];

Primes3[53];

Position(Primes1,601);

Position(Primes3,607);

7. (a) Here is the program and the calculation:

gap> NicePrimes:=[];

[ ]

gap> for i in [1..Length(Primes)] do

> if RemInt(Primes[i],6)=5 then

> Add(NicePrimes,Primes[i]);

> fi;

> od;

gap> NicePrimes;

[ 5, 11, 17, 23, 29, 41, 47, 53, 59, 71, 83, 89, 101, 107,

113, 131, 137, 149, 167, 173, 179, 191, 197, 227, 233, 239,

251, 257, 263, 269, 281, 293, 311, 317, 347, 353, 359, 383,

389, 401, 419, 431, 443, 449, 461, 467, 479, 491, 503, 509,

521, 557, 563, 569, 587, 593, 599, 617, 641, 647, 653, 659,

677, 683, 701, 719, 743, 761, 773, 797, 809, 821, 827, 839,

857, 863, 881, 887, 911, 929, 941, 947, 953, 971, 977, 983 ]

(b) We know (see Exercise 4) that all primes p > 3 fall into two categories: those

for which p = 6k + 1 and those for which p = 6k + 5.

11.1 Solutions to Exercises of Chap. 1 257

n 1 = 6k1 + 1 and n 2 = 6k2 + 1, then their product

Now we assume that there are only ﬁnitely many primes p such that p =

6k + 5. Then there is the largest such prime. Let p1 , p2 , . . . , pn , . . . be the

sequence of all primes in increasing order with pn being the largest prime

that gives remainder 5 on division by 6. Consider the number

N = p1 p2 . . . pn − 1.

has remainder 5 on division by 6 and hence belongs to the second category.

Let q be any prime that divides N . Obviously it is different from all of the

p1 , p2 , . . . , pn . Since q > pn it must be of the type q = 6k + 1. Thus every

prime that divides N has remainder 1 on division by 6, then, as we noted

above, the same must be true for N , which contradicts the fact that N has

remainder 5 on division by 6.

8. There are many alternative proofs of the fact that the number of primes is inﬁ-

nite. Here is one of those. Assume on the contrary that there are only k primes

p1 , p2 , . . . , pk . Given n, let us ﬁnd an upper bound f (n) for the number of

products

α1 , α2 , . . . , αk might assume. Since n ≥ piαi ≥ 2αi we obtain αi ≤ log2 n.

Then the number of products which do not exceed n will be at most

It is easy to show that f (n) grows more slowly than n for n sufﬁciently large. For

example, we may use L’Hospital rule to show that

f (n)

lim = 0.

n→∞ n

This will be an absurdity since for large n there will not be enough prime factori-

sations for all positive integers between 1 and n.

258 11 Solutions to Exercises

√

1. (a) Notice that since 210 < 17, all composites below 210 divide 2, 3, 5, 7, 11

or 13. The primes to be found are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,

41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83,89, 97, 101, 103, 107, 109, 113, 127,

131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199.

Hence π(210) = 46.

(b) We have

210

≈ 39,

ln 210

which is somewhat lower than 46. This shows that the approximation given

by the Prime Number Theorem is not very good for small values of n.

2. Straightforward. √

3. (a) No, because n > 11093 and n can be, for example, a square of a prime p

such that 10000 < p ≤ 11093.

The number of possible prime divisors is approximately x/ln(x) where x =

(b) √

123123137, so approximately 1193 divisions are needed. The professor

has already done 10000/ ln(10000) = 1085 divisions; he needs to do another

108.

4. Since n is composite, n = p1 p2 . . . pm , where pi is prime, for all i = 1, 2, . . √

. , m,

and we do not assume that all of them are√ different. We are given that pi > 3

n,

and m ≥ 2. Then, we also have pi > n because pi is an integer. Suppose that

3

m ≥ 3. Then √

n = p1 p2 . . . pm ≥ p1 p2 p3 > ( 3 n)3 = n,

5. Let n > 6 be an integer. If n is odd, then 2 and n − 2 are relatively prime (since

n − 2 is odd) and n = 2 + (n − 2) is a valid solution. More generally, if there

is a prime p which is smaller than n − 1 and does not divide n, we can write

n = p + (n − p) and gcd((, p), n − p) = 1.

Since n > 6 we may assume that 2|n, 3|n, 5|n and hence 30|n. In particular, n is

composite. Let q be the largest prime divisor of n. Then n ≥ 6q so 5 ≤ q ≤ n/6.

By Bertrand’s postulate there is a prime p such that q < p < 2q ≤ n/3 < n.

Now gcd((, p), n) = 1 and so n = p + (n − p) is the solution (note that

n − p ≥ n − n/3 > 1).

6. The program does some kind of sieving but the result is very different from the

result of the Sieve of Eratosthenes. It outputs all powers of 2 between 1 and 106 .

There are 20 such numbers in total.

22 · 33 · 44 · 55 = 210 · 33 · 55

11.1 Solutions to Exercises of Chap. 1 259

and the number of divisors will be (10 + 1)(3 + 1)(5 + 1) = 264. Note that we

cannot use the formula straight as 4 is not prime.

2. We factor this number with GAP:

gap> FactorsInt(123456789);

[ 3, 3, 3607, 3803 ]

divisors then will be (2 + 1)(1 + 1)(1 + 1) = 12.

3. The common divisors of 10650 and 6750 are the divisors of gcd(10650, 6750).

So, let us calculate this number using the Euclidean algorithm. We will ﬁnd:

6750 = 1 · 3900 + 2850

3900 = 1 · 2850 + 1050

2850 = 2 · 1050 + 750

1050 = 1 · 750 + 300

750 = 2 · 300 + 150

300 = 2 · 150

Hence

Therefore the common divisors of 10650 and 6750 are the factors of 150, which

are 1, 2, 3, 5, 6, 10, 15, 25, 30, 50, 75, 150.

4. (a) We have gcd(m, n) = 22 · 54 · 112 ; lcm(m, n) = 24 · 32 · 57 · 72 · 113 .

(b) Using GAP we calculate

Also

5. The prime factorisation of 33 is 33 = 3 · 11. This number of divisors can occur

when the number is equal to p 32 , where p is prime, or when the number is p 10 q 2 ,

where p, q are primes. As 232 > 10000, the ﬁrst possibility cannot occur. In the

second, since 310 > 10000, the number can be only of the form n = 210 q 2 . The

smallest unused prime is q = 3. This gives us the number n = 102 · 32 = 9216.

No other prime q works since n = 102 · 52 > 10000. So the only such number

is 9216.

260 11 Solutions to Exercises

6. Since the prime factorisation of 246 is 246 = 2 · 3 · 41, the prime factorisation

of 246246 will be

d(d(246246 )) = 4 · 4 = 16.

7. If d is a divisor of a and b, then a = a d and b = b d and a − b = (a − b )d,

whence d is a common divisor of a and a − b. If d is a divisor of a and a − b,

then a = a d and a − b = cd. Then b = a − (a − b) = (a − c)d, that is, d is

also a common divisor of a and b.

8. We use the previous exercise repeatedly. We have gcd(13n + 21, 8n + 13) =

gcd(8n + 13, 5n + 8) = gcd(5n + 8, 3n + 5) = gcd(3n + 5, 2n + 3) =

gcd(2n + 3, n + 2) = gcd(n + 2, n + 1) = gcd(n + 1, 1) = 1.

9. (a) Suppose a 2 and a + b have a common prime divisor p. Then it is also a

divisor of a and hence of b = (a + b) − a, contradiction.

(b) As in Exercise 7, we notice that gcd(a, b) = gcd(a, a + b) for any integer

q. Then, since a 2 − b2 = (a − b)(a + b) is divisible by a + b, we have

gcd(a 2 +b2 , a +b) = gcd((a 2 +b2 )+(a 2 −b2 ), a +b) = gcd(2a 2 , a +b) = 2.

Since a + b and a 2 + b2 are not relatively prime, their greatest common

divisor can only be 2. This can be realised taking two arbitrary odd relatively

prime a and b, say a = 25 and b = 49.

10. Let Fi and F j be two Fermat numbers with i < j. Then by Exercise 4 of

Sect. 1.1.1 F0 F1 . . . F j−1 = F j − 2. Since the left-hand side is divisible by Fi ,

the only common divisor of Fi and F j could be 2. However, these numbers are

odd, hence coprime.

11. If there were only a ﬁnite number k of primes, then among any k + 1 Fermat

numbers there will be two with a common prime factor. However this is not

possible due to the previous exercise. Hence the number of primes is inﬁnite.

3773 1 0

3596 0 1 1

177 1 −1 20

56 −20 21 3

9 61 −64 6

2 −386 405 4

1 1605 −1684 2

11.1 Solutions to Exercises of Chap. 1 261

y = −1684.

2. Performing the Extended Euclidean Algorithm on 1995 and 1840 gives

gcd(1995, 1840) = 5 and hence x = 95 and y = −103 may be taken that

will satisfy 1995x + 1840y = 5.

Multiply now 1995x + 1840y = 5 by (−2) to see that z 0 = −2x = −190

and w0 = −2y = 206 satisfy 1995z 0 + 1840w0 = −10. Next, observe that

1995(−k · 1840) + 1840(k · 1995) = 0, for any integer k. Sum the last two equa-

tions to obtain 1995(z 0 −1840k)+1840(w0 +1995k) = −10, for any integer k. It

is now easy to ﬁnd two additional solutions, for example z 1 = z 0 + 1840 = 1650

and w1 = w0 − 1995 = −1789, or z 2 = z 0 − 1840 = −2030 and w2 =

w0 + 1995 = 2201.

3. We are given that N = kc + a and N = td + b for some integers k and t.

Subtracting the two equalities yields 0 = kc + a − td − b. Therefore

a − b = kc − td.

Since the right-hand side is divisible by gcd(c, d), we see that a − b is divisible

by gcd(c, d) as well.

4. (a) The Extended Euclidean algorithm applied to 68 and 26 gives 2 =

gcd(68, 26) = 5 · 68 + (−13) · 26. Multiplying both sides by (35 − 9)/

2 = 13, we see that 35 − 9 = 13 · 5 · 68 − 13 · 13 · 26. Hence, the number

x = 35 + 13 · 13 · 26 = 9 + 13 · 5 · 68 = 4429 satisﬁes our congru-

ences. (There are many other solutions, all of them are congruent modulo

884 = lcm(26, 68); i.e., all these solutions are given by 4429 + 884 · n,

n ∈ Z .)

(b) The Extended Euclidean algorithm applied to 71 and 50 gives 1 = 27 · 50 +

(−19) · 71. Now, 15 = 19 − 4 and the number x = 4 + 15 · 27 · 50 =

19 + 15 · 19 · 71 = 20254 satisﬁes our congruences but is greater than 3550.

But x = x mod 3550 = 2504 is the unique solution of the two congruences

which lies in the interval [0, 3550).

5. (a) We know from Exercise 2 that gcd(1995, 1840) = 5. If there were integers x

and y satisfying 1840x + 1995y = 3, then 3 = 5(368x + 399y) and 3 would

be divisible by 5, a contradiction.

(b) Let C be the set of integers c for which there exist integers x and y satisfying

the equation ax + by = c, and let d = gcd(a, b). By the Extended Euclidean

Algorithm we know that there are some integers x0 , y0 , such that ax0 + by0 =

d. Let k be an arbitrary integer. Then a(kx0 ) + b(ky0 ) = kd, showing that

kd ∈ C, so C contains all multiples of gcd(a, b). Let us prove that C contains

nothing else. Write a = da and b = db , for some integers a and b ,

and take an arbitrary c ∈ C. Then, for some integers x and y, we have:

262 11 Solutions to Exercises

is indeed the set of all multiples of gcd(m, n).

Solutions to Exercises of Sect. 1.3.1

1. Using the prime factorization of these numbers and the formula for φ(n) we

compute:

1 2 4

φ(180) = φ(22 · 32 · 5) = 180 = 48,

2 3 5

φ(n) = 4382136 = ( p − 1)(q − 1). Thus n − φ(n) = 4471 = p + q − 1, whence

p + q = 4472. Solving the system of equations

p + q = 4472

pq = 4386607

3. We have φ(m) = pq( p − 1)(q − 1) = 11424 = 25 · 3 · 7 · 17. Hence p and q

can only be among the primes 2, 3, 7, 17. By the trial and error method we ﬁnd

p = 7, q = 17 and m = 14161.

4. By Fermat’s Little Theorem 24 ≡ 1 mod 5 so we need to ﬁnd the remainder of

2013

22013 on division by 4. This remainder is obviously 0 so the remainder of 22

on division by 5 is 1.

5. By Fermat’s Little Theorem we have a 6 ≡ 1 mod 7 for all a ∈ Z, which are not

divisible by 7. As 333 = 47 · 7 + 4 and 555 = 92 · 6 + 3,

333555 ≡ 43 ≡ 64 ≡ 1 mod 7.

555333 ≡ 23 ≡ 1 mod 7.

6. We compute a n−1 mod n as follows:

gap> n:=1234567890987654321;

1234567890987654321

gap> a:=111111111;

111111111

gap> PowerMod(a,n-1,n);

385560404775530811

11.1 Solutions to Exercises of Chap. 1 263

The result is not equal to 1 and this shows that by Fermat’s Little Theorem n is

not prime. Indeed, we see that n has four different prime factors:

gap> Factors(n);

[ 3, 3, 7, 19, 928163, 1111211111 ]

(b) Since a ≡ b mod m means m | (a − b), we see that for any divisor d | m we

have d | (a − b) which is the same as a ≡ b mod d.

(c) Indeed, a ≡ b mod m i , is equivalent to m i | (a − b). This implies

lcm(m 1 , m 2 , . . . , m k ) | (a − b),

which means the equivalence holds also for the least common multiple of the

m i ’s.

2. We have 72 ≡ −3 mod 25, 47 ≡ −3 mod 25 and 28 ≡ −3 mod 25. Thus

3. We have φ(3x 5 y ) = 3x−1 5 y−1 · 2 · 4 = 3x−1 5 y−1 · 23 and 600 = 23 · 3 · 52 . By

uniqueness of prime factorisation, we have x − 1 = 1 and y − 1 = 2. Hence

x = 2 and y = 3.

4. Let S be the set of positive integers a for which the congruence has a solution.

We see that 243 = 35 and φ(243) = 2 · 34 = 162. By Euler’s theorem:

If gcd(x, n) = 1, then x φ(n) ≡ 1 mod n.

Hence if gcd(x, 243) = 1, then x 162 ≡ 1 mod 243. Hence 1 ∈ S. If gcd(x, n) >

1, then x = 3y and x 162 ≡ 0 mod 243. Thus S = {0, 1}.

5. We are given that n = pq where p and q are primes. Moreover, we know that

φ(n) = φ( p)φ(q) = ( p − 1)(q − 1) = pq − p − q + 1 = 3308580, and therefore

p + q = n − 3308580 + 1. We now determine p and q from the equations:

pq = 3312913,

p + q = 4334.

This shows that p and q are the roots of the quadratic equation x 2 − 4334x +

3312913 = 0 which roots are 3343 and 991. The result is n = pq = 3343 · 991.

264 11 Solutions to Exercises

1. By the distributive law (CR5) we have a · 0 + a · 0 = a · (0 + 0) = a · 0. Now

subtracting a·0 on both sides we get a·0 = 0. We further argue as in Lemma 1.4.2.

2. (a) The invertible elements of Z16 are those elements that are relatively prime to

16 = 24 (i.e., those which are odd). We have

12 = 72 = 92 = 152 = 1, 3 · 11 = 1, 5 · 13 = 1,

13−1 = 5.

(b) The zero-divisors of Z15 are those (non-zero) elements that are not relatively

prime to 15 = 3 · 5 (i.e., multiples of 3 or 5). We have

a multiple of 5.

3 (a) Using the Euclidean algorithm, we ﬁnd that gcd(111, 74) = 37 and that

gcd(111, 77) = 1, so 77 is invertible and 74 is a zero divisor. Since 111 =

3·37, we have 74 c = 0 for any c that is a multiple of 3. From the Extended

Euclidean algorithm 1 = 34 · 111 − 49 · 77, hence 77−1 = −49 = 62.

(b) We have

Hence

x = (77−1 ) 100 = 62 100 = 95,

74 x ⊕ 11 = 0 ⇒ 74 x = −11 = 100,

and there are no solutions because {74 x | x ∈ Z111 } = {0, 37, 74}.

4. Since we will have only operations in Zn for various n but not in Z we will write

+ and · instead of ⊕ and . Recall that a function from a set A to A itself is

one-to-one if no two (different) elements of A are mapped to the same element

of A. For a ﬁnite set this is also equivalent to f being onto which can be also

restated as the range of f being all of Z21 .

(a) If a is a zero-divisor in Z21 , that is, if there is an element d = 0 in Z21 , such

that ad = 0 mod 21, then f (d) = ad + b = b = f (0), and f is not one-to-

one. On the other hand, if a is not a zero divisor, then gcd(a, 21) = 1, and

there exists (a unique) element c ∈ Z21 satisfying ac = 1 mod 21. But then

f (x1 ) = f (x2 ) implies cf(x1 ) = cf(x2 ), or c(ax1 + b) = c(ax2 + b), which

reduces to x1 + cb = x2 + cb and ﬁnally implies that x1 = x2 , proving that

11.1 Solutions to Exercises of Chap. 1 265

f is one-to-one in this case. The set of pairs (a, b), for which the function

f is one-to-one is therefore {(a, b) | a, b ∈ Z21 and gcd(a, 21) = 1}.

(b) Since 7 is not relatively prime with 21 the function f is not one-to-one, and

so the image of f is a proper subset of Z21 . The expression 7x, for x ∈ Z21 ,

takes only three values in Z21 , namely 0 if x is a multiple of 3, 7 if x is

congruent to 1 modulo 3, and 14 if x is congruent to 2 modulo 3. The image

of f is therefore {3, 10, 17}.

(c) The condition f −1 ( f (x)) = x, for all x ∈ Z21 , is equivalent to c(ax +

b) + d = x, or (ac)x + (cb + d) = x. It is sufﬁcient to take ac = 1 and

cb + d = 0. We can ﬁnd c by solving the equation 4c + 21y = 1 using

the Extended Euclidean Algorithm, which gives us c = −5, y = 1, or

better, c = 16, y = −3. Now, d = −cb = −16 · 15 = 12 mod 21. So,

f −1 (x) = 16x + 12.

5. Fermat’s Little Theorem says that if p is prime and a is not divisible by p, then

a p−1 ≡ 1 mod p. Hence x 10 = 1 in Z11 . So x 102 = x 2 in Z11 . The equation

x 2 = 4 has in Z11 two solutions: x1 = 2 and x2 = −2 = 9.

6. Since m is odd, gcd(m, 2) = 1, whence 2φ(m) ≡ 1 mod m. Thus 2φ(m)−1 ≡

2−1 mod m which is the inverse of 2 in Zm . Since m is odd, m + 1 is an even

number and (m + 1)/2 is an integer. This number is the inverse of 2 in Zm since

2 (m + 1)/2 = 1. Therefore 2φ(m)−1 ≡ (m + 1)/2 mod m.

7. If ( p − 1)! ≡ −1 mod p, then gcd( j, p) = 1 for all j ∈ Z∗p . Hence p is prime. If

p is prime, then the equation x 2 = 1 in Z p is equivalent to (x − 1)(x + 1) = 0,

hence has only two solutions x = ±1, that is, either x = 1 or x = p − 1. Then for

every j ∈ {2, . . . , p − 2} we have j −1 = j. This means 2 · 3 · . . . · ( p − 2) = 1.

Hence ( p − 1)! = p − 1 = −1.

1. 2002(10) = 11111010010(2) ; and 1100101(2) = 26 + 25 + 22 + 1 = 99(10) .

2. (a) 2011(10) = 11111011011(2) ;

(b) 101001000(2) = 28 + 26 + 23 = 256 + 64 + 8 = 328(10) .

3. Observe ﬁrst that the last three digits in the binary representation depend only on

the remainder on division by 8. Namely, if a = an 2n +· · ·+a3 23 +a2 22 +a1 2+a0

is the binary representation of a, then a ≡ a2 22 + a1 2 + a0 mod 8. Clearly

751015 ≡ 31015 mod 8. By Euler’s Theorem, 3φ(8) = 34 ≡ 1 mod 8. Therefore,

751015 ≡ 3253·4+3 ≡ 33 ≡ 3 mod 8. Since 3 = 11(2) , we see that the last three

digits in the binary representation of 751015 are 011.

4. We calculate as follows:

. . 01 (2) · 10 .

10 .

n m

and 2 non-zero digits if m = n = 2.

266 11 Solutions to Exercises

means that n ≡ a + b + c + d mod 6. Therefore n ≡ 0 mod 6 if and only if

a + b + c + d ≡ 0 mod 6.

6. (a) 2A4F(16) = 2 · 163 + 10 · 162 + 4 · 16 + 15 = 10831,

(b) 1000 = 16 · 62 + 8, and 62 = 16 · 3 + 14, so

encodings of the ith positions of the plain text, the key and the cypher text,

respectively.

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Plaintext B U Y M O R E P R O P E R T Y

pi 1 20 24 12 14 17 4 16 17 14 16 4 17 19 24

Key T O D A Y I W I L L G O O N C

ki 19 14 3 0 24 8 22 8 11 11 6 14 14 13 2

pi + ki = ci 20 8 1 12 12 25 0 24 2 25 22 18 5 6 0

Cyphertext U I B M M Z A Y C Z W S F G A

Conversely, to decrypt we add (−ki ) to each side of the above to get pi =

ci + (−ki ).

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Cyphertext R C X R N W O A P D Y W C A U

ci 17 2 23 17 13 22 14 0 15 3 24 22 2 0 20

Key T O D A Y I W I L L G O O N C

−ki 7 12 23 0 2 18 4 18 15 15 20 12 12 13 24

ci + (−ki ) = pi 24 14 20 17 15 14 18 18 4 18 18 8 14 13 18

Plaintext Y O U R P O S S E S S I O N S

i 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Cyphertext E R K Y W H Z R G S X Q J W

ci 4 17 10 24 22 7 25 17 6 18 23 16 9 22

Key E A G A I N I N T O L I F E

−ki 22 0 20 0 18 13 18 13 7 12 15 18 21 22

ci + (−ki ) = pi 0 17 4 24 14 20 17 4 13 4 12 8 4 18

Plaintext A R E Y O U R E N E M I E S

11.2 Solutions to Exercises of Chap. 2 267

2. We will place the result in an array called random:

gap> random:=[1..20];;

gap> for i in [1..20] do

> random[i]:=Random([0..25]);

> od;

gap> random;

[ 24, 19, 16, 9, 1, 9, 24, 24, 15, 3, 12, 3, 10, 11, 21, 23, 19, 6, 19, 24 ]

3. The message as a numerical string will be: [8, 7, 0, 21, 4, 13, 14, 19, 8, 12, 4, 19,

14, 7, 0, 19, 4].

gap>#Entering the key:

gap> k:=random;;

gap>#Entering the message:

gap> p:=[ 8, 7, 0, 21, 4, 13, 14, 19, 8, 12, 4, 19, 14, 7, 0, 19, 4 ];;

gap> c:=[1..Length(p)];

[ 1 .. 17 ]

gap> for i in [1..Length(p)] do

> c[i]:=(p[i]+k[i]) mod 26;

> od;

gap> c;

[ 6, 0, 16, 4, 5, 22, 12, 17, 23, 15, 16, 22, 24, 18, 21, 16, 23 ]

gap># which in letters will be GAQEFWMRXPQWYSVQX

gap> # Decoding back:

gap> q:=[1..Length(p)];;

gap> for i in [1..Length(p)] do

> q[i]:=(c[i]-k[i]) mod 26;

> od;

gap> p=q;

true

1. (13, 11) cannot be used as a key since 13 is not invertible in Z26 and the mapping

x → 13x + 11 (mod 26) would not be one-to-one.

2. The cyphertext for CRYPTO will be JSRWOL. The inverse function for

decrypting is

C −→ x −→ y = 19x + 13 mod 26 −→ L

and the plaintext for DRDOFP is SYSTEM. We can calculate the latter using

subprograms LtoN and NtoL:

gap> str := "DRDOFP"; ;

gap> outstr := "A";

gap> for i in [1..Length(str)] do

> outstr[1] := NtoL( (19*LtoN( str[i] ) + 13) mod 26);

> Print( outstr );

> od;

SYSTEM

3. Since the letter F was encrypted as N, and the letter K was encrypted as O. Then

for the encryption function f (x) = ax + b mod 26 we will have f (5) = 13 and

f (10) = 14. Solving the system of equations in Z26

268 11 Solutions to Exercises

5a + b = 13,

10a + b = 14

we ﬁnd a = 21 and b = 12, hence the key is the pair (21, 12).

With GAP this would be

gap> M:=[[5,1],[10,1]];

[ [ 5, 1 ], [ 10, 1 ] ]

gap> rhs:=[13,14];

[ 13, 14 ]

gap> [a,b]:=Mˆ-1*rhs mod 26;

[ 21, 12 ]

the cyphertext are as given in the table below.

a 0.049 j 0.076 s 0.017

b 0.052 k 0.045 t 0.000

c 0.135 l 0.076 u 0.062

d 0.000 m 0.031 v 0.007

e 0.000 n 0.031 w 0.007

f 0.000 o 0.035 x 0.101

g 0.000 p 0.093 y 0.021

h 0.007 q 0.017 z 0.000

i 0.101 r 0.066

Since the most frequent letter in the cyphertext is c and in English texts this

role is usually played by e, our guess is that the encryption function f (x) =

ax + b mod 26 maps the integer value of e, which is 4, to the integer value of c,

which is 2. This gives the ﬁrst equation:

The second most frequent letter in English is t, while in our cyphertext the second

place is shared by x and i. Suppose ﬁrst that the letter t was encrypted to x. Then

12 mod 26. If the encryption function is f (x) = ax + b mod 26, then the

decryption function is g(x) = cx − cb mod 26, where ca = 1 mod 26. In the

case a = 17, b = 12, we get c = 23 and so g(x) = 23x + 10 mod 26. If we

decrypt the cyphertext with this function we get

djree rctqk xmr ...

11.2 Solutions to Exercises of Chap. 2 269

which is obviously not an English text. Our guess that t was encrypted to x must

therefore be wrong. We get similar nonsense if we assume that t is encrypted to i.

We can either proceed in this fashion until we get something meaningful, or

observe that in our cyphertext the group of three letters ljc is very frequent. Since

our guess is that c is in fact encrypted to e, it is very plausible that the group ljc

represents the word the. If this is the case, then t is encoded to l, which gives the

equation

19a + b = 11 mod 26, (11.3)

This, together with (11.1), implies that a = 11 and b = 10. The decrypting

function is then g(x) = 19x + 18. Decrypting the cyphertext with g gives the

following plaintext:

three rings for the eleven kings under the sky seven for the dwarf lords in their halls of stone

nine for mortal men doomed to die one for the dark lord on his dark throne in the land of

mordor where the shadows lie one ring to rule them all one ring to ﬁnd them one ring to

bring them all and in the darkness bind them in the land of mordor where the shadows lie

1. (a) Computing the determinants of these matrices we get

1 12 16

det = 13, det = 17.

12 1 61

Since 13 is not relatively prime to 26 the ﬁrst matrix is not invertible because

its determinant is not invertible. Since 17−1 = 23 exists the second matrix is

invertible with

−1

16 1 20 23 18

= 23 = .

61 20 1 18 23

16

M=

61