Algebra For Applications

Springer Undergraduate Mathematics Series
Arkadii Slinko
Algebra for
Applications
Cryptography, Secret Sharing,
Error-Correcting, Fingerprinting,
Compression
Advisory Board
M.A.J. Chaplain, University of Dundee, Dundee, Scotland, UK
K. Erdmann, University of Oxford, Oxford, England, UK
A. MacIntyre, Queen Mary, University of London, London, England, UK
E. Süli, University of Oxford, Oxford, England, UK
M.R. Tehranchi, University of Cambridge, Cambridge, England, UK
J.F. Toland, University of Cambridge, Cambridge, England, UK
More information about this series at http://www.springer.com/series/3423
Arkadii Slinko
Algebra for Applications

Cryptography, Secret Sharing,
Error-Correcting, Fingerprinting,
Compression
123
Arkadii Slinko
Department of Mathematics
The University of Auckland
Auckland
New Zealand
ISSN 1615-2085 ISSN 2197-4144 (electronic)

ISBN 978-3-319-21950-9 ISBN 978-3-319-21951-6 (eBook)
DOI 10.1007/978-3-319-21951-6
Library of Congress Control Number: 2015945568
Mathematics Subject Classification: 11A05–11A51, 11C08, 11C20, 11T06, 11T71, 11Y05, 11Y11,
11Y16, 20A05, 20B30, 12E20, 14H52, 14G50, 68P25, 68P30, 94A60, 94A62
Springer Cham Heidelberg New York Dordrecht London

© Springer International Publishing Switzerland 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
Printed on acid-free paper
Springer International Publishing AG Switzerland is part of Springer Science+Business Media

(www.springer.com)
The aim of a Lecturer should be,
not to gratify his vanity by a shew
of originality; but to explain,
to arrange, and to digest with clearness,
what is already known in the science…
George Pryme (1781–1868)

To my parents Michael and Zinaida,
my wife Lilia,
my children Irina and Michael, and
my grandchildren Erik and Yuri.
Preface
This book originated from my lecture notes for the one-semester course which I
have given many times at The University of Auckland since 1998. The goal of that
course and this book is to show the incredible power of algebra and number theory
in the real world. It does not advance far in theoretical algebra, theoretical number
theory or combinatorics. Instead, we concentrate on concrete objects like groups of
points on elliptic curves, polynomial rings and finite fields, study their elementary
properties and show their exceptional applicability to various problems in infor-
mation handling. Among the applications are cryptography, secret sharing,
error-correcting, fingerprinting and compression of information.
Some chapters of this book—especially the number-theoretic and cryptographic
ones—use GAP to illustrate the main ideas. GAP is a system for computational
discrete algebra, which provides a programming language, a library of thousands of
functions implementing algebraic algorithms, written in the GAP language, as well
as large data libraries of algebraic objects.
If you are using this book for self-study, then studying a certain topic, familiarise
yourself with the corresponding section of Appendix A, where you will find
detailed instructions on how to use GAP for this particular topic. As GAP will be
useful for most topics, it is not a good idea to skip it completely.
I owe a lot to Robin Christian who in 2006 helped me to introduce GAP to my
course and proofread the lecture notes. The introduction of GAP has been the
biggest single improvement to this course. The initial version of the GAP Notes,
which have now been developed into Appendix A, were written by Robin. Stefan
Kohl, with the assistance of Eamonn O’Brien, has kindly provided us with two
programs for GAP that allowed us to calculate in groups of points on elliptic curves.
I am grateful to Paul Hafner, Primož Potočnic, Jamie Sneddon and especially to
Steven Galbraith who in various years were members of the teaching team for this
course and suggested valuable improvements or contributed exercises.
Many thanks go to Shaun White who did a very thorough job proofreading part
of the text in 2008 and to Steven Galbraith who improved the section on cryp-
tography in 2009 and commented on the section on compression. However, I bear
ix
x Preface
the sole responsibility for all mistakes and misprints in this book. I would be most
obliged if you report any noticed mistakes and misprints to me.
I hope you will enjoy this book as much as I enjoyed writing it.
Auckland Arkadii Slinko

March 2015
Contents
1 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Natural Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Divisibility and Primes . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Factoring Integers. The Sieve of Eratosthenes. . . . . . . 9
1.2 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.1 Greatest Common Divisor and Least Common
Multiple. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 14
1.2.2 Extended Euclidean Algorithm. Chinese
Remainder Theorem . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Fermat’s Little Theorem and Its Generalisations . . . . . . . . . . 22
1.3.1 Euler’s φ-Function . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Congruences. Euler’s Theorem . . . . . . . . . . . . . . . . . 24
1.4 The Ring of Integers Modulo n. The Field Zp . . . . . . . . . . . . 27
1.5 Representation of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Cryptology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1 Classical Secret-Key Cryptology . . . . . . . . . . . . . . . . . . . . . . 38
2.1.1 The One-Time Pad . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.2 An Affine Cryptosystem . . . . . . . . . . . . . . . . . . . . . 41
2.1.3 Hill’s Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Modern Public-Key Cryptology . . . . . . . . . . . . . . . . . . . . . . 47
2.2.1 One-Way Functions and Trapdoor Functions . . . . . . . 47
2.3 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.1 Orders of Magnitude . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3.2 The Time Complexity of Several Number-Theoretic
Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 The RSA Public-Key Cryptosystem . . . . . . . . . . . . . . . . . . . 58
2.4.1 How Does the RSA System Work?. . . . . . . . . . . . . . 58
2.4.2 Why Does the RSA System Work?. . . . . . . . . . . . . . 61
2.4.3 Pseudoprimality Tests . . . . . . . . . . . . . . . . . . . . . . . 64
xi
xii Contents
2.5 Applications of Cryptology . . . . . . . . . . . . . . . . . . . . . . . . . 69

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3 Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
3.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
3.1.1 Composition of Mappings. The Group
of Permutations of Degree n. . . . . . . . . . . . . . . . . . . 73
3.1.2 Block Permutation Cipher . . . . . . . . . . . . . . . . . . . . 78
3.1.3 Cycles and Cycle Decomposition . . . . . . . . . . . . . . . 79
3.1.4 Orders of Permutations . . . . . . . . . . . . . . . . . . . . . . 81
3.1.5 Analysis of Repeated Actions . . . . . . . . . . . . . . . . . . 84
3.1.6 Transpositions. Even and Odd . . . . . . . . . . . . . . . . . 86
3.1.7 Puzzle 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.2 General Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2.1 Definition of a Group. Examples . . . . . . . . . . . . . . . 93
3.2.2 Powers, Multiples and Orders. Cyclic Groups. . . . . . . 95
3.2.3 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.2.4 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.3 The Abelian Group of an Elliptic Curve . . . . . . . . . . . . . . . . 103
3.3.1 Elliptic Curves. The Group of Points of an Elliptic
Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3.2 Quadratic Residues and Hasse’s Theorem . . . . . . . . . 109
3.3.3 Calculating Large Multiples Efficiently . . . . . . . . . . . 112
3.4 Applications to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 114
3.4.1 Encoding Plaintext . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.4.2 Additive Diffie–Hellman Key Exchange
and the Elgamal Cryptosystem . . . . . . . . . . . . . . . .. 115
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 116
4 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1 Introduction to Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1.1 Examples and Elementary Properties of Fields . . . . . . 117
4.1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.1.3 The Cardinality of a Finite Field. . . . . . . . . . . . . . . . 124
4.2 The Multiplicative Group of a Finite Field Is Cyclic . . . . . . . . 125
4.2.1 Lemmas on Orders of Elements . . . . . . . . . . . . . . . . 126
4.2.2 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . 128
4.2.3 Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3 The Elgamal Cryptosystem Revisited . . . . . . . . . . . . . . . . . . 130
5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.1 The Ring of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.1.1 Introduction to Polynomials . . . . . . . . . . . . . . . . . . . 133
5.1.2 Lagrange Interpolation. . . . . . . . . . . . . . . . . . . . . . . 138
Contents xiii
5.1.3 Factoring Polynomials . . . . . . . . . . . . . . . . . . ..... 140

5.1.4 Greatest Common Divisor and Least Common
Multiple. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2.1 Polynomials Modulo mðxÞ . . . . . . . . . . . . . . . . . . . . 145
5.2.2 Minimal Annihilating Polynomials . . . . . . . . . . . . . . 148
6 Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.1 Introduction to Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . 154
6.1.1 Access Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.1.2 Shamir’s Threshold Access Scheme . . . . . . . . . . . . . 155
6.2 A General Theory of Secret Sharing Schemes . . . . . . . . . . . . 158
6.2.1 General Properties of Secret Sharing Schemes . . . . . . 158
6.2.2 Linear Secret Sharing Schemes . . . . . . . . . . . . . . . . . 163
6.2.3 Ideal and Non-ideal Secret Sharing Schemes . . . . . . . 167
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7 Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.1 Binary Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . 172
7.1.1 The Hamming Weight and the Hamming Distance . . . 172
7.1.2 Encoding and Decoding. Simple Examples . . . . . . . . 175
7.1.3 Minimum Distance, Minimum Weight.
Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.1.4 Matrix Encoding Technique . . . . . . . . . . . . . . . . . . . 182
7.1.5 Parity Check Matrix . . . . . . . . . . . . . . . . . . . . . . . . 187
7.1.6 The Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . 190
7.1.7 Polynomial Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.1.8 Bose–Chaudhuri–Hocquenghem (BCH) Codes . . . . . . 196
7.2 Non-binary Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . 199
7.2.1 The Basics of Non-binary Codes . . . . . . . . . . . . . . . 199
7.2.2 Reed–Solomon (RS) Codes . . . . . . . . . . . . . . . . . . . 201
7.3 Fingerprinting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.3.1 The Basics of Fingerprinting . . . . . . . . . . . . . . . . . . 204
7.3.2 Frameproof Codes. . . . . . . . . . . . . . . . . . . . . . . . . . 207
7.3.3 Codes with the Identifiable Parent Property . . . . . . . . 208
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8 Compression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... . . 213

8.1 Prefix Codes . . . . . . . . . . . . . . . . . . . . . . . . . . ....... . . 214
8.1.1 Information and Information Relative to a Partition . . . 214
8.1.2 Non-uniform Encoding. Prefix Codes . . . ....... . . 217
8.2 Fitingof’s Compression Code . . . . . . . . . . . . . . . ....... . . 221
8.2.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . ....... . . 221
8.2.2 Fast Decoding . . . . . . . . . . . . . . . . . . . ....... . . 224
xiv Contents
8.3 Information and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 226

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9 Appendix A: GAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

9.1 Computing with GAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.1.1 Starting with GAP . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.1.2 The GAP Interface . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.1.3 Programming in GAP: Variables, Lists, Sets
and Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.2 Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.2.1 Basic Number-Theoretic Algorithms . . . . . . . . . . . . . 232
9.2.2 Arithmetic Modulo m . . . . . . . . . . . . . . . . . . . . . . . 234
9.2.3 Digitising Messages . . . . . . . . . . . . . . . . . . . . . . . . 235
9.3 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.4.2 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.4.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.4.4 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
10 Appendix B: Miscellanies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

10.1 Linear Dependency Relationship Algorithm . . . . . . . . . . . . . . 249
10.2 The Vandermonde Determinant . . . . . . . . . . . . . . . . . . . . . . 250
11 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

11.1 Solutions to Exercises of Chap. 1 . . . . . . . . . . . . . . . . . . . . 253
Chapter 1
Integers
We must get back to primeval integrity.

Rhinoceros. Eugène Ionesco (1909–1994)
The formula ‘Two and two make five’ is not without its
attractions.
Notes from Underground. Fyodor Dostoevsky (1821–1881)
The theory of numbers is the oldest and the most fundamental mathematical disci-
pline. Despite its old age, it is one of the most active research areas of mathematics
due to two main reasons. Firstly, the advent of fast computers has changed Number
Theory profoundly and made it in some ways almost an experimental discipline.
Secondly, new important areas of applications such as cryptography have emerged.
Some of the applications of Number Theory will be considered in this course.
1.1 Natural Numbers
1.1.1 Basic Principles
The theory of numbers is devoted to studying the set N = {1, 2, 3, 4, 5, 6, . . .} of

positive integers, also called the natural numbers. The most important properties of
N are formulated in the following three principles.
The Least Integer Principle. Every non-empty set S ⊆ N of positive integers
contains a smallest (least) element.
The Principle of Mathematical Induction. Let S ⊆ N be a set of positive
integers which contains 1 and contains n + 1 whenever it contains n. Then S = N.
The Principle of Strong Mathematical Induction. Let S ⊆ N be a set of positive
integers which contains 1 and contains n + 1 whenever it contains 1, 2, . . . , n. Then
S = N.
These three principles are equivalent to each other: if you accept one of them
you can prove the remaining two as theorems. Normally one of them, most often
the Principle of Mathematical Induction, is taken as an axiom of Arithmetic but in
© Springer International Publishing Switzerland 2015 1
A. Slinko, Algebra for Applications, Springer Undergraduate Mathematics Series,
DOI 10.1007/978-3-319-21951-6_1
2 1 Integers
proofs we use all of them since one may be much more convenient to use than the
others.
Example 1.1.1 On planet Tralfamadore there are only 3 cent and 5 cent coins in
circulation. Prove that an arbitrary sum of n ≥ 8 cents can be paid (provided one has
a sufficient supply of coins).
Solution: Suppose that this statement is not true and there are positive integers
m ≥ 8 for which the sum of m cents cannot be paid by a combination of 3 cent and
5 cent coins. By the Least Integer Principle there is a smallest such positive integer
s (the minimal counterexample). It is clear that s is not 8, 9 or 10 as 8 = 3 + 5,
9 = 3 + 3 + 3, 10 = 5 + 5. Thus s − 3 ≥ 8 and, since s was minimal, the sum of
s − 3 cents can be paid as required. Adding to s − 3 cents one more 3 cent coin we
obtain that the sum of s cents can be also paid, which is a contradiction.
Example 1.1.2 Prove that
1 1 1
+ 2 + · · · + 2 < 2.
12 2 n
Solution: Denote the left-hand side of the inequality by F(n). We have a sequence
of statements A1 , A2 , . . . , An , . . . to be proved, where An is F(n) < 2, and we are
going to use the Principle of Mathematical Induction to prove all of them.
The statement A1 reduces to
1
< 2,
12
which is true. Now we have to derive the validity of An+1 from the validity of An ,
that is, to prove that
1
F(n) < 2 implies F(n) + < 2.
(n + 1)2
Oops! It is not possible because, while we do know that F(n) < 2, we do not have
the slightest idea how close F(n) is to 2, and we therefore cannot be sure that there
1
will be room for (n+1) 2 . What shall we do?
Surprisingly, the stronger inequality
1 1 1 1
2
+ 2 + ··· + 2 ≤ 2 −
1 2 n n
1.1 Natural Numbers 3
can be proved smoothly. Indeed, A1 is again true as
1 1
=2− ,
12 1
and
1 1 1
F(n) ≤ 2 − implies F(n) + ≤2− (1.1)
n (n + 1)2 n+1
is now true. Due to the induction hypothesis, which is F(n) ≤ 2 − n1 , to show (1.1)
it would be sufficient to show that

1 1 1
2− + 2
≤2− .
n (n + 1) n+1
This is equivalent to
1 1 1 1
≤ − = ,
(n + 1)2 n n+1 n(n + 1)
which is true.
This example shows that we shouldn’t expect that someone has already prepared
the problem for us so that the Principle of Mathematical Induction can be applied
directly.
The reader needs to be familiar with the induction principles. The exercises below
concentrate on the use of the Least Integer Principle.
Exercises
1. Verify that each of the following two statements is false:

(a) Every nonempty set of integers (we do not require the integers in the set to
be positive) contains a smallest element.
(b) Every nonempty set of positive rational numbers contains a smallest element.
2. Prove that, for any integer n ≥ 1, the integer 4n + 15n − 1 is divisible by 9.
3. Prove that 11n+2 + 122n+1 is divisible by 133 for all n ≥ 0.
n
4. Let Fn = 22 + 1 be the nth Fermat number. Show that F0 F1 . . . Fn = Fn+1 − 2.
5. Prove that 2n + 1 is divisible by n for all numbers of the form n = 3k .
6. Prove that an arbitrary positive integer N can be represented as a sum of distinct
powers chosen from 1, 2, 22 , . . . , 2n , . . .
7. Use the Least Integer Principle to prove that the representation of N as a sum of
distinct powers of 2 is unique.
8. Several discs of equal diameter lie on a table so that some of them touch each
other but no two of them overlap. Prove that these discs can be painted with four
colours so that no two discs of the same colour touch.
4 1 Integers
1.1.2 Divisibility and Primes
The set of all integers

. . . , −3, −2, −1, 0, 1, 2, 3, . . .
is denoted by Z.
Theorem 1.1.1 (Division with Remainder) Given any integers a, b, with a > 0,
there exist unique integers q, r such that
b = qa + r, and 0 ≤ r < a.
In this case we also say that q and r are, respectively, the quotient and the remainder
of b when it is divided by a. It is often said that q and r are the quotient and the
remainder of dividing a into b. The notation r = b mod a is often used. You can
find q and r by using long division, a technique which most students learn at school.
If you want to find q and r using a calculator, use it to divide b by a. This will give
you a number with decimals. Discard all the digits to the right of the decimal point
to obtain q. Then find r as a − bq.
Example 1.1.3 (a) 35 = 3 · 11 + 2, (b) −51 = (−8) · 7 + 5; so that 2 = 35 mod 11
and 5 = −51 mod 7.
Definition 1.1.1 An integer b is divisible by an integer a = 0 if there exists an
integer c such that b = ac, that is, we have b mod a = 0. We also say that a is a
divisor of b and write a|b.
Let n be a positive integer. Let us denote by d(n) the number of positive divisors
of n. It is clear that 1 and n are always divisors of any number n which is greater
than 1. Thus we have d(1) = 1 and d(n) ≥ 2 for n > 1.
Definition 1.1.2 A positive integer n is called a prime if d(n) = 2. An integer n > 1
which is not prime is called a composite number.
Example 1.1.4 (a) 2, 3, 5, 7, 11, 13 are primes; (b) 1, 4, 6, 8, 9, 10 are not primes;
(c) 4, 6, 8, 9, 10 are composite numbers.
A composite positive integer n can always be represented as a product of two
other positive integers different from 1 and n. Indeed, since d(n) > 2, there is a
divisor n 1 such that 1 < n 1 < n. But then n 2 = n/n 1 also satisfies 1 < n 2 < n and
n = n 1 n 2 . We are ready to prove
Theorem 1.1.2 (The Fundamental Theorem of Arithmetic) Every positive integer
n > 1 can be expressed as a product of primes (with perhaps only one factor), that is
n = p1α1 p2α2 . . . prαr ,
where p1 , p2 , . . . , pr are distinct primes and α1 , α2 , . . . , αr are positive integers.

This factorisation is unique apart from the order of the prime factors.
Proof Let us prove first that any number n > 1 can be decomposed into a product
of primes. We will use the Principle of Strong Mathematical Induction. If n = 2, the
decomposition is trivial and we have only one factor, which is 2 itself. Let us assume
that for all positive integers which are less than n, a decomposition into a product
of primes exists. If n is a prime, then n = n is the decomposition required. If n is
composite, then n = n 1 n 2 , where n > n 1 > 1 and n > n 2 > 1 and by the induction
hypothesis there are prime decompositions n 1 = p1 . . . pr and n 2 = q1 . . . qs for n 1
and n 2 . Then we may combine them
n = n 1 n 2 = p1 . . . pr q1 . . . qs
and get a decomposition for n and prove the first statement.

To prove that the decomposition is unique, we shall assume the existence of
an integer capable of two essentially different prime decompositions, and from this
assumption derive a contradiction. This will show that the hypothesis that there exists
an integer with two essentially different prime decompositions cannot be true, and
hence the prime decomposition of every integer is unique. We will use the Least
Integer Principle.
Suppose that there exists a positive integer with two essentially different prime
decompositions, then there will be a smallest such integer
n = p1 p2 . . . pr = q1 q2 . . . qs , (1.2)
where pi and q j are primes. By rearranging the order of the p’s and the q’s, if
necessary, we may assume that
p1 ≤ p2 ≤ . . . ≤ pr , q1 ≤ q2 ≤ . . . ≤ qs .
It is impossible that p1 = q1 , for, if it were the case, we would cancel the first factor
from each side of Eq. (1.2) and obtain two essentially different prime decompositions
for the number n/ p1 , which is smaller than n, contradicting the choice of n. Hence
either p1 < q1 or q1 < p1 . Without loss of generality we suppose that p1 < q1 .
We now form the integer
n = n − p1 q 2 q 3 . . . q s . (1.3)
Since p1 < q1 , we have n = n − p1 q2 q3 . . . qs > n − q1 q2 q3 . . . qs = 0, thus this

number is positive. It is obviously smaller than n. The two distinct decompositions
of n give the following two decompositions of n :
n = ( p1 p2 . . . pr ) − ( p1 q2 . . . qs ) = p1 ( p2 . . . pr − q2 . . . qs ), (1.4)
n = (q1 q2 . . . qs ) − ( p1 q2 . . . qs ) = (q1 − p1 )(q2 . . . qs ). (1.5)

6 1 Integers
Since n is a positive integer, which is smaller than n and greater than 1, the prime
decomposition for n must be unique, apart from the order of the factors. This means
that if we complete prime factorisations (1.4) and (1.5) the result will be identical.
From (1.4) we learn that p1 is a factor of n and must appear as a factor in decompo-
sition (1.5). Since p1 < q1 ≤ qi , we see that p1 = qi , i = 2, 3, . . . , s. Hence, it is a
factor of q1 − p1 , i.e., q1 − p1 = p1 m or q1 = p1 (m + 1), which is impossible as q1
is prime and m + 1 ≥ 2. This contradiction completes the proof of the Fundamental
Theorem of Arithmetic. �
Example 1.1.5 396 = 22 · 32 · 11 and 17 = 17 are two prime factorisations. The

corresponding output of GAP will look as follows:
gap> FactorsInt(396);
[ 2, 2, 3, 3, 11 ]
[ 17 ]
GAP conveniently remembers all 168 primes not exceeding 1000. They are stored
in the array Primes (in Sect. 9.1.3 all the primes in this array are listed). GAP can
also check if a particular number is prime or not.
gap> IsPrime(2ˆ(2ˆ4)-1);
false
gap> IsPrime(2ˆ(2ˆ4)+1);
true
What GAP cannot answer is whether or not there are infinitely many primes. This is
something that can only be proved.
Theorem 1.1.3 (attributed to Euclid)1 The number of primes is infinite.
Proof Suppose there were only finitely many primes p1 , p2 , . . . , pr . Then form the
integer
n = 1 + p1 p2 . . . pr .
1 Euclid of Alexandria (about 325 BC–265 BC) is one of the most prominent educators of all
time. He is best known for his treatise on mathematics The Elements which is divided into 13
books: the first six on geometry, three on number theory, one is devoted to Eudoxus’s theory of
irrational numbers and the last three to solid geometry. Euclid is not known to have made any
original discoveries and The Elements are based on the work of the people before him such as
Eudoxus, Thales, Hippocrates and Pythagoras. Over a thousand editions of this work have been
published since the first printed version appeared in 1482. Very little, however, is known about his
life. The enormity of the work attributed to Euclid even led some researchers to suggest that The
Elements was written by a team of mathematicians at Alexandria who took the name Euclid from
the historical character who lived 100 years earlier.
Since n > pi for all i, it must be composite. Let q be the smallest prime factor of n.
As p1 , p2 , . . . , pr represent all existing primes, then q is one of them, say q = p1
and n = p1 m. Now we can write
1 = n − p1 p2 . . . pr = p1 m − p1 p2 . . . pr = p1 (m − p2 . . . pr ).
We have got that p1 > 1 is a factor of 1, which is an absurdity. So our initial

assumption that there were only finitely many primes must be false. �
In the past many mathematicians looked for a formula that always evaluates to
a prime number. Euler2 noticed that all values of the quadratic polynomial P(n) =
n 2 − n + 41 are prime for n = 0, 1, 2, . . . , 40. However, P(41) = 412 is not prime.
m
For the same reason Fermat introduced the numbers Fm = 22 +1, m ≥ 0, which are
now called Fermat numbers. He checked that F0 = 3, F1 = 5, F2 = 17, F3 = 257
and F4 = 65537 are primes. He believed that all such numbers are primes, however
he could not prove that F5 = 4294967297 is prime. Euler in 1732 showed that F5
was composite by presenting its prime factorisation F5 = 641 · 6700417. We can
now easily check this with GAP:
gap> F5:=2ˆ(2ˆ5)+1;
4294967297
gap> IsPrime(F5);
false
gap> FactorsInt(F5);
[ 641, 6700417 ]
Since then it has been shown that all numbers F5 , F6 , . . . , F32 are composite. The
status of F33 remains unknown (December, 2014). It is also unknown whether there
are infinitely many prime Fermat numbers.
Many early scholars felt that the numbers of the form 2n − 1 were prime for all
prime values of n, but in 1536 Hudalricus Regius showed that 211 − 1 = 2047 =
23 · 89 was not prime. The French monk Marin Mersenne (1588–1648) gave in
the preface to his Cogitata Physica-Mathematica (1644) a list of positive integers
n < 257 for which the numbers 2n − 1 were prime. Several numbers in that list were
incorrect. By 1947 Mersenne’s range, n < 257, had been completely checked and it
was determined that the correct list was:
n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127.
Mersenne still got his name attached to these numbers.

As of February 2014 there are 48 known Mersenne primes. The last one was
discovered in January 2013 by the Great Internet Mersenne Prime Search (GIMPS)
2 Leonhard Euler (1707–1783) was a Swiss mathematician who made enormous contributions in
fields as diverse as infinitesimal calculus and graph theory. He introduced much of the modern
mathematical terminology and notation. [3] He is also renowned for his work in mechanics, fluid
dynamics, optics, astronomy, and music theory.
8 1 Integers
project led by Dr. Curtis Cooper.3 The new prime number is 257,885,161 − 1; it has
17,425,170 digits. This is the largest known prime to date. We can check with GAP
if the number of digits of this prime was reported correctly:
gap> n:=57885161;;
gap> 2ˆn-1;
<integer 581...951 (17425170 digits)>
But checking its primality is currently beyond GAP.
Exercises
1. Write a GAP program that calculates the 2007th prime p2007 . Calculate p2007 .
2. Write a GAP program that finds the smallest k for which
n = p1 p 2 . . . p k + 1
is composite. It should output k, n and its prime factorisation.

3. Find all integers a = 3 for which a − 3 is a divisor of a 3 − 17.
4. Prove that the set P of all primes that are greater than 2 is split into two disjoint
classes: primes of the form 4k + 1 and primes of the form 4k + 3. Similarly, P is
split into two other disjoint classes: primes of the form 6k + 1 and primes of the
form 6k + 5.
5. Prove that any prime of the form 3k + 1 is also of the form 6k + 1 (but for a
different k, of course).
6. GAP remembers all 168 primes not exceeding 1000. The command Primes[i];
gives you the ith prime. Using GAP:
(a) Create two lists of prime numbers, called Primes1 and Primes3, include in
the first list all the primes p ≤ 1000 for which p = 4k + 1 and include in
the second list all the primes p ≤ 1000 for which p = 4k + 3.
(b) Output the number of primes in each list.
(c) Output the 32nd prime from the first list and the 53rd prime from the second
list.
(d) Output the positions of 601 and 607 in their respective lists.
7. (a) Use GAP to list all primes up to 1000 representable in the form 6k + 5.
(b) Prove that there are infinitely many primes representable in the form 6k + 5.
8. Give an alternative proof that the number of primes is infinite along the following
lines:
3 See http://www.mersenne.org/various/57885161.htm.
• Assume that there are only k primes p1 , p2 , . . . , pk .

• Given n, find an upper bound f (n) for the number of products
p1α1 p2α2 . . . pkαk
that do not exceed n by estimating the number of values that α1 , α2 , . . . , αk

might assume.
• Show that f (n) grows more slowly than n for n sufficiently large.
1.1.3 Factoring Integers. The Sieve of Eratosthenes
None of the ideas we have learned up to now will help us to find the prime factorisation
of a particular integer n. Finding prime factorisations is not an easy task, and there
are no simple ways to do so. The theorem that we will prove in this section is of
some help since it tells us where to look for the smallest prime divisor of n.
Firstly, we have to define the following useful function.
Definition 1.1.3 Let x be a real number. By x we denote the largest integer n such
that n ≤ x. The integer x is called the integer part of x or the floor of x.
√
Example 1.1.6 π = 3, 19 = 4, −2.1 = −3.
Theorem√1.1.4
The smallest prime divisor of a composite number n is less than or
equal to n .
Proof
√ We prove first that n has a divisor which is greater than 1 but not greater than
√
is composite, we have n = d1 d2 , where d1 > 1 and d2 > 1. If d1 > n
n. As n √
and d2 > n, then √
n = d1 d2 > ( n)2 = n,
√
which is impossible. Suppose,
√ d1 ≤ n. Then any of the prime divisors of d1 will
be less than or equal to n. But every divisor of d1 is also a divisor
√ of n, thus the
smallest prime√divisor p of n will satisfy the inequality p ≤ n. Since p is an
integer, p ≤ n. �
Now we may demonstrate a beautiful and efficient method of listing all primes
up to x, called the Sieve of Eratosthenes.
Algorithm (The Sieve of Eratosthenes): To find all the primes up to x begin
by writing down all the integers from 2 to x in ascending order. The first number
on the list is 2. Leave it there and cross out all other multiples of 2. Then use the
following iterative procedure. Let d be the next√smallest number on the list that is
not eliminated.
√ Leave d on the list and, if d ≤ x, cross out all other multiples of
it. If d > x, then stop. The prime numbers up to x are those which have not been
crossed out.
10 1 Integers
For example, if we write all positive integers not exceeding 100 in a 10 × 10

square table, then at the end of the process our table will look like:
2 3 5 7
11 13 17 19
23 29
31 37
41 43 47
53 59
61 67
71 73 79
83 89
97
The numbers in this table are all primes not exceeding 100. Please√note that we
had to cross out only multiples of the primes from the first row since 100 = 10.
The simplest algorithm for factoring integers is Trial Division.
Algorithm (Trial Division): Suppose a sufficiently long list of primes is available.
Given a positive√ integer n, divide it with remainder by all primes on the list which
do not exceed n, starting from 2. The first prime which divides n (call this prime
p1 ) will be the smallest prime divisor of n. In this case n is composite. Calculate
√
n 1 = n/ p1 and repeat the procedure. If none of the primes, which do not exceed n,
divide n, then n is prime, and its prime factorisation is trivial.
Using the list of primes stored by GAP in array Primes we can apply the Trial
Division algorithm to factorise numbers not exceeding one million. Practically, it is
virtually impossible to completely factor a large number of about 100 decimal digits
only with Trial Division unless it has small prime divisors. Trial Division is very fast
for finding small factors (up to about 106 ) of n.
It is important to know how many operation will be needed to factorise n. If we
do not know how many operations are needed, it is impossible to estimate the time it
would take to use the Trial Division Algorithm in the worst possible case—the case
in which small factors are absent.
Let π(x) denote the number of primes which do not exceed x. Because of the
irregular occurrence of the primes, we cannot expect a simple formula for π(x). The
following simple program calculates this number for x = 1000.
gap> n:=1000;;
gap> piofx:=0;;
gap> p:=2;;
gap> while p<n do

> p:=NextPrimeInt(p);
> piofx:=piofx+1;
> od;
gap> piofx;
168
As we see there are 168 primes not exceeding 1000. GAP stores them in an array
Primes. For example, the command
gap> Primes[100];
541
gives you the 100th prime.

One of the most impressive results in advanced number theory gives an asymptotic
approximation for π(x).
Theorem 1.1.5 (The Prime Number Theorem)
ln x
lim π(x) = 1, (1.6)
x→∞ x
where ln x is the natural logarithm, to base e.
The proof is beyond the scope of this book. The first serious attempt towards prov-
ing this theorem (which was long conjectured to be true) was made by Chebyshev4
who proved (1848–1850) that if the limit exists at all, then it is necessarily equal to
one. The existence of the limit (1.6) was proved independently by Hadamard5 and
de la Vallée-Poussin6 with both papers appearing almost simultaneously in 1896.
Corollary 1.1.1 For a large positive integer n there exist approximately n/ ln n
primes among the numbers 1, 2, . . . , n. This can be expressed as
n
π(n) ∼ , (1.7)
ln n
where ∼ means approximately equal for large n. (In Sect. 2.3 we will give it a precise
meaning.)
4 Pafnutii Lvovich Chebyshev (1821–1894) was a Russian mathematician who is largely remem-
bered for his investigations in number theory. Chebyshev is also famous for the orthogonal polyno-
mials he invented. He had a strong interest in mechanics as well.
5 Jacques Salomon Hadamard (1865–1963) was a French mathematician whose most important
result is the prime number theorem which he proved in 1896. He worked on entire functions and
zeta functions and became famous for introducing Hadamard matrices and Hadamard transforms.
6 Charles Jean Gustave Nicolas Baron de la Vallée-Poussin (1866–1962) is best known for his
proof of the prime number theorem and his major work Cours d’Analyse. He was additionally known
for his writings about the zeta function, Lebesgue and Stieltjes integrals, conformal representation,
and algebraic and trigonometric series.
12 1 Integers
√
Example 1.1.7 Suppose n = 999313. Then n = 999. Using (1.7) we approxi-
mate π(999) as 999
6.9 ≈ 145. The real value of π(999), as we know, is 168. The number
999 is too small for the approximation in (1.7) to be good.
So, if we try to find a minimal prime divisor of n using Trial Division, then, in the worst
case scenario, we might need to perform 168 divisions. However n = 7 · 142759,
where the latter number is prime. So 7 will be discovered after four divisions only and
factored out but we will need to perform 74 additional divisions to√ prove that 142759
is prime by dividing 142759 by all primes smaller than or equal to 142759 = 377.
The following two facts are also related to the distribution of primes. Both facts
are useful to know and easy to remember.
Theorem 1.1.6 (Bertrand’s Postulate) For each positive integer n > 1 there is a
prime p such that n 3 and checked it for numbers up to at least 2 · 106 . This conjecture,
similar to one stated by Euler one hundred years earlier, was proved by Chebyshev
in 1850.
Theorem 1.1.7 There are arbitrarily large gaps between consecutive primes.
Proof This follows from the fact that, for any positive integer n, all numbers
n! + 2, n! + 3, . . . , n! + n
are composite. This is true since for any 2 ≤ k ≤ n

n!
n! + k = k +1 .
k
Thus, for any n there are n − 1 consecutive composite integers. �
Exercises
1. (a) Use the Sieve of Eratosthenes to find the prime numbers up to 210. Hence
calculate π(210) exactly.
(b) Calculate the estimate that the Prime Number Theorem gives for π(210) and
compare your result with the exact value of π(210) obtained in (a).
2. Convince yourself that the following program implements the Sieve of
Eratosthenes
7 Joseph Louis Francois Bertrand (1822–1900), born and died in Paris, was a professor at the
École Polytechnique and Collège de France. He was a member of the Paris Academy of Sciences
and was its permanent secretary for twenty-six years. Bertrand made a major contribution to group
theory and published many works on differential geometry and on probability theory.
n:=2*10ˆ3;;
set:=Set([2..n]);;
p:=2;;
while p<RootInt(n)+1 do
k:=2;;
while k*p<n+1 do
RemoveSet(set,k*p);
k:=k+1;
od;
p:=NextPrimeInt(p);
od;
and stores in an array ‘set’ all primes not exceeding 2000.

3. Professor Woodhead has compiled a list of all primes that are less than 10,000
and is very proud of himself. He checks that the number n = 123123137 does
not have any prime divisors in his list by dividing n by all of the primes that he
found.
(a) Can he claim that n is prime?
(b) Estimate the number of additional divisions that Professor Woodhead must
perform in order to be able to claim that n is prime.
4. A composite
√ number n does not have prime divisors which are less than or equal
to 3 n. Prove that it is a product of two primes.
5. Use Bertrand’s postulate to show that any integer greater than 6 is the sum of two
relatively prime integers each of which is greater than 1.
6. What would be the output for the following GAP program?
n:=10ˆ6;
set:=Set([1..n]);
p:=3;
while p<n+1 do;
k:=1;
while k*p<n+1 do;
RemoveSet(set,k*p);
k:=k+1;
od;
p:=NextPrimeInt(p);
od;
set;
In particular, how many numbers will be displayed?

14 1 Integers
1.2 The Euclidean Algorithm
1.2.1 Greatest Common Divisor and Least Common Multiple
Let n be a positive integer with the prime factorisation
n = p1α1 p2α2 . . . prαr , (1.8)
where pi are distinct primes and αi are positive integers. How can we find all divisors
of n? Let d be a divisor of n. Then n = dm, for some m, thus
n = dm = p1α1 p2α2 . . . prαr .
Since the prime factorisation of n is unique, d cannot have in its prime factorisation
a prime which is not among the primes p1 , p2 , . . . , pr . Also, a prime pi in the prime
factorisation of d cannot have an exponent greater than αi . Therefore
β β
d = p1 1 p2 2 . . . prβr , 0 ≤ βi ≤ αi , i = 1, 2, . . . , r. (1.9)
Theorem 1.2.1 The number of positive divisors of n is
d(n) = (α1 + 1)(α2 + 1) . . . (αr + 1). (1.10)
Proof Indeed, we have exactly αi + 1 possibilities to choose βi in (1.9), namely

0, 1, 2, . . . , αi . Thus the total number of divisors will be exactly the product
(α1 + 1)(α2 + 1) . . . (αr + 1). �
It is important to note that Eq. (1.10) does not give us a complete algorithm for
the calculation of d(n) as we need to run the factorisation algorithm first. No direct
method of calculation is known.
Definition 1.2.1 The numbers in, where i = 0, ±1, ±2, . . . , are called multiples
of n.
It is clear that any multiple of n given by (1.8) has the form

γ γ
m = kp1 1 p2 2 . . . prγr , γi ≥ αi , i = 1, 2, . . . , r,
where k has none of the primes p1 , p2 , . . . , pr in its prime factorisation. The number
of multiples of n is infinite.
Let a and b be two positive integers. If d is a divisor of a and also a divisor of b,
then we say that d is a common divisor of a and b. As there are only a finite number
of common divisors, there is the greatest common divisor, denoted by gcd(a, b). The
number m is said to be a common multiple of a and b if m is a multiple of a and also
1.2 The Euclidean Algorithm 15
a multiple of b. Among all common multiples there is a minimal one (Least Integer
Principle!). It is called the least common multiple and it is denoted by lcm(a, b).
In the decomposition (1.8), all exponents were positive. However, sometimes it
is convenient to allow some exponents to be 0 as in the formulation of the following
theorem.
Theorem 1.2.2 Let

β β
a = p1α1 p2α2 . . . prαr , b = p1 1 p2 2 . . . prβr
be two arbitrary positive integers, where αi ≥ 0 and βi ≥ 0. (We could assume that
a and b are expressed using the same primes p1 , p2 , . . . , pr because we allowed
some exponents to be 0.) Then
min(α1 ,β1 ) min(α2 ,β2 )

gcd(a, b) = p1 p2 ... prmin(αr ,βr ) , (1.11)
and
max(α1 ,β1 ) max(α2 ,β2 )
lcm(a, b) = p1 p2 ... prmax(αr ,βr ) . (1.12)
Moreover,
gcd(a, b) · lcm(a, b) = a · b. (1.13)
Proof Formulas (1.11) and (1.12) follow from our description of common divi-
sors and common multiples. To prove (1.13) we have to notice that min(αi , βi ) +
max(αi , βi ) = αi + βi . �
Example 1.2.1 Let
a = 136995569568 = 25 · 311 · 11 · 133 = 25 · 311 · 111 · 133 · 170 ,

b = 84474819 = 35 · 112 · 132 · 17 = 20 · 35 · 112 · 132 · 171 .
Then gcd(a, b) = 35 · 11 · 132 = 415737.
Theorem 1.2.2 gives us an algorithm for calculating the greatest common divisor.
However, it depends on the factorisation algorithm, which is computationally difficult
using existing methods. It is suspected but has not yet been proved that no easy
algorithms for prime factorisation exist. So it is desirable in any number theoretic
algorithm to avoid factorisation of the numbers involved. The algorithm given above
for finding the greatest common divisor cannot be used unless prime factorisation has
already been done. Fortunately, the greatest common divisor gcd(a, b) of numbers
a and b can be found without knowing the prime factorisations of a and b. Such
an algorithm will be presented below. It was known to Euclid; he could even have
been the first to have discovered it. The algorithm is based on the following simple
observation.
16 1 Integers
Proposition 1.2.1 Let a, b, q, r be any integers such that a = qb + r . Then

gcd(a, b) = gcd(b, r ).
Proof Indeed, if d is a common divisor of a and b, we have a = a d and b = b d.

Then r = a − qb = a d − qb d = (a − qb )d and d is also a common divisor
of b and r . Also, if d is a common divisor of b and r , then b = b d, r = r d and
a = qb + r = qb d + r d = (qb + r )d, whence d is a common divisor of a
and b. �
Now to the algorithm. The idea of it is clear: start with the pair (a, b) for which
the greatest common divisor is sought, and replace it with a “smaller” pair with the
same greatest common divisor. Repeat the process (if necessary) until the greatest
common divisor is easily seen.
Theorem 1.2.3 (The Euclidean Algorithm) Let a and b be positive integers. We use
the division algorithm several times to find:
a = q1 b + r 1 , 0 < r1 < b,
b = q2 r 1 + r 2 , 0 < r2 < r1 ,
r 1 = q3 r 2 + r 3 , 0 < r3 < r2 ,
..
.
rs−2 = qs rs−1 + rs , 0 < rs < rs−1 ,
rs−1 = qs+1 rs .
Then rs = gcd(a, b).
Proof By Proposition 1.2.1, gcd(a, b) = gcd(b, r1 ) = gcd(r1 , r2 ) = · · · =

gcd(rs−1 , rs ) = rs . �
Example 1.2.2 Let a = 321, b = 843. Find the greatest common divisor gcd(a, b).
The Euclidean algorithm yields
321 = 0 · 843 + 321

843 = 2 · 321 + 201
321 = 1 · 201 + 120
201 = 1 · 120 + 81
120 = 1 · 81 + 39
81 = 2 · 39 + 3
39 = 13 · 3 + 0,
321 · 843
and therefore gcd(321, 843) = 3 and lcm(321, 843) = = 107 · 843 =
3
90201.
Definition 1.2.2 If gcd(a, b) = 1, the numbers a and b are said to be relatively

prime (or coprime).
For example, the numbers 200 = 23 · 52 and 567 = 34 · 7 are coprime.
Exercises
1. How many divisors does the number 22 · 33 · 44 · 55 have? (No GAP, please.)
2. How many divisors does the number 123456789 have?
3. Find all common divisors of 10650 and 6750.
4. (a) Find the greatest common divisor and the least common multiple of m =
24 · 32 · 57 · 112 and n = 22 · 54 · 72 · 113 .
(b) Use GAP to check the identity lcm(m, n) · gcd(m, n) = m · n.
5. Find all positive integers n ≤ 10000 with exactly 33 distinct positive divisors.
6. Calculate d(d(246246 )), where d(n) is the number of divisors of n.
7. Show that gcd(a, b) = gcd(a, a − b).
8n + 13
8. Show that the fraction is in lowest possible terms for every n ≥ 1.
13n + 21
9. Suppose two positive integers a and b are relatively prime.
(a) Prove that gcd(a 2 , a + b) = 1.
(b) Suppose a +b and a 2 +b2 are not relatively prime. Find the greatest common
divisor of this pair and give an example of two such integers.
10 Show that any two distinct Fermat numbers are coprime. (Use Exercise 4 of
Sect. 1.1.1.)
11 Use Fermat numbers to give an alternative proof that the number of primes is
infinite.
1.2.2 Extended Euclidean Algorithm. Chinese Remainder

Theorem
Given two integers a and b we can consider all their possible linear combinations
k1 a + k2 b, where k1 , k2 ∈ Z. Let us denote this set by <a, b>. We note that a and b
belong to this set since a = 1 · a + 0 · b and b = 0 · a + 1 · b. We also note that when
we add two numbers from <a, b>, even with some coefficients, we always remain
in <a, b>. Indeed, suppose we have linear combinations k1 a + k2 b and k1 a + k2 b.
Then
u(k1 a + k2 b) + v(k1 a + k2 b) = (uk1 + vk1 )a + (uk2 + vk2 ),
which is an element of <a, b>.

Analysing the chain of divisions with remainder in the formulation of Theo-
rem 1.2.3, we come to the conclusion that all remainders ri , i = 1, 2, . . . , s belong
to <a, b>. In particular, gcd(a, b) belongs to <a, b>. This is an important fact so
we formulate it as theorem for further reference.
18 1 Integers
Theorem 1.2.4 Let a and b be positive integers. Then there exist integers m and n
such that
gcd(a, b) = ma + nb. (1.14)
The numbers m and n in (1.14) are not unique, moreover there exist infinitely
many such pairs. However, sometimes, knowing even one pair of such numbers is
more important than knowing the greatest common divisor itself. One pair of numbers
m and n satisfying (1.14) can be easily obtained from the Euclidean algorithm by back
substitution. The following theorem provides us with a convenient way of calculating
them. It also gives an alternative proof of the existence of m and n based on Linear
Algebra.
Theorem 1.2.5 (The Extended Euclidean Algorithm) Let us write the following
matrix with two rows R1 , R2 , and three columns C1 , C2 , C3 :

R1 a10
[C1 C2 C3 ] = = .
R2 b01
In accordance with the Euclidean Algorithm above, we perform elementary row

operations R3 := R1 − q1 R2 , R4 := R2 − q2 R3 , . . . , each time creating a new row,
so as to obtain: ⎡ ⎤
a 1 0
⎢b 0 1 ⎥
⎢ ⎥
⎢ r1 1 −q 1 ⎥
⎢ ⎥
[C1 C2 C3 ] = ⎢ r2 −q2 1 + q1 q2 ⎥ .
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
rs m n
Then gcd(a, b) = rs = ma + nb.
Proof Note that C1 = aC2 +bC3 . In Linear Algebra you have learned that elementary
row operations do not change linear relationships between columns. Since new rows
were obtained by means of elementary row operations on the existing rows, the
relationships between the columns of C1 , C2 , C3 must be exactly the same as those
between the columns of C1 , C2 , C3 (see Sect. 10.1 of the Appendix for justification).
Thus we conclude that C1 = aC2 + bC3 . In particular, rs = ma + nb. �
Example 1.2.3 Let a = 321, b = 843. Find a linear presentation of the greatest
common divisor in the form gcd(a, b) = ma + nb.
The Euclidean algorithm on these numbers was performed in Example 1.2.2 and
we know that gcd(321, 843) = 3 and all the quotients obtained at each division. The
Extended Euclidean algorithm yields
321 1 0
843 0 1 0
321 1 0 2
201 −2 1 1
120 3 −1 1
81 −5 2 1
39 8 −3 2
3 −21 8 13
where for convenience of performing row operations the quotients are placed on
the right of the bar. Thus we obtain the linear presentation gcd(321, 843) = 3 =
(−21) · 321 + 8 · 843. So m = −21 and n = 8.
The properties of relatively prime numbers gathered in the following are often
used.
Lemma 1.2.1 Let a and b be relatively prime positive integers. Then

(a) a and b do not have common primes in their prime factorisations;
(b) if c is a multiple of a and c is also a multiple of b, then c is a multiple of ab;
(c) if ac is a multiple of b, then c is a multiple of b;
(d) there exist integers m, n such that ma + nb = 1.
Proof Suppose as in the proof of Theorem 1.2.2 that

β β
a = p1α1 p2α2 . . . prαr , b = p1 1 p2 2 . . . prβr ,
where αi ≥ 0 and βi ≥ 0 are nonnegative integers. Then by (1.11)

min(α1 ,β1 ) min(α2 ,β2 )
gcd(a, b) = p1 p2 ... prmin(αr ,βr ) = 1,
which implies min(αi , βi ) = 0 for all i = 1, 2, . . . , r . This means that either the
prime pi does not enter the prime factorisation of a or it does not enter the prime
factorisation of b. Thus a and b do not have primes in common. This proves (a).
Let us prove (b). As we know from (a) the numbers a and b do not have primes
in common in their prime factorisations. Hence
β β
a = p1α1 p2α2 . . . prαr , b = q1 1 q2 2 . . . qsβs ,
where pi = q j for all i and j. We have

β β
c = p1α1 p2α2 . . . prαr k = q1 1 q2 2 . . . qsβs m.
β β β
Since the prime factorisation is unique k must be divisible by q1 1 q2 2 . . . qs s , which
is b, and m must be divisible by p1α1 p2α2 . . . prαr , which is a. As a result, c is divisible
by ab,
20 1 Integers
Let us prove (c). We have ac = bd for some positive integer d so

β β
ac = p1α1 p2α2 . . . prαr c = bd = q1 1 q2 2 . . . qsβs d.
Due to the uniqueness of the prime factorisation of ac the number c must be divisible
β β β
by q1 1 q2 2 . . . qs s which is b.
Now (d) follows from Theorem 1.2.5. �
The following result is extremely important. Its author is not known exactly but
it could be Sun Tzu (or Sun Zi)8 in whose book it was first mentioned.
Theorem 1.2.6 (The Chinese Remainder Theorem) Let a and b be two relatively
prime numbers, 0 ≤ r < a and 0 ≤ s < b. Then there exists a unique number N
such that 0 ≤ N < ab and
r = N mod a and s = N mod b, (1.15)
that is, N has remainder r on dividing by a and remainder s on dividing by b.
Proof Let us prove, first, that there exists at most one integer N with the conditions
required. Assume, on the contrary, that for two integers N1 and N2 we have 0 ≤
N1 < ab, 0 ≤ N2 < ab and
r = N1 mod a s = N1 mod b, r = N2 mod a s = N2 mod b.
Without loss of generality let us assume that N1 > N2 . Then the number M =
N1 − N2 satisfies 0 ≤ M < ab and
0 = M mod a 0 = M mod b. (1.16)
By Lemma 1.2.1(b), condition (1.16) implies that M is divisible by ab, whence

M = 0 and N1 = N2 .
Now we will find an integer N such that r = N mod a and s = N mod b,
ignoring the condition 0 ≤ N < ab. By Theorem 1.2.4 there are integers m, n such
that gcd(a, b) = 1 = ma +nb. Multiplying this equation by r −s we get the equation
r − s = (r − s)ma + (r − s)nb = m a + n b.
Now we define the number
N = r − m a = s + n b.
8 Sun Tzu (3rd–5th century AD) (or Sun Zi) was a Chinese mathematician and astronomer. He
investigated Diophantine equations. He authored “Sun Tzu’s Calculation Classic”, which contained,
among other things, the Chinese remainder theorem.
It clearly satisfies condition (1.15). If N does not satisfy 0 ≤ N < ab, we divide
N by ab with remainder. Let N = q · ab + N1 , where N1 is the remainder. Then
0 ≤ N1 < ab and N1 satisfies (1.15) since N1 has the same remainder as N on
division by a and also by b. The theorem is proved. �
Example 1.2.4 Find the smallest positive integer N such that
5 = N mod 991, 8 = N mod 441.
Using the Extended Euclidean algorithm we find

991 1 0
441 0 1 2
109 1 −2 4
5 −4 9 21
4 85 −191 1
1 −89 200 4
thus yielding 1 = (−89) · 991 + 200 · 441. We may write 8 − 5 = 3 =
(−267) · 991 + 600 · 441 and obtain the number N = −264592 = 8 − 600 · 441 =
5 + (−267) · 991, which satisfies all the requirements apart from being between
0 and 437031 = 991 · 441. We divide N by 437031 with remainder. We have
−264592 = (−1) · 437031 + 172439, and the remainder N1 = 172439 will be the
number required.
Exercises
1. Use the Extended Euclidean Algorithm to find the greatest common divisor d of
3773 and 3596 and find any integers x and y such that d = 3773x + 3596y.
2. Using the Extended Euclidean Algorithm, find at least one pair of integers (x, y)
satisfying 1840x +1995y = 5, and at least three pairs of integers (z, w) satisfying
1840z + 1995w = −10.
3. Let a, b, c and d be non-negative integers with c > 1 and d > 1. Suppose that
there exists an integer N such that
a = N mod c and b = N mod d.
Prove that a − b is a multiple of gcd(c, d).

4. (a) Find any integer y such that
y ≡ 9 mod 26 and y ≡ 35 mod 68.
(Note that 26 and 68 are not relatively prime.)

(b) Find the unique integer x such that 0 ≤ x < 3550 and
x ≡ 4 mod 50 and x ≡ 19 mod 71.

22 1 Integers
5. (a) Prove that there are no integers x, y satisfying 1840x + 1995y = 3.

(b) Let a and b be non-zero integers. Describe the set of integers c for which
there exist integers x and y satisfying the equation ax + by = c.
1.3 Fermat’s Little Theorem and Its Generalisations
1.3.1 Euler’s φ-Function
Definition 1.3.1 Let n be a positive integer. The number of positive integers not
exceeding n and relatively prime to n is denoted by φ(n). This function is called
Euler’s φ-function or Euler’s totient function.
Let us denote by Zn the set {0, 1, 2, . . . , n−1} and by Z∗n the set of those positive
numbers from Zn that are relatively prime to n. Then φ(n) is the number of elements
of Z∗n , i.e., φ(n) = |Z∗n |.
Example 1.3.1 Let n = 20. Then Z∗20 = {1, 3, 7, 9, 11, 13, 17, 19} and φ(20) = 8.

k k k−1 k 1
Lemma 1.3.1 If n = p , where p is prime, then φ(n) = p − p = p 1− .
p
Proof It is easy to list all positive integers that are less than or equal to pk and
not relatively prime to p k . They are 1· p, 2· p, 3· p, . . . , ( p k−1 − 1)· p. They are all
multiples of p and we have exactly p k−1 − 1 of them. To obtain Z∗n we have to
remove from Zn all these p k−1 − 1 numbers and also 0. Therefore Z∗n will contain
p k − ( p k−1 − 1) − 1 = p k − p k−1 numbers. �
An important consequence of the Chinese Remainder Theorem is that the function

φ(n) is multiplicative in the following sense.
Theorem 1.3.1 Let m and n be any two relatively prime positive integers. Then
φ(mn) = φ(m)φ(n).
Proof Let Z∗m = {r1 , r2 , . . . , rφ(m) } and Z∗n = {s1 , s2 , . . . , sφ(n) }. Let us consider
an arbitrary pair (ri , s j ) of numbers, one from each of these sets. By the Chinese
Remainder Theorem there exists a unique positive integer Ni j such that 0 ≤ Ni j <
mn and
ri = Ni j mod m, s j = Ni j mod n,
that is, Ni j has remainder ri on dividing by m, and remainder s j on dividing by n, so
Ni j = am + ri , Ni j = bn + s j . (1.17)
1.3 Fermat’s Little Theorem and Its Generalisations 23
We have gcd(Ni j , m) = gcd(m, ri ) = 1 and gcd(Ni j , n) = gcd(n, s j ) = 1, that

is Ni j is relatively prime to m and also relatively prime to n. Since m and n are
relatively prime, Ni j is relatively prime to mn, i.e., Ni j ∈ Z∗mn . Clearly, different
pairs (i, j) = (k, l) yield different numbers, that is Ni j = Nkl for (i, j) = (k, l).
We note that there are φ(m)φ(n) of the numbers Ni j , exactly as many as there are
pairs of the form (ri , s j ). This shows that φ(m)φ(n) ≤ φ(mn).
Suppose now that a number N ∈ Zmn is different from Ni j for all i and j. Then
r = N mod m, s = N mod n,
where either r does not belong to Z∗m or s does not belong to Z∗n . Assuming the
former, we get gcd(r, m) > 1. But then gcd(N , m) = gcd(m, r ) > 1, in which case
gcd(N , mn) > 1 too. Thus N does not belong to Z∗mn . This shows that the numbers
Ni j —and only these numbers—form Z∗mn . Therefore φ(mn) = φ(m)φ(n). �
Theorem 1.3.2 Let n be a positive integer with the prime factorisation
n = p1α1 p2α2 . . . prαr ,
where pi are primes and αi are positive integers. Then

1 1 1
φ(n) = n 1 − 1− ... 1 − .
p1 p2 pr
Proof We use Lemma 1.3.1 and Theorem 1.3.1 to compute φ(n). Repeatedly apply-
ing Theorem 1.3.1 we get

φ(n) = φ p1α1 φ p2α2 . . . φ prαr .
By Lemma 1.3.1 this can be rewritten as

1 1 1
φ(n) = p1α1 1 − p2α2 1 − . . . prαr 1 −
p1 p2 pr

1 1 1
= p1α1 p2α2 . . . prαr 1 − 1− ... 1 −
p1 p2 pr

1 1 1
=n 1− 1− ... 1 − ,
p1 p2 pr
as required. �
1 2 10
Example 1.3.2 φ(264) = φ(23 · 3 · 11) = 264 2 3 11 = 80. We also have
φ(269) = 268 as 269 is prime.
The following corollary will be important in the cryptography section.

24 1 Integers
Corollary 1.3.1 If n = pq, where p and q are primes, then φ(n) = ( p−1)(q −1) =
pq − p − q + 1.
There are no known methods for computing φ(n) in situations where the prime
factorisation of n is not known. If n is so big that modern computers cannot factorise
it, you can publish n and keep φ(n) secret.
Exercises
1. Compute φ(125), φ(180) and φ(1001).
2. Factor n = 4386607, which is a product of two primes, given φ(n) = 4382136.
3. Find m = p 2 q 2 , given that p and q are primes and φ(m) = 11424.
2013
4. Find the remainder of 2(2 ) on division by 5.
5. Using Fermat’s Little Theorem find the remainder on dividing by 7 the number
333555 + 555333 .
6. Let n = 1234567890987654321 and a = 111111111. Calculate a n−1 mod n. Is

the result consistent with the hypothesis that n is prime?
7. Let p > 2 be a prime. Prove that all prime divisors of 2 p − 1 have the form
2kp + 1.
1.3.2 Congruences. Euler’s Theorem
Definition 1.3.2 Let a and b be integers and m be a positive integer. We say that
a is congruent to b modulo m and write a ≡ b mod m if a and b have the same
remainder on dividing by m, that is a mod m = b mod m.
For example, 41 ≡ 80 mod 13 since the numbers 41 and 80 both have remainder
2 when divided by 13. Also, 41 ≡ −37 mod 13. When a and b are not congruent
we write a ≡ b mod m. For example, 41 ≡ 7 mod 13 because 41 has remainder 2,
when divided by 13, and 7 has remainder 7.
Lemma 1.3.2 (Criterion) Let a and b be two integers and m be a positive integer.
Then a ≡ b mod m, if and only if a − b is divisible by m.
Proof By the division algorithm
a = q1 m + r1 , 0 ≤ r1 < m, and b = q2 m + r2 , 0 ≤ r2 < m.
Thus a − b = (q1 − q2 )m + (r1 − r2 ), where −m < r1 − r2 < m. We see that a − b

is divisible by m if and only if r1 − r2 is divisible by m but this can happen if and
only if r1 − r2 = 0, or r1 = r2 , which is the same as a ≡ b mod m. �
Lemma 1.3.3 Let a and b be two integers and m be a positive integer. Then
(a) if a ≡ b mod m and c ≡ d mod m, then a + c ≡ b + d mod m;
(b) if a ≡ b mod m and c ≡ d mod m, then ac ≡ bd mod m;
(c) if a ≡ b mod m and n is a positive integer, then a n ≡ bn mod m;
(d) if ac ≡ bc mod m and c is relatively prime to m, then a ≡ b mod m.
Proof (a) is an exercise.

(b) If a ≡ b mod m and c ≡ d mod m, then m|(a − b) and m|(c − d), i.e.,
a − b = im and c − d = jm for some integers i, j. Then
ac −bd = (ac −bc)+(bc −bd) = (a −b)c +b(c −d) = icm + jbm = (ic + jb)m,
whence ac ≡ bd mod m.
(c) Follows immediately from (b).
(d) Suppose that ac ≡ bc mod m and gcd(c, m) = 1. Then, by the criterion,
(a − b)c = ac − bc is a multiple of m. As gcd(c, m) = 1, by Lemma 1.2.1(c) a − b
is a multiple of m, and by the criterion a ≡ b mod m. �
Theorem 1.3.3 (Fermat’s Little Theorem) Let p be a prime. If an integer a is not

divisible by p, then a p−1 ≡ 1 mod p. Also a p ≡ a mod p for all a.
Proof Let a be relatively prime to p. Consider the numbers a, 2a, . . . , ( p − 1)a.

They all have different remainders on dividing by p. Indeed, suppose that for some
1 ≤ i < j ≤ p−1 we have ia ≡ ja mod p. Then by Lemma 1.3.3(d) a can be
canceled and i ≡ j mod p, which is impossible. Therefore these remainders are
1, 2, . . . , p − 1 and by repeated application of Lemma 1.3.3 (b) we have
a · 2a · . . . · ( p − 1)a ≡ ( p − 1)! mod p,
which is
( p − 1)! · a p−1 ≡ ( p − 1)! mod p.
Since ( p − 1)! is relatively prime to p it can be canceled by Lemma 1.3.3(d) and

we get a p−1 ≡ 1 mod p. When a is relatively prime to p, the last statement follows
from the first one. If a is a multiple of p the last statement is also clear. �
Example 1.3.3 Find 3282013 mod 7.

Firstly we note that 6 = 328 mod 7 and using Lemma 1.3.3(c) we find that
3282013 mod 7 = 62013 mod 7. Now we have to reduce 2013. We can do this using
Fermat’s Little Theorem. Since for all a relatively prime to 7 we have a 6 ≡ 1 mod 7
we can replace 2013 with its remainder on division by 6. Since 3 = 2013 mod 6 we
obtain
3282013 mod 7 = 62013 mod 7 = 63 mod 7 = 6.
The latter follows from the calculation 63 = 36 · 6 ≡ 1 · 6 mod 7 = 6.

26 1 Integers
Fermat’s Little Theorem is a powerful (but not perfect) tool for checking primality.
Let
p := 20747222467734852078216952221076085874809964747211172927529925
89912196684750549658310084416732550077
be a random 100-digit prime. Then the calculation
gap> PowerMod(3,p-1,p);
1
gap> q:=pˆ2;;
gap> PowerMod(3,q-1,q)=1;
false
shows that 3 p−1 ≡ 1 mod p but for q = p 2 we have 3q−1 ≡ 1 mod q thus revealing
the compositeness of q. We will discuss thoroughly primality checking in Sect. 2.4.3.
Despite its usefulness, Fermat’s Little Theorem has limited applicability since the
modulus p must be a prime. The following theorem generalises it to an arbitrary
positive integer n. It will be very important in cryptographic applications.
Theorem 1.3.4 (Euler’s Theorem) Let n be a positive integer. Then
a φ(n) ≡ 1 mod n
for all a relatively prime to n.
Proof Let Z∗n = {z 1 , z 2 , . . . , z φ(n) }. Consider the numbers z 1 a, z 2 a, . . . , z φ(n) a.

Both z i and a are relatively prime to n, therefore z i a is also relatively prime to
n. Suppose that ri = z i a mod n, i.e., ri is the remainder on dividing z i a by n.
Since gcd(z i a, n) = gcd(ri , n), one has ri ∈ Z∗n . These remainders are all different.
Indeed, suppose that ri = r j for some 1 ≤ i < j ≤ n. Then z i a ≡ z j a mod n. By
Lemma 1.3.3(d) a can be canceled and we get z i ≡ z j mod n, which is impossible.
Therefore the remainders r1 , r2 , . . . , rφ(n) coincide with z 1 , z 2 , . . . , z φ(n) , apart from
the order in which they are listed. Thus
z 1 a · z 2 a · . . . · z φ(n) a ≡ r1 · r2 · . . . · rφ(n) ≡ z 1 · z 2 · . . . · z φ(n) mod n,
which is
Z · a φ(n) ≡ Z mod n,
where Z = z 1 · z 2 · . . . · z φ(n) . Since Z is relatively prime to n it can be canceled by

Lemma 1.3.3(d), and we get a φ(n) ≡ 1 mod n. �
Example 1.3.4 Using Euler’s Theorem compute the last decimal digit (units digit)
of the number 32007 .
Since the last decimal digit of 32007 is equal to 32007 mod 10, we have to calculate
this remainder. As gcd(3, 10) = 1 and φ(10) = 4 we have 34 ≡ 1 mod 10. As
3 = 2007 mod 4 we obtain
32007 ≡ 33 ≡ 7 mod 10.
Hence the last digit of 32007 is 7.
Exercises
1. Show that:
(a) Both sides of the congruence and its modulus can be simultaneously divided
by a common positive divisor.
(b) If a congruence holds modulo m, then it also holds modulo d, where d is an
arbitrary divisor of m.
(c) If a congruence holds for moduli m 1 and m 2 , then it also holds modulo
lcm(m 1 , m 2 ).
2. Without using mathematical induction show that 722n+2 − 472n + 282n−1 is
divisible by 25 for any n ≥ 1.
3. Find all positive integer solutions x, y to the equation φ(3x 5 y ) = 600, where φ
is the Euler totient function.
4. List all positive integers a such that 0 ≤ a ≤ 242 for which the congruence
x 162 ≡ a mod 243 has a solution.
5. Without resorting to FactorsInt command, factorise n if it is known that it is
a product of two primes and that φ(n) = 3308580.
God of infinity finds refuge in a ring,

Birds of eternity sing there.
And you too find a ring in your heart.
Velemir Khlebnikov (1885–1922)
1.4 The Ring of Integers Modulo n. The Field Z p
We will consider the set Zn = {0, 1, 2, . . . , n−1} as an algebraic object by introduc-

ing two algebraic operations on it. First, given a, b ∈ Zn , we define a new addition
a ⊕ b by
df
a ⊕ b = a + b mod n. (1.18)
According to the definition, a ⊕ b is the remainder on dividing a + b by n and

therefore it is always in Zn .
28 1 Integers
Example 1.4.1 In Z11 the following identities hold: 3 ⊕ 4 = 7, 5 ⊕ 9 = 3, 4 ⊕ 7 = 0.
Theorem 1.4.1 The new addition satisfies the following properties:

1. It is commutative, a ⊕ b = b ⊕ a, for all a, b ∈ Zn .
2. It is associative, a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c, for all a, b, c ∈ Zn .
3. Element 0 (zero) is the unique element such that a ⊕ 0 = 0 ⊕ a = a, for every
a ∈ Zn .
df
4. For each a ∈ Zn there exists a unique element (−a) = n−a ∈ Zn such that
a ⊕ (−a) = (−a) ⊕ a = 0.
Proof Only the second property is not completely obvious. We prove it by noting
that a ⊕ b ≡ a + b mod n. Then by Lemma 1.3.3(a)
(a ⊕ b) ⊕ c ≡ (a ⊕ b) + c ≡ (a + b) + c mod n
and
a ⊕ (b ⊕ c) ≡ a ⊕ (b + c) ≡ a + (b + c) mod n,
whence (a ⊕ b) ⊕ c ≡ a ⊕ (b ⊕ c) mod n. But these numbers are in Zn and the

difference between them is less than n. Therefore (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c). �
Properties 1–4 in algebra axiomatically define a commutative group.
Definition 1.4.1 An algebraic system < G, + > which consists of a set G together
with an algebraic operation + defined on it is said to be a commutative group if the
following axioms are satisfied:
CG1 The operation is commutative, a + b = b + a, for all a, b ∈ G.
CG2 The operation is associative, a + (b + c) = (a + b) + c, for all a, b, c ∈ G.
CG3 There exists a unique element 0 such that a + 0 = 0 + a = a, for all a ∈ G.
CG4 For every element a ∈ G there exists a unique element −a such that a +
(−a) = (−a) + a = 0, for all a ∈ G.
Thus we can reformulate Theorem 1.4.1 by saying that < Zn , ⊕ > is a commu-
tative group.
Corollary 1.4.1 An equation a ⊕ x = b has a unique solution in Zn , namely

x = (−a) ⊕ b.
Proof Suppose that a ⊕ x = b, where x is a solution. Add (−a) to both sides of the
equation. We get
(−a) ⊕ (a ⊕ x) = (−a) ⊕ b,
from where, by using properties 1–4, we can find that (−a)⊕(a ⊕ x) = ((−a)⊕a)⊕
x = 0 ⊕ x = x, hence x = (−a) ⊕ b. Similar computations show that x = (−a) ⊕ b
is indeed a solution. �
1.4 The Ring of Integers Modulo n. The Field Z p 29
Example 1.4.2 Equation 18 ⊕ x = 13 in Z26 has a solution x = (−18) ⊕ 13 =

8 ⊕ 13 = 21.
Now, given a, b ∈ Zn , we define a new multiplication a b by
df
a b = ab mod n. (1.19)
According to the definition, a b is the remainder on dividing ab by n and therefore

it is always in Zn .
Example 1.4.3 In Z12 the following identities hold: 5 5 = 1, 2 4 = 8, 4 6 = 0.
Theorem 1.4.2 The new multiplication modulo n satisfies the following properties:
5. It is commutative, a b = b a, for all a, b ∈ Zn .
6. It is associative, a (b c) = (a b) c, for all a, b, c ∈ Zn .
7. It is distributive relative to the addition, a (b ⊕ c) = (a b) ⊕ (a c) and
(a ⊕ b) c = (a c) ⊕ (b c), for all a, b, c ∈ Zn .
8. There is a unique element 1 in Zn such that a 1 = 1 a = a, for every a ∈ Zn .
Proof Statements 5 and 8 are clear. The two other can be proved as in
Theorem 1.4.1. �
Properties 1–8 in algebraic terms mean that Zn together with operations ⊕ and is
a commutative ring with a unity element 1 which is defined by the following set of
axioms.
Definition 1.4.2 An algebraic system < R, +, · > which consists of a set R together
with two algebraic operations + and · defined on it is said to be a commutative ring
if the following axioms are satisfied:
CR1 < R, + > is a commutative group.
CR2 The operation · is commutative, a · b = b · a, for all a, b ∈ R.
CR3 The operation · is associative, a · (b · c) = (a · b) · c, for all a, b, c ∈ R.
CR4 There exists a unique element 1 ∈ R such that a ·1 = 1·a = a, for all nonzero
a ∈ R.
CR5 The distributive law holds, that is, a · (b + c) = a · b + a · c, for all a, b, c ∈ R.
Example 1.4.4 Other commutative rings include the ring of polynomials Z[x] with
integer coefficients or else with rational or real coefficients. The set of all n × n
matrices over the integers Zn×n is also a ring but not commutative since axiom CR2
is not satisfied.
Definition 1.4.3 An element a of a ring R is called invertible if there exists an
element b in R such that a · b = b · a = 1. An element b in this case is called a
multiplicative inverse of a.
Lemma 1.4.1 If a ∈ Zn possesses a multiplicative inverse, then this inverse is
unique.
30 1 Integers
Proof Suppose that we have two inverses of a, say b and c, so that a b = b a = 1

and a c = c a = 1. Then
b (a c) = b 1 = b,
and
(b a) c = 1 c = c,
hence b = c due to the associative law of multiplication. �

If a multiplicative inverse of a exists, it is denoted a −1 .
Theorem 1.4.3 All elements from Z∗n are invertible in Zn .
Proof Let a ∈ Z∗n . Then gcd(a, n) = 1, and we can write a linear presentation of
this greatest common divisor 1 = ua + vn. Let us divide u by n with remainder w.
We have u = qn + w, where 0 ≤ w < n, and we substitute qn + w in place of u:
1 = (qn + w)a + vn = wa + (qa + v)n.
It is clear now that w ∈ Zn and that w a = 1, which means w = a −1 . �

Example 1.4.5 Find 11−1 in Z26 and solve 11 x ⊕ 5 = 3.
Solution. We use the Extended Euclidean algorithm
26 1 0
11 0 1 2
4 1 −2 2
3 −2 5 1
1 3 −7 3
to find a linear presentation 1 = 3 · 26 + (−7) · 11. As −7 = (−1) · 26 + 19 we have
1 = 3 · 26 + (−7) · 11 = 3 · 26 + ((−1) · 26 + 19) · 11 = 2 · 26 + 19 · 11.
Thus 11−1 = 19 and 11 x ⊕ 5 = 3 can be solved as follows: 11 x = 3 ⊕ (−5) =

3 ⊕ 21 = 24. Finally, x = 11−1 24 = 19 24 = 14. �
Definition 1.4.4 A nonzero element a ∈ Zn is called a zero divisor if there exists
another nonzero element b ∈ Zn such that a b = 0.
Example 1.4.6 4 5 = 0 in Z10 .
Lemma 1.4.2 A divisor of zero in Zn is never invertible.
Proof Suppose that a b = 0, a = 0, b = 0 and a is invertible, that is a −1
exists. Then we have a −1 (a b) = a −1 0 = 0. The left-hand side is equal to
a −1 (a b) = (a −1 a) b = 1 b = b, hence b = 0, a contradiction. �
1.4 The Ring of Integers Modulo n. The Field Z p 31
Theorem 1.4.4 If gcd(a, n) = d > 1 for some 0 = a ∈ Zn , then a is a zero

divisor in Zn . All elements from Z∗n are invertible in Zn and all other elements are
not invertible.
Proof Let a = bd and n = md. Then am = bdm = bn, which is a multiple of n.

Thus a m = 0 and a is a zero divisor. Thus, in Zn , aside from Z∗n , we have the zero
element and the zero divisors. On the other hand, by Theorem 1.4.3 all elements of
Z∗n are invertible. �
Hence, depending on n, the following property may or may not be true for Zn :
9. For every nonzero a ∈ Zn there is a unique element a −1 ∈ Zn such that aa −1 =
a −1 a = 1.
Definition 1.4.5 A commutative ring < R, +, · > is called a field if the following
axiom is satisfied
F1 For every nonzero a ∈ R there is a unique element a −1 ∈ R such that a · a −1 =
a −1 · a = 1.
Theorem 1.4.4 gives us a complete answer to the question of when Zn is a field.
Theorem 1.4.5 Zn is a field if and only if n is prime.
Proof If p is prime, then Z∗p consists of all non-zero elements of Z p , and by

Theorem 1.4.3 all elements of Z∗p are invertible. Hence Z p is a field. Suppose Zn
is a field. Since all non-zero elements of any field are invertible, by Theorem 1.4.4
Z∗n = Zn \ {0}, that is all integers smaller than n are relatively prime to n. This is
possible only when n is prime. �
Exercises
1. Prove that in any commutative ring R a divisor of zero is not invertible. (Hint:
prove first that for any a ∈ R we have a · 0 = 0. Then follow the proof of
Lemma 1.4.2.)
2. (a) List all invertible elements of Z16 and for each invertible element a give its
inverse a −1 .
(b) List all zero divisors of Z15 and for each zero divisor a give all non-zero
elements b such that a b = 0.
3. (a) Which one of the two elements 74 and 77 is invertible in Z111 and which one
is a zero divisor? For the invertible element a, give the inverse a −1 and for
the zero divisor b give the element c ∈ Z111 such that b c = c b = 0.
(b) Solve the equations 77 x ⊕ 21 = 10 and 74 x ⊕ 11 = 0 in Z111 .
4. Let a and b be two elements of the ring Z21 and let f : Z21 → Z21 be a linear
function defined by f (x) = a x ⊕ b (where the operations are computed in
Z21 ).
32 1 Integers
(a) Describe the set of all pairs (a, b) for which the function f is one-to-one.
(b) Find the range of the function f for the case a = 7, b = 3.
(c) Suppose a = 4 and b = 15. Find the inverse function f −1 (x) = c x ⊕ d
which satisfies f −1 ( f (x)) = x for each x ∈ Z21 .
5. How many solutions in Z11 does the equation x 102 = 4 have? List them all.
6. Given an odd number m > 1, find the remainder when 2φ(m)−1 is divided by m.
This remainder should be expressed in terms of m.
7. (Wilson’s Theorem) Let p be an integer greater than one. Prove that p is prime if
and only if ( p −1)! = −1 in Z p . (Hint: 1 and −1 = p −1 are the only self-inverse
elements of Z∗p .)
8. Prove that any commutative finite ring R (unity is not assumed) without zero
divisors is a field.
1.5 Representation of Numbers
There is an important distinction between numbers and their representations. In the

decimal system the zero and the first nine positive integers are denoted by symbols
0, 1, 2, . . . , 9, respectively. These symbols are called digits. The same symbols are
used to represent all the integers. The tenth integer is denoted as 10 and an arbitrary
integer N can now be represented in the form
N = an · 10n + an−1 · 10n−1 + · · · + a1 · 10 + a0 , (1.20)
where a1 , a2 , . . . , an are integers that can be represented by a single digit

0, 1, 2, . . . , 9. For example,
1998 = 1 · 103 + 9 · 102 + 9 · 10 + 8.
In this notation the meaning of a digit depends on its position. Thus two digit symbols
“9” are situated in the tens and the hundreds places and their meaning is different.
In general, for the number N given by (1.20) we write
N = (an an−1 . . . a1 a0 )(10)
to emphasise the exceptional role of 10. This notation is called positional, and its
invention has been attributed to the Sumerians or the Babylonians. It was further
developed by Hindus, and proved to be of enormous significance to civilisation. In
Roman symbolism, for example, one wrote
MCMXCVIII = (thousand) + (nine hundred) + (ninety) + (five) + (one) + (one) + (one),

1.5 Representation of Numbers 33
It is clear that more and more new symbols such as I, V, X, C, M are needed as the
numbers get larger while with the Hindu positional system, now in use, we need
only ten “Arabic numerals” 0, 1, 2, . . . , 9, no matter how large the number is. The
positional system was introduced into medieval Europe by merchants, who learned
it from the Arabs. It is exactly this system which is to blame for the fact that the
ancient art of computation, once confined to a few adepts, has become a routine
algorithmic skill that can be done automatically by a machine, and is now taught in
primary school.
Mathematically, there is nothing special about the decimal system. The use of ten
as the base goes back to the dawn of civilisation, and is attributed to the fact that
we have ten fingers on which to count. Other numbers could be used as the base,
and undoubtedly some of them were used. The number words in many languages
show remnants of other bases, mainly twelve, fifteen and twenty. For example, in
English the words for 11 and 12 and in Spanish the words for 11, 12, 13, 14 and 15
are not constructed on the decimal principle. In French the word for 20—vingt—
suggests that that number had a special role at some time in the past. The Babylonian
astronomers had a system of notation with base 60. This is believed to be the reason
for the customary division of the hour and the angular degree into 60 minutes. In the
theorem that follows we show that an arbitrary positive integer b > 1 can be used as
a base.
Theorem 1.5.1 Let b > 1 be a positive integer. Then every positive integer N can
be uniquely represented in the form
N = d0 + d1 b + d2 b2 + · · · + dn bn , (1.21)
where the “digits” d0 , d1 , . . . , dn lie in the range 0 ≤ di ≤ b−1, for all i.
Proof The proof is by induction on N , the number being represented. Clearly,

the representation 1 = 1 for 1 is unique. Suppose, inductively, that every inte-
ger 1, 2, . . . , N −1 is uniquely representable. Now consider the integer N . Let
d0 = N mod b. Then N − d0 is divisible by b and let N1 = (N − d0 )/b. Since
N1 < N , by the induction hypothesis N1 is uniquely representable in the form
N − d0
N1 = = d1 + d2 b + d3 b2 + · · · + dn bn−1 .
b
Then clearly
N = d0 + N1 b = d0 + d1 b + d2 b2 + · · · + dn bn
is the representation required.

Finally, suppose that N has some other representation in this form, i.e.,
N = d0 + d1 b + d2 b2 + · · · + dn bn = e0 + e1 b + e2 b2 + · · · + en bn .
34 1 Integers
Then d0 = e0 = r as they are equal to the remainder of N on dividing by b. Now

the number
N −r
N1 = = d1 + d2 b + d3 b2 + · · · + dn bn−1 = e1 + e2 b + e3 b2 + · · · + en bn−1
b
has two different representations which contradicts the inductive assumption, since
we have assumed the truth of the result for all N1 < N . �
Corollary 1.5.1 We use the notation
N = (dn dn−1 . . . d1 d0 )(b) (1.22)
to express (1.21). The digits di can be found by the repeated application of the division
algorithm as follows:
N = q1 b + d0 , (0 ≤ d0 < b)
q1 = q2 b + d1 , (0 ≤ d1 < b)
..
.
qn = 0 · b + dn (0 ≤ dn < b)
For example, the positional system with base 5 employs the digits 0, 1, 2, 3, 4 and
we can write
1998(10) = 3 · 54 + 0 · 53 + 4 · 52 + 4 · 5 + 3 = 30443(5) .
But in the era of computers it is the binary (or dyadic) system (base 2) that has
emerged as the most important. This system has only two digits, 0 and 1, and a very
simple multiplication table for them. But under the binary system, representations
of numbers get longer quickly. For example,
150(10) = 1 · 27 + 0 · 26 + 0 · 25 + 1 · 24 + 0 · 23 + 1 · 22 + 1 · 2 + 0.
= 10010110(2) (1.23)
Leibniz9 was one of the ardent proponents of the binary system. According to
Laplace: “Leibniz saw in his binary arithmetic the image of creation. He imag-
ined that Unity represented God, and zero the void; that the Supreme Being drew
all beings from the void, just as unity and zero express all numbers in his system of
numeration.”
9 Gottfried Wilhelm von Leibniz (1646–1716) was a German mathematician and philosopher who
developed infinitesimal calculus independently of Isaac Newton, and Leibniz’s mathematical nota-
tion has been widely used ever since it was published. He invented an early mechanical calculating
machine.
1.5 Representation of Numbers 35
Let us look at the binary representation of a number from the information point
of view. Information is measured in bits. One bit is a unit of information expressed
as a choice between two possibilities 0 and 1. The number of binary digits in the
binary representation of a number N is therefore the number of bits we need to
transmit N through an information channel (or input into a computer). For example,
the Eq. (1.23) shows that we need 8 bits to transmit or convey the number 150.
Theorem 1.5.2 To input a number N by converting it into its binary representation

we need log2 N + 1 bits of information.
Proof Suppose that N has n binary digits in its binary representation. That is
N = 2n−1 + an−2 2n−2 + · · · + a1 21 + a0 20 , ai ∈ {0, 1}.
Then 2n > N ≥ 2n−1 or n > log2 N ≥ n − 1, which is equivalent to log2 N =

n − 1. Hence n = log2 N + 1. �
Example 1.5.1 To input 150 we need log2 150 + 1 = 7.2 + 1 = 8 bits.
Example 1.5.2 The input is the number 15011. Convert it to binary. What is the
length of this input?
Solution. Let 15011 = (an an−1 . . . a1 a0 )(2) be the binary representation of 15111.
We can find the binary digits of 15011 recursively by a series of divisions with
remainder:
15011 = 2 · 7505 + 1 −→ a0 = 1,
7505 = 2 · 3752 + 1 −→ a1 = 1,
3752 = 2 · 1876 + 0 −→ a2 = 0,
1876 = 2 · 938 + 0 −→ a3 = 0,
938 = 2 · 469 + 0 −→ a4 = 0,
469 = 2 · 234 + 1 −→ a5 = 1,
234 = 2 · 117 + 0 −→ a6 = 0,
117 = 2 · 58 + 1 −→ a7 = 1,
58 = 2 · 29 + 0 −→ a8 = 0,
29 = 2 · 14 + 1 −→ a9 = 1,
14 = 2 · 7 + 0 −→ a10 = 0,
7 = 2·3+1 −→ a11 = 1,
3 = 2·1+1 −→ a12 = 1,
1 = 2·0+1 −→ a13 = 1,
we see that 15011 = 11101010100011(2) , reading the binary digits from the column
of remainders from bottom to top. Hence the length of the input is 14 bits. �
Example 1.5.3 To estimate from above and from below the number of bits required
to input an integer N which has 100-digits in its decimal representation we may use
the GAP command LogInt(N,2) to calculate log2 N . A 100-digit integer is
between 1099 and 10100 , so we have
36 1 Integers
gap> LogInt(10ˆ100,2)+1;
333
gap> LogInt(10ˆ99,2)+1;
329
So the number in this range will need between 329 and 333 bits.
The negative powers of 10 are used to express those real numbers which are not
integers. This also works in other bases. For example,
1 1 2 5 0 0 1
= 0.125(10) = + + 3 = + 2 + 3 = 0.001(2)
8 10 102 10 2 2 2
1
= 0.142857142857 . . .(10) = 0.(142857)(10) = 0.001001 . . .(2) = 0.(001)(2)
7
The binary expansions of irrational numbers, such as
√
5 = 10.001111000110111 . . .(2) ,
are sometimes used in cryptography to simulate a random

√ sequence of bits. But this
method is considered to be insecure. The number 5 in the example above can be
guessed from the initial segment, which will reveal the whole sequence.
Exercises
1. Find the binary representation of the number 2002(10) and the decimal represen-
tation of the number 1100101(2) .
2. (a) Find the binary representation of the number whose decimal representation
is 2011.
(b) Find the decimal representation of the number whose binary representation
is 101001000.
3. Use Euler’s Theorem to find the last three digits in the binary representation of
751015 .
4. How many non-zero digits are there in the binary representation of the integer
100 . . 001 (2) · 100 .

. . . 001 (2) ?
n m
5. The integer n has base 7 representation n = abcd(7) , where a, b, c, d are base 7

digits of n. Prove that n is divisible by 6 if and only if the sum a + b + c + d of
its digits is divisible by 6.
6. The symbols A, B, C, D, E and F are used to denote the digits 10, 11, 12, 13, 14
and 15, respectively, in the hexadecimal representation (i.e., to base 16).
(a) Find the decimal representation of 2A4F(16) .
(b) Find the hexadecimal representation of 1000(10) .
Chapter 2
Cryptology
Enigmatic words—they are all full of meaning.

Nikolai Roerich (1874–1947)
Cryptology is about communication in the presence of adversaries or potential adver-

saries. In medieval times diplomats had to communicate with their superiors using
a messenger. Messengers could be killed, and letters could be captured and read
by adversaries. During times of war, orders from military headquarters needed to
be sent to the line officers without being intercepted and understood by the enemy.
The case of a war is an extreme example where the adversary is clearly defined. But
there are also situations where the existence of an ‘adversary’ is less obvious. For
example, corporate deals and all negotiations must remain secret until completed.
Sometimes two parties want to communicate privately even if they do not have any
adversaries. For example, they wish to exchange love letters, and confidentiality of
messages for them remains a very high priority. Thus, a classical goal of cryptography
is privacy. Authentication is another goal of cryptography which is any process by
which you verify that someone is indeed who they claim they are. We use passwords
to ensure that only certain people have access to certain resources (for example, if
you do your banking on the Internet you do not want other people to know your
financial situation or to tamper with your accounts). Digital signatures are a special
technique for achieving authentication. Apart from signing your encrypted emails,
digital signatures are used for other applications, for example to ensure that auto-
matic software updates originate from the company they are supposed to, rather than
being viruses. Digital signatures are to electronic communication what handwritten
signatures are to paper-based communication. Nowadays cryptography has matured
and it is addressing an ever increasing number of other goals.
In his article “Cryptography”1 Ronald Rivest writes: “The invention of radio
gave a tremendous impetus to cryptography, since an adversary can eavesdrop easily
1 Chapter 13. Handbook of Theoretical Computer Science. J. Van Leeuwen (ed.) (Elsevier, 1990)
pp. 717–755.
DOI 10.1007/978-3-319-21951-6_2
38 2 Cryptology
over great distance. The course of World War II was significantly affected by use,
misuse, and breaking of cryptographic systems used for radio traffic. It is intriguing
the computational engines designed and built by the British to crack the German
Enigma cipher are deemed by some to be the first real “computers”; one could argue
that cryptography is the mother (or at least the midwife) of computer science.” (This
chapter can be downloaded from Ron Rivest’s web page.)
Here, Rivest mentions the famous “Colossus” computers. Until recently all infor-
mation about them was classified. The Colossus computers were built by a dedicated
team of British mathematicians and engineers led by Alan Turing and Tommy Flow-
ers. It was extensively used in the cryptanalysis of high-level German communica-
tions. It is believed that this heroic effort shortened the Second World War by many
months. Recently Colossus was recreated and outperformed a modern computer (in
deciphering messages which had been encrypted using the Lorenz SZ 40/42 cipher
machine).2 Due to the secrecy that surrounded everything related to Colossus, there
arose a myth that the ENIAC was the first large-scale electronic digital calculator in
the world. It was not.
2.1 Classical Secret-Key Cryptology
One of the oldest ciphers known is Atbash. It even appears in the Hebrew Scriptures
of the Bible. Any occurrence of the first letter of the alphabet is replaced by the last
letter, occurrences of the second letter are replaced by the second to last etc. Atbash
is a specific example of a general technique called inversion.
Caesar is also a very old cipher used by Gaius Julius Caesar (100 BC–44 BC).
Letters are simply replaced by letters three steps further down the alphabet. This way
‘a’ becomes ‘d’, ‘b’ becomes ‘e’ etc. In fact, any cipher using a displacement of any
size is now known as a Caesar. Caesar is a specific example of a general technique
called displacement.
These two ciphers are examples of the so-called substitution methods which use
a mapping of an alphabet onto itself that replace a character with the one it maps
onto. If the mapping does not change within the message, the scheme is known as a
mono-alphabet scheme. Such cryptosystems were not very secure but were sufficient
enough when literacy was not widespread.
For both of these cryptosystems it is essential to keep the method of encryption
secret, because even publicising the idea on which it is based might give away an
essential part of the security of the system, especially if the adversary managed to
intercept sufficiently many encrypted messages.
2 For more about this exciting project, and for further historical information about Colossus, see
http://www.codesandcyphers.org.uk/lorenz/rebuild.htm.
2.1 Classical Secret-Key Cryptology 39
By the end of the 19th century it became clear that security must be introduced
differently. In 1883 Auguste Kerckhoffs [9]3 wrote two journal articles titled La Cryp-
tographie Militaire, in which he stated six design principles for military ciphers. His
main idea—which is called now Kerckhoffs’ Principle—was that the security must
be a result of not keeping the encryption mechanism secret but as a result of keep-
ing a changeable part of the encryption mechanism—called the secret key—secret.
Depending on the secret key the encryption mechanism should encrypt messages dif-
ferently. So, even if the adversary knows the encryption method but does not know
the key, they will not know how to decrypt messages.
Thus, until recently, a standard cryptographic solution to the privacy problem was
a secret-key cryptosystem, which consisted of the following:
• A message space M: a set of strings (plaintext messages) over some alphabet (e.g.,
binary alphabet, English, Cyrillic or Arabic alphabets);
• A ciphertext space C: a set of strings (ciphertext messages) over some alphabet
(e.g., the alphabet of the dancing men in one of the Arthur Conan Doyle’s stories
of Sherlock Holmes);
• A key space K: a set of strings (keys) over some alphabet;
• An encryption algorithm E : M × K → C, which to every pair m ∈ M and k ∈ K
puts in correspondence a ciphertext E(m, k);
• A decryption algorithm D : C ×K → M with the property that D(E(m, k), k) = m
for all m ∈ M and k ∈ K.
The meaning of the last condition is that if a message is encrypted with a key k,
then the same key, when used in the decryption algorithm, will decrypt this message
from the ciphertext.
To use a secret-key cryptosystem the parties wishing to communicate privately
agree on a key k ∈ K, which they must keep secret. They communicate a message
m ∈ M by sending the ciphertext c = E(m, k). The recipient can decrypt the
ciphertext to obtain the message m by means of the key k and the decryption algorithm
D since m = D(c, k). The cryptosystem is considered to be secure if it is infeasible
in practice for an eavesdropper, who has discovered E(m, k) but does not know k, to
deduce m.
Below we present three examples.
2.1.1 The One-Time Pad
The one-time pad is a nearly perfect solution to the privacy problem. It was invented
in 1917 by Gilbert Vernam (D. Kahn, The Codebreakers, Macmillan, New York,
1967) for use in telegraphy. In this secret-key cryptosystem the key is as long as the
message being encrypted. The key, once used, is discarded and never reused.
3 Auguste Kerckhoffs (1983–1903) was a Dutch linguist and cryptographer who was professor of
languages at the Ecole des Hautes Etudes Commerciales in Paris.
40 2 Cryptology
Suppose that parties managed to generate a very long string k of randomly chosen
0’s and 1’s. Suppose that they also managed to secretly deliver k to all parties involved
with the intention to use it as a key. If a party A wishes to send a telegraphic message
m to other parties, then it writes the message as a string of zeros and ones m =
m1 m2 . . . mn , takes the first n numbers from k, that is k = k1 k2 . . . kn and adds these
two strings component-wise mod 2 to get the encrypted message
c = m ⊕ k = c1 c2 . . . cn , where ci = mi ⊕ ki .
Then A destroys the first n numbers of the key. On the receiving end all other parties
decrypt the message c by computing m = c ⊕ k and also destroy the first n numbers
of the key. When another message is to be sent, another part of the key will be
used—hence the name “one-time pad.” This system is unconditionally secure in
the following sense. If c = c1 c2 . . . cn is the ciphertext, then an arbitrary message
m = m1 m2 . . . mn could be sent. Indeed, if the key were m ⊕ c, then m ⊕ (m ⊕ c) = c
and the ciphertext is c.
For written communication this system can be modified as follows. Each letter of
the alphabet is given a number in Z26 :
A B C D E F G H I J K L M
0 1 2 3 4 5 6 7 8 9 10 11 12
N O P Q R S T U V W X Y Z
13 14 15 16 17 18 19 20 21 22 23 24 25
You then agree to use a book, little-known to the general public (considered as a
very long string of letters), as the secret key. For example, “The Complete Poems of
Emily Dickinson” would be a good choice.4 Then you do the same as we did with
telegraphic messages except that we add messages mod 26. Suppose we need to send
a message
BUY TELECOM SHARES
and the first unused poem from the book is

Best Witchcraft is Geometry
To the magician’s mind –
His ordinary acts are feats
To thinking of mankind.
Then the message will be represented as the following string of 16 numbers:
B U Y T E L EC O M S HA R E S
1 20 24 19 4 11 4 2 14 12 18 7 0 17 4 18
4 This choice reflects author’s fascination with Dickinson’s poetry.

and the first 16 letters from the poem will be
BE S T W I T CHC R AF T I S
1 4 18 19 22 8 19 2 7 2 17 0 5 19 8 18
Adding these two messages mod 26 we get
2 24 16 12 0 19 23 4 21 14 9 7 5 10 12 10
C Y Q M A T X E V O JHF K M K
so the cryptotext will be
CYQMATXEVOJHFKMK.
This version of the one-time pad is much less secure as it is vulnerable to frequency
analysis.
Exercises
1. Use Khlebnikov’s poem
Today I will go once again
Into life, into haggling, into the flea market,
And lead the army of my songs
To duel against the market tide.5
as the key to encrypt BUY MORE PROPERTY and to decrypt RCXRN-

WOAPDYWCAUERKYWHZRGSXQJW.
2. Use the GAP command Random([0..25]); to generate a sequence of 20 random
letters of the alphabet.
3. Using the sequence obtained in the previous exercise as a key for a one-time pad
cryptosystem, encrypt and then decrypt back the sentence by Emily Dickinson
“I HAVE NO TIME TO HATE”. The GAP programs LettertoNumber and
NumbertoLetter found in Sect. 9.2.3 can help you to convert messages into
the digital format and back.
2.1.2 An Affine Cryptosystem
This is a substitution cipher which is also based on modular arithmetic. The key to
this cryptosystem is a pair k = (a, b) of numbers a ∈ Z∗26 , b ∈ Z26 . Under this
5 Velemir Khlebnikov (1885–1922) was one of the key poets in the Russian Futurist movement but
his work and influence stretch far beyond it. He was educated as a mathematician and his poetry is
very abstract and mathematical. He experimented with the Russian language, drawing deeply upon
its roots.
42 2 Cryptology
system a number in Z26 is assigned to every letter of the alphabet as in the previous
section. Each letter is encoded into the corresponding number x it is assigned to and
then into the letter to which the number a x ⊕ b is assigned. For instance, if a = 3
and b = 5, then the letter “H” will have a numerical encoding “7”. Then 37⊕5 = 0
is computed, and we note that “0” is the numerical encoding for “A”, which shows
that “H” is encrypted into “A”. Using the key k = (3, 5), the message
BUY TELECOM SHARES
will be encrypted into

INZKRMRLVPHAFERH
The requirement that a ∈ Z∗26 , i.e., gcd(a, 26) = 1 is needed to ensure that the
encryption function is one-to-one. Indeed, this is equivalent to having the function
E(x) = a x ⊕ b
be one-to-one. If a is a divisor of zero and a d = 0 for some d = 0, then E(x) =

E(x ⊕ d) and E is not one-to-one. In particular, E(0) = E(d) and unambiguous
decryption is impossible. On the other hand, if a is invertible, then a x ⊕ b = y
implies x = a−1 (y ⊕ (−b)) and the decryption function exists:
D(y) = a−1 (y ⊕ (−b)).
Since the key is very short, this system is not secure: one can simply use all keys one
by one and see which key gives a meaningful result. However it can be meaningfully
used in combination with other cryptosystems. For example, if we use this encryption
first and then use a one-time pad (or the other way around), the frequency analysis
will be very much hampered.
Exercises
1. Is it possible to use k = (13, 11) as a key in an affine cryptosystem?

2. Using the affine cryptosystem with the key k = (11, 13) encrypt the message
CRYPTO and decrypt the message DRDOFP.
3. In an affine cryptosystem with an unknown key Eve guessed that the letter F was
encrypted as N and the letter K was encrypted as O. Help Eve to calculate the
key.
4. A plaintext (in English) has been encrypted using an affine cryptosystem. The
obtained ciphertext is:
ljpcc puxya nip ljc cbhcx quxya wxrcp ljc aqo achcx nip ljc rskpn bipra ux ljcup
jkbba in alixc xuxc nip miplkb mcx riimcr li ruc ixc nip ljc rkpq bipr ix jua rkpq
ljpixc ux ljc bkxr in miprip sjcpc ljc ajkrisa buc ixc puxy li pwbc ljcm kbb ixc puxy
li nuxr ljcm ixc puxy li vpuxy ljcm kbb kxr ux ljc rkpqxcaa vuxr ljcm ux ljc bkxr
in miprip sjcpc ljc ajkrisa buc
Find the original plaintext. The following estimates of the relative frequencies of
the 26 letters in English texts may be of some help. You are also encouraged to
use any computer assistance you need.
Letter Rel. freq. Letter Rel. freq. Letter Rel. freq.

a 0.082 j 0.002 s 0.063
b 0.015 k 0.008 t 0.091
c 0.028 l 0.040 u 0.028
d 0.043 m 0.024 v 0.010
e 0.127 n 0.067 w 0.023
f 0.022 o 0.075 x 0.001
g 0.020 p 0.019 y 0.020
h 0.061 q 0.001 z 0.001
i 0.070 r 0.060
2.1.3 Hill’s Cryptosystem
We now consider a slightly more sophisticated encryption procedure, sometimes

called The Hill Cipher which was invented in 1929 by Lester S. Hill. Instead of sub-
stituting letters it substitutes blocks of letters of fixed length m. The whole message
is divided into such m-tuples and each m-tuple is encrypted separately as follows.
The key for this cryptosystem is an invertible m × m matrix over Z26 . Both matrix
operations, addition and multiplication, are defined by means of addition and multi-
plication modulo 26. Since we will not be using any other operations, it is no longer
appropriate to write the symbols ⊕ and for modular operations. To simplify things
we will use ordinary notation. We will consider the case m = 2 and therefore pairs
of letters and 2 × 2 matrices. Let

ab
K=
cd
be the key-matrix. The encryption of a pair of letters (P1 , P2 ) is carried out by

x1 x1 y1
(P1 , P2 ) → →K = → (C1 , C2 ),
x2 x2 y2
where x1 , x2 are the numerical codes for P1 , P2 and y1 , y2 are the numerical codes
for C1 , C2 . The invertibility of K is needed for the unambiguous recovery of x1 , x2
from y1 , y2 .
44 2 Cryptology
Example 2.1.1 Let

33
K=
25
and suppose the plaintext message is HELP. Then this plaintext is represented by
two pairs

H L 7 11
HELP → , → , .
E P 4 15
Then we compute

33 7 7 33 11 0
= , =
25 4 8 25 15 19
and continue the encryption as follows:

7 0 H A
, → , → HIAT
8 19 I T
so the cryptotext is HIAT.

The matrix K is invertible, hence an inverse K −1 exists such that KK −1 =
−1
K K = I2 , where I2 is the identity matrix of order 2. It follows that

x1 x1 x1
K −1 K = I2 = ,
x2 x2 x2
and decryption is possible. To implement decoding, we compute

−1 −1 5 23 5 23 15 17
K =9 =3 = .
24 3 24 3 20 9
Let us see how we can decrypt the ciphertext HIAT:

H A 7 0
HIAT → , → , .
I T 8 19
Then we compute

15 17 7 7 15 17 0 11
= , =
20 9 8 4 20 9 19 15
and continue the decryption as follows:

7 11 H L
, → , → HELP.
4 15 E P
We need a criterion of invertibility of a matrix over Z26 . The standard criterion

of invertibility for matrices over R is a nonzero determinant. Since Z26 has zero
divisors, we have to slightly modify the standard criterion.
Theorem 2.1.1 An n × n matrix K over Z26 is invertible if and only if det(K) is an
invertible element in Z26 .
Proof We will prove this theorem only for n = 2. Let us consider a 2 × 2 matrix
ab
K= whose determinant is = det(K) = ad − bc. Let us compute
cd

ab d −b 0
= . (2.1)
cd −c a 0
If is a divisor of zero, say = 0, then

ab d −b 0 00
= .
cd −c a 0 00
Let us denote the product of the two rightmost matrices as

d −b 0
L= .
−c a 0
If L = 0, then KL = 0 and K is a left divisor of zero. As in Lemma 1.4.2 it can be

shown that K cannot be invertible. If, however,

d −b 0 d −b 00
= ,
−c a 0 −c a 00
then a = b = c = d = 0 and then

ab 0 00
= ,
cd 0 00
whence K is again a left divisor of zero.

On the other hand, Eq. 2.1 shows that if is invertible, then
−1
ab d −b
= −1
cd −c a
is the inverse. �
46 2 Cryptology
Hill’s cryptosystem is not considered secure. In particular, it is vulnerable to the

so-called known plaintext attack. Indeed, if a k × k matrix K is a key, then it is
normally enough to know that the message fragments m1 , m2 , . . . , mk are encrypted
as c1 , c2 , . . . , ck . Indeed, if the ith column of a matrix X represents the numerical
encodings of the plain text fragment mi and the ith column of a matrix Y represents
the numerical encodings of the cipher text fragment ci , then Y = KX, from which the
key can be found as K = YX −1 . In rare cases the matrix X may appear degenerate, in
which case we will not be able to find K exactly but will still have much information
about it.
In cryptanalysis, which is the art of breaking ciphers, the so-called method of
‘cribs’ is widely used. This term was introduced by cryptographers in Bletchley Park
and it means a suspected plaintext. For example, an English language text contains
the word ‘that’ with high probability and a letter often starts with the word ‘dear’.
Exercises
1. (a) Which one of the two matrices, considered as matrices over Z26 ,

1 12 16
,
12 1 6 1
is invertible and which is not? Find the inverse for the invertible matrix.
(b) Let M be the invertible matrix found in part (a). Use it as a key in Hill’s
cryptosystem to encrypt YEAR and to decrypt ROLK.
2. In Hill’s cryptosystem with the key

11 12
K=
12 11
find all pairs of letters XY which do not change after being encoded twice, i.e.,
if we encode XY we get a pair ZT which is being encoded as XY.
3. You have captured the ciphertext
NWOLBOTEPEHKICNSHR.
You know it has been encrypted using the Hill cipher with a 2 × 2 matrix and you
know the first 4 letters of the message are the word “DEAR”. Find the secret key
and decrypt the message.
4. The key for Hill’s cryptosystem is the following matrix over Z26
⎡ ⎤
1 2 3 4 5
⎢9 11 18 12 4⎥
⎢ ⎥
K =⎢
⎢1 2 8 23 3⎥⎥.
⎣7 14 21 5 1⎦
5 20 6 5 0
Use GAP to decrypt the message
WGVUUTGEPVRIMFTXMXMHCYTNGYMJJE
EZKEWHLQQISDJYJCTYEUBYKFBWPBBE
5. (advanced linear algebra required) Prove that a square n × n matrix A over Z26 is
invertible if and only if its determinant det(A) is an invertible element of Z26 .
2.2 Modern Public-Key Cryptology
Traditional secret-key cryptology assumes that both the sender and the receiver must
be in possession of the same secret key which they use both for encryption and
decryption. This secret key must be delivered all around the world, to all the corre-
spondents. This is a major weakness of this system.
Modern public-key cryptology breaks the symmetry between the sender and the
receiver. It requires two separate keys for each user, one of which is secret (or private)
and one of which is public. The public key is used to encrypt plaintext, whereas the
private key is used to decrypt ciphertext. Since the public key is no longer secret it
can be widely distributed without compromising security. The term “asymmetric”
stems from the use of different keys to perform the encryption and the decryption,
each operation being the inverse of the other—while the conventional (“symmetric”)
cryptography uses the same key to perform both.
The computational complexity is the main reason why the system works. The
adversary will know how to decrypt messages but will still be unable to do it due to
the extremely high complexity of the task.
2.2.1 One-Way Functions and Trapdoor Functions
The idea of public-key cryptology is closely related to the concept of a one-way

function.
Definition 2.2.1 A function n → f (n) is said to be a one-way function if the com-

putation of f (n), given n, is computationally easy while the computation of n, given
f (n), is intractable.
Example 2.2.1 Given the availability of ordinary telephone books the function
TELEPHONE NUMBER = f (NAME)
can be easily performed in seconds as it is easy to find the name in the book since
they are listed in the alphabetical order but the function
48 2 Cryptology
NAME = f −1 (TELEPHONE NUMBER)
can hardly be performed at all, since in the worst case you need to read the whole
book in order to find the name corresponding to a given number. You might need a
month to do that.
We note that Definition 2.2.1 should not be treated as a rigorous mathematical

definition. It contains references to ‘easy’ and ‘intractable’ tasks which may be
dependent on the computing resources available.
A publicly available one-way function f has a number of useful applications. In
time-shared computer systems, instead of storing a table of login passwords, one
can store, for each password w, the value f (w). Passwords can easily be checked
for correctness at login, but even the system administrator cannot deduce any user’s
password by examining the stored table.
Definition 2.2.2 Let n → f (n) be a one-way function and let t be an additional

parameter (it can be a number, a graph, a function, anything) such that it is compu-
tationally easy to compute n, given f (n) and t. Then t is called a trapdoor and f is
called a trapdoor function.
Example 2.2.2 Imagine that you have taken the time to enter the telephone directory
into your computer, sorted all phone numbers in increasing order, and printed them.
Suppose that it took one month of your time. Then you possess a trapdoor to the one-
way function f described in Example 2.2.1. For you it is equally easy to compute f or
f −1 and you are the only person (at least for the next month) who can compute f −1 .
We describe the idea of a public-key cryptosystem with the following example.

Imagine that Alice possesses a trapdoor function
f (TEXT) = CIPHERTEXT
with a secret trapdoor t. Then she puts this function f in the public domain, where it
is accessible to everyone, and asks everybody to send her f (TEXT) each time when
the necessity arises to send a message TEXT confidentially. Knowing the trapdoor t,
it is an easy job for her to compute the TEXT from f (TEXT) while it is infeasible to
compute it for anybody else. The function f (or a certain parameter which determines
f uniquely) is called Alice’s public key and the trapdoor t is called her private key.
Example 2.2.3 Let us see how we can use the trapdoor function of Example 2.2.1
to construct a public-key cryptosystem. Take the University telephone directory and
announce the method of encryption as follows. Your correspondent must take a letter
of your message, find a name in the directory starting with this letter, and assign
to this letter the phone number of the person with the chosen name. She must do
it with all letters of your message. Then all these phone numbers combined will
form a ciphertext. For example, the message SELL BRIERLY, sent to you, will be
encrypted as follows:
2.2 Modern Public-Key Cryptology 49
S SCOTT 8751
E EVANS 8057
L LEE 8749
L LEE 5999
B BANDYOPADHYAY 7439
R ROSENBERG 5114
I ITO 7518
E ESCOBAR 6121
R RAWIRI 7938
L LEE 6346
Y YU 5125
The message in encrypted form will look like this:
87518057874959997439511475186121793863465125
For decryption you must use your private key, which is the inverse telephone directory.
2.3 Computational Complexity
In this section we will develop several rigorous concepts necessary for implementing
the idea of the previous section. To measure the running time of an algorithm we need
first to choose a unit of work, say one multiplication, or one division with remainder,
etc.; we will often call the chosen units of work as steps.
It is often the case that not all instances of a problem under consideration are
equally hard even if the two inputs are of the same length. For example, if we feed
an algorithm two different—but equally long—inputs (and we feed them in one at
a time, not both at once) then the algorithm might require an astronomical number
of operations to deal with the first input, but only a handful of operations to deal
with the second input. The (worst case) time complexity of an algorithm is a function
that for each length of the input shows the maximal number of units of work that
may be required. We say that an algorithm is of time complexity f (n) if for all n and
for all inputs of n bits, the execution of the algorithm requires at most f (n) steps.
The worst-case complexity takes into consideration only the hardest instances of the
problem. It is relevant when people are pessimistic, and think that it is very likely
that a really hard instance of the problem will crop up.
Average-case complexity, on the other hand, estimates how difficult the relevant
problem is ‘on average’. An optimist, thinking that hard instances of the problem
are rare, will be more interested in the average-case than the worst-case complexity.
At present, much less is known about the average-case complexity than about the
worst-case one, so we concentrate on the former.
We need a language to compare the time complexity functions of different
algorithms.
50 2 Cryptology
2.3.1 Orders of Magnitude
Firstly, we will say what it means to be asymptotically equal.

Definition 2.3.1 Let f (x) and g(x) be two real-valued functions. We say that f (n) ∼
g(n) (read “f is asymptotically equal to g”) if
f (n)
lim = 1.
n→∞ g(n)
Example 2.3.1 Let f (x) = dk=0 ak x k be a polynomial of degree d. Then f (n) ∼

a0 nd . Indeed, when n → ∞
f (n) a0 nd + a1 nd−1 + · · · + ad a1 1 ad 1
= =1+ · + ··· + · → 1.
a0 nd a0 nd a0 n a0 nd
Example 2.3.2 The famous Stirling formula

√
n! ∼ 2πn · nn e−n (2.2)
gives us a tool to compare factorial growth with other types of growth.
For comparing the growth of functions we use the “little-oh,” “big-Oh” and “big-
Theta” notation.
Definition 2.3.2 We say that f (n) = o(g(n)) (read “f is little-oh of g”) if
f (n)
lim = 0.
n→∞ g(n)
Informally, this means that f grows more slowly than g when n gets large.
Example 2.3.3 1000n2.9 = o(n3 ).

This is almost obvious since
1000n2.9 1000
3
= 0.1 → 0.
n n
However not all comparisons can be done so easily. To compare the rate of growth
of two functions one often needs L’Hospital’s rule. We formulate it in the form that
suits our applications.
Theorem 2.3.1 (L’Hospital’s rule.) Let f (x) and g(x) be two differentiable functions
such that limx→∞ f (x) = ∞, and limx→∞ g(x) = ∞. Suppose that
2.3 Computational Complexity 51
f (x)
lim
x→∞ g (x)
exists. Then
f (x) f (x)
lim = lim .
x→∞ g(x) x→∞ g (x)
√
Example 2.3.4 ln n = o( n).
Let us justify this using L’Hospital’s rule. Indeed,
ln x (ln x) 1/x 2

lim √ = lim √ = lim √ = lim √ = 0.
x→∞ x x→∞ ( x) x→∞ 1/2 x x→∞ x
Example 2.3.5 (a) n1999 = o(en ), (b) cn = o(n!).

(a) again follows from L’Hospital’s rule and we leave it as an exercise. (b) follows
from Stirling’s formula. Indeed,
cn cn 1 (ec)n 1 ec n
lim = lim √ = lim √ · n = lim √ · = 0.
n→∞ n! n→∞ 2πn · nn e−n n→∞ 2πn n n→∞ 2πn n
Definition 2.3.3 We say that f (n) = O(g(n)) (read “f is big-Oh of g”) if there exists
a number C > 0 and an integer n0 such that for n > n0
|f (n)| < Cg(n).
Informally, this means that f doesn’t grow at a faster rate than g when n gets large.
√
Example 2.3.6 (a) sin n = O(1), (b) 1000n3 + n = O(n3 ).
√ In the3 first case |sin n| ≤ 1 · 1 so we can take C = 1. In the second, we note that
n ≤ n , hence
√
1000n3 + n ≤ 1001n3
and we can take C = 1001.

d k
Proposition 2.3.1 Let f (x) = k=0 ak x be a polynomial of degree d. Then f (n) =
O(nd ).
Proof Let C = |a0 | + |a1 | + · · · + |ad |. Then x i < x d for sufficiently large x, and
|f (x)| = |a0 + a1 x + a2 x 2 + · · · + ad x d | ≤ (|a0 | + |a1 | + |a2 | + · · · + |ad |)x d = Cx d ,
where C = |a0 | + |a1 | + · · · + |ad |, which proves the statement. �

52 2 Cryptology
Definition 2.3.4 We say that f (n) = (g(n)) (read “f is big-Theta of g”) if there
exist two numbers c, C > 0 and an integer n0 such that for n > n0
cg(n) < |f (n)| < Cg(n).
Informally, this means that f grows as fast as g does when n gets large.
Example 2.3.7 πn+sin(n) = (n) since 2n < |πn+sin(n)| < 5n so we can choose
c = 2 and C = 5.
The functions 1, log n, n, nd , cn (c > 1), n! are considered to be standard and we

measure the growth of other functions by comparing their growth against the standard
ones:
O(1) at most constant Θ(1) constant
O(log n) at most logarithmic Θ(log n) logarithmic
O(n) at most linear Θ(n) linear
O(n2 ) at most quadratic Θ(n2 ) quadratic
O(n3 ) at most cubic Θ(n3 ) cubic
O(nd ) at most polynomial Θ(nd ) polynomial
O(cn ) at most exponential Θ(cn ) exponential
O(n!) at most factorial Θ(n!) factorial
These functions are listed in increasing order of the rapidity of growth. Of course
there are some intermediate cases like O(log log n) and O(n log n). The table below
provides estimates of the running times of algorithms for certain orders of complexity.
Here we have problems with input strings of 2, 16 and 64 bits.
Problem Order of complexity

size
n log n n n log2 n n2 2n n!
2 1 2 2 4 4 2
16 4 16 64 256 6.5 × 104 2.1 × 1013
64 6 64 384 4096 1.8 × 1019 >1089
If we assume that one operation (unit of labour) requires 1 µs (= 10−6 s), then it
is worth noting that a problem with exponential complexity will require on input of
64 bits:
1.84 × 1019 µs = 5845 centuries.
Problems which can only be solved by algorithms whose time complexity is expo-
nential quickly become intractable when the size of the input grows. That is why
mathematicians and computer scientists consider polynomial growth as the upper
bound of what can be practically computed. Everything which is beyond polynomial

growth is considered to be intractable (though there are some interesting intermedi-
ate cases, such as the subexponential time complexity algorithms for factorisation of
integers).
Exercises
√
1. Prove that (log n)2 = o( n).
2. Use L’Hospital rule to compare the growth of the two functions:
√
f (n) = n2007 , g(n) = 2 n
.

3. Let f (x) = dk=0 ak x k be a polynomial of degree d. Prove that f (n) = (nd ).

x
4. It has been experimentally established that the function ψ(x) = 2 lndtt approxi-
mates the function π(x) introduced in Sect. 1.1.3 even better than x/ ln x. Using
L’Hospital’s rule, prove that
x
ψ(x) ∼ .
ln x
5. List the following functions in increasing order of magnitude, when n → ∞:
√
(a) f (n) = (ln n)1000 , g(n) = n10 , h(n) = 3 en .
(b) f (n) = esin n , g(n) = n2 , h(n) = ln n!
6. We say that is a perfect power if there are positive integers m > 1 and k > 1 such
that = mk . Suppose that the unit of work is execution of one GAP command
RootInt(x,y) and that multiplication is costless. Write a GAP program that has
polynomial complexity and determines if the given integer n is a perfect power
or not. Find out if the following number n is a perfect power
32165915990795960806201156497692131799189453658831821777511700748913568729
08523398835627858363307507667451980912979425575549941566762328495958107942
76742746387660103832022754020518414200488508306904576286091630047326061732
13147723760062022617223850536734439419187423527298618434826797850608981800
75920878659088367693192622340064634811419535028889335540064440165586139725
67864525460233092587652156920261205787558242189274149331895101172683052822
80727849358699658455141506222721476847645629705008614991371536420103263486
34959615993459063845793313984237722143683892937148998975391746809877568851
72762336013543700624574174575024244791527281937.
If it is a perfect power, output m and k such that n = mk .

7. Use Stirling’s formula to establish the character of growth of the following bino-
mial coefficients:

(a) nk , where k is fixed,

(b) nk , where k ∼ αn, and α is a fixed real number with 0 < α < 1.
54 2 Cryptology
2.3.2 The Time Complexity of Several Number-Theoretic

Algorithms
In a number theoretic algorithm the input is often a number (or several numbers).
So what is the length of the input in bits if it is an integer N? In other words, we are
asking how many zeros and ones one needs to express N. This question was solved
in Sect. 1.5, where we learned how to represent numbers in binary. By Theorem 1.5.2
to express N in binary we need n = log2 N + 1 bits of information. For most
calculations it would be sufficient to use the following approximations: N ≈ 2n and
n ≈ log2 N.
Now we will consider two algorithms for calculating cN mod m, where c and m
are fixed numbers. Here N is the input and cN mod m is an output. The running
time of the algorithm will be measured by the number of modular multiplications
required. In ordinary arithmetic this measure might not be satisfactory since the
numbers grow in size and some multiplications are much more labour intensive than
the others. However, in modular arithmetic all numbers are of approximately equal
size and our assumption is realistic.
Algorithm 1 is given by the formula cN = (. . . (((c · c) · c) · c) . . .) · c. That is we
calculate powers of c recursively by setting c1 = c and ci+1 = ci · c. To calculate
cN by this method we require N − 1 multiplications. Hence the complexity function
f (n) for this algorithm is f (n) = N − 1 ≈ 2n − 1, where n = log2 N + 1 is the
length of the input. Since 21 2n < f (n) < 2n we have f (n) = (2n ). This algorithm
has exponential complexity.
We have been too straightforward in calculating cN mod m and the result was
appalling. We can be much more clever and do much better.
Algorithm 2 (Square and Multiply): Let us represent N in binary
N = 2k + ak−1 2k−1 + · · · + a1 21 + a0 20 , ai ∈ {0, 1},
where k = log2 N = n − 1. We can rewrite this as
N = 2 i0 + 2 i1 + · · · + 2 is , k = i 0 > i1 > · · · > i s , s ≤ k + 1 = n.
By successive squaring we compute

1 2 3 2 k k−1
c2 = c2 mod m, c2 = (c2 )2 mod m, c2 = (c2 )2 mod m, . . . , c2 = (c2 )2 mod m
using k = n − 1 modular multiplications. At most, another s ≤ n additional modular

multiplications may be required to calculate
i0 +2i1 +···+2is i0 i1 is
cN = c2 = c2 · c2 · · · · · c2 mod m.
So n − 1 ≤ f (n) ≤ 2n − 1. This means that f (n) = (n) and the algorithm has
linear complexity. We have now proven the following theorem.
Theorem 2.3.2 Let c and m be positive integers. Then for every positive integer N we
can calculate cN mod m using at most 2 log N multiplications modulo m. Algorithm
2 (Square and Multiply) has linear complexity.
Example 2.3.8 How many multiplications are needed to calculate c29 using Algo-
rithms 1 and 2?
The binary representation for 29 is as follows:
29 = 16 + 8 + 4 + 1 = 11101(2) .
and n = log2 29 + 1 = 5. Hence we need 4 multiplications to calculate c2 , c4 ,

c8 , c16 by successive squaring, and then we will need 3 more to calculate c29 =
c16 · c8 · c4 · c. Thus Algorithm 2 would use 7 multiplications in total. Algorithm 1
would use 28 multiplications.
The complexity of the Euclidean algorithm will also be important for us. So we
prove:
Theorem 2.3.3 For any two positive integers a and b the Euclidean algorithm will
find their greatest common divisor after at most 2 log2 N + 1 integer divisions with
remainder, where N = max(a, b).
Proof Let us make one observation first. Suppose a = qb + r is a division with
remainder, a = a/gcd(a, b), b = b/gcd(a, b), and r = r/gcd(a, b). Then a =
qb + r is also a division with remainder. Hence the number of steps that the
Euclidean algorithm (Theorem 1.2.3) requires is the same for the pair (a, b) as for
the pair (a , b ). This allows us to assume that gcd(a, b) = 1. Let us also assume that
a is not smaller than b.
We will first prove that if a ≥ b (as we just assumed) then on dividing a by b with
remainder
a = qb + r, (0 ≤ r < b),
we get r < a/2. Indeed, if q ≥ 2, then r a/2, hence r = a − b < a/2.
Let us perform the Euclidean algorithm on a and b
a = q1 b + r1 , 0 < r1 < b,
b = q2 r1 + r2 , 0 < r2 < r1 ,
r1 = q3 r2 + r3 , 0 < r 3 < r2 ,
..
.
rs−2 = qs rs−1 + rs , 0 < rs < rs−1 ,
rs−1 = qs+1 rs .
56 2 Cryptology
Then rs = gcd(a, b). Due to the observation at the beginning of the proof we can
conclude that
r3 < r1 /2 < a/4, r5 < r3 /2 < a/8,
a b
and by induction r2k+1 < and r2k < k . Suppose the algorithm stops at step
2k+1 2
s, i.e., after calculating that rs = 1. Then if s = 2k + 1, we have 2k+1 < a and
k < log2 a, whence s = 2k + 1 < 2 log2 a + 1. Hence s ≤ 2 log2 a = 2 log2 N. If
s = 2k, then 2k < b, whence k < log2 b ≤ log2 N, and s = 2k < 2 log2 N.
If a is smaller than b then we will need an additional step, and the number of steps
will be no greater than 2 log2 N + 1. �
Now we can draw conclusions about the time complexity of the Euclidean algo-
rithm. For one unit of work we will adopt the execution of a single a mod b operation,
that is division of a by b with remainder.
Corollary 2.3.1 The Euclidean algorithm has linear worst-case complexity.
Proof The upper bound in Theorem 2.3.3 can be interpreted as follows. The number
log2 N, where N = max(a, b), is almost exactly the number of bits, say k, in the
binary representation of N. So the length of the input, n, (numbers a and b) is at least
k and at most 2k while the number of units of work is at most 2k. So for the time
complexity function f (n) we have f (n) ≤ 2n. Thus f (n) = O(n). �
In Sect. 1.1.3 we saw that the Trial Division algorithm for factoring an integer n
(which, we recall,√ could involve performing as many divisions as there are primes
between 2 and n), was computationally difficult. Now we can state this precisely.
It has exponential time complexity!
Theorem 2.3.4 (A worst-case time complexity for factoring) The Trial Division
algorithm for factoring integers has exponential complexity.
Proof Let the unit of work be one division. Let us assume that we have an infinite
memory and that all primes are stored there: p1 , p2 , . . . , pm , . . .. Given a positive
√
integer N we have to try to divide it by all primes which do not exceed M = N.
According to the Prime Number Theorem there are approximately
√
M N
≈2
ln M ln N
such primes. This means
√ that in the worst-case scenario we have to try all of them
and thus perform 2 N/ ln N divisions. Since N ≈ 2n , where n is the number of
input bits, the worst-case complexity function takes the form
2 √ n 2 1 √ n
f (n) ≈ 2 ≈ · · 2 .
n ln 2 ln 2 n
√
Let 2 = αβ, where α > 1 and β > 1. Then
1 √ n αn
· 2 = · βn > βn.
n n
Thus f (n) grows faster than β n . �
In the case of calculating Nth powers we know one efficient and one inefficient
algorithm. For factoring integers we know only one and it is inefficient. All attempts
of researchers in number theory and computer science to come up with a more effi-
cient algorithm have resulted in only very modest improvements. Several algorithms
are known that are subexponential with the time complexity function, for example,
1/3 2/3
f (n) = ecn (ln n) (see [1]). This growth is still very fast. At the moment of writing
it is not feasible to factor a 200 digit integer unless it has many small divisors.
Exercises
1. (a) Estimate the number of bits required to input an integer N which has 100
digits in its decimal representation.
(b) Represent n = 1234567 in binary and decide how many multiplications
mod m the Square and Multiply algorithm would require to calculate
cn mod m.
2. The Bubble Sort Algorithm takes a finite list of numbers and arranges them in
the increasing order. Given a list of numbers, it compares each item in the list
with the item next to it, and swaps them if the former is larger than the latter. The
algorithm repeats this process until it makes a pass all the way through the list
without swapping any items (in other words, all items are in the correct order).
This causes larger values to “bubble” to the end of the list while smaller values
“sink” towards the beginning of the list.
Assume that one needs 100 bits to input any number on the list (so the length of
the input is 100n). Take one swap as one unit of work. Determine the worst case
complexity of the Bubble Sort Algorithm. Use the appropriate notation (big-oh,
little oh, etc.) to express the character of the growth.
3. The input of the following algorithm is a positive integer N. The algorithm tries to
divide N by the first (log2 N)3 primes and, if one of them divides N, it declares N
composite. If none of those primes divide N, the algorithm declares N interesting.
What is the worst-case complexity of this algorithm?
4. Let (fn ) be the sequence of Fibonacci numbers given by f0 = f1 = 1 and fn+2 =
fn+1 + fn .
(a) Prove that
fn < 2fn−1 and fn+5 > 10fn .
(b) Using part (a), prove Lamé’s theorem that the number of divisions with
remainder required by the Euclidean algorithm for finding gcd(a, b) is at
most five times the number of decimal digits in the smaller of a or b.
58 2 Cryptology
2.4 The RSA Public-Key Cryptosystem
Alice wishes to receive confidential messages from her correspondents. For this
purpose she may use the public-key RSA cryptosystem, named after Rivest, Shamir
and Adelman [2], who invented it in 1977. It is widely used now. It is based on the
fact that the mapping
f : x → x e mod n
for a specially selected very large number n and exponent e is a one-way function.
2.4.1 How Does the RSA System Work?
Alice creates her public and private keys as follows:

1. she generates two large primes p = q of roughly the same size;
2. calculates n = pq and φ = (p − 1)(q − 1), where φ is the value of the Euler
φ-function, φ(n);
3. using trial and error method, selects a random integer e with 1 < e < φ and
gcd(e, φ) = 1;
4. computes d such that ed ≡ 1 mod φ and 1 < d < φ.
We will later discuss how Alice can generate two large primes. She can then do steps
2–4 because the complexity of the Extended Euclidean Algorithm is so low that it
easily works for very large numbers. Note that finding d is also done by the Extended
Euclidean algorithm.
Alice uses a certain public domain which is accessible for all her correspondents,
for example, her home page, to publish her public key (n, e), keeping everything
else secret; in particular, d which is Alice’s private key (which will be used for
decryption). It must be clear for everybody that (n, e) is indeed Alice’s public key
and nobody but Alice could publish it.
She then instructs how to use her public key to convert text into ciphertext. In the
first instance all messages must be transformed into numbers by some convention
specified by Alice, e.g., we may use “01” instead of “a”, “02” instead of “b”, etc.;
for simplicity, let us not distinguish between upper and lower case, and denote a
space by “27”. Thus a message for us is a non-negative integer. The public key (n, e)
stipulates that Alice may receive messages, which are non-negative integers m which
are smaller than n. (If the message is longer it should be split into several shorter
messages.) The message m must be encrypted applying the following function to the
message:
f (m) = me mod n.
This function is uniquely determined by the Alice’s public key. It is a one-way

function to everybody but Alice who has a trapdoor d (we will see later how it can
2.4 The RSA Public-Key Cryptosystem 59
be used for decryption). For example, when Bob wishes to send a private message
to Alice, he obtains Alice’s public key (n, e) from the public domain and uses it as
follows:
• turns the message text into a nonnegative integer m < n (or several of them if
breaking the text into blocks of smaller size is necessary);
• computes c = me mod n;
• sends the ciphertext c to Alice.
Alice then recovers the plaintext m using her private key d (which is the trapdoor
for f ) by calculating
m = cd mod n.
This may seem to be a miracle at this stage but it can (and, below, will) be explained.
This system can work only because of the clever choice of the primes p and q.
Indeed, p and q should be chosen so that their product n = pq is infeasible to factorise.
This secures that p and q are known only to Alice, while at the same time n and her
public exponent e are known to everybody. This implies that Alice’s private exponent
d is also known only to her. Indeed, to calculate d from publicly known parameters,
one needs to calculate φ(n) first. But the only known method of calculating φ(n)
requires calculation of the prime factorisation of n. Since it is infeasible, we can
publish n but keep φ(n), and hence d, secret.
Example 2.4.1 This is of course a very small example (too small for practical pur-
poses), just to illustrate the algorithms involved. Suppose Alice’s arrangements were
as follows:
1. p = 101, q = 113;
2. n = pq = 11413, φ = (p − 1)(q − 1) = 11200;
3. e = 4203 (picked at random from the interval (1, φ), making sure that
gcd(e, φ) = 1);
4. d = 3267 (the inverse of e in Zφ );
5. the public key is therefore (11413, 4203), the private key is 3267.
If Bob wants to send the message “Hello Alice” he transforms it into a number as
described. The message is then represented by the integer
0805121215270112090305.
This is too large (≥11413), so we break the message text into chunks of 2 letters
at a time.
A. The first message fragment is m = 0805;

B. Bob computes c = me = 8054203 ≡ 6134 mod 11413;
C. Alice decrypts this message fragment by calculating cd = 61343267 ≡ 805
mod 11413.
60 2 Cryptology
If Bob wants to receive an encrypted answer from Alice he has to set up a similar
scheme. In practice people do not set up cryptosystems individually but use a trusted
provider of such services. Such a company would create a public domain and place
there all public keys attributed to participating individuals. Such a company creates
an infrastructure that makes encrypted communication possible. The infrastructure
that is needed for such cryptosystem to work is called a public-key infrastructure
(PKI) and the company that certifies that a particular public key belongs to a certain
person or organisation is called a certification authority (CA). The most known such
companies are Symantec (which bought VeriSign), Comodo, GlobalSign, Go Daddy
etc. Furthermore, we will show in Sect. 2.5 that the PKI also allows Alice and Bob
to sign their letters with digital signatures.
Exercises
1. With the primes given in Example 2.4.1 decide which one of the two numbers
e1 = 2145 and e2 = 3861 can be used as a public key and calculate the matching
private key for it.
2. Alice and Bob agreed to use the RSA cryptosystem to communicate in secret.
Each message consist of a single letter which is encoded as
A = 11, B = 12, . . . , Z = 36.
Bob’s public key is (n, e) = (143, 113) and Alice sent him the message 97. Which
letter did Alice send to Bob in this message?
3. Alice’s public exponent in RSA is e = 41 and the modulus is n = 13337. How
many multiplications mod n does Bob need to perform to encrypt his message
m = 2619? (Do not do the actual encryption, just count.)
4. Set up your own RSA cryptosystem. Demonstrate how a message addressed to
you can be encrypted and how you can decrypt it using your private key.
5. Alice and Bob have the public RSA keys (20687, 17179) and (20687, 4913),
respectively. Bob sent an encrypted message to Alice, Eve found out that the
encrypted message was 353. Help Eve to decrypt the message, suspecting that
the modulus 20687 might be a product of two three-digit primes. Try to do it with
an ordinary calculator first, then check your answer with GAP.
6. Alice and Bob encrypt their messages using the RSA method. Bob’s public key
is (n, e) = (24613, 1003).
(a) Alice would like to send Bob the plaintext m = 183. What ciphertext should
she send?
(b) Bob knows that φ(n) = 24300 but has forgotten his private key d. Help Bob
to calculate d.
(c) Bob has received the ciphertext 16935 from Casey addressed to him. Show
how he finds the original plaintext.
2.4.2 Why Does the RSA System Work?
There are five issues here:

1. Why is m = (me )d mod n?
2. Can me mod n and cd mod n be calculated efficiently?
3. To what extent can the RSA system be considered ‘secure’ as a cryptosystem?
4. How can the encryption and decryption exponents e and d be found?
5. How can large primes p and q be found?
Let us address these issues one by one.
1. First we consider the question of why the text recovered by Alice via her
private decryption key is actually the original plaintext. This means we must consider
(me )d mod n. We note that since ed ≡ 1 mod φ and φ = φ(n) = (p − 1)(q − 1)
we have ed = 1 + φ(n)k for some integer k. Suppose first that m and n are coprime.
Then by Euler’s theorem mφ(n) ≡ 1 mod n and
e d k
m = med = m1+φ(n)k = m · mφ(n) ≡ m mod n.
There is a very small probability that m will be divisible by p or q but even in this
unlikely case we still have m = (me )d mod n. To prove this we have to consider
(me )d mod p and (me )d mod q separately. Indeed,
e d
m = med = m1+(p−1)(q−1)x = m · m(p−1)(q−1)x

m mod p if gcd(m, p) = 1,
≡
0 mod p if p|m.
since in the first case by Fermat’s Little Theorem m(p−1) ≡ 1 mod p. In both cases
we see that m ≡ (me )d mod p.
Similarly we find (me )d ≡ m mod q. Then the statement follows from the Chinese
Remainder Theorem (Theorem 1.2.6). According to this theorem, there is a unique
integer N in the interval [0, pq) such that N ≡ m mod p and N ≡ m mod q. We have
two numbers with such property, namely m and (me )d mod n. Hence they coincide
and m = (me )d mod n.
We have established that the decrypted message is identical to the message that
was encrypted. This resolves the first issue.
2. To resolve the second issue we considered the computational problem of raising
a number to a power. The complexity of this operation is very low, in fact it is linear
(see Theorem 2.3.2). Hence me mod n and cd mod n can be calculated efficiently.
3. It is evident that if the prime factorisation of the number n in the public key is
known then anybody can compute φ and thus d. In this case encrypted messages are
not secure. But for large values of n the task of factorisation is too difficult and time
consuming to be feasible. So the encryption function (raise to power e mod n) is a
one-way function, with d as a trapdoor.
62 2 Cryptology
To illustrate how secure the system is Rivest, Shamir and Adelman encrypted a
sentence in English. This sentence was converted into a number as we did before
(the only difference was that they denoted a space as “00.” Then they encrypted it
further using e = 9007 and
n = 11438162575788886766932577997614661201021829672124236256256184293
5706935245733897830597123563958705058989075147599290026879543541.
These two numbers were published, and it was made known that n = pq, where
p and q are primes which contain 64 and 65 digits in their decimal representations,
respectively. Also published was the message
f (m) = 9686961375462206147714092225435588290575999112457431987469512093
0816298225145708356931476622883989628013391990551829945157815154.
An award of $100 was offered for decrypting it. This award was only paid 17 years
later, in 1994, when Atkins et al. [3] reported that they had decrypted the sentence.
This sentence—“The magic words are squeamish ossifrage,”—was placed in the title
of their paper. For decrypting, they factored n and found p and q which were
p = 3490529510847650949147849619903898133417764638493387843990820577
and
q = 32769132993266709549961988190834461413177642967992942539798288533.
In this work 600 volunteers participated. They worked 220 h on 1600 computers to
achieve this result! Recently, in 2009, another effort involving several researchers
factored a 232-digit number (RSA-768) utilising hundreds of machines over a span
of two years.6 Of course, doable does not mean practical but for very sensitive
information one would now want to choose primes as large as containing 150 digits
and even more.
It can be shown that finding d is just as hard as factoring n, and it is believed that
finding any trapdoor is as hard as factoring n, although this has not been proven. 30
years have passed since RSA was invented and so far all attacks on RSA have been
unsuccessful.
4. To find e and d we need only the Euclidean and the Extended Euclidean algo-
rithms. Indeed, first we try different numbers between 1 and φ(n) at random until
we find one which is relatively prime to φ(n) (the fact that it can be done quickly we
leave here without proof). This will be taken as e. Since d is the inverse of e modulo
φ(n), we find d using the Extended Euclidean algorithm. This can be done because
the Euclidean algorithm is very fast (Corollary 2.3.1).
6 See http://eprint.iacr.org/2010/006.pdf for details.

5. One may ask: if we cannot factor positive integers efficiently, then surely we
will not be able to say if a number is prime or not. If so, our wonderful system is in
danger because two big primes cannot be efficiently found. However this is not the
case and it is easier to establish if a number is prime or not than to factorise it. We
devote the next section to checking primality.
In the case of RSA it is preferable to use the following encodings for letters:
A B C D E F G H I J K L M
11 12 13 14 15 16 17 18 19 20 21 22 23
N O P Q R S T U V W X Y Z
24 25 26 27 28 29 30 31 32 33 34 35 36
The advantage of it is that a letter always has a two-digit encoding which resolves
some ambiguities. We will use it from now on and, in particular, in exercises.
Exercises
1. In RSA Bob has been using a product of two large primes n and a single public
exponent e. In order to increase security, he now chooses two public exponents
e1 and e2 which are both relatively prime to φ(n). He asks Alice to encrypt her
messages twice: once using the first exponent and then using another one. That
is, Alice is supposed to calculate c1 = me1 (mod n), then c2 = c1e2 (mod n),
and send c2 to Bob. He has also prepared two decryption exponents d1 and d2
for decrypting her messages. Does this double encryption increase security over
single encryption?
2. Eve intercepted the following message from Bob to Alice:
5272281348, 21089283929, 3117723025, 26844144908, 22890519533,
26945939925, 27395704341, 2253724391, 1481682985, 2163791130,
13583590307, 5838404872, 12165330281, 28372578777, 7536755222.
In the public domain Eve learns that this message was sent using the encryption
modulus n = pq = 30796045883. She also observes that Alice’s public key
is e = 48611. Decode the message which was encoded using the encodings
A = 11, B = 12, . . . , Z = 36.
3. Eve has intercepted the following message from Bob to Alice
[ 427849968240759007228494978639775081809,
498308250136673589542748543030806629941,
925288105342943743271024837479707225255,
95024328800414254907217356783906225740 ]
She knows Bob used the RSA cryptosystem with the modulus
64 2 Cryptology
n = 956331992007843552652604425031376690367
and that Alice’s public exponent is e = 12398737. She also knows that, to convert
their messages into numbers, Bob and Alice usually use the encodings: space =
00, A = 11, B = 12, . . . , Z = 36. Help Eve to break the code and decrypt the
message.
2.4.3 Pseudoprimality Tests
In this section we will discuss four probabilistic tests that might be used for testing the
compositeness of integers. Their sophistication and quality will gradually increase,
and only the last one will be practical.
By a pseudoprimality test we mean a test that is applied to a pair of integers (b, n),
where 2 ≤ b ≤ n − 1, and that has the following characteristics:
(a) The possible outcomes of the test are: “n is composite” or “inconclusive”.
(b) If the test reports “n is composite” then n is composite.
(c) The test runs in a time that is polynomial in log n (i.e., in the number of bits
necessary to input n).
If n is prime, then the outcome of the test will be “inconclusive” for every b. If the
test result is “inconclusive” for one particular b, then we say that n is a pseudoprime
to the base b (which means that n is so far acting like a prime number).
The outcome of the test for the primality of n depends on the base b that is chosen.
In a good pseudoprimality test there will be many bases b that will reveal that n is
composite in case it is composite. More precisely, a good pseudoprimality test will,
with high probability (i.e., for a large number of choices of the base b) declare that
a composite number n is composite. More formally, we define
Definition 2.4.1 We say that a pseudoprimality test applied to a pair of integers

(b, n) is good if there is a fixed positive real number t such that 0 < t ≤ 1, and every
composite integer n is declared to be composite for at least t(n − 2) choices of the
base b, in the interval [2, n − 1].
A good pseudoprimality test will find the compositeness of n with probability at

least t and, most importantly, this number t does not depend on n. This is, in fact,
sufficient for practical purposes since we can increase this probability by running this
test several times for several different bases. Indeed, if the probability of missing the
compositeness of n is p, then the probability of missing the compositeness running
it for two different bases will be p2 and for k different bases pk . For k → ∞ this
value quickly tends to 0, hence we can make our test as reliable as we want it to be.
Of course, given an integer n, it is silly to say that “there is a high probability that
n is prime”. Either n is prime or it is not, and we should not blame our ignorance on
n itself. Nonetheless, the abuse of language is sufficiently appealing and it is often
said that a given integer n is very probably prime if it was subjected it to a good
pseudoprimality test, with a large number of different bases b, and have found that
it is pseudoprime to all of those bases.
Here are four examples of pseudoprimality tests, only one of which is good.
Test 1. Given b, n. Output “n is composite” if b divides n, else “inconclusive.”
If n is composite, the probability that it will be so declared is the probability that
we happen to have found an integer b that divides n. The probability of this event, if
b is chosen at random uniformly from [2, n − 1], is
d(n) − 2
p(n) = ,
n−2
where d(n) is the number of divisors of n. Certainly p(n) is not bounded from below
by a positive constant t, if n is composite. Indeed, if ni = p2i , where pi is the ith
prime, then d(ni ) = 3, and
1
p(ni ) = →0
ni − 2
as i → ∞. This test is not good.
Example 2.4.2 Suppose n = 44 = 22 · 11. Then d(n) = 3 · 2 = 6, and
4 2
p(n) = = .
42 21
Test 2. Given b, n, where 2 ≤ b ≤ n−1. Output “n is composite” if gcd(b, n) = 1,
else output “inconclusive.”
This test runs in linear time and it is a little better than Test 1, but not yet good.
If n is composite, the number of bases b for which Test 2 will produce the result
“composite” is n − φ(n) − 1, where φ is the Euler totient function. Indeed, we have
φ(n) numbers b that are relatively prime to n; for those numbers b and only for those
we have gcd(b, n) = 1. We also have to exclude b = n which is outside of the range.
Hence the probability of declaring a composite n composite will be
n − φ(n) − 1
p(n) = .
n−2
For this test the number of useful bases will be large if n has some small prime
factors, but in that case it is easy to find out that n is composite by other methods.
If n has only a few large prime factors, say if n = p2 , then the proportion of useful
bases is very small, and we have the same kind of inefficiency as in Test 1. Indeed,
if ni = p2i , then φ(ni ) = pi (pi − 1) and
ni − φ(ni ) − 1 p2 − pi (pi − 1) − 1 pi − 1 1
p(ni ) = = i 2
= 2 ∼ →0
ni − 2 pi − 2 pi − 2 pi
66 2 Cryptology
if pi → ∞.

Example 2.4.3 Suppose n = 44 = 22 · 11. Then φ(n) = 44 1 − 21 1 − 1
11 = 20,
and
44 − 20 − 1 23
p(n) = = .
42 42
Test 3. Given b, n. If b and n are not relatively prime or if bn−1 ≡ 1 mod n then
output “n is composite”, else output “inconclusive”.
This test rests on Fermat’s Little Theorem. Indeed, if gcd(b, n) > 1 or gcd(b, n) =
1 and bn−1 ≡ 1 mod n, then n cannot be prime since, if n was prime, by Fermat’s
Little Theorem in the latter case we must have bn−1 ≡ 1 mod n. It also runs in linear
time if we use the Square and Multiply algorithm to calculate bn−1 , and it works
much better than the previous two tests.
Example 2.4.4 To see how this test works let us calculate 232 mod 33. We obtain:
232 = 25 · 25 · 25 · 25 · 25 · 25 · 22 ≡ (−1)6 · 22 ≡ 4 mod 33.
Hence 33 is not prime.

Unfortunately, this test is still not good. It works well for most but not for all num-
bers. The weak point of it is that there exist composite numbers n, called Carmichael
numbers, with the property that the pair (b, n) produces the output “inconclusive”
for every integer b in [2, n − 1] that is relatively prime to n. An example of such
a Carmichael number is n = 561, which is composite (561 = 3 · 11 · 17), but for
which Test 3 gives the result “inconclusive” for every integer b < 561 that is rel-
atively prime to 561 (i.e., that is not divisible by 3 or 11 or 17). For Carmichael
numbers Test 3 behaves exactly like Test 2, which we know is unsatisfactory. More-
over, it was proved recently that there are infinitely many Carmichael numbers [4],
which means that the drawback is serious. The first ten Carmichael numbers7 are:
561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341 . . .
Despite such occasional misbehaviour, the test usually seems to perform quite
well. When n = 169 (a difficult integer for tests 1 and 2) it turns out that there
are 158 different b’s in [2, 168] that produce the “composite” outcome from Test 3,
namely every such b except for 19, 22, 23, 70, 80, 89, 99, 146, 147, 150, 168.
Finally, we will describe a good pseudoprimality test. The idea was suggested in
1976 by Miller (see the details in [5]).
Test 4. (Rabin–Miller) Given b, n, where 2 ≤ b ≤ n − 1, we first calculate

gcd(b, n). If gcd(b, n) > 1 then we output “composite”. If gcd(b, n) = 1, let us
represent n − 1 as n − 1 = 2s t, where t is an odd integer. If
7 Sequence A002997 from The On-Line Encyclopedia of Integer Sequences http://oeis.org/.

(a) bt ≡ 1 mod n, and

(b) for every integer i in [0, s − 1]
i
b2 t ≡ −1 mod n,
then return “composite”, else return “inconclusive”.

Let us convince ourselves that Test 4 works. For this we need the identity
s−1 s
(a − 1)(a + 1)(a2 + 1) · · · · · (a2 + 1) = a2 − 1, (2.3)
which can easily be proved by induction.

Suppose that conditions (a) and (b) are satisfied but n is prime. Then gcd(b, n) = 1.
Substituting a = bt into the identity (2.3) and, using Fermat’s Little Theorem, we
will obtain
s−1 t s
(bt − 1)(bt + 1)(b2t + 1) · · · · · (b2 + 1) = b2 t − 1 = bn−1 − 1 ≡ 0 mod n.
However, by (a) and (b) every bracket is non-zero modulo n. Hence there are zero
divisors in Zn which contradicts the primality of n. This means that if the test outputs
“composite”, the number n is composite.
What is the computational complexity of this test? By Theorem 2.3.3, part (a) of
the test can be done in O(log n) divisions with remainder, and the complexity of this
is at most linear. Similarly, in part (b) of the test there are O(log n) possible values
of i to check, and for each of them we do a single multiplication of two integers
i i−1 i−1
calculating b2 t = b2 t · b2 t , each of which has O(log n) bits. Hence the overall
complexity is still linear.
Theorem 2.4.1 (Rabin) If n is composite then for at least 43 (n − 2) of the integers

b, such that 2 ≤ b ≤ n − 1, Test 4 gives the result “n is composite”.
This means that Test 4 is a good pseudoprimality test and, if we choose b at random
to prove the compositeness of n, then we will find the required b with probability
greater than 3/4. Hence we can set t = 3/4. The proof of this result cannot be
considered in this book.
Example 2.4.5 If n = 169, then it turns out that for 157 of the possible 167 bases b
in [2, 168] Test 4 will output “169 is composite”. The only bases b that 169 can fool
are 19, 22, 23, 70, 80, 89, 99, 146, 147, 150, 168. In this case the performance of
Test 4 and of Test 3 are identical. However, there are no analogues of the Carmichael
numbers for Test 4.
How can this pseudoprimality test be used to find large primes? Suppose that you
want to generate an n-digit prime. You generate an arbitrary n-digit number r and
subject it to a good pseudoprimality test (for example, Rabin–Miller Test) repeating
the test several times. Suppose that we have done k runs of Test 4 with different
68 2 Cryptology
random b’s and each time got the answer ‘inconclusive’. If r is composite, then the
probability that we get the answer “inconclusive” once is less than 1/4. If we run
this test k times, the probability that we get the answer “inconclusive” k times is less
than 1/4k . For k = 5 this probability is less than 10−3 . For k = 10 it is less than
10−6 , which is a very small number already. Since Test 4 performs very quickly we
may run this test 100 times. If we got the answer “inconclusive” all 100 times, the
probability that n is composite is negligible.
In 2002 Agrawal et al. [6] came up with a polynomial deterministic algorithm
(AKS algorithm) for primality testing. It is based on the following variation of Fer-
mat’s Little Theorem for polynomials:
Theorem 2.4.2 Let gcd(a, n) = 1 and n > 1. Then n is prime if and only if
(x − a)n ≡ (x n − a) mod n.
The authors received the 2006 Gödel Prize and the 2006 Fulkerson Prize for this work.
Originally the AKS algorithm had complexity O((log n)12 ), where n is the number
to be tested, but in 2005 C. Pomerance and H.W. Lenstra, Jr. demonstrated a variant
of AKS algorithm that runs in O((log n)6 ) operations, a marked improvement over
the bound in the original algorithm. Despite all the efforts it is still not yet practical,
but a number of researchers are actively working on improving this algorithm. See
[7] for more information on the algorithm and a proof of Theorem 2.4.2.
Exercises
1. We implement the first and the second pseudoprimality tests by choosing at ran-
dom b in the interval 1 < b < n and applying it to the pair (b, n).
(a) What is the probability that the first pseudoprimality test finds that 91 is
composite?
(b) What is the probability that the second pseudoprimality test finds that 91 is
composite?
2. Show that the third pseudoprimality test finds that 91 is composite for the pair
(5, 91).
n
3. Prove that any number Fn = 22 + 1 is either a prime or a pseudoprime to the
base 2. (Use Exercise 4 in Sect. 1.1.1.)
4. Write a GAP program that checks if a number n is a Carmichael number. Use it
to find out if the number 15841 is a Carmichael number.
5. Prove without using GAP that 561 is a Carmichael number, i.e., a560 ≡ 1 mod 561
for all a relatively prime to 561.
6. Show that 561 is a pseudoprime to the base 7 (i.e., n = 561 passes the Third
Pseudoprimality Test with b = 7) but not a pseudoprime to the base 7 relative to
the Miller–Rabin test.
7. Show that the Miller–Rabin test with b = 2 proves that n = 294409 is composite
(despite 294409 being a Carmichael number).
8. Show that a power of a prime is never a Carmichael number.
2.5 Applications of Cryptology 69
2.5 Applications of Cryptology
1. Diffie–Hellman exponential secret key exchange. This idea was suggested in

1976 by Diffie and Hellman [8], and it triggered the development of public-key
cryptography. Two parties A and B openly agree on two parameters: a positive integer
n and g ∈ Zn . They secretly choose two exponents a and b, respectively. Then A
sends g a to B and B sends g b to A. After that, B takes the received g a to the exponent
b to get g ab and A takes g b to the exponent a and also gets g ab . Then they use g ab
as their secret key. An eavesdropper has to compute g ab from g, g a and g b which for
n sufficiently large is intractable. The Elgamal cryptosystem, which we will study
later, develops this idea further.
2. Digital signatures. The notion of a digital signature may prove to be one of
the most fundamental and useful inventions of modern cryptography. A signature
scheme provides a way for each user to sign messages so that the signatures can be
verified by anyone. More specifically, each user can create a matched pair of private
and public keys so that only they can create a signature for a message (using their
private key) but anyone can verify the signature for the message (using the signer’s
public key). The verifier can convince himself that the message content has not been
altered since the message was signed. Also, the signer cannot later repudiate having
signed the message, since no one but the signer possesses the signer’s private key.
For example, when your computer receives a software update, say from Adobe,
it checks the digital signature to make sure that this is a genuine update from Adobe
and not a virus or trojan.
At this stage the only public-key cryptosystem that we know is the RSA but as
we will see the idea can also be used for other cryptosystems. If in RSA n = pq
is the product of two large primes p and q, then the message space M is the set
{0, 1, 2, . . . , n − 1}. We have functions EU and DU (encryption and decryption) as
EU : m → meU mod NU , DU : m → mdU mod NU ,
where eU and dU are the public exponent and the private exponent of user U, respec-
tively. One can turn this around to obtain a digital signature. If m is a document which
is to be signed by the user U, then she computes her signature as s = DU (m). The
user sends m together with the signature s. Anyone can now verify the signature by
testing whether EU (s) ≡ m mod NU or not.
This idea was first proposed by Diffie and Hellman [8]. The point is that if the
message m was changed then the old signature would be no longer valid, and the
only person who can create a new signature, matching the new message, should be
someone who knows the private key DU and we assume that only user U possess
DU .
By analogy with the paper world, where Alice might sign a letter and seal it in an
envelope addressed to Bob, Alice can sign her electronic letter m to Bob by appending
her digital signature DA (m) to m, and then seal it in an “electronic envelope” with
Bob’s address by encrypting her signed message with Bob’s public key, sending
70 2 Cryptology
the resulting message EB (m|DA (m)) to Bob. Only Bob can open this “electronic
envelope” by applying his private key to it to obtain DB (EB (m|DA (m))) = m|DA (m).
After that he will apply Alice’s public key to the signature obtaining EA (DA (m)). On
seeing that EA (DA (m)) = m, Bob can be really sure that the message m came from
Alice and its content was not altered by a third party.
These applications of public-key technology to electronic mail are likely to
become widespread in the near future. For simplicity, we assumed here that the
message m was short enough to be transmitted in one piece. If the message is long
there are methods to keep the signature short. We will not dwell on this here.
3. Pay-per-view movies. It is common these days that cable TV operators with all-
digital systems encrypt their services. This lets cable operators activate and deactivate
a cable service without sending a technician to your home. The set-up involves each
subscriber having a set-top box, which is a device connected to a television set at the
subscribers’ premises and which allows a subscriber to view encrypted channels of
his choice on payment. The set-top box contains a set of private keys of the user. A
‘header’ broadcast in advance of the movie contains keys sufficient to download the
actual movie. This header is in turn encrypted with the relevant user public keys.
4. Friend-or-foe identification. Suppose A and B share a secret key K. Later, A
is communicating with someone and he wishes to verify that he is communicating
with B. A simple challenge-response protocol to achieve this identification is as
follows:
• A generates a random value r and transmits r to the other party.
• The other party (assuming that it is B) encrypts r using their shared secret key K
and transmits the result back to A.
• A compares the received ciphertext with the result he obtains by encrypting r
himself using the secret key K. If the result agrees with the response from B, A
knows that the other party is B; otherwise he assumes that the other party is an
impostor.
This protocol is generally more useful than the transmission of an unencrypted shared
password from B to A, since the eavesdropper could learn the password and then pre-
tend to be B later. With the challenge-response protocol an eavesdropper presumably
learns nothing about K by hearing many values of r encrypted with K as key.
An interesting exercise is to consider whether the following variant of the above
idea is secure: A sends the encryption of a random r, B decrypts it and sends the
value r to A, and A verifies that the response is correct.
Exercises
1. Alice and Bob agreed to use Diffie–Hellman secret key exchange to come up with
a secret key for their secret key cryptosystem. They openly agreed on the prime
p = 100140889442062814140434711571
2.5 Applications of Cryptology 71
and an element g = 13 ∈ Zp . Alice has decided on her private key by choosing

a = 123456789. She also got a message g b = 92639204398732276532642490
482 from Bob. Which message should she send to Bob and how should she
calculate the shared secret key?
2. Alice and Bob have the following RSA parameters:
nA = 171024704183616109700818066925197841516671277, eA = 1571,
nB = 839073542734369359260871355939062622747633109, eB = 87697.
Bob knows his two primes which are
pB = 8495789457893457345793, qB = 98763457697834568934613.
Alice signs a message m by calculating her signature s = mdA (mod nA ). She

then encrypts the pair (m, s) using Bob’s public key by calculating (m1 , s1 ), where
m1 = meB (mod nB ) and s1 = seB (mod nB ). She obtains
m1 = 119570441441889749705031896557386843883475475,
s1 = 443682430493102486978079719507596795657729083
and sends the pair (m1 , s1 ) to Bob. Show how Bob can find the message m and
verify that it came from Alice. (Do not try to convert digits of m into letters, the
message is meaningless.)
References
1. Lenstra, A.K., Lenstra, H.W., Manasse, M.S., Pollard, J.M.: The number field sieve. In: Pro-
ceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, pp. 564–
572,14–16 May 1990
2. Rivest, R.L., Shamir, A., Adelman, L.: A method for obtaining digital signatures and public key
cryptosystems. Commun. ACM 21(2), 120–126 (1978)
3. Atkins, D., Graff, M., Lenstra, A.K., Leyland, P.C.: The magic words are squeamish ossifrage.
In: ASIACRYPT-94, Lecture Notes in Computer Science. vol. 917. Springer, New York (1995)
4. Alford, W.R., Granville, A., Pomerance, C.: There are infinitely many Carmichael numbers.
Ann. Math. 140, 703–722 (1994)
5. Williams, H.C.: Primality testing on a computer. Ars Combinatoria 5, 127–185 (1978)
6. Agrawal, M., Kayal, N., Saxena, N.: Primes is in P. Department of Computer Science and
Engineering, Indian Institute of Technology, Kanpur, India, 6 August 2002
7. Song., Y.Y.: Primality Testing and Integer Factorization in Public-key Cryptography. Kluwer,
The Netherlands (2004)
8. Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory IT 22, 644–
654 (1976)
9. Kerckhoffs, A.: La cryptographie militaire. Journal des sciences militaires. 9, 5–83 (1883)
Chapter 3
Groups
There is nothing in the world that I loathe more than group

activity, that communal bath where the hairy and slippery mix in
a multiplication of mediocrity.
Vladimir Nabokov (1899–1977)
It may seem pretty obvious what a group is, but it’s worth giving
it some thought anyway.
(from business management literature)
The concept of a group helps to unify a great variety of different mathematical

structures which at first sight might appear unrelated. In this chapter we will start by
looking at groups of permutations from which groups take their origin. We will then
give a general definition of a group, and move on to studying the multiplicative group
of Zn and the group of points of an elliptic curve. The latter two groups have recently
gained cryptographic significance. Group theory plays a central role in cryptography;
as a matter of fact, any large finite group can potentially be a basis of a cryptographic
system.
3.1 Permutations
3.1.1 Composition of Mappings. The Group

of Permutations of Degree n
Let A, B, and C be three sets. Suppose we have mappings f : A → B and g : B → C.

For any element a ∈ A we can find its image f (a) ∈ B under f and for that element
of B we can find its image g(f (a)) ∈ C under g. We have now implicitly defined
a third mapping which maps a ∈ A onto g(f (a)). We denote this mapping by f ◦ g
and call it the composition of mappings f and g. As a formula, it can be written as
(f ◦ g)(a) = g(f (a)).
Important Note: the convention we use runs contrary to that used in Calculus,
where f ◦ g(x) = f (g(x)) (i.e., first compute g(x), then apply the function f to
the result). This may cause some minor problems to students used to a different
DOI 10.1007/978-3-319-21951-6_3
74 3 Groups
convention. The great advantage of writing the composition in this way is that it is
the same convention as the one used in GAP.
One of the properties of composition of major importance is its compliance with
the associative law.
Proposition 3.1.1 Composition of mappings is associative, that is, given sets A, B,
C, D and mappings f : A → B, g : B → C and h : C → D, we have
(f ◦ g) ◦ h = f ◦ (g ◦ h).
Proof Two mappings from A to D are equal when they assign exactly the same
images in D to every element in A. Let us calculate the image of a ∈ A first under
the mapping (f ◦ g) ◦ h and then under f ◦ (g ◦ h):
((f ◦ g) ◦ h)(a) = h((f ◦ g)(a)) = h(g(f (a))),

(f ◦ (g ◦ h)(a) = (g ◦ h)(f (a)) = h(g(f (a))).
The image of a under both mappings is the same. Since a ∈ A was arbitrary, the two
mappings are equal. �
Let A be any set. It is a well-known that if f : A → A is a function which is both

one-to-one and onto then f is invertible, i.e., there exists a function g : A → A such
that
g ◦ f = f ◦ g = id, (3.1)
where id is the identity mapping on A. In this case f and g are called mutual inverses
and we use the notation g = f −1 and f = g −1 to express that. Equation (3.1) means
that g maps f (a) to a while f maps g(a) to a, i.e., g undoes the work of f , and f
undoes the work of g.
Example 3.1.1 Let R+ be the set of positive real numbers. Let f : R+ → R and
g : R → R+ be given as f (x) = ln x and g(x) = ex . These are mutual inverses and
hence both functions are invertible.
In what follows we assume that the set A is finite and consider mappings from A
into itself. If A has n elements, for convenience, we assume that the elements of A
are the numbers 1, 2, . . . , n (the elements of any finite set can be labeled with the
first few integers, so this does not restrict generality).
Definition 3.1.1 Let n be a positive integer. A permutation of degree n is a function
π : {1, 2, . . . , n} → {1, 2, . . . , n},
which is one-to-one and onto.

3.1 Permutations 75
Since a function is specified if we indicate what the image of each element is, we
can specify a permutation π by listing each element together with its image, like so:

1 2 3 ······ n − 1 n
π= .
π(1) π(2) π(3) · · · · · · π(n − 1) π(n)
Given that π is one-to-one, no number is repeated in the second row of the array.
Given that π is onto, each number from 1 to n appears somewhere in the second row.
In other words, the second row is just a rearrangement of the first.1

1 2 3 4 5 6 7
Example 3.1.2 π = is the permutation of degree 7 which
2 5 3 1 7 6 4
maps 1 to 2, 2 to 5, 3 to 3, 4 to 1, 5 to 7, 6 to 6, and 7 to 4.
Example 3.1.3 The mapping σ : {1, 2, . . . , 6} → {1, 2, . . . , 6} given by σ(i) =
3i mod 7 is a permutation of degree 6. Indeed,
σ(1) = 3, σ(2) = 6, σ(3) = 2, σ(4) = 5, σ(5) = 1, σ(6) = 4,
and thus
1 2 3 4 5 6
σ= .
3 6 2 5 1 4
Theorem 3.1.1 There are exactly n! permutations of degree n.

Proof Let us consider a permutation of degree n. It is completely determined by its
bottom row. There are n ways to fill the first position of this row, n − 1 ways to fill
the second position (since we must not repeat the first entry), etc., leading to a total
of n · (n − 1) · · · · · 2 · 1 = n! different possibilities. �
The composition of two permutations of degree n is again a permutation of degree
n. Most of the time we will omit the symbol ◦ for the composition, and speak of the
product πσ of two permutations π and σ, meaning the composition π ◦ σ.
Example 3.1.4 Let

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
σ= , π= .
2 4 5 6 1 8 3 7 4 6 1 3 8 5 7 2
Then

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
σπ =
2 4 5 6 1 8 3 7 4 6 1 3 8 5 7 2

1 2 3 4 5 6 7 8
= ,
6 3 8 5 4 2 1 7
1 Clearly, in this case of finite sets, one-to-one implies onto and vice versa but this will no longer be
true for infinite sets.

76 3 Groups
and

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
πσ =
4 6 1 3 8 5 7 2 2 4 5 6 1 8 3 7

1 2 3 4 5 6 7 8
= .
6 8 2 5 7 1 3 4
Explanation: the calculation of σπ requires us to find

σ π
• the image of 1 when we apply first σ, then π, (1 → 2 → 6, so write the 6 under
the 1),
σ π
• the image of 2 when we apply first σ, then π, (2 → 4 → 3, so write the 3 under
the 2),
• etc.
.. All this is easily done at a glance and can be written down immediately; BUT
. be careful to start with the left hand factor!
The calculation of πσ requires us to find

π σ
• the image of 1 when we apply first π, then σ, (1 → 4 → 6, so write the 6 under
the 1)
π σ
• the image of 2 when we apply first π, then σ, (2 → 6 → 8, so write the 8 under
the 2)
• etc.
.. All this is easily done at a glance and can be written down immediately; BUT
. be careful to start with the left hand factor again!
Important Note: the example shows clearly that πσ = σπ, that is, the commuta-
tive law for permutations does not hold; so we have to be very careful about the order
of the factors in a product of permutations. But the good news is that the composition
of permutations is associative. This follows from Proposition 3.1.1.
We can also calculate the inverse of a permutation; for example, using the same
π as above, we find
1 2 3 4 5 6 7 8
π −1 = .
3 8 4 1 6 2 7 5
Explanation: just read the array for π from the bottom up: since π(1) = 4, we must
have π −1 (4) = 1, hence write 1 under the 4 in the array for π −1 , since π(2) = 6, we
must have π −1 (6) = 2, hence write 2 under the 6 in the array for π −1 , etc. In this
case we will indeed have ππ −1 = id = π −1 π.
Similarly, we calculate

−1 1 2 3 4 5 6 7 8
σ = .
5 1 7 2 3 4 8 6
3.1 Permutations 77
Simple algebra shows that the inverse of a product can be calculated from the product
of the inverses (but note how the order is reversed!):
(πσ)−1 = σ −1 π −1 . (3.2)
To justify this, we need only to check that the product of πσ and σ −1 π −1 equals the
identity, and this is pure algebra: it follows from the associative law that
(πσ)(σ −1 π −1 ) = π(σ(σ −1 π −1 )) = π((σσ −1 )π −1 ) = ππ −1 = id.
Definition 3.1.2 The set of all permutations of degree n with the operation of com-
position is called the symmetric group of degree n, and is denoted by Sn .
We call Sn a group since the following axioms are satisfied:

1. Sn is associative, i.e., (πσ)τ = π(στ ) for all π, σ, τ ∈ Sn ;
2. Sn has an identity element id, i.e., π id = id π = π for all π ∈ Sn ;
3. every element π ∈ Sn has an inverse π −1 , i.e., ππ −1 = id = π −1 π.
In Sect. 1.4 we defined a commutative group. This group is not commutative as πσ is
not necessarily equal to σπ. The concept of a group was introduced into mathematics
by Évariste Galois.2
Exercises
1. In the following two cases calculate f ◦ g and g ◦ f . Note that they are different
and even their natural domains are different.
(a) f (x) = sin x and g(x) =
√ 1/x;
(b) f (x) = ex and g(x) = x.
2. Let Rθ be an anticlockwise rotation of the plane about the origin through an
angle θ. Show that Rθ is invertible with the inverse R2π−θ .
3. Show that any reflection H of the plane in any line is invertible and the inverse
of H is H itself.
4. Determine how many permutations of degree n act identically on a fixed set of
k elements of {1, 2, . . . , n}.
5. Show that the mapping σ : {1, 2, . . . , 8} → {1, 2, . . . , 8} given by σ(i) =
5i mod 9 is a permutation by writing it down in the form of a table.
6. Let the mapping π : {1, 2, . . . , 12} → {1, 2, . . . , 12} be defined by π(k) =
3k mod 13. Show that π is a permutation of S12 .
2 Évariste Galois (1811–1832), a French mathematician who was the first to use the word “group”
(French: groupe) as a technical term in mathematics to represent a group of permutations. While
still in his teens, he was able to determine a necessary and sufficient condition for a polynomial to
be solvable by radicals, thereby solving a long-standing problem. His work laid the foundations for
Galois theory, a major branch of abstract algebra.
78 3 Groups
7. The mapping τ : {1, 2, . . . , 12} → {1, 2, . . . , 12} is defined by τ (k) = k 2

mod 13. Show that τ is not a permutation of S12 by showing that both one-
to-one and onto properties are violated.
8. Calculate the inverses and all distinct powers of the permutations:

1 2 3 4 5 6 1 2 3 4 5 6
ρ= , τ= .
3 4 5 6 1 2 4 6 5 1 3 2
9. Let

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
σ= , γ= .
2 4 5 6 1 9 8 3 7 6 2 7 9 3 8 1 4 5
Calculate (σγ)−1 and check yourself with GAP.

10. Prove rigorously that the composition of two permutations of degree n is a
permutation of degree n.
3.1.2 Block Permutation Cipher
A permutation π of order n can be used as a secret key in the following cryptosystem

called a permutation cipher. In this cryptosystem a plaintext and ciphertext are both
over the same alphabet. Let m = a1 a2 . . . an be a message of fixed length n over an
alphabet A. Then the corresponding cryptotext is defined as
E(m, π) = aπ(1) aπ(2) . . . aπ(n) ,
which means the symbols of the message are permuted in accord with the permutation
π. If the message is longer than n we split it into smaller segments of length n. (It
is always possible to add some junk letters to make the total length of the message
divisible by n.)
Example 3.1.5 Suppose the secret key is

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
π=
2 12 3 16 4 10 9 15 7 8 6 5 14 1 13 11
and the message is

ALL ALL ARE GONE THE OLD FAMILIAR FACES
We split it into two submessages of length 16 each:

ALLALLAREGONETHE OLDFAMILIARFACES
3.1 Permutations 79
and then apply π to both submessages:

LNLEAGEHARLLTAEO LFDSFAIEILMACOAR
The final message is then

LNLEAGEHARLLTAEOLFDSFAIEILMACOAR
The permutation cipher is difficult to break with the knowledge of ciphertext

only. Indeed, the length of blocks is unknown, and even if known, the space of
secret keys is very large: it has n! possible permutations and n! grows very fast. Even
for reasonably small n like n = 128 the number of possible keys is astronomical.
However, if one can guess even a fragment of the plaintext, it may become easy.
To make guessing the plaintext difficult a substitution cipher can be applied first.
The combination of substitutions and permutations is called a product cipher. The
product ciphers are not normally used on their own but they are an indispensable
part of modern cryptography. For example, adopted on 23 November 1976 the DES
(Data Encryption Standard) involved 16 rounds of substitutions and permutations.
The main steps of the DES algorithm are as follows:
• Partitioning of the text into 64-bit blocks;
• Initial permutation within each block;
• Breakdown of the blocks into two parts: left and right, named L and R;
• Permutation and substitution steps repeated 16 times (called rounds) on each part;
• Re-joining of the left and right parts then, the inverse of the initial permutation.
DES is now considered to be insecure for many applications. In 1997, a call was
launched for projects to develop an encryption algorithm in order to replace the
DES. After an international competition, in 2001, a new block cipher Rijndael3 was
selected as a replacement for DES. It is now referred to as the Advanced Encryption
Standard (AES).
3.1.3 Cycles and Cycle Decomposition
A permutation π of order n which “cyclically permutes” some of the numbers 1, . . . , n

(and leaves all others fixed) is called acycle.
1 2 3 4 5 6 7
For example, the permutation π = is a cycle, because we
1 5 3 7 4 6 2
π π π π
have 5 → 4 → 7 → 2 → 5, and each of the other elements of {1, 2, 3, 4, 5, 6, 7},
namely 1, 3, 6, stay unchanged. To see this, we must of course chase elements around,
the nice cyclic structure is not immediately evident from our notation. We write
π = (5 4 7 2), meaning that all numbers not on the list are mapped to themselves,
3 J.
Daemen and V. Rijmen. The block cipher Rijndael, Smart Card Research and Applications,
LNCS 1820, Springer–Verlag, pp. 288–296.
80 3 Groups
whilst the ones in the bracket are mapped to the one listed to the right, except the
rightmost one, which is mapped to the leftmost on the list.
Note: cycle notation is not unique, since there is no beginning or end to a circle.
We can write π = (5 4 7 2) and π = (2 5 4 7), as well as π = (4 7 2 5) and
π = (7 2 5 4)—they all denote one and the same cycle.
We say that a permutation is a cycle of length k (or a k-cycle) if it moves k numbers.
For example, (3 6 4 9 2) is a 5-cycle, (3 6) is a 2-cycle, (1 3 2) is a 3-cycle. We note
also that the inverse of a cycle is again a cycle. For example (1 2 3)−1 = (1 3 2) (or
(3 2 1) if you prefer). Similarly, (1 2 3 4 5)−1 = (1 5 4 3 2). To find the inverse of
a cycle one has to reverse the arrows. This leads us to the following
Theorem 3.1.2 (i1 i2 i3 . . . ik )−1 = (ik ik−1 . . . i2 i1 ).
Not all permutations are cycles; for example, the permutation

1 2 3 4 5 6 7 8 9 10 11 12
σ= (3.3)
4 3 2 11 8 9 5 6 7 10 1 12
σ σ σ
is not a cycle (we have 1 → 4 → 11 → 1, but the other elements are not all
σ σ
fixed (2 goes to 3, for example). Let us chase other elements. We find: 2 → 3 → 2
σ σ σ σ σ
and 5 → 8 → 6 → 9 → 7 → 5. So in the permutation σ three cycles coexist
peacefully.
Two cycles (i1 i2 i3 . . . ik ) and (j1 j2 j3 . . . jm ) are said to be disjoint, if the sets
{i1 , i2 , . . . , ik } and {j1 , j2 , . . . , jm } have empty intersection. For instance, we may say
that
(1 5 8) and (2 4 3 6 9)
are disjoint. Any two disjoint cycles σ and τ commute, i.e., στ = τ σ (see Exercise 1).
For example,
(1 2 3 4)(5 6 7) = (5 6 7)(1 2 3 4).
However, if we multiply any cycles which are not disjoint, we have to watch their
order; for example: (1 2)(1 3) = (1 2 3), whilst (1 3)(1 2) = (1 3 2), and
(1 3 2) = (1 2 3).
The relationship between a cycle and the permutation group it belongs to is much
like that between a prime and the natural numbers.
Theorem 3.1.3 Every permutation can be written as a product of disjoint cycles.

Moreover, any such representation is unique up to the order of the factors.
Proof Let σ be a permutation of degree n. Take any element i1 ∈ {1, 2, . . . , n}

and start a cycle: σ(i1 ) = i2 , σ(i2 ) = i3 , etc. Suppose that i1 , i2 , . . . , ik were all
different and σ(ik ) ∈ {i1 , i2 , . . . , ik } (this has to happen sooner or later since the set
{1, 2, . . . , n} is finite). If σ(ik ) = i1 , we have a cycle. No other possibility can exist.
If σ(ik ) = i for 2 ≤ ≤ k, then σ(i−1 ) = i = σ(ik ), which contradicts σ being
one-to-one. We observe then that σ = (i1 i2 i3 . . . ik )σ , where σ does not move
3.1 Permutations 81
any element of the set {i1 , i2 , . . . , ik } and acts as σ on the complement of this set. So
σ fixes strictly more elements than σ does. This operation can be now applied to σ
and so on. It will terminate at some stage and at that moment σ will be represented
as a product of disjoint cycles. �
In particular, the permutation σ given in (3.3) can be represented as
σ = (1 4 11)(2 3)(5 8 6 9 7 5).
Exercises
1. Explain why any two disjoint cycles commute.
2. Let the mapping π : {1, 2, . . . , 12} → {1, 2, . . . , 12} be defined by π(k) =
3k mod 13. This is a permutation, don’t prove this. Find the decomposition
of π into disjoint cycles.
3. Calculate the following product of permutations in S5
(1 2)(1 3 5 2)−1 (4 3 5)(2 5)
and represent it as a product of disjoint cycles.

4. Let

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
σ= , τ= .
9 8 7 6 5 3 1 4 2 6 2 1 4 7 5 9 3 8
Calculate (στ )−1 and represent the result as a product of disjoint cycles.
3.1.4 Orders of Permutations
An element of a group has an important characteristic—its order. Orders are very

important for cryptography. Now we will define the order of a permutation, and show
how the decomposition of this permutation into a product of disjoint cycles allows
us to calculate its order.
It is clear that if a permutation τ is a cycle of length k, then τ k = id, i.e., if this
permutation is repeated k times, we will have the identity permutation as a result of
this repeated action. Moreover, for no positive integer s smaller than k we will have
τ s = id. Also it is clear that if τ m = id for some positive integer m, then k is a divisor
of m. This observation motivates our next definition.
Definition 3.1.3 Let π be a permutation. The smallest positive integer i such that
π i = id is called the order of π.
It is not immediately obvious that any permutation has an order. We will see later
that this is indeed the case.
82 3 Groups
Example 3.1.6 The order of the cycle (3 2 6 4 1) is 5, as we noted before.
Example 3.1.7 Let us calculate the order of the permutation π = (1 2)(3 4 5). We
have:
π = (1 2)(3 4 5),
π 2 = (3 5 4),
π 3 = (1 2),
π 4 = (3 4 5),
π 5 = (1 2)(3 5 4),
π 6 = id.
So the order of σ is 2 · 3 = 6 (note that π has been given as a product of two disjoint
cycles with relatively prime lengths).
We see that for those powers k = 2, 3, 4, 5 for which (1 2)k = id we have

(3 5 4)k = id and the other way around. This happens because the orders of (1 2)
and (3 4 5) are relatively prime.
Example 3.1.8 The order of permutation ρ = (1 2)(3 4 5 6) is four. To see this let
us calculate
ρ = (1 2)(3 4 5 6),
ρ2 = (3 5)(4 6),
ρ3 = (1 2)(3 6 5 4),
ρ4 = id.
So the order of σ is 4 (note that ρ has been given as a product of disjoint cycles but
their lengths were not coprime).
More generally, this suggests that the order of a product of disjoint cycles equals
the least common multiple of the lengths of those cycles. We will upgrade this
suggestion into a theorem.
Theorem 3.1.4 Let σ be a permutation and σ = τ1 τ2 . . . τr be the decomposition

of σ into a product of disjoint cycles. Let k be the order of σ and k1 , k2 , . . . , kr be
the orders (lengths) of τ1 , τ2 , . . . , τr , respectively. Then
k = lcm (k1 , k2 , . . . , kr ). (3.4)
Proof We first notice that τim = id if and only if m is a multiple of ki . Then,

since the cycles τi are disjoint, we know that they commute and hence for
k = lcm (k1 , k2 , . . . , kr )
σ k = τ1k τ2k . . . τrk = id.
So the order of σ is not greater than lcm (k1 , k2 , . . . , kr ).

3.1 Permutations 83
Suppose now σ m = id for some m. Let us prove that m is a multiple of

lcm (k1 , k2 , . . . , kr ). We have
σ m = τ1m τ2m . . . τrm = id.
The powers of cycles τ1m , τ2m , . . ., τrm act on disjoint sets of indices and, since
σ m = id, it must be τ1m = τ2m = · · · = τrm = id. For if not, and τsm (i) = j with
i = j, then the product τ1m τ2m . . . τrm cannot be equal to id because all permutations
τ1m , . . . , τs−1
m , τ m , . . . , τ m leave i and j invariant. Thus the order of σ is a multiple
s+1 r
of each of the k1 , k2 , . . . , kr and hence a multiple of the least common multiple of
them. Thus the order of σ is not smaller than lcm (k1 , k2 , . . . , kr ). This proves the
theorem. �
Example 3.1.9 The order of
σ = (1 2 3 4)(5 6 7)(8 9)(10 11 12)(13 14 15 16 17)
is lcm(4, 3, 2, 3, 5) = 60. Before applying the formula (3.4) we must carefully check
that the cycles are disjoint.
Example 3.1.10 To determine the order of an arbitrary permutation, first write it as

product of disjoint cycles. For example, to determine the order of

1 2 3 4 5 6 7 8 9 10 11 12
σ=
4 3 2 11 8 9 5 6 7 10 1 12
we represent it as
σ = (1 4 11)(2 3)(5 8 6 9 7),
and therefore the order of σ is 30.
Exercises
1. Find the order of the permutations

1 2 3 4 5 6 7 8 9
(a) σ = ,
5 3 6 7 1 2 8 9 4
(b) τ = (1 2)(2 3 4)(4 5 6 7)(7 8 9 10 11).
2. There is an amusing legend about Flavius Josephus, a famous historian and math-
ematician who lived in the first century A.D. The story says that in the Jewish
revolt against Rome, Josephus and 40 of his comrades were holding out against
the Romans in a cave. With defeat imminent, they resolved that, like the rebels
at Masada, they would rather die than be slaves to the Romans. They decided
to arrange themselves in a circle. One man was designated as number one, and
they proceeded clockwise around the circle of 41 men killing every third man.
84 3 Groups
At first it is obvious whose turn it was to be killed. Initially, the men in positions
3, 6, 9, 12, . . . , 39 were killed. The next man to be killed was in position 1 and
then in position 5 (since the man in position 3 was slaughtered earlier), and so on.
Josephus (according to the story) instantly figured out where he ought to stand
in order to be the last man to go. When the time came, instead of killing him-
self, he surrendered to the Romans and lived to write his famous histories: “The
Antiquities” and “The Jewish War”.
(a) Find the permutation σ (called the Josephus permutation) for which σ(i) is
the number of the man who was ith to be killed.
(b) In which position did Josephus stand around the circle?
(c) Find the cyclic structure of the Josephus permutation.
(d) What is the order of the Josephus permutation?
(e) Calculate σ 2 and σ 3 .
3. The mapping π(i) = 13i mod 23 is a permutation of S22 (do not prove this). Find
the decomposition of π into a product of disjoint cycles and determine the order
of this permutation.
3.1.5 Analysis of Repeated Actions
In this section we consider one important application of permutations. Sometimes

(and often in cryptography) a certain action is performed repeatedly and we are
interested in the outcome that results after a number of repetitions.
As one particularly instructive example we will analyse the so-called interlacing
shuffle that card players often do with a deck of cards. Suppose that we have a deck
of 2n cards (normally 52) and suppose that our cards were numbered from 1 to 2n
and the original order of cards in the deck was
a1 a2 a3 . . . a2n−1 a2n .
We split the deck into two halves which contain the cards a1 , a2 , . . . , an and
an+1 an+2 , . . . , a2n , respectively. Then we interlace them as follows. We put the first
card of the second pile first, then the first card of the first pile, then the second card of
the second pile, then the second card of the first pile etc. This is called the interlacing
shuffle. After this operation the order of cards will be:
an+1 a1 an+2 a2 . . . a2n an .
We put the permutation

1 2 3 ... n n + 1 n + 2 ... 2n
σn =
2 4 6 . . . 2n 1 3 . . . 2n − 1
3.1 Permutations 85
in correspondence to this shuffle. All it says is that the first card goes to the second
position, the second card is moved to the fourth position, etc. We see that we can
define this permutation by the formula:
σn (i) = 2i mod 2n + 1
and σn (i) is the position of the ith card after the shuffle. What will happen after
2, 3, 4, . . . shuffles? The resulting change will be characterized by the permutations
σn2 , σn3 , σn4 , . . . , respectively.
Example 3.1.11 For n = 4

1 2 3 4 5 6 7 8
σ4 = = 1 2 4 8 7 5 3 6 .
2 4 6 8 1 3 5 7
The order of σ4 is 6.
Example 3.1.12 For n = 5

1 2 3 4 5 6 7 8 9 10
σ5 =
2 4 6 8 10 1 3 5 7 9

= 1 2 4 8 5 10 9 7 3 6 .
Also σ510 = id and 10 is the order of σ5 . Hence all cards will be back to their initial
positions after 10 shuffles but not before.
Let us deal with the real thing that is the deck of card of 52 cards. We know that
the interlacing shuffle is defined by the equation σ26 (i) = 2i mod 53. GAP helps us
to investigate. We have:
gap> lastrow:=[1..52];;
gap> for i in [1..52] do
> lastrow[i]:=2*i mod 53;
> od;
gap> lastrow;
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40,
42, 44, 46, 48, 50, 52, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27,
29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51 ]
gap> PermList(lastrow);
(1,2,4,8,16,32,11,22,44,35,17,34,15,30,7,14,28,3,6,12,24,48,43,33,13,26,52,51,
49,45,37,21,42,31,9,18,36,19,38,23,46,39,25,50,47,41,29,5,10,20,40,27)
gap> Order(last);
52
Thus the interlacing shuffle σ26 is a cycle of length 52 and has order 52.
Exercises
1. A shuffle of a deck of 15 cards is made as follows. The top card is put at the bottom,
the deck is cut into three equal decks, the bottom third is switched with the middle
86 3 Groups
third, and then the resulting bottom card is placed on the top. How many times
must this shuffle be repeated to get the cards back in the initial order? Write
down the permutation corresponding to this shuffle and find its decomposition
into disjoint cycles.
2. Use GAP to determine the decomposition into disjoint cycles and the order of the
interlacing shuffle σ52 for the deck of 104 cards which consists of two copies of
ordinary decks with 52 cards in each.
3. On a circle there are n beetles. At a certain moment they start to move all at once
and with the same speed (but maybe in different directions). When two beetles
meet, both of them reverse their directions and continue to move with the same
speed. Prove that there will be a moment when all beetles again occupy their
initial positions. (Hint: Suppose one beetle makes the full circle in time t. Think
about what will happen after time t when all beetles move.)
3.1.6 Transpositions. Even and Odd
Cycles of length 2 are the simplest permutations, as they move only two elements.
We define
Definition 3.1.4 A cycle of length 2 is called a transposition.
It is intuitively plausible that any permutation is a product of transpositions (indeed,

every arrangement of n objects can be obtained from a given starting position by
making a sequence of swaps). We will observe, first, that a cycle of arbitrary length
can be expressed as a product of transpositions. Then using Theorem 3.1.3 we will be
able to express any permutation as product of transpositions. Here are some examples:
Example 3.1.13 (1 2 3 4 5) = (1 2)(1 3)(1 4)(1 5) (just check that the left-hand
side equals the right-hand side!).
We can express an arbitrary cycle as a product of transpositions in exactly the same

way:
(i1 i2 . . . ir ) = (i1 i2 )(i1 i3 ) . . . (i1 ir ). (3.5)
Example 3.1.14 To express any permutation σ as product of transpositions, first

decompose σ into a product of disjoint cycles, then write the cycles as a product of
transpositions as in formula (3.5). For example,

1 2 3 4 5 6 7 8 9 10 11
= (1 4 11)(2 3)(5 8 6 9 7)
4 3 2 11 8 9 5 6 7 10 1
= (1 4)(1 11)(2 3)(5 8)(5 6)(5 9)(5 7).

3.1 Permutations 87
Example 3.1.15 Note that there are many different ways to write a permutation as
product of transpositions; for example, (1 2 3 4 5) can be written in any of the
following forms
(1 2)(1 3)(1 4)(1 5) = (3 4)(3 5)(3 1)(3 2)

= (3 4)(3 5)(2 3)(1 3)(2 3)(2 1)(3 1)(3 2).
(Don’t ask how these products were found! The point is to check that all these
products are equal, and to note that there is nothing unique about how one can write
a permutation as a product of transpositions.)
However there is something in common in all decompositions of a given permuta-

tion into a product of transpositions. As we will see, the number of such transpositions
will be either always even or always odd.
Definition 3.1.5 A permutation is called even if it can be written as a product of an

even number of transpositions. A permutation is called odd if it can be written as a
product of an odd number of transpositions.
To make this definition meaningful we need to prove that there is no permutation

which is at the same time even and odd—this justifies the use of the terminology.
We will establish this by looking at the polynomial

f (x1 , x2 , . . . , xn ) = (xi − xj ).
i<j
For example, for n = 4 we get a polynomial
f (x1 , x2 , x3 , x4 ) = (x1 − x2 )(x1 − x3 )(x1 − x4 )(x2 − x3 )(x2 − x4 )(x3 − x4 ).
It is clear that for π = (i i+1) we have
f (xπ(1) , xπ(2) , . . . , xπ(n) ) = −f (x1 , x2 , . . . , xn ) (3.6)
since all brackets will remain except (xi − xi+1 ), which will become (xi+1 − xi ) =
−(xi − xi+1 ), so we will have one change of sign.
Arguing by induction we suppose that (3.6) is true for all permutations π = (i j)
for which |j − i| < . Suppose now that |j − i| = . Since
(i j) = (j−1 j)(i j−1)(j−1 j)
we conclude that (3.6) holds for the transposition π = (i j) with |j − i| = too.

Hence (3.6) holds for any product of an odd number of transpositions. It is now also
clear that
f (xπ(1) , xπ(2) , . . . , xπ(n) ) = +f (x1 , x2 , . . . , xn ) (3.7)
88 3 Groups
whenever π is a product of an even number of transpositions. This implies that there

is no permutation which is both even and odd.
Example 3.1.16 (1 2 3 4) is an odd permutation, because (1 2 3 4) = (1 2)(1 3)
(1 4). On the other hand, the permutation (1 2 3 4 5) is even, because (1 2 3 4 5) =
(1 2)(1 3)(1 4)(1 5).
Example 3.1.17 Since id = (1 2)(1 2), the identity permutation is even.

1 2 3 4 5 6 7 8 9
Example 3.1.18 Let π = . Is π even or odd?
4 3 2 5 1 6 9 8 7
First we decompose π into a product of cycles, then use the result above:
π = (1 4 5)(2 3)(7 9) = (1 5)(1 4)(2 3)(7 9).
This shows that π is even.

Theorem 3.1.5 A k-cycle is even if k is odd and odd if k is even.
Proof Immediately follows from (3.5). �
Definition 3.1.6 We say that two permutations have the same parity if they are both
odd or both even and different parity if one of them is odd and another is even.
Theorem 3.1.6 In any symmetric group Sn
(i) The product of two even permutations is even.
(ii) The product of two odd permutations is even.
(iii) The product of an even permutation and an odd one is odd.
(iv) A permutation and its inverse have the same parities.
Proof Only statement (iv) needs a comment. It follows from (iii). Indeed, for any
permutation π we have ππ −1 = id, and, since the identity permutation is even, by
(iii), π and π −1 cannot have different parities. �
Theorem 3.1.7 Exactly half of the elements of Sn are even and half of them are odd.
Proof Denote by E the set of even permutations in Sn , and by O the set of odd
permutations in Sn . If τ is any fixed transposition from Sn , we can establish a one-
to-one correspondence between E and O as follows: for π in E we know that τ π
belongs to O. Therefore we have a mapping f : E → O defined by f (π) = τ π. The
function f is one-to-one since τ π = τ σ implies that π = σ; f is onto, because if κ
is an odd permutation then τ κ is even, and f (τ κ) = τ τ κ = κ. �
n!
Corollary 3.1.1 The number of even permutations in Sn is 2. The number of odd
permutations in Sn is also n!2 .
Corollary 3.1.2 The set An of all even permutations of degree n is a group relative
to the operation of composition called the alternating group of degree n.
3.1 Permutations 89
Example 3.1.19 We can have a look at the elements of S4 , listing all of them, checking
which of them are even and which of them are odd.
S4 = {id, (1 2 3), (1 3 2), (1 2 4), (1 4 2), (2 3 4), (2 4 3),

(1 3 4), (1 4 3), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3),
(1 2), (1 3), (1 4), (2 3), (2 4), (3 4), (1 2 3 4), (1 4 3 2),
(1 3 2 4), (1 4 2 3), (1 2 4 3), (1 3 4 2)}.
The elements in the first two lines are even permutations, and the remaining elements
are odd. We have
A4 = {id, (1 2 3), (1 3 2), (1 2 4), (1 4 2), (2 3 4), (2 4 3),

(1 3 4), (1 4 3), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}.
Exercises
1. Write the permutations
(1 3 7)(5 8)(2 4 6 9), (1 3 7)(5 7 8)(2 3 4 6 9)
as products of transpositions.
2. What would be the parity of the product of 11 odd permutations?
3. Let π, ρ ∈ Sn be two permutations. Prove that π and ρ−1 πρ have the same parity.
4. Let π, ρ ∈ Sn be two permutations. Prove that π −1 ρ−1 πρ is an even permutation.
5. Determine the parity of the permutation σ of order n such that σ(i) = n + 1 − i.
3.1.7 Puzzle 15
We close this section with a few words about a game played with a simple toy. This
game seems to have been invented in the 1870s by the famous puzzle-maker Sam
Loyd. It caught on and became all the rage in the United States in the 1870s, and
finally led to a discussion by W. Johnson in the scholarly journal, the American
Journal of Mathematics, in 1879. It is often called the 15-puzzle.
Consider a toy made up of 16 squares, numbered from 1 to 15 inclusive and with
the lower right-hand corner blank.
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15
90 3 Groups
The toy is constructed so that the squares can be slid vertically and horizontally,
such moves being possible because of the presence of the blank square. Start with
the position shown above and perform a sequence of slides in such a way that, at
the end, the lower right-hand square is again blank. Call the new position realisable.
The natural question is: How can we determine whether or not the given position is
realisable?
What do we have here? After a sequence of slides we have shuffled the numbers
from 1 to 15; that is, we have effected a permutation of the numbers from 1 to 15. To
ask which positions are realisable is merely to ask which permutations can be carried
out. This is a permutation of S16 since the blank square also moves in the process. In
other words, in S16 , the symmetric group of degree 16, which permutations can be
reached via the toy? For instance, can the following position
13 4 12 15
1 14 9 6
8 3 2 7
10 5 11
be realised?
We will denote the empty square by the number 16. The position will be then
a1 a2 a3 a4
a5 a6 a7 a8
a9 a10 a11 a12
a13 a14 a15 a16
characterised by the permutation

1 2 . . . 16
.
a1 a2 . . . a16
Example 3.1.20 The position
1 3 5 7
9 11 13 15
2 4 6
8 10 12 14
3.1 Permutations 91
will correspond to the permutation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
σ= .
1 3 5 7 9 11 13 15 2 4 16 6 8 10 12 14
If we make a move pulling down the square 13, then the new position will be
1 3 5 7
9 11 15
2 4 13 6
8 10 12 14
and the new permutation will be

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
= σ (13 16).
1 3 5 7 9 11 16 15 2 4 13 6 8 10 12 14
We observethe rule the permutation changes: when we swap the square with the
number i on it with the neighboring empty square, the permutation is being multiplied
on the right by the transposition (i 16).
Theorem 3.1.8 If a position characterised by the permutation σ can be transformed

by legal moves to the initial position, then there exist transpositions τ1 , τ2 , . . . , τm
such that
id = σ τ1 τ2 . . . τm . (3.8)
If the empty square was initially in the right bottom corner, then m is even and σ is
even.
Proof Suppose that a position characterised by the permutation σ can be transformed

by legal moves to the initial position. As we noted in Example 3.1.20, every legal move
is equivalent to a multiplication by a transposition (i 16) for some i ∈ {1, 2, . . . , 15}.
Since the initial position is characterised by the identity permutation, we see that
(3.8) follows. This implies
σ = τm τm−1 . . . τ2 τ1
from which we see that the parity of σ is the same as the parity of m.
Let us colour the board in the chessboard pattern.
92 3 Groups
Every move changes the colour of the empty square. Thus if at the beginning and
at the end the empty square was black, then there was an even number of moves made.
Therefore, if initially the right bottom corner was empty and we could transform this
position to the initial position, then an even number of moves was made, m is even,
and σ is also even. �
It can be shown that every position, with an even permutation σ can be transformed
to the initial position, but no easy proof is known.
Exercises
1. Given the two positions in the 15-puzzle
14 10 13 12 10 14 13 12
6 11 9 8 6 11 9 8
7 3 5 1 7 3 5 1
4 15 2 4 15 2
show that one of them is realisable and one is not without writing down the
corresponding permutations and determining their parities.
2. For each of the following arrangements of the 15-puzzle determine the parity of
the corresponding permutation.
1 3 2 4 13 5 3
6 5 7 8 9 2 7 10
9 13 15 11 1 15 14 8
14 10 12 12 11 6 4
Which one is realisable and which is not.

3.2 General Groups 93
3.2 General Groups
3.2.1 Definition of a Group. Examples
Surprisingly many objects in mathematics satisfy the same properties as the symmet-
ric groups defined in Definition 3.1.2. There is good reason to study all such objects
simultaneously. For this purpose we introduce the concept of a general group.
Definition 3.2.1 A set G together with a binary operation ∗ is called a group if it

satisfies the following three properties:
1. the operation ∗ is associative; i.e.,
(a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ G.
2. G contains an identity element; i.e., there exists an element e ∈ G such that
e∗g =g∗e=g for all g ∈ G.
(This element is often also denoted by 1, or, if the group operation is written as
addition, it is usually denoted by 0.)
3. Every element of G possesses an inverse; i.e., given g ∈ G there exists a unique
element h in G such that
g∗h=h∗g =e (e is the identity element of G).
The element h is called the inverse of g, and denoted by g −1 (when the operation
is written as addition, the inverse is usually denoted by −g).
We denote this group (G, ∗), or simply G, when this invites no confusion. A group
G in which the commutative law holds (a ∗ b = b ∗ a for all a, b ∈ G) is called a
commutative group or an abelian group.
In any group (G, ∗) we have the familiar formula for the inverse of the product
(a ∗ b)−1 = b−1 ∗ a−1
for all a, b ∈ G. This can be proved in the same way as (3.2).
Example 3.2.1 We established in the previous sections that Sn is a group, the opera-
tion being multiplication of permutations (i.e., composition of functions). This group
is not abelian.
Example 3.2.2 Here is an example where the group operation is written as addi-
tion: Zn is an abelian group under addition ⊕ modulo n. This was established in
Theorem 1.4.1.
94 3 Groups
Example 3.2.3 Z∗n (the set of invertible elements in the ring Zn ) is a group under
multiplication modulo n. In particular, Z∗8 = {1, 3, 5, 7} with 3−1 = 3, 5−1 = 5,
7−1 = 7.
When we talk about a group it is important to be clear about the group operation;
either it must be explicitly specified, or the group operation must be clear from
the context and tacitly understood. The following are cases where there is a clear
understanding of the operation, so it will often not be made explicit. Most important
are:
• When we talk about the group Zn we mean the set of integers modulo n under
addition modulo n.
• When we talk about the group Z∗n we mean the set of invertible elements of the
ring Zn under multiplication modulo n.
Normally, when making general statements about groups, we write the statements
in multiplicative notation; but it is important to be able to apply them also in situations
where the group operation is written as addition (some obvious modifications must
be made).
Definition 3.2.2 Let G be a group and e be its identity element. The number of
elements of G is called the order of G and is denoted |G|.
Example 3.2.4 Orders of several groups:

• Sn is a group of order n!.
• Zn is a group of order n.
• Z∗n is a group of order φ(n), where φ is Euler’s totient function; for example,
|Z∗12 | = 4.
• Z is an infinite group.
• The positive integers R+ with the usual operation of multiplication of the reals is
also an infinite group.
Exercises
1. Show that division a b = a/b is a binary operation on R\{0}. Show that it is not
associative.
2. Show that a b = ab is a binary operation on the set R+ of positive real numbers.
Show that it does not have a neutral element.
3. Let Cn be the set of all complex numbers satisfying the equation zn = 1. Prove
that this is an abelian group of order n.
4. Prove that the set GLn (R) of all invertible n × n matrices is a non-abelian group.
5. Prove that for four arbitrary elements g1 , g2 , g3 , g4 of a group G (where the
operation is written as multiplication)
(g1 g2 )(g3 g4 ) = (g1 (g2 g3 ))g4 .

List all possible arrangements of brackets on the product g1 g2 g3 g4 and show that
the result will be always the same so that we can write
g1 g2 g3 g4
for all of them without confusion. Finally you may try to prove that a product
g1 g2 . . . g n
involving n ≥ 3 elements is independent of the way in which these elements are

combined and associated. Hint: You need to use a clever induction here.
3.2.2 Powers, Multiples and Orders. Cyclic Groups
Definition 3.2.3 Let G be a group whose operation is written multiplicatively, g an

element of G, e the identity element of G, and n ∈ Z. We define
⎧
⎪ gg · · · g if n > 0,
⎪
⎪
⎪
⎨ n times
gn = e if n = 0,
⎪
⎪ −1 −1 −1
⎪g g ···g
⎪ if n < 0.
⎩
|n| times
Since we know that the product g1 g2 . . . gn is independent of the way in which these
elements are associated, it becomes clear that the usual law of exponents g i g j = g i+j
holds (totally obvious in the case where both i and j are positive, and still trivial in
all other cases). The set of all powers of g ∈ G we denote by < g >.
Definition 3.2.4 Let G be a group whose operation is written additively, g an element

of G, 0 the identity element of G, and n ∈ Z. We define
⎧
⎪ g + g··· + g if n > 0,
⎪
⎪
⎪
⎨ n times
ng = 0 if n = 0,
⎪
⎪ (−g) + (−g) + · · · + (−g) if n < 0.
⎪
⎪
⎩
|n| times
The usual law of multiples mg + ng = (m + n)g also holds. The set of all
multiples of g ∈ G we also denote by < g >.
Definition 3.2.5 Any group G which consists of powers (multiples) of a single

element g is called cyclic. This fact can be written as G = < g >. The element g in
this case is called the generator of G.
96 3 Groups
We note that every cyclic group is abelian since g i g j = g j g i and mg + ng =

ng + mg.
Example 3.2.5 Several examples:
• Sn is NOT a cyclic group since it is not abelian.
• Zn = < 1 > and is cyclic.
• Z∗5 = < 2 > and is cyclic. Check this by calculating all multiples of 2.
• Z = < 1 > is an infinite cyclic group.
Later (see, e.g., Exercise 4) we will see that abelian groups do not have to be
cyclic.
Definition 3.2.6 Let G be a group and e be its identity element. Then the order of g
in G is the least positive integer i such that g i = e, if such an integer exists; otherwise
we say that the order of g is infinite. It is denoted ord (g).
We note that this definition is consistent with the definition of the order of a
permutation given earlier.
In an additively written group G the order of g ∈ G is the least positive integer m
such that mg = 0, if such an integer exists; if no such integer exists, we say that the
order of g is infinite.
Example 3.2.6 Confirm for yourself that:
• Each of the non-identity elements of Z∗12 have order 2;
• In Z12 the element 10 has order 6;
• Element 6 in the group Z has infinite order.
As we will see later, in a finite group G, the orders of its elements and the order
of the group |G| are closely related.
We start to establish this link with the following
Lemma 3.2.1 If ord (g) = n, then < g > = {e, g, g 2 , . . . , g n−1 }, and all n powers
of g in this set are distinct, i.e., | < g > | = n. Conversely, if | < g > | = n, then g
is an element of order n.
Proof Suppose ord (g) = n. Then g n = e and all powers of g belong to the set
{e, g, g 2 , . . . , g n−1 }. Indeed, for any k ∈ Z we may divide k by n with remainder
k = qn + r, where 0 ≤ r < n. Then g k = g qn+r = g qn g r = (g n )q g r = g r ,
which belongs to {e, g, g 2 , . . . , g n−1 }. Hence < g > = {e, g, g 2 , . . . , g n−1 }, On the
other hand, if any two powers in this set are equal, say g i = g j with i < j, then
g j = g i g j−i = g i and g j−i = e. This is a contradiction since j − i < n and n is
the order of g. Therefore, if the order of g is finite, the order of g is the same as the
cardinality of the set < g >.
Suppose now that the cardinality of < g > is n. Then we have only n distinct
powers of g and there will exist two distinct integers k and m such that g k = g m .
If we assume that k > m then we will find that g k−m = e, and g will have finite
order. We have already proved that in this case the order of g and the size of < g >
coincide. Hence ord (g) = n. �
In the following corollary we find a link between the two concepts of ‘order’. It
is often useful since we can decide whether a group is cyclic or not by looking at the
orders of its elements.
Corollary 3.2.1 For a group of order n to be cyclic, it is necessary and sufficient

that it has an element of order n.
Example 3.2.7 Z∗8 is NOT a cyclic group, because |Z∗8 | = φ(8) = 4 and there is no
element of order 4 in this group (indeed, check that they all have order 2).
The following theorem is an important tool that allows us to calculate orders of

elements in Zn .
n
Theorem 3.2.1 The order of i ∈ Zn is ord (i) = .
gcd(i, n)
Proof To see this, note that the group is written additively, so the order of i is the
smallest positive integer k such that ki ≡ 0 mod n. That is, ki is the smallest positive
number which is a multiple of i as well as of n. This means that ki = lcm(i, n). Now
solve this equation for k using (1.13):
lcm(i, n) in n
k= = = .
i igcd(i, n) gcd(i, n)
This proves the theorem. �
Example 3.2.8 The order of 110 in Z121 is
121 121
ord (110) = = = 11.
gcd(121, 110) 11
Exercises
1. Find the orders of elements 5, 1331, 594473 in Z16427202 .

2. Find all elements of order 7 in Z84 .
3. Find the order of i = 41670852902912 in the abelian group Zn , where n =
563744998038700032.
4. Show that Z∗12 is an abelian group which is not cyclic.
5. Show that the order of the interlacing shuffle σn (defined in Sect. 3.1.5) is equal
to the order of 2 in Z∗2n+1 .
3.2.3 Isomorphism
A single group may have several very different presentations. To deal with this
problem mathematics introduces the concept of isomorphism.
98 3 Groups
Definition 3.2.7 Let G and H be two groups with operations ∗ and ◦, respectively.
An onto and one-to-one mapping σ : G → H is called an isomorphism if
σ(g1 ∗ g2 ) = σ(g1 ) ◦ σ(g2 ) (3.9)
for all g1 , g2 ∈ G.
What it says is that if we rename the elements of H appropriately and change the
name for the operation in H, we will obtain the group G. If two groups G and H are
isomorphic, we write G ∼ = H. The equation (3.9) written as
g1 ∗ g2 = σ −1 (σ(g1 ) ◦ σ(g2 ))
also has a computational interpretation. It says that instead of computing g1 ∗ g2 for

elements g1 and g2 in the group G one can compute σ(g1 ) ◦ σ(g2 ) for the images
σ(g1 ) and σ(g2 ) of these elements in H and take the preimage of the result.
Example 3.2.9 A classical, well known example of an isomorphism is the isomor-

phism of the group R, which is the reals with the operation of addition, and the group
R+ , which is the positive reals with the operation of multiplication. The isomorphism
σ : R → R+ between these two groups is given by σ(x) = ex . Indeed, the condition
(3.9) is satisfied since
σ(x + y) = ex+y = ex ey = σ(x)σ(y).
The famous slide rule—a commonly used calculation tool in science and engineering
before electronic calculators became available—was based on this isomorphism.
Example 3.2.10 We claim Z4 ∼

= Z∗5 . Let us look at their addition and multiplication
tables, respectively.
Z4 Z∗5
⊕ 0 1 2 3 1 2 3 4
0 0 1 2 3 1 1 2 3 4
1 1 2 3 0 2 2 4 1 3
2 2 3 0 1 3 3 1 4 2
3 3 0 1 2 4 4 3 2 1
We may observe that the first table can be converted into the second one if we
make the following substitution:
0 → 1, 1 → 2, 2 → 4, 3 → 3
(check it right now). Therefore this mapping, let us call it σ, from Z4 to Z∗5 is an
isomorphism. The mystery behind this mapping is clarified if we notice that we
actually map
0 → 20 , 1 → 21 , 2 → 22 , 3 → 23 .
Then the isomorphism property (3.9) follows from the formula 2i 2j = 2i⊕j .
Before continuing with the study of isomorphisms we make a useful observation:
in any group G the only element that satisfies g 2 = g is the identity element. Indeed,
multiplying this equation by g −1 we get g = e.
Proposition 3.2.1 Let (G, ∗) and (H, ◦) be two groups and e be the identity element
of G. Let σ : G → H be an isomorphism of these groups. Then σ(e) is the identity
of H.
Proof Let σ(e) =

, where e is the identity element of G. Let us prove that
is the
identity element of H. We note that
2 = σ(e)2 = σ(e2 ) = σ(e) =
. Any element
in a group with this property must be the identity, so
is the identity of H. �
Theorem 3.2.2 Let (G, ∗) and (H, ◦) be two groups and σ : G → H be an isomor-
phism. Then σ −1 : H → G is also an isomorphism.
Proof We need to prove
σ −1 (h1 ◦ h2 ) = σ −1 (h1 ) ∗ σ −1 (h2 ) (3.10)
for all h1 , h2 ∈ H. For this reason we apply σ to both sides of this equation. As
σσ −1 = idG and σ −1 σ = idH , and due to (3.9)
σ(σ −1 (h1 ◦ h2 )) = h1 ◦ h2 ,
σ(σ −1 (h1 ) ∗ σ −1 (h2 )) = σ(σ −1 (h1 )) ◦ σ(σ −1 (h2 )) = h1 ◦ h2 .
The result is the same both times. As σ is one-to-one, (3.10) is proven. �
Any isomorphism preserves the orders of elements.
Theorem 3.2.3 Let σ : G → H be an isomorphism and g be an element of G of

finite order. Then ord (g) = ord (σ(g)).
Proof By Proposition 3.2.1 σ(e) =

, where e is the identity element of G and
is
the identity of H. Suppose now ord (g) = n. Then g n = e. Let us now apply σ to
both sides of this equation. We obtain σ(g)n = σ(g n ) = σ(e) =
, from which we
see that ord (σ(g)) ≤ n, i.e., ord (σ(g)) ≤ ord (g). Since σ −1 is also an isomorphism,
which takes σ(g) to g, we obtain ord (g) ≤ ord (σ(g)). This proves the theorem. �
We now move on to one of the main theorems of this section. The theorem will,
in particular, give us a tool for calculating orders of elements of cyclic groups which
are also written multiplicatively.
Theorem 3.2.4 Every cyclic group G of order n is isomorphic to Zn .

100 3 Groups
Proof Since G = <g> has cardinality n, by Lemma 3.2.1 we have ord (g) = n and
G = {g 0 , g 1 , g 2 , . . . , g n−1 }. We define σ : Zn → G by setting σ(i) = g i . Then
σ(i ⊕ j) = g i⊕j = g i+j = g i g j = σ(i)σ(j),
where ⊕ is addition modulo n. This checks (3.9) and proves that the mapping σ is
indeed an isomorphism. �
Now we can now reap benefits of Theorem 3.2.4.
Corollary 3.2.2 Let G be a multiplicative cyclic group and G = < g >, where g is
an element of order n. Then
n
ord (g i ) = . (3.11)
gcd(i, n)
Proof This now follows from the theorem we have just proved and Theorems 3.2.1
and 3.2.3. Indeed, the order of g i in G must be the same as the order of i in Zn . �
Exercises
1. Let σ : G → H be an isomorphism and g be an element of G. Prove that σ(g −1 ) =
σ(g)−1 .
2. Let Cn be the group of all complex numbers satisfying the equation zn = 1. Prove
that Cn ∼= Zn .
3. Prove that the multiplicative group of complex numbers C∗ is isomorphic to the
group of matrices
a −b
G= | a, b ∈ R
b a
under the usual multiplication of matrices.

4. Both groups G 1 = Z∗191 and G 2 = Z∗193 are cyclic (do not try to prove this).
Which of these groups contains elements of order 19? How many?
5. Knowing that 2 is a generating element for the cyclic group Z∗211 determine the
order of 2150 in Z∗211 .
6. 264 is a generator of Z∗271 , i.e., the (multiplicative) order of 264 in Z271 is 270,
as is shown by the following calculation:
gap> OrderMod(264,271);
270
Without GAP determine the multiplicative order of 26472 in Z∗271 .
3.2.4 Subgroups
Definition 3.2.8 Let G be a group. We say that a subset H of G is a subgroup of G

if it satisfies the following properties:
1. H contains the identity element of G.

2. H is closed under the group operation; i.e., if a and b belong to H, then ab also
belongs to H.
3. H is closed under inverses; i.e., if a belongs to H then a−1 also belongs to H.
We write H ≤ G to denote that H is a subgroup of G. If G is any group, then

G ≤ G and {e} ≤ G. These are trivial examples. Let us consider a non-trivial one.
Firstly, we would like to introduce a construction which, given an element g ∈ G
will always give us a subgroup containing this element. Moreover, this subgroup will
be the smallest subgroup with this property. This is the familiar < g > = {g i | i ∈ Z},
which is the set of all powers of g.
Proposition 3.2.2 Let G be a group, g ∈ G. Then < g > is a subgroup of G. This

is the smallest subgroup of G that contains g.
Proof To decide whether or not < g > is a subgroup, we must answer three questions:
• Does the identity e of G belong to < g >? The answer is YES, because g 0 = e
and < g > consists of all powers of g.
• If x, y ∈< g >, does xy also belong to < g >?
x ∈< g > means that x = g i for some integer i; similarly, y = g j for some integer
j. Then xy = g i g j = g i+j , which shows that xy is a power of g and therefore
belongs to < g >.
• If x ∈< g >, does x −1 also belong to < g >?
x ∈< g > means that x = g i for some integer i; then x −1 = g −i , i.e., x −1 is also
a power of g and therefore belongs to < g >.
So < g > is indeed a subgroup. It is the smallest subgroup containing g ∈ G since
any subgroup that contains g must also contain all powers of g. �
We will call < g > the subgroup generated by g ∈ G.

Another example gives us a subgroup of a non-commutative group.
Example 3.2.11 Let us establish that the set of permutations V = {e, a, b, c} ⊂ S4 ,

where e is the identity permutation and
a = (1 2)(3 4), b = (1 3)(2 4), c = (1 4)(2 3),
is a subgroup of S4 . This statement makes the following claims:

1. The identity e belongs to V . This is obvious.
2. The product of two elements of V also belongs to V . We check:
ab = ba = c, bc = cb = a, ac = ca = b, a2 = b2 = c2 = e,
and see that this is true.

3. V is closed under taking inverses. This is also true since a−1 = a, b−1 = b,
c−1 = c.
102 3 Groups
We see that V is indeed a subgroup of S4 . This group is known as the Klein four-group.
Additional information about orders may be extracted using Lagrange’s Theorem.
We will state and prove this theorem below, but first we need to introduce the cosets
of a subgroup. Let G be a group, H a subgroup of G, and g ∈ G. The set gH = {gh |
h ∈ H} is called a left coset of H and the set Hg = {hg | h ∈ H} is called a right
coset of H.
Example 3.2.12 Let us consider G = S4 and H = V , the Klein four-group which
is a subgroup of S4 . Let g = (12). Then the corresponding left coset consists of the
permutations
(12)V = {(12), (34), (1 4 2 3), (1 3 2 4)}.
Indeed, (12) = (12) e, (34) = (12) a, (1 4 2 3) = (12) b, (1 3 2 4) = (12) c.

Proposition 3.2.3 If H is finite, then |gH| = |Hg| = |H| for any g ∈ G.
Proof We need to prove that all elements gh are different, i.e., if gh1 = gh2 , then
h1 = h2 . This is obvious since we can multiply both sides of the equation gh1 = gh2
by g −1 on the left. This proves |gH| = |H|. The proof of |Hg| = |H| is similar. �
We are now ready to state and prove Lagrange’s Theorem.
Theorem 3.2.5 (Lagrange’s Theorem) Let G be a finite group, H a subgroup of G.
Then the order of H is a divisor of the order of G.
Proof Our proof relies on the decomposition of G into a disjoint union of left cosets
of H, all of which have the same number of elements, namely |H|. Let us prove that
such decomposition exists. All we need to show is that any two cosets are either
disjoint or coincide.
Suppose the two cosets aH and bH have a nonzero intersection, i.e., ah1 = bh2
for some h1 , h2 ∈ H. Then b−1 a = h2 h1−1 ∈ H. In this case any element ah ∈ aH
can be expressed as b(b−1 a)h, where (b−1 a)h belongs to H. This proves aH ⊆ bH
and hence aH = bH as both sets have the same cardinality. Hence these cosets must
coincide. We obtain a partition of G into a number of disjoint cosets each of which
has cardinality |H|. If k is the number of cosets in the partition, then in total G has
k|H| elements. This proves the theorem. �
Corollary 3.2.3 The order of an element g of a finite group G is a divisor of the
order of G. In particular, g |G| = 1.
Proof Just note that by Lemma 3.2.1 the order of an element g ∈ G equals the order
of the subgroup < g > of G. Then Lagrange’s theorem implies that the order of g
is a divisor of |G|. Let ord (g) = m, |G| = n and n = mk for some integer k. Then
g n = g mk = (g m )k = 1k = 1. �
Example 3.2.13 Find the order of the element 2 ∈ Z∗17 .
A naive approach is to calculate all powers of 2, until one such power is found
to be the identity. We have a more economical way to find the order: since Z∗17 has
16 elements, it is sufficient to calculate all the powers 2i where i is a divisor of 16
until the result equals 1. We know that 216 mod 17 = 1 and we need to calculate
only 22 mod 17, 24 mod 17, and 28 mod 17. Our calculations will terminate when
we find that 28 ≡ 1 mod 17; the order of 2 in Z∗17 is therefore 8.
Example 3.2.14 Find out if 2 is a generator of the group Z∗13 .

The question asks: is the order of 2 in Z∗13 equal to 12? Now we see that it is
not necessary to calculate each of the powers of 2, but only those powers 2i where
i is a divisor of 12 (which is φ(13) as the order of the (multiplicative) group Z∗n
is φ(n)). So we calculate 22 mod 13, 23 mod 13, 24 mod 13, and 26 mod 13.
If none of them turn out to be 1, then we can be sure that the order of 2 in Z∗13 is
12, and that 2 is a generator of the group Z∗13 (which is therefore cyclic). It turns
out that 2k mod 13 ≡ 1 for k = 2, 3, 4, 6 and therefore 2 is indeed a generator
of Z∗13 .
Exercises
1. Let SLn (R) be the set of all real matrices with determinant 1. Prove that this is a
subgroup of GLn (R).
2. Let m, n be positive integers and let m be a divisor of n. Prove that Cm is a subgroup
of Cn .
3. Prove that a cyclic group G of order n has exactly φ(n) generators, i.e., elements
g ∈ G such that G =< g >.
4. Let G be a finite group with |G| even. Prove that it contains an element of order
2.
5. Prove that any finite subgroup of the multiplicative group C of the field C of
complex numbers is cyclic.
3.3 The Abelian Group of an Elliptic Curve
During the last 20 years, the theory of elliptic curves over finite fields has been
found to be of great value to cryptography. As methods of factorisation of integers
are getting better and computers are getting more powerful, to maintain the same
level of security the prime numbers p and q in RSA have to be chosen bigger and
bigger, which slows calculations down. The idea of using elliptic curves over finite
fields belongs to Neal Koblitz [1] and Victor Miller [2] who in 1985 independently
proposed cryptosystems based on groups of points of elliptic curves. By now their
security has been thoroughly tested and in 2009 the National Security Agency of
the USA stated that “Elliptic Curve Cryptography provides greater security and
more efficient performance than the first generation public key techniques (RSA and
Diffie–Hellman) now in use”. Some researchers also see elliptic curves as the source
of cryptosystems for the next generation. Certicom www.certicom.com was the first
company to market security products using elliptic curve cryptography.
104 3 Groups
3.3.1 Elliptic Curves. The Group of Points of an Elliptic Curve
Elliptic curves are not ellipses and do not look like them. They received their name due
to their similarities with denominators of elliptic integrals that arise in calculations
of the arc length of ellipses.
Definition 3.3.1 Let F be a field, and a, b be scalars in F such that the cubic X 3 +
aX + b has no multiple roots. An elliptic curve E over a field F is the set of solutions
(X, Y ) ∈ F 2 to the equation
Y 2 = X 3 + aX + b, (3.12)
plus a “point at infinity” denoted by ∞.
When F is the field of real numbers the condition on the cubic can be expressed
in terms of a and b. Let r1 , r2 , r3 be the roots (maybe complex) of X 3 + aX + b, taken
together with their multiplicities, i.e., such that
X 3 + aX + b = (X − r1 )(X − r2 )(X − r3 ). (3.13)
Then it is possible to check that
d = (r1 − r2 )2 (r1 − r3 )2 (r2 − r3 )2 = −(4a3 + 27b2 ). (3.14)
This real number is called the discriminant of the cubic, and the cubic has no multiple
roots if and only if this discriminant is nonzero, i.e.,
d = −(4a3 + 27b2 ) = 0. (3.15)
This condition also guarantees the absence of multiple roots over an arbitrary field F.
Example 3.3.1 The equation
Y 2 = X 3 + 3X + 4
defines an elliptic curve over Z7 since the discriminant d = −(4a3 +27b2 ) = 6 = 0.

The point (5, 2) ∈ Z27 belongs to this curve since 22 ≡ 53 + 3 · 5 + 4 mod 7 with
both sides being equal to 4.
When F = R is a field of reals, the graph of an elliptic curve can have two different
forms depending on whether the cubic on the right-hand side of (3.12) has one or
three real roots (see Fig. 3.1).
3.3 The Abelian Group of an Elliptic Curve 105
Fig. 3.1 Graphs of elliptic curves a y2 = x(x + 1)(x − 1) and b y2 = x 3 + 1
Jacobi4 (1835) was the first to suggest using the group law on a cubic curve. In
this section we will introduce the addition law for points of the elliptic curve (3.12),
so that it will become an abelian group. We will do this first for elliptic curves over
the familiar field of real numbers. These curves have the advantage that they can be
represented graphically.
Definition 3.3.2 Let E be an elliptic curve over R and P = (x, y) ∈ E. Then we

define −P as the point (x, −y), which is symmetric to P about x-axis. It is clear that
(x, −y) ∈ E whenever (x, y) ∈ E.
Definition 3.3.3 Let E be an elliptic curve over R and P, Q ∈ E.

(a) Suppose that P = Q and that the line PQ is not parallel to the y-axis. Then PQ
intersects E at the third point R. Then we define P + Q = −R (see Fig. 3.2).
(b) Suppose that P = Q and the tangent line to the curve at P is not parallel to the
y-axis. Further, suppose that the tangent line to the curve at P intersects E at
the third point R. Then we define 2P = P + P = −R. If the tangent line has a
“double tangency” at P, i.e., P is a point of inflection, then R is taken to be P.
(c) Suppose that P = Q and PQ is parallel to the y-axis. Then we define P+Q = ∞.
(d) Suppose that P = Q and the tangent line to the curve at P is parallel to the y-axis.
Then we define 2P = P + P = ∞.
(e) For every P ∈ E (including P = ∞) we define P + ∞ = P.
4 Carl Gustav Jacob Jacobi (1804–1851) was a German mathematician who made fundamental
contributions to elliptic functions, dynamics, differential equations, and number theory.
106 3 Groups
Fig. 3.2 Adding points of an

elliptic curve
Theorem 3.3.1 The elliptic curve E over R relative to this addition is an (infinite)
abelian group. If P = (x1 , y1 ) and Q = (x2 , y2 ) are two points of E, then P + Q =
(x3 , y3 ), where
1. in case (a)

y2 − y1 2
x3 = − x1 − x2 , (3.16)
x2 − x1

y2 − y1
y3 = −y1 + (x1 − x3 ). (3.17)
x2 − x1
2. in case (b)
2
3x12 + a
x3 = − 2x1 , (3.18)
2y1

3x12 + a
y3 = −y1 + (x1 − x3 ). (3.19)
2y1
Proof First, we have to prove that the addition is defined for every pair of (not
necessarily distinct) points of E. Suppose we are in case (a), which means x1 = x2 .
Then we have to show that the third point R on the line PQ exists. The equation of
this line is y = mx + c, where m = yx22 −y
−x1 and c = y1 − mx1 . A point (x, mx + c) of
1
this line lies on the curve if and only if (mx + c)2 = x 3 + ax + b or
x 3 − m2 x 2 + (a − 2mc)x − (c2 − b) = 0. (3.20)

Since we already have two real roots of this polynomial x1 and x2 we will have the
third one as well. Dividing the left-hand side of (3.20) by (x − x1 )(x − x2 ) will give
the factorisation
x 3 − m2 x 2 + (a − 2mc)x − (c2 − b) = (x − x1 )(x − x2 )(x − x3 ),
where x3 is this third root. Knowing x1 and x2 , the easiest way to find x3 is to notice
that x1 + x2 + x3 = m2 , and express the third root as x3 = m2 − x1 − x2 . Since
m = yx22 −y 1
−x1 , this is exactly (3.16). Now we can also calculate y3 as follows
y3 = −(mx3 + c) = −mx3 − (y1 − mx1 ) = −y1 + m(x1 − x3 )
(remember (x3 , y3 ) represents −R, hence the minus). This will give us (3.17).
Case (b) is similar, except that m can now be calculated as the derivative dy/dx at
P. Implicit differentiation of (3.12) gives us
dy
2y = 3x 2 + a,
dx
or dy/dx = (3x 2 + a)/2y. Hence m = (3x12 + a)/2y1 . (We note that y1 = 0 in this
case.) This implies (3.18) and (3.19). �
It helps to visualise the point at infinity ∞ as located far up the y-axis. Then it
becomes the third point of intersection of any vertical line with the curve. Then (c),
(d), and (e) of Definition 3.3.2 will implement the same set of rules as (a) and (b),
for the case when the point at infinity is involved.
We deduced formulae (3.16)–(3.19) for the real field R but they make sense for
any field. Of course we have to remove references to parallel lines and interpret the
addition rule in terms of coordinates only.
Definition 3.3.4 Let F be a field and let E be the set of pairs (x, y) ∈ F 2 satisfying
(3.12) plus a special symbol ∞. Then for any (x1 , y1 ), (x2 , y2 ) ∈ E we define:
(a) If x1 = x2 , then (x1 , y1 ) + (x2 , y2 ) = (x3 , y3 ), where x3 , y3 are defined by
formulae (3.16) and (3.17).
(b) If y1 = 0, then (x1 , y1 ) + (x1 , y1 ) = (x3 , y3 ), where x3 , y3 are defined by
formulae (3.18) and (3.19).
(c) (x, y) + (x, −y) = ∞ for all (x, y) ∈ E (including the case y = 0).
(d) (x, y) + ∞ = ∞ + (x, y) = (x, y) for all (x, y) ∈ E.
(e) ∞ + ∞ = ∞.
Theorem 3.3.2 For any field F and for any elliptic curve
Y 2 = X 3 + aX + b, a, b ∈ F,
the set E with the operation of addition defined in Definition 3.3.4 is an abelian group.
108 3 Groups
Proof It is easy to check that the identity element is ∞ and the inverse of P = (x, y)
is −P = (x, −y). So two axioms of a group are obviously satisfied. It is not easy
to prove that the addition, so defined, is associative. We omit this proof since it is a
tedious calculation. �
Example 3.3.2 Suppose F = Z11 and the curve is given by the equation Y 2 = X 3 +7.
Then P = (5, 0) and Q = (3, 10) belong to the curve. We have
P + Q = (6, 5), 2Q = (3, 1), 2P = ∞.

y2 −y1 10 −1 1
Indeed, if P + Q = (x3 , y3 ), then m = x2 −x1 = −2 = −2 = 2 = 6 and
x3 = m2 − x1 − x2 = 3 − 5 − 3 = 6,
y3 = −y1 + m(x1 − x3 ) = 0 + 6(−1) = 5,
3·32 +0
so P + Q = (6, 5). Calculating 2Q = (x4 , y4 ), we get m = 9 = 3 and
x4 = m2 − 2x1 = 9 − 2 · 3 = 3,
y4 = −y1 + m(x1 − x4 ) = −10 + 3 · 0 = 1,
so 2Q = (3, 1). The last equation 2P = ∞ follows straight from the definition (part
(c) of Definition 3.3.4).
The calculations in the last exercise can be done with GAP. The program has
to read the files elliptic.gd and elliptic.gi first (given in the appendix).
Then the command EllipticCurveGroup(a,b,p); calculates the points of
the elliptic curve Y 2 = aX +b mod p. As you see below, GAP uses the multiplicative
notation for operations of elliptic curves:
Read("/.../elliptic.gd");
Read("/.../elliptic.gi");
gap> G:=EllipticCurveGroup(0,7,11);
EllipticCurveGroup(0,7,11)
gap> points:=AsList(G);
[ ( 2, 2 ), ( 2, 9 ), ( 3, 1 ), ( 3, 10 ), ( 4, 4 ), ( 4, 7 ), ( 5, 0 ),
( 6, 5 ), ( 6, 6 ), ( 7, 3 ), ( 7, 8 ), infinity ]
gap> P:=points[7];
( 5, 0 )
gap> Q:=points[4];
( 3, 10 )
gap> P*Q;
( 6, 5 )
gap> Qˆ2;
( 3, 1 )
gap> Pˆ2;
infinity
Exercises
1. Which of the following equations define an elliptic curve over Z13 :
Y 2 = X 3 + 4X + 11, Y 2 = X 3 + 6X + 11?
2. Prove that from Eq. (3.13) it follows that
r1 + r2 + r3 = 0, r1 r2 + r1 r3 + r2 r3 = a, r1 r2 r3 = −b. (3.21)
3. Prove (3.14) using (3.21).

4. Consider the elliptic curve E given by Y 2 = X 3 + X − 1 mod 7.
(a) Check that (1, 1), (2, 3), (3, 1), (4, 2), (6, 2) are points on E;
(b) Find another six points on this curve;
(c) Calculate −(2, 3), 2(4, 2), (1, 1) + (3, 1);
(d) Use GAP to show that E has 11 points in total.
5. Let F = Z13 and let the elliptic curve E be given by the equation Y 2 = X 3 +5X+1.
(a) Using GAP list all the elements of the abelian group E of this elliptic curve.
Hence find the order of the abelian group E.
(b) Find (manually) the order of P = (0, 1) in E. Is E cyclic?
6. Using GAP generate the elliptic curve Y 2 = X 3 + 7X + 11 in Z46301 . Determine
its order and check that it is cyclic.
3.3.2 Quadratic Residues and Hasse’s Theorem

Definition 3.3.5 Let F be a finite field. An element h ∈ F ∗ is called a quadratic
residue if there exists another element g ∈ F such that g 2 = h. Otherwise, it is called
a quadratic nonresidue.
Theorem 3.3.3 If F = Zp for p > 2, then exactly half of all nonzero elements of
the field Z∗p are quadratic residues.
Proof Since p is odd, p − 1 is even. Then all nonzero elements of Zp can be split
into pairs,
Zp \ {0} = {±1, ±2, . . . , ±(p − 1)/2}.
Since i2 = (−i)2 , each pair gives us only one quadratic residue, hence we cannot
have more than (p − 1)/2 quadratic residues. On the other hand, if we have x 2 = y2 ,
then x 2 − y2 = (x − y)(x + y) = 0. Due to the absence of zero divisors, we then
have x = ±y. Therefore we have exactly (p − 1)/2 nonzero quadratic residues. �
Example 3.3.3 In Z7 we have 12 = 62 = 1, 22 = 52 = 4, and 32 = 42 = 2 so the

set of nonzero quadratic residues is {1, 2, 4}.
110 3 Groups
Determining whether or not a particular element a of Z∗p is a quadratic residue

or non-residue is of great importance for applications of elliptic curves. Even more
important are the algorithms for finding a square root of a, if it exists. The first
question can be efficiently solved by using the following criterion.
Theorem 3.3.4 (Euler’s criterion) Let p be an odd prime and a ∈ Z∗p . Then

p−1 1 if a is a quadratic residue
a 2 =
−1 if a is a quadratic non-residue
Proof If a is a quadratic residue with b2 = a, then by Fermat’s Little Theorem

p−1
a 2 = bp−1 = 1. For the converse see Exercise 5. �
The importance of this criterion is that we can use the Square and Multiply algo-
rithm to raise a to the power of p−1 2 and thus check if a is a quadratic residue or not.
By Theorem 2.3.2 the Square and Multiply algorithm has linear complexity, hence
this is an easy problem to solve. It is somewhat more difficult to find a square root of
an element of Zp , given that it is a quadratic residue. Reasonably fast polynomial time
algorithms exist—most notably the Tonelli–Shanks algorithm, however it is not fully
deterministic as it requires finding at least one quadratic non-residue. This necessary
quadratic non-residue is easy to find using the trial and error method with the aver-
age expected number of trials being only 2. No fully deterministic polynomial-time
algorithm is known.
GAP uses the Tonelli–Shanks algorithm to extract square roots in finite fields. For
example, the following calculation shows that 12 is a quadratic non-residue in Z103
and 13 is a quadratic residue in this field:
gap> RootMod(12,103);
fail
gap> RootMod(13,103);
61
Let p be a large prime. Let us try to estimate the number of points on the elliptic
curve Y 2 = f (X) over Zp , where f (X) is a cubic. For a solution with the first
coordinate X to exist it is necessary and sufficient that f (X) is a quadratic residue. It
is plausible to suggest that f (X) will be a quadratic residue for approximately half of
all points X ∈ Zp . On the other hand, if f (X) is a nonzero quadratic residue, then the
equation Y 2 = f (X) will have two solutions with X as the first coordinate. Hence it
is reasonable to expect that the number of points on the curve will be approximately
p 5
2 · 2 + 1 = p + 1 (p plus the point at infinity). Hasse (1930) gave the exact bound,
which we give here without a proof:
5 Helmut Hasse (1898–1979) was a German mathematician working in algebraic number theory,
known for many fundamental contributions. The period when Hasse’s most important discoveries
were made was a very difficult time for German mathematics. When the Nazis came to power in
1933 a great number of mathematicians with Jewish ancestry were forced to resign and many of
them left the country. Hasse did not compromise his mathematics for political reasons, he struggled
Theorem 3.3.5 (Hasse’s Theorem) Suppose E is an elliptic curve over Zp and let
N be the number of points on E. Then
√ √
p + 1 − 2 p ≤ N ≤ p + 1 + 2 p. (3.22)
It was also shown that for any p and N satisfying (3.22) there exists a curve over Zp
having exactly N points.
As we have already seen, cryptography works with large objects with which it is
difficult to calculate. Large elliptic curves are of great interest to it. Hasse’s theorem
says that to have a large curve we need a large field. This can be achieved in two
ways. The first is to have a large prime p. The second is to keep p small but to try
to build a new large field F, as an extension of Zp . As we will see later, for every n
there is a field containing exactly q = pn elements. There is a more general version
of Theorem 3.3.5 which also often goes by the name of “Hasse’s Theorem”.
Theorem 3.3.6 Suppose E is an elliptic curve over a field F containing q elements
and let N be the number of points on E. Then
√ √
q + 1 − 2 q ≤ N ≤ q + 1 + 2 q. (3.23)
For cryptographic purposes, it is not uncommon to use elliptic curves over fields of
2150 or more elements. It is worth noting that for n ≥ 20 it is infeasible to list all
points on the elliptic curve over a field of 2n elements.
Despite the fact that each curve has quite a few points there does not exist a
deterministic algorithm which will produce, in less than exponential time, a point
on a given curve Y 2 = f (X). In particular, it is difficult to find X such that f (X) is a
quadratic residue. In practice, fast probabilistic methods are used.
Example 3.3.4 Let F = Z5 . Consider the curve Y 2 = X 3 + 2. Let us list all the
points on this curve and calculate the addition table for the corresponding abelian
group E. The quadratic residues of Z5 are 1 = 12 = 42 and 4 = 22 = 32 . We shall
list all possibilities for x and in each case see what y can be:
x = 0 =⇒ y2 = 2, no solution
x = 1 =⇒ y2 = 3, no solution
x = 2 =⇒ y2 = 0 =⇒ y = 0
x = 3 =⇒ y2 = 4 =⇒ y = 2, 3
x = 4 =⇒ y2 = 1 =⇒ y = 1, 4
Hence we can list all the points of E. We have E = {∞, (2, 0), (3, 2), (3, 3), (4, 1),
(4, 4)}. Let us calculate the addition table.
(Footnote 5 continued)
against Nazi functionaries who tried (sometimes successfully) to subvert mathematics to political
doctrine. On the other hand, he made no secret of his strong nationalistic views and his approval of
many of Hitler’s policies.
112 3 Groups
+ ∞ (2,0) (3,2) (3,3) (4,1) (4,4)

∞ ∞ (2,0) (3,2) (3,3) (4,1) (4,4)
(2,0) (2,0) ∞ (4,1) (4,4)
(3,2) (3,2) (4,1) (3,3) ∞
(3,3) (3,3) (4,4) ∞ (3,2)
(4,1) (4,1) ∞
(4,4) (4,4) ∞ (3,2)
We see that 2 · (2, 0) = ∞, hence ord ((2, 0)) = 2. Also 3 · (3, 2) = 3 · (3, 3) = ∞,
while 2 · (3, 2) = ∞ and 2 · (3, 3) = ∞, hence ord ((3, 2)) = ord ((3, 3)) = 3.
Exercises
1. Fill the remaining empty slots of the table above and find the orders of (4, 1) and
(4, 4).
2. Find all quadratic residues of the field Z17 .
3. Use Hasse’s theorem to estimate the number of points on an elliptic curve over
Z2011 .
4. Prove that:
(a) the product of two quadratic residues and the inverse of a quadratic residue
are quadratic residues;
(b) the product of a quadratic residue and a quadratic non-residue is a quadratic
non-residue;
(c) the product of two quadratic non-residues is a quadratic residue.
p−1
5. Prove that if a is a quadratic non-residue, then a 2 = −1. (Use Wilson’s theorem,
which is Exercise 7 in Sect. 1.4.)
6. Use the trial and error method to find a quadratic non-residue in Zp , where
p = 359334085968622831041960188598043661065388726959079837.
3.3.3 Calculating Large Multiples Efficiently
For calculating multiples efficiently the same rules apply as to calculating powers.
Below we give a complete analogue of the Square and Multiply algorithm.
Theorem 3.3.7 Given P ∈ E, for any positive integer N it is possible to calculate
N · P using no more than 2log2 N additions.
Proof We assume N is already written in binary (otherwise we need another log2 N

divisions to convert N into binary representation):
N = 2m0 + 2m1 + · · · + 2ms ,

where m0 = log2 N and m0 > m1 > · · · > ms . We can find all multiples 2mi · P,
i = 1, 2, . . . , s by successive doubling in m0 additions:
21 · P = P + P,
22 · P = 21 · P + 21 · P,
...
2m0 · P = 2m0 −1 · P + 2m0 −1 · P.
Now to calculate
N · P = (2m0 + 2m1 + · · · + 2ms ) · P = 2m0 · P + 2m1 · P + · · · + 2ms · P
we need no more than m0 extra additions. In total no more than 2m0 = 2log2 N.
Since n = log2 N is the length of the input, the complexity function f (n) is at most
linear in n or f (n) = (n). �
The algorithm presented here can be called the Double and Add algorithm. It
has linear complexity. Up to isomorphism, this is the same algorithm as Square and
Multiply.
We see that it is an easy task to calculate multiples of any point P on elliptic curve.
That is, it is easy to calculate N · P given an integer N and a point P on the curve.
However there is no easy way to calculate N given N · P and P. So the function
N → N · P is a one way function and it has been recognised by now that it has a great
significance for cryptography. This branch of cryptography is called Elliptic Curve
Cryptography (ECC). It was proposed in 1985 by Victor Miller and Neil Koblitz as a
mechanism for implementing public-key cryptography alternative to RSA. We will
show one of the cryptosystems of ECC in the next section.
Exercises
Let G be the abelian group of the elliptic curve Y 2 = X 3 + 1234X + 17 over Z346111 .
(GAP will take a few seconds to generate this group, be patient.)
1. Choose a random point on G, let it be called P.

2. Calculate 123 · P.
3. If GAP uses the Double and Add algorithm to compute large multiples, how many
additions will GAP perform when calculating 1729 · P?
4. What is the order of P in G? (Use the command Order(P);.)
5. Calculate the order of G and decide whether P is a generator of G. (Use the
command Size(G);.)
114 3 Groups
3.4 Applications to Cryptography
3.4.1 Encoding Plaintext
It is not so straightforward as for RSA to encode a message as a point of the given

elliptic curve. To illustrate the difficulties we may face here it is enough to say that
there is no known polynomial time algorithm for finding a single point on the curve.
This problem has not yet been fully resolved. However, there are fast probabilistic
methods which work for most messages, but for a small proportion of them these
methods fail to produce a point. The probability of such an unwanted event can be
managed and made arbitrarily small.
The following method was suggested by Koblitz [1]. Suppose that we have an
elliptic curve over Zp given by the equation Y 2 = X 3 + aX + b. We may assume that
our message is already represented by a number m. We will try to embed this number
in the X-coordinate of the point P = (X, Y ) ∈ E. Of course, we would like to make
X = m but this is not always possible since f (m) = m3 + am + b is a quadratic
residue only in about 50 % of cases. A failure rate of 1/2 is, of course, unacceptable.
Suppose that a failure rate of 2−k is acceptable for some sufficiently large positive
integer k. Then, for each of the numbers mi = km + i, where 0 ≤ i < k, we check
if f (mi ) is a quadratic residue. If f (mi ) is a quadratic residue, then we can find a
point P = (X, Y ) ∈ E, for which X = mi (using, for example, the GAP command
RootMod(f(mi ),p);). This will be the plaintext. The message m can always be
recovered as m = X/k. We should choose p sufficiently large so that (m + 1)k < p
for any message m. Since we now have k numbers mi that represent the message, the
probability that for none of them f (mi ) is a quadratic residue will be less than 2−k .
If k = 10, then this means that we can add another junk digit to m (it will be
placed in the rightmost position) in order to get a point on the curve. This junk digit
will be discarded at the receiving end. If k = 100, then we can add two junk digits.
This is already sufficient for practical purposes as 2−100 is very small.
Suppose we have chosen the prime number p = 17487707 and we would like to
represent the message “HAPPY NEW YEAR” using points of the elliptic curve
Y 2 = X 3 + 123X + 456 mod p. Let us encode letters as follows: A = 11,
B = 12, . . ., Z = 36 and suppose we view the failure rate 2−10 as acceptable.
Since our chosen prime has 8 digits we can make messages 6 digits long and still
have a possibility to add one junk digit. This means we have split our message into
blocks with three letters in each:
[HAP, PYN, EWY, EAR] → [x1 , x2 , x3 , x4 ] = [181126, 263524, 153335, 151128].
The message will be encoded as
[(1811261, 11301481), (2635241, 14638357), (1533350, 13487258), (1511282, 9580769)].

3.4 Applications to Cryptography 115
Calculating this, we initially added a junk digit zero to every xi and tried to find a
matching yi . If we failed, we would change the last digit to 1, and, in the case of
another failure to 2, etc. We see that x3 gave us a quadratic residue straightaway, x1
and x2 needed the second attempt with the last digit 1 and x4 needed three attempts
with the last digits 0,1,2.
Exercises
1. Use the trial and error method to find a quadratic residue r and a quadratic non-
residue n in Zp , where
p = 359334085968622831041960188598043661065388726959079837.
Find an element s ∈ Zp such that r = s2 in Zp .

2. Represent the message “CHRISTMAS” using the points of the elliptic curve
Y 2 = X 3 + 123X + 456 mod 17487707. (Note that you do not have to generate
the whole group of points for this curve, which would be time consuming.)
3.4.2 Additive Diffie–Hellman Key Exchange and the Elgamal

Cryptosystem
The exponential Diffie–Hellman key exchange can easily be adapted for elliptic
curves. Suppose that E is a publicly known elliptic curve over Zp . Alice and Bob,
through an open channel, agree upon a point Q ∈ E. Alice chooses a secret positive
integer kA (her private multiplier) and sends kA · Q to Bob. Bob chooses a secret pos-
itive integer kB (his private multiplier) and sends kB · Q to Alice. Bob then calculates
P = kB · (kA · Q) = kA kB · Q, and Alice calculates P = kA · (kB · Q) = kA kB · Q.
They now both know the point P which they can use as the key for a conventional
secret key cryptosystem. An eavesdropper wanting to spy on Alice and Bob would
face the following task called the Diffie–Hellman problem for elliptic curves:
The Diffie–Hellman Problem: Given Q, kA · Q, kB · Q (but not kA or kB ), find
P = kA kB · Q. No polynomial time algorithms are known for this problem.
Elgamal6 (1985) modified the Diffie–Hellman idea to adapt it for message trans-
mission (see [3], p. 287). It starts as above with Alice and Bob publicly announcing
Q and exchanging kB · Q and kA · Q, which play the role of their public keys. Alter-
natively you may think that there is a public domain run by a trusted authority where
Q is stored and that any new entrant, say Cathy, chooses her private multiplier kC
and publishes her public key kC · Q there.
6 Taher Elgamal (born 18 August 1955) is an Egyptian-born American cryptographer. In 1985,

Elgamal published a paper titled “A Public Key Cryptosystem and a Signature Scheme Based on
Discrete Logarithms” in which he proposed the design of the Elgamal discrete logarithm cryptosys-
tem and of the Elgamal signature scheme. He is also recognized as the “father of SSL”, which is
a protocol for transmitting private documents via the Internet that is now the industry standard for
Internet security and ecommerce.
116 3 Groups
Suppose that messages can be interpreted as points of an elliptic curve E in an

agreed upon way, and that Bob wants to send Alice a message M ∈ E. He chooses
a secret random integer s (for each message a distinct random number should be
generated), reads Alice’s public key kA · Q from the public domain and sends Alice
the pair of points C1 = s · Q and C2 = M + s · (kA · Q). On the receiving end, Alice,
using her private multiplier kA , can calculate the plaintext as M = C2 − kA · C1 .
Nobody else can do this without knowing Alice’s private multiplier kA .
Exercises
1. Alice and Bob are setting up the Elgamal elliptic curve cryptosystem for private
communication. They’ve chosen a point Q = (88134, 77186) on the elliptic curve
E given by Y 2 = X 3 +12345 over Z95701 . They’ve chosen their private multipliers
kA = 373 and kB = 5191 and published the points QA = (27015, 92968) and
QB = (55035, 17248), respectively. They agreed to cut the messages into two-
letter segments and encode the letters as A = 11, B = 12, . . . , Z = 36, space =
41, ’ = 42, . = 43, , = 44, ? = 45. They also agreed that, for each point (x, y),
only the first four digits of x are meaningful (so that they can add additional junk
digits to their messages, if needed, to obtain a point on the curve).
(a) Alice got the message:
[ [ ( 87720, 6007 ), ( 59870, 82101 ) ], [ ( 34994, 7432 ), ( 36333, 86213 ) ],

[ ( 50702, 2643 ), ( 33440, 56603 ) ], [ ( 34778, 12017 ), ( 81577, 501 ) ],
[ ( 93385, 52237 ), ( 38536, 21346 ) ], [ ( 63482, 12110 ), ( 70599, 87781 ) ],
[ ( 16312, 46508 ), ( 62735, 69061 ) ], [ ( 64937, 58445 ), ( 41541, 36985 ) ],
[ ( 40290, 45534 ), ( 11077, 77207 ) ], [ ( 64001, 62429 ), ( 32755, 18973 ) ],
[ ( 81332, 47042 ), ( 35413, 9688 ) ], [ ( 5345, 68939 ), ( 475, 53184 ) ] ]
Help her to decrypt this message.

(b) She suspects that the sender of the message was Bob. Show how Alice may
reply to this message and how Bob will decrypt it.
References
1. Koblitz, N.: Elliptic curve cryptosystems. Math. Comput. 48, 203–209 (1987)
2. Miller, V.: Uses of Elliptic Curves in Cryptography. Advances in Cryptology—Crypto ’85, pp.
417–426 (1986)
3. Koblitz, N.: Algebraic Aspects of Cryptography. Springer, Berlin (1998)
4. Shanks, D.: Five number theoretic algorithms. In: Proceedings of the Second Manitoba Confer-
ence on Numerical Mathematics, pp. 51–70 (1973)
Chapter 4
Fields
Oh field of battle, field of dying,

Who sank on you with glory here?
Ruslan and Liudmila. Alexander Pushkin (1799–1837)
4.1 Introduction to Fields
In Sect. 1.4 we defined a field and proved that, for any prime p, the set of integers
Z p = {0, 1, 2, . . . , p − 1} with the operations:
a ⊕ b = a + b mod p,
a b = ab mod p
is a field. This field has cardinality p. So far, these are the only finite fields we have
learned. In this chapter we prove that a finite field must have cardinality pn for some
prime p and positive integer n, i.e., its cardinality may only be a power of a prime.
Such fields exist and we lay the grounds for the construction of such fields in Chap. 5.
In this chapter we also prove a very important result that the multiplicative group
of any finite field is cyclic. This makes it possible to define “discrete logarithms”—
special functions on finite fields that are difficult to compute, and widely used in
cryptography. We show that the Elgamal cryptosystem can also be based on the
multiplicative group of a large finite field.
4.1.1 Examples and Elementary Properties of Fields
We recap that an algebraic system consisting of a set F equipped with two operations,
addition + and multiplication ·, is called a field if the following nine axioms are
satisfied:

DOI 10.1007/978-3-319-21951-6_4
118 4 Fields
F1. Addition is commutative, a + b = b + a, for all a, b ∈ F.

F2. Addition is associative, a + (b + c) = (a + b) + c, for all a, b, c ∈ F.
F3. There exists a unique element 0 such that a + 0 = 0 + a = a, for all a ∈ F.
F4. For every element a ∈ F there exists a unique element −a such that a + (−a) =
(−a) + a = 0, for all a ∈ F.
F5. Multiplication is commutative, a · b = b · a, for all a, b ∈ F.
F6. Multiplication is associative, a · (b · c) = (a · b) · c, for all a, b, c ∈ F.
F7. There exists a unique element 1 ∈ F such that a · 1 = 1 · a = a, for all nonzero
a ∈ F.
F8. The distributive law holds, that is, a · (b + c) = a · b + a · c, for all a, b, c ∈ F.
F9. For every nonzero a ∈ F there is a unique element a −1 ∈ F such that a · a −1 =
a −1 · a = 1.
Here and later, for any field F the set of its non-zero elements will be denoted by
F ∗ . We note that axioms F1–F4 mean that F is an abelian group relative to the
addition and axioms F5–F7 mean that F ∗ is also an abelian group but relative to the
multiplication. Axioms F1–F8 mean that F is a commutative ring relative to the two
operations. Only the last axiom is specific for fields.
The examples of infinite fields are numerous. The most important are the fields
of rational numbers Q, real numbers R, and complex numbers C.
Definition 4.1.1 Let F be a field and G be a subset of F. Sometimes G is also a field

relative to the same operations of addition and multiplication as in F. If so, we say
that G is a subfield of F.
Example 4.1.1 Q is a subfield of R, and R is a subfield of C.
Three basic properties of fields are stated in the following theorem. The second
one is called absence of divisors of zero and the third solvability of linear equations.
We saw these properties hold for Z p but now we would like to prove them for arbitrary
fields.
Theorem 4.1.1 Let F be a field. Then for any two elements a, b ∈ F

(i) a0 = 0 for all a ∈ F;
(ii) ab = 0 if and only if a = 0 or b = 0 (or both);
(iii) if a = 0, the equation ax = b has a unique solution x = a −1 b in F.
Proof (i) Firstly we need to prove that 0 · a = 0 for all a ∈ F. We have
F3 F8
0 · a = (0 + 0) · a = 0 · a + 0 · a.
Adding −(0 · a) to both sides we get
F2 F4 F3
0 = −(0 · a) + (0 · a + 0 · a) = (−(0 · a) + 0 · a) + 0 · a = 0 + 0 · a = 0 · a.
Hence 0 · a = 0. This also proves the ‘if’ part of (ii).

4.1 Introduction to Fields 119
(ii) Suppose now that ab = 0 and either a = 0 or b = 0. Without loss of generality

we assume the former. Then by F9 we know that a −1 exist. We have now
F6 F9 F7
0 = a −1 · 0 = a −1 (ab) = (a −1 a)b = 1 · b = b.
So b = 0 which proves the ‘only if’ part of (ii).

Let us prove (iii). Since a = 0, we know a −1 exists. Suppose that the equation
ax = b has a solution. Then multiplying both sides by a −1 we get a −1 (ax) = a −1 b.
As in the proof of (i) we calculate that the left-hand side of this equation is x. So
x = a −1 b. It is also easy to check that x = a −1 b is indeed a solution of ax = b. �
A very important technique is enlarging a given field to obtain a larger field with
some given property. After learning a few basic facts about polynomials we discuss
how to make such extensions.
Exercises
1. Prove that the set of all non-negative rational numbers Q+ is NOT a field.
2. Prove that the set of all integers Z is NOT√a field. √
3. Prove that the set of all real numbers Q( 2) of the form x + y 2, where x and
y are in Q is√
a field.
4. Consider Q( 3), which is defined √similarly to the field from the previous exercise.
Find the inverse element of 2 − 3 and solve the equation
√ √
(2 − 3)x = 1 + 3.
5. Solve the system of linear equations
3x + y + 4z = 1
x + 2y + z = 2
4x + y + 4z = 4
with coefficients in Z 5 .
4.1.2 Vector Spaces
The reader familiar with Linear Algebra may well skip this section.
120 4 Fields
Definition 4.1.2 Suppose that the following objects are given:

VS1. a field F of scalars;
VS2. a set V of objects, called vectors;
VS3. a rule (or operation) called vector addition, which associates with each pair
of vectors u, v in V a vector u + v in V , called the sum of u and v, in such a
way that
(a) addition is commutative, u + v = v + u;
(b) addition is associative, u + (v + w) = (u + v) + w;
(c) there exists a unique vector 0 in V , called the zero vector, such that
u + 0 = u for all u in V ;
(d) for each vector u in V there is a unique vector −u in V such that u +
(−u) = 0;
VS4. a rule (or operation) called scalar multiplication, which associates with each
scalar a in F and vector u in V a vector au in V , called the product of a and
u, in such a way that
(a) 1u = u for all u in V ;
(b) a1 (a2 u) = (a1 a2 )u;
(c) a(u + v) = au + av;
(d) (a1 + a2 )u = a1 u + a2 u.
Then we call V a vector space over the field F.
Example 4.1.2 Where F is a field, F n is the set of n-tuples whose entries are scalars
from F. It is a vector space over F relative to the following addition and scalar
multiplication:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a1 b1 a1 + b1 a1 ka1
⎢ a2 ⎥ ⎢ b2 ⎥ ⎢ a2 + b2 ⎥ ⎢ a2 ⎥ ⎢ ka2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ + ⎢ .. ⎥ = ⎢ .. ⎥, k ⎢ . ⎥ = ⎢ . ⎥.
. .
⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦ ⎣ . ⎦
an bn an + bn an kan
In particular, Rn , Cn and Znp are vector spaces over the fields R, C and Z p , respec-
tively.
Example 4.1.3 Let Fm×n be the set of m ×n matrices whose entries are scalars from a
field F. It is a vector space over F relative to matrix addition and scalar multiplication.
The sets of all m × n matrices Rm×n , Cm×n and (Z p )m×n with entries from R, C
and Z p are vector spaces over the fields R, C and Z p , respectively.
Example 4.1.4 Let F be a field, and Fn [x] be the set of all polynomials of degree
at most n whose coefficients are scalars from F. It is a vector space over F relative
to the addition of polynomials and scalar multiplication. The sets of all polynomials
Rn [x], Cn [x] and (Z p )n [x] of degree at most n with coefficients from R, C and Z p
are vector spaces over the fields R, C and Z p , respectively.
Example 4.1.5 Let F be a field, F[x] be the set of all polynomials (without restric-
tion on their degrees), whose coefficients are scalars from F. It is a vector space
over F relative to addition of polynomials and scalar multiplication. The sets of all
polynomials R[x], C[x] and Z p [x] with coefficients from R, C and Z p are vector
spaces over the fields R, C and Z p , respectively.
The most interesting example for us is given in the following theorem.
Theorem 4.1.2 Let F be a subfield of a field G. Then G is a vector space over F

relative to the following operations. The addition of elements of G is the operation
of addition in the field G. The scalar multiplication of elements of G by elements of
F is performed as multiplication in the field G.
Proof This is an exercise. Check that the vector space axioms for G all follow from
the field axioms. �
Example 4.1.6 The field of complex numbers C is a vector space over the reals R
which is a subfield of C. Both C and R are vector spaces over the rationals Q.
The axioms of a vector space have many useful consequences. The two most
important ones are as follows:
Proposition 4.1.1 For any element v of a vector space V we have
0 · v = 0, (−1) · v = −v.
Proof We will prove only the first one, the second is an exercise. We will use VS4
(d) for this. We have
0 · v = (0 + 0) · v = 0 · v + 0 · v.
If we define x = 0 · v, then we will have x = x + x in the group (V, +). Adding −x

on both sides we get 0 = x. �
Definition 4.1.3 Let V be a vector space over the field F and v1 , . . . , vk be arbitrary
vectors in V . Then the set of all possible linear combinations a1 v1 + a1 v2 + · · · +
ak vk with coefficients a1 , . . . , ak in F is called the span of v1 , . . . , vk and denoted
span{v1 , . . . , vk }.
Definition 4.1.4 Let V be a vector space over the field F. The space V is said to
be finite-dimensional if there exists a finite number of vectors v1 , v2 , . . . , vk which
span V , that is V = span{v1 , v2 , . . . , vk }.
Example 4.1.7 The space of polynomials Fn [x] is finite-dimensional as the set of

monomials {1, x, x 2 , . . . , x n } spans it. The space of polynomials F[x] is infinite-
dimensional.
122 4 Fields
Proof We will concentrate only on the second part of this example (for the first see
the exercise below). Suppose F[x] is finite-dimensional and there exist polynomials
f 1 , f 2 , . . . , f n such that
F[x] = span{ f 1 , f 2 , . . . , f n }.
Let us choose a positive integer N such that N > deg ( f i ) for all i = 1, . . . , n.
As { f 1 , f 2 , . . . , f n } spans F[x] we can find scalars a1 , a2 , . . . , an such that x N =
a1 f 1 + a2 f 2 + · · · + an f n . Then
G(x) = x N − a1 f 1 (x) − a2 f 2 (x) − · · · − an f n (x)
is a polynomial of degree N which is identically zero. This is a contradiction since

G(x) cannot have more than N roots. (When F = R, this result is well-known. For
an arbitrary field this will be proved in Proposition 5.1.3.) �
Definition 4.1.5 Let V be a vector space over the field F. A subset {v1 , v2 , . . . , vk }
of V is said to be linearly dependent if there exist scalars a1 , a2 , . . . , ak in F, not all
of which are 0, such that
a1 v1 + a2 v2 + · · · + ak vk = 0.
A set which is not linearly dependent is called linearly independent.
Example 4.1.8 Let Fm×n be the space of all m × n matrix with entries from F. Let
E i j be the matrix whose (i j)-entry is 1 and all other entries are 0. Such a matrix is
called a matrix unit. The set of all n 2 matrix units is linearly independent.
Example 4.1.9 The set of monomials {1, x, x 2 , . . . , x n } is linearly independent in

Fn [x].
Definition 4.1.6 Let V be a vector space. A basis for V is a linearly independent

set of vectors which spans V .
Theorem 4.1.3 Let V be a finite-dimensional vector space. Then every spanning

subset of V can be reduced to a basis.
Proof Suppose V = span{v1 , v2 , . . . , vk } but {v1 , v2 , . . . , vk } is not linearly inde-

pendent. Then
a1 v1 + a2 v2 + · · · + ak vk = 0
and at least one coefficient is nonzero. Without loss of generality we may assume
that ak = 0. Then
vk = −(ak−1 a1 )v1 − · · · − (ak−1 ak−1 )vk−1
and now every linear combination of v1 , v2 , . . . , vk can be written as a linear com-

bination of v1 , v2 , . . . , vk−1 . Indeed,
b1 v1 + b2 v2 + · · · + bk vk = (b1 − ak−1 a1 bk )v1 + · · · + (bk−1 − ak−1 ak−1 bk )vk−1 .
This implies that V = span{v1 , v2 , . . . , vk } = span{v1 , v2 , . . . , vk−1 }. We continue

this process until the remaining system of vectors is linearly independent. Then we
will have arrived at a basis for V . �
Proposition 4.1.2 Let {v1 , v2 , . . . , vn } be a basis for a finite-dimensional vector

space V over a field F and v ∈ V . Then there exist a unique n-tuple (a1 , a2 , . . . , an )
of elements of F such that
v = a1 v1 + a2 v2 + · · · + an vn . (4.1)
Proof The fact that there is at least one such n-tuple follows from the fact that
{v1 , v2 , . . . , vn } spans V . Suppose there were two different ones:
v = a1 v1 + a2 v2 + · · · + an vn = b1 v1 + b2 v2 + · · · + bn vn .
Then
(a1 − b1 )v1 + · · · + (an − bn )vn = 0,
which contradicts {v1 , v2 , . . . , vn } being linearly independent. �
Lemma 4.1.1 Let F be a finite field with q elements. Suppose {v1 , v2 , . . . , vn } is a

basis for V over F. Then V contains q n elements.
Proof Every element v of V can be written in a unique way as a linear combination

(4.1). Each coefficient ai appearing in this linear combination may take any one of
q values. The total number of such linear combinations will therefore be q n . This is
how many elements V has. �
In the case when F is finite, it is now clear that all bases are equinumerous, i.e.,
contain the same number of vectors. This is also true in general.
Definition 4.1.7 Let V be a finite-dimensional vector space over a field F. The

dimension of V is the number of vectors in any basis of V . It is denoted dimF V .
Exercises
1. Check that F n satisfies all axioms of a vector space, observing how these axioms
follow from the axioms of a field.
2. Justify the statement in Example 4.1.8.
3. Justify the statement in Example 4.1.9.
124 4 Fields
4. Prove that the set of symmetric n × n matrices S = { A ∈ Fn×n | A T = A} is a

vector space over F. Find its dimension over F.
5. Let V be the set of positive real numbers with the addition
df
u ⊕ v = uv,
i.e., the new addition is the former multiplication. Also for any real number a ∈ R
and any u ∈ V we define the scalar multiplication
df
a u = ua .
Prove that (V, ⊕, ) is a vector space over R.
4.1.3 The Cardinality of a Finite Field
Theorem 4.1.4 Any finite field F contains one of the fields Z p for a certain prime
p. In this case F is a vector space over Z p and it contains p n elements, where
n = dimZ p F.
Proof Let m be a positive integer. Consider the element m · 1 of F which is obtained

by adding m ones, that is m · 1 = 1 + · · · + 1 (m times). When m = 1, 2, . . ., we
obtain the sequence
1, 2 · 1, 3 · 1, . . . , m · 1, . . .
The following is clear from the ring axioms: for any positive integers a, b
a · 1 + b · 1 = (a + b) · 1, (4.2)
(a · 1) · (b · 1) = (ab) · 1. (4.3)
Since F is finite, we get m 1 · 1 = m 2 · 1 or, assuming that m 1 < m 2 , we get

(m 2 − m 1 ) · 1 = 0. Let p be the minimal positive integer for which p · 1 = 0. Then
p is prime. If not, and p = ab for a < p and b < p, then a · 1 = 0 and b · 1 = 0
but (a · 1) · (b · 1) = (ab) · 1 = p · 1 = 0. This is a contradiction since F, being a
field, by Theorem 4.1.1, contains no zero divisors.
Now, since p · 1 = 0, the Eqs. (4.2) and (4.3) become
a · 1 + b · 1 = (a ⊕ b) · 1,
(a · 1) · (b · 1) = (a b) · 1,
where ⊕ and are addition and multiplication modulo p. We can now recognise
that the set {0, 1, 2 · 1, . . . , ( p − 1) · 1} together with the operations of addition and
multiplication in F is in fact Z p . By Theorem 4.1.2 F is a vector space over Z p .
Moreover, F is finite-dimensional over Z p since it is finite. Let n = dimZ p F. By

Lemma 4.1.1, there are exactly p n elements of F. �
The theorem we have proved states that the cardinality of any finite field is a power
of a prime. The converse is also true.
Definition 4.1.8 If p · 1 = 0 in a field F for some prime p, then this prime p is said
to be the characteristic of F. If such a prime does not exist, the field F is said to have
characteristic 0.
Theorem 4.1.5 For any prime p and any positive integer n there exists a field of
cardinality p n . This field is unique up to isomorphism.
Proof We will show how to construct the fields of cardinality pn in the next chapter.
The uniqueness, however, is beyond the scope of this book. �
The unique field of cardinality p n is denoted GF( p n ) and is called the Galois
field of p n elements.1
Exercises
1. Let n 1 = 449873499879757801 and n 2 = 449873475733618561. Find out if
there are fields GF(n 1 ) and GF(n 2 ). In case GF(n i ) exists for i = 1 or i = 2,
identify the prime number p such that Z p ⊆ GF(n i ) and determine the dimension
of GF(n i ) over Z p .
2. Let F be a finite field of q elements. Prove that all its elements are roots of the
equation x q − x = 0. (Hint. Consider the multiplicative group (F ∗ , ·) of this field
and use Corollary 3.2.3.
4.2 The Multiplicative Group of a Finite Field Is Cyclic
In any field F the set F ∗ of all nonzero elements play a very important role. Axiom
F9 states that all elements of F ∗ are invertible. Moreover, this axiom, together with
axioms F5–F7 imply that F ∗ relative to the operation of multiplication is a commu-
tative group. This group is called the multiplicative group of F. Our goal for the rest
of this chapter is to prove that in any finite field F the multiplicative group of F is
cyclic.
We will concentrate our attention on orders of elements in F ∗ . Eventually, we will
find that there is always an element in F ∗ whose order is exactly the cardinality of
this group, thus proving that F ∗ is cyclic.
We now look at the field Z7 to get an intuition of what is to come. In this case
Z∗7 = {1, 2, 3, 4, 5, 6}. Let us calculate the powers of each element:
1 See Sect. 3.1.3 for a brief historical note about Évariste Galois.
126 4 Fields
Powers of 1: 1, 12 = 1.
Powers of 2: 2, 22 = 4, 23 = 1; so there are 3 elements in Z7 which are powers
of 2.
Powers of 3: 3, 32 = 2, 33 = 6, 34 = 4, 35 = 5, 36 = 1; so all nonzero elements
are powers of 3.
Powers of 4: 4, 42 = 2, 43 = 1, so there are three distinct powers of 4.
Powers of 5: 5, 52 = 4, 53 = 6, 54 = 2, 55 = 3, 56 = 1, so all nonzero elements
are powers of 5.
Powers of 6: 6, 62 = 1, so there are two powers.
We summarise our experience: the element 1 has order 1, the elements 2 and 4
have order 3, the elements 3 and 5 have order 6, and the element 6 has order 2. Hence
Z∗7 =< 3 >=< 5 >, it is cyclic and has two generators 3 and 5.
4.2.1 Lemmas on Orders of Elements
Lemma 4.2.1 Let G be a group and g be an element of G of finite order. Then

ord (g) = ord (g −1 ).
Proof Since (g k )−1 = (g −1 )k it follows that g k = 1 implies (g −1 )k = 1 and

conversely. Therefore the orders of g and of g −1 are the same. �
Lemma 4.2.2 Every element of a finite group has a finite order. Moreover, in a finite
group the order of any element is a divisor of the total number of elements in the
group.
Proof Let G be a finite group containing g. Then by Proposition 3.2.1 ord (g) =
|<g>|, which is a divisor of |G| by Lagrange’s theorem. �
Lemma 4.2.3 Let G be a group and g be an element of G of a finite order. Suppose

that g n = 1. Then ord (g)|n, i.e., ord (g) is a divisor of n.
Proof Let ord (g) = m. Suppose n = qm + r , where 0 ≤ r < m, and suppose that
r = 0. Then 1 = g n = g qm+r = (g m )q · gr = gr which contradicts the minimality
of m. �
Equation 3.11 will play a crucial role in the proof of our next lemma. To recap,
Eq. 3.11 says that for any element g ∈ G and positive integer i
ord (g)
ord (g i ) = . (4.4)
gcd(i, ord (g))
Lemma 4.2.4 If g is an element of a group G and ord (g) = ki, where k and i are
positive integers, then ord (g i ) = k.
4.2 The Multiplicative Group of a Finite Field Is Cyclic 127
Proof Indeed by (4.4) we have
ord (g) ki
ord (g i ) = = = k. �
gcd(i, ord (g)) i
Lemma 4.2.5 Let G be a commutative group, and a and b be two elements of G that
have orders m and n, respectively. Suppose that gcd(m, n) = 1. Then ord (ab) = mn.
Proof Since (ab)mn = a mn bmn = 1, we know by Lemma 4.2.3 that ord (ab)|mn.
Suppose that for some k the equality (ab)k = 1 holds. Then (ab)k = a k bk = 1
and a k = (bk )−1 . Let c = a k = (bk )−1 . Then cm = (a k )m = (a m )k = 1 and
cn = ((bk )−1 )n = ((bn )k )−1 = 1. As 1 = gcd(m, n) = um + vn for some integers
u and v, we may write c = cum+vn = cum · cvn = (cm )u · (cn )v = 1. Thus
a k = bk = 1 and by Lemma 4.2.3 we have m|k and n|k. This implies mn|k, because
m and n are relatively prime. If k = ord (ab), we get mn|ord (ab) and together with
ord (ab)|mn we get ord (ab) = mn. �
Corollary 4.2.1 Let G be a commutative group and let a1 , a2 , . . . , ak ∈ G be el-

ements of finite order such that ord (ai ) = piαi and all primes p1 , p2 , . . . , pk are
different. Then
ord (a1 a2 . . . ak ) = p1α1 p2α2 . . . pkαk .
Proof Follows immediately from Lemma 4.2.5. We have to apply it k − 1 times. �
Example 4.2.1 Let a, b, c be elements of a commutative group G with orders
ord (a) = 53 · 17, ord (b) = 72 · 5, ord (c) = 172 · 7.
Let us show how to use these elements to construct an element g ∈ G such that
ord (g) = m and a m = bm = cm = 1.
We claim that m can be taken as lcm(ord (a), ord (b), ord (c)) = 53 · 72 · 172 and
g = a 17 b5 c7 . Indeed, by Lemma 4.2.4 we have
ord (a 17 ) = 53 , ord (b5 ) = 72 , ord (c7 ) = 172 ,
and by Corollary 4.2.1 we get ord (g) = 53 · 72 · 172 . If m = 53 · 72 · 172 , then

ord (a)|m, ord (b)|m, ord (c)|m, which implies a m = bm = cm = 1.
Exercises
1. Let G be an abelian group of order 105 containing elements of orders 3, 5, and 7.

Prove that it is cyclic.
2. Let g, h, k be elements of a finite abelian group G of orders 183618, 131726,
127308, respectively. Use g, h, k to construct an element of G of order 1018264
646281, i.e., express an element of this order using g, h, k.
128 4 Fields
4.2.2 Proof of the Main Theorem
Theorem 4.2.1 Let G be a finite commutative group. Then there exists an element
g ∈ G such that ord (g) = m ≤ |G| and x m = 1 for all x ∈ G.
Proof Let us consider the set of integers I = {ord (g) | g ∈ G} and let p1 , p2 , . . . , pn
be the set of all primes that occur in the prime factorizations of integers from I . For
each such prime pi let us choose the element gi such that ord (gi ) = piαi qi , where
gcd( pi , qi ) = 1 and the integer αi is maximal among all elements of G. (Note that
the same element might correspond to several primes, i.e., among g1 , g2 , . . . , gn not
q
all elements may be distinct.) Then by Lemma 4.2.4 for the element h i = gi i we
αi
have ord (h i ) = pi . Set g = h 1 h 2 . . . h n . Then, by Corollary 4.2.1,
m = ord (g) = p1α1 p2α2 . . . pnαn ,
and it is also clear that m divides the order of every element in G, thus x m = 1 for
all x ∈ G. Moreover, m ≤ |G| by Lemma 4.2.2. �
Theorem 4.2.2 Let F be a finite field consisting of q elements. Then there exists an
element g ∈ F ∗ such that ord (g) = |F ∗ | = q − 1, i.e., F ∗ = <g>.
Proof It is sufficient to prove that there exists an element g ∈ F ∗ of order q − 1. By

Theorem 4.2.1 there exists an element of order m ≤ q − 1 such that x m = 1 for all
x ∈ F ∗ . In the next chapter we will prove that a polynomial of degree n over any
field has no more than n roots in that field. The polynomial x m − 1 can be considered
as a polynomial from F[x]; it has degree m and q − 1 roots in F. Since q − 1 ≥ m,
this is possible only if m = q − 1. The theorem is proved. �
Definition 4.2.1 Let F be a finite field consisting of q elements. Then an element

g ∈ F ∗ of order q − 1 is called a primitive element of F.
Corollary 4.2.2 Every finite field has a primitive element.
Corollary 4.2.3 Let F be a finite field consisting of q elements. Then ord (a) divides
q − 1 for every element a ∈ F ∗ .
Proof Let g be a primitive element of F. Then ord (g) = q − 1 and a = g i for some
1 ≤ i < q − 1. Then by Lemma 4.2.4 ord (a) = ord (g i ) = q − 1/gcd(i, q − 1),
which is a divisor of q − 1. �
Theorem 4.2.3 For each prime p and positive integer n there is a unique, up to
isomorphism, finite field GF( pn ) that consists of p n elements. Its elements are the
n
roots of the polynomial f (x) = x p − x.
Proof We cannot prove the first part of the statement, i.e., the existence of F =
GF( p n ) but we can prove the second. Suppose F exists and g is a primitive element.
Then every nonzero element a of F lies in F ∗ , which is a cyclic group of order p n − 1
4.2 The Multiplicative Group of a Finite Field Is Cyclic 129
n
with generator g. By Corollary 4.2.3 ord (a) is a divisor of p n − 1, hence a p −1 = 1.
n
It follows that a p = a for all a ∈ F, including 0, which proves the second part of
the theorem. �
The idea behind the proof of the existence of GF( p n ) is as follows. Firstly we
construct an extension Z p ⊂ K such that every polynomial with coefficients in Z p
n
has a root in K . Then the polynomial f (x) = x p − x will have p n roots in K and
we have to check that f (x) does not have multiple roots. These pn distinct roots will
then be a field GF( p n ).
From our considerations it follows that, if m|n, then GF( pm ) is a subfield of
m
GF( p n ). Indeed, any root of the equation x p = x will also be a root of the equation
n
x p = x (see Exercise 3 that follows).
Exercises
1. Consider the field Z p , where p = 192837481 is prime. Do there exist elements

in Z∗p of orders 11561 and 58380?
2. Let p be a prime and m, n be positive integers. Then pm − 1 divides p n − 1 if
and only if m divides n.
3. GF( p m ) is a subfield of GF( p n ) if and only if m|n.
4.2.3 Discrete Logarithms
Definition 4.2.2 Let F be a finite field consisting of q elements and let g ∈ F ∗ be

a primitive element of F. Then the equation g x = h has a unique solution modulo
q − 1 which is called the discrete logarithm of h to base g, denoted logg (h).
Example 4.2.2 As was computed in the previous section
Z∗7 = {31 = 3, 32 = 2, 33 = 6, 34 = 4, 35 = 5, 36 = 1}.
Thus 3 is a primitive element of Z7 and log3 (3) = 1, log3 (2) = 2, log3 (6) = 3,
log3 (4) = 4, log3 (5) = 5, log3 (1) = 6.
Example 4.2.3 For example, g = 3 is a primitive element of Z19 as seen from the
table featuring powers of 3:
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3n 3 9 8 5 15 7 2 6 18 16 10 11 14 4 12 17 13 1
130 4 Fields
Therefore the table of logarithms to base 3 will be:
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
log3 (n) 18 7 1 14 4 8 6 3 2 11 12 15 17 13 5 10 16 9
Computing discrete logarithms in a finite field is believed to be computationally

difficult. So we can now add the following problem to our list of apparently hard
problems in Number Theory.
Discrete Logarithm Problem: Given a prime p, a generator g of Z∗p and an
element h ∈ Z∗p , find the integer x such that g x = h and 0 ≤ x ≤ p − 2.
We recap that an element h of a finite field F is a quadratic residue if there exists
another element g ∈ F such that g 2 = h.
Theorem 4.2.4 Let F be a finite field. Let g ∈ F ∗ be any primitive element of F.

Then an element h ∈ F ∗ is a quadratic residue if and only if logg (h) is even.
Proof If logg (h) = 2k is even, then h = g 2k = (g k )2 , thus h is a quadratic residue.

The reverse is clearly also true. Indeed, if h is a quadratic residue, then h = (h 1 )2 for
some h 1 ∈ F ∗ . Since h 1 = g k for some k, we get h = (g k )2 = g 2k and logg (h) = 2k
is even. �
Exercises
1. How many primitive elements are there in the field Z1237 ?
2. Let F = Z17 .
(a) Decide whether 2 or 3 is a primitive element of F. Denote the one which is
primitive by g.
(b) Compute the table of powers of g in F and the table of discrete logs to base g.
3. Let g be a primitive element in a finite field F consisting of q elements. Prove
that
logg (ab) ≡ logg (a) + logg (b) mod q−1.
4.3 The Elgamal Cryptosystem Revisited
In Chap. 2 we studied a public key cryptosystem whose security is based on the

complexity of factoring integers. Here we present a cryptosystem whose security is
based on the complexity of calculating discrete logarithms. It is based on the Diffie–
Hellman key exchange agreement. It was invented by Taher Elgamal in 1985. The
4.3 The Elgamal Cryptosystem Revisited 131
Elgamal algorithm is used in the free GNU Privacy Guard software, recent versions
of PGP, and other cryptosystems.
In a public domain, a large prime p and a primitive element α of Z p are displayed.
Each participant of the group, who wants to send or receive encrypted messages,
creates their private and public keys. Alice, for example, selects a secret integer k A
and calculates α k A which she places in the public domain as her public key. Bob
selects a secret integer k B and calculates α k B which he places in a public domain as
his public key. Now they can exchange messages.
Suppose, for example, that Bob wants to send a message m to Alice. We’ll assume
that m is an integer such that 0 ≤ m < p. (If m is larger, he breaks it into several
blocks as usual.) He chooses a secret random integer s and computes c1 = α s in
Z p . He also takes Alice’s public key α k A from the public domain and calculates
c2 = m · (α k A )s . He sends this pair (c1 , c2 ) of elements of Z p to Alice so this is
the cyphertext. On the receiving end Alice uses her private key k A to calculate m
as follows: m = c2 · ((c1 )k A )−1 . For the evil eavesdropper Eve to figure out k A she
must solve a Discrete Logarithm Problem, which is difficult.
Exercises
1. Alice and Bob agreed to use the Elgamal cryptosystem based on the multiplicative
group of the field Z p for p = 53. They also agreed to use 2 as the primitive element
of Z p . Since p is small their messages consist of a single letter which is encoded as
A = 11, B = 12, . . . , Z = 36.
Bob’s public key is 32 and Alice sent him the message (30, 42). Which letter did
Alice send to Bob in this message?
2. Alice and Bob have set up the multiplicative Elgamal cryptosystem for private
communication. They’ve chosen an element g = 123456789 in the multiplicative
group of the field Z p , where p = 123456789987654353003. They’ve chosen
their private exponents k A = 373 and k B = 5191 and published the elements
g A = 52808579942366933355 and g B = 39318628345168608817, respectively.
They agreed to cut the messages into ten-letter segments and encode the letters
as A = 11, B = 12, . . ., Z = 36, space = 41, ’ = 42, . = 43, , = 44, ? = 45.
Bob got the following message from Alice:
[ [ 83025882561049910713, 66740266984208729661 ],
[ 117087132399404660932, 44242256035307267278 ],
[ 67508282043396028407, 77559274822593376192 ],
[ 60938739831689454113, 14528504156719159785 ],
[ 5059840044561914427, 59498668430421643612 ],
[ 92232942954165956522, 105988641027327945219 ],
[ 97102226574752360229, 46166643538418294423 ] ]
Help him to decrypt it.

Chapter 5
Polynomials
A polynomial walks into a bar and asks for a drink. The barman
declines: “We don’t cater for functions.”
An old math joke.
This chapter is about polynomials and their use. After learning the basics we discuss
Lagrange’s interpolation needed for Shamir’s secret sharing scheme that we discuss
in Chap. 6. Then, after proving some further results on polynomials, we give a con-
struction of a finite field whose cardinality pn which is a power of a prime p. This field
is constructed as polynomials modulo an irreducible polynomial of degree n. The
field constructed will be an extension of Zp and in this context we discuss minimal
annihilating polynomials which we will need in Chap. 7 for the construction of good
error-correcting coding.
5.1 The Ring of Polynomials
5.1.1 Introduction to Polynomials
Let F be a field. Any expression
k

f (x) = ai x i , ai ∈ F, (5.1)
i=0
where k is an arbitrary positive integer, is called a polynomial over F. The set of all
polynomials over F is denoted by F[x]. For k = 0 there is no distinction between the
scalar a0 and the polynomial f (x) = a0 . Thus we assume that F ⊂ F[x]. The zero
polynomial 0 is a very special one. Any other polynomial f (x) = 0 we can write in
the form (5.1) with ak = 0 and define its degree as follows.

DOI 10.1007/978-3-319-21951-6_5
134 5 Polynomials

Definition 5.1.1 Given a nonzero polynomial f (x) = ki=0 ai x i , with ak = 0, the
number k is said to be the degree of f (x) and will be denoted deg (f ). Note that deg (f )
is undefined if f = 0. Colloquially speaking, the degree of f (x) is the highest power
of x which appears.
Definition 5.1.2 Let

k
m

f (x) = ai x i , g(x) = bi x i
i=0 i=0
be two polynomials of degree deg (f ) = k and deg (g) = m, respectively. We say

that these two polynomials are equal, and write f (x) = g(x), if k = m and ai = bi
for all i = 0, 1, 2, . . . , k.
The addition and multiplication in the field induces the corresponding operations
over polynomials. Let
k
m

i=0 i=0
be two polynomials and assume that deg (f ) = k ≥ m = deg (g). Then we define
k

f (x) + g(x) := (ai + bi )x i ,
i=0
where for i > deg (g) we assume that bi = 0. Multiplication is defined in such a way
that x i · x j = x i+j is true. The only way to do this is to set
⎛ ⎞
k+m
i

f (x)g(x) := ⎝ aj bi−j ⎠ x i .
i=0 j=0
The same convention also works here: ap = 0, when p > deg (f ), and bq = 0, when
q > deg (g).
By defining these two operations we obtain an algebraic object which is called
the polynomial ring over F; it is also denoted by F[x].
Example 5.1.1 Let f (x) = x 2 + x + 1 and g(x) = x 3 + x + 1 be two polynomials

from Z2 [x]. Then
f (x) + g(x) = x 3 + x 2 , f (x)g(x) = x 5 + x 4 + 1.
(Some training in handling these operations is desirable. Try several examples

yourself.)
5.1 The Ring of Polynomials 135
We observe that
Proposition 5.1.1 For any two nonzero polynomials f , g ∈ F[x]
1. deg (f + g) ≤ max(deg (f ), deg (g));
2. deg (f g) = deg (f ) + deg (g) and, in particular, F[x] has no zero divisors.
Division with remainder is also possible.
Theorem 5.1.1 (Division Algorithm) Given polynomials f (x) and g(x) in F[x] with
g(x) = 0, there exist a “quotient” q(x) ∈ F[x] and a “remainder” r(x) ∈ F[x] such
that
f (x) = g(x)q(x) + r(x)
and either r(x) = 0 or deg (r) < deg (g). Moreover, the quotient and the remainder
are uniquely defined.
Proof Let k m

i=0 i=0
be two polynomials with deg (f ) = k and deg (g) = m. Then there are two cases to
consider:
Case 1. If k < m, then we can set q(x) = 0 and r(x) = f (x).
Case 2. If k ≥ m, we can define
−1
f1 (x) = f (x) − bm ak x k−m g(x) = f (x) − g(x)q1 (x),
where q1 (x) = bm −1 a x k−m . This polynomial f (x) will be of smaller degree than f ,
k 1
since f (x) and q1 (x)g(x) have the same degree m and the same leading coefficient
am . By induction hypothesis,
f1 (x) = g(x)q2 (x) + r(x),
with either r(x) = 0 or deg (r) < deg (g). Therefore
f (x) = g(x)(q1 (x) + q2 (x)) + r(x).
Now suppose that
f (x) = g(x)q1 (x) + r1 (x) = g(x)q2 (x) + r2 (x).
Then
g(x)(q1 (x) − q2 (x)) = r2 (x) − r1 (x).
136 5 Polynomials
This cannot happen for r1 (x) = r2 (x) since the degree of the right-hand side is
smaller than the degree of the left-hand side. Thus r2 (x) − r1 (x) = 0. This can
happen only when q1 (x) − q2 (x) = 0, since F[x] has no zero divisors. �
The quotient and the remainder can be computed by the following “polynomial
long division” process, commonly taught in high school. For example, let us consider
polynomials f (x) = x 4 + x 3 + x 2 + x + 1 and g(x) = x 2 + 1 from Z2 [x]. Then
x2 + x
x2 +1 x4 + x3 + x2 + x + 1
x4 + x2
x 3 + x +1
x3 +x
+1
encodes a division with remainder of the polynomial f (x) by g(x). It shows that the
quotient q(x) and the remainder r(x) are
q(x) = x 2 + x, r(x) = 1,
that is
x 4 + x 3 + x 2 + x + 1 = (x 2 + x)(x 2 + 1) + 1.
We say that a polynomial f (x) is divisible by g(x) if f (x) = q(x)g(x), i.e., when the
remainder is zero.
A polynomial (5.1) defines a function f : F → F with
k

f (α) = ai α i .
i=0
It is straightforward to check that this function satisfies the following conditions:
(f + g)(α) = f (α) + g(α), (f g)(α) = f (α)g(α).
In Analysis this function is always identified with the polynomial itself. However,
working over a finite field we cannot do this. Indeed, 12 + 1 = 0 and 02 + 0 = 0. So
the polynomial f (x) = x 2 + x over Z2 is non-zero but the function associated with
it is the zero function.
Definition 5.1.3 An element α ∈ F is called a root1 of f (x) if f (α) = 0.
Proposition 5.1.2 An element α ∈ F is a root of a polynomial f (x) if and only if
f (x) = g(x)(x − α), i.e., f (x) is divisible by x − α.
1A purist would talk about a zero of the polynomial f (x) but a root of the equation f (x) = 0. We
are not making this distinction.
Proof Let us divide f (x) by (x − α) with remainder:
f (x) = q(x)(x − α) + r,
where r ∈ F is the remainder and q(x) is the quotient. Substituting α in this equation
we get 0 = 0 + r, whence r = 0 and f (x) is divisible by (x − α) and q(x) can be
taken as g(x). Conversely, if f (x) = g(x)(x − α), then f (α) = g(α) · 0 = 0. �
Proposition 5.1.3 A polynomial
k

f (x) = ai x i , ai ∈ F, (5.2)
i=0
of degree k cannot have more than k roots in the field F.
Proof Suppose that α1 , . . . , αk+1 ∈F are distinct roots of f (x). By Proposition 5.1.2
f (x) = (x − α1 )g1 (x), deg (g1 ) = k − 1. (5.3)
Substituting α2 in this equation we get
0 = (α2 − α1 )g1 (α2 ).
Since in any field there are no divisors of zero we conclude that g1 (α2 ) = 0 and by
Proposition 5.1.2
g1 (x) = (x − α2 )g2 (x), deg (g2 ) = k − 2.
Substituting this in (5.3) we get
f (x) = (x − α1 )(x − α2 )g2 (x).
After k such steps we get
f (x) = (x − α1 )(x − α2 ) . . . (x − αk )gk (x), deg (gk ) = 0. (5.4)
This means gk (x) = c ∈ F. Comparing coefficients of x k in (5.4) and (5.2) we see

that c = ak and
f (x) = ak (x − α1 )(x − α2 ) . . . (x − αk ). (5.5)
Substituting x = αk+1 in (5.5) we get
0 = ak (αk+1 − α1 )(αk+1 − α2 ) . . . (αk+1 − αk ),
which contradicts the fact that there are no zero divisors in F. �

138 5 Polynomials
Exercises
1. Consider the following polynomials in Z7 :
f (x) = 5x 4 + x 2 + 3x + 4, g(x) = 3x 2 + 2x + 1.
Find the quotient and the remainder of f (x) on division by g(x).

2. Find the roots of f (x) = x 4 +2x 3 +2x 2 +2x+1 in Z5 [x]. Hence find a factorisation
of f (x) into linear factors.
5.1.2 Lagrange Interpolation
Sometimes we need to reconstruct a polynomial knowing a number of values of this

polynomial.
Proposition 5.1.4 Let α0 , α1 , . . . , αk be distinct elements of F and β0 , β1 , . . . , βk

be arbitrary elements of F. Then there exists no more than one polynomial f (x) of
degree at most k such that f (αi ) = βi for i = 0, 1, . . . , k.
k i
Proof Suppose that two distinct polynomials f (x) = i=0 ai x and g(x) =
k i
i=0 bi x satisfy f (αi ) = βi and g(αi ) = βi for all i = 0, 1, 2, . . . , k. Then the
polynomial h(x) = f (x) − g(x) is not zero, and its degree is not greater than k. Also
h(αi ) = f (αi ) − g(αi ) = βi − βi = 0,
and h(x) has at least k+1 distinct roots α0 , α1 , . . . , αk . However, by Proposition 5.1.3
this is impossible. �
Theorem 5.1.2 Let α0 , α1 , . . . , αk be distinct elements of F and β0 , β1 , . . . , βk be

arbitrary elements of F. Then there exists a unique polynomial
k
(x − α0 ) . . . (x − αi−1 )(x − αi+1 ) . . . (x − αk )
f (x) = βi (5.6)
(αi − α0 ) . . . (αi − αi−1 )(αi − αi+1 ) . . . (αi − αk )
i=0
of degree at most k such that f (αi ) = βi for i = 0, 1, . . . , k.
Proof The polynomial (5.6) was constructed as follows. We first constructed poly-
nomials gi (x) of degree k such that gi (αi ) = 1 and gi (αj ) = 0 for i = j. These
polynomials are:
(x − α0 ) . . . (x − αi−1 )(x − αi+1 ) . . . (x − αk )

gi (x) = .
(αi − α0 ) . . . (αi − αi−1 )(αi − αi+1 ) . . . (αi − αk )

Then the desired polynomial was constructed as f (x) = ki=0 βi gi (x). We immedi-
ately see that f (αi ) = βi , as required. This polynomial is unique because of Propo-
sition 5.1.4. �
Example 5.1.2 As an example, let us construct a polynomial f (x) of degree 2 over

F = Z5 with the properties: f (1) = 2, f (2) = 4, f (3) = 4. We apply Theorem 5.1.2
to the case F = Z5 , k = 2, α0 = 1, α1 = 2, α2 = 3, β0 = 2, β1 = 4, β2 = 4.
The formula tells us that
(x − 2)(x − 3) (x − 1)(x − 3) (x − 1)(x − 2)
f (x) = 2 +4 +4
(1 − 2)(1 − 3) (2 − 1)(2 − 3) (3 − 1)(3 − 2)
is the desired polynomial. If we want to know the coefficients of this polynomial, we

have to expand all the expressions, bearing in mind that all the arithmetic is in Z5 :
x2 + 1 x2 + x + 3 x 2 + 2x + 2
f (x) = 2 +4 +4 =
4·3 1·4 2·1
x 2 + 1 + (x 2 + x + 3) + 2(x 2 + 2x + 2) = 4x 2 + 3.
(You can easily check that indeed f (1) = 2, f (2) = 4, f (3) = 4. Do it!)
Note: a simple alternative to using the formula is to calculate the coefficients of
the desired polynomial as the unique solution of a system of linear equations: if
f (x) = ax 2 + bx + c and f (1) = 2, f (2) = 4, f (3) = 4, we have the system
⎧
⎨ a + b + c = 2,
4a + 2b + c = 4,
⎩
4a + 3b + c = 4.
The usual method of Gaussian elimination (all arithmetic in Z5 ) leads to a = 4,

b = 0, c = 3, confirming the result obtained by the previous method. Another way
to solve this system of linear equation is of course to calculate the inverse of the
matrix of this system and multiply it by the column on the right-hand side.
Corollary 5.1.1 Let us consider the class of polynomials
k

f (x) = ai x i , ai ∈ F,
i=0
with an arbitrary but fixed a0 ∈ F. Let α1 , . . . , αk be distinct non-zero elements of

F and β1 , . . . , βk be arbitrary elements of F. Then there exists a unique polynomial
f (x) of degree k in this class such that f (αi ) = βi for i = 1, 2, . . . , k.
Proof Since a0 = f (0) it is enough to set α0 = 0 and β0 = a0 and apply Theorem

5.1.2. �
140 5 Polynomials
Exercises

1. Use Lagrange interpolation to find f (x) = 2i=0 ai x i ∈ Z7 [x] with f (1) = f (2) =
1 and f (3) = 2.
2. Find the constant term of the polynomial f (x) of degree no greater than 2 with
coefficients in Z7 such that f (1) = 3, f (3) = 2, f (4) = 1.
3. Find the constant term of the polynomial f (x) of degree at most 3 in Z7 such that
f (1) = 3, f (2) = 2, f (3) = 2, f (5) = 1.
4. Use GAP to find a polynomial f (x) ∈ Z13 [x] of degree at most 3 such that
f (1) = 5, f (2) = 7, f (3) = 0, f (5) = 3.
In the next chapter we will look at one particular application of polynomials to

cryptography, namely to secret sharing.
5.1.3 Factoring Polynomials
Definition 5.1.4 Any polynomial
k

f (x) = ai x i , ai ∈ F, (5.7)
i=0
where k is an arbitrary positive integer and ak = 1 is called a monic polynomial of

degree k over F.
Example 5.1.3 The polynomial f (x) = 5x 2 + x 5 − 1 is a monic polynomial of

degree 5. The polynomial g(x) = x 2 + 2x 5 − 1 is not monic.
Definition 5.1.5 A polynomial f (x) from F[x] is said to be reducible over F if there
exist two polynomials f1 (x) and f2 (x) from F[x], each of degree greater than or
equal 1, such that f (x) = f1 (x)f2 (x). Otherwise f (x) is said to be irreducible over F.
Example 5.1.4 The polynomial f (x) = x 2 + 1 is irreducible over R and reducible

over C since f (x) = (x − i)(x + i). The polynomial g(x) = x 2 − 2 is irreducible
over Q and reducible over R. The polynomial h1 (x) = x 2 + 2 ∈ Z5 [x] is irreducible
over Z5 , and h2 (x) = x 2 + 2 = (x + 3)(x + 8) ∈ Z11 [x] is reducible over Z11 .
We see that the reducibility or irreducibility of a given polynomial depends heav-

ily on the field under consideration. We will be especially interested in irreducible
polynomials over Z2 . Of course, both linear polynomials x and x + 1 are irreducible.
Since x 2 , (x + 1)2 = x 2 + 1, x(x + 1) = x 2 + x are reducible, the only irreducible
polynomial of degree 2 is x 2 + x + 1. There are eight polynomials of degree 3:
f1 (x) = x 3 ,
f2 (x) = x 3 + 1,
f3 (x) = x 3 + x + 1,
f4 (x) = x 3 + x,
f5 (x) = x 3 + x 2 ,
f6 (x) = x 3 + x 2 + 1,
f7 (x) = x 3 + x 2 + x,
f8 (x) = x 3 + x 2 + x + 1.
To check them for irreducibility, the following proposition is useful.
Proposition 5.1.5 A polynomial f (x) ∈ F[x] of degree 3 is irreducible over F if and

only if it has no roots in F.
Proof If f (x) is irreducible clearly it has no linear factors, nor by Proposition 5.1.2
does it have any roots in F. Conversely, suppose that f (x) has no roots in F. If it
is reducible, then f (x) = g(x)h(x), where either g(x) or h(x) has degree 1, and
polynomial of degree 1 always has a root in F. This gives us a contradiction to
Proposition 5.1.2. �
Returning to our list, we know that any reducible polynomial f (x) of degree 3 has
a root in Z2 , i.e., either f (0) = 0 or f (1) = 0. Six out of the eight polynomials in the
table have roots in Z2 and only f3 (x) = x 3 + x + 1 and f6 (x) = x 3 + x 2 + 1 do not
have roots, hence they are the only two irreducible polynomials.
Theorem 5.1.3 If a polynomial f (x) ∈ F[x] is not divisible by any irreducible

polynomial over F of degree not greater than n2 , then it is irreducible over F.
Proof If f (x) is reducible over F, then f (x) = g(x)h(x), where g(x), h(x) ∈ F[x]
both have degrees at least one. Then at least one of them will have degree not greater
than n2 . Any of its irreducible factors will have degree not greater than n2 . Hence,
if there are no irreducible polynomials over F of degree not greater than n2 that
divide f (x), it must be irreducible over F. �
Example 5.1.5 Let us determine whether or not f (x) = x 5 + x 4 + 1 is irreducible

over Z2 . We check that f (0) = f (1) = 1, that is f (x) has no roots in Z2 . But does this
imply its irreducibility? Not at all. The absence of roots means the absence of linear
factors. However it is possible that a polynomial of degree five has no linear root
but is reducible by having one quadratic irreducible factor and another one of degree
three. We now have to check that there are no quadratic irreducible factors. The only
possible irreducible quadratic factor is x 2 + x + 1, so we have to divide f (x) by
x 2 + x + 1 and calculate the remainder. We find that f (x) = (x 2 + x + 1)(x 3 + x + 1).
Hence f (x) is reducible.
142 5 Polynomials
Irreducible polynomials play a similar role to that played by prime numbers. The
following theorem can be proved using the same ideas as for integers.
Theorem 5.1.4 Any polynomial f (x) from F[x] of degree no less than 1 can be
uniquely represented as a product
f (x) = cp1 (x)α1 p2 (x)α2 . . . pk (x)αk
where p1 (x), p2 (x), . . . , pk (x) ∈ F[x] are monic irreducible (over F) polynomials, c
is a non-zero constant, and α1 , α2 , . . . , αk are positive integers. This representation
is unique apart from the order of p1 (x), p2 (x), . . . , pk (x).
Exercises
1. Let F be a field and let f (x) ∈ F[x]. True or false:

(a) If f (x) has a root in F then f (x) is reducible in F[x].
(b) If f (x) is reducible in F[x] then f (x) has a root in F.
2. Find all irreducible quadratic polynomials in Z3 [x].
3. Explain why checking irreducibility is much easier for cubic (degree 3) polyno-
mials than for quartic (degree 4) polynomials.
4. Which of the following polynomials are irreducible in Z3 [x]:
(i) f (x) = x 3 + 2x + 2,
(ii) g(x) = x 4 + 2x 3 + 2x + 1,
(iii) h(x) = x 4 + x 3 + x 2 + x + 1?
5. Represent f (x) = x 5 + x + 1 ∈ Z2 [x] as a product of irreducible polynomials.
6. Show that f (x) = x 5 + x 2 + 1 ∈ Z2 [x] is an irreducible polynomial over Z2 .
5.1.4 Greatest Common Divisor and Least Common Multiple
Definition 5.1.6 Let F be a field and f (x), g(x) be two polynomials from F[x]. A
monic polynomial d(x) ∈ F[x] is called the greatest common divisor of f (x) and
g(x) iff:
(a) d(x) divides both f (x) and g(x), and
(b) d(x) is of maximal degree with the above property.
The greatest common divisor of f (x) and g(x) is denoted gcd(f (x), g(x)) or
gcd(f , g)(x). Its uniqueness follows from the following
Theorem 5.1.5 (The Euclidean Algorithm) Let f and g be two polynomials. We use
the division algorithm several times to find:
f = q1 g + r1 , deg (r1 ) < deg (g),

g = q2 r1 + r2 , deg (r2 ) < deg (r1 ),
r1 = q3 r2 + r3 , deg (r3 ) < deg (r2 ),
..
.
rs−2 = qs rs−1 + rs , deg (rs ) < deg (rs−1 ),
rs−1 = qs+1 rs .
Then all common divisors of f and g are also divisors of rs . Moreover, rs divides
both f and g. Thus rs = gcd(f , g).
The extended Euclidean algorithm also holds.
Theorem 5.1.6 (The Extended Euclidean Algorithm) Let f and g be two polynomi-
als. Let us form the following matrix with two rows R1 , R2 , and three columns C1 ,
C2 , C3 :
f 10
(C1 C2 C3 ) = .
g01
In accordance with the Euclidean Algorithm above, we perform elementary row

operations R3 := R1 − q1 R2 , R4 := R2 − q2 R3 , . . ., each time creating a new row,
so as to obtain: ⎡ ⎤
f 1 0
⎢g 0 1 ⎥
⎢ ⎥
⎢ r1 1 −q1 ⎥

⎢ ⎥
(C1 C2 C3 ) = ⎢ r2 −q2 1 + q1 q2 ⎥ .
⎢ ⎥
⎢ .. ⎥
⎣ . ⎦
rs m n
Then gcd(f , g)(x) = rs (x) = f (x)m(x) + g(x)n(x).
Proof The proof is exactly the same as for numbers. �
Example 5.1.6 Let f (x) = x 4 + x 3 + x 2 + 1 and g(x) = x 4 + x 2 + x + 1 be from

Z2 [x]. We write:
x 4 + x 3 + x 2 + 1 = (x 4 + x 2 + x + 1) · 1 + (x 3 + x)
x 4 + x 2 + x + 1 = (x 3 + x) · x + (x + 1)
x 3 + x = (x + 1) · (x 2 + x).
So gcd(f , g)(x) = x + 1.
144 5 Polynomials
The Extended Euclidean algorithm gives
x4 + x3 + x2 + 1 1 0
x4 + x2 + x + 1 0 1
x3 + x 1 1
x+1 x x+1
Hence gcd(f , g)(x) = x + 1 = f (x)x + g(x)(x + 1).
Definition 5.1.7 Two polynomials f (x), g(x) ∈ F[x] are said to be coprime (rela-
tively prime) if gcd(f , g)(x) = 1.
Corollary 5.1.2 Two polynomials f (x), g(x) ∈ F[x] are coprime if and only if there
exist polynomials m(x), n(x) ∈ F[x] such that
1 = f (x)m(x) + g(x)n(x).
Definition 5.1.8 Let F be a field and f (x), g(x) be two polynomials from F[x]. A
monic polynomial m(x) ∈ F[x] is called the least common multiple of f (x) and g(x)
if:
(a) m(x) is a multiple of both f (x) and g(x);
(b) m(x) is of minimal degree with the above property.
It is denoted lcm(f (x), g(x)) or lcm(f , g)(x).
All the usual properties of the least common multiple are satisfied. For example,
as for the integers, we can prove:
Theorem 5.1.7 Let f (x) and g(x) be two monic polynomials in F[x]. Then
lcm(f (x), g(x)) · gcd(f (x), g(x)) = f (x)g(x).
Example 5.1.7 Let f (x) = x 4 + x 3 + x 2 + 1 and g(x) = x 4 + x 2 + x + 1 be two

polynomials in Z2 [x]. We know that gcd(f , g)(x) = x + 1. Hence lcm(f , g)(x) =
f (x)g(x) 7 6 5 4 2
x+1 = x + x + x + x + x + 1.
Exercises
1. Find the greatest common divisor d(x) of the polynomials f (x) = x 7 + 1 and
g(x) = x 3 + x 2 + x + 1 in Z2 [x] and represent it in the form d(x) = f (x)m(x) +
g(x)n(x).
5.2 Finite Fields 145
5.2 Finite Fields
5.2.1 Polynomials Modulo m(x)
Let m(x) be a polynomial over F of degree n. Let us consider the set
F[x]/(m(x)) = {f (x) | f = 0 or deg (f ) < n}
of all polynomials of degree lower than n. This is exactly the set of all possible
remainders on division by m(x). Clearly F[x]/(m(x)) is an n dimensional vector
space over F spanned by the monomials 1, x, . . . , x n−1 .
Let f (x) be a polynomial from F[x] and r(x) be its remainder on division by m(x).
We define
r(x) = f (x) mod m(x).
We will also write f (x) ≡ g(x) mod m(x) if f (x) mod m(x) = g(x) mod m(x).
Note that f (x) mod m(x) belongs to F[x]/(m(x)) for all f (x) ∈ F[x].
Let us now convert F[x]/(m(x)) into a ring2 by introducing the following addition
and multiplication:
f (x) ⊕ g(x) := (f + g)(x) mod m(x), (5.8)

f (x) g(x) := f g(x) mod m(x). (5.9)
Note that the ‘new’ addition is not really new as it coincides with the old one.
But we do indeed get a new multiplication. All properties of a commutative ring for
F[x]/(m(x)) can be easily verified.
Example 5.2.1 Let us consider the ring R[x]/(x 2 + 1). Since deg (x 2 + 1) = 2, this
is a 2-dimensional space over the reals with basis {1, x}. The addition is
(a · 1 + bx) ⊕ (c · 1 + dx) = (a + c) · 1 + (b + d)x,
and the multiplication
(a · 1 + bx) (c · 1 + dx) = ac · 1 + (ad + bc)x + bdx 2

≡ (ac − bd) · 1 + (ad + bc)x.
One must be able to recognise the complex numbers (with x playing the role of i).
In mathematical language the ring R[x]/(x 2 + 1) is said to be isomorphic to C.
As in the case of the integers, and by using the same approach, we can prove
2 Those familiar with the basics of abstract algebra will recognise the quotient-ring of F[x] by the
principal ideal generated by m(x).
146 5 Polynomials
Theorem 5.2.1 F[x]/(m(x)) is a field if and only if m(x) is irreducible over F.

Proof Suppose m(x) is of degree n and is irreducible over F. Then we need to show
that every non-zero polynomial f (x) ∈ F[x]/(m(x)) is invertible. We know that
deg (f ) < n. Since m(x) is irreducible we have gcd(f , m) = 1 and by the Extended
Euclidean Algorithm we can find a(x), b(x) ∈ F[x] such that a(x)f (x) + b(x)m(x) =
1. Let us divide a(x) by m(x) with remainder: a(x) = q(x)m(x) + r(x) and substitute
into the previous equation. We will obtain
r(x)f (x) + (q(x)f (x) + b(x))m(x) = 1.
This means that r(x) f (x) = 1 in F[x]/(m(x)), thus f (x) is invertible and r(x) is
its inverse.
On the other hand, if m(x) is not irreducible, we can write m(x) = n(x)k(x),
which will lead to n(x) k(x) = 0 in F[x]/(m(x)). Then, having divisors of zero,
by Lemma 1.4.2 F[x]/(m(x)) cannot be a field. �
From now on, we will not use the special symbols ⊕ and to denote the operations
in F[x]/(m(x)); this will invite no confusion.
Example 5.2.2 Prove that K = Z2 [x]/(x 4 + x + 1) is a field, and determine how
many elements it has. Then find (x 3 + x 2 )−1 .
Solution To prove that K is a field we must prove that m(x) = x 4 + x + 1 is
irreducible. If it were reducible, then it would have a factor of degree 1 or 2. Since
m(0) = m(1) = 1, it does not have linear factors. So, if it is reducible, the only
possibility left is that it is the square of the only irreducible polynomial of degree 2,
that is (x 2 + x + 1)2 = x 4 + x 2 + 1. This does not coincide with m(x), hence m(x)
is irreducible. Hence K is a field. Since dimZ2 K = deg (m(x)) = 4, K has 24 = 16
elements.
By using the Extended Euclidean algorithm we get
x4 + x + 1 1 0
3
x +x 2 0 1
x2 + x + 1 1 x+1
x x x2 + x + 1
1 x + x + 1 x3 + x
2
Thus (x 3 + x 2 )−1 = x 3 + x. �
Example 5.2.3 Let us continue to investigate K = Z2 [x]/(x 4 + x + 1) for a while.
We know that, as a finite field, K must have a primitive element, in fact φ(15) = 8
of them. The polynomial x 4 + x + 1 is very convenient since x is one of the primitive
elements of K. Let us compute powers of x and place all elements of K in the table
below.
Note that x 15 = 1, so logs are manipulated mod 15. We now have two different
representations of elements of K: as tuples (or polynomials) and as powers. The
first representation is best for calculating additions and the second for calculating
multiplications and inverses.
Construction of the field K
4-Tuple Polynomial Power of x Logarithm

0000 0
1000 1 1 0
0100 x x 1
0010 x2 x2 2
0001 x3 x3 3
1100 1+x x4 4
0110 x + x2 x5 5
0011 x2 + x3 x6 6
1101 1 + x + x3 x7 7
1010 1 + x2 x8 8
0101 x + x3 x9 9
1110 1 + x + x2 x 10 10
0111 x + x2 + x3 x 11 11
1111 1 + x + x2 + x3 x 12 12
1011 1 + x2 + x3 x 13 13
1001 1 + x3 x 14 14
The following calculations clarify the use of this table:

1. (1 + x 2 )−1 = (x 8 )−1 = x −8 = x 15−8 = x 7 = 1 + x + x 3 .
2. log(x+x 2 +x 3 ) = 11 and log(1+x+x 2 +x 3 ) = 12. Thus log(x+x 2 +x 3 )(1+x+
x 2 + x 3 ) = (11 + 12) mod 15 = 8, hence (x + x 2 + x 3 )(1 + x + x 2 + x 3 ) = 1 + x 2 .
Theorem 5.2.1 allows us to construct a field of cardinality pn for any prime p and
any positive integer n. All we need to do is to take Zp and an irreducible polynomial
m(x) of degree n. Then Zp [x]/(m(x)) is the desired field. In this book we will not
prove that for any p and any positive integer n such a polynomial indeed exists
(although it does!). Moreover, for any prime p and positive integer n the field of
pn elements is unique up to an isomorphism. This is why it is denoted GF(pn ) and
called the Galois3 field of cardinality pn . Again, proving its uniqueness is beyond
the scope of this book.
Theorem 5.2.2 For any prime p and any positive integer n there exists a unique,
up to isomorphism, field GF(pn ) consisting of pn elements.
In the Advanced Encryption Standard (AES) algorithm, adopted in 2001, the
field GF(28 ) is used for calculations. This field is constructed with the use of the
irreducible polynomial m(x) = x 8 + x 4 + x 3 + x + 1.
3 See Sect. 3.1.3 for a brief historical note about this mathematician.
148 5 Polynomials
Exercises
1. Use the Extended Euclidean algorithm to find the (multiplicative) inverse of β =

1 + x + x 2 + x 3 in F = Z2 [x]/(x 5 + x 3 + 1).
2. Let F = Z3 [x]/(x 2 + 2x + 2).
(a) Prove that F is a field.
(b) List all elements of F.
(c) Show that 2x + 1 is a primitive element in F by calculating all powers of
2x + 1 and constructing the ‘logarithm table’ as in Example 5.2.3.
(d) Using the ‘logarithm table’ which you created in part (c), calculate
2x 7 (x + 1)−5 (2x + 2) + (x + 2)5 .
(e) How many primitive elements are there in the field F? List them all.
3. (advanced) Let f (x) = a0 + a1 x + · · · + an x n be a polynomial from F[x], where
F is any field. We define the derivative of f (x) by the formula:
f
(x) = a1 + 2a2 x + · · · + nan x n−1 .
(a) Check that the product rule holds for such a derivative.
(b) Prove that any multiple root of f (x) is also a root of gcd(f (x), f
(x)).
n
(c) Let p be a prime. Prove that the polynomial f (x) = x p − x does not have
multiple roots in any field F of characteristic p.
5.2.2 Minimal Annihilating Polynomials
Let F and K be two fields such that F ⊆ K. We say that F is a subfield of K and that
K is an extension of F if the addition and multiplication in K, being restricted to F,
coincide with the operations in F of the same name.
Example 5.2.4 Elements 0 and 1 of Z2 can be identified with constant polynomials

0 and 1 of K = Z2 [x]/(x 4 + x + 1). So Z2 is a subfield of K = Z2 [x]/(x 4 + x + 1).
Definition 5.2.1 Let F ⊂ K be an extension of fields and a ∈ K. We say that a

polynomial f (t) ∈ F[t] is an annihilating polynomial of a if a is a root of f (t), i.e.,
f (a) = 0. (Please note that the coefficients of f (t) lie in F while a is an element
of K.) A polynomial f (t) ∈ F[t] is called the minimal annihilating polynomial of a
over F if it is an annihilating polynomial which is monic and of minimal possible
degree.
Example 5.2.5 In the extension R ⊆ C, check that the polynomial f (t) = t 2 −2t +2
is the minimal annihilating polynomial for a = 1 + i over R.
Solution Indeed, we have f (a) = (1 + i)2 − 2(1 + i) + 2 = 2i − 2 − 2i + 2 = 0,

so f (t) is annihilating for a. At the same time there can be no linear annihilating
polynomial. Such polynomial would have real coefficients and hence would be of
the form g(t) = t − r, where r ∈ R. Substituting a will give (1 + i) − b = 0, which
is not possible. �
Exercises 5.2.1 Every complex number has an annihilating polynomial over R

which is at most quadratic.
Example 5.2.6 In the extension Z2 ⊆ Z2 [x]/(x 4 + x + 1) the polynomial f (t) =

t 4 + t + 1 is the minimal annihilating polynomial for x.
Solution We note first that f (x) = x 4 + x + 1 ≡ 0 mod x 4 + x + 1, hence f (t) is

an annihilating polynomial for x. On the other hand, if it were possible to find an
annihilating polynomial of degree 3 or smaller, say g(t) = αt 3 + βt 2 + γ t + δ1 with
at least one coefficient non-zero, then
αx 3 + βx 2 + γ x + δ1 = 0,
which means that 1, x, x 2 , x 3 are linearly dependent over Z2 . But this was a basis of
Z2 [x]/(x 4 + x + 1), so we have drawn a contradiction. �
Theorem 5.2.3 Let F ⊂ K be an extension of fields such that dimF K = n and

a ∈ K. Then the minimal annihilating polynomial for a has degree at most n.
Proof Let us consider the first n + 1 powers of a, that is, 1 = a0 , a, a2 , . . . , an . Since

the dimension of K over F is n, these n + 1 vectors must be linearly dependent over
F. Thus there exist c0 , c1 , . . . , cn ∈ F, not all zero, such that
c0 1 + c1 a + c2 a2 + · · · + cn an = 0.
This is the same as saying that f (a) = 0 for f (t) = c0 + c1 t + · · · + cn t n from F[t],
so we have found an annihilating polynomial of degree at most n. �
Theorem 5.2.4 Let F ⊂ K be an extension of fields and a ∈ K. Then

(i) The minimal annihilating polynomial of a is irreducible over F.
(ii) Every annihilating polynomial of a is a multiple of the minimal annihilating
polynomial of a.
Proof (i) Suppose that f (t) is the minimal annihilating polynomial of a and that it
is reducible, i.e., f (t) = g(t)h(t), where g(t) and h(t) can be considered monic and
each of degree strictly less that deg (f ). Then 0 = f (a) = g(a)h(a), whence (no zero
divisors in K) either g(a) = 0 or h(a) = 0, which contradicts the minimality of f (t).
150 5 Polynomials
(ii) Suppose that f (t) is the minimal annihilating polynomial of a and g(t) is any
other annihilating polynomial of a. Let us divide g(t) by f (t) with remainder:
g(t) = f (t)q(t) + r(t), r = 0 or deg (r) < deg (f ).
We claim that r = 0. Otherwise we substitute a to obtain g(a) = f (a)q(a) + r(a) or

0 = 0 + r(a), from which r(a) = 0 but the degree of r(t) is strictly smaller than that
of f (t), and thus we have arrived at a contradiction. �
To calculate the minimal annihilating polynomial we use the Linear Dependency
Relationship Algorithm (see Appendix B for this algorithm). Suppose we need to
find the minimal annihilating polynomial of an element a ∈ K over a subfield F of
K. Suppose n = dimF K. We choose any basis B of K over F. Then every element
x ∈ K can be represented by its coordinate column [x]B relative to the basis B.
For an element a ∈ K we consider the matrix A = ( [1]B [a]B [a2 ]B . . . [an ]B ).
Its columns are linearly dependent (as is any n + 1 vector in n-dimensional vector
space). By row reducing A to its reduced echelon form we find the first k such that
{[1]B , [a]B , [a2 ]B , . . . , [ak ]B } is linearly dependent. This reduced row echelon form
will also give us coefficients c0 , c1 , c2 , . . . , ck−1 such that c0 [1]B +c1 [a]B +c2 [a2 ]B +
· · · + ck−1 [ak−1 ]B + [ak ]B = 0. Then f (x) = x k + ck−1 x k−1 + · · · + c1 x + c0 is the
minimal annihilating polynomial of a over F.
Example 5.2.7 In the extension Z2 ⊆ Z2 [x]/(x 4 + x + 1), find the minimal annihi-
lating polynomial of a = 1 + x + x 3 .
Solution We calculate the coordinate tuples of the following powers of a:
a0 = (1 + x + x 3 )0 = 1 → 1000
a1 = (1 + x + x 3 )1 = 1 + x + x3 → 1101
a2 = (1 + x + x 3 )2 = 1 + x3 → 1001
a3 = (1 + x + x 3 )3 = x2 + x3 → 0011
a4 = (1 + x + x 3 )4 = 1 + x2 + x3 → 1011
These five are already linearly dependent, so we don’t have to compute any further
powers. Now we use Linear Dependency Relationship Algorithm to find a linear
dependency between these tuples. We place them as columns in a matrix and take it
to the row reduced echelon form:
⎡ ⎤ ⎡ ⎤
11101 10001
⎢ 0 1 0 0 0 ⎥ rref ⎢ 0 1 0 0 0 ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 0 0 1 1 ⎦ −→ ⎣ 0 0 1 0 0 ⎦ ,
01111 00011
from which it follows that 1, a, a2 , a3 are linearly independent (hence no annihilating

polynomials of degree ≤ 3) and that a4 = 1 + a3 , whence the minimal annihilating
polynomial of a will be f (t) = t 4 + t 3 + 1.
Exercises
1. What is the dimension of the field F = GF(24 ) over its subfield F1 = GF(22 )?
2. Let K = Z2 [x]/(1 + x + x 4 ) as introduced in Example 5.2.3. Find the minimal
annihilating polynomial over Z2 for:
(a) α = 1 + x + x 2 ;
(b) α = 1 + x.
3. Let K be the field K = Z2 [x]/(x 4 + x 3 + 1). Then K is an extension of Z2 .
(a) Create a table for K as in Example 5.2.3. Check that x is a primitive element
of this field.
(b) Find the minimal annihilator polynomials for x, x 3 and x 5 over Z2 .
(c) Calculate (x 100 + x + 1)(x 3 + x 2 + x + 1)15 + x 3 + x + 1 in the most efficient
way and represent it as a power of x and as a polynomial in x of degree at
most 3.
4. Generate a field consisting of 16 elements using GAP. It will give you:
gap> F:=GaloisField(2ˆ4);
GF(2ˆ4)
gap> AsList(F);
[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2, Z(2ˆ4), Z(2ˆ4)ˆ2, Z(2ˆ4)ˆ3, Z(2ˆ4)ˆ4,
Z(2ˆ4)ˆ6, Z(2ˆ4)ˆ7, Z(2ˆ4)ˆ8, Z(2ˆ4)ˆ9, Z(2ˆ4)ˆ11, Z(2ˆ4)ˆ12, Z(2ˆ4)ˆ13,
Z(2ˆ4)ˆ14 ]
(a) Explain why Z(24 )5 and Z(24 )10 are not listed among the elements.
(b) Using GAP find the polynomial in Z2 [x] of smallest degree of which Z(24 )7
is a root.
Chapter 6
Secret Sharing
The very word ‘secrecy’ is repugnant in a free and open society;

and we are as a people inherently and historically opposed to
secret societies, to secret oaths, and to secret proceedings.
John F. Kennedy (1917–1963)
Secrecy is the first essential in affairs of state.
Cardinal Richelieu (1585–1642)
Certain cryptographic keys, such as missile launch codes, numbered bank accounts
and the secret decoding exponent in an RSA public key cryptosystem, are so impor-
tant that they present a dilemma. If too many copies are distributed, one may be leaked.
If too few, they might all be lost or accidentally destroyed. Secret sharing schemes
invented by Shamir [1] and Blakley [2] address this problem, and allow arbitrarily
high levels of confidentiality and reliability to be achieved. A secret sharing scheme
‘divides’ the secret s into ‘shares’—one for every user—in such a way that s can be
easily reconstructible by any authorised subset of users, but an unauthorised subset
of users can extract absolutely no information about s. A secret sharing scheme, for
example, can secure a secret over multiple servers and remain recoverable despite
multiple server failures.
Secret sharing also provides a mechanism to facilitate a cooperation—in both
human and artificial societies—when cooperating agents have different status with
respect to the activity and certain actions are only allowed to coalitions that satisfy
certain criteria, e.g., to sufficiently large coalitions or coalitions with players of
sufficient seniority or to coalitions that satisfy a combination of both criteria. The
banking system where the employees are arranged into a hierarchy according to their
ranks or designations provides many examples. Simmons,1 for example, describes
the situation of a money transfer from one bank to another. If the sum to be transferred
is sufficiently large this transaction must be authorised by three senior tellers or two
vice-presidents. However, two senior tellers and a vice-president can also authorise
the transaction. Tassa2 provides another banking scenario. The shares of the vault
1 Simmons, G. (1990). How to (really) share a secret. In: Proceedings of the 8th annual international
cryptology conference on advances in cryptology (pp. 390–448). London, UK: Springer-Verlag.

2 Tassa,T. (2007). Hierarchical threshold secret sharing. Journal of Cryptology, 20, 237–264.
DOI 10.1007/978-3-319-21951-6_6
154 6 Secret Sharing
key may be distributed among bank employees, some of whom are tellers and some
are department managers. The bank policy could require the presence of, say, three
employees in opening the vault, but at least one of them must be a departmental
manager.
6.1 Introduction to Secret Sharing
6.1.1 Access Structure
More formally, we assume that the set of users is U = {1, 2, . . . , n} and D is the
dealer who facilitates secret sharing.3 It is always assumed that the dealer knows the
secret.
Definition 6.1.1 Let 2U be the power set4 of the set of all users U . The set ⊆ 2U
of all authorised coalitions is called the access structure of the secret-sharing scheme.
An access structure may be any subset of 2U such that
X ∈ and X ⊆ Y, then Y ∈ . (6.1)
The condition in the definition of an access structure is called the monotone

property, it reflects the natural requirement that if a smaller coalition knows the
secret, then the larger one will know it too. The access structure is public knowledge
and all users know it.
Let ⊆ 2U be an access structure. A coalition C ⊆ U is called minimal autho-
rised coalition if it is authorised and any proper subset of C is not authorised. Due
to the monotone property (6.1) the access structure is completely defined by the set
min of its minimal authorised coalitions.
We assume that every user participates in at least one minimal authorised coalition.
If not, such a user never brings useful information to any coalition of users and is
redundant.
Example 6.1.1 The threshold access structure “k-out-of-n” consists of all subsets
of 2U consisting of k or more users.
According to Time Magazine, May 4, 1992, a typical threshold access structure
was realized in USSR. The three top state officials, the President, the Prime Minister,
and the Minister of Defense, each had the so-called “nuclear suitcase” and any two
of them could authorise a launch of a nuclear warhead. No one of them could do it
alone. So it was a two-out-of-three threshold scheme.
In a two-out-of-three scheme U = {1, 2, 3} and min = {{1, 2}, {1, 3}, {2, 3}}.
We see that all users are equally important. If, however, U = {1, 2, 3} and
3 The dealer is not necessarily a person, this can be a computer.

4 The set of all subsets of U .
6.1 Introduction to Secret Sharing 155
min = {{1, 2}, {1, 3}}, then user 1 is much more important than the two other
users. Without user 1 the secret cannot be accessed. But user 1 is not almighty. To
access the secret she needs to join forces with at least one other user.
Here are a couple of real life examples.
Example 6.1.2 Consider the situation of a money transfer from one bank to another.
If the sum to be transferred is sufficiently large this transaction must be authorised
by three senior tellers or two vice-presidents. However, two senior tellers and a
vice-president can also authorise the transaction.
Example 6.1.3 The United Nations Security Council consists of five permanent
members and 10 non-permanent members. The passage of a resolution requires
that all five permanent members vote for it, and also at least nine members in total.
We will deal with threshold access structures first. A very elegant construction by
Shamir realising the threshold access structure is based on Lagrange’s interpolation
polynomial and will be presented in the next section.
Exercises
1. Let U = {1, 2, 3, 4} and min = {{1, 2, 3}, {3, 4}}. List all authorised coalitions.
2. Write down the minimal authorised coalitions for the access structure in
Example 6.1.2. Assume that the vice-presidents are users 1 and 2 and the senior
tellers are users 3, 4, 5.
3. Find the number of minimal authorised coalitions in Example 6.1.3.
4. Let U1 and U2 be disjoint sets of users and let 1 and 2 be access structures over
U1 and U2 respectively. Let U = U1 ∪ U2 . Then
(a) The sum of 1 and 2 is 1 + 2 = {X ⊆ U | X ∩U1 ∈ 1 or X ∩U2 ∈ 2 }.
Prove that 1 + 2 is an access structure.
(b) The product of 1 and 2 is 1 × 2 = {X ⊆ U | X ∩ U1 ∈ 1 and
X ∩ U2 ∈ 2 }. Prove that 1 × 2 is an access structure.
5. Let be an access structure over a set of users U and let us define the dual
structure of as the set of complements of all unauthorised coalitions, i.e.,
= {X ⊆ U | X c ∈
/ }.
Prove that is an access structure.
6.1.2 Shamir’s Threshold Access Scheme
In this section we will look at one particular application of polynomials to cryptog-

raphy, namely to secret sharing.
Suppose that the secret is a string of zeros and ones. We may assume that it is
the binary representation of a positive integer s. We choose a prime p which is
sufficiently large. Then the field Z p is large and we may assume that s ∈ Z p without
any danger that it can be easily guessed. Thus our secret will always be an element
of a finite field.
Suppose n users wish to share this secret by dividing it into ‘pieces’ in such a way
that any k people, where k is a fixed positive integer not exceeding n, can learn the
secret from their pieces, but no subset of less than k people can do so. Here the word
“dividing” must not be understood literally. Shamir proposed the following elegant
solution to this problem. The secret can be “divided into pieces” as follows. The
centre:
1. generates k random coefficients t0 , t1 , . . . , tk−1 ∈ Z p and sets the secret s to be
t0 ; k−1 i
2. forms the polynomial p(x) = i=0 ti x ∈ Z p [x];
3. gives user i the “piece” p(i), for i = 1, . . . , n. Practically it can be an electronic
card where a pair of numbers (i, p(i)) is stored.
Now, given any k values for p(x), one can use Theorem 5.1.2 to interpolate and to find
all coefficients of p(x) including the secret t0 = s. However, due to Corollary 5.1.1,
a subset of k−1 values for p(x) provides absolutely no information about s, since for
any possible s there is a polynomial of degree k−1 consistent with the given values
and the possible value of s.
Example 6.1.4 The company Dodgy Dealings Inc. has four directors. According to
a clause in the company’s constitution any three of them are allowed to get access
to the company’s secret offshore account. The company set up a Shamir’s threshold
access secret sharing scheme for facilitating this clause with the secret password
being an element of Z7 . According to this scheme the system administrator issued
magnetic cards to the directors as required.
Suppose that three directors with the following magnetic cards
1 2 4
3 0 6
gathered to make a withdrawal from their offshore account. Show how the secret
password can be calculated.
Solution. A quadratic polynomial p(x) = t0 + t1 x + t2 x 2 ∈ Z7 [x] satisfies
p(1) = 3, p(2) = 0, p(4) = 6;
given that, we must find t0 . Using the Lagrange interpolation formula
(x − 2)(x − 4) (x − 1)(x − 2) (x + 5)(x + 3) (x + 6)(x + 5)

p(x) = 3 +6 =3 +6
(1 − 2)(1 − 4) (4 − 1)(4 − 2) 3 6
= (x 2 + x + 1) + (x 2 + 4x + 2) = 2x 2 + 5x + 3,
hence the secret is t0 = 3. �

6.1 Introduction to Secret Sharing 157
If in Shamir’s scheme the enumeration of users is publicly known, then only the
value p(i) must be given to the ith user. In this case the secret s and each share p(i)
are both an element of the same field and need the same number of binary digits to
encode them. As we will see one cannot do any better.
Exercises
1. According to the 3-out-of-4 Shamir’s threshold secret sharing scheme the admin-
istrator issued electronic cards to the users:
1 2 3 4
4 4 x 0
(a) Show how the secret can be calculated by users 1, 2 and 4.

(b) Find x and determine the card of user 3.
2. Shamir’s secret sharing scheme is set up so that the secret is an element of Z31
and the threshold is 3 which means that any three users are authorised. Show how
the secret can be reconstructed from the shares
1 5 7
16 7 22
3. The league club Crawlers United has six senior board members. Each year the
club holds an anniversary day, and on this day the senior board members have a
duty to open the club vault, take out the club’s meager collection of trophies, and
put them on display. According to a clause in the club’s constitution any four of
them are allowed to open the vault. The club set up a Shamir’s threshold access
secret sharing scheme for facilitating this clause with the secret password being
an element of Z97 . According to this scheme the administrator issued electronic
cards to the senior board members as required.
Suppose that four senior board members are gathered to open the vault with the
following cards:
1 2 4 6
56 40 22 34
(a) Show how the secret password can be calculated.

(b) Guess which cards were given to the two remaining senior board members.
Hint: Use the GAP commands InterpolatedPolynomial and Value.

6.2 A General Theory of Secret Sharing Schemes
6.2.1 General Properties of Secret Sharing Schemes
Let us see now how we can define a secret sharing scheme formally.
Let S0 , S1 , . . . , Sn be finite sets where S0 will be interpreted as a set of all possible
secrets and Si will be interpreted as a set of all possible shares that can be given to
user i. Suppose |Si | = m i . We may think of a very large table, consisting of up to
M = m 0 m 1 · · · m n rows, where each row contains a tuple
(s0 , s1 , . . . , sn ), (6.2)
where si comes from Si (and all rows are distinct). Mathematically, the set of all such
(n + 1)-tuples is denoted by the Cartesian product S0 × S1 × . . . × Sn . Any subset
T ⊆ S0 × S1 × . . . × Sn
of this Cartesian product is called a distribution table. Thus T consists of several

rows like the one shown in (6.2). If a secret s0 ∈ S0 is to be distributed among users,
then one (n + 1)-tuple
(s0 , s1 , . . . , sn ) ∈ T
is chosen by the dealer from T at random uniformly among those tuples whose first
coordinate is s0 . Then user i gets the share si ∈ Si .
There is only one but essential component of a secret sharing scheme that we
have not introduced yet. We must ensure that every authorised coalition must be
able to recover the secret. Thus we need to have, for every authorised coalition
X = {i 1 , i 2 , . . . , i k } ∈ , a secret recovery function (algorithm)
f X : Si1 × Si2 × . . . × Sik → S0
with the property that f X (si1 , si2 , . . . , sik ) = s0 for every (s0 , s1 , s2 , . . . , sn ) ∈ T .
In particular, in the distribution table there cannot be tuples (s, . . . , si1 , . . . , si2 , . . . ,
sik , . . .) with s = s0 .
6.2 A General Theory of Secret Sharing Schemes 159
Example 6.2.1 Let us consider a secret sharing scheme for n = 3 users, =

{{1, 2}, {1, 3}} with Si = Z3 for i = 0, 1, 2, 3 and the distribution table
⎡ ⎤
D 1 2 3
⎢0 0 0 0⎥
⎢ ⎥
⎢1 1 1 2⎥
⎢ ⎥
⎢0 1 2 1⎥
⎢ ⎥
⎢1 2 0 0⎥
T =⎢
⎢2
⎥. (6.3)
⎢ 2 2 1⎥⎥
⎢0 2 1 2⎥
⎢ ⎥
⎢2 1 0 0⎥
⎢ ⎥
⎣1 0 2 1⎦
2 0 1 2
The two secret recovery functions s0 = f {1,2} (s1 , s2 ) and s0 = f {1,3} (s1 , s2 ) can be
given by the tables
s1 s2 f {1,2} (s1 , s2 ) s1 s3 f {1,3} (s1 , s3 )

0 0 0 0 0 0
1 0 2 1 0 2
0 1 2 0 1 1
1 1 1 1 1 0
0 2 1 0 2 2
2 0 1 2 0 1
1 2 0 1 2 1
2 1 0 2 1 2
2 2 2 2 2 0
respectively. Note that the function f {2,3} does not exist. Indeed, when (s2 , s3 ) =
(0, 0) the secret s0 can take values 0, 1, 2 so f {2,3} (0, 0) is not defined.
Example 6.2.2 (n-out-of-n scheme) Let us design a secret sharing scheme with n
users such that the only authorised coalition is the grand coalition, that is the set
U = {1, 2, . . . , n}. We need a sufficiently large field F and set S0 = F so that it is
infeasible to try all secrets one by one. We will also have Si = F for all i = 1, . . . , n.
To share a secret s ∈ F the dealer generates n−1 random elements s1 , s2 , . . . , sn−1
∈ F and calculates sn = s − (s1 + · · · + sn−1 ). Then he gives share si to user i.
n distribution table T will consists of all n-tuples (s0 , s1 , s2 , . . . , sn ) such that
The
i=1 si = s0 and the secret recovery function (in this case the only one) will be
fU (s1 , s2 , . . . , sn ) = s1 + s2 + · · · + sn .
The distribution table is convenient for defining the secret sharing scheme, how-
ever, in practical applications it is usually huge, so schemes are normally defined
differently.
Definition 6.2.1 A secret sharing scheme realising access structure is called

perfect if for every non-authorised coalition of users { j1 , j2 , . . . , jm } ⊂ U , for
every sequence of shares s j1 , s j2 , . . . , s jm with s jr ∈ S jr , and for every two
possible secrets s, s ∈ S0 the distribution table T contains as many tuples
(s, . . . , s j1 , s j2 , . . . , s jm , . . .) as tuples (s , . . . , s j1 , s j2 , . . . , s jm , . . .).
In other words, if the scheme is perfect a non-authorised coalition
X = { j1 , j2 , . . . , jm } with shares s j1 , s j2 , . . . , s jm will have no reason to believe
that the secret s was more likely to be chosen than any other secret s . For example,
in Example 6.2.1 if users 2 and 3 have shares 2 and 1, respectively, they will observe
the following rows of T
D 1 2 3
0 1 2 1
2 2 2 1
1 0 2 1
and will be unable to determine which row was chosen by the dealer. So the scheme
in that example is perfect.
The scheme from Example 6.2.2 is obviously perfect. Let us have another look at
the perfect secret sharing scheme invented by Shamir and specify the secret recovery
functions.
Example 6.2.3 ([1]) Suppose that we have n users and the access structure is now
= {X ⊆ U | |X | ≥ k}, i.e., a coalition is authorised if it contains at least k users.
Let F be a large finite field and put Si = F for i = 0, 1, . . . , n. Let a1 , a2 , . . . , an
be distinct fixed publicly known nonzero elements of F (in the earlier example we
took ai = i).
Suppose s ∈ F is the secret to share. The dealer randomly generates t0 , t1 , . . . , tk−1
∈ F, sets s = t0 , and forms the polynomial
p(x) = t0 + t1 x + · · · + tk−1 x k−1 . (6.4)
Then she gives the share si = p(ai ) to user i. Note that s = p(0).
Suppose now X = {i 1 , i 2 , . . . , i k } is a minimal authorised coalition. Then the
secret recovery function is
k

(−ai1 ) · · · (−a ir ) · · · (−ai k )
f X (si1 , si2 , · · · , sik ) = sir ,
r =1 (air − ai1 ) · · · (ai
r − air ) · · · (air − ai k )
where the hat over the term indicates its non-existence. This is the value at zero of
Lagrange’s interpolation polynomial
k
(x − ai1 ) · · · (x − air ) · · · (x − aik )
p(air ) ,
r =1

(air − ai1 ) · · · (air − air ) · · · (air − aik )
which is equal to p(x).

We now may use the idea in Example 6.2.2 to construct a perfect secret sharing
scheme for an arbitrary access structure . We will illustrate this method in the
following.
Example 6.2.4 Let U = {1, 2, 3, 4} and min = {{1, 2}, {2, 3}, {3, 4}}. Let s ∈ Z p
be a secret. Firstly we consider three coalitions of users {1, 2}, {2, 3} and {3, 4}
separately and build 2-out-of-2 schemes on each of these sets of users. Under the
first scheme users 1 and 2 will get shares a and s − a, under the second scheme users
2 and 3 get shares b and s − b and under the third scheme users 3 and 4 get shares c
and s − c. Thus altogether users will get the following shares:
1 ← a,
2 ← (s − a, b),
3 ← (s − b, c),
4 ← s − c.
Let us show that this scheme is perfect. For this we have to consider every maximal
non-authorised coalition and show that it has no clue about the secret. It is easy
to see that every coalition of three or more players is authorised. So the maximal
non-authorised coalitions will be {1, 3}, {1, 4}, {2, 4}. The coalition {1, 3} will know
values a, s − b and c. Since a, b, c were chosen randomly and independently, a, s − b
and c are also three random independent values which contain no information about
s. Similarly for {1, 4} and {2, 4}. Note that under this scheme users 2 and 3 will have
to hold as their shares two elements of Z p each. Their shares will be twice as long
as the secret (in binary representation).
This can be developed into a general method that allows us to prove:
Theorem 6.2.1 For any access structure there exists a perfect secret sharing
scheme which realises it.
Sketch of the proof. Let us consider the set min of all minimal authorised coalitions.
Suppose a user m belongs to q authorised coalitions W1 , W2 , . . . , Wq whose cardi-
nalities are m 1 , m 2 , . . . , m q . We then consider q separate smaller access structures
where the ith one will be defined on the set of users Wi and will be an m i -out-of-m i
access structure. Let si be the share received by user i in this reduced access struc-
ture. So, in total, user i receives the vector of shares (s1 , s2 , . . . , sq ). As the access
structure is public knowledge, user i will use his share si only when an authorised
coalition with his participation contains Wi . If a coalition is not authorised, then
it does not contain any of the W1 , W2 , . . . , Wq and it is possible to show that its
participants cannot get any information about the secret. �
Under this method if a user belongs to k minimal authorised coalitions, then she
will receive k elements of the field to hold as her share.
Suppose 2d−1 ≤ |S0 | < 2d or log2 |S0 | = d. Then we can encode elements of
S0 (secrets) using binary strings of length d. In this case we say that the length of
the secret is d. Similarly we can talk about the lengths of the share that user i has
received. We say that the information ratio of the secret sharing scheme S is
n log2 |Si |
i(S) = max .
i=1 log2 |S0 |
This number is the maximal ratio of the amount of information that must be conveyed
to a participating user to the amount of information that is contained in the secret.
In the secret sharing literature it is also common to use the term information rate,
which is the inverse of the information ratio. The information ratio of the scheme
constructed in Theorem 6.2.1 is terrible. For example, for the ( n2 +1)-out-of-n scheme

n
(assume that n is even) every user belongs to n/2 authorised coalitions, which by
√
Stirling’s formula grows approximately as 2n / n. More precisely, we will have

2 2n
i(S) ∼ ·√ ,
π n
i.e., the information ratio of such scheme grows exponentially with n. We know we
can do much better: the information ratio of Shamir’s scheme is 1. However, for
some access structures the information ratio can be large. It is not known exactly
how large it can be.
Exercises
1. Consider the secret sharing scheme with the following distribution table.
s0 s1 s2 s3 s4 s5 s6
0 0 0 1 1 2 2
0 0 0 2 2 1 1
0 1 1 2 2 0 0
0 1 1 0 0 2 2
0 2 2 0 0 1 1
0 2 2 1 1 0 0
1 0 1 1 2 2 0
1 0 2 2 1 1 0
1 1 2 2 0 0 1
1 1 0 0 2 2 1
1 2 0 0 1 1 2
1 2 1 1 0 0 2
(a) What is the domain of the secrets? What are the domains of the shares?
(b) Show that the coalition of users {1, 2} is authorised but {1, 3, 5} is not.
(c) Give the table for the secret recovery function for the coalition {1, 2}.
6.2.2 Linear Secret Sharing Schemes
Let us look at Shamir’s scheme from a different perspective. We can observe that the
vector of the shares (where we think that the secret is the share of the dealer) can be
obtained by the following matrix multiplication as
⎡ ⎤
1 0 0 ... 0 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎢ 1 2 k−1 ⎥ t0 p(0) s0
⎢ a1 a1 . . . a1 ⎥ ⎢ t ⎥ ⎢ p(a ) ⎥ ⎢ s ⎥
⎢ ⎥⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 1⎥
⎢ 1 a2 a22 . . . a2k−1 ⎥ ⎢ . ⎥ = ⎢ . ⎥ = ⎢ . ⎥, (6.5)
⎢ ⎥ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
⎣... ... ... ... ... ⎦
tk−1 p(an ) sn
1 an an2 . . . ank−1
where p(x) is the polynomial (6.4). Since all a1 , a2 , . . . , an are assumed to be

different and nonzero, any k rows of the matrix in (6.5) are linearly independent
since the determinant of the matrix formed by these rows is the well-known Van-
dermonde determinant (10.3). This is why any k users can learn all the coefficients
t0 , t1 , t2 , . . . , tk−1 of p(x), including its constant term t0 (which is the secret).
Let us write (6.5) in the matrix form as H t = s, where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 ... 0 t0 s0
k−1 ⎥
⎢ 1
⎢ a1 a12 . . . a1 ⎥ ⎢ t1 ⎥ ⎢ s1 ⎥
⎢ ⎥ ⎢ ⎥
H =⎢
⎢ 1 a2 a22 . . . a2k−1 ⎥
⎥, t = ⎢ .. ⎥ , s = ⎢ .. ⎥, (6.6)
⎣... ⎣ ⎦ ⎣ . ⎦
... ... ... ... ⎦ .
1 an an2 . . . ank−1 tk−1 sn
and denote the rows of H as h0 , h1 , h2 , . . . , hn . Then the following is true: the span
of a group of distinct rows {hi1 , hi2 , . . . , hir }, none of which is h0 , contains h0 if and
only if r ≥ k. We may now define the k-out-of-n access structure as follows:
H = {{i 1 , i 2 , . . . , i s } ⊆ U | h0 ∈ span{hi1 , hi2 , . . . , his }}. (6.7)
This can be generalised by considering matrices H other than the one in (6.6).
Theorem 6.2.2 (Linear Secret Sharing Scheme) Let H be an arbitrary (n + 1) × k

matrix with coefficients in a finite field F. Let h0 , h1 , . . . , hn be the rows of H .
Let us define a secret sharing scheme on the set of users U = {1, 2, . . . , n} as
follows. Choose the coefficients of vector t = (t0 , t1 , . . . , tk−1 ) randomly, calculate
the vector s = (s0 , s1 , . . . , sn ) from the equation H t = s, declare s0 to be the secret
and s1 , s2 , . . . , sn the shares of users 1, 2, . . . , n, respectively. Then this is a perfect
secret sharing scheme realising the access structure H defined as in (6.7).
Proof Suppose h0 is in the span of {hi1 , hi2 , . . . , hir }. Then
h0 = λ1 hi1 + λ2 hi2 + . . . + λr hir,

where the coefficients λ1 , λ2 , . . . , λr can be found by solving this system of linear

equations. Multiplying both sides of this equation by t we obtain s0 = λ1 si1 +
λ2 si2 + · · · + λr sir , hence the secret s0 can be calculated from the shares of users
i 1 , i 2 , . . . , ir .
Suppose now that h0 is not in the span of {hi1 , hi2 , . . . , hir }. Without loss of gener-
ality we may assume that i 1 = 1, . . . , ir = r , i.e., that there are users 1, 2, . . . , r with
their shares s1 , s2 , . . . , sr and that h0 is not a linear combination of the h1 , h2 , . . . , hr .
Let Hr be the matrix with rows h1 , h2 , . . . , hr and H r be the matrix with rows
h0 , h1 , . . . , hr . By the assumption we have rank(H r ) = rank(Hr ) + 1.
Let sr be the column vector with entries s1 , s2 , . . . , sr and s̄r be the column vector
with entries s0 , s1 , s2 , . . . , sr . Since the system
Hr t = sr (6.8)
is consistent we have rank(Hr | sr ) = rank(Hr ), where (Hr | sr ) is the augmented

matrix of the system (6.8). As the matrix (H r | s̄r ) is obtained by adding just one
row to (Hr | sr ) its rank is either the same or larger by 1. On the other hand, it is
not smaller than the rank of H r . Since rank(H r ) = rank(Hr ) + 1 it will be true
that rank(H r | s̄r ) = rank(H r ) and this system is consistent for every s0 . Since
the dimension and hence the cardinality—remember F is finite—of the solution is
determined by the rank of H r only, we will have the same number of solutions to the
equation H r t = s̄r no matter what s0 was. So members of the coalition {i 1 , i 2 , . . . , ir }
will be unable to identify s0 , hence this coalition is not authorised. �
Example 6.2.5 Let U = {1, 2, 3} and min = {{1, 2}, {1, 3}}. We can realise this
access structure by a linear scheme. Consider the matrix
⎡ ⎤
1 0
⎢1 1⎥
H =⎢ ⎥
⎣ 1 −1 ⎦ .
2 −2
The dealer may choose two random elements t0 , t1 from a field Z p for some large
prime p and calculate ⎡ ⎤
s0
⎢ s1 ⎥
⎢ ⎥ = H t0 ,
⎣ s2 ⎦ t1
s3
where s0 is taken as the secret and s1 , s2 and s3 are given as shares to users 1, 2 and
3, respectively. (Note that s0 = t0 .) If users 1 and 2 come together they can find t0
and t1 from the system of linear equations

1 1 t0 s
= 1
1 −1 t1 s2
because the determinant of this system is nonzero. Similarly, 1 and 3 can also do
this. But, if 2 and 3 come together, they will face the system

1 −1 t0 s2
= ,
2 −2 t1 s3
which has exactly p solutions. Their shares therefore provide them with no informa-
tion about t0 and hence s0 .
Exercises
1. Determine the minimal authorised coalitions for the access structure realised by
the linear secret sharing scheme with the matrix
⎤
⎡
1 0
⎢1 1⎥
⎢ ⎥
H =⎢ ⎥
⎢ 2 −2 ⎥
⎣3 3⎦
4 −4
over Z11 .
2. Let F be a sufficiently large field Z p . Find the access structure which is realised
by the linear secret sharing scheme with the matrix
⎡ ⎤
1 0 0
⎢1 1 1⎥
⎢ ⎥
⎢1 2 4⎥
⎢ ⎥
H =⎢
⎢1 3 9⎥⎥.
⎢0 0 1⎥
⎢ ⎥
⎣0 0 2⎦
0 0 3
3. Let F be a sufficiently large field. Find the access structure which is realised by
⎡ ⎤
1 0 0
⎢1 1 0⎥
⎢ ⎥
⎢1 2 0⎥
H =⎢ ⎥.
⎢1
⎢ 3 32 ⎥
⎥
⎣1 4 42 ⎦
1 5 52
4. Let F be a sufficiently large field. Find the access structure which is realised by
⎡ ⎤
1 0 0
⎢1 a1 0 ⎥
⎢ ⎥
⎢1 a2 0 ⎥
⎢ ⎥
H = ⎢1 a3 a32 ⎥ ,
⎢ ⎥
⎢ ⎥
⎣1 a4 a42 ⎦
1 a5 a52
where a1 , . . . , a5 are distinct nonzero elements of the field F.

5. A linear secret sharing scheme for the group of users U = {1, 2, 3, 4, 5} is
defined by the matrix over Z31 :
⎡ ⎤ ⎡ ⎤
h0 1 0 0 0
⎢ h1 ⎥ ⎢ 1 2 3 0⎥
⎢ ⎥ ⎢ ⎥
⎢ h2 ⎥ ⎢ 1 3 3 0⎥
H =⎢ ⎥ ⎢
⎢ h 3 ⎥ = ⎢ 11
⎥.
⎢ ⎥ ⎢ 5 2 0⎥⎥
⎣ h4 ⎦ ⎣ 0 1 1 2⎦
h5 0 6 1 1
These users got shares 2, 27, 20, 10, 16, respectively, which are also elements
of Z31 . Let A = {1, 2, 3} and B = {1, 4, 5} be two coalitions.
(a) Show that one of the coalitions is authorised and the other is not.
(b) Show how the authorised coalition can determine the secret.
6. Let H be an (n + 1) × k matrix over a field F and H be the access structure
defined by the formula (6.7). Let us represent the ith row hi of this matrix as
hi = (ci , hi ), where ci ∈ F is the first coordinate of hi and hi is a (k − 1)-
dimensional row vector of the remaining coordinates. Suppose the coalition
{i 1 , i 2 , . . . , ir } is not authorised in H . Then
r
r

λ j h j = 0 =⇒ λjcj = 0
j=1 j=1
for all λ1 , λ2 , . . . , λr .
7. Let U and V be disjoint sets of k and m users, respectively. Let M and N be two
matrices realising linear secret sharing schemes with access structures M and
N . Find the matrix realising the access structures
(a) M + N ,
(b) M × N
on the set of users U ∪ V .
8. Prove that the access structure min = {{1, 2}, {2, 3}, {3, 4}} on the set of users
U = {1, 2, 3, 4} cannot be realised by a linear secret sharing scheme.
9. Let n > 2. The access structure with the set of minimal authorized coalitions
min = {{1, 2}, {1, 3}, . . . , {1, n}, {2, 3, . . . , n}}
on the set of users U = {1, 2, . . . , n} cannot be realised by a linear secret sharing

scheme.
6.2.3 Ideal and Non-ideal Secret Sharing Schemes
Given a secret sharing scheme with access structure , a user is called a dummy if
she does not belong to any minimal authorised coalition in min . A dummy user can
be removed from any authorised coalition without making it non-authorised.
Theorem 6.2.3 Let S0 be the set of possible secrets and Si be the set of possible
shares that can be given to user i in a secret sharing scheme S. If this scheme is
perfect and has no dummy users, then |Si | ≥ |S0 | for all i = 1, . . . , n or i(S) ≥ 1.
Proof Let i be an arbitrary user. Since no dummies exist, i belongs to one of the
minimal authorised coalitions, say X = {i 1 , i 2 , . . . , i k }, and with no loss of generality
we may assume that i = i k . Suppose that there is a tuple (s0 , s1 , . . . , sn ) ∈ T in the
distribution table where s0 is the secret shared and si1 , si2 , . . . , sik−1 are the shares
given to users i 1 , i 2 , . . . , i k−1 . Since the scheme is perfect the distribution table
contains tuples (s, . . . , si1 , . . . , si2 , . . . , sik−1 , . . .) for every s ∈ S0 . However, if we
add user i = i k we get the coalition X which is authorised and can recover the secret.
Thus, when the shares si1 , si2 , . . . , sik−1 of users i 1 , i 2 , . . . , i k−1 are fixed the secret
depends on the share of the user i only. Hence for every possible secret s there is a
share t (s) which, if given to the user i, leads to recovery s as the secret by coalition
X and can be calculated using the secret recovery function f X of coalition X , that is
f X (si1 , . . . , sik−1 , t (s)) = s.
The mapping t : S0 → Si is one-to-one as if t (s1 ) = t (s2 ), then
s1 = f X (si1 , . . . , sik−1 , t (s1 )) = f X (si1 , . . . , sik−1 , t (s2 )) = s2 .
Thus Si has at least as many elements as S0 , that is |Si | ≥ |S0 |. �

Definition 6.2.2 A secret sharing scheme S is called ideal if it is perfect and
i(S) = 1.
Ideal schemes are the most informationally efficient having their information rate
equal to 1. By Theorem 6.2.3 this is the best possible rate for a perfect scheme. An
equivalent statement would be that |Si | = |S0 | for all i = 1, . . . , n. Normally in such
cases both secret and shares belong to the same finite field. In particular, this is true
for Shamir’s secret sharing scheme given in Example 6.2.3. Indeed, if the elements
a1 , a2 , . . . , an are publicly known, the secret is p(0) and the share of the ith user is
p(ai ) for the polynomial p there defined. More generally,
Theorem 6.2.4 Any linear secret sharing scheme is ideal.

Proof We need to recap how the shares in this scheme are defined. We have a (nor-
mally large) field F and an (n + 1) × k matrix H over this field. Then we define
a k-dimensional vector t over F at random and calculate the (n + 1)-dimensional
vector H t = s = (s0 , s1 , . . . , sn )T . Here s0 is the secret and si is the share of user i.
Both are elements of F. �
However there exist very simple access structures for which there are no ideal
secret sharing schemes (see [3] and [4]). Theorem 6.2.4 tells us that we have to look
for such examples among non-linear schemes.
Example 6.2.6 For the access structure of Example 6.2.4 with
min = {{1, 2}, {2, 3}, {3, 4}}
there are no ideal secret sharing schemes realising it.

Proof Suppose on the contrary there is an ideal secret sharing scheme S with the
distribution table T realising . Then for some positive integer q we have |Si | = q
for i = 0, 1, 2, 3, 4. For any subset I ⊆ {0, 1, 2, 3, 4} let T I be the restriction of T
to columns indexed by numbers from I and let #T I stand for the number of distinct
rows in T I . Let us firstly note that
#T{1,2} = #T{2,3} = #T{3,4} = q 2 .
Let us consider, for example, T{1,2} . Take any s1 ∈ S1 . As in the proof of

Theorem 6.2.3 we conclude that for any secret s0 there will be exactly one value
s2 ∈ S2 such that (s0 , s1 , s2 , . . .) is a row in T . Hence there will be exactly q distinct
rows in T{1,2} with s1 in column 1. As |S1 | = q there are exactly q 2 distinct rows in
T{1,2} .
Let us now fix arbitrary elements s0 ∈ S0 and s2 ∈ S2 . Since both {1, 2} and {2, 3}
are authorised, there will be unique s1 and s3 such that (s0 , s1 , s2 , s3 . . .) is a row in
T . In other words s1 uniquely determines s3 in any row of the distribution table. This
leads to the coalition {1, 4} being authorised. Indeed, since the table T is the public
knowledge users 1 and 4 can figure out the share given to user 3 and then can figure
out the secret since {3, 4} is authorised. �
The construction of the previous theorem leads us to a definition of a generalised
linear secret sharing scheme which may not be ideal.
Example 6.2.7 A family L of subspaces {L 0 , L 1 , . . . , L n } is said to satisfy property
“all or nothing” if for every subset X ⊂ {1, 2, . . . , n} the span span{L i | i ∈ X }
either contains L 0 or has zero intersection with it. Any such family defines a certain
access structure, namely
L = {X ⊆ U | span{L i | i ∈ X } ⊇ L 0 }.
Now the secret and the shares will be finite-dimensional vectors over F. Let
{L 0 , L 1 , . . . , L n } be subspaces of F k satisfying the property all-or-nothing. Let Hi
be the matrix whose rows form a basis of L i . Then we generate random vectors ti of
the same dimension as dim L i and calculate the secret and the shares as si = Hi ti ,
i = 0, 1, . . . , n. As in the Theorem 6.2.2 it leads to a perfect secret sharing scheme
realising L , however it may not be ideal as the following example shows.
Example 6.2.8 Let subspaces L 0 , L 1 , L 2 , L 3 , L 4 be the row spaces of the following

matrices:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 00 10 0 00 1 00
⎢0 1⎥ ⎢0 0⎥ ⎢0 1 0⎥ ⎢0 0 0⎥ ⎢0 1⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢0 0⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ ⎥
H0T = ⎢ ⎥, HT = ⎢ 1 0 ⎥ , HT = ⎢ 1 0 ⎥, HT = ⎢0 0 ⎥, HT = ⎢0 0 ⎥ .
⎢0 0⎥ 1 ⎢0 1⎥ 2 ⎢0 1 0⎥ 3 ⎢0 1 0⎥ 4 ⎢0 0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0 0⎦ ⎣0 0⎦ ⎣0 0 1⎦ ⎣0 0 1⎦ ⎣1 0⎦
0 0 00 00 0 10 0 01
This family satisfies the property all-or-nothing. The access structure associated with
it can be given by the set of minimal authorised coalitions as:
min = {{1, 2}, {2, 3}, {3, 4}}.
Since the secret is 2-dimensional and some shares are 3-dimensional the information
rate of such scheme will be 3/2. As 3/2 < 2 this is a more efficient secret sharing
scheme realising than the one in Example 6.2.4. In fact, it can be proved that the
scheme for this example is optimal for in the sense that it gives the best possible
information rate.
Exercises
1. Let T be the distribution table of a perfect ideal secret sharing scheme with the
set of users, U = {1, 2, . . . , n}, the dealer 0 and the cardinality of the domain of
secrets q. Prove that
(i) If a coalition C is authorised and C = C ∪ {0}, then #TC = #TC ;
(ii) If a coalition C is not authorised and C = C ∪ {0}, then #TC = q · #TC .
2. Prove all the missing details in Example 6.2.8.
3. In this exercise we consider the case when for an access structure of a secret
sharing scheme with distribution table T all minimal authorised coalitions have
size 2. In this case min can be interpreted as edges of a graph G() defined on
U = {1, 2, . . . , n}. We assume that this graph is connected. Let the cardinality of
the domain of secrets be q.
(i) Show that, if {i, j} ∈ min , then #T{i, j} = q 2 .
(ii) Prove that #TU ∪{0} = q 2 .
/ min , then #T{i, j} = q.
(iii) Prove that if {i, j} ∈
(iv) Prove that if {i, j} and { j, k} are both not authorised, then {i, k} is not autho-
rised too.
(v) Prove the following theorem proved in [3].
Theorem 6.2.5 Let be an ideal access structure such that all minimal authorised
coalitions have size 2 and G() is connected. Then the complementary graph of
G() is a disjoint union of cliques.5
References
1. Shamir, A.: How to share a secret. Commun. ACM 22, 612–613 (1979)
2. Blakley, G.R.: Safeguarding cryptographic keys. In: Proceedings of the National Computer
Conference, vol. 48, pp. 313–317 (1979)
3. Brickell, E.F., Davenport, D.M.: On the classification of ideal secret sharing schemes. J. Cryptol.
4, 123–134 (1991)
4. Stinson, D.R.: An explication of secret sharing schemes. Des. Codes Cryptogr. 2, 357–390
(1992)
5A clique is a subgraph where any two vertices are connected.

Chapter 7
Error-Correcting Codes
All sorts of computer errors are now turning up. You’d be

surprised to know the number of doctors who claim they are
treating pregnant men.
Isaac Asimov (1920–1992)
This chapter deals with the problem of reliable transmission of digitally encoded
information through an unreliable channel. When we transmit information from a
satellite, or an automatic station orbiting the moon, or from a probe on Mars, then
for many reasons (e.g., sun-bursts) our message can be distorted. Even the best
telecommunication systems connecting numerous information centres in various
countries have some non-zero error rate. These are examples of transmission in
space. When we save a file on a hard disc and then try to read it one month later, we
may find that this file have been distorted (due, for example, to microscopic defects
on the disc’s surface). This is an example of transmission in time. The channels
of transmission in both cases are different but they have one important feature in
common: they are not 100 % reliable. In some cases even a single mistake in the
transmission of a message can have serious consequences. We will show how algebra
can help to address this important problem.
We think of a message as a string of symbols of a certain alphabet. The most
common is the alphabet consisting of two symbols 0 and 1. It is called the binary
alphabet and we can interpret these symbols as elements of the finite field Z2 . Some
non-binary alphabets are also used, for example, we can use the symbols of any finite
field F. But we will initially concentrate on the binary case.
The symbols of the message are transmitted through the channel one by one. Let
us see what can happen to them. Since mistakes in the channel do occur, we assume
that, when we transmit 0, with probability p > 1/2 we receive 0 and with probability
1 − p we receive 1 as a result of a mistake in the channel. Similarly, we assume that
transmitting 1 we get 1 with probability p and 0 with probability 1 − p. Thus we
assume that the probability of a mistake does not depend on the transmitted symbol.
In this case the channel is called symmetric. In our case we are talking about a binary
symmetric channel. It can be illustrated as follows:

DOI 10.1007/978-3-319-21951-6_7
172 7 Error-Correcting Codes
x⊕1
1−p
p
x
Here the error is modeled by means of addition modulo 2. Let x be the symbol to
be transmitted. If transmission is perfect, then x will also be the symbol received, but
if a mistake occurs, then the message received will be x ⊕ 1, where the addition is
in the field Z2 . Indeed, 0 ⊕ 1 = 1 and 1 ⊕ 1 = 0. Thus the mistake can be modeled
algebraically as the addition of 1 to the transmitted symbol.
In practical situations p is very close to 1, however, even when p = 0.98, among
any 100 symbols transmitted, on average two will be transmitted with an error. Such a
channel may not be satisfactory to transfer some sensitive data and an error-correction
technique must be implemented.
7.1 Binary Error-Correcting Codes
Binary error-correcting codes are used when messages are strings of zeros and ones,
i.e., the alphabet is Z2 = {0, 1}.
7.1.1 The Hamming Weight and the Hamming Distance
If we transmit symbols of our message one by one, then there is no way that we can
detect an error. That is why we will try to split the message into blocks of symbols
of fixed length m. Any block of m symbols
a1 a2 . . . am , ai ∈ Z2
can be represented by an m-dimensional vector (a1 , a2 , . . . , am ) ∈ Zm

2 . Note that
according to a long-established tradition in coding theory the messages are written
as row vectors—this is different from the convention used in most undergraduate
Linear Algebra courses where elements of Rm are viewed as column vectors. Since
we split all messages into blocks of length m we may consider all messages to have
a fixed length m and view them as elements of the m-dimensional vector space Zm 2
over Z2 . When considering vectors as messages we will often omit commas, e.g.,
(1 1 1 0) is the vector (1, 1, 1, 0) treated as a message.
7.1 Binary Error-Correcting Codes 173
As above, mistakes during the transmission can be modeled algebraically. Sup-

pose that a message a = (a1 , a2 , . . . , am ) ∈ Zm 2 was transmitted and b =
(b1 , b2 , . . . , bm ) ∈ Zm
2 was received with one mistake in position i. Then
b = a + ei ,
where ei = (0, . . . , 0, 1, 0, . . . , 0) with a 1 in the ith position and 0 elsewhere. If

positions i1 , i2 , . . . , ik were damaged, then
b = a + ,
where = ei1 + · · · + eik is a vector with k ones and m − k zeros. In this case is
called the error vector.
Definition 7.1.1 The (Hamming) weight of a vector x ∈ Zm

2 is the number of nonzero
coordinates in x. It is denoted by wt(x).
Proposition 7.1.1 If a message a = (a1 , a2 , . . . , am ) ∈ Zm2 was transmitted and

b = (b1 , b2 , . . . , bm ) ∈ Zm
2 was received with k mistakes during the transmission,
then b = a + with wt() = k.
Example 7.1.1 If a = (0 1 0 1 0 1) and b = (0 1 1 0 0 1), then = (0 0 1 1 0 0)

with wt() = 2.
Another useful concept is that of the Hamming distance.

Definition 7.1.2 The Hamming distance between two vectors x, y ∈ Zm 2 is the
number of coordinates in which these two vectors differ. It is denoted as d(x, y).
Lemma 7.1.1 d(x, y) = wt(x + y).
Proof Let x = (x1 , x2 , . . . , xm ) and y = (y1 , y2 , . . . , ym ). Then, for all i = 1, . . . , m,

we have xi + yi = 0, if xi = yi , and xi + yi = 1, if xi = yi . Hence every i for which
xi = yi increases the weight of x + y by one. This proves the lemma. �
We may now reformulate Proposition 7.1.1 as follows:
Proposition 7.1.2 Suppose that a message a = (a1 , a2 , . . . , am ) ∈ Zm 2 was trans-

mitted and b = (b1 , b2 , . . . , bm ) ∈ Zm
2 was received. Then the fact that k mistakes
occur during the transmission is equivalent to d(a, b) = k.
Theorem 7.1.1 (Properties of the Hamming distance) It is a metric on Zm

2 , which
means that
1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y;

2. d(x, y) = d(y, x) for all x, y;
3. d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z.
Proof The first two properties are obvious. Let us prove the third one. Suppose
that xi = zi and the position i contributes 1 to d(x, z). Then either xi = yi and
yi = zi or xi = yi and yi = zi . Hence the ith position will also contribute 1 to the
sum d(x, y) + d(y, z). Suppose now that xi = zi and the position i contributes 0
to d(x, z). Then either xi = yi = zi and the ith position contributes also 0 to the
sum d(x, y) + d(y, z) or xi = yi = zi and the ith position contributes 2 to the
sum d(x, y) + d(y, z). Hence the right-hand side is not smaller than the left-hand
side. �
The following sets play a special role in coding theory. For any x ∈ Zm2 we define
Bk (x) = {y ∈ Zm
2 | d(x, y) ≤ k}, and we call it the ball of radius k with centre x.
Example 7.1.2 Let a = (1 1 1 1) ∈ Z42 . Then
B1 (a) = {a, (1 1 1 0), (1 1 0 1), (1 0 1 1), (0 1 1 1)}.
Theorem 7.1.2 The cardinality of the ball of radius k with centre x is

m m m
|Bk (x)| = + + ··· + . (7.1)
0 1 k
Proof Let y ∈ Bk (x). We may consider the “error vector” e such that y = x + e. Then
y ∈ Bk (x) if and only
if wt(e) ≤ k. It is enough to prove that, for each i = 1, . . . , k,
there are exactly mi vectors e ∈ Zm 2 such that wt(e) = i. Indeed, we must choose i
positions out of m in the zero vector and change the coordinates there to ones. Hence
every vector e with wt(e) = i corresponds to an i-element subset of {1, 2, . . . , m}.
We know that there are exactly

m m!
=
i i!(m − i)!
such subsets. (see, for example, [1], p. 271). Now it is clear that the formula (7.1)
counts all “error vectors” of weight at most k, and hence all vectors y which are at
Hamming distance k or less from x. �
Example 7.1.3 The cardinality of the ball of radius 2 with centre x is

m m m m(m − 1)
|B2 (x)| = + + =1+m+ .
0 1 2 2
Exercises
1. (a) Consider the following binary vectors:
u = (1 1 0 1 1 1 0), v = (1 0 0 0 1 1 1).
Determine the Hamming weights of u, v. Find d(u, v).

(b) The vector x = (0 1 1 1 0 1 0 1 1 0) was sent through a binary channel and
y = (0 1 0 1 0 1 1 1 1 0) was received. How many mistakes have occurred?
Write down the error vector.
2. List all vectors of B2 (x) ⊂ Z42 , where x = (1 0 1 0).
3. Let x be a word in Z72 . How many elements are there in the ball B3 (x) of radius
3?
4. Explain why the cardinality of Bk (x) does not depend on x.
7.1.2 Encoding and Decoding. Simple Examples
By now we have already understood the convenience of having all messages of equal
length, say m. Longer messages can be split into several shorter ones. The idea of
error-correction is to increase the length m of a transmitted message and to add to
each message several auxiliary symbols, the so-called check symbols, which will not
bear any information but will help to correct errors. Hence we increase the length of
every message from m to n, where m < n.
Definition 7.1.3 An error correcting code C consists of an encoding function

E : Zm n n m
2 → Z2 and a decoding function D : Z2 → Z2 ∪ {error} which satisfies
m
D(E(x)) = x for all x ∈ Z2 . Such a code is called a (binary) (m, n)-code.
We note that the encoding function is necessarily one-to-one. Indeed, if we had

E(x1 ) = E(x2 ), then x1 = D(E(x1 )) = D(E(x2 )) = x2 i.e., x1 = x2 .
Definition 7.1.4 Elements of E(Zm

2 ) are called codewords (or codevectors).
Example 7.1.4 (Parity check code) This code increases the length of a message by
1 adding only one check symbol which is the sum modulo 2 of all other symbols.
That is
E(x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xm+1 ),
where xm+1 = x1 + · · · + xm . Note that the sum of all coordinates for any of the
codevectors is equal to 0:
x1 + · · · + xm+1 = (x1 + · · · + xm ) + (x1 + · · · + xm ) = 0.

Let us see now what happens if one mistake occurs. In this case for the received
vector y = (y1 , y2 , . . . , ym+1 ) we will get
y1 + · · · + ym+1 = x1 + · · · + xm+1 + 1 = 0 + 1 = 1.
Hence if we organise the decoding as follows:

(y1 , y2 , . . . , ym ), if y1 + y2 + · · · + ym+1 = 0
D(y1 , y2 , . . . , ym+1 ) =
error if y1 + y2 + · · · + ym+1 = 1,
this code will detect any single error.
Example 7.1.5 (Triple repetition code) This code increases the length of a message
threefold by repeating every symbol three times:
E(x1 , x2 , . . . , xm ) = (x1 , x2 , . . . , xm , x1 , x2 , . . . , xm , x1 , x2 , . . . , xm ).
Decoding may be organised as follows. To decide on the first symbol the algorithm
inspects y1 , ym+1 , and y2m+1 . If the majority (two or three) of these symbols are 0’s,
then the decoding algorithm decides that a 0 was transmitted, while if the majority of
symbols are 1’s, then the algorithm decides that a 1 was sent. This code will correct
any single error but will fail to correct some double ones.
The ability of a particular code C = (E, D) to detect or correct errors depends on

the geometric properties of the set of codewords E(Zm n
2 ) ⊂ Z2 and the properties of
the decoding function D.
Definition 7.1.5 Suppose that for all y ∈ Zn2 the vector x = D(y) is such that the
vector E(x) is the closest (in respect to the Hamming distance) codeword to y (any
of them if there are several within the same distance), then we say that the decoding
function D satisfies maximum likelihood decoding.
Maximum likelihood decoding is based on the assumption (which can be proved)

that, under the assumption that mistakes are random and independent, in the sym-
metric channel the probability of k mistakes during the transmission is less then the
probability of j mistakes if and only if j > k. Therefore, if our assumption on the
distribution of mistakes is true, the maximum likelihood decoding minimises the
probability of the decoder making a mistake and will always be assumed.
Theorem 7.1.3 For a code C the following statements are equivalent:

(a) C detects all combinations of k or fewer errors.
(b) For any codeword x the ball Bk (x) does not contain codewords different from x.
(c) The minimum distance between any two codewords is at least k + 1.
Proof We will prove that (c) ⇒ (b) ⇒ (a) ⇒ (c). Suppose that the minimum
distance between any two codewords is at least k + 1. Then, for any codeword x,
the ball Bk (x) does not contain any other codeword, hence (c) ⇒ (b). Further, if a
combination of k or fewer errors occurs, by Proposition 7.1.2 the received vector y
will be in Bk (x). As there are no codevectors in Bk (x), other than x, the error will be
detected, hence (b) ⇒ (a). Finally, for a maximum likelihood decoder to be able to
detect all combinations of k or fewer errors, for any codeword x all vectors in Bk (x)
must not be codewords. Hence the distance between any two codewords is at least
k + 1, thus (a) ⇒ (c). �
Theorem 7.1.4 For a code C the following statements are equivalent:
(a) C corrects all combinations of k or fewer errors.
(b) For any two codewords x and y of C the balls Bk (x) and Bk (y) do not intersect.
(c) The minimum distance between any two codewords of C is at least 2k + 1.
Proof We will prove that (c) ⇒ (b) ⇒ (a) ⇒ (c). Suppose that the minimum
distance between any two codewords is at least 2k + 1. Then, for any two codewords
x and y the balls Bk (x) and Bk (y) do not intersect. Indeed, if they did, then for a
certain z ∈ Bk (x) ∩ Bk (y)
d(x, z) ≤ k, d(y, z) ≤ k.
This, by the triangle inequality,
d(x, y) ≤ d(x, z) + d(z, y) = k + k = 2k,
which is a contradiction, hence (c) ⇒ (b). Further, if no more than k mistakes happen
during the transmission of a vector x, the received vector y will be in the ball Bk (x)
and will not be in the ball of radius k for any other codeword. Hence y is closer to x
than to any other codevector. Since the decoding is a maximum likelihood decoding
y will be decoded to x and all mistakes will be corrected. Thus (b) ⇒ (a).
On the other hand, it is easy to see that if the distance d between two codewords
x and y does not exceed 2k, then certain combinations of k or fewer errors will not
be corrected. To show this let us change d coordinates of x, one by one, and convert
it into y:
x = x0 → x1 → · · · → xk → · · · → xd = y.
Then xk will be no further from y than from x. Hence if k mistakes take place and
the received vector is xk , then it may be decoded as y (even if d = 2k). This shows
that (a) ⇒ (c). �
Exercises
1. Consider the triple repetition (4, 12)-code. Find a necessary and sufficient con-
dition on the error vector e = (e1 , e2 , . . . , e12 ) for the message to be decrypted
correctly. Give an example of an error vector e of Hamming weight 4 which the
code corrects.
2. Let m = m1 m2 be composite. Let us consider a two-dimensional m1 × m2 array

and write our messages into this array (in any but fixed way). To every message
a = (a1 , a2 , . . . , am ) we add m1 + m2 additional symbols e1 , e2 , . . . , em1 and
f1 , f2 , . . . , fm2 , where ei is the sum (modulo 2) of all symbols in row i and fj
be the sum of all symbols in column j. Thus we have an (m, n)-code, where
n = m + m1 + m2 . Show that this code can correct all single errors and detect all
triple errors.
7.1.3 Minimum Distance, Minimum Weight. Linear Codes
Let C = (E, D) be an (m, n)-code, where E : Zm n

2 → Z2 is the encoding function,
n m
D : Z2 → Z2 is the (maximum likelihood) decoding function and D ◦ E = id or
D(E(x)) = x holds for all x ∈ Zm 2 . We observed that the set of codewords E(Z2 ) is
m
an important object. It is so important that it is often identified with the code itself and
also denoted C. We will also do this when it invites no confusion and the encoding
function is clear from the context. We saw that it is extremely important to spread
C = E(Zm n
2 ) in Z2 uniformly and that the most important characteristic of C is the
minimum distance between any two codewords of C
dmin (C) = min d(a, b).

a=b∈C
We may now reformulate Theorems 7.1.3 and 7.1.4 as follows:
Theorem 7.1.5 A code C detects all combinations of k or fewer errors if and only
if dmin (C) ≥ k + 1 and corrects all combinations of k or fewer errors if and only if
dmin (C) ≥ 2k + 1.
The following table shows the error-correcting capabilities of codes depending

on their minimum distance.
dmin 1 2 3 4 5 6 7 8 9
Errors detected 0 1 2 3 4 5 6 7 8
Errors corrected 0 0 1 1 2 2 3 3 4
Now let us consider several examples.

1 1 Hk Hk
Example 7.1.6 Let H1 = and let us define inductively: Hk+1 = .
1 −1 Hk −Hk
Then Hn is a matrix of order 2n × 2n . For example,
⎡ ⎤
1 1 1 1 1 1 1 1
⎢ 1 −1 1 −1 1 −1 1 −1 ⎥
⎡ ⎤ ⎢ ⎥
1 1 1 1 ⎢ 1 1 −1 −1 1 1 −1 −1 ⎥
⎢ ⎥
⎢ 1 −1 1 −1 ⎥ ⎢ 1 −1 −1 1 1 −1 −1 1 ⎥
H2 = ⎢ ⎥
⎣ 1 1 −1 −1 ⎦ , H3 = ⎢ ⎥
⎢ 1 1 1 1 −1 −1 −1 −1 ⎥ .
⎢ ⎥
1 −1 −1 1 ⎢ 1 −1 1 −1 −1 1 −1 1 ⎥
⎢ ⎥
⎣ 1 1 −1 −1 −1 −1 1 1 ⎦
1 −1 −1 1 −1 1 1 −1
It can be proved by induction that any two distinct rows of Hn are orthogonal (see
Exercise 2). This, in turn, is equivalent to the matrix equation
Hn HnT = nIn , (7.2)
where In is the identity n × n matrix.
Definition 7.1.6 An n × n matrix H with entries from {+1, −1} satisfying (7.2) is
called a Hadamard matrix.
The orthogonality of the rows of Hn means that any two rows of Hn coincide in
2n−1 positions and also differ in 2n−1 positions. Hence if we replace each −1 with
a 0, we will have a set of vectors with minimum distance 2n−1 . For example, if we
do this with the rows of H3 shown above we will get eight vectors with minimum
distance 4. We can use these vectors for the construction of a code. For example,
( 0 0 0 ) → ( 1 1 1 1 1 1 1 1 ),
( 1 0 0 ) → ( 1 0 1 0 1 0 1 0 ),
( 0 1 0 ) → ( 1 1 0 0 1 1 0 0 ),
( 0 0 1 ) → ( 1 0 0 1 1 0 0 1 ),
( 1 1 0 ) → ( 1 1 1 1 0 0 0 0 ),
( 1 0 1 ) → ( 1 0 1 0 0 1 0 1 ),
( 0 1 1 ) → ( 1 1 0 0 0 0 1 1 ),
( 1 1 1 ) → ( 1 0 0 1 0 1 1 0 ).
We obtain a (3, 8)-code with minimum distance 4.

In fact, we can do even better as the following exercise shows.

H3
Exercise 7.1.1 We may consider the matrix and replace in this matrix each
−H3
−1 by a 0. Then we will obtain 16 vectors which may be used to construct a (4, 8)-
code with minimum distance 4.
When, in 1969, the Mariner spacecraft sent pictures to Earth, the matrix H5 was
used to construct 64 codewords of length 32 with minimum distance 16. Each pixel
had a darkness given by a 6-bit number. Each of them was changed to one of the
64 codewords and transmitted. This code could correct any combination of 7 errors.
Since the signals from Mariner were fairly weak such an error-correcting capability
was really needed.
We may also define the minimum weight of the code by
wtmin (C) = min wt(a).

0=a∈C
This concept will be also quite important, especially for linear codes.
We remind the reader of the definition of a subspace. Let F be a field and V be a
vector space over F. A subset W ⊆ V is a subspace if for any two vectors u, v ∈ W
and any two scalars α, β ∈ F the linear combination αu + βv is also an element of
W . In this case W becomes a vector space in its own right.
Exercise 7.1.2 Let W be the set of all vectors from Zn2 whose sum of all coordinates
is equal to zero. Show that W is a subspace of Zn2 .
Definition 7.1.7 An error correcting code C = (E, D) is called linear if E : Zm2 →

Zn2 is a linear transformation from Zm
2 into Z2
n . For a binary field, where the only
scalars are 0 and 1, this means that
E(x + y) = E(x) + E(y)
for all x, y ∈ Zm
2.
Exercise 7.1.3 Prove that the parity check code is linear.
An important property of a linear code is formulated in the following proposition.
Proposition 7.1.3 For any linear code the set of codewords C is a subspace of Zn2 .
In particular, the zero vector 0 is a codeword.
Proof We will prove that C is a subspace of Zn2 if we show that the sum of any two
codewords is again a codeword. (As our coefficients come from Z2 , linear combina-
tions are reduced to sums.) Let b, c be two codewords. Then b = E(x) and c = E(y)
and
b + c = E(x) + E(y) = E(x + y) ∈ C.
In particular, 0 = b + b ∈ C. �
For a linear code finding the minimum distance is much simplified.
Theorem 7.1.6 For any linear code C
dmin (C) = wtmin (C).

Proof Suppose dmin (C) = d(a, b). Then as we know from Lemma 7.1.1 d(a, b) =
wt(a + b), and since a + b ∈ C we get
dmin (C) ≥ wtmin (C).
On the other hand, if wtmin (C) = wt(a), then, again by Lemma 7.1.1, wt(a) =
d(0, a), and hence
dmin (C) ≤ wtmin (C).
This completes the proof. �

Theorem 7.1.6 is very useful. There are M = 2m codewords in any (m, n)-code. To
find the minimum distance we need to perform M(M − 1)/2 calculations of distance
while to find the minimum weight we need only M such calculations.
Example 7.1.7 For the following (3, 6)-code C
0 = (0 0 0) → (0 0 0 0 0 0) = 0
a1 = (1 0 0) → (1 0 0 1 0 0) = c1
a2 = (0 1 0) → (0 1 0 1 1 1) = c2
a3 = (0 0 1) → (0 0 1 0 1 1) = c3
a1 + a2 = (1 1 0) → (1 1 0 0 1 1) = c1 + c2
a1 + a3 = (1 0 1) → (1 0 1 1 1 1) = c1 + c3
a2 + a3 = (0 1 1) → (0 1 1 1 0 0) = c2 + c3
a1 + a2 + a3 = (1 1 1) → (1 1 1 0 0 0) = c1 + c2 + c3 ,
it is easy to see that it is linear. We see that C = Span{c1 , c2 , c3 }, and dmin (C) =
wtmin (C) = wt(c1 ) = 2.
Exercises
1. Prove by induction that in the sequence of matrices H1 , H2 , . . . , Hn , . . . all matri-

ces are Hadamard matrices.
H
2. Let H be a Hadamard matrix. Let us construct a 2n × n matrix and then
−H
replace each −1 by a 0. Prove that in the resulting matrix every two distinct rows
have Hamming distance of at least n/2 between them.
3. Let Ei : Z32 → Z72 , i = 1, 2 be the encoding mappings of the codes C1 and C2 ,
respectively, given by
(a) E1 (a) = (a1 , a2 , a3 , a1 + a2 , a2 + a3 , a1 + a3 , 0),
(b) E2 (a) = (a1 , a2 , a3 , a1 + a2 , a2 , a1 + a2 + a3 , 1).
Which code is linear and which is not?
4. Show that in a binary linear code, either all codewords have even Hamming weight
or exactly half of the codewords have even Hamming weight.
7.1.4 Matrix Encoding Technique
Let C = (E, D) be a linear (m, n)-code. Let us consider the vectors e1 , e2 , . . . , em of

the standard basis of Zm 2 , where ei is the vector which has only one nonzero element
1 in the ith position. Let us consider the vectors
E(e1 ) = g1 , . . . , E(em ) = gm ,
which encode the simplest possible messages e1 , e2 , . . . , em . These vectors are

important since in the linear code they fully determine the encoding function. Indeed,
for an arbitrary message vector a = (a1 , a2 , . . . , am ) we have
E(a) = E(a1 e1 + a2 e2 + · · · + am em ) = a1 E(e1 ) + a2 E(e2 ) + · · · + am E(em )

= a1 g1 + a2 g2 + · · · + am gm .
Hence the subspace of all codewords C is spanned by {g1 , g2 , . . . , gm }. We can now

represent the encoding function by means of matrix multiplication
E(a) = a1 g1 + a2 g2 + · · · + am gm = aG, (7.3)
where ⎡⎤
g1
⎢ g2 ⎥
⎢ ⎥
G=⎢ . ⎥
⎣ .. ⎦
gm
is the matrix with rows g1 , g2 , . . . , gm . Equation (7.3) shows that the code is the row
space of the matrix G, i.e., C = Row(G).
Definition 7.1.8 Let C = (E, D) be a linear (m, n)-code. Then the matrix G such
that
E(a) = aG,
for all a ∈ Zm
2 , is called the generator matrix of C.
Example 7.1.8 Suppose the encoding function of an (2, 4)-code is
E(a) = (a1 , a2 , a2 , a1 + a2 ).
Then

1001
E(a) = (a1 , a2 , a2 , a1 + a2 ) = a1 (1, 0, 0, 1) + a2 (0, 1, 1, 1) = (a1 , a2 ) .
0111
Proposition 7.1.4 Let C = (E, D) be a linear (m, n)-code with generator m × n

matrix G. Then the rows of G are linearly independent. Moreover, rank(G) = m and
dim C = m.
Proof It is enough to prove linear independence of the rows g1 , g2 , . . . , gm . The two
remaining statements will then follow. Suppose on the contrary that a1 g1 + a2 g2 +
· · · + am gm = 0 with not all ai ’s being zero. Then, since E is linear,
0 = a1 g1 + a2 g2 + · · · + am gm = a1 E(e1 ) + a2 E(e2 ) + · · · + am E(em )

= E(a1 e1 + · · · + am em ).
Since the standard basis is linearly independent, we have a = a1 e1 + · · · +

am em = 0. This contradicts the fact E is one-to-one, since we have E(0) = 0
and E(a) = 0. �
Example 7.1.9 (Parity check code revisited) The parity check (m, m + 1)-code is
linear. Indeed, if the sum of coordinates for both x and y is zero, then the same is
true for x + y. We have
E(e1 ) = (1 0 . . . 0 1),
E(e2 ) = (0 1 . . . 0 1),
...
E(em ) = (0 0 . . . 1 1).
Hence ⎡ ⎤
1 0 ... 0 1
⎢0 1 ... 0 1⎥
G=⎢
⎣ ..
⎥ = [Im 1m ],
.. ... .. .. ⎦
0 0 ... 1 1
where Im is the m × m identity matrix and 1m is the m-dimensional column of 1’s.

Example 7.1.10 (Triple repetition code) The triple repetition code (m, 3m)-code is
also linear. We have
E(e1 ) = (1 0 . . . 0 1 0 . . . 0 1 0 . . . 0),
E(e2 ) = (0 1 . . . 0 0 1 . . . 0 0 1 . . . 0),
...
E(em ) = (0 0 . . . 1 0 0 . . . 1 0 0 . . . 1).
Hence ⎡ ⎤
10 ... 0 1 0 ... 0 1 0 ... 0
⎢0 1 ... 0 0 1 ... 0 0 1 ... 0⎥
G=⎢
⎣ .. ..
⎥ = [Im Im Im ].
... .. .. .. . . . .. .. .. . . . .. ⎦
00 ... 1 0 0 ... 1 0 0 ... 1
Example 7.1.11 Let us define a linear (3, 5)-code by its generator matrix
⎡ ⎤
10001
G = ⎣0 1 0 1 0⎦.
00111
Then the encoding function is

⎡ ⎤
1000 1
E(a1 , a2 , a3 ) = (a1 , a2 , a3 ) ⎣ 0 1 0 1 0 ⎦ = (a1 , a2 , a3 , a2 + a3 , a1 + a3 ).
0011 1
We see that the codeword E(a), which encodes a, consists of the vector a itself
embedded into the first three coordinates and two additional symbols.
Definition 7.1.9 A linear (m, n)-code C = (E, D) is called systematic if, for any
a ∈ Zm2 , the first m symbols of the codeword E(a) are the symbols of the word a,
i.e.,
E(a1 , a2 , . . . , am ) = (a1 , a2 , . . . , am , b1 , b2 , . . . , bn−m ).

info symbols check symbols
The symbols of a in E(a) are called the information symbols and the remaining
symbols are called the check symbols. These are the auxiliary symbols which we
mentioned earlier.
Proposition 7.1.5 For a linear (m, n)-code to be systematic, it is necessary and

sufficient that its generator matrix has the form G = (Im A), where A is an m×(n−m)
matrix.
Proof Any systematic code C = (E, D) must encode e1 , e2 , . . . , em into vectors
E(ei ) = gi = (0, . . . , 1, . . . , 0, ai1 , ai2 , . . . , ai n−m ).
Hence ⎡
⎤ ⎡ ⎤
g1 1 0 ... 0 a11 . . . a1n−m
⎢ g2 ⎥ ⎢
⎢ ⎥ ⎢0 1 ... 0 a21 . . . a2n−m ⎥
⎥ = [Im A].
G=⎢ . ⎥=⎣
⎣ .. ⎦ .. .. ... .. .. . . . ... ⎦
gm 0 0 ... 1 am1 . . . amn−m
The converse is easy and is left as an exercise. �
Definition 7.1.10 Two (m, n)-codes C 1 = (E1 , D1 ) and C 2 = (E2 , D2 ) are called
equivalent if, for every a ∈ Zm
2 , their respective codewords E1 (a) and E2 (a) differ
only in the order of symbols, moreover the permutation that is required to obtain
E1 (a) from E2 (a) does not depend on a.
Example 7.1.12 The two codes
(0 0) → (0 0 0 0) (0 0) → (0 0 0 0)
(0 1) → (0 1 0 1) (0 1) → (0 1 0 1)
(1 0) → (1 0 0 1) (1 0) → (0 1 1 0)
(1 1) → (1 1 0 0) (1 1) → (0 0 1 1)
are equivalent. The permutation that must be applied to the symbols of the first code
to obtain the second is (1 3)(2 4).
It is clear that two equivalent codes have the same minimum distance.
Theorem 7.1.7 Let C be a linear (m, n)-code with minimum distance d. Then there
is a systematic linear (m, n)-code with the same minimum distance d.
Proof Let C be a linear (m, n)-code with generator matrix G. When we perform
elementary row operations over the rows of G we do not change Row(G) and hence
the set of codewords (it will change the encoding function, however).
We may, therefore, assume that our matrix G is already in its reduced row echelon
form. Since G has full rank (its rows are linearly independent), we must have m pivot
columns which are the m columns of the identity matrix Im . Let the positions of these
columns be i1 , i2 , . . . , im . Then in a codeword E(a) we will find our information
symbols a1 , a2 , . . . , am in positions i1 , i2 , . . . , im . Moving these columns (and hence
the respective coordinates) to the first m positions, we will obtain a systematic code
which is equivalent to the given one. �
Example 7.1.13 Let C be a (3,6)-code with the generator matrix

⎡ ⎤
101011
G = ⎣0 1 1 1 1 0⎦.
000111
Then reducing G to its reduced row echelon form

⎡ ⎤ ⎡ ⎤
101011 101011
G = ⎣ 0 1 1 1 1 0 ⎦ → ⎣ 0 1 1 0 0 1 ⎦ = G
000111 000111
gives us a generator matrix G of a new code with the same minimum distance. It is
equivalent to the systematic code with the generator matrix
⎡ ⎤
100111
G = ⎣ 0 1 0 1 0 1 ⎦ ,
001011
which is G with columns 3 and 4 swapped.

Example 7.1.14 The following matrix

⎡ ⎤
1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1100010
⎢0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1⎥
⎢ ⎥
⎢0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0⎥
⎢ ⎥
⎢0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0⎥
⎢ ⎥
⎢0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0⎥
⎢ ⎥
⎢0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 1 1 1⎥
G=⎢
⎢0
⎥
⎢ 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1⎥
⎥
⎢0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1⎥
⎢ ⎥
⎢0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0⎥
⎢ ⎥
⎢0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 1⎥
⎢ ⎥
⎣0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1⎦
0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1111111
is the generator matrix of the famous Golay code. This is a (12, 24)-code and its
minimum distance is 8. It was used by the Voyager I and Voyager II space-crafts
during 1979–1981 to provide error correction when the Voyagers transmitted to
Earth colour pictures of Jupiter and Saturn.
Exercises
1. The encoding function E : Z4 → Z7 of the linear code is
E(a) = (a1 , a2 , a3 , a1 + a2 + a4 , a2 + a3 , a1 + a3 + a4 , a4 ).
Construct the generator matrix.

2. Check by inspection that the Golay code is systematic.
3. Show that elementary row operations performed on a generator matrix G do not
change the set of codewords and, in particular, the minimum distance of the code.
4. Let C 1 be the linear (3, 6)-code with the following generator matrix over Z2
⎡ ⎤
101010
G = ⎣1 1 0 0 1 1⎦.
111000
(a) Encode (1 1 1) and show that C 1 is NOT systematic.

(b) Find the generator matrix of another linear (3, 6)-code C 2 which is systematic
and equivalent to C 1 .
(c) List all codewords of C 2 and determine its minimum distance.
7.1.5 Parity Check Matrix
The generator matrix of a code is a great tool for the sender since with its help
the encoding can be done by means of matrix multiplication. All she needs is to
store the generator matrix which contains all the information about the encoding
function. However, the generator matrix is not very useful at the receiving end. On
the receiving end we need another matrix—the parity check matrix, which we will
introduce below.
Definition 7.1.11 Let C be a linear (m, n)-code. An (n − m) × n matrix H is called

a parity check matrix of C if x ∈ C if and only if HxT = 0.
By definition, the null-space of H is Null(H) = {x | HxT = 0}. Therefore we

can reformulate the above definition as follows: an (n − m) × n matrix H is a parity
check matrix of C if and only if C = Null(H).
Having this matrix at the receiving end we may quickly check if the received
vector y was the codevector by calculating its syndrome S(y) = HyT . Then y ∈ C if
and only if S(y) = 0. If the syndrome is the zero vector, the decoder assumes that no
mistakes happen. Later we will learn how the syndrome S(y), if nonzero, can help
to correct mistakes that occurred.
But, firstly, we have to learn how to construct such a matrix given the generator
matrix G. We will assume that our code is systematic and G has the form G = (Im A),
where A is an arbitrary m × (n − m) matrix. In other words,
⎡ ⎤ ⎡ ⎤
g1 10 ... 0 a11 . . . a1n−m
⎢ g2 ⎥ ⎢
⎢ ⎥ ⎢0 1 ... 0 a21 . . . a2n−m ⎥
⎥.
G=⎢ . ⎥=⎣
⎣ .. ⎦ .. .. ... .. .. . . . ... ⎦
gm 00 ... 1 am1 . . . amn−m
Let us assume for a moment that an (n − m) × n parity check matrix H exists.

Since gi ∈ C, for any i = 1, 2, . . . , m, we must have HgiT = 0 and hence HGT = 0.
We also have GHT = (HGT )T = 0. This means that all columns of H T must be
solutions to the system of linear equations GxT = 0. Since G is already in its reduced
row echelon form, we separate variables to obtain
x1 = −a11 xm+1 − · · · − a1n−m xn

x2 = −a21 xm+1 − · · · − a2n−m xn
...
xm = −am1 xm+1 − · · · − amn−m xn
(of course in Z2 we have −aij = aij however we would like to leave the possibility
of a non-binary alphabet). Setting, as usual, the values of the free variables to be
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
xm+1 1 0 0
⎢ xm+2 ⎥ ⎢ 0 ⎥ ⎢1⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ = ⎢ .. ⎥ , ⎢ .. ⎥ , . . . , ⎢ .. ⎥ ,
⎣ . ⎦ ⎣.⎦ ⎣.⎦ ⎣.⎦
xn 0 0 1
we obtain a basis {f1 , f2 , . . . , fn−m } for the solution space of the system GxT = 0
calculating
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−a11 −a12 −a1n−m
⎢ −a21 ⎥ ⎢ −a22 ⎥ ⎢ −a2n−m ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
T
⎢ −a ⎥
m1 ⎥ T
⎢ −a ⎥
m2 ⎥ T
⎢ −a ⎥
mn−m ⎥
⎢
f1 = ⎢ ⎢ ⎢
1 ⎥ , f2 = ⎢ 0 ⎥ , . . . , fn−m = ⎢ 0 ⎥.
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦
0 0 1
We will show that the matrix H with rows {f1 , f2 , . . . , fn−m } is a parity check matrix
for this code. Indeed, HgiT = 0, hence for any codeword c ∈ C we have c =
a1 g1 + a2 g2 + · · · + am gm and
HcT = H(a1 g1 + a2 g2 + · · · + am gm )T = a1 Hg1T + · · · + am Hgm

T
= 0.
Therefore C ⊆ Null(H). On the other hand, since H has rank n − m, we get

dim Null(H) = n − (n − m) = m = dim C. Hence Null(H) = C and H is indeed a
parity check matrix for C. We see that H has the form
T
−A
H= = (−AT | In−m ).
In−m
We have proved:
Theorem 7.1.8 Let C be a linear (m, n)-code. If G = (Im | A) is a generator matrix
of C, then H = (−AT | In−m ) is a parity check matrix of C.
This works in the other direction too: given an (n−m)×n matrix H = (A | In−m ),
where A is an (n − m) × m matrix, we can construct a linear (m, n)-code C with the
generator matrix G = (Im | −AT ) and it will have H as its parity check matrix.
Example 7.1.15 Suppose that the generator matrix for a binary (4, 7)-code is
⎡ ⎤
1 0 0 0 1 0 1
⎢0 1 0 0 0 1 1⎥
G=⎢
⎣0
⎥ = (I4 | A).
0 1 0 1 1 0⎦
0 0 0 1 0 1 0
Then ⎡ ⎤
1010 100
H = ⎣0 1 1 1 0 1 0 ⎦ = (AT | I3 ).
1100 001
If we encode a = (1 0 1 0), we get

⎡ ⎤
1 0 0 0 1 0 1
⎢0 1 0 0 0 1 1⎥
b = aG = (1 0 1 0) ⎢
⎣0
⎥ = (1 0 1 0 0 1 1).
0 1 0 1 1 0⎦
0 0 0 1 1 1 1
We have S(b) = HbT = (0 0 0)T = 0 but for c = b + e2 = (1 1 1 0 0 1 1) we

have S(c) = HcT = (0 1 1)T = (0 0 0)T . If c was received, this would show that
one or more mistakes happened.
Let hi be the ith column of the parity check matrix, that is, H = (h1 h2 . . . hn ).
We know that a vector b ∈ Zn2 is a codevector if and only if S(b) = 0.
Let a be a codevector and suppose b = a + e. We may treat b as the codevector a
with an error. Our goal is to determine how the syndrome S(b) of the vector b ∈ Zn2
depends on the codevector a and on the error vector e. We will find that it does not
depend on a at all! This will allow us to develop a method of error correction.
Lemma 7.1.2 Let a be a codevector and suppose b = a + e, where the error vector
e has Hamming weight s and ones in positions i1 , i2 , . . . , is , which corresponds to s
mistakes in the corresponding positions. Then
S(b) = hi1 + hi2 + · · · + his . (7.4)
Proof By Proposition 7.1.1 e = ei1 + ei2 + · · · + eis , where ej is the jth vector of the
standard basis of Zn2 . Then
S(b) = HbT = H(a + e)T = 0 + HeT = H(eiT1 + eiT2 + · · · + eiTs ) = hi1 + hi2 + · · · + his ,
since HeiTt = hit . �

We see that, indeed, the syndrome of the received vector depends only on the error
vector and not on the codevector.
Theorem 7.1.9 Let H = (h1 , h2 , . . . , hn ) be an (n − m) × n matrix with entries
from Z2 such that no two columns of H coincide. Then any binary linear (m, n)-code
C with H as its parity check matrix corrects all single errors. If a single error occurs
in ith position, then the syndrome of the received vector is equal to the ith column of
H, i.e., hi .
Proof Suppose that a codevector a was sent and the vector b = a + ei was received
(which means that a mistake occurred in the ith position). Then due to (7.4)
S(b) = HbT = hi .
We now know where the mistake happened and can correct it. �
Exercises
1. Let ⎡ ⎤
12121
A = ⎣1 2 1 0 2⎦
21010
be a matrix over Z3 .
(a) Find a basis for the null space Null(A) of this matrix.
(b) List all vectors of the Null(A).
(c) Find among the nonzero vectors of Null(A) the vector whose weight is min-
imal.
2. Let us consider a binary code C given by its parity check matrix
⎡ ⎤
001 1101
⎢0 1 0 1 0 1 1⎥
H=⎢
⎣1 0 0
⎥.
0 1 1 1⎦
111 1110
(a) Compute the generator matrix for C. What is the number of information
symbols for this code?
(b) Will the code C correct any single mistake?
(c) Will the code C correct any two mistakes?
(d) Will the code C detect any two mistakes?
(e) Encode the message vector whose coordinates are all equal to 1;
(f) Decode y1 = (1 1 0 1 0 0 1) and y2 = (1 1 0 1 1 0 0);
(g) Show that a single mistake could not result in receiving the vector z =
(0 1 0 1 1 1 1). Show that two mistakes could result in receiving z.
7.1.6 The Hamming Codes
Richard Hamming1 was an American mathematician and computer scientist. He

started a new subject within information theory. Hamming codes, Hamming distance
1 Richard Wesley Hamming (1915–1998) He participated in the Manhattan Project that produced
the first atomic bombs during World War II. There he was responsible for running the IBM computers
in Los Alamos laboratory which played a vital role in the project. Later he worked for Bell Labs after
which he became increasingly interested in teaching and taught in a number of leading universities
in the USA. Hamming is best known for his work on error-detecting and error-correcting codes.
His fundamental paper on this topic “Error detecting and error correcting codes” appeared in April
1950 in the Bell System Technical Journal.
and Hamming metric are standard terms used today in coding theory but they are
also used in many other areas of mathematics.
We start with the Hamming (4, 7)-code. Let us consider the binary 3 × 7 matrix
⎡ ⎤
0001111
H = (h1 h2 h3 h4 h5 h6 h7 ) = ⎣ 0 1 1 0 0 1 1 ⎦ , (7.5)
1010101
where in the ith column hi of H we write the binary representation of i from i = 1

to i = 7. Theorem 7.1.9 gives us reason to believe that the (4, 7)-code with this
parity check matrix will be good since by that theorem such a code will correct all
single errors. We also note that all nonzero three-dimensional columns are used in
the construction of H and every binary 3 × 8 matrix will have at least two equal
columns. This says to us that the code with parity check matrix H must be in some
way the optimal (4, 7)-code.
Let us find a generator matrix G that will match the parity check matrix H. We
know that by row reducing H we do not change the null-space of H, hence the set
of codewords stays the same. We will therefore be trying to obtain a matrix with
the identity matrix I3 in the last three columns in order to apply Theorem 7.1.8. The
technique is the same as for finding the reduced row echelon form. We obtain:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0001111 0001111 0001111
H = ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 0 1 1 0 0 1 1 ⎦
1010101 1011010 1101001
⎡ ⎤ ⎡ ⎤
0111100 0111 100
−→ ⎣ 0 1 1 0 0 1 1 ⎦ −→ ⎣ 1 0 1 1 0 1 0 ⎦ = (C | I3 ).
1101001 1101 001
Therefore the generator matrix of this code is

⎡ ⎤
1 0 0 0 0 1 1
⎢ 0 1 0 0 1 0 1⎥
G = (I4 | C T ) = ⎢
⎣0
⎥.
0 1 0 1 1 0⎦
0 0 0 1 1 1 1
The vector a = (1 1 1 0) will be encoded to

⎡ ⎤
1 0 0 0 0 1 1
⎢0 1 0 0 1 0 1⎥
b = aG = (1 1 1 0) ⎢
⎣0
⎥ = (1 1 1 0 0 0 0).
0 1 0 1 1 0⎦
0 0 0 1 1 1 1
Suppose that the vector c = (1 0 1 0 0 0 0) is received. Its syndrome is

⎡ ⎤
1
⎢0⎥
⎡ ⎤⎢ ⎥ ⎡ ⎤
0001111 ⎢ ⎢1⎥
⎥ 0
T
S(c) = Hc = ⎣ 0 1 1 0 0 1 1 ⎦ ⎢ 0 ⎥
⎢
⎥ = ⎣ 1 ⎦ = h2 .
1010101 ⎢ ⎢0⎥
⎥ 0
⎣0⎦
0
Assuming that only one mistake happened, we know that this mistake occurred in the
second position. Hence the vector b = (1 1 1 0 0 0 0) was sent and a = (1 1 1 0)
was the original message.
This code is very interesting. It has 24 = 16 codewords and, since it corrects any
single error, it has minimum distance of at least 3. So, if we take a ball B1 (x) of
radius one centred at a codeword x, it will not intersect with other similar balls of
radius one around other codewords. Due to Theorem 7.1.2, every such ball will have
eight vectors of Z72 . In total, these balls will contain 16 · 8 = 128 = 27 vectors, that
is all vectors of Z72 . The whole space is the union of those unit balls! This means that
the Hamming (4,7)-code corrects all single mistakes but not a single double mistake
since any double mistake will take you to another ball. Lemma 7.1.2 provides an
alternative explanation of why any double mistake will not be corrected. Indeed, the
syndrome of a double mistake is the sum of the corresponding two columns of H.
However, since all three-dimensional vectors are used as columns of H, the sum of
any two columns will be a third column. This means that any double mistake will be
treated as a single mistake and will not be corrected.
Suppose, for example, that the vector a = (1 1 1 0) encoded as b =
(1 1 1 0 0 0 0), was sent, and the vector c = (0 1 1 0 0 0 1) was received
with mistakes in the first and the seventh positions. The syndrome of it is
⎡ ⎤
0
⎢1⎥
⎡ ⎤⎢ ⎥ ⎡ ⎤
0001111 ⎢ ⎢ 1⎥⎥ 1
S(c) = HcT = ⎣ 0 1 1 0 0 1 1 ⎦ ⎢ 0
⎢ ⎥
⎥ = h1 + h7 = ⎣ 1 ⎦ = h6 .
1010101 ⎢ ⎢0⎥
⎥ 0
⎣0⎦
1
The received vector will be decoded as (0 1 1 0 0 1 1) and then (0 1 1 0). This

double mistake will not be corrected as it will mimic a single mistake in position 6.
The (4, 7) binary Hamming code is the ‘smallest’ code from the infinite family
of Hamming codes.
Definition 7.1.12 A binary Hamming (2k − k − 1, 2k − 1)-code is any code with

the parity check matrix H, whose rth column contains the binary representation of
the integer r, for r = 1, 2, . . . , 2k − 1.
Example 7.1.16 The Hamming (11, 15)-code is given by its parity check matrix
⎡ ⎤
0 0 0 0 0 0 0 1 1 1 11 1 1 1
⎢0 0 0 1 1 1 1 0 0 0 01 1 1 1⎥
H=⎢
⎣0
⎥.
1 1 0 0 1 1 0 0 1 10 0 1 1⎦
1 0 1 0 1 0 1 0 1 0 10 1 0 1
Corollary 7.1.1 A binary Hamming (2k − k − 1, 2k − 1)-code corrects all single

mistakes.
Exercises
1. We have defined the Hamming (4, 7)-code by means of the parity check matrix
H and we computed the generator matrix G, where
⎡ ⎤
⎡ ⎤ 100 0 011
1010101 ⎢0 1 0 0 1 0 1⎥
H = ⎣0 1 1 0 0 1 1⎦, G=⎢
⎣0 0 1
⎥.
0 1 1 0⎦
0001111
000 1 111
(a) Encode the vector u = (1 0 1 1);

(b) Decode the vector v = (1 0 1 1 0 1 1);
(c) Find all strings of length 7 which are decoded to w = (1 0 1 1).
2. A code that, for some k, corrects all combinations of k mistakes and does not
correct any combination of mistakes for > k, is called perfect. Prove that all
codes of the family of Hamming codes are perfect.
7.1.7 Polynomial Codes
There is one particular class of linear codes the construction of which uses some
advanced algebra, and because of that these codes are very effective. In this section
we will consider (m, n)-codes obtained in this way. We will identify our messages
(strings of symbols of length m or vectors from Zm 2 ) with polynomials of degree at
most m − 1. More precisely, this identification is given by the formula
a = (a0 , a1 , . . . , am−1 ) → a(x) = a0 + a1 x + · · · + am−1 x m−1 .
Given a message we take its symbols as coefficients of a polynomial. Of course,

the message a can be easily recovered from the polynomial a(x). Suppose now that
we have a polynomial g(x) = g0 + g1 x + · · · + gk x k , where k = n − m. Then we

can define an (m, n)-code C as follows. For every a = (a0 , a1 , . . . , am−1 ) ∈ Zm
2 we
define
E : a → a(x) = a0 + a1 x + · · · + am−1 x m−1 → a(x)g(x)

= b0 + b1 x + · · · + bn−1 x n−1 → b,
where b = (b0 , b1 , . . . , bn−1 ) ∈ Zn2 . Such a code is called a polynomial code and
the polynomial g(x) is called the generator polynomial of this code.
Example 7.1.17 Suppose g(x) = 1 + x 2 + x 3 and a = (1 1 1 0). Then
a(x) = 1 + x + x 2 , b(x) = a(x)g(x) = 1 + x + x 5
and hence b = (1 1 0 0 0 1 0).
Theorem 7.1.10 The polynomial code C is linear with the following m×n generator
matrix ⎡ ⎤
g0 g1 . . . gk
⎢ g0 g1 . . . gk ⎥
⎢ ⎥
G=⎢ ⎢ g0 g1 . . . gk ⎥, (7.6)
⎥
⎣ ... ... ... ... ⎦
g0 g1 . . . gk
where all empty places are filled with zeros.
Proof The linearity of the encoding function follows from the distributive law for
polynomials. Suppose that E(a1 ) = b1 and E(a2 ) = b2 with a1 (x), b1 (x), a2 (x),
b2 (x) being the corresponding polynomials. We need to show that E(a1 + a2 ) =
b1 + b2 . Indeed, we have
a1 + a2 → a1 (x) + a2 (x) → (a1 (x) + a2 (x))g(x) = a1 (x)g(x) + a2 (x)g(x)
= b1 (x) + b2 (x) → b1 + b2 ,
as required.
To determine the generator matrix we need to calculate E(e1 ), . . . , E(em ). We
have
ei → x i−1 → x i−1 g(x) = g0 x i−1 + g1 x i + · · · + gn−m x n−m+i−1
→ (0, . . . , 0, g0 , g1 , . . . , gn−m , 0 . . . , 0).

i−1
This must be the ith row of the generator matrix G. This gives us (7.6). �
Although for a polynomial code the generator matrix (7.6) is easy to obtain, it is
sometimes more convenient (and gives more insight) to multiply polynomials and
not matrices.
Example 7.1.18 Let g(x) = 1 + x 2 + x 3 . Using it we can define an (m, m+3)-code

for all m. Let us choose m = 4. Then we obtain a (4, 7)-code whose generator matrix
will be ⎡ ⎤
1011000
⎢0 1 0 1 1 0 0⎥
G=⎢ ⎣0 0 1 0 1 1 0⎦.
⎥
0001011
To encode (1 1 1 0) we perform the following multiplication of polynomials:
(1 1 1 0) → 1+x +x 2 → (1+x +x 2 )(1+x 2 +x 3 ) = 1+x +x 5 → (1 1 0 0 0 1 0).
By row reducing G (when we change the encoding function but not the set of
codewords), we get
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 1 0 0 0 1 0 10 011 1 0 0 0 1 0 1
⎢0 1 0 1 1 0 0⎥ ⎢0 1 00 1 1 1⎥ ⎢0 1 0 0 1 1 1⎥
G=⎢
⎣0
⎥→⎢ ⎥→⎢ ⎥.
0 1 0 1 1 0⎦ ⎣0 0 10 1 1 0⎦ ⎣0 0 1 0 1 1 0⎦
0 0 0 1 0 1 1 0 0 01 011 0 0 0 1 0 1 1
Since it is now in the form (I4 | A), by Theorem 7.1.8 we may obtain its parity check
matrix H as (AT | I3 ), that is,
⎡ ⎤
1110100
H = (AT | I3 ) = ⎣ 0 1 1 1 0 1 0 ⎦ .
1101001
From this we observe that the code which we obtained is equivalent to the Hamming
code since H = (h5 , h7 , h6 , h3 , h4 , h2 , h1 ), where h1 , h2 , . . . , h7 are the columns
of the parity check matrix of the Hamming code. Hence it is equivalent to the (4,7)
Hamming code.
Exercises
Let g(x) = 1 + x + x 3 . Consider the polynomial (5, 8)-code C with g(x) as generator
polynomial. For this code
1. Encode a = (1 0 1 0 1).
2. Find the generator matrix G of the code C.
3. Find a systematic linear code C (in terms of its parity check matrix) which is
equivalent to C.
7.1.8 Bose–Chaudhuri–Hocquenghem (BCH) Codes
This is one particularly good class of polynomial codes which was discovered inde-
pendently around 1960 by Bose, Chaudhuri and Hocquenghem. They enable us to
correct multiple errors. Since the construction of the generator polynomial for these
codes is based on a finite field of certain cardinality, we have to construct one first,
say F and then find its primitive element α.
In Chap. 5 we discussed a method of constructing a field which consists of pn
elements. It is unique up to isomorphism and denoted by GF(pn ). To construct it we
need to take Zp , find an irreducible polynomial m(x) over Zp of degree n and form
F = Zp [x]/(m(x)). There are very good tables of irreducible polynomials over Zp
of virtually any degree (see, for example, [2]).
BCH codes work equally well for binary and for non-binary alphabets but in this
section we put the main emphasis on the binary case. Therefore we will consider a
field F = GF(2r ) for some r which is an extension of Z2 . The general case is not
much different with only minor changes needed.
As usual we will consider (m, n)-codes, where m denotes the number of informa-
tion symbols and n the length of codewords. The minimum distance of the code we
will denote by d. For BCH codes we, first, have to decide on the length of the code-
words n and on the minimum distance d, then m will depend on these two parameters
but this dependence is not straightforward.
This restriction on the length is not important in applications because it is not
the length of codewords that is practically important (we may divide our messages
into segments of any length) but the speed of transmission, which is characterised
by the ratio m/n, and the error-correcting capabilities of the code, i.e., the minimum
distance d.
We use the extension Z2 ⊆ F for the construction. The length of the word n will
be taken to be the number of elements in the multiplicative group of the field F. As
we consider the binary situation, this number can only be n = 2r − 1, where r is
an arbitrary positive integer, since the field F of characteristic 2 may have only 2r
elements for some r.
Let α be a primitive element of F. Then it has multiplicative order n and the
powers 1 = α 0 , α, α 2 , . . . , α n−1 are all different. To construct g(x) we need to know
the minimal annihilating polynomials of α, α 2 , . . . , α d−1 . Let mi (x) be the minimal
annihilating polynomial of α i .
Theorem 7.1.11 The polynomial code of length n with the generator polynomial
g(x) = lcm(m1 (x), m2 (x), . . . , md−1 (x)) (7.7)
has minimum distance at least d. It has m = n − deg (g) information symbols.
Proof Since this code is linear, the minimum distance is the same as the minimum
weight. Hence it is enough to prove that there are no codewords of weight d − 1 or
less. Since the code is polynomial, all vectors from Zn2 are identified with polynomials
of degree smaller than n and the codewords are identified with polynomials which
are divisible by g(x). Hence, we have to show that there are no polynomials of degree
smaller than n which are multiples of g(x) and have less than d nonzero coefficients.
Suppose on the contrary that the polynomial
c(x) = c1 x i1 + c2 x i2 + · · · + cd−1 x id−1
is a multiple of g(x). Then it will be an annihilating polynomial for α, α 2 , . . . , α d−1 ,

i.e.,
c(α) = c(α 2 ) = · · · = c(α d−1 ) = 0.
This can be rewritten as
c1 α i1 + c2 α i2 + · · · + cd−1 α id−1 = 0
c1 α 2i1 + c2 α 2i2 + · · · + cd−1 α 2id−1 = 0
...
c1 α (d−1)i1 + c2 α (d−1)i2
+ · · · + cd−1 α (d−1)id−1
= 0.
Let us set βk = α ik . We see that the system of homogeneous linear equations
β1 x1 + β2 x2 + · · · + βd−1 xd−1 = 0
β12 x1 + β22 x2 + · · · + βd−1
2
xd−1 = 0
...
β1d−1 x1 + β2d−1 x2 + · · · + βd−1
d−1
xd−1 = 0
has a nontrivial solution (c1 , c2 , . . . , cd−1 ). This can happen only if the determinant
of this system vanishes. This, however, contradicts the classical result of the theory
of determinants that, for any k > 1, the Vandermonde determinant

β1 β2 ... βk
2
β
1 β22 ... βk2
... ... ... . . .

βk β2k ... βk
1 k
is zero if and only if βs = βt for some s = t such that s ≤ k and t ≤ k (see the
Appendix for the proof). Indeed, in our case k = d − 1 and βs = α is = α it = βt
because is ≤ n − 1 and it ≤ n − 1. This contradiction proves the theorem. �
The following lemma significantly helps with the calculation of g(x).
Lemma 7.1.3 Let Z2 ⊂ F be an extension of fields. Let α ∈ F and let m(t) be

the minimal annihilating polynomial of α over Z2 . Then m(t) is also the minimal
irreducible polynomial of α 2 .
Proof Let m(t) = t k + a1 t k−1 + · · · + ak · 1, where ai ∈ Z2 . Then we have
m(α) = α k + a1 α k−1 + · · · + ak−1 α + ak · 1 = 0. (7.8)
We note, first, that ai2 = ai as 02 = 0 and 12 = 1 for ai ∈ {0, 1}. We also note that
since 2x = 0 for all x ∈ F, then (x + y)2 = x 2 + y2 for all x, y ∈ F and by induction
(x1 + x2 + · · · + xn )2 = x12 + x22 + · · · + xn2
for all x1 , x2 , . . . , xn ∈ F. Now (7.8) implies:
0 = (m(α))2 = (α k )2 + (a1 α k−1 )2 + · · · + (ak · 1)2

= (α 2 )k + a1 (α 2 )k−1 + · · · + ak · 12 = m(α 2 ).
Hence m(t) is also an annihilating polynomial for α 2 . Therefore the minimal irre-
ducible polynomial of α 2 must divide m(t). Since m(t) is irreducible, this is possible
only if it coincides with m(t). �
Example 7.1.19 Suppose that we need a code which corrects any two errors and has
length 15. Hence d = 5, and we need a field containing 16 elements. Such a field
K = Z2 [x]/(x 4 + x + 1) was constructed in Example 5.2.3. We also saw that the
multiplicative order of x was 15, hence x is a primitive element of F. Let α = x.
For correcting any two mistakes we need a code with minimum distance d = 5.
Theorem 7.1.11 tells us that we need to take the generator polynomial
g(t) = lcm(m1 (t), m2 (t), m3 (t), m4 (t)).
Then we know that m1 (t) = t 4 + t + 1. By Lemma 7.1.3 we have m1 (t) = m2 (t) =

m4 (t). Hence g(t) = m1 (t)m3 (t) and we have to calculate m3 (t) which is the minimal
annihilating polynomial for β = x 3 . Using the table in Example 5.2.3, we calculate
that β 2 = x 6 = x 2 + x 3 , β 3 = x 9 = x + x 3 , β 4 = x 12 = 1 + x + x 2 + x 3 . Elements
1, β, β 2 , β 3 , β 4 must be linearly dependent in the 4-dimensional vector space F
and we can find the linear dependency between them using the Linear Dependency
Relationship Algorithm. By row reducing the following matrix to its row reduced
echelon form ⎡ ⎤ ⎡ ⎤
10001 10001
⎢ 0 0 0 1 1 ⎥ rref ⎢ 0 1 0 0 1 ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 0 1 0 1 ⎦ −→ ⎣ 0 0 1 0 1 ⎦
01111 00011
we find that m3 (t) = t 4 + t 3 + t 2 + t + 1. Now we calculate
g(t) = (t 4 + t + 1)(t 4 + t 3 + t 2 + t + 1) = t 8 + t 7 + t 6 + t 4 + 1.
Now we may say that m = n − deg (g) = 15 − 8 = 7 and our code C will be a
(7, 15)-code. It will correct any two errors.
A more practical example is a code widely used in European data communication
systems. It is a binary (231, 255)-code with a guaranteed minimum distance of 7.
The field consisting of 28 = 256 elements is used and the encoding polynomial has
degree 24.
Exercises
1. Construct a binary (m, n)-code with the length of codewords n = 15, which
corrects all triple errors, in following steps:
(a) Using the field K = Z2 [x]/(x 4 + x 3 + 1), compute the generating polynomial
g(t) of a binary BCH code with the length of the codewords n = 15 and with
a minimum distance 7.
(b) What is the number m of information symbols?
(c) Write down the generating matrix G of this BCH code.
(d) Encode the message which is represented by the string of m ones.
2. In European data communication systems a binary BCH (231, 255)-code is used
with guaranteed minimum distance 7. Using GAP find the generator polynomial
of this code.
7.2 Non-binary Error-Correcting Codes
Non-binary codes have many different uses. Any finite field Zp can be used as an
alphabet of a code if the channel allows us to distinguish p different symbols. Non-
binary codes can be used as an intermediate step in the construction of good binary
codes, and they can also be used in the construction of fingerprinting codes, which
we will discuss in the next section.
7.2.1 The Basics of Non-binary Codes
We will again consider (m, n)-codes. The encoding function of such a code will be a
mapping (normally linear) E : F m → F n for a certain finite field F which serves as
the alphabet. The Hamming weight and the Hamming distance are defined exactly
as for binary codes.
Example 7.2.1 Let u = (0 1 2 0 2 1 0) and v = (0 2 2 0 2 0 0) be vectors of Z73 .

Then wt(u) = 4, wt(v) = 3 and
d(u, v) = wt(u − v) = wt (0 2 0 0 0 1 0) = 2.
If u was sent and v was received, then the error vector is e = v −u = (0 1 0 0 0 2 0).
With non-binary codes we don’t have the luxury that −a = a anymore. With
ternary codes we have −a = 2a instead! But the following theorem is still true:
Theorem 7.2.1 A code C detects all combinations of k or fewer errors if and only
if dmin (C) ≥ k + 1 and corrects all combinations of k or fewer errors if and only if
dmin (C) ≥ 2k + 1.
The error correction capabilities of any code will again be dependent on the
minimum distance of the code, and the minimum distance for a linear code will be
equal to the minimum weight.
Theorem 7.2.2 For any linear code C
dmin (C) = wtmin (C).
The concepts of generator matrix G and parity check matrix H are the same.
A little refinement must be made for finding G from H and the other way around.
Namely, if G = (Im | A), then H = (−AT | In−m ). Theorem 7.1.9 must be also
slightly generalised to allow the design of non-binary error-correcting codes capable
of correcting all single mistakes.
Theorem 7.2.3 A linear (non-binary) code with parity check matrix H corrects all
single mistakes if and only if no one column of H is a multiple of another column.
Proof Let e be an error vector of weight 1. Then e = aei for some i = 1, 2, . . . , m

and some 0 = a ∈ Zp , i.e.,
e = (0 . . . 0 a 0 . . . 0) = a(0 . . . 0 1 0 . . . 0).
The syndrome of such a mistake will be
HeT = ahi ,
where hi is the ith column of H. If there is no other column in H that is a multiple of

hi , then we can find both i and a. If there is such a column the identification of the
mistake would be impossible. �
Example 7.2.2 Suppose p = 3. We can obtain an analogue of the Hamming code

by defining a code by its parity check matrix
⎡ ⎤
0000111111111
H = ⎣0 1 1 1 0 0 0 1 1 1 2 2 2⎦.
1012012012012
7.2 Non-binary Error-Correcting Codes 201
The secret behind this matrix is that every nonzero column vector from Z33 is either a
column of H or a multiple of such a column. Then this code will be a (10, 13)-code
that corrects any single mistake. For example, the syndrome
⎡ ⎤
2
HyT = ⎣ 0 ⎦ = 2h7 ,
1
for y ∈ Z133 shows that a mistake happened in the 7th position and it should be
corrected by subtracting 2 (or adding 1) to the coordinate y7 .
Exercises
In the exercises below, all matrices and codes are ternary, i.e., over Z3 .
1. Suppose the matrix ⎡ ⎤
121211
H1 = ⎣ 1 2 1 0 2 1 ⎦
210102
is taken as a parity check matrix of a ternary error correcting code C1 . Will this
code correct all single errors?
2. Find the generator matrix for the code C2 with the following parity check matrix
⎡ ⎤
121211
H2 = ⎣ 1 2 1 0 2 2 ⎦ .
210101
3. Suppose that the code C2 was used. Decode the vector y = (0 2 2 2 2 2).
7.2.2 Reed–Solomon (RS) Codes
No changes at all should be made for polynomial codes and BCH codes. Among
non-binary BCH codes Reed–Solomon codes are of special practical importance.
They are also widely used to build other good codes, including good binary codes.
Definition 7.2.1 Let F be a finite field of q = pr elements and α be any of its

primitive elements. Let d > 1 be a positive integer such that |F| > d − 1. A Reed–
Solomon (or RS) code over F is a polynomial (q − d, q − 1)-code with the generator
polynomial
g(x) = (x − α)(x − α 2 ) . . . (x − α d−1 ). (7.9)
Theorem 7.2.4 The Reed–Solomon (q − d, q − 1)-code with the generator polyno-

mial (7.9) has a minimum distance of at least d.
Proof We consider the trivial extension of fields F ⊆ F. Let mi (x) be the minimal
irreducible polynomial of α i over F. Then mi (x) = x − α i and we see that the RS
code is a BCH code. By Theorem 7.1.11 its guaranteed minimum distance is d. �
Example 7.2.3 Let F = Z2 [t]/(t 2 + t + 1). Then F = {0, 1, α, β}, where α = t and
β = t + 1. We note that β = α 2 , so α is a primitive element of F. The RS (2, 3)-code
over F with generator polynomial g(x) = x +α (which is the same as x −α) will have
minimum distance 2. It will have 42 = 16 codevectors. Let us encode the message
(α β). We have
(α β) → α + βx → (α + βx)(x + α) = α 2 + (αβ + α)x + βx 2

= β + βx + βx 2 = (β β β).
Here is the table of all the codevectors:
(0 0 0) (α 1 0) (0 α 1) (α β 1)
(β α 0) (0 β α) (β 1 α) (1 1 1)
(1 β 0) (0 1 β) (1 α β) (α α α)
(β 0 1) (α β 1) (1 0 α) (β β β)
Example 7.2.4 Let F = Z5 . We take α = 2 as the primitive element of Z5 . The RS (2,

4)-code over F with generator polynomial g(x) = (x−α)(x −α 2 ) = (x−2)(x−4) =
x 2 + 4x + 3 will have minimum distance 3. It will have 52 = 25 codevectors:
(3 4 1 0), (2 1 4 0), (1 3 2 0), (0 3 4 1), (1 1 1 1), . . . .
The Reed–Solomon codes are among the best known. To substantiate this claim
let us prove the following
Theorem 7.2.5 (The Singleton bound) Let C be a linear (m, n)-code. Then
dmin (C) ≤ n − m + 1.
Proof Let us consider the codeword E(e1 ) = g1 . It has only one nonzero information
symbol. It has n − m check symbols which may also be nonzero. In total, wt(g1 ) ≤
n − m + 1. But
dmin (C) = wtmin (C) ≤ wt(g1 ) ≤ n − m + 1.
The theorem is proved. �
Now we can show that any Reed–Solomon code achieves the Singleton bound.
Theorem 7.2.6 Let C be a Reed–Solomon (m, n)-code. Then dmin (C) = n − m + 1.

7.2 Non-binary Error-Correcting Codes 203
Proof Let us consider the Reed–Solomon code C of length n with the generator
polynomial
g(x) = (x − α)(x − α 2 ) . . . (x − α d−1 ).
Let m be the number of information symbols. We know that dmin (C) ≥ d since d
is the guaranteed minimum distance of this code. Since the degree of the generator
polynomial is d − 1, this will be the number of check symbols of this polynomial
code, i.e., d − 1 = n − m. Hence dmin (C) ≥ d = n − m + 1. By the previous theorem
we obtain dmin (C) = n − m + 1 and C achieves the Singleton bound. �
As we mentioned, good binary codes can be obtained from RS codes. Let F be a
field of 2r elements, n = 2r −1. We know that F is an r-dimensional vector space over
Z2 and any element of F can be represented as a binary r-tuple. First we construct
an RS (m, n)-code over F and then, in each codeword we replace every element of
F with the corresponding binary tuple. We obtain an (rm, rn)-code which is binary.
Such codes are very good in correcting bursts of errors (several errors occurring in
close proximity) because such multiple errors affect not too many elements of F in
codewords of the RS-code and can be therefore corrected. Such codes are used in
CD-players because any microscopic defect on a disc results in a burst of errors.
We see that our choice of a code might be a result of the selected model for
mistakes: when they are random and independent we use one type of code, when
they are highly dependent (and come in bursts) we user another type of code.
Example 7.2.5 In Example 7.2.3, using the basis {1, α} for F, we may represent the
elements of F as follows:
0 → (0 0), 1 → (1 0), α → (0 1), β = α 2 → (1 1).
Then we will obtain a binary (4, 6)-code with the following codevectors:
(0 0 0 0 0 0) (0 1 1 0 0 0) (0 0 0 1 1 0) (0 1 1 1 1 0)
(1 1 0 1 0 0) (0 0 1 1 0 1) (1 1 1 0 0 1) (1 0 1 0 1 0)
(1 0 1 1 0 1) (0 0 1 0 1 1) (1 0 0 1 1 1) (0 1 0 1 0 1)
(1 1 0 0 1 0) (0 1 1 1 1 0) (1 0 0 0 0 1) (1 1 1 1 1 1).
The standard texts on error-correcting codes are [2, 3].

Exercises
In a series of exercises below we construct a ternary BCH-code of length n = 8 with
minimum distance 4 using the field F = Z3 [x]/(x 2 + 2x + 2).
1. Show that α = x is a primitive element of F. Build a ‘table of powers’ of α.
2. Show that the minimal annihilating polynomials of α, α 2 and α 3 are
m1 (x) = x 2 + 2x + 2, m2 (x) = x 2 + 1 and m3 (x) = x 2 + 2x + 2,
respectively.
3. Determine the generator polynomial g(x) of C.

4. How many information symbols does this code have?
5. Find the generator matrix G of C.
7.3 Fingerprinting Codes
The rapid growth of the digital economy, facilitated by spread of broadband avail-
ability, and rapid increases in computing power and storage capacity, has created
a global market for content and rights holders of intellectual property. But it also
creates a threat, that without adequate means of protection, piracy will prevent this
market from functioning properly.
Managing intellectual property in electronic environments is not an easy task. On
the one hand owners of the content would like to sell it for profit to paying customers
but at the same time to protect it from any further illegal distribution. There are many
ways to do so. One avenue is opened with the recent development of fingerprinting2
codes that provide combinatorial and algebraic methods of tracing illegally ‘pirated’
data. The idea is that a codeword might be embedded in the content (software, music,
movie) in such a way that any illegally produced copies will reveal the distributor.
For example, such a situation emerges in the context of pay TV, where only
paying customers should be able to view certain programs. The broadcasted signal
is normally encrypted and the decryption keys are sent to the paying subscribers. If
an illegal decoder is found, the source of its decryption keys must be identified.
Fingerprinting techniques have been used for quite some time; fingerprints have
been embedded in digital video, documents and computer programs. However, only
recently has it become possible to give protection against colluding malicious users.
This is what fingerprinting codes are about. This section is largely based on the
groundbreaking paper of Boneh and Shaw [4] and also on the paper by Staddon
et al. [5].
7.3.1 The Basics of Fingerprinting
There are numerous ways to embed a codeword identifying the user in the content
which is normally represented as a file. A copy of the file sold to the user can
therefore be characterised by a vector x = (x1 , x2 , . . . , xn ) ∈ Znq specific to this
particular copy. This is a fingerprint of this copy. Any subset C ⊂ Znq may be used
as the set of fingerprints and will be called a fingerprinting (watermarking) code.
A malicious coalition of users may try to create a pirate copy of the product
by trying to identify the embedded fingerprint and to change it. To achieve this,
they might compare their files—for example, using the diff command—and find
2 Or watermarking, the war in terminology is currently raging.

7.3 Fingerprinting Codes 205
positions in which their files differ. These will certainly belong to the code so the
coalition may discover some but not all symbols of the fingerprint. They might change
the symbols in the identified positions with the goal of producing another legitimate
copy of the product that was sold to another user (or has not yet been sold). This way
they might ‘frame’ an innocent user.
The owner of the property rights for the content would like to design a scheme
that enables the identification of at least one member of the coalition that produced a
pirated copy. As a bottom line, the scheme should make it infeasible for a malicious
coalition to frame an innocent user by producing their fingerprint. Of course, we
have to make an assumption that the malicious coalition is not too large (and here
we have clear analogy with error-correcting codes that too are effective if there were
not too many mistakes during the transmission).
Let us now proceed to formal definitions.
Definition 7.3.1 Let X ⊆ Znq . For any coordinate i we define the projection

Pi (X) = {xi }.
x∈X
In other words Pi (X) is the set of all ith coordinates of the words from X.
Example 7.3.1 Let X = {x, y, z}, where
x = (0 1 2 3),
y = (0 0 2 2),
x = (0 1 3 1).
Then P1 (X) = {0}, P2 (X) = {0, 1}, P3 (X) = {2, 3}, P4 (X) = {1, 2, 3}.
Definition 7.3.2 We also define the envelope of X
desc(X) = {y ∈ Znq | yi ∈ Pi (X) for all i}.
Elements of the envelope are called descendants of X and elements from X are called
their parents. It is clear that X ⊆ desc(X).
A descendant of a set of vectors X = {x1 , x2 , . . . , xn } inherits coordinates from

vectors in X but may take, say, its 1st coordinate from x5 , the second from x2 and all
the rest from x3 . For example, in Example 7.3.1 vector (0 0 3 3) is a descendant of
X but vector (0 2 2 2) is not.
Definition 7.3.3 For any positive integer w, we will also define a restricted envelope
descw (X), which consists of all descendants of subsets of X of cardinality w.
We illustrate the difference between desc(X) and descw (X) in the following
example.
Example 7.3.2 Let X = {x, y, z}, where
x = (1 0 0),
y = (0 1 0),
x = (0 0 1).
Then any vector in Z32 will be a descendant of X. At the same time (1 1 1) ∈

/ desc2 (X).
Example 7.3.3 Let C ⊂ Z44 be the fingerprinting code consisting of the vectors
u = (0 1 2 3),
v = (1 2 3 0),
w = (2 3 0 1),
x = (3 0 1 2),
y = (0 0 0 0),
z = (1 1 1 1).
The triple X = {v, y, z} can produce the descendant s = (0 2 1 1) but not t =

(2 0 1 1). We see that s ∈ desc3 (X) but s ∈ / desc2 (X). To prove the last statement
we note that for s to be a descendant of a pair of vectors from C, one of them must
be either u or y (otherwise we cannot get the first coordinate 0). Neither of these two
vectors has 2 as their second coordinate. Hence the second vector in this pair must
be v. But P4 ({u, v, y}) does not contain 1. Hence s ∈/ desc2 (X).
Exercises
1. Let X = {x1 , x2 , x3 }, where
x1 = (1 1 1 0 0 0 2 2 2),
x2 = (1 1 2 2 0 0 1 1 2),
x3 = (1 2 2 0 2 0 1 2 0).
(a) Find the projections Pi (X) for i = 1, 2, 3.

(b) Find the number of elements in the envelope desc(X).
(c) Write down a vector y which belongs to desc2 (X) but for which no parent
can be identified.
2. Give an example of a set of vectors X such that |X| > 1 and desc(X) = X.
3. Suppose X = {x1 , x2 , . . . , xn } and Pi (X) = mi for i = 1, . . . , n. Prove that
|desc(X)| = m1 · . . . · mn .
7.3.2 Frameproof Codes
One goal that immediately comes to our mind is to secure that a coalition of malicious
users cannot frame an innocent user. Of course, such protection can be put in place
only against reasonably small malicious coalitions in a direct analogy with error-
correcting codes where the decoder is capable of correcting only a limited number
of mistakes.
Definition 7.3.4 A code C is called w-frameproof (w-FP code) if for every subset
X ⊂ C such that |X| ≤ w we have
desc(X) ∩ C = X.
In other words, a code is w-frameproof if no coalition of size at most w can frame

another user, who is not in the coalition, by producing the fingerprint of that user.
Example 7.3.4 The code C consisting of the n elements of the standard basis of Znq
e1 = (1 0 0 . . . 0),
e2 = (0 1 0 . . . 0),
...
en = (0 0 0 . . . 1)
is w-frameproof for any w = 1, 2, . . . , n.
Example 7.3.5 The code in Example 7.3.3 is 3-frameproof. Indeed, the first four
users cannot be framed by any coalition to which they do not belong because each
of them contains 3 in the position where all other users have symbols different from
3. It is also easy to see that the two last users cannot be framed by any coalition of
three or fewer users.
The following function will be useful in our proofs. For any two words u, v of
length n we define I(u, v) = n − d(u, v). In other words, I(u, v) is the number of
coordinates where u and v agree.
As in the theory of error-correcting codes, the minimum distance dmin (C) between
any two distinct codewords is an important parameter.
Theorem 7.3.1 Let C be a fingerprinting code of length n. Suppose that

1
dmin (C) > n 1 − ,
w
then C is a w-frameproof code.
Proof Suppose that a coalition X = {x1 , x2 , . . . , xw } can frame an innocent user

y ∈ C \ X, that is y ∈ desc(X). Since y, xi ∈ C, for every i = 1, 2, . . . , w we
have d(y, xi ) > n (1 − 1/w) and hence we obtain I(y, xi ) = n − d(y, xi ) < n −
(n − n/w) = n/w. This means that y and xi coincide in less than n/w positions and,
hence, fewer than n/w positions of y could come from xi . Since we have exactly w
elements in X, it follows now that fewer than w · n/w = n coordinates in y can come
from vectors of X. Hence at least one coordinate of y, say yj , does not coincide with
the jth coordinates of any of the vectors x1 , x2 , . . . , xw and therefore yj ∈
/ Pj (X).
This contradicts the assumption that y is a descendant of X. �
Exercises
The code C ⊂ {1, 2, 3}6 consists of six codewords:
c1 = (1 1 1 1 1 1), c2 = (2 2 2 2 2 2), c3 = (3 3 3 3 3 3),

c4 = (1 2 3 1 2 3), c5 = (2 3 1 2 3 1), c6 = (3 1 2 3 1 2).
1. Find the minimum distance of this code.

2. Prove that it is 2-frameproof.
7.3.3 Codes with the Identifiable Parent Property
Definition 7.3.5 We say that a code C has the identifiable parent property of order
w (w-IPP code) if for any x ∈ descw (C) the family of subsets
{X ⊆ C | |X| ≤ w and x ∈ desc(X)} (7.10)
has a nonzero intersection.

What this says is that, for any w-IPP code and for any x ∈ descw (C) this vector
cannot be produced without the participation of a certain user: the one who is in the
intersection of the family of subsets (7.10). Therefore this user can be identified. The
w-IPP property is stronger than w-frameproofness.
Proposition 7.3.1 Any code C with the identifiable parent property of order w is
w-frameproof.
Proof Suppose that the w-IPP property holds but a certain coalition X with no more
than w users can frame an innocent user c ∈ C \ X. Then c ∈ desc(X) and c ∈
desc({c}). Since {c} ∩ X = ∅, this contradicts the w-IPP property. �
Let us now give a non-trivial example of a w-IPP code.
Example 7.3.6 The following code has the identifiable parent property of order 2
and was constructed with the help of a Reed–Solomon code:
c1 = (1 1 1 1 1),
c2 = (1 2 2 2 2),
c3 = (1 3 3 3 3),
c4 = (1 4 4 4 4),
c5 = (2 1 2 3 4),
sc6 = (2 2 1 4 3),
c7 = (2 3 1 4 2),
c8 = (2 4 3 2 1),
c9 = (3 1 4 2 3),
c10 = (3 2 3 1 4),
c11 = (3 3 2 4 1),
c12 = (3 4 1 3 2),
c13 = (3 4 1 3 2),
c14 = (4 2 4 3 1),
c15 = (4 4 2 1 3).
It is really hard to check that this code indeed is 2-IPP but relatively easy to check
that dmin (C) = 4. As we will see later Theorem 7.3.3 will imply 2-IPP for this code.
Codes with the identifiable parent property normally require a large alphabet. The
binary alphabet is the worst one.
Proposition 7.3.2 There does not exist a binary 2-IPP code C with |C| ≥ 3.
Proof Suppose x, y, z ∈ C be three distinct codewords and X = {x, y, z}. We define

a descendant u in the following way. For each i, we consider the coordinates xi , yi ,
zi ; among them there will be a majority of zeros or a majority of ones. We define ui
to coincide with the majority. Then u belongs to each of the desc(x, y), desc(x, z),
and desc(y, z). However, {x, y} ∩ {x, z} ∩ {y, z} = ∅. �
For generalisations of this to alphabets containing q elements, see Exercise 1 below.

We see from the Example 7.3.6 that it is not too easy to check that the code in the
above example satisfies the identifiable parent property of order 2. But there exists
one slightly stronger property that is much easier to check.
Definition 7.3.6 A code C is called w-traceable (w-TA code) if for any y ∈ descw (C)
the inclusion y ∈ desc(X), for some subset X ⊆ C with |X| = w, implies the existence
of at least one codeword x ∈ X such that d(y, x) < d(y, z) for any z ∈ C \ X.
If a code is a w-TA code, we can always trace at least one parent of y ∈ descw (C)
using a process similar to maximum likelihood decoding for error correcting codes.
Indeed, the following proposition is true.
Proposition 7.3.3 Suppose that a code C is w-traceable, and y ∈ desc(X) for some
subset X ⊆ C with |X| = w. Let x1 , x2 , . . . , xk be the set of vectors from C such
that d = d(y, x1 ) = · · · = d(y, xk ) and no vector z ∈ C satisfies d(y, z) < d. Then
{x1 , x2 , . . . , xk } ⊆ X.
Proof Suppose xi ∈/ X for some i. Then by the traceability property there must be a
vector in x ∈ X such that d(y, x) < d(y, xi ) = d, which contradicts the minimality
of d. �
Let us now state one obvious fact.
Lemma 7.3.1 Let X = {x1 , x2 , . . . , xw } and y ∈ desc(X). Then there exists i ∈
{1, 2, . . . , w} such that I(xi , y) ≥ n/w.
Proof Suppose on the contrary that I(xi , y) < n/w for all i ∈ {1, 2, . . . , w}. Then
y inherited fewer than n/w coordinates from each xi . In total it inherited fewer than
n · n/w = n coordinates from vectors of X and cannot be a descendant of X. �
Theorem 7.3.2 Any w-TA code C is also a w-IPP code.
Proof Suppose that the code C is w-traceable. Let x ∈ descw (C). Let us consider a
family of subsets (7.10). Suppose y ∈ C is the closest or one of the closest vectors of
C to x, i.e., the distance d(x, y) is the smallest possible. Because C is w-traceable y
must belong to every subset of the family (7.10), hence its intersection is nonempty
and the w-IPP property holds. �
Theorem 7.3.3 Suppose that a code C of length n has a minimum distance

1
dmin (C) > n 1 − 2 .
w
Then C is a w-traceable code and hence has the identifiable parent property of
order w.
Proof Let X ⊆ C with |X| = w. Suppose X = {x1 , x2 , . . . , xw }. Let us consider any
z ∈ C \ X. Then, for any i, I(z, xi ) = n − d(z, xi ) < n − (n − n/w2 ) = n/w 2 , i.e.,
the number of coordinates where z and xi agree is less than n/w2 . We now define
I(z, X) = {j | zj ∈ Pj (X)}.
We obtain now n n
I(z, X) ≤ wI(z, xi ) < w · = . (7.11)
w2 w
On the other hand, by Lemma 7.3.1, for every y ∈ desc(X) we can find a xi such that
I(xi , y) ≥ n/w. Thus we obtain d(xi , y) ≤ n − n/w = n(1 − 1/w) while for any
z ∈ C \ X we will have I(z, y) ≤ I(z, X) < n/w and hence d(z, y) > n − n/w =
n(1 − 1/w), proving w-traceability. �
This theorem works only for a reasonably large alphabet.
Exercises
1. Let the size of the alphabet be q. Then there does not exist a w-IPP code C with
|C| > w ≥ q.
2. Using the Reed–Solomon code C over Z17 of length 16 with minimum distance 13,
show that there exists a fingerprinting code with the identifiable parent property
of order 2 containing 83521 codewords.
References
1. Ross, K., Wright, K.: Discrete Mathematics. Prentice Hall, Upper Saddle River (1999)
2. Peterson, W.W., Weldon, E.J.: Error-Correcting Codes, 2nd edn. MIT Press, Cambridge (1972)
3. Macwilliams, F.J., Sloane, N.J.A.: The Theory of Error-Correcting Codes. North-Holland,
Amsterdam (1977)
4. Boneh, D., Shaw, J.: Collusion-secure fingerprinting for digital data. IEEE Trans. Inf. Theory
44(5), 1897–1905 (1998)
5. Staddon, J.N., Stinson, D.R., Wei, R.: Combinatorial properties of frameproof and traceability
codes. IEEE Trans. Inf. Theory 47(3), 1042–1049 (2001)
Chapter 8
Compression
Good things, when short, are twice as good.

Baltasar Gracián y Morales (1601–1658)
Compression of files is an important practical question. Memory is always a limiting

resource so if our files can be stored in a more economic fashion, this has to be
done. Some files, like pictures, contain a lot of redundancy and can be compressed
significantly even without loss of quality of pictures. There are numerous ways to
do so.
There are three major approaches to measuring the quantity of information in
a message of a certain alphabet: probabilistic, combinatorial, and algorithmic. The
probabilistic view is that information is anything that resolves uncertainty. The more
uncertain an event that may or may not take place in the future, the more information
is required to resolve the uncertainty. This works well with messages generated by
random sources but cannot help answering questions like: “What is the quantity of
information in Leo Tolstoy’s “War and Peace”? or “How much information is needed
for the reproduction of a particular form of cockroach?”.
The combinatorial approach tries to reduce complex events to some basic ones.
Suppose you would like to know if there will be rain tomorrow. You look at the
weather forecast and get the answer. This is a simple ‘yes’ or ‘no’ situation and
it is easy to resolve. Suppose a 1 means ‘no rain’ and a 0 means rain, then one
binary digit carries all the information you need. One bit is a unit of information
expressed as a choice between two possibilities 0 and 1. Asking whether there will
be rain tomorrow you ask for one bit of information. Information for more complex
events can also be measured in bits. Given a set of possible events we ask how
many bits of information is required to individualise each particular event. Suppose
n binary digits are sufficient to give a distinctive label to every event and you cannot
do this with n − 1 binary digits. Then we say that every event in the set of events
carries n bits of information.

DOI 10.1007/978-3-319-21951-6_8
214 8 Compression
The algorithmic approach is closely related to the concept of (Kolmogorov)

complexity. Roughly speaking, the longer the program that we have to write for
a computer to output the given message, the less redundancy this massage has and
less compressible it is.
Here we give a glimpse of the combinatorial approach describing Fitingof’s
compression code. These types of codes are universal as they can be used when
we do not know how the data was generated. Boris Fitingof [2] developed the first
such code, and the construction is quite elegant. His paper was inspired by a paper of
Kolmogorov [1]. However, it is fair to consider Fitingof the founder of the universal
encoding.
8.1 Prefix Codes
8.1.1 Information and Information Relative to a Partition
Let be a finite set and || be the number of elements in it. Suppose that we want
to give an individual label to each element of and each label must be a sequence
of zeros and ones. How long must our sequences be so that we have enough labels
for all elements of ? Since we have exactly 2n sequences of length n, this number
should be taken so that 2n ≥ ||. If we aim at sequences of the shortest possible
length, we should choose n so that
2n ≥ || > 2n−1 . (8.1)
This means that to specify an element of requires n bits of information. For

example, the labeling can be done in the following way. Let || = 2n (or n =
log2 ||), and ω0 , ω1 , . . . , ω||−1 be elements of listed in some order. Then we
can think of the correspondence
ωk → k → k(2) ,
where k(2) is the binary representation of k written according to the following

convention: if k, in binary, has < n binary digits then n − zeros are added
in front of the standard binary representation of k. In other words, the information
contained in ωk is the binary representation of k. Then, under this arrangement, every
element of carries exactly n bits of information.
Example 8.1.1 Let || = 16, n = 4. Then ω5 can be put in correspondence to 5 and
to 5(2) = 0101. Thus, every element of carries 4 bits of information.
Definition 8.1.1 The information of an element ω ∈ is by definition
I (ω) = log2 ||, (8.2)
where || denotes the number of elements in .

8.1 Prefix Codes 215
Here and further in this section all logarithms will be taken to base 2.
Let x be the nearest integer which is greater than or equal to x. Then (8.1)
implies n ≥ log || > n − 1, hence, for an element ω ∈ , the integer I (ω) is
the minimal number of binary symbols necessary for individualising ω among other
elements of .
Let now
= 1 ∪ 2 ∪ · · · ∪ n (8.3)
be a partition of into n disjoint classes. Let π(ω) denote the class which contains ω.
Definition 8.1.2 The information of an element ω ∈ relative to the given partition
is defined as
I (ω) = log |π(ω)|. (8.4)
It can be interpreted as follows. In a partitioned set, when the information about the
partition is public knowledge, every element ω ∈ carries information only about
its class π(ω). In the extreme case, when there is only one class in the partition, i.e.,
the set itself, we get the same concept as in Definition 8.1.1.
Example 8.1.2 Let = Z42 be the four-dimensional vector space over Z2 . Let
i = {y ∈ Z42 | wt(y) = i},
where wt(y) is the Hamming weight of y. Then = 0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 is a

partition of . Let u = 1111, v = 0010, w = 0101.1 Then
� �
4
I (u) = log |4 | = log = log 1 = 0 bits,
4
� �
4
I (v) = log |1 | = log = log 4 = 2 bits,
1
� �
4
I (w) = log |2 | = log = log 6 ≈ 2.6 bits.
2
Example 8.1.3 Let = Zn2 and
= 0 ∪ 1 ∪ . . . ∪ n
be the partition for which, as in the previous example, i consists

� � of vectors of
Hamming weight i. Let z ∈ Zn2 have weight d. Since |d | = dn ,
� �
n
I (z) = log |d | = log .
d
1 In this chapter we will identify vectors from Zn2 and words of length n in the binary alphabet.
216 8 Compression
If d is small, then
� �
n n(n − 1) . . . (n − d + 1)
I (z) = log = log < log n d = d log n,
d d!
i.e., the information of a vector of a small weight is rather small relative to n. If d

is close to n, the information will be small too. It will be maximal for d = n/2 in
which case, due to the asymptotic formula,
� � �
n 2 2n
∼ ·√ , (8.5)
n/2 π n
which can easily be obtained from Stirling’s formula 2.2. This implies
� � � �
n n 1 1
I (z) = log = log ∼ n − log n + (1 − log π) ∼ n,
d n/2 2 2
which is only slightly smaller than n and asymptotically equal to n.

Proposition 8.1.1 For the partition (8.3)
�
2−(I (ω)+log n) = 1. (8.6)
ω∈
Proof For any element ω of the class i
−(I (ω) + log n) = − log |i | − log n.
Thus
� n
� n
� n
�
|i | 1
2−(I (ω)+log n) = |i |2(− log |i |−log n) = = = 1.
|i |n n
ω∈ i=1 i=1 i=1
�
We shall see soon what equation (8.6) means.
Exercises
1. How many bits of information does one need to specify one letter of the English
alphabet?
2. In a magic trick, there are three participants: the magician, an assistant, and a
volunteer. The assistant, who claims to have paranormal abilities, is in a sound-
proof room. The magician gives the volunteer six blank cards, five white and one
blue. The volunteer writes a different integer from 1 to 100 on each card, as the
magician is watching. The volunteer keeps the blue card. The magician arranges
the five white cards in some order and passes them to the assistant. The assistant
then announces the number on the blue card. How does the trick work?
8.1.2 Non-uniform Encoding. Prefix Codes
Let X be a finite set (alphabet) and X n be the set of all possible words of length n in
this alphabet. We stress that in X n we collect all words regardless of whether they
are meaningful or not. For example, if X is the English alphabet, then yyyx za is also
considered as a word belonging to X 6 . Let also W (X ) be the set of all words in this
alphabet, i.e.,
W (X ) = X 1 ∪ X 2 ∪ · · · ∪ X n ∪ . . .
In particular, W (Z2 ) is the set of all binary sequences.
Definition 8.1.3 Let X be an alphabet and n be a positive integer. By a non-uniform

(compression) code we understand a mapping
ψ : X n → W (Z2 ). (8.7)
This means that every word w from X n is encoded into a binary codeword ψ(w).
Note that the length of w is strictly n while the length of ψ(w) can be arbitrary. The
code of a message M, which is a word from W (X ), will be obtained as follows. We
divide M into segments of length n and the tail which is of length at most n (but
by agreement it can also be viewed as of length n; for example, for English words
we may add as many letters ‘z’ at the end of the message as is needed). Then M is
represented as M = w1 w2 . . . ws . . ., where wi ∈ X n and we define
ψ(w1 )ψ(w2 ) . . . ψ(ws ) . . . (8.8)
to be the encoding for M. What we should take care of is that the message (8.8) can
be uniquely decoded and that this decoding is as easy as possible. This is non-trivial
since the words ψ(w1 ), . . . ψ(ws ) . . . may have different lengths and we may not
know, for example, where ψ(w1 ) ends and where ψ(w2 ) starts. We will now introduce
a class of codes for which such decoding is possible.
Definition 8.1.4 A non-uniform code (8.7) is said to be a prefix code if for every two
words w1 , w2 ∈ X n neither of the two codewords ψ(w1 ), ψ(w2 ) is the beginning of
the other.
If our code is a prefix one, then we can decode (8.8) uniquely. Indeed, there will be
only one codeword which is the beginning of (8.8) and that will be ψ(w1 ). Similarly
we decode the rest of the message.
Example 8.1.4 Let X = {a, b, c} and ψ(a) = 1, ψ(b) = 01, ψ(c) = 00. This is a
prefix code and the message 0001101100 can be uniquely decoded as
0001101100 = ψ(c)ψ(b)ψ(a)ψ(b)ψ(a)ψ(c) −→ cbabac.

218 8 Compression
Example 8.1.5 Every binary rooted tree gives us a prefix code. We assign a 1 to each
edge from a parent to its left child and a 0 to each edge from a parent to its right
child. Then the set of all terminal vertices can be identified with the set of codewords
of a prefix code. Indeed, for any terminal vertex, there is a unique directed path from
the root to it. This path gives a string of 0’s and 1’s which we assign to the terminal
vertex. Since we always finish at a terminal vertex, no path is a beginning of the other
and therefore no codeword will be a beginning of the other. For example, the tree
below will give us the code {0, 11, 101, 100}.
1 0
1 0 0
11 1 0
101 100
Theorem 8.1.1 (Kraft’s Inequality) Suppose that |X n | = q. Then a prefix code

ψ : X n → W (Z2 ) with the lengths of codewords m 1 , m 2 , . . . , m q exists if and only if
q
�
2−m i ≤ 1. (8.9)
i=1
Proof We will assume that m = max(m 1 , . . . , m q ), which means that the longest
codeword has length m. Suppose that a prefix code possesses a codeword u of length
i. Then the 21 = 2 words u0 and u1 cannot be codewords. The 22 = 4 words u00,
u01, u10 and u11 also cannot be codewords. In general all 2k−i words of length k
obtained by extending u to the right cannot be codewords. If v is another codeword
of length j then it excludes another 2k− j words of length k from being codewords.
The codewords u and v cannot exclude the same word, otherwise one of them will
be the beginning of the other.
Let us denote by S j the number of codewords of length j. Then, as we just noticed,
S1 · 2k−1 + S2 · 2k−2 + · · · + Sk−1 · 2
words of length k cannot be codewords. This number plus Sk , which is the number of
codewords of length k, should be less than or equal to 2k , which is the total number of
words of length k. The existence of a prefix code with the given lengths of codewords
implies that the following inequality holds for any k = 1, . . . m:
2k − S1 · 2k−1 − S2 · 2k−2 − · · · − Sk−1 · 2 − Sk ≥ 0. (8.10)

Thus, all these inequalities are necessary conditions for the existence of such a prefix
code. But the inequality for k = m is the strongest because it implies all the rest.
Indeed, (8.10) implies
2k − S1 · 2k−1 − S2 · 2k−2 − · · · − Sk−1 · 2 ≥ 0
and after dividing by 2 we get
2k−1 − S1 · 2k−2 − S2 · 2k−3 − · · · − Sk−1 ≥ 0,
i.e., the same inequality for k − 1. Thus, indeed, the inequality for k = m implies all
other inequalities.
Taking this strongest inequality (8.10) and dividing it by 2m we get
m
�
S j · 2− j ≤ 1. (8.11)
j=1
This is equivalent to (8.9) as
m
� q
�
S j · 2− j = 2−m i .
j=1 i=1
Hence the inequality (8.9) is a necessary condition for the existence of a prefix code
with lengths of codewords m 1 , m 2 , . . . , m q .
Let us show that it is also sufficient. Let S j be the number of codewords of length
j and m be the maximal length of codewords. We will again use (8.9) in its equivalent
form (8.11) which implies (8.10) for all k = 1, . . . , m.
Firstly, we take S1 arbitrary words of length 1. Since (8.10) for k = 1 gives
2 − S1 ≥ 0, we have S1 ≤ 2 and we can do this step. Suppose that we have done
k − 1 steps already and have chosen Si words of length i for i = 1, . . . , k−1 so that
no one word is the beginning of the other. Then the chosen words will prohibit us
from choosing
S1 · 2k−1 + S2 · 2k−2 + · · · + Sk−1 · 2
words of length k. But due to (8.10)

� �
2k − S1 · 2k−1 + S2 · 2k−2 + · · · + Sk−1 · 2 ≥ Sk ,
hence we can find Sk words of length k which are compatible with the words previ-
ously chosen. This argument shows that the construction of the code can be completed
to the end. �
220 8 Compression
Example 8.1.6 Let us consider the following equation:
1 1 1 1
+ 2 + 3 + 3 = 1. (8.12)
21 2 2 2
If X = {a, b, c, d}, then according to Theorem 8.1.1 there exists a prefix code
ψ : X → W (Z2 ) with the lengths of the codewords 1, 2, 3, 3. Let us choose the
codeword ψ(a) = 0 of length 1, then we cannot use the words 00 and 01 for the
choice of the codeword for b of length 2 and we choose ψ(b) = 10. For the choice
of codewords for c and d we cannot choose the words 000, 001, 010, 011 (because
of the choice of ψ(a)) and the words 100, 101 (because of the choice of ψ(b)), thus
we choose the two remaining words of length 3, i.e., ψ(c) = 110 and ψ(d) = 111.
Suppose now that X = {a, b}. Then |X 2 | = 4 and we can use (8.12) again for
this situation to define a code ψ : X 2 → W (Z2 ) as follows:
ψ(aa) = 110, ψ(ab) = 0, ψ(ba) = 10, ψ(bb) = 111.
The words abba and baabab will be encoded as 010 and 1000, respectively. The
word 11111001000 can be represented as
11111001000 = ψ(bb)ψ(aa)ψ(ab)ψ(ba)ψ(ab)ψ(ab)
and therefore it will be decoded to bbaaabbaabab.
Corollary 8.1.1 Let

= 1 ∪ 2 ∪ · · · ∪ n
be a partition of a finite set into n disjoint classes. Then there exists a prefix
code ψ : → W (Z2 ) such that for any ω ∈ the length of the codeword ψ(ω)
is l(ω) = I (ω) + log n, where I (ω) is the information of ω relative to the given
partition.
Proof By Proposition 8.1.1

� �
2−I (ω)+log n ≤ 2−(I (ω)+log n) = 1,
ω∈ ω∈
and the result follows from Theorem 8.1.1. �
The existence of the code is not everything. Another important issue is its fast
decodability.
Exercises
1. Check that the set {11, 10, 00, 011, 010} is a set of codewords of a prefix code
and construct the corresponding tree.
2. Given the equation
1 1 1 1 1 1 1
+ 3 + 3 + 4 + 4 + 4 + 4 = 1,
22 2 2 2 2 2 2
the existence of which prefix code can we imply from Kraft’s inequality?
3. Let X be an alphabet consisting of 9 elements. Construct a prefix binary code
ψ : X → W (Z2 ) with the lengths of the codewords: 2, 3, 3, 3, 3, 3, 4, 5, 5 in
following steps:
(a) Use Kraft’s inequality to prove that such a code exists.
(b) Construct any tree that corresponds to such a code.
(c) List the codewords corresponding to this tree.
8.2 Fitingof’s Compression Code
8.2.1 Encoding
We need to compress files when we are short of memory and want to use it effectively.
Since computer files are already written as strings of binary digits, in this section we
will consider the code ψ : Zn2 → W (Z2 ) which encodes binary sequences of fixed
length n into binary sequences of variable length. The idea of Fittingof’s compression
is expressed in Example 8.1.3, where it was shown that the information of a vector
from Zn2 of small (or large) Hamming weight is relatively small compared to n.
Therefore if we encode words in such a way that the length of a codeword ψ(x)
will be approximately equal to the information of x, then words of small and large
Hamming weights will be significantly compressed. This, for example, often works
well with photographs.
In this section we will order all binary words of the same length using lexico-
graphic order. This order depends on an order on our binary symbols and we will
assume that zero precedes one (denoted 0 ≺ 1).
Definition 8.2.1 Let y = y1 y2 . . . yn and z = z 1 z 2 . . . z n be two binary words of the
same length. We say that y is lexicographically earlier than z, and write y ≺ z, if for
some k ≥ 0
y1 = z 1 , y2 = z 2 , . . . , yk = z k , but yk+1 ≺ z k+1 ,
(which means that yk+1 = 0 and z k+1 = 1).

This order is called lexicographic since it is used in dictionaries to list words.
For example, in the Oxford English Dictionary the word “abash” precedes the word
“abate” because the first three letters of these words coincide but the fourth letter “s”
of “abash” precedes, in the English alphabet, the fourth letter “t” of “abate”.
222 8 Compression
For example, all 15 binary words of length 6 and weight 4 will be listed in lexi-
cographic order as shown:
001111 ≺ 010111 ≺ 011011 ≺ 011101 ≺ 011110 ≺ 100111 ≺ 101011 ≺ 101101

(8.13)
≺ 101110 ≺ 110011 ≺ 110101 ≺ 110110 ≺ 111001 ≺ 111010 ≺ 111100.
We can refer to these words by just quoting their ordinal numbers. We adopt the
agreement that the first word has ordinal number zero. Thus an ordinal number x
is the number of words that are earlier than x. In particular, the ordinal number of
101011 is 6.
Lemma 8.2.1 Let X = Zn2 and X d = {x ∈ X | wt(x) = d} be all vectors of weight

d. If X d is ordered lexicographically, then the ordinal number N (x) of x in X d can
be calculated as
� � � � � �
n−n d n−n 2 n−n 1
N (x) = + ··· + + , (8.14)
1 d−1 d
where the 1’s in x occupy the positions n 1 < n 2 < · · · < n d (counting from the left).
Proof Firstly, we count all the words of weight d whose n 1 −1 leftmost symbols
coincide with those of x, i.e., are all zeros, and the position n 1 is also occupied by
a zero (this condition secures that all such words are lexicographically earlier than
x). �Since�we have to distribute d ones between n−n 1 remaining positions, there will
be n−n d
1
such words. Secondly, we have to count all the words whose first n 2 −1
symbols
�n−n 2 � coincide with those of x and which have a zero in the position n 2 . There are
d−1 such words as we have to distribute d−1 ones between n−n 2 places. Finally,
we will have to count all words whose first n d −1 symbols�coincide � with those of
x and which have a zero in the position n d . There will be n−n 1
d
such words. All
the words that are lexicographically earlier than x are now counted. As the ordinal
number of x is equal to the number of words which lexicographically precede x, this
proves (8.14). �
Example 8.2.1 Consider the word x = 101011 in X 4 where X = Z62 . Then n 1 = 1,

n 2 = 3, n 3 = 5, n 4 = 6 and d = 4. So
� � � � � � � �
0 1 3 5
N (x) = + + + = 0 + 0 + 1 + 5 = 6,
1 2 3 4
which is consistent with (8.13).
We are now ready to describe Fitingof’s compression code ψ : Zn2 → W (Z2 ).

The idea of this code is to characterise any word x from X = Zn2 by two parameters,
namely, its Hamming weight d and the ordinal number N(x) of x in X d . We partition
X = Zn2 into n + 1 disjoint classes
8.2 Fitingof’s Compression Code 223
X = X0 ∪ X1 ∪ · · · ∪ Xn, X d = {x ∈ X | wt(x) = d}.
The codeword ψ(x) for x ∈ X d (i.e., for a word x of weight d) will consist of two
parts: ψ(x) = μ(x)ν(x), where μ(x) is the prefix of fixed length log(n + 1), which
is the binary code for d, and ν(x) is the binary � �� of the ordinal number N (x) of x in
� code
the class X d consisting of log |X d | = log dn binary symbols. Both parameters
together characterise x uniquely. In total the length of the codeword ψ(x) = μ(x)ν(x)
will be � � ��
n
l(ψ(x)) = log(n + 1) + log .
d
Asymptotically the length of the word ψ(x) will be
o(n)
l(ψ(x)) = I (x) + o(n), → 0,
n
i.e., equal to its information relative to the given partition.
We now state the main theorem of this chapter.
Theorem 8.2.1 (Fitingof) There exists a prefix code ψ : Zn2 → W (Z2 ) for which the
length of the codeword ψ(x) is asymptotically equal to the information of the word
x and for which there exists a decoding procedure of polynomial complexity.
Proof We have shown already that the length of the codeword ψ(x) is asymptotically
equal to the information of the word x. Let us prove that Fitingof’s code is a prefix
one. Suppose ψ(x1 ) = μ(x1 )ν(x1 ) is a beginning of ψ(x2 ) = μ(x2 )ν(x2 ). We know
that the length of μ(x1 ) is the same as the length of μ(x2 ), hence μ(x1 ) = μ(x2 ) and
hence x1 and x2 has the same weight. But then the length of ν(x1 ) is the same as the
length of ν(x2 ) and hence ψ(x1 ) and ψ(x2 ) have the same length. However, in such
a case one cannot be a beginning of another without being equal.
The proof will be continued in the next section devoted to the decoding algorithm.
�
Let us consider an example.
Example 8.2.2 Consider the Fitingof’s compression code ψ : Z31

2 → W (Z2 ). For
the vector
x = 0000000100000101000100000000000
we will have μ(x) = 00100 because wt(x) = 4 = 100(2) and the prefix must be of
length 5 to accommodate � all
� possible weights in the range from 0 to 31. The length of
the suffix ν(x) will be 31
4 = 15. Further, we will have n 1 = 8, n 2 = 14, n 3 = 16,
n 4 = 20 and
� � � � � � � �
11 15 17 23
N (x) = + + + = 9651 = 10010110110011(2) ,
1 2 3 4
224 8 Compression
thus the suffix ν(x) will be 010010110110011. Thus, ψ(x) = 001000100101

10110011 has length 20.
Exercises
1. Put the following three words of Z72 in the increasing lexicographic order:
a = 0110111, b = 0111101, c = 0111011.
2. How many vectors of Hamming weight at least 4 and at most 5 are there in Z10
2 ?
3. Calculate the ordinal number of the word w = 0011011 in X 4 ⊂ Z72 .
4. Let ψ : Z15
2 → W (Z2 ) be Fitingof’s code.
(a) How long is the prefix which shows the Hamming weight of the word?
(b) Given x = 000010100000100, how long must be the suffix of the codeword
ψ(x)?
(c) Encode x, i.e., find ψ(x).
8.2.2 Fast Decoding
To decode a message we have to decode the codewords one by one starting from the
first. Suppose the first codeword is ψ(x). First, we separate its prefix μ(x) (because it
is of fixed known length log(n + 1)) and� reconstruct
� �� d = wt(x). Then, knowing d,
we calculate the length of ν(x), which is log dn . Then looking at ν(x) and knowing
that it represents the ordinal number N (x) of x in X d , we reconstruct N = N (x).
Then we are left with the equation
� � � � � �
xd x2 x1
+ ··· + + =N (8.15)
1 d−1 d
to solve for xd < · · · < x2 < x1 , where xi = n − n i . This can be done in a fast and
elegant way using the properties of Pascal’s triangle, part of which is shown below:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
� �
The nth row of this triangle contains the binomial coefficients mn , m = 0, 1, . . . n,
where m increases from left to right. These binomial coefficients are defined induc-
tively by the formula
� � � � � �
n n−1 n−1
= + (8.16)
j j j−1
8.2 Fitingof’s Compression Code 225
�0� �0�
and the boundary conditions: 0 = 1, and m = 0 for all 0 = m ∈ Z. We also know
the explicit formula � �
n n!
= ,
m m!(n−m)!
which involves factorials.

The solution of (8.15) will be based on the formula
� � � � � � � � � �
n−d n−d+1 n−1 n n+1
+ + ··· + + = . (8.17)
0 1 d−1 d d
We prove it by induction on d for fixed but arbitrary n. If d = 1, then (8.17) becomes

� � � �
n n+1
1+ = or 1 + n = n + 1,
1 1
which is true. Let us assume that (8.17) is true for d = k − 1. Then by the induction
hypothesis, applied to the first k − 1 summands of the left-hand side of (8.17), and
using (8.16), we get
� � � � � � � �
n−k n−k+1 n−1 n
+ + ··· + + =
0 1 k−1 k
��
(n−1)−(k−1) (n−1)−(k−1)+1 n−1 n
+ + ··· + + =
0 1 k−1 k
� � � � � �
n n n+1
+ = ,
k−1 k k
which proves (8.17) for d = k. Hence, by induction, (8.17) is proven.
Proposition 8.2.1 Suppose the Eq. (8.15) is satisfied for some x1 , . . . , xd such that
xd < xd−1 < · · · < x1 . Then x1 can be found as the largest integer satisfying the
inequality
� �
x1
≤ N. (8.18)
d
�m �
Proof Suppose that x1 < m, where m is the largest integer satisfying d ≤ N.
Then, since xd < xd−1 < · · · < x1 , by (8.17)
� � � � � �
xd x2 x1
+ ··· + + ≤
1 d−1 d
226 8 Compression
� � � � � � � � � �
x1 −d+1 x1 −1 x1 x1 +1 m
+ ··· + + = −1< ≤ N,
1 d−1 d d d
which is a contradiction. Hence x1 = m. �
Proposition 8.2.1 gives us a fast algorithm for solving equation (8.15).

� � Indeed, we
find x1 directly applying Proposition 8.2.1. Then we move the term xd1 to the right
� � � � � �
xd x2 x1
+ ··· + =N−
1 d−1 d
and apply Proposition 8.2.1 again to get x2 , etc.
Example 8.2.3 If d = 4 and N = 30, then for the equation

� � � � � � � �
x4 x3 x2 x1
+ + + = 30
1 2 3 4
� � � � � �
we find successively: x41 = 15 and x1 = 6, x32 = 10 and x2 = 5, x23 = 3 and
�x4 �
x3 = 3, 1 = 2 and x4 = 2.
If we needed, for example, to find the word x which has ordinal number 30 in
X 4 ⊂ Z72 , then according to the equation (8.14)
� � � � � � � �
7−n 4 7−n 3 7−n 2 7−n 1
+ + + = 30,
1 2 3 4
we would get 7−n 1 = 6, 7−n 2 = 5, 7−n 3 = 3, 7−n 4 = 2 or n 1 = 1, n 2 = 2,

n 3 = 4, n 4 = 5, whence x = 1101100.
Exercise
1. Let ψ : Z15
2 → W (Z2 ) be Fitingof’s compression code. Decode ψ(y) =
00100011110, i.e., find y.
8.3 Information and Uncertainty
A more traditional measure of information contained in a binary word x of length n

is based on the assumption that this word was generated by a random source. This
is, of course, not always a realistic assumption so our approach here is more general.
We will show, however, that in the case of a random source the two approaches are
asymptotically equivalent, i.e., when n gets large. Let us consider a random source
which sends signal “1” with probability p and signal “0” with probability 1 − p.
Then the measure of uncertainty about what the next signal will be is given by the
binary entropy function
8.3 Information and Uncertainty 227
H ( p) = − p log p − (1− p) log(1− p)
(logarithms are to the base 2 and it is assumed that 0 · log 0 = 0). The uncertainty
is minimal when p = 0 or p = 1, in which case we essentially don’t have any
uncertainty and the entropy of such source is zero. If p = 1/2, then the uncertainty
is maximal and the entropy of such source is equal to 1. We say that one symbol sent
from such a random source contains H ( p) bits of information. Thus we have 1 bit
of information from a symbol from a random source only in the case of probability
1/2. A word of length n contains n H ( p) bits of information.
Given a binary word x of length n consisting of m 1 ones and m 2 zeros we define
m1 m1 m2 m2
H (x) = − log − log .
n n n n
Of course, if this word was generated from a random source with probability p, then
m 1 /n → p, when n gets large, and H (x) → H ( p). The following theorem then
shows that the two approaches are equivalent.
Theorem 8.3.1 For every binary word x of length n it is true that
I (x) = n (H (x) + o(1)) ,

log n
where as usual o(1) → 0, when n → ∞. Moreover, o(1) ∼ n .
Proof Let us assume that the Hamming weight of x is m 1 and let m 2 = n − m 1 . We

will need Stirling’s formula (2.2) again. We use it to calculate
� � ��
n n n!
I (x) = log ∼ log = log
m1 m1 (m 1 )!(m 2 )!
√ n −n
2πn n e
∼ log √ √
( 2πm 1 m m 1
1 −m 1
e )( 2πm 2 m m 2 −m 2
2 e )
1 n 1
= log + log � m 1 �m 1 � m 2 �m 2
2 2πm 1 m 2 n n
1 n m1 m2
= log − m 1 log − m 2 log .
2 2πm 1 m 2 n n
Therefore
I (x) 1 n m1 m1 m2 m2
= log − log − log = o(1) + H (x),
n 2n 2πm 1 m 2 n n n n
where o(1) is of order log n/n. The theorem is proved. �

228 8 Compression
References
1. Kolmogorov, A.N.: Three approaches to the definition of the concept “the quantity of informa-
tion”. Probl. Inf. Transm. 1(1), 3–11 (1965)
2. Fitingof, B.M.: Optimal encoding under an unknown or changing statistics. Probl. Inf. Transm.
2(2), 3–11 (1966)
Chapter 9
Appendix A: GAP
9.1 Computing with GAP
9.1.1 Starting with GAP
GAP is a system for computational algebra. GAP has been and is developed by the
international cooperation of many people, including user contributions. This package
is free and you can install it onto your computer using the instructions from the
website www.gap-system.org. A reference manual and tutorial can be found
there. There is plenty of information about GAP available online too.
9.1.2 The GAP Interface
Once you have started GAP, you can start working straight away. If you type a simple
command (for example, ‘quit’) followed by a semi-colon, GAP will evaluate your
command immediately. If you press enter without entering a semi-colon, GAP will
simply give you a new line to continue entering more input. This is useful if you want
to write a more complicated command, perhaps a simple program. If you wanted your
simple command to be evaluated, then simply enter a semi-colon on the new line
and press enter again. A double semi-colon executes the command but suppresses
the output. Since GAP ignores whitespace, this will work just the same as if you
had entered the semi-colon in the first place. A semi-colon will not always cause
GAP to evaluate straight away, GAP is able to work out whether you have finished
a complete set of instructions or are part of the way through entering a program.
Another way to interact with GAP, which is particularly useful for things you
want to do more than once, is to prepare a collection of commands and programs in a
text file. Then you can type the command Read (“MyGAPprog.txt”); and GAP will
evaluate all of the instructions in your text file. If your file is not in the same place
that GAP was launched from, you will have to provide its relative path (for example,
“../../GAPprogs/Example1.txt”).
DOI 10.1007/978-3-319-21951-6_9
230 9 Appendix A: GAP
9.1.3 Programming in GAP: Variables, Lists, Sets and Loops
You can declare a variable in GAP using the ‘:=’ operator. For example, if you
want a variable n to equal 2000, you would enter n := 2000;, or if you want n
to be the product of p and q you would enter n := p ∗ q;. You can also declare
lists using the ‘:=’ operator, for example, zeros := [0,0,0];. The command
list:=[m..n]; defines the list of integers m, m + 1, m + 2, . . . , n. A list may
have several identical numbers in it. Lists have a length given by the command
Length(listName);, and their entries can be referenced individually by typing
listName[index]; (indices start from 1!). In GAP a list of primes ≤ 1000 is
stored. It is called ‘Primes’. This is very useful.
gap> Primes;
[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167,
173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263,
269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,349, 353, 359, 367,
373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587,
593, 599, 601, 607, 613, 617, 619, 631, 641,643, 647, 653, 659, 661, 673, 677, 683,
691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811,
821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929,
937, 941, 947, 953, 967, 971, 977, 983, 991, 997 ]
The command
gap> Length(Primes);
168
gives us the number of primes in this list. We can find the prime in 100th position
and the position of 953 in this list as follows:
gap> Primes[100];
541
gap> Position(Primes,953);
162
Sets cannot contain multiple occurrences of elements and the order of elements
does not matter. Basically GAP views sets as ordered lists without repetitions. The
command Set(list); converts a list into a set.
gap> list:=[2,5,8,3,5];
[ 2, 5, 8, 3, 5 ]
gap> Add(list,2);
gap> list;
[ 2, 5, 8, 3, 5, 2 ]
gap> set:=Set(list);
[ 2, 3, 5, 8 ]
gap> RemoveSet(set,2);
gap> set;
[ 3, 5, 8 ]
For loops and while loops exist in GAP. Both have the same format:
for (while) [condition] do [statements] od;
9.1 Computing with GAP 231
For example, the following for loop squares all of the entries in the list ‘boringList’,
and places them in the same position in the list ‘squaredList’:
gap> boringList:=[2..13];
[ 2 .. 13 ]
gap> squaredList:=[1..Length(boringList)];
[ 1 .. 12 ]
gap> for i in [1..Length(boringList)] do
> squaredList[i]:=boringList[i]ˆ2;
> od;
gap> squaredList;
[ 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169 ]
Here is an example of using a while loop. We want to square the first five numbers
of the boringList.
gap> boringList:=[2..13];;
gap> i:=1;;
gap> while i<6 do
> boringList[i]:=boringList[i]ˆ2;
> i:=i+1;
> od;
gap> boringList;
[ 4, 9, 16, 25, 36, 7, 8, 9, 10, 11, 12, 13 ]
Lists may contain other lists. Analyse the following program that lists all pairs of
twin primes not exceeding 1000. It also illustrates the use of the ‘if-then’ command.
if [condition] then [statements] fi;
Here it is:
gap> twinpairs:=[];
[ ]
gap> numbers:=[1..Length(Primes)-1];
[ 1 .. 167 ]
gap> for i in numbers do
> if Primes[i]=Primes[i+1]-2 then
> Add(twinpairs,[Primes[i],Primes[i+1]]);
> fi;
> od;
gap> twinpairs;
[ [ 3, 5 ], [ 5, 7 ], [ 11, 13 ], [ 17, 19 ], [ 29, 31 ], [ 41, 43 ],
[ 59, 61 ], [ 71, 73 ], [ 101, 103 ], [ 107, 109 ], [ 137, 139 ],
[ 149, 151 ], [ 179, 181 ], [ 191, 193 ], [ 197, 199 ], [ 227, 229 ],
[ 239, 241 ], [ 269, 271 ], [ 281, 283 ], [ 311, 313 ], [ 347, 349 ],
[ 419, 421 ], [ 431, 433 ], [ 461, 463 ], [ 521, 523 ], [ 569, 571 ],
[ 599, 601 ], [ 617, 619 ], [ 641, 643 ], [ 659, 661 ], [ 809, 811 ],
[ 821, 823 ], [ 827, 829 ], [ 857, 859 ], [ 881, 883 ] ]
9.2 Number Theory
Most of the number-theoretic commands in GAP are self-explanatory. Some of them

we have already encountered in the previous section.
9.2.1 Basic Number-Theoretic Algorithms
One of the most important commands is the command FactorsInt(n);. It out-

puts the prime factorisation of n or, more precisely, the primes that enter this prime
factorisation with their multiplicity. The command PrintFactorsInt(n);
gives a nicer view of this prime factorisation but you cannot use the output
as a list, which you can do with the output of FactorsInt. The command
DivisorsInt(n); can be used to find all of the divisors of n. The com-
mand PrimeDivisorsInt(n); finds the set of unique prime divisors of n. For
example,
[ 2, 2, 2, 71428571 ]
gap> PrintFactorsInt(571428568);
2ˆ3*71428571
gap> DivisorsInt(571428568);
[ 1, 2, 4, 8, 71428571, 142857142, 285714284, 571428568 ]
gap> PrimeDivisors(571428568);
[ 2, 71428571 ]
The command IsPrime(n); answers the question if n is prime. The command

NextPrimeInt(n); gives the smallest prime number that is strictly greater than
n. The action of the command PrevPrimeInt(n); is similar. For example,
gap> IsPrime(571428568);
false
gap> NextPrimeInt(571428568);
571428569
gap> PrevPrimeInt(571428568);
571428527
The list of primes ‘Primes’ contains only the 168 primes that are smaller than 1000.
Using the commands that we have just introduced we can, for example, create a list
of the first 5000 primes:
gap> biggerPrimes := [];
[ ]
gap> counter := 1;
1
gap> currentPrime := 2;
2
gap> while counter < 5000 do;
> biggerPrimes[counter] := currentPrime;
> counter := counter + 1;
> currentPrime := NextPrimeInt(currentPrime);
> od;
The remainder and quotient of n divided by m are given by the commands RemInt
(n,m); and QuoInt(n,m);, respectively. For example,
gap> RemInt(9786354,383);
321
gap> QuoInt(9786354,383);
25551
The following command does the same thing as RemInt(n,m):

gap> 9786354 mod 383;
321
9.2 Number Theory 233
The greatest common divisor of a and b is given by GcdInt(a,b);. For example,

gap> GcdInt(123456789,987654321);
9
To find m, n such that ma + nb = gcd(a, b), use the GAP command Gcdex(a,b);.
For example,
Gcdex(108,801);
returns
rec( gcd := 9, coeff1 := -37, coeff2 := 5, coeff3 := 89, coeff4 := -12 )
where m =coeff1, n =coeff2 (m 1 =coeff3 and n 1 =coeff4 will also work). Another
example,
gap> Gcdex(123456789,987654321);
rec( gcd := 9, coeff1 := -8, coeff2 := 1, coeff3 := 109739369,
coeff4 := -13717421 )
To find the least common multiple of m and n, use the GAP command LcmInt(m,
n);. For example,
gap> LcmInt(123456789,987654321);
13548070123626141
The Euler totient function φ(n) is given by the command Phi(n);. For example,
gap> Phi(2ˆ15-1); Phi(2ˆ17-1);
27000
131070
The Chinese remainder theorem states the existence of the minimal solution N ≥ 0
of N = a1 mod n 1 , N = a2 mod n 2 , . . . , N = ak mod n k . The command for finding
this solution is ChineseRem([n1 , n2 , ..., nk ], [a1 , a2 , ..., ak ]);. For example:
gap> ChineseRem([5,7],[1,2]);
16
GAP does not provide automatic conversion between bases. One way of doing base
conversion is to use the p-adic numbers package, feel free to investigate this on
your own. Another way is to write simple programs. For example, 120789 can be
converted to binary as follows:
gap> n := 120789;
120789
gap> base := 2;
2 gap> rems := [];
[ ]
gap> pos := 1;
1
gap> while n > 0 do;
> rems[pos] := RemInt(n,base);
> n := QuoInt(n,base);
> pos := pos + 1;
> od;
gap> n;
0
gap> rems;
[ 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1 ]
That is, 120789 is 11101011111010101 in binary. If you are not sure why the list
rems are read in the reverse order, you need to study the base conversion algorithm
in Chap. 1. As for converting from another base into decimal, you should now be
able to do it yourself. Write a simple program to convert 100011100001111100000
from binary to decimal.
The commands RootInt(n,k); and LogInt(n,b); can be used to determine,
respectively, the integer part √
of the kth (positive real) root of n and the logarithm
of n to the base b, that is, k n and logb (n). These should be used instead of
computing roots and logarithms as GAP does not support real numbers.
Despite not supporting real numbers GAP can display a complicated fraction as
a floating-point real number e.g.,
gap> Float(254638754321/387498765398);
0.657134
9.2.2 Arithmetic Modulo m
The multiplicative order of a modulo m is given by OrderMod(a,m);. For exam-

ple,
gap> OrderMod(10,77);
6
The command SmallestRootInt(n); determines the smallest root of the inte-

ger n, which is the integer r of smallest absolute value for which a positive integer
k exists such that n = r k . For example, 135 = 371293 and this command gives
gap> SmallestRootInt(371293);
13
The command PowerMod(r,e,m); returns the eth power of r modulo m. For

example,
gap> PowerMod(987654321,123456789,987654321123456823);
171767037218848697
Calculating this number as

987654321ˆ123456789 mod 987654321123456823;
will be a mistake. The latter may take centuries (guess why). The command
QuotientMod(r,s,m) returns the quotient r s −1 of the elements r and s mod-
ulo m. In particular, using the command QuotientMod(r,s,m) is preferable to
using s −1 mod m. For example,
gap> QuotientMod(1,123456789,987654321123456823);
743084182864240163
gap> 123456789ˆ-1 mod 987654321123456823;
743084182864240163
For a larger modulus, the first command works faster.

The primitive root modulo m is given by PrimitiveRootMod(m) and the discrete

log of a to the base b modulo m is given by LogMod(a,b,m). For example,
gap> PrimitiveRootMod(23);
5
gap> LogMod(11,5,97);
86
The command RootMod(m,p) is especially useful when dealing with elliptic

curves; it determines whether or not m is a quadratic residue in Z p and, if it is, outputs
k such that m = k 2 mod p.
gap> q:=[0,0,0,0,0,0,0,0,0,0,0,0];
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
> q[i]:=RootMod(i,13);
> od;
gap> q;
[ 1, fail, 9, 11, fail, fail, fail, fail, 3, 7, fail, 8 ]
9.2.3 Digitising Messages
In the crypto section we needed to convert messages into numbers. Two small pro-
grams LettertoNumber and NumbertoLetter do the trick.1 They are not part
of GAP so you have to execute them before converting.
LtoN (an acronym for “Letter to Number”) takes any capital letter, which must
be put between apostrophes, e.g., ‘A’, and returns the corresponding number in the
range [0..25]. Any other argument would return −1, and print out an error message.
LtoN:=function(itamar)
local amith;
if itamar < ’A’ or ’Z’ < itamar then
Print("Out of range\n");
return -1;
else
amith:=INT_CHAR(itamar)-65;
return amith;
fi;
end;;
NtoL (an acronym of “Number to Letter”) takes any number, positive or negative,
and finds the corresponding letter. The argument must be an integer.
NtoL:=function(itamar)
local amith;
amith:=CHAR_INT(itamar mod 26+65);
return amith;
end;;
Now we can digitise ABRACADABRA to numbers and back:

gap> Read("LettertoNumber");
gap> Read("NumbertoLetter");
gap> letters:="ABRACADABRA";
1I owe former student Amith Itamar for writing these.

"ABRACADABRA"
gap> numbers:=[1..Length(letters)];
[ 1 .. 11 ]
gap> for i in [1..Length(letters)] do
> numbers[i]:=LtoN(letters[i]);
> od;
gap> numbers;
[ 0, 1, 17, 0, 2, 0, 3, 0, 1, 17, 0 ]
gap> letters2:="ZZZZZZZZZZZ";
"ZZZZZZZZZZZ"
gap> for i in [1..Length(numbers)] do
> letters2[i]:=NtoL(numbers[i]);
> od;
gap> letters2;
"ABRACADABRA"
In certain applications, for example in RSA, it is convenient to have all messages

of the same length. In such situations the following two programs can be used instead.
LtoN1 takes any capital letter, which must be put between apostrophes, e.g., ‘A’,
and returns the corresponding number in the range [11..36]. Any other argument
would return −1, and print out an error message.
LtoN1:=function(itamar)
local amith;
if itamar < ’A’ or ’Z’ < itamar then
Print("Out of range\n");
return -1;
else
amith:=INT_CHAR(itamar)-65+11;
return amith;
fi;
end;;
NtoL1 takes any two-digit number, positive or negative, and finds the corre-
sponding letter. The argument must be an integer.
NtoL1:=function(itamar)
local amith;
amith:=CHAR_INT(itamar-11 mod 26+65);
return amith;
end;;
The following program CNtoL1 written by Joel Laity is very convenient for
decryption of messages in RSA. It converts a number with any number of digits into
a message. For example,
gap> n:=1112131415161718192021222324252627282930313233343536;
1112131415161718192021222324252627282930313233343536
gap> CNtoL1(n);
"A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
Here is the code for it:

# CNtoL1 converts a number with an even number of digits to a sequence of characters.
# The last two digits will be converted to a character, two at a time, using the
# function NtoL1 until the entire number is exhausted. The output is a string of the
# characters with spaces in between.
CNtoL1:=function(joel)
local n, string, temp, i;
if IsInt(joel) then
string:=[];
while joel > 0 do
n:=joel mod 100;
joel:= (joel-n)/100;
Add(string,NtoL1(n));
Add(string,’ ’);
od;
#reverses the order of the list
for i in [1..QuoInt(Length(string),2)] do
temp:=string[i];
string[i]:=string[Length(string)+1-i];
string[Length(string)+1-i]:=temp;
od;
#removes extra space
string:=string{[2..Length(string)]};
return string;
else Print("Input must be an integer!");
fi;
end;;
9.3 Matrix Algebra
GAP treats row vectors as a special case of lists.

gap> v:=[1,2,3];
[ 1, 2, 3 ]
gap> IsRowVector(v);
true
You can calculate linear combinations of vectors as usual

gap> 2*[1,1,1] + [1,2,3];
[ 3, 4, 5 ]
A matrix is a list of rows. For example, the matrix

⎡ ⎤
1 2 3
A = ⎣4 5 6⎦
7 8 9
will be presented as
gap> A:=[[1, 2, 3],[4, 5, 6],[7, 8, 9]];
[ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
gap> IsMatrix(A);
true
We can calculate the vector-matrix product

gap> u:=[1,1,1];
[ 1, 1, 1 ]
gap> u*A;
[ 12, 15, 18 ]
One has to note that if we multiply the matrix A by a row vector u (which would not
be normally defined) it will actually calculate Au T , e.g.,
gap> A*u;
[ 6, 15, 24 ]
We can calculate the determinant of A as

gap> Determinant(A);
0
and calculate the inverse in two ways:

gap> B:=[[1,1,1],[0,2,1],[0,0,13]];
[ [ 1, 1, 1 ], [ 0, 2, 1 ], [ 0, 0, 13 ] ]
gap> Bˆ-1;
[ [ 1, -1/2, -1/26 ], [ 0, 1/2, -1/26 ], [ 0, 0, 1/13 ] ]
gap> Inverse(B);
[ [ 1, -1/2, -1/26 ], [ 0, 1/2, -1/26], [ 0, 0, 1/13 ] ]
Matrices with entries in Z26 can be added, multiplied and inverted adding mod 26 at
the end of the line, e.g.,
gap> C:=[[1,1,1],[0,3,1],[0,0,5]];
[ [ 1, 1, 1 ], [ 0, 3, 1 ], [ 0, 0, 5 ] ]
gap> Cˆ-1 mod 26;
[ [ 1, 17, 12 ], [ 0, 9, 19 ], [ 0, 0, 21 ] ]
9.4 Algebra
9.4.1 Permutations
Permutations in GAP are represented as products of disjoint cycles. For example,

the permutation

1 2 3 4
π=
2 1 4 3
will be represented as (1, 2)(3, 4). The identity permutation is represented as ( ). For
example:
gap> pi:=(1,2)(3,4);
(1,2)(3,4)
gap> piˆ2;
()
A permutations can also be defined by its last row using the command PermList.
For example, the permutation π can be defined as
gap> pi:=PermList([2,1,4,3]);
(1,2)(3,4)
Given a permutation written as a product of disjoint cycles, we may recover its last
row using the command ListPerm:
9.4 Algebra 239
gap> tau:=(1,3,4)(2,5,6,7);
(1,3,4)(2,5,6,7)
gap> ListPerm(tau);
[ 3, 5, 4, 1, 6, 7, 2 ]
The image π(i) is calculated as i π . For example:

gap> c:=PermList([2,3,4,1]);
(1,2,3,4)
gap> 2ˆc;
3
The order of a permutation can be calculated with GAP too.

gap> Order(PermList([2,4,5,1,3]));
6
The symmetric group of order n can be defined by the command G:=Symmetric

Group(n);. Then you can ask to generate a random permutation of degree n. Say,
gap> G:=SymmetricGroup(9);
Sym( [ 1 .. 9 ] )
gap> Random(G);
(1,9,2,8,4,6,3,7)
9.4.2 Elliptic Curves
For working in elliptic curves, first of all we have to read the two files elliptic.gd and
elliptic.gi, given at the end of this section:
gap> Read("elliptic.gd");
gap> Read("elliptic.gi");
Then an elliptic curve Y 2 = X 3 +a X +b modulo Z p can be defined by the command:

gap> G:=EllipticCurveGroup(a,b,p);
If we try to input parameters for which the discriminant of the cubic d = −(4a 3 +
27b2 ) is zero it will return an error. If the discriminant is nonzero, it will generate
the group G. To list it we may use the command AsList(G);
gap> AsList(G);
[ ( 1, 1 ), ( 1, 4 ), ( 2, 1 ), ( 2, 4 ), infinity ]
Now let us consider a larger group.

gap> H:=EllipticCurveGroup(17,19,97);
gap> ptsList := AsList(H);
[ ( 2, 35 ), ( 2, 62 ), ( 3, 0 ), ( 4, 32), ( 4, 65 ), ( 5, 36 ), ( 5, 61 ),
( 7, 44 ), ( 7, 53 ), ( 8, 45 ), ( 8, 52 ), ( 10, 5 ), ( 10, 92 ),
( 12, 37 ), ( 12, 60 ), ( 13, 20 ), ( 13, 77 ), ( 14, 24 ), ( 14, 73 ),
( 16, 33 ), ( 16, 64 ), ( 23, 8 ), ( 23, 89 ), ( 24, 34 ), ( 24, 63 ),
( 25, 8 ), ( 25, 89 ), ( 31, 48 ), ( 31, 49 ), ( 35, 18 ), ( 35, 79 ),
( 36, 40 ), ( 36, 57 ), ( 37, 45 ), ( 37, 52 ), ( 38, 21 ), ( 38, 76 ),
( 40, 0 ), ( 41, 31 ), ( 41, 66 ), ( 44, 3 ), ( 44, 94 ), ( 45, 27 ),
( 45, 70 ), ( 46, 19 ), ( 46, 78 ), ( 47, 47 ), ( 47, 50 ), ( 49, 8 ),
( 49, 89 ), ( 51, 29 ), ( 51, 68 ), ( 52, 45 ), ( 52, 52 ), ( 54, 0 ),
( 55, 2 ), ( 55, 95 ), ( 56, 12 ), ( 56, 85 ), ( 60, 27 ), ( 60, 70 ),

( 63, 2 ), ( 63, 95 ), ( 65, 47 ), ( 65, 50 ), ( 66, 16 ), ( 66, 81 ),
( 68, 39 ), ( 68, 58 ), ( 69, 17 ), ( 69, 80 ), ( 70, 21 ), ( 70, 76 ),
( 71, 25 ), ( 71, 72 ), ( 76, 2 ), ( 76, 95 ), ( 79, 34 ), ( 79, 63 ),
( 81, 4 ), ( 81, 93 ), ( 82, 47 ), ( 82, 50 ), ( 83, 23 ), ( 83, 74 ),
( 85, 30 ), ( 85, 67 ), ( 86, 21 ), ( 86, 76 ), ( 89, 27 ), ( 89, 70 ),
( 91, 34 ), ( 91, 63 ), ( 92, 10 ), ( 92, 87 ), ( 93, 9 ), ( 93, 88 ),
( 96, 1 ), ( 96, 96 ), infinity ]
gap> Size(H);
100
In GAP the group of an elliptic curve is represented multiplicatively, so we have to

multiply points instead of adding them and calculate P −1 instead of −P:
gap> point1:=ptsList[2];
( 2, 62 )
gap> point2:=ptsList[21];
( 16, 64 )
gap> point1 * point2;
( 81, 93 )
gap> point1ˆ-1;
( 2, 35 )
gap> g := Random(G);
( 92, 87 )
gap> gˆ5;
( 69, 80 )
You can determine orders of all elements of the group simultaneously using the
command
gap> List(ptsList,Order);
[ 50, 50, 2, 25, 25, 25, 25, 25, 25, 50, 50, 50, 50, 50, 50, 10, 10, 50, 50,
50, 50, 50, 50, 10, 10, 50, 50, 50, 50, 50, 50, 5, 5, 50, 50, 25, 25, 2,
50, 50, 50, 50, 50, 50, 25, 25, 50, 50, 50, 50, 25, 25, 5, 5, 2, 50, 50,
25, 25, 50, 50, 50, 50, 25, 25, 50, 50, 50, 50, 10, 10, 50, 50, 50, 50, 25,
25, 10, 10, 50, 50, 50, 50, 50, 50, 10, 10, 50, 50, 25, 25, 10, 10, 50, 50,
50, 50, 50, 50, 1 ]
The group of an elliptic curve for a 10-digit prime is already too big for GAP; it will
not be able to keep the whole group in the memory. For example, the two commands
gap> p:=123456791;;
gap> G:=EllipticCurveGroup(123,17,p);
will return an error. If one wants to calculate in larger groups, special techniques
must be applied.
We can find out if the group is cyclic or not.
gap> n:=NextPrimeInt(12345);
12347
gap> G:=EllipticCurveGroup(123,17,n);
gap> Size(G);
12371
gap> Random(G);
( 11802, 5830 )
gap> P:=Random(G);
gap> Order(P);
12371
gap> IsCyclic(G);
true
9.4 Algebra 241
There is no known polynomial time algorithm which finds a point on the given curve,
although the following randomised algorithm gives us a point with probability close
to 1/2. This algorithm chooses x at random and tries to find a matching y such that
(x, y) is on the curve. For example,
gap> p:=NextPrimeInt(99921);
99923
gap> Size(G);
100260
gap> IsCyclic(G);
true
gap> x:=12345;
12345
gap> fx:=(xˆ3+123*x+17) mod p;
51321
gap> y:=RootMod(fx,p);
fail
gap> x:=1521;
gap> fx:=(xˆ3+123*x+17) mod p;
42493
gap>
y:=RootMod(fx,p);
72372
and we obtain a point (1521, 72372) which belongs to G.

It is not so easy to input a point of a given elliptic curve. Suppose we want to
input a point M = (2425, 89535) of the curve
Y 2 = X 3 + 12345 mod 95701.
We must generate the curve but we also have to explain to GAP that M is the point
of the curve we have defined. For this we present GAP with an already known point
of the target curve (for example, we can generate a point P on this curve at random)
and say that we will input a point of the same curve. We see how this can be done in
the example below:
gap> G:=EllipticCurveGroup(0,12345,95701);;
gap> P:=Random(G);
(91478, 65942 )
gap> M:=EllipticCurvePoint(FamilyObj(P),[2425,89535]);
( 2425, 89535 )
Finally, below are the files that have to be read before any calculations with elliptic
curves are possible.
#############################################################################
##
#W elliptic.gd Stefan Kohl
##
## This file contains declarations of functions etc. for computing with
## elliptic curve
##
DeclareCategory( "IsPointOnEllipticCurve", IsScalar);
DeclareCategoryCollections( "IsPointOnEllipticCurve" );
DeclareRepresentation( "IsAffineWeierstrassRep", IsComponentObjectRep,

["x","y"] );
DeclareGlobalFunction( "EllipticCurvePoint" );
DeclareGlobalFunction( "EllipticCurveGroup" );
#############################################################################
##
#E elliptic.gd . . . . . . . . . . . . . . . . . . . . . . . . . ends here
##
#############################################################################
##
#W elliptic.gi Stefan Kohl
##
## This file contains implementations of methods and functions for
## computing in the elliptic curve point groups E( a , b )/p
## (only in affine Weierstrass form) ##
InstallGlobalFunction( EllipticCurvePoint,
function ( Fam, P )
local X, Y;
X := P[ 1 ]; Y := P[ 2 ];
if X <> infinity
and (Yˆ2) mod Fam!.p <> (Xˆ3 + Fam!.a*X + Fam!.b) mod Fam!.p
then Error( "The given point must be on the specified curve" ); fi;
if X = infinity
then Y := infinity;
else X := X mod Fam!.p; Y := Y mod Fam!.p;
fi;
return Objectify( NewType( Fam, IsPointOnEllipticCurve

and IsAffineWeierstrassRep ),
rec( x := X, y := Y ) );
end );
InstallGlobalFunction( EllipticCurveGroup,
function ( a, b, p )
local F, G, X, Y, FamName, ready, Point;
if not ( IsInt(a) and IsInt(b)

and IsPosInt(p) and IsPrimeInt(p) and p >= 5 )
then Error( "E(a,b)/p : <a> and have to be integers, ",
" and p has to be a prime >= 5" ); fi;
if (4*aˆ3 + 27*bˆ2) mod p = 0
then Error( "<a> and must satisfy 4*a3 + 27*b2 <> 0 (mod )" );
fi;
FamName := Concatenation ( "E(", String(a), ",", String(b), ")/",

String( p ) );
F := NewFamily( FamName, IsPointOnEllipticCurve );

SetName( F, FamName );
F!.a := a;
F!.b := b;
F!.p := p;
X := 0; ready := false;
9.4 Algebra 243
repeat
if Legendre( Xˆ3 + a*X + b, p ) = 1
then Y := RootMod( Xˆ3 + a*X + b, p );
Point := EllipticCurvePoint( F, [ X, Y ] );
if not IsBound( G )
then G := GroupByGenerators( [ Point ] );
else G := ClosureGroup( G, Point );
fi;
if p > 31 and Size( G ) > p - 2 * RootInt( p )
then ready := true; fi;
fi;
X := X + 1;
until X = p or ready;
SetIsWholeFamily( G, true );
SetName( G, Concatenation( "EllipticCurveGroup(", String( a ),
",", String( b ), ",", String( p ), ")" ) );
return G;
end );
InstallMethod( PrintObj,
"for element in E(a,b)/p, (AffineWeierstrassRep)",
true, [ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,
function( p )
Print( "EllipticCurvePoint( ", FamilyObj( p ),
", [ ",p!.x,", ", p!.y, " ] )" );
end );
InstallMethod( ViewObj,
"for element in E(a,b)/p, AffineWeierstrassRep",
true, [ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,
function( p )
if p!.x <> infinity
then Print( "( ",p!.x,", ", p!.y, " )" );
else Print( "infinity" );
fi; end );
InstallMethod( \=,
"for two elements in E(a,b)/p, AffineWeierstrassRep",
IsIdenticalObj,
[ IsPointOnEllipticCurve and IsAffineWeierstrassRep,
IsPointOnEllipticCurve and IsAffineWeierstrassRep ],
0,
function( x, y )
return x!.x = y!.x and x!.y = y!.y;
end );
InstallMethod( \<,
IsIdenticalObj,
0,
function( x, y )
return [x!.x, x!.y] < [y!.x, y!.y];
end );
InstallMethod( \*,
IsIdenticalObj,

0,
function( p1, p2 )
local lambda, p3, p, h;
p := FamilyObj( p1 )!.p;
if (p1!.x <> infinity) and (p2!.x <> infinity)
then
if p1!.x = p2!.x and p1!.y = (- p2!.y) mod FamilyObj( p2 )!.p
then p3 := rec( x := infinity, y := infinity );
else
if p1!.x <> p2!.x
then h := QuotientMod( 1, p1!.x - p2!.x, FamilyObj( p1 )!.p );
if h = fail then return Gcd( p1!.x - p2!.x, p ); fi;
lambda := (p1!.y - p2!.y) * h;
else h := QuotientMod( 1, 2 * p1!.y, FamilyObj( p1 )!.p );
if h = fail then return Gcd( 2 * p1!.y, p ); fi;
lambda := (3 * p1!.xˆ2 + FamilyObj( p1 )!.a) * h;
fi;
p3 := rec();
p3.x := lambdaˆ2 - p1!.x - p2!.x;
p3.y := - (lambda * (p3.x - p1!.x) + p1!.y);
fi;
else
if p1!.x = infinity then p3 := rec( x := p2!.x, y := p2!.y );
else p3 := rec( x := p1!.x, y := p1!.y ); fi;
fi;
return EllipticCurvePoint( FamilyObj( p1 ), [ p3.x, p3.y ] );
end );
InstallMethod( OneOp,
"for an element in E(a,b)/p, AffineWeierstrassRep",
true,
[ IsPointOnEllipticCurve ], 0,
x -> EllipticCurvePoint( FamilyObj( x ),
[ infinity, infinity ] )
);
InstallMethod( InverseOp,
"for an element in E(a,b)/p, AffineWeierstrassRep",
true,
[ IsPointOnEllipticCurve and IsAffineWeierstrassRep ], 0,
function ( p )
if p!.x = infinity
then return EllipticCurvePoint( FamilyObj( p ), [ infinity, infinity ] );
else return EllipticCurvePoint( FamilyObj( p ),
[ p!.x, (- p!.y) mod FamilyObj( p )!.p ] );
fi;
end );
InstallMethod( Random,
"for group E(a,b)/p",
true,
[ CategoryCollections( IsPointOnEllipticCurve )
and IsWholeFamily ], 0,
function ( G )
local X, Y, a, b, p;
a := ElementsFamily( FamilyObj( G ) )!.a;

b := ElementsFamily( FamilyObj( G ) )!.b;
9.4 Algebra 245
p := ElementsFamily( FamilyObj( G ) )!.p;
repeat
X := Random( [0 .. p - 1] );
until Legendre( Xˆ3 + a*X + b, p ) = 1;
Y := RootMod( Xˆ3 + a*X + b, p );
return EllipticCurvePoint( ElementsFamily( FamilyObj( G ) ), [ X, Y ] );
end );
#############################################################################
##
#E elliptic.gi . . . . . . . . . . . . . . . . . . . . . . . . . ends here
##
9.4.3 Finite Fields
GAP knows about all the finite fields. To create the finite field Z p , type GF(p); For
example,
gap> F:=GF(5);;
gap> List:=Elements(F);
[ 0*Z(5), Z(5)ˆ0, Z(5), Z(5)ˆ2, Z(5)ˆ3 ]
The first element is 0 (GAP makes it clear that this is the zero of Z5 and not, say,
of Z3 ). The remaining elements are powers of a primitive element of Z5 , and, in
particular, the second element is 1. Type Int(Z(5)); to determine the value of
Z (5) (as an integer mod 5).
gap> Int(Z(5));
2
gap> value:=[0,0,0,0,0];;
> value[i]:=Int(List[i]);
> od;
gap> value;
[ 0, 1, 2, 4, 3 ]
Let us also consider Z7 :

gap> F:=GF(7);
GF(7)
gap> Elements(F);
[ 0*Z(7), Z(7)ˆ0, Z(7), Z(7)ˆ2, Z(7)ˆ3, Z(7)ˆ4, Z(7)ˆ5 ]
gap> # Here 0*Z(7)=0, Z(7)ˆ0=1, Z(7)=3, Z(7)ˆ2=2, Z(7)ˆ3=6, Z(7)ˆ4=4, Z(7)ˆ5=5.
gap> # Z(7) is not 2 since 2 is not a primitive element.
Addition is carried out using + and multiplication using ∗.

In GAP the generator of Z ( p) is chosen as the smallest primitive root mod p, as
is obtained from the PrimitiveRootMod function. Here’s how to verify this for
p = 3 and p = 5:
gap> PrimitiveRootMod(7);
3
gap> p:=123456791;;
gap> PrimitiveRootMod(p);
17
To generate a finite field G F( p k ), where p is a prime and k > 0 is an integer, we

type GF(pˆk);. For example,
gap> GF4:=GF(4);
GF(2ˆ2)
gap> F:=Elements(GF4);
[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2 ]
Since F ∗ is a cyclic group, GAP uses a generator of this cyclic group, denoted Z ( pk ),
to list all elements (except zero) as its powers.
gap> GF4:=GF(4);
GF(2ˆ2)
gap> gf4:=Elements(GF4);
[ 0*Z(2), Z(2)ˆ0, Z(2ˆ2), Z(2ˆ2)ˆ2 ]
gap> # Note that GAP lists elements of Z_2 first.
gap> GF8:=GF(8);
GF(2ˆ3)
[ 0*Z(2), Z(2)ˆ0, Z(2ˆ3), Z(2ˆ3)ˆ2, Z(2ˆ3)ˆ3, Z(2ˆ3)ˆ4, Z(2ˆ3)ˆ5, Z(2ˆ3)ˆ6 ]
Note that G F(8) contains G F(2) but not G F(4). It is a general fact that G F( pm )
contains G F( p k ) as a subfield if and only if k|m.
gap> GF9:=GF(9);
GF(3ˆ2)
[ 0*Z(3), Z(3)ˆ0, Z(3), Z(3ˆ2), Z(3ˆ2)ˆ2, Z(3ˆ2)ˆ3, Z(3ˆ2)ˆ5, Z(3ˆ2)ˆ6,
Z(3ˆ2)ˆ7 ]
Note that GAP lists elements of Z 3 first. Next, let’s try adding, subtracting, and
multiplying field elements in GAP. For example in G F(9) :
[ 0*Z(3), Z(3)ˆ0, Z(3), Z(3ˆ2), Z(3ˆ2)ˆ2, Z(3ˆ2)ˆ3, Z(3ˆ2)ˆ5, Z(3ˆ2)ˆ6, Z(3ˆ2)ˆ7 ]
gap> gf9[5]+gf9[6]; gf9[5]-gf9[7];
Z(3)
Z(3ˆ2)ˆ3
gap> gf9[5]ˆ2;
Z(3)
In a finite field the discrete logarithm of an element z with respect to a root r is

the smallest nonnegative integer i such that r i = z. The command LogFFE(z, r)
returns this value. (Note that r must not be a primitive element of the field for this
command to work.) An error is signalled if z is zero, or if z is not a power of r .
gap> LogFFE( Z(409)ˆ116, Z(409) ); LogFFE( Z(409)ˆ116, Z(409)ˆ2 );
116; 58
9.4.4 Polynomials
It is not too hard to explain to GAP that we now want x to be a polynomial. We can
define the polynomial ring F[x] first. For example, we define the polynomial ring in
one variable x over Z2 as follows:
gap> R:=PolynomialRing(GF2,["x"]);
PolynomialRing(..., [ x ])
gap> x:=IndeterminatesOfPolynomialRing(R)[1];
x
9.4 Algebra 247
Now GAP will understand the following commands in which we define a polynomial
1+x+x 3 ∈ Z2 [x] and substitute the primitive element of G F(8) in it. All calculations
will therefore be conducted in the field G F(8):
gap> p:=Z(2)+x+xˆ3;
xˆ3+x+Z(2)ˆ0
gap> Value(p,Z(2ˆ3));
0*Z(2)
This tells us that the generator Z(23 ) of G F(8) is a root of the polynomial p(x) =
x 3 + x + 1 over Z2 .
We can factorise polynomials as follows:
gap> Factors(xˆ16+x+1);
[ xˆ8+xˆ6+xˆ5+xˆ3+Z(2)ˆ0, xˆ8+xˆ6+xˆ5+xˆ4+xˆ3+x+Z(2)ˆ0 ]
The Euclidean and Extended Euclidean algorithms can be performed as follows:

gap> g:=xˆ3+1;
xˆ3+Z(2)ˆ0
gap> h:=xˆ4+xˆ2+1;
xˆ4+xˆ2+Z(2)ˆ0
gap> Gcd(g,h);
xˆ2+x+Z(2)ˆ0
gap> GcdRepresentation(g,h);
[ x, Z(2)ˆ0 ]
gap> x*g+Z(2)ˆ0*h;
xˆ2+x+Z(2)ˆ0
Alternatively we can define the polynomial x as follows. This time we define x

as a polynomial from Q[x]:
gap> x := Indeterminate(Rationals);
x_1
gap> Factors(xˆ12-1);
[x_1-1, x_1+1, x_1ˆ2-x_1+1, x_1ˆ2+1, x_1ˆ2+x_1+1, x_1ˆ4-x_1ˆ2+1 ]
When you type x, GAP understands what you want to say but still gives the answer
in terms of x1 . Another useful command:
gap> QuotientRemainder( (x+1)*(x+2)+5, x+1 );
[ x_1+2, 5 ]
InterpolatedPolynomial(R,x,y) returns, for given lists x and y of ele-

ments in a ring R of the same lengths, say, n, the unique polynomial of degree less
than n which has value y[i] at x[i], for all i = 1, 2, . . . , n. Note that the elements in
x must be distinct. For example,
gap> InterpolatedPolynomial( Integers, [ 1, 2, 3 ], [ 5, 7, 0 ] );
-9/2*xˆ2+31/2*x-6
Using GAP we can calculate minimal annihilating polynomials. For example,

gap> F:=GF(2ˆ6);
GF(2ˆ6)
gap> elts:=Elements(F);
Z(2ˆ3)ˆ5, Z(2ˆ3)ˆ6, Z(2ˆ6), Z(2ˆ6)ˆ2, Z(2ˆ6)ˆ3, Z(2ˆ6)ˆ4, Z(2ˆ6)ˆ5,
Z(2ˆ6)ˆ14, Z(2ˆ6)ˆ15, Z(2ˆ6)ˆ16, Z(2ˆ6)ˆ17, Z(2ˆ6)ˆ19, Z(2ˆ6)ˆ20,

Z(2ˆ6)ˆ57, Z(2ˆ6)ˆ58, Z(2ˆ6)ˆ59, Z(2ˆ6)ˆ60, Z(2ˆ6)ˆ61, Z(2ˆ6)ˆ62 ]
gap> a:=elts[11];
Z(2ˆ6)
gap> MinimalPolynomial(GF(2),a);
x_1ˆ6+x_1ˆ4+x_1ˆ3+x_1+Z(2)ˆ0
gap> MinimalPolynomial(GF(2ˆ3),a);
x_1ˆ2+Z(2ˆ3)*x_1+Z(2ˆ3)
So the minimal annihilating polynomial over Z2 is of degree 6:
m(t) = t 6 + t 4 + t 3 + t + 1
while the same element has the minimal annihilating polynomial
m 1 (t) = t 2 + αt + α
over G F(23 ), where α = Z (23 ) is a primitive element of G F(23 ) over Z2 .

Chapter 10
Appendix B: Miscellanies
10.1 Linear Dependency Relationship Algorithm
This algorithm is based on the following observation.

Lemma 10.1.1 Let A = [ a1 , a2 , . . . , an ] and B = [ b1 , b2 , . . . , bn ] be two m × n
matrices given by their columns a1 , a2 , . . . , an and b1 , b2 , . . . , bn . Suppose that A
is row reducible to B. Then
x1 a1 + x2 a2 +· · ·+ xn an = 0 if and only if x1 b1 + x2 b2 +· · ·+ xn bn = 0. (10.1)
In particular, a system of columns {ai1 , ai2 , . . . , aik } is linearly independent if and

only if the system {bi1 , bi2 , . . . , bik } is linearly independent.
Proof Let x = (x1 , x2 , . . . , xn )T . Then
Ax = x1 a1 + x2 a2 + · · · + xn an and Bx = x1 b1 + x2 b2 + · · · + xn bn .
Since elementary row operations do not change the solution set of systems of linear
equations, we know that
Ax = 0 if and only if Bx = 0.
Hence (10.1) is true. �

The algorithm is used when we are given a set of vectors v1 , v2 , . . . , vn ∈ Rn and
we need to identify a basis of span{v1 , v2 , . . . , vn } and express all other vectors as
linear combinations of that basis. We form a matrix (v1 · · · vn ) whose columns are the
given vectors and reduce it to the reduced row echelon form where all relationships
are transparent.
Example 10.1.1 The matrix A = [a1 , a2 , . . . , a5 ] with columns a1 , a2 , . . . , a5
is brought to its reduced row echelon form R = [r1 , r2 , . . . , r5 ] with columns
DOI 10.1007/978-3-319-21951-6_10
250 10 Appendix B: Miscellanies
r1 , r2 , . . . , r5 as follows:
⎡ ⎤ ⎡ ⎤
1 −1 0 1 −4 1010 2
⎢ 0 2 2 2 0 ⎥ rref ⎢ 0 1 1 0 3 ⎥
A=⎢ ⎥ ⎢ ⎥
⎣ 2 1 3 1 4 ⎦ −→ ⎣ 0 0 0 1 −3 ⎦ .
3 254 0 0000 0
The relationships between columns of R are much more transparent than that of
A. For example, we see that {r1 , r2 , r4 } is linearly independent (as a part of the
standard basis of R4 ) and that r1 + r2 − r3 = 0 and r5 = 2r1 + 3r2 − 3r4 .
Hence we can conclude that {a1 , a2 , a4 } is linearly independent, hence a basis of
span{a1 , a2 , . . . , a5 } and that a3 = a1 + a2 and a5 = 2a1 + 3a2 − 3a4 .
10.2 The Vandermonde Determinant
The Vandermonde Determinant

1 1 · · · 1

x1 x2 · · · xn
2 2

Vn (x1 , x2 , . . . , xn ) = x1 x2 · · · xn2 (10.2)
.. .. .. .
.
. . ..
x n−1 x n−1 · · · xnn−1
1 2
plays a significant role in algebra and applications. It can be defined over any field,
has a beautiful structure and can be calculated directly for any order.
More precisely, the following theorem is true.
Theorem 10.2.1 Let F be a field and a1 , a2 , . . . , an be elements of this field. Then

the value of the Vandermonde determinant of order n ≥ 2 is

Vn (a1 , a2 , . . . , an ) = (ai − a j ). (10.3)
1≤i< j≤n
Proof Since V2 = a2 − a1 we get a basis for induction. Suppose the theorem is true
for order n − 1. Consider the determinant

1 1 · · · 1

x
2 a22 · · · an2
x a · · · a
f (x) = 2 n
.. .. . . ..
. . . .

x n−1 a n−1 · · · a n−1
2 n
10.2 The Vandermonde Determinant 251
If we expand it using cofactors of the first column we will see that it has degree n − 1.
Also it is easy to see that f (a2 ) = · · · = f (an ) = 0 since, if we replace x with any
of the ai for i > 1, we will have a determinant with two equal columns. Hence
f (x) = C(x − a2 ) . . . (x − an ).
From the expansion of f (x) by cofactors of the first column we see that C =
Vn−1 (a2 , . . . , an ). Hence we have
f (x) = Vn−1 (a2 , . . . , an )(x − a2 ) . . . (x − an ).
Substituting a1 for x and using the induction hypothesis, we get

Vn (a1 , a2 , . . . , an ) = f (a1 ) = (a1 − a2 ) . . . (a1 − an ) (ai − a j )
2≤i< j≤n

= (ai − a j ).
1≤i< j≤n
Corollary 10.2.1 If a1 , a2 , . . . , an are distinct elements of the field, the Vander-

monde determinant Vn (a1 , a2 , . . . , an ) is nonzero.
The determinant

x1 x2 · · · xn
2
x1
x22 · · · xn2

. .. . . ..
Vn (x1 , x2 , . . . , xn ) = .. . . . (10.4)

x n−1 x2n−1 · · · xn
n−1
1
xn x2n · · · xn
1 n
is also sometimes called the Vandermonde determinant. It is closely related to the

original Vandermonde determinant as the following theorem states.
Theorem 10.2.2 Let a1 , a2 , . . . , an be elements of the field F. Then

n

Vn (a1 , a2 , . . . , an ) = ai Vn (a1 , a2 , . . . , an ). (10.5)
i=1
Proof Immediately follows from Theorem 10.2.1 (exercise). �

Chapter 11
Solutions to Exercises
11.1 Solutions to Exercises of Chap. 1
Solutions to Exercises of Sect. 1.1.1

1. (a) The whole set of integers itself does not contain a smallest element.
(b) The set {1/2, 1/3, . . . , 1/n, . . .} does not contain a smallest element.
2. Here we just need the Principle of Mathematical Induction. For n = 1, the integer
4n + 15n − 1 = 18 is divisible by 9. This is a basis for the induction. Suppose that
4n +15n −1 is divisible by 9 for some n > 1. Let us consider 4n+1 +15(n +1)−1
and represent it as 4 · 4n + 15n + 14 = 4(4n + 15n − 1) − 45n + 18. This is
now obviously divisible by 9 since both 4(4n + 15n − 1) and 45n + 18 are (the
former by induction hypothesis). Thus 4n+1 + 15(n + 1) − 1 is divisible by 9 and
the induction step has been proven.
3. For n = 0 we have 112 + 121 = 133 which is, of course, divisible by 133. This
gives us a basis for the induction.
We need the Principle of Mathematical Induction again. Suppose now
133 | 11n+2 + 122n+1 (induction hypothesis) and let us consider
11(n+1)+2 + 122(n+1)+1 = 11n+3 + 122n+3 .
We rearrange this as follows:
11n+3 + 122n+3 = 11 · 11n+2 + 144 · 122n+1

= 144(11n+2 + 122n+1 ) − 133 · 11n+2 .
The right-hand side is divisible by 133. Indeed, the first summand is divisible by
133 by induction hypothesis and the second is simply a multiple of 133. Thus
11n+3 + 122n+3 is divisible by 133, which completes the induction step and the
proof.
4. We have F0 = 3 and F1 = 5. We see that F0 = F1 − 2 and this is a basis for our
induction. The induction step
DOI 10.1007/978-3-319-21951-6_11
254 11 Solutions to Exercises
F0 F1 . . . Fn = Fn+1 − 2 implies F0 F1 . . . Fn+1 = Fn+2 − 2
will be proved if we could show that (Fn+1 − 2)Fn+1 = Fn+2 − 2. We have

n+1 n+1 n+2
(Fn+1 − 2)Fn+1 = (22 − 1)(22 + 1) = 22 − 1 = Fn+2 − 2.
5. For k = 1 we have 3k = 3, which is a divisor of 23 + 1 = 9. This gives us a basis
for the induction.
k
Suppose now 3k | 23 + 1 (induction hypothesis). Then there exists an integer m
k
such that m · 3k = 23 + 1 and let us consider
k+1 k
23 = (23 )3 = (m · 3k −1)3 = m 3 · 33k −m 2 · 32k+1 +m · 3k+1 −1=t · 3k+1 −1,
k+1
where t = m 3 · 32k+1 − m 2 · 3k + m is an integer. Thus 3k+1 | 23 + 1, which
proves the induction step.
6. Let M be the minimal number that cannot be represented as required. Then M is
between two powers from the list, say 2k < M < 2k+1 . Since M is minimal, the
number M − 2k can be represented as
M − 2k = 2i1 + · · · + 2is ,
where i 1 < · · · 2is . Therefore
M = 2i1 + · · · + 2is + 2k
is a representation of M as a sum of distinct powers of 2, contrary to what was

assumed. This contradiction proves the statement.
7. Let M be the minimal positive integer which can be represented as a sum of
distinct powers of 2 in two different ways:
M = 2i1 + · · · + 2is = 2 j1 + · · · + 2 jt .
Suppose that i 1 < · · · < i s and j1 < · · · < jt . Then either i 1 = 0 or j1 = 0. If

not, we can divide both sides by 2 and get two different representations for M/2
which contradicts the minimality of M. If i 1 = j1 = 0, then 2i1 = 2 j1 = 1 and
subtracting 1 on both sides we would get two different representations for M − 1,
which again contradicts the minimality of M.
Hence
1 + 2i2 + · · · + 2is = 2 j1 + · · · + 2 jt
and j1 > 0. But then the left-hand side is odd and the right-hand side is even.
This contradiction shows that such a minimal counterexample M does not exist
and all integers can be uniquely represented.
8. Consider a minimal counter example, i.e., any configuration of discs which cannot
be painted as required and which consists of the least possible number of discs.
Consider the centers of all discs and consider the convex hull of them. This hull
11.1 Solutions to Exercises of Chap. 1 255
is a convex polygon and each angle of it is less than 180◦ . If a disc with the centre
O is touched by two other discs with centres P and Q
then, as PQ ≥ PO and PQ ≥ QO, then ∠POQ ≥ ∠PQO and ∠POQ ≥ ∠QPO,

whence ∠POQ ≥ 60◦ . Thus every disc with centre at a vertex of the convex hull
cannot be touched by more than three other discs. Remove any of the discs whose
center is at the vertex of the convex hull. Then the rest of the discs can already
be painted because the counterexample was minimal. But then the removed disc
can be painted as well, since it was touched by at most three other discs and we
can choose the fourth colour to paint it. This contradiction proves the statement.
1. The 2007th prime will not be stored in Primes so we have to use the command
NextPrimeInt to find it:
gap> p:=1;;
gap> n:=2007;;
gap> for i in [1..n] do
> p:=NextPrimeInt(p);
> od;
gap> p;
17449
We see that p2007 = 17449.

2. The following GAP program
gap> k:=1;;
gap> N:=Primes[1];;
gap> while IsPrime(N+1)=true do
> k:=k+1;
> N:=N*Primes[k];
> od;
gap> k;
6
gap> N:=N+1;
30031
gap> FactorsInt(n);
[ 59, 509 ]
shows that the smallest k for which Nk = p1 p2 . . . pk + 1 is composite is k = 6.

Then N6 = 30031 = 59 · 509. Both 59 and 509 are greater than p6 = 13.
3. We have a 3 − 27 = a 3 − 33 = (a − 3)(a 2 + 3a + 9) and therefore a − 3 divides

a 3 − 27. Hence a − 3 divides a 3 − 17 if and only if it divides the difference
(a 3 − 27) − (a 3 − 17) = 10. This happens if and only if
a − 3 ∈ {±1, ±2, ±5, ±10},
which is equivalent to a ∈ {−7, −2, 1, 2, 4, 5, 8, 13}.

4. We will prove the second statement. Let p > 2 be a prime. Let us divide it by
6 with remainder: p = 6k + r , where r = 0, 1, 2, 3, 4, 5. When r takes values
0, 2, 3, 4 the right-hand side is divisible by 2 or 3, hence in this case p cannot be
a prime. Only two possibilities are left: p = 6k + 1 and p = 6k + 5. Examples
of primes of these two sorts are 7 and 11.
5. Let p = 3k +1 be a prime. Then p > 2 and hence it is odd. But then 3k = p −1 is
even and 3k = 2m. Due to uniqueness of prime factorisation, k must be divisible
by 2, i.e., k = 2k . Therefore p = 3k + 1 = 6k + 1.
6. Here is the program:
Primes1:=[];;
Primes3:=[];;
numbers:=[1..168];;
for i in numbers do
if RemInt(Primes[i],4)=1 then
Add(Primes1,Primes[i]);
fi;
if RemInt(Primes[i],4)= 3 then
Add(Primes3,Primes[i]);
fi;
od;
Length(Primes1);
Length(Primes3);
Primes1[32];
Primes3[53];
Position(Primes1,601);
Position(Primes3,607);
Run this program yourself to find the numerical answers.

7. (a) Here is the program and the calculation:
gap> NicePrimes:=[];
[ ]
gap> for i in [1..Length(Primes)] do
> if RemInt(Primes[i],6)=5 then
> Add(NicePrimes,Primes[i]);
> fi;
> od;
gap> NicePrimes;
[ 5, 11, 17, 23, 29, 41, 47, 53, 59, 71, 83, 89, 101, 107,
113, 131, 137, 149, 167, 173, 179, 191, 197, 227, 233, 239,
251, 257, 263, 269, 281, 293, 311, 317, 347, 353, 359, 383,
389, 401, 419, 431, 443, 449, 461, 467, 479, 491, 503, 509,
521, 557, 563, 569, 587, 593, 599, 617, 641, 647, 653, 659,
677, 683, 701, 719, 743, 761, 773, 797, 809, 821, 827, 839,
857, 863, 881, 887, 911, 929, 941, 947, 953, 971, 977, 983 ]
(b) We know (see Exercise 4) that all primes p > 3 fall into two categories: those
for which p = 6k + 1 and those for which p = 6k + 5.
One additional observation: if we take two numbers of the first category if

n 1 = 6k1 + 1 and n 2 = 6k2 + 1, then their product
n 1 n 2 = (6k1 + 1)(6k2 + 1) = 6(6k1 k2 + k1 + k2 ) + 1 = 6k3 + 1
will also be from the same category.

Now we assume that there are only finitely many primes p such that p =
6k + 5. Then there is the largest such prime. Let p1 , p2 , . . . , pn , . . . be the
sequence of all primes in increasing order with pn being the largest prime
that gives remainder 5 on division by 6. Consider the number
N = p1 p2 . . . pn − 1.
Since p1 = 2 and p2 = 3, the product p1 p2 . . . pn is divisible by 6. Hence N

has remainder 5 on division by 6 and hence belongs to the second category.
Let q be any prime that divides N . Obviously it is different from all of the
p1 , p2 , . . . , pn . Since q > pn it must be of the type q = 6k + 1. Thus every
prime that divides N has remainder 1 on division by 6, then, as we noted
above, the same must be true for N , which contradicts the fact that N has
remainder 5 on division by 6.
8. There are many alternative proofs of the fact that the number of primes is infi-
nite. Here is one of those. Assume on the contrary that there are only k primes
p1 , p2 , . . . , pk . Given n, let us find an upper bound f (n) for the number of
products
p1α1 p2α2 . . . pkαk
that do not exceed n by estimating the number of values that integers

α1 , α2 , . . . , αk might assume. Since n ≥ piαi ≥ 2αi we obtain αi ≤ log2 n.
Then the number of products which do not exceed n will be at most
f (n) = (log2 n + 1)k .
It is easy to show that f (n) grows more slowly than n for n sufficiently large. For
example, we may use L’Hospital rule to show that
f (n)
lim = 0.
n→∞ n
This will be an absurdity since for large n there will not be enough prime factori-
sations for all positive integers between 1 and n.

√
1. (a) Notice that since 210 < 17, all composites below 210 divide 2, 3, 5, 7, 11
or 13. The primes to be found are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37,
41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83,89, 97, 101, 103, 107, 109, 113, 127,
131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199.
Hence π(210) = 46.
(b) We have
210
≈ 39,
ln 210
which is somewhat lower than 46. This shows that the approximation given
by the Prime Number Theorem is not very good for small values of n.
2. Straightforward. √
3. (a) No, because n > 11093 and n can be, for example, a square of a prime p
such that 10000 < p ≤ 11093.
The number of possible prime divisors is approximately x/ln(x) where x =
(b) √
123123137, so approximately 1193 divisions are needed. The professor
has already done 10000/ ln(10000) = 1085 divisions; he needs to do another
108.
4. Since n is composite, n = p1 p2 . . . pm , where pi is prime, for all i = 1, 2, . . √
. , m,
and we do not assume that all of them are√ different. We are given that pi > 3
n,
and m ≥ 2. Then, we also have pi > n because pi is an integer. Suppose that
3
m ≥ 3. Then √
n = p1 p2 . . . pm ≥ p1 p2 p3 > ( 3 n)3 = n,
which is a contradiction. Thus m = 2 and p is a product of two primes.

5. Let n > 6 be an integer. If n is odd, then 2 and n − 2 are relatively prime (since
n − 2 is odd) and n = 2 + (n − 2) is a valid solution. More generally, if there
is a prime p which is smaller than n − 1 and does not divide n, we can write
n = p + (n − p) and gcd((, p), n − p) = 1.
Since n > 6 we may assume that 2|n, 3|n, 5|n and hence 30|n. In particular, n is
composite. Let q be the largest prime divisor of n. Then n ≥ 6q so 5 ≤ q ≤ n/6.
By Bertrand’s postulate there is a prime p such that q 1).
6. The program does some kind of sieving but the result is very different from the
result of the Sieve of Eratosthenes. It outputs all powers of 2 between 1 and 106 .
There are 20 such numbers in total.
1. Firstly we represent the number as a product of primes:
22 · 33 · 44 · 55 = 210 · 33 · 55
and the number of divisors will be (10 + 1)(3 + 1)(5 + 1) = 264. Note that we
cannot use the formula straight as 4 is not prime.
2. We factor this number with GAP:
[ 3, 3, 3607, 3803 ]
so the prime factorisation is 123456789 = 32 · 3607 · 3803. The number of

divisors then will be (2 + 1)(1 + 1)(1 + 1) = 12.
3. The common divisors of 10650 and 6750 are the divisors of gcd(10650, 6750).
So, let us calculate this number using the Euclidean algorithm. We will find:
10650 = 1 · 6750 + 3900

6750 = 1 · 3900 + 2850
3900 = 1 · 2850 + 1050
2850 = 2 · 1050 + 750
1050 = 1 · 750 + 300
750 = 2 · 300 + 150
300 = 2 · 150
Hence
gcd(10650, 6750) = 150 = 2 · 3 · 52 .
Therefore the common divisors of 10650 and 6750 are the factors of 150, which
are 1, 2, 3, 5, 6, 10, 15, 25, 30, 50, 75, 150.
4. (a) We have gcd(m, n) = 22 · 54 · 112 ; lcm(m, n) = 24 · 32 · 57 · 72 · 113 .
(b) Using GAP we calculate
gcd(m, n)·lcm(m, n) = 302500·733713750000 = 221948409375000000.
Also
mn = 1361250000 · 163047500 = 221948409375000000,
which is the same value.

5. The prime factorisation of 33 is 33 = 3 · 11. This number of divisors can occur
when the number is equal to p 32 , where p is prime, or when the number is p 10 q 2 ,
where p, q are primes. As 232 > 10000, the first possibility cannot occur. In the
second, since 310 > 10000, the number can be only of the form n = 210 q 2 . The
smallest unused prime is q = 3. This gives us the number n = 102 · 32 = 9216.
No other prime q works since n = 102 · 52 > 10000. So the only such number
is 9216.
6. Since the prime factorisation of 246 is 246 = 2 · 3 · 41, the prime factorisation
of 246246 will be
246246 = 2246 · 3246 · 41246
and d(246246 ) = 2473 . As 247 = 13 · 19 we have 2473 = 133 · 193 and

d(d(246246 )) = 4 · 4 = 16.
7. If d is a divisor of a and b, then a = a d and b = b d and a − b = (a − b )d,
whence d is a common divisor of a and a − b. If d is a divisor of a and a − b,
then a = a d and a − b = cd. Then b = a − (a − b) = (a − c)d, that is, d is
also a common divisor of a and b.
8. We use the previous exercise repeatedly. We have gcd(13n + 21, 8n + 13) =
gcd(8n + 13, 5n + 8) = gcd(5n + 8, 3n + 5) = gcd(3n + 5, 2n + 3) =
gcd(2n + 3, n + 2) = gcd(n + 2, n + 1) = gcd(n + 1, 1) = 1.
9. (a) Suppose a 2 and a + b have a common prime divisor p. Then it is also a
divisor of a and hence of b = (a + b) − a, contradiction.
(b) As in Exercise 7, we notice that gcd(a, b) = gcd(a, a + b) for any integer
q. Then, since a 2 − b2 = (a − b)(a + b) is divisible by a + b, we have
gcd(a 2 +b2 , a +b) = gcd((a 2 +b2 )+(a 2 −b2 ), a +b) = gcd(2a 2 , a +b) = 2.
This is because, by (a), a 2 and a + b do not have prime divisors in common.

Since a + b and a 2 + b2 are not relatively prime, their greatest common
divisor can only be 2. This can be realised taking two arbitrary odd relatively
prime a and b, say a = 25 and b = 49.
10. Let Fi and F j be two Fermat numbers with i < j. Then by Exercise 4 of
Sect. 1.1.1 F0 F1 . . . F j−1 = F j − 2. Since the left-hand side is divisible by Fi ,
the only common divisor of Fi and F j could be 2. However, these numbers are
odd, hence coprime.
11. If there were only a finite number k of primes, then among any k + 1 Fermat
numbers there will be two with a common prime factor. However this is not
possible due to the previous exercise. Hence the number of primes is infinite.
1. The Extended Euclidean algorithm gives
3773 1 0
3596 0 1 1
177 1 −1 20
56 −20 21 3
9 61 −64 6
2 −386 405 4
1 1605 −1684 2
Looking at the last line we find that
gcd(3773, 3596) = 1 = 3773 · 1605 + (−1684) · 3596.
Moreover, we have 1 = 3773 · 1605 + 3596 · (−1684). So, x = 1605,

y = −1684.
2. Performing the Extended Euclidean Algorithm on 1995 and 1840 gives
gcd(1995, 1840) = 5 and hence x = 95 and y = −103 may be taken that
will satisfy 1995x + 1840y = 5.
Multiply now 1995x + 1840y = 5 by (−2) to see that z 0 = −2x = −190
and w0 = −2y = 206 satisfy 1995z 0 + 1840w0 = −10. Next, observe that
1995(−k · 1840) + 1840(k · 1995) = 0, for any integer k. Sum the last two equa-
tions to obtain 1995(z 0 −1840k)+1840(w0 +1995k) = −10, for any integer k. It
is now easy to find two additional solutions, for example z 1 = z 0 + 1840 = 1650
and w1 = w0 − 1995 = −1789, or z 2 = z 0 − 1840 = −2030 and w2 =
w0 + 1995 = 2201.
3. We are given that N = kc + a and N = td + b for some integers k and t.
Subtracting the two equalities yields 0 = kc + a − td − b. Therefore
a − b = kc − td.
Since the right-hand side is divisible by gcd(c, d), we see that a − b is divisible
by gcd(c, d) as well.
4. (a) The Extended Euclidean algorithm applied to 68 and 26 gives 2 =
gcd(68, 26) = 5 · 68 + (−13) · 26. Multiplying both sides by (35 − 9)/
2 = 13, we see that 35 − 9 = 13 · 5 · 68 − 13 · 13 · 26. Hence, the number
x = 35 + 13 · 13 · 26 = 9 + 13 · 5 · 68 = 4429 satisfies our congru-
ences. (There are many other solutions, all of them are congruent modulo
884 = lcm(26, 68); i.e., all these solutions are given by 4429 + 884 · n,
n ∈ Z .)
(b) The Extended Euclidean algorithm applied to 71 and 50 gives 1 = 27 · 50 +
(−19) · 71. Now, 15 = 19 − 4 and the number x = 4 + 15 · 27 · 50 =
19 + 15 · 19 · 71 = 20254 satisfies our congruences but is greater than 3550.
But x = x mod 3550 = 2504 is the unique solution of the two congruences
which lies in the interval [0, 3550).
5. (a) We know from Exercise 2 that gcd(1995, 1840) = 5. If there were integers x
and y satisfying 1840x + 1995y = 3, then 3 = 5(368x + 399y) and 3 would
be divisible by 5, a contradiction.
(b) Let C be the set of integers c for which there exist integers x and y satisfying
the equation ax + by = c, and let d = gcd(a, b). By the Extended Euclidean
Algorithm we know that there are some integers x0 , y0 , such that ax0 + by0 =
d. Let k be an arbitrary integer. Then a(kx0 ) + b(ky0 ) = kd, showing that
kd ∈ C, so C contains all multiples of gcd(a, b). Let us prove that C contains
nothing else. Write a = da and b = db , for some integers a and b ,
and take an arbitrary c ∈ C. Then, for some integers x and y, we have:
c = ax + by = d(a x + b y), showing that c is a multiple of d. Therefore, C

is indeed the set of all multiples of gcd(m, n).
1. Using the prime factorization of these numbers and the formula for φ(n) we
compute:
φ(125) = φ(53 ) = 53 − 52 = 100,

1 2 4
φ(180) = φ(22 · 32 · 5) = 180 = 48,
2 3 5
φ(1001) = φ(7 · 11 · 13) = 6 · 10 · 12 = 720.
2. We know that n = 4386607 = pq for some prime integers p, q. In this case

φ(n) = 4382136 = ( p − 1)(q − 1). Thus n − φ(n) = 4471 = p + q − 1, whence
p + q = 4472. Solving the system of equations
p + q = 4472
pq = 4386607
we find that p = 3019 and q = 1453.

3. We have φ(m) = pq( p − 1)(q − 1) = 11424 = 25 · 3 · 7 · 17. Hence p and q
can only be among the primes 2, 3, 7, 17. By the trial and error method we find
p = 7, q = 17 and m = 14161.
4. By Fermat’s Little Theorem 24 ≡ 1 mod 5 so we need to find the remainder of
2013
22013 on division by 4. This remainder is obviously 0 so the remainder of 22
on division by 5 is 1.
5. By Fermat’s Little Theorem we have a 6 ≡ 1 mod 7 for all a ∈ Z, which are not
divisible by 7. As 333 = 47 · 7 + 4 and 555 = 92 · 6 + 3,
333555 ≡ 43 ≡ 64 ≡ 1 mod 7.
As 555 = 79 · 7 + 2 and 333 = 55 · 6 + 3,
555333 ≡ 23 ≡ 1 mod 7.
Thus 333555 + 555333 ≡ 1 + 1 = 2 mod 7 and the remainder is 2.

6. We compute a n−1 mod n as follows:
gap> n:=1234567890987654321;
1234567890987654321
gap> a:=111111111;
111111111
gap> PowerMod(a,n-1,n);
385560404775530811
The result is not equal to 1 and this shows that by Fermat’s Little Theorem n is
not prime. Indeed, we see that n has four different prime factors:
gap> Factors(n);
[ 3, 3, 7, 19, 928163, 1111211111 ]
1. (a) This follows from m | (a − b) if and only if dm | (da − db).

(b) Since a ≡ b mod m means m | (a − b), we see that for any divisor d | m we
have d | (a − b) which is the same as a ≡ b mod d.
(c) Indeed, a ≡ b mod m i , is equivalent to m i | (a − b). This implies
lcm(m 1 , m 2 , . . . , m k ) | (a − b),
which means the equivalence holds also for the least common multiple of the
m i ’s.
2. We have 72 ≡ −3 mod 25, 47 ≡ −3 mod 25 and 28 ≡ −3 mod 25. Thus
722n+2 − 472n + 282n−1 ≡ (−3)2n+2 − (−3)2n + 32n−1 mod 25.
Since 2n + 2 and 2n are even, the right-hand side will be
32n+2 − 32n + 32n−1 = 32n (27 − 3 + 1) = 25 · 32n ,
which is obviously divisible by 25.

3. We have φ(3x 5 y ) = 3x−1 5 y−1 · 2 · 4 = 3x−1 5 y−1 · 23 and 600 = 23 · 3 · 52 . By
uniqueness of prime factorisation, we have x − 1 = 1 and y − 1 = 2. Hence
x = 2 and y = 3.
4. Let S be the set of positive integers a for which the congruence has a solution.
We see that 243 = 35 and φ(243) = 2 · 34 = 162. By Euler’s theorem:
If gcd(x, n) = 1, then x φ(n) ≡ 1 mod n.
Hence if gcd(x, 243) = 1, then x 162 ≡ 1 mod 243. Hence 1 ∈ S. If gcd(x, n) >
1, then x = 3y and x 162 ≡ 0 mod 243. Thus S = {0, 1}.
5. We are given that n = pq where p and q are primes. Moreover, we know that
φ(n) = φ( p)φ(q) = ( p − 1)(q − 1) = pq − p − q + 1 = 3308580, and therefore
p + q = n − 3308580 + 1. We now determine p and q from the equations:

pq = 3312913,
p + q = 4334.
This shows that p and q are the roots of the quadratic equation x 2 − 4334x +
3312913 = 0 which roots are 3343 and 991. The result is n = pq = 3343 · 991.
Solutions to Exercises of Sect. 1.4

1. By the distributive law (CR5) we have a · 0 + a · 0 = a · (0 + 0) = a · 0. Now
subtracting a·0 on both sides we get a·0 = 0. We further argue as in Lemma 1.4.2.
2. (a) The invertible elements of Z16 are those elements that are relatively prime to
16 = 24 (i.e., those which are odd). We have
12 = 72 = 92 = 152 = 1, 3 · 11 = 1, 5 · 13 = 1,
thus 1, 7, 9, 15 are self-inverse, 3−1 = 11, 11−1 = 3, 5−1 = 13 and

13−1 = 5.
(b) The zero-divisors of Z15 are those (non-zero) elements that are not relatively
prime to 15 = 3 · 5 (i.e., multiples of 3 or 5). We have
{3, 6, 9, 12} {5, 10} = 0.
That is, a b = 0 whenever one of a and b is a multiple of 3 and another is

a multiple of 5.
3 (a) Using the Euclidean algorithm, we find that gcd(111, 74) = 37 and that
gcd(111, 77) = 1, so 77 is invertible and 74 is a zero divisor. Since 111 =
3·37, we have 74 c = 0 for any c that is a multiple of 3. From the Extended
Euclidean algorithm 1 = 34 · 111 − 49 · 77, hence 77−1 = −49 = 62.
(b) We have
77 x ⊕ 21 = 10 ⇒ 77 x = 10 ⊕ (−21) = 10 ⊕ (90) = 100.
Hence
x = (77−1 ) 100 = 62 100 = 95,
so that x = 95, while
74 x ⊕ 11 = 0 ⇒ 74 x = −11 = 100,
and there are no solutions because {74 x | x ∈ Z111 } = {0, 37, 74}.
4. Since we will have only operations in Zn for various n but not in Z we will write
+ and · instead of ⊕ and . Recall that a function from a set A to A itself is
one-to-one if no two (different) elements of A are mapped to the same element
of A. For a finite set this is also equivalent to f being onto which can be also
restated as the range of f being all of Z21 .
(a) If a is a zero-divisor in Z21 , that is, if there is an element d = 0 in Z21 , such
that ad = 0 mod 21, then f (d) = ad + b = b = f (0), and f is not one-to-
one. On the other hand, if a is not a zero divisor, then gcd(a, 21) = 1, and
there exists (a unique) element c ∈ Z21 satisfying ac = 1 mod 21. But then
f (x1 ) = f (x2 ) implies cf(x1 ) = cf(x2 ), or c(ax1 + b) = c(ax2 + b), which
reduces to x1 + cb = x2 + cb and finally implies that x1 = x2 , proving that
f is one-to-one in this case. The set of pairs (a, b), for which the function
f is one-to-one is therefore {(a, b) | a, b ∈ Z21 and gcd(a, 21) = 1}.
(b) Since 7 is not relatively prime with 21 the function f is not one-to-one, and
so the image of f is a proper subset of Z21 . The expression 7x, for x ∈ Z21 ,
takes only three values in Z21 , namely 0 if x is a multiple of 3, 7 if x is
congruent to 1 modulo 3, and 14 if x is congruent to 2 modulo 3. The image
of f is therefore {3, 10, 17}.
(c) The condition f −1 ( f (x)) = x, for all x ∈ Z21 , is equivalent to c(ax +
b) + d = x, or (ac)x + (cb + d) = x. It is sufficient to take ac = 1 and
cb + d = 0. We can find c by solving the equation 4c + 21y = 1 using
the Extended Euclidean Algorithm, which gives us c = −5, y = 1, or
better, c = 16, y = −3. Now, d = −cb = −16 · 15 = 12 mod 21. So,
f −1 (x) = 16x + 12.
5. Fermat’s Little Theorem says that if p is prime and a is not divisible by p, then
a p−1 ≡ 1 mod p. Hence x 10 = 1 in Z11 . So x 102 = x 2 in Z11 . The equation
x 2 = 4 has in Z11 two solutions: x1 = 2 and x2 = −2 = 9.
6. Since m is odd, gcd(m, 2) = 1, whence 2φ(m) ≡ 1 mod m. Thus 2φ(m)−1 ≡
2−1 mod m which is the inverse of 2 in Zm . Since m is odd, m + 1 is an even
number and (m + 1)/2 is an integer. This number is the inverse of 2 in Zm since
2 (m + 1)/2 = 1. Therefore 2φ(m)−1 ≡ (m + 1)/2 mod m.
7. If ( p − 1)! ≡ −1 mod p, then gcd( j, p) = 1 for all j ∈ Z∗p . Hence p is prime. If
p is prime, then the equation x 2 = 1 in Z p is equivalent to (x − 1)(x + 1) = 0,
hence has only two solutions x = ±1, that is, either x = 1 or x = p − 1. Then for
every j ∈ {2, . . . , p − 2} we have j −1 = j. This means 2 · 3 · . . . · ( p − 2) = 1.
Hence ( p − 1)! = p − 1 = −1.

1. 2002(10) = 11111010010(2) ; and 1100101(2) = 26 + 25 + 22 + 1 = 99(10) .
2. (a) 2011(10) = 11111011011(2) ;
(b) 101001000(2) = 28 + 26 + 23 = 256 + 64 + 8 = 328(10) .
3. Observe first that the last three digits in the binary representation depend only on
the remainder on division by 8. Namely, if a = an 2n +· · ·+a3 23 +a2 22 +a1 2+a0
is the binary representation of a, then a ≡ a2 22 + a1 2 + a0 mod 8. Clearly
751015 ≡ 31015 mod 8. By Euler’s Theorem, 3φ(8) = 34 ≡ 1 mod 8. Therefore,
751015 ≡ 3253·4+3 ≡ 33 ≡ 3 mod 8. Since 3 = 11(2) , we see that the last three
digits in the binary representation of 751015 are 011.
4. We calculate as follows:
. . 01 (2) = (2n−1 + 1)(2m−1 + 1) = 2n+m−2 + 2n−1 + 2m−1 + 1.

. . 01 (2) · 10 .
10 .
n m
Therefore, there are 4 non-zero digits if m = n, 3 non-zero digits if m = n > 2,

and 2 non-zero digits if m = n = 2.
5. We are given that n = a · 73 + b · 72 + c · 71 + d. Since 7 ≡ 1 mod 6, this

means that n ≡ a + b + c + d mod 6. Therefore n ≡ 0 mod 6 if and only if
a + b + c + d ≡ 0 mod 6.
6. (a) 2A4F(16) = 2 · 163 + 10 · 162 + 4 · 16 + 15 = 10831,
(b) 1000 = 16 · 62 + 8, and 62 = 16 · 3 + 14, so
1000 = 62 · 16 + 8 = (16 · 3 + 14) · 16 + 8 = 3 · 162 + 14 · 16 + 8 = 3E8(16) .
1. To encrypt we will do the calculation pi ⊕ ki = ci where pi , ki , ci are the

encodings of the ith positions of the plain text, the key and the cypher text,
respectively.
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Plaintext B U Y M O R E P R O P E R T Y
pi 1 20 24 12 14 17 4 16 17 14 16 4 17 19 24
Key T O D A Y I W I L L G O O N C
ki 19 14 3 0 24 8 22 8 11 11 6 14 14 13 2
pi + ki = ci 20 8 1 12 12 25 0 24 2 25 22 18 5 6 0
Cyphertext U I B M M Z A Y C Z W S F G A
So the cyphertext is UIBMMZAYCZWSFGA.

Conversely, to decrypt we add (−ki ) to each side of the above to get pi =
ci + (−ki ).
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Cyphertext R C X R N W O A P D Y W C A U
ci 17 2 23 17 13 22 14 0 15 3 24 22 2 0 20
Key T O D A Y I W I L L G O O N C
−ki 7 12 23 0 2 18 4 18 15 15 20 12 12 13 24
ci + (−ki ) = pi 24 14 20 17 15 14 18 18 4 18 18 8 14 13 18
Plaintext Y O U R P O S S E S S I O N S
i 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Cyphertext E R K Y W H Z R G S X Q J W
ci 4 17 10 24 22 7 25 17 6 18 23 16 9 22
Key E A G A I N I N T O L I F E
−ki 22 0 20 0 18 13 18 13 7 12 15 18 21 22
ci + (−ki ) = pi 0 17 4 24 14 20 17 4 13 4 12 8 4 18
Plaintext A R E Y O U R E N E M I E S
The plaintext is therefore YOUR POSSESSIONS ARE YOUR ENEMIES.

2. We will place the result in an array called random:
gap> random:=[1..20];;
> random[i]:=Random([0..25]);
> od;
gap> random;
[ 24, 19, 16, 9, 1, 9, 24, 24, 15, 3, 12, 3, 10, 11, 21, 23, 19, 6, 19, 24 ]
3. The message as a numerical string will be: [8, 7, 0, 21, 4, 13, 14, 19, 8, 12, 4, 19,
14, 7, 0, 19, 4].
gap>#Entering the key:
gap> k:=random;;
gap>#Entering the message:
gap> p:=[ 8, 7, 0, 21, 4, 13, 14, 19, 8, 12, 4, 19, 14, 7, 0, 19, 4 ];;
gap> c:=[1..Length(p)];
[ 1 .. 17 ]
gap> for i in [1..Length(p)] do
> c[i]:=(p[i]+k[i]) mod 26;
> od;
gap> c;
[ 6, 0, 16, 4, 5, 22, 12, 17, 23, 15, 16, 22, 24, 18, 21, 16, 23 ]
gap># which in letters will be GAQEFWMRXPQWYSVQX
gap> # Decoding back:
gap> q:=[1..Length(p)];;
gap> for i in [1..Length(p)] do
> q[i]:=(c[i]-k[i]) mod 26;
> od;
gap> p=q;
true

1. (13, 11) cannot be used as a key since 13 is not invertible in Z26 and the mapping
x → 13x + 11 (mod 26) would not be one-to-one.
2. The cyphertext for CRYPTO will be JSRWOL. The inverse function for
decrypting is
C −→ x −→ y = 19x + 13 mod 26 −→ L
and the plaintext for DRDOFP is SYSTEM. We can calculate the latter using
subprograms LtoN and NtoL:
gap> str := "DRDOFP"; ;
gap> outstr := "A";
gap> for i in [1..Length(str)] do
> outstr[1] := NtoL( (19*LtoN( str[i] ) + 13) mod 26);
> Print( outstr );
> od;
SYSTEM
3. Since the letter F was encrypted as N, and the letter K was encrypted as O. Then
for the encryption function f (x) = ax + b mod 26 we will have f (5) = 13 and
f (10) = 14. Solving the system of equations in Z26
5a + b = 13,
10a + b = 14
we find a = 21 and b = 12, hence the key is the pair (21, 12).
With GAP this would be
gap> M:=[[5,1],[10,1]];
[ [ 5, 1 ], [ 10, 1 ] ]
gap> rhs:=[13,14];
[ 13, 14 ]
gap> [a,b]:=Mˆ-1*rhs mod 26;
[ 21, 12 ]
4. A straightforward counting shows that the relative frequencies of the 26 letters in

the cyphertext are as given in the table below.
letter rel. freq letter rel. freq letter rel. freq

a 0.049 j 0.076 s 0.017
b 0.052 k 0.045 t 0.000
c 0.135 l 0.076 u 0.062
d 0.000 m 0.031 v 0.007
e 0.000 n 0.031 w 0.007
f 0.000 o 0.035 x 0.101
g 0.000 p 0.093 y 0.021
h 0.007 q 0.017 z 0.000
i 0.101 r 0.066
Since the most frequent letter in the cyphertext is c and in English texts this
role is usually played by e, our guess is that the encryption function f (x) =
ax + b mod 26 maps the integer value of e, which is 4, to the integer value of c,
which is 2. This gives the first equation:
4a + b = 2 mod 26. (11.1)
The second most frequent letter in English is t, while in our cyphertext the second
place is shared by x and i. Suppose first that the letter t was encrypted to x. Then
19a + b = 23 mod 26, (11.2)
implying 15a = 21 mod 26, a = 7 · 21 = 17 mod 26, and b = 2 − 4a =

12 mod 26. If the encryption function is f (x) = ax + b mod 26, then the
decryption function is g(x) = cx − cb mod 26, where ca = 1 mod 26. In the
case a = 17, b = 12, we get c = 23 and so g(x) = 23x + 10 mod 26. If we
decrypt the cyphertext with this function we get
djree rctqk xmr ...
which is obviously not an English text. Our guess that t was encrypted to x must
therefore be wrong. We get similar nonsense if we assume that t is encrypted to i.
We can either proceed in this fashion until we get something meaningful, or
observe that in our cyphertext the group of three letters ljc is very frequent. Since
our guess is that c is in fact encrypted to e, it is very plausible that the group ljc
represents the word the. If this is the case, then t is encoded to l, which gives the
equation
19a + b = 11 mod 26, (11.3)
This, together with (11.1), implies that a = 11 and b = 10. The decrypting
function is then g(x) = 19x + 18. Decrypting the cyphertext with g gives the
following plaintext:
three rings for the eleven kings under the sky seven for the dwarf lords in their halls of stone
nine for mortal men doomed to die one for the dark lord on his dark throne in the land of
mordor where the shadows lie one ring to rule them all one ring to find them one ring to
bring them all and in the darkness bind them in the land of mordor where the shadows lie

1. (a) Computing the determinants of these matrices we get

1 12 16
det = 13, det = 17.
12 1 61
Since 13 is not relatively prime to 26 the first matrix is not invertible because
its determinant is not invertible. Since 17−1 = 23 exists the second matrix is
invertible with
−1

16 1 20 23 18
= 23 = .
61 20 1 18 23
(b) We use the matrix

16
M=
61
for encryption. Encrypting YEAR we replace letters by the numbers
YEAR → (24, 4, 0, 17)
then compute

24 16 24 22
M = =
4 61 4 18
and

0 16 0 24
M = = .
17 61 17 17
Now we can complete the encryption:
YEAR → (24, 4, 0, 17) → (22, 18, 24, 17) → WSYR.
To decrypt ROLK we represent letters by numbers
ROLK → (17, 14, 11, 10)
and then use the inverse matrix M −1 to compute

−1 17 23 18 17 19
M = =
14 18 23 14 4
and

−1 11 23 18 11 17
M = = .
10 18 23 10 12
Now we can complete the decryption:
ROLK → (17, 14, 11, 10) → (19, 4, 17, 12) → TERM.
2. Let v = (x, y)T be the vector of numerical encodings for X and Y, respectively.
Then we know that K (K v) = v, that is K 2 v = v. Of course, v = (0, 0)T is one
solution. If v = 0, then it is an eigenvector of K 2 belonging to the eigenvalue 1.
We have

2 54 10 44
K −I = − =
45 01 44
The nullspace of this matrix is spanned by the vector (1, −1)T = (1, 25)T
The other eigenvectors will be (2, −2)T = (2, 24)T , etc. up to (13, −13)T =
(13, 13)T . Together with (0, 0)T we will have 14 pairs:
XY = AA, AZ, BY, CX, . . . , NN.
3. Write the unknown secret key as

ab
K = .
cd
The first 4 letters of the ciphertext correspond to the vectors (13, 22) and (14, 11).
The first 4 letters of the message correspond to (3, 4) and (0, 17). Hence, the secret
key satisfies the equations

ab 3 13 ab 0 14
≡ , ≡ mod 26.
cd 4 22 cd 17 11
Using the second equations first we find 17b ≡ 14 (mod 26) and 17d ≡ 11
(mod 26). Compute 17−1 ≡ −3 ≡ 23 (mod 26) using the extended Euclidean
algorithm (or Gcdex in GAP) and hence determine that b = 10 and d = 19.
Now use the first equations to solve for a and c. One has 3a ≡ 13−4b (mod 26).
Since 3−1 ≡ 9 (mod 26) one determines that a = 17. Similarly, 3c ≡ 22 − 4d
(mod 26) and so c = 8.
Now that one has the key, compute

5 22
K −1 =
2 25
and decrypt the ciphertext. The original message is
DEAR BOB I HAVE GOT IT X
(the letter X at the end was added to make the total number of letters in the
message even).
4. Firstly we input what is given: the key K and the cryptotext c:
gap> K:=[ [ 1, 2, 3, 4, 5 ], [ 9, 11, 18, 12, 4 ], [ 1, 2, 8, 23, 3 ],
[ 7, 14, 21, 5, 1 ], [ 5, 20, 6, 5, 0 ] ];;
gap> c:= [ 24, 12, 9, 9, 4 ], [ 4, 25, 10, 4, 22 ], [ 7, 11, 16, 16, 8 ],
[ 18, 3, 9, 24, 9 ], [ 2, 19, 24, 4, 20 ], [ 1, 24, 10, 5, 1 ],
[ 22, 15, 1, 1, 4 ] ];;
gap>#Calculating the inverse of the key matrix:
gap> M:=Kˆ-1 mod 26;
[ [ 21, 16, 22, 25, 11 ], [ 17, 9, 22, 21, 9 ], [ 6, 10, 23, 17, 20 ],
[ 13, 14, 8, 11, 2 ], [ 22, 2, 3, 4, 17 ] ]
gap># Preparing the list for the plaintext:
gap> p:=[[],[],[],[],[],[],[],[],[],[],[],[]];;
gap>#Calculating the plaintext:
> p[i]:=c[i]*M mod 26;
> od;
gap> p;
[ [ 12, 0, 19, 7, 4 ], [ 12, 0, 19, 8, 2 ], [ 8, 0, 13, 18, 0 ],
[ 17, 4, 12, 0, 2 ], [ 7, 8, 13, 4, 18 ], [ 5, 14, 17, 2, 14 ],
[ 13, 21, 4, 17, 19 ], [ 8, 13, 6, 2, 14 ], [ 5, 5, 4, 4, 8 ],
[ 13, 19, 14, 19, 7 ], [ 4, 14, 17, 4, 12 ], [ 18, 25, 25, 25, 25 ] ]
gap># This reads: "Mathematicians are machines for conversion of coffee into
gap># theorems."
gap># This famous statement belongs to Paul Erd\"{o}s.
5. Let us start with some brief Linear Algebra preliminaries. Let R be a commutative
ring and A be an n × n matrix with entries from R.
The (i, j) minor of A, denoted Mij , is the determinant of the (n − 1) × (n − 1)

matrix that results from deleting row i and column j of A. The cofactor matrix of
A is the n ×n matrix C whose (i, j) entry is the (i, j) cofactor Aij = (−1)i+ j Mij .
Finally, the adjugate adj(A) of A is the transpose of C, that is, the n × n matrix
whose (i, j) entry is the ( j, i) cofactor of A.
The adjugate is defined so that the product of A and its adjugate yields a diagonal
matrix whose diagonal entries are det(A):
Aadj( A) = adj(A) A = det(A)In ,
where In is the identity matrix of order n. If det(A) is invertible, then A−1 =

1
det(A) adj(A) is the inverse of A.
On the other hand, if A is invertible and there exists a matrix B such that AB =
BA = In , where In is the identity matrix of order n, then
det( A) det(B) = det(AB) = det(In ) = 1
and det( A) is invertible.
1. We have
(log x)2 ((log x)2 ) log x

lim √ = lim √ =4 √ →0
x→∞ x x→∞ ( x) x
(as in Example 2.3.4).

2. Let us apply L’Hospital rule once:
n 2007 x 2007 (x 2007 ) 2007 · x 2006 x 2006.5

lim √ = lim √ = lim √ = lim √
1 −1/2
= k1 · lim √ ,
n→∞ 2 n x→∞ 2 x x→∞ x ) x→∞ ln 2 · 2 x · x→∞ 2 x
(2 2x
4014
where k1 = ln 2 . If we continue applying L’Hospital rule 4014 times, we will
obtain
x 2007 1
lim √ = k4014 · lim √ = 0.
x→∞ 2 x x→∞ 2 x
Hence f (n) = o(g(n)).

3. Straightforward.
4. Let χ(x) = lnxx , so that we are trying to prove that ψ(x) ∼ χ(x), or in other
words, that
χ(x)
lim = 1.
x→∞ ψ(x)
Since lim x→∞ ψ(x) = lim x→∞ χ(x) = ∞, we can apply L’Hospital’s rule to
get
χ(x) χ (x)
lim = lim
x→∞ ψ(x) x→∞ ψ (x)

ln x − 1 1 1
= lim · 1
= lim 1 − = 1,
x→∞ (ln x)2 x→∞ ln x
ln x
where we differentiated χ(x) using the Quotient rule and ψ(x) using the Funda-
mental Theorem of Calculus.
5. (a) We have f (n) = o(g(n)) since
(ln n)1000
lim = 0.
n→∞ n 10
This can be established by L’Hospital’s rule:
(ln n)1000 1000(ln n)999 (1000)!

lim 10
= lim 10
= · · · = lim = 0.
n→∞ n n→∞ 10n n→∞ 101000 n 10
Also g(n) = o(h(n)), since by L’Hospital’s rule
n 10 10n 9 10!
lim = lim = · · · = lim = 0.
n→∞ en/3 n→∞ 1/3en/3 n→∞ (1/3)10 en/3
So the functions are already listed in increasing order of magnitude.

(b) The function f (n) is bounded from above and below: | f (n)| ≤ e so f (n) =
O(1). We apply Stirling’s formula to find
√ n n 1
h(n) ∼ ln 2πn = (ln(2π) + ln n) + n(ln n − 1) ∼ n ln n.
e 2
This is faster than constant but slower than n 2 . Hence we have to order the
functions as f (n), h(n), g(n).
6. The algorithm is based on the observation that the equality
√ i
i n = n
is true if and only if n is an ith power of an integer. Indeed, if n is not a perfect

√ √ √ √ i
ith power, then i n is not an integer and i n < i n. Then i n < n.
What is the maximal number i for which we shall try to check this equation? If
n = a i , then i = loga n which reaches the maximal value at log2 n. Hence we
should check the equation for i = 2, 3, . . . , log2 n.
Thus the program performs log2 n operations RootInt, hence its complexity is
linear in the number of bits of n, which is the size of the input.
gap> i:=1;;
gap> ell:=LogInt(n,2);
2121
gap> while i<(ell+1) do
> if RootInt(n,i)î=n then
> m:=RootInt(n,i);
> k:=i;
> fi;
> i:=i+1;
> od;
gap> m;
113
gap> k;
311
Hence n = 113311 .
7. (a) We have

n n! 1
= = n(n − 1) · · · (n − k + 1) ∼ (n k )
k k!(n − k)! k!
so this growth is polynomial. n √

(b) The Stirling approximation asserts that n! ∼ ne 2πn. This gives
√
n (n/e)n 2πn
∼ · √ √ ,
αn (nα/e)nα (nβ/e)nβ 2πnα 2πnβ
where β = 1 − α. After simplifications we get

n 1 1
∼ nα nβ √ .
αn α β 2πnαβ
Upon taking logarithm to base 2, we get:

1 n ln n
log2 ∼ −(α log2 α + β log2 β) − log2 (2πnαβ)/(2n) = H (α) + O .
n αn n
The function H (α) = −(α log2 α + β log2 β) is called the Entropy function.
We get as a result that

n
∼ en H (α) ,
αn
which is exponential growth.

1. (a) The exact number of bits required to input an integer N will be log2 N . We
are interested in integers between 1099 and 10100 , so we have
log2 1099 = 99 log2 10 ≈ log2 10100 = 100 log2 10 ≈ 330.
(b) Using the repeated division algorithm, we have
1234567 = 1001011010110100001112 .
We can use GAP to establish the sequence of remainders:

gap> n:=1234567;
1234567
gap> base:=2;
2
gap> rems:=[];
[ ]
gap> pos:=1;
1
gap> while n>0 do;
> rems[pos]:=RemInt(n,base);
> n:=QuoInt(n,base);
> pos:=pos+1;
> od;
gap> n;
0
gap> rems;
[ 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1 ]
This gives us the digits but in the reverse order.

0
Since the binary representation is 21 digits long, we need to calculate c2 =
1 20
c, c2 = c2 , . . . c2 first. This requires 20 multiplications. Since the binary
representation contains 11 digits 1, we need to multiply 11 of these entries
together to obtain cn , which is a further 10 multiplications, for a total of 30.
2. Suppose that n items are being bubble sorted. We need to determine the greatest
possible number of swaps that can occur. For each item x in the list, let Sx be the
set of items in the list that are smaller than x, and ahead of x in the list. Each time
a swap occurs where x is the larger item, the order of Sx decreases by one, and
when Sx is empty no more swaps with x as the larger item can occur. While Sx is
still non-empty, the list is not yet sorted. So |Sx | is exactly the number of swaps
that will occur with x as the larger item. Seeing that in every swap one of the two
items involved is the larger, we simply need to add up these values for every item
to determine the number
of swaps. In other words, the total number of swaps to
sort a list will be x |Sx |.
An obvious upper bound on |Sx | is the number of items in the list that are smaller
than x. The ith smallest item in the list can have at most i − 1 items that are
smaller than it, so
n
n(n − 1)
|Sx | ≤ (i − 1) = = (n 2 ).
x
2
i=1
We also need to check that this can actually occur, and clearly it can, for example,
the list {n, n − 1, . . . , 2, 1} will require this number of swaps.
3. In the worst-case
√ scenario
√ we might need (log2 N )3 divisions. For large N , this
is less than N / ln N divisions required by the standard algorithm. Therefore
some composite numbers together with primes might be declared to be interesting.
This algorithm has polynomial complexity. Indeed, since we may consider that
N ≈ 2n , where n is the number of bits necessary to input N , the worst-case
complexity function is f (n) ≈ (log2 2n )3 = n 3 . It is cubic.
4. (a) Obviously f n ≥ f n−1 . Hence f n = f n−1 + f n−2 ≤ 2 f n−1 . We have f n+5 =
f n+4 + f n+3 = 2 f n+3 + f n+2 = 3 f n+2 + 2 f n+1 = 5 f n+1 + 3 f n = 8 f n +
5 f n−1 > 8 f n + 4 f n−1 ≥ 8 f n + 2 f n = 10 f n .
(b) We may assume a > b. We use the Euclidean algorithm to find:
a = q1 b + r 1 , 0 < r1 < b,
b = q2 r 1 + r 2 , 0 < r2 < r1 ,
r 1 = q3 r 2 + r 3 , 0 < r3 < r2 ,
..
.
rs−2 = qs rs−1 + rs , 0 < rs < rs−1 ,
rs−1 = qs+1 rs .
where rs = gcd(a, b). We may set b = r0 .

We have rs ≥ 1 = f 2 and rs−1 ≥ 2 = f 3 . By induction on i we prove
that rs−i ≥ f i+2 for all i = 0, 1, . . . , s. Indeed, if the statement is true
for all i such that 0 ≤ i < k, then rs−k = qs−k+2 rs−k+1 + rs−k+2 ≥
rs−k+1 + rs−k+2 ≥ f k+1 + f k = f k+2 , which proves the induction step. In
particular, b = r0 ≥ f s+2 .
Suppose now that s > 5k, where k is the number of decimal digits in b.
However, by (a) we get b ≥ f s+2 > f 5k+2 > 10k f 2 = 10k , which is a
contradiction.
1. As e1 = 2145 and φ(n) = 11200 are obviously not coprime (they have a factor
5 in common), e1 cannot be used in a public key. On the other hand, e2 = 3861
is coprime with φ(n) and the Extended Euclidean algorithm gives us 1 = 1744 ·
11200 + (−5059) · 3861. So d = 11200 − 5059 = 6141. Checking with GAP:
gap> QuotientMod(1,3861,11200);
6141
2. We first need to calculate Bob’s private key which is e−1 mod φ(n) =
113−1 mod 120 = 17 and then calculate 9717 mod 143 = 15. So the letter
was ‘E’.
3. Bob calculates m 2 , m 4 , m 8 , m 16 , m 32 by successive squaring. Then he multiplies
m 32 · m 8 · m = m 41√using in total 7 multiplications.
5. (a) We calculate 20687 = 143. Assuming that 20687 is a product of two
three-digit primes, the smallest prime factor of 20687 should be one of these
primes:
101, 103, 107, 109, 113, 127, 131, 137, 139.
Trying all of them we find that 20687 = 137 · 151. Thus φ(20687) = 136 ·
150 = 20400. Now we may compute Alice’s private key which is d =
17179−1 mod 20400. We compute
20400 1 0
17179 0 1
3221 1 −1
1074 −5 6
1073 11 − 13
1 −16 19
Hence d = 19 (aren’t we lucky that it is so small!). Thus the plaintext will be

35319 mod 20687. We note that 19 = 16+2+1 = (10011)(2) . Thus we have
to compute 3532 and 35316 and then 35319 operating in Z20687 . We compute
(fingers crossed) 3532 = 487, 3534 = 4872 = 9612, 3538 = 96122 = 2042,
35316 = 20422 = 18618. Hence 35319 = 18618 487 353 = 6060
353 = 8419, which is the plaintext.
Checking with GAP:
gap> QuotientMod(1,17179,20400);
19
gap> PowerMod(353,19,20687);
8419
6. (a) The cyphertext Alice needs to send to Bob is c = m e mod n = 1831003 mod
24613. Without GAP, this number can be efficiently calculated as follows:
first, find the binary representation e = 1111101011(2) and construct the
sequence (computed in Z n )
m0 =m= 183,
m1 = m02 = 8876,
2
m2 = m12 = m02 = 21776,
3
m3 = m22 = m02 = 118,
4
m4 = m32 = m02 = 13924,
5
m5 = m42 = m02 = 1175,
6
m6 = m52 = m02 = 2297,
7
m7 = m62 = m02 = 9027,
8
m8 = m72 = m02 = 17699,
9
m9 = m82 = m02 = 4950.
Now,
9 +28 +27 +26 +25 +23 +2+1
c =1832 mod n =
((((((m 9 m 8 ) m 7 ) m 6 ) m 5 ) m 3 ) m 1 ) m 0 = 20719.
GAP, of course, simplifies calculations greatly:

gap> n:=24613; ; e:=1003;; m:=183;;
gap> PowerMod(m,e,n);
20719
(b) The private key d and the public key e satisfy the equation ed = 1 mod φ(n),
or equivalently, ed + yφ(n) = 1. The Extended Euclidean Algorithm gives a
negative solution d = −533, which is congruent to d = d + φ(n) = 23767
modulo φ(n).
gap> QuotientMod(1,e, 24300);
23767
(c) Bob can decrypt the cyphertext c = 16935 raising c to power d, m =

cd mod n = 1693523767 . Applying the same procedure as in (a) we get
m = 135. Alternatively we may use GAP as follows:
gap> PowerMod(16935,23767,n);
135
1. The double encryption with e1 and then with e2 is the same as one encryption
with e = e1 e2 , since c2 ≡ c1e2 ≡ (m e1 )e2 ≡ m e1 e2 mod n. As gcd(e1 e2 , φ(n)) =
1, the product e1 e2 is another legitimate exponent. For decryption we can use
exponent d = d1 d2 , since e1 e2 d1 d2 ≡ 1 mod φ(n) and m ≡ (c2d2 )d1 ≡ c2d1 d2
mod n. Thus double encryption is the same as a single encryption (with another
exponent) and it does not increase security over single encryption.
2. Eve has to try to factorise n and if it is successful, then calculate φ(n) and then
Alice’s private decryption exponent d.
gap> n:=30796045883;
30796045883
gap> e:=48611;
48611
gap> factors:=FactorsInt(n);
[ 163841, 187963 ]
gap> # So the factorisation was successful!
gap> phi:= (factors[1]-1)*(factors[2]-1);
30795694080
gap> d:=QuotientMod(1,e,phi);
20709535691
gap> # Eve inputs cryptotext in the list c
gap> c:=[ 5272281348, 21089283929, 3117723025, 26844144908, 22890519533,
26945939925, 27395704341, 2253724391, 1481682985, 2163791130,
13583590307, 5838404872, 12165330281, 28372578777, 7536755222 ];;
gap> # Now she decodes the crytpotext writing the output into the list m:
gap> m:=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];;
> m[i]:=PowerMod(c[i],d,n);
> od;
gap> m;
[ 2311301815, 2311301913, 2919293018, 1527311515, 2425162913, 1915241315,
1124142431, 2312152830, 1815252835, 1929301815, 2731151524, 2516231130,
1815231130, 1913292116, 1711312929 ]
gap> # This reads:"Mathematics is the queen of sciences and number theory
gap> # is the queen of mathematics KF GAUSS"
3. Indeed, FactorsInt can factorise n in a matter of seconds and knowing p and q

we can calculate Alice’s private key d:
gap> n:=956331992007843552652604425031376690367;
956331992007843552652604425031376690367
gap> factors:=FactorsInt(956331992007843552652604425031376690367);
[ 7746289204980135457, 123456789012345681631 ]
gap> p:=factors[1];
7746289204980135457
gap> q:=factors[2];
123456789012345681631
gap> n=p*q;
true
gap> phi:=(p-1)*(q-1);
956331992007843552521401346814050873280
gap> e:=12398737;
12398737
gap>d:=QuotientMod(1,e,phi);
801756262003467870842260800571951669873
Now it is possible to read the message:

gap> crypto:= [ 427849968240759007228494978639775081809,
> 498308250136673589542748543030806629941,
> 925288105342943743271024837479707225255,
> 95024328800414254907217356783906225740 ];;
gap> plain:=[0,0,0,0];;
> plain[i]:=PowerMod(crypto[i],d,n);
> od;
gap> plain;
[ 30181929001929002335002215303015280030,
25003018150033252822140030181130002415,
32152800332825301500302500231500152319,
223500141913211924292524 ]
This reads:
THIS IS MY LETTER TO THE WORLD THAT NEVER WROTE TO ME
EMILY DICKINSON
1. (a) The test (b, 91) reveals compositeness of 91 with probability 2/89 for the
interval b ∈ {2, . . . , 90} as only b = 7 and b = 13 are divisors of 91.
(b) There are n−φ(n)−1 = 18 numbers b in {2, 3, . . . , 90} that are not relatively
prime to n = 91 and will reveal compositeness of n = 91. The probability
sought for is 18/89.
2. Since φ(91) = 72, by Euler’s theorem 572 ≡ 1 mod 91. Hence 5090 ≡
5018 mod 91 and since 18 = 16 + 2 = (10010)(2) we have to compute 52 and
516 . We compute in Z91 as follows: 52 = 25, 54 = 252 = 79, 58 = 792 = 53,
516 = 532 = 79. Hence 518 = 64 and 590 ≡ 1 (mod 91). We know that 91 is
composite by the third test.
3. We use Exercise 4 of Sect. 1.1.1 as follows:
n
22
2 Fn − 2 = 2 2 Fn −1 − 1 = 2 2 + 1 − 2 = 2(F2n − 2) = 2F0 F1 . . . F2n −1 .
Since n < 2n − 1, we get Fn | 2 Fn − 2 or 2 Fn ≡ 2 mod Fn . Hence, if Fn is not

prime, it is a pseudoprime to base 2.
4. Here is a GAP program. It counts the number of integers i between 1 and n such
that they are relatively prime to n and such that i n−1 ≡ 1 mod n. If this number
is equal to φ(n) and n is composite, the number n is a Carmichael number.
gap> n:=15841;;
gap> counter:=0;
0
gap> for i in [1..n] do
> if GcdInt(i,n)=1 then
> if PowerMod(i,n-1,n)=1 then
> counter:=counter+1;
> fi;
> fi;
> od;
gap> counter=Phi(n);
true
gap> IsPrime(n);
false
We see that indeed n = 15841 is a Carmichael number.

5. We note that 561 = 3 · 11 · 17. Let gcd(a, 561) = 1. Then gcd(a, 3) =
gcd(a, 11) = gcd(a, 17) = 1. Hence by Fermat’s Little Theorem we have:
a 2 ≡ 1 mod 3, a 10 ≡ 1 mod 11 and a 16 ≡ 1 mod 17. Thus a 560 = (a 2 )280 ≡
1 mod 3, a 560 = (a 10 )56 ≡ 1 mod 11, and a 560 = (a 16 )35 ≡ 1 mod 17. By the
Chinese Remainder Theorem, these imply a 560 ≡ 1 mod 561. This is true for all
a relatively prime to 561. Hence 561 is a Carmichael number.
6. The output of the third pseudoprimality test is “inconclusive”, because 7560 ≡ 1

mod 561. Therefore 561 is a pseudoprime to base 7. Note that 561 = 3 · 11 · 17
is not prime. Consider the following decomposition of 7560 − 1:
7560 − 1 = (735 − 1)(735 + 1)(770 + 1)(7140 + 1)(7280 + 1) ≡ 0 mod 561.
Every expression in the brackets is not divisible by 561:
735 − 1 ≡ 240 mod 561, 735 + 1 ≡ 242 mod 561, 770 + 1 ≡ 299 mod 561,
7140 + 1 ≡ 167 mod 561, 7280 + 1 ≡ 68 mod 561.
So the Miller–Rabin test will find 561 composite.

The product of expressions is divisible by 561, because of the presence of zero
divisors 3, 11, 17 of Z561 . More precisely, 735 − 1 is divisible by 3, 735 + 1 is
divisible by 11 and 7280 − 1 is divisible by 17.
7. We use the Rabin–Miller test with b = 2 to prove that n = 294409 is composite.
Obviously gcd(n, 2) = 1.
gap> n:=294409;
294409
gap> s:=0;; t:=n-1;;
gap> while RemInt(t,2)=0 do
> s:=s+1;
> t:=t/2;
> od;
gap> s;
3
gap> t;
36801
gap> 2ˆs*t+1=n;
true
gap> # So our s is 3 and t is 36801.
gap> 2ˆt mod n;
512
gap> # i.e. bˆt is not congruent to 1 and -1 mod n
gap> 2ˆ(2*t) mod n;
262144
gap> # i.e. bˆ2t is not congruent to -1 mod n
gap> 2ˆ(4*t) mod n;
1
gap> # bˆ4t is not congruent to -1 mod n
gap> Hence 2 is a Rabin--Miller witness that n is composite.
8. Suppose p is an odd prime and n = p k , where k > 1. Then by Euler’s theorem

k k k−1
for any a relatively prime to p we have a φ( p ) = a p − p ≡ 1 mod n. If n is a
k
Carmichael number, then also a p −1 ≡ 1 mod n. We have gcd( p k − p k−1 , p k −
1) = p − 1. Hence we can find integers s and t such that s( pk − p k−1 ) + t ( p k −
1) = p − 1. Then we have
k − p k−1 k −1
a p−1 = (a p )s · (a p )t ≡ 1 mod p k .
As k ≥ 2, this, in particular, implies
a p−1 ≡ 1 mod p 2 .
This must be true for all a relatively prime to pk and in particular for a = p − 1.
But using binomial expansion we find that
( p − 1) p−1 ≡ ( p − 1) p + 1 ≡ 1 − p mod p 2 ,
which is not 1 mod p 2 , a contradiction. For p = 2 the argument is similar but

easier. It is left to the reader.

1. gap> p:=100140889442062814140434711571;
100140889442062814140434711571
gap> g:=13;
13
gap> a:=123456789;
123456789
gap> # Alice received the following m from Bob:
gap> m:=92639204398732276532642490482;
92639204398732276532642490482
gap> # Alice has to send gâ to Bob:
gap> PowerMod(g,a,p);
49776677612066280125182950089
gap> # and take as the secret key mâ
gap> PowerMod(m,a,p);
16685041818541498009742672048
2. Firstly, let us find Bob’s decryption exponent d B . For this purpose compute
φ(n B ) = ( p B − 1)(q B − 1).
gap> pB:=8495789457893457345793;;
gap> qB:=98763457697834568934613;;
gap> phi:=(pB-1)*(qB-1);
839073542734369359260764096691906894721352704
We know that e B d B ≡ 1 mod φ(n B ), so

gap> eB:=87697;;
gap> dB:=PowerMod(eB,-1,phi);
259959042568078902255663939554592635205071473
Bob need to decrypt the message (m 1 , s1 ) using his private key d B :

gap> m1:=119570441441889749705031896557386843883475475;;
gap> s1:=443682430493102486978079719507596795657729083;;
gap> nB:=pB*qB;;
gap> m:=PowerMod(m1,dB,nB);
1234567890000000000987654321
gap> s:=PowerMod(s1,dB,nB);
127780754898627768266801147589372430259685176
Bob can verify that message is from Alice by computing s e A mod n A . If the
message is from Alice, then the result will be m, which is indeed the case.
gap> nA:=171024704183616109700818066925197841516671277;;
gap> eA:=1571;;
gap> ms:=PowerMod(s,eA,nA);
1234567890000000000987654321
gap> ms=m;
true
1. We have:
(a) f ◦ g(x) = sin1 x , and g ◦ f (x) = sin x1 ;
√ √
(b) f ◦ g(x) = e x = e x/2 , and g ◦ f (x) = e x .
2. Indeed, Rθ ◦ R2π−θ = id since this composition is a rotation through an angle
of 2π.
3. We have H ◦ H = id.
4. Without loss of generality we may assume that our permutations fixed elements
n − k + 1, n − k + 2, . . . , n. Any such permutation can be identified with per-
mutations on the set {1, 2, . . . , n−k}. Hence there are (n − k)! of them.
5. We have
σ(1) = 5, σ(2) = 1, σ(3) = 6, σ(4) = 2, σ(5) = 7, σ(6) = 3, σ(7) = 8, σ(8) = 4,
hence

12345678
σ= .
51627384
The numbers in the last row are all different, hence this is a one-to-one mapping,
hence a permutation.
6. Since for finite sets one-to-one implies onto, it is enough to prove that π is
one-to-one. Suppose π(k1 ) = π(k2 ). Then 3k1 ≡ 3k2 mod 13, which implies
k1 ≡ k2 mod 13 since 3 and 13 are coprime. Hence π is one-to-one and is a
permutation.
7. Since i 2 ≡ (13 − i)2 mod 13, the mapping is not one-to-one. We have
12 mod 13 = 1, 22 mod 13 = 4, 32 mod 13 = 9, 42 mod 13 = 3,
52 mod 13 = 12, and 62 mod 13 = 10. Therefore 2, 5, 6, 7, 8, 11 are not
in the range of τ , hence it is not onto.
8. We have

123456 123456
ρ= , ρ2 = , ρ3 = id.
345612 561234
Hence

123456
ρ−1 = ρ2 = .
561234
2 −1
τ = id, hence τ = τ .
We also have
123456789
9. (σγ)−1 = . Calculating this with GAP:
913765842
gap> sigma:=PermList([2,4,5,6,1,9,8,3,7]);;
gap> gamma:=PermList([6,2,7,9,3,8,1,4,5]);
gap> mu:=sigma*gamma;;
gap> ListPerm(muˆ-1);
[ 9, 1, 3, 7, 6, 5, 8, 4, 2 ]
thus confirming the result obtained.

10. We have to show that if f and g are permutations on {1, 2, . . . , n}, then f ◦g is also
a permutation. It is enough to prove that it is one-to-one. Suppose not, then for two
distinct elements a, b ∈ {1, 2, . . . , n} we have f ◦ g(a) = f ◦ g(b). This means
that g( f (a)) = g( f (b)). Since g is one-to-one we conclude that f (a) = f (b).
However, f is also one-to-one, which implies a = b, a contradiction.
1. Let π = στ , where σ and τ are disjoint cycles. Suppose σ moves elements of the
set I and τ moves elements of the set J . Since these cycles are disjoint, I and J
have no elements in common. Let K = {1, 2, . . . , n} \ (I ∪ J ). Then π(i) = σ(i)
for i ∈ I , π( j) = τ ( j) for j ∈ J and π(k) = k for k ∈ K . We obtain exactly the
same result for π = τ σ.
2. The calculation shows

1 2 3 4 5 6 7 8 9 10 11 12
π= = (1 3 9)(2 6 5)(4 12 10)(7 8 11).
3 6 9 12 2 5 8 11 1 4 7 10
This can also be done with GAP:

gap> s:=[1..12];;
> s[i]:=3*s[i] mod 13;
> od;
gap> s;
[ 3, 6, 9, 12, 2, 5, 8, 11, 1, 4, 7, 10 ]
gap> PermList(s);
(1,3,9)(2,6,5)(4,12,10)(7,8,11)
3. (1 4 3)(2 5).

123456789
4. (στ )−1 = = (1 6 7 5 4 8)(2 9 3).
692847513
1. (a) Since
123456789
σ= = (1 5)(2 3 6)(4 7 8 9),
536712894
the order of σ is lcm(2, 3, 4) = 12.

(b) τ = (1 2)(2 3 4)(4 5 6 7)(7 8 9 10 11) = (1 3 5 6 8 9 10 11 7 4 2) so τ is
actually a cycle of length 11, so its order is 11.
2. Let J be the Josephus permutation.
(a) gap> j:=[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 1, 5, 10, 14,
19, 23, 28, 32, 37, 41, 7, 13, 20, 26, 34, 40, 8, 17, 29, 38, 11, 25, 2,
22, 4, 35, 16, 31 ];;
gap> J:=PermList(j);
(1,3,9,27,26,20,28,34,11,33,38,4,12,36,2,6,18,19,23,41,31,17,14)(5,15)
(7,21,32,29,40,16,10,30,8,24)(13,39,35,25)(22,37)
(b) In which position did Josephus stand around the circle?

gap> 41ˆJ;
31
So Josephus stood 31st.

(c) What is the order of the Josephus permutation?
gap> Order(J);
460
(d) Calculate σ 2 and σ 3 .

gap> Jˆ2;
(1,9,26,28,11,38,12,2,18,23,31,14,3,27,20,34,33,4,36,6,19,41,17)
(7,32,40,10,8)(13,35)(16,30,24,21,29)(25,39)
gap> Jˆ3;
(1,27,28,33,12,6,23,17,3,26,34,38,36,18,41,14,9,20,11,4,2,19,31)(5,15)
(7,29,10,24,32,16,8,21,40,30)(13,25,35,39)(22,37)
3. The mapping i → 13i mod 23 is one-to-one mapping of S22 into itself since 13
and 23 are relatively prime. Now
gap> list:=[1..22];;
> list[i]:=13*i mod 23;
> od;
gap> list;
[ 13, 3, 16, 6, 19, 9, 22, 12, 2, 15, 5, 18, 8, 21, 11, 1, 14, 4, 17, 7, 20, 10 ]
gap> PermList(list);
(1,13,8,12,18,4,6,9,2,3,16)(5,19,17,14,21,20,7,22,10,15,11).
The order of this permutation is lcm(11, 11) = 11.


1. The permutation corresponding to the shuffle will be:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
= (1 6)(2 7)(3 8)(4 9)(5 15).
6 7 8 9 15 1 2 3 4 10 11 12 13 14 5
The order of this permutation is 2, so repeating this shuffle twice will bring cards
to the initial order.
2. We know that the interlacing shuffle is defined by the equation σ(i) = 2i mod 105.
Thus we have:
gap> lastrow:=[1..104];
[ 1 .. 104 ]
> lastrow[i]:=2*i mod 105;
> od;
gap> s:=PermList(lastrow);
(1,2,4,8,16,32,64,23,46,92,79,53)(3,6,12,24,48,96,87,69,33,66,27,54)(5,10,20,
40,80,55)(7,14,28,56)(9,18,36,72,39,78,51,102,99,93,81,57)(11,22,44,88,71,37,
74,43,86,67,29,58)(13,26,52,104,103,101,97,89,73,41,82,59)(15,30,60)(17,34,68,
31,62,19,38,76,47,94,83,61)(21,42,84,63)(25,50,100,95,85,65)(35,70)(45,90,
75)(49,98,91,77)
gap> Order(s);
12
For this deck of cards this shuffle is very bad.

3. We assume, first, that each beetle has a number 1, 2, . . . , n and each carry a
little coloured flag and all flags are of a different colour. Suppose that when any
two beetles meet, they exchange their flags. Now all flags move with the same
constant speed without ever changing their directions, so after a certain time t they
will occupy their initial positions. This means that the beetles will also occupy
their initial positions but now in place of beetles 1, 2, . . . , n we will find beetles
i 1 , i 2 , . . . , i n . We will take time t as a unit of time. Hence, every unit time interval
the beetles exchange places according to the permutation

1 2 3 ... n
σ=
i1 i2 i3 . . . in
When k units of time pass, they will exchange places according to the permutation
σ k . If σ is the product of m disjoint cycles of length 1 , 2 , . . . , m , respectively,
then σ is the identity permutation for = lcm(1 , 2 , . . . , m ) being the order
of σ. Hence after units of time all beetles will occupy their initial positions.
1. Using Eq. (3.5) we get
(1 3 7)(5 8)(2 4 6 9) = (1 3)(1 7)(5 8)(2 4)(2 6)(2 9),
(1 3 7)(5 7 8)(2 3 4 6 9) = (1 3)(1 7)(5 7)(5 8)(2 3)(2 4)(2 6)(2 9).
Another representation can be obtained if we first represent the permutation as a

product of disjoint cycles:
(1 3 7)(5 7 8)(2 3 4 6 9) = (1 4 6 9 2 3 8 5 7) = (1 4)(1 6)(1 9)(1 2)(1 3)(1 8)(1 5)(1 7).
2. It is odd. We can prove by induction that the product of an odd number of odd
permutations is odd. Suppose this is true for any 2n−1 odd permutations. Consider
the product = π1 . . . π2n+1 . We can write it as
= (π1 . . . π2n−1 )(π2n π2n+1 ).
The induction hypothesis gives us that the first bracket is odd and the second, by
Theorem 3.1.6(ii), is even. Then is even by Theorem 3.1.6(iii).
3. We must consider four cases, here we will consider only one: π is even and ρ is
odd. By Theorem 3.1.6(iv) ρ−1 is also odd. Then, by Theorem 3.1.6, ρ−1 π is odd
and ρ−1 πρ is even. Hence π and ρ−1 πρ have the same parity. The other three
cases are similar.
4. By the previous exercise π −1 ρ−1 π has the same parity as ρ. Hence, by
Theorem 3.1.6, π −1 ρ−1 πρ is an even permutation.
5. If n = 2k, this permutation is a product
(1 n)(2 n − 1) . . . (k k + 1).
If n = 2k + 1, then
(1 n)(2 n − 1) . . . (k k + 2).
Hence the parity of this permutation is the parity of the number n2 .
1. The two positions differ only by a switch of neighboring squares 10 and 14. In
this case the corresponding permutations will be of different parities, hence only
one of them is realizable.
2. Calculating the corresponding permutations in GAP:
gap> first:=[1,3,2,4,6,5,7,8,9,13,15,11,14,10,12,16];
[ 1, 3, 2, 4, 6, 5, 7, 8, 9, 13, 15, 11, 14, 10, 12, 16 ]
gap> s:=PermList(first);
(2,3)(5,6)(10,13,14)(11,15,12)
This one is even, hence the first position is realisable.

gap> second:=[13,16,5,3,9,2,7,10,1,15,14,8,12,11,6,4];
[ 13, 16, 5, 3, 9, 2, 7, 10, 1, 15, 14, 8, 12, 11, 6, 4 ]
gap> t:=PermList(second);
(1,13,12,8,10,15,6,2,16,4,3,5,9)(11,14)
This one is odd, hence the second position is not realisable.

1. It is a binary operation since, if a and b are two non-zero real numbers, a b =

a : b is also a non-zero real number. We have
a ac
(a b) c = , a (b c) = ,
bc b
which are different, if |c| = 1. Hence this operation is not associative.
2. If e is a neutral element, then we must have e a = a e = a for all a ∈ R+ .
This means ea = a e = a for all a. Since for e > 1 the function e x grows faster
than x e and for e < 1 the function e x grows slower than x e , the only option is
e = 1, which is also impossible.
3. Obviously the identity element 1 is in Cn . If z 1 and z 2 are any two roots of unity,
that is |z 1 | = |z 2 | = 1, then |z 1 z 2 | = |z 1 ||z 2 | = 1, so their product also lies in
Cn . Hence multiplication is an algebraic operation on Cn . The associative law
for it follows from the properties of multiplication in C. Also, if z is a root of
unity, |z −1 | = |z|−1 = 1 and hence z −1 also belongs to Cn . Hence Cn is a group
relative to the operation of multiplication. (It is a subgroup of the multiplicative
group C∗ of C.) We know there are exactly n roots of degree n of unity:
2iπ 2iπ
ψi = cos + sin , i = 0, 1, 2, . . . , n − 1.
n n
4. A matrix A is invertible iff det( A) = 0. If A, B are two invertible matrices, then

det(AB) = det( A) det(B) = 0 and hence AB is also invertible. The inverse of an
invertible matrix is also invertible and the identity matrix In is invertible. Thus
GLn (R) contains the identity, inverses and its multiplication is associative. Hence
GLn (R) is a group.
5. We have to use the associative law twice:
(g1 g2 )(g3 g4 ) = (g1 g2 )g3 )g4 = (g1 (g2 g3 ))g4 .
The following are all possible arrangements of brackets on the product g1 g2 g3 g4 :
g1 ((g2 g3 )g4 )) = g1 (g2 (g3 g4 )) = (g1 g2 )(g3 g4 ) = (g1 g2 )g3 )g4 = (g1 (g2 g3 ))g4 .
Let us now prove that all arrangements of brackets on g1 g2 . . . gn will give us

the same element as (g1 g2 . . . gn )r = (. . . ((g1 g2 )g3 ) . . .)gn . Suppose now that
the bracket arrangement is arbitrary, this gives us the product uv, where u is
g1 g2 . . . gk with some arrangement of brackets and v is gk+1 gk+2 . . . gn also with
some arrangement of brackets. We have |u| = k and |v| = n − k, where by |w|
we denote the number of group elements involved.
If |v| = 1, then the statement follows from the induction hypothesis. If |v| > 1,
then we have v = v1 v2 , where |v1 | < |v| and |v2 | < |v|. Then we apply the
associative law as follows:
uv = u(v1 v2 ) = (uv1 )v2 = u v ,
where |v | < |v|. By induction hypothesis uv = (g1 g2 . . . gn )r as required.

1. Since in Zn
n
ord (i) = ,
gcd(i, n)
we calculate that the orders of the elements 5, 1331, and 594473 will be 16427202,
12342, and 7986, respectively. Indeed,
gap> n:=16427202;; i1:=5;; i2:=1331;; i3:=594473;;
gap> order1:=n/GcdInt(i1,n);order2:=n/GcdInt(i2,n);order3:=n/GcdInt(i3,n);
16427202
12342
7986
2. Using the formula as in the previous exercise we see that to have order 7 the
element i must satisfy gcd(i, 84) = 84/7 = 12. We have six such elements: 12,
24, 36, 48, 60, 72.
3. The order of i ∈ Zn is calculated as
n
ord (i) = = 87330619392.
gcd(i, n)
Here is the calculation:

gap> n:=563744998038700032;; i:=41670852902912;;
gap> gcd:=GcdInt(n,i);
6455296
gap> order:=n/gcd;
87330619392
4. This is a group of order 4 but each non-zero element has order 2. Hence all cyclic
subgroups have order 2 and the group is not cyclic.
5. We know that σn (i) = 2i mod 2n + 1 Suppose σnk = id for some k. Then
2k i ≡ i mod 2n + 1 for all i including those which are relatively prime to
2n + 1. By Lemma 1.3.3(d) this is equivalent to 2k ≡ 1 mod 2n + 1. Hence the
order of σn is equal to the order of 2 in Z∗2n+1 .

1. We have gg−1 = e, where e is the identity element of G. We know from the proof
of Theorem 3.2.3 that σ(e) = , where is the identity of H . Applying σ to gg−1 =
e and using the property that σ(gh) = σ(g)σ(h) we obtain: σ(g)σ(g −1 ) = . This
means that σ(g −1 ) = σ(g)−1 .
2. In Exercise 1 of Sect. 3.2.1 we defined roots of unity ψi . It is straightforward to
check that
ψi ψ j = ψi⊕ j .
This makes the mapping i → ψi an isomorphism.

3. We define a mapping τ : C∗ → G as

a −b
τ (a + bi) = .
b a
This mapping is one-to-one and onto. Also let z 1 = a + bi and z 2 = c + di. Then
z 1 z 2 = (a + bi)(c + di) = (ac − bd) + (ad + bc)i
and

a −b c −d ac − bd −ad − bc
= ,
b a d c ad + bc ac − bd
which means τ (z 1 z 2 ) = τ (z 1 )(z 2 ) and τ is an isomorphism.

4. Both 191 and 193 are primes, hence |G 1 | = 190 and |G 2 | = 192. The second
number is not divisible by 19 and the first is. So G 2 has no elements of order 19
but G 1 has. Let g be a generator of G 1 . Then element h will have order 19 iff
gcd(h, 190) = 10. Hence elements k · 10 will have order 19 for k = 1, . . . , 18,
in total 18 such elements.
210 210
5. ord (2150 ) = = = 7.
gcd(150, 210) 30
270 270
6. ord (26472 ) = = = 15.
gcd(270, 72) 18

1. If det( A) = det(B) = 1, then det(AB) = det( A) det(B) = 1, and if det(A) = 1,
then det( A−1 ) = det(A)−1 = 1. Finally, we notice that det(In ) = 1. All require-
ments of a subgroup are satisfied: SLn (R) is closed under the multiplication,
under inverses, and contains the identity element. Hence SLn (R) is a subgroup of
GLn (R).
2. If m is a divisor of n, then z m = 1 implies z n = 1. Hence Cm ⊆ Cn . Since Cm is
known to be a group on its own, it will be a subgroup of Cn .
3. An element g i is a generator of < g > iff ord (g i ) = n. Since
n
ord (g i ) =
gcd(i, n)
for g i to be a generator it is necessary and sufficient to have gcd(i, n) = 1. We

have exactly φ(n) such numbers i.
4. Suppose that for no element 1 = g ∈ G we have g 2 = 1. Then for no element
g = 1 of G we have g = g −1 . Hence we can split the whole set G \ {1} into
disjoint pairs {g, g −1 }. Then G \ {1} has an even number of elements and G has
an odd number of elements. This is a contradiction.
5. Suppose G is a subgroup of C∗ and |G| = n. Then, by Corollary 3.2.3, g n = 1
for every g ∈ G. This implies G = Cn since elements of G must then coincide
with n roots of unity in C.
1. The discriminant of the cubic X 3 + 4X + 11 is zero, so the first equation does

not define an elliptic curve; the discriminant of X 3 + 6X + 11 is 3, so the second
equation does define an elliptic curve over Z13 .
2. Compare the coefficients of the polynomials in the right-hand side and in the
left-hand side of the equation.
3. Direct calculation.
4. (a) Direct calculation.
(b) With every point (x, y) there must also be the point −(x, y) = (x, −y). This
gives us another five points on E, namely (1, 6), (2, 4), (3, 6), (4, 5), (6, 5).
Also the point at infinity ∞.
(c) −(2, 3) = (2, 4), 2(4, 2) = (6, 5), (1, 1) + (3, 1) = (3, 6).
(d) GAP shows that we found all points on E:
gap> G:=EllipticCurveGroup(1,-1,7);
EllipticCurveGroup(1,-1,7)
gap> AsList(G);
[ ( 1, 1 ), ( 1, 6 ), ( 2, 3 ), ( 2, 4 ), ( 3, 1 ), ( 3, 6 ), ( 4, 2 ),
( 4, 5 ), ( 6, 2 ), ( 6, 5 ), infinity ].
5. (a) Generating this elliptic curve with GAP:

gap> AsList(G);
[ ( 0, 1 ), ( 0, 12 ), ( 3, 2 ), ( 3, 11 ), ( 6, 0 ), ( 11, 3 ),
( 11, 10 ), infinity ]
gap> Order(G);
8
(b) Since the order of E is 8, Lagrange’s Theorem tells us that the order of P
is a factor of 8, i.e., 2 or 4 or 8. Then: 2(0, 1) = (0, 1) + (0, 1) = (3, 11),
4(0, 1) = (3, 11) + (3, 11) = (6, 0), and 8(0, 1) = (6, 0) + (6, 0) = ∞, so
the order of P is 8 and the group G is a cyclic group with P as generator.
6. gap> p:=46301;;
gap> Order(G);
46376
gap> IsCyclic(G);
true

1. The complete addition table is:
+ ∞ (2,0) (3,2) (3,3) (4,1) (4,4)
∞ ∞ (2,0) (3,2) (3,3) (4,1) (4,4)
(2,0) (2,0) ∞ (4,1) (4,4) (3,2) (3,3)
(3,2) (3,2) (4,1) (3,3) ∞ (4,4) (2,0)
(3,3) (3,3) (4,4) ∞ (3,2) (2,0) (4,1)
(4,1) (4,1) (3,2) (4,4) (2,0) (3,3) ∞
(4,4) (4,4) (3,3) (2,0) (4,1) ∞ (3,2)
Firstly, we note that (4, 4) = −(4, 1) so they have the same order. We have
2(4, 1) = (3, 3), 3(4, 1) = (2, 0), 4(4, 1) = (3, 2), 5(4, 1) = (4, 4), 6(4, 1) = ∞
so the order of both (4, 1) and (4, 4) is 6.

2. In any field x 2 = (−x)2 , hence the following are all quadratic residues of Z17 :
12 = 1, 22 = 4, 32 = 9, 42 = 16, 52 = 8, 62 = 2, 72 = 15, 82 = 13.
The answer is: {1, 2, 4, 8, 9, 13, 15, 16}.

3. The number of points N on such an elliptic curve by Hasse’s theorem will be in
the range
√ √
2012 − 2 2011 ≤ N ≤ 2012 + 2 2011
√
As 2011 ≈ 44.84417465, we see that 1923 ≤ N ≤ 2101.
4. We have three cases to consider:
(a) The product of two quadratic residues is a quadratic residue. Indeed, if
g = g12 and h = h 21 , then gh = (g1 h 1 )2 . The inverse of a quadratic residue
is again a quadratic residue. Indeed, if g = h 2 , then g −1 = (h −1 )2 .
(b) Suppose now that the product of a quadratic residue g and a quadratic non-
residue h is a quadratic residue k. Then h = g −1 k is a quadratic residue due
to the two observations made above, a contradiction. Hence the product of
a quadratic residue and a quadratic non-residue is a quadratic non-residue.
(c) Let g be an arbitrary quadratic non-residue. Let us consider the function
σ : h → hg from Z∗p to itself. Since g is invertible, this is a permutation of
Z∗p . Let q = ( p−1)/2. Then Theorem 3.3.3 tells us that there are q quadratic
residues, let us denote their set by R, and q quadratic non-residues, let us
denote their set by N . Then we have shown that σ maps R into N . Since σ
is invertible and R and N have the same cardinality, then σ maps R onto N .
In such a case σ must map N onto R, which means that the product of any
quadratic non-residue and g is a quadratic residue. Since g is arbitrary we
have proved that the product of any two quadratic non-residues is a quadratic
residue.
5. Since Z p is a field, for every nonzero c ∈ Z p the equation cx = a has a solution c

in Z p which is also nonzero. Also c = c since a is a non-residue. Thus all nonzero
elements of Z p are split into disjoint pairs (c1 , c1 ), . . . , (c( p−1)/2 , c( p−1)/2 ) such
that ci ci = a. Then by Wilson’s theorem
( p−1)/2

p−1
a 2 = ci · ci = ( p − 1)! = −1.
i=1
6. We generate numbers at random and then use Euler’s criterion to check if it is a

quadratic non-residue or not. In fact, the first attempt gives us a non-residue and
the second gives a quadratic residue:
gap> a:=Random([1..2ˆ28]);
153521494
gap> PowerMod(a,(p-1)/2,p);
359334085968622831041960188598043661065388726959079836
# This is actually p-1 so 153521494 is a non-residue.
199280309
1
# And this shows that 199280309 is a quadratic residue.
1. (Your random point might be different.)

gap> G := EllipticCurveGroup(1234,17,346111);
gap> P := Random(G);
( 22154, 139681 )
2. gap> Pˆ123;
( 208576, 85861 )
3. 1729 = 110110000012 , so GAP will first perform 10 additions to calculate
2 · P, 4 · P, . . . 1024 · P, and then a further 4 additions to compute the sum
P + 64 · P + 128 · P + 512 · P + 1024 · P.
4. gap> Order(P);
346543
5. gap> Size(G);
346543
# The order of P coincides with the order of G so P is a generator.
1. We generate numbers at random and then use Euler’s criterion to check if it is a

quadratic non-residue or not. In fact, the first attempt gives us a non-residue and
the second gives a quadratic residue:
153521494
359334085968622831041960188598043661065388726959079836
# This is actually p-1 so 153521494 is a non-residue.
199280309
1
# And this shows that 199280309 is a quadratic residue.
gap> b:=RootMod(a,p);
286534778672701806664621728123564904392266164296221884
gap> a=bˆ2 mod p;
true
# So indeed b is a square root of a.
2. We make the following steps:

CHRISTMAS --> [ CHR, IST, MAS ] --> [ 131828, 192930, 231129 ] -->
[ (1318281, 15879309), (1929301, 3765260), (2311294, 6775980) ].
Here is the calculation:

gap> p:=17487707;;
gap> m:=[ 131828, 192930, 231129 ];;
gap> x1:=m[1]*10;
1318280
gap> f1:=(x1ˆ3+123*x1+456) mod p;
7287640
gap> RootMod(f1,p);
fail
gap> x1:=x1+1;
1318281
gap> f1:=(x1ˆ3+123*x1+456) mod p;
5117601
gap> RootMod(f1,p);
15879309
gap> x2:=m[2]*10;
1929300
gap> f2:=(x2ˆ3+123*x2+456) mod p;
2898698
gap> RootMod(f2,p);
fail
gap> x2:=x2+1;
1929301
gap> f2:=(x2ˆ3+123*x2+456) mod p;
3728942
gap> RootMod(f2,p);
3765260
gap> x3:=m[3]*10;
2311290
gap> f3:=(x3ˆ3+123*x3+456) mod p;
14098022
gap> RootMod(f3,p);
fail
gap> x3:=x3+1;
2311291
gap> f3:=(x3ˆ3+123*x3+456) mod p;
16049134
gap> RootMod(f3,p);
fail
gap> x3:=x3+1;
2311292
gap> f3:=(x3ˆ3+123*x3+456) mod p;
14380285
gap> RootMod(f3,p);
fail
gap> x3:=x3+1;
2311293
gap> f3:=(x3ˆ3+123*x3+456) mod p;
9091481
gap> RootMod(f3,p);
fail
gap> x3:=x3+1;
2311294
gap> f3:=(x3ˆ3+123*x3+456) mod p;
182728
gap> RootMod(f3,p);
6775980
gap>
1. I will first show how the message was encrypted and then show how to decrypt it.
You have to do the opposite: first decrypt the message and then encrypt a message
of your own.
gap> Read("elliptic.gd");
gap> Read("elliptic.gi");
gap> # Defining the curve:
gap> P:=Random(G);
( 91478, 65942 )
gap> # Encoding the message "I’m nobody. Who are you?"
gap> M:=[0,0,0,0,0,0,0,0,0,0,0,0];
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
gap> M[1]:=EllipticCurvePoint(FamilyObj(P),[1942,37617]);
( 1942, 37617 )
( 2341, 44089 )
( 2425, 89535 )
( 1225, 46279 )
( 1435, 60563 )
( 43410, 66195 )
( 3318, 58656 )
( 25413, 63045 )
( 1128, 14737 )
( 1541, 72018 )
( 3525, 29201 )
( 3145, 46983 )
gap> M;
[ ( 1942, 37617 ), ( 2341, 44089 ), ( 2425, 89535 ), ( 1225, 46279 ),
( 1435, 60563 ), ( 43410, 66195 ), ( 3318, 58656 ), ( 25413, 63045 ),
( 1128, 14737 ), ( 1541, 72018 ), ( 3525, 29201 ), ( 3145, 46983 ) ]
gap> # In M[6] and M[8] we had to add an additional fifth digit in order
gap> # to get a point.
gap> # These are in the public domain:

gap> Q:=EllipticCurvePoint(FamilyObj(P),[88134,77186]);
( 88134, 77186 )
gap> QkA:=EllipticCurvePoint(FamilyObj(P),[27015, 92968]);
( 27015, 92968 )
gap> # All Bob needs for encryption is QkA which is in the public domain.
gap> C:=[0,0,0,0,0,0,0,0,0,0,0,0];
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ]
> C[i]:=[P,P];
> s:=Random([1..(p-1)]);
> C[i][1]:=Qˆs;
> C[i][2]:=M[i]*(QkA)ˆs;
> od;
gap> C;
[ [ ( 87720, 6007 ), ( 59870, 82101 ) ], [ ( 34994, 7432 ), ( 36333, 86213 ) ],
[ ( 50702, 2643 ), ( 33440, 56603 ) ], [ ( 34778, 12017 ), ( 81577, 501 ) ],
[ ( 93385, 52237 ), ( 38536, 21346 ) ], [ ( 63482, 12110 ), ( 70599, 87781 ) ],
[ ( 16312, 46508 ), ( 62735, 69061 ) ], [ ( 64937, 58445 ), ( 41541, 36985 ) ],
[ ( 40290, 45534 ), ( 11077, 77207 ) ], [ ( 64001, 62429 ), ( 32755, 18973 ) ],
[ ( 81332, 47042 ), ( 35413, 9688 ) ], [ ( 5345, 68939 ), ( 475, 53184 ) ] ]
gap> # Now Alice decrypts this message using her private key kA
gap> kA:=373;
373
> M1[i]:=C[i][2]*((C[i][1])ˆkA)ˆ-1;
> od;
gap> M1;
[ ( 1942, 37617 ), ( 2341, 44089 ), ( 2425, 89535 ), ( 1225, 46279 ),
( 1435, 60563 ), ( 43410, 66195 ), ( 3318, 58656 ), ( 25413, 63045 ),
( 1128, 14737 ), ( 1541, 72018 ), ( 3525, 29201 ), ( 3145, 46983 ) ]
gap> # Here we have to ignore any fifth digit in the x-component which occurs.
gap> # Alice reads the message as "I’m nobody. Who are you?" which is the
gap> # first line of the following poem by Emily Dickinson:
I’m Nobody. Who are you?

Are you - Nobody - Too?
Then there’s a pair of us?
Don’t tell! They’d advertise - you know!
How dreary - to be - Somebody!

How public - like a Frog -
To tell one’s name - the livelong June -
To an admiring Bog!
1. A negative of 1 does not exist in Q+ so F3 is violated.

√ inverse of 2 does not exist in Z, hence F9 is violated.
2. The (multiplicative)
3. To prove that Q(√ 2) is a field
√ we have to show that it contains 1, which is true −1
√ 1 + 0 2, that Q( 2) is closed under√the multiplication,
since 1 =

and
√ that a
is in Q( 2) whenever a is. Suppose a = x + y 2 and b = x + y 2. Then
√ √ √
ab = (x + y 2)(x + y 2) = (xy + 2x y ) + (xy + x y) 2
√ √
and Q( 2) is√closed under the multiplication. To show that a = x + y 2 has an
inverse in Q( 2) we calculate the product
√ √
(x + y 2)(x − y 2) = (x 2 − 2y 2 )
and observe
√ that it is in Q. We also observe that for a = 0 we have x 2 − 2y 2 √
= 0,
x y
since 2 is irrational. Thus we have aa−1 = 1 for a −1 = x 2 −2y 2 − 2 2 2.
√ √ −1 √ √ x −2y√
4. In Q( √2) we will have (2 − 3) = 2 + 3. Further, x = (2 + 3)(1 + 3) =
5 + 3 3.
5. All standard linear algebra techniques for finding such a solution works: you can
find the solution by Gaussian elimination or by calculating the inverse of the
matrix of this system of linear equations. Here we show how this can be solved
with GAP:
gap> A:=[[3,1,4],[1,2,1],[4,1,4]];
[ [ 3, 1, 4 ], [ 1, 2, 1 ], [ 4, 1, 4 ] ]
gap> b:=[1,2,4];
[ 1, 2, 4 ]
gap> Determinant(A) mod 5;
3
# Hence the matrix is invertible.
gap> Aˆ-1 mod 5;
[ [ 4, 0, 1 ], [ 0, 2, 2 ], [ 1, 2, 0 ] ]
# The solution can now be calculated as
gap> Aˆ-1*b mod 5;
[ 3, 2, 0 ]
# Thus we have a unique solution x=3, y=2, z=0.

4. We have to check, firstly, that S is an abelian group. We will use properties of
the transpose. If A and B are symmetric, then ( A + B)T = A T + B T = A + B
and their sum is also symmetric. Hence the set of symmetric matrices is closed
under the addition. It is also easy to see that the zero matrix is symmetric and,
if A is symmetric, then −A is also symmetric. Also we have to note that if A is
symmetric, then λ A, where λ is a scalar is also symmetric. The dimension of S
will be 21 n(n + 1). A basis for F can be taken to consist of all diagonal matrix
units E ii and of all matrices E ij + E ji for i = j.
5. Let us prove first that (V, ⊕) is an abelian group. The associative law for ⊕
follows from the associative law of multiplication. 1 is obviously the zero of this
abelian group since u ⊕ 1 = u. Also u ⊕ u −1 = 1, hence −u = u −1 .
Let us check the further axioms. We have:
1 u = u 1 = u,
(ab) u = u ab = ((u b )a = a (b u),
(a + b) u = u a+b = u a u b = a u ⊕ b u,
a (u ⊕ v) = (uv)a = u a v a = a u ⊕ a v.
Thus (V, ⊕, ) is a vector space.


1. We use GAP to factorise n 1 and n 2 :
gap> n1:=449873499879757801;;
gap> n2:=449873475733618561;;
gap> FactorsInt(n1);
[ 670726099, 670726099 ]
gap> FactorsInt(n2);
[ 12347, 12347, 54323, 54323 ]
We see that n 1 = 6707260992 is a power of a prime p = 670726099. Hence

Z p will be contained in GF(n 1 ) and the dimension of GF(n 1 ) over Z p will be
log p (n 1 ) = 2.
Also n 2 = 123472 · 543232 is not a power of a prime. Hence GF(n 2 ) does not
exist.
2. Obviously the zero is a solution to this equation. Suppose a = 0. Then a belongs
to the multiplicative group F ∗ of F which has q − 1 element. By Corollary 3.2.3,
a q−1 = 1 and, in particular, a q = a.
1. Let g, h, k be elements of G of orders 3, 5, 7, respectively. Then, by Corol-

lary 4.2.1, for the product a = ghk we have ord (a) = 3 · 5 · 7 = 105. Since
ord (a) = |G|, the group G =< a > is cyclic.
2. Let g, h, k be elements of a finite abelian group G of orders 183618, 131726,
127308, respectively. We have to use g, h, k to construct an element x of G of
order 1018264646281. The prime factorisations of the numbers involved are as
follows:
gap> ordx:=1018264646281;
1018264646281
gap> FactorsInt(ordx);
[ 97, 97, 101, 101, 103, 103 ]
gap> ordg:=183618;
183618
gap> FactorsInt(ordg);
[ 2, 3, 3, 101, 101 ]
gap> ordh:=131726;
131726
gap> FactorsInt(ordh);
[ 2, 7, 97, 97 ]
gap> ordk:=127308;
127308
gap> FactorsInt(ordk);
[ 2, 2, 3, 103, 103 ]
As the order of the sought element x is 972 · 1012 · 1032 we need to construct
elements x1 , x2 , x3 ∈ G of orders 972 , 1012 , 1032 , respectively. Since the orders
of g, h, k are 18 · 1012 , 14 · 972 , 12 · 1032 , we can take x1 = h 14 , x2 = g 18 ,
x3 = k 12 . Then x = x1 x2 x3 .

1. Let a be the generator of Z∗p . We factorise p − 1, which is the order of Z∗p and
both 11561 and 58380.
gap> p:=192837481;;
gap> FactorsInt(p-1);
[ 2, 2, 2, 3, 5, 11, 139, 1051 ]
[ 11, 1051 ]
[ 2, 2, 3, 5, 7, 139 ]
We see that 58380 does not divide p − 1, hence by Corollary 3.2.3 an element of
this order cannot be in Z∗p .
Let us calculate ( p − 1)/11561 = 16680. Then ord (a 16680 ) = 11561 by
Lemma 4.2.4. So an element of order 11561 exists.
2. Let us divide n by m with remainder: n = qm + r with 0 ≤ r < m. Then
p n − 1 = p n−m ( p m − 1) + ( p n−m − 1),
that is gcd( p n − 1, p m − 1) = gcd( p n−m − 1, p m − 1) from which we obtain

gcd( p n − 1, p m − 1) = gcd( pr − 1, p n − 1). This means that when we divide
p n − 1 by p m − 1 the remainder will be pr − 1. The statement follows from here.
3. Suppose now GF( p n ) contains a subfield F of cardinality p m . All elements of
n
the first field satisfy the equation x p −1 = 1 and elements of the second satisfy
m
x p −1 = 1. Let g be a primitive element of the smaller field F. Then ord (g) =
p m − 1. Since it lies in the multiplicative subgroup GF( pn )∗ ,whose cardinality
is p n − 1, the order of g must divide p n − 1 (Corollary 3.2.3). Hence p m − 1
divides p n − 1 and m | n by the previous exercise.
Suppose now that F = GF( p n ) and m | n. Then p n − 1 = k( p m − 1) and in
the multiplicative group F ∗ there are elements of order p m − 1. Indeed, if g is
a primitive element of F ∗ , then ord (g k ) = p m − 1. Let h = g k . The subgroup
G =< h > then contains p m − 1 elements. All elements of G and only those
m
satisfy the equation x p − 1 = 0 (indeed this equation may have no more than
p m − 1 roots). We need to show that G is a subfield. Since it closed under inverses
and under multiplication we only have to show that it is closed under addition. Let
x, y ∈ G. Then binomial theorem gives us (x + y) p = x p + y p and by induction
m m m m m m
(x + y) p = x p + y p . Since x p = y p = 1, we have (x + y) p = 1, i.e.
x + y ∈ G. This finishes the proof.

1. The number of primitive elements in the field Z1237 is the number of generators of
its cyclic multiplicative group of order 1236, which is φ(1236) = φ(22 ·3·103) =
1 2 102
1236 · · · = 408.
2 3 103
2. (a) We have 2 mod 17 = 1 and 38 mod 17 = 16 = −1. Hence 2 is not primitive
8
and 3 is a primitive element of F. Let us set g = 3.

(b) Since g = 3 is a primitive element of Z17 all its powers of 3 in the following
table are different:
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3n 3 9 10 13 5 15 11 16 14 8 7 4 12 2 6 1
Therefore the table of logarithms to base 3 will be:
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
log3 (n) 16 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8
3. Let us note first that by the definition of the discrete log we have g logg (x) = x. To
prove
logg (ab) = logg (a) + logg (b) mod p − 1 (11.4)
we take g, the primitive element to the power k = logg (ab) and to the power
m = logg (a) + logg (b) mod p − 1. Since all powers g i are different for i =
0, 1, . . . , p−2 we get g k = g m if and only if k = m since k, m ∈ {0, 1, . . . , p−2}.
Since g p−1 = 1 we have
g m = g logg (a)+logg (b) = g logg (a) · g logg (b) = ab.
Also, g k = g logg (ab) = ab, which proves (11.4).
1. Bob’s secret exponent k B can be easily calculated. Indeed, since 25 = 32 we have

k B = 5. Then the message M can be calculated as
M = C2 /C1k B = C2 /C15 = 42 30−5 = 42 235 = 12.
Hence Alice sent the letter “B” to Bob.

2. All Bob needs to know is his private key k B = 5191 and p. The calculation may
be performed as follows:
gap> kB:=5191;
5191
gap> p:=123456789987654353003;
123456789987654353003
gap> c:= [ [ 83025882561049910713, 66740266984208729661 ],
[ 117087132399404660932, 44242256035307267278 ],
[ 67508282043396028407, 77559274822593376192 ],
[ 60938739831689454113, 14528504156719159785 ],
[ 5059840044561914427, 59498668430421643612 ],
[ 92232942954165956522, 105988641027327945219 ],
[ 97102226574752360229, 46166643538418294423 ] ]
gap> m:=[0,0,0,0,0,0,0];;
> m[i]:=(c[i][2]*(PowerMod(c[i][1],kB,p))ˆ-1) mod p;
> od;
gap> m;
[ 19244117112225192941, 16191522142944411631, 22224125164116222533,
15282944412628192319, 30193215411522152315, 24302941141124131541,
16252841182531282943 ]
gap> # Which reads:"In Galois fields, full of flowers, primitive elements
gap> # dance for hours."

1. We use long division:
4x 2 + 2x
3x 2 + 2x + 1| 5x 4 + x 2 + 3x + 4
5x 4 + x3 + 4x 2
−x 3 − 3x 2 + 3x
−x 3 − 3x 2 − 5x
x +4
We see that the quotient is 4x 2 + 2x and the remainder is x + 4,

i.e., (5x 4 + x 2 + 3x + 4 = (3x 2 + 2x + 1)(4x 2 + 2x) + x + 4.
2. We just evaluate f (a) at each a ∈ Z5 .
a 01234
f (a) 1 3 0 0 0
So the roots of f are 2, 3 and 4; it has factors x + 3, x + 2 and x + 1.

By long division (omitted), we find
f (x) = (x + 1)(x 3 + x 2 + x + 1)
= (x + 1)(x + 2)(x 2 + 4x + 3)
= (x + 1)(x + 2)(x + 3)(x + 1)
= (x + 1)2 (x + 2)(x + 3).

1. We use k = 2 and α0 = 1, α1 = 2, α2 = 3, β0 = β1 = 1 and β2 = 2. Lagrange’s
interpolation formula then gives:
(x − α1 )(x − α2 ) (x − α0 )(x − α2 ) (x − α0 )(x − α1 )

f (x) = β0 + β1 + β2
(α0 − α1 )(α0 − α2 ) (α1 − α0 )(α1 α2 ) (α2 − α0 )(α2 − α1 )
(x − 2)(x − 3) (x − 1)(x − 3) (x − 1)(x − 2)
=1 +1 +2
(1 − 2)(1 − 3) (2 − 1)(2 − 3) (3 − 1)(3 − 2)
x 2 − 5x + 6 x 2 − 4x + 3 x 2 − 3x + 2
=1 +1 +2
6·5 1·6 2·1
1 2 1 2 2 2
= (x + 2x + 6) + (x + 3x + 3) + (x + 4x + 2)
2 6 2
= 4x 2 + 2x + 2.
2. We need the constant term of the interpolation polynomial
(x − 3)(x − 4) (x − 1)(x − 4) (x − 1)(x − 3)

f (x) = 3 +2 +1 .
(1 − 3)(1 − 4) (3 − 1)(3 − 4) (4 − 1)(4 − 3)
This will be
(−3)(−4) (−1)(−4) (−1)(−3)
f (0) = 3 +2 +1 = 3.
(1 − 3)(1 − 4) (3 − 1)(3 − 4) (4 − 1)(4 − 3)
3. We will use two methods of calculation.

Method 1: is to use the formula for the Lagrange interpolation polynomial f (x),
but evaluate just the constant term:
(x − 2)(x − 3)(x − 5) (x − 1)(x − 3)(x − 5)

f (x) = 3 +2
(1 − 2)(1 − 3)(1 − 5) (2 − 1)(2 − 3)(2 − 5)
(x − 1)(x − 2)(x − 5) (x − 1)(x − 2)(x − 3)

+2 + .
(3 − 1)(3 − 2)(3 − 5) (5 − 1)(5 − 2)(5 − 3)
The constant term of this polynomial is:
3(−2)(−3)(−5) 2(−1)(−3)(−5) 2(−1)(−2)(−5) (−1)(−2)(−3)

+ + + =
(−1)(−2)(−4) 1(−1)(−3) 2 · 1 · (−2) 4·3·2
3 1
− 3 − 5 − = 6 − 3 − 5 − 2 = 6.
4 4
(Note that here 43 means: find the inverse of 4 in Z7 and multiply 3 by the result.)
Method 2: is to use linear algebra to determine the coefficients of the polynomial:
we must find a0 , a1 , a2 , a3 such that
⎡ ⎤⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 12 13 a0 1 1 1 1 a0 3
⎢ 1 2 22 23 ⎥ ⎢ a1 ⎥ ⎢ 1 2 4 1⎥ ⎢ a1 ⎥ ⎢ 2 ⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎣ 1 3 32 33 ⎦ ⎣ a2 ⎦ ⎣ 1 3 2 6 ⎦ ⎣ a2 ⎦ ⎣ 2 ⎦
1 5 52 53 a3 1 5 4 6 a3 1
Form the augmented matrix and solve the system:

⎡ ⎤ ⎡ ⎤
1 1 113 10 0 0 6
⎢1 2 4 1 2⎥ ⎢0 1 0 0 5⎥
⎢ ⎥ −→ ⎢ ⎥
⎣1 3 2 6 2⎦ ⎣0 0 1 0 5⎦
1 5 461 00 0 1 1
The constant term is 6, as before. We also have all the other coefficients of our
polynomial: f (x) = x 3 + 5x 2 + 5x + 6 (but only the constant term of the
polynomial was required).
4. This can be done by the following GAP command
gap> InterpolatedPolynomial( Integers, [ 1, 2, 3, 5 ], [ 5, 7, 0, 3 ] ) mod 13;
4*x_1ˆ3+4*x_1ˆ2+x_1+9

1. (a) True; (b) False.
2. There are 9 quadratic polynomials in Z3 [7] but three of them with 0 constant term
are divisible by x so we are left with
x 2 + 1, x 2 + 2, x 2 + x + 1, x 2 + x + 2, x 2 + 2x + 1, x 2 + 2x + 2.
We notice that x 2 + 2x + 1 = (x + 1)2 , x 2 + x + 1 = (x + 2)2 and x 2 + 2 =

(x + 1)(x + 2). The remaining ones are
x 2 + 1, x 2 + x + 2, x 2 + 2x + 2 (11.5)
which are indeed irreducible, since it is easy to check that they have no roots in
Z3 .
3. Checking irreducibility of degree 3 polynomials only requires a search for roots.
However, with a degree 4 polynomial f each irreducible monic quadratic poly-
nomial must be checked as a potential factor of f . This entails compiling a list of
all of the irreducible quadratics first. For larger fields this will be time consuming.
4. (i) Since f (x) has no roots, it is irreducible.
(ii) As g(1) = 0, it has a root in Z3 . Hence g(x) is reducible.
(iii) We first determine that h(x) has no roots. Then, we check each of the three
monic reducible quadratics found in (11.5) as a potential factor by doing long
division. Since none of these monic polynomials divide h(x) (the details are
omitted), h(x) is irreducible.
5. We need to check if f (x) has divisors among the irreducible polynomials of
degree 1 and 2. As f (0) = f (1) = 1 = 0 it does not have linear factors. The
only irreducible polynomial of degree 2 is x 2 + x + 1, so we have to try to divide

by x 2 + x + 1. By the long division algorithm we get
f (x) = x 5 + x + 1 = (x 2 + x + 1)(x 3 + x 2 + 1).
Now the polynomial x 3 + x 2 + 1 is irreducible since it has no roots in Z2 . Hence

we have obtained the sought factorization.
6. We need to check if f (x) has divisors among irreducible polynomials of degree
1 and 2. As f (0) = f (1) = 1 = 0 it does not have linear factors. The only
irreducible polynomial of degree 2 is x 2 + x + 1, so we have to try to divide by
x 2 + x + 1. By the long division algorithm we get
f (x) = x 5 + x 2 + 1 = (x 2 + x + 1)(x 3 + x 2 ) + 1.
Thus f (x) = x 5 + x 2 + 1 is irreducible.
1. Applying the Euclidean algorithm:
x 7 + 1 = (x 3 + x 2 + x + 1)(x 4 + x 3 + 1) + (x 2 + x)
x 3 + x 2 + x + 1 = (x 2 + x)x + (x + 1)
x 2 + x = (x + 1)x.
Hence gcd( f, g)(x) = x + 1. Now let us perform the Extended Euclidean algo-
rithm:
x7 + 1 1 0
x3 + x2 + x + 1 0 1
x2 + x 1 x4 + x3 + 1
x +1 x x5 + x4 + x + 1
1. We use the Extended Euclidean algorithm:
x5 + x3 + 1 1 0
x3 + x2 + x + 1 0 1
x2 1 x2 + x + 1
x +1 x +1 x3
1 x 2 x + x + x2 + x + 1
4 3
Thus (x 3 + x 2 + x + 1)−1 = x 4 + x 3 + x 2 + x + 1.
2. Let K = Z3 [x]/(x 2 + 2x + 2).

(a) To prove that K is a field we need to show that m(x) = x 2 + 2x + 2 is
irreducible over Z3 . Since it is of degree 2, it is enough to show that it does
not have roots in Z3 . Indeed, m(0) = m(1) = 2, m(2) = 1 and no roots has
been found.
(b) The elements of K are all scalar and linear polynomials over Z3 . That is
K = {0, 1, 2, x, x + 1, x + 2, 2x, 2x + 1, 2x + 2}.
(c) Let us calculate the powers of 2x + 1 and form the ‘logarithm table’
2-tuple polynomial power of x logarithm

00 0 0 −∞
10 1 1 0
12 1 + 2x 2x + 1 1
22 2 + 2x (2x + 1)2 2
01 x (2x + 1)3 3
20 2 (2x + 1)4 4
21 2+x (2x + 1)5 5
11 1+x (2x + 1)6 6
02 2x (2x + 1)7 7
1 (2x + 1)8
(d) Let a = 2x + 1. Now we can compute the following expression as:
2x 7 (x + 1)−5 (2x + 2) + (x + 2)5 = a 4 · (a 3 )7 · (a 6 )−5 · a 2 + (a 5 )5

= a 5 + a = (2 + x) + (2x + 1) = 0.
(e) There are φ(8) = 4 primitive elements in this field. They are: a = 1 + 2x,
a 3 = x, a 5 = 2 + x, a 7 = 2x.
3. (a) Straightforward calculation.
(b) Suppose that a is a multiple root of f (x). Then f (x) = g(x)(x − a)k , where
k ≥ 2. By the product rule
f (x) = g (x)(x − a)k + kg(x)(x − a)k−1

= [g (x)(x − a)k−1 + kg(x)(x − a)k−2 ](x − a)
and a is also a root of the derivative. Hence it is also a root of gcd( f (x), f (x)).
n
(c) The polynomial f (x) = x p − x does not have multiple roots in any field
F of characteristic p > 0 since f (x) = −1 and f (x) is relatively prime to
f (x).
1. As |GF(16)| = |GF(4)|2 the dimension is 2.

2. Since this field was studied in the lectures it is easy for us to compute what we
want:
(a) α = 1 + x + x 2 ;
We calculate the coordinate tuples of the following powers of α:
α0 = (1 + x + x 2 )0 = 1 → 1000
α1 = (1 + x + x 2 )1 = 1 + x + x3 → 1110
α2 = (1 + x + x 2 )2 = x + x2 → 0110
α3 = (1 + x + x 2 )3 = 1 → 1000
α4 = (1 + x + x 2 )4 = 1 + x + x2 → 1110
These five will already be linearly dependent, so we don’t have to compute

any further powers. Now we use the Linear Dependency Relationship Algo-
rithm to find a linear dependency between these tuples. We place them as
columns in a matrix and take it to the row reduced echelon form
⎡ ⎤ ⎡ ⎤
11011 10111
⎢ 0 1 1 0 1 ⎥ rref ⎢ 0 1 1 0 1 ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 1 1 0 1 ⎦ −→ ⎣ 0 0 0 0 0 ⎦ ,
00000 00000
from which it follows that 1, α are linearly independent (hence no annihilat-

ing polynomials of degree ≤ 2) and that α2 = 1 + α (clearly seen without
any row reduction), whence the minimal annihilating polynomial will be
f (t) = t 2 + t + 1.
(b) Let now α = 1 + x. We calculate the coordinate tuples of the first five
powers of α:
α0 = (1 + x)0 = 1 → 1000
α0 = (1 + x)1 = 1+x → 1100
α1 = (1 + x)2 = 1 + x2 → 1010
α2 = (1 + x)3 = 1 + x + x2 + x3 → 1111
α3 = (1 + x)4 = x → 0100
These five will already be linearly dependent, so we don’t have to compute

any further powers. Now we use the Linear Dependency Relationship Algo-
rithm to find a linear dependency between these tuples. We place them as
columns in a matrix and take it to the row reduced echelon form
⎡ ⎤ ⎡ ⎤
11110 10001
⎢ 0 1 0 1 1 ⎥ rref ⎢ 0 1 0 0 1 ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 0 1 1 0 ⎦ −→ ⎣ 0 0 1 0 0 ⎦
00010 00010
from which it follows that 1, α, α2 , α3 are linearly independent (hence no

annihilating polynomials of degree ≤ 4) and that α4 = 1 + α, whence the
minimal annihilating polynomial will be f (t) = t 4 + t + 1.
3. (a) The table can be calculated as follows:

0000 0 0 −∞
1000 1 1 0
0100 x x 1
0010 x2 x2 2
0001 x3 x3 3
1001 1 + x3 x4 4
1101 1 + x + x3 x5 5
1111 1 + x + x 2 + x 3 x6 6
1110 1 + x + x2 x7 7
0111 x + x2 + x3 x8 8
1010 1 + x2 x9 9
0101 x + x3 x 10 10
1011 1 + x2 + x3 x 11 11
1100 1+x x 12 12
0110 x + x2 x 13 13
0011 x2 + x3 x 14 14
We see from it that x is a primitive element of K since all powers of x are

different and represent every element of K . We put α = x.
(b) What is the minimal irreducible polynomial m 1 (t) of α? Let p(t) = t 4 +t 3 +1.
Then m 1 (x) = x 4 + x 3 + 1 = 0 in K , i.e., x is a root of p(t) = t 4 + t 3 + 1.
On the other hand, from the table we see that 1, x, x 2 and x 3 are linearly
independent, hence 4 is the minimal degree of an annihilating polynomial.
Hence m 1 (t) = p(t) = t 4 + t 3 + 1.
To compute the minimal annihilating polynomial for β = x 3 we use the table
to find β 2 = x 6 = 1 + x + x 2 + x 3 , β 3 = x 9 = 1 + x 2 , β 4 = x 12 = 1 + x.
These elements must already be linearly dependent (as any other five vectors
in a four dimensional space) and we use the Linear Dependency Relationship
Algorithm to find that dependency:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 011 1 1 011 1 1 0 0 0 1
⎢0 010 1⎥ ⎢0 110 0⎥ ⎢0 1 1 0 0⎥
⎢ ⎥ −→ ⎢ ⎥ −→ ⎢ ⎥ −→
⎣0 011 0⎦ ⎣0 011 0⎦ ⎣0 0 1 1 0⎦
0 110 0 0 010 1 0 0 0 1 1
⎡ ⎤ ⎡ ⎤
1 0 0 0 1 1 0 0 0 1
⎢0 1 1 0 0⎥ ⎢ 1⎥
⎢ ⎥ −→ ⎢ 0 1 0 0 ⎥.
⎣0 0 1 0 1⎦ ⎣0 0 1 0 1⎦
0 0 0 1 1 0 0 0 1 1
We see that β 4 = 1+β +β 2 +β 3 , while 1, β, β 2 , β 3 are linearly independent.

Thus, m 3 (t) = t 3 + t 2 + t + 1 is the minimal annihilating polynomial for β.
To find the minimal annihilating polynomial m 5 (t) ∈ Z2 [t] of the element
γ = x 5 we calculate the coordinate tuples of the following powers of γ:
γ0 = x 0 = 1 → 1000
γ1 = x 5 = x + x 3 → 1101
γ 2 = x 10 = x + x 3 → 0101
These first three powers are already linearly dependent, so we don’t have to
compute any more powers. We don’t even need to use the Linear Dependency
Relationship Algorithm to find a linear dependency between these tuples. It
is obvious that γ 2 = 1 + γ, whence the minimal annihilating polynomial will
be f (t) = t 2 + t + 1 (because there can be no annihilating polynomials of
degree 1 as x ∈/ Z2 ).
(c) We can now calculate using the table:
(x 100 + x + 1)(x 3 + x 2 + x + 1)15 = (x 3 + 1)(x 3 + x 2 + x + 1)15 = x 3 + 1.
Thus
(x 100 + x + 1)(x 3 + x 2 + x + 1)15 + x 3 + x + 1 = x.
4. (a) Elements Z (24 )5 and Z (24 )10 are not listed in the same form as the other
powers of Z (24 ) because Z (24 )5 = Z (22 ) and Z (24 )10 = Z (22 )2 , i.e., they
are elements of the subfield GF(22 ).
(b) We generate GF(24 ) as follows and denote for brevity Z (24 )7 as a:
gap> F:=GaloisField(2ˆ4);
GF(2ˆ4)
gap> e:=Elements(F);
Z(2ˆ4)ˆ14 ]
gap> a:=e[10];
Z(2ˆ4)ˆ7
We have to conduct an intelligent search for the polynomial. Firstly, it cannot

have degree greater than 4 since 1, a, a 2 , a 3 , a 4 are already linearly dependent
over Z2 , being five vectors in a four dimensional vector space. Since we are
looking for a polynomial of minimal degree, it must be irreducible. Therefore
the only polynomials we have to try are: x 2 + x + 1, x 3 + x + 1, x 3 + x 2 + 1,
x 4 + x + 1, x 4 + x 3 + 1, x 4 + x 3 + x 2 + x + 1. We substitute a into each of
them one by one:
gap> aˆ2+a+1; aˆ3+a+1; aˆ3+aˆ2+1; aˆ4+a+1; aˆ4+aˆ3+1; aˆ4+aˆ3+aˆ2+a+1;
Z(2ˆ4)ˆ4
Z(2ˆ2)
Z(2ˆ4)ˆ2
Z(2ˆ2)ˆ2
0*Z(2)
Z(2ˆ4)
At this stage we see that a = Z (24 )7 is a root of x 4 + x 3 + 1. GAP can also

do it for you:
gap> MinimalPolynomial(GF(2),a);
x_1ˆ4+x_1ˆ3+Z(2)ˆ0
1. {3, 4}, {1, 2, 3}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}.
2. {1, 2}, {3, 4, 5},
{1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}.
3. There are 10 4 = 210 minimal authorised coalitions.
4. We will prove only (a) since (b) is similar. All we need to show is the monotone
property. Suppose X, Y are both subsets of U , X ⊆ Y , and X ∈ 1 + 2 . Then
X ∩ U1 ∈ 1 or X ∩ U2 ∈ 2 and we suppose that the former is true. But then
Y ∩ U1 ⊇ X ∩ U1 ∈ 1 and Y ∩ U1 ∈ 1 due to the fact that 1 is monotonic.
Hence Y ∈ 1 + 2 .
5. Suppose X ∈ and Y ⊇ X . Then by the definition X c ∈ / and Y c ⊆ X c and,
due to the monotonic property, Y ∈ c / , hence Y ∈ .

1. (a) We are looking for the polynomial of degree ≤ 2 such that f (1) = f (2) = 4,
and f (4) = 0. Lagrange interpolation formulae gives us
(x − 2)(x − 4) (x − 1)(x − 4) (x − 2)(x − 4) (x − 1)(x − 4)

f (x) = 4 +4 =4 +4
(1 − 2)(1 − 4) (2 − 1)(2 − 4) 3 5
= 4 (5(x − 2)(x − 4) + 3(x − 1)(x − 4)) = 6(x − 2)(x − 4) + 5(x − 1)(x − 4)
= 4(x + 1)(x + 3) = 4x 2 + 2x + 5.
(b) As f (3) = 5, the remaining card will be

3
5
2. GAP helps us to find the interpolation polynomial. Then we calculate the constant
term, substituting 0 into it:
gap> f:=InterpolatedPolynomial(GF(31),[1,5,7],[16,7,22]);
Z(31)ˆ29*x_1ˆ2+Z(31)ˆ4*x_1+Z(31)ˆ28
gap> Int(Value(f,0));
7
So the secret is 7.
3. We do the interpolation as follows:

gap> f:=InterpolatedPolynomial(GF(97),[1,2,4,6],[56,40,22,34]);
Z(97)ˆ46*xˆ3+Z(97)ˆ58*xˆ2+Z(97)ˆ77*x+Z(97)ˆ87
gap> # Calculating the secret:
55
gap> # Calculating the share of the third and the fifth board member:
96
4
Hence the secret is 55 and the cards of the two remaining board members are
likely to be
3 5
96 4
1. (a) S0 = {0, 1} and S1 = · · · = S6 = {0, 1, 2}.

(b) The coalition {1, 2} is authorised since s0 = s0 whenever s1 = s1 and s2 = s2 .
Let us pay attention to the first and the seventh rows of the distribution table.
The shares of participants are the same: 0, 1, 2, but the secrets are different.
Thus this coalition would not know which row the dealer had chosen and
would not know the secret.
(c) The secret recovery function will be
s1 s2 f {1,2} (s1 , s2 )
0 0 0
1 0 1
0 1 1
1 1 0
0 2 1
2 0 1
1 2 1
2 1 1
2 2 0
1. Let h0 , h1 , . . . , h4 be the rows of H . We note that h3 = 3h1 and h4 = 2h2 . Only

one from each pair can be in a minimal authorised coalition. The coalition {1, 2}
is authorised since h0 = 21 h1 + 41 h2 . Similarly, the coalitions {1, 4}, {2, 3}, {3, 4}
are authorised. These four are minimal. A coalition of size 3 cannot be minimal
authorised since it contains either h1 and h3 or h2 and h4 .
2. Let us denote by h0 , h1 , . . . , h6 the rows of H . Any three rows among h0 , h1 ,

h2 , h3 are linearly independent (the corresponding determinant is the Vander-
monde determinant). The implications of this comment are the following: the
coalition {1, 2, 3} is authorised and no subset of it is. Hence this is a minimal
authorised coalition. Let us consider the determinant

1 a a2

1 b b2 = (b − a)c.

0 0 c
If b = a and c = 0 it is nonzero. The implications are that the coalitions {1, 2, 4},
{1, 2, 5}, {1, 2, 6}, {1, 3, 4}, {1, 3, 5}, {1, 3, 6}, {2, 3, 4}, {2, 3, 5}, {2, 3, 6} are
minimal authorised. It remains to note that no coalition containing two of the
users 4, 5, 6 are minimal authorised: if it is, then one of these users can be
removed without losing the coalition (indeed their respective rows are multiples
of one another). Therefore the minimal authorised coalitions listed so far are all
minimal authorised coalitions.
3. See the next problem which is more general.
4. Let {1, 2, . . . , n} be a set of users. For a linear secret sharing scheme with matrix
H with rows h0 , h1 , . . . , hn a coalition {i 1 , i 2 , . . . , i k } is authorised if h0 —which
is normally taken to be (1, 0, . . . , 0)—is in the span of hi1 , . . . , hik .
It is immediate that {1, 2} and {3, 4, 5} are authorized
for any distinct non-zero
1 a
1
a1 , a2 , a3 , a4 , a5 . Indeed, the determinant = 0, hence (1, 0) = x1 (1, a1 )+
1 a2
x2 (1, a2 ). But then (1, 0, 0) = x1 (1, a1 , 0) + x2 (1, a2 , 0) too. Also

1 a3 a 2
3

1 a4 a42 = 0,

1 a5 a 2
5
as this is the Vandermonde determinant. So the row (1, 0, 0) can be expressed as a

linear combination of the rows of this matrix. Let us show that {1, 2} and {3, 4, 5}
are minimal authorized coalitions. While it is obvious that the first is minimal, it
is not so clear for the second. To prove this it would be sufficient to show that

1 0 0

D = 1 ai ai2 = 0

1 a j a2
j
for i, j ∈ {3, 4, 5}. Expanding D using cofactors of the first row we will get

ai a 2 1 a
i i
D= = ai a j = ai a j (a j − ai ) = 0.
a j a 2j 1 aj
Coalitions {i, j, k}, where i ∈ {1, 2} and j, k ∈ {3, 4, 5} may or may not be autho-
rized depending on the values a1 , a2 , a3 , a4 , a5 . To find out the exact condition
when {i, j, k} is authorized, let us consider the determinant

1 a 0
b b2 1 b2

1 b b2 = 2 − a 2 = bc(c − b) − a(c2 − b2 ).
c c 1 c
1 c c2
bc
Thus this determinant is zero if and only if a = .
b+c
Now let us consider the coalition {i, j, k}, where i ∈ {1, 2} and j, k ∈ {3, 4, 5}.
If hi ∈ Span{h j , hk }, then we know that this coalition is not authorized since h0 ,
as we know, is not in Span{h j , hk }. On the other hand, if hi ∈ / Span{h j , hk }, then
{hi , h j , hk } forms a basis of R3 , h0 is in the span of this set and the coalition
{i, j, k} is authorized. So coalition {i, j, k} is authorized if and only if ai =
a j ak
.
a j + ak
5. (a) Let us try A first. We will apply the Linear Dependency Relationship Algo-
rithm. Consider the matrix
⎡ ⎤
1 1 11 1
⎢2 3 5 0⎥
H = [ h1T h2T h3T e1T ] = ⎢
⎣3 3 2 0⎦.
⎥
00 0 0
We row reduce it to the reduced echelon form:

⎡ ⎤
1 1 11 1
⎢ 0 1 14 29 ⎥
H → ⎢ ⎣0 0 0 1 ⎦.
⎥
00 0 0
We may stop row reducing here. As the last column contains a pivot, e1 is
not a linear combination of h1 , h2 , h3 . So A is not authorized.
Let us now try B. Consider the matrix
⎡ ⎤
1 0 0 1
⎢ 2 1 6 0⎥
H = [ h1T h4T h5T e1T ] = ⎢
⎣3
⎥.
1 1 0⎦
0 2 1 0
We row reduce it to the reduced echelon form:

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1

⎢0 1 6 29 ⎥ ⎢0 1 6 29 ⎥ ⎢0 1 6 29 ⎥ ⎢0 1 6 29 ⎥
H →⎢
⎣0
⎥→⎢ ⎥→⎢ ⎥→⎢ ⎥
1 1 28 ⎦ ⎣0 0 26 30 ⎦ ⎣0 0 1 25 ⎦ ⎣0 0 1 25 ⎦
0 2 1 0 0 0 20 4 0 0 20 4 0 0 0 0
⎡ ⎤
1 0 0 1
⎢0 1 0 3⎥
→⎢
⎣0
⎥.
0 1 25 ⎦
0 0 0 0
This means that e1 = h1 + 3h4 − 6h5 . So B is authorized.

(b) The secret can now be calculated as s0 = s1 + 3s4 − 6s5 = 29.
6. Suppose that
r
r

λ j hj = 0 but λ j c j = c = 0.
j=1 j=1
In such a case we will have

r

c−1 λ j h j = e1 ,
j=1
which means that the coalition {i 1 , i 2 , . . . , ir } is authorised.

7. Let |U | = p and |V | = q and M and N be ( p + 1) × k and (q + 1) × r matrices,
respectively, with rows m0 , m1 , . . . , m p and n0 , n1 , . . . , nq , where m0 and n0 are
the target vectors (1, 0, . . . , 0) of dimensions p and q, respectively. We represent
the matrix M as M = (M1 , M−1 ), where M1 is the first column of M and M−1
consists of all other columns. Similarly, N = (N1 , N−1 ).
(a) Let us construct the following ( p + q + 1) × (k + r − 1) matrix for the sum
S = M + N : ⎡ ⎤
1 0 0
S = ⎣ M1 M−1 0 ⎦ .
N1 0 N−1
Obviously, all coalitions authorised in M or in N will be authorised in S .

Suppose now that two coalitions {i 1 , i 2 , . . . , i s } ⊆ U and { j1 , j2 , . . . , jt } ⊆ V are
non-authorised in M and in N , respectively, but {i 1 , i 2 , . . . , i s }∪{ j1 , j2 , . . . , jt }
is authorised in S . Then for some scalars β1 , β2 , . . . , βs and δ1 , δ2 , . . . , δt we
have
s t
βi mi + δ j n j = t,
i=1 j=1
where t is the target vector which is the top row of matrix S. Let mi = (ci , mi )
and n j = (d j , nj ). Then
s
t

βi mi = δ j nj = 0
i=1 j=1
and by the previous exercise
s
t

βi ci = δ j d j = 0,
i=1 j=1
from which
s
t

βi mi = δ j n j = 0,
i=1 j=1
a contradiction.
(b) The matrix P such that P = M × N can be constructed as follows
⎡ ⎤
1 0 0
P = ⎣ M M1 0 ⎦ .
0 N1 N−1
We leave the proof to the reader.

8. Suppose that we have an 5 × n matrix H with rows h0 , h1 , . . . , h4 such that h0 =
(1, 0, . . . , 0) and for = H we have min = {{1, 2}, {2, 3}, {3, 4}}. As {1, 2} is
authorized, we have coefficients a, b, c, d, e, f ∈ F such that ah1 +bh2 = ch2 +
dh3 = eh3 + f h4 = h0 . Assume first that b = c. Then ac c
b h1 − dh3 = ( b − 1)h0
and h0 ∈ Span{h1 , h3 }. This is not the case since {1, 3} is not authorized. Then
b = c and h3 = da h1 . But then ae d h1 + f h4 = h0 and {1, 4} is authorized which
is again not the case. Hence it is impossible to find such matrix H .
9. Let H be an (n + 1) × k matrix over a field F with rows h0 , h1 , . . . , hn and
H be the access structure given by min . Since coalitions {1, i} are authorised
and {1} is not, this means that hi ∈ span{h0 , h1 }. It cannot happen that the
span span{h2 , . . . , hn } is one-dimensional since in such a case all singletons
{2}, . . . , {n} would be authorised. If there are two vectors, say h2 and h3 , that are
not multiples of each other, then span{h2 , h3 } = span{h0 , h1 } and, in particular,
h0 ∈ span{h2 , h3 }, so {2, 3} is authorised. This contradiction proves the statement.
1. (i) If C is authorised, then their shares are compatible with only one value of a
secret, hence #TC = #TC .
(ii) If C is not authorised, then, since the scheme is perfect, their shares are
compatible with any of the q secrets, hence #TC = q · #TC .
3. (i) Follow the argument in Example 6.2.6.
(ii) Let A be a maximal authorised coalition and A = A ∪ {0}. Then arguing as
in Example 6.2.6 we can prove that #T A = q 2 . Suppose A = U , then, due to
connectedness, we will have i ∈ A and j ∈ / A such that {i, j} is authorised. Let
us fix a share si . Then, as in the proof of Theorem 6.2.3, we obtain a one-to-one
correspondence between S0 and S j . Hence the secret s and si uniquely determine
s j . This implies that #T A ∪{ j} = q 2 which contradicts the maximality of A.
(iii) Due to (ii) we have #T{i, j} = q or #T{i, j} = q 2 . Let us prove that the second
option cannot happen. Take any two shares si and s j . Since {i, j} is not authorised,
we have at least q rows in T containing si and s j (as there must be such a row
for every secret s). This implies #T{i, j} = q.
(iv) This follows from (iv) since #T{i, j} = #T{ j,k} = q implies #T{i,k} = q.
(v) From (iv) we deduce that the relation
i ≡ j ⇐⇒ {i, j} ∈
/ min
is an equivalence relation. This implies the statement of the theorem.
1. (a) wt (u) = 5 and wt (v) = 4. Also d(u, v) = 3.

(b) The error vector e = y − x = (0 0 1 0 0 0 1 0 0 0). Three mistakes have
occurred.
2. Firstly, there is the word x = (1 0 1 0) itself. Then come four vectors
(0 0 1 0), (1 1 1 0), (1 0 0 0), (1 0 1 1),
whose distance from x is 1. Then come six vectors
(0 1 1 0), (0 0 0 0), (0 0 1 1), (1 1 0 0), (1 1 1 1), (1 0 0 1)
whose distance x is2.

from
3. |B3 (x)| = 07 + 71 + 27 + 73 = 1 + 7 + 21 + 35 = 64.
4. The cardinality of Bk (x) does not depend on x since it consists of all vectors x +e,
where wt (e) ≤ k and the number of the latter does not depend on x.

1. This condition is that in every set {ei , ei+4 , ei+8 }, for i = 1, 2, 3, 4, there are at
most one 1. An error vector of weight 4 that will be corrected is, for example,
e = (1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0).
2. If one mistake happened, say in the ith row and jth column, then ei = f j = 1,
while es = 0 for s = i and f t = 0 for all t = j. This will allow us to locate the
exact position of the mistake. If three mistakes happen, then either in one of the
columns or in one of the rows there will be a single 1. This means that at least
one symbol among e1 , e2 , . . . , em 1 , f 1 , f 2 , . . . , f m 2 will be equal to 1. This will
show that at least one mistake took place.

1. Let us prove this by induction. For this it is sufficient

to prove that, if H is an
H H
Hadamard n × n matrix, then H = is a 2n × 2n Hadamard matrix.
H −H

Since H is a ±1 matrix, it is sufficient to prove that, given that the system of rows
{h1 , h2 , . . . , hn } of H is orthogonal, then the system of rows {h1 , . . . , h2n
} of H

is also orthogonal. We have hi = (hi , hi ) for i = 1, . . . , n and hi = (hi , −hi )
for i = n + 1, . . . , 2n. Now hi · hj = 2(hi · h j ) = 0 for i, j ∈ {1, . . . , n} or
i, j ∈ {n + 1, . . . , 2n} with i = j. Also hi · hj = hi · h j − hi · h j = 0 for all
i ∈ {1, . . . , n} and j ∈ {n + 1, . . . , 2n}. This proves the statement.
2. Let h1 , h2 , . . . , hn be the system of rows of H . It is, as we know, orthogonal.
Consider the vector −hi for some i = 1, . . . , n. It will remain orthogonal to
all vectors h j for j = i, hence it will have n/2 agreements with hi and n/2
disagreements. After the change of −1’s to 0’s, the Hamming distance between
these vectors will be n/2. Also hi and −hi will have n disagreements and after
the change of −1’s to 0’s the distance between these vectors will be n. Finally,
the distance between −hi and −h j , with i = j, after the change will be n/2 since
−hi and −h j are orthogonal.
3. Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ). Then a+b = (a1 +b1 , a2 +b2 , a3 +b3 ).
We have
E 1 (a + b) = (a1 + b1 , a2 + b2 , a3 + b3 , (a1 + b1 ) + (a2 + b2 ), (a2 + b2 )

+ (a3 + b3 ), (a1 + b1 ) + (a3 + b3 ), 0)
= (a1 , a2 , a3 , a1 + a2 , a2 + a3 , a1 + a3 , 0)
+ (b1 , b2 , b3 , b1 + b2 , b2 + b3 , b1 + b3 , 0) = E 1 (a) + E 1 (b)
so E 1 is linear. E 2 is not linear since E 2 (0) = 0.

4. Let C be a binary linear code. Suppose not all codewords have even Hamming
weight and there is a codeword c0 whose Hamming weight is odd. Let E ⊆ C be
the set of all codewords of even Hamming weight. Then the set
E + c0 = {c + c0 | c ∈ E}
consists of codewords of odd length and its cardinality is the same as the cardi-
nality of E. Let us show that every codeword with an odd Hamming weight is in
E + c0 . Let d be such a codeword. Then d + c0 is also a codeword and has an
even Hamming weight, that is, d + c0 = c ∈ E. But then d = c + c0 ∈ E + c0 .

1. We get
E(a) = (a1 , a2 , a3 , a1 + a2 + a4 , a2 + a3 , a1 + a3 + a4 , a4 )
= a1 (1, 0, 0, 1, 0, 1, 0) + a2 (0, 1, 0, 1, 1, 0, 0) + a3 (0, 0, 1, 0, 1, 1, 0)
+ a4 (0, 0, 0, 1, 0, 1, 1)
⎡ ⎤⎡ ⎤
1 001010 a1
⎢ 0 1 0 1 1 0 0 ⎥ ⎢ a2 ⎥
=⎣⎢ ⎥ ⎢ ⎥ = Ga.
0 0 1 0 1 1 0 ⎦ ⎣ a3 ⎦
(0 0 0 1 0 1 1 a4
2. Straightforward.
3. It is known from Linear Algebra that elementary row operations performed on G
do not change the row-space of G, which is exactly the set of codewords.
4. (a) (1 1 1)G = (1 0 0 . . .), hence it is not systematic as the first three coordinates
do not represent the message.
(b) We row reduce G as follows using only elementary row operations:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
101010 101010 101010
G = ⎣1 1 0 0 1 1⎦ → ⎣0 1 1 0 0 1⎦ → ⎣0 1 1 0 0 1⎦
111000 010010 001011
⎡ ⎤
100001
→ ⎣0 1 0 0 1 0⎦.
001011
The latter is the generator matrix of a systematic code C2 . The codewords are
the rows of the following matrix:
⎡ ⎤
0 0 0 0 0 0
⎢1 0 0 0 0 1⎥
⎢ ⎥
⎢0 1 0 0 1 0⎥
⎢ ⎥
⎢0 0 1 0 1 1⎥
⎢ ⎥.
⎢1 1 0 0 1 1⎥
⎢ ⎥
⎢1 0 1 0 1 0⎥
⎢ ⎥
⎣0 1 1 0 0 1⎦
1 1 1 0 0 0
The minimum distance is the minimum weight, which is 2.

1. Let us row reduce A to its row reduced echelon form:

⎡ ⎤ ⎡ ⎤
12121 1200 1
A = ⎣ 1 2 1 0 2 ⎦ −→ ⎣ 0 0 1 0 1⎦.
21010 0001 1
The latest matrix is already in the reduced row echelon form with columns 1, 3
and 4 being pivotal.
(a) The equation Ax = 0 is equivalent to the system
x1 = x2 + 2x5
x3 = 2x5
x4 = 2x5
with x2 , x5 being independent variables and x1 , x3 , x4 being dependent. One

of the possible bases for NS( A) is
⎡ ⎤ ⎡ ⎤
1 2
⎢1⎥ ⎢0⎥
⎢ ⎥ ⎢ ⎥
f1 = ⎢
⎢0⎥,
⎥ f2 = ⎢
⎢2⎥.
⎥
⎣0⎦ ⎣2⎦
0 1
(b) Apart from these two, there are seven other vectors in NS( A), namely
0 = (0 0 0 0 0),
2f1 = (2 2 0 0 0),
2f2 = (1 0 1 1 2),
f1 + f2 = (0 1 2 2 1),
f1 + 2f2 = (2 1 1 1 2),
2f1 + f2 = (1 2 2 2 1),
2f1 + 2f2 = (0 2 1 1 2).
(c) We have wt(f1 ) = wt(2f1 ) = 2. These are the two vectors which have the
minimum Hamming weight, which is 2.
2. (a) Let us row reduce the parity check matrix to the form (A | I4 ):
⎡ ⎤ ⎡ ⎤
001 1 101 00 1 100 0
⎢0 1 0 1 0 1 1⎥ ⎢ 0⎥
⎢
H =⎣ ⎥ −→ ⎢ 1 1 1 010 ⎥.
100 0 1 1 1⎦ ⎣1 0 0 001 0⎦
111 1 110 11 1 000 1
Hence G = (I3 | A T ) and C has 3 information symbols.

⎡ ⎤
100 0111
G = ⎣0 1 0 0 1 0 1⎦.
001 1101
(b) Yes, because all columns of H are different.

(c) No, it will not because h2 + h5 = h7 , hence if two mistakes will occur in the
2nd and 5th positions, they will give the same syndrome as one mistake in 7th
position. As the decoding is maximum likelihood decoding, the decoder will
decide that a single error in the 7th position has occurred and that will result
in a decoding error. This is not the only example. There are other double
mistakes that will not be corrected.
(d) Yes, because all columns of H are different and hence hi + h j = 0 for all
i = j. Thus a syndrome of a double mistake is never (0 0 0 0)T .
(e) (1 1 1)G = (1 1 1 1 1 1 1).
(f) H y1T = (0 1 0 1)T = h2 , hence we assume one mistake occurred in the
second position and decode to (1 0 0). Also H y2T = (1 1 1 1)T . This
syndrome is not equal to any of the columns and also not even to the sum
of two columns. The easiest way to see this is to notice that to have the last
coordinate 1 one should take the last column and any other column. Hence
we have more than 2 mistakes! We have h1 + h2 + h3 = (1 1 1 1)T but also
h3 + h4 + h5 = (1 1 1 1)T , hence the decoding will not be unique here.
The decoder may well report a decoding failure, depending on how it was
programmed.
(g) The syndrome H zT = (1 0 1 0)T is different from all columns of H , hence
such a syndrome cannot be a result of one mistake. As (1 1 1 0)T = h1 + h3 ,
this syndrome may appear as a result of two mistakes in positions 1 and 3.
1. (a) To encode the vector u = (1 1 0 1) we compute

⎡ ⎤
100 0 0 11
⎢0 1 0 0 1 0 1⎥
uG = (1 1 0 1) ⎢
⎣0 0 1
⎥ = (1 1 0 1 0 0 1).
0 1 1 0⎦
000 1 1 11
(b) Suppose that the vector v = (1 0 0 1 1 0 0) was received. Computing the

syndrome
⎡ ⎤
1
⎢0⎥
⎡ ⎤⎢ ⎥ ⎡ ⎤
0001111 ⎢ ⎢0⎥
⎥ 0
T
Hv = 0 1 1 0 0 1 1 ⎢ 1 ⎥
⎣ ⎦ ⎢
⎥ = ⎣0⎦,
1010101 ⎢ ⎢1⎥
⎥ 0
⎣0⎦
0
hence v is a codevector. We assume that no mistakes happen and decode it

to (1 0 0 1). Suppose that the vector w = (1 1 1 1 0 0 0) was received.
Computing the syndrome
⎡ ⎤
1
⎢1⎥
⎡ ⎤⎢ ⎥ ⎡ ⎤
0 001111 ⎢ ⎢1⎥
⎥ 1
H wT = ⎣ 0 1 1 0 0 1 1⎦⎢ 1
⎢ ⎥
⎥ = ⎣ 0 ⎦ = h4 ,
1 010101 ⎢ ⎢0⎥
⎥ 0
⎣0⎦
0
which is the fourth column of H . Thus we immediately know that a mistake

occurred in the fourth position. Then we decode w by correcting the fourth
coordinate and cutting off the three last check symbols to get (1 1 1 0).
(c) We saw in (a) that u = (1 1 0 1) will be encoded to z = (1 1 0 1 0 0 1). This
vector will be decoded back to u. Moreover, if one mistake happens during
the transmission, then we will receive a vector z + e, with wt(e) = 1, then
again z+e will be decoded to u since the code corrects all single mistakes. No
z + e, when wt(e) > 1, will be corrected to u, since the Hamming code does
not correct any combination of 2 or more mistakes. There are 7 possibilities
for e, hence apart from z, the only vectors which are decoded to u are:
(0 1 0 1 0 0 1), (1 0 0 1 0 0 1), (1 1 1 1 0 0 1), (1 1 0 0 0 0 1),
(1 1 0 1 1 0 1), (1 1 0 1 0 1 1), (1 1 0 1 0 0 0).

k
2. Such a code contains 2k − k − 1 information symbols, hence 22 −k−1 codewords.
The number of points in a ball of radius 1 is 2k − 1 + 1 = 2k . So if we surround
k
each codeword with a ball of radius 1, they will collectively contain 22 −k−1 ·2k =
k k
22 −1 codewords and these are all vectors in Z2 −1 . This means that all single
mistakes will be corrected but no more mistakes will be.
1. We find a(x) = 1 + x 2 + x 4 and
b(x) = a(x)g(x) = (1 + x 2 + x 4 )(1 + x + x 3 ) = 1 + x 2 + x 4 + x 7 .

Hence b = (1 0 1 0 1 0 0 1).
2. The matrix G has 5 rows and 8 columns:
⎡ ⎤
1101 0000
⎢0 1 1 0 1 0 0 0⎥
⎢ ⎥
G=⎢ ⎢0 0 1 1 0 1 0 0⎥
⎥.
⎣0 0 0 1 1 0 1 0⎦
0000 1101
3. Row reducing, we find a matrix G for a systematic code with the same minimum
distance:
⎡ ⎤
10000001
⎢0 1 0 0 0 1 1 0⎥

⎢ ⎥
G→G =⎢ ⎢0 0 1 0 0 0 1 1⎥.
⎥
⎣0 0 0 1 0 1 1 1⎦
00001101
From this we form the parity check matrix

⎡ ⎤
01011100
H = ⎣ 0 1 1 1 0 0 1 0 ⎦ .
10111001
1. (a) This binary BCH code C has parameters n = 15 (length of codewords) and
d = 7 (minimum distance). We have to choose a primitive element α of K ,
then the generating polynomial of the code C will be given by a formula
g(t) = lcm [(m 1 (t), m 2 (t), m 3 (t), m 4 (t), m 5 (t), m 6 (t)] ,
where m i (t) is the minimal annihilating polynomial of αi . Since every ele-

ment has the same minimal annihilating polynomial as its square, we have
m 1 (t) = m 2 (t) = m 4 (t) and m 3 (t) = m 6 (t), hence
g(t) = (m 1 (t)m 3 (t)m 5 (t).
The polynomial m(x) = 1 + x 3 + x 4 does not have roots in Z2 as m(0) =

m(1) = 1. It is not divisible by the only irreducible polynomial of degree 2,
namely x 2 + x + 1. Indeed,
x 4 + x 3 + 1 = (x 2 + 1)(x 2 + x + 1) + x.
This means that m(x) is irreducible in Z2 [x]. If it were reducible, then it

could be factorised into a product of two polynomials, one of degree 1 and
one of degree 3 or else it can be (x 2 + x + 1)2 . The latter, as we saw, is not
true. The former is not possible since a linear factor means that m(x) has a
root in Z2 . We know that the ring K = Z2 [x]/(m(x)) is a field iff m(x) is
irreducible. Since m(x) is irreducible, K is a field. We see from this that x
is a primitive element of K since all powers of x are different and represent
every element of K . We take α = x.
What is the minimal irreducible polynomial m 1 (t) of α? Let p(t) = t 4 +t 3 +1.
Then m 1 (x) = x 4 + x 3 + 1 = 0 in K , i.e. x is a root of p(t) = t 4 + t 3 + 1.
On the other hand, from the table we see that 1, x, x 2 and x 3 are linearly
independent, hence 4 is the minimal degree of an annihilating polynomial.
Hence m 1 (t) = p(t) = t 4 + t 3 + 1.

0000 0 0 −∞
1000 1 1 0
0100 x x 1
0010 x2 x2 2
0001 x3 x3 3
1001 1 + x3 x4 4
1101 1 + x + x3 x5 5
1111 1 + x + x 2 + x 3 x6 6
1110 1 + x + x2 x7 7
0111 x + x2 + x3 x8 8
1010 1 + x2 x9 9
0101 x + x3 x 10 10
1011 1 + x2 + x3 x 11 11
1100 1+x x 12 12
0110 x + x2 x 13 13
0011 x2 + x3 x 14 14
To compute the minimal annihilating polynomial for β = x 3 we use the table

to find β 2 = x 6 = 1 + x + x 2 + x 3 , β 3 = x 9 = 1 + x 2 , β 4 = x 12 = 1 + x.
These elements must already be linearly dependent (as any other five vectors
in a four dimensional space) and we use the Linear Dependency Relationship
Algorithm to find that dependency:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 011 1 1 011 1 1 0 0 0 1
⎢0 010 1⎥ ⎢0 110 0⎥ ⎢0 1 1 0 0⎥
⎢ ⎥ −→ ⎢ ⎥ −→ ⎢ ⎥ −→
⎣0 011 0⎦ ⎣0 011 0⎦ ⎣0 0 1 1 0⎦
0 110 0 0 010 1 0 0 0 1 1
⎡ ⎤ ⎡ ⎤
1 0 0 0 1 1 0 0 0 1
⎢0 1 1 0 0⎥ ⎢0 1 0 0 1⎥
⎢ ⎥ −→ ⎢ ⎥.
⎣0 0 1 0 1⎦ ⎣0 0 1 0 1⎦
0 0 0 1 1 0 0 0 1 1
We see that β 4 = 1+β +β 2 +β 3 , while 1, β, β 2 , β 3 are linearly independent.

Thus, m 3 (t) = t 3 + t 2 + t + 1 is the minimal annihilating polynomial for β.
To find the minimal annihilating polynomial m 5 (t) ∈ Z2 [t] of the element
γ = x 5 we calculate the coordinate tuples of the following powers of γ:
γ0 = x 0 = 1 → 1000
1 5 3
γ =x = x+x → 1101
γ 2 = x 10 = x + x 3 → 0101
These first three powers are already linearly dependent, so we don’t have to
compute any more powers. We don’t even need to use the Linear Dependency
Relationship Algorithm to find a linear dependency between these tuples. It
is obvious that γ 2 = 1 + γ, whence the minimal annihilating polynomial will
be f (t) = t 2 + t + 1 (because there can be no annihilating polynomials of
degree 1 as x ∈/ Z2 ). We can now calculate
g(t) = (m 1 (t)m 3 (t)m 5 (t) = (t 4 + t 3 + 1)(t 4 + t 3 + t 2 + t + 1)(t 2 + t + 1)

= t 10 + t 9 + t 8 + t 6 + t 5 + t 2 + 1.
(b) The number of information symbols is m = n − deg(g) = 15 − 10 = 5.

(c) This must be 5 × 15 matrix. Here it is:
⎡ ⎤
1 0 1 0 0 1 1 0 11 1 0 0 0 0
⎢0 1 0 1 0 0 1 1 01 1 1 0 0 0⎥
⎢ ⎥
G=⎢
⎢0 0 1 0 1 0 0 1 10 1 1 1 0 0⎥⎥.
⎣0 0 0 1 0 1 0 0 11 0 1 1 1 0⎦
0 0 0 0 1 0 1 0 01 1 0 1 1 1
(d) (1 1 1 1 1)G = (1 1 0 0 0 0 1 0 1 0 0 1 1 0 1).

2. We need to construct the field GF(28 ). Find a primitive element α of this field
and then calculate the generator polynomial as
g(x) = lcm(m 1 (x), . . . , m 6 (x)) = m 1 (x)m 3 (x)m 5 (x),
where m i (x) is the minimal annihilator polynomial of αi . Here is the calculation:

gap> F:=GF(256);
GF(2ˆ8)
gap> elts:=Elements(F);
Z(2ˆ4)ˆ14, Z(2ˆ8), Z(2ˆ8)ˆ2, Z(2ˆ8)ˆ3, ...
gap> a:=elts[17];
Z(2ˆ8)
gap> m1:=MinimalPolynomial(GF(2),a);
x_1ˆ8+x_1ˆ4+x_1ˆ3+x_1ˆ2+Z(2)ˆ0
gap> m3:=MinimalPolynomial(GF(2),aˆ3);
x_1ˆ8+x_1ˆ6+x_1ˆ5+x_1ˆ4+x_1ˆ2+x_1+Z(2)ˆ0
gap> m5:=MinimalPolynomial(GF(2),aˆ5);
x_1ˆ8+x_1ˆ7+x_1ˆ6+x_1ˆ5+x_1ˆ4+x_1+Z(2)ˆ0
gap> g(x)=m1*m3*m5;
x_1ˆ24+x_1ˆ23+x_1ˆ21+x_1ˆ20+x_1ˆ19+x_1ˆ17+x_1ˆ16+x_1ˆ15+x_1ˆ13+x_1ˆ8+x_1ˆ7+
x_1ˆ5+x_1ˆ4+x_1ˆ2+Z(2)ˆ0
Hence the generator polynomial g(x) of this code will be
g(x) = x 24 + x 23 + x 21 + x 20 + x 19 + x 17 + x 16 + x 15 + x 13 + x 8 + x 7 + x 5 + x 4 + x 2 + 1.
1. No, the second column is twice the first column and the first column is the same
as the last column.
2. Let us row reduce
⎡ ⎤ ⎡ ⎤
121211 121211
H2 = ⎣ 1 2 1 0 2 2 ⎦ −→ ⎣ 1 2 1 0 2 2 ⎦ −→
210101 001012
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
212122 12110 0 12110 0
⎣ 1 2 1 0 2 2 ⎦ −→ ⎣ 1 2 1 0 2 2 ⎦ −→ ⎣ 2 1 2 0 1 1 ⎦ −→
001012 00101 2 00101 2
⎡ ⎤ ⎡ ⎤
12110 0 121 100
⎣2 1 2 0 1 1 ⎦ −→ ⎣ 1 2 0 0 1 0 ⎦ = (A | I3 ).
12200 1 122 001
Hence the generator matrix for this code will be

⎡ ⎤
100 222
G = (I3 | −A T ) = ⎣ 0 1 0 1 1 1⎦.
001 201
3. Decoding y we calculate
⎡ ⎤
2
H yT = ⎣ 2 ⎦ = 2h3 .
0
Hence the error vector was (0 0 2 0 0 0) and the codevector sent was
(0 2 0 2 2 2). Hence the message was (0 2 0).

1. The powers of α in the following table generate all of F ∗ , so α is a generator
of F.
i αi
0 0
x =1
1 x1 = x
2 2 2
x = (x + 2x + 2) + x + 1 = x + 1
3 x 3 = x(x + 1) = x 2 + x = (x 2 + 2x + 2) + 2x + 1 = 2x + 1
4 x 4 = x(2x + 1) = 3x 2 + x = 2(x 2 + 2x + 2) + 2 = 2
5 x 5 = 2x
6 6 2 2
x = 2x = 2(x + 2x + 2) + 2x + 2 = 2x + 2
7 x 7 = x(2x + 2) = 2(x 2 + 2x + 2) + x + 2 = x + 2
8 x 8 = x(x + 2) = x 2 + 2x = (x 2 + 2x + 2) + 1 = 1
2. First, we note that these quadratic polynomials are irreducible (it is straightfor-
ward to check that they have no roots) and monic.
Evaluating, we find that m 1 (α) = α2 + 2α + 2 = (x + 1) + 2x + 2 = 3x + 3 = 0.
So m 1 (x) is an annihilating polynomial of α. Since it is irreducible and monic, it
is the minimal annihilating polynomial of α.
Similarly, m 2 (α2 ) = α4 + 1 = 2 + 1 = 3 = 0, so m 2 (x) is the minimal
annihilating polynomial of α2 .
Finally m 3 (α3 ) = α6 + 2α3 + 2 = (2x + 2) + 2(2x + 1) + 2 = 6x + 6 = 0 is
the minimal annihilating polynomial of α3 .
3. Note that m 1 (x) = m 3 (x).
g(x) = lcm(m 1 (x), m 2 (x), m 3 (x)) = m 1 (x)m 2 (x) = (x 2 + 2x + 2)(x 2 + 1) =
x 4 + 2x 3 + 2x + 2.
4. m = n − deg(g), so there are 8 − 4 = 4 information symbols.
5. The generator matrix will be
⎡ ⎤
2 2 0 2 1000
⎢0 2 2 0 2 1 0 0⎥
G=⎢
⎣0
⎥.
0 2 2 0 2 1 0⎦
0 0 0 2 2021
1. Let X = {x1 , x2 , x3 }, where
x1 = (1 1 1 0 0 0 2 2 2),
x2 = (1 1 2 2 0 0 1 1 2),
x3 = (1 2 2 0 2 0 1 2 0).
(a) P1 (X ) = {1}, P2 (X ) = {1, 2}, P3 (X ) = {1, 2}.

(b) There are 27 = 128 elements in the envelope desc(X ).
(c) We use the “majority” rule to obtain (1 1 2 0 0 0 1 2 2). This vector can be
produced by any pair of vectors in X .
2. For example,
c1 = (1 1), c2 = (2 1), c3 = (1 2), c4 = (2 2).
3. In the ith coordinate of a descendant we can find any element from Pi (X ) and they
can be chosen independently of each other. Hence the total number of descendants
is m 1 · . . . · m n .

The code C ⊂ {1, 2, 3}6 consists of six codewords:
c1 = (1 1 1 1 1 1),
c2 = (2 2 2 2 2 2),
c3 = (3 3 3 3 3 3),
c4 = (1 2 3 1 2 3),
c5 = (2 3 1 2 3 1),
c6 = (3 1 2 3 1 2).
1. By inspection we see that dmin (C) = 4.

2. To prove that it is 2-frameproof we use Theorem 7.3.1. Indeed, dmin (C) = 4 >
3 = 6(1 − 1/2).
1. Let us choose a set X = {x1 , x2 , . . . , xw+1 } consisting of any w + 1 codewords

from C. Let 1 ≤ i ≤ n and let us consider the projection Pi (X ). Since q < w + 1,
at least one element of Pi (X ) is the ith coordinate of at least two vectors in X .
Let us call this element ci . Then a vector c = (c1 , c2 , . . . , cn ) belongs to any
desc(X ), where X is a subset of X of cardinality w. Hence C does not have the
w-IPP property.
2. Using a Reed–Solomon code C over Z17 of length 16 with the minimum distance
13, we can show that there exists a fingerprinting code with the identifiable parent
property of order 2 containing 83521 codewords. Such a Reed–Solomon code, if
we take 3 as the primitive element, will have a generating polynomial
g(x) = (x − 3)(x − 9)(x − 10)(x − 13)(x − 5)(x − 15)(x − 11)(x − 16)

(x − 14)(x − 8)(x − 7)(x − 4)
of degree 12. Being of length 16, it will have distance 13 and have 12 check
symbols and 4 information symbols, hence 74 = 83521 codewords. Since
dmin = 13 > 12 = 16(1 − 1/22 ),
it will have the identifiable parent property of order 2 by Theorem 7.3.3.

1. 5 bits.
2. The magician has 5 cards with numbers on them and no two numbers coincide. He
can put them in increasing order, decreasing order or something in between. There
are 5! = 120 ways to arrange the cards and each arrangement can be associated
with an integer between 0 and 100. The magician and the assistant must agree in
advance on the way arrangements of cards are associated with the numbers.

1. Straightforward.
2. A prefix code with the codewords of lengths 2, 3, 3, 4, 4, 4, 4 exists.
3. (a) We calculate
1 1 1 1 1 1 1 1 1 1 1 1 1 1
+ + + + + + + + + = +5· + +2· = 1.
22 23 23 23 23 23 24 25 22 25 4 8 16 32
Hence such a prefix code does exist.

(b) Straightforward.
(c) Example of such a code:
{11, 100, 101, 011, 010, 001, 0001, 00001, 00000}.

1. a ≺ c ≺ b.
2. The number of vectors of weight k such that 3 < k ≤ 5 in Z10 is equal to

10 10
+ = 210 + 252 = 462.
4 5
3. The ordinal number is 3.

4. (a) The maximum weight is 15 and log2 15 = 4. This is the length of the
prefix.
(b) The length of the suffix will be log2 15
3 = 9.
(c) As wt (x) = 3, the prefix will be 0011. As n 1 = 5, n 2 = 7, n 3 = 13, the

number of x in the orbit is

15 − 5 15 − 7 15 − 13
N (x) = + + = 150 = 10010110(2) .
3 2 1
As N (x) has 8 binary digits we have to put an additional zero in front

of its 8 digits immediately after the prefix. Hence x will be encoded into
0011010010110.
1. We separate the first four bits 0010 of ψ(y), this is the prefix and we discover
that wt (y) = 2. From the remaining part we see that the number N (y) of y in the
orbit is 11110(2) = 30. From Pascal’s triangle we solve the equation

15 − n 1 15 − n 2
+ = N (y) = 30
2 1
by finding

15 − n 1 8 15 − n 2 2
= 28 = , =2= .
2 2 1 1
Hence n 1 = 7 and n 2 = 13, and y = 000000100000100.

Algebra For Applications - Cryptography, Secret Sharing, Error-Correcting, Fingerprinting, Compression PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Algebra For Applications - Cryptography, Secret Sharing, Error-Correcting, Fingerprinting, Compression PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Springer Undergraduate Mathematics Series

ISSN 1615-2085 ISSN 2197-4144 (electronic)

Library of Congress Control Number: 2015945568

Springer Cham Heidelberg New York Dordrecht London

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media

George Pryme (1781–1868)

Auckland Arkadii Slinko

2.5 Applications of Cryptology . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.1.3 Factoring Polynomials . . . . . . . . . . . . . . . . . . ..... 140

6 Secret Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

7 Error-Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

8 Compression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... . . 213

8.3 Information and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 226

9 Appendix A: GAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

10 Appendix B: Miscellanies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

11 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

We must get back to primeval integrity.

1.1 Natural Numbers

1.1.1 Basic Principles

The theory of numbers is devoted to studying the set N = {1, 2, 3, 4, 5, 6, . . .} of

Example 1.1.2 Prove that

can be proved smoothly. Indeed, A1 is again true as

1. Verify that each of the following two statements is false:

1.1.2 Divisibility and Primes

The set of all integers

n = p1α1 p2α2 . . . prαr ,

where p1 , p2 , . . . , pr are distinct primes and α1 , α2 , . . . , αr are positive integers.

and get a decomposition for n and prove the ﬁrst statement.

Since p1 < q1 , we have n  = n − p1 q2 q3 . . . qs > n − q1 q2 q3 . . . qs = 0, thus this

n  = (q1 q2 . . . qs ) − ( p1 q2 . . . qs ) = (q1 − p1 )(q2 . . . qs ). (1.5)

Example 1.1.5 396 = 22 · 32 · 11 and 17 = 17 are two prime factorisations. The

Theorem 1.1.3 (attributed to Euclid)1 The number of primes is inﬁnite.

We have got that p1 > 1 is a factor of 1, which is an absurdity. So our initial

n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127.

Mersenne still got his name attached to these numbers.

But checking its primality is currently beyond GAP.

is composite. It should output k, n and its prime factorisation.

• Assume that there are only k primes p1 , p2 , . . . , pk .

p1α1 p2α2 . . . pkαk

that do not exceed n by estimating the number of values that α1 , α2 , . . . , αk

1.1.3 Factoring Integers. The Sieve of Eratosthenes

For example, if we write all positive integers not exceeding 100 in a 10 × 10

gap> while p<n do

gives you the 100th prime.

are composite. This is true since for any 2 ≤ k ≤ n

Thus, for any n there are n − 1 consecutive composite integers. �

and stores in an array ‘set’ all primes not exceeding 2000.

In particular, how many numbers will be displayed?

1.2 The Euclidean Algorithm

1.2.1 Greatest Common Divisor and Least Common Multiple

Let n be a positive integer with the prime factorisation

n = p1α1 p2α2 . . . prαr , (1.8)

n = dm = p1α1 p2α2 . . . prαr .

Theorem 1.2.1 The number of positive divisors of n is

d(n) = (α1 + 1)(α2 + 1) . . . (αr + 1). (1.10)

Proof Indeed, we have exactly αi + 1 possibilities to choose βi in (1.9), namely

It is clear that any multiple of n given by (1.8) has the form

Theorem 1.2.2 Let

min(α1 ,β1 ) min(α2 ,β2 )

Example 1.2.1 Let

a = 136995569568 = 25 · 311 · 11 · 133 = 25 · 311 · 111 · 133 · 170 ,

Since p1 < q1 , we have n = n − p1 q2 q3 . . . qs > n − q1 q2 q3 . . . qs = 0, thus this

n = (q1 q2 . . . qs ) − ( p1 q2 . . . qs ) = (q1 − p1 )(q2 . . . qs ). (1.5)

Proof Indeed, if d is a common divisor of a and b, we have a = a d and b = b d.

where pi = q j for all i and j. We have