Sunteți pe pagina 1din 29

Summer 2016 coding theory tutorial

Geoffrey Smith, notes by Lynnelle Ye


July 27, 2016

1 June 20.
Resources: Geoff’s webpage, van Lint’s Introduction to Coding Theory, Nathan Kaplan’s
notes.
Coding theory studies ways of adding redundant information to data such that errors
can be detected or corrected. A basic example is the checksum. In binary, this would mean
adding a single bit at the end so that you are sending an even number of 1s.

Definition 1.1. The Hamming distance between two words x and y is the number of
letters in x distinct from those in y, denoted d(x, y). For example, d(1000, 1010) = 1.
The minimum distance of a code is the minimal Hamming distance between two dis-
tinct valid code words.

For example, the minimum distance of the trivial code is 1. The minimum distance of
the binary checksum code is 2, that is, it can detect a single error.
Another simple example is repetition: given x0 , . . . , xn , you send

x0 , . . . , xn , x0 , . . . , xn , . . . ,

repeated k times. The minimum distance of this code is k.


The basic idea of coding theory (to a myopic mathematician) is to find codes of length n
over some alphabet Q that maximize the number of valid codewords (M ) and the minimum
distance of the code (d). Given an alphabet Q, we say “an (n, M, d)-code over Q” for a
code with n-letter words, M valid codewords, and minimum distance d. For example, binary
checksum is an (n, 2n−1 , 2)-code over {0, 1}, and repetition is an (nk, 2n , k)-code over {0, 1}.
Binary checksum and repetition are examples of linear codes.

Definition 1.2. Suppose we’re working with the alphabet Z/pZ, the field of integers modulo
the prime p. A linear code C is a sub-vector space of (Z/pZ)n . That is, we want our set
of codewords to be closed under addition.

This is true for binary checksum, because if x1 , . . . , xn−1 , y and x01 , . . . , x0n−1 , y 0 are valid,
then
x1 + x01 + · · · + xn−1 + x0n−1 + y + y 0 ≡ 0 (mod 2)
so x1 + x01 , . . . , xn−1 + x0n−1 , y + y 0 is a valid code.

1
Definition 1.3. An [n, k, d]p -code is a linear code C over Z/pZ with words of length n such
that dim C = k and C has minimum distance d.
A generating matrix for an [n, k, d]-code is a k × n matrix whose rows are a basis for
C.

For example, binary checksum with n = 4 is a [4, 3, 2]-code with generating matrix
 
1 0 0 1
0 1 0 1 .
0 0 1 1

The Hamming code is the following thing: the [7, 4, 3]-code version, also called the Ham-
ming (7, 4)-code, encodes the data x1 , x2 , x3 , x4 by writing

x1 x2 x3 x4

(placing them in positions 3, 5, 6, 7 in the codeword), and then putting “parity bits” in the
1, 2, 4 slots such that for the subsets of positions whose indices have ones in the 1, 2, 4s-places,
there are an even number of 1s in each subset. The generator matrix is
 
1 1 1 0 0 0 0
1 0 0 1 1 0 0
 
0 1 0 1 0 1 0
1 1 0 1 0 0 1

(constructed by filling in positions 1, 2, 4 appropriately for x1 x2 x3 x4 = 1000, 0100, 0010,


0001). Another important matrix:

Definition 1.4. The parity check matrix of a code C is an (n − k) × n matrix M of rank


n − k such that M x = 0 for all valid codewords x.

The parity check matrix for the Hamming code is


 
1 0 1 0 1 0 1
M = 0 1 1 0 0 1 1
0 0 0 1 1 1 1

(constructed by setting the columns to vary from 1 to 7 written in binary). This matrix has
7 pairwise linearly independent columns, the most possible for 3-row matrices over Z/2Z
(i.e. no two columns are scalar multiples of each other, or 0).
h n i
−1 pn −1
Definition 1.5. The pp−1 , p−1 − n, 3 Hamming code is the code over Z/pZ whose parity
check matrix is a maximal set of pairwise linearly independent vectors over Z/pZ.

For example, when n = 2, p = 3, we get the [4, 2, 3]3 -code over Z/3Z with parity check
matrix  
1 0 1 1
.
0 1 1 2

2
The Hamming (7, 4) code can correct a single error: if you receive a message vector v,
calculate M v. You get a 3-term vector that is zero if and only if v is a valid codeword. If
M v is nonzero, it corresponds to some number i in binary. We claim that if you flip the bit
in position i, you get the codeword that was meant.
Proof by example: to encode 1101, we see that there are three 1s in positions with a 1
in the 1s place, and so on, so we get 1010101. Say we receive 1010001. Then we see that
the bits in the 1st and 4th positions are inconsistent, whereas the bit in the 2nd position is
consistent, and that 1 + 4 = 5, and so that the 5th bit is incorrect.

2 June 22.
Recall that we are looking at linear codes, i.e. vector subspaces C ⊆ (Fp )n (we will write Fp
for Z/pZ from now on, and similarly Fq for the field with q elements if q = pt ). These have
two associated matrices:
1. the generating matrix G, a k × n matrix with rowspace C, and
2. the parity matrix H, an (n − k) × n matrix with nullspace C.
These are not unique! [CLASS CANCELLED BY FIRE ALARM]

3 June 24.
Recall that a linear [n, k]q code is a k-dimensional vector subspace C ⊆ Fnq , with associated
generating matrix G and parity matrix H as defined last time. They are not unique, because
we can do row operations on either without changing C.
Definition 3.1. We say that two codes C, C 0 ⊆ Fnq are equivalent if there is some permu-
tation π of {1, 2, 3, . . . , n} that sends any codeword x = x1 · · · xn to a codeword xπ(1) · · · xπ(n)
of C 0 .
Remark 1. Any two binary Hamming codes are equivalent.
We claim that given any C ⊆ Fnq , there is a code C 0 equivalent to C having [Ik |A] as a
generating matrix, where A is some matrix. Proof: take any generating matrix, put it in
reduced row echelon form, and permute the columns to get a generating matrix that looks
like [Ik |A], whose corresponding code is equivalent to C.
Proposition 3.1. Given a code C with G = [Ik |A], its parity check matrix is given by
H = [−AT |In−k ].
Proof. We need to show that the rowspace of G equals the nullspace of H, that is, HGT = 0.
But  
T T Ik
HG = [−A |In−k ] = −AT + AT = 0.
AT

Recall that the distance d(x, y) between x, y ∈ Fnq is the number of places where x, y
differ, or the number of nonzero letters in x − y. The minimum distance of C ⊆ Fnq is
min{d(x, y) | x, y ∈ C, x 6= y}.

3
Definition 3.2. The weight of x is d(x, 0).

Proposition 3.2. The minimum weight of a linear code C is equal to its minimum distance.

Proof. If C has minimum distance d, then d(x, y) = d for some x, y ∈ C, so x − y has weight
d.

Proposition 3.3. The minimum distance of a linear code C ⊆ Fnq is the minimum number
of columns of a parity check matrix H that are linearly dependent.

For example, binary checksum has parity check matrix [1, 1, . . . , 1]. No one of these
columns is linearly dependent (a single column being linearly dependent means it is zero).
Recall that the Hamming [(q n − 1)/(q − 1), (q n − 1)/(q − 1) − n]q code is any code with
parity check matrix having the maximal number of pairwise linearly independent columns.
By this proposition, the Hamming (7, 4) code has minimum distance 3: by definition no 2
columns of H are linearly dependent, so the minimum distance is at least 3; conversely, the
sum of any two columns must be a column of the matrix up to scaling, because otherwise
the Hamming code would not have maximally many columns.
The proof of this proposition is in van Lint 3.2. (To say it quickly, multiplying H by a
code of minimal weight gives 0, hence gives a minimal linear dependence relation between
the columns.)
Remark 2. A code of minimum distance d can detect d − 1 errors. A code with minimum
distance 2e + 1 can correct e errors, because given a message with at most e errors, there is
at most one codeword within distance e.

Definition 3.3. A perfect code C ⊆ Fnq is a code of minimum distance 2e + 1 such that
every x ∈ Fnq is at most distance e from some y ∈ C.

For example, the [2e + 1, 1, 2e + 1] repetition code with G = [1, 1, . . . , 1] (2e + 1 ones) is
a perfect e-error-correcting code.

Theorem 3.4 (Lloyd’s theorem). The only perfect linear binary codes capable of correcting
e ≥ 2 errors are the repetition code and the Golay code G23 .

The Golay code G23 is a unique [23, 12, 7] code (up to equivalence) over F2 .

Definition 3.4. The dual code of a [n, k]-code C with generating matrix G is the [n, n − k]
code C ⊥ with parity matrix G.

Warning: dual codes are in no sense the “opposite” of your starting code. C ∩ C ⊥ need
not be zero; in many interesting cases, C = C ⊥ (for example the extended Golay code G24 ).

Example 3.1. The [n, n − 1] checksum code has as dual the [n, 1] repetitionh n code.
i
q −1
The dual of the Hamming code is called a Hadamard code. It is a q−1 , n -code (note
that this has a very low rate, but it’s useful because it can correct a lot of errors).

Remark 3. The rate of an [n, k] code is k/n.


Question: how do you find the minimal distance of C ⊥ , using stuff you know about C?

4
Definition 3.5. Given C ⊆ Fnq , let Ai be the number of codewords x ∈ C of weight i. The
weight polynomial A(z) is defined by A(z) = 0≤i≤n Ai z i .
P

Theorem 3.5 (MacWilliams’s Theorem). Let C be a [k, n]q code and A(z), B(z) the weight
polynomials of C, C ⊥ respectively. Then we have
 
−k n 1−z
B(z) = q (1 + (q − 1)z) A .
1 + (q − 1)z
Note that if you plug in z = 0, the P
LHS B(0) is 1 = A0 the number of codewords of
−k −k −k k
weight 0, and the RHS is q A(1) = q i Ai = q q = 1.
We will now discuss combinatorial designs.
Definition 3.6. A t − (v, k, λ) design is a pair X, B where X is a set of v objects and B a
collection of k-element subsets of X such that given any t objects in X, there are precisely
λ subsets in B containing all of them.
Example 3.2. The projective plane over F2 (Fano plane): X is a set of 7 points, B is a set
of lines containing them, as in the following picture.

This is a 2 − (7, 3, 1)-design. Notice that any pair of points in X lie on a unique line.
More generally, a finite projective plane is a set of points X and lines B such that
—any pair of lines intersect in a unique point, and
—any pair of points has a unique line through them.
Question: are there finite projective planes over than projective planes over finite fields?
In particular, it is a fact that finite projective planes have precisely n2 + n + 1 points for
some n—can n be anything other than pk for some p?

5
Old theorem: n can’t be 6. Large computer calculation: n can’t be 10. Can n be 12?
Dunno!
Problem: are there any t − (v, k, λ)-designs with t ≥ 6? Answer: yes, but we don’t know
any examples...
Consider the incidence matrix of the Fano plane, where each column corresponds to a
point and each row to a line, and there is a 1 if the point is on the line and 0 otherwise:
 
1 1 1 0 0 0 0
1 0 0 1 1 0 0
 
1 0 0 0 0 1 1
 
0 1 0 1 0 1 0 .
 
0 1 0 0 1 0 1
 
0 0 1 1 0 0 1
0 0 1 0 1 1 0

The rows of this matrix are exactly the elements of the [7, 4]2 Hamming code of weight 3.
In general, a t − (v, k, λ)-design has an incidence matrix whose rows are the weight-k
elements of some code. For example, there is a 5 − (24, 8, 1) combinatorial design. The rows
of its incidence matrix are the weight 8 elements of the Golay code G24 .
Question: when do the weight i elements of a code C form a combinatorial design?
Answer: this is nontrivial.

4 June 27.
Recall that a subset C ⊆ Qn with |C| = M and minimum distance d is called an (n, M, d)-
code (similar to the [n, k, d] notation for linear codes but not the same). We work over Q
such that |Q| = q, and let θ = (q − 1)/q.
Definition 4.1. The function Aq (n, d) is given by

Aq (n, d) = max(M | a (n, M, d)-code exists).

We will just write A(n, d) sometimes.


The fundamental question is: what is Aq (n, d)?
Theorem 4.1 (Singleton bound). Aq (n, d) ≤ q n−d+1 .
Proof. Suppose there’s an (n, M, d)-code. Delete d − 1 letters in it. This produces an
(n − d + 1, M, 1)-code, because any two distinct words x, y ∈ C differ in at least one place
not deleted. Thus we have a code C 0 ⊆ Qn−d+1 with no repeated words, which means
M = |C 0 | ≤ q n−d+1 .
Definition 4.2. A code achieving the Singleton bound is called maximal-distance sepa-
rable, or an MDS code.
Remark 4. Given any (n, M, d)-code C ⊆ Qn , removing an arbitrary letter from every code-
word produces an (n − 1, M, d − 1)-code. This is called puncturing the code.

6
Proposition 4.2. A(n, d) ≤ A(n − 1, d − 1) for n, d ≥ 1.

Proof. Given any (n, M, d)-code, puncturing once gives an (n − 1, M, d − 1)-code.

Definition 4.3. The function Vq (n, d) is given by


d  
n
X n
Vq (n, d) = #{elements of Q within distance d of 0} = (q − 1)i .
i=0
i

Theorem 4.3 (Gilbert’s bound). If n, k, d are chosen such that Vq (n, d − 1) < q n−k+1 , then
an [n, k, d]q -code exists (hence Aq (n, d) ≥ q k ).

Proof. When k = 0, there’s nothing to prove. Suppose we have a [n, k − 1, d]-code Ck−1 and
know Vq (n, d−) < q n−k+1 . Consider the set of all words x ∈ Fnq within distance d − 1 of some
element of Ck−1 . This set has size at most q k−1 Vq (n, d − 1) elements in it. But we know
Vq (n, d − 1)q k−1 < q n , so there is some x ∈ Qn having distance at least d to every word in
Ck−1 .
We claim that Ck = span{Ck−1 , x} is an [n, k, d]-code. Proof: suppose y ∈ Ck has
minimal weight, and y = c + ax where c ∈ Ck−1 , a ∈ Fq . Then wt(y) = d(−c, ax) =
d(−a−1 c, x) ≥ d, since x has distance at least d to −a−1 c ∈ Ck−1 (unless a = 0, in which
case wt(y) = wt(c) ≥ d). Thus Ck is an [n, k, d]-code.

Definition 4.4. Given 0 ≤ δ ≤ 1, define

logq (A(n, δn))


α(δ) = lim sup .
n→∞ n
This is between 0 and 1.

The Singleton bound says that α(δ) ≤ 1 − δ. The Gilbert bound says that α(δ) ≥
1 − Hq (δ), where Hq is the entropy

Hq (δ) = δ logq (q − 1) − δ logq δ − (1 − δ) logq (1 − δ).

Theorem 4.4 (Sfasman-Vladut-Zink). If q ≥ 40 and is a prime to an even power, then


1
α(δ) ≥ 1 − δ − √
q−1

for 0 ≤ δ ≤ θ.

The proof is by doing algebraic geometry on certain modular curves over Fq and getting
close to the Weil bound or something.

Theorem 4.5 (Sphere-packing/Hamming bound).


qn
Aq (n, 2d + 1) ≤ .
Vq (n, d)

7
Proof. If C is an (n, M, 2d + 1)-code, then every x ∈ Qn is within distance d of at most one
codeword, as otherwise the triangle inequality would give two codewords that are distance
qn
at most 2d apart. Thus q n ≥ M Vq (n, d), and M ≤ Vq (n,d) .

Asymptotically, this gives α(δ) ≤ 1 − H(δ/2).


d
Theorem 4.6 (Plotkin bound). If d > θn, then A(n, d) ≤ d−θn
.

Proof. Suppose C ⊆ Qn is an (n, M, d)-code. Write a matrix whose rows are all M code-
words. Pick a column. Let mj be the number of times the symbol j appears in this column.
Two codewords differ in this column precisely one when of them is some symbol j 0 and the
0
P is not, iterating over all j . In particular, of the M (M − 1) ordered pairs of codewords,
other
0≤j≤q−1 mj (M − mj ) pairs differ in this column.
We now find
X X X X
mj (M − mj ) = M mi − m2j = M 2 − m2j
0≤j≤q−1 0≤j≤q−1 0≤j≤q−1 j

1 q−1 2
≤ M2 − M2 = M = θM 2
q q
by Cauchy-Schwarz. Summing over every column, there are at most θM 2 n places where
pairs of codewords differ. There are M (M − 1) pairs of codewords, each differing in at least
d places. Thus
dM (M − 1) ≤ θM 2 n
d(M − 1) ≤ θM n
(d − θn)M ≤ d
d
M≤ .
d − θn

Unfortunately, Plotkin only applies in the narrow range θ ≤ δ ≤ 1. However, sometimes


we can manipulate other situations into this range.

Theorem 4.7 (Asymptotic Plotkin bound). α(δ) = 0 for θ ≤ δ ≤ 1.


α(δ) ≤ 1 − δ/θ for 0 ≤ δ ≤ θ.

Proof. If θ ≤ δ ≤ 1, then
δn
α(δ) = lim sup n−1 logq A(n, δn) ≤ lim sup n−1 logq = 0.
n→∞ n→∞ δn − θn

Now suppose we have an (n, M, d)-code with d ≤ θn. Set n0 = b(d − 1)/θc. Choose n − n0
positions in C, and consider only codes with particular letters in each of these positions.
0
Call this subcode C 0 . WLOG, we can assume |C 0 | ≥ M/q n−n . It still has minimum distance
d. Puncture C 0 at these n − n0 positions to get an (n0 , |C 0 |, d)-code. Note that d − n0 θ > 0

8
0
because of our choice of n0 , so by Plotkin, |C 0 | ≤ d
d−θn
≤ d. From |C 0 | ≥ M q n −n , we find
0
M ≤ q n−n d, so
bd/θc
logq M ≤ n − n0 + logq d ≤ n(1 − ) + logq d ≤ n(1 − δ/θ) + logq d
n
logq M δ logq d
≤1− +
n θ n
and taking the limit as n → ∞, we find α(δ) ≤ 1 − δ/θ.
Graph of all the asymptotic bounds:

5 June 29.
Today: cyclic codes.
Recall that a field is a set K with two commutative operations + and · such that a·(b+c) =
a·b+a·c, with additive and multiplicative identities 0 and 1, and additive and multiplicative
inverses. Today we work only with finite fields.
Proposition 5.1. If q ∈ Z+ is a prime power, there is a unique field of order q. If q is not
a prime power, there is no field of order q.
Let Fq be the finite field with q = pr elements. It may be constructed by setting Fp =
Z/pZ and Fq = Fp [x]/(f ), where f ∈ Fp [x] is an irreducible polynomial of degree r. Here
Fp [x] is the ring of polynomials in x with coefficients in Fp , (f ) is the set of all multiples of
f , and Fp [x]/f is the ring where f is identified with 0.

9
Theorem 5.2. The set of nonzero elements of Fq , denoted F×
q , form a cyclic group under
multiplication.

Proof. Suppose otherwise. Then F× ×


q can be written in the form Fq = G1 × G2 , where G1 , G2
are abelian groups with gcd(|G1 |, |G2 |) = n > 1. Then G1 has at least n elements of order
dividing n, as does G2 . Then F× 2
q has at least n elements of order dividing n, but the
elements α ∈ F× n
q of order dividing n are precisely the roots of x − 1, which has at most n
roots. Since n2 > n, this is a contradiction.

Definition 5.1. Fix q = pr and k ≤ q − 1. The Reed-Solomon [q − 1, k]-code has as


codewords the words f (1), f (α), . . . , f (αq−2 ), where f ∈ Fq [x] is a degree at most k − 1
polynomial and α ∈ Fq is a generator of F× q . If given the string (a0 , . . . , ak−1 ), we construct
k−1
the polynomial a0 + a1 x + · · · + ak−1 x ∈ Fq [x], and obtain the Reed-Solomon codeword
by evaluating it at every nonzero element of Fq .

Proposition 5.3. The RS [q − 1, k]-code has minimum distance q − k.

Proof. Let f be a nonzero polynomial of degree at most k − 1. Then f has at most k − 1


roots over Fq . So the codeword you set by evaluating f at every point of F× q has at most
k − 1 zeroes. So the codeword corresponding to f has weight at least (q − 1) − (k − 1) ≥ q − k,
which means RS has minimum distance at least q − k. By the Singleton bound, it also has
minimum distance at most q − k.
So Reed-Solomon codes are MDS codes—pretty much the only nontrivial ones. However,
they are very hard to decode.
RS codes satisfy the following property: if a0 , . . . , an−1 ∈ C, then a1 , . . . , an−1 , a0 ∈
C. Codes with this property are called cyclic codes. Proof: suppose (a0 , . . . , aq−2 ) is a
codeword in the RS code C. This means ai = f (αi ) for some degree k − 1 polynomial f and
some generator α ∈ F× q . Then the polynomial g(x) = f (αx) is also a degree at most k − 1-
polynomial, and g(αi ) = f (αi+1 ) = ai+1 . Thus g produces the codeword a1 , a2 , . . . , aq−2 , a0 .

Definition 5.2. A cyclic code C in Fnq is one where, if a0 , . . . , an−1 is in C, then


a1 , . . . , an−1 , a0 is in C.

Given any word a0 , . . . , an−1 ∈ Fnq , identify it with the polynomial a0 + a1 x + · · · +


an−1 xn−1 ∈ Fq [x]/(xn − 1). This identification is an isomorphism of vector spaces.

Proposition 5.4. C is a cyclic code if and only if its image under this map is an ideal in
Fq [x]/(xn − 1).

Definition 5.3. Given a ring R, an ideal I ⊂ R is a subset satisfying x, y ∈ I implies


x + y ∈ I, and x ∈ I, r ∈ R implies rx ∈ I.

Proof. Given c = (a0 , . . . , an−1 ) ∈ C, its image is p = a0 + · · · + an−1 xn−1 ∈ Fq [x]/(xn − 1).
Multiplying by x produces the polynomial xp = a0 x+a1 x2 +· · ·+an−1 xn . But xn = 1, so this
can be rewritten xp = an−1 +a0 x+· · ·+an−2 xn−1 , which is the image of an−1 , a0 , . . . , an−2 ∈ C.
So multiplication by x preserves the image of C in Fq [x]/(xn − 1). Because C is linear, so
does addition.

10
Proposition 5.5. Given C ⊆ Fq [x]/(xn − 1) an ideal, there is a polynomial g dividing xn − 1
such that C = (g), that is, every element of C can be written as rg for some r ∈ Fq [x]/(xn −1).

This polynomial is called the generator polynomial for a cyclic code.


Fact: if C has generator polynomial of degree d, then the corresponding cyclic code has
dimension n − d. For example, if g = x − 1, then the code contains as codewords all words
a0 , . . . , an−1 with a0 + · · · + an−1 = 0, so it’s binary checksum. This has dimension n − 1.

6 July 1.
Recall: we have been discussion linear codes over Fq , and we defined a cyclic code C ⊆ Fnq
to be one in which x = x0 , x1 , . . . , xn−1 ∈ C implies xn−1 , x0 , x1 , . . . , xn−2 ∈ C. We identify
Fnq with Fq [x]/(xn − 1) via the map a0 , . . . , an−1 7→ a0 + a1 x + · · · + an−1 xn−1 . Thus we can
think of codes as vector subspaces of Fq [x]/(xn − 1), and in fact the image of a cyclic code
C ⊆ Fnq is an ideal C ⊆ Fq [x]/(xn − 1).

Proposition 6.1. An ideal C ⊆ Fq [x]/(xn − 1) can be written as C = (g), where g is a


polynomial dividing xn − 1 in Fq [x].

For example, in F2 [x]/(x3 − 1), x3 − 1 factors as (x + 1)(x2 + x + 1), giving the possible
generators. The cyclic codes of length 3 are thus checksum, repetition, the zero code, and
the trivial code.
One can show that if g = g0 + g1 x + g2 x2 + · · · + gd xd is a generator polynomial for C,
then  
g0 g1 · · · gd 0 0 ··· 0
 0 g0 · · · gd−1 gd 0 · · · 0 
G =  .. ..
 
.. .. .. .. .. .. 
. . . . . . . .
0 0 · · · 0 g0 g1 · · · gd
is a generating matrix for C. In particular, C is a [n, n − d]-code.
Since g divides xn − 1, let h = (xn − 1)/g. We claim that a polynomial f is in C if and
only if f h = 0. Proof: if f ∈ C, f = f g for some f . Then f h = f gh = f (xn − 1) = 0.
Conversely, if hf = 0, then hf is a multiple of xn − 1 in Fq [x], so g divides f h ∈ Fq [x]. Since
g is coprime to (xn − 1)/g = h, g divides f and f ∈ C.
h is called a check polynomial for C. The corresponding parity check matrix is
 
0 ··· 0 0 hk hk−1 · · · h0
 0 · · · 0 hk hk−1 · · · h0 0 
 
 .. .. .. .. .. .. .. .. 
. . . . . . . .
hk · · · h1 h0 0 ··· 0 0

where h = h0 + h1 x + · · · + hk xk .
We now introduce the discrete Fourier transform. Fix q, n with gcd(q, n) = 1. Then fix
a primitive nth root of unity β over Fq . Define a function

Φ : F [x]/(xn − 1) → F [x]/(xn − 1)

11
X
(Φ(a))(x) = a(β j )xn−j .
1≤j<n

In the coding theory community this is known as a Mattson-Solomon polynomial.


Proposition 6.2. Φ is invertible and Φ−1 (A)(x) = n−1 (Φ(A))(x−1 ).
Proof. Suppose A(x) = (Φ(a))(x). Then
X
A(β k ) = a(β j )β k(n−j) .
1≤j≤n

Let a = a0 + a1 x + · · · + an−1 xn−1 . Then


X X
A(β k ) = ai β ij β k(n−j)
1≤j≤n 0≤i≤n−1

X X
= β kn ai β j(i−k) .
0≤i≤n−1 1≤j≤n

Now β kn = 1, while β j(i−k) is n if i = k and 0 if i 6= k. So this simplifies to


P
1≤j≤n

A(β k ) = nak .

Thus X
n−1 (Φ(A))(x−1 ) = n−1 A(β j )(x−1 )n−j
1≤j≤n
X X
= n−1 naj xj−n = aj xj = a
1≤j≤n
n
since x = 1.
In Reed-Solomon, we define C ⊆ Fq−1 q as containing all codewords a(1), a(α), a(α2 ),...,
a(αq−2 ) with a a polynomial of degree at most k − 1. Via Fqq−1 ∼
= Fq [x]/(xq−1 − 1), we can
identify C as the set of Mattson-Solomon polynomials obtained from polynomials of degree
at most k.
Corollary 6.3. The dimension k RS code C in Fq [x]/(xq−1 − 1) consists of all polynomials
with roots at α, α2 , . . . , αn−k .
In particular, if the minimal polynomial of αi is denoted mi , the generator polynomial g
of C is lcm(m1 , . . . , mn−k ).
Now we can describe the BCH (Bose-Rav-Chaudhuri-Hocquenghern) code. Fix q, n with
gcd(q, n) = 1. Let β be a primitive nth root of unity. The BCH code with minimum distance
d of length n is the ideal C ⊆ Fq [x]/(xn − 1) generated by the least ommon multiple of the
minimum polynomials of β, β 2 , · · · , β d−2 .
For example, when d = 1, this is an empty product, so C = (1) and we get the trivial
code.
Theorem 6.4. The minimal distance of C as presented above is indeed at least d.

12
Proof. For any f ∈ C, we have f (β) = f (β 2 ) = · · · = f (β d−1 ) = 0. Writing f in vector form
as (f0 , . . . , fn−1 ), we ahve
 
1 β β2 ··· β n−1
1 β 2 β4 ··· β 2(n−1)   T
 f = 0.

 .. .. .. .. ..
. . . . . 
d−1 2(d−1) (d−1)(n−1)
1 β β ··· β

We must show that any d − 1 columns are linearly independent. Pick any d − 1 columns
i1 , . . . , id−1 so that 0 ≤ i1 < i2 < · · · < id−1 ≤ n − 1. The corresponding submatrix is
 
β i1 β i2 ··· β id−1
 .. .. .. .. 
 . . . . 
i1 (d−1) i2 (d−1) id−1 (d−1)
β β ··· β

and the determinant of this is the Vandermonde determinant


Y
β i1 +···+id−1 (β ij − β ik ) 6= 0.
1≤j<k≤d−1

For example, consider F2 [x]/(x15 − 1). We have

x15 − 1 = (x + 2)(x2 + x + 1)(x4 + x + 1)(x4 + x3 + 1)(x4 + x3 + x2 + x + 1).

Let β be a root of x4 + x + 1. Then β 2 , β 4 are also roots of x4 + x + 1, and β 3 is a root of


x4 +x3 +x2 +x+1. So the BCH code with d = 3 is generated by x4 +x+1. That is a [15, 11, 3]-
code, in fact a Hamming code. If we take d = 5, we get g = (x4 + x + 1)(x4 + x3 + x2 + x + 1),
which produces a new [15, 7, 5]-code.

7 July 6.
New resource: Stichtenoth, Algebraic function fields and codes. Try exercises 1.6, 1.8, 2.5,
2.9.
Recall that a Reed-Solomon code is a code over Fq , with parameter k, whose codewords
are of the form f (1), f (α1 ), . . . , f (αq−2 ) where f is a polynomial of degree at most k − 1 and
α ∈ F×q is a generator of the multiplicative group.

Definition 7.1. Given a field k, the field of rational functions k(x) is given by ratios
f (x)/g(x), where f, g ∈ k[x] are polynomials with g 6= 0, modulo the equivalence f /g = f 0 /g 0
if and only if f g 0 = gf 0 .

Definition 7.2. An algebraic function field over k is a finite extension of k(x). (We
say that a field F is a finite extension of k(x) if F contains k(x) and is a finite-dimensional
vector space over k(x).)

13
An example is k(x)[y]/(y 2 − x3 − ax − b). A non-example is k(x, y)/k.
We will assume that an algebraic
√ function field F/k contains no finite extension k 0 of k.
For example, we exclude Q(x)[ 2]/Q.
Definition 7.3. A valuation v on the algebraic function field F/k is a function v : F →
Z ∪ {∞} satisfying:
v(x) = ∞ if and only if x = 0
v(xy) = v(x) + v(y)
v(x + y) ≥ min(v(x), v(y))
v(a) = 0 for a ∈ K \ {0}
v(x) = 1 for some x ∈ F .
Example 7.1. Over k(x), we can define a valuation vx : k(x) → Z ∪ {∞} by setting vx (f /g)
to be the highest power of x dividing f minus the highest power of x dividing g.
Similarly, for any irreducible polynomial p ∈ k[x], we can define a valuation vp by setting
vp (f /g) to be the highest power of p dividing f minus the highest power of p dividing g.
For k = Fq , there is one more valuation on Fq (x): v∞ (f /g) = deg(g) − deg(f ).
Definition 7.4. A place P of a function field F/k is a subset P ⊂ F of the form

P = {f ∈ F | v(f ) ≥ 1}

for some valuation v. This is a vector subspace of F , and is a maximal ideal of the ring
OP = {f ∈ F | vp (f ) ≥ 0}.
Key point: a place P determines a valuation vP , and vice versa.
Proposition 7.1. Every f ∈ F \ K satisfies vP (f ) > 0 for some P (“f has a zero at P ”)
and vP 0 (f ) < 0 for some P 0 (“f has a pole at P 0 ”).
For example, f = x2 + 1 has a zero at x2 + 1 and a pole of degree 2 at ∞.
Definition 7.5. The degree deg P of a place P is given by deg P = dimK OP /P .
For example, deg P∞ = 1, and deg Pp(x) = deg p(x).
P
Definition 7.6. A divisor D = 1≤i≤k ai Pi is a finite formal linear combination of places
Pi with coefficients in Z. We call the group of divisors div(F ).
P
Definition 7.7. Let D = ai Pi be a divisor. The degree of D is given by deg D =
0 0
P
1≤i≤k ai deg Pi . D is called effective if ai ≥ 0 for all i. We say D ≥ D if D − D is
effective.
Definition 7.8. The complete linear system associated to D, denoted |D|, is given by
|D| = {f ∈ F | vPi (f ) ≥ −ai for Pi appearing in D, vP (f ) ≥ 0 for other P }.
Let K = Fq , F = Fq (x). Consider the complete linear series |dP∞ |, consisting of all ratio-
nal functions with a pole of order at most d at P∞ and no other poles, that is, polynomials
of degree at most d. P
The principal divisor associated to f ∈ F is the divisor (f ) = places P vP (f )P . (This
can be shown to be finite.)

14
Proposition 7.2. deg((f )) = 0.

Construction: start with P


an algebraic function field F/Fq . Let P1 , . . . , Pn be n distinct
places of degree 1, and D = Pi . Let G be a divisor containing no Pi as a summand. The
algebro-geometric code C(D, G) is given by the codewords

{(f (P1 ), . . . , f (Pn )) | f ∈ |G|}.

Here, f (Pi ) means the image of f ∈ OPi under the map OPi → OPi /Pi ∼ = K. For example,
if F = Fq (x), Pi = P(x−ai ) , then f (Pi ) = f (mod x − ai ) = f (ai ).
We can get the original Reed-Solomon code this way: work in Fq (x)/Fq , and take Pi =
X − αi (1 ≤ i ≤ q − 1, α ∈ F× q primitive), and G = (k − 1)P∞ .

Theorem 7.3. The dimension of C(D, G) is dim |G| − dim |G − D|. The minimum distance
of C(D, G) is at least n − deg G.

Proof. The dimension of C(D, G) is the dimension of the image of the map ϕ : |G| → Fnq ,
f 7→ (f (P1 ), . . . , f (Pn )). This is dim |G| − dim ker ϕ. But f ∈ |G| is in the kernel of ϕ if
and only if f ∈ Pi for all 1 ≤ i ≤ n, that is, vPi (f ) ≥ 1 for all i, that is, f ∈ |G − D|. So
dim ker ϕ = dim |G − D|.
Now suppose there is f ∈ |G| such that wt(f ) ≤ n − deg G − 1. Then f vanishes at
at least n + 1 of the places Pi . Say f (Pi1 ) = f (Pi2 ) = · · · = f (Pideg G+1 ) = 0. Then let
D0 = Pi1 + · · · + Pideg G+1 , so that f ∈ |G − D0 |. But deg(G − D0 ) = −1, so |G − D0 | = 0,
contradiction.
Call n−deg |G| the designed distance of C(D, G). Note that C(D, G) is an [n, dim |G|−
dim |G − D|, n − deg G]-code.

Theorem 7.4 (Riemann-Roch). Fix F/Fq an algebraic function field. There is an integer
g ≥ 0 and a divisor K such that

dim |D| − dim |K − D| = deg D − g + 1

for all divisors D on F .

The number g is called the genus of F/Fq , and deg K = 2g − 2.

Corollary 7.5. If F/Fq is a genus g extension and n, d are chosen with n > d > 2g − 2,
then any algebro-geometric code with deg D = n, deg G = d is an [n, d − g + 1, n − d]-code.

So strong codes come from function fields of low genus with large numbers of degree-1
points.

8 July 8.
Recall: let F/Fq be an algebraic function field, for example Fq (x)/Fq . Take a collection
P1 , . . . , Pn of distinct degree 1 places and a divisor G containing none of the Pi . Then we
have a code of length n with codewords f (P1 ), f (P2 ), . . . , f (Pn ), f ∈ |G|. Here, the complete

15
linear series |G| is the set of f such that vP (f ) ≥ −vP (G) for all places P . For example,
if P∞ is the place on Fq (x) corresponding to the valuation v∞ (f /g) = deg g − deg f , then
|dP∞ | is the space of polynomials of degree at most d.
We showed that C(P1 + · · · + Pn , G) is an [n, dim |G| − dim |G − D|, n − deg G] code, and
so now we want a way of computing dim |G| and dim |G − D|.
Theorem 8.1 (Riemann-Roch). Given any algebraic function field F/k, there is an integer
g ≥ 0 and a divisor W such that

dim |D| − dim |W − D| = deg D − g + 1.

We will now sketch a proof of this theorem. The below propositions are proved in
Stichtenoth.
Proposition 8.2. For any divsior D, we have dim |D| ≤ deg D + 1.
Proposition 8.3. Given F/k an algebraic function field, there is a constant C such that
deg D − dim |D| ≤ C for all divisors D.
We write l(D) = dim |D| to clean up notation.
Definition 8.1. The genus g of F/k is given by the formula

g = max{deg D − l(D) + 1 | D a divisor}.

By the previous two propositions, g is a well-defined nonnegative integer.


Example 8.1. k(x)/k has genus 0: for any place P , either P = P∞ or P = Pp(x) , with
l(P∞ ) = 2 (|P∞ | being generated by 1, x as a k-vector space) and l(Pp(x) ) = deg p(x) + 1
(since |Pp(x) | consists of f (x)/p(x) where f (x) is any polynomial of degree at most deg p(x)).
So deg D − l(D) + 1 = 0 for D = P . But also we have l(D1 + D2 ) ≥ l(D1 ) + l(D2 ) − 1 for
D1 , D2 effective (and deg(D1 + D2 ) = deg D1 + deg D2 ), so this is true for all D.
In general it is hard to compute the genus of an algebraic function field.
Theorem 8.4 (Riemann’s theorem). For any divisor D on F/k, we have l(D) ≥ deg D −
g + 1. For any divisor D of sufficiently high degree, l(D) = deg D − g + 1.
Proof. The first statement follows from the definition of g.
For the second, let D0 be a divisor on F/k such that deg D − l(D) + 1 = g. Let
C = deg D0 + g and let D be any divisor of degree at least C. Then deg(D − D0 ) ≥ g, so
l(D − D0 ) ≥ g − g + 1 = 1. Thus D + (f ) ≥ D0 for some f ∈ F/k, so

deg D − l(D) = deg(D + (f )) − l(D + (f )) ≥ deg D0 − l(D0 ) = g − 1

(actually we won’t prove this inequality but it’s true) so

g − 1 ≥ deg D − l(D) ≥ g − 1.

16
Define the “index of specialization” i(D) of a divisor D be given by i(D) = l(D)−deg D+
g − 1. In particular, Riemann’s theorem states that i(D) ≥ 0 for all D and i(D) = 0 for all
D of sufficiently large degree.
Definition 8.2. Denote by PF the set of places of F/k. An adele α of F is a map α : PF →
F , P 7→ αP , such that vP (αP ) ≥ 0 for all but finitely many places P . We denote by AF the
vector space of adeles.
Definition 8.3. A principal adele is one of the form αf : P 7→ f , f ∈ F . This is an adele
because every f ∈ F has finitely many poles. This gives us an inclusion F ,→ AF .
Definition 8.4. Given a divisor D, let AF (D) be the space
AF (D) = {α ∈ AF | vP (α) ≥ vP (−D) for all places P }.
Example 8.2. Af (0) consists of adeles such that vP (αP ) ≥ 0 for all places P .
Theorem 8.5. For D ∈ div(F ), we have
i(D) = dimk (AF /(AF (D) + F )).
Proof. Step 1: for any divisor D1 ≤ D2 , dim(AF (D2 )/AF (D1 )) = deg D2 − deg D1 . Proof:
assume D2 = D1 + P . Then the vector space AF (D2 )/AF (D1 ) only depends on the P th
place of AF . In particular, letting t ∈ F be such that vP (t) = vP (D1 ) + 1, we have that the
morphism
AF (D2 )/AF (D1 ) → OP /P
α 7→ tαP
P
is an isomorphism. (By definition, if D = ai Pi , vPi (Di ) = ai .) So dim AF (D2 )/AF (D1 ) =
dim OP /P = deg P . The claim follows by induction.
Step 2: given D2 ≥ D1 ,
dim(AF (D2 ) + F )/(AF (D1 ) + F ) = i(D1 ) − i(D2 ).
Proof: use an exact sequence
0 → |D2 |/|D1 | → (AF (D2 ) + F )/(AF (D1 ) + F ) → AF (D2 )/AF (D1 ) → 0.
Step 3: AF = AF (D) + F for all divisors D with i(D) = 0. Proof: given any adele α ∈ AF ,
α ∈ AF (D0 ) for some D0 . By taking D0 sufficiently large, we can assume i(D0 ) = 0 for
D0 ≥ D. Then in particular we have
dim(AF (D0 ) + F )/(AF (D) + F ) = 0.
Since α ∈ AF (D0 ) + F , α ∈ AF (D) + F , and so AF = AF (D) + F .
Proof of the theorem: let D be arbitrary, D0 ≥ D a divisor with i(D0 ) = 0. Then
dim AF /(AF (D) + F ) = dim AF /(AF (D0 ) + F ) + dim(AF (D0 ) + F )/(AF (D) + F )
= 0 + i(D) = i(D).

17
Corollary 8.6 (Bad Riemann-Roch). For any D,

l(D) − dim AF /(AF (D) + F ) = deg D − g + 1.

Definition 8.5. A Weil differential is a linear map ω : AF → K such that ω is zero on


some subspace of the form AF (D) + F .

Given a Weil differential ω, let (ω) be the divisor in div(F ) satisfying


1. ω vanishes on AF ((ω)) + F
2. If ω vanishes on AF (D) + F for some D, then D ≤ (ω).
We call (ω) a canonical divisor.

Theorem 8.7. For any D and canonical divisor (ω), we have

dim AF /(AF (D) + F ) = l((ω) − D).

This gives us Riemann-Roch.

9 July 11.
Last time: let F/k be an algebraic function field of genus g. We defined an adele to be the
space of maps α : PF → F satisfying vP (αP ) ≥ 0 for all but finitely many P . We proved the
rough Riemann-Roch theorem
AF
l(D) − dim = deg D − g + 1.
AF (D) + F

We defined a Weil differential to be a function ω : AF → k vanishing on some subspace


AF (D) + F , and we claimed that if (ω) is the divisor associated to ω, then

AF
l((ω) − D) = dim .
AF (D) + F

This claim is the only thing separating us from full Riemann-Roch. We will now sketch a
proof of this claim.
Recall that AF (D) = {α ∈ AF | vP (αP ) ≥ vP (−D) for all places P }. Also recall the
following definition:

Definition 9.1. A canonical divisor associated to a Weil divisor is a divisor (ω) satisfying
1. ω vanishes on AF ((ω)) + F ,
2. If ω vanishes on AF (D) + F , then D ≤ (ω).

Lemma 9.1. A nonzero Weil divisor has an associated canonical divisor.

See Stichtenoth 1.5 for the proof of this lemma.

Definition 9.2. Let ΩF denote the space of Weil differentials, and ΩF (D) the space of those
that vanish on AF (D) + F .

18
AF
Lemma 9.2. dim ΩF (D) = dimk AF (D)+F
.

Proof. ω ∈ ΩF (D) is a linear function ω : AF → k with kernel containing AF (D) + F . That


AF
is, it’s a map ω : AF (D)+F → k. So ΩF (D) is the dual of AF (D)+F
AF
and thus has the same
dimension.

Corollary 9.3. Every algebraic function field has a nonzero Weil divisor.

Proof. Take any divisor D of degree ≤ −2. Then dim ΩF (D) = i(D) ≥ 1, i.e. ΩF (D)
contains nonzero elements.
Given f ∈ F , ω ∈ ΩF , we define the composition f ω : AF → k by (f ω)(α) = ω(f α).
(Multiplication by f indeed sends ΩF to itself, because it sends ω ∈ ΩF (D) to f ω ∈ ΩF (D +
(f )).) So we can regard ΩF as a nonzero F -vector space.

Theorem 9.4. ΩF is a one-dimensional F -vector space.

We will not prove this.


Let ω be a Weil differential and W = (ω) the associated canonical divisor.

Theorem 9.5 (duality theorem). Fix a divisor D. The map µ : |W −D| → ΩF (D), f 7→ ωf ,
is an isomorphism.

Proof. We need to check:


1. This map makes sense. Given f ∈ |W − D| and ω ∈ ΩF (W ), we have f ω ∈ ΩF (W +
(f )). Since (f ) ≥ D − W by definition, we have f ω ∈ ΩF (W + D − W ) = ΩF (D), as we
needed.
2. It’s linear. This is trivial: (af + g) 7→ (af + g)ω = af ω + gω.
3. It’s injective. This is because if µ(f ) = 0, then f ω = 0, so f is not invertible in F , so
f = 0.
4. It’s surjective. Suppose ω 0 ∈ ΩF (D). Because ΩF is a one-dimensional vector space
over F , we have ω 0 = f ω for some f ∈ F . Then f ω ∈ ΩF (D), so ω ∈ ΩF (D −(f )), so by part
2 of the definition of a canonical divisor, D − (f ) ≤ W , so (f ) ≥ D − W , so f ∈ |W − D|.

Corollary 9.6. l(W − D) = dim ΩF (D) = i(D).

Corollary 9.7. l(D) − l(W − D) = deg D − g + 1.

This completes our proof of Riemann-Roch.

Corollary 9.8. Any canonical divisor W has degree 2g − 2 and l(W ) = g.

Proof. Riemann-Roch with D = 0 gives

l(0) − l(W ) = 0 − g + 1

1 − l(W ) = 1 − g
l(W ) = g.

19
Riemann-Roch with D = W gives

l(W ) − l(0) = deg W − g + 1

g − 1 = deg W − g + 1
deg W = 2g − 2.

Okay, now back to codes. Let F/Fq be an algebraic function field. Let P1 , . . . , Pn be
distinct degree-1 places, D = P1 + · · · + Pn . Let G be a divisor on F with support disjoint
from D. Recall that the code C(D, G) has codewords

{(f (P1 ), . . . , f (Pn )) | f ∈ |G|}

and that C(D, G) is an [n, k, d]-code with k = l(G) − l(G − D) and d ≥ n − deg G. So if
n > deg G > 2g −2, Riemann-Roch tells us that C(D, G) is an [n, deg G−g +1, ≥ n−deg G]-
code.
Recall that the Singleton bound says that [n, k, d]-codes satisfy k + d ≤ n + 1. So for
algebro-geometric codes of this form, we have n − g + 1 ≤ k + d ≤ n + 1.
If f1 , . . . , fk are generators of |G|, the generating matrix for C(D, G) is
 
f1 (P1 ) · · · f1 (Pn )
 .. .. ..  .
 . . . 
fk (P1 ) . . . fk (Pn )

Definition 9.3. Given F/Fq and D, G as above, the code CΩ (D, G) is given by the codewords

{(ωP1 (1), . . . , ωPn (1)) | ω ∈ ΩF (G − D)}

where ωP : F → Fq is the function sending f ∈ F to ω(α), where α is the adele with αP = f


and αP 0 = 0 for P 0 6= P .
Proposition 9.9. CΩ (D, G) is an [n, k, d]-code with k = i(G − D) − i(G) and d ≥ deg G −
(2g − 2).
Proof. We have a map Ω(G − D) → CΩ (D, G) sending ω to the corresponding codeword.
Ω(G−D) has dimension i(G−D). The kernel of this map consists of ω ∈ Ω(G−D) vanishing
at all AF (Pi ). So such ω lie in ΩF (G), which has dimension i(G).
The minimum distance result is proven similarly.
Theorem 9.10. CΩ (D, G) = C(D, G)⊥ .
Proof. First, for CΩ to be the dual, it must have the right dimension. Say C(D, G) is an
[n, k] code; then we need CΩ to be an [n, n − k]-code. But we have

dim C(D, G) = l(G) − l(G − D)

dim CΩ (D, G) = i(G − D) − i(G)

20
so their sum is

(l(G) − i(G)) − (l(G − D) − i(G − D)) = (deg G − g + 1) − (deg G − n − g + 1) = n

by Riemann-Roch.
Now we need to show that if ω ∈ Ω(G − D) and f ∈ |G|, we have
X
f (Pi )ωPi (1) = 0.
1≤i≤n

This sum evaluates to X


ωPi (f ) = ω(α)
i

where α is given by αPi = f for 1 ≤ i ≤ n and αP = 0 otherwise. Then one can show that
α ∈ AF (G − D) + F , so ω vanishes at it.

Theorem 9.11. Given G, D as above,

CΩ (D, G) = C(D, (ω) − G + D)

for some canonical divisor (ω).

Recall that a central problem of coding theory is computing the numbers Aq (n, d) =
max{k | an [n, k, d]q -code exists}, or, asymptotically, the numbers

A(n, bδnc)
α(δ) = lim sup .
n→∞ n

Algebro-geometric codes are [n, n − d − g + 1, d]-codes, so we can bound α(δ) by maximizing

n−d−g+1 g−1
=1−δ− .
n n
g−1
So we want to minimize n
.

Theorem 9.12. If q is a square, there is a sequence of algebraic function fields Fi over Fq


such that
#{degree-1 places of Fi } √
→ q − 1.
genus of Fi
Moreover, this is the best possible bound.

This gives the Tsfasman-Vladut-Zink lower bound


1
α(δ) ≥ 1 − δ − √ .
q−1

21
10 July 13.
Up to now, we have mostly been looking for good codes, that is, vector subspaces C ⊆ Fnq
that are [n, k, d]-codes with k, d large. In real life, we want other features—for example, easy
error correction. So let’s talk about decoding algebro-geometric codes.
Let F/Fq be an algebraic function field of genus g, D = P1 + · · · + Pn a sum of distinct
degree 1 places, and G a divisor with disjoint support from D. How do we decode CΩ =
CΩ (D, G)? First we note that it is the dual code to C(D, G), so if C ∈ Fnq is a codeword,
then we have X
Ci f (Pi ) = 0
1≤i≤n

for all f ∈ |G|. Suppose we send the codeword C, but you receive the codeword R = C + E.
Define X
[R, f ] = Ri f (Pi ).
1≤i≤n

Then assuming wt(E) ≤ d, we have [R, f ] = 0 for all f ∈ |G| if and only if R = C and E = 0,
that is, [C, f ] = 0 for all f ∈ |G|, and [R, f ] = [E, f ]. This expression is called a “syndrome”.
Our problem is thus reduced to finding E of low weight such that [E, f ] = [R, f ].
Now fix an integer t ≥ 0 and a divisor G1 such that
supp(G1 ) ∩ supp(D) = ∅
deg G1 < deg G − (2g − 2) − t
l(G1 ) > t.

This data will alow us to correct t errors in CΩ whenever t ≤ d 2−1 , where d∗ = deg G−(2g−2).

(It’s always possible to find t, G1 satisfying t ≥ d −1−g 2
.)
Suppose E is supported at i1 , . . . , iv , v ≤ t (so it’s zero everywhere else). We want to
find f ∈ |G1 | that vanishes at Pi1 , . . . , Piv . This is possible because v ≤ t < l(G1 ).
Now let f1 , . . . , fl be a basis for |G1 |, g1 , . . . , gk a basis for |G−G1 | (note k ≥ t), h1 , . . . , hm
a basis for |G|. We want f ∈ |G1 | that vanishes P at Pi1 , . . . , Piv , but such an f would satisfy
[R, f gρ ] = 0 for all 1 ≤ ρ ≤ k. Writing f = 1≤λ≤l aλ fλ , this becomes
X
[R, fλ gρ ]aλ = 0
1≤λ≤l

for all 1 ≤ ρ ≤ k. Solving these linear equations for aλ , we get our function f . Then evaluate
f at all Pi to find the places Pi1 , . . . , Piv where f vanishes, so that E is supported at these
places. Now we have X
[R, hµ ] = Ejk hµ (Pjk ).
1≤k≤v

Solving this set of linear equations gives a unique solution set Ejk , giving us E. Finally, we
set C = R − E.
We have a more explicit algorithm for decoding Reed-Solomon codes. Fix a field Fq
and α ∈ F× q a primitive root of unity. Then the Reed-Solomon [q − 1, k, q − k]-code has
codewords f (1), f (α), . . . , f (αq−2 ) where f is a polynomial of degree at most k − 1. Let
f = f0 + f1 x + f2 x2 + · · · + fk−1 xk−1 .

22
Proposition 10.1. Given any RS codeword C0 , C1 , . . . , Cq−2 , write it in polynomial form:
C(x) = C0 + C1 x + · · · + Cq−2 xq−2 . We have
1. C(αi ) = 0, 1 ≤ i ≤ q − k − 1;
2. −C(α−i ) = fi .

Now suppose we receive a word R = R0 , . . . , Rq−2 of the form R = C + E where C is a


codeword and E is an error of weight at most (q − k)/2. How do we determine E?
To begin, write R, E as polynomials R(x), E(x) and set

Si = R(αi ) = E(αi ).

ei xij . Create the error position polynomial


P
Now suppose E(x) = 1≤j≤v
Y
Λ(x) = (1 − αij x) = 1 + Λ1 x + · · · + Λv xv
1≤j≤v

so that Λ(α−i ) = 0 if and only if E(x) has a nonzero xi . Multiply Λ(α−ij ) = 0 by ej αkij for
k ∈ Z and sum to get
(k−1)ij
ej αkij + Λ1 ej + · · · + Λv e(k−v)ij = 0.

Sum this over all 1 ≤ j ≤ v. Assume k > v. Then we get

Sk + Λ1 Sk−1 + · · · + Λv Sk−v = 0

for all v < k ≤ q − k − 1. In particular, if we know there are v errors, then Λ1 , . . . , Λv satisfy

S1 S2 · · · Sv
 
   
.. ..  Λ v −S v+1
S2 . ··· .   ..   .. 

.
 .. .. .. ..   .   .  .
 =
. . . Λ1 −S2v
Sv Sv+1 · · · S2v−1

So inverting this matrix, we can solve for Λ.


Now apply a naive trick: start by assuming that we have a maximal number of errors.
Create the matrix of syndromes. If this matrix is invertible, then there are actually that
many errors. Otherwise, subtract 1 from v and try until we get something invertible. Once
the matrix Mv is invertible, compute
   
Λv −Sv+1
 ..  −1  . 
 .  = Mv  ..  .
Λ1 −S2v

From this, we recover Y


Λ= (1 − αij x).
Using Chien’s algorithm, determine the constants αij ; take the discrete log (by table lookup)
of each to extract ij , the incorrect indices.

23
ej xij , take the equations
P
Finally, to recover E =
X
Sk = ej αkij
1≤j≤v

and solve the linear system to get ej .


This algorithm has some problems, the most glaring of which is the necessity of testing a
bunch of large matrices for invertibility. The best way to improve this is by the Berlekamp-
Massey algorithm.

11 July 15.
Today: weight enumerators. We start with a [n, k]-code C ⊆ Fnq . We define the integers Ai
by the formula
Ai = #{c ∈ C | wt(c) = i}
so A0 = 1, Ai = 0 for 0 < i < d, etc.
Definition 11.1. The weight enumerator WC (z) of a code C is the polynomial
X
WC (z) = Ai z i .
0≤i≤n

Weight enumerators are nontrivial to calculate.


Example 11.1. Let’s try the trivial [n, n, 1]q code over Fq . To pick c ∈ Fnq of weight i, we
must pick i integers
1 ≤ j1 < · · · < ji ≤ n
and pick one of q − 1 nonzero terms for each Cjk . So
 
n
Ai = (q − 1)i
i
X n
WC (z) = (q − 1)i z i = (1 + (q − 1)z)n .
0≤i≤n
i
h i
q k −1 q k −1
Example 11.2. Start with the Hamming q−1
, q−1 − k, 2 code Hq,k . Take its dual Sq,k =

Hq,k , called the simplex code. The parity check matrix of Hq,k has as columns a maximal
set of pairwise linearly independent vectors v1 , . . . , vn in Fkq , so to generate a codeword in
Sq,k , you take x ∈ Fkq and take the dot product with all the vi .
Question: given x ∈ Fkq , how many of the vi · x are equal to zero?
Answer: If x = 0, all of them.
k−1
If x 6= 0, q q−1−1 of the vi · x are 0.
So every nonzero codeword in Sq,k has weight precisely

q k − 1 q k−1 − 1
− = q k−1 .
q−1 q−1

24
h i
q k −1
So Sq,k is a q−1
, k, q k−1 -code and

k−1
WSq,k (z) = 1 + (q k − 1)z q .

Recall MacWilliams’s theorem:

Theorem 11.1. If C ⊆ Fnq is an [n, k]-code with dual C ⊥ , then the weight enumerators of
C, C ⊥ satisfy  
−k n 1−z
WC ⊥ (z) = q (1 + (q − 1)z) W .
1 + (q − 1)z
Corollary 11.2.
 qk−1 !
q k −1 1−z
WHq,k (z) = q −k (1 + (q − 1)z) q−1 1 + (q k − 1) .
1 + (q − 1)z

Trivia: simplex codes are useful for extremely noisy channels. The S2,5 -code was used in
communications with the Mariner 9 Mars probe.
Remark 5. The Hadamard code of length n corrects more than n(q − 1)/q errors, which
means it is affected by the Plotkin bound.
To prove MacWilliams’s theorem, we need to review character theory.

Definition 11.2. A character ψ of a finite abelian group is a homomorphism ψ : G → C× .


For example, if G = Z/nZ, ψ(k) = e2πik/n is a character.
The character given by ψ(g) = 1 for all g is called the trivial character.

Proposition 11.3 (Most Important Proposition (MIP)). Given any finite abelian group G
and ψ : G → C× , we have
(
X |G| ψ is trivial
Sψ = ψ(g) =
g∈G
0 ψ is not trivial.
P
Proof. If ψ is trivial, Sψ = g∈G 1 = |G|.
Now suppose ψ is nontrivial. There is some g 0 ∈ G such that ψ(g 0 ) = ω 6= 1. Note that
X X
ωSψ = ψ(g 0 ) ψ(g) = ψ(g 0 )ψ(g)
g∈G g∈G

X X
= ψ(g 0 g) = ψ(g) = Sψ .
g∈G g∈G

So we conclude ωSψ = Sψ , so (1 − ω)Sψ = 0, so Sψ = 0.


Proof of MacWilliams’s theorem. Let ψ : Fq → C× be a nontrivial character of (Fq , +).
Define a function
g : Fnq → C[z]

25
X
u 7→ ψ(u · v)z w(v) .
v∈Fn
q

Sum this over all u ∈ C to get


X XX X X
g(u) = ψ(u · v)z w(v) = z w(v) ψ(u · v).
u∈C u∈C v∈Fn
q v∈Fn
q u∈C

We have two possibilities:


1. If v ∈ C ⊥ , ψ(u · v) = 1 for all u, and u∈C ψ(u · v) = |C|.
P

P 2. If v ∈/ C ⊥ , u · v takes all values in Fq an equal number of times, so by the MIP,


u∈C ψ(u · v) = 0.
So X X X
g(u) = z w(v) |C| = |C| z w(v) = |C|WC ⊥ (z).
u∈C v∈C ⊥ v∈C ⊥

Now let w also be the function on Fq with w(0) = 0 and w(a) = 1 for a 6= 0. We have
X
g(u) = ψ(u · v)z w(v)
v∈Fn
q

which, if u = (u1 , . . . , un ), is
!
X X
ψ ui vi z w(v1 )+···+w(vn )
v1 ,...,vn ∈Fq 1≤i≤n

X Y
= ψ(ui vi )z w(vi )
v1 ,...,vn ∈Fq 1≤i≤n
Y X
= ψ(ui v)z w(v) .
1≤i≤n v∈Fq

We have two possibilities:


1. If ui = 0, the inner sum is
X
z w(v) = 1 + (q − 1)z.
v∈Fq

P
2. If ui 6= 0, we have v∈Fq ψ(ui · v) = 0 by the MIP, so
X
ψ(ui v)z w(v) = z · 0 + (1 − z) = 1 − z.
v∈Fq

We conclude that
 w(u)
n−w(u) w(u) n 1−z
g(u) = (1 + (q − 1)z) (1 − z) = (1 + (q − 1)z)
1 + (q − 1)z

26
and so  
X
n 1−z
g(u) = (1 + (q − 1)z) WC
u∈C
1 + (q − 1)z
which, combined with X
g(u) = |C|WC ⊥ (z)
u∈C

from earlier, gives the desired result.


Definition 11.3. A linear [2k, k]-code C is called self-dual if C ⊥ = C.
Example 11.3. I2 ⊆ F2q , the repetition code of length 2, is self-dual: G = H = [1, 1] and
WI2 (z) = 1 + z 2 .
The extended Hamming code H8 of length 8, whose generating matrix is
 
1 0 1 0 1 0 1 0
0 1 1 0 0 1 1 0 
0 0 0 1 1 1 1 0 = G = H,
 

1 1 1 1 1 1 1 1

is self-dual, and WH8 (z) = 1 + 14z 4 + z 8 .


Corollary 11.4 (Corollary of MacWilliams’s theorem, called Gleason’s theorem). A self-
dual binary code C has weight enumerator W (z) lying in C[1 + z 2 , 1 + 14z 4 + z 8 ].
Proof. Apply MacWilliams, observing that WC ⊥ (z) = WC (z). One gets an additional con-
dition from the fact that all c ∈ C have even weight.
Example 11.4. The Golay code G24 is a [24, 12, 8]-code over F2 , so its weight enumerator
is subject to Gleason’s theorem. But there is no connection to I2 or H8 .
Open question: what are the self-dual codes?

12 July 27.
Today: channels of communication and Shannon’s Theorem. So far, our basic question has
been: what codes of length n correcting d changes can we come up with? That is to say,
how can we communicate through a channel where messages consist of words of length n and
received words have at most d errors? Today, we will discuss three slightly more realistic
models of channels: the binary symmetric channel, the erasure channel, and the deletion
channel.
The binary symmetric channel works as follows: we use binary codes. Any bit in a sent
message has a probability 1 − p of being sent correctly, and a probability p of being flipped.
The binary erasure channel works as follows: we use binary codes. Every sent bit has
a probability 1 − p of being received correctly and a probability p of being replaced by the
“erased” bit e.
Question: how many codewords can there be in a channel with words of length n such
that decoding is “probably” possible?

27
Example 12.1. For the deterministic channel, we have a maximum rate of at least 1−Hq (δ),
where δ is the proportion of errors and
Hq (δ) = q logq (q − 1) − δ logq δ − (1 − δ) logq (1 − δ)
(Gilbert bound).
Claim: in the binary erasure channel (BEC) with probability p of error, there can be at
most 2n(1−p) codewords for decoding to work, i.e. the rate R of a code in BEC is at most
1 − p.
Proof: suppose a genie tells the messenger which bits will be erased. On average, np will
be erased, so the messenger must fit the message into a length-n(1 − p) string, and there are
at most 2n(1−p) ways of doing this.
Fact: if R < 1 − p, as n → ∞, it is possible to choose 2nR codewords in {0, 1}n such that
the probability a message will fail to be decoded goes to 0.
Proof: choose the M = 2bRnc codewords independently at random. Suppose we send the
codeword C. By Chebyshev’s inequality, we have
p(1 − p)
P (at least (p + )n bits are erased) ≤ .
n
In particular, with  > 0 fixed, this probability approaches 0 as n → ∞. Choose  =
(1 − p − R)/2. Now, given any codewords C 0 6= C, the probability that C 0 agrees with C
in the (1 − p − )n bits not erased is equal to 2−(1−p−)n . Then the expected number of
codewords C 0 6= C that agree with C in these places is at most
1−p−R
2Rn · 2−(1−p−)n = 2(R−(1−p−))n = 2− 2
n

which goes to 0 as n → ∞. So the probability of an error in transmission also approaches 0.


We say that “the channel capacity for BEC is 1 − p”.
Define the entropy of p ∈ [0, 1] to be
H(p) = −(p log2 p + (1 − p) log2 (1 − p))
(all logs are base 2 for the rest of this class).
In the binary symmetric channel (BSC) with error probability p, choose 0 < R < 1−H(p).
Assuming the code length is n, choose M = 2bRnc codewords uniformly at random. To
decode, find the nearest neighbor of the received message and decode to that.
Theorem 12.1 (Shannon’s theorem). With the setup as above, the probability a particular
message will be incorrectly decoded goes to 0 as n → ∞.
jq k
np(1−p)
Proof sketch. Set b = /2
for some  > 0, and ρ = bnp + bc. Chebyshev’s inequality
gives

P (received message had > p errors) ≤ .
2
Show that it is unlikely that any other codeword is within distance ρ of the received codeword,
using that the probability that C 0 6= C is within ρ of the received message Cr is
vol Bρ (Cr )
.
2n
28
Aggregating all these probabilities gives an error probability of

2Rn vol Bρ (Cr )


→0
2n
as n → ∞.
Note that this bound on the channel capacity of the BSC (being at least 1 − H(p))
matches the Gilbert bound that codes of rate R > 1 − H(δ) exist. Shannon’s theorem
is more constructive, since Gilbert’s bound just proves the existence of such codes, while
Shannon’s theorem gives a way to (probably) construct them.
(What’s an upper bound? Sphere packing gives 1 − H(p/2). Geoff is unsure if a tight
bound is written down or known, but suspects 1 − H(p) is the correct answer.)
Now let’s consider the binary deletion channel, where the messages are in binary, and a
sent bit is received with probability 1 − p and destroyed without record with probability p.
The best known bound is that the channel capacity is at least (1 − p)/q. The actual channel
capacity is completely unknown.
Problem: if we choose codewords uniformly at random, it is extremely likely that a code-
n
word C has on the order of (1−p)n distinct subwords of length (1 − p)n. In particular, once
p ≥ 0.5, we expect that each codeword will have almost all length (1 − p)n words condainted
in it. For example, any binary string of length 5 can be extracted from 1010101010.
Modification: pick γ > 0.5. Create codewords by choosing a first bit randomly, then for
the remainder, make each bit the same as its predecessor with probability γ. A probabilistic
analysis produces the bound (1 − p)/q given above.
The best way to decode here is the “maximum likelihood decoding”: if you receive C 0 ,
decode to a codeword C that maximizes

P (C 0 received|C sent).

Suppose we have a code with words 1100 and 1010, and receive 10. If 1100 was sent, there
are four ways to make deletions producing 10; with 1010, there are only three ways.
In extreme situations (i.e. p > 1/2), maximum likelihood and minimum distance will
give opposite conclusions: for example, in the binary symmetric channel with p = 1, if the
message is 1001, we always get 0110, which is not minimum distance!

29

S-ar putea să vă placă și