Chapter4 PDF

Information Theory and Coding 314
Dr. Roberto Togneri
Dept. of E&E Engineering

The University of Western Australia
Section 4
Error-Control Codes
(Channel Coding)
DISCLAIMER
Whilst every attempt is made to maintain the accuracy and correctness of these notes,
the author, Dr. Roberto Togneri, makes no warranty or guarantee or promise express
or implied concerning the content
ITC314 Information Theory and Coding 314 623.314
4. ERROR-CONTROL CODES (CHANNEL CODING)
4.1 Introduction
In the previous chapter we considered coding of information sources for transmission through a
noiseless channel. In that case efficient compact codes was our primary concern. In general,
though, the channel is noisy and errors will be introduced at the output. In this chapter we will be
concerned with codes which can detect or correct such errors. We restrict our attention to the class
of binary codes and BSC channels. Our model for a noiseless channel was:
message message
Modulator
Source Source
Digital Channel
Encoder Demodulator
Decoder
no errors
In the presence of noise we include an additional channel decoder/encoder. This allows us to

separate source representation (using compact codes) from error protection (using error-control
codes).
Properties of Channel Coding
1. The channel encoder takes the binary stream output of the source encoder, and maps each fixed-
block message or information word of length L bits, to a fixed-block codeword of length N. Thus
channel coding uses block codes with N > L. The difference, N-L, represents the check-bits used
for channel coding.
2. We restrict ourselves to binary channels. Hence the source encoding uses a binary code and the
data entering and leaving the channel encoder, channel decoder and channel is binary.
3. With L bits of information we need M = 2L codewords, each N-bits in length. Thus we will
receive either one of the M = 2L codewords or any one of the remaining 2N-2L words which
represent the codewords with detectable bit errors. It is this mapping operation and the “extra” N-
L check bits which can be exploited for error-control.
Our model for a noisy channel is:

NOISE
L bit N bit N bit L bit
message codeword word message
Modulator
Source Channel Channel Source
Digital Channel
Encoder Encoder Demodulator
Encoder Decoder
bit errors no errors
feedback
channel error detected?
22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-2

NOTE
1. With error correction the error is detected and corrected at the receiver. A communication system
which implements this form of channel coding is known as a Forward-Error-Correction (FEC)
error-control system.
2. With error detection the error is detected and the transmitter is required retransmit the message.
This requires a reliable return feedback channel from receiver to transmitter. The simplest
implementation is a stop-and-wait strategy whereby the receiver acknowledges (ACK) each
correct reception and negatively acknowledges (NACK) each incorrect reception. The transmitter
then retransmits any message that was incorrectly received A communication system which
implements this form of channel coding is known as a Automatic-Repeat-Request (ARQ) error-
control system.
3. In packet-switched data networks the, possibly, source-coded messages are transmitted in N-bit
codeword blocks called frames, packets or segments (depending on which network layer handles
the data) which incorporate not only the message but a control header and/or trailer. ARQ error
control is used based on systematic error-detecting channel codes with the check-bits included in
the appropriate checksum field of the header or trailer.
4.2 Information Rate (fixed-rate source coding)
It should be obvious that since N > L the rate of data entering the channel is greater than the rate of
data entering the channel encoder. Let:
1
ns = ⇒ fixed source coding bit rate (bps)
τs
1
nc = ⇒ channel bit rate (bps)
τc
It should be evident that: N=[Lτ s /τ c ] or N =[Lnc /ns ]

Equation 4.1
NOTE: If the source coder and channel bit rates are such that Lnc /n s is not on integer, then a
“dummy bit” will occasionally have to be transmitted over the channel.
Channel Coding Information Rate

L τ c ns
R= = =
N τ s nc
Equation 4.2
1 N
Channel Code Redundancy: C red = =
R L
Equation 4.3
1 N-L
Excess Redundancy: E red = −1 =
R L
Equation 4.4

Example 4.1
(a)_
The source coder rate is 2 bps and the channel bit rate is 4 bps. Say L = 3, then
N = L[n c / n s ] = 3[4 / 2] = 6 and the channel coder will generate N-L = 3 check bits for every L = 3-
bit message block and hence produce a N = 6-bit codeword. The check bits can be used for error-
control (e.g. even parity-check)
(b)_
4
The source coder rate is 3 bps and the channel bit rate is 4 bps. Now N =   L .
 3
If L = 3 ! N = 4 (1 check bit for every 3-bit message block to produce a 4-bit codeword)
2
If L = 5 ! N = 6 ≡ 6 (1 check bit for every 5-bit message block to produce a 6-bit codeword)
3
BUT there is a mismatch between the source code and channel data rates. Express the fraction as the
2 d
ratio of the two small integers d and n: = thus n = 3 and d = 2. To synchronise the data rates
3 n
we need to transmit d = 2 dummy bits for every n = 3 codewords / message blocks:
L = 5, n = 3 " X X X X X XXXXX XXXXX = 15 bits

N = 6, d = 2 " + + + + + + ++ + + + + ++ + + + + * * = 20 bits
Thus the average rate will be the correct (20/15) = (4/3). Both the check-bits + and dummy bits *
can be used for error-control.
4.3 Decision Rules and Channel Decoding
With a noiseless channel we know with probability 1 which input was transmitted upon reception of
the output. However, when the channel is noisy there is usually more than one possible input
symbol that may have been transmitted given the observed output symbol. In the presence of noise
we must use a decision rule.
Decision Rule
Consider a channel with an r-symbol input alphabet:

A = {ai}, i = 1,2 ..., r
and an s-symbol output alphabet:
B = {bj}, j =1,2 ..., s.
The decision rule d(bj) is any function specifying a unique input symbol for each output symbol, bj.
The decision rule d(b) is any function specifying a unique input word for each output word, b.

Example 4.2
Consider the channel matrix:
$!! #!!"
OUTPUT
 b1 b2 b3

 a1 0.5 0.3 0.2
INPUT 
a 2 0.2 0.3 0.5

a3  
0.3 0.3 0.4
Possible decision rules are:
d (b1 ) = a1 d (b1 ) = a1
Rule 1: " d (b2 ) = a 2 and Rule 2" d (b2 ) = a 2
d (b3 ) = a 3 d (b3 ) = a 2
With the second decision rule if b = b1b1b3b2 then d (b) = d (b1b1b3b2 ) = a1a1a 2 a 2
The aim of a good decision rule is to minimise the error probability of making a wrong decision.
4.3.1 Minimum-Error Probability Decoding
We choose
d (b j ) = a *
where a* is given by:
P ( a * / b j ) ≥ P ( a i / b j ) ∀i
Equation 4.1
Now From Bayes' law we have:
P(b j / a*) P( a*) P (b j / a i ) P ( a i )
≥
P (b j ) P (b j )
Equation 4.2
Since P(bj) is independent of ai in Equation 4.2 we have:
P(b j / a*) P(a*) ≥ P (b j / a i ) P( ai )
Equation 4.3
Thus the minimum-error probability decoding requires not only the channel matrix probabilities but
also the a priori probabilities P(ai). In some cases we do not have knowledge of P(ai).

4.3.2 Maximum-Likelihood Decoding
We chose
d (b j ) = a *
where a* is given by
P(b j / a*) ≥ P(b j / ai ) ∀i
Equation 4.4
Thus maximum-likelihood decoding depends only on the channel. This is an advantage in situations
where the P(ai) are not known. Although not minimum-error, maximum - likelihood decoding does
result in minimum-error decoding when the inputs are equiprobable.
4.3.3 Error probability
If we choose the input according to the decision rule d(bj) then the probability of getting it right is:
P(a* = d (b j ) / b j )
and the probability of error:

P ( E / b j ) = 1 − P ( a * /b j )
P(E/bj) is the conditional error probability given the output bj.

Maximising P(a*/bj) will minimise the error, hence the name minimum-error probability decoding!
The overall probability PE is the average of P(E/bj) over all the bj.
PE = ∑ P( E / b j ) P(b j )
B
We can equivalently define:

PE = ∑ P( E / ai ) P(a i )
A
Equation 4.5
and
P( E / ai ) = ∑ P(b / a ) i
b ∈ Bc
Equation 4.6
where ∑ is the summation over members of the B alphabet for which d(bj) ≠ ai.
b ∈ Bc

Example 4.3
Refer to same channel of Example 4.2. Also assume we are given that P(a1) = 0.3, P(a2) = 0.3 and
P(a3) = 0.4.
Maximum Likelihood Decoding

This requires finding the maximum P(bj/a*) for the given bj, that is the maximum value along the
column of the channel matrix P ≡ P(b j / a i ) corresponding to bj:
$!! #!!"
OUTPUT
 b1 b2 b3
 d (b1 ) = a1
a
0.5 0.3 0.2 ! d (b ) = a
P(b j / a i ) ≡ INPUT  1
 a 2  2 1
 0 . 2 0 . 3 0.5  d (b3 ) = a 2
a3
0.3 0.3 0.4
 
Note that d(b2) = a2 = a3 are also correct. We calculate the error probability using:
PE = ∑ P( E / ai ) P(ai ) = ∑ P( ai ) ∑ P(b / ai )
A A b∈ Bc
To derive P( E / a i ) = ∑ P(b / a ) we sum across the ai row excluding the bj column if d(bj) = ai:
i
b∈B C
$!! #!!"
OUTPUT
b1 b2 b3

 0.5 0.3 0.2 ⇒ 0.2 × 0.3 = 0.06
 a1
INPUT   
a 2 0.2 0.3 0.5 ⇒ 0.5 × 0.3 = 0.15
a 3 0.3 0.3 0.4 ⇒ 1%
.0 × 0%.4 = 0.&
40
  ('
P ( E / ai ) P(a ) P = 0.61
i E
Thus PE = 0.61. Note that P(E/a3) = 1.0 is an obvious result since the decision rule never decides a3.
Minimum Error Decoding

This requires finding the maximum P(bj/a*)P(a*) for the given bj, that is we need to know the input
probabilities. We form the augmented channel matrix which is pre-multiplied by P(ai) and then find
the maximum value along the column of the augmented channel matrix Pˆ ≡ P(b j / a i ) P( a i )
corresponding to bj:
$!! ! #!!!
OUTPUT
"
 b1 b2 b3
 d (b1 ) = a1
a 0.15 0.09 0.06
P(b j / ai ) P(a i ) ≡ INPUT  1 ! d (b2 ) = a 3
a 2  
0.06 0.09 0.15 d (b3 ) = a 3
a 3 0.12 0.12 0.16
 
And we calculate the error probability as before:
$!! #!!"
OUTPUT
b1 b2 b3

 0.5 0.3 0.2 ⇒ 0.5 × 0.3 = 0.15
 a1
INPUT   
 0.2 0.3 0.5 ⇒ 1.0 × 0.3 = 0.30
 2
a
a 0.3 0.3 0.4  ⇒
3
0%
.3 × 0%
.4 =
('0.&
12
P ( E / ai ) P( a ) P = 0.57
i E
Thus PE = 0.57 which is lower than maximum likelihood decoding.

Decision Rule for modern BSCs and typical source coders
The following will typically be the case:

1. q << p (modern channels have very low bit error rates)
2. The P(ai) are close to equiprobable (due to optimal source coding)
It can be shown that maximum likelihood decoding will yield the same decision rule (and PE) as
minimum error decoding. Specifically for modern digital communication systems the decision rule
on a per symbol basis is simply d(b = 0) = 0 and d(b = 1) = 1 with PE = q
However the decision rule will be based not on each symbol ai and bj but on which N-bit codeword
a was sent upon receiving the corresponding N-bit word b. We describe the problem as follows:
Channel Decoding Problem
Let a represent the codeword that is transmitted through the channel and let b be the received word.
With an L-bit message block and N-bit codeword this means that we have M = 2L N-bit valid
codewords for a. However there will be a larger set of all possible received words, b. Due to
channel noise which can cause bit errors in transmission any of the 2N possible N-bit words can be
received as b. Since N > L the received word b can either be one of the 2L N-bit codewords (no
errors or sufficient bit errors to convert one codeword to a different codeword) or one of the 2N - 2L
N-bit non code-words (bit errors, but not enough to convert one codeword to a different codeword).
The task of the channel decoder is to map b to the most likely a that was sent.
4.4 Hamming Distance
Hamming Distance
Given two words a = a1a2 … an and b = b1b2 … bn their Hamming distance is defined as the
number of positions in which a and b differ. We denote it by d(a,b):
d(a,b) = number of bit positions i = 1,2, … n where ai ≠ bi
For each alphabet and each number n, the Hamming distance is a metric on the set of all words of
length n. That is, for arbitrary words a, b, c, the Hamming distance obeys the following conditions
1. d(a,a) = 0 and d(a,b) > 0 whenever a ≠ b
2. d(a,b) = d(b,a)
3. d(a,b) + d(b,c) ≥ d(a,c) (triangle inequality)
Example 4.4
Let n = 8 and consider:
a = 1 1 0 1 0 0 0 1 d ( a , b) = 4
b = 0 0 0 1 0 0 1 0 ! d (b, c) = 2 , and d (a , b) + d (b, c ) ≥ d ( a, c )
c = 0 1 0 1 0 0 1 1 d ( a, c) = 2

4.4.1 Decoding Rule using the Hamming Distance
As stated, for modern communications systems the maximum likelihood decoding rule will yield
the same result as the minimum error decoding rule. When considering the decision rule for
deciding which N-bit codeword a was sent given the received N-bit word b, then a is chosen to
maximise P(b/a). Say we transmit the N-bit codeword a and observe the N-bit word b at the output
(of the BSC). Assume d(a, b) = D, then this means that a and b differ in exactly D places, thus D
bits are in error (with probability q) and N-D bits are OK (with probability p = 1 - q). Thus:
P (b / a )= (q ) ( p )
D N−D
1
Since q < , P(b/a) is maximised when we chose a such that D = d(a, b) is smallest. This
2
procedure can be stated more precisely as follows:
Hamming distance decoding rule
Let the binary code alphabet A = {0, 1} of block length N be used. Let ai : i ∈ [1..M] represent one
of the M codewords which is sent (through the channel). We receive the word b:
(a) if b = ai for a particular i then the codeword ai was sent.
(b) if b ≠ ai for any i, we find the codeword, a* which is closest to b:

d(a*,b) ≤ d(ai,b) for all i ∈ [1..M]
if there is only one candidate a* ! error corrected and a* was sent, else
if there is more than one candidate a* ! error detected but not corrected
Example 4.5
Consider the following channel code
Message Codeword
(L=2) (N=3)
00 000
01 001
10 011
11 111
L 2 3 N−L 1
Note that R = = , C red = and E red = =
N 3 2 L 2
There are M = 2L = 22 = 4 messages and hence 4 valid codewords a, however with N =3, there are
2N = 8 possible received words, 4 of these will be codewords and 4 of these will be non-codewords.
If the received word b = 000, 001, 011 or 111 then the Hamming distance decoding rule would
mean that we decode directly to the codeword a = 000, 001, 011 or 111 (i.e. a is the same as b and
there is no error). If we receive any of the non codewords then:
b=b1b2b3 Closest codeword Action
010 000 (b2 in error), 011 (b3 in error) error detected
100 000 (b1 in error) single-bit error corrected to 000
101 001 (b1 in error), 111 (b2 in error) error detected
110 111 (b3 in error) single-bit error corrected to 111

4.4.2 Error Detection/Correction using the Hamming Distance
Minimum distance
The minimum distance d(K) of a nontrivial code K is the smallest Hamming distance over all pairs
of distinct code words:
d(K) = min {d(a,b) | a,b are words in K and a ≠ b}
Error Detection
A block code K is said to detect all combinations of t errors provided that for each code word a and
each received word b obtained by corrupting t bits in a, b is not a code word (and hence can be
detected). This property is important in situations where the communication system uses ARQ
error-control coding.
Error Detecting Property
A code detects all t errors if and only if its minimum distance is greater than t:
d(K) > t
Property 4.1
Proof
Let a* be the codeword that is sent and b the received word. Assume there are t bit errors in
transmission then d(a*,b) = t. To detect that b is in error then it is sufficient to ensure that b does not
correspond to any of the ai codewords, i.e. d(b, ai) > 0 for all i. Using the triangle inequality we have
for any other codeword ai: d ( a*, b) + d (b, a i ) ≥ d (a*, a i ) ⇒ d (b, a i ) ≥ d ( a*, a i ) − d ( a*, b)
To ensure d(b, ai) > 0 we must have: d ( a*, a i ) − d ( a*, b) > 0 ⇒ d ( a*, a i ) > d ( a*, b) and since
d(a*,b) = t we get the final result that:
d ( a*, a i ) > t ! d(K) > t and this is the condition for detecting up to all t-bit errors
Error Correction
A block code K is said to correct t errors provided that for each code word a and each word b
obtained by corrupting t bits in a, the maximum likelihood decoding leads uniquely to a. This
property is important in situations where the communication system uses FEC error-control coding.
Error Correcting Property
A code corrects all t errors if and only if its minimum distance is greater than 2t:
d(K) > 2t
Property 4.2

Proof
Let a* be the codeword that is sent and b the received word. Assume there are t bit errors in
transmission then d(a*,b) = t. To detect that b is in error and ensure maximum likelihood decoding
uniquely yields a* (error can be corrected) then it is sufficient that d(a*,b) < d(b, ai) for all i.
Using the triangle inequality we have for any other codeword ai:
d ( a*, b) + d (b, a i ) ≥ d (a*, a i ) ⇒ d (b, a i ) ≥ d ( a*, a i ) − d ( a*, b)
To ensure d(b, ai) > d(a*,b) we must have:
d (a*, a i ) − d ( a*, b) > d (a*, b) ⇒ d (a*, a i ) > 2 ⋅ d ( a*, b)
and since d(a*,b) = t we get the final result that:
d (a*, a i ) > 2t ! d(K) > 2t and this is the condition for correcting up to all t-bit errors
Example 4.6
Even-parity check Code
The even-parity check code has N = L + 1 by adding a parity-check bit to the L-bit message. For the
case L = 2 we have:
Message Codeword
00 000
01 011
10 101
11 110
By comparing the Hamming distance between different codewords we see that d(K) = 2 (i.e. no two
codewords are less than 2 distance from each other) and by Property 4.1 d(K) = 2 > t =1 and as
expected this code will be able to detect all single-bit errors.
Repetition Code
The repetition code is defined only for L = 1 and performs error correction by repeating the message
bit (N-1) times (for N odd) and then using a majority vote decision rule (which is the same as the
Hamming distance decoding rule). Consider the N=3 repetition code:
Message Codeword
0 000
1 111
We see that d(K) = 3. From Property 4.1 d(K) = 3 > t = 2 and this code can detect all 2-bit errors.
Furthermore from Property 4.2 d(K) = 3 > 2(t = 1) and this code can correct all 1-bit errors. The
Hamming decoding rule will perform 1-bit error correction. To see this and the majority voting
operation:
Non codeword Closest Majority Action
codeword bit
001 000 0 message 0
010 000 0 message 0
011 111 1 message 1
100 000 0 message 0
101 111 1 message 1
110 111 1 message 1
Note: Using majority voting of the received bits is a much simpler implementation for decoding the
final message than using the Hamming distance decoding rule.

Code from Example 4.5

Message Codeword
00 000
01 001
10 011
11 111
We see that d(K) = 1 (e.g. distance between codewords 000 and 001) which would imply this code
cannot not even do single-bit error detection let alone error correction. In Example 4.5 the
Hamming distance decoding rule was either performing 1-bit error detection or 1-bit error
correction. The explanation is that this code can detect most but not all 1-bit errors, specifically if
codeword 000 is sent and there is a bit error in b3 we receive 001 and decode this to codeword 001
(this is known as an undetected error as there is no way to detect such errors).
Code K6
Code K6 will be used in a later example and is defined as follows
Message Codeword
(L = 3) (N = 6)
000 000000
001 001110
010 010101
011 011011
100 100011
101 101101
110 110110
111 111000
By comparing codewords we have that d(K) = 3 and hence this code can correct all 1-bit errors. For
example, if a = 010101 is sent and b = 010111 is received than the closest codeword is 010101 and
the error in b5 is corrected. This will always be the case no matter which codeword was sent and
which bit is in error. It should be noted that there are only M = 2L = 23 = 8 valid codewords
compared to 2N = 26 = 64 possible received words.
4.4.3 Upper bound B(N,d(K)) on number of codewords
The d(K) for a particular block code K specifies the code’s error correction/detection properties. Say
we want to design a code of length N and minimum distance d(K). What is the bound, B(N,d(K)) on
M (and hence L), for any given (N, d(K)), i.e. find B(N, d(K) such that M ≤ B(N, d(K))?
This is sometimes known as the main coding theory problem (of finding good codes).
Some results for B(N,d(K)) are:
B(N, 1) = 2N
B(N, 2) = 2N-1
2N
B(N, 3) =
N +1
2 N-1
B(N, 4) =
N

In general, we have:
B(N, 2t+1) = B(N+1, 2t+2) for t = 0,1,2, …
Equation 4.7
thus if the we know the bound for even d(K) we can calculate the bound for odd d(K).
For even d(K) one bound (of the apparently many) is the Plotkin bound for even d(K):
2d (K)
B(N, d (K)) = provided d(K) is even and 2d(K) > N ≥ d(K)
2d (K) - N
Equation 4.8
Example 4.7
Our channel coder encodes messages of L = 2 bits (M = 4). What is the minimum block length, N,
and possible code for the following requirements:
1-bit error detection

We need d(K) > t = 1 ! d(K) = 2, so M = 4 ≤ B(N,2) = 2N-1 ! N = 3 (smallest N)
We need to design a code with L = 2, N = 3 and ensure d(K) = 2, so here it is:
Info Code
00 000
01 011
10 101
11 110
The even-parity check code!
1-bit error correction

2N
We need d(K) > 2(t = 1) ! d(K) = 3, so M = 4 ≤ B( N,3) = ! N = 5 (smallest N)
N +1
We need to design a code with L = 2, N = 5 and ensure d(K) = 3, so here it is:
Info Code
00 00000
01 01011
10 10111
11 11100
2-bit error detection

We need d(K) > (t = 2) ! d(K) = 3 ! same code as 1-bit error correction except that the
decoding rule is simply to indicate an error if b ≠ ai for all i.
2-bit error correction

2d ( K ) 12
We need d(K) > 2(t = 2) = 4 ! d(K) = 5, so B(N,5) = B(N+1,6) = =
2d ( K ) − ( N + 1) 11 − N
12
using the Plotkin bound, and M = 4 ≤ ! N = 8. We need to design a code with L = 2, N =
11 − N
8 and ensure d(K) = 5. Fun!

4.4.4 Issues
The problem with using the Hamming distance or encoding and decoding is that with L bit
information words we generate 2L code words which we must store, and perform message word to
code word comparisons, which is an expensive operation. This represents an exponential increase in
memory and computational power with L. Only in the specific case of the repetition code is an
efficient implementation (using majority voting) possible. Later we will develop a way of
systematically encoding and decoding codes in an efficient manner by considering the important
class of binary linear codes.
4.5 Block Error Probability and Information Rate
Let:
Kn = block code of length n (code with n ≡ N-bit codewords)
Pe(Kn) = block error probability for code Kn
Peb(Kn) = bit error probability for code Kn
R(Kn) = information rate for code Kn
The bit error probability Peb(Kn) indicates the probability of bit errors between the transmitted
message and decoded message. This measures the true errors in using Kn. The block error
probability Pe(Kn) indicates the probability of decoding to the incorrect codeword block. It should
be noted that Pe(Kn) ≥ Peb(Kn) since even with an incorrect codeword block not all of the bits will
be in error.
Example 4.8
Consider a channel coding with L=3 and N=6. The 3-bit information sequence 010 is encoded as
010111 and sent through the channel. However due to a bit errors the received word is 011101 and
the channel decoder decodes this as codeword 011111 and message 011. Although the block error
decoding is 100% in error (i.e. Pe(Kn) = 1.0) the first 2 bits are OK so the bit error decoding is only
33% incorrect (i.e. Peb(Kn) = 0.33).
When comparing different codes it is normal to use Pe(Kn) rather than Peb(Kn), since Pe(Kn) is easier
to calculate than Peb(Kn) and Pe(Kn) represents the worst-case performance.
What is the relationship between Pe(Kn) and R(Kn)?

Example 4.9
Consider a BSC with q = 0.001 then Peb(Kn) = 0.001. How do we improve this and what is the price
that we pay?
(n = 3) repetition code
Info Code
0 000
1 111
We note that since L = 1 then Peb(Kn) = Pe(Kn). How do we calculate Pe(Kn)? The majority vote and
Hamming distance decoding rule will fail to yield the correct codeword if:
• all n = 3 bits are in error q3

•  3  q 2 p = 3q 2 p
2 out of n = 3 bits are in error 2
 
Hence:
Pe(Kn) = q3 + 3q2p = 3 x 10-6 much better than q = 1 x 10-3, but there is a cost since R(Kn) = 1/3
(n = 5) repetition code
Info Code
0 00000
1 11111
The majority vote and Hamming distance decoding rule will fail to yield the correct codeword if:
• all n = 5 bits are in error q5
• 4 out of n = 5 bits are in error  5  q 4 p = 5q 4 p
4
 
• 3 out of n = 5 bits are in error  5 q 3 p 2 = 10q 3 p 2
 3
 
Hence: Pe(Kn) = q + 5q p +10 q p = 10-8, but now R(Kn) = 1/5 !
5 4 3 2
1
With repetition codes R( K n ) = and there is an exchange of message rate for message reliability.
n
But is there a way of reducing Pe(Kn) without a corresponding reduction in R(Kn)? Yes!
Example 4.10
Consider a BSC with q = 0.001 and the restriction that R(Kn) = 0.5. We design two codes K4 and K6
in an attempt to lower Pe(Kn) and still keep R(Kn) = 0.5
Code K4 (N = 4, L = 2)
We transmit a1 once and transmit a2 three times to produce the following code:
Info Code
00 0000
01 0111
10 1000
11 1111
We perform conditional error correction by assuming b1 is correct (a1 = b1) and then do a majority
vote on the remaining b2b3b4 bits to obtain a2. This decoding will be correct when:
• all n = 4 bits are correct p4
• b1 is correct and there is one error in the remaining 3 bits  3 p 3q = 3 p 3q
1
 
Hence: 1 - Pe(Kn) = p4 + 3p3q ! Pe(Kn) = 1 - p4 - 3p3q = 0.001
But this is indeed better if we consider Peb(Kn). In the following, let “c” mean correct and “i” mean
incorrect. We can derive :
Pci = Pr(a1 = “c”, a2 = “i”) q(3p2q + p3) = ! first bit only is incorrect (only 1 of the 2-bits is wrong)
Pic = Pr(a1 = “i” , a2 = “c”) = p(q3 + 3pq2) ! second bit only is incorrect (only 1 of the 2-bits is wrong)
Pii = Pr(a1 = “i” , a2 = “i”) = q(q3 + 3pq2) ! both bits are incorrect
Pcc = Pr(a1 = “c” , a2 = “c”) = p(3p2q + p3) ! both bits are correct (decoding is correct)

And over a message with L = 2 bits we can show that:

Peb ( K n ) =
2 × ( Pii ) + 1 × ( Pci + Pic ) + 0 × ( Pcc )  p
2
( 
) ( q
=  − q  q 3 + 3 pq 2 + 3 p 2 q + p 3
2  2
)
Hence for q = 0.001: Peb(Kn) = 0.0005 < 0.001
Code K6 from Example 4.6 (N = 6, L = 3)

Code K6 is able to correct all 1-bit errors in any if the n = 6 codewords. The decoding will be correct
when:
• all n = 6 bits are correct: p6
• any one bit is in error:  6  p 5q = 6 p 5q
1
 
and it can be shown that the decoding will always be incorrect for n > 1 bit errors (Show This!).
Hence: Pe(Kn) = 1 - p6 - 6p5q = 0.000015 << 0.001
This represents a reduction in the block error probability over K4 by 2 orders of magnitude without
any reduction in R(Kn)!
4.5.1 Issues
The obvious observation is that by using larger values of n our error probability decreases with the
same information rate (compare this to the equivalent statement for source coding). We are tempted
to ask:
1. Can an encoding be found (for n large enough) so that Pe(Kn) → ∈ and R(Kn) = 1
2 ?
2. Can an arbitrary reliability (i.e. make Pe(Kn) as small as we want) be achieved if R(Kn) = 0.9?
3. Can an arbitrary reliability be achieved if R(Kn) = 0.99?

Shannon's Fundamental Theorem will provide us with some of the answers.
4.6 Shannon's Fundamental Theorem
For a noiseless BSC:

C = log 2 = 1 bit
which means that a bit of information can be transmitted through the channel. For a noisy BSC we
may have:
C = 0.5 bits
which means that if N bits are sent through the channel the presence of errors will allow only 0.5N
bits of information to be extracted (assuming the BSC is used at channel capacity).
Now consider:
R = C = 0.5 bits
L 1
∴ = i.e. N = 2L
N 2
Thus we will extract 0.5N = 0.5(2L) = L bits of information which is exactly the number of bits
needed to recover the message (which is of length L bits)! We have overcome the error of the

channel by introducing redundancies. Shannon's Fundamental Theorem will tell us that for large
enough N, if R ≤ C we can encode for error-free transmission:
Shannons's Fundamental Theorem
Every binary symmetric channel of capacity C > 0 can be encoded with an arbitrary reliability and
with information rate, R(Kn) ≤ C, arbitrarily close to C. That is, there exist codes K1, K2, K3 ….
such that Pe(Kn) tends to zero and R(Kn) tends to C with increasing n:
lim Pe (K n ) = 0, lim R(K n ) = C i.e. R(K n ) = C − ∈1 , ∈1 > 0 and lim ∈1 = 0
n →∞ n →∞ n →∞
That is, If R(Kn) ≤ C, then Pe(Kn) → 0 for large enough n, and as a bonus R(Kn) → C!
4.6.1 Engineer’s Proof
The formal proof of Shannon's Theorem is quite lengthy and involved. We only present the salient
features of the proof here (which is why this is an engineer’s proof # ).
Assume an arbitrarily small number ∈1 > 0and that we have to design a code Kn of length n such
that R (K n ) = C − ∈1 . To do this we need an information length of L = n (C − ∈1 )
and hence:
n (C − ∈1 )
R(K n ) = =C − ∈1
n
n (C −∈1 )
This means we need M = 2 codewords.
Shannon's proof makes use of random codes. Given any number n we can pick M out of the 2n
n (C −∈1)
binary words in a random way and we obtain a random code Kn. If M = 2 then we know
that R (K n ) = C − ∈1 but Pe(Kn) is a random variable.
Denote:
~
Pe = E( Pe (K n ))
as the expected value of Pe(Kn) for a fixed value of n, but a completely random choice of M
codewords. The main, and difficult part, of the proof (which we conveniently omit) is to show that:
~
Pe → 0 as n → ∞
~
Thus given arbitrarily small numbers ∈1 > 0 and ∈2 > 0 we can find n such that Pe <∈2 . This means
there must be at least one random code Kn with R (K n ) = C − ∈1 and Pe(Kn) < ∈2.
NOTE
1. The surprising part of Shannon's proof of the theorem is that we can achieve small Pe(Kn) with a
random choice of Kn. However, this feature of the proof flys in the face of common sense: How
can you chose a good code at random?
2. No practical coding scheme realizing the parameters promised by Shannon's Theorem, has ever
been presented. Thus, coding theory tends to ignore the theorem and concentrate on techniques
which permit designing codes capable at correcting lots of errors and still retain a reasonable
information rate.

Converse of Shannon's Theorem
In every binary symmetric channel of capacity C whenever codes Kn of lengths n have information
rates at least C + ∈1 ( ∈1 > 0) then the codes tend to be totally unreliable:
R (K n )≥ C + ∈1 → lim
n → ∞ Pe (K n )=1
That is, If R(Kn) > C, then Pe(Kn) → 1 for large enough n.
Example 4.11
Consider the BSC from Example 4.10. With q = 0.001 we know that:
C = 1 + p log p + q log q = 0.9886
We can now answer the 4.5.1 Issues questions:
1. Since R = 0.5 < C = 0.9886 we can find a code (for n large enough) such that Pe(Kn) → ∈
2. Since R = 0.9 < C = 0.9886 we can find a code (for n large enough) such that Pe(Kn) → ∈
3. Since R = 0.99 > C = 0.9886 we cannot find a code such that Pe(Kn) → ∈
4.7 Binary Linear Codes
4.7.1 Binary (mod 2) arithmetic
From now on we define the operations + (mod 2 addition) and .(mod 2 multiplication) as follows:
Exclusive OR + 0 1 . 0 1
+ → 0 0 1 0 0 0 ← . And
1 1 0 1 0 1
Since 1 + 1 = 0 ⇒ 1 = -1
Thus binary mod-2 subtraction coincides with binary mod-2 addition.
Example 4.12
Compare binary mod-2 subtraction with mod-2 addition:
1-0=1 1+0=1
1-1=0 1+1=0
0-1=1 0+1=1
0-0=0 0+0=0 They are the same!
Consider binary mod-2 algebra:

x1 = -x1 ⇒ x1 + x1 = 0
x1 + x2 = x3 ⇒ x1 + x2 + x3 = 0
NOTE Since binary mod-2 subtraction is the same as binary mod-2 addition then all binary
mod-2 algebraic equations will be (re)written in standard form using mod-2 binary addition.

4.7.2 Binary Linear Codes (Take 1)
Change in notation
n = N ≡ length of block code (entering the channel)

k = L ≡ number of information / message bits in each block (entering channel encoder).
An important class of binary codes are the binary linear codes which can be described by systems of
linear equations as follows:
Binary linear codes (one definition)
Denote by xi the ith bit of the code word. Assume we have a message or information word of k bits
which is encoded into a codeword of n bits. In a binary linear code all the 2k codewords satisfy n-k
linear equations in xi for i = 1,2, … n (i.e. n-k equations with n unknowns).
Furthermore, we can rearrange the equations to be homogenous (i.e. the RHS is zero)
Binary linear codes (another definition)
Every homogenous system of linear equations defines a linear code. We can also show that every
linear code can be described by a homogenous system of linear equations.
NOTE
Since there are n-k equations in n unknowns then we have n - (n-k) = k independent variables and n-
k dependent variables. Since the variables are binary this means we have 2k solutions which is as
expected.
Example 4.13
n-length repetition code (n odd)
Info Code:
x1 x2 x3 … xn
0 000…0
1 111…1
With k = 1 then we need n-k = n-1 equations in the n unknowns (x1 x2 x3 … xn) which fully describes
the repetition code:
x2 = x1 x1 + x2 = 0
x3 = x1 x1 + x3 = 0
!
) )
xn = x1 x1 + xn = 0

n-length even-parity check code

Consider the case of n = 3 (i.e. k = 2):
Info Code:
x1 x2 x3
00 000
01 011
10 101
11 110
In general n = k -1 and we expect n-k =1 equation which will fully specify the even-parity check
code, and of course it’s the equation used to set the check-bit, x3, itself:
x n = x1 + x 2 + * + xn −1 ! x1 + x2 + * + x n = 0
4.7.3 Parity Check Matrix
Given a linear code K described by a system of (n-k) homogenous equations in (n) unknowns we
construct the (n-k) x (n) matrix, H, with the co-efficients of the equations. That is, the ith row of H
expresses the ith equation. H is known as the parity-check matrix.
Parity Check Matrix and Parity Check equations
The binary matrix H is called a parity check matrix of a binary linear code K of length (n) provided
that the code words K are precisely those binary words xT = [x1 x2 … xn] which fulfil:
 x1  0 
x  0 
H  2  =   or Hx = 0 and H has (n-k) rows and (n) columns
) )
   
 xn  0 
The system of (n-k) homogenous equations are known as the parity-check equations
NOTE: The system Hx = 0 will have 2k solutions.
Example 4.14
H for Repetition Code
$!!!!!# n !!!!!"
1 1 0 0 0 * 0 0  x1 
  
x1 + x2 = 0 1 0 1 0 0 * 0 0  x 2 
x1 + x3 = 0 1 0 0 1 0 * 0 0  x3 
! Hx = n − 1  =0
)  ) ) ) ) ) + ) )  ) 
x1 + xn = 0 1 0 0 0 0 * 1 0  xn −1 
  
1 0 0 0 0 * 0 1  xn 
Hence, H is a (n-1) x (n) matrix

H for Even-parity check code
 x1 
 
$!!!# "  x2 
n !!!
x1 + x2 + * + x n = 0 ! Hx = [1 1 1 * 1 1] )  = 0
 
 xn −1 
 x n 
Hence, H is a (1) x (n) matrix
4.7.4 Binary Linear Codes (Take 2)
If the word x is a codeword then Hx = 0, and if y is also a codeword then Hy = 0. But if z = x + y

then Hz = Hx + Hy = 0, z is also a codeword. This gives us the “proper” definition of a binary
linear code.
Binary Linear Code (proper definition)
A binary block code is said to be linear provided that the sum of any arbitrary two code words is
also a code word.
Example 4.15
Repetition Code
Info Code:
x1 x2 x3 … xn
0 000…0
1 111…1
There are only two codewords and 000…0 + 111…1 = 111…1 which is of course a codeword, so
the repetition code is a binary linear code.
Even-parity check code

The even-parity check code is a binary linear code.
Code K4 from Example 4.10

Info Code
00 0000
01 0111
10 1000
11 1111
By adding any two codewords in can be shown that K4 is a binary linear code:
0111 + 1000 = 1111 √
1111 + 1000 = 0111 √

Show that K6 is a binary linear code

Code from Example 4.5

Info Code
00 000
01 001
10 011
11 111
Is this code linear? Consider:
001 + 011 = 010 × ! this is not a binary linear code
4.7.5 Rectangular Code
As we shall see describing binary linear codes by the parity-check matrix will facilitate both the
analysis and design of codes. However in some cases special codes can be constructed which can be
analysed in a different and more intuitive way. One such class of code is the r x s rectangular code
which is used when transmitting row by row or two dimensional data blocks. This is a binary linear
code of length n = rs, whose codewords are considered as r x s matrices. Each row and column has
r-1 and s-1 information bits respectively.
3 x 4 rectangular code
r = 3 and s =4 " n = 3 x 4 = 12 and k = (3-1) x (2-1) = 6. To analyse this code we represent the
codeword (x1, x2, …, x12) in the following matrix form:
 x1 x2 x3 x4  The information bits are:

r=3 x x6 x7 x8  x1 x2 x3 x5 x6 x7
 5 
 x9 x10 x11 x12  The check bits are:
x4 x8 x9 x10 x11 x12
s=4
Each check-bit is an even-parity check over the corresponding row or column. If one includes the
check-bits themselves then there are r + s rows and columns to check over and hence r + s = 7
parity-check equations:
x1 + x2 + x3 + x4 = 0 

x5 + x6 + x7 + x8 = 0  row parity - checks
x9 + x10 + x11 + x12 = 0 
x1 + x5 + x9 = 0 
x 2 + x6 + x10 = 0 
 column parity - checks
x3 + x7 + x11 = 0 
x 4 + x8 + x12 = 0 
We expect n-k = 6 parity-check equations so one of the equations is in fact redundant. However all 7
equations are used to derive the parity-check matrix description:

1 1 1 1 0 0 0 0 0 0 0 0  x1 
row 0 0 0 0 1 1 1 1 0 0 0 0  x 2 
parity-checks  
0 0 0 0 0 0 0 0 1 1 1 1  ) 
  
Hx = 1 0 0 0 1 0 0 0 1 0 0 0  x 6  = 0
column 0 1 0 0 0 1 0 0 0 1 0 0  ) 
parity-checks 0  
 0 1 0 0 0 1 0 0 0 1 0  x11 
0 0 0 1 0 0 0 1 0 0 0 1  x12 
4.7.6 Systematic Binary Linear Codes
Systematic (Separable) Binary Linear Codes
Block codes in which the message or information bits are transmitted in unaltered form are called
systematic codes. Specifically for binary linear codes, consider the (k) length message mT = [m1, m2,
…, mk] (mi is the ith bit if the message). The (n) length codeword is represented by:
xT = [mT | bT] = m1, m2, …, mk, b1, b2, …, bn-k where bi is the ith check bit
That is, the codeword is formed by appending the (n-k) check bits to the (k) information bits
NOTE: Unless otherwise stated, all codes we develop will be systematic codes
“Communication Systems” by S. Haykin uses a reversed vector representation

The text by S. Haykin adopts the notation: b0, b1, …, bn-k-1, m0, m1, …, mk-1
which should compared to our notation: m1, m2, …, mk, b1, b2, …, bn-k
The notation adopted by S. Haykin assumes bits run to the “right” with mk-1 being the LSB (least
significant bit) of the message. The notation adopted here assumes bits run to the “left” with m1
being the LSB of the message. Thus the bit sense is reversed and the subscript notation is different.
Comparing our notation to that of S. Haykin:
(LSB) m1 = mk-1, m2 = mk-2,…, bn-k-1 = b1, bn-k = b0 (MSB)
Example 4.16
The repetition code is obviously systematic
n-length even-parity check code

Consider the case of n=3 (i.e. k=2 and n-k=1):
Info Code
m1m2 x1x2 x3
00 00 0
01 01 1
10 10 1
11 11 0
same
This is a systematic code since x1x2 = m1m2 and x3 = b1 is the check-bit


We have that n=4, k=2 and n-k=2:
Info Code
m1m2 x1x2 x3x4
00 00 00
01 01 11
10 10 00
11 11 11
same
This is a systematic code since x1 x2 = m1 m2 and x2 x4 = b1 b2 are the check-bits

Show that K6 is a systematic code
By making m1, m2, …, mk the independent variables and the b1, b2, …, bn-k the dependent variables
the b1, b2, …, bn-k can be expressed as explicit functions of the m1, m2, …, mk variables only. Hence:
Result 4.1
In encoding, the (n-k) check-bits are appended to the (k) length message to form the resultant (n)-
length codeword.
In decoding, after the correct codeword has been chosen (i.e. errors corrected), the (n-k) bits
appended to the (n)-length codeword are stripped to form the (k)-length message.
4.7.7 Systematic form for H
Define:
b as the (n-k) column vector of check bits
m as the (k) column vector of information bits
m
c as the (n) column codeword vector =   ≡ x
b
P as the (n-k) x (k) coefficient matrix
G as the (n) x (k) generator matrix
H as the (n-k) x (n) parity-check matrix
Since the check-bits are generated from the information bits we represent this operation by using P :
b = Pm
Equation 4.9
I 
and G =  k  , where Ik is the (k) x (k) identity matrix, generates the codeword:
P
I   m  m
c = Gm =  k  m =  =
P  Pm   b 
Equation 4.10

and if H = [ P | In-k ], where In-k is the (n-k) x (n-k) identity matrix, then:
m
Hc = [P | I n-k ]  = Pm + b = b + b = 0
b
Exercise: Show that HG = 0
Systematic Form for H
H = [ P | In-k ]
Equation 4.11
NOTE
1. The systematic form for H makes it a trivial operation to extract P and hence generate the check
bits from the information bits.
2. If H is in non-systematic form, the resulting parity-check equations will need to be manipulated,
or H can be directly manipulated by standard row operations into systematic form.
“Communication Systems” by S. Haykin uses a reversed matrix representation
The parity-check matrix systematic structure adopted by S. Haykin is different than that described
here and both the rows and columns of H are reversed. Thus a parity-check matrix, H, from S.
Haykin:
 a11 a12 * a1n  a mn * a m 2 a m1 
a   ) ) 
 21 a 22 * a 2 n  
+ )

H= is equivalent to our: H=
 ) ) + )   a2n * a22 a21 
   
am1 a m 2 * a mn   a1n * a12 a11 
Example 4.17
Repetition code
1 1 0 0 0 * 0 0  m1  m
1 0 1 0 0 * 0 0  b1 
  
1 0 0 1 0 * 0 0  b2 
n-1 Hc =   =0
) ) ) ) + ) )  )  b
)
1 0 0 0 0 * 1 0 bn − 2 
  
1 0 0 0 0 * 0 1  bn −1 
P In-1

 b1  1
 b  1
 2  
b =  )  = ) [m1 ] = Pm
  
bn − 2  1
 bn −1  1
 m1  1 I1
 b  1
 1  
 b  1
c =  2  =   [m1 ] = Gm
P
 )  ) 
bn − 2  1
  
 bn −1  1
Even-parity check code
 m1 
 m 
n-1  2 
 ) 
Hc = [1 1 1 * 1 1]  =0
 m n − 2 
P I1  mn −1 
 
 b1 
 m1 
 m 
 2 
b = [b1 ] = [1 1 1 * 1] )  = Pm
 
 mn − 2 
 mn −1 
 m1  1 0 0 * 0 0  m1 
 m  0 1 0 * 0 0  m2 
 2    
 )  In-1 0 0 1 * 0 0  m3 
c=  =    = Gm
mn − 2  ) ) ) + ) )  ) 
 mn −1  0 0 0 * 0 1  mn − 2 
    
 b1  P 1 1 1 * 1 1  mn −1 

4.7.8 Issues
Given the 2k solutions to the system Hx = 0 which we use as the codewords, and the corresponding
2k information words or messages, we can generate a table of codewords with 2k entries. Encoding
will then involve a table lookup, and decoding will involve searching the same table for the closest
matching codeword (Hamming distance decoding rule). Both operations involve an exponential
increase of complexity with k. However there is a better way:
• systematic binary linear codes make encoding a simple logic operation on the k-bit message.
• syndrome decoding involves a simple logic operation on the n-bit received word for error
detection. Error correction will involve a lookup operation of a 2n-k table which becomes a
simple logic operation on the n-bit received word for the case of single-bit error correction.
It should be noted that lookup of a 2n-k table is much less expensive than finding the closest match
in a 2k table. Not only is a table lookup operation more efficient to implement (e.g. as an indexing
operation) but (n-k) << k < n.
4.8 Check-bit Generation Encoding and Syndrome Decoding
4.8.1 Error pattern
Error pattern or error vector
Let c represent a codeword and let r represent the codeword received with bit errors. The vector, e,
defined by:
e=c+r
is termed the error vector (error pattern) since it indicates the bit positions in which c and r differ
(i.e. there has been a bit error):
0 if ri = ci
ei = 
1 if ri ≠ ci
Example 4.18
Consider (n,k) = (5,2) code with the following parity-check matrix:
1 0 0 1 0
H = 0 1 0 0 1 
 
0 1 1 1 0
The code table will be shown to be:
Info Code
00 00000
01 01101
10 10110
11 11011
Say cT = [01101] was transmitted and rT = [01111] was received, that is bit r4 is in error, and thus
eT = [00010] and we note that r = c + e and c = r + e.

4.8.2 Syndrome
Let r be the received word:

r = ci + e
where ci is the ith codeword that was transmitted and e is the error pattern of the transmission error.
Error correction involves identifying the error pattern, e, given only r. Once we know e we obtain
the correct codeword by: ci = r + e. But how do we get e from r? Consider:
Hr = Hci + He = 0 + He = s
The (n-k) column vector, s, is called the syndrome.
Now let:
H = [h1 | h 2 | h 3 | * | h n −1 | h n ]
th
where hi corresponds to the i column of H. And let:
 e1 
e 
e =  2
)
 
en 
th
where ei is the i bit of e. Consider:
 e1 
e 
He = [h1 | h 2 | h 3 | * | h n −1 | h n ] 2  = e1[h1 ] + e2 [h 2 ] + * + en [h n ] = s
)
 
 en 
Consider a bit error in bits j, k and l of the codeword then ej = ek =el = 1 and all other ei = 0. Hence it
follows that:
s = [h j ] + [h k ] + [ h l ]
That is, the syndrome is the sum of the columns in H which correspond to the bits in error. Thus we
get the following result:
Result 4.2
The syndrome, s, only depends on the error pattern, e, and the syndrome is the sum of the columns
∑
of H which correspond to the bits in error, that is s = [h i ] for i s.t. ei = 1 .
Notice that although s is calculated from r it is related to e. But does knowing s allow us to get e?
Unfortunately there are many error combinations which can yield the same syndrome (i.e. we can’t
simply do e = H-1s since H is non-invertible). We make the following observations:
1. Consider the two error patterns, ei and ej, which yield the same syndrome, then
Hei + Hej = s + s = 0, hence ci = (ei + ej) is by definition a codeword. Equivalently, if ei yields s
then ei + c also yields the same s for any and all codewords c.
Result 4.3
All error patterns that have the same syndrome differ by a codeword and a non-zero codeword
added to an error pattern will result in a different error pattern with the same syndrome.

2. The collection of all the error patterns that give the same syndrome is called a coset of the code.
Since there are 2n-k different syndromes then there will be 2n-k different cosets for the code. From
Result 4.3 since there are 2k codewords there will be 2k distinct error patterns in each coset.
[Note that with 2n-k cosets and 2k error patterns per coset we will have 2n-k x 2n-k = 2n error
patterns and hence all possible error pattern combinations will have been accounted for]. Thus
for a given syndrome there will be 2k distinct error patterns in the coset. Which error pattern is
the right one?
3. Based on the Hamming distance decoding rule in the particular coset associated with the
syndrome calculated, it is reasonable to select the error pattern with the minimum number of bit
errors (i.e. minimum number of bits that are 1), as this is the most likely error pattern to occur.
The error pattern with the minimum number of bits being 1 in a particular coset is known as the
coset leader. Hence we choose the coset leader and add that to r to yield the most likely
codeword that could have been sent.
4. If there is more than one candidate coset leader this implies that for that particular syndrome the
error cannot be corrected since there is more than one error pattern which is equally likely.
4.8.3 Encoding of Systematic Codes
If H is in systematic form (see Equation 4.11), then the co-effiicient matrix, P, is trivially extracted.
If H is not in systematic form, it will need to be converted to systematic form by the appropriate row
operations. Alternatively, the system of equations Hx = 0 can be converted directly to the form
b = Pm by the appropriate algebraic operations.
Encoding the (k) bit information word, m, to the (n)-bit codeword, c, proceeds as follows:
1. Generate the check-bit vector, b = Pm

m
2. Form c =  
b
Example 4.19
Consider the encoding process for the code from Example 4.18 with parity-check matrix:
1 0 0 1 0
H = 0 1 0 0 1 
 
0 1 1 1 0
Since H is not in systematic form we perform the following row operations in the order stated
• swap rows 2 and 3
• swap rows 1 and 2
• add row 2 to row 1
and hence:
 1 1 1 0 0 1 1
H sys = 1 0 0 1 0 ⇒ P = 1 0
 
   
0 1 0 0 1 0 1
Thus we get the check-bit generation equations:

 b1  1 1 b1 = m1 + m2 x3 = x1 + x 2
     m1 
b = Pm ⇒ b2 = 1 0   ⇒ b2 = m1 ⇒ x 4 = x1
     m2 
b3  0 1 b3 = m2 x5 = x 2
The code table is now derived by generating the check-bits from the message and appending the
check-bits to the message to form the codeword:
Info Code
x1x2 x1x2x3x4x5
00 00000
01 01101
10 10110
11 11011
4.8.4 Syndrome Decoding
Decoding the (n)-bit received word, r, to retrieve the (k)-bit information word, m, proceeds as
follows:
1. Calculate the syndrome s = Hr

2. For error detection:
• if s ≠ 0 then there is an error and an indication has to be sent to retransmit the message
• if s = 0 then there is no error and c = r, hence go to step 4.
3. For error correction:
• Determine the unique coset leader, ec, of the coset corresponding to s and then correct the
most likely error pattern by c = r + ec
• If there is more than one candidate coset leader then only error detection is possible and an
indication has to be sent to retransmit the message.
4. Extract the message, m, by stripping off the (n-k) check bits appended to c.
But how do we determine the coset leader? From Result 4.2 and Result 4.3 at least two approaches
suggest themselves:
1. Find the minimum number of columns in H that when added together yield the syndrome. The
locations of the columns specify the bit positions of the coset leader, ec.
2. Find one combination, any combination, of columns in H that when added together yields the
syndrome. Define the error pattern, e, such that ei = 1 if column i was used in the combination
and then form the coset by adding each 2k codeword to e and then locate the coset leader as the
error pattern with the minimum number of bits being 1.
Since the above operations are expensive to perform for each received word, the usual practice is to
prime the decoder with the pre-determined coset leader for each of the possible 2n-k syndromes. The
coset leader is then found by using s to index (lookup) the 2n-k table of coset leaders.
In the case of single-bit errors, from Result 4.2 if s matches the ith column of H then bit i is in error
(the coset leader has ei = 1, and all other bits are 0) and the codeword, c, is obtained by inverting the
ith bit of r. Thus there is no need to explicitly determine the coset leader.

Example 4.20
Consider the decoding process for the code from Example 4.18.
Say rT = [01111] is received. The syndrome s is calculated:
0 
 
1 1 1 0 0 1 0
s = Hr = 1 0 0 1 0 1 = 1
    
0 1 0 0 1 1 0
1
0
s = 1 matches only the 4th column of H, " bit 4 is in error (i.e. ecT = [00010]) and hence
 
0
cT = [01101] which is a valid codeword. The message sent is then mT = [01].
1
T
Say r = [00111] is received. The syndrome s = Hr = 1 does not match any column of H.

1
Looking for the minimum number of columns which when added yield s we see that:
1 0 1 1 0 1
h 2 + h 4 = 0 + 1 = 1 = s and h1 + h 5 = 1 + 0 = 1 = s
     
         
1 0 1 0 1 1
Hence there are two candidate coset leaders representing double-bit errors:
ec1T = [01010] and ec2T = [10001] and thus the error cannot be corrected. Indeed we see that
c1T = (ec1 + r)T = [01101] and c2T = (ec2 + r)T = [10110] are equally distant from rT = [00111].
4.8.5 Encoding and Decoding Implementations
Encoding
Generation of the (n-k) check-bits from the m message bits, b = Pm, can easily be implemented in
hardware using XOR logic gates or making use of XOR assembly language instructions for
implementation in software or embedded systems.
 m1  c1
m  c
 2  2

 )  )
mk  ck

m1   b1 ck +1
m2  b  )
XOR
 2
 logic gate  
)  array  ) cn −1
mk  bn −k  cn

Decoding
Assuming an FEC system, for each syndrome, s, the corresponding unique coset leader, ec, is
derived and stored. Thus syndrome decoding becomes a table lookup operation using s as the index
into a table of size 2n-k. The generation of the syndrome can be implemented using XOR logic gates
or special XOR assembly instructions. The table lookup operation can be implemented using a (n-
k)-to-2n-k decoder and inverters can be used to correct the required bits in error.
inverter
enable logic
n-k inverter
2n-k
 s1 
r1  enable
to logic  c1
s  input
r2  XOR
 2  2n-k OR
c
 2
 logic gate   inverter 
) array  )  enable logic gate )
decoder or
rn  s n −k  cn
buffer
)
inverter
enable logic
4.9 Hamming Weight
Hamming weight / Minimum weight
The Hamming weight of a word, w(x), is defined as the number of bits distinct from 0. For each
non-trivial code K, the smallest Hamming weight of a code word distinct from 0T = [000...0] is
called the minimum weight of K, w(K).
Example 4.21
Hamming weight examples:
w(xT = [111]) ≡ w(111) = 3
w(101000) = 2
w(110110) = 4
The minimum weight of the codes considered so far are:

w(K) = n for the n-length repetition code
w(K) = 2 for the n-length even-parity check code
w(K4) = 1 for code K4 from Example 4.10
For the r x s rectangular code, K, it is not necessary to list the codewords in order to find the
minimum weight. All that is needed is to use the row and column parity-check equations and
attempt to find the non-zero codeword solution which uses the least number of bits equal to 1. It can
thus be shown that w(K) = 4 for any r x s rectangular code.

Relationship between Hamming distance and Hamming weight for binary linear codes
For each non-trivial binary linear code the minimum distance, d(K), is equal to the minimum
weight, w(K). That is d(K) = w(K)
Property 4.3
From Property 4.1 and Property 4.2 we can now say:

Error Detection / Correction for binary linear codes
A binary linear code corrects (or detects) all t errors if and only if its minimum weight is larger than
2t (or larger than t, respectively)
Property 4.4
Example 4.22
w(K) = n for the n-length repetition code hence:
" detect all (n-1) bit errors and correct all (n-1)/2 errors
w(K) = 2 for the n-length even-parity check code

" detect all 1 bit errors

" no error protection at all (code K4 corrects single-bit errors conditional on x1 being correct, but
if x1 is corrupted this is not detected, hence code K4 has no unconditional error protection)

" corrects all 1 bit errors and detects all 2-bit errors
w(K) = 4 for r x s rectangular code

" corrects all 1-bit errors and detects all 3-bit errors
4.10 Designing the Parity-Check matrix
Let x be the codeword with a Hamming weight equal to the minimum weight, w(K), of the code K.
If H is a parity-check matrix for K, then we know Hx = 0. Now by definition x has exactly w(K) bits
which are 1, the remaining bits being 0. Thus the operation Hx = 0 effectively sums w(K) columns
of H to 0. This gives us the following results:
Result 4.4
For a linear code K with parity-check matrix, H, if the minimum number of columns of H that sum
to zero is n, then w(K) = n.
Result 4.5
For a linear code K with parity-check matrix, H, if no combination of n or less columns of H sum to
zero, then w(K) > n.

4.10.1 Structure of H for single-error detection
For single-bit error detection w(K) = 2 or w(K) > 1. From Result 4.5 this means no single column of
H must sum to zero, i.e. no column of H must be a zero column. This means w(K) is at least 2 and
can detect all single-bit errors.
Property of H for single-error detection
A binary linear code K detects single-bit errors if and only if every parity-check matrix of K has
non-zero columns (i.e. no column is the all-zeros column).
4.10.2 Structure of H for single error correction
For single-bit error correction w(K) = 3 or w(K) > 2. From Result 4.5 this means no single column
(i.e. n = 1) of H must sum to zero and no two columns (i.e. n = 2) of H must sum to zero. The first
condition implies non-zero columns and the second condition implies no two columns of H are
identical (if and only if two columns are identical will they sum to zero), that is, the columns are
pairwise distinct.
Property of H for single-error correction
A binary linear code K corrects single-bit errors if and only if every parity-check matrix of K has
non-zero, pairwise distinct columns (i.e. no column is the all-zeros column and no two columns are
the same).
If a binary matrix with r rows has to have nonzero, pairwise distinct columns this means that the
number of columns, c < 2r. For a parity-check matrix with (n-k) rows and (n) columns this means:
Property of K for single-error correction
A binary linear code K which corrects single-bit errors will satisfy:
n < 2n-k
However this is a necessary but not sufficient condition
Example 4.23
(a)
It is required to design the most efficient code for single-bit error correction with k=2. This implies
k 2
a code with maximum information rate: max R = = → min n . From the property n < 2n-2, then
n n
the smallest permissible is n = 5, and hence n-k=3 check bits are needed. This H is a 3x5 matrix.
For single-bit error correction H must have nonzero, pairwise distinct columns. Furthermore we
want H to be in systematic form, so one possible solution is:
1 0 1 0 0 x3 = x1
H = 1 1 0 1 0 ⇒ x 4 = x1 + x2
 
0 1 0 0 1 x5 = x 2

Since the columns are non-zero, pairwise distinct w(K) = 3

Info Code
x1x2 x1x2x3x4x5
00 00000
01 01011
10 10110
11 11101
The minimum weight of the code is, as expected, w(K) = 3. Hence this is a single-bit error
correcting code. Note that the code from Example 4.18 is also another solution.
(b)
It is now required to design the most efficient code for 2-bit error correction with k = 2. This
requires a code with at least w(K) = 5. To determine the smallest n that permits this, the upper
bound expression B(n,5) that was derived in Example 4.7 for the same case of M = 4, d(K) = 5 and
L = 2, was N ≡ n = 8, and n-k = 6 check bits, thus H is a 6x8 matrix.
To design the matrix we note that to be in systematic form, predefines the 6x6 identity matrix, I6.
How do we choose the remaining 2 columns (which define P)? Since w(K) = 5 we must ensure that
no combination of 4 or less columns will add to zero. Here is one solution:
1 1 1 0 0 0 0 0 x3 = x1 + x2
0 1 0 1 0 0 0 0 
  x 4 = x2
1 1 0 0 1 0 0 0 x5 = x1 + x2
H=  ⇒
0 1 0 0 0 1 0 0  x6 = x2
1 0 0 0 0 0 1 0 x7 = x1
 
1 0 0 0 0 0 0 1 x8 = x1
P I6
Info Code
x1x2 x1x2x3x4x5x6x7x8
00 00000000
01 01111100
10 10101011
11 11010111
2 1
From which we see that the minimum weight of the code is indeed w(K) = 5. Also R = = which
8 4
is not that good. We may need to increase k to achieve better efficiency.

4.11 Perfect Codes for Error Correction
A binary linear (n,k)-code requires 2k codewords of length n. Thus there are 2n possible words of
length n and 2k of these are codewords and 2n - 2k of these are non codewords. If the code corrects
up to t errors, the non codewords are either:
Type 1: non codewords of distance t or less from precisely one of the codewords, that is, the t or
less bits in error can be corrected.
Type 2: non codewords of distance greater than t from two or more codewords, that is, the errors
can only be detected.
If all the non codewords are of Type 1 then the code is a perfect code.
Properties of Perfect Codes

1. Perfect codes will always be able to correct the most likely errors for any word received, there is
no need to handle detected errors (there will always be a unique coset leader).
2. Perfect codes guarantee maximum information rate for t-bit error correction (this is intuitively
obvious since the “unnecessary” Type 2 non-codewords don’t exist and fewer words mean
smaller n).
3. A perfect code correcting up to t errors will also have the smallest allowable minimum distance
of 2t+1.
Conceptual definition of a perfect code
A linear code of length n is called perfect for t errors provided that for every word r of length n,
there exists precisely one code word of Hamming distance t or less from r.
Operational definition of a perfect code
With only Type 1 non-codewords, r, then s = Hr, must be able to correct up to t-bit errors. Thus the
2n-k possible values of s must be the same as the total number of different combinations of
0, 1, 2, …, t-1, t bit errors of an n-bit word. Thus a linear (n,k) code is perfect for t errors if:
t
n
2 n−k
= ∑  i 
i =0
Equation 4.12
Example 4.24
Consider the code from Example 4.23(a):
Info Code
Consider the 5-bit word 00111
00 00000
01 01011 • It is not of distance (t=1) or less from precisely one codeword
10 10110 • It is, in fact, of distance 2 from two codewords: 01011 and 10110
! NOT a perfect code
11 11101
Furthermore:
t =1
 n   5   5
2 n−k
=2 5− 2 3
=2 =8 ≠ ∑  i  =  0  +  1  = 6
i =0

4.12 Hamming Codes - Perfect Codes for Single Error Correction
The Hamming codes we now consider are an important class of binary linear codes. Hamming
codes represent the family of perfect codes for single-error correction.
For single-error correction t = 1, and since the code is perfect from Equation 4.12:
t =1
n
2 n−k
= ∑   = n + 1 = ! n = 2m -1 where m=n-k and k = n-m = 2m -m-1
i = 0 
i
Properties of a (n,k) Hamming Code

Block length: n = 2m -1 m≥3
Information length : k = n - m = 2m -m-1
Min. distance: d(K) = 3
Error: single bit correction.
NOTE
1. For single-error correction, H must have non-zero, pairwise distinct columns. For m rows we
can have, at most, 2m -1 non-zero and distinct columns for H (i.e n ≤ 2m - 1 ).
k n−m
2. Since R= = , the more columns we have (larger n) per fixed m the better our
n n
information rate. Thus the condition for perfect codes, n = 2m -1, implies maximum information
rate.
Hamming Code (one possible definition)
A binary linear code is called a Hamming code provided that it has, for some number m, a parity
check matrix H of m rows and 2m -1 columns such that each non-zero binary word of length m is a
column of H.
4.12.1 Syndrome Decoding for Hamming Codes
Since the Hamming codes are perfect codes for single-error correction, all 2n possible received
words, r, are either a codeword or a codeword corrupted by 1 bit. Thus for each possible syndrome,
s, there must be only one coset leader. From the syndrome decoding condition for single-error
correcting codes we have the following decoding procedure for Hamming codes:
1. Calculate the syndrome: s = Hr
2. If s is zero there is no error.
3. If s is non-zero, let i be the column of H which matches s, then correct the ith bit of the received
word to form the codeword, c.
4. Decode to the information word, m, by stripping off the (n-k) check bits appended to c.

Example 4.25
Consider the following (7,4) Hamming code parity-check matrix:
0 0 0 1 1 1 1
n = 7 and k = 4
H = 0 1 1 0 0 1 1 "
  m = n−k =3
1 0 1 0 1 0 1
Design the encoding logic, list the codewords and perform syndrome decoding on:
0110010 0110101 1101100 0001111
Code Table
We need to convert H to systematic form:
0 1 1 1 1 0 0  x5 = x 2 + x 3 + x 4
 
H sys = 1 0 1 1 0 1 0 ! x6 = x1 + x3 + x 4
 
1 1 0 1 0 0 1 x7 = x1 + x2 + x 4
message codeword message codeword

x1x2x3x4 x1x2x3x4x5x6x7 x1x2x3x4 x1x2x3x4x5x6x7
0000 0000000 1000 1000011
0001 0001111 1001 1001100
0010 0010110 1010 1010101
0011 0011001 1011 1011010
0100 0100101 1100 1100110
0101 0101010 1101 1101001
0110 0110011 1110 1110000
0111 0111100 1111 1111111
Encoding Logic
message
x4x3x2x1
UP for first 4
x4 x3 x2 x1 bits
codeword
x7x6x5x4x3x2x1
DOWN for
last 3 bits
x7 x6 x5
N.B. Bits are serially shifted to the right with the LSB first (right-hand side) and MSB last (left-
hand side), thus the message and codeword bits are “reversed”.

Syndrome Decoding
We calculate s =Hr (using the original non-systematic form of H):
rT sT i cT mT
0110010 111 7 0110011 0110
0110101 011 3 0100101 0100
1101100 010 2 1001100 1001
0001111 000 - 0001111 0001
NOTE
1. The first three Hamming codes are:

m n k R
3 7 4 4/7 = 0.57
4 15 11 11/15 = 0.73
5 31 26 26/31 = 0.84
n−m m
2. The information rate for the Hamming codes go as R = = 1− but although R tends
n 2m − 1
to 1 rapidly for larger m we correct only single errors for increasingly large blocks ⇒ less error
protection.
Block Error Probability for Hamming Codes
 2 m − 1  2 m −1  2 m − 1  2 m − 2
Pe (m ) = 1 −  p −  qp
 0   1 
   
4.13 Types of Errors - Probability of Undetected Error
Assume the codeword ci was transmitted and r is received. The following types of errors can be
present when decoding a word:
1. No error (s = 0 and r = ci) " ideal
2. Detected error (s ≠ 0 and r ≠ any c):
a) correctly correct (coset leader ⊕ r = ci " non-fatal)
b) cannot correct (two or more coset leaders " non-fatal)
c) incorrectly correct (coset leader ⊕ r = cj ≠ ci " fatal)
3. Undetected error (s = 0 and r = cj ≠ ci) " fatal
In FEC error-control both 2(c) and 3 are fatal (and also 2(b) if there is no ARQ capability). For
ARQ error-control only 3 is fatal since the case of detected errors in 2 are dealt with in the same
way (by re-transmission). Since most communication systems use ARQ, it is the undetected errors
that are a universal problem.

4.13.1 Probability of undetected error
An undetected error occurs if the received word, r, is a different codeword, (r = cj), then was
transmitted, ci. Thus:
e = ci + r = ci + cj (e is the sum of two codewords ! e is a codeword itself)
Condition for an undetected error

An undetected error occurs if and only if e is a nonzero code word.
The probability of undetected error, Pund(K), is the sum of the probability of occurrence of the error
patterns which yield an undetected error. Define ei as the error pattern with i bits in error, thus the
probability of occurrence of ei is = qi pn-i. Since the ei of interest are those which are the same as the
non-zero codewords, we let Ai = number of code words with Hamming weight i, and hence:
n
Pund ( K ) = ∑ Ai qi p n − i
i =1
If the minimum weight of the code is d(K) then A1= A2 = ... = Ad(K)-1 = 0 so we have:
Probability of undetected error

n
Pund ( K ) = ∑ Ai qi p n − i
i = d (K)
Equation 4.13
where Ai = the number of codewords with Hamming weight i
Example 4.26
Consider the (7,4) Hamming code table of Example 4.25 from which we see that:
d(K) = 3 ! A3 = 7, A4 = 7, A5 = 0, A6 = 0, A7 = 1
Hence:
Pund ( K ) = 7 q 3 p 4 + 7 q 4 p 3 + q 7
If q = 0.01, then:
Pund(K) = 7 x 10-6 " 7 codewords in a million will have undetected errors

4.14 Other Error Detecting/Correcting codes
4.14.1 Types of Errors
When examining different codes it is important to consider:

• Type of Code: Is the code systematic or non-systematic?
• Information Rate: How many check-bits are used per message bit?
• Error protection: Number of bit errors that can be detected / corrected.
• Diagnostic resolution: Range over which bit errors are detected / corrected
(i.e. larger n " lower resolution)
• Error latency: How much data needs to be received before the error is detected / corrected
(i.e. larger n " longer latencies)
• Coverage: Which bit/word positions will be protected and in what way.
• Types of errors: Is the code best suited to random or burst errors?
The discussion so far has assumed that bit errors occur randomly (i.e. channel is memoryless or
independent). In practice, however, the condition that causes the bit error may last long enough to
affect a sequence of bits (i.e. channel has memory). If a sequence of , consecutive bits are suddenly
subject to interference, which causes a large number of them to be in error, then we have a condition
called a burst error of length ,. A specific type of burst error is the multiple unidirectional error
which forces groups of bits to all go 0 (low) or 1 (high). This can be caused by a common mode of
failure in the memory block or bus causing a group of local cells or bus lines to go low (i.e. shorted)
or high.
4.14.2 Bit interleaving
Although codes can be designed to specifically handle ,-length burst errors (e.g. the CRC codes
discussed in Section 4.15), a larger class of codes (e.g. the Hamming codes) exist which are
applicable only to random (or independent channel) errors. One way to convert burst errors to
random errors is to interleave bits entering the channel in a deterministic manner. If a sequence of
bits experience a burst-error the corresponding deinterleaving operation will spread these errors out
and hence “randomise” them.
Burst-error
Encoder Interleaver Deinterleaver Decoder
Channel
Input Output
Data Data
Independent-error Channel
Example 4.27
Consider the following 16-bit block of data: b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16
1:4 Interleaver: b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 b4 b8 b12 b16
A burst-error of length 4 occurs: b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15 b4 b8 b12 b16
1:4 Deinterleaver: b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16
And the burst error of length 4 has been converted a 1-bit random error in each n=4 code block.

4.14.3 m-out-of-n Codes
The m-out-of-n codes are formed as the set of n-bit vectors, where the vector is a code word if and
only if exactly m bits of the bits are 1. The 2-out-of-5 code is the most common example of such a
code:
Table 4.1: 2-out-of-5 code
message number codeword
1 00011
2 00101
3 01001
4 10001
5 00110
6 01010
7 10010
8 01100
9 10100
10 11000
Properties
• Non-systematic code (need to do a table lookup)
• Single error detection
• Multiple unidirectional error detection
• Used in encoding control signals
Since the code is non-systematic its use for encoding and decoding communication messages is
limited. However control signals which do not need to be decoded (just interpreted) can be
configured to exist as m-out-of-n codes, and any multiple unidirectional error on the control bus can
be detected. In such a case it is obvious that the number of 1’s will change and the error can be
detected.
Example 4.28
Consider a 2-out-of-5 code and assume message 7 (encoded as 10010) is sent.
Suppose a unidirectional error that causes the first 3 bits to go low occurs (most common error
when adjacent memory cells or bus lines experience a common failure like a short-circuit), then the
received message will be 00010 and this error is detected (since only one bit is a 1).
Suppose that now the first three bits are all tied high, then the received message will be 11110 and
this error is also detected (since four bits are now 1).
4.14.4 Checksum Codes
Checksum codes are generated by appending an n-bit checksum word to a block of s n-bit data
words, formed as the sum of the s n-bit words using modulo 2n addition.
Encoding: use an n-bit arithmetic adder to add the s n-bit data words, with any carry beyond the nth
bit being discarded. The sum is then appended to the data block as the n-bit checksum
Decoding: use the same n-bit arithmetic adder to add the s n-bit data words and XOR the sum with
the appended checksum, if the result is zero then there is no error.

Properties
• Simple and inexpensive, hence large number of uses
• Single error detection at least
• Multiple error detection dependent on s and columns in error: error coverage is highest for the
least significant bits
• Error coverage can be controlled by weighting the bits or using extended checksums
• Long error latency (must wait for end of data block)
• Low diagnostic resolution (error can be anywhere in the s n-bit words)
• Uses: sequential storage devices, block-transfer peripherals (disks, tapes), read-only memory,
serial numbering, etc.
Example 4.29
Let n=8 and s=8, then we append an 8-bit checksum, Ck, after every block of 8x 8-bit data words
8
(bytes), wi, calculated as follows: C k = ∑ w i mod 28 . Consider the following data block (in hex):
i =1
01 FA 02 32 62 A1 22 E3
The sum (in hex) = 337 and Ck = 337(hex) mod 256 = 37(hex)
(i.e., we simply retain the low-order 8 bits of the sum and discard the rest)
The multiple error coverage can be improved by using an extended checksum of m bits for n-bit
data where m > n and the sum is modulo 2m addition (i.e. we retain the low-order m bits of the sum).
As an example n=8 and m=16 and a 2-byte checksum is formed from the sum of s data bytes in the
block.
Real-world Checksum examples
International Standard Book Number (ISBN)

ISBN 0 13 283796 X
↑ ↑ ↑ ↑
Country Publisher Check symbol
Book ID
USA (Prentice Hall) (X is for 10)
1
Check symbol (a10 = X = 10) is derived such that ∑ia11−i is divisible by 11, where ak is the kth
i =10
digit from the left in the ISBN code. For the above example we see that:
Sum = (i = 10)*(a1 = 0) + (i = 9)*(a2 = 1) + 8*3 + 7*2 + 6*8 + 5*3 + 4*7 + 3*9 + 2*6 +
(i = 1)*(a10 = X = 10) = 187 which is divisible by 11
UWA Student Number
The last digit of the 7 digit student number, s7, is the check digit derived such that:
Sum = (8*s1 + 4*s2 + 6*s3 + 3*s4 + 5*s5 + 2*s6 + s7) is divisible by 11

4.15 Cyclic Codes
4.15.1 Introduction
The problem with checksum codes is that simply summing bits to form a checksum (or check bits)
does not produce a sufficiently complex result. By complex we mean that the input bits should
maximally affect the checksum, so that a wide range of bit errors or group of bit errors will affect
the final checksum and be detected. A more complex “checksumming” operation can be formed by
performing division rather than summation. Furthermore the remainder not the quotient is used, as
the remainder “gets kicked about quite a lot during the calculation, and if more bytes were added to
the message (dividend) it’s value could change radically again very quickly”.
In the previous sections the idea of the parity-check matrix, check-bit generation and syndrome
decoding as a means for efficient design and implementation of linear codes was shown from the
point of view of binary matrix linear algebra. The cyclic codes are a subclass of linear codes which
satisfy the following two properties:
Cyclic codes
Linearity property: the sum of any two codewords is also a codeword

Cyclic property: any cyclic shift of a codeword is also a codeword
Example 4.30
Consider the word: 01001100

One cyclic shift to the right: 00100110
Another: 00010011
Yet another: 10001001
If the code is a cyclic code all of the above would be codewords.
In a systematic cyclic code the check-bits are generated as the remainder of a division operation and
as indicated this provides a more powerful form of “checksumming”. The cyclic codes are analysed
not by representing words as binary vectors and using matrix binary linear algebra but by
representing words as binary polynomials and using polynomial arithmetic modulo 2. Like linear
codes the cyclic codes also permit efficient implementations of the encoding and decoding
operations, especially fast logic implementations for serial data transmission. The properties of
cyclic codes also allow greater scope for designing codes for specialised uses, especially in handling
burst errors. Not surprisingly all modern channel coding techniques use cyclic codes, especially the
important class of Cyclic Redundancy Check (CRC) codes for error detection.

4.15.2 Code polynomials and the cyclic property
Let the k-bit information or message word be represented as:

m0 m1 … mk-2 mk-1
where m0 is the MSB (most significant bit) and mk-1 is the LSB (least significant bit).
Let the n-bit code word be represented as:

c0 c1 … cn-2 cn-1
where c0 is the MSB and cn-1 is the LSB.
And let the (n-k)-bit check-bits be represented as:

b0 b1 … bn-k-2 bn-k-1
We assume the code is systematic hence:
c0 c1 … cn-k-2 cn-k-1 cn-k cn-k+1 … cn-2 cn-1 ⇒ b0 b1 … bn-k-2 bn-k-1 m0 m1 … mk-2 mk-1
Same reversed notation as used by “Communication Systems” by S. Haykin
We are adopting the same notation that the textbook by S. Haykin uses. Compared to the previous
definition in Section 4.7, there are two major differences:
1. The indexing sense has been reversed, and indexing starts at 0 not 1.
2. The codeword representation is now reversed. The bits are now running to the “right”.
Previously the 7-bit codeword 1101110 implied a message of 1101 followed by the check-bit
sequence 110 , all running to the “left”: Now the same 7-bit codeword is bit reversed to
0111011 which implies a message of 1011 followed by the check-bit sequence 011 , all
running to the “right”: In both cases the message and check-bits are actually identical.
We denote the column vectors using this reversed notation with a ^, that is:
r T = [1101110] ⇒ rˆ T = [0111011]
For a codeword of length n we define the code polynomial, C(x), as:
n −1
C( x ) = ∑ ci x i
i =0
We can similarly define the message polynomial, M(x) is formed as:
k −1
M( x ) = ∑ mi x i
i =0
and let:
n - k −1
B( x ) = ∑ bi x i
i =0
The for a systematic code we see that:

C(x) = B(x) + xn-kM(x)

Equation 4.14
Let C(j)(x) represent j cyclic shifts to the right of C(x), it can be shown that:
x jC(x) = Q(x)(xn + 1) + C(j)(x)
Equation 4.15
That is:
C(j)(x) = x jC(x) mod (xn + 1)
Equation 4.16
That is the remainder when dividing x jC(x) by (xn + 1) represents the polynomial of the codeword
cyclically shifted to right j times.
NOTE:
Remember we are using mod-2 binary arithmetic so: xi – x j " xi + x j ; xi + xi = 0 and 1 + 1 = 0
Example 4.31
Consider the codeword: 010 01100 with n = 8 and k = 5.
The left-most bits 010 are the check-bits and hence:
B( x ) = 0 + 1x + 0 x 2 = x
The right-most bits 01100 are the message bits and hence:
M ( x ) = 0 + 1x + 1x 2 + 0 x 3 + 0 x 4 = x + x 2
The codeword polynomial is then:
C( x ) = 0 + 1x + 0 x 2 + 0 x 3 + 1x 4 + 1x 5 + 0 x 6 + 0 x 7 = x + x 4 + x 5
C( x ) = B( x ) + x 3 M ( x ) = x + x 3 ( x + x 2 ) = x + x 4 + x 5
Say C(x) ≡ 01001100 is now cyclically right shifted by 3 bits, then we get C(3)(x) ≡ 10001001, and
hence:
C( x ) = x + x 4 + x 5 ⇒ C ( 3) ( x ) = 1 + x 4 + x 7
Now:
x 3C( x ) = x 4 + x 7 + x 8
And:
1
3 8 8 8 7 4
x C( x ) mod ( x + 1) ! x + 1 x + x + x
x8 + 1
(3)
x 7 + x 4 + 1 = C (x)
Let:
x 3C( x ) mod ( x 8 + 1) = R ( x )
where R(x) is the remainder, then:
x 3C( x ) = ( x 8 + 1)Q( x ) + R ( x )
where Q(x) is the quotient. From the above results we have that:
Q(x) = 1 and R(x) = x7 + x4 + 1
∴ x 3C( x ) = x 8 + x 7 + x 4 = ( x 8 + 1) ⋅1 + ( x 7 + x 4 + 1)

Now consider the (n-k+1) bit word represented by:

g0 g1 … gn-k-1 gn-k
where, by definition, g0 = 1 and gn-k = 1 The corresponding polynomial is known as the:
Generator Polynomial
n − k -1
G( x ) = 1 + ∑ gi x i + x n − k
i =1
Equation 4.17
We choose G(x) to be a factor of xn + 1 (i.e. xn + 1 = G(x)H(x), where H(x) is known as the parity-
check polynomial) and form all words of the form C(x) = A(x)G(x) as code polynomials belonging
to the code defined by G(x). The code so defined is obviously linear, but is it cyclic? That is, will a
cyclic shift of C(x) yield a polynomial which can be defined by, say, A2(x)G(x) (i.e. C(j)(x) =
A2(x)G(x) for some A2(x)) and thus belong to the code? Using Equation 4.15 with the appropriate
substitutions (C(x) = A(x)G(x) and (xn + 1) = G(x)H(x)), we have:
x jA(x)G(x) = Q(x)G(x)H(x) + C(j)(x)
Dividing throughout by G(x) we get:
C( j ) ( x)
x j A( x ) = Q( x )H( x ) +
G( x )
C( j ) ( x)
∴ = x j A( x ) + Q( x )H( x ) = A 2 ( x )
G( x )
and hence C(j)(x) = A2(x)G(x). Thus:
Result 4.6
If the degree n-k polynomial G(x) as defined by Equation 4.17 is a factor of (xn + 1), then all code
polynomials (i.e. codewords) formed as the product A(x)G(x), where A(x) is any degree k-1
polynomial, belong to the (n,k) cyclic code defined by G(x).
NOTE
1. The degree of the polynomials encountered so far are:

C(x) is a polynomial of degree n-1
M(x) is a polynomial of degree k-1
G(x) is a polynomial of degree n-k
B(x) is a polynomial of degree n-k-1
2. The product of a polynomial of degree a with a polynomial of degree b is a polynomial of
degree a + b. Hence A(x) is a polynomial of degree k-1 since n-1 = (k-1) + (n-k).
3. Since A(x) is a polynomial of degree k-1, there are 2k possible A(x) and hence 2k possible code
polynomials.
4. G(x) is governed only by the degree (n-k) and the fact that it must be a factor of (xn + 1), thus for
the same G(x) we may be able design different codes for different appropriate values of n.
In the same way that the parity-check matrix, H, was used to define a linear code the generator
polynomial, G(x), is used to define a cyclic code.

4.15.3 Encoding/Decoding cyclic codes (algebraic method)
By making A(x) = M(x) we simply derive C(x) = M(x)G(x), but this will not give us a systematic
(i.e. separable) code. For a systematic code we must have:
C(x) = A(x)G(x) = B(x) + xn-kM(x)
Hence dividing the above through by G(x):
x n − k M( x ) B( x )
= A( x ) +
G( x ) G( x )
Thus B(x) is the remainder left when dividing xn-kM(x) by G(x).
Encoding Systematic Cyclic Codes
1. Express the message word in polynomial form, M(x)

2. Form xn-kM(x).
3. Divide xn-kM(x) by G(x) to find the remainder B(x).
4. Form the codeword polynomial, C(x) = B(x) + xn-kM(x)
5. Convert C(x) to binary form to form the codeword.
Example 4.32
Consider G(x) = 1 + x + x3 with n=7 and k=4 (we will later see that this G(x) produces a (7,4)
Hamming code). Thus C(x) = x3M(x) + B(x), where B(x) is the remainder left when dividing x3M(x)
by G(x). The following are examples of encoding systematic cyclic codes algebraically for the
messages 1000, 1010 and 1011:
Message M(x) x3M(x) B(x) C(x) Codeword

m 0m 1m 2m 3 b0b1b2m0m1m2m3
1 0 0 0 1 x3 1+x 1+x+x3 1 1 0 1 0 0 0
1 0 1 0 1+x2 x3+x5 x2 x2+x3+x5 0 0 1 1 0 1 0
1 0 1 1 1+x2+x3 x +x5+x6
3
1 1+x3+x5+x6 1 0 0 1 0 1 1
B(x) for Message 1000

Q( x ) 1
3 3 3
G( x ) x M ( x ) ! x + x + 1 x
* x3 + x +1
B( x ) x +1
Q( x ) x2
G( x ) x 3M ( x ) ! x 3 + x + 1 x5 + x3
* x5 + x3 + x 2
B( x ) x2

x3 + x2 + x + 1
x3 + x + 1 x6 + x5 + x3
x6 + x4 + x3
x5 + x 4
Q( x )
x5 + x3 + x 2
3
G( x ) x M ( x ) !
x4 + x3 + x2
*
B( x ) x4 + x2 + x
x3 + x
x3 + x + 1
1
For decoding consider R(x) as the received word polynomial of degree n-1. Let S(x) denote the
remainder of dividing R(x) by G(x). Since C(x) = A(x)G(x) then if there are no errors we expect R(x)
= C(x) and hence no remainder (i.e. S(x) = 0). If S(x) ≠ 0 then we know there is an error. S(x) is
called the syndrome polynomial.
How is S(x) related to E(x), the error polynomial?
We have:
R(x) = C(x) + E(x)
Now:
Q(x)G(x) + S(x) = A(x)G(x) + E(x)
where Q(x) is the quotient and S(x) is the remainder when R(x) is divided by G(x). Then:
E(x) = (Q(x) + A(x))G(x) + S(x)
Hence the syndrome can be obtained as the remainder of dividing the error polynomial, E(x), by
G(x) (i.e. S(x) = E(x) mod G(x)). In the same way the syndrome was dependent only on the error
pattern for linear codes the syndrome polynomial is only dependent on the error polynomial for
cyclic codes. Thus the syndrome polynomial can be used to find the most likely error polynomial.
Decoding Cyclic codes
1. Express the received word in polynomial form, R(x).

2. Divide R(x) by G(x) to find the remainder, S(x)
3. If S(x) = 0, there is no error.
4. If S(x) ≠ 0 then an error has occurred and the “coset leader” error polynomial for S(x) is used to
correct the error.

Example 4.33
Consider the cyclic code from Example 4.32 with G(x) = 1 + x + x3. Say the codeword
cˆ T = [0011010] is transmitted, but rˆ T = [0011110] is received (an error in bit r4). To decode we
form:
R(x) = x2 + x3 + x4 + x5
and divide R(x) by G(x) to yield the remainder S(x):
Q( x ) x2 + x
G( x ) R ( x ) ! x 3 + x + 1 x5 + x4 + x3 + x 2
* x5 + x3 + x 2
S( x )
x4
x4 + x2 + x
x2 + x
Thus S(x) = x2 + x " sˆ T = [011] , and since S(x) ≠ 0 an error is detected and syndrome decoding
can be used to attempt to correct the error.
NOTE
1. The generator polynomial, G(x), can be shown to be related to the generator matrix, G, and the
parity-check polynomial, H(x), can be shown to be related to the parity-check matrix, H.
2. The Hamming codes are cyclic codes! Specifically a (7,4) Hamming code is designed with
G(x) = 1 + x + x3. Since G(x)H(x) = x7 +1, the corresponding H(x) = 1 + x + x2 + x4 can be used to
construct the (7,4) Hamming code parity-check matrix.
3. S(x) is a polynomial of degree n-k-1, E(x) is a polynomial of degree n-1.
Exercise: Verify that the codewords generated with G(x) as defined above indeed produce a (7,4)
Hamming code
4.15.4 Deriving the Generator matrix and Parity-Check matrix from G(x)
As stated previously we can design the cyclic code codewords by C(x) = M(x)G(x). Although this
does not give us a systematic code it will allow direct derivation of the generator matrix, G. For
compatibility with the G and H matrices discussed in Section 4.7 we need to “undo” the bit reversal
implied by the cyclic code polynomial notation. For example, consider the code polynomial C(x) co-
efficient vector:
 c0   cn −1 
 c  c 
 1   n−2 
cˆ = ⇒ c=
 )   ) 
   
cn −1   c0 
We want G such that:

cn −1  mk −1 
 )   ) 
c = Gm ⇒   = G 
 c1   m1 
   
 c0   m0 
Now we have:
C( x ) = M ( x ) G ( x )
= ( m0 + m1 x + m2 x 2 + - + mk −1 x k −1 )G( x )
= m0G ( x ) + m1 xG ( x ) + m2 x 2 G( x ) + - + mk −1 x k −1G( x )
and reversing the order this becomes:

cn −1 x n −1 + - + c1 x + c0 = mk −1 x k −1G ( x ) + - + m2 x 2 G( x ) + m1 xG( x ) + m0 G( x )
now we have (noting g0 = gn-k = 1 ):
G( x ) = g n − k x n − k + - + g 2 x 2 + g1 x + 1 ⇒ x i G( x ) = g n − k x n − k + i + - + g 2 x 2 + i + g1 x1+ i + g 0 x i
In matrix form this becomes:
 g n − k x n −1   0   0   0   0 
  g n−2       
 )  x ) ) 0
 n −k       
 g x k +1   )   0   0   ) 
−  2 k     − +     
cn −1 x n 1   g1 x  g2 x k
gn−k x n k 2
0 0
         
 g x k −1   k −1     n − k +1   
 ) =m  0 g x ) g x 0
−  + mk − 2  1 k − 2  + - + m2   + m1  n − k  + m0  − 
 c x  k 1
  g0 x g2 x 4
) gn −k x n k
 1 
0        
     3   3   ) 
 c   0 
0 g 1 x g 2 x

      2 
0
2 2
 )   )   g0 x   g1 x   g2 x 
   0   0   g x   g x 
 0  0 1
       
 0   0   0   0   g0 
or:
gn −k 0 * 0 0 0 
 ) gn −k * ) ) 0 
 
 c n −1   g2 ) + 0 0 )   m k −1 
 x n −1 0 0   x
n −1
0  
0  m k − 2 
0 * 0 * 0
  n −2  
c   g1 g2 * gn −k 0
 0
n−2
0 0  n−2  
 )  
x * 0 x * 0 0 
g0 g1 * ) gn −k 0  ) 
 ) ) + 
) )   =  ) ) + ) )   
   c2    0 g0 * g2 ) g n − k   m2 
 0 0 * x 0 
c1  
0 0 * x 0 
0 0 * g1 g2 )   m1 
 0 * 
0 1    * 1   
 0
c0  
0 0 0
 '
( &  ) ) + g0 g1 g 2   m0 
(!'!&
c  0 0 * 0 g0 g1  m
 
 0
( 0 * 0 0 g 0 
!!!!!!!!'!!!!!!!!&
G
And this gives G! Now by definition HG = 0 which means row operations on H correspond to
column operations on G. So to convert G to systematic form we perform column operations on G.
Once G is systematic we obtain the systematic form for H trivially. In summary:

1. Take the n column vector of the reversed and zero-padded (n-k) co-efficients of G(x) and form G
as the concatenation of the k cyclic shifts of the column vector as shown above.
2. Perform column operations to convert G to systematic form.
3. Derive the systematic form of H directly from the systematic form of G.
Example 4.34
Find G and hence H corresponding to G(x) = 1 + x + x3 and n=7, and hence verify that this is indeed
a (7,4) Hamming code.
 g 3  1
 g   0
Now G(x) = 1 + x + x ! g = [1101] ! g =  2  =  
3
ˆ T
 g1  1
   
 g 0  1
With n=7 and k =4 we form:
1 0 0 0
0 1 0 0
 
1 0 1 0
  I 
G = 1 1 0 1 and we need the systematic form, G sys =  k 
P
0 1 1 0
 
0 0 1 1
0 0 0 1
By:
• adding columns 3 and 4 to column 1
• adding column 4 to column 2
we get:
1 0 0 0
0 1 0 0 
 
0 0 1 0  1 1 1 0 1 0 0
 
G sys = 0 0 0 1 ⇒ H = [P | I n − k ] = 0 1 1 1 0 1 0
 
1 1 1 0 1 1 0 1 0 0 1
 
0 1 1 1 
1 1 0 1
And H is the parity-check matrix of a (7,4) Hamming code
We can now use H to perform syndrome decoding. Consider cˆ T = [0011010] , rˆ T = [0011110] and
1
s = [011] from Example 4.33. Now s = 1 matches the 3rd column of H, hence the 3rd bit of
ˆT
 
0
r T = [0111100] is corrected to form c T = [0101100] " cˆ T = [0011010] .

4.15.5 Linear Shift Register Method for Encoding/Decoding cyclic codes
The generator polynomial coefficients, gi, can be used to design a linear shift register circuit for
performing the division operation of xn-kM(x) by G(x). One form of the register circuit is:
DECODE ENCODE
IN CONTROL IN
g1 g2 gn-k-1
ENCODE
b0 b1 b2 ... bn-k-1 OUT
CLEAR
Figure 4.1 Encoder / Decoder circuit for cyclic codes

Layout
• If gi = 1 then an XOR gate is included between register cells bi-1 and bi as shown.
• If gi = 0 then register cells bi-1 and bi are directly connected (there is no XOR gate).
• Each bit register, bi, is a clocked synchronous memory register, (e.g. a D-type flip-flop).
Encoding is accomplished by simply clearing the registers, setting CONTROL = 1 and DECODE
IN = 0 and shifting the message word through the shift register via the ENCODE IN. After all
message bits have been shifted into the register, the check bits (i.e. remainder B(x)) can then be
found in the shift register contents b0b1 … bn-k-1 and these can be appended to the message bits by
setting CONTROL = 0 and shifting the contents out of the register via ENCODE OUT.
Decoding is accomplished by clearing the shift register, setting CONTROL = 1 and ENCODE IN =
0 and shifting the received word through the register via DECODE IN. The final contents of the
shift register will only be zero if there has been no error (i.e. the received word is a code word),
otherwise it contains the syndrome bits which can be used to perform error correction by syndrome
decoding.
Example 4.35
For G(x) = 1 + x + x3 we have n-k-1 = 2 and g1 = 1, g2 = 0. Hence the coding circuit is:
DECODE ENCODE
IN CONTROL IN
g1=1 g2=0
ENCODE
b0 b1 b2 OUT
CLEAR

Encode message 1010

Clear the register cells, set CONTROL = 1 and DECODE IN = 0, and serially right-shift 1010 in
through ENCODE IN (note that this implies LSB (m3) first and MSB (m0) last):
ENCODE BEFORE AFTER

IN b0b1b2 b0b1b2
0 000 000
1 000 110
0 110 011
1 011 001
The register contents are 0 0 1. Now set CONTROL = 0 and shift the contents of the shift register to
ENCODE OUT. The code word is then:
0011010
From From
ENCODE OUT ENCODE IN
And the result is confirmed from Example 4.32
Decode received word 0011110

Clear the register cells, set CONTROL = 1 and ENCODE IN = 0, and serially right shift 0011110 in
through DECODE IN (as before this implies LSB (r7) first and MSB (r0) last):
DECODE BEFORE AFTER
IN b0b1b2 b0b1b2
0 000 000
1 000 100
1 100 110
1 110 111
1 111 001
0 001 110
0 110 011
And the non-zero syndrome indicates that there is an error. The syndrome value of 011 is confirmed
from Example 4.33.
NOTE:
1. The cyclic code circuitry is in fact simpler than the more cumbersome combinational logic
required for check-bit generation and syndrome generation of general linear codes, especially
when data has to be processed serially.
2. An encoder only circuit will have neither a DECODE IN nor top left XOR. Similarly, a decoder
only circuit will have neither the ENCODE IN, CONTROL, top AND gate, nor top right XOR.

4.15.6 Cyclic Redundancy Check (CRC) Codes
The CRC codes are codes specifically designed for error detection. There are several reasons why
CRC codes are the best codes for any form of practical error detection:
1. The encoder and decoder implementation is fast and practical, especially for communication data
which is processed in serial fashion.
2. The CRC codes handle many combinations of errors, especially combinations of burst errors, as
well as random errors.
Since CRC codes only detect errors there is no syndrome decoding as such. The syndrome is
calculated and checked to see if it is zero or not (the true non-zero value of the syndrome is not
important). Thus the ENCODE IN in Figure 4.1 can be used rather than DECODE IN for decoding.
The decoder circuit is identical to the encoder circuit
ENCODE / DECODE
CONTROL IN
g1 g2 gn-k-1
... ENCODE
b0 b1 b2 bn-k-1
OUT
CLEAR
Figure 4.2 Encoder / Decoder circuit for CRC codes
Why does this work? Using ENCODE IN means we are dividing xn-kR(x), rather than R(x),
by G(x). In both cases if R(x) = C(x), the remainder will be zero.
A complete encoder/decoder CRC circuit can be designed using simple combinatorial logic as
shown in Figure 4.3.
SERIAL
IN
SERIAL
OR OUT
ENABLE
FEEDBACK
CONTROL ENCODE
IN
ENCODE
CLEAR CLEAR
OUT
b0 b1 b2 … bn-k-1
ERROR
OR
FLAG
Figure 4.3 Complete CRC Encoder / Decoder circuit

CRC Encoding
1. CLEAR register.
2. Set ENABLE FEEDBACK = 1 and shift the k message bits via SERIAL IN
(same bits appear at SERIAL OUT as the message part of the code word).
3. Set ENABLE FEEDBACK = 0 and shift the n-k register contents
via SERIAL OUT (check bits immediately follow to complete the code word).
CRC Decoding
1. CLEAR register.
2. Set ENABLE FEEDBACK = 1 and shift the n received word bits via SERIAL IN.
3. If ERROR FLAG = 1 an error has been detected.
4.15.7 Generator Polynomials and Applications of CRC codes
We have:
R(x) = C(x) + E(x)
So:
R(x) mod G(x) = C(x) mod G(x) + E(x) mod G(x) = 0 + E(x) mod G(x)
Types of errors detected by G(x)
Thus errors will go undetected if E(x) mod G(x) = 0. That is, if E(x) is a multiple of G(x) errors
are undetected. So we choose G(x) such that its multiples look as little like the kind of noise we
expect (in terms of the error patterns) as possible. That is we want G(x) to not be a factor of E(x) if
we want to detect the types of errors E(x) represents.
NOTE: Since the CRC logic divides xn-kR(x) by G(x) errors will go undetected if xn-kE(x) mod G(x)
= 0, and the same observation still applies: if E(x) is a multiple of G(x) errors are undetected.
Important results for choosing a generator polynomial, G(x):

• For single-bit errors E(x) = xi, if G(x) has 2 or more terms then it will never be a factor of E(x) =
xi (i.e. E(x) mod G(x) ≠ 0) and all single bit errors will be detected. For example, G(x) = 1 + x.
• For double-bit errors E(x) = xi + x j = xi(1 + x j-i); if G(x) has 2 or more terms and it is not a factor
of 1 + xr for r = 1,2, … n then it will never divide E(x) = xi(1 + x j-i) and all double-bit errors will
be detected. For example, G(x) = 1 + x14 + x15 will not divide (is not a factor of) 1 + xr for any
value of r below 32,768!
• It can be shown that if and only if a polynomial has (1 + x) as a factor it has an even number of
terms. Thus for odd bit errors E(x) has an odd number of terms and does not have (1 + x) as a
factor. If G(x) has an even number of terms it has (1 + x) as a factor and thus G(x) will never
divide E(x) and all odd bit errors will be detected. For example, G(x) = 1 + x.
• If there are burst errors of length r or less then E(x) = xi(1 + x + x2 + … + xr-1) where i
determines how far from the right the burst is located. If G(x) is a polynomial of degree r then
burst errors of length ≤ r imply that E(x) has xi and a polynomial of degree < r as factors. Since a
lower degree polynomial (i.e. E(x)) can never be divided by a higher degree polynomial (i.e.
G(x)) and since G(x) has 2 or more terms (it will never divide xi) then all burst errors of length ≤
r bits can be detected.

Table 4.2 Standard CRC codes

Name (n,k) G(x)
CRC-12 (12+k,k) 1 + x + x2 + x3+ x11 + x12
CRC-16 (16+k,k) 1 + x2 + x15 + x16
CRC-32 (32+k,k) 1 + x + x2 + x4 + x5 + x7 + x8 + x10 + x11 + x12 + x16 + x22 + x23 + x26 + x32
CRC-CCITT (16+k,k) 1 + x5 + x12 + x16
Think! Textbooks on CRC codes fail to mention that G(x) must be a factor of xn +1. In fact the
implication seems to be that n can take on any value, and thus CRC codes are not strictly cyclic
codes.
4.15.8 BCH codes
This is one of the most important and powerful classes of linear block codes. The most common
binary BCH codes, known as primitive BCH codes, are characterised as follows:
BCH Codes:
Block length: n = 2m -1 for m ≥ 3
Message length : k ≥ n - mt
Min. distance: d(K) ≥ 2t + 1
Error: t-bit correction where t < (2m - 1)/2
Thus BCH codes are t-error correcting codes. The Hamming single-correcting codes can be
described as BCH codes with t = 1.
4.15.9 Reed-Solomon Codes
The Reed-Solomon codes, also called RS codes, are an important class of non-binary BCH codes.
The encoding process maps k symbols to n symbols where each symbol is an m-bit word. Obviously
m=8 is a popular choice since this means the code operates on byte word blocks. The RS codes are
characterised as follows:
RS Codes:
Block length: n = 2m -1 symbols
Message length: k symbols
Parity-check length: n-k = 2t symbols
Min. distance: d(K) ≥ 2t + 1 symbols
Error: t-symbol correction

4.15.10 Convolutional Codes
In block coding, a k-bit word is mapped to an n-bit word and processed on a block-by-block basis.
This requires the encoder to shift in a k-bit block and then wait at least (n-k) cycles for the complete
n-bit code block to be generated before the next k-bit block can be shifted in. In most applications
and in data communications in particular, the data message bits arrive continuously. Thus this
would require some form of buffering of data blocks and a coding scheme that codes the data “on-
the-fly” and in a continuous fashion would be preferable. Such codes are called convolutional codes.
Properties
• Coding process operates on data continuously rather than waiting for or buffering a block of data
• Non-systematic coding schemes are used, which makes decoding more complicated

Chapter4 PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter4 PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Information Theory and Coding 314

Dr. Roberto Togneri

Dept. of E&E Engineering

4. ERROR-CONTROL CODES (CHANNEL CODING)

In the presence of noise we include an additional channel decoder/encoder. This allows us to

Properties of Channel Coding

Our model for a noisy channel is:

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-2

4.2 Information Rate (fixed-rate source coding)

It should be evident that: N=[Lτ s /τ c ] or N =[Lnc /ns ]

Channel Coding Information Rate

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-3

L = 5, n = 3 " X X X X X XXXXX XXXXX = 15 bits

4.3 Decision Rules and Channel Decoding

Consider a channel with an r-symbol input alphabet:

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-4

a 2 0.2 0.3 0.5

4.3.1 Minimum-Error Probability Decoding

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-5

4.3.2 Maximum-Likelihood Decoding

4.3.3 Error probability

and the probability of error:

P(E/bj) is the conditional error probability given the output bj.

We can equivalently define:

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-6

Maximum Likelihood Decoding

Minimum Error Decoding

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-7

Decision Rule for modern BSCs and typical source coders

The following will typically be the case:

4.4 Hamming Distance

d(a,b) = number of bit positions i = 1,2, … n where ai ≠ bi

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-8

4.4.1 Decoding Rule using the Hamming Distance

(a) if b = ai for a particular i then the codeword ai was sent.

(b) if b ≠ ai for any i, we find the codeword, a* which is closest to b:

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-9

4.4.2 Error Detection/Correction using the Hamming Distance

Error Detecting Property

Error Correcting Property

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-10

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-11

Code from Example 4.5

4.4.3 Upper bound B(N,d(K)) on number of codewords

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-12

1-bit error detection

1-bit error correction

2-bit error detection

2-bit error correction

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-13

4.5 Block Error Probability and Information Rate

What is the relationship between Pe(Kn) and R(Kn)?

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-14

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-15

And over a message with L = 2 bits we can show that:

Code K6 from Example 4.6 (N = 6, L = 3)

3. Can an arbitrary reliability be achieved if R(Kn) = 0.99?

4.6 Shannon's Fundamental Theorem

For a noiseless BSC:

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-16

4.6.1 Engineer’s Proof

22/01/01 Dr. Roberto Togneri, E&E Eng, UWA Page 4-17

Converse of Shannon's Theorem