Expander Graph Arguments For Message Passing Algorithms

Expander Graph Arguments for Message Passing
∗
Algorithms
David Burshtein and Gadi Miller

Dept. of Electrical Engineering Systems
Tel-Aviv University
Tel-Aviv 69978, Israel
Abstract
We show how expander based arguments may be used to prove that message passing
algorithms can correct a linear number of erroneous messages. The implication of this result is
that when the block length is sufficiently large, once a message passing algorithm has corrected
a sufficiently large fraction of the errors, it will eventually correct all errors. This result is
then combined with known results on the ability of message passing algorithms to reduce the
number of errors to an arbitrarily small fraction for relatively high transmission rates. The
results hold for various message passing algorithms, including Gallager’s hard decision and
soft decision (with clipping) decoding algorithms. Our results assume low density parity check
codes based on an irregular bipartite graph.
Index Terms - Low density parity check codes, expander graph, belief propagation, iterative
decoding.
I Introduction
Low density parity check (LDPC) codes were introduced by Gallager in 1963 [1]. Gallager explored
the properties of these codes under the assumption of optimal (maximum likelihood) decoding.
He also suggested several practical iterative decoding algorithms. Recently, following the intro-
duction of turbo codes by Berrou et al. [11], LDPC codes have attracted significant academic and
commercial interest.
∗
IEEE Transactions on Information Theory, volume 47, pp. 782-790, February 2001
1
LDPC codes have been shown to possess some very desirable properties [1], [3], [4]. Under the
assumption of optimal (Maximum Likelihood) decoding, it can be shown that for properly chosen
ensemble parameters these codes have an error exponent arbitrarily close to the random coding
error exponent [4].
Maximum Likelihood decoding of LDPC codes is in general not feasible. Instead, it has been
suggested to use some iterative decoding procedure [1], [2], [7], [10]. Gallager [1] proposed two
iterative decoding algorithms. The first is a hard decision decoding algorithm. The second is a soft
decision decoding algorithm, which is more complicated but at the same time more powerful. Both
algorithms are “message passing algorithms” (a precise definition will be given later). Richardson
and Urbanke [7] proposed other message passing algorithms which may be viewed as a compromise
between Gallager’s hard and soft decoding algorithms.
Sipser and Spielman [10] suggested an alternative hard decoding algorithm, which is not
a message passing algorithm. Using properties of expander graphs, Sipser and Spielman [10]
proved that their algorithm can correct a linear number of errors in the received bits. On the
other hand, for various channels and LDPC codes it was shown [2], [7], [8] that the fraction of
errors, when using message passing algorithms, can be made arbitrarily small for relatively high
transmission rates. In particular, when using the soft decoding algorithm for properly chosen
irregular LDPC codes, it was demonstrated that this property holds for transmission rates close to
channel capacity. Hence in [2], [7] it was suggested to combine these two types of algorithms, that
is to start iterating with a message passing algorithm, and then switch to Sipser and Spielman’s
algorithm (an alternative suggestion was to conclude the work by using some other, high rate
LDPC code, at the final stage). In practice, however, it was noted that this algorithmic switch is
unnecessary (e.g. [2]).
In this paper we show that expander based arguments may be applied for message passing
algorithms as well. Hence the algorithmic switch is indeed unnecessary. The paper is organized
as follows. In Section II we provide some background information on irregular LDPC codes that
are based on random irregular bipartite graphs. We also briefly describe both Gallager’s message
passing decoding algorithms and Sipser and Spielman’s decoding algorithm. In Section III we show
how expander based arguments may be applied to the hard decoding message passing algorithm.
In Section IV we generalize our results. In particular we present a similar result for the soft
decoding algorithm (with appropriate clipping). Section V concludes the paper.
2
II Background
Throughout the paper we assume a binary-input, {0, 1}, memoryless channel. The following
method is used to construct an ensemble of irregular low density parity check (LDPC) codes. The
method is based on an ensemble of irregular bipartite graphs. We first specify two sequences
λ = (λlmin , . . . , λlmax ) ρ = (ρrmin , . . . , ρrmax )
λl is the fraction of left nodes with degree l. ρr is the fraction of right nodes with degree r.
Without loss of generality we assume that λlmin > 0, λlmax > 0, ρrmin > 0, ρrmax > 0. Hence,
lmin , lmax are the minimal and maximal left degrees. rmin , rmax are the minimal and maximal
right degrees (note: in [10], [2], [9], [8] the notation is different: λl (ρr , respectively) denotes the
fraction of edges with left (right) degree l (r)).
Now, there are N left nodes such that λl N left nodes have degree l (l = lmin , . . . , lmax ).
Similarly, there are M right nodes, such that ρr M nodes have degree r (r = rmin , . . . , rmax ). By
counting the total number of edges, E, we have,
lX
max rX
max
E= λl lN = ρr rM
l=lmin r=rmin
The E edges originating from left nodes are labeled from 1 to E. The same procedure is applied for
the E edges originating from right nodes. The ensemble of bipartite graphs is obtained by choosing
a permutation π with uniform probability from the space of all permutations of {1, 2, . . . , E}. For
each i, the edge labeled i on the left side is associated with the edge labeled πi on the right side.
This is shown in Figure 1. Note that in this way multiple edges may link a pair of nodes.
The nodes on the left side are associated with the codeword bits (variable nodes) and the
nodes on the right are associated with the parity-check equations (constraints or check nodes).
The mapping from the bipartite graph space to the parity-check matrix space is such that an
element Ai,j in the matrix, corresponding to the i’th node on the right and j’th node on the left,
is set to ‘1’ if there is an odd number of edges between the two nodes, and to ‘0’ otherwise.
The rate R of each code in the ensemble satisfies R ≥ 1 − M/N (the inequality is due to a
possible redundancy in the M parity check equations).
A special case of the irregular code ensemble that was described above is obtained by setting
lmin = lmax = l and rmin = rmax = r. We then have an l − r regular code ensemble. It was
shown [8] that by setting λ and ρ appropriately, the performance of irregular codes can be made
superior to the performance of both regular codes and turbo codes.
3
Gallager proposed two practical iterative decoding algorithms for LDPC codes. We first
present the hard decoding algorithm. In the following, li and rj denote the number of edges of
the i-th variable node and j-th check node, respectively. si ∈ {0, 1} i = 1, . . . , N is the received
value (channel output) corresponding to the i-th variable (if the channel is an arbitrary binary-
input one, we assume that si is a binary quantized version of the channel). Leftbound messages are
messages transmitted from a check node to a variable node. Rightbound messages are messages
transmitted from a variable node to a check node.
Algorithm 1 Gallager’s hard decoding algorithm [1]. Do the following two steps, alter-
nately (the number of iterations will be discussed later):
1. Rightbound messages. For all edges e = (i, j) do the following in parallel: If this is the
zeroth round, then set gi,j to si . Otherwise compute gi,j as follows:
(a) If more than β(li − 1) of the incoming leftbound messages (excluding the message along
edge e = (i, j)) sent the same value to i in the previous round, then set gi,j to this
value.
(b) Otherwise set gi,j to si .
In either case, i sends gi,j to j.
2. Leftbound messages. For all edges (i, j) do the following in parallel: The check node j
sends to i the parity (exclusive-or) of the values it received in this round from its adjacent
variable nodes excluding i.
Notes:
1. The value of β = βl is in general a function of ρ, λ and l.
2. The hard decoding algorithm is also called Gallager’s decoding algorithm B. Gallager’s
decoding algorithm A is a special case of this algorithm with β < 1 arbitrarily close to 1.
The second algorithm that Gallager proposed is a soft decoding algorithm, which is more
complicated and time consuming. However, it possesses improved performance compared to the
hard decoding algorithm. The soft decoding algorithm may be described as follows. Let si be the
channel output for the i-th input bit, and let mi be the following log-likelihood ratio,
p(0 | si ) p(si | 0)
mi = log = log
p(1 | si ) p(si | 1)
4
where p(·|si ) denotes the a-posteriori probability of the transmitted bit given si (we have assumed
that the code is such that the a-priori probabilities of an input bit are equal).
Algorithm 2 Gallager’s soft decoding algorithm [1]. Do the following two steps, alternately
(the number of iterations will be discussed later):
1. Rightbound messages. For all edges e = (i, j) do the following in parallel: If this is the
zeroth round, than set gi,j to mi . Otherwise
X
gi,j = mi + hi,k (1)
k∈N (i)\j
where N (i) denotes the set of neighbor nodes of i. In either case, the variable node i sends
gi,j to j.
2. Leftbound messages. For all edges (i, j) do the following in parallel: The check node j
sends i the following message, hi,j :
 
Y egk,j − 1  1+u
hi,j =f f (u) = log (2)
k∈N (j)\i
egk,j + 1 1−u
The soft decoding algorithm is also called belief propagation decoding [6]. The reasoning
behind Algorithm 2 is that in iteration n, gi,j is the log-likelihood ratio that corresponds to the
probability of the variable i given the values of the variables in the graph spanned from e = (i, j)
in 2n layers, under the assumption that this graph is tree like. This assumption will be discussed
later. The tree assumption is illustrated in Figure 2.
Both algorithms 1 and 2 are “message passing algorithms” in the sense that information is
transmitted back and forth between variable and check nodes along the edges. The transmitted
message along an edge is a function of all received messages at the node except for the said edge.
This property of the algorithm ensures that the incoming messages are independent for a tree like
graph.
Sipser and Spielman [10] suggested a slightly different hard decoding algorithm, which is not
a message passing algorithm. The proof of the convergence properties of the algorithm is based
on the notion of an expander graph.
We use the following notation. For any set V of variable nodes, we denote by N (V ), the set
of neighbors (check nodes) of V . E(V ) denotes the set of edges that connect V with N (V ). In
addition, | · | denotes set cardinality, so that |N (V )| and |E(V )| are the number of neighbors and
edges of the set V , respectively.
5
Definition 1 A (γ, δ, ρ, λ) expander is a (ρ, λ) bipartite graph, such that for any set X of variable
nodes with |E(X)| ≤ γN , |N (X)| > δ|E(X)|.
In [10], [2] an expander is defined under the condition that |X| is small enough (|X| ≤ αN
instead of E(|X|) ≤ γN ). Note that when the underlying bipartite graph is regular, the two
definitions coincide. However, we found our definition to be more convenient for formulating our
results for irregular bipartite graphs.
The parallel version of the algorithm that was proposed in [10] reads,
Algorithm 3 Sipser and Spielman, parallel decoding algorithm [10]. Iterate the following
(the number of iterations will be discussed later): For each variable node, count the number of
unsatisfied check nodes among its neighbors. Flip in parallel each variable node with fraction larger
than β of its neighbors unsatisfied.
Note: In [10], β = 1/2 is used in algorithm 3.

Sipser and Spielman [10] proved that their algorithm can correct a linear number of errors in
the received bits. As was noted in Section I, this led [2], [7] to the suggestion to combine message
passing algorithms with Algorithm 3. That is, to start iterating with a message passing algorithm,
and then switch to Algorithm 3. The resulting combined algorithm can be used for decoding at
relatively high rates. In the next two sections we show that as long as N is large enough, this
switch is unnecessary (for lmin > 5).
III Expander arguments for Gallager’s hard decoding algorithm
Theorem 1 Consider a (γ, δ, ρ, λ) expander. Let β̃ = β + (1 − β)/lmin and suppose that β ≥ 1/2,
β̃ ≤ 2δ − 1. Let N0 be some integer such that
δ − β̃ δ
N0 ≤ γN − (3)
(1 − β̃)lmax 1 − β̃
Then Gallager’s hard decoding algorithm will correct any N0 errors at the received values, after
at most dlog(1−β̃)/[2(1−δ)] N0 e + 1 decoding rounds (dxe denotes the smallest integer greater than
or equal to x).
Proof: We say that a variable node is good if all the rightbound messages it transmits are correct.
Otherwise, the variable node is bad. At the beginning of the iterations, the set of bad nodes is the
set of variable nodes with erroneous received values. Consider some variable node with l edges.
Note that by definition of the decoding algorithm, if this variable node has more than β(l − 1) + 1
6
correct incoming messages then all its outgoing messages must be correct, i.e. the variable node
is good. To put it differently, if this variable node is bad then it has at most β(l − 1) + 1 incoming
messages which are correct.
We prove the theorem by showing that the number of rightbound messages transmitted by
bad variable nodes (number of edges that connect the set of bad variable nodes) is monotonically
decreasing under the conditions of the theorem.
Let Xn denote the set of bad variable nodes at the beginning of the n-th iteration of the
algorithm. The set of bad variable nodes after the n-th iteration, Xn+1 , is comprised of two
disjoint sets:
1. Variable nodes Xn+1 \ Xn that were good before the iteration, but turned to be bad after
the iteration.
2. Variable nodes Xn+1 ∩ Xn that were bad both before and after the iteration.
First note that for all i
β(li − 1) + 1 = βli + 1 − β ≤ βli + (1 − β)li /lmin = β̃li
Let Q denote the set of neighbors of Xn+1 \ Xn that are not neighbors of Xn . Then |Q| ≤
β̃|E(Xn+1 \ Xn )| (Otherwise, at least one variable node, i ∈ Xn+1 \ Xn , is connected by more than
β̃li edges to Q. Therefore this variable node receives more than β(li − 1) + 1 correct messages.
Hence this node cannot be bad at the end of the iteration). Hence,
|N (Xn+1 ∪ Xn )| ≤ |N (Xn )| + β̃|E(Xn+1 \ Xn )| (4)
Further, suppose that |E(Xn+1 ∪ Xn )| ≤ γN . Then by the expansion property,
|N (Xn+1 ∪ Xn )| > δ · |E(Xn+1 ∪ Xn )| (5)
Combining (4) and (5), and noting that |E(Xn+1 ∪ Xn )| = |E(Xn+1 \ Xn )| + |E(Xn )| and β̃ ≤
2δ − 1 < δ yields
|N (Xn )| − δ|E(Xn )|
|E(Xn+1 \ Xn )| < (6)
δ − β̃
We thus conclude that (6) holds provided that |E(Xn+1 ∪ Xn )| ≤ γN . We now claim that
|E(Xn+1 ∪ Xn )| ≤ γN . Otherwise, there exist some A ⊂ Xn+1 \ Xn such that γN − lmax <
|E(A ∪ Xn )| ≤ γN (this follows from the fact that E(Xn ) ≤ N0 lmax < γN , where the second
inequality is due to (3) and the fact that δ < 1). Now by (4) and (5) with Xn+1 ∩ A instead of
Xn+1 (so that A ∪ Xn replaces Xn+1 ∪ Xn and A replaces Xn+1 \ Xn ), we have
δ(γN − lmax ) < |E(Xn )| + β̃|E(A)| ≤ |E(Xn )| + β̃(γN − |E(Xn )|)
7
Hence,
δ − β̃ δlmax
|E(Xn )| > γN −
1 − β̃ 1 − β̃
The last inequality contradicts (3) ((3) holds for the first iteration. For subsequent iterations, we
show that the number of rightbound messages transmitted by bad variable nodes is monotonically
decreasing. Hence |E(Xn )| ≤ γN (δ − β̃)/(1 − β̃) − δlmax /(1 − β̃) for all n ≥ 1). Hence (6) holds
under the conditions of the theorem.
Now, let element i ∈ Xn+1 ∩ Xn have li edges, such that at step 2 of the iteration ai edges
carry correct leftbound messages, and bi edges carry erroneous leftbound messages. We know that
ai + bi = li and ai ≤ β(li − 1) + 1 ≤ β̃li . Let T denote the set of check nodes that are neighbors of
Xn+1 ∩Xn , but are not neighbors of Xn \Xn+1 . Each element in T that is connected to Xn+1 ∩Xn
by some edge that carries an erroneous leftbound message must be connected to Xn+1 ∩ Xn by at
least one more edge. To see that, note that all incorrect rightbound messages come from nodes in
Xn . Since Xn \ Xn+1 is not connected to T , all incorrect messages into T come from Xn+1 ∩ Xn .
Therefore, if a leftbound message from some check node in T to Xn+1 ∩ Xn is incorrect then this
check node in T must be connected to Xn+1 ∩ Xn by at least one additional edge. Thus,
X 1X 1X 1X 1 + β̃
|T | ≤ ai + bi = li + ai ≤ |E(Xn+1 ∩ Xn )|
i
2 i 2 i 2 i 2
The first inequality follows from the fact that for every incorrect leftbound message from T to
Xn+1 ∩ Xn we can find another edge from T to Xn+1 ∩ Xn . This pairing up yields the factor 1/2.
We thus conclude that,
1 + β̃
|N (Xn )| ≤ |E(Xn \ Xn+1 )| + |E(Xn+1 ∩ Xn )|
2
Using |E(Xn \ Xn+1 )| = |E(Xn )| − |E(Xn+1 ∩ Xn )| in the last equation yields
2
|E(Xn+1 ∩ Xn )| ≤ (|E(Xn )| − |N (Xn )|) (7)
1 − β̃
Combining (6) and (7) yields
Ã ! Ã !
2 δ 1 2
|E(Xn+1 )| < − |E(Xn )| + − |N (Xn )| (8)
1 − β̃ δ − β̃ δ − β̃ 1 − β̃
Now since β̃ ≤ 2δ−1, the term that multiplies |N (Xn )| in the right hand side of (8) is non-positive.
Hence, using the expansion property, |N (Xn )| > δ|E(Xn )|, in (8) yields
2(1 − δ)
|E(Xn+1 )| < |E(Xn )| ≤ |E(Xn )|
1 − β̃
This proves the theorem. 2
8
Comparing our proof to the proof of Theorem 11 in [10], we see that the generalization to
message passing algorithms is achieved by using the definition of bad variable nodes, instead of
corrupt (erroneous) variables. The generalization to irregular graphs is achieved by showing the
decrease of the number of edges that connect the set of bad nodes, instead of considering the
number of corrupt nodes as in [10], [2]. Note that our proof does not utilize case (b) in stage 1 of
Algorithm 1. In fact when the condition in case (a) is not satisfied, we assume that gi,j is set to
the wrong value.
Theorem 1 asserts that if β̃ ≤ 2δ − 1 and β ≥ 1/2 than Algorithm 1 will correct a linear
number of errors, provided that the graph is a (γ, δ, ρ, λ) expander for some γ > 0. On the other
hand, in Appendix A we prove the following Lemma:
Lemma 1 Consider an irregular ρ, λ bipartite graph that is chosen at random as was described
above. We assume that all right nodes have degree at most rmax (independent of N ), and all left
nodes have degree at least lmin . Then with probability 1 − o(N )/N , for any δ < 1 − 1/lmin and
γ > 0 small enough, B is a (γ, δ, ρ, λ) expander.
In particular, we see that if the left degree of all nodes is at least 5, then B is a (γ, δ, ρ, λ)
with δ > 3/4. The last claim is Lemma 2 in [2]. Lemma 1 is more powerful than Theorem 26
in [10], which is based on Azuma’s inequality.
Combining Theorem 1 and Lemma 1 we see that a sufficient condition for Algorithm 1 to
correct a linear number of errors with probability 1 − o(N )/N is
1 2
≤β <1−
2 lmin − 1
In particular, choosing β = 1/2 ensures that this property will be satisfied for any lmin > 5. We
also see from Theorem 1 that β = 1/2 yields the best bound for the convergence rate. This agrees
with Gallager’s observation [1][p. 50-51] that for sufficiently advanced iterations, the best value
of β is 1/2.
The result of Theorem 1 is also applicable for the case that instead of initial left-node values, we
are given rightbound messages, a sufficiently large portion of which are correct. Indeed, suppose
that at some stage of the algorithm, the number of rightbound messages that are in error is smaller
than φN , where φ is small enough. Then the number of bad variable nodes is upper bounded by
φN . Hence, by the proof of Theorem 1, after a sufficient number of iterations all messages will
be correct. This result may be combined with the following. It has been shown [1],[9] that for
various graph parameters and values of β, a threshold probability p∗1 > 0 exists such that for any
9
channel crossover probability p < p∗1 , as long as the tree assumption holds the probability of error
of the root message approaches 0 as the tree-depth grows to infinity.
Combining these two results, we may now prove a stronger claim, that for any crossover prob-
ability p < p∗1 Gallager’s hard decoding algorithm recovers all errors with probability 1 − o(N )/N .
This can be shown as follows:
Denote by Ai the event that a specific edge ei carries an erroneous rightbound message after
n iterations (ei is the edge labeled i on the left. Recall that this edge is matched with the one
labeled πi on the right. Thus, Pr(Ai ) is independent of i). Further denote by Bi the event that
the graph which spans from ei is tree-like in the first 2n layers. By the definition of p∗1 , for any
crossover probability p < p∗1 , Pr(Ai | Bi ) → 0 as n → ∞. It is also clear [7], that for a given n,
Pr(Bi ) → 1 as N → ∞. Now,
Pr(Ai ) = Pr(Ai , Bi ) + Pr(Ai , Bic ) ≤ Pr(Ai | Bi ) + Pr(Bic )
it follows that Pr(Ai ) is arbitrarily small for n and N large enough. Let X be a random variable
indicating the number of erroneous rightbound messages after n iterations. Then
N
X N
X N
X
EX = E IAi = EIAi = Pr(Ai ) = N Pr(A1 )
i=1 i=1 i=1
Thus, by the discussion above, EX/N can be made arbitrarily small by choosing n and N large
enough (i.e. limn→∞ limN →∞ ). Hence, by Markov’s inequality, for any φ > 0 we have Pr(X <
φN ) → 1 as n, N → ∞. However, as was mentioned above, when X < φN and φ is sufficiently
small, the decoding algorithm will always terminate successfully. Thus, a complete successful
decoding is achievable with a probability 1 − o(N )/N .
On the other hand, denoting p∗2 = γ(δ − β̃)/[lmax (1 − β̃)] and employing the law of large
numbers, Theorem 1 implies that for crossover probability p < p∗2 , Gallager’s hard decoding
algorithm successfully decodes all message bits. We have thus proved:
Theorem 2 Consider a ρ, λ irregular bipartite graph chosen at random as was described in Sec-
tion II. Suppose that 1/2 ≤ β < 1 − 2/(lmin − 1). Let p∗1 be the threshold probability such that
for any crossover probability p < p∗1 , the probability of error for the rightbound message of specific
edge ei approaches zero as the graph depth tends to infinity, under the assumption that the graph
spanned by ei to this depth is tree-like. Furthermore, let p∗2 be defined as above.
Then for any crossover probability p < max{p∗1 , p∗2 } the probability that Gallager’s hard decod-
ing algorithm fails to correctly decode all bits approaches 0 as N → ∞.
Note: The probability space is comprised both of the channel and the code structure.
10
Typically p∗1 > p∗2 so that the proof of Theorem 2 utilizes both the tree based and the expander
based arguments.
Sipser and Spielman proved the convergence of Algorithm 3 for regular bipartite graphs, by
showing that the number of corrupt variables (variable nodes with wrong values) is monotonically
decreasing. In order to prove the convergence of the algorithm for the general case of irregular
bipartite graphs, we need to consider the number of edges that connect the set of corrupt variables,
and show that this number is monotonically decreasing. Following the proof of Theorem 1 (with
corrupt variables instead of bad variable nodes) it can be shown that for 1 − 2δ/3 < β ≤ 2δ − 1,
Algorithm 3 will correct any N0 errors after at most dlog(1−β)/[2(1−δ)] N0 e + 1 decoding rounds,
where
δ+β−1 δ
N0 ≤ γN −
βlmax β
Combining this result with Lemma 1 we see that a sufficient condition for Algorithm 3 to correct
a linear number of errors with probability 1 − o(N )/N for the case β = 1/2 is lmin ≥ 5.
IV Expander arguments for Gallager’s soft decoding algorithm
In this section we show that the techniques used in the previous section may be used for the
soft decoding (belief propagation) algorithm as well. To this end, we formulate the results of
section III for a wider class of message passing algorithms.
Definition 2 A message passing algorithm is an algorithm on an irregular bipartite graph,

proceeding as follows:
1. Initially, values gi,j are associates with each edge (i, j). These values are interpreted as
rightbound messages.
2. Then, the following two steps are performed alternately.
(a) In parallel, the new value hi,j of each edge (i, j) is set to a function of all rightbound
messages of edges (k, j), k 6= i. These values are interpreted as leftbound messages.
(b) In parallel, the new value gi,j of each edge (i, j) is set to a function of i and all leftbound
messages of edges (i, k), k 6= j. These values are interpreted as rightbound messages.
Definition 3 Let Ri,j : R → {good, bad} and let Li,j : R → {good, bad} be two functions associ-
ating the rightbound and leftbound message of each edge with a ‘goodness’ property, respectively.
11
A message passing algorithm is said to be good if functions Ri,j and Li,j exist such that the
following two properties hold:
1. At step (2a), for each (i, j), if all rightbound messages (k, j), k 6= i, are good (Rk,j (gk,j ) =
‘good’), then the resulting leftbound message (i, j) is good (Li,j (hi,j ) = ‘good’) .
2. At step (2b), for each left-node i, if more than β̃ of the incoming leftbound messages (i, j)
are good (Li,j (hi,j ) = ‘good’), then all the resulting rightbound messages (i, k) are good
(Ri,k (gi,k ) = ‘good’ for all k).
Retracing the proof of Theorem 1, the following generalization can be made:
Theorem 3 Consider a (γ, δ, ρ, λ) expander. Suppose that β̃ ≤ 2δ − 1. Let N0 be some integer

satisfying (3). If at some stage of a good message passing algorithm, the number of bad rightbound
messages is at most N0 , then after at most dlog(1−β̃)/[2(1−δ)] N0 e+1 rounds, all rightbound messages
will be good.
To prove the Theorem we first say that a variable node is good if all the rightbound messages
it transmits are good. Otherwise, that variable node is bad. We then show that the number of
messages that connect the set of bad nodes is monotonically decreasing, by following the arguments
of Theorem 1.
To complete our argument, we now show that the soft decoding algorithm 2 is indeed a good
message passing algorithm. In practice, algorithm 2 needs some modification, since the messages
tend to increase to infinity as the iteration index increases (the probabilities that the algorithm
assigns tend to approach either 0 or 1). Hence, it is common practice to clip the messages. We
assume that after some number, n, of iterations, we clip the absolute value of the rightbound
messages to some sufficiently large value, K. In fact, it is this modification that makes it possible
to extend our results to the soft decoding algorithm. For convenience we also assume that the
uncoded log-likelihood ratios, mi are bounded by some constant (independent of N ). In practice
the validity of this assumption follows from the fact that the channel output si is typically bounded
(e.g. a quantized channel).
As a consequence of the clipping assumption we have the following. First, from Equation (2)
we see that:
 
Y Y e|gk,j | − 1
sign(hi,j ) = sign(gk,j ) , |hi,j | = f 
k∈N (j)\i k∈N (j)\i
e|gk,j | +1
12
Hence if |gk,j | = K for all k ∈ N (j) \ i then |hi,j | = K̃j where
" # 
e K − 1 rj −1
∆
K̃j =f  K 
e +1
Furthermore,
K̃j
lim =1
K→∞ K
In addition to that, suppose that |gk,j | ≤ K for k ∈ N (j) \ i. Then, since both (ex − 1)/(ex + 1)
and f (·) are monotonically increasing,
|hi,j | ≤ K̃j
We now define the functions Li,j (·) and Ri,j (·) as follows. We say that a rightbound (leftbound)
message is good if that message has the correct sign and an absolute value after the clipping K
(Kj ; here j is the index of the check node that transmits the message). Otherwise the message is
bad.
From (1) and the discussion above we see that for K large enough, if more than (li − 1)/2 + 1
of the incoming leftbound messages of some variable node i are good, then all the rightbound
messages gi,k ∀k will be good. Hence, property (2) in Definition 3 holds for β̃ = 1/2 + 1/(2lmin ).
In addition to that, it is easy to see that property (1) in Definition 3 holds as well. Thus, by
Theorem 3, if after n iterations there are φN bad rightbound messages, where φ is sufficiently
small, then eventually all messages will be good and hence correct (provided that we have the
required graph expansion). We summarize our results by the following Theorem.
Theorem 4 Consider a ρ, λ irregular bipartite graph chosen at random as was described in

Section II. Suppose that lmin > 5. Further consider Gallager’s soft decoding algorithm with
clipping of rightbound messages to a sufficiently large value K after a sufficiently large num-
ber of iterations n. Let e = (i, j) be some edge, and suppose that the graph spanned by e to
depth 2n is tree like. Let gi,j be the rightbound message transmitted on e after n iterations.
The algorithm is operated on a binary-input memoryless channel such that for all K 0 > 0,
Pr{|gi,j | > K 0 and sign gi,j is correct} → 1, as n → ∞, assuming a tree like graph up to depth 2n.
Then the probability that the algorithm fails to correctly decode all bits approaches 0 as N → ∞.
Note: The probability space is comprised both of the channel and the code structure.
In fact the condition in Theorem 4 can be weakened by using the following lemma.
Lemma 2 The following three conditions are equivalent for Algorithm 2 under the tree like graph
assumption:
13
(c1) For all K 0 > 0, Pr{|gi,j | > K 0 and sign gi,j is correct} → 1 as n → ∞.
(c2) For all K 0 > 0, Pr{|gi,j | > K 0 } → 1 as n → ∞.
(c3) Pr{sign gi,j is correct} → 1 as n → ∞.
Proof: It is obvious that condition (c1) implies (c2) and (c3). Also, (c2) and (c3) together
imply (c1). To complete the proof we show that (c2) implies (c3) and that (c3) implies (c2). For
brevity, we denote gi,j by g. Let E denote a decoding error event, i.e. E = {sign g is erroneous}.
Using these notations, (c3) states that Pr(E) → 0 as n → ∞.
Since we assume a tree like graph, g is the log-likelihood ratio so that
Pr(0 | g = x)
log =x
Pr(1 | g = x)
Now, Pr(E | g = x) = min {Pr(0 | g = x), Pr(1 | g = x)}. Hence
1 − Pr(E | g = x)
log = |x| ∀x ∈ R (9)
Pr(E | g = x)
(c2)⇒(c3): Using (9) we have:
Pr(E | g = x)
Pr(E | g = x) ≤ = e−|x| (10)
1 − Pr(E | g = x)
Let K 0 > 0 be any positive number. Since the function e−|x| takes its maximum in the set
0
{|x| ≥ K 0 } at x = ±K 0 , we have from (10): Pr(E | |g| ≥ K 0 ) ≤ e−K . Hence,
0
Pr(E) ≤ Pr(E | |g| ≥ K 0 ) + Pr(|g| ≤ K 0 ) ≤ e−K + Pr(|g| ≤ K 0 )
(c3) follows by setting K 0 large enough.

(c3)⇒(c2): Since Pr(E | g = x) ≤ 1/2 for all x, (9) implies
1 Pr(E | g = x) 1
Pr(E | g = x) ≥ = e−|x| (11)
2 1 − Pr(E | g = x) 2
Let K 0 > 0 be any positive number. Since the function e−|x| takes its minimum in the interval
0
[−K 0 , K 0 ] at the boundaries, (11) implies Pr(E | |g| ≤ K 0 ) ≥ e−K /2. Hence, we have:
1 0
Pr(E) ≥ Pr(|g| ≤ K 0 ) Pr(E | |g| ≤ K 0 ) ≥ Pr(|g| ≤ K 0 ) e−K
2
0
Thus, Pr(|g| ≤ K 0 ) ≤ 2eK Pr(E). Hence, Pr(E) → 0 implies Pr(|g| ≤ K 0 ) → 0 for any given
K 0 > 0. 2
As was noted previously for the hard decoding algorithm, a threshold probability p∗1 > 0 exists
such that for any channel crossover probability p < p∗1 , as long as the tree assumption holds, the
14
probability of error of the root message approaches 0 as the tree depth grows to infinity. Now,
under the tree assumption, the soft decoding algorithm yields the optimal (Maximum Likelihood)
estimate of the root message given the values of the variables in that tree. Hence the corresponding
probability of error should be smaller than the probability of error of any other algorithm (e.g.
the hard decoding algorithm) that considers only that tree’s variables.
Suppose that some binary quantized version of the channel is such that the probability of
uncoded error is less than p∗1 . From the discussion above, the error probability of sign gi,j ,
approaches 0 as the tree depth goes to infinity. Hence condition (3) in Lemma 2 is satisfied, so
that the condition in Theorem 4 is satisfied. Thus we see that Algorithm 2 with appropriate
clipping is at least as good as Algorithm 1 for any binary quantization of the channel output.
In order to examine whether the condition in Theorem 4 is satisfied for some given LDPC
ensemble and channel, we need to evaluate the evolution of the message distribution as in [7].
In fact, this evaluation can be made for Algorithm 2, when clipping to some value K is applied
from the beginning. We note that even though in this case the messages are not the log-likelihood
ratios (even under the tree assumption), Theorem 4 still holds.
V Conclusions
We showed how expander based arguments may be used to prove that message passing algorithms
can correct a linear number of erroneous messages. The implication of this result is that once a
message passing algorithm has corrected most of the errors, it will eventually correct all errors.
In the previous sections we considered two message passing algorithms. We note however, that
our results apply to other message passing algorithms, such as the algorithms proposed in [7].
These algorithms may be viewed as a compromise between Gallager’s hard and soft message
passing algorithms. To this end, we define bad and good leftbound and rightbound messages,
such that a necessary condition for goodness is correctness, and then apply Theorem 3.
In order to apply our results to the soft decoding algorithm, a clipping assumption was im-
posed. It remains an open question whether this modification in the algorithm is really necessary.
Appendix
A Proof of Lemma 1
To prove Lemma 1 we first state and prove an auxiliary Lemma.
15
Lemma 3 Let X1 , . . . , Xn be binary {0, 1} random variables, such that for all 1 ≤ i ≤ n,
n o
Pr Xi = 1 | Xi−1
1 = xi−1
1 ≤² ∀i, x1i−1 (12)
where Xi−1
1 denotes the vector (X1 , X2 , . . . , Xi−1 ). Then for any δ > ²,
(N ) ½µ ¶ ¾
X ² 1−²
Pr Xi ≥ δN ≤ exp δ log + (1 − δ) log N (13)
i=1
δ 1−δ
Proof: For any s > 0,

(N ) ÃN !
X n P o ³ P ´ Y
s X sδN −sδN s X −sδN sXi
Pr Xi ≥ δN = Pr e i i ≥e ≤e E e i i =e E e (14)
i=1 i=1
where the second transition is due to the Markov inequality. Now, for 1 ≤ n ≤ N we have, since
esx is positive and using (12),
Ã n ! ( n
)
Y X Y
E e sXi
= Pr(Xn−1
1 = xn−1
1 ) Pr(Xn = xn | X1n−1 = x1n−1 ) e sxi
i=1 xn
1 i=1
( n−1
)
X Y X
= Pr(Xn−1
1 = xn−1
1 ) e sxi
Pr(Xn = xn | Xn−1
1 = xn−1
1 )esxn
xn−1 i=1 xn
1
Ãn−1 !
Y
sXi
≤ E e (²es + 1 − ²)
i=1
Hence by using the last inequality repeatedly for n = N, N − 1, . . . , 1 we obtain

ÃN !
Y
E e sXi
≤ (²es + 1 − ²)N (15)
i=1
Combining (14) and (15) we have

(N )
X
Pr Xi ≥ δN ≤ ef (s)N
i=1
where
f (s) = −sδ + log (²es + 1 − ²) (16)
(16) holds for all s > 0. In order to obtain the tightest bound we seek for that s that minimizes
f (s). By straight-forward differentiation it is easy to verify that the minimizing s is
δ(1 − ²)
s = log
²(1 − δ)
(note that s > 0, since δ > ²). The corresponding f (s) is,
² 1−²
f (s) = δ log + (1 − δ) log
δ 1−δ
16
This yields (13). 2
We are now ready to prove Lemma 1.
Proof of Lemma 1. We use the following notation. pA
γ,δ denotes the event that some specific
set, A, of left nodes in the graph with γN edges has at most δγN neighboring right nodes. ∆γ
is the total number of ways to fix sets A of left nodes with γN edges. To express ∆γ , note that
λl N is the total number of left nodes with degree l in the graph, and let λ̃l N be the number of
∆ ∆
left nodes with degree l in A. We also denote λmin =λlmin , λmax =λlmax . Then,
Ã ! Ã !
X λmin N λmax N
∆γ = ···
P λ̃min N λ̃max N
(λ̃min N,...,λ̃max N ) s.t. l
lλ̃l =γ
¡a¢
where in the summation above λ̃l N are all integers. Recall that b ≤ eah(b/a) where h(x) is the
entropy function,
h(x) = −x log x − (1 − x) log(1 − x)
Hence, ¯( )¯
¯ X ¯
θN ¯ ¯
∆γ ≤ e ¯ (λ̃min N, . . . , λ̃max N ) s.t. lλ̃l = γ ¯
¯ ¯
l
where
X
θ= max P λl h(λ̃l /λl ) (17)
(λ̃min ,...,λ̃max ) s.t. l
lλ̃l =γ l
P
Now, since l lλ̃l = γ, we have λ̃l N ≤ γN for l = lmin , . . . , lmax . Hence,
∆γ ≤ eθN (γN )(lmax −lmin +1) (18)
To solve the constrained optimization (17) we define the Lagrange function θL (λ̃min , . . . , λ̃max ) as
follows,
X X
θL (λ̃min , . . . , λ̃max ) = λl h(λ̃l /λl ) + φ lλ̃l
l l
where φ is a Lagrange multiplier. Differentiating θL (λ̃min , . . . , λ̃max ) with respect to λ̃l and equat-
ing to 0 yields
λl
λ̃l =
1 + e−lφ
Hence the solution of the constrained optimization (17) is given by,
lX
max lX
max
1 lλl
θ= λl h( ) where γ=
l=lmin
e−φl +1 l=lmin
e−φl +1
(The solution is indeed a global maximum due to the convexity of the objective function in
(λ̃min , . . . , λ̃max )). Now, by setting γ > 0 small enough, φ is made an arbitrarily small negative
17
number (φ → −∞). Hence it is easy to see that
γ lmin λmin
θ→ log as γ→0 (19)
lmin γ
In order to evaluate pA
γ,δ , we assume, without loss of generality, that the specific set of left
nodes are the first vertices on the left. We order the edges according to their position on the left
side of the graph, i.e. the first edge is the first edge of the first left node, etc. Let X̃i be a random
variable such that X̃i = 1 if the i-th edge is connected to a right node which is not connected to
Plmax ∆
any previous edge (i.e., an edge j with j < i). Set Xi = 1 − X̃i . Let E = l=lmin λl lN =lN be the
total number of edges in the graph. lmin ≤ l ≤ lmax is the average degree of a left node. Now, for
any binary vector x1i−1 ,
n o E − rmax (i − 1)
Pr X̃i = 1 | Xi−1
1 = xi−1
1 ≥ (20)
E − (i − 1)
(Let Yi denote the right node to which the i-th edge is connected. Then for any y1i−1 ,
n o
Pr X̃i = 1 | Y1i−1 = y1i−1 ≥ [E − rmax (i − 1)]/[E − (i − 1)]. The minimal conditional probability
is obtained when y1i−1 is a set of i − 1 different check node indices, such that all have degree rmax .
(20) follows immediately). Hence,
n o rmax − 1
Pr Xi = 1 | Xi−1
1 = xi−1
1 ≤
l/γ − 1
Employing Lemma 3 we now have,

   
XγN  XγN 
pA
γ,δ = Pr X̃i ≤ δγN = Pr Xi ≥ (1 − δ)γN
   
i=1 i=1
(" # )
rmax − 1 l/γ − rmax
≤ exp (1 − δ) log + δ log γN (21)
(1 − δ)[l/γ − 1] δ(l/γ − 1)
provided that 1 − δ > (rmax − 1)/(l/γ − 1). Define
∆ rmax − 1 l/γ − rmax

g(γ)=θ/γ + (1 − δ) log + δ log (22)
(1 − δ)[l/γ − 1] δ(l/γ − 1)
Using (19) it is easy to verify that if δ < 1 − 1/lmin then for γ sufficiently small, g(γ) < 0 and
g(x) is monotonically increasing in (0, γ).
Denote by p̃γ,δ the probability that all sets of left nodes with i ≤ γN edges have more than
δi neighbors. Then, from (18) and (21) we have, by employing the union bound,
γN
X γN
X
p̃γ,δ ≥ 1 − ∆i/N pA
i/N,δ ≥ 1 − eig(i/N ) ilmax ≥ 1 − N eN g(γ) N lmax
i=1 i=1
18
(the second inequality is due to the fact that ∆i/N ≤ eθN ilmax which follows from (18)) which
tends to 1 as N → ∞, since g(γ) < 0. 2
Acknowledgment
The authors would like to thank R. Urbanke for pointing out an inaccuracy in a previous version
of the paper, and for some comments that helped to improve the presentation of the paper.
References
[1] R. G. Gallager, Low Density Parity Check Codes, M.I.T Press, Cambridge, Massachusetts,
1963.
[2] M. Luby, M. Mitzenmacher, A. Shokrollahi and D. Spielman, “Analysis of low density codes
and improved designs using irregular graphs”, Proceedings of the 30th Annual ACM Sympo-
sium on Theory of Computing (STOC-98), pp. 249-258, ACM Press, 1998.
[3] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices”, IEEE Trans.
Inform. Theory, vol. 45, pp. 399-431, March 1999.
[4] G. Miller and D. Burshtein, “Bounds on the maximum likelihood decoding error probability
of low density parity check codes”, submitted for publication, IEEE Trans. Inform. Theory.
[5] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.
[6] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,

Morgan Kaufmann Publishers, 1988.
[7] T. Richardson and R. Urbanke, “The capacity of low-density parity check codes under
message-passing decoding”, submitted for publication, IEEE Trans. Inform. Theory, avail-
able at http://cm.bell-labs.com/cm/ms/who/tjr/pub.html.
[8] T. Richardson, A. Shokrollahi and R. Urbanke, “Design of provably good low-density par-
ity check codes”, submitted for publication, IEEE Trans. Inform. Theory, available at
http://cm.bell-labs.com/cm/ms/who/tjr/pub.html.
[9] L. Bazzi, T. Richardson and R. Urbanke, “Exact thresholds and optimal codes for the binary
symmetric channel and Gallager’s decoding algorithm A”, submitted for publication, IEEE
Trans. Inform. Theory, available at http://cm.bell-labs.com/cm/ms/who/tjr/pub.html.
19
[10] M. Sipser and D. Spielman, “Expander Codes”, IEEE Trans. Inform. Theory, vol. 42, Novem-
ber 1996.
[11] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shanon limit error correcting coding
and decoding: turbo codes”, Proceedings 1993 IEEE International Conference on Commu-
nications, Geneva, Switzerland, pp. 1064-1070, 1993.
20
List of Figures
∆ ∆ ∆
1 Constructing a random irregular graph (λmin =λlmin , λmax =λlmax , ρmin =ρrmin and
∆
ρmax =ρrmax ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 A depth two tree spanned from e = (i, j) after one iteration. Note that the tree is
actually spanned backwards, that is - opposite to the direction in which information
is transferred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
21
{

}
l min
rmin
λminN
ρ min M
π i
}

{
ρ max M
λmax N
rmax 0
1
lmax
∆ ∆ ∆
Figure 1: Constructing a random irregular graph (λmin =λlmin , λmax =λlmax , ρmin =ρrmin and
∆
ρmax =ρrmax ).
22
check node j
edge (i, j)
variable node i
... k li − 1 check nodes
rk − 1 variable nodes
Figure 2: A depth two tree spanned from e = (i, j) after one iteration. Note that the tree is
actually spanned backwards, that is - opposite to the direction in which information is transferred.
23

Expander Graph Arguments For Message Passing Algorithms

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Expander Graph Arguments For Message Passing Algorithms

Încărcat de

Drepturi de autor:

Formate disponibile

Expander Graph Arguments for Message Passing

David Burshtein and Gadi Miller

λ = (λlmin , . . . , λlmax ) ρ = (ρrmin , . . . , ρrmax )

(b) Otherwise set gi,j to si .

In either case, i sends gi,j to j.

1. The value of β = βl is in general a function of ρ, λ and l.

Note: In [10], β = 1/2 is used in algorithm 3.

III Expander arguments for Gallager’s hard decoding algorithm

First note that for all i

β(li − 1) + 1 = βli + 1 − β ≤ βli + (1 − β)li /lmin = β̃li

|N (Xn+1 ∪ Xn )| ≤ |N (Xn )| + β̃|E(Xn+1 \ Xn )| (4)

Further, suppose that |E(Xn+1 ∪ Xn )| ≤ γN . Then by the expansion property,

|N (Xn+1 ∪ Xn )| > δ · |E(Xn+1 ∪ Xn )| (5)

δ(γN − lmax ) < |E(Xn )| + β̃|E(A)| ≤ |E(Xn )| + β̃(γN − |E(Xn )|)

Using |E(Xn \ Xn+1 )| = |E(Xn )| − |E(Xn+1 ∩ Xn )| in the last equation yields

Pr(Ai ) = Pr(Ai , Bi ) + Pr(Ai , Bic ) ≤ Pr(Ai | Bi ) + Pr(Bic )

IV Expander arguments for Gallager’s soft decoding algorithm

Definition 2 A message passing algorithm is an algorithm on an irregular bipartite graph,

2. Then, the following two steps are performed alternately.

Retracing the proof of Theorem 1, the following generalization can be made:

Theorem 3 Consider a (γ, δ, ρ, λ) expander. Suppose that β̃ ≤ 2δ − 1. Let N0 be some integer

Theorem 4 Consider a ρ, λ irregular bipartite graph chosen at random as was described in

(c2) For all K 0 > 0, Pr{|gi,j | > K 0 } → 1 as n → ∞.

(c3) Pr{sign gi,j is correct} → 1 as n → ∞.

Now, Pr(E | g = x) = min {Pr(0 | g = x), Pr(1 | g = x)}. Hence

(c2)⇒(c3): Using (9) we have:

(c3) follows by setting K 0 large enough.

To prove Lemma 1 we first state and prove an auxiliary Lemma.

Proof: For any s > 0,

Hence by using the last inequality repeatedly for n = N, N − 1, . . . , 1 we obtain

Combining (14) and (15) we have

∆γ ≤ eθN (γN )(lmax −lmin +1) (18)

Employing Lemma 3 we now have,

provided that 1 − δ > (rmax − 1)/(l/γ − 1). Define

∆ rmax − 1 l/γ − rmax

[6] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,

... k li − 1 check nodes

S-ar putea să vă placă și