Documente Academic
Documente Profesional
Documente Cultură
where denotes the output of and its input. We take advantage of the polarization effect to construct
To gain an intuitive understanding of the channels , codes that achieve the symmetric channel capacity by a
consider a genie-aided successive cancellation decoder in which method we call polar coding. The basic idea of polar coding is
the th decision element estimates after observing and to create a coding system where one can access each coordinate
the past channel inputs (supplied correctly by the genie channel individually and send data only through those for
regardless of any decision errors at earlier stages). If is a which is near .
priori uniform on , then is the effective channel seen 1) -Coset Codes: We first describe a class of block codes
by the th decision element in this scenario. that contain polar codesthe codes of main interestas a spe-
3) Channel Polarization: cial case. The block lengths for this class are restricted to
powers of two, for some . For a given , each
Theorem 1: For any B-DMC , the channels po- code in the class is encoded in the same manner, namely
larize in the sense that, for any fixed , as goes
to infinity through powers of two, the fraction of indices (8)
for which goes to and
the fraction for which goes to . where is the generator matrix of order , defined above.
For an arbitrary subset of , we may write (8) as
This theorem is proved in Section IV.
The polarization effect is illustrated in Fig. 4 for the case
(9)
is a BEC with erasure probability . The numbers
have been computed using the recursive relations
where denotes the submatrix of formed by the rows
with indices in .
If we now fix and , but leave as a free variable, we
obtain a mapping from source blocks to codeword blocks
(6) . This mapping is a coset code: it is a coset of the linear block
3054 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
code with generator matrix , with the coset determined in the order from to , where ,
by the fixed vector . We will refer to this class are decision functions defined as
of codes collectively as -coset codes. Individual -coset
codes will be identified by a parameter vector , if (12)
where is the code dimension and specifies the size of .1 The
otherwise
ratio is called the code rate. We will refer to as the in-
formation set and to as frozen bits or vector. for all . We will say that a decoder
For example, the code has the encoder block error occurred if or equivalently if .
mapping The decision functions defined above resemble ML de-
cision functions but are not exactly so, because they treat the
future frozen bits as RVs, rather than
(10) as known bits. In exchange for this suboptimality, can be
computed efficiently using recursive formulas, as we will show
in Section II. Apart from algorithmic efficiency, the recursive
For a source block , the coded block is
structure of the decision functions is important because it ren-
.
ders the performance analysis of the decoder tractable. Fortu-
Polar codes will be specified shortly by giving a particular
nately, the loss in performance due to not using true ML decision
rule for the selection of the information set .
functions happens to be negligible: is still achievable.
2) A Successive Cancellation Decoder: Consider a
3) Code Performance: The notation will
-coset code with parameter . Let be
denote the probability of block error for an
encoded into a codeword , let be sent over the channel
code, assuming that each data vector is sent with
, and let a channel output be received. The decoders
probability and decoding is done by the above SC decoder.
task is to generate an estimate of , given knowledge
More precisely
of and . Since the decoder can avoid errors in the
frozen part by setting , the real decoding task is to
generate an estimate of .
The coding results in this paper will be given with respect
to a specific successive cancellation (SC) decoder, unless
some other decoder is mentioned. Given any
-coset code, we will use an SC decoder that generates its The average of over all choices for will
decision by computing be denoted by , i.e.,
if
(11)
if
1We include the redundant parameter K in the parameter set because often A key bound on block error probability under SC decoding is
we consider an ensemble of codes with K fixed and free.A the following.
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3055
Proposition 2: For any B-DMC and any choice of the chosen in accordance with the polar coding rule for , and
parameters fixed arbitrarily. The block error probability under successive
cancellation decoding satisfies
(13)
(16)
Hence, for each , there exists a frozen vector such This is proved in Section VI-B. Note that for symmetric chan-
that nels equals the Shannon capacity of .
6) Complexity: An important issue about polar coding is
(14) the complexity of encoding, decoding, and code construction.
The recursive structure of the channel polarization construction
leads to low-complexity encoding and decoding algorithms for
This is proved in Section V-B. This result suggests choosing the class of -coset codes, and in particular, for polar codes.
from among all -subsets of so as to minimize
the right-hand side (RHS) of (13). This idea leads to the defini- Theorem 5: For the class of -coset codes, the complexity
tion of polar codes. of encoding and the complexity of successive cancellation
4) Polar Codes: Given a B-DMC , a -coset code with decoding are both as functions of code block
parameter will be called a polar code for length .
if the information set is chosen as a -element subset of This theorem is proved in Sections VII and VIII. Notice that
such that for all the complexity bounds in Theorem 5 are independent of the
. code rate and the way the frozen vector is chosen. The bounds
Polar codes are channel-specific designs: a polar code for one hold even at rates above , but clearly this has no practical
channel may not be a polar code for another. The main result of significance.
this paper will be to show that polar coding achieves the sym- As for code construction, we have found no low-complexity
metric capacity of any given B-DMC . algorithms for constructing polar codes. One exception is the
An alternative rule for polar code definition would be to case of a BEC for which we have a polar code construction al-
specify as a -element subset of such that gorithm with complexity . We discuss the code construc-
for all . This alternative tion problem further in Section IX and suggest a low-complexity
rule would also achieve . However, the rule based on the statistical algorithm for approximating the exact polar code con-
Bhattacharyya parameters has the advantage of being connected struction.
with an explicit bound on block error probability.
D. Relations To Previous Work
The polar code definition does not specify how the frozen
vector is to be chosen; it may be chosen at will. This de- This paper is an extension of work begun in [2], where
gree of freedom in the choice of simplifies the performance channel combining and splitting were used to show that im-
analysis of polar codes by allowing averaging over an ensemble. provements can be obtained in the sum cutoff rate for some
However, it is not for analytical convenience alone that we do specific DMCs. However, no recursive method was suggested
not specify a precise rule for selecting , but also because there to reach the ultimate limit of such improvements.
it appears that the code performance is relatively insensitive to As the present work progressed, it became clear that polar
that choice. In fact, we prove in Section VI-B that, for symmetric coding had much in common with ReedMuller (RM) coding
channels, any choice for is as good as any other. [3], [4]. Indeed, recursive code construction and SC decoding,
5) Coding Theorems: Fix a B-DMC and a number . which are two essential ingredients of polar coding, appear to
Let be defined as with selected in have been introduced into coding theory by RM codes.
accordance with the polar coding rule for . Thus, According to one construction of RM codes, for any
is the probability of block error under SC decoding for polar and , an RM code with block length and
coding over with block length and rate , averaged over dimension , denoted , is defined as a linear code
all choices for the frozen bits . The main coding result of whose generator matrix is obtained by deleting
this paper is the following. of the rows of so that none of the deleted rows
has a larger Hamming weight (number of s in that row) than
Theorem 3: For any given B-DMC and fixed , any of the remaining rows. For instance
block error probability for polar coding under successive can-
cellation decoding satisfies
(15) and
For example, is the -coset code with parameter decoding and proves Theorem 3. Section VI considers polar
. So, RM coding and polar coding may coding for symmetric B-DMCs and proves Theorem 4. Sec-
be regarded as two alternative rules for selecting the infor- tion VII gives an analysis of the encoder mapping , which
mation set of a -coset code of a given size . results in efficient encoder implementations. In Section VIII,
Unlike polar coding, RM coding selects the information set we give an implementation of SC decoding with complexity
in a channel-independent manner; it is not as fine-tuned to . In Section IX, we discuss the code construction
the channel polarization phenomenon as polar coding is. We complexity and propose an statistical algorithm for
will show in Section X that, at least for the class of BECs, the approximate code construction. In Section X, we explain why
RM rule for information set selection leads to asymptotically RM codes have a poor asymptotic performance under SC de-
unreliable codes under SC decoding. So, polar coding goes coding. In Section XI, we point out some generalizations of
beyond RM coding in a nontrivial manner by paying closer the present work, give some complementary remarks, and state
attention to channel polarization. some open problems.
Another connection to existing work can be established by
noting that polar codes are multilevel codes, which II. RECURSIVE CHANNEL TRANSFORMATIONS
are a class of codes originating from Plotkins method for code
combining [5]. This connection is not surprising in view of the We have defined a blockwise channel combining and split-
fact that RM codes are also multilevel codes [6, pp. ting operation by (4) and (5) which transformed independent
114125]. However, unlike typical multilevel code construc- copies of into . The goal in this section is to
tions, where one begins with specific small codes to build larger show that this blockwise channel transformation can be broken
ones, in polar coding the multilevel code is obtained by expur- recursively into single-step channel transformations.
gating rows of a full-order generator matrix , with respect We say that a pair of binary-input channels and
to a channel-specific criterion. The special structure of en- are obtained by a single-step transformation
sures that, no matter how expurgation is done, the resulting code of two independent copies of a binary-input channel
is a multilevel code. In essence, polar coding enjoys and write
the freedom to pick a multilevel code from an ensemble of such
codes so as to suit the channel at hand, while conventional ap- iff there exists a one-to-one mapping such that
proaches to multilevel coding do not have this degree of flexi-
bility. (17)
Finally, we wish to mention a spectral interpretation
of polar codes which is similar to Blahuts treatment of
BoseChaudhuriHocquenghem (BCH) codes [7, Ch. 9]; this (18)
type of similarity has already been pointed out by Forney
[8, Ch. 11] in connection with RM codes. From the spectral for all .
viewpoint, the encoding operation (8) is regarded as a transform According to this, we can write for
of a frequency domain information vector to a time any given B-DMC because
domain codeword vector . The transform is invertible with
. The decoding operation is regarded as a spec-
tral estimation problem in which one is given a time domain
observation , which is a noisy version of , and asked to (19)
estimate . To aid the estimation task, one is allowed to freeze
a certain number of spectral components of . This spectral
interpretation of polar coding suggests that it may be possible
to treat polar codes and BCH codes in a unified framework.
(20)
The spectral interpretation also opens the door to the use of
various signal processing techniques in polar coding; indeed, which are in the form of (17) and (18) by taking as the identity
in Section VII, we exploit some fast transform techniques in mapping.
designing encoders for polar codes. It turns out we can write more generally
E. Paper Outline (21)
The rest of the paper is organized as follows. Section II ex-
plores the recursive properties of the channel splitting operation. This follows as a corollary to the following.
In Section III, we focus on how and get trans- Proposition 3: For any ,
formed through a single step of channel combining and split-
ting. We extend this to an asymptotic analysis in Section IV
and complete the proofs of Theorems 1 and 2. This completes
the part of the paper on channel polarization; the rest of the
paper is mainly about polar coding. Section V develops an upper
bound on the block error probability of polar coding under SC (22)
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3057
(24)
(25)
(29)
Thus, we have shown that the blockwise channel transforma- with equality iff is a BEC.
tion from to breaks at a local level into Since the BEC plays a special role with respect to (w.r.t.)
single-step channel transformations of the form (21). The full extremal behavior of reliability, it deserves special attention.
set of such transformations form a fabric as shown in Fig. 5 for
. Reading from right to left, the figure starts with four Proposition 6: Consider the channel transformation
copies of the transformation and con- . If is a BEC with some erasure
tinues in butterfly patterns, each representing a channel trans- probability , then the channels and are BECs with
formation of the form . The erasure probabilities and , respectively. Conversely,
two channels at the right endpoints of the butterflies are always if or is a BEC, then is BEC.
identical and independent. At the rightmost level there are eight
independent copies of ; at the next level to the left, there are B. Rate and Reliability for
four independent copies of and each; and so on. We now return to the context at the end of Section II.
Each step to the left doubles the number of channel types, but
halves the number of independent copies. Proposition 7: For any B-DMC
the transformation
is rate-preserving and reliability-improving in the sense that
III. TRANSFORMATION OF RATE AND RELIABILITY
(30)
We now investigate how the rate and reliability parameters,
and , change through a local (single-step) (31)
3058 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
(32)
(33)
(34)
(35)
(36)
is located at level of the tree at node number counting
from the top.
(37) There is a natural indexing of nodes of the tree in Fig. 6 by
bit sequences. The root node is indexed with the null sequence.
The upper node at level is indexed with and the lower node
with equality in (37) iff is a BEC. with . Given a node at level with index , the upper
This result follows from Propositions 4 and 5 as a special node emanating from it has the label and the lower
case and no separate proof is needed. The cumulative relations node . According to this labeling, the channel
(36) and (37) follow by repeated application of (30) and (31), is situated at the node with . We
respectively. The conditions for equality in Proposition 4 are denote the channel located at node alternatively
stated in terms of rather than ; this is possible because as .
i) by Proposition 4, iff ; and ii) We define a random tree process, denoted , in
is a BEC iff is a BEC, which follows from Proposition 6 connection with Fig. 6. The process begins at the root of the tree
by induction. with . For any , given that
For the special case that is a BEC with an erasure proba- equals or with probability each. Thus,
bility , it follows from Propositions 4 and 6 that the parameters the path taken by through the channel tree may be thought
can be computed through the recursion of as being driven by a sequence of independent and identically
distributed (i.i.d.) Bernoulli RVs where
equals or with equal probability. Given that has
taken on a sample value , the random channel process
takes the value . In order to keep track of the rate
(38) and reliability parameters of the random sequence of channels
, we define the random processes and
with . The parameter equals the era- .
sure probability of the channel . The recursive relations (6) For a more precise formulation of the problem, we consider
the probability space where is the space of all binary
follow from (38) by the fact that for
sequences is the Borel field (BF)
a BEC.
generated by the cylinder sets
IV. CHANNEL POLARIZATION , and is the
probability measure defined on such that
We prove the main results on channel polarization in this sec- . For each , we define as the BF generated by the
tion. The analysis is based on the recursive relationships de- cylinder sets . We
picted in Fig. 5; however, it will be more convenient to re-sketch define as the trivial BF consisting of the null set and only.
Fig. 5 as a binary tree as shown in Fig. 6. The root node of the Clearly, .
tree is associated with the channel . The root gives birth to The random processes described above can now be formally
an upper channel and a lower channel , which are as- defined as follows. For and , define
sociated with the two nodes at level . The channel in turn and
gives birth to channels and , and so on. The channel . For , define
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3059
Furthermore, the sequence converges almost ev- It is interesting that the above discussion gives a new interpre-
erywhere (a.e.) to a random variable such that . tation to as the probability that the random process
Proof: Condition (39) is true by construction and (40) by converges to zero. We may use this to strengthen
the fact that . To prove (41), consider a cylinder set the lower bound in (1). (This stronger form is given as a side
and use Proposition 7 to write result and will not be used in the sequel.)
Proposition 11: For any B-DMC , we have
with equality iff is a BEC.
(42)
This result can be interpreted as saying that, among all
Since is the value of on , (41) fol- B-DMCs , the BEC presents the most favorable ratereli-
lows. This completes the proof that is a martingale. ability tradeoff: it minimizes (maximizes reliability)
Since is a uniformly integrable martingale, by general among all channels with a given symmetric capacity ;
convergence results about such martingales (see, e.g., [9, The- equivalently, it minimizes required to achieve a given
orem 9.4.6]), the claim about follows. level of reliability .
It should not be surprising that the limit RV takes values Proof of Proposition 11: Consider two channels and
a.e. in , which is the set of fixed points of under with . Suppose that is a BEC. Then,
the transformation , as determined by has erasure probability and . Consider the
the condition for equality in (25). For a rigorous proof of this random processes and corresponding to and ,
statement, we take an indirect approach and bring the process respectively. By the condition for equality in (34), the process
also into the picture. is stochastically dominated by in the sense that
for all . Thus, the
Proposition 9: The sequence of random variables and Borel probability of converging to zero is lower-bounded by the
fields is a supermartingale, i.e., probability that converges to zero, i.e., .
This implies .
and is -measurable (43)
(44)
(45) B. Proof of Theorem 2
We will now prove Theorem 2, which strengthens the above
Furthermore, the sequence converges a.e. to a
polarization results by specifying a rate of polarization. Con-
random variable which takes values a.e. in .
Proof: Conditions (43) and (44) are clearly satisfied. To sider the probability space . For , by
verify (45), consider a cylinder set and use Proposition 7, we have if and
if . For
Proposition 7 to write
and , define
for all
For and , we have
Since is the value of on , (45) if
follows. This completes the proof that is a super- if
martingale. For the second claim, observe that the supermartin-
gale is uniformly integrable; hence, it converges a.e. which implies
and in to an RV such that (see,
e.g., [9, Theorem 9.4.5]). It follows that
. But, by Proposition 7, with probability ;
3060 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
(49)
Finally, we tie the above analysis to the claim of Theorem 2.
Define and for .
A realization for the input random vector
corresponds to sending the data vector together with the
frozen vector . As random vectors, the data part and
and note that the frozen part are uniformly distributed over their respec-
tive ranges and statistically independent. By treating as a
random vector over , we obtain a convenient method for
So, for . On the other hand analyzing code performance averaged over all codes in the en-
semble .
The main event of interest in the following analysis is the
block error event under SC decoding, defined as
(50)
where with Since the decoder never makes an error on the frozen part of
. We conclude that for . , i.e., equals with probability one, that part has
This completes the proof of Theorem 2. been excluded from the definition of the block error event.
The probability of error terms and
Given Theorem 2, it is an easy exercise to show that polar
that were defined in Section I-C.3 can be
coding can achieve rates approaching , as we will show in
expressed in this probability space as
the next section. It is clear from the above proof that Theorem 2
gives only an ad hoc result on the asymptotic rate of channel po-
larization; this result is sufficient for proving a capacity theorem
for polar coding; however, finding the exact asymptotic rate of (51)
polarization remains an important goal for future research.2
where denotes the event
2A recent result in this direction is discussed in Section XI-A. .
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3061
Fig. 7. Rate versus reliability for polar coding and SC decoding at block lengths 2 ; 2 , and 2 on a BEC with erasure probability 1=2.
(55)
D. A Numerical Example
(53) Although we have established that polar codes achieve the
symmetric capacity, the proofs have been of an asymptotic na-
Thus, we have ture and the exact asymptotic rate of polarization has not been
found. It is of interest to understand how quickly the polariza-
tion effect takes hold and what performance can be expected of
polar codes under SC decoding in the nonasymptotic regime. To
investigate these, we give here a numerical study.
For an upper bound on , note that
Let be a BEC with erasure probability . Fig. 7 shows
the rate versus reliability tradeoff for using polar codes with
block lengths . This figure is obtained by
using codes whose information sets are of the form
, where is a
variable threshold parameter. There are two sets of three curves
in the plot. The solid lines are plots of
(54) versus . The dashed lines are plots
3062 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
(57)
for all .
(61)
The proof is immediate and omitted.
Proposition 13: If a B-DMC is symmetric, then the chan- (62)
nels and are also symmetric in the sense that
Equality follows in (61) from (58) and (60) by taking ,
and in (62) from the fact that for
(58) any fixed . The rest of the proof is immediate.
Now, by (54), we have, for all
(59) (63)
for all . and, since , we obtain
Proof: Let and observe that
. (64)
Now, let , and use the same reasoning to see that
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3063
This implies that, for every symmetric B-DMC and every Proposition 15: For any symmetric B-DMC , the parame-
code ters given by (7) can be calculated by the simplified
formula
(65)
This bound on is independent of the frozen We omit the proof of this result.
vector . Theorem 4 is now obtained by combining The- For the important example of a BSC, this formula becomes
orem 2 with Proposition 2, as in the proof of Theorem 3.
Note that although we have given a bound on
that is independent of , we stopped short of claiming
that the error event is independent of because our deci-
sion functions break ties always in favor of . If this
bias were removed by randomization, then would become in- This sum for has terms, as compared to
dependent of . terms in (7).
VIII. DECODING
Fig. 9. A circuit for implementing the transformation F . Signals flow from In this section, we consider the computational complexity
left to right. Each edge carries a signal 0 or 1. Each node adds (mod-2) the of the SC decoding algorithm. As in the previous section, our
signals on all incoming edges from the left and sends the result out on all edges
to the right. (Edges carrying the signals u and x are not shown.)
computational model will be a single processor machine with
a random-access memory and the complexities expressed will
be time complexities. Let denote the worst case com-
Proposition 17: For any
plexity of SC decoding over all -coset codes with a given
, the rows of and with index have the
block length . We will show that .
same Hamming weight given by , where
(74)
A. A First Decoding Algorithm
is the Hamming weight of . Consider SC decoding for an arbitrary -coset code with
Proof: For fixed , the sum of the terms parameter . Recall that the source vector
(as integers) over all consists of a random part and a frozen part . This
gives the Hamming weight of the row of with index vector is transmitted across and a channel output
. From the preceding formula for , is obtained with probability . The SC decoder
this sum is easily seen to be . The proof for observes and generates an estimate of . We
is similar may visualize the decoder as consisting of decision elements
(DEs), one for each source element ; the DEs are activated in
C. Encoding Complexity the order to . If , the element is known; so, the th
For complexity estimation, our computational model will be DE, when its turn comes, simply sets and sends this
a single-processor machine with a random-access memory. The result to all succeeding DEs. If , the th DE waits until it
complexities expressed will be time complexities. The discus- has received the previous decisions , and upon receiving
sion will be given for an arbitrary -coset code with parame- them, computes the likelihood ratio (LR)
ters .
Let denote the worst case encoding complexity over
all codes with a given block length . If we
take the complexity of a scalar mod- addition as one unit and
the complexity of the reverse shuffle operation as units, and generates its decision as
we see from Fig. 3 that . if
Starting with an initial value (a generous figure), we otherwise
obtain by induction that for all
. Thus, the encoding complexity is . which is then sent to all succeeding DEs. This is a single-pass
A specific implementation of the encoder using the form algorithm, with no revision of estimates. The complexity of this
is shown in Fig. 9 for . The input to algorithm is determined essentially by the complexity of com-
the circuit is the bit-reversed version of , i.e., . puting the LRs.
The output is given by . In general, the A straightforward calculation using the recursive formulas
complexity of this implementation is with (22) and (23) gives
for and for .
An alternative implementation of the encoder would be to
apply in natural index order at the input of the circuit in
Fig. 9. Then, we would obtain at the output.
Encoding could be completed by a post bit-reversal operation:
. (75)
3066 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
and
(76)
(77)
be calculated at this node is while the second Second, we notice that the decoder graph consists of butter-
label indicates that this node will be the 26th node to be acti- flies ( -by- complete bipartite graphs) that tie together adjacent
vated. The numeric labels, 1 through 32, will be used as quick levels of the graph. For example, nodes 9, 19, 10, and 13 form
identifiers in referring to nodes in the graph. a butterfly. The computational subtrees rooted at nodes 9 and
The decoder is visualized as consisting of DEs situated 19 split into a single pair of computational subtrees, one rooted
at the leftmost side of the decoder graph. The node with label at node 10, the other at node 13. Also note that among the four
is associated with the th DE, . The po- nodes of a butterfly, the upper-left node is always the first node to
sitioning of the DEs in the leftmost column follows the bit-re- be activated by the above depth-first algorithm and the lower-left
versed index order, as in Fig. 9. node always the last one. The upper-right and lower-right nodes
Decoding begins with DE 1 activating node 1 for the calcula- are activated by the upper-left node and they may be activated in
tion of . Node 1 in turn activates node 2 for . any order or even in parallel. The algorithm we specified always
At this point, program control passes to node 2, and node 1 will activated the upper-right node first, but this choice was arbitrary.
wait until node 2 delivers the requested LR. The process con- When the lower-left node is activated, it finds the LRs from its
tinues. Node 2 activates node 3, which activates node 4. Node right neighbors ready for assembly. The upper-left node assem-
4 is a node at the channel level; so it computes and bles the LRs it receives from the right side as in formula (75),
passes it to nodes 3 and 23, its left-side neighbors. In general, a the lower-left node as in (76). These formulas show that the but-
node will send its computational result to all its left-side neigh- terfly patterns impose a constraint on the completion time of LR
bors (although this will not be stated explicitly below). Program calculations: in any given butterfly, the lower-left node needs to
control will be passed back to the left neighbor from which it wait for the result of the upper-left node which in turn needs to
was received. wait for the results of the right-side nodes.
Node 3 still needs data from the right side and activates node Variants of the decoder are possible in which the nodal com-
5, which delivers . Node 3 assembles from the putations are scheduled differently. In the left-to-right im-
messages it has received from nodes 4 and 5 and sends it to node plementation given above, nodes waited to be activated. How-
2. Next, node 2 activates node 6, which activates nodes 7 and 8, ever, it is possible to have a right-to-left implementation in
and returns its result to node 2. Node 2 compiles its response which each node starts its computation autonomously as soon as
and sends it to node 1. Node 1 activates node 9 which its right-side neighbors finish their calculations; this allows ex-
calculates in the same manner as node 2 calculated ploiting parallelism in computations to the maximum possible
, and returns the result to node 1. Node 1 now assembles extent.
For example, in such a fully parallel implementation for the
and sends it to DE 1. Since is a frozen node, DE 1
case in Fig. 10, all eight nodes at the channel-level start calcu-
ignores the received LR, declares , and passes control to
lating their respective LRs in the first time slot following the
DE 2, located next to node 16.
availability of the channel output vector . In the second time
DE 2 activates node 16 for . Node 16 assem-
slot, nodes 3, 6, 10, and 13 do their LR calculations in parallel.
bles from the already-received LRs and
Note that this is the maximum degree of parallelism possible
, and returns its response without activating any node. in the second time slot. Node 23, for example, cannot calculate
DE 2 ignores the returned LR since is frozen, announces in this slot, because
, and passes control to DE 3. is not yet available; it has to wait until decisions
DE 3 activates node 17 for . This triggers LR are announced by the corresponding DEs. In the third time slot,
requests at nodes 18 and 19, but no further. The bit is nodes 2 and 9 do their calculations. In time slot 4, the first deci-
not frozen; so, the decision is made in accordance with sion is made at node 1 and broadcast to all nodes across the
, and control is passed to DE 4. DE 4 activates graph (or at least to those that need it). In slot 5, node 16 calcu-
node 20 for , which is readily assembled and lates and broadcasts it. In slot 6, nodes 18 and 19 do their cal-
returned. The algorithm continues in this manner until finally culations. This process continues until time slot 15 when node
DE 8 receives and decides . 32 decides . It can be shown that, in general, this fully parallel
There are a number of observations that can be made by decoder implementation has a latency of time slots for
looking at this example that should provide further insight into a code of block-length .
the general decoding algorithm. First, notice that the computa-
tion of is carried out in a subtree rooted at node 1, con-
sisting of paths going from left to right, and spanning all nodes IX. CODE CONSTRUCTION
at the channel level. This subtree splits into two disjoint sub- The input to a polar code construction algorithm is a triple
trees, namely, the subtree rooted at node 2 for the calculation of where is the B-DMC on which the code will be
and the subtree rooted at node 9 for the calculation of used, is the code block length, and is the dimensionality
. Since the two subtrees are disjoint, the corresponding of the code. The output of the algorithm is an information set
calculations can be carried out independently (even in parallel of size such that is as small
if there are multiple processors). This splitting of computational as possible. We exclude the search for a good frozen vector
subtrees into disjoint subtrees holds for all nodes in the graph from the code construction problem because the problem is al-
(except those at the channel level), making it possible to imple- ready difficult enough. Recall that, for symmetric channels, the
ment the decoder with a high degree of parallelism. code performance is not affected by the choice of .
3068 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
In principle, the code construction problem can be solved by Recall that, for a given , the RM rule constructs a
computing all the parameters and -coset code with parameter by prioritizing
sorting them; unfortunately, we do not have an efficient algo- each index for inclusion in the information set
rithm for doing this. For symmetric channels, some computa- w.r.t. the Hamming weight of the th row of . The RM
tional shortcuts are available, as we showed by Proposition 15, rule sets the frozen bits to zero. In light of Proposition 17,
but these shortcuts have not yielded an efficient algorithm, ei- the RM rule can be restated in bit-indexed terminology as fol-
ther. One exception to all this is the BEC for which the param- lows.
eters can all be calculated in time thanks to RM Rule: For a given , with
the recursive formulas (38). choose as follows: i) Determine the integer such
Since exact code construction appears too complex, it makes that
sense to look for approximate constructions based on estimates
of the parameters . To that end, it is preferable to
(81)
pose the exact code construction problem as a decision problem:
Given a threshold and an index ,
decide whether where ii) Put each index with into
. iii) Put sufficiently many additional indices with
into to complete its size to .
Any algorithm for solving this decision problem can be used We observe that this rule will select the index
to solve the code construction problem. We can simply run the
algorithm with various settings for until we obtain an infor-
mation set of the desired size .
Approximate code construction algorithms can be proposed for inclusion in . This index turns out to be a particularly poor
based on statistically reliable and efficient methods for esti- choice, at least for the class of BECs, as we show in the re-
mating whether for any given pair . The estimation maining part of this section.
problem can be approached by noting that, as we have implic- Let us assume that the code constructed by the RM rule is
itly shown in (54), the parameter is the expectation of used on a BEC with some erasure probability . We
the RV will show that the symmetric capacity converges
to zero for any fixed positive coding rate as the block length is
increased. For this, we recall the relations (6), which, in bit-in-
(80) dexed channel notation of Section IV, can be written as follows.
For any
where is sampled from the joint probability as-
signment . A Monte
Carlo approach can be taken, where samples of
are generated from the given distribution and the empirical
means are calculated. Given a sample with initial values and
of , the sample values of the RVs (80) can all be . These give the bound
computed in complexity . An SC decoder may be
used for this computation since the sample values of (80) are (82)
just the square roots of the decision statistics that the DEs in an
SC decoder ordinarily compute. (In applying an SC decoder for Now, consider a sequence of RM codes with a fixed rate
this task, the information set should be taken as the null set.) , increasing to infinity, and . Let
Statistical algorithms are helped by the polarization phenom- denote the parameter in (81) for the code with block length
enon: for any fixed and as grows, it becomes easier to re- in this sequence. Let . A simple asymptotic
solve whether , because an ever-growing fraction analysis shows that the ratio must go to as is
of the parameters tend to cluster around or . increased. This in turn implies by (82) that must
It is conceivable that, in an operational system, the estimation go to zero.
of the parameters is made part of an SC decoding Suppose that this sequence of RM codes is decoded using an
procedure, with continual update of the information set as more SC decoder as in Section I-C.2 where the decision metric ig-
reliable estimates become available. nores knowledge of frozen bits and instead uses randomization
over all possible choices. Then, as goes to infinity, the SC de-
coder decision element with index sees a channel whose
X. A NOTE ON THE RM RULE
capacity goes to zero, while the corresponding element of the
In this part, we return to the claim made in Section I-D that input vector is assigned 1 bit of information by the RM rule.
the RM rule for information set selection leads to asymptotically This means that the RM code sequence is asymptotically unre-
unreliable codes under SC decoding. liable under this type of SC decoding.
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3069
We should emphasize that the above result does not say that
RM codes are asymptotically bad under any SC decoder, nor
does it make a claim about the performance of RM codes under
other decoding algorithms. (It is interesting that the possibility
of RM codes being capacity-achieving codes under ML de-
coding seems to have received no attention in the literature.)
A. Rate of Polarization
A major open problem suggested by this paper is to determine
how fast a channel polarizes as a function of the block-length
parameter . In recent work [12], the following result has been
obtained in this direction.
Proposition 18: Let be a B-DMC. For any fixed rate
and constant , there exists a sequence of sets
such that and
Conversely, if and , then for any sequence of sets combines independent copies of the channel from the
with , we have previous step to obtain . In general, the size of the construc-
tion is after steps. The construction is characterized
(84)
by a kernel where is some finite set
included in the mapping for randomization. The reason for in-
As a corollary, Theorem 3 is strengthened as follows. troducing randomization will be discussed shortly.
The vectors and in Fig. 11 denote
Proposition 19: For polar coding on a B-DMC at any fixed
the input and output vectors of . The input vector is first
rate , and any fixed
transformed into a vector by breaking it into
(85) consecutive subblocks of length , namely, ,
and passing each subblock through the transform . Then,
a permutation sorts the components of w.r.t. mod-
This is a vast improvement over the bound proved residue classes of their indices. The sorter ensures that, for any
in this paper. Note that the bound still does not depend on the , the th copy of , counting from the top of
rate as long as . A problem of theoretical interest is the figure, gets as input those components of whose indices
to obtain sharper bounds on that show a more explicit are congruent to mod- . For example,
dependence on . and
Another problem of interest related to polarization is robust- so on. The general formula is for all
ness against channel parameter variations. A finding in this re- .
gard is the following result [13]: If a polar code is designed for a We regard the randomization parameters as being
B-DMC but used on some other B-DMC , then the code chosen at random at the time of code construction, but fixed
will perform at least as well as it would perform on pro- throughout the operation of the system; the decoder operates
vided is a degraded version of in the sense of Shannon with full knowledge of them. For the binary case considered in
[14]. This result gives reason to expect a graceful degradation this paper, we did not employ any randomization. Here, random-
of polar-coding performance due to errors in channel modeling. ization has been introduced as part of the general construction
because preliminary studies show that it greatly simplifies the
B. Generalizations analysis of generalized polarization schemes. This subject will
The polarization scheme considered in this paper can be gen- be explored further in future work.
eralized as shown in Fig. 11. In this general form, the channel Certain additional constraints need to be placed on the kernel
input alphabet is assumed -ary, , for to ensure that a polar code can be defined that is suitable
some . The construction begins by combining inde- for SC decoding in the natural order to . To that end, it
pendent copies of a DMC to obtain , where is sufficient to restrict to unidirectional functions, namely,
is a fixed parameter of the construction. The general step invertible functions of the form
3070 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
inequality . Using this in the expression for , in (86) and use (5) again to obtain (22). For the proof of (23),
we obtain the claim of the lemma we write
for all . Since where the inequality follows from the identity
, (87) can be written as
(88)
Substituting and
Next, we note that
into (88) and simplifying, we obtain Likewise, each term obtained by expanding
gives
when summed over . Also,
summed over equals . Combining these, we obtain
the claim (27). Equality holds in (27) iff, for any choice of ,
which for all four possible values of is equivalent to one of the following is true: or
or . This is satisfied if is a
BEC. Conversely, if we take , we see that for equality in
(27), we must have, for any choice of , either
Thus, either there exists no such that , or ; this is equivalent to saying that is a BEC.
in which case , or for all we have To prove (28), we need the following result which states that
, which implies . the parameter is a convex function of the channel transi-
tion probabilities.
D. Proof of Proposition 5
Lemma 4: Given any collection of B-DMCs
Proof of (26) is straightforward. and a probability distribution on , define
as the channel . Then
(89)
where