Channel Polarization For Capacity Achieveing Codes (Arikan)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO.
7, JULY 2009 3051
Channel Polarization: A Method for Constructing

Capacity-Achieving Codes for Symmetric
Binary-Input Memoryless Channels
Erdal Arkan, Senior Member, IEEE
AbstractA method is proposed, called channel polarization, A. Preliminaries

to construct code sequences that achieve the symmetric capacity
I (W ) of any given binary-input discrete memoryless channel
We write to denote a generic B-DMC with
(B-DMC) W . The symmetric capacity is the highest rate achiev- input alphabet , output alphabet , and transition probabilities
able subject to using the input letters of the channel with equal . The input alphabet will always be
probability. Channel polarization refers to the fact that it is pos- , the output alphabet and the transition probabilities may
sible to synthesize, out of N independent copies of a given B-DMC be arbitrary. We write to denote the channel corresponding
(i)
W , a second set of N binary-input channels fWN : 1 i N g
to uses of ; thus, with
such that, as N becomes large, the fraction of indices i for which
(i)
I (WN ) is near 1 approaches I (W ) and the fraction for which .
(i)
I (WN ) is near 0 approaches 1 0 I (W ). The polarized channels Given a B-DMC , there are two channel parameters of pri-
(i)
fWN g are well-conditioned for channel coding: one need only mary interest in this paper: the symmetric capacity
send data at rate 1 through those with capacity near 1 and at rate 0
through the remaining. Codes constructed on the basis of this idea
are called polar codes. The paper proves that, given any B-DMC
W with I (W ) > 0 and any target rate R < I (W ), there exists a
sequence of polar codes f n ; n 1g such that n has block-length and the Bhattacharyya parameter
N = 2n , rate R, and probability of block error under suc-
cessive cancellation decoding bounded as Pe (N; R) O(N 0 )
independently of the code rate. This performance is achievable by
encoders and decoders with complexity O(N log N ) for each.
Index TermsCapacity-achieving codes, channel capacity,
These parameters are used as measures of rate and reliability,
channel polarization, Plotkin construction, polar codes, Reed respectively. is the highest rate at which reliable commu-
Muller (RM) codes, successive cancellation decoding. nication is possible across using the inputs of with equal
frequency. is an upper bound on the probability of max-
imum-likelihood (ML) decision error when is used only once
I. INTRODUCTION AND OVERVIEW to transmit a or .
FASCINATING aspect of Shannons proof of the noisy
A channel coding theorem is the random-coding method
that he used to show the existence of capacity-achieving code
It is easy to see that
we will use base- logarithms; hence,
values in
takes values in . Throughout,
will also take
. The unit for code rates and channel capacities
sequences without exhibiting any specific such sequence [1]. will be bits.
Explicit construction of provably capacity-achieving code Intuitively, one would expect that iff ,
sequences with low encoding and decoding complexities has and iff . The following bounds, proved in
since then been an elusive goal. This paper is an attempt to the Appendix, make this precise.
meet this goal for the class of binary-input discrete memoryless
channels (B-DMCs). Proposition 1: For any B-DMC , we have
We will give a description of the main ideas and results of the
(1)
paper in this section. First, we give some definitions and state
some basic facts that are used throughout the paper. (2)
Manuscript received October 14, 2007; revised August 13, 2008. Current ver-
sion published June 24, 2009. This work was supported in part by The Scien-
The symmetric capacity equals the Shannon capacity
tific and Technological Research Council of Turkey (TBITAK) under Project
107E216 and in part by the European Commission FP7 Network of Excellence when is a symmetric channel, i.e., a channel for which there
NEWCOM++ under Contract 216715. The material in this paper was presented exists a permutation of the output alphabet such that i)
in part at the IEEE International Symposium on Information Theory (ISIT),
and ii) for all . The bi-
Toronto, ON, Canada, July 2008.
The author is with the Department of Electrical-Electronics Engineering, nary symmetric channel (BSC) and the binary erasure channel
Bilkent University, Ankara, 06800, Turkey (e-mail: arikan@ee.bilkent.edu.tr). (BEC) are examples of symmetric channels. A BSC is a B-DMC
Communicated by Y. Steinberg, Associate Editor for Shannon Theory. with and
Color versions of Figures 4 and 7 in this paper are available online at http://
ieeexplore.ieee.org. . A B-DMC is called a BEC if for each , either
Digital Object Identifier 10.1109/TIT.2009.2021379 or . In the latter case,
0018-9448/$25.00 2009 IEEE
3052 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
Fig. 1. The channel W.
is said to be an erasure symbol. The sum of over all

erasure symbols is called the erasure probability of the BEC.
We denote random variables (RVs) by upper case letters, such
as , and their realizations (sample values) by the corre-
sponding lower case letters, such as . For an RV,
denotes the probability assignment on . For a joint ensemble
Fig. 2. The channel W and its relation to W and W.
of RVs denotes the joint probability assignment.
We use the standard notation to denote the channel , where can be any power of two,
mutual information and its conditional form, respectively. . The recursion begins at the th level
We use the notation as shorthand for denoting a row vector with only one copy of and we set . The first level
. Given such a vector , we write , of the recursion combines two independent copies of
, to denote the subvector ; if is regarded as shown in Fig. 1 and obtains the channel
as void. Given and , we write to denote with the transition probabilities
the subvector . We write to denote the subvector
(3)
with odd indices odd . We write to de-
note the subvector with even indices even . The next level of the recursion is shown in Fig. 2 where two
For example, for , we have independent copies of are combined to create the channel
. The notation is used to denote with transition probabilities
the all-zero vector. .
Code constructions in this paper will be carried out in vector In Fig. 2, is the permutation operation that maps an input
spaces over the binary field GF . Unless specified otherwise, to . The mapping
all vectors, matrices, and operations on them will be over from the input of to the input of can be written as
GF . In particular, for vectors over GF we write with
to denote their componentwise mod- sum. The
Kronecker product of an -by- matrix and an
-by- matrix is defined as
.. .. .. Thus, we have the relation be-

. . .
tween the transition probabilities of and those of .
The general form of the recursion is shown in Fig. 3 where
which is an -by- matrix. The Kronecker power is two independent copies of are combined to produce the
defined as for all . We will follow the channel . The input vector to is first transformed
convention that . into so that and for
We write to denote the number of elements in a set . We . The operator in the figure is a permutation, known as
write to denote the indicator function of a set ; thus, the reverse shuffle operation, and acts on its input to produce
equals if and otherwise. , which becomes the
We use the standard Landau notation to input to the two copies of as shown in the figure.
denote the asymptotic behavior of functions. We observe that the mapping is linear over GF .
It follows by induction that the overall mapping ,
B. Channel Polarization from the input of the synthesized channel to the input of
Channel polarization is an operation by which one manufac- the underlying raw channels , is also linear and may be
tures out of independent copies of a given B-DMC a represented by a matrix so that . We call
second set of channels that show a the generator matrix of size . The transition probabilities of
polarization effect in the sense that, as becomes large, the the two channels and are related by
symmetric capacity terms tend towards or for all (4)
but a vanishing fraction of indices . This operation consists of for all . We will show in Section VII that
a channel combining phase and a channel splitting phase. equals for any , where is a
1) Channel Combining: This phase combines copies of a
permutation matrix known as bit-reversal and .
given B-DMC in a recursive manner to produce a vector
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3053
with . This recursion is valid only for BECs

and it is proved in Section III. No efficient algorithm is known
for calculation of for a general B-DMC .
Fig. 4 shows that tends to be near for small
and near for large . However, shows an erratic be-
havior for an intermediate range of . For general B-DMCs, de-
termining the subset of indices for which is above a
given threshold is an important computational problem that will
be addressed in Section IX.
4) Rate of Polarization: For proving coding theorems, the
speed with which the polarization effect takes hold as a function
of is important. Our main result in this regard is given in terms
of the parameters
Fig. 3. Recursive construction of W from two copies of W . (7)
Theorem 2: For any B-DMC with , and

Note that the channel combining operation is fully specified by any fixed , there exists a sequence of sets
the matrix . Also note that and have the same set of , such that
rows, but in a different (bit-reversed) order; we will discuss this and for all .
topic more fully in Section VII.
2) Channel Splitting: Having synthesized the vector channel This theorem is proved in Section IV-B.
out of , the next step of channel polarization is to We stated the polarization result in Theorem 2 in terms
split back into a set of binary-input coordinate channels rather than because this form is better
, , defined by the transition suited to the coding results that we will develop. A rate of
probabilities polarization result in terms of can be obtained from
Theorem 2 with the help of Proposition 1.
(5)
C. Polar Coding
where denotes the output of and its input. We take advantage of the polarization effect to construct
To gain an intuitive understanding of the channels , codes that achieve the symmetric channel capacity by a
consider a genie-aided successive cancellation decoder in which method we call polar coding. The basic idea of polar coding is
the th decision element estimates after observing and to create a coding system where one can access each coordinate
the past channel inputs (supplied correctly by the genie channel individually and send data only through those for
regardless of any decision errors at earlier stages). If is a which is near .
priori uniform on , then is the effective channel seen 1) -Coset Codes: We first describe a class of block codes
by the th decision element in this scenario. that contain polar codesthe codes of main interestas a spe-
3) Channel Polarization: cial case. The block lengths for this class are restricted to
powers of two, for some . For a given , each
Theorem 1: For any B-DMC , the channels po- code in the class is encoded in the same manner, namely
larize in the sense that, for any fixed , as goes
to infinity through powers of two, the fraction of indices (8)
for which goes to and
the fraction for which goes to . where is the generator matrix of order , defined above.
For an arbitrary subset of , we may write (8) as
This theorem is proved in Section IV.
The polarization effect is illustrated in Fig. 4 for the case
(9)
is a BEC with erasure probability . The numbers
have been computed using the recursive relations
where denotes the submatrix of formed by the rows
with indices in .
If we now fix and , but leave as a free variable, we
obtain a mapping from source blocks to codeword blocks
(6) . This mapping is a coset code: it is a coset of the linear block
Fig. 4. Plot of I (W ) versus i = 1; . . . ; N = 2 for a BEC with = 0:5.
code with generator matrix , with the coset determined in the order from to , where ,
by the fixed vector . We will refer to this class are decision functions defined as
of codes collectively as -coset codes. Individual -coset
codes will be identified by a parameter vector , if (12)
where is the code dimension and specifies the size of .1 The
otherwise
ratio is called the code rate. We will refer to as the in-
formation set and to as frozen bits or vector. for all . We will say that a decoder
For example, the code has the encoder block error occurred if or equivalently if .
mapping The decision functions defined above resemble ML de-
cision functions but are not exactly so, because they treat the
future frozen bits as RVs, rather than
(10) as known bits. In exchange for this suboptimality, can be
computed efficiently using recursive formulas, as we will show
in Section II. Apart from algorithmic efficiency, the recursive
For a source block , the coded block is
structure of the decision functions is important because it ren-
.
ders the performance analysis of the decoder tractable. Fortu-
Polar codes will be specified shortly by giving a particular
nately, the loss in performance due to not using true ML decision
rule for the selection of the information set .
functions happens to be negligible: is still achievable.
2) A Successive Cancellation Decoder: Consider a
3) Code Performance: The notation will
-coset code with parameter . Let be
denote the probability of block error for an
encoded into a codeword , let be sent over the channel
code, assuming that each data vector is sent with
, and let a channel output be received. The decoders
probability and decoding is done by the above SC decoder.
task is to generate an estimate of , given knowledge
More precisely
of and . Since the decoder can avoid errors in the
frozen part by setting , the real decoding task is to
generate an estimate of .
The coding results in this paper will be given with respect
to a specific successive cancellation (SC) decoder, unless
some other decoder is mentioned. Given any
-coset code, we will use an SC decoder that generates its The average of over all choices for will
decision by computing be denoted by , i.e.,
if
(11)
if
1We include the redundant parameter K in the parameter set because often A key bound on block error probability under SC decoding is
we consider an ensemble of codes with K fixed and free.A the following.
Proposition 2: For any B-DMC and any choice of the chosen in accordance with the polar coding rule for , and
parameters fixed arbitrarily. The block error probability under successive
cancellation decoding satisfies
(13)
(16)
Hence, for each , there exists a frozen vector such This is proved in Section VI-B. Note that for symmetric chan-
that nels equals the Shannon capacity of .
6) Complexity: An important issue about polar coding is
(14) the complexity of encoding, decoding, and code construction.
The recursive structure of the channel polarization construction
leads to low-complexity encoding and decoding algorithms for
This is proved in Section V-B. This result suggests choosing the class of -coset codes, and in particular, for polar codes.
from among all -subsets of so as to minimize
the right-hand side (RHS) of (13). This idea leads to the defini- Theorem 5: For the class of -coset codes, the complexity
tion of polar codes. of encoding and the complexity of successive cancellation
4) Polar Codes: Given a B-DMC , a -coset code with decoding are both as functions of code block
parameter will be called a polar code for length .
if the information set is chosen as a -element subset of This theorem is proved in Sections VII and VIII. Notice that
such that for all the complexity bounds in Theorem 5 are independent of the
. code rate and the way the frozen vector is chosen. The bounds
Polar codes are channel-specific designs: a polar code for one hold even at rates above , but clearly this has no practical
channel may not be a polar code for another. The main result of significance.
this paper will be to show that polar coding achieves the sym- As for code construction, we have found no low-complexity
metric capacity of any given B-DMC . algorithms for constructing polar codes. One exception is the
An alternative rule for polar code definition would be to case of a BEC for which we have a polar code construction al-
specify as a -element subset of such that gorithm with complexity . We discuss the code construc-
for all . This alternative tion problem further in Section IX and suggest a low-complexity
rule would also achieve . However, the rule based on the statistical algorithm for approximating the exact polar code con-
Bhattacharyya parameters has the advantage of being connected struction.
with an explicit bound on block error probability.
D. Relations To Previous Work
The polar code definition does not specify how the frozen
vector is to be chosen; it may be chosen at will. This de- This paper is an extension of work begun in [2], where
gree of freedom in the choice of simplifies the performance channel combining and splitting were used to show that im-
analysis of polar codes by allowing averaging over an ensemble. provements can be obtained in the sum cutoff rate for some
However, it is not for analytical convenience alone that we do specific DMCs. However, no recursive method was suggested
not specify a precise rule for selecting , but also because there to reach the ultimate limit of such improvements.
it appears that the code performance is relatively insensitive to As the present work progressed, it became clear that polar
that choice. In fact, we prove in Section VI-B that, for symmetric coding had much in common with ReedMuller (RM) coding
channels, any choice for is as good as any other. [3], [4]. Indeed, recursive code construction and SC decoding,
5) Coding Theorems: Fix a B-DMC and a number . which are two essential ingredients of polar coding, appear to
Let be defined as with selected in have been introduced into coding theory by RM codes.
accordance with the polar coding rule for . Thus, According to one construction of RM codes, for any
is the probability of block error under SC decoding for polar and , an RM code with block length and
coding over with block length and rate , averaged over dimension , denoted , is defined as a linear code
all choices for the frozen bits . The main coding result of whose generator matrix is obtained by deleting
this paper is the following. of the rows of so that none of the deleted rows
has a larger Hamming weight (number of s in that row) than
Theorem 3: For any given B-DMC and fixed , any of the remaining rows. For instance
block error probability for polar coding under successive can-
cellation decoding satisfies
(15) and
This theorem follows as an easy corollary to Theorem 2 and

the bound (13), as we show in Section V-B. For symmetric chan-
nels, we have the following stronger version of Theorem 3.
This construction brings out the similarities between RM
Theorem 4: For any symmetric B-DMC and any fixed codes and polar codes. Since and have the same
, consider any sequence of -coset codes set of rows (only in a different order) for any , it is
with increasing to infinity, clear that RM codes belong to the class of -coset codes.
For example, is the -coset code with parameter decoding and proves Theorem 3. Section VI considers polar
. So, RM coding and polar coding may coding for symmetric B-DMCs and proves Theorem 4. Sec-
be regarded as two alternative rules for selecting the infor- tion VII gives an analysis of the encoder mapping , which
mation set of a -coset code of a given size . results in efficient encoder implementations. In Section VIII,
Unlike polar coding, RM coding selects the information set we give an implementation of SC decoding with complexity
in a channel-independent manner; it is not as fine-tuned to . In Section IX, we discuss the code construction
the channel polarization phenomenon as polar coding is. We complexity and propose an statistical algorithm for
will show in Section X that, at least for the class of BECs, the approximate code construction. In Section X, we explain why
RM rule for information set selection leads to asymptotically RM codes have a poor asymptotic performance under SC de-
unreliable codes under SC decoding. So, polar coding goes coding. In Section XI, we point out some generalizations of
beyond RM coding in a nontrivial manner by paying closer the present work, give some complementary remarks, and state
attention to channel polarization. some open problems.
Another connection to existing work can be established by
noting that polar codes are multilevel codes, which II. RECURSIVE CHANNEL TRANSFORMATIONS
are a class of codes originating from Plotkins method for code
combining [5]. This connection is not surprising in view of the We have defined a blockwise channel combining and split-
fact that RM codes are also multilevel codes [6, pp. ting operation by (4) and (5) which transformed independent
114125]. However, unlike typical multilevel code construc- copies of into . The goal in this section is to
tions, where one begins with specific small codes to build larger show that this blockwise channel transformation can be broken
ones, in polar coding the multilevel code is obtained by expur- recursively into single-step channel transformations.
gating rows of a full-order generator matrix , with respect We say that a pair of binary-input channels and
to a channel-specific criterion. The special structure of en- are obtained by a single-step transformation
sures that, no matter how expurgation is done, the resulting code of two independent copies of a binary-input channel
is a multilevel code. In essence, polar coding enjoys and write
the freedom to pick a multilevel code from an ensemble of such
codes so as to suit the channel at hand, while conventional ap- iff there exists a one-to-one mapping such that
proaches to multilevel coding do not have this degree of flexi-
bility. (17)
Finally, we wish to mention a spectral interpretation
of polar codes which is similar to Blahuts treatment of
BoseChaudhuriHocquenghem (BCH) codes [7, Ch. 9]; this (18)
type of similarity has already been pointed out by Forney
[8, Ch. 11] in connection with RM codes. From the spectral for all .
viewpoint, the encoding operation (8) is regarded as a transform According to this, we can write for
of a frequency domain information vector to a time any given B-DMC because
domain codeword vector . The transform is invertible with
. The decoding operation is regarded as a spec-
tral estimation problem in which one is given a time domain
observation , which is a noisy version of , and asked to (19)
estimate . To aid the estimation task, one is allowed to freeze
a certain number of spectral components of . This spectral
interpretation of polar coding suggests that it may be possible
to treat polar codes and BCH codes in a unified framework.
(20)
The spectral interpretation also opens the door to the use of
various signal processing techniques in polar coding; indeed, which are in the form of (17) and (18) by taking as the identity
in Section VII, we exploit some fast transform techniques in mapping.
designing encoders for polar codes. It turns out we can write more generally
E. Paper Outline (21)
The rest of the paper is organized as follows. Section II ex-
plores the recursive properties of the channel splitting operation. This follows as a corollary to the following.
In Section III, we focus on how and get trans- Proposition 3: For any ,
formed through a single step of channel combining and split-
ting. We extend this to an asymptotic analysis in Section IV
and complete the proofs of Theorems 1 and 2. This completes
the part of the paper on channel polarization; the rest of the
paper is mainly about polar coding. Section V develops an upper
bound on the block error probability of polar coding under SC (22)
transformation (21). By understanding the local behavior, we

will be able to reach conclusions about the overall transforma-
tion from to . Proofs of the results in
this section are given in the Appendix.
A. Local Transformation of Rate and Reliability

Proposition 4: Suppose for some set
of binary-input channels. Then
(24)
(25)
with equality iff equals or .

The equality (24) indicates that the single-step channel trans-
form preserves the symmetric capacity. The inequality (25) to-
gether with (24) implies that the symmetric capacity remains
unchanged under a single-step transform,
Fig. 5. The channel transformation process with N = 8 channels. , iff is either a perfect channel or a completely noisy
one. If is neither perfect nor completely noisy, the single-step
transform moves the symmetric capacity away from the center
and in the sense that , thus helping polar-
ization.
Proposition 5: Suppose for some set
of binary-input channels. Then
(23)
(26)
This proposition is proved in the Appendix. The transform (27)
relationship (21) can now be justified by noting that (22) and (28)
(23) are identical in form to (17) and (18), respectively, after the
following substitutions: Equality holds in (27) iff is a BEC. We have
iff equals or , or equivalently, iff equals
or .
This result shows that reliability can only improve under a
single-step channel transform in the sense that
(29)
Thus, we have shown that the blockwise channel transforma- with equality iff is a BEC.
tion from to breaks at a local level into Since the BEC plays a special role with respect to (w.r.t.)
single-step channel transformations of the form (21). The full extremal behavior of reliability, it deserves special attention.
set of such transformations form a fabric as shown in Fig. 5 for
. Reading from right to left, the figure starts with four Proposition 6: Consider the channel transformation
copies of the transformation and con- . If is a BEC with some erasure
tinues in butterfly patterns, each representing a channel trans- probability , then the channels and are BECs with
formation of the form . The erasure probabilities and , respectively. Conversely,
two channels at the right endpoints of the butterflies are always if or is a BEC, then is BEC.
identical and independent. At the rightmost level there are eight
independent copies of ; at the next level to the left, there are B. Rate and Reliability for
four independent copies of and each; and so on. We now return to the context at the end of Section II.
Each step to the left doubles the number of channel types, but
halves the number of independent copies. Proposition 7: For any B-DMC
the transformation
is rate-preserving and reliability-improving in the sense that
III. TRANSFORMATION OF RATE AND RELIABILITY
(30)
We now investigate how the rate and reliability parameters,
and , change through a local (single-step) (31)
with equality in (31) iff is a BEC. Channel splitting moves

the rate and reliability away from the center in the sense that
(32)
(33)
with equality in (32) and (33) iff equals or . The reli-

ability terms further satisfy
(34)
(35)
with equality in (34) iff is a BEC. The cumulative rate and

reliability satisfy
Fig. 6. The tree process for the recursive channel construction.
(36)
is located at level of the tree at node number counting
from the top.
(37) There is a natural indexing of nodes of the tree in Fig. 6 by
bit sequences. The root node is indexed with the null sequence.
The upper node at level is indexed with and the lower node
with equality in (37) iff is a BEC. with . Given a node at level with index , the upper
This result follows from Propositions 4 and 5 as a special node emanating from it has the label and the lower
case and no separate proof is needed. The cumulative relations node . According to this labeling, the channel
(36) and (37) follow by repeated application of (30) and (31), is situated at the node with . We
respectively. The conditions for equality in Proposition 4 are denote the channel located at node alternatively
stated in terms of rather than ; this is possible because as .
i) by Proposition 4, iff ; and ii) We define a random tree process, denoted , in
is a BEC iff is a BEC, which follows from Proposition 6 connection with Fig. 6. The process begins at the root of the tree
by induction. with . For any , given that
For the special case that is a BEC with an erasure proba- equals or with probability each. Thus,
bility , it follows from Propositions 4 and 6 that the parameters the path taken by through the channel tree may be thought
can be computed through the recursion of as being driven by a sequence of independent and identically
distributed (i.i.d.) Bernoulli RVs where
equals or with equal probability. Given that has
taken on a sample value , the random channel process
takes the value . In order to keep track of the rate
(38) and reliability parameters of the random sequence of channels
, we define the random processes and
with . The parameter equals the era- .
sure probability of the channel . The recursive relations (6) For a more precise formulation of the problem, we consider
the probability space where is the space of all binary
follow from (38) by the fact that for
sequences is the Borel field (BF)
a BEC.
generated by the cylinder sets
IV. CHANNEL POLARIZATION , and is the
probability measure defined on such that
We prove the main results on channel polarization in this sec- . For each , we define as the BF generated by the
tion. The analysis is based on the recursive relationships de- cylinder sets . We
picted in Fig. 5; however, it will be more convenient to re-sketch define as the trivial BF consisting of the null set and only.
Fig. 5 as a binary tree as shown in Fig. 6. The root node of the Clearly, .
tree is associated with the channel . The root gives birth to The random processes described above can now be formally
an upper channel and a lower channel , which are as- defined as follows. For and , define
sociated with the two nodes at level . The channel in turn and
gives birth to channels and , and so on. The channel . For , define
. It is clear that, for any fixed , the RVs hence, . Thus,

and are measurable with respect to the BF . , which implies .
This, in turn, means that equals or a.e.
A. Proof of Theorem 1
Proposition 10: The limit RV takes values a.e. in the set
We will prove Theorem 1 by considering the stochastic con- : and .
vergence properties of the random sequences and . Proof: The fact that equals or a.e., combined with
Proposition 8: The sequence of random variables and Borel Proposition 1, implies that a.e. Since ,
fields is a martingale, i.e., the rest of the claim follows.
As a corollary to Proposition 10, we can conclude that, as
and is -measurable (39)
tends to infinity, the symmetric capacity terms
(40) cluster around and , except for a vanishing fraction.
(41) This completes the proof of Theorem 1.
Furthermore, the sequence converges almost ev- It is interesting that the above discussion gives a new interpre-
erywhere (a.e.) to a random variable such that . tation to as the probability that the random process
Proof: Condition (39) is true by construction and (40) by converges to zero. We may use this to strengthen
the fact that . To prove (41), consider a cylinder set the lower bound in (1). (This stronger form is given as a side
and use Proposition 7 to write result and will not be used in the sequel.)
Proposition 11: For any B-DMC , we have
with equality iff is a BEC.
(42)
This result can be interpreted as saying that, among all
Since is the value of on , (41) fol- B-DMCs , the BEC presents the most favorable ratereli-
lows. This completes the proof that is a martingale. ability tradeoff: it minimizes (maximizes reliability)
Since is a uniformly integrable martingale, by general among all channels with a given symmetric capacity ;
convergence results about such martingales (see, e.g., [9, The- equivalently, it minimizes required to achieve a given
orem 9.4.6]), the claim about follows. level of reliability .
It should not be surprising that the limit RV takes values Proof of Proposition 11: Consider two channels and
a.e. in , which is the set of fixed points of under with . Suppose that is a BEC. Then,
the transformation , as determined by has erasure probability and . Consider the
the condition for equality in (25). For a rigorous proof of this random processes and corresponding to and ,
statement, we take an indirect approach and bring the process respectively. By the condition for equality in (34), the process
also into the picture. is stochastically dominated by in the sense that
for all . Thus, the
Proposition 9: The sequence of random variables and Borel probability of converging to zero is lower-bounded by the
fields is a supermartingale, i.e., probability that converges to zero, i.e., .
This implies .
and is -measurable (43)
(44)
(45) B. Proof of Theorem 2
We will now prove Theorem 2, which strengthens the above
Furthermore, the sequence converges a.e. to a
polarization results by specifying a rate of polarization. Con-
random variable which takes values a.e. in .
Proof: Conditions (43) and (44) are clearly satisfied. To sider the probability space . For , by
verify (45), consider a cylinder set and use Proposition 7, we have if and
if . For
Proposition 7 to write
and , define
for all
For and , we have
Since is the value of on , (45) if
follows. This completes the proof that is a super- if
martingale. For the second claim, observe that the supermartin-
gale is uniformly integrable; hence, it converges a.e. which implies
and in to an RV such that (see,
e.g., [9, Theorem 9.4.5]). It follows that
. But, by Proposition 7, with probability ;
For and , define V. PERFORMANCE OF POLAR CODING

We show in this section that polar coding can achieve the
symmetric capacity of any B-DMC . The main tech-
nical task will be to prove Proposition 2. We will carry out the
Then, we have analysis over the class of -coset codes before specializing
the discussion to polar codes. Recall that individual -coset
codes are identified by a parameter vector . In
the analysis, we will fix the parameters while keeping
from which, by putting and , we obtain free to take any value over . In other words, the anal-
ysis will be over the ensemble of -coset codes with a
(46) fixed . The decoder in the system will be the SC de-
coder described in Section I-C.2.
Now, we show that (46) occurs with sufficiently high proba-
bility. First, we use the following result, which is proved in the
Appendix. A. A Probabilistic Setting for the Analysis
Lemma 1: For any fixed , there exists a finite Let be a probability space with the probability
integer such that assignment
(48)
for all . On this probability space, we
Second, we use Chernoffs bound [10, p. 531] to write define an ensemble of random vectors that
represent, respectively, the input to the synthetic channel ,
(47) the input to the productform channel , the output of
where is the binary entropy function. Define as (and also of ), and the decisions by the decoder. For each
the smallest such that the RHS of (47) is greater than or equal sample point , the first three vectors take
to ; it is clear that is finite for any on the values
and , while the decoder output takes on the
and . Now, with
value whose coordinates are defined recursively
and , we obtain the desired bound as
(49)
Finally, we tie the above analysis to the claim of Theorem 2.
Define and for .
A realization for the input random vector
corresponds to sending the data vector together with the
frozen vector . As random vectors, the data part and
and note that the frozen part are uniformly distributed over their respec-
tive ranges and statistically independent. By treating as a
random vector over , we obtain a convenient method for
So, for . On the other hand analyzing code performance averaged over all codes in the en-
semble .
The main event of interest in the following analysis is the
block error event under SC decoding, defined as
(50)
where with Since the decoder never makes an error on the frozen part of
. We conclude that for . , i.e., equals with probability one, that part has
This completes the proof of Theorem 2. been excluded from the definition of the block error event.
The probability of error terms and
Given Theorem 2, it is an easy exercise to show that polar
that were defined in Section I-C.3 can be
coding can achieve rates approaching , as we will show in
expressed in this probability space as
the next section. It is clear from the above proof that Theorem 2
gives only an ad hoc result on the asymptotic rate of channel po-
larization; this result is sufficient for proving a capacity theorem
for polar coding; however, finding the exact asymptotic rate of (51)
polarization remains an important goal for future research.2
where denotes the event
2A recent result in this direction is discussed in Section XI-A. .
Fig. 7. Rate versus reliability for polar coding and SC decoding at block lengths 2 ; 2 , and 2 on a BEC with erasure probability 1=2.
B. Proof of Proposition 2 We conclude that

We may express the block error event as where
which is equivalent to (13). This completes the proof of Propo-

(52) sition 2. The main coding theorem of the paper now follows
readily.
is the event that the first decision error in SC decoding occurs at
stage . We notice that C. Proof of Theorem 3
By Theorem 2, for any given rate , there exists a
sequence of information sets with size such
that
(55)
In particular, the bound (55) holds if is chosen in accor-

dance with the polar coding rule because by definition this rule
minimizes the sum in (55). Combining this fact about the polar
where coding rule with Proposition 2, Theorem 3 follows.
D. A Numerical Example
(53) Although we have established that polar codes achieve the
symmetric capacity, the proofs have been of an asymptotic na-
Thus, we have ture and the exact asymptotic rate of polarization has not been
found. It is of interest to understand how quickly the polariza-
tion effect takes hold and what performance can be expected of
polar codes under SC decoding in the nonasymptotic regime. To
investigate these, we give here a numerical study.
For an upper bound on , note that
Let be a BEC with erasure probability . Fig. 7 shows
the rate versus reliability tradeoff for using polar codes with
block lengths . This figure is obtained by
using codes whose information sets are of the form
, where is a
variable threshold parameter. There are two sets of three curves
in the plot. The solid lines are plots of
(54) versus . The dashed lines are plots
of versus . The parameter

is varied over a subset of to obtain the curves. . This proves the first claim. To prove the
The parameter corresponds to the code rate. The sig- second claim, we use the first result
nificance of is also clear: it is an upper bound on ,
the probability of block error for polar coding at rate under
SC decoding. The parameter is intended to serve as a lower
bound to .
This example provides empirical evidence that polar coding
achieves channel capacity as the block length is increaseda
fact already established theoretically. More significantly, the ex-
ample also shows that the rate of polarization is too slow to make
near-capacity polar coding under SC decoding feasible in prac- where we used the fact that the sum over can be
tice. replaced with a sum over for any fixed since
.
VI. SYMMETRIC CHANNELS
B. Proof of Theorem 4
The main goal of this section is to prove Theorem 4, which is We return to the analysis in Section V and consider a code en-
a strengthened version of Theorem 3 for symmetric channels. semble under SC decoding, only this time assuming
that is a symmetric channel. We first show that the error
A. Symmetry Under Channel Combining and Splitting
events defined by (53) have a symmetry property.
Let be a symmetric B-DMC with
Proposition 14: For a symmetric B-DMC , the event
and arbitrary. By definition, there exists a permutation on
has the property that
such that i) and ii) for
all . Let be the identity permutation on . Clearly, iff (60)
the permutations form an Abelian group under func-
tion composition. For a compact notation, we will write to for each .
denote , for . Proof: This follows directly from the definition of by
Observe that for all using the symmetry property (59) of the channel .
. This can be verified by exhaustive study of possible cases
or by noting that Now, consider the transmission of a particular source vector
. Also, observe that and a frozen vector , jointly forming an input vector
as is a commutative operation on . for the channel . This event is denoted below as
For , let instead of the more formal .
Corollary 1: For a symmetric B-DMC , for each
(56) and , the events and are indepen-
dent; hence, .
This associates to each element of a permutation on .
Proof: For and , we
Proposition 12: If a B-DMC is symmetric, then is have
also symmetric in the sense that
(57)
for all .
(61)
The proof is immediate and omitted.
Proposition 13: If a B-DMC is symmetric, then the chan- (62)
nels and are also symmetric in the sense that
Equality follows in (61) from (58) and (60) by taking ,
and in (62) from the fact that for
(58) any fixed . The rest of the proof is immediate.
Now, by (54), we have, for all
(59) (63)
for all . and, since , we obtain
Proof: Let and observe that
. (64)
Now, let , and use the same reasoning to see that
This implies that, for every symmetric B-DMC and every Proposition 15: For any symmetric B-DMC , the parame-
code ters given by (7) can be calculated by the simplified
formula
(65)
This bound on is independent of the frozen We omit the proof of this result.
vector . Theorem 4 is now obtained by combining The- For the important example of a BSC, this formula becomes
orem 2 with Proposition 2, as in the proof of Theorem 3.
Note that although we have given a bound on
that is independent of , we stopped short of claiming
that the error event is independent of because our deci-
sion functions break ties always in favor of . If this
bias were removed by randomization, then would become in- This sum for has terms, as compared to
dependent of . terms in (7).
C. Further Symmetries of the Channel VII. ENCODING

We may use the degrees of freedom in the choice of in
In this section, we will consider the encoding of polar codes
(59) to explore the symmetries inherent in the channel . and prove the part of Theorem 5 about encoding complexity.
For a given , we may select with to obtain We begin by giving explicit algebraic expressions for , the
generator matrix for polar coding, which so far has been de-
(66)
fined only in a schematic form by Fig. 3. The algebraic forms of
So, if we were to prepare a lookup table for the transition naturally point at efficient implementations of the encoding
probabilities , operation . In analyzing the encoding operation
it would suffice to store only the subset of probabilities , we exploit its relation to fast transform methods in signal
. processing; in particular, we use the bit-indexing idea of [11] to
The size of the lookup table can be reduced further by using interpret the various permutation operations that are part of .
the remaining degrees of freedom in the choice of . Let
A. Formulas for
. Then, for
any and , we have In the following, assume for some . Let
denote the -dimensional identity matrix for any . We
(67) begin by translating the recursive definition of as given by
Fig. 3 into an algebraic form
which follows from (66) by taking on the left hand
side.
To explore this symmetry further, let with .
. The set is the orbit of under Either by verifying algebraically that
the action group . The orbits over variation of or by observing that channel combining op-
partition the space into equivalence classes. Let eration in Fig. 3 can be redrawn equivalently as in Fig. 8, we
be a set formed by taking one representative from each equiv- obtain a second recursive formula
alence class. The output alphabet of the channel can be
represented effectively by the set .
For example, suppose is a BSC with . Each (68)
orbit has elements and there are orbits. In
valid for . This form appears more suitable to derive a
particular, the channel has effectively two outputs, and
recursive relationship. We substitute
being symmetric, it has to be a BSC. This is a great simplifi-
back into (68) to obtain
cation since has an apparent output alphabet size of .
Likewise, while has an apparent output alphabet size of
, due to symmetry, the size shrinks to .
(69)
Further output alphabet size reductions may be possible by
exploiting other properties specific to certain B-DMCs. For ex- where (69) is obtained by using the identity
ample, if is a BEC, the channels are known to be with
BECs, each with an effective output alphabet size of three. . Repeating this, we obtain
The symmetry properties of help simplify the com-
putation of the channel parameters. (70)
Second, the reverse shuffle operator acts on a row vector

to replace the element in bit-indexed position with
the element in position ; that is, if , then
for all . In other words,
cyclically rotates the bit-indexes of the elements of a left
operand to the right by one place.
Third, the matrix in (70) can be interpreted as the
bit-reversal operator: if , then
for all . This statement can be proved by
induction using the recursive formula (71). We give the idea
of such a proof by an example. Let us assume that is a
bit-reversal operator and show that the same is true for .
Let be any vector over GF . Using bit-indexing, it can
be written as .
Since , let us first consider the action
of on . The reverse shuffle rearranges the elements
of with respect to oddeven parity of their indices, so
equals .
This has two halves, and
, corresponding to oddeven
Fig. 8. An alternative realization of the recursive construction for W .
index classes. Notice that
for all
and
. This is to be expected since the reverse
where . It shuffle rearranges the indices in increasing order within each
can seen by simple manipulations that oddeven index class. Next, consider the action of
on . The result is . By assumption, is
(71) a bit-reversal operation, so , which
in turn equals . Likewise, the result
We can see that is a permutation matrix by the following of equals . Hence, the overall
induction argument. Assume that is a permutation matrix operation is a bit-reversal operation.
for some ; this is true for since . Then, Given the bit-reversal interpretation of , it is clear that
is a permutation matrix because it is the product of two is a symmetric matrix, so . Since is a permuta-
permutation matrices, and . tion, it follows from symmetry that .
In the following, we will say more about the nature of as It is now easy to see that, for any -by- matrix ,
a permutation. the product has elements
B. Analysis by Bit-Indexing . It follows that if is invariant under bit-re-
versal, i.e., if for every
To analyze the encoding operation further, it will be conve-
, then . Since
nient to index vectors and matrices with bit sequences. Given
, this is equivalent to . Thus,
a vector with length for some , we de-
bit-reversal-invariant matrices commute with the bit-reversal
note its th element, , alternatively as
operator.
where is the binary expansion of the integer
in the sense that . Likewise, the ele- Proposition 16: For any the generator
ment of an -by- matrix is denoted alternatively as matrix is given by and
where and are the binary rep- where is the bit-reversal permutation. is a bit-reversal
resentations of and , respectively. Using this conven- invariant matrix with
tion, it can be readily verified that the product of
a -by- matrix and a -by- matrix has elements
. (73)
We now consider the encoding operation under bit-indexing.
First, we observe that the elements of in bit-indexed form are
Proof: commutes with because it is invariant
given by for all . Thus,
under bit-reversal, which is immediate from (72). The statement
has elements
was established before; by proving that
commutes with , we have established the other statement:
. The bit-indexed form (73) follows by applying
bit-reversal to (72).
(72)
Finally, we give a fact that will be useful in Section X.
The encoding circuit of Fig. 9 suggests many parallel imple-

mentation alternatives for : for example, with proces-
sors, one may do a column-by-column implementation, and
reduce the total latency to . Various other tradeoffs are
possible between latency and hardware complexity.
In an actual implementation of polar codes, it may be prefer-
able to use in place of as the encoder mapping in
order to simplify the implementation. In that case, the SC de-
coder should compensate for this by decoding the elements of
the source vector in bit-reversed index order. We have in-
cluded as part of the encoder in this paper in order to have
an SC decoder that decodes in the natural index order, which
simplified the notation.
VIII. DECODING
Fig. 9. A circuit for implementing the transformation F . Signals flow from In this section, we consider the computational complexity
left to right. Each edge carries a signal 0 or 1. Each node adds (mod-2) the of the SC decoding algorithm. As in the previous section, our
signals on all incoming edges from the left and sends the result out on all edges
to the right. (Edges carrying the signals u and x are not shown.)
computational model will be a single processor machine with
a random-access memory and the complexities expressed will
be time complexities. Let denote the worst case com-
Proposition 17: For any
plexity of SC decoding over all -coset codes with a given
, the rows of and with index have the
block length . We will show that .
same Hamming weight given by , where
(74)
A. A First Decoding Algorithm
is the Hamming weight of . Consider SC decoding for an arbitrary -coset code with
Proof: For fixed , the sum of the terms parameter . Recall that the source vector
(as integers) over all consists of a random part and a frozen part . This
gives the Hamming weight of the row of with index vector is transmitted across and a channel output
. From the preceding formula for , is obtained with probability . The SC decoder
this sum is easily seen to be . The proof for observes and generates an estimate of . We
is similar may visualize the decoder as consisting of decision elements
(DEs), one for each source element ; the DEs are activated in
C. Encoding Complexity the order to . If , the element is known; so, the th
For complexity estimation, our computational model will be DE, when its turn comes, simply sets and sends this
a single-processor machine with a random-access memory. The result to all succeeding DEs. If , the th DE waits until it
complexities expressed will be time complexities. The discus- has received the previous decisions , and upon receiving
sion will be given for an arbitrary -coset code with parame- them, computes the likelihood ratio (LR)
ters .
Let denote the worst case encoding complexity over
all codes with a given block length . If we
take the complexity of a scalar mod- addition as one unit and
the complexity of the reverse shuffle operation as units, and generates its decision as
we see from Fig. 3 that . if
Starting with an initial value (a generous figure), we otherwise
obtain by induction that for all
. Thus, the encoding complexity is . which is then sent to all succeeding DEs. This is a single-pass
A specific implementation of the encoder using the form algorithm, with no revision of estimates. The complexity of this
is shown in Fig. 9 for . The input to algorithm is determined essentially by the complexity of com-
the circuit is the bit-reversed version of , i.e., . puting the LRs.
The output is given by . In general, the A straightforward calculation using the recursive formulas
complexity of this implementation is with (22) and (23) gives
for and for .
An alternative implementation of the encoder would be to
apply in natural index order at the input of the circuit in
Fig. 9. Then, we would obtain at the output.
Encoding could be completed by a post bit-reversal operation:
. (75)
and
(76)
Thus, the calculation of an LR at length is reduced to the

calculation of two LRs at length . This recursion can be
continued down to block length , at which point the LRs have
the form and can be computed
directly.
To estimate the complexity of LR calculations, let
denote the worst case complexity
of computing over and
. From the recursive LR formulas, we have the com-
plexity bound
(77)
where is the worst case complexity of assembling two LRs at

length into an LR at length . Taking as one unit, we
obtain the bound
Fig. 10. An implementation of the successive cancellation decoder for polar
coding at block-length N = 8.
(78)
Let us suppose that we carry out the calculations in each class
The overall decoder complexity can now be bounded as
independently, without trying to exploit any further savings
. This complexity
that may come from the sharing of LR values between the two
corresponds to a decoder whose DEs do their LR calculations
classes. Then, we have two problems of the same type as the
privately, without sharing any partial results with each other.
original but at half the size. Each class in (79) generates a set
It turns out, if the DEs pool their scratch-pad results, a more
of LR calculation requests at length , for a total of
efficient decoder implementation is possible with overall com-
plexity , as we will show next. requests. For example, if we let , the
requests arising from the first class are
B. Refinement of the Decoding Algorithm
We now consider a decoder that computes the full set of LRs,
. The previous decoder could
skip the calculation of for ; but now we
do not allow this. The decisions are made Using this reasoning inductively across the set of all lengths
in exactly the same manner as before; in particular, if , , we conclude that the total number of LRs that
the decision is set to the known frozen value , regardless need to be calculated is .
of . So far, we have not paid attention to the exact order in which
To see where the computational savings will come from, we the LR calculations at various block lengths are carried out. Al-
inspect (75) and (76) and note that each LR value in the pair though this gave us an accurate count of the total number of LR
calculations, for a full description of the algorithm, we need to
specify an order. There are many possibilities for such an order,
but to be specific we will use a depth-first algorithm, which is
is assembled from the same pair of LRs easily described by a small example.
We consider a decoder for a code with parameter
chosen as . The
computation for the decoder is laid out in a graph as shown in
Thus, the calculation of all LRs at length requires exactly Fig. 10. There are nodes in the graph, each
LR calculations at length .3 Let us split the LRs at responsible for computing an LR request that arises during the
length into two classes, namely course of the algorithm. Starting from the left side, the first
column of nodes correspond to LR requests at length (deci-
sion level), the second column of nodes to requests at length ,
the third at length , and the fourth at length (channel level).
(79)
Each node in the graph carries two labels. For example, the
N
3Actually, some LR calculations at length =2 may be avoided if, by chance, third node from the bottom in the third column has the labels
some duplications occur, but we will disregard this. and ; the first label indicates that the LR value to
be calculated at this node is while the second Second, we notice that the decoder graph consists of butter-
label indicates that this node will be the 26th node to be acti- flies ( -by- complete bipartite graphs) that tie together adjacent
vated. The numeric labels, 1 through 32, will be used as quick levels of the graph. For example, nodes 9, 19, 10, and 13 form
identifiers in referring to nodes in the graph. a butterfly. The computational subtrees rooted at nodes 9 and
The decoder is visualized as consisting of DEs situated 19 split into a single pair of computational subtrees, one rooted
at the leftmost side of the decoder graph. The node with label at node 10, the other at node 13. Also note that among the four
is associated with the th DE, . The po- nodes of a butterfly, the upper-left node is always the first node to
sitioning of the DEs in the leftmost column follows the bit-re- be activated by the above depth-first algorithm and the lower-left
versed index order, as in Fig. 9. node always the last one. The upper-right and lower-right nodes
Decoding begins with DE 1 activating node 1 for the calcula- are activated by the upper-left node and they may be activated in
tion of . Node 1 in turn activates node 2 for . any order or even in parallel. The algorithm we specified always
At this point, program control passes to node 2, and node 1 will activated the upper-right node first, but this choice was arbitrary.
wait until node 2 delivers the requested LR. The process con- When the lower-left node is activated, it finds the LRs from its
tinues. Node 2 activates node 3, which activates node 4. Node right neighbors ready for assembly. The upper-left node assem-
4 is a node at the channel level; so it computes and bles the LRs it receives from the right side as in formula (75),
passes it to nodes 3 and 23, its left-side neighbors. In general, a the lower-left node as in (76). These formulas show that the but-
node will send its computational result to all its left-side neigh- terfly patterns impose a constraint on the completion time of LR
bors (although this will not be stated explicitly below). Program calculations: in any given butterfly, the lower-left node needs to
control will be passed back to the left neighbor from which it wait for the result of the upper-left node which in turn needs to
was received. wait for the results of the right-side nodes.
Node 3 still needs data from the right side and activates node Variants of the decoder are possible in which the nodal com-
5, which delivers . Node 3 assembles from the putations are scheduled differently. In the left-to-right im-
messages it has received from nodes 4 and 5 and sends it to node plementation given above, nodes waited to be activated. How-
2. Next, node 2 activates node 6, which activates nodes 7 and 8, ever, it is possible to have a right-to-left implementation in
and returns its result to node 2. Node 2 compiles its response which each node starts its computation autonomously as soon as
and sends it to node 1. Node 1 activates node 9 which its right-side neighbors finish their calculations; this allows ex-
calculates in the same manner as node 2 calculated ploiting parallelism in computations to the maximum possible
, and returns the result to node 1. Node 1 now assembles extent.
For example, in such a fully parallel implementation for the
and sends it to DE 1. Since is a frozen node, DE 1
case in Fig. 10, all eight nodes at the channel-level start calcu-
ignores the received LR, declares , and passes control to
lating their respective LRs in the first time slot following the
DE 2, located next to node 16.
availability of the channel output vector . In the second time
DE 2 activates node 16 for . Node 16 assem-
slot, nodes 3, 6, 10, and 13 do their LR calculations in parallel.
bles from the already-received LRs and
Note that this is the maximum degree of parallelism possible
, and returns its response without activating any node. in the second time slot. Node 23, for example, cannot calculate
DE 2 ignores the returned LR since is frozen, announces in this slot, because
, and passes control to DE 3. is not yet available; it has to wait until decisions
DE 3 activates node 17 for . This triggers LR are announced by the corresponding DEs. In the third time slot,
requests at nodes 18 and 19, but no further. The bit is nodes 2 and 9 do their calculations. In time slot 4, the first deci-
not frozen; so, the decision is made in accordance with sion is made at node 1 and broadcast to all nodes across the
, and control is passed to DE 4. DE 4 activates graph (or at least to those that need it). In slot 5, node 16 calcu-
node 20 for , which is readily assembled and lates and broadcasts it. In slot 6, nodes 18 and 19 do their cal-
returned. The algorithm continues in this manner until finally culations. This process continues until time slot 15 when node
DE 8 receives and decides . 32 decides . It can be shown that, in general, this fully parallel
There are a number of observations that can be made by decoder implementation has a latency of time slots for
looking at this example that should provide further insight into a code of block-length .
the general decoding algorithm. First, notice that the computa-
tion of is carried out in a subtree rooted at node 1, con-
sisting of paths going from left to right, and spanning all nodes IX. CODE CONSTRUCTION
at the channel level. This subtree splits into two disjoint sub- The input to a polar code construction algorithm is a triple
trees, namely, the subtree rooted at node 2 for the calculation of where is the B-DMC on which the code will be
and the subtree rooted at node 9 for the calculation of used, is the code block length, and is the dimensionality
. Since the two subtrees are disjoint, the corresponding of the code. The output of the algorithm is an information set
calculations can be carried out independently (even in parallel of size such that is as small
if there are multiple processors). This splitting of computational as possible. We exclude the search for a good frozen vector
subtrees into disjoint subtrees holds for all nodes in the graph from the code construction problem because the problem is al-
(except those at the channel level), making it possible to imple- ready difficult enough. Recall that, for symmetric channels, the
ment the decoder with a high degree of parallelism. code performance is not affected by the choice of .
In principle, the code construction problem can be solved by Recall that, for a given , the RM rule constructs a
computing all the parameters and -coset code with parameter by prioritizing
sorting them; unfortunately, we do not have an efficient algo- each index for inclusion in the information set
rithm for doing this. For symmetric channels, some computa- w.r.t. the Hamming weight of the th row of . The RM
tional shortcuts are available, as we showed by Proposition 15, rule sets the frozen bits to zero. In light of Proposition 17,
but these shortcuts have not yielded an efficient algorithm, ei- the RM rule can be restated in bit-indexed terminology as fol-
ther. One exception to all this is the BEC for which the param- lows.
eters can all be calculated in time thanks to RM Rule: For a given , with
the recursive formulas (38). choose as follows: i) Determine the integer such
Since exact code construction appears too complex, it makes that
sense to look for approximate constructions based on estimates
of the parameters . To that end, it is preferable to
(81)
pose the exact code construction problem as a decision problem:
Given a threshold and an index ,
decide whether where ii) Put each index with into
. iii) Put sufficiently many additional indices with
into to complete its size to .
Any algorithm for solving this decision problem can be used We observe that this rule will select the index
to solve the code construction problem. We can simply run the
algorithm with various settings for until we obtain an infor-
mation set of the desired size .
Approximate code construction algorithms can be proposed for inclusion in . This index turns out to be a particularly poor
based on statistically reliable and efficient methods for esti- choice, at least for the class of BECs, as we show in the re-
mating whether for any given pair . The estimation maining part of this section.
problem can be approached by noting that, as we have implic- Let us assume that the code constructed by the RM rule is
itly shown in (54), the parameter is the expectation of used on a BEC with some erasure probability . We
the RV will show that the symmetric capacity converges
to zero for any fixed positive coding rate as the block length is
increased. For this, we recall the relations (6), which, in bit-in-
(80) dexed channel notation of Section IV, can be written as follows.
For any
where is sampled from the joint probability as-
signment . A Monte
Carlo approach can be taken, where samples of
are generated from the given distribution and the empirical
means are calculated. Given a sample with initial values and
of , the sample values of the RVs (80) can all be . These give the bound
computed in complexity . An SC decoder may be
used for this computation since the sample values of (80) are (82)
just the square roots of the decision statistics that the DEs in an
SC decoder ordinarily compute. (In applying an SC decoder for Now, consider a sequence of RM codes with a fixed rate
this task, the information set should be taken as the null set.) , increasing to infinity, and . Let
Statistical algorithms are helped by the polarization phenom- denote the parameter in (81) for the code with block length
enon: for any fixed and as grows, it becomes easier to re- in this sequence. Let . A simple asymptotic
solve whether , because an ever-growing fraction analysis shows that the ratio must go to as is
of the parameters tend to cluster around or . increased. This in turn implies by (82) that must
It is conceivable that, in an operational system, the estimation go to zero.
of the parameters is made part of an SC decoding Suppose that this sequence of RM codes is decoded using an
procedure, with continual update of the information set as more SC decoder as in Section I-C.2 where the decision metric ig-
reliable estimates become available. nores knowledge of frozen bits and instead uses randomization
over all possible choices. Then, as goes to infinity, the SC de-
coder decision element with index sees a channel whose
X. A NOTE ON THE RM RULE
capacity goes to zero, while the corresponding element of the
In this part, we return to the claim made in Section I-D that input vector is assigned 1 bit of information by the RM rule.
the RM rule for information set selection leads to asymptotically This means that the RM code sequence is asymptotically unre-
unreliable codes under SC decoding. liable under this type of SC decoding.
We should emphasize that the above result does not say that
RM codes are asymptotically bad under any SC decoder, nor
does it make a claim about the performance of RM codes under
other decoding algorithms. (It is interesting that the possibility
of RM codes being capacity-achieving codes under ML de-
coding seems to have received no attention in the literature.)
XI. CONCLUDING REMARKS

In this section, we go through the paper to discuss some re-
sults further, point out some generalizations, and state some
open problems.
A. Rate of Polarization
A major open problem suggested by this paper is to determine
how fast a channel polarizes as a function of the block-length
parameter . In recent work [12], the following result has been
obtained in this direction.
Proposition 18: Let be a B-DMC. For any fixed rate
and constant , there exists a sequence of sets
such that and
(83) Fig. 11. General form of channel combining.
Conversely, if and , then for any sequence of sets combines independent copies of the channel from the
with , we have previous step to obtain . In general, the size of the construc-
tion is after steps. The construction is characterized
(84)
by a kernel where is some finite set
included in the mapping for randomization. The reason for in-
As a corollary, Theorem 3 is strengthened as follows. troducing randomization will be discussed shortly.
The vectors and in Fig. 11 denote
Proposition 19: For polar coding on a B-DMC at any fixed
the input and output vectors of . The input vector is first
rate , and any fixed
transformed into a vector by breaking it into
(85) consecutive subblocks of length , namely, ,
and passing each subblock through the transform . Then,
a permutation sorts the components of w.r.t. mod-
This is a vast improvement over the bound proved residue classes of their indices. The sorter ensures that, for any
in this paper. Note that the bound still does not depend on the , the th copy of , counting from the top of
rate as long as . A problem of theoretical interest is the figure, gets as input those components of whose indices
to obtain sharper bounds on that show a more explicit are congruent to mod- . For example,
dependence on . and
Another problem of interest related to polarization is robust- so on. The general formula is for all
ness against channel parameter variations. A finding in this re- .
gard is the following result [13]: If a polar code is designed for a We regard the randomization parameters as being
B-DMC but used on some other B-DMC , then the code chosen at random at the time of code construction, but fixed
will perform at least as well as it would perform on pro- throughout the operation of the system; the decoder operates
vided is a degraded version of in the sense of Shannon with full knowledge of them. For the binary case considered in
[14]. This result gives reason to expect a graceful degradation this paper, we did not employ any randomization. Here, random-
of polar-coding performance due to errors in channel modeling. ization has been introduced as part of the general construction
because preliminary studies show that it greatly simplifies the
B. Generalizations analysis of generalized polarization schemes. This subject will
The polarization scheme considered in this paper can be gen- be explored further in future work.
eralized as shown in Fig. 11. In this general form, the channel Certain additional constraints need to be placed on the kernel
input alphabet is assumed -ary, , for to ensure that a polar code can be defined that is suitable
some . The construction begins by combining inde- for SC decoding in the natural order to . To that end, it
pendent copies of a DMC to obtain , where is sufficient to restrict to unidirectional functions, namely,
is a fixed parameter of the construction. The general step invertible functions of the form
C. Iterative Decoding of Polar Codes

We have seen that polar coding under SC decoding can
achieve symmetric channel capacity; however, one needs to
use codes with impractically large block lengths. A question
of interest is whether polar coding performance can improve
significantly under more powerful decoding algorithms. The
sparseness of the graph representation of makes Gallagers
belief propagation (BP) decoding algorithm [15] applicable
to polar codes. A highly relevant work in this connection
is [16] which proposes BP decoding for RM codes using a
factor-graph of , as shown in Fig. 12 for . We carried
out experimental studies to assess the performance of polar
codes under BP decoding, using RM codes under BP decoding
as a benchmark [17]. The results showed significantly better
performance for polar codes. Also, the performance of polar
codes under BP decoding was significantly better than their
performance under SC decoding. However, more work needs
to be done to assess the potential of polar coding for practical
applications.
Fig. 12. The factor graph representation for the transformation F .

APPENDIX
such that , for a given set of coor- A. Proof of Proposition 1

dinate functions . For
The RHS of (1) equals the channel parameter as de-
a unidirectional , the combined channel can be split to
fined in Gallager [10, Sec. 5.6] with taken as the uniform input
channels in much the same way as in this paper. The distribution. (This is the symmetric cutoff rate of the channel.)
encoding and SC decoding complexities of such a code are both It is well known (and shown in the same section of [10]) that
. . This proves (1).
Polar coding can be generalized further in order to overcome To prove (2), for any B-DMC , define
the restriction of the block length to powers of a given number
by using a sequence of kernels in the
code construction. Kernel combines copies of a given
DMC to create a channel . Kernel combines
copies of to create a channel , etc., for an overall This is the variational distance between the two distributions
block-length of . If all kernels are unidirectional, and over .
the combined channel can still be split into channels
whose transition probabilities can be expressed by recursive for- Lemma 2: For any B-DMC .
mulas and encoding and decoding complexities are Proof: Let be an arbitrary B-DMC with output alphabet
maintained. and put
So far we have considered only combining copies of one . By definition
DMC . Another direction for generalization of the method is
to combine copies of two or more distinct DMCs. For example,
the kernel considered in this paper can be used to combine
copies of any two B-DMCs . The investigation of coding
advantages that may result from such variations on the basic The th bracketed term under the summation is given by
code construction method is an area for further research.
It is easy to propose variants and generalizations of the basic
channel polarization scheme, as we did above; however, it is not
clear if we obtain channel polarization under each such variant. where and . We now consider
We conjecture that channel polarization is a common phenom- maximizing over . We compute
enon, which is almost impossible to avoid as long as chan-
nels are combined with a sufficient density and mix of connec-
tions, whether chosen recursively or at random, provided the
coordinate-wise splitting of the synthesized vector channel is
done according to a suitable SC decoding order. The study of and recognize that and are, respectively, the
channel polarization in such generality is an interesting theoret- geometric and arithmetic means of the numbers and .
ical problem. So, and is maximized at , giving the
inequality . Using this in the expression for , in (86) and use (5) again to obtain (22). For the proof of (23),
we obtain the claim of the lemma we write
Lemma 3: For any B-DMC .

Proof: Let be an arbitrary B-DMC with output al-
phabet and put
. Let
and . Then, we have
. Clearly, is upper-bounded By carrying out the inner and outer sums in the same manner as
by the maximum of over subject to the in the proof of (22), we obtain (23).
constraints that and .
To carry out this maximization, we compute the partial deriva- C. Proof of Proposition 4
tives of with respect to Let us specify the channels as follows:
and . By hypothesis there is
a one-to-one function such that (17) and (18)
are satisfied. For the proof it is helpful to define an ensemble
of RVs so that the pair is
uniformly distributed over
and
and observe that is a decreasing, concave function of . We now have
for each , within the range . The maximum
occurs at the solution of the set of equations , all
, where is a constant, i.e., at . Using
the constraint and the fact that , we
find . So, the maximum occurs at
From these and the fact that is invertible, we get
and has the value . We have
thus shown that , which is equivalent to
.
From the above two lemmas, the proof of (2) is immediate.
Since and are independent, equals
. So, by the chain rule, we have
B. Proof of Proposition 3
To prove (22), we write
where the second equality is due to the one-to-one relationship
between and . The proof of (24) is completed
by noting that equals
which in turn equals .
To prove (25), we begin by noting that
This shows that . This and (24) give

(86) (25). The above proof shows that equality holds in (25) iff
, which is equivalent to having
By definition (5), the sum over for any fixed equals
for all such that , or equivalently
because, as ranges over ranges

also over . We now factor this term out of the middle sum (87)
for all . Since where the inequality follows from the identity
, (87) can be written as
(88)
Substituting and
Next, we note that
into (88) and simplifying, we obtain Likewise, each term obtained by expanding
gives
when summed over . Also,
summed over equals . Combining these, we obtain
the claim (27). Equality holds in (27) iff, for any choice of ,
which for all four possible values of is equivalent to one of the following is true: or
or . This is satisfied if is a
BEC. Conversely, if we take , we see that for equality in
(27), we must have, for any choice of , either
Thus, either there exists no such that , or ; this is equivalent to saying that is a BEC.
in which case , or for all we have To prove (28), we need the following result which states that
, which implies . the parameter is a convex function of the channel transi-
tion probabilities.
D. Proof of Proposition 5
Lemma 4: Given any collection of B-DMCs
Proof of (26) is straightforward. and a probability distribution on , define
as the channel . Then
(89)
Proof: This follows by first rewriting in a different

form and then applying Minkowskys inequality [10, p. 524,
inequality (h)]
To prove (27), we use shorthand notation

and ,
and write
We now write as the mixture
where
and apply Lemma 4 to obtain the claimed inequality

Since and , we have F. Proof of Lemma 1

, with equality iff equals or . Since The proof follows that of a similar result from Chung
, this also shows that iff [9, Theorem 4.1.1]. Fix . Let
equals or . So, by Proposition 1, . By Proposition 10, . Fix
iff equals or . . implies that there exists such that
. Thus, for some . So,
E. Proof of Proposition 6 . Therefore, .
Since , by the monotone convergence
From (17), we have the identities property of a measure, .
So, . It follows that, for any
there exists a finite such that, for all
, . This completes the proof.
REFERENCES
(90) [1] C. E. Shannon, A mathematical theory of communication, Bell Syst.
Tech. J., vol. 27, pp. 379423, 623656, Jul.Oct. 1948.
[2] E. Arkan, Channel combining and splitting for cutoff rate improve-
ment, IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 628639, Feb. 2006.
(91) [3] D. E. Muller, Application of Boolean algebra to switching circuit de-
sign and to error correction, IRE Trans. Electron. Comput., vol. EC-3,
no. 9, pp. 612, Sep. 1954.
Suppose is a BEC, but is not. Then, there exists [4] I. Reed, A class of multiple-error-correcting codes and the decoding
scheme, IRE Trans. Inf. Theory, vol. IT-4, no. 3, pp. 3944, Sep. 1954.
such that the left sides of (90) and (91) are both different from [5] M. Plotkin, Binary codes with specified minimum distance, IRE
zero. From (91), we infer that neither nor is an erasure Trans. Inf. Theory, vol. IT-6, no. 3, pp. 445450, Sep. 1960.
symbol for . But then the RHS of (90) must be zero, which [6] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Upper
Saddle River, N.J.: Pearson, 2004.
is a contradiction. Thus, must be a BEC. From (91), we [7] R. E. Blahut, Theory and Practice of Error Control Codes. Reading,
conclude that is an erasure symbol for iff either MA: Addison-Wesley, 1983.
or is an erasure symbol for . This shows that the erasure [8] G. D. Forney, Jr., MIT 6.451 Lecture Notes, Spring, 2005, unpublished.
[9] K. L. Chung, A Course in Probability Theory, 2nd ed. New York:
probability for is , where is the erasure probability Academic, 1974.
of . [10] R. G. Gallager, Information Theory and Reliable Communication.
Conversely, suppose is a BEC but is not. Then, New York: Wiley, 1968.
[11] J. W. Cooley and J. W. Tukey, An algorithm for the machine calcu-
there exists such that and lation of complex Fourier series, Math. Comput., vol. 19, no. 90, pp.
. By taking , we see 297301, 1965.
that the RHSs of (90) and (91) can both be made nonzero, [12] E. Arkan and E. Telatar, On the Rate of Channel Polarization, Aug.
2008, arXiv:0807.3806v2 [cs.IT].
which contradicts the assumption that is a BEC. [13] A. Sahai, P. Glover, and E. Telatar, private communication, Oct. 2008.
The other claims follow from the identities [14] C. E. Shannon, A note on partial ordering for communication chan-
nels, Inf. Contr., vol. 1, pp. 390397, 1958.
[15] R. G. Gallager, Low-density parity-check codes, IRE Trans. Inf.
Theory, vol. IT-8, no. 1, pp. 2128, Jan. 1962.
[16] G. D. Forney, Jr., Codes on graphs: Normal realizations, IEEE Trans.
Inf. Theory, vol. 47, no. 2, pp. 520548, Feb. 2001.
[17] E. Arkan, A performance comparison of polar codes and Reed-Muller
codes, IEEE Commun. Lett., vol. 12, no. 6, pp. 447449, Jun. 2008.
Erdal Arkan (S84M79SM94) was born in Ankara, Turkey, in 1958. He

received the B.S. degree from the California Institute of Technology, Pasadena,
in 1981 and the S.M. and Ph.D. degrees from the Massachusetts Institute of
The arguments are similar to the ones already given and we omit Technology, Cambridge, in 1982 and 1985, respectively, all in electrical engi-
neering.
the details, other than noting that is an erasure Since 1987 he has been with the Electrical-Electronics Engineering Depart-
symbol for iff both and are erasure symbols for . ment of Bilkent University, Ankara, Turkey, where he is presently a Professor.

Channel Polarization For Capacity Achieveing Codes (Arikan)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Channel Polarization For Capacity Achieveing Codes (Arikan)

Încărcat de

Drepturi de autor:

Formate disponibile

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO.

7, JULY 2009 3051

Channel Polarization: A Method for Constructing

AbstractA method is proposed, called channel polarization, A. Preliminaries

Fig. 1. The channel W.

is said to be an erasure symbol. The sum of over all

.. .. .. Thus, we have the relation be-

with . This recursion is valid only for BECs

Fig. 3. Recursive construction of W from two copies of W . (7)

Theorem 2: For any B-DMC with , and

Fig. 4. Plot of I (W ) versus i = 1; . . . ; N = 2 for a BEC with  = 0:5.

This theorem follows as an easy corollary to Theorem 2 and

transformation (21). By understanding the local behavior, we

A. Local Transformation of Rate and Reliability

with equality iff equals or .

with equality in (31) iff is a BEC. Channel splitting moves

with equality in (32) and (33) iff equals or . The reli-

with equality in (34) iff is a BEC. The cumulative rate and

. It is clear that, for any fixed , the RVs hence, . Thus,

For and , define V. PERFORMANCE OF POLAR CODING

B. Proof of Proposition 2 We conclude that

which is equivalent to (13). This completes the proof of Propo-

In particular, the bound (55) holds if is chosen in accor-

of versus . The parameter

C. Further Symmetries of the Channel VII. ENCODING

Second, the reverse shuffle operator acts on a row vector

The encoding circuit of Fig. 9 suggests many parallel imple-

Thus, the calculation of an LR at length is reduced to the

where is the worst case complexity of assembling two LRs at

XI. CONCLUDING REMARKS

(83) Fig. 11. General form of channel combining.

C. Iterative Decoding of Polar Codes

Fig. 12. The factor graph representation for the transformation F .

such that , for a given set of coor- A. Proof of Proposition 1

Lemma 3: For any B-DMC .

This shows that . This and (24) give

By definition (5), the sum over for any fixed equals

for all such that , or equivalently

because, as ranges over ranges

Proof: This follows by first rewriting in a different

To prove (27), we use shorthand notation

We now write as the mixture

and apply Lemma 4 to obtain the claimed inequality

Since and , we have F. Proof of Lemma 1

Erdal Arkan (S84M79SM94) was born in Ankara, Turkey, in 1958. He

S-ar putea să vă placă și

Fig. 4. Plot of I (W ) versus i = 1; . . . ; N = 2 for a BEC with = 0:5.