Sunteți pe pagina 1din 62





Low density parity check (LDPC) codes are most widely used error correcting codes
(ECC). Because of their popularity they are used in several applications such as the digital
satellite broadcasting system (DVB-S2), Wireless Local Area Network (IEEE 802.11n)
and Metropolitan Area Network (802.16e). These codes are used to transmit a message
over noisy transmission channel. LDPC codes are constructed using sparse parity check
matrices. LDPC codes provide very fast encoding and decoding algorithms. In this paper,
low density parity check decoder is implemented using Verilog technique. A partially
parallel decoder is design using belief propagation (BP) algorithm. For simulation,
Modelsim is used and for synthesis, Mentor Graphics Leonardo Spectrum with vertex IV
technology is used.




Low-Density Parity-Check (LDPC) [1] codes have become one of the most attractive error
correction codes due to its excellent performance [2] and suitability in high data rate
applications, such as WiMax, DVB-S2 and so on [3]. The inherent structure of the LDPC
code makes the decoder achieve high degree of parallelism in practical implementation
[4]. LDPC decoding algorithms are primarily iterative and are based on belief propagation
message passing algorithm. The complexity of the decoding algorithm is highly critical for
the overall performance of the LDPC decoder. Various algorithms have been proposed
in the past to achieve tradeoff between complexity and performance [5, 6]. The Sum-
Product Algorithm (SPA) [7], a soft decision based message passing algorithm can
achieve best performance, but with high decoding complexity. Whereas, Bit-Flip is a hard
decision based algorithm with least decoding complexity, but suffers from poor
performance [6]. Min-Sum Algorithm (MSA) is the simplified version of SPA that has
reduced implementation complexity with a slight degradation in performance [7]. The
MSA performs simple arithmetic and logical operations that makes suitable for hardware
implementation. But the performance of the algorithm is significantly impacted by the
quantization of soft input messages used [8]. Reducing the quantization of the message
is invariably important to reduce the implementation complexity and hardware resources
of the decoder. But this advantage comes with degradation in decoding performance.
Performance issues and hardware implementation of such low complexity algorithms,
especially the 2-bit MSA has limited information in the literature.


To provide and improvise designing the efficient and reliable data transmission methods.
This typically involves the removal of redundancy and the correction (or detection) of
errors in the transmitted data


This paper discusses the performance and hardware implementation complexity

associated with 2-bit MSA. Modifications are proposed to improve the overall
performance of the algorithm to achieve comparable to that of 3-bit MSA. With a
comparable performance to that of 3-bit MSA, FPGA implementation of proposed MMS2
can save up-to 18% of slices and leading to 23% improvement in maximum operating
frequency of the LDPC decoder.

The rest of the paper is organized as follows. NS-FAIDs are introduced in Section II, which
also discusses their expected implementation benefits and the DE analysis. The
optimization of regular and irregular NS-FAIDs is presented in Section III. The proposed
hardware architectures, with both MS and NSFAID decoding kernels, are discussed in
Section IV. Numerical results are provided in Section V, and Section VI concludes the




Coding theory is the study of the properties of codes and their fitness for a specific
application. Codes are used for data compression, cryptography, error-correction and
more recently also for network coding. Codes are studied by various scientific
disciplines—such as information theory, electrical engineering, mathematics, and
computer science—for the purpose of designing efficient and reliable data transmission
methods. This typically involves the removal of redundancy and the correction (or
detection) of errors in the transmitted data.

There are four types of coding:[1]

1. Data compression (or, source coding)

2. Error correction (or channel coding)

3. Cryptographic coding

4. Line coding

Data compression and error correction may be studied in combination.

Source encoding attempts to compress the data from a source in order to transmit it more
efficiently. This practice is found every day on the Internet where the common Zip data
compression is used to reduce the network load and make files smaller.

The second, channel encoding, adds extra data bits to make the transmission of data
more robust to disturbances present on the transmission channel. The ordinary user may
not be aware of many applications using channel coding. A typical music CD uses the
Reed-Solomon code to correct for scratches and dust. In this application the transmission
channel is the CD itself. Cell phones also use coding techniques to correct for the fading
and noise of high frequency radio transmission. Data modems, telephone transmissions,
and NASA all employ channel coding techniques to get the bits through, for example the
turbo code and LDPC codes.

History of coding theory

In 1948, Claude Shannon published "A Mathematical Theory of Communication", an

article in two parts in the July and October issues of the Bell System Technical Journal.
This work focuses on the problem of how best to encode the information a sender wants
to transmit. In this fundamental work he used tools in probability theory, developed by
Norbert Wiener, which were in their nascent stages of being applied to communication
theory at that time. Shannon developed information entropy as a measure for the
uncertainty in a message while essentially inventing the field of information theory.

The binary Golay code was developed in 1949. More specifically, it is an error-correcting
code capable of correcting up to three errors in each 24-bit word, and detecting a fourth.

A two-dimensional visualisation
of the Hamming distance

Richard Hamming won the Turing Award in 1968 for his work at Bell Labs in numerical
methods, automatic coding systems, and error-detecting and error-correcting codes. He
invented the concepts known as Hamming codes, Hamming windows, Hamming
numbers, and Hamming distance.

Source coding

Main article: Data compression

The aim of source coding is to take the source data and make it smaller.


Data can be seen as a random variable , where appears with

probability .

Data are encoded by strings (words) over an alphabet .

A code is a function

(or if the empty string is not part of the alphabet).

is the code word associated with .

Length of the code word is written as

Expected length of a code is

The concatenation of code words .

The code word of the empty string is the empty string itself:


1. is non-singular if injective.

2. is uniquely decodable if injective.

3. is instantaneous if is not a prefix of (and vice versa).


Entropy of a source is the measure of information. Basically, source codes try to reduce
the redundancy present in the source, and represent the source with fewer bits that carry
more information.

Data compression which explicitly tries to minimize the average length of messages
according to a particular assumed probability model is called entropy encoding.
Various techniques used by source coding schemes try to achieve the limit of Entropy of
the source. C(x) ≥ H(x), where H(x) is entropy of source (bitrate), and C(x) is the bitrate
after compression. In particular, no source coding scheme can be better than the entropy
of the source.


Facsimile transmission uses a simple run length code. Source coding removes all data
superfluous to the need of the transmitter, decreasing the bandwidth required for

Channel coding

Main article: Forward error correction

The purpose of channel coding theory is to find codes which transmit quickly, contain
many valid code words and can correct or at least detect many errors. While not mutually
exclusive, performance in these areas is a trade off. So, different codes are optimal for
different applications. The needed properties of this code mainly depend on the
probability of errors happening during transmission. In a typical CD, the impairment is
mainly dust or scratches. Thus codes are used in an interleaved manner. [citation needed] The
data is spread out over the disk.

Although not a very good code, a simple repeat code can serve as an understandable
example. Suppose we take a block of data bits (representing sound) and send it three
times. At the receiver we will examine the three repetitions bit by bit and take a majority
vote. The twist on this is that we don't merely send the bits in order. We interleave them.
The block of data bits is first divided into 4 smaller blocks. Then we cycle through the
block and send one bit from the first, then the second, etc. This is done three times to
spread the data out over the surface of the disk. In the context of the simple repeat code,
this may not appear effective. However, there are more powerful codes known which are
very effective at correcting the "burst" error of a scratch or a dust spot when this
interleaving technique is used.
Other codes are more appropriate for different applications. Deep space communications
are limited by the thermal noise of the receiver which is more of a continuous nature than
a bursty nature. Likewise, narrowband modems are limited by the noise, present in the
telephone network and also modeled better as a continuous disturbance.[citation needed] Cell
phones are subject to rapid fading. The high frequencies used can cause rapid fading of
the signal even if the receiver is moved a few inches. Again there are a class of channel
codes that are designed to combat fading.[citation needed]

Linear codes

Main article: Linear code

The term algebraic coding theory denotes the sub-field of coding theory where the
properties of codes are expressed in algebraic terms and then further researched. [citation

Algebraic coding theory is basically divided into two major types of codes:[citation needed]

1. Linear block codes

2. Convolutional codes.

It analyzes the following three properties of a code – mainly:[citation needed]

 code word length

 total number of valid code words

 the minimum distance between two valid code words, using mainly the Hamming
distance, sometimes also other distances like the Lee distance.

Linear block codes

Main article: Block code

Linear block codes have the property of linearity, i.e. the sum of any two codewords is
also a code word, and they are applied to the source bits in blocks, hence the name linear
block codes. There are block codes that are not linear, but it is difficult to prove that a
code is a good one without this property.[2]

Linear block codes are summarized by their symbol alphabets (e.g., binary or ternary)
and parameters (n,m,dmin)[3] where

1. n is the length of the codeword, in symbols,

2. m is the number of source symbols that will be used for encoding at once,

3. dmin is the minimum hamming distance for the code.

There are many types of linear block codes, such as

1. Cyclic codes (e.g., Hamming codes)

2. Repetition codes

3. Parity codes

4. Polynomial codes (e.g., BCH codes)

5. Reed–Solomon codes

6. Algebraic geometric codes

7. Reed–Muller codes

8. Perfect codes.

Block codes are tied to the sphere packing problem, which has received some attention
over the years. In two dimensions, it is easy to visualize. Take a bunch of pennies flat on
the table and push them together. The result is a hexagon pattern like a bee's nest. But
block codes rely on more dimensions which cannot easily be visualized. The powerful
(24,12) Golay code used in deep space communications uses 24 dimensions. If used as
a binary code (which it usually is) the dimensions refer to the length of the codeword as
defined above.
The theory of coding uses the N-dimensional sphere model. For example, how many
pennies can be packed into a circle on a tabletop, or in 3 dimensions, how many marbles
can be packed into a globe. Other considerations enter the choice of a code. For example,
hexagon packing into the constraint of a rectangular box will leave empty space at the
corners. As the dimensions get larger, the percentage of empty space grows smaller. But
at certain dimensions, the packing uses all the space and these codes are the so-called
"perfect" codes. The only nontrivial and useful perfect codes are the distance-3 Hamming
codes with parameters satisfying (2r – 1, 2r – 1 – r, 3), and the [23,12,7] binary and [11,6,5]
ternary Golay codes.[2][3]

Another code property is the number of neighbors that a single codeword may have. [4]
Again, consider pennies as an example. First we pack the pennies in a rectangular grid.
Each penny will have 4 near neighbors (and 4 at the corners which are farther away). In
a hexagon, each penny will have 6 near neighbors. When we increase the dimensions,
the number of near neighbors increases very rapidly. The result is the number of ways
for noise to make the receiver choose a neighbor (hence an error) grows as well. This is
a fundamental limitation of block codes, and indeed all codes. It may be harder to cause
an error to a single neighbor, but the number of neighbors can be large enough so the
total error probability actually suffers.[4]

Properties of linear block codes are used in many applications. For example, the
syndrome-coset uniqueness property of linear block codes is used in trellis shaping, [5]
one of the best known shaping codes. This same property is used in sensor networks for
distributed source coding

Convolutional codes

Main article: Convolutional code

The idea behind a convolutional code is to make every codeword symbol be the weighted
sum of the various input message symbols. This is like convolution used in LTI systems
to find the output of a system, when you know the input and impulse response.
So we generally find the output of the system convolutional encoder, which is the
convolution of the input bit, against the states of the convolution encoder, registers.

Fundamentally, convolutional codes do not offer more protection against noise than an
equivalent block code. In many cases, they generally offer greater simplicity of
implementation over a block code of equal power. The encoder is usually a simple circuit
which has state memory and some feedback logic, normally XOR gates. The decoder
can be implemented in software or firmware.

The Viterbi algorithm is the optimum algorithm used to decode convolutional codes. There
are simplifications to reduce the computational load. They rely on searching only the most
likely paths. Although not optimum, they have generally been found to give good results
in the lower noise environments.

Convolutional codes are used in voiceband modems (V.32, V.17, V.34) and in GSM
mobile phones, as well as satellite and military communication devices.

Cryptographical coding

Main article: Cryptography

Cryptography or cryptographic coding is the practice and study of techniques for secure
communication in the presence of third parties (called adversaries).[6] More generally, it
is about constructing and analyzing protocols that block adversaries;[7] various aspects in
information security such as data confidentiality, data integrity, authentication, and non-
repudiation[8] are central to modern cryptography. Modern cryptography exists at the
intersection of the disciplines of mathematics, computer science, and electrical
engineering. Applications of cryptography include ATM cards, computer passwords, and
electronic commerce.

Cryptography prior to the modern age was effectively synonymous with encryption, the
conversion of information from a readable state to apparent nonsense. The originator of
an encrypted message shared the decoding technique needed to recover the original
information only with intended recipients, thereby precluding unwanted persons from
doing the same. Since World War I and the advent of the computer, the methods used to
carry out cryptology have become increasingly complex and its application more

Modern cryptography is heavily based on mathematical theory and computer science

practice; cryptographic algorithms are designed around computational hardness
assumptions, making such algorithms hard to break in practice by any adversary. It is
theoretically possible to break such a system, but it is infeasible to do so by any known
practical means. These schemes are therefore termed computationally secure;
theoretical advances, e.g., improvements in integer factorization algorithms, and faster
computing technology require these solutions to be continually adapted. There exist
information-theoretically secure schemes that provably cannot be broken even with
unlimited computing power—an example is the one-time pad—but these schemes are
more difficult to implement than the best theoretically breakable but computationally
secure mechanisms.

Line coding

Main article: Line code

A line code (also called digital baseband modulation or digital baseband transmission
method) is a code chosen for use within a communications system for baseband
transmission purposes. Line coding is often used for digital data transport.

Line coding consists of representing the digital signal to be transported by an amplitude-

and time-discrete signal that is optimally tuned for the specific properties of the physical
channel (and of the receiving equipment). The waveform pattern of voltage or current
used to represent the 1s and 0s of a digital data on a transmission link is called line
encoding. The common types of line encoding are unipolar, polar, bipolar, and
Manchester encoding.

Other applications of coding theory

Another concern of coding theory is designing codes that help synchronization. A code
may be designed so that a phase shift can be easily detected and corrected and that
multiple signals can be sent on the same channel.[citation needed]
Another application of codes, used in some mobile phone systems, is code-division
multiple access (CDMA). Each phone is assigned a code sequence that is approximately
uncorrelated with the codes of other phones.[citation needed] When transmitting, the code word
is used to modulate the data bits representing the voice message. At the receiver, a
demodulation process is performed to recover the data. The properties of this class of
codes allow many users (with different codes) to use the same radio channel at the same
time. To the receiver, the signals of other users will appear to the demodulator only as a
low-level noise.[citation needed]

Another general class of codes are the automatic repeat-request (ARQ) codes. In these
codes the sender adds redundancy to each message for error checking, usually by adding
check bits. If the check bits are not consistent with the rest of the message when it arrives,
the receiver will ask the sender to retransmit the message. All but the simplest wide area
network protocols use ARQ. Common protocols include SDLC (IBM), TCP (Internet), X.25
(International) and many others. There is an extensive field of research on this topic
because of the problem of matching a rejected packet against a new packet. Is it a new
one or is it a retransmission? Typically numbering schemes are used, as in
TCP."RFC793". RFCs. Internet Engineering Task Force (IETF). September 1981.

Group testing

Group testing uses codes in a different way. Consider a large group of items in which a
very few are different in a particular way (e.g., defective products or infected test
subjects). The idea of group testing is to determine which items are "different" by using
as few tests as possible. The origin of the problem has its roots in the Second World War
when the United States Army Air Forces needed to test its soldiers for syphilis. It
originated from a ground-breaking paper by Robert Dorfman.

Analog coding

Information is encoded analogously in the neural networks of brains, in analog signal

processing, and analog electronics. Aspects of analog coding include analog error
correction,[9] analog data compression.[10] analog encryption[11]
Neural coding

Neural coding is a neuroscience-related field concerned with how sensory and other
information is represented in the brain by networks of neurons. The main goal of studying
neural coding is to characterize the relationship between the stimulus and the individual
or ensemble neuronal responses and the relationship among electrical activity of the
neurons in the ensemble.[12] It is thought that neurons can encode both digital and analog
information,[13] and that neurons follow the principles of information theory and compress
information,[14] and detect and correct[15] errors in the signals that are sent throughout the
brain and wider nervous system.


In telecommunication, information theory, and coding theory, forward error correction

(FEC) or channel coding[1] is a technique used for controlling errors in data transmission
over unreliable or noisy communication channels. The central idea is the sender encodes
the message in a redundant way by using an error-correcting code (ECC). The
American mathematician Richard Hamming pioneered this field in the 1940s and invented
the first error-correcting code in 1950: the Hamming (7,4) code.[2]

The redundancy allows the receiver to detect a limited number of errors that may occur
anywhere in the message, and often to correct these errors without retransmission. FEC
gives the receiver the ability to correct errors without needing a reverse channel to request
retransmission of data, but at the cost of a fixed, higher forward channel bandwidth. FEC
is therefore applied in situations where retransmissions are costly or impossible, such as
one-way communication links and when transmitting to multiple receivers in multicast.
FEC information is usually added to mass storage devices to enable recovery of corrupted
data, and is widely used in modems.

FEC processing in a receiver may be applied to a digital bit stream or in the demodulation
of a digitally modulated carrier. For the latter, FEC is an integral part of the initial analog-
to-digital conversion in the receiver. The Viterbi decoder implements a soft-decision
algorithm to demodulate digital data from an analog signal corrupted by noise. Many FEC
coders can also generate a bit-error rate (BER) signal which can be used as feedback to
fine-tune the analog receiving electronics.

The noisy-channel coding theorem establishes bounds on the theoretical maximum

information transfer rate of a channel with some given noise level. Some advanced FEC
systems come very close to the theoretical maximum.

The maximum fractions of errors or of missing bits that can be corrected is determined by
the design of the FEC code, so different forward error correcting codes are suitable for
different conditions.

How it works

FEC is accomplished by adding redundancy to the transmitted information using an

algorithm. A redundant bit may be a complex function of many original information bits.
The original information may or may not appear literally in the encoded output; codes that
include the unmodified input in the output are systematic, while those that do not are

A simplistic example of FEC is to transmit each data bit 3 times, which is known as a (3,1)
repetition code. Through a noisy channel, a receiver might see 8 versions of the output,
see table below.

Triplet received Interpreted as

000 0 (error free)

001 0

010 0

100 0
111 1 (error free)

110 1

101 1

011 1

This allows an error in any one of the three samples to be corrected by "majority vote" or
"democratic voting". The correcting ability of this FEC is:

 Up to 1 bit of triplet in error, or

 up to 2 bits of triplet omitted (cases not shown in table).

Though simple to implement and widely used, this triple modular redundancy is a
relatively inefficient FEC. Better FEC codes typically examine the last several dozen, or
even the last several hundred, previously received bits to determine how to decode the
current small handful of bits (typically in groups of 2 to 8 bits).

Averaging noise to reduce errors

FEC could be said to work by "averaging noise"; since each data bit affects many
transmitted symbols, the corruption of some symbols by noise usually allows the original
user data to be extracted from the other, uncorrupted received symbols that also depend
on the same user data.

 Because of this "risk-pooling" effect, digital communication systems that use FEC
tend to work well above a certain minimum signal-to-noise ratio and not at all below

 This all-or-nothing tendency — the cliff effect — becomes more pronounced as

stronger codes are used that more closely approach the theoretical Shannon limit.

 Interleaving FEC coded data can reduce the all or nothing properties of transmitted
FEC codes when the channel errors tend to occur in bursts. However, this method
has limits; it is best used on narrowband data.
Most telecommunication systems use a fixed channel code designed to tolerate the
expected worst-case bit error rate, and then fail to work at all if the bit error rate is ever
worse. However, some systems adapt to the given channel error conditions: some
instances of hybrid automatic repeat-request use a fixed FEC method as long as the FEC
can handle the error rate, then switch to ARQ when the error rate gets too high; adaptive
modulation and coding uses a variety of FEC rates, adding more error-correction bits per
packet when there are higher error rates in the channel, or taking them out when they are
not needed.

Types of FEC

Main articles: Block code and Convolutional code

The two main categories of FEC codes are block codes and convolutional codes.

 Block codes work on fixed-size blocks (packets) of bits or symbols of

predetermined size. Practical block codes can generally be hard-decoded in
polynomial time to their block length.

 Convolutional codes work on bit or symbol streams of arbitrary length. They are
most often soft decoded with the Viterbi algorithm, though other algorithms are
sometimes used. Viterbi decoding allows asymptotically optimal decoding
efficiency with increasing constraint length of the convolutional code, but at the
expense of exponentially increasing complexity. A convolutional code that is
terminated is also a 'block code' in that it encodes a block of input data, but the
block size of a convolutional code is generally arbitrary, while block codes have a
fixed size dictated by their algebraic characteristics. Types of termination for
convolutional codes include "tail-biting" and "bit-flushing".

There are many types of block codes, but among the classical ones the most notable is
Reed-Solomon coding because of its widespread use on the Compact disc, the DVD, and
in hard disk drives. Other examples of classical block codes include Golay, BCH,
Multidimensional parity, and Hamming codes.
Hamming ECC is commonly used to correct NAND flash memory errors.[3] This provides
single-bit error correction and 2-bit error detection. Hamming codes are only suitable for
more reliable single level cell (SLC) NAND. Denser multi level cell (MLC) NAND requires
stronger multi-bit correcting ECC such as BCH or Reed–Solomon.[4][dubious – discuss] NOR
Flash typically does not use any error correction.[4]

Classical block codes are usually decoded using hard-decision algorithms,[5] which
means that for every input and output signal a hard decision is made whether it
corresponds to a one or a zero bit. In contrast, convolutional codes are typically decoded
using soft-decision algorithms like the Viterbi, MAP or BCJR algorithms, which process
(discretized) analog signals, and which allow for much higher error-correction
performance than hard-decision decoding.

Nearly all classical block codes apply the algebraic properties of finite fields. Hence
classical block codes are often referred to as algebraic codes.

In contrast to classical block codes that often specify an error-detecting or error-correcting

ability, many modern block codes such as LDPC codes lack such guarantees. Instead,
modern codes are evaluated in terms of their bit error rates.

Most forward error correction correct only bit-flips, but not bit-insertions or bit-deletions.
In this setting, the Hamming distance is the appropriate way to measure the bit error rate.
A few forward error correction codes are designed to correct bit-insertions and bit-
deletions, such as Marker Codes and Watermark Codes. The Levenshtein distance is a
more appropriate way to measure the bit error rate when using such codes.[6]

Concatenated FEC codes for improved performance

Main article: Concatenated error correction codes

Classical (algebraic) block codes and convolutional codes are frequently combined in
concatenated coding schemes in which a short constraint-length Viterbi-decoded
convolutional code does most of the work and a block code (usually Reed-Solomon) with
larger symbol size and block length "mops up" any errors made by the convolutional
decoder. Single pass decoding with this family of error correction codes can yield very
low error rates, but for long range transmission conditions (like deep space) iterative
decoding is recommended.

Concatenated codes have been standard practice in satellite and deep space
communications since Voyager 2 first used the technique in its 1986 encounter with
Uranus. The Galileo craft used iterative concatenated codes to compensate for the very
high error rate conditions caused by having a failed antenna.

Low-density parity-check (LDPC)

Main article: Low-density parity-check code

Low-density parity-check (LDPC) codes are a class of recently re-discovered highly

efficient linear block codes made from many single parity check (SPC) codes. They can
provide performance very close to the channel capacity (the theoretical maximum) using
an iterated soft-decision decoding approach, at linear time complexity in terms of their
block length. Practical implementations rely heavily on decoding the constituent SPC
codes in parallel.

LDPC codes were first introduced by Robert G. Gallager in his PhD thesis in 1960, but
due to the computational effort in implementing encoder and decoder and the introduction
of Reed–Solomon codes, they were mostly ignored until recently.

LDPC codes are now used in many recent high-speed communication standards, such
as DVB-S2 (Digital video broadcasting), WiMAX (IEEE 802.16e standard for microwave
communications), High-Speed Wireless LAN (IEEE 802.11n)[citation needed], 10GBase-T
Ethernet (802.3an) and (ITU-T Standard for networking over power lines,
phone lines and coaxial cable). Other LDPC codes are standardized for wireless
communication standards within 3GPP MBMS (see fountain codes).

Turbo codes

Main article: Turbo code

Turbo coding is an iterated soft-decoding scheme that combines two or more relatively
simple convolutional codes and an interleaver to produce a block code that can perform
to within a fraction of a decibel of the Shannon limit. Predating LDPC codes in terms of
practical application, they now provide similar performance.

One of the earliest commercial applications of turbo coding was the CDMA2000 1x (TIA
IS-2000) digital cellular technology developed by Qualcomm and sold by Verizon
Wireless, Sprint, and other carriers. It is also used for the evolution of CDMA2000 1x
specifically for Internet access, 1xEV-DO (TIA IS-856). Like 1x, EV-DO was developed
by Qualcomm, and is sold by Verizon Wireless, Sprint, and other carriers (Verizon's
marketing name for 1xEV-DO is Broadband Access, Sprint's consumer and business
marketing names for 1xEV-DO are Power Vision and Mobile Broadband, respectively).

Local decoding and testing of codes

Main articles: Locally decodable code and Locally testable code

Sometimes it is only necessary to decode single bits of the message, or to check whether
a given signal is a codeword, and do so without looking at the entire signal. This can make
sense in a streaming setting, where codewords are too large to be classically decoded
fast enough and where only a few bits of the message are of interest for now. Also such
codes have become an important tool in computational complexity theory, e.g., for the
design of probabilistically checkable proofs.

Locally decodable codes are error-correcting codes for which single bits of the message
can be probabilistically recovered by only looking at a small (say constant) number of
positions of a codeword, even after the codeword has been corrupted at some constant
fraction of positions. Locally testable codes are error-correcting codes for which it can be
checked probabilistically whether a signal is close to a codeword by only looking at a
small number of positions of the signal.


Interleaving is frequently used in digital communication and storage systems to improve

the performance of forward error correcting codes. Many communication channels are
not memoryless: errors typically occur in bursts rather than independently. If the number
of errors within a code word exceeds the error-correcting code's capability, it fails to
recover the original code word. Interleaving ameliorates this problem by shuffling source
symbols across several code words, thereby creating a more uniform distribution of
errors.[7] Therefore, interleaving is widely used for burst error-correction.

The analysis of modern iterated codes, like turbo codes and LDPC codes, typically
assumes an independent distribution of errors.[8] Systems using LDPC codes therefore
typically employ additional interleaving across the symbols within a code word.[9]

For turbo codes, an interleaver is an integral component and its proper design is crucial
for good performance.[7][10] The iterative decoding algorithm works best when there are
not short cycles in the factor graph that represents the decoder; the interleaver is chosen
to avoid short cycles.

Interleaver designs include:

 rectangular (or uniform) interleavers (similar to the method using skip factors
described above)

 convolutional interleavers

 random interleavers (where the interleaver is a known random permutation)

 S-random interleaver (where the interleaver is a known random permutation with

the constraint that no input symbols within distance S appear within a distance of
S in the output).[11]

 Another possible construction is a contention-free quadratic permutation

polynomial (QPP).[12] It is used for example in the 3GPP Long Term Evolution
mobile telecommunication standard.[13]

In multi-carrier communication systems, interleaving across carriers may be employed to

provide frequency diversity, e.g., to mitigate frequency-selective fading or narrowband


Transmission without interleaving:

Error-free message: aaaabbbbccccddddeeeeffffgggg

Transmission with a burst error: aaaabbbbccc____deeeeffffgggg

Here, each group of the same letter represents a 4-bit one-bit error-correcting codeword.
The codeword cccc is altered in one bit and can be corrected, but the codeword dddd is
altered in three bits, so either it cannot be decoded at all or it might be decoded incorrectly.

With interleaving:

Error-free code words: aaaabbbbccccddddeeeeffffgggg

Interleaved: abcdefgabcdefgabcdefgabcdefg

Transmission with a burst error: abcdefgabcd____bcdefgabcdefg

Received code words after deinterleaving: aa_abbbbccccdddde_eef_ffg_gg

In each of the codewords aaaa, eeee, ffff, gggg, only one bit is altered, so one-bit error-
correcting code will decode everything correctly.

Transmission without interleaving:

Original transmitted sentence: ThisIsAnExampleOfInterleaving

Received sentence with a burst error: ThisIs______pleOfInterleaving

The term "AnExample" ends up mostly unintelligible and difficult to correct.

With interleaving:

Transmitted sentence: ThisIsAnExampleOfInterleaving...

Error-free transmission: TIEpfeaghsxlIrv.iAaenli.snmOten.

Received sentence with a burst error: TIEpfe______Irv.iAaenli.snmOten.

Received sentence after deinterleaving: T_isI_AnE_amp_eOfInterle_vin_...

No word is completely lost and the missing letters can be recovered with minimal

Disadvantages of interleaving

Use of interleaving techniques increases latency. This is because the entire interleaved
block must be received before the packets can be decoded.[15] Also interleavers hide the
structure of errors; without an interleaver, more advanced decoding algorithms can take
advantage of the error structure and achieve more reliable communication than a simpler
decoder combined with an interleaver.

List of error-correcting codes

Distance Code

2 (single-error detecting) Parity

3 (single-error correcting) Triple modular redundancy

3 (single-error correcting) perfect Hamming such as Hamming(7,4)

4 (SECDED) Extended Hamming

5 (double-error correcting)

6 (double-error correct-/triple error detect)

7 (three-error correcting) perfect binary Golay code

8 (TECFED) extended binary Golay code

 AN codes

 BCH code, which can be designed to correct any arbitrary number of errors per
code block.

 Berger code
 Constant-weight code

 Convolutional code

 Expander codes

 Group codes

 Golay codes, of which the Binary Golay code is of practical interest

 Goppa code, used in the McEliece cryptosystem

 Hadamard code

 Hagelbarger code

 Hamming code

 Latin square based code for non-white noise (prevalent for example in broadband
over powerlines)

 Lexicographic code

 Long code

 Low-density parity-check code, also known as Gallager code, as the archetype for
sparse graph codes

 LT code, which is a near-optimal rateless erasure correcting code (Fountain code)

 m of n codes

 Online code, a near-optimal rateless erasure correcting code

 Polar code (coding theory)

 Raptor code, a near-optimal rateless erasure correcting code

 Reed–Solomon error correction

 Reed–Muller code
 Repeat-accumulate code

 Repetition codes, such as Triple modular redundancy

 Spinal code, a rateless, nonlinear code based on pseudo-random hash functions


 Tornado code, a near-optimal erasure correcting code, and the precursor to

Fountain codes

 Turbo code

 Walsh–Hadamard code




ASIC Design is based on a flow that uses HDL as the entry level for design, which applies
for both Verilog and VHDL. The following description describes the flow from specification
of design upto tapeout, which is the form sent to silicon foundry for fabrication.

The following are the steps for the flow:-

1. Specification: This is the beginning and most important step towards designing a
chip as the features and functionalities of the chip are defined. Both design at
macro and micro level are taken into consideration which is derived from the
required features and functionalities. Speed, size, power consumption are among
the considerations on which the accepted range of values are specified. Other
performance criteria are also set at this point and deliberated on its viability; some
form of simulation might be possible to check on this.

2. RTL Coding: The microarchitecture at specification level is then transformed in

RTL code which marks the beginning of the real design phase towards realising a
chip.As a real chip is expected, so the code has to be a synthesiable RTL code.

3. Simulation and Testbench: RTL code and testbench are simulated using HDL
simulators to check on the functionality of the design. If Verilog is the language
used a Verilog simulator is required while VHDL simulator for a VHDL code. Some
of the tools available at CEDEC include: Cadence’s Verilog XL, Synopsys’s VCS,
and Mentor Graphic’s Modelsim. If the simulation results do not agree with the
intended function expected, the testbench file or the RTL code could be the cause.
The process of debugging the design has to be done if the RTL code is the source
of error. The simulation has to be repeated once either one of the two causes, or
both, have been corrected. There could be a possiblity of the loop in this process,
until the RTL code correctly describes the required logical behaviour of the design.

4. Synthesis: This process is conducted on the RTL code. This is the process
whereby the RTL code is converted into logic gates. The logic gate produced is
the functional equivalent of the RTL code as intended in the design. The synthesis
process however requires two input files: firstly, the “standard cell technology files”
and secondly the “constraints file”. A synthesised database of the design is created
in the system.

5. Pre-Layout Timing Analysis: When synthesis is completed, the synthesized

database along with timing information from the synthesi process is used to
perform a Static Timing Analysis (STA). Tweaking (making small changes) has to
be done to correct any timing issues.

6. APR: This is the Automatic Place and Route process whereby the layout is being
produced. In this process, the synthesized database together with timing
information from synthesis is used to place the logic gates. Most designs have
critical paths whose timings required them to be routed first. The process of
placement and routing normally has some degree of flexibility.

7. Back Annotation: This is the process where extraction for RC parasitics are made
from the layout. The path delay is calculated from these RC parasitics. Long
routing lines can significantly increase the interconnect delay for a path and for
sub-micron design parasitics cause significant increase in delay. Back annotation
is the step that bridges synthesis and physical layout.
8. Post-Layout Timing Analysis: This step in ASIC flow allows real timing violations
such as hold and setup to be detected. In this step, the net interconnect delay
information is fed into the timing analysis and any setup violation should be fixed
by optimizing the paths that fail while hold violation is fixed by introducing buffers
to the path to increase the delay. The process between APR, back annotation and
post-layout timing analysis go back and forth until the design is cleared of any
violation. Then it will be ready for logic verification.

9. Logic Verification: This step acts as the final check to ensure the design is correct
functionally after additional timing information from layout. Changes have to be
made on the RTL code or the post-layout synthesis to correct the logic verification.

10. Tapeout: When the design passed the logic verification check, it is now ready for
fabrication. The tapeout design is in the form of GDSII file, which will be accepted
by the foundry.


Logic blocks

Main article: Logic block

Simplified example illustration of a logic cell

The most common FPGA architecture[1] consists of an array of logic blocks (called
configurable logic block, CLB, or logic array block, LAB, depending on vendor), I/O pads,
and routing channels. Generally, all the routing channels have the same width (number
of wires). Multiple I/O pads may fit into the height of one row or the width of one column
in the array.
An application circuit must be mapped into an FPGA with adequate resources. While the
number of CLBs/LABs and I/Os required is easily determined from the design, the number
of routing tracks needed may vary considerably even among designs with the same
amount of logic. For example, a crossbar switch requires much more routing than
a systolic array with the same gate count. Since unused routing tracks increase the cost
(and decrease the performance) of the part without providing any benefit, FPGA
manufacturers try to provide just enough tracks so that most designs that will fit in terms
of lookup tables (LUTs) and I/Os can be routed. This is determined by estimates such as
those derived from Rent's ruleor by experiments with existing designs.

In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM, LE, slice
etc.). A typical cell consists of a 4-input LUT[timeframe?], a full adder (FA) and a D-type flip-
flop, as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal
mode those are combined into a 4-input LUT through the left mux. In arithmetic mode,
their outputs are fed to the FA. The selection of mode is programmed into the middle
multiplexer. The output can be either synchronous or asynchronous, depending on the
programming of the mux to the right, in the figure example. In practice, entire or parts of
the FA are put as functions into the LUTs in order to save space.[33][34][35]

Hard blocks

Modern FPGA families expand upon the above capabilities to include higher level
functionality fixed into the silicon. Having these common functions embedded into the
silicon reduces the area required and gives those functions increased speed compared
to building them from primitives. Examples of these include multipliers, generic DSP
blocks, embedded processors, high speed I/O logic and embedded memories.

Higher-end FPGAs can contain high speed multi-gigabit transceivers and hard IP
cores such as processor cores, Ethernet MACs, PCI/PCI Express controllers, and
external memory controllers. These cores exist alongside the programmable fabric, but
they are built out of transistors instead of LUTs so they have ASIC level performance and
power consumption while not consuming a significant amount of fabric resources, leaving
more of the fabric free for the application-specific logic. The multi-gigabit transceivers also
contain high performance analog input and output circuitry along with high-speed
serializers and deserializers, components which cannot be built out of LUTs. Higher-level
PHY layer functionality such as line coding may or may not be implemented alongside
the serializers and deserializers in hard logic, depending on the FPGA.


Most of the circuitry built inside of an FPGA is synchronous circuitry that requires a clock
signal. FPGAs contain dedicated global and regional routing networks for clock and reset
so they can be delivered with minimal skew. Also, FPGAs generally contain
analog PLL and/or DLL components to synthesize new clock frequencies as well as
attenuate jitter. Complex designs can use multiple clocks with different frequency and
phase relationships, each forming separate clock domains. These clock signals can be
generated locally by an oscillator or they can be recovered from a high speed serial data
stream. Care must be taken when building clock domain crossing circuitry to avoid
metastability. FPGAs generally contain block RAMs that are capable of working as dual
port RAMs with different clocks, aiding in the construction of building FIFOs and dual port
buffers that connect differing clock domains.

3D architectures

To shrink the size and power consumption of FPGAs, vendors such

as Tabula and Xilinx have introduced new 3D or stacked architectures. [36][37] Following
the introduction of its 28 nm 7-series FPGAs, Xilinx revealed that several of the highest-
density parts in those FPGA product lines will be constructed using multiple dies in one
package, employing technology developed for 3D construction and stacked-die

Xilinx's approach stacks several (three or four) active FPGA die side-by-side on a
silicon interposer – a single piece of silicon that carries passive interconnect. [37][38] The
multi-die construction also allows different parts of the FPGA to be created with different
process technologies, as the process requirements are different between the FPGA fabric
itself and the very high speed 28 Gbit/s serial transceivers. An FPGA built in this way is
called a heterogeneous FPGA.[39]
Altera's heterogeneous approach involves using a single monolithic FPGA die and
connecting other die/technologies to the FPGA using Intel's embedded multi-die
interconnect bridge (EMIB) technology.


To define the behavior of the FPGA, the user provides a design in a hardware description
language (HDL) or as a schematic design. The HDL form is more suited to work with large
structures because it's possible to just specify them numerically rather than having to
draw every piece by hand. However, schematic entry can allow for easier visualisation of
a design.

Then, using an electronic design automation tool, a technology-mapped netlist is

generated. The netlist can then be fit to the actual FPGA architecture using a process
called place-and-route, usually performed by the FPGA company's proprietary place-and-
route software. The user will validate the map, place and route results via timing
analysis, simulation, and other verification methodologies. Once the design and
validation process is complete, the binary file generated (also using the FPGA company's
proprietary software) is used to (re)configure the FPGA. This file is transferred to the
FPGA/CPLD via a serial interface(JTAG) or to an external memory device like

The most common HDLs are VHDL and Verilog, although in an attempt to reduce the
complexity of designing in HDLs, which have been compared to the equivalent
of assembly languages, there are moves[by whom?] to raise the abstraction level through the
introduction of alternative languages. National Instruments' LabVIEW graphical
programming language (sometimes referred to as "G") has an FPGA add-in module
available to target and program FPGA hardware.

To simplify the design of complex systems in FPGAs, there exist libraries of predefined
complex functions and circuits that have been tested and optimized to speed up the
design process. These predefined circuits are commonly called IP cores, and are
available from FPGA vendors and third-party IP suppliers (rarely free, and typically
released under proprietary licenses). Other predefined circuits are available from
developer communities such as OpenCores (typically released under free and open
source licenses such as the GPL, BSD or similar license), and other sources.

In a typical design flow, an FPGA application developer will simulate the design at multiple
stages throughout the design process. Initially the RTL description in VHDL or Verilog is
simulated by creating test benches to simulate the system and observe results. Then,
after the synthesis engine has mapped the design to a netlist, the netlist is translated to
a gate level description where simulation is repeated to confirm the synthesis proceeded
without errors. Finally the design is laid out in the FPGA at which point propagation delays
can be added and the simulation run again with these values back-annotated onto the

More recently, OpenCL is being used by programmers to take advantage of the

performance and power efficiencies that FPGAs provide. OpenCL allows programmers
to develop code in the C programming language and target FPGA functions as OpenCL
kernels using OpenCL constructs



Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to

model electronic systems. It is most commonly used in the design and verification of
digital circuits at the register-transfer level of abstraction. It is also used in the verification
of analog circuits and mixed-signal circuits, as well as in the design of genetic circuits


Hardware description languages such as Verilog are similar to software programming

languages because they include ways of describing the propagation time and signal
strengths (sensitivity). There are two types of assignment operators; a blocking
assignment (=), and a non-blocking (<=) assignment. The non-blocking assignment
allows designers to describe a state-machine update without needing to declare and
use temporary storage variables. Since these concepts are part of Verilog's language
semantics, designers could quickly write descriptions of large circuits in a relatively
compact and concise form. At the time of Verilog's introduction (1984), Verilog
represented a tremendous productivity improvement for circuit designers who were
already using graphical schematic capture software and specially written software
programs to document and simulate electronic circuits.

The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Like C,
Verilog is case-sensitive and has a basic preprocessor (though less sophisticated than
that of ANSI C/C++). Its control flow keywords (if/else, for, while, case, etc.) are
equivalent, and its operator precedence is compatible with C. Syntactic differences
include: required bit-widths for variable declarations, demarcation of procedural blocks
(Verilog uses begin/end instead of curly braces {}), and many other minor differences.
Verilog requires that variables be given a definite size. In C these sizes are assumed from
the 'type' of the variable (for instance an integer type may be 8 bits).

A Verilog design consists of a hierarchy of modules. Modules encapsulate design

hierarchy, and communicate with other modules through a set of declared input, output,
and bidirectional ports. Internally, a module can contain any combination of the following:
net/variable declarations (wire, reg, integer, etc.), concurrent and sequential statement
blocks, and instances of other modules (sub-hierarchies). Sequential statements are
placed inside a begin/end block and executed in sequential order within the block.
However, the blocks themselves are executed concurrently, making Verilog a dataflow

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and signal strengths (strong, weak, etc.). This system allows abstract
modeling of shared signal lines, where multiple sources drive a common net. When a wire
has multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths.

A subset of statements in the Verilog language are synthesizable. Verilog modules that
conform to a synthesizable coding style, known as RTL (register-transfer level), can be
physically realized by synthesis software. Synthesis software algorithmically transforms
the (abstract) Verilog source into a netlist, a logically equivalent description consisting
only of elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a
specific FPGA or VLSI technology. Further manipulations to the netlist ultimately lead to
a circuit fabrication blueprint (such as a photo mask set for an ASIC or a bitstream file for
an FPGA).



Verilog was one of the first popular[clarification needed] hardware description languages to be
invented.[citation needed] It was created by Prabhu Goel, Phil Moorby and Chi-Lai Huang and
Douglas Warmke between late 1983 and early 1984.[2] Chi-Lai Huang had earlier worked
on a hardware description LALSD, a language developed by Professor S.Y.H. Su, for his
PhD work.[3] The wording for this process was "Automated Integrated Design Systems"
(later renamed to Gateway Design Automation in 1985) as a hardware modeling
language. Gateway Design Automation was purchased by Cadence Design Systems in
1990. Cadence now has full proprietary rights to Gateway's Verilog and the Verilog-XL,
the HDL-simulator that would become the de facto standard (of Verilog logic simulators)
for the next decade. Originally, Verilog was only intended to describe and allow
simulation, the automated synthesis of subsets of the language to physically realizable
structures (gates etc.) was developed after the language had achieved widespread

Verilog is a portmanteau of the words "verification" and "logic".[4]


With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain
under the Open Verilog International (OVI) (now known as Accellera) organization.
Verilog was later submitted to IEEE and became IEEE Standard 1364-1995, commonly
referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards
support behind its analog simulator Spectre. Verilog-A was never intended to be a
standalone language and is a subset of Verilog-AMS which encompassed Verilog-95.

Verilog 2001[edit]

Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users
had found in the original Verilog standard. These extensions became IEEE Standard
1364-2001 known as Verilog-2001.

Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a
simple 8-bit addition required an explicit description of the Boolean algebra to determine
its correct value). The same function under Verilog-2001 can be more succinctly
described by one of the built-in operators: +, -, /, *, >>>. A generate/endgenerate construct
(similar to VHDL's generate/endgenerate) allows Verilog-2001 to control instance and
statement instantiation through normal decision operators (case/if/else). Using
generate/endgenerate, Verilog-2001 can instantiate an array of instances, with control
over the connectivity of the individual instances. File I/O has been improved by several
new system tasks. And finally, a few syntax additions were introduced to improve code
readability (e.g. always, @*, named parameter override, C-style function/task/module
header declaration).

Verilog-2001 is the version of Verilog supported by the majority of

commercial EDA software packages.

Verilog 2005[edit]

Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005)

consists of minor corrections, spec clarifications, and a few new language features (such
as the uwire keyword).

A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and
mixed signal modeling with traditional Verilog.

Main article: SystemVerilog

The advent of hardware verification languages such as OpenVera, and Verisity's e

language encouraged the development of Superlog by Co-Design Automation Inc
(acquired by Synopsys). The foundations of Superlog and Vera were donated
to Accellera, which later became the IEEE standard P1800-2005: SystemVerilog.

SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to

aid design verification and design modeling. As of 2009, the SystemVerilog and Verilog
language standards were merged into SystemVerilog 2009 (IEEE Standard 1800-2009).
The current version is IEEE standard 1800-2012.


A simple example of two flip-flops follows:

module toplevel(clock,reset);

input clock;

input reset;

reg flop1;

reg flop2;

always @ (posedge reset or posedge clock)

if (reset)


flop1 <= 0;

flop2 <= 1;



flop1 <= flop2;

flop2 <= flop1;



The "<=" operator in Verilog is another aspect of its being a hardware description
language as opposed to a normal procedural language. This is known as a "non-blocking"
assignment. Its action doesn't register until after the always block has executed. This
means that the order of the assignments is irrelevant and will produce the same result:
flop1 and flop2 will swap values every clock.

The other assignment operator, "=", is referred to as a blocking assignment. When "="
assignment is used, for the purposes of logic, the target variable is updated immediately.
In the above example, had the statements used the "=" blocking operator instead of "<=",
flop1 and flop2 would not have been swapped. Instead, as in traditional programming, the
compiler would understand to simply set flop1 equal to flop2 (and subsequently ignore
the redundant logic to set flop2 equal to flop1).

An example counter circuit follows:

module Div20x (rst, clk, cet, cep, count, tc);

// TITLE 'Divide-by-20 Counter with enables'

// enable CEP is a clock enable only

// enable CET is a clock enable and

// enables the TC output

// a counter using the Verilog language

parameter size = 5;

parameter length = 20;

input rst; // These inputs/outputs represent

input clk; // connections to the module.

input cet;

input cep;

output [size-1:0] count;

output tc;

reg [size-1:0] count; // Signals assigned

// within an always

// (or initial)block

// must be of type reg

wire tc; // Other signals are of type wire

// The always statement below is a parallel

// execution statement that

// executes any time the signals

// rst or clk transition from low to high

always @ (posedge clk or posedge rst)

if (rst) // This causes reset of the cntr

count <= {size{1'b0}};


if (cet && cep) // Enables both true


if (count == length-1)

count <= {size{1'b0}};


count <= count + 1'b1;


// the value of tc is continuously assigned

// the value of the expression

assign tc = (cet && (count == length-1));


An example of delays:


reg a, b, c, d;

wire e;

always @(b or e)


a = b & e;

b = a | b;

#5 c = b;

d = #6 c ^ e;


The always clause above illustrates the other type of method of use, i.e. it executes
whenever any of the entities in the list (the b or e) changes. When one of these
changes, a is immediately assigned a new value, and due to the blocking
assignment, b is assigned a new value afterward (taking into account the new value of a).
After a delay of 5 time units, c is assigned the value of b and the value of c ^ e is tucked
away in an invisible store. Then after 6 more time units, d is assigned the value that was
tucked away.

Signals that are driven from within a process (an initial or always block) must be of
type reg. Signals that are driven from outside a process must be of type wire. The
keyword reg does not necessarily imply a hardware register.

Definition of constants[edit]

The definition of constants in Verilog supports the addition of a width parameter. The
basic syntax is:

<Width in bits>'<base letter><number>


 12'h123 — Hexadecimal 123 (using 12 bits)

 20'd44 — Decimal 44 (using 20 bits — 0 extension is automatic)

 4'b1010 — Binary 1010 (using 4 bits)

 6'o77 — Octal 77 (using 6 bits)

Synthesizeable constructs[edit]

There are several statements in Verilog that have no analog in real hardware, e.g.
$display. Consequently, much of the language can not be used to describe hardware.
The examples presented here are the classic subset of the language that has a direct
mapping to real gates.

// Mux examples — Three ways to do the same thing.

// The first example uses continuous assignment

wire out;

assign out = sel ? a : b;

// the second example uses a procedure

// to accomplish the same thing.

reg out;

always @(a or b or sel)



1'b0: out = b;

1'b1: out = a;


// Finally — you can use if/else in a

// procedural structure.

reg out;

always @(a or b or sel)

if (sel)

out = a;


out = b;

The next interesting structure is a transparent latch; it will pass the input to the output
when the gate signal is set for "pass-through", and captures the input and stores it upon
transition of the gate signal to "hold". The output will remain stable regardless of the input
signal while the gate is set to "hold". In the example below the "pass-through" level of the
gate would be when the value of the if clause is true, i.e. gate = 1. This is read "if gate is
true, the din is fed to latch_out continuously." Once the if clause is false, the last value at
latch_out will remain and is independent of the value of din.

// Transparent latch example

reg latch_out;

always @(gate or din)


latch_out = din; // Pass through state

// Note that the else isn't required here. The variable

// latch_out will follow the value of din while gate is

// high. When gate goes low, latch_out will remain constant.

The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it
can be modeled as:

reg q;

always @(posedge clk)

q <= d;

The significant thing to notice in the example is the use of the non-blocking assignment.
A basic rule of thumb is to use <= when there is a posedge or negedge statement within
the always clause.

A variant of the D-flop is one with an asynchronous reset; there is a convention that the
reset state will be the first if clause within the statement.

reg q;

always @(posedge clk or posedge reset)


q <= 0;


q <= d;

The next variant is including both an asynchronous reset and asynchronous set condition;
again the convention comes into play, i.e. the reset term is followed by the set term.

reg q;

always @(posedge clk or posedge reset or posedge set)


q <= 0;


q <= 1;


q <= d;

Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.
Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set
goes high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume
no setup and hold violations.

In this example the always @ statement would first execute when the rising edge of reset
occurs which would place q to a value of 0. The next time the always block executes
would be the rising edge of clk which again would keep q at a value of 0. The always
block then executes when set goes high which because reset is high forces q to remain
at 0. This condition may or may not be correct depending on the actual flip flop. However,
this is not the main problem with this model. Notice that when reset goes low, that set is
still high. In a real flip flop this will cause the output to go to a 1. However, in this model it
will not occur because the always block is triggered by rising edges of set and reset —
not levels. A different approach may be necessary for set/reset flip flops.

The final basic variant is one that implements a D-flop with a mux feeding its input. The
mux has a d-input and feedback from the flop itself. This allows a gated load function.

// Basic structure with an EXPLICIT feedback path

always @(posedge clk)


q <= d;


q <= q; // explicit feedback path

// The more common structure ASSUMES the feedback is present

// This is a safe assumption since this is how the

// hardware compiler will interpret it. This structure

// looks much like a latch. The differences are the

// '''@(posedge clk)''' and the non-blocking '''<='''


always @(posedge clk)


q <= d; // the "else" mux is "implied"

Note that there are no "initial" blocks mentioned in this description. There is a split
between FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks
where reg values are established instead of using a "reset" signal. ASIC synthesis tools
don't support such a statement. The reason is that an FPGA's initial state is something
that is downloaded into the memory tables of the FPGA. An ASIC is an actual hardware

Initial and always[edit]

There are two separate ways of declaring a Verilog process. These are the always and
the initial keywords. The always keyword indicates a free-running process.
The initial keyword indicates a process executes exactly once. Both constructs begin
execution at simulator time 0, and both execute until the end of the block. Once
an always block has reached its end, it is rescheduled (again). It is a common
misconception to believe that an initial block will execute before an always block. In fact,
it is better to think of the initial-block as a special-case of the always-block, one which
terminates after it completes for the first time.



a = 1; // Assign a value to reg a at time 0

#1; // Wait 1 time unit

b = a; // Assign the value of reg a to reg b


always @(a or b) // Any time a or b CHANGE, run the process


if (a)

c = b;


d = ~b;

end // Done with this block, now return to the top (i.e. the @ event-control)

always @(posedge a)// Run whenever reg a has a low to high change

a <= b;

These are the classic uses for these two keywords, but there are two significant additional
uses. The most common of these is an always keyword without the @(...) sensitivity list.
It is possible to use always as shown below:


begin // Always begins executing at time 0 and NEVER stops

clk = 0; // Set clk to 0

#1; // Wait for 1 time unit

clk = 1; // Set clk to 1

#1; // Wait 1 time unit

end // Keeps executing — so continue back at the top of the begin

The always keyword acts similar to the C language construct while(1) {..} in the sense
that it will execute forever.

The other interesting exception is the use of the initial keyword with the addition of
the forever keyword.

The example below is functionally identical to the always example above.

initial forever // Start at time 0 and repeat the begin/end forever


clk = 0; // Set clk to 0

#1; // Wait for 1 time unit

clk = 1; // Set clk to 1

#1; // Wait 1 time unit



The fork/join pair are used by Verilog to create parallel processes. All statements (or
blocks) between a fork/join pair begin execution simultaneously upon execution flow
hitting the fork. Execution continues after the join upon completion of the longest running
statement or block between the fork and join.



$write("A"); // Print Char A

$write("B"); // Print Char B


#1; // Wait 1 time unit

$write("C");// Print Char C



The way the above is written, it is possible to have either the sequences "ABC" or "BAC"
print out. The order of simulation between the first $write and the second $write depends
on the simulator implementation, and may purposefully be randomized by the simulator.
This allows the simulation to contain both accidental race conditions as well as intentional
non-deterministic behavior.

Notice that VHDL cannot dynamically spawn multiple processes like Verilog.[5]

Race conditions[edit]

The order of execution isn't always guaranteed within Verilog. This can best be illustrated
by a classic example. Consider the code snippet below:


a = 0;


b = a;




$display("Value a=%d Value of b=%d",a,b);


What will be printed out for the values of a and b? Depending on the order of execution
of the initial blocks, it could be zero and zero, or alternately zero and some other arbitrary
uninitialized value. The $display statement will always execute after both assignment
blocks have completed, due to the #1 delay.


Note: These operators are not shown in order of precedence.

Operator type Operation performed

~ Bitwise NOT (1's complement)

& Bitwise AND

Bitwise | Bitwise OR

^ Bitwise XOR

~^ or ^~ Bitwise XNOR


Logical && AND

|| OR

& Reduction AND

~& Reduction NAND

| Reduction OR

~| Reduction NOR
^ Reduction XOR

~^ or ^~ Reduction XNOR

+ Addition

- Subtraction

- 2's complement
* Multiplication

/ Division

** Exponentiation (*Verilog-2001)

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to

Logical equality (bit-value 1'bX is removed from

Relational comparison)

Logical inequality (bit-value 1'bX is removed from


4-state logical equality (bit-value 1'bX is taken as


4-state logical inequality (bit-value 1'bX is taken as

>> Logical right shift

<< Logical left shift

>>> Arithmetic right shift (*Verilog-2001)

<<< Arithmetic left shift (*Verilog-2001)

Concatenation {, } Concatenation

Replication {n{m}} Replicate value m for n times

Conditional ?: Conditional



Belief propagation, also known as sum-product algorithm is a message passing algorithm

for performing inference on graphical models, such as Bayesian networks and Markov
random fields. It calculates the marginal distribution for each unobserved node,
conditional on any observed nodes. Belief propagation is used in artificial intelligence and
information theory and has demonstrated empirical success in numerous applications
including low-density parity-check codes, turbo codes. (a).Belief Propagation Algorithm
Update Check Messages For each check node j, and for every bit node associated with
it j compute: we assume BPSK modulation, which maps a codeword c= (c1,c2,……….,
cN) into a transmitted sequence s= (s1,s2,…..sN). Then S is transmitted over a channel
corrupted by additive white Gaussian noise (AWGN) [5]. Sum product algorithm consist
the following steps. First LLRs used for priori message probabilities, then parity check
matrix H and maximum number of allowed iterations Imax. Steps for sum product
algorithm Initialise Set Qij = λj, this initialise the check nodes with priori message

Test for valid codeword Make a tentative decision on codeword

If number of iterations is Imax or valid codeword has been found then finish Update Bit
Messages For each bit node j, and for every check node associated with it j compute:
Modified sum product algorithm Decoder which implement using modified sum product
algorithm which is an approximated algorithm in context with normal SPA algorithm.
Modified SPA is easy for implementation of decoder [6]. When encoder output is
transmitted through the AWGN channel the output is given to the decoder’s variable node.
Let M (n) denote the set of check nodes connected to the symbol node n and N (m) the
set of symbol nodes participating in the m-th parity-check equation. Step1 Initialization
Assuming the AWGN channel with noise variance 𝜎2, the reliability value is Lc = 2/𝜎2.


The initialization is done in every position (m, n) of the parity check matrix H, where Hm,
n =1 as


Iterative Process Update the check-node LLR, for each m and for each n∈ N (m), as

Note that both the tanh and tanh−1 . functions are increasing and have odd symmetry.
Thus, the sign and the magnitude of the incoming messages can be used in a simplified
version, as
Step 3 Variable node update Update the variable node LLR, for each n and for each m
∈ M (n), as

Step 4 Decision Process Decide if 𝜆𝑛 (𝑢𝑛) ≥ 0, then 𝑢𝑛=0 and if 𝜆𝑛 (𝑢𝑛) ≤ 0 then 𝑢𝑛=1.
Then compute the syndrome uHT =0, then the codeword (u) is the final codeword,
otherwise the iteration takes place till valid code word is obtained




Low density parity check (LDPC) code is an error correcting code which is used in noisy
communication channel to reduce probability of loss of information. LDPC codes are
capacity-approaching codes, which means that practical constructions exist that allow the
noise threshold to be set very close the theoretical maximum (the Shannon limit) for a
symmetric memoryless channel. The noise threshold defines an upper bound for the
channel noise, up to which the probability of lost information can be reduce. Low density
parity check (LDPC) codes are also known as Gallager codes because these codes
proposed by R.G. Gallager in 1962[1]. With increased capacity of computers and the
development of relevant theories such as belief propagation algorithm LDPC codes were
rediscovered by Mackay and Neal in 1996[2]. LDPC codes are linear block codes which
defined by sparse M × N parity check matric H. Where N is LDPC code length [3]. The
tanner graph has been introduced to represent LDPC codes [4]. Tanner graphs are
bipartite graph. Tanner graph have two types of nodes are variable node (v- node) and
check node(c-node). The n coordinates of the codewords are associated with the n
message nodes. The codewords are those vectors such that for all check nodes the sum
of the neighbouring positions among the message nodes is zero.

Here we consider H be the parity check matrix of irregular (10, 5) LDPC code and its
tanner graph is also shown in fig. 1. For this LDPC code the path (c1 → v8 → c3 → v10
→ p1) with the black bold lines. In recent year studies the decoding is done by various
algorithms and different types of decoders are designed such as partially parallel decoder,
memory efficient decoders. In all decoding scheme the belief propagation decoding is
lead to good approximate decoder. A belief propagation decoding of LDPC codes on
memoryless channel is best practical decoding algorithm

Encoder Block Diagram:

During the encoding of a frame, the input data bits (D) are repeated and distributed to a
set of constituent encoders. The constituent encoders are typically accumulators and
each accumulator is used to generate a parity symbol. A single copy of the original data
(S0,K-1) is transmitted with the parity bits (P) to make up the code symbols. The S bits from
each constituent encoder are discarded.

The parity bit may be used within another constituent code.

In an example using the DVB-S2 rate 2/3 code the encoded block size is 64800 symbols
(N=64800) with 43200 data bits (K=43200) and 21600 parity bits (M=21600). Each
constituent code (check node) encodes 16 data bits except for the first parity bit which
encodes 8 data bits. The first 4680 data bits are repeated 13 times (used in 13 parity
codes), while the remaining data bits are used in 3 parity codes (irregular LDPC code).

For comparison, classic turbo codes typically use two constituent codes configured in
parallel, each of which encodes the entire input block (K) of data bits. These constituent
encoders are recursive convolutional codes (RSC) of moderate depth (8 or 16 states) that
are separated by a code interleaver which interleaves one copy of the frame.

The LDPC code, in contrast, uses many low depth constituent codes (accumulators) in
parallel, each of which encode only a small portion of the input frame. The many
constituent codes can be viewed as many low depth (2 state) 'convolutional codes' that
are connected via the repeat and distribute operations. The repeat and distribute
operations perform the function of the interleaver in the turbo code.

The ability to more precisely manage the connections of the various constituent codes
and the level of redundancy for each input bit give more flexibility in the design of LDPC
codes, which can lead to better performance than turbo codes in some instances. Turbo
codes still seem to perform better than LDPCs at low code rates, or at least the design of
well performing low rate codes is easier for Turbo Codes.

As a practical matter, the hardware that forms the accumulators is reused during the
encoding process. That is, once a first set of parity bits are generated and the parity bits
stored, the same accumulator hardware is used to generate a next set of parity bits.


As with other codes, the maximum likelihood decoding of an LDPC code on the binary
symmetric channel is an NP-complete problem. Performing optimal decoding for a NP-
complete code of any useful size is not practical.

However, sub-optimal techniques based on iterative belief propagation decoding give

excellent results and can be practically implemented. The sub-optimal decoding
techniques view each parity check that makes up the LDPC as an independent single
parity check (SPC) code. Each SPC code is decoded separately using soft-in-soft-out
(SISO) techniques such as SOVA, BCJR, MAP, and other derivates thereof. The soft
decision information from each SISO decoding is cross-checked and updated with other
redundant SPC decodings of the same information bit. Each SPC code is then decoded
again using the updated soft decision information. This process is iterated until a valid
code word is achieved or decoding is exhausted. This type of decoding is often referred
to as sum-product decoding.

The decoding of the SPC codes is often referred to as the "check node" processing, and
the cross-checking of the variables is often referred to as the "variable-node" processing.

In a practical LDPC decoder implementation, sets of SPC codes are decoded in parallel
to increase throughput.

In contrast, belief propagation on the binary erasure channel is particularly simple where
it consists of iterative constraint satisfaction.

For example, consider that the valid codeword, 101011, from the example above, is
transmitted across a binary erasure channel and received with the first and fourth bit
erased to yield ?⁠01?⁠
11. Since the transmitted message must have satisfied the code
constraints, the message can be represented by writing the received message on the top
of the factor graph.

In this example, the first bit cannot yet be recovered, because all of the constraints
connected to it have more than one unknown bit. In order to proceed with decoding the
message, constraints connecting to only one of the erased bits must be identified. In this
example, only the second constraint suffices. Examining the second constraint, the fourth
bit must have been zero, since only a zero in that position would satisfy the constraint.

This procedure is then iterated. The new value for the fourth bit can now be used in
conjunction with the first constraint to recover the first bit as seen below. This means that
the first bit must be a one to satisfy the leftmost constraint.
Thus, the message can be decoded iteratively. For other channel models, the messages
passed between the variable nodes and check nodes are real numbers, which express
probabilities and likelihoods of belief.

This result can be validated by multiplying the corrected code word r by the parity-check
matrix H:

Code construction

For large block sizes, LDPC codes are commonly constructed by first studying the
behaviour of decoders. As the block size tends to infinity, LDPC decoders can be shown
to have a noise threshold below which decoding is reliably achieved, and above which
decoding is not achieved,[17] colloquially referred to as the cliff effect. This threshold
can be optimised by finding the best proportion of arcs from check nodes and arcs from
variable nodes. An approximate graphical approach to visualising this threshold is an
EXIT chart. The construction of a specific LDPC code after this optimization falls into
two main types of techniques:

 Pseudorandom approaches
 Combinatorial approaches

Construction by a pseudo-random approach builds on theoretical results that, for large

block size, a random construction gives good decoding performance.[7] In general,
pseudorandom codes have complex encoders, but pseudorandom codes with the best
decoders can have simple encoders.[18] Various constraints are often applied to help
ensure that the desired properties expected at the theoretical limit of infinite block size
occur at a finite block size.


(a).Simulation result Simulation is done to check the correctness of the code. In this
paper Verilog coding technique is used. Verilog code is designed to check errors like
syntax errors, logical errors, etc. simulation is done by using modelsim. Decoding is
done using Verilog in Xilinx and simulation results are given below.


Synthesis is done using the software mentor graphics Leonardo spectrum. And the
technology used is Virtex IV. In synthesis the result is observed like chip area, delay,
etc. chip area is find out by the number of look up tables (LUTs). The delay gives
propagation delay. The table show the results obtained during the synthesis of the code
to implement the LDPC decoder.


The decoder for the LDPC codes is implemented with use of bipartite graph. Code is
implemented by using Verilog and in Xilinx and simulation is done by using modelsim.
The decoder modified sum product algorithm was found to be effective for decoding.
We observed that high throughput LDPC decoding architectures should exploit the
benefit of parallel decoding algorithms.

[1]. Robert G. Gallager, “Low Density Parity Check Codes”, IRE Trans. Inf. Theory, Vol.
IT-8, No.1, pp. 21–28, Jan 1962.

[2]. MacKay, D.J.C., “Good error-correcting codes based on very sparse matrices”,
IEEE Trans. Inform. Theory, Vol. 45, No. 3, pp. 399–431, March 1999.

[3]. Lei Yang, Hui Liu, and C.-J. Richard Shi,” Code Construction and FPGA
Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code
Decoder”, Department of Electrical Engineering University of Washington, Seattle, WA

[4]. D. J. C. Mackay, S. T. Wilson and M. C. Davey, “Comparison of construction of

irregular Gallager codes”, IEEE Transactions on Communications, Vol. 47, pp. 1449-
1454, Oct. 1999.

[5]. Jinghu Chen, and Marc P. C. Fossorier, Senior Member, “Near Optimum Universal
Belief Propagation Based Decoding of LowDensity Parity Check Codes”, IEEE
Transactions on Communications, Vol. 50, No. 3, March 2002.
[6]. S. Papaharalabos, P. Sweeney, B.G. Evans, P.T. Mathiopoulos, G. Albertazzi, A.
Vanelli-Coralli and G.E. Corazza, “ Modified sum-product algorithms for decoding low-
density parity-check codes”, IET Communication., 2007, Vol. 1, No. 3, pp. 294–300,

[7]. Guido Masera, Federico Quaglio, and Fabrizio Vacca, “Implementation of a Flexible
LDPC Decoder”, IEEE Transactions on circuits and systems, Vol. 54, No. 6, June 2007.

[8]. T. Richardson, “Error floors of LDPC codes”, in Proc. Annual Allerton Conference on
Communication, Control, and Computing, Monticello, IL, pp. 1426-1435, 2003

[9]. Tuan Ta, “a Tutorial on Low Density Parity-Check Codes”, The University of Texas
at Austin.

[10]. Edward Liao1, Engling Yeo2, Borivoje Nikolic, “Low-Density Parity-Check Code
Constructions for Hardware Implementation, IEEE Communications Society” 2004.

[11]. Jin Sha, Minglun Gao, Zhongjin Zhang,Li Li, Zhongfeng Wang, “High-Throughput
and Memory Efficient LDPC Decoder Architecture”, Proceedings of the 5th WSEAS Int.
Conf. on Instrumentation, Measurement, Circuits and Systems, Hangzhou, China, pp.
224-229, April 2006