Documente Academic
Documente Profesional
Documente Cultură
Abstract
This paper presents a Viterbi decoder (VD) architecture for a programmable data transmission system, implemented using a Field
Programmable Gate Array (FPGA) device. This VD has been conceived as a building block of a software defined radio (SDR) mobile
transceiver, reconfigurable on request and capable to provide agility in choosing between different standards. UMTS and GPRS Viterbi
decoding is achieved by choosing different coding rates and constraint lengths, and the possibility to switch, at run time, between them
guarantees a high degree of programmability. The architecture has been tested and verified with a Xilinx XC2V2000 FPGA for providing
a generalized co-simulation/co-design testbed. The results show that this decoder can sustain an uncoded data rate of about 2 Mbps, with
an area occupancy of 46%, due to the efficient resources reuse.
r 2007 Elsevier B.V. All rights reserved.
Keywords: Viterbi decoder; Software defined radio; Field Programmable Gate Array (FPGA); Configurable and programmable architecture
0167-9260/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.vlsi.2007.04.001
ARTICLE IN PRESS
162 L. Bissi et al. / INTEGRATION, the VLSI journal 41 (2008) 161–170
can be a viable solution for fast switching between different operating frequency required by the UMTS standard [11]
coding rates when Channel Status Information is available, is of primary concern.
and a simpler code can be used in place of a stronger code. Moreover, three-bit soft decision has been adopted as
Adaptive channel coding systems have recently been quantized input: this choice represents a good trade-off
introduced as means of efficiently allocating power and between complexity and accuracy [12].
coding rate [6]. The architecture of the decoder itself, thus, The architecture of a Viterbi decoder usually consists of
must accommodate for such programmability/reprogram- three units, mainly:
mability, either at the algorithmic or hardware logic level.
In general, both requirements can easily be satisfied when 1. Branch Metric Unit (BMU), which generates the branch
an Field Programmable Gate Array (FPGA) device is used metrics (BMs) and measures the distance between the
for this purpose [5–8]. received symbol and the symbol associated with the
Moreover, reprogrammability also represents a key transition among trellis states;
aspect of a software defined radio (SDR) terminal [9]. 2. Add Compare Select Unit (ACSU), which finds the
Such a terminal must have an interface with several survivor path for each state; the BM of each transition is
communication standards, and its decoding capabilities added to its partial path metric (PM) and the path with
are in large part accomplished by means of software repro- the smallest PM is selected as the survivor path. The
grammable devices [10]. The channel decoding of convolu- ACSU also provides a decision bit, which indicates that
tional codes can be carried out by a programmable the survivor path is either the lower or the upper path to
architecture VD, which is matched to a particular class of the current state;
standards and specifications. 3. Survivor Memory Unit (SMU), which stores the
In this paper, we present an architecture for a standard- survivor paths.
agile, programmable VD that is able to switch between
UMTS and GPRS decoding. In our approach, the The proposed VD is based on the register-exchange (RE)
reconfiguration is not a critical point because we are using method [13], which uses a register for each one of the 2(K1)
the device to implement the decoder only. But in a ‘‘real’’ states. The register records the decoded output sequence
scenario, where the FPGA houses the whole transceiver associated with the path leading from the initial to the final
digital processing sub-system (such as in software radio state. The RE approach does not require a trace-back and,
applications), the possibility to reuse the building blocks therefore, it allows for a fast decoding, in comparison with
for different standards becomes very important in order to the trace-back (TB) approach [2,4,5,14].
minimize efforts during the design process. Therefore, we It is well known that choosing a survivor path length of
propose an architecture that can be shared between two five times the code constraint length results in a negligible
different standards. On the other hand, the programmabi- degradation from the ideal decoder performance [12]. Since
lity of the system can be useful for a particular application, in our decoder the convolutional code has K ¼ 9, a survivor
in which the coding rate is modified at run time during the path length equal or larger than 45 is required. Nevertheless,
transmission. The proposed VD can be programmed to we have chosen a length equal to 52 in order to exploit all
deal with convolutional codes of constraint length 5 and 9 the available memory, without wasting resources.
and code rate 1/2 and 1/3. In addition, thanks to the The core of the proposed VD architecture for constraint
reprogrammability of the FPGA, it is possible to imple- length equal to 9 is shown in Fig. 1. The block diagram
ment new features in the VD (e.g. new functionalities or includes the system that provides control signals for the
different standards). We begin with presenting the adopted synchronization of the main blocks.
algorithmic architecture of the VD with a constraint length For this architecture, the detailed interconnection among
of 9 and a code rate programmable between 1/3 and 1/2, main blocks is depicted in Fig. 2.
then we describe the modifications needed to manage a A large part of the whole system logic is included in the
constraint length of 5. After that, we show some features of Node block, which is used to evaluate the survivor path
the adopted convolutional codes and the obtained results, metric of the states. The outputs of this block are: the state
in terms of implementation complexity of the decoder. PM and the decoded output sequence. There are N ¼ 16
Eventually, we compare the BER performance of our Node blocks (Node_0, y, Node_15), consecutively num-
decoder with that of a behavioral model implemented in bered and following the order of the states of the trellis, i.e.
Matlab. the generic Node_k block (k ¼ 0, y, 15) analyzes the states
labeled with 16*k+i (i ¼ 0, y, 15); each block can manage
2. Architecture of the programmable VD 16 different states for every input symbol. The Node block
(Fig. 3) is composed of the following sub-blocks:
The complexity of the VD exponentially increases with
the constraint length K. For increasing values of K, larger a ROM,
hardware resources are required, both in terms of proces- two BMUs,
sing power and memory. In this paper, the reuse of an ACSU,
resources for reducing the area occupancy and the a Control Unit.
ARTICLE IN PRESS
L. Bissi et al. / INTEGRATION, the VLSI journal 41 (2008) 161–170 163
The ROM contains the branch output values of the The decoded output sequence for each state is the bit
convolutional code. ROM addressing is provided by sequence of the path, converging to that state, with the
the Counter block, which counts the number of times that lowest PM, which has been left-shifted in order to append
the Node block processes the input symbol. A single-bit the bit associated to the path itself. The Control Unit of
input (code_rate) allows for choosing between 1/3 and 1/2 the Node block manages the correct scanning of the trellis,
code rates, and it is the most significant bit of the ROM and provides the enable signals (e_u_BMU, e_l_BMU,
address. The branch outputs and the input symbols are e_ACSU) to the selected block, for power-saving purposes.
routed to the BMUs that compute the BMs of the upper Moreover, in order to set up the number of trellis levels, the
and lower input branches (o_u_b, o_l_b), for each state. length of the encoded sequence is required as input.
Since the code rate can be as low as 1/3, the decoder can Fig. 4 shows in detail the connection between Node and
process three input symbols at the same time. Each BMU RAM blocks. The outputs of the 16 Node blocks (seq_out,
consists of three sub-blocks, used to calculate the distance metric_out) are stored in 8 RAM blocks.
between each input symbol and the relevant output branch The Node_2j and Node_(2j+1) blocks (j ¼ 0, y, 7)
bits. BMUs output is generated by a two-step process in outputs are stored in the RAM_2j_(2j+1) block. The two
order to increase the computational speed, and a double RAM outputs are forwarded to the Node_j and No-
clock frequency is required. de_(j+8) blocks, which analyze the states labeled with
The ACSU inputs are represented by the BMs of the j*16+i and j*16+2(92)+i (i ¼ 0, y, 15). This architec-
branches entering the state (BM_u_b, BM_l_b), the PMs, ture follows the butterfly structure of the trellis reported in
and the decoded output sequences from the previous states Fig. 5, in order to minimize the number of connections in
(upper_node, lower_node). This block computes the PM of between. By adopting this strategy, the number of
the two paths entering the state, and the survivor path with connections among the Node and RAM blocks is reduced.
lower PM is eventually chosen. The ACSU outputs Fig. 6 shows an example where the RAM_0_1 block
(node_out bus) represent the node PM (metric_out) and stores the seq_out and the metric_out outputs of the trellis
the decoded output sequence (seq_out and data_out). states, and the RAM_0_1 outputs are fed back to the
ARTICLE IN PRESS
164 L. Bissi et al. / INTEGRATION, the VLSI journal 41 (2008) 161–170
Fig. 2. Details of the interconnections among the main blocks of the proposed Viterbi decoder.
Fig. 6. Equivalence between the trellis diagram and the Node and RAM
blocks connection.
Table 1
Resources used in the FPGA device (percentile figures are referred to the
whole device area)
Logic utilization
Number slice registers 3490 (16%) 3501 (16%)
No. of 4 input LUTs 7269 (33%) 8170 (38%)
Logic distribution
No. of SLICEs 4732 (44%) 4966 (46%)
No. of 4 input LUTs 8014 (37%) 8881 (41%)
No. of bonded IOBs 38 (8%) 39 (8%)
No. of DCMs 1 (12%) 1 (12%)
No. of block RAMs 33 (59%) 33 (59%)
Total equivalent gate count 2,253,595 2,256,522
Maximum clock frequency (MHz) 32.256 32.256
Maximum decoded bit rate (Mbps) 2.016 2.016
5. Conclusion
Table 2
Summary of the performance of VD architecures (Label ‘‘–’’ means the information is not available)
Guo et al. [2,4] Tessier et al. [5] Chadha and Han et al. [18] Proposed architecture
Cavallaro [7]
Area occupancy 2793/5120 slices 1469/12288 slices 89407/888439 gates 8090/10752 slices 4966/10752 slices
(54%) (12%) (10%) (75%) (46%)
Throughput 78 kbps 240.3 kbps 19.7 Mbps – 2.016 Mbps
Programmability/overhead No No Yes/2.9% No Yes/2%
Constraint length (K) and K¼9 K¼9 K ¼ 3–7 K¼7 K ¼ 5–9
code rate (R) R ¼ 1/2 — R ¼ 1/2–1/3 R ¼ 1/2 R ¼ 1/2–1/3
Operating clock (MHz) 40 7.6 – 13.3 32.3
Coding gain at a BER of 5 3.1–3.7 – – 3.5
105 (dB)
References
[1] G.D. Forney, The Viterbi algorithm, Proc. IEEE 61 (3) (1973)
268–278.
[2] M. Guo, M. Omair Ahmad, M. Swamy, C. Wang, A low-power
systolic array-based adaptive Viterbi decoder and its FPGA
implementation, in: Proceedings of the ISCAS’03, vol. 2, 25–28
May 2003, pp. 276–279.
[3] X. Qin, M. Zhu, Z. Wei, D. Chao, An adaptive Viterbi decoder based
on FPGA dynamic reconfiguration technology, in: Proceedings of
ICFPT’04, 2004, pp. 315–318.
[4] M. Guo, M. Ahmad, M. Swamy, C. Wang, FPGA design and
implementation of a low-power systolic array-based adaptive Viterbi
decoder, IEEE Trans. Circuits Syst. I 52 (2005) 350–365.
[5] R. Tessier, S. Swaminathan, R. Ramaswamy, D. Goeckel, W.
Burleson, A reconfigurable, power-efficient adaptive Viterbi decoder,
IEEE Trans. VLSI Syst. 13 (2005) 484–488.
[6] E.-A. Choi, J.-W. Jung, N.-S. Kim, FPGA realization of adaptive
coding rate trellis-coded 8PSK system, in: Proceedings of
PIMRC2003 1, vol. 1, 7–10 September 2003, pp. 702–706.
Fig. 12. Bit error rate (BER) vs. Eb/N0 for the UMTS (G0 ¼ 557OCT, [7] K. Chadha, J. Cavallaro, A reconfigurable Viterbi decoder archi-
G1 ¼ 663OCT, G2 ¼ 771OCT) standard. tecture, in: Proceedings of 35th Asilomar Conference on Signals,
Systems and Computers, vol. 1, 4–7 November 2001, pp. 66–71.
[8] Z. Cheng, S. Khawam, T. Arslan, Domain specific reconfigurable
fabric targeting Viterbi algorithm, in: Proceedings of ICFPT’04, 2004,
pp. 363–366.
[9] J. Mitola, Technical challenges in the globalization of software radio,
IEEE Personal Commun. 6 (1999) 84–89.
[10] M. Cummings, S. Haruyama, FPGA in the software radio, IEEE
Commun. Mag. 37 (1999) 108–112.
[11] 3GPP TS 25.212, Technical Specification Group Radio Access
Network; Multiplexing and channel coding (FDD), /http://www.
3gpp.orgS.
[12] J.G. Proakis, Digital Communications, third ed, McGraw-Hill,
New York, 1995, pp. 506–511.
[13] D.A.F. El-Dib, M.I. Elmasry, Low-power register-exchange Viterbi
decoder for high-speed wireless communications, in: Proceedings of
the 2002 IEEE International Symposium on Circuits and Systems
(ISCAS 2002), May 2002, pp. 737–740.
[14] T.K. Truong, M.-T. Shih, I.S. Reed, E.H. Satorius, A VLSI design
for a trace-back Viterbi decoder, IEEE Trans. Commun. 40 (3) (1992)
616–624.
[15] 3GPP TS 03.64 V8.12.0 (2004-04) 3rd Generation Partnership
Project; Technical Specification Group GSM/EDGE Radio Access
Network; General Packet Radio Service (GPRS); Overall description
Fig. 13. Bit error rate (BER) vs. Eb/N0 for the GPRS (G0 ¼ 23OCT, of the GPRS radio interface; Stage 2 (Release 1999), /http://www.
G1 ¼ 33OCT) standard. 3gpp.org/ftp/Specs/html-info/0364.htmS.
ARTICLE IN PRESS
170 L. Bissi et al. / INTEGRATION, the VLSI journal 41 (2008) 161–170
[16] Xilinx, VirtexTM-II Platform FPGAs: Detailed Description, DS031-2 Giuseppe Baruffa was born in Perugia, Italy, in
(v3.1) Product Specification, October 2003, /http://www.xilinx.comS. 1970. He received his Laurea degree in Electronic
[17] XtremeDSP Development Kit User Guide NT107-0132, Issue 9, Engineering from University of Perugia, Italy, in
/http://www.nallatech.comS. 1996, and his Ph.D. degree in Telecommunica-
[18] J.-S. Han, T.-J. Kim, C. Lee, High performance Viterbi decoder using tions also from the University of Perugia in 2001.
modified register exchange methods, in: Proceedings of the 2004 Since 2005, he has been an Assistant Professor in
International Symposium on Circuits and Systems (ISCAS’04), vol. 3, Electronic Engineering at the Department of
23–26 May 2004, pp. III—553–556. Electronic and Information Engineering (DIEI),
[19] M.A. Bickerstaff, D. Garrett, T. Prokop, C. Thomas, B. Widdup, G. University of Perugia. His main research interests
Zhou, L.M. Davis, G. Woodward, C. Nicol, R.-H. Yan, A unified are in the field of digital broadcasting techniques
turbo/Viterbi channel decoder for 3GPP mobile wireless in 0.18-mm (DVB), frequency assignment algorithms in nonlinear environments, joint
CMOS, IEEE J. Solid-State Circuits 37 (11) (2002) 1555–1564. source/channel coding for video over wireless networks. In 2005, he spent
6 months at the Signal Processing Institute, Swiss Federal Polytechnic of
Lausanne (EPFL), working on the transmission of JPEG2000 video over
Lucia Bissi graduated Cum Laude in Electronic
Engineering at the University of Perugia (Italy) in wireless networks.
2004. Currently, she is a Ph.D. student at the
Department of Electronic and Information En-
gineering (DIEI) of the University of Perugia, Andrea Scorzoni earned his doctoral degree in
Italy. Her research interests are in digital circuit Electronics in 1989. From 1983 to 1998 he
design on Field Programmable Gate Arrays worked as a grant holder and then as a research
(FPGAs) and in the interfacing of sensors with fellow at the CNR-Institute of Microelectronics
microcontrollers, with programmable systems on and Microsystems (IMM, formerly LAMEL,
chip and with computer networks, with particular Bologna, Italy). From 1989 to 1998 he was local
attention to the standardization of ‘Smart sensors’. scientific responsible of several European scien-
tific programs (SPECTRE, ADEQUAT, PRO-
Pisana Placidi from November 1996 to October PHECY). Since 1998 he is Professor of
1999 was a selected student of the ‘Doctoral Electronics at the Department of Electronic and
Student Programme in Engineering and Applied Information Engineering (DIEI), University of Perugia, Italy. From 1999
Science’ and she joined the MIC team working at he has been local responsible of a number of national projects (CNR-
CERN (Geneve, Switzerland). She is presently MADESS II on metal reliability, three PRIN projects on a high-frequency
Research Associate in the Department of Elec- front-end and on DNA sensors). His areas of interest include measure-
tronic and Information Engineering (DIEI), ments and modeling of physical parameters of solid-state electron devices
University of Perugia. Her current research on silicon and silicon carbide, modeling of sensors and in particular solid-
interests are in the following fields: design and state radiation sensors, design of microsystems, electromigration phenom-
fabrication of the electronics for active pixel ena, and design of embedded systems. Prof. Scorzoni published about 70
detectors integrated in standard CMOS submicron technology; interfacing technical papers related to the above fields in international journals. He
of sensors with microcontrollers, with programmable system on chip and also authored or cohauthored more than 70 contributed and invited
with computer networks. papers at international conferences.