Documente Academic
Documente Profesional
Documente Cultură
Preface
I have never thought of myself as a book writer. Over the course of my career, I have
written volumes of technical documentation, published several articles in technical
magazines, and have done a lot of technical blogging. At some point, I have
accumulated a wealth of experience and knowledge in the area of FPGA design, and
thought it was a good time to share it with a broader audience.
Writing a book takes time, commitment, and discipline. It also requires a very different
skill set. Unfortunately, many engineers, including myself, are trained to use
programming languages better than natural languages. Despite all that, writing a book
is definitely an intellectually rewarding experience.
I would like to express my gratitude to all the people who have provided valuable ideas,
reviewed technical contents, and edited the manuscript: my colleagues from SerialTek,
former colleagues from Xilinx, technical bloggers, and many others.
Table of Contents
1. Introduction 9
2. FPGA Landscape 11
3. FPGA Applications 14
4. FPGA Architecture 17
5. FPGA Project Tasks 22
6. Overview of FPGA design tools 30
3
Introduction
Tips 1:5
Tips 6:12
Tips 13:19
Tips 20:37
FPGA selection
Tips 38:44
Tips 45:55
Tips 56:64
Tips 65:77
Design Optimizations
Tips 78:81
Tips 82:89
Tips 90:96
Tips 97:99
Resources
Tip 100
1. Introduction
Target audience
FPGA logic design has grown from being one of many hardware engineering skills a
decade ago to a highly specialized field. Nowadays, FPGA logic design is a full time job.
It requires a broad range of skills, such as a deep knowledge of FPGA design tools, the
ability to understand FPGA architecture and sound digital logic design practices. It can
take years of training and experience to master those skills in order to be able to design
complex FPGA projects.
This book is intended for electrical engineers and students who want to improve their
FPGA design skills. Both novice and seasoned logic and hardware engineers can find
bits of useful information in this book. It is intended to augment, not replace, existing
FPGA documentation, such as user manuals, datasheets, and user guides. It provides
useful and practical design tips and tricks, and little known facts that are hard to find
elsewhere.
The book is intended to be very practical with a lot of illustrations, code examples and
scripts. Rather than having a generic discussion applicable to all FPGA vendors, this
edition of the book focuses on Xilinx FPGAs. Code examples are written in Verilog
HDL. This will enable more concrete examples and in-depth discussions. Most of the
examples are simple enough, and can be easily ported to other FPGA vendors and
families, and VHDL language.
The book provides an extensive collection of useful online references.
It is assumed that the reader has some digital design background, and working
knowledge of ASIC or FPGA logic design using Verilog HDL.
range of topics in a limited space. Instead, it covers the important points, and provides
references for further exploration of that topic. Some of the material in this book has
appeared previously as more complete articles in technical magazines.
Software
The FPGA synthesis and simulation software used in this book is a free Web edition of
Xilinx ISE package.
4. FPGA Architecture
The key to successful design is a good understanding of the underlying FPGA
architecture, capabilities, available resources, and just as important - the limitations.
This Tip uses Xilinx Virtex-6 family as an example to provide a brief overview of the
architecture of a modern FPGA.
The main architectural components, as illustrated in the following figure, are logic and
IO blocks, interconnect matrices, clocking resources, embedded memories, routing, and
configuration logic.
Many high-end FPGAs also include complex functional modules such as memory
controllers, high speed serializer/deserializer transceivers, integrated PCI Express
interface, and Ethernet MAC blocks.
The combination of FPGA logic and routing resources is frequently called FPGA fabric.
The term derives its name from its topological representation. As the routing between
logic blocks and other resources is drawn out, the lines cross so densely that it resembles
a fabric.
11
Logic blocks
Logic block is a generic term for a circuit that implements various logic functions. A
logic block in Xilinx FPGAs is called Slice. A Slice in Virtex-6 FPGA contains four lookup tables (LUTs), eight registers, a carry chain, and multiplexers. The following figure
shows main components of a Virtex-6 FPGA Slice.
The connectivity between LUTs, registers, multiplexers, and a carry chain can be
configured to form different logic circuits.
There are two different Slice types: SLICEM and SLICEL. A SLICEM has a multipurpose LUT, which can also be configured as a Shift Register LUT (SRL), or a 64- or 32bit read-only or random access memory.
Each Slice register can be configured as a latch.
Clocking resources
Each Virtex-6 FPGA provides several highly configurable mixed-mode clock managers
(MMCMs), which are used for frequency synthesis and phase shifting.
Clocks to different synchronous elements across FPGA are distributed using dedicated
low-skew and low-delay clock routing resources. Clock lines can be driven by global
clock buffers, which allow glitchless clock multiplexing and the clock enable.
More detailed discussion of Xilinx FPGA clocking resources is provided in Tip #20.
Copyright 2011 Evgeni Stavinov
Embedded memory
Xilinx FPGAs have two types of embedded memories: a dedicated Block RAM (BRAM)
primitive, and a LUT configured as Distributed RAM
Virtex-6 BRAM can store 36K bits, and can be configured as a single- or dual-ported
RAM. Other configuration options include data width of up to 36-bit, memory depth up
to 32K entries, and error detection and correction.
Tip #34 describes different use cases of FPGA-embedded memory.
DSP
Virtex-6 FPGAs provide dedicated Digital Signal Processing (DSP) primitives to
implement various functions used in DSP applications, such as multipliers,
accumulators, and signed arithmetic operations. The main advantage of using DSP
primitives instead of general-purpose LUTs and registers is high performance.
Tip #28 describes different use cases of DSP primitive.
Input/Output
Input/Output (IO) block enables different IO pin configurations: IO standards, singleended or differential, slew rate and the output strength, pull-up or pull-down resistor,
digitally controlled impedance (DCI). An IO in Virtex-6 can be delayed by up to 32
increments of 78 ps each by using an IODELAY primitive.
Serializer/Deserializer
Most of Virtex-6 FPGAs include dedicated transceiver blocks that implement
Serializer/Deserializer (SerDes) circuits. Transceivers can operate at a data rate between
155 Mb/s and 11.18 Gb/s, depending on the configuration.
Routing resources
FPGA routing resources provide programmable connectivity between logic blocks, IOs,
embedded memory, DSP, and other modules. Routing resources are arranged in a
horizontal and vertical grid. A special interconnect module serves as a configurable
switch box to connect logic blocks, IOs, DSP, and other module to horizontal and
vertical routing. Unfortunately, Xilinx doesnt provide much documentation on
performance characteristics, implementation details, and quantity of the routing
resources. Some routing performance characteristics can be obtained by analyzing
timing reports. And the FPGA Editor tool can be used to glean information about the
routing quantity and structure.
13
FPGA configuration
The majority of modern FPGAs are SRAM-based, including Xilinx Spartan and Virtex
families. On each FPGA power-up, or during a subsequent FPGA reconfiguration, a
bitstream is read from the external non-volatile memory (NVM), processed by the
configuration controller, and loaded to the internal configuration SRAM. Tips #35-37
describe the process of FPGA configuration and bitstream structure in more detail.
The following table summarizes this Tip by showing key features of the smallest, midrange, and largest Xilinx Virtex-6 FPGA.
Table 1: Xilinx Virtex-6 FPGA key features
Logic cells
Embedded memory (Kbyte)
DSP modules
User IOs
XC6VLX75T
74,496
832
288
240
XC6VLX240T
241,152
2,328
768
600
XC6VLX760
758,784
4,275
864
1200
The design implements a PCI Express to Ethernet adapter and is shown to illustrate the
potential complexity of a clocking scheme. It has a 16-lane PCI Express, tri-mode
Ethernet, DDR3 memory controller, and the bridge logic. 16 Serializer/Deserializer
(SerDes) modules embedded in FPGA are used to receive PCI Express data, one for each
lane. Each SerDes outputs a recovered clock synchronized to the data. A shared clock is
used for all PCI Express transmit lanes. Tri-mode Ethernet MAC requires 2.5MHz, 25
MHz, and 125MHz clocks to operate at 10Mbs, 100Mbs, or 1Gbs speed, respectively.
The memory controller uses a 333MHz clock, and the bridge logic utilizes a 200MHz
clock. In total, there are 23 clocks in the design. Each clock domain crossing from PCI
Express to bridge, bridge to Ethernet, and bridge to memory controller requires using
a different technique to ensure reliable operation of the design.
Metastability
Metastability is the main design problem to be considered for implementing data
transmission between different clock domains.
Metastability is defined as a transitory state of a register which is neither logic 0 nor
logic 1. A register might enter a metastable state if the setup and hold timing
requirements are not met. In a metastable state a register is set to an intermediate
voltage level, which is neither a zero nor a one logic state. Small voltage and
temperature perturbations can return the register to a valid state. The transition time
and resulting logic level are indeterminate. In some cases, the register output can
15
oscillate between the two valid states. Metastability conditions arise in designs with
multiple clocks, or asynchronous inputs, and result in data corruption.
The following are some of the circuit examples that can cause metastability.
Example 1
A state machine may enter an illegal state if some of the inputs to the next state logic are
driven by a register in a different clock domain. This is illustrated in the following
figure.
The exact problem that may occur due to metastability depends on the state machine
implementation. If the state machine is implemented as one-hot that is, there is exactly
one register for each state then the state machine may transition to a valid, but
incorrect, state.
Example 2
An input data to a Xilinx BRAM primitive and the BRAM itself are in different clock
domains.
If the input data violates the setup of hold requirements of a BRAM, that may result in
data corruption. The same applies to other BRAM inputs, such as address and write
enable.
Example 3
The output of a register in one clock domain is used as a synchronous reset to a register
in another clock domain. The data output of the right register, shown in the figure
below, can be corrupted.
Example 4
Data coherency problem may occur when a data bus is sampled by registers in different
clock domains. This case is illustrated in the following figure.
There is no guarantee that all the data outputs will be valid in the same clock. It might
take several clocks for all the bits to settle.
17
There are several practical methods for measuring the metastability capture window
described in the literature. The one applicable to Xilinx FPGA is Xilinx Application Note
XAPP094 [1].
However, MTBF can be only determined using statistical methods. A commonly used
MTBF equation is:
Resources
[1] Metastable Recovery in Virtex-II FPGAs, Xilinx Application Note XAPP094
http://www.xilinx.com/support/documentation/application_notes/xapp094.pdf