Sunteți pe pagina 1din 29

An introduction to FPGA

Christoph Heer
December 2002

Abstract

This document aims to give an overview of the technology of FPGAs (Field-Programmable Gate
Arrays). It focuses on aspects of the architecture and gives insights into the design flow.
FPGA devices are compliant with standard CMOS technology, with the exception of those FPGAs
which use flash or fuse technology. With processes below 0.2 µm, macros of a reasonable capacity
of some 10,000 gate equivalents can be embedded on-chip, constituting Configurable Systems-on-
Chip (CSoC). Today, chip platforms integrate a standard microprocessor core, SRAM and an
FPGA. Future systems will contain specialized cores.

An introduction to FPGA Christoph Heer, December 2002


Contents

Chapter 1 - Introduction

References

Chapter 2 - FPGA Architecture


2.1 Basic Structure
2.2 The Configurable Logical Cell
2.2.1. Simple Transistor/Multiplexer/Gate-Based Cells
2.2.2. LUT-Based Cells
2.2.3. PAL/PLA-Based Cells
2.2.4. ALU-Based Cells
2.3 Routing Structures
2.4 FPGA Configuration
2.5 Distributed SRAM
2.6 Input / Output Cells
References

Chapter 3 - FPGA Design Flow

Appendix A - PAL / PLA Architecture

Appendix B - List of Relevant Acronyms

An introduction to FPGA Christoph Heer, December 2002


1 Introduction

Digital integrated circuits may be broadly classified into three categories:

1.Programmable logic
2.Application-specific logic
3.Programmable standard architectures

Programmable logic is typically a means of storing amounts of data for quick access 1 , in either a
volatile or non-volatile manner with respect to the power supply. The programmability of such
devices could be one-time only, repeatable, or even dynamic (in the case of RAMs). Application
specific logic is typically highly optimised in terms of functionality, performance, power and cost.
The highest degree of optimisation is obtained with full custom implementation, while semi-custom
devices offer quicker design processes. Programmable standard architectures are highly flexible,
generic devices, the functionality of which is determined by loaded software (program code).
However, the processing time of a function is long because the code is executed sequentially; such
devices are therefore typically made use of in applications which allow these longer response times.

Gate arrays provide a highly standardised means to implement digital integrated circuit designs.
They are manufactured as regular arrays of patterned blocks of transistors which can be
interconnected to form logic elements such as gates, flip-flops and multiplexers. The advantage is
that the manufacturer can pre-produce gate array wafers without interconnections in high-volume.
These are then configured in an additional process step in the factory. Once a customer provides a
definition of the logic block interconnections, one or more layers of metal are added to form these
connections. Sea-of-gates structures are slightly different in that, unlike regular gate arrays, where
blank routing space is provided at regular intervals in the transistor array, added metal interconnects
have to be placed over particular transistors, rendering them unusable. The advantage is a better
area utilisation. These two types of devices are collectively known as MPGAs (Mask-
Programmable Gate Arrays). As process technologies advance and sizes get smaller, it is becoming
increasingly more expensive to configure such devices.

FPGAs (Field-Programmable Gate Arrays) and CPLDs (Complex Programmable Logic Devices) 2
are digital devices based on configurable logical cells and configurable interconnect structures.
They are manufactured using the latest technologies and very high capacity in equivalent ASIC
gates. The Altera APEX 20KC for example reaches capacities of 1.5 million gates using 0.15 µm
technology [1]. Unlike MPGAs, the configuration step does not involve a technological process but
1
A PLA / PAL may also be considered as a memory device, if the input vector to the array is viewed as an address
vector and the output of the array as the contents of the memory location uniquely determined by that input /
address.
2
As is explained in Section 2.2.3, CPLDs may be considered to be a type of FPGA, and throughout this document,
unless otherwise specified, the term FPGA will be used to refer to both FPGAs and CPLDs. The nature and
complexity of the two types of devices are similar, even though they differ very much in architecture and possibly in
the type of application too.

An introduction to FPGA Christoph Heer, December 2002


is done electrically. Re-configuration is therefore an option, during system boot-up and possibly
dynamically during run-time, though one-time programmable FPGAs also exist. FPGA devices
provide a very high degree of flexibility based on a standard architecture producible in large
quantities. They support the implementation of a wide range of circuit types and offer a lot of
potential for parallel processing. In this respect they appear superior to DSP architectures. The fact
that there is no need to generate a mask to configure FPGA architecture means that the hardware
implementation of logic circuits is faster and that small quantities may be produced at a reasonable
cost. FPGAs can be used for fast functional verification during the development phase, avoiding the
long waiting times associated with simulation. The cost of prototyping and time-to-market of new
designs is therefore reduced, as is the cost for small-volume production of particular designs.

Most FPGAs are re-configurable even after the chip would have been put into application. In
particular, FPGA macros which are embedded together with standardised cores on the same die
allow further flexibility. Thus, for example, if one such embedded FPGA macro is used in a
communications transceiver, changes in the communications protocol may be taken care of simply
by re-configuring the eFPGA, rather than re-designing the whole transceiver.
All these advantages however come with an incurred increase in signal delay and power
consumption, and worse utilisation of chip area when compared to equivalent logic circuits
implemented in full-custom or semi-custom.

To summarise, systems implemented using FPGAs offer the following advantages and
disadvantages over semi-custom and full-custom devices:

Advantages:
• Fast and cheap procedure for implementing hardware
• Fast functional verification
• Low cost of low-volume production
• Improved time-to-market
• Re-configurability in the field

Disadvantages:
• Non-optimal utilisation of silicon area
• Signal delay and power consumption are higher
• Routing problems could limit flexibility
• Potential clock-skew problems

Despite these disadvantages, the market of stand-alone FPGA devices has in recent years exploded
into a billion-dollar business and further growth is expected as process technologies improve. The
main benefit of flexibility without the costs of mask generation will then be even more significant.
Since FPGAs are compatible with standard CMOS processes, the embedding of FPGA macros into
larger designs will be a common technique in the imminent future. The following market models
are foreseeable:

1.Programmable once:
• derivatives of standard devices
• low cost of customisation even in low quantities
• protection of intellectual property as read-outs of programmed gate arrays are harder to obtain
than those of full-custom designs
2.Re-programmable:
• prototyping and functional development on standard platforms
• in-field customisation and updating

An introduction to FPGA Christoph Heer, December 2002


• multiple-application hardware

In conclusion, although FPGAs are sub-optimal in terms of physical implementation, they offer
great potential for producing standard cores which are individually customisable at low cost.

References

[1] Altera, Data Sheet, APEX 20KC Programmable Logic Device, ver. 1.1, April 2000.

An introduction to FPGA Christoph Heer, December 2002


2 FPGA Architecture

2.1 Basic Structure

Figure 2.1 - Basic FPGA architecture [1].

The basic architecture of an FPGA (Figure 2.1) is an array of identical, configurable logical cells.
The periphery of the device consists of a number of configurable input/output cells. The array is
interwoven with configurable interconnect resources and switches, which provide connection routes
between all these elements. Additionally FPGAs may have small RAM blocks distributed in the
array; these may also be configured to provide one logically lumped memory unit.

The array of configurable logical cells may be structured in several ways, as shown in Figure 2.2.

a)Symmetric matrix
b)Rows of cells
c)Sea of cells: this term refers to the fact that no dedicated routing resources exist between the
structured logical cells but instead they are switched through the cells.
d)Hierarchical structure

An introduction to FPGA Christoph Heer, December 2002


Figure 2.2 - a) Symmetric matrix architecture b) Rows c) Sea of cells d) Hierarchy [2a].

An FPGA device is generally designed to allow the implementation of practically any logic circuit.
This however requires an area trade-off between a sufficient number of flexible configurable logical
cells and enough interconnect resources to allow all connections between these cells. As the
majority of circuits will only utilise a small portion of routing and logic resources, this results in a
loss in speed (incurred by signal passing through redundant routing elements) and density of logic
when compared to the same circuit implemented in dedicated logic. An interesting concept is the
grouping of different FPGA devices with related architecture into a family [3]. Each member in a
family would be physically tailored to a certain class of application architecture, by for example
replacing the switches in certain routes by hard shorts, or hard-wiring the logical cells internally in a
certain manner. This member may now implement certain circuits more efficiently, but its reduced
flexibility means that some circuits may not fit at all onto the device. Implementation of a circuit is
now a question of choosing the right device from the FPGA family.

The IEEE Std. 1149.1 Joint Test Action Group (JTAG) standard describes boundary-scan test
circuitry which facilitates functional verification and debugging of FPGA cores by allowing the
observation of logic nodes without the need to bring these nodes externally via an I/O pin. Dynamic
configuring of the FPGA may also be done through the JTAG interface.

An introduction to FPGA Christoph Heer, December 2002


2.2 The Configurable Logical Cell

The CLC (Configurable Logical Cell) is used to implement a number of logic functions (generally
one or two) of a larger number of inputs. A cell may consist of various combinations of the
following elements:

• Transistors
• Basic gates (NAND, XOR, ... )
• Flip-flops
• Multiplexers
• Look-up tables (LUTs)
• AND-OR arrays (sum-of-products)

The term granularity refers to a quantification of the complexity of the CLC and can depend on the
following:

• Number of logical functions which may be implemented by each CLC


• Number of equivalent NAND2 gates of each CLC
• Total number of transistors that physically constitute the CLC

An FPGA device of higher granularity therefore consists of a larger number of less complex CLCs,
requiring more complex interconnections. FPGAs can therefore be classified according to the
granularity of their array structures. Arrays of gates or transistors represent the highest extreme of
the granularity scale, while arrays of microprocessors or ALUs are at the other end, since the CLCs
in this case are of very high complexity and require simpler interconnect resources.

2.2.1 Simple Transistor / Multiplexer / Gate-Based Cells

Figure 2.4 - Cell of transistor chains [2a].

An introduction to FPGA Christoph Heer, December 2002


The most basic type of configurable logical cell consists of simple groupings of transistors.
Programmable devices based on such cells are conceptually very similar to gate arrays and require
complex routing to implement large logic circuits. Figure 2.4 shows a logical cell formed of
transistor chains.

As a second example of a device with high


granularity, Figure 2.5 shows a simple CLC
based on multiplexers and a standard OR gate.
This is used in the Actel 40MK family. The 8-
input, 1-output cell can implement basic logic
gates (NAND, AND, OR, NOR) with 2, 3 or 4
inputs. Efficient use of interconnecting resources
allows the implementation of any logic function,
including flip-flops, by wiring a number of gates
together.

Figure 2.5 - Actel 40MK CLC [4].

An introduction to FPGA Christoph Heer, December 2002


2.2.2 LUT-Based Cells

Most FPGAs use logical cells which are based on Look-up Table (LUTs), the largest
exception being CPLDs. An LUT is realised as a number of memory locations (e.g.
SRAM) which are set during the configuration phase. During operation, the vector of
input signals selects one memory location, the content of which is switched to the
output of the LUT. This is implemented by means of pass transistors.

In the example LUT shown in Figure 2.6, depending on the inputs A, B and C, a path
is switched through a decision tree of depth three. The contents of the memory cell (in
this case 1 bit) corresponding to that path then appear at the output. Using this
architecture any combinational function of the three inputs may be implemented. An
LUT with more inputs can implement more logic, thereby reducing the number of
logical cells needed and with it the chip area needed to provide the routing between
the cells (Figure 2.3). However, LUT complexity grows exponentially with the
number of inputs. Previous research [5] has shown that a 4-input LUT is the most
efficient in terms of area and most commercial FPGA vendors in fact use LUTs of
this size.

It is also common practice to use two LUTs in parallel. The two outputs could either
be dynamically selected using a multiplexer or propagated as two output ports of the
logical cell. In the first case a logical cell of 4 inputs, for instance, could be
implemented using two 3-input LUTs and one multiplexer which is switched by the
fourth cell input. The benefit of splitting the LUT is increased flexibility in
configuring the logical cell.

Figure 2.6 - LUT architecture [2b].


Whilst the LUT implements combinational logic circuits, logical cells must also
contain flip-flops to be able to implement sequential logic. Figure 2.7 shows a
simplified CLC for a typical FPGA.

Figure 2.7 - Basic CLC architecture [6].

11
Figure 2.8 - CLC configurations [7].

This simple cell can now be configured in several modes to implement various basic
types of digital circuit (Figure 2.8). The most common configurations are:

• Synthesis mode: Any logic function of up to 4 variables in its registered or direct


form.
• Arithmetic mode: The LUT is split to provide any two logic functions of the same
3 variables. In the arithmetic mode, the inputs A, B, C are the addends and the
Carry-in, whilst the output functions are the Sum and the Carry-out.
• Multiplier mode: This mode also implements an adder, with the addends this time
being partial products and Carry-in from the previous bit position. The partial
product of A and B may be implemented with an AND gate. In the case of the
Atmel AT40K device [7], from which these configurations were sourced, an AND
gate is included in the architecture of the CLC for this purpose, avoiding the
wasteful reservation of an LUT input to implement such a simple function.
• Counter mode: The LUT provides two logic functions (counter Output and Carry-
out) of the same 2 variables, which are a Carry-in and the previous Output. The
feedback loop to use this output as an input is normally provided for within the
CLC; this could also be implemented externally by connecting appropriate routes.
• Multiplexer (2:1) mode: The LUT is configured to provide a logic function of 3
variables, where one selects one of the other two inputs. As an example, the case
where C is the select line for A and B will be considered. In this case the 1-bit
memory cells in the LUT are configured to implement the following truth table:

12
A B C D O/p
0 0 0 x 0
0 1 0 x 0
1 0 0 x 1
1 1 0 x 1
0 0 1 x 0
0 1 1 x 1
1 0 1 x 0
1 1 1 x 1

Note that some configurations, namely Arithmetic, Counter and Multiplier modes,
require 2 distinct functions of 3 inputs. Both Atmel and Actel in fact provide an
architecture with two separate 3-input LUTs. This is equivalent to a 4-input LUT in
terms of the number of gates required to implement the LUT. In other words,
extending the 3-input LUT in Figure 2.6 to a 4-input LUT involves inserting a fourth
input line D and increasing the depth of the tree to four, which requires an additional
16 pass transistors. Since each 3-input LUT contains 14 transistors, having two 3-
input LUTs and using D to select which of the outputs will be registered by the flip-
flop also results in an additional 16 transistors. Figure 2.9 shows one such example in
the Actel Varicore CLC [8] (ignore the Carry In and Carry Out lines at this point).
Other devices, like Altera APEX 20K [9], have 4-input LUTs with a second output
specifically providing a Carry-out line.

Additionally, the CLCs contain interfacing logic to the routing resources and in some
cases specialised functionality such as fast carry and cascade chains to speed up
arithmetic operations. The internal connectivity of the each cell is determined by a
number of multiplexers which can be used to configure all possible inter-connections
between LUT, flip-flop and local routing lines.

13
Figure 2.9 - Actel Varicore CLC architecture [8].

An example of an LUT-based CLC of higher complexity (5 inputs / 2 outputs) is the


Xilinx XC3000 CLC [10], which uses a 5-input LUT and 2 flip-flops to implement
more complex functions with less number of cells. The obvious penalty is less
efficient CLC utilisation.

14
Figure 2.10 - Xilinx XC3000 CLC [10].
2.2.3 PAL/PLA-Based Cells

Complex Programmable Logic Device (CPLDs) are also devices with high cell-
complexity. The CLC of a CPLD is not-surprisingly called a Simple Programmable
Logic Device (SPLD) and is based on sum-of-products (also called AND-OR) logic.
Each SPLD is made up of a PAL or PLA3 , macrocells and input / output structures.
The PLA / PAL produces a number of product terms which are functions of the inputs
to the SPLD. The number of macrocells per PLA / PAL determines how many
different logic functions may be obtained from a selection of the same set of product
terms. Whether the OR logic is lumped in the PLA / PAL cell or the macrocell block
is simply a question of labelling and is manufacturer-dependent.

3
The difference between a PAL and a PLA is explained in Appendix A.

15
Figure 2.11 - Generic CPLD architecture [11].

As with other FPGAs, all the logical cells can be interconnected using routing
resources, though in the case of the CPLD, these tend to be simpler and based on
signal lines running through the whole device, a characteristic of a low-granularity
device. This also means that delays between cells are predictable. Figure 2.12 shows
the CLC of Altera CPLDs, consisting of AND gates with high fan-in (gates with more
than 20 inputs) which converge in OR gates of 3 to 8 inputs. This structure allows the
implementation of complex logic functions using a minimal amount of CLCs,
reducing the required number of interconnections. In practice though it is very
difficult to use the array to its maximum complexity, so density is wasted.

16
Figure 2.12 - Altera CLC architecture [2c].

2.2.4 ALU-Based Cells

FPGAs based on arrays of ALUs have recently appeared on the market as very low-
granularity programmable devices. Companies offering such solutions, or in the
process of developing them, include Adaptive Silicon, LSI (architecture licensed from
Adaptive Silicon), PACT corporation and Elixent. Arrays of statically programmed
ALUs can be configured into synchronous DSP pipelines yielding powerful
instruction level parallelism.

Figure 2.13 - Array of 4-bit ALUs [12].

17
2.3 Routing Structures

Four types of routing networks are needed in an FPGA device:

• Power feeding network


• Reset and multiple clock networks (local / global)
• Signal network interconnecting all cells
• Configuration lines

A strategy adopted by most manufacturers to different extents is the structuring of the


device into some sort of hierarchy, by segmenting the array into groups of CLCs.
Routing lines interconnecting the cells could then be broadly classified into three
different types:

• Local routing lines directly interconnecting neighbours


• Interconnects to route signals within a cluster of cells
• Global interconnects to transmit signals throughout the whole array

Local routing lines are of low fan-out and limited length. The switching in this case
is done from within the CLC, to create fast point-to-point interconnections useful for
fast arithmetic operations for instance. These connections allow the most efficient
implementation of standard structures (as are multiplier elements, shift registers, etc.)
in terms of utilisation and speed.

18
Figure 2.14 - Example of a routing resource using programmable switches [2a].
Routing within a cluster of cells is done by means of a matrix of interconnection
lines, which may be configured to realise connections between any two CLCs or
between one CLC and an I/O cell. Different routes are made using routing resources
which consist of configurable pass transistor switches. An example of such a resource
is shown in Figure 2.14. Emphasis has to be made on the importance of having
efficient CAD tools which make good utilisation of the CLCs and place for minimal
distance. Each of the switches in such programmable routing resources is equivalent
to an RC element, meaning that it introduces a propagation delay to the signal. Figure
2.15 shows how the route between two CLCs, passing through a switching matrix and
two programmable interconnection points (PIPs in Xilinx terminology) which connect
the cell to a line, may be represented by an equivalent RC model. With FPGA devices
of high granularity, the routing resources are more complex, meaning that there are a
large number of very different routes between two cells, each of which has a very
different associated delay. For this reason, low-granularity devices have more easily
predictable delays between cells.

Global interconnects require strong signal driving and do not use the above
mentioned routing matrices. They enable the transmission of global signals to all
CLCs with minimal delay and attenuation of logic levels. Because of large distances,
there could be the need for signal refresh using tri-state buffers.

19
Figure 2.15 - Breakdown of route into equivalent electrical model [2a].
The level of connectivity between cells in the FPGA has a direct effect on the total
area of the circuit. Recent advancements in the semiconductor technology process has
increased the number of metal layers available for interconnection (from 2 to 7
layers), albeit at a cost. Extra layers can be used to reduce the amount of area required
for more complex interconnectivity and allow the allocation of specific layers to
particular functions such as power supply and clock signals.

Different FPGA manufacturers have adopted very different solutions to the complex
question of routing between cells in an FPGA device. Therefore the routing
architectures of the different devices will be addressed in more detail in the chapters
concerning the particular devices.

2.4 FPGA Configuration

FPGA devices allow the configuration of all CLCs, I/O cells and interconnect
resources. The gate of each configurable transistor is controlled by the contents of a 1-
bit memory cell, with a logic '0' or logic '1' determining whether the gate is off or on.

20
To reduce the wiring required for configuration, the memory cells can be connected in
a chain and the configuration is then loaded using a shift operation. Depending on the
physical configuration mechanism, it is possible to classify FPGAs into three classes:

• One-time configurable devices


• Non-volatile re-configurable devices
• Volatile re-configurable devices

One-time programmable devices store configuration using fuses or anti-fuses. The


former are normally closed structures, while the latter are normally open. A device
based on fuse technology is programmed by physically breaking the connections
between appropriate structures. On the other hand, a device based on anti-fuses is
programmed by melting interconnections between particular cells to generate
contacts. The Actel eX [13], mX [4] and sX [14] families are based on anti-fuse
structures.

In the case of re-programmable devices, activation or deactivation of interconnects is


implemented by means of pass transistors or tri-state buffers (Figure 2.16). Memory
units also store the configuration of LUTs and static multiplexers in the CLC. If the
type of memory used is EEPROM, the device is non-volatile, but the difficult
mechanism of re-configuration imposes limitations on the application of the system.
SRAM memory, on the other hand, loses the configuration once power is removed
from the device (volatile), but it is simple and quick to configure. The use of SRAM
allows for dynamic re-configuration of the device even during real-time operation.
Small local SRAM blocks may also be used to store several configuration bits. In this
case, unlike in the application of SRAM blocks for ordinary data storage, there is no
need for a select of the read lines.

Figure 2.16 - Configuration of FPGA devices [15].

In commercial applications, a separate PROM device is used to store the


configuration, which is then loaded into the FPGA SRAM at system start-up via a
special configuration interface which usually allows both serial and parallel
configuration modes. In systems which combine eFPGA cores with microprocessor
cores, the processor could load new configurations into the FPGA. To facilitate
system testing and debugging, many devices support read-out of configuration. The

21
IEEE Std. 1149.1 JTAG standard describes boundary-scan circuitry which allows the
observation and configuration of individual elements for such purposes.

2.5 Distributed SRAM

Several applications require the use of local memory units. For this purpose, many
FPGAs include small SRAM blocks, which are distributed in an array-like structure
throughout the device. This is known as distributed RAM and could be configured as
one logical RAM unit. This type of RAM offers faster access by the FPGA and more
flexibility of configuration of the memory as well as of the communication between
different processes and memory blocks, when compared to a lumped memory block
external to the FPGA core. In most cases these distributed memory blocks can be
configured as multiple independent synchronous / asynchronous, single-port / dual-
port RAM blocks, often offering a compromise between the width of the address and
data busses. For example, Altera's FLEX 10K [16] allows the following
configurations: 256x8, 512x4, 1024x2, 2048x1.

The LUT in an LUT-based CLC could be looked at as a small memory unit with the
flip-flop used to latch the output. Some FPGA devices, like the Xilinx XC4000 Series
[17], also allow the configuration of several CLCs into distributed RAM, though of
course this implies a loss in logic resources. In the survey carried out on commercially
available FPGAs, the only type of distributed RAM described was that implemented
as SRAM blocks distributed throughout the device.

2.6 Input / Output Cells

An important aspect of flexibility on an architectural level is the interface between an


IC and external circuitry. There may be the need to support different bus standards
with the same core logic, or to allow different IC pin-outs as required by different
board layouts. The input / output cells on an FPGA device are programmable blocks
situated on the periphery of the circuit. As an example, the basic structure of the IO
cell of the Xilinx XC4000 Series [17] will be examined, as shown in the simplified
block diagram in Figure 2.17. In general, it may be assumed that other manufacturers
use similar architecture in the IO cells of their devices; if however there are large
differences, then these are explained in the respective sections.

22
Figure 2.17 - Simplified block diagram of XC4000E Series IOC [17].

The structure incorporates the following features:

• D flip-flops which could be used to provide sequential buffering of the input or


output line.
• The tri-state output buffer may be put in a state of high impedance by means of an
activate signal, implementing tri-state outputs or bi-directional I/O.
• The output slew rate may be controlled at the configuration stage.
• The output pull-up device may be configured with either an n-channel transistor,
pulling to one threshold level below Vcc or p-channel transistor to pull up to Vcc.
• The input thresholds can be configured for either TTL or CMOS logic levels.
• Programmable pull-up and pull-down resistors are used to tie floating pins to Vcc
or ground respectively.
References

[1] J. Carrabina, F. Lisa and A. J. Velasco, Implementación con FPGAs, Chapter 11


from the book Sistemas Digitales, 2000.

[2] S.A. Bota Ferragut, FPGAs, Internal Communication, Universitat de Barcelona


[2a] Chapter 1. Introducción
[2b] Chapter 3. Arquitectura Logic Cell Array (LCA) de Xilinx
[2c] Chapter 5. Arquitectura Multiple Array Matrix (Max-plus) de Altera

[3] V. Betz and J. Rose, Using Architectural “Families” to Increase FPGA Speed
and Density, University of Toronto.

23
[4] Actel, Data Sheet, 40MX and 42MX FPGA Families, ver. 5.0, February 2001.

[5] J. Rose, R. J. Francis, D. Lewis and P. Chow, Architecture of Programmable


Gate Arrays: The Effect of Logic Block Functionality on Area Efficiency, IEEE
Journal of Solid State Circuits, Oct. 1990, pp. 1217 - 1225.

[6] V. Betz and J. Rose, How Much Logic Should Go in an FPGA Logic Block?,
University of Toronto.

[7] Atmel, Data Sheet, AT40K FPGAs, January 1999.

[8] Actel, Data Sheet, VariCore EPGA Family, rel. 1.0, February 2001.

[9] Altera, Data Sheet, APEX 20K PLD Family, ver. 4.0, August 2001.

[10] Xilinx, Data Sheet, XC3000 Series FPGAs, ver. 3.1, November 1998.

[11] A. Dhir, Introducing Xilinx and Programmable Logic Solutions for Home
Networking, ver. 1.0, March 2001.

[12] www.elixent.com

[13] Actel, Data Sheet, eX Family FPGAs, ver. 0.3, March 2001.

[14] Actel, Data Sheet, 54SX Family FPGAs, ver. 3.0.1, May 2000.

[15] V. Betz and J. Rose, FPGA Routing Architecture: Segmentation and Buffering to
Optimise Speed and Density, University of Toronto.

[16] Altera, Data Sheet, FLEX 10K Embedded PLD Family, ver. 4.1, March 2001.

[17] Xilinx, Data Sheet, XC4000E and XC4000X Series FPGAs, ver. 3.1, ver 1.6,
May 1999.

24
3 FPGA Design Flow

The process of circuit design on FPGA devices is highly automated and involves the
use of flexible and powerful CAD tools. The efficiency of the tools used has a direct
impact on the overall design time and the efficiency of the FPGA implementation:

• Design Entry. This is the starting point of the design process and involves
capturing the design using a high-level description language like Verilog or
VHDL. Alternatively a schematic editor is used to enter the design at basic logic
level, or by making use of generic blocks which in turn are described by high-
level languages. Other possibilities include entry of the design using state
diagrams. The CAD software provided by FPGA manufacturers includes libraries
of standard circuits or macro-functions to quickly implement common circuits of
varying complexity. The schematic or VHDL description are then translated into a
netlist describing the circuit in terms of logic gates and sequential elements.
• Logic Synthesis. This tool optimises the circuit by regrouping logic functions
and/or removing redundancies. Such optimisation is carried out according to
design constraints or rules, which could be minimising area or maximising
velocity. Once the optimised netlist is obtained, it has to be mapped onto the
logical cell of the FPGA (LUT / flip-flop, PLA ... ). The aim of this is to minimise
the total number of CLCs to be used.
• Floorplanning. The circuit to be designed is now divided into partitions, each of
which is adjusted to be implemented in a particular area on a FPGA device. A
partition usually corresponds to a large section of the circuit which has a particular
functionality, e.g a multiplier, filter bank etc. In this step, the total number of
FPGA devices required is also determined.
• Place and Route. A logic partition is now mapped onto an FPGA device by
means of the placement tool, which assigns a physical place in the array of CLCs
to each function (LUT / flip-flop, PLA ... ). Typical placement algorithms aim to
minimise the total length of the interconnections in the final design, with the
objective of maximising the speed of the device. Routing algorithms configure the
routing elements to provide the required connections between logic elements. The
primary aim of any routing algorithm is to assure that 100% of the required routes
may be realised. Other goals of routing algorithms include finding the shortest
paths possible between elements. Because of restricted interconnection resources,
this step is the most restrictive.
• Layout Verification. This step involves extracting the physical layout of the
design and simulating it using commercial simulators to obtain timing data and
checking design rules (DRC). If the delays associated with the interconnections
within the prototype indeed fulfil delay constraints imposed by the design
specifications, then the device may be programmed, otherwise the placement and
routing steps have to be repeated until a satisfactory configuration is found.
• Macro Integration. This involves the provision of all the necessary files and data
formats for integrating the macro in the design flow of the whole chip.

25
Once the circuit would have been verified, the design configuration is output in a
format which is readable as an input to the FPGA device which is to be programmed.
The programming of the device could be a question of minutes.

26
Appendix A - PAL / PLA Structure

Figure A.1 -
PAL / PLA structure [1].

A PLA (Programmable Logic Array) provides a structured form of implementing


combinational functions which are in the form of sum-of-products of a number of

27
input lines to the device. As shown in Figure A.1, PLAs are built of two distributed-
gate arrays. These 2 arrays are programmed by forming a connection between the
array input lines and the logic gate (AND, OR) inputs. The first array provides the
products (and is therefore known as the AND plane) and the second provides the
desired sum of these products (and is known as the OR plane). A PAL device is a
variation in which the OR plane is fixed.

References

[1] Xilinx, Data Sheet. CoolRunner XPLA3 CPLD, ver. 1.4, April 2001.

28
Appendix B - List of Relevant Acronyms

ASIC Application-Specific Integrated Circuit


CLC Configurable Logical Cell
CPLD Complex PLD
CSoC Configurable SoC
DRC Design Rule Check
eFPGA embedded FPGA
FPGA Field-Programmable Gate Array
IOC Input / Output Cell
JTAG Joint Test Action Group
LUT Look-Up Table
MPGA Mask-Programmable Gate Array
PLA / PAL Programmable Logic Array
PLD Programmable Logic Device
SoC System-on-Chip
SPLD Simple PLD

29