Sunteți pe pagina 1din 59

FPGA ARCHITECTURE AND

ITS DESIGN METHODOLOGY

COMPARE AN FPGA AND DSP

Filter

DIGITAL LOGIC
Logic Gates

MOORES LAW

Transistor Switches

< 40 nm ! $$$

DIGITAL LOGIC
Digital Logic Function

Product AND (&)


Sum OR (|)

3 Inputs

Black Box

Truth Table
(Look Up Table LUT)

SUM of PRODUCTS

WHAT ARE PROGRAMMABLE


CHIPS?
As compared to hard-wired chips, programmable chips can be customized as

per needs of the user by programming


This convenience, coupled with the option of re-programming in case of

problems, makes the programmable chips very attractive


Other benefits include instant turnaround, low starting cost and low risk
As compared to programmable chips, ASIC (Application Specific Integrated

Circuit) has a longer design cycle and costlier ECO (Engineering Change Order)

Still, ASIC has its own market due to the added benefit of faster performance

and lower cost if produced in high volume

Programmable chips are good for medium to low volume products. If you need

more than 10,000 chips, go for ASIC or hard copy

WHAT IS AVAILABLE?
PLA (Programmable Logic Array) is a simple field programmable chip that

has an AND plane followed by an OR plane. It is based on the fact that any
logical function can be written in SOP (Sum of Products) form thus any
function can be implemented by AND gates generating products which
feed to an OR gate that sums them up

CPLD (Complex Programmable Logic Device) consists of multiple PLA

blocks that are interconnected to realize larger digital systems

FPGA (Field Programmable Gate Array) has narrower logic choices and

more memory elements. LUT (Lookup Table) may replace actual logic gates

COMPARE PAL, PLA, PROM

PROGRAMMABLE LOGIC DEVICES PLDS


Inputs
Un-programmed State

ANDs

SUM of PRODUCTS
(Re-)Programmble Links
Reconfigurable
GLUE LOGIC

Planes of
ANDs, ORs
ORs
Outputs
Logic Functions

PROGRAMMABLE LOGIC DEVICES PLDS


Logic Functions

Programmed PLD

x
Sums

Product Terms

COMPLEX PLDS

CPLDs
Programmable PLD Blocks
Programmable Interconnects
Electrically Erasable links

Feedback Outputs

CPLD Architecture

PROGRAMMABILITY: WHERE
DO FPGAS FIT?
MultiCore

Intel CPU

TI DSP

GPU

ASSP

FPGA

ManyCore

ASIC

Flexibility, Programming Abstraction


Performance, Area and Power Efficiency

CPU:
Market-agnostic
Accessible to many
programmers (C++)
Flexible, portable

FPGA:
Somewhat Restricted
Market
Harder to Program
(Verilog)
More efficient than SW
More expensive than
ASIC

ASIC
Market-specific
Fewer programmers
Rigid, less
programmable
Hard to build
(physical)

Which Way to Go?


ASICs

FPGAs

Off-the-shelf
High performance
Low development cost

Low power

Low cost in
high volumes

Short time to market

Re-configurability

ASIC Design Example Factoring circuit/GMU


Global Memory

Local
Memory

ASIC 130 nm vs. Virtex II 6000


Factoring/GMU

19.68 mm

19.80 mm

51x

Area of Xilinx Virtex II 6000


FPGA
(estimation by R.J. Lim Fong,
MS Thesis, VPI, 2004)

2.7 mm
2.82 mm

Area of an ASIC with equivalent functionality

WHAT IS FPGA
Field

Programmable Gate Array

Simple Programmable Logic Blocks


Massive Fabric of Programmable Interconnects
Standard CMOS Integrated Circuit fabrication process as for memory chips

(Moores Law)
An FPGA is a device that contains a matrix of reconfigurable gate

array logic circuitry.

FPGAs are truly parallel in nature ie the performance of one part of the

application is not affected when additional processing is added.

FPGAs use dedicated hardware for processing logic and do not have an

operating system .

FPGA can be Partially reconfigurable while rest of the chip is

running

WHAT IS INSIDE FPGA

Logic blocks
to implement combinational
and sequential logic

Interconnect
wires to connect inputs and
outputs to logic blocks

I/O blocks
special logic blocks at periphery
of device for external connections

Major FPGA vendors


SRAM-based FPGAs
Xilinx Inc.
www.xilinx.com
Altera Corp.
www.altera.com
Atmel Corp.
www.atmel.com
Lattice Semiconductor Corp. www.latticesemi.com
Antifuse and flash-based FPGAs
Actel Corp.
www.actel.com
QuickLogic Corp.
www.quicklogic.com

FPGA families

vendor

Low cost

High performance

Xilinx

Spartan 3,3L,3E

Vertex 4
LX/SX/FX,Vertex 5
LX

Altera

Cyclone II,III

Stratix II ,II GX

How to make logic blocks programmable?


When a FPGA is configured, the internal circuitry is connected

in a way that creates a hardware implementation of the


software application.

These wires are connected by the user and therefore must use an

electronic device to connect them

Cheap/fast fuse connections


small area (can fit lots of them)
low resistance wires (fast even if in multiple segments)
very high resistance when not connected
small capacitance (wires can be longer)

PROGRAMMING TECHNOLOGIES
Fuse and anti-fuse ( One Time Programming)
fuse makes or breaks link between two wires
typical connections are 50-300 ohm
one-time programmable

Flash EPROM based (Multiple Time Programming)


High density
Process issues

RAM-based-pass transistors controlled by an SRAM


cell (Multiple time Programming)
memory bit controls a switch that connects/disconnects two wires
typical connections are .5K-1K ohm
can be programmed and re-programmed easily (tested at factory)

SRAM:

In SRAM programming technology SRAM cell is used to


store the data which specifies whether a connection has to
be made or not. The SRAM cell drives the gate of pass
transistor on the chip either turning pass transistor or
transmission gate on to make a connection or of to break
a connection.

The advantage of SRAM technology is that designers


can reuse chips during prototyping and a system can be
manufactured using ISP. The other advantage is to
reprogram a chip by downloading a new configuration file.

SRAM-TYPE FPGA INTERCONNECT


ARCHITECTURE (CONTD)

Cell Connection
Matrix (CCM)

PSM

ANTIFUSE
:

Anti fuses are originally open circuits and take


on low resistance only when programmed. When
unprogrammed, the insulator isolates the top and
bottom layers, but when programmed the
insulator changes to become a low-resistance
link.

EPROM
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control

device as in an SRAM cell or as a directly programmable switch. When used


as a switch they can be very efficient as interconnect and can be
reprogrammable at the same time. They are also non-volatile so they do
not require an extra PROM for loading. They, however, do have their
detractions. The EEPROM process is complicated and therefore also lags
SRAM technology.

COMPONENTS OF MODERN FPGAS

Xilinx CLB

CONFIGURATION LOGIC BLOCKS (CLBS)


In Xilinx logic block Look up table LUT is used to implement

any number of different functionality

.. CONTD
The input lines go into the input and enable of lookup table.

The output of the lookup table gives the result of the logic
function that it implements. Lookup table is implemented
using SRAM

LOOKUP TABLE
A LUT (Lookup table) is a one bit wide memory array
A 4-input AND gate is replaced by a LUT that has four address inputs and

one single bit output with 16 one bit locations

Location 15 would have a logic value 1 stored, all others would be zero
LUTs can be programmed and reprogrammed to change the logical

function implemented

LOOK UP TABLES
LUTS

LUT contains Memory Cells to implement small logic


functions

Each cell holds 0 or 1 .

Programmed with outputs of Truth Table

Inputs select content of one of the cells as output

3 Inputs LUT -> 8 Memory Cells

Configured by re-programmable SRAM memory cells

3 6 Inputs

Multiplexer MUX

Static Random Access Memory


SRAM cells

CONFIGURING LUT

LUT is a RAM with data width of 1bit.


The contents are programmed at power up
Logic Functions implemented in Look Up Table
Multiplexers (select 1 of N inputs)

Truth Table

Required Function

Programmed LUT

LUT (LOOK-UP TABLE) FUNCTIONALITY

x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

x1
x2
x3
x4

y
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0

LUT

x1 x2 x3 x4

x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1

x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1

x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

y
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0

Look-Up tables are


primary elements
for logic
implementation
Each LUT can
implement any
function of 4
inputs

x1 x2
y
y

DISTRIBUTED RAM
RAM16X1S

CLB LUT configurable as Distributed RAM


A LUT equals 16x1 RAM
Implements Single and Dual-Ports
Cascade LUTs to increase RAM size
Synchronous write
Synchronous/Asynchronous read
Accompanying flip-flops used for synchronous read

D
W
EWCL
K
A0
A1
A2
A3

LUT

RAM32X1S
D
WE
WCL
A0
K
A1
A2
A3
A4

LUT

or
=

LUT

RAM16X2S
D0
D1
WE
WCLK O
A0
0
O
A1
1
A2
A3

or

RAM16X1D
D
W
EWCL
K
A0

SPO

A1
A2
A3
DPRA0 DP
DPRA1 O
DPRA2
DPRA3

SHIFT REGISTER
Each LUT can be configured as shift
register

LUT
IN
CE
CL
K

Serial in, serial out

Dynamically addressable delay up to 16


cycles
For programmable pipeline
Cascade for greater cycle delays
Use CLB flip-flops to add depth

D
Q
CE

D
Q
CE

LUT

D
Q
CE

D
Q
CE

DEPTH[3:0]

OUT

CARRY & CONTROL LOGIC

COUT
YB
G4
G3
G2
G1

Y
Look-Up
Table O

Carry
&
Control
Logic

CK
EC
R

F5IN
BY
SR
XB
F4
F3
F2
F1

CIN
CLK
CE

X
Look-Up
Table O

Carry
&
Control
Logic

S
D

CK
EC

SLICE

Each CLB contains separate logic and


routing for the fast generation of sum & MSB
carry signals
Increases efficiency and performance of
adders, subtractors, accumulators,
comparators, and counters

Carry logic is independent of normal


logic and routing resources

LSB

Carry Logic
Routing

FAST CARRY LOGIC

CLB SLICE STRUCTURE


Each slice contains two sets of the following:
Four-input LUT
Any 4-input logic function,
or 16-bit x 1 sync RAM
or 16-bit shift register

Carry & Control

Fast arithmetic logic


Multiplier logic
Multiplexer logic

Storage element

Latch or flip-flop
Set and reset
True or inverted inputs
Sync. or async. control

BLOCK RAM
Port B

Port A

Spartan-3
Dual-Port
Block RAM

Block RAM

Most efficient memory implementation


Dedicated blocks of memory
Ideal for most memory requirements
4 to 104 memory blocks
18 kbits = 18,432 bits per block (16 k without parity bits)
Use multiple blocks for larger memories
Builds both single and true dual-port RAMs

18 X 18 MULTIPLIER
Embedded 18-bit x 18-bit multiplier
2s complement signed operation
Multipliers are organized in columns
Fast arithmetic functions
Optimized to implement
multiply / accumulate modules

Data_A
(18 bits)

18 x 18
Multiplier
Data_B
(18 bits)

Output
(36 bits)

IOB FUNCTIONALITY
IOB provides interface between the package pins and CLBs
Each IOB can work as uni- or bi-directional I/O
Outputs can be forced into High Impedance
Inputs and outputs can be registered
advised for high-performance I/O
Inputs can be delayed

FPGA DESIGN
FLOW

TRANSLATING A DESIGN TO AN FPGA


RTL
.
.
C = A+B
.

Array

Circuit
A
B

CAD to translate circuit from text description to physical

implementation well understood.


Most current FPGA designers use register-transfer level specification
(allocation and scheduling)
Same basic steps as ASIC design.

DESIGN PROCESS
Specification
Spec

RTL
C = A+B
.

VHDL/Verilog description (Source Files)


Module RC5( clock, reset, encr_decr,
data_input, data_output,
out_full, key_input
key_read
);
..

Synthesis

Netlist

Functional simulation

Post-synthesis simulation

Circuit
A
B

Array

DESIGN PROCESS (2)


Implementation
(Mapping, Placing & Routing)
Timing simulation

Configuration
On chip testing

DESIGN PROCESS
IN DETAIL

Logic Synthesis
VHDL description
Module MLU( );
Reg A1,B1,Y1;
Reg MUX_0, MUX_1, MUX_2, MUX_3;
begin
A1<=(NEG_A=0)?A:~A;
B1<=(NEG_B=0)?B:~B;
Y<=(NEG_Y=0)Y1:~Y1;
MUX_0<=A1 & B1;
MUX_1<=A1 | B1;
MUX_2<=A1 ^ B1;
MUX_3<=A1 ^ B1;
case({L1,L0})
0: Y1<=MUX_0
1: Y1<=MUX_01
2: Y1<=MUX_2;
3: Y1<=MUX_3;
Endcase end

Circuit netlist

FEATURES OF SYNTHESIS TOOLS


Interpret RTL code
Produce synthesized circuit netlist in a standard EDIF format
Give preliminary performance estimates
Some can display circuit schematics corresponding to EDIF netlist

Mapping
LUT0
LUT4
LUT1

FF1
LUT5

LUT2
FF2
LUT3

Placing

FPGA
CLB SLICES

Routing
Programmable Connections

FPGA

CONFIGURATION OF SRAM BASED FPGAS


Once a design is implemented, you must create a file that the FPGA can

understand (.bit extension)

The BIT file can be downloaded directly to the FPGA, or can be converted

into a PROM file which stores the programming information

Millions of SRAM cells holding LUTs and Interconnect Routing


Volatile Memory. Lose configuration when board power is turned off.
Keep Bit Pattern describing the SRAM cells in non-Volatile Memory e.g.

PROM or Digital Camera card

Configuration takes ~ secs


JTAG Port

Programming
Bit File
JTAG Testing

USER CONSTRAINTS

File contains various constraints for Xilinx


Clock Period
Circuit Locations
Pin Locations

Every pin in the top-level unit needs to have

a pin in the UCF

CONSTRAINTS
NET "CLOCK" LOC = "V10" | IOSTANDARD = "LVCMOS33";
NET "SEG<0>"

LOC = "T17" | IOSTANDARD = "LVCMOS33";

NET "SEG<1>"

LOC = "T18" | IOSTANDARD = "LVCMOS33";

VIRTEX 5 CLB
ARCHITECTURE

LUT BASED FULL ADDER DESIGN

1 bit Full Adder

4 bit Full Adder

PIPELINING IDEA

PIPELINE SOLUTION

Each flip stage can operate at faster


Rate than before, but result goes valid
After TWO clocks.

S-ar putea să vă placă și