Sunteți pe pagina 1din 51

Digital Systems Design Using VHDL

Designing with FPGAs


Basic Programmable Components in
FPGAs
 Programmable logic blocks
 Programmable interconnect
 Programmable I/O Block
Programmable Logic Block Architecture
 Different Components
 Look Up Table Based (4 or 5-variable LUTs)
 Programmable logic device (PLD) Based
 Multiplexer Based
 NAND gate based
 Transistor based
 Different granularity/different amount of control
 Different names
 CLBs/LEs/LABs/Tiles/Versa Tiles
Granularity
 Granularity: Logic block correlates to the granularity
of a device that, in turn , relates to the effort required
to complete wiring between blocks (routing channels)
 Classification of FPGAs based on granularity
(I) Low granularity
(II) Medium granularity
(III) High granularity
LUT Based Programmable Logic Block
Example

M-Mode selection: Direct or


Buffered output
Implementing a Boolean function-LUTs
F1=A’B’C + AB

LUT contents:
IMPLEMENT 4-TO-1 MUX USING FPGA
 M = S1’S0’I0 + S1’S0I1 + S1S0’I2 + S1S0I3
IMPLEMENT 4-TO-1 MUX USING FPGA
 X = M1 = S0’I0 + S0I1
 Y = M2 = S0’I2 + S0I3
 M = S1’M1 + S1M2
IMPLEMENT 4-TO-1 MUX USING FPGA
Inputs Output

X4 X3 X2 X1 X
(S0) (I1) (I0)
x 0 0 0 0
x 0 0 1 1
x 0 1 0 0
x 0 1 1 1
x 1 0 0 0
x 1 0 1 0
x 1 1 0 1
x 1 1 1 1
IMPLEMENT 4-TO-1 MUX USING FPGA(3-
input & 4-input LUTs)
IMPLEMENT 4-TO-1 MUX USING FPGA
IMPLEMENT 4-TO-1 MUX USING FPGA
 Expensive to Create 4-to-1 MUX using LUTs
 48 SRAM Cells Required in 1st Design
• Three 4-input LUTs
• 16 SRAM Cells per LUT
 40 SRAM Cells Required in 2nd Design
• 16 Cells for X
• 16 Cells for Y
• 8 Cells for Z
SHIFT REGISTER USING FPGA
3-TO-8 DECODER USING FPGA
 How Many Programmable Logic Blocks to Implement
3-to-8 Decoder?
Shannon decomposition
 What is the use of Shannon decomposition?
SHANNON DECOMPOSITION
 Z(a,b,c,d,e,f) = a’Z(0,b,c,d,e,f) + aZ(1,b,c,d,e,f) = a’Z0+aZ1
Shannon’s Expansion theorem
 To decompose functions of large numbers of
variables- fewer variables
 Offers a general decomposition technique for any
function
 For realizing any 6 variable function
Z(a, b, c, d, e, f) = a’. Z(0,b,c,d,e,f) + a . (1,b,c,d,e,f)
=a’Z0+aZ1
SHANNON DECOMPOSITION
 Z(a,b,c,d,e,f) = abcd’ef’ + a’b’c’def’ + b’cde’f
SHANNON DECOMPOSITION
 If Only 4-Variable Function Generators Available
 Can Recursively Apply Shannon Decomposition
• Z(a,b,c,d,e,f) = a’b’Z(0,0,c,d,e,f) + a’bZ(0,1,c,d,e,f)
+ ab’Z(1,0,c,d,e,f) + abZ(1,1,c,d,e,f)
• Z(a,b,c,d,e,f) = a’b’Z00 + a’bZ01 + ab’Z10 + abZ11
SHANNON DECOMPOSITION
 Z = abcd’ef’ + a’b’c’def’ + b’cde’f
SHANNON DECOMPOSITION
SHANNON DECOMPSITION
 Any 7-Variable Function Can Be Realized with 6 or
Fewer LUT5s
 Z(a,b,c,d,e,f,g) = a’b’Z00 + a’bZ01+ab’Z10 + abZ11

 Z = c’de’fg + bcd’e’fg’ + a’c’def’g + a’b’d’ef’g’ +


ab’defg’
SHANNON DECOMPOSITION
 Shannon Decomposition
 Decompose n Variable Function
• Two n-1 Variable Functions
• 2-to-1 MUX

 Expensive to Implement MUXes with LUTs


 Some FPGAs Provide MUXes in Addition to LUT4s
SHANNON DECOMPOSITION
 7 Variable Function Generator
 Two 6 Variable Generators + 2-to-1 MUX
 Four 5 Variable Generators + Three 2-to-1 MUXes
 Eight 4 Variable Generators + Seven 2-to-1 MUXes
7-VARIABLE FUNCTION USING LUT4
XILINX SPARTAN
 Xilinx Spartan
 Provides MUXes in Addition to LUT4
 Logic Unit Called “Slice”
 7-Variable Function
 Requires 4 Slices
PARITY FUNCTION
F = A  B  C  D  E
 How Many LUT4s?

 How Many Slices?


CARRY CHAINS
 Adder Circuit
 One LUT4 Needed for Each SUM and CARRY Bit
• N-Bit Adder Requires 2N LUT4s

 Very Commonly Used


 Some FPGAs Provide Built-In Carry Chain
• N-Bit Adder Requires N LUT4s
CARRY CHAINS
CASCADE CHAINS
EXAMPLES OF FPGA LOGIC BLOCKS
 Examples of Logic Blocks in Commercial FPGAs
 Xilinx Spartan/Virtex Slice
• 4-Variable LUTs
 Altera Logic Element
• 4-Variable LUTs
 Actel Fusion VersaTile
• MUXes
XILINX VIRTEX SLICE
ALTERA LOGIC ELEMENT
ACTEL FUSION VERSATILE
MEMORY IN FPGAs
 Early FPGAs
 No Dedicated Memory
 Only LUT-Based Memory
 Typically Interfaced to
External Memory Chips

 Modern FPGAs
 Include Dedicated Memory
• Xilinx Calls “Block RAM”
• Altera Calls “TriMatrix”
 Some Have ECC Bits
• Some Manufacturers Count in Memory Size
SIZE OF DEDICATED RAM
FPGA Family Dedicated RAM size Organization
(Kbits)
Xilinx Virtex 5 1152 - 10368 64 - 576 18Kb blocks
Xilinx Virtex 4 864 - 9936 48 - 552 18Kb blocks
Xilinx Virtex-II 72 - 3024 4 - 168 18Kb blocks
Xilinx Spartan 3E 72 - 648 4 - 36 18Kb blocks
Altera Stratix II 409 - 9163 104 - 930 512b blocks
78 - 768 4Kb blocks
0 - 9 512Kb blocks
Altera Cyclone II 117 - 1125 26 - 250 4Kb blocks
Lattice SC 1054 - 7987 56 - 424 18Kb blocks
Actel Fusion 27 - 270 6 - 60 4Kb blocks
LUT-BASED RAM
 Memory in LUTs (aka “Distributed Memory”)
 Can Be Used for Small Amounts of Memory
 4-Variable LUT Contains 16-Bits RAM
 Two LUT4s
• Create 32 x 1 Memory
– 5 Address Lines
– MUX Select on LUTs (MSB Address Line)
• Create 16 x 2 Memory
– 4 Address Lines
– 2 Data Lines
LUT-BASED RAM
 Asynchronous – Data Ready After Access Time
 Synchronous – Combine with Logic Block Flip-Flops
LUT-BASED RAM
FPGA Family LUT Based RAM No. of LUTs
(Kb)
Xilinx Virtex 5 320 - 3420 19200 - 207360
Xilinx Virtex 4 96 - 987 12288 - 126336
Xilinx Virtex-II 8 - 1456 512 - 93184
Xilinx Spartan 3E 15 - 231 1920 - 29504
Altera Stratix II 195 - 2242 12480 - 143520
Altera Cyclone II 72 - 1069 4608 - 68416
Lattice SC 245 - 1884 15200 - 115200
Lattice ECP2 12 - 136 6000 - 68000
DEDICATED MULTIPLIERS IN FPGAs
 Multiplication
 Common Operation
 Uses Lots of Logic Blocks
 Slow with Reconfigurable Logic
 FPGAs Commonly Include Dedicated Multipliers
 Xilinx Virtex/Spartan and Altera Stratix/Cyclone
• Have 18x18 Multipliers Producing 36 Bit Product
DEDICATED MULTIPLIERS IN FPGAs
 For Multiplication with Larger than 18 Bit Numbers
 Several Dedicated Multipliers Can Be Put Together
 If A and B are 32-Bit Numbers
• A = Cx216 + D
• B = Ex216 + F
• AB = CEx232 + (DE+CF)x216 + DF
• Use Four Multipliers to Generate CE, DE, CF, DF
• Use Adders to Combine
VHDL INFERRING DEDICATED MULTIPLIERS
 Synthesis Results in Using 4 Dedicated Multipliers
 64 Inputs and 64 Outputs
library IEEE;
use IEEE.numeric_bit.all;

entity multiplier is
port(A, B: in unsigned (31 downto 0);
C: out unsigned (63 downto 0));
end multiplier;
architecture mult of multiplier is
begin
C <= A * B;
end mult;
COST OF PROGRAMMABILITY
# of # of
Device Config Logic # of Usable
Vendor Family Device Bits Blocks LUTs I/O Pins

Xilinx Virtex-5 XC5VLX30 8.4M 4,800 19,200 400

XC5VLX330 79.7M 51,840 207,360 1200


Xilinx Virtex-II XC2V40 0.3M 256 512 88

XC2V8000 26.2M 46,592 93,184 1108

Xilinx Spartan 3E XC3S100E 0.6M 960 1,920 108

XC3S1600E 6.0M 14,752 29,504 376


Altera Stratix II EP2S15 4.7M 6,240 12,480 366

EP2S180 49.8M 71,760 143,520 1170


Altera Stratix EP1S10 3.5M 10,570 10,570 426
EP1S80 23.8M 79,040 79,040 1238
Altera Cyclone II EP2C5 1.3M 4,608 4,608 158
EP2C70 14.3M 68,416 68,416 622
ONE-HOT STATE ASSIGNMENT
 Minimizing FPGA
 Reducing LUTs Often More Important Than
Reducing Flip-fops
 One-Hot State Assignment Simplifies Logic
• T0 = 1000, T1 = 0100, T2 = 0010, T3 = 0001
ONE-HOT STATE ASSIGNMENT
 Q3+ = X1Q0Q1’Q2’Q3’ + X2Q0’Q1Q2’Q3’ + X3Q0’Q1’Q2Q3’
+ X4Q0’Q1’Q2’Q3
 Since One-Hot, Not Necessary to Include Q’ Terms
 Q3+ = X1Q0 + X2Q1 + X3Q2 + X4Q3
 Each Product Term in Next State and Output Logic
• Contains Exactly One State Variable
ONE-HOT STATE ASSIGNMENT
 Complications for Reset
 Reset Requires Making All Flip-Flops 0 Except One
 Issue for FPGAs Without Flip-Flop Preset Capability
• E.g. Xilinx 3000 Series

 Solutions
 Replace 1000 with 0000
• Replace Q0 with Q0’ in Equations
 Reset to 0000 and Immediate Transition to 1000
• Add Term Q0’Q1’Q2’Q3’ in Q0+
FPGA CAPACITY
 Determining Gate Count for FPGAs
 Not Straightforward
• Some Use LUTs
• Highly Dependent on Circuit Being Implemented

 Equivalent Gate Count


 Size of ASIC Design that Can Fit in FPGA
• Depends on Circuit Type and Interconnections
FPGA CAPACITY
 Approximate Equivalent Gate Count
 Determine Size of Circuits One Logic Block Can
Implement
• Multiply By Number of Logic Blocks
 Will Be Higher Than for Practical Circuits
 Better Gate Count Estimate
 Use PREP Benchmark Circuits
 If Benchmark Circuit Requires 2000 Gates in ASIC
• See How Many Fit in FPGA and Multiply By 2000
– e.g., If 20 Copies Fit, Then Gate Count 40K
 Will Still Be Higher Than for Practical Circuits
• Since No Interconnections Between Copies
FPGA CAPACITY
 Typical Gate Count
 Some Vendors Multiply Maximum Gate Count by
Some Weighting Factor

 System Gates
 Typically Designs Will Use Some LUTs as Memory
 Venders Often Compute System Gates By
Assuming Some Fraction of Logic Blocks Used as
RAMs (e.g., 20-30%)

S-ar putea să vă placă și