Documente Academic
Documente Profesional
Documente Cultură
R.B.Ghongade
Key Terms
Field-Programmable Device (FPD) a general term that refers to any type of integrated circuit used for implementing digital hardware, where the chip can be configured by the end user to realize different designs. Programming of such a device often involves placing the chip into a special programming unit, but some chips can also be configured in-system. Another name for FPDs is programmable logic devices (PLDs); although PLDs encompass the same types of chips as FPDs, we prefer the term FPD because historically the word PLD has referred to relatively simple types of devices. PLA a Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of logic, an ANDplane and an OR-plane, where both levels are programmable.
Key Terms
PAL a Programmable Array Logic (PAL) is a relatively small FPD that has a programmable AND-plane followed by a fixed OR-plane SPLD refers to any type of Simple PLD, usually either a PLA or PAL CPLD a more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a single chip. Alternative names are sometimes adopted for this style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others. FPGA a Field-Programmable Gate Array is an FPD featuring a general structure that allows very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs (AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of flip-flops to logic resources than do CPLDs.
Key Terms
HCPLDs high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This term has been coined in trade literature for providing an easy way to refer to both types of devices. Interconnect the wiring resources in an FPD. Programmable Switch a user-programmable switch that can connect a logic element to an interconnect wire, or one interconnect wire to another Logic Block a relatively small circuit block that is replicated in an array in an FPD. When a circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits that can each be mapped into a logic block. The term logic block is mostly used in the context of FPGAs, but it could also refer to a block of circuitry in a CPLD.
Key Terms
Logic Capacity the amount of digital logic that can be mapped into a single FPD. This is usually measured in units of equivalent number of gates in a traditional gate array. In other words, the capacity of an FPD is measured by the size of gate array that it is comparable to. In simpler terms, logic capacity can be thought of as number of 2-input NAND gates. Logic Densitythe amount of logic per unit area in an FPD. Speed-Performance measures the maximum operable speed of a circuit when implemented in an FPD. For combinational circuits, it is set by the longest delay through any path, and for sequential circuits it is the maximum clock frequency for which the circuit functions properly.
Generic usage
FPD
ASIC
Typical usage
SPLD
CPLD
FPGA
GATE ARRAY
STANDARD CELL
FULL CUSTOM
Increasing complexity
Increasing complexity
PLA
PAL
GAL
PROM
EPLD
E2PLD
Inputs
(logic variables)
Outputs
(logic functions)
Consists of a set of inputs (the logic variables) and set of outputs (logic functions). The job of the designer is to simply program the switches and hence configure the logic gates to perform the desired function
Y AB AC BC
A
Thus given a logic function in SOP form , it can be implemented by using AND and OR arrays. This forms the basic working principle of programmable logic devices
Programming technologies
The type of links gives rise to two different technologies: Fusible link and Anti-fuse Fusible link technologies
Un-programmed Device
Logic 1 a
Programmed Device
b y=a.b'
blown fuses
Antifuse technologies
Logic 1 a
Un-programmed Device
antifuse links
a
Logic 1
Programmed Device
y=a.b'
Simplified Antifuse
An antifuse is a microscopic column of amorphous (noncrystalline) silicon linking two metal tracks. In its unprogrammed state, the amorphous silicon acts as an insulator with a very high resistance in excess of one billion ohms
Un-programmed
Programmed
Poly-diffusion Anti-fuse
An Oxide-Nitride-Oxide dielectric normally prevents current from flowing between diffusion and poly-silicon layers When a programming pulse is applied the dielectric melts and a circuit is formed between the diffusion and poly-silicon
Metal-Metal Anti-fuse
The link is an alloy of tungsten, titanium and silicon The conductive link usually forms at the corner of the via where the electric field is highest during programming
Programming!
The act of programming this particular element effectively grows a linkknown as a viaby converting the insulating amorphous silicon into conducting polysilicon Devices based on antifuse technologies are OTP, because once an antifuse has been grown, it cannot be removed. Again it is a severe limitation of the technology, but antifuse technology has found its way in space applications because of high reliability
PLD Notation
link a b c d
a.b'.d
a'.c'
no link
Non-programmable link
a b c d non-programmable connection
a.b'.d
a'.c'
PLA
a b c d Programmable OR array
f1
f2
f3
a.b.c'.d' a'.b'.c'
b'.c
PLA QP82S100
m 16 n 8 p 48
Consists of 16 dedicated inputs and 8 dedicated outputs Each output is capable of being actively controlled by any or all of the 48 product terms. The True, Complement, or Dont Care condition of each of the 16 inputs can be ANDed together to comprise one product term All 48 product terms can be selectively ORed to each output
PLA QP82S100
PAL
a b c d Fixed OR array
f1
f2
f3
Additional Features
Tri-state outputs
gives programmable bi-directional pins saves the pin-count
Registered outputs
Enables the use of the PAL in finite state machines Increases the versatility of the device
Macrocell
PAL16L8A
Specifications
Part Number = PAL16L8A Description = Programmable array logic device Fuse type=titanium-tungsten Manufacturer = Texas Instruments Number of Inputs = Upto 16 Prod. Terms Max. = 64 No. of Outputs = Upto 8 Nom. Supp (V) = 5.0 Package = DIP, LEADLESS CERAMIC CHIP CARRIER(FK) Pins = 20 Technology = Advanced Low-Power Schottky Bi-directional pins=6
Reprogrammable PLDs
The basic (and most severe) limitation with fusible link and antifuse technologies is that, the device cannot be re-programmed. This may be a severe short-coming especially during the development phases of the system
EPROM
An EPROM transistor has the same basic structure as a standard MOS transistor, but with the addition of a second polysilicon floating gate isolated by layers of oxide
SOURCE TERMINAL SOURCE TERMINAL CONTROL GATE TERMINAL DRAIN TERMINAL SiO2 GATE SOURCE DRAIN Si GATE
FLOATING GATE
DRAIN TERMINAL
SOURCE
DRAIN
MOS TRANSISTOR
EPROM TRANSISTOR
EPROM
In its un-programmed state, the floating gate is uncharged and doesnt affect the normal operation of the control gate. To program the transistor, a relatively high voltage in the order of 12V is applied between the control gate and drain terminals. This causes the transistor to be turned hard on, and excited electrons push through the oxide into the floating gate in a process known as hot (high energy) electron injection. When the programming signal is removed, a negative charge remains on the floating gate. This charge is very stable and will not dissipate for more than a decade under normal operating conditions. The stored charge on the floating gate inhibits the normal operation of the control gate, and thus distinguishes those cells that have been programmed from those which have not.
EPROM
2 E PROM
An E2PROM cell is approximately 2.5 times larger than an EPROM cell because it contains two transistors. One of the transistors is similar to that of an EPROM transistor in that it contains a floating gate, but the insulating oxide layers surrounding the floating gate are very much thinner. The second transistor can be used to erase the cell electrically, and E2PROM devices can typically be erased and reprogrammed on a word-by-word basis.
2 E PROM
FLASH
The name FLASH was originally coined to reflect the technologys rapid erasure times compared to EPROM These devices can be electrically erased, but only by erasing the whole device or a large portion of it. architectures have a two-transistor cell which is very similar to that of an E2PROM cell allowing them to be erased and reprogrammed on a word-by-word basis.
FLASH
SRAM
It consists of two cross-coupled inverters and two access transistors The SRAM cell drives the gates of other transistors on the chip - either ON to make connection or OFF to break the connection. The access transistors R E A D / W R IT E are connected to the at their respective gate terminals, and the DATA at their source/drain terminals. The is R E A D / W R IT E used to select the cell while the DATA are used to perform read or write operations on the cell. Internally, the cell holds the stored value on one side and its complement on the other side. To store data, R E A D / W R IT E is set to to 1 (5v), the NMOS now passes the data from the left hand side to the right hand side of the transistor. After the data stabilizes around the two NOT gates, R E A D / W R IT E is set to 0, and the data remains running forever. Note that the lower NOT is labeled WEAK, meaning it has weaker transistors. That is in case we want to set a new data and we want the STRONG NOT to override the WEAK one in case the logical level has to change
SRAM Cell
SRAM
SRAM cells are used for the following: 1. They can store a logic value of 0 or 1. 2. They can store a value of an LUT. 3. They configure the interconnection switches of the FPGA
Type
Symbol
Re-programmable
Volatile
Technology
Fusible link
No
No
Bipolar
Anti-fuse
No
No
CMOS
FPGA
EPROM
No
UVCMOS
E2PROM
No
EECMOS
SRAM
Yes
CMOS
FPGA
FLASH
No
CMOS
FPGA
Technology Antifuse
Advantages Highest density - a mere cross point - 10X the density of SRAM Lowest switch resistance - 25 Ohms Very low capacitance 1 fF per node.approaching the metal line capacitance non- volatile Nearly impossible to reverse engineer Radiation hard Live with in 1 millisecond of the power supply reaching spec voltage Software is easy to place and route
Limitations Requires programmer Requires a socket - a problem for devices with > 200 pins solved with BGA Those who design by test will throw out a lot of parts. Requires one to two transistors per wire for programming ~ 10mA for Metal antifuses ONO antifuses require less only 5mA needed so can be programmed from the edge Some antifuse defects not testable until programming - hence only 98% to 99 % programming yield - but 100% functional Requires high voltages About the same speed as SRAM Radiation Hardness is expected to behave similar to EPROM - has not been tested yet
FLASH
Re-programmable in the board No socket Non-volatile One transistor instead of 6 for routing control - i.e. denser parts Passes full Vcc without pump Live at power up. Difficult to reverse engineer
CPLD
R.B.Ghongade
Key Terms
CPLD: Complex programmable logic device. A programmable logic device consisting of several interconnected programmable blocks. Logic Array Block (LAB): A group of macrocells that share common resources in a CPLD. Programmable Interconnect Array (PIA): An internal bus with programmable connections that link together the Logic Array Blocks of a CPLD. Buried logic: Logic circuitry in a PLD that has no connection to the input or output pins of the PLD, but is used solely as internal logic. I/O Control Block: A circuit in a CPLD that controls the type of tri-state switching used in a macrocell output.
Key Terms
Parallel logic expanders: Product terms that are borrowed from neighbouring macrocells in the same LAB. Shared logic expanders: Product terms that are inverted and fed back into the programmable AND matrix of an LAB for use by any other macrocell in the LAB. Specifications: There are several performance specifications for complex programmable logic devices
Internal frequency is the speed at which CPLDs can perform operations or transfer data internally. The propagation delay is the time interval between the application of an input signal and the occurrence of the corresponding output in a logic circuit. Speed grade indicates the delay in nanoseconds (ns) through a macrocell in the CPLD. For example, a CPLD with a speed grade of 10 has a delay of 10 ns through a macrocell. CPLD with low speed grade numbers run faster than devices with highspeed grade numbers
CPLD
The term complex PLD (CPLD) is generally taken to refer to a class of devices that
contain a number of simple PLA or PAL functions (generically referred to as simple PLDs (SPLDs) share a common programmable interconnection matrix.
Thus CPLDs consist of multiple SPLD-like blocks on a single chip. However, CPLD products are much more sophisticated than SPLDs, even at the level of their basic SPLD-like blocks. While each manufacturer has a different variation, in general they are all similar in that they consist of function blocks, input/output block, and an interconnect matrix. The devices are programmed using programmable elements that, depending on the technology of the manufacturer, can be
EPROM cells EEPROM cells Flash EPROM cells
Some tricks!
Using XOR gate as programmable NOT gate
0 1 1
LOGIC CIRCUIT
LOGIC CIRCUIT
Some tricks!
Using MUX as programmable switch
4:1 MUX
Programmable Cells
Packages
Device number
E P M 7 128 S LC84
EPM7
128
S In-system programmable
Architecture
The MAX 7000 architecture includes the following elements:
Logic array blocks (LAB) Macrocells Expander product terms (shareable and parallel) Programmable interconnect array I/O control blocks
Global Clock
Architecture
Macrocell
Macrocell
The macrocell is similar to that of a GAL or Universal PAL in that it provides a sum-of-products function with active- HIGH or -LOW options and the choice of registered or combinational output. Registered outputs can be clocked with one of two global clocks or by a product term from the AND matrix. The register can be cleared globally or by a product term and preset with a product term. The macrocell has five dedicated product terms, which is fewer than found in the PAL and GAL. This is generally sufficient to implement most logic functions. If more terms are required, they can be supplied by a set of shared logic expanders or parallel logic expanders.
Shareable Expanders
Shareable Expanders
Shared logic expanders do not add more product terms to a given macrocell. They do make the programming of the entire LAB more efficient by allowing a product term to be programmed once and used in several macrocells of the same LAB. One product term per macrocell is inverted and fed back into the shared expander pool of product terms. Since there are 16 macrocells per LAB, the shared logic expander pool has up to 16 product terms
Parallel Expanders
Parallel Expanders
Parallel logic expanders allow a macrocell to borrow up to 15 product terms from its three lower-numbered neighbours (5 product terms per neighboring macrocell). For example, macrocell 4 can borrow up to 5 terms each from macrocells 3, 2, and 1. By using its 5 dedicated product terms and the maximum number of parallel expanders, a macrocell can have up to 20 product terms at its disposal. These borrowed terms are not usable by the macrocell from which they were borrowed. The parallel expanders are set up so that a lowernumber cell lends product terms to a higher-number cell, so the number of available terms depends on how close to the end of a chain a macrocell is.
PIA
Logic is routed between LABs via the programmable interconnect array (PIA). This global bus is a programmable path that connects any signal source to any destination on the device. All MAX 7000 dedicated inputs, I/O pins, and macrocell outputs feed the PIA, which makes the signals available throughout the entire device. Only the signals required by each LAB are actually routed from the PIA into the LAB. An EEPROM cell controls one input to a 2-input AND gate, which selects a PIA signal to drive into the LAB. While the routing delays of channel-based routing schemes in masked or FPGAs are cumulative, variable, and path-dependent, the MAX 7000 PIA has a fixed delay. The PIA thus eliminates skew between signals and makes timing performance easy to predict.
I/O Block
I/O Block
The I/O control block allows each I/O pin to be individually configured for input, output, or bidirectional operation. All I/O pins have a tri-state buffer that is individually controlled by one of the global output enable signals or directly connected to ground or VCC. The I/O control block of EPM7032, EPM7064, and EPM7096 devices has two global output enable signals that are driven by two dedicated active-low output enable pins (OE1 and OE2). The I/O control block of MAX 7000E and MAX 7000S devices has six global output enable signals that are driven by the true or complement of two output enable signals, a subset of the I/O pins, or a subset of the I/O macrocells
I/O Control
I/O Block
When the tri-state buffer control is connected to ground, the output is tri-stated (high impedance) and the I/O pin can be used as a dedicated input. When the tri-state buffer control is connected to VCC, the output is enabled. The MAX 7000 architecture provides dual I/O feedback, in which macrocell and pin feedbacks are independent. When an I/O pin is configured as an input, the associated macrocell can be used for buried logic
Output Configuration
MultiVolt I/O Interface
MAX 7000 device outputs can be programmed to meet a variety of system-level requirements. MultiVolt I/O Interface MAX 7000 devicesexcept 44pin devicessupport the MultiVolt I/O interface feature, which allows MAX 7000 devices to interface with systems that have differing supply voltages. The 5.0-V devices in all packages can be set for 3.3V or 5.0-V I/O pin operation. These devices have one set of VCC pins for internal operation and input buffers (VCCINT), and another set for I/O output drivers (VCCIO).
Output Configuration
Open-Drain Output Option (MAX 7000S Devices Only)
This open-drain output enables the device to provide system-level control signals (e.g., interrupt and write enable signals) that can be asserted by any of several devices. It can also provide an additional wired-OR plane
Output Configuration
Slew-Rate Control
The output buffer for each MAX 7000E and MAX 7000S I/O pin has an adjustable output slew rate that can be configured for low-noise or high-speed performance. A faster slew rate provides high-speed transitions for high-performance systems However, these fast transitions may introduce noise transients into the system. A slow slew rate reduces system noise, but adds a nominal delay of 4 to 5 ns.
More packages
VQFP: Very Fine Pitch Quad Flat Pack/ Very Thin Quad Flat Package
Device marking
Features
High-performance: 5 ns pin-to-pin logic delays on all pins, fCNT to 125 MHz Large density range: 36 to 288 macrocells with 800 to 6,400 usable gates 5V in-system programmable: Endurance of 10,000 program/erase cycles Enhanced pin-locking architecture Flexible 36V18 Function Block: 90 product terms drive any or all of 18 macrocells within Function Block, global and product term clocks, output enables, set and reset signals, extensive IEEE Std 1149.1 boundary-scan (JTAG) support ,slew rate control on individual outputs, user programmable ground pin capability, extended pattern security features for design protection, High-drive 24 mA outputs, 3.3V or 5V I/O capability Advanced CMOS 5V FLASH technology Supports parallel programming of multiple XC9500 devices
XC9500 Architecture
Function Blocks
Function Blocks
The AND plane still exists as shown by the crossing wires. The AND plane can accept inputs from the I/O blocks, other function blocks, or feedback from the same function block. The terms are then ORed together using a fixed number of OR gates, and terms are selected via a large multiplexer. The outputs of the mux can then be sent straight out of the block, or through a clocked flip-flop. This particular block includes additional logic such as a selectable exclusive OR and a master reset signal, in addition to being able to program the polarity at different stages
Function Blocks
Each Function Block is comprised of 18 independent macrocells, each capable of implementing a combinatorial or registered function. The FB also receives global clock, output enable, and set/reset signals. The FB generates 18 outputs that drive the Fast CONNECT switch matrix. These 18 outputs and their corresponding output enable signals also drive the IOB. Logic within the FB is implemented using a sum-ofproducts representation. Thirty-six inputs provide 72 true and complement signals into the programmable AND-array to form 90 product terms. Any number of these product terms, up to the 90 available, can be allocated to each macrocell by the product term allocator.
XC9500 macrocell
Set control Programmable inversion or XOR product term Up to 5 product terms Global clock or product-term clock Reset control
OE control
IOB
Switch matrix
Macrocell
I/O blocks
Output Banking: The output pins are grouped in large banks which allow easy interfacing to 3.3V, 2.5V, 1.8V, and 1.5V in a single part. Thus these CPLDs can be widely used as voltage interface translators
DataGate
DataGate
Is used for power reduction. Each I/O pin has a series switch that can block the arrival of free running signals that are not of interest. Signals that serve no use may increase power consumption, and can be disabled. Users are free to do their design, then choose sections to participate in the DataGATE function. DataGATE is a logic function that drives an assertion rail threaded through the medium and high-density CoolRunner-II CPLD parts. Designers can select inputs to be blocked under the control of the DataGATE function, effectively blocking controlled switching signals so they do not drive internal chip capacitances. Output signals that do not switch, are held by the bus hold feature. Any set of input pins can be chosen to participate in the DataGATE function.
Choice of CPLD
When considering a CPLD for use in a design, the following issues should be taken into account :
1. The programming technology EPROM, EEPROM, or Flash EPROM? This will determine the equipment needed to program the devices and whether they can be programmed only once or many times. 2. The function block capability How many function blocks are there in the device? How many product and sum terms can be used? What are the minimum and maximum delays through the logic? What additional logic resources are there such as XNORs, ALUs, etc.? What kind of register controls are available (e.g., clock enable, reset, preset, polarity control)? How many are local inputs to the function block and how many are global, chipwide inputs? What kind of clock drivers are in the device and what is the worst case skew of the clock signal on the chip. This will help determine the maximum frequency at which the device can run. 3. The I/O capability How many I/O are independent, used for any function, and how many are dedicated for clock input, master reset, etc.? What is the output drive capability in terms of voltage levels and current? What kind of logic is included in an I/O block that can be used to increase the functionality of the design?
FPGA
R.B.Ghongade
Key terms
Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations. CLB: Configurable Logic Block is the name for programmable logic block in a FPGA. Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table. Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element. Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements. DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA. Clock trees: Distribution of clock signal lines along the FPGA architecture.
FPGA architecture
Programmable logic block
Programmable interconnect
Many times the FPGA is described in terms of the fabric which means the underlying structure of the device
Programming
FPGAs can use any one of the following programming technologies:
SRAM Antifuse FLASH Hybrid FLASH-SRAM
FPGA fabric
Types of architectures
Fine grained
Each programmable logic block can be used to implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.). fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations). Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks
Types of architectures
Coarse grained
In the case of a coarse-grained architecture, each logic block contains a relatively large amount of logic compared to their fine-grained counterparts. For example, a logic block might contain four 4-input LUTs, four multiplexers, four D-type flip-flops, and some fast carry logic. As the granularity of the blocks increases to mediumgrained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support.
MUX-based
This is based on the Shannons decomposition theorem which states that: Let f(x) be a switching function on n variables. Then f(a) can be factored as
f ( a ) ai f 1 ai f 2
OR
y ab c
b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1
y1 c
y2 b c
y a (c ) a ( b c )
Example
b 0 0 1 1 c 0 1 0 1 y 0 1 1 1
y2 y3 y4 y2 b c b 1
MUX implementation
LUT-based
An n-input LUT is that it can implement any possible n-input combinational. The underlying concept behind a LUT is relatively simple. A group of input signals is used as an index (pointer) to a lookup table. The contents of this table are arranged such that the cell pointed to by each input combination contains the desired value
LUT implementation
# of LUTs?
It has been statistically concluded that a 4-input LUT is best for FPGA devices. One additional advantage of LUT based programmable block is that the SRAM the cells forming the LUT can be used as a small block of RAM (the 16 cells forming a 4input LUT, for example, could be used as a 16 X 1 RAM). This is referred to as distributed RAM. Also all the SRAM cells are effectively connected in a chain. This is so as to facilitate the programming. But this offers a new possibility of using this chain as a shift register. Because of all these advantages , majority of todays FPGA architectures are LUT based
High-performance families
LX
Logic
Xilinx devices
0.13m 0.18m 0.22m 0.3m
0.13m
90nm
65nm
Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-II 450 MHz 8M gates Virtex-4 500 MHz 16M gates*
0.35m
0.25m
Virtex-E 240 MHz 4M gates Virtex 200 MHz 1M gates Spartan 80 MHz 40K gates Spartan-II 200 MHz 200K gates
XC4000 100 MHz 250K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates
1985 1987
1991
1995
2006
4-input LUT
y
0 MUX
FLIP-FLOP
clock
The core building block in a modern FPGA from Xilinx is called a logic cell
Logic Cell
The register can be configured to act as a flipflop, or as a latch. The polarity of the clock (rising- edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low). In addition to the LUT, MUX, and register, the LC also contains other elements, including some special fast carry logic for use in arithmetic operations.
The Slice
A slice contains two LCs Each logic cells LUT, MUX, and register have their own data inputs and outputs; the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.
Embedded RAM
Embedded RAM
A lot of applications require the use of memory, so FPGAs may include relatively large chunks of embedded RAM called block RAM. Depending on the architecture of the component, these blocks might be positioned around the periphery of the device, scattered across the face of the chip in relative isolation, or organized in columns. Each block of RAM can be used independently, or multiple blocks can be combined together to implement larger blocks. These blocks can be used for a variety of purposes, such as implementing standard single- or dual-port RAMs, first-in first-out (FIFO) functions and state machines
MAC
Clock trees
All of the synchronous elements inside an FPGAfor example, the registers configured to act as flip-flops inside the programmable logic blocksneed to be driven by a clock signal. Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate registers.
Clock trees
Clock managers
Some FPGA clock managers are based on phase-locked loops (PLLs), while others are based on digital delay-locked loops
Jitter removal
Skew correction
General-purpose I/O
I/O
Each bank can be configured individually to support a particular I/O standard. Allows the FPGA to work with devices using multiple I/O standards, FPGA can actually be used to interface between different I/O standards (and also to translate between different protocols that may be based on particular electrical standards).
Core voltages
Gigabit transceivers
The traditional way to move large amounts of data between devices is to use a bus, a collection of signals that carry similar data and perform a common function Buses grew to 16 bits in width, then 32 bits, then 64 bits, and so forth. The problem is that this requires a lot of pins on the device and a lot of tracks connecting the devices together. Routing these tracks so that they all have the same length and impedance becomes increasingly difficult as boards grow in complexity. Furthermore, it becomes increasingly difficult to manage signal integrity issues (such as susceptibility to noise) when we are dealing with large numbers of bus-based tracks.
Todays high-end FPGAs include special hard-wired gigabit transceiver blocks. These blocks use one pair of differential signals (which means a pair of signals that always carry opposite logical values) to transmit (TX) data and another pair to receive (RX) data
PSM
CLB CLB
PSM
CLB Programmable Switch Matrix
PSM
CLB CLB
PSM
CLB
The Switch
The actual switching matrix employs a structure of six pass transistors per cross point. Thus connectivity can be established by controlling the transistors
FPGA
R.B.Ghongade
Key terms
Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations. CLB: Configurable Logic Block is the name for programmable logic block in a FPGA. Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table. Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element. Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements. DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA. Clock trees: Distribution of clock signal lines along the FPGA architecture.
FPGA architecture
Programmable logic block
Programmable interconnect
Many times the FPGA is described in terms of the fabric which means the underlying structure of the device
Programming
FPGAs can use any one of the following programming technologies:
SRAM Antifuse FLASH Hybrid FLASH-SRAM
FPGA fabric
Types of architectures
Fine grained
Each programmable logic block can be used to implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.). fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations). Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks
Types of architectures
Coarse grained
In the case of a coarse-grained architecture, each logic block contains a relatively large amount of logic compared to their fine-grained counterparts. For example, a logic block might contain four 4-input LUTs, four multiplexers, four D-type flip-flops, and some fast carry logic. As the granularity of the blocks increases to mediumgrained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support.
MUX-based
This is based on the Shannons decomposition theorem which states that: Let f(x) be a switching function on n variables. Then f(a) can be factored as
f ( a ) ai f 1 ai f 2
OR
y ab c
b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1
y1 c
y2 b c
y a (c ) a ( b c )
Example
b 0 0 1 1 c 0 1 0 1 y 0 1 1 1
y2 y3 y4 y2 b c b 1
MUX implementation
LUT-based
An n-input LUT is that it can implement any possible n-input combinational. The underlying concept behind a LUT is relatively simple. A group of input signals is used as an index (pointer) to a lookup table. The contents of this table are arranged such that the cell pointed to by each input combination contains the desired value
LUT implementation
# of LUTs?
It has been statistically concluded that a 4-input LUT is best for FPGA devices. One additional advantage of LUT based programmable block is that the SRAM the cells forming the LUT can be used as a small block of RAM (the 16 cells forming a 4input LUT, for example, could be used as a 16 X 1 RAM). This is referred to as distributed RAM. Also all the SRAM cells are effectively connected in a chain. This is so as to facilitate the programming. But this offers a new possibility of using this chain as a shift register. Because of all these advantages , majority of todays FPGA architectures are LUT based
High-performance families
LX
Logic
Xilinx devices
0.13m 0.18m 0.22m 0.3m
0.13m
90nm
65nm
Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-II 450 MHz 8M gates Virtex-4 500 MHz 16M gates*
0.35m
0.25m
Virtex-E 240 MHz 4M gates Virtex 200 MHz 1M gates Spartan 80 MHz 40K gates Spartan-II 200 MHz 200K gates
XC4000 100 MHz 250K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates
1985 1987
1991
1995
2006
4-input LUT
y
0 MUX
FLIP-FLOP
clock
The core building block in a modern FPGA from Xilinx is called a logic cell
Logic Cell
The register can be configured to act as a flipflop, or as a latch. The polarity of the clock (rising- edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low). In addition to the LUT, MUX, and register, the LC also contains other elements, including some special fast carry logic for use in arithmetic operations.
The Slice
A slice contains two LCs Each logic cells LUT, MUX, and register have their own data inputs and outputs; the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.
Embedded RAM
Embedded RAM
A lot of applications require the use of memory, so FPGAs may include relatively large chunks of embedded RAM called block RAM. Depending on the architecture of the component, these blocks might be positioned around the periphery of the device, scattered across the face of the chip in relative isolation, or organized in columns. Each block of RAM can be used independently, or multiple blocks can be combined together to implement larger blocks. These blocks can be used for a variety of purposes, such as implementing standard single- or dual-port RAMs, first-in first-out (FIFO) functions and state machines
MAC
Clock trees
All of the synchronous elements inside an FPGAfor example, the registers configured to act as flip-flops inside the programmable logic blocksneed to be driven by a clock signal. Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate registers.
Clock trees
Clock managers
Some FPGA clock managers are based on phase-locked loops (PLLs), while others are based on digital delay-locked loops
Jitter removal
Skew correction
General-purpose I/O
I/O
Each bank can be configured individually to support a particular I/O standard. Allows the FPGA to work with devices using multiple I/O standards, FPGA can actually be used to interface between different I/O standards (and also to translate between different protocols that may be based on particular electrical standards).
Core voltages
Gigabit transceivers
The traditional way to move large amounts of data between devices is to use a bus, a collection of signals that carry similar data and perform a common function Buses grew to 16 bits in width, then 32 bits, then 64 bits, and so forth. The problem is that this requires a lot of pins on the device and a lot of tracks connecting the devices together. Routing these tracks so that they all have the same length and impedance becomes increasingly difficult as boards grow in complexity. Furthermore, it becomes increasingly difficult to manage signal integrity issues (such as susceptibility to noise) when we are dealing with large numbers of bus-based tracks.
Todays high-end FPGAs include special hard-wired gigabit transceiver blocks. These blocks use one pair of differential signals (which means a pair of signals that always carry opposite logical values) to transmit (TX) data and another pair to receive (RX) data
PSM
CLB CLB
PSM
CLB Programmable Switch Matrix
PSM
CLB CLB
PSM
CLB
The Switch
The actual switching matrix employs a structure of six pass transistors per cross point. Thus connectivity can be established by controlling the transistors
FPGA II
R.B.Ghongade
Versatile I/O and packaging Low-cost packages available in all densities Family footprint compatibility in common packages 16 high-performance interface standards Hot swap Compact PCI friendly Zero hold time simplifies system timing
Spartan II family
Device Logic Cells System Gates (Logic and RAM) 15,000 30,000 50,000 100,000 150,000 200,000 CLB Array (R x C) Total CLBs
Maximum Available User I/O
Total Block RAM Bits 16K 24K 32K 40K 48K 56K
8 x 12 12 x 18 16 x 24 20 x 30 24 x 36 28 x 42
Available packages
Device Maximum User I/O 86 92 176 176 Available User I/O According to Package Type
VQ100 VQG100 TQ144 TQG144 CS144 CSG144 PQ208 PQG208 FG256 FGG256 FG456 FGG456
60 60 -
86 92 92 92
92 -
140 140
176 176
XC2S150
260
140
176
260
XC2S200
284
140
176
284
Slice
BUFT
Each Spartan-II CLB contains two 3-state drivers (BUFTs) that can drive on-chip busses. Each Spartan-II BUFT has an independent 3-state control pin and an independent input pin.
Block RAM
Spartan-II FPGAs incorporate several large block RAM memories. These complement the distributed RAM Look-Up Tables (LUTs) that provide shallow memory structures implemented in CLBs. Block RAM memory blocks are organized in columns. All Spartan-II devices contain two such columns, one along each vertical edge. These columns extend the full height of the chip. Each memory block is four CLBs high, and consequently, a Spartan-II device eight CLBs high will contain two memory blocks per column, and a total of four blocks.
Local Routing
Provide the following three types of connections:
Interconnections among the LUTs, flip-flops, and General Routing Matrix (GRM) Internal CLB feedback paths that provide high-speed connections to LUTs within the same CLB, chaining them together with minimal routing delay. Direct paths that provide high-speed connections between horizontally adjacent CLBs, eliminating the delay of the GRM
Local Routing
I/O Routing
Spartan-II devices have additional routing resources around their periphery that form an interface between the CLB array and the IOBs. This additional routing, called the VersaRing, facilitates pin-swapping and pin-locking, such that logic redesigns can adapt to existing PCB layouts. Time-to-market is reduced, since PCBs and other system components can be manufactured while the logic design is still in progress.
Dedicated Routing
Some classes of signal require dedicated routing resources to maximize performance. In the Spartan-II architecture, dedicated routing resources are provided for two classes of signal.
Horizontal routing resources are provided for on-chip3-state busses.Four partition-able bus lines are provided per CLB row, permitting multiple busses within a row Two dedicated nets per CLB propagate carry signals vertically to the adjacent CLB
Global Routing
Global Routing resources distribute clocks and other signals with very high fanout throughout the device. Spartan-II devices include two tiers of global routing resources referred to as primary and secondary global routing resources. The primary global routing resources are four dedicated global nets with dedicated input pins that are designed to distribute high-fanout clock signals with minimal skew. Each global clock net can drive all CLB,IOB, and block RAM clock pins. The primary global nets may only be driven by global buffers. There are four global buffers, one for each global net. The secondary global routing resources consist of 24backbone lines, 12 across the top of the chip and 12 across bottom. From these lines, up to 12 unique signals per column can be distributed via the 12longlines in the column. These secondary resources are more flexible than the primary resources since they are not restricted to routing only to clock pins
Input/Output Block
I/O Banking
Boundary Scan
Spartan-II devices support all the mandatory boundary-scan instructions specified in the IEEE standard 1149.1 A Test Access Port (TAP) and registers are provided that implement the EXTEST, SAMPLE/PRELOAD, and BYPASS instructions
Virtex IV family
Contain the same basic resources
Slices (grouped into CLBs)
Contain combinatorial logic and register resources
IOBs
Interface between the FPGA and the outside world
Overview of Virtex IV
The Virtex-4 Family is a new generation FPGA from Xilinx. The innovative Advanced Silicon Modular Block or ASMBL column-based architecture is unique in the programmable logic industry. ASMBL column-based architecture is unique in the programmable logic industry. Virtex-4 FPGAs contain three families (platforms): LX, FX, and SX. A wide array of hard-IP core blocks complete the system solution. These cores include the PowerPC processors, Tri-Mode Ethernet MACs, 622 Mb/s to 10+ Gb/s serial transceivers, dedicated DSP slices, high-speed clock management circuitry, and sourcesynchronous interface blocks. The basic Virtex-4 building blocks are an enhancement of those found in the popular Virtex-based product families: Virtex, Virtex-E, Virtex-II, Virtex-II Pro, and Virtex-II Pro X, allowing upward compatibility of existing designs. Virtex-4 devices are produced on a 90-nm copper process, using 300 mm (12 inch) wafer technology.
Virtex Architecture
Block SelectRAM SelectRAM resource I/O Blocks (IOBs)
Slice 0
PRE D CE CLR Q
LUT LUT
Carry Carry
LUT LUT
Carry Carry
D PRE Q CE CLR
Slices
LUTs
Flip-Flops
MULT_ANDs
64 bits
64 bits
S CO DI CI
CY_MUX
CY_XOR MULT_AND
AxB
LUT
LUT
A new feature introduced in Virtex family is the MULTI AND gate for efficient multiply and add implementation. Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition. The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit
Connecting LUTs
F5 F8
MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice
F5
Slice S0
F6
F5
Slice S1
F5
F6
COUT
To CIN of S2 of the next CLB
Simple, fast, and complete arithmetic Logic Dedicated XOR gate for singlelevel sum completion Uses dedicated routing resources All synthesis tools can infer carry logic
SLICE S2
SLICE S1
CIN COUT
CIN
CIN
CLB
D Q CE
D Q CE
D Q CE
IOB Element
Input path Two DDR registers Output path Two DDR registers Two 3-state enable DDR registers Separate clocks and clock enables for I and O Set and reset signals are shared
IOB
OCK1
Input
Reg Reg
ICK1
OCK2
3-state
Reg Reg
ICK2
OCK1
PAD PAD
Output
OCK2
Reg Reg
SelectIO Standard
Allows direct connections to external signals of varied voltages and thresholds Optimizes the speed/noise tradeoff Saves having to place interface components onto your board Differential signaling standards LVDS, BLVDS, ULVDS LDT LVPECL Single-ended I/O standards LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz) GTL, GTLP
LUT LUT
A0 A1 A2 A3
RAM32X1S D WE
Slice LUT
WCLK A0 A1 A2 A3 A4
LUT
DOA DOPA
DOB DOPB
Clock Regions
I/O Tile
SelectIO Resources
All Virtex-4 FPGAs have configurable high-performance SelectIO drivers and receivers, supporting a wide variety of standard interfaces. The robust feature set includes programmable control of output strength and slew rate, and on-chip termination using Digitally Controlled Impedance (DCI). All banks can support 3.3V I/O. Each IOB contains both input, output, and 3-state SelectIO drivers. These drivers can be configured to various I/O standards. Differential I/O uses the two IOBs grouped together in one tile.
Single-ended I/O standards (LVCMOS, LVTTL, HSTL, SSTL, GTL, PCI) Differential I/O standards (LVDS, LDT, LVPECL, BLVDS, CSE Differential HSTL and SSTL)
I/O Block
I/O Banking
Virtex 5 features
Four platforms Virtex-5 LX: High-performance general logic applications Future platforms will be optimized for advanced serial connectivity, signal processing applications, and embedded systems Most advanced, high-performance, optimal utilization, FPGA fabric True 6-input look-up table (LUT) technology Dual 5-LUT option Improved reduced-hop routing 64-bit distributed RAM option Powerful clock management tile (CMT) clocking Digital Clock Manager (DCM) blocks for zero delay buffering, frequency synthesis, and clock phase shifting PLL blocks for input jitter filtering, zero delay buffering, frequency synthesis, and phase-matched clock division Advanced DSP48E slices 25 x 18, twos complement, signed multiplication Optional adder/accumulator - Optional pipelining Optional bitwise logical functionality Dedicated cascade connections
Virtex 5 features
36-Kbit block RAM/FIFOs True dual-port RAM blocks Enhanced optional programmable FIFO logic Programmable True dual-port widths up to x36 Simple dual-port widths up to x72 Built-in optional error-correction circuitry with scrubbing Optionally program each block as two independent 18-Kbit blocks High-performance parallel SelectIO technology - 1.2 to 3.3V I/O Operation - Source-synchronous interfacing using ChipSync technology Digitally-controlled impedance (DCI) active termination Flexible fine-grained I/O banking High-speed memory interface support Flexible configuration options SPI-4 Parallel FLASH interface Multi-bitstream support with dedicated fallback reconfiguration logic Auto buswidth detection capability 65-nm copper CMOS process technology 1.0V core voltage