Sunteți pe pagina 1din 19

An Introduction to Reconfigurable Computing

Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003

Reconfigurable Computing
is computation on a platform with reconfigurable (i.e., modifiable at run-time) hardware capable of implementing application-specific algorithms and functionality on demand.

Computing Spectrum
Software
Fetch Decode Registers Execute / + Memory Writeback + x xor z-1 x + x xor

Soft-Hardware
A

Hardware
B C D

result

General-Purpose CPU Easily reprogrammed Low cost Fundamental bottlenecks

Field Programmable Gate Arrays (FPGAs) Reconfigurable hardware Medium cost Speedup potential

Application-Specific Integrated Circuit (ASIC) Not modifiable High cost Extremely fast

History
1945: 1945: 1960: 1970s: 1985: 1990s: 1999: 2002:

Eckert, Mauchly, von Neumann: ENIAC


von Neumann architecture Estrin: Fixed+Variable Structure Computer Simple PLDs Xilinx introduces first FPGA Custom Computing Machines (CCMs) FPGAs exceed million logic gates FPGAs include complex cores

ENIAC Fixed+Variable CPU: ConnectingVirtex CCM: The Teramac II new Users can attachPro Xilinx computational (image Xilinx Virtex FPGA Blocks for an rapidio.org) Multi-Chipcourtesy ofalgorithm Module of FPGAs computational circuits to a fixed ALU

Reconfigurable Computing in Modern HPC


Stand-alone platforms
OctigaBay 12K SRC-6 Starbridge Hypercomputer

Accelerator cards
Timelogics DeCypher Nallatechs BenNUEY Annapolis Micro Systems WILDSTAR II

Example: Computational Fluid Dynamics


William Smith & Austars Schnore at GE Global Research

From: Towards an RCC-based Accelerator for Computational Fluid Dynamics, ERSA 2003

And now for some details


Field Programmable Gate Arrays (FPGAs) Common RC design techniques Reported examples

Field-Programmable Gate Arrays (FPGAs)


FPGAs emulate digital logic circuitry
Large array of configurable logic blocks Internal routing through programmable interconnection network

FPGAs hold hardware configuration in SRAM


Change the digital circuitry by loading new configuration

Design approach:
User designs in hardware description language Synthesis tools translate to logic gates Mapping tools target specific FPGA

Simplified Logic Block


Emulates logic function
LUT Register

Thousands per chip

Lookup Table (LUT)


Holds truth table Inputs produce outputs

Register

1-bit registers
Hold data between cycles

LUT

Note: Greatly simplified

LUT Example:1-bit Adder


A 0 0 0 0 1 1 1 1 Truth Table B Cin Cout 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 Sum 0 1 1 0 1 0 0
Register

A B C 0

LUT
Register

Cout

A B C 0

LUT

Sum

Routing Data between Logic Blocks


LB X LB X LB X LB X LB LB LB X LB LB X LB X LB LB X LB X LB X LB LB X LB X LB X LB LB X LB X LB LB X LB LB

Need to connect logic blocks Wires and Switchboxes


LBs connect to local wires Switchboxes route long connections

Routing set at compile time


Performed by tools

Reconfiguration
Modern FPGAs SRAM based
Can be loaded with new circuitry
Full Configuration Image

Full reconfiguration
Few megabytes of configuration Milliseconds

Partial reconfiguration
Reprogram only a portion of chip Reduces configuration time Non-trivial, poorly supported

FPGA

Partial Configuration Image

Design Techniques
Digital logic design techniques for exploiting FPGAs

FPGAs as Computational Accelerators


Use FPGAs as soft-hardware
Port algorithm to hardware Run inside FPGA Reuse hardware

Techniques
Concurrency, memory, partial evaluation

1. Concurrency
Load FPGA with multiple computational circuits
Hardware state machines are like threads, but.. All tasks are always running

Raw parallelism
Units run in parallel Example: Key breaking

Pipelining
Chain units together in series Example: Streaming computations, data-flow

2. Custom Memory Interactions


Most FPGA cards have multiple memory banks
Fetch/store multiple data values at same time Predictable performance (as opposed to caches) Hide address generation
SRAM Bank 0 SRAM Bank 1 SRAM Bank 2 SRAM Bank 3

X X X FPGA
SRAM Bank 4

3. Partial Evaluation
Know data constants at design time
Apply to circuits and reduce hardware Synthesis tools perform automatically
Example: 4-bit Ripple-Carry Adder

Note: FPGAs unique because we can easily generate new, optimized hardware configurations for each set of constants.

RC Performance Examples
CFD: 23 GFLOPS sustained
Towards an RCC-based Accelerator for Computational Fluid Dynamics, Smith & Schnore, 2003

Adaptive beamforming: 20 GFLOPS


Parallel systolic array architecture 20 GFLOPS QR processor on a Xilinx Virtex-E FPGA, Walke, et. al., 2000

Real-time holographic video display at 30fps


Using field programmable gate arrays to scale up the speed of holographic video computation, Nwodoh

In Summary
Reconfigurable computing uses FPGAs to emulate application-specific hardware
Achieve performance gains with dedicated hardware

It is possible to implement just about any kind of digital hardware in the FPGA.
Limited by capacity and effort Resurrect application-specific hardware architectures SIMD, MIMD, Systolic Processor Arrays, Data-Flow

S-ar putea să vă placă și