Sunteți pe pagina 1din 10

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

WP-01136-1.1 White Paper

Implementing digital signal processing (DSP) datapaths with different performance, precision, intellectual property (IP), and development flows is challenging and laborintensive. As more and more high-performance DSP datapaths are implemented on FPGAs, Altera has developed a complete DSP solutions portfolio at 28 nm to address these challenges and speed up the design cycle for FPGA-based applications. This white paper discusses the different components of this portfolio and how they come together to accelerate the implementation of a DSP design.

Introduction
Although signal processing is usually associated with digital signal processors, it is becoming increasingly evident that FPGAs are taking over as the platform of choice in the implementation of high-performance, high-precision signal processing. Accordingly, FPGA vendors are beginning to include hard multipliers and DSP blocks within their core silicon architecture. IP cores are also provided to assist traditional functions such as finite impulse response (FIR) and fast Fourier transforms (FFTs). As a result, a wide range of applications are now relying on FPGAs as the key signal processing platform. These applications, shown in Figure 1, share one thing in commonthe performance requirements exceed the capabilities of a traditional programmable digital signal processor.
Figure 1. Different Applications Need Different Performance, Precision, IP, and Tools
9-Bit Precision 100 GMACs Floating-Point Precision TERAFLOPs

Video Surveillance

Broadcast Systems

Wireless Basestations

Medical Imaging

Military Radar

High-Perf. Computing

9-Bit HDLPrecision 100 Video IP GMACs FIR, FFT, NCOs

Floating-Point Precision MATLAB/SIMULINK TERAFLOPs Floating-Point Functions

101 Innovation Drive San Jose, CA 95134 www.altera.com April 2011

Copyright 2011 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, and specific device designations are trademarks and/or service marks of Altera Corporation in the U.S. and other countries. All other words and logos identified as trademarks and/or service marks are the property of Altera Corporation or their respective owners. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.

Altera Corporation
Feedback Subscribe

Page 2

Alteras Total 28-nm DSP Portfolio

These systems not only have different performance and precision requirements, but also different design and development flows. For example, video processing requires 9- to 10-bit precision, with some high-end designs needing a 16-bit color depth. These designs are generally created in a HDL design flow, with video- and image-processing IP functions increasingly utilized to speed up the development flow. On the other side of the spectrum, military radar designs require the highest DSP performance and floating-point precision to get the highest dynamic range. Many of these designs are modeled in the popular MATLAB and Simulink tools, along with floating-point functions that are optimized for the FPGA architecture.

Alteras Total 28-nm DSP Portfolio


The biggest challenge faced by FPGA vendors is in providing a complete DSP solution portfolioone that not only includes a DSP silicon architecture that is configurable, but also a range of tools, IP, and building blocks that can help designers to quickly and efficiently complete the implementation of their algorithms. To support the 28-nm Stratix V, Arria V, and Cyclone V FPGAs, Altera offers a total DSP portfolio, which, as illustrated in Figure 2, comprises a variable-precision DSP architecture, the DSP Builder Advanced Blockset, a video design framework, and a comprehensive suite of floating-point IP.
Figure 2. Industrys First Total DSP Portfolio

DSP Block Architecture

DSP Builder Timing-Driven Simulink Synthesis

Total DSP Solutions


Video Design Framework Comprehensive Floating-Point IP

Variable-Precision DSP Architecture


The basic principle behind Alteras DSP solutions portfolio is the recognition that one size does not fit all, that it is necessary to understand the diverse needs and preferences of customers in the design and development environment. Signal processing applications have different precision requirements and different precision levels at different stages of the signal processing data-paths. For example, video broadcast applications can efficiently use multipliers ranging from 9x9 to 18x18. Other applications, such as wireless and medical systems, that develop complex, multi-

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

April 2011 Altera Corporation

Variable-Precision DSP Architecture

Page 3

channel filters, require a higher precision as there is a need to maintain data precision after each stage of the filter. Apart from these, there are also applications in the military, test, and high-performance computing industries that demand both performance and precision, sometimes requiring single- or even double-precision floating-point to implement complex matrix operations and FFTs. To address the precision requirements of various DSP applications on the entire spectrum, Altera architected the industrys first variable-precision DSP block. It is the first DSP block in the market to have two native precision modes, 18-bit precision mode and high-precision mode, illustrated in Figure 3 and Figure 4. This unique feature provides backward compatibility with previous 40-nm DSP blocks, as well as efficient support for emerging signal processing applications of higher precisions. In addition, the variable precision blocks for Stratix V, Arria V, and Cyclone V devices are optimized for various applications.
Figure 3. Arria V and Cyclone V 18-Bit Precision and High-Precision Modes

18 +/18

18x19 X

64 Bits 27
Output Multiplexer

27x27 X

64 Bits

Output Multiplexer

Output Register

Intermediate Multiplexer

Input Register

18 Bits

108 Bits

+ 18 Bits Coefficient Bank

+ -

74 Bits

Input Register

Coefficient Bank

Intermediate Multiplexer

Output Register

108 Bits

Coefficient Bank 27 bits

+ -

74 Bits

Feedback Register

Feedback Multiplexer

18

18x19

25

64 Bits

Feedback Multiplexer

+/-

+/-

Feedback Register

18

25

64 Bits

18-Bit Precision Mode

High-Precision Mode

Figure 4. Stratix V 18-bit Precision and High-Precision Modes

18x18 +/X Intermediate Multiplexer

64 bits

64 bits

Output Multiplexer

Output Multiplexer

Output Register

Input Register

72 bits

18-bit Coeff Bank

+ -

Input Register

+ -

64 bits

72 bits

27-bit Coeff Bank

X 27x27

Output Register

18-bit Coeff Bank

Intermediate Multiplexer

Accumulator

Accumulator

+ -

64 bits

+/-

X 18x18 64 bits

26-bit +/-

64 bits

18-Bit Precision Mode

High-Precision Mode

April 2011

Altera Corporation

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

Page 4

DSP Builder Advanced Blockset

A single variable-precision DSP block at 28nm can support precisions ranging from 9x9 to 27x27. In addition, the precision of each block within a device can be independently configured to support bit growth in various designs such as FIR and FFTs. This block is called variable because its precision is configurable by the customer on a block-by-block basis. This is a powerful new concept because FPGAs traditionally force the designer to adapt the algorithm to the block architecture, which results in either a suboptimal implementation or the need to modify the algorithm. Legacy fixed-precision DSP architectures can support only one precision. As such, the designer either wastes resources when the precision requirement of the algorithm is lower, or settles for lower performance by cascading multiple blocks when the precision requirement is higher. In such a situation, only a DSP block with configurable precision is able to provide system performance within stringent cost and power budgets. The increasing need for higher precision and complex multiplication operators in high-performance datapaths is also taken into consideration in the design of the variable-precision DSP block. To enable the cascading of multiple DSP blocks, the variable-precision block was designed with the industrys only 64-bit cascade bus and adder. This design allows the implementation of large complex multipliers and floating-point signal processing functions with 50 percent fewer resources than the competing 18x25 architecture.

DSP Builder Advanced Blockset


Alteras DSP Builder tool provides support for high-level, Simulink-based synthesis, timing-driven netlist optimizations, and a complete floating-point design flow for FPGAs. Netlist optimization is a unique feature of the DSP Builder tool that allows the designer to specify the desired fMAX (clock frequency) and latency of the system and leave the rest of the work to the tool. The DSP Builder tool includes the necessary registers needed to increase the fMAX of critical paths to meet latency constraints. As a result, no more time-consuming hand-tweaking of the HDL code is necessary as changes can be made with the push of a button. The resulting productivity gain can be best illustrated with an example radar design that meets timing at 350 MHz using the DSP Builder tool. Figure 5 shows a portion of a radar design jointly developed by Altera and The MathWorks to be implemented in an Stratix V FPGA with a target fMAX of 350 MHz.

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

April 2011 Altera Corporation

DSP Builder Advanced Blockset

Page 5

Figure 5. Large DSP Design for a Radar Front-End Application

8-Channel 8-ChannelPolyphase PolyphaseFIR FIRFilter Filter 8-Channel 8-ChannelPolyphase PolyphaseFIR FIRFilter Filter 8-Channel 8-ChannelPolyphase PolyphaseFIR FIRFilter Filter 8 8-Channel Channel C h a n n e l Polyphase P o l y p h aa ss eeF FIR IR F ilit rr Complex 8-Channel 8 -Channel P Polyphase oly p h F FIR IRFilter Filter F le te ComplexMixer Mixer+ +Adder Adder Complex Mixer + Adder Complex Mixer + Adder Complex ComplexMixer Mixer+ +Adder Adder C Complex o m p l e x Mixer ix ee rr+ Ad Adder d der d ee rr 1024-point, Radix 4, Complex FFT Complex Mixer Adder C omplexM M ix +A A d d 1024-point, Radix 4, Complex FFT 1024-point, Radix 4, Complex FFT 1024-point, Radix 4, Complex FFT 1024-point, 1024-point,Radix Radix4, 4,Complex ComplexFFT FFT 1024-point, 024 0 2 4 point, p o i n t , Radix R a d ix 44 , ,Complex C o m p le xxFFT F F T 1 024 point, R a d ix4, C o m p le F F T 1024-point, Radix 4, Complex FFT 4

Portion of a high-end radar front-end design

inst

anc

es

Typically, this fMAX constraint can only be met by hand-tweaking the HDL code to add the necessary registers and resources. However, with the DSP Builder tool, designers now have an automated way of meeting the performance goal. The compilation report in Figure 6 shows the large design, comprising about 60K logic elements (LEs), achieving a system fMAX greater than 350 MHz without the need for manual handtweaking of the HDL code.
Figure 6. Automatic Generation of a Large Design that Closes Timing at >350 MHz

This system fMAX was achieved by following these steps: 1. The datapath was built in Simulink using building blocks from the DSP Builder library and simulated to make certain it conformed to the algorithm. 2. The fMAX of the total system was set to 350 MHz in the Parameters file (.params) file in Simulink, signaling the DSP Builder tool to optimize the implementation for the specified performance. The system implementation constraints were added at a higher level of abstraction within the high-level Simulink design description. 3. After clicking DSP Builder, the Simulink design description was analyzed, and both a HDL code and a bitstream were generated for the Stratix V FPGA. The timing constraints (in this case, fMAX) were incorporated. Pipeline registers and the correct amount of time-division multiplexing were automatically added to meet or even exceed the specified fMAX.

April 2011

Altera Corporation

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

Page 6

Video Design Framework

The designer can also efficiently run multiple what-if scenarios with the DSP Builder tool. To do so, all that is necessary is to change the fMAX settings, latency settings, target device architecture, and even design parameters such as the number of channels, by editing the top-level parameter file in MATLAB and Simulink. Once satisfied with the performance, latency, and device utilization, the designer can either choose to use that HDL code for the datapath, or to further tweak the code to meet additional system goals. In either case, the implementation design cycle is reduced tremendously.

Video Design Framework


As the world of video makes a transition to 1080p high-definition (HD) resolutions, FPGAs are ideal platforms for video processing. Altera anticipated this transition nearly four years ago and invested in a video design framework, shown in Figure 7, that edged out Xilinxs design tools to win the prestigious 2009 EDN Innovation Award.
Figure 7. FPGA Industrys Only Video Design Framework

Alteras video design framework is currently the only one in the market that includes 18 video functions, a streaming video interface standard, six hardware-verified reference designs, and a range of video development kits. To date, over 100 active customers are using this video design framework in their systems. Figure 8 shows an example customer design using the Altera video design framework. The end system is a video wall that incorporates multiple video sources, also known as a composite video. Such video walls are not only common in outdoor advertising monitors, but also in medical, military, and broadcast applications. As the individual videos come from different sources, they must be processed differently some video sources need to be de-interlaced and scaled, others are progressive to begin with and need only be scaled, while some others may need to be custom processed. All the sources are then stitched together to form a composite image that is within the users control.

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

April 2011 Altera Corporation

Comprehensive Floating-Point IP

Page 7

Figure 8. Using Alteras Video Design Framework to Develop a Custom Video Wall
Video 1 Color Space Conversion Motion-Adaptive Deinterlacing Clipping Scaling

Video Wall
Video 2 CRS and Color Space Conversion Clipping Scaling Color Space Conversion + CRS

Video 3

CRS and Color Space Conversion

Motion-Adaptive Deinterlacing

Clipping

Color Space Conversion + CRS

Video Composite Mixer Image

Test Pattern Generation

Video 4

Proprietary Video Processing

Altera Video Framework Function

Note:
(1) Image: Apantac LLC

To build the rather complex video signal chain, the building blocks and the openstreaming interface of Alteras video design framework were used. Key MegaCore functions from Alteras Video and Image Processing Suite can be linked together to create one video path, while the other path can be fully customized. Both video streams can then be alpha blended to create the composite video stream.

Comprehensive Floating-Point IP
In the high-performance DSP domain, floating-point signal processing is slowly but surely being seen as a way to increase dynamic range. Alteras internal research shows that almost half of high-performance DSP designs using FPGAs, such as advanced military space-time adaptive processing (STAP) radar, MIMO equalization for LTE channel cards, and high-performance computing boxes, require higher than 18-bit precision. Floating-point processing generally involves mantissa multiplication, mantissa normalization and de-normalization, and exponent addition. While exponent addition and subtraction operations are straightforward, mantissa multiplication and normalization require higher than 24-bit precision multipliers. In order to perform these operations, traditional FPGA architectures that are limited to the 18x25 precision must be cascaded to implement a single-precision mantissa multiplication. Alteras new variable-precision architecture can implement single-precision floatingpoint mantissa multiplications in a single block, thus allowing for a very highperformance design.

April 2011

Altera Corporation

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

Page 8

Comprehensive Floating-Point IP

In addition, with DSP Builder v.10.1 and later, Altera has integrated a tool flow to build floating-point datapaths. This fused-datapathtool flow builds floating-point datapaths while taking into account the hardware implementation issues inherent in FPGAs. The tool allows designers to create high-performance, floating-point implementations of large FPGA designs, as illustrated in Figure 9.
Figure 9. Floating-Point Design Entry Example

e (c)

x Square

single (c) + double (c)

c mag

double 4 C1 a

a a>=b boolean b CmpGE

Mag

single (c)

x Square

single (c) + double (c) 20 b

a>=b

boolean

boolean

4
exit

3
nz Maxlter CmpGE1

Finished

2
point

1
qPoint

3
count 1 Maxlter1 Coord2 + int16

2
qCount

The combination of Stratix V FPGAs and the fused-datapath toolflow can now support 1-teraFLOPS processing rates. No competing FPGA vendor can benchmark this level of performance. The fused-datapath toolflow also works well on other Altera FPGA families, such as Stratix II, Stratix III, and Stratix IV FPGAs, and Arria, Arria II, Arria V, and Cyclone V FPGAs. Altera has been using this toolflow internally to build floating point IP and reference designs for several years. In addition, the IP for Stratix IV floating-point performance is already available to designers. Finally, Alteras portfolio of floating-point functions, illustrated in Figure 10, is the largest portfolio of floating-point IP cores within the FPGA industry, and ranges from simple operators, such as addition, subtraction, and inversion, to complex matrix multiplication, matrix inversion, and FFTs.

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

April 2011 Altera Corporation

Summary

Page 9

Figure 10. Alteras Floating-Point Portfolio

It is possible to achieve very high fMAX and low latency for these functions as they are optimized for Altera device architectures. For large matrix multiplication functions such as 64x64, an fMAX as high as 380 MHz can be obtained.

Summary
The DSP in FPGAs concept spans across different industries and different applications of different performance, precision, IP, and tool flow requirements. Because todays FPGA vendors are expected to meet the customers need for a complete DSP portfolio that includes IP, tools, building block functions, and configurable DSP blocks to enable rapid design implementation and debug, Altera has developed a unique and differentiated total DSP solutions portfolio, in conjunction with the 28-nm Stratix V, Arria V, and Cyclone V FPGAs, to enable highperformance DSP designs for a wide range of markets and applications.

Further Information

Stratix V FPGAs: Built for Bandwidth:


www.altera.com/products/devices/stratix-fpgas/stratix-v/stxv-index.jsp

Literature: Arria V FPGAs:


http://www.altera.com/products/devices/arria-fpgas/arria-v/arrv-index.jsp

Literature: Stratix V Devices:


www.altera.com/products/devices/stratix-fpgas/stratix-v/literature/stv-literature.jsp

DSP Solutions: www.altera.com/technology/dsp/dsp-index.jsp

April 2011

Altera Corporation

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

Page 10

Acknowledgements

Alteras Total 28-nm DSP Portfolio: Fastest Path to Highest Performance Signal Processing: www.altera.com/b/28-nm-dsp-portfolio.html Webcast: Accelerate your FPGA-Based DSP Designs: www.altera.com/education/webcasts/all/wc-2010-accelerate-fpga-dsp-designs.html Video and Image Processing (VIP) Suite MegaCore Functions: www.altera.com/products/ip/dsp/image_video_processing/m-alt-vipsuite.html

Acknowledgements

Suhel Dhanani, Sr. Manager, Embedded Marketing, Altera Corporation Jordon Inkeles, Senior Manager, Software and DSP Marketing, Altera Corporation

Document Revision History


Table 1 shows the revision history for this document.
Table 1. Document Revision History Date April 2011 July 2010 Version 1.1 1.0 Initial release. Changes Added details for Arria V, Cyclone V, and variable-precision DSP blocks.

Accelerating DSP Designs with the Total 28-nm DSP Portfolio

April 2011 Altera Corporation