Sunteți pe pagina 1din 53

TMSuite

Vivado Design
UltraFast
Design
Methodology

Guidelines For Predictable Success

Copyright 2013 Xilinx


.

Xilinx Delivers an ASIC-Class Advantage


Through Silicon, Tools, and Methodology
ASIC-Class Advantage
UltraScale

ASIC-Class Architecture
ASIC-class
ASIC-class capabilities
capabilities
Enables
Enables massive
massive data
data flow
flow
Removes
Removes interconnect
interconnect bottlenecks
bottlenecks

Vivado

ASIC-Strength Design Suite


Accelerate
Accelerate system
system integration:
integration:
15x

15x faster
faster C++
C++ verification
verification

Interface-level
Interface-level connections
connections
Accelerate
Accelerate implementation:
implementation:
4x

4x faster
faster analytical
analytical P&R
P&R

Page 2

Copyright 2013 Xilinx


.

UltraFast

Design Methodology
Best
Best practices
practices for
for PCB
PCB planning,
planning,
HDL
HDL design,
design, closure
closure
Predictable
Predictable success
success in
in weeks,
weeks,
not
not months
months

Agenda
UltraFast Methodology Introduction
Write HDL code that best fit the hardware
Timing constraints creation and validation
Clock planning, Pin planning, Floorplanning

Page 3

Copyright 2013 Xilinx


.

UltraFastTM Methodology
Benefits
Fast Compile Times and Predictable Results
Require good methodology

Project Schedules Drive Time To Market


Manage risk affectively
Minimize Iterations, especially late-stage changes
Explore options early with estimation and progressive analysis

Proven Recommendations from Successful Customers


Best Practices with Checklists and Links to Documentation
Verification Tools and Reports
Linting and DRC

Page 4

Copyright 2013 Xilinx


.

UltraFast User Guide: UG949


PCB planning: Avoid board re-spins
Use XPE to validate power against budget
Use Vivado I/O planning & DRC on a top level including all I/F

Design Creation: Coding style for best QoR


Use HDL language templates in Vivado
New Linting capability: Methodology DRC ruledeck

Implementation: Rapid convergence & signoff timing


Rapid convergence technique: Closure with the simplest constraints
Signoff convergence: Closure with pristine constraints
Use XDC language templates & Timing DRC ruledeck

Page 5

Copyright 2013 Xilinx


.

Overall Strategy for Accelerated Design Cycle


Earlier Iterations
Start closure at the front-end of the design flow
Engage UltraFast early
Faster iterations than in the back-end
Greater impact on Quality of Results (QoR)

Impact on QoR
100x
10x

1.2x

1.1x

PCB
PCB //
Planning
Planning
Device/IP
Device/IP
selection
selection

IP
IP Integration,
Integration, RTL
RTL Design,
Design,
Verification
Verification

Implementation
Implementation
Closure
Closure

Reduce Design Cycle Time & Cost


Page 6

Copyright 2013 Xilinx


.

Config.,
Config., Bring-up,
Bring-up,
Debug
Debug

UltraFAST Design Methodology Guide UG949


Project Planning & Kickoff
Board Planning & Schematic Creation
Design Creation & IP Integration
Implementation & Design Closure
Configuration Programming &
Hardware Debug

Page 7

Copyright 2013 Xilinx


.

Design Methodology Checklist in DocNav


Sample section

Copyright 2013 Xilinx


.

Checklist
Spreadsheet based checklist to be used by designer and FAE to
review key portions of board schematic for FPGA/SOC
Power Distribution System, Configuration, Transceivers, XADC, I/O Interfaces

Copyright 2013 Xilinx


.

Vivado DesignTM
Suite
UltraFast
Design
Methodology

Guidelines For Predictable Success

Copyright 2013 Xilinx


.

Vivado Enables Design Methodology


Key Technology: Shared, Scalable Data Model
Progressive estimation accuracy across the entire flow

Reduce iterations
late in the cycle

Estimation

IP
IP
Integration
Integration

RTL
RTLDesign
Design

Synthesis
Synthesis

Place
Place && Route
Route

Shared, Scalable Data Model

Shares design information between


implementation steps
Ensures fast convergence and timing closure

Enables use of the same commands &


reports to analyze design at every step
Enables cross-probing

Schematics

Scalable for next decade of designs

Code
Changes

Tool
Settings
Placement
Edits

Timing
Timing Path
Path #1
#1
Timing
Timing Path
Path #2
#2
Timing
Timing Path
Path #3
#3

Reports

Copyright 2013 Xilinx


.

Placement

entity
entity FIR
FIR is
is
port
port (clk
(clk :: in
in
rst
rst :: in
in
din
din :: in
in

Timing
Timing
Report
Report

Highly efficient memory utilization

Page 11

RTL

Technique for Rapid Timing Closure


Baselining
Prioritize and close 1 step at a time
Converge first at Synthesis (faster, higher impact), then in back-end
Start with the simplest (baseline) constraint:
Internal Fmax (flop-to-flop constraints) which is the problem 9/10 times
Define proper clock dependencies

Make sure the design & constraints are reasonable


Analyze, get to root cause, then decide how to fix it
Clock path vs. data path vs. interconnect delay vs. logic delay

Add I/O constraints (with Vivado XDC templates) and redo

Do not confuse with Signoff Constraints


You still want complete constraints

View QuickTime Video for UltraFast Design Methodology for


Timing Closure

Page 12

Copyright 2013 Xilinx


.

Progressive Approach to Design Closure


Synthesi
s

Synthesi
s

Synthesi
s

Analysis

Analysis

Analysis

Baseline Constraints

If needed

Add I/O Constraints

Add Timing Exceptions


Place
and/or
Floorplan

Route

Place

Route

Place

Route

Analysis

Analysis

Analysis

Analysis

Analysis

Analysis

Optimize Internal Paths

Optimize Entire Chip

Fine-tune

Fmax

Fmax

Fmax

Baseline XDC

Page 13

Complete XDC

Copyright 2013 Xilinx


.

Final XDC

Critical Path could be a Moving Target


Example from a Real Design
Post-synthesis estimates (the real problem)
Worst path: 13 levels of logic
worst path: 4.3ns

Post-place
Worst path: 7 levels
Paths with 7-13 levels got placed locally

worst path: 4.2ns

Post-route (the side-effect of the real problem)


Worst Path: 4 levels of logic
Paths with 5-13 levels got preferred routing

Analyze & Fix timing issues at early stages for


faster timing convergence
Page 14

Copyright 2013 Xilinx


.

worst path: 4.1ns

Vivado Design
Suite Code that Best
Writing
HDL
Fits the Hardware

Copyright 2013 Xilinx


.

Impact of HDL Coding Style


Block inference
Follow recommended templates for RAM, DSP, LUTRAM, SRL inference

Pipeline your design to reduce levels of logic


Think about Reset
Taxes routing not always needed: Xilinx devices boot in a known state
Dedicated shifters (SRLs) and RAM memory arrays dont use resets

Synchronous resets are preferred


Allows packing of registers into dedicated RAM and DSP blocks
Tools have the option to implement reset in datapath (LUT)

Give more freedom to Synthesis


Revisit attributes needed by other synthesis engines or older releases
Avoid KEEP, dont_touch, syn_preserve, max_fanout attributes

Review Design Creation Chapter in UG949


Review Design Creation tab in the Design Methodology Checklist

Page 16

Copyright 2013 Xilinx


.

Using HDL Language Templates


Accessing templates in IDE
Windows Language Templates

Synthesis Templates

Page 17

BRAM, LUTRAM, ROM, SRL


Counter, MULT
FSM, Decoder, Encoder

Copyright 2013 Xilinx


.

Coding to Match the Hardware


DSP48 Blocks and BRAM Blocks
Leverage DSP block cascading capabilities
in

Pipelined adder chain


delivers optimal performance

in

Adder tree
becomes a
performance
bottleneck

out

DSP48 DSP48 DSP48

DSP48

out

Avoid Block RAM collision avoidance logic(*)

Synthesis
assumes
collision

rdaddr
wraddr
din

(*): logic added by default by Synplify


(attribute syn_no_rw_check removes the logic)
Page 18

RAMB

rdaddr
wraddr
din

dout

=
Copyright 2013 Xilinx
.

RAMB

Inference with
collision check
disabled
dout

The Impact of Resets


Increase performance with the right reset choice

Think Local, not Global with resets


No reset at all (if possible) is best
Synchronous rather than asynchronous reset
Active HIGH rather than active low reset
Default register value can be controlled via the
INIT property or at signal declaration in RTL

From: UG949 Chapter 4 Design Creation Control Signals and Control Sets

Page 19

Copyright 2013 Xilinx


.

Reset Routing
Resets compete for the same resources as the
rest of the active signals of the design
Including the critical datapath paths

Designs that minimize or eliminate resets have

About 18% fewer timing paths on average


About 15% less runtime on average
10% fewer registers and 7% fewer LUTs
20% lower timing scores
Use less memory

Be selective with where you code resets


Initialize all registers in the VHDL / Verilog code

Page 20

Copyright 2013 Xilinx


.

More on Resets
Many designs need some resets
Very few designs require resets on all registers
Most ASICs require a described reset on every register for testability
But the FPGA has a built-in Global Set/Reset (GSR)

Guideline: Be selective with where you code resets


Only place resets that have impact on functionality
I/O, State-machines, critical control logic, etc.

Omit resets that do not

Initialize all registers in the VHDL / Verilog code


This should be done whether using a reset or not
VHDL:
signalmy_regsiter:std_logic_vector(7downto0):=01010101;
Verilog:
reg[7:0]my_register=8h55;
Copyright 2013 Xilinx
.

Gauging Other Design Metrics


report_high_fanout_nets
To reduce fanout on a net use
max_fanout (Vivado synthesis and XST)
syn_maxfan (Synplify)

Use phys_opt_design for timing driven replication

From: Design Methodology Checklist Design Creation tab

Page 22

Copyright 2013 Xilinx


.

Gauging Other Design Metrics


report_control_sets
Indicator of possible packing fragmentation and fitting issues
Run the verbose option to generate a full list
Use Synplifys syn_reduce_controlset_size attribute for control
Default is 2, set it to 8 to eliminate most lowest fanout control sets

From: Design Methodology Checklist Design Creation tab

Page 23

Copyright 2013 Xilinx


.

Methodology DRCs
Two new rule decks in 2013.3
methodology_checks
timing_checks

Usage:
report_drc ruledeck methodology_checks
report_drc ruledeck timing_checks
Specific methodology_checks available only for the elaborated
design

Tools Report Report DRCs


Page 24

Copyright 2013 Xilinx


.

Review and Resolve Critical Warnings


Vivado does not stop for Critical Warnings
Enables fixing many issues at once
Bitstream generation will error with unresolved critical warnings

From: UG949 Chapter 5 Implementation Moving past Synthesis


Page 25

Copyright 2013 Xilinx


.

Review and Resolve Critical Warnings


Critical warnings are serious design issues
Invalid constraints or XDC syntax errors
Path segmentation
Netlist or target objects not found or invalid

Address these warnings before moving forward


Results of design analysis may be inaccurate
Critical Warnings may prevent design success

Page 26

Copyright 2013 Xilinx


.

Vivado Design
Suite
Timing
Constraints
Creation
and Validation

Copyright 2013 Xilinx


.

Timing Constraints Need to Be "Clean"


When constraints (clock, IO) are missing
The corresponding paths are timed optimistically
No violation will be reported but design may not work on HW

When path are incorrectly constrained


Runtime and optimization efforts will be spent on the wrong paths
Reported timing violations may not result in any issues on HW

When constraints create wrong HOLD violations


May result in long runtime and SETUP violations
P&R fixes HOLD violations as #1 priority, because:
Designs with HOLD violations wont work on HW
Designs with SETUP violations will work, but slower

Review the Creating Constraints section of the Design


Creation Chapter in UG949 & checklist

Page 28

Copyright 2013 Xilinx


.

Include IP Constraints
Many cores have their own constraints / exceptions
PCIE, MIG, RAM-based asynchronous FIFOs

Non-native IP: Be careful!


Very easy to drop the IP constraints especially if provided as .ngc files

Native IP: Constraints included


Sources window in IDE: Compile Order Constraints
Use report_compile_order constraints to identify constraint file sources

Page 29

Copyright 2013 Xilinx


.

Method to Create Good Constraints


Create clocks and define clock interactions
Four-step guideline

Set input and output delays


Beware of creating incorrect HOLD violations

Set timing exceptions


Less is more!
Beware of creating incorrect HOLD violations

Use report commands to validate each step

Page 30

Copyright 2013 Xilinx


.

Clock Ground Rules


For SDC-based timers, clocks only exist if you create them
Use create_clock for primary clocks

Clocks propagate automatically through clocking modules


MMCM and PLL output clocks are automatically generated
Gigabit transceivers are not supported. Create them manually.
dont
create_clock here

create_clock
here

Use create_generated_clock for internal clocks (if needed)


All inter-clock paths are evaluated by default
Page 31

Copyright 2013 Xilinx


.

Four Steps for Creating Clocks


Run report_timing_summary before starting constraint capture
View report_clocks section to see all signals driving clock pins

Step 1
Use create_clock for all primary clocks on top level ports
Run the design (synthesis) or open netlist design

Step 2
Run report_clocks
Study the report to verify period, phase and propagation
Apply corrections to your constraints (if needed)
Attributes
P: Propagated
G: Generated
Clock
sys_clk
pll0/clkfbout
pll0/clkout0
pll0/clkout1

Page 32

Period
10.000
10.000
2.500
10.000

Waveform
{0.000 5.000}
{0.000 5.000}
{0.000 1.250}
{0.000 5.000}

Attributes
P
P,G
P,G
P,G

Sources
{sys_clk}
{pll0/plle2_adv_inst/CLKFBOUT}
{pll0/plle2_adv_inst/CLKOUT0}
{pll0/plle2_adv_inst/CLKOUT1}

Output of report_clocks (excerpt)


Copyright 2013 Xilinx
.

Four Steps for Creating Clocks (continued)


Step 3
Evaluate the clock interaction using report_clock_interaction
BEWARE: All inter-clock paths are constrained by default!

Mark inter-clock paths (Clock Domain Crossing) as asynchronous


Make sure you designed proper CDC synchronizers
Use set_clock_groups (preferred method to set_false_path)
BEWARE: This overrides any set_max_delay constraints!

Do you have unconstrained objects?


Find out with check_timing

Step 4
Run report_clock_networks
You want the design to have clean clock lines without logic
Tip: Use clock gating option in synthesis to remove LUTs on the clock line

Page 33

Copyright 2013 Xilinx


.

Defining & Validating Clock Interactions

Page 34

Copyright 2013 Xilinx


.

Constraining Cross Clock Domains


Use appropriate synchronizing
techniques
2 or more register stages, for single bit
FIFO for buses

Maximize MTBF
ASYNC_REG to place synchronizing flops in
the same slice for best Mean Time Between
Failures (MTBF)

set_property ASYNC_REG TRUE \


[get_cells [list sync0_reg
sync1_reg]]
Page 35

Copyright 2013 Xilinx


.

Constraints for Asynchronous CDC


Ignoring timing paths between individual clocks
set_clock_groups asynchronous group {clk1} group {clk2}
This is equivalent to:
set_false_path from [get_clocks clk1] to [get_clocks clk2]
set_false_path from [get_clocks clk2] to [get_clocks clk1]
BEWARE: This overrides any set_max_delay constraints!

Ignoring timing paths between groups of clocks


# SDC create_clock for the two primary clocks
create_clock -name clk_oxo -period 10 [get_ports clk_oxo]
create_clock -name clk_core -period 10 [get_ports clk_core]
# Set Asynchronous Clock Groups
set_clock_groups -asynchronous
-group [get_clocks include_generated_clocks clk_oxo] \
-group [get_clocks include_generated_clocks clk_core} ]
BEWARE: This overrides any set_max_delay constraints!

Page 36

Copyright 2013 Xilinx


.

Setting Input / Output Delays


Start with no IO constraints
Focus on finding and fixing core timing issues
Vivado does not time from IOs without IO constraints
No Need to false_path from or to get_ports to ignore IO timing

Specify realistic IO delays Once Core Timing Reasonable


Use set_input_delay and set_output_delay
Wrong delay value (e.g. <0 ns) can cause invalid analysis

The delay value specified is the external delay


Default in UCF: internal delay

Page 37

Copyright 2013 Xilinx


.

Multicycle Paths
set_multicycle_path N implies a HOLD check at N-1
E.g.: a multicycle_path of 10 implies a HOLD requirement of 9 cycles!

Whenever setup check is changed, hold check is also changed


Guidelines for proper multicycle path constraints
Should always be pairs of set_multicycle_path constraints
One for setup and one for hold
Bring the HOLD requirement back to 0 (reduce by N-1) to avoid incorrect HOLD violations

regA
D

Multicycle Path = 3T

CE

regB
D
CE

regA/CLK

HOLD

SETUP

regB/CLK

CLK
REGB/D

set_multicycle_path from [get_cells regA] to [get_cells regB]

3 -setup

set_multicycle_path from [get_cells regA] to [get_cells regB]

2 hold

hold checked at edge 3-1-2 = 0


Page 38

Copyright 2013 Xilinx


.

Using Vivado Language Templates


XDC Template

Accessing templates in IDE


Windows Language Templates

SDR & DDR Templates


Inputs and outputs
Source / System synchronous
Center / Edge aligned

Page 39

Copyright 2013 Xilinx


.

Reading the Reports


Reading the report_timing_summary

Intra-clock report
Inter-clock report

Use report_timing for interactivity and advanced options


You would typically use it in the TCL window
report_timing through [get_nets {/cpu_top/crit_net_name}]
report_timing setup max_paths 10 # For 10 worst setup paths
report_timing hold to [get_cells {/top/item}] # Hold on
item

Use filters from your XDC files to check each expression


set_multicycle_path from [get_pins regA/C] to [get_pins
regB/D]
report_timing from [get_pins regA/C] to [get_pins regB/D]
Page 40

Copyright 2013 Xilinx


.

Timing Command Summary


Obtain full timing summary of the design
report_timing_summary: summary subsections for all timing checks

Create and validate clocks


check_timing: for missing clocks and IO constraints
report_clocks: check frequency and phase
report_clock_networks: possible clock root

Validate clock groups


report_clock_interaction

Validate I/O delays


report_timing from [input_port] setup/-hold
report_timing to [output_port] setup/-hold

Add exceptions if necessary


Validate using report_timing

Page 41

Copyright 2013 Xilinx


.

Managing Constraint Files


Using a single XDC file
XDC apply to both synthesis & implementation

Using multiple XDC files


Main XDC with top level constraints
Primary clocks and I/O delays
Exceptions on clocks and RTL objects

Implementation specific XDC


Physical constraints

main.xdc
main.xdc

Elaboration

Exceptions based on physical netlist


Synthesis

The order of constraint files matters!


To report the order of XDC files:
report_compile_order constraints

Page 42

Copyright 2013 Xilinx


.

impl.xdc
impl.xdc

Implementation

Managing IP Constraint Files


Some IP come with their own XDC constraints
Example: The clocking wizard

The clocking wizard XDC will be read before the user XDC by default
(user constraints can override IP defined clocks by default)

The order of constraint files matters!


To report the order of XDC files: report_compile_order constraints
Always verify the clocks using report_clocks (step 2 of 4-step process)
To change the default processing order
set_property set_processing_order early|late IP_XDC_File

If necessary, IP_XDC_files can be enabled/disabled

Page 43

Copyright 2013 Xilinx


.

Vivado Design
Suite
Clock
Planning,
Pin
Planning
and Floorplanning

Copyright 2013 Xilinx


.

Clock and Pin Planning


Pin and Clock Planning often happens early in the Project
Decisions here can have prolific effects throughout the design

Excessive clock skew


Poor I/O timing
Timing hazardous clock domain crossing
Less flexible logic placement
Fewer clocking resource choices
Excessive routing delays
Reduced device utilization

Pin and Clock Planning should be considered together


Choices made for clock pins affect clocking timing and resources choices
Choices made for data pins affect clock pin placement decisions

Review the Board & Device Planning Chapter in UG949


Review the Board and FPGA Planning tab in the Design
Methodology Checklist

Page 45

Copyright 2013 Xilinx


.

Clock and Pin Planning


Considerations for clock pin planning
Generate all I/O interface and clocking IP prior to pin assignment
Consolidate clocking where possible and consolidate MMCMs
Fewer clocks and MMCM means fewer clock resources and crossings

Consider all CDC when assigning clocking resource and pins

Considerations for data pin planning


Group related data pins in same bank, or adjacent banks if single bank not possible
Place associated I/O clock in same bank when possible

Consider associated control signal placement along with data paths


Consider data flow as planning pinout
Chose a pinout that has clean passage through device

Place high fanout signals towards the middle of the chip


Really high fanout signals considered for CCIO pins with BUFG resources

Evaluate all pin attributes (I/O Standard, Slew, etc.) during placement

Page 46

Copyright 2013 Xilinx


.

Clock and Pin Planning


Use Vivado Pin Planning capabilities
Import pin & clocking assignments from generated IP
Visualization of I/O resource placement on package and in device
DRC, SSN and other checks available to validate choices
Configuration pin assignments & possible device migration considerations

Re-evaluate in Vivado any


subsequent pin changes
Understand how PCB pin swaps

affect timing & resources


Vivado I/O & Clock Planning Tutorial
UG935
Available in DocNav and Vivado

Page 47

Copyright 2013 Xilinx


.

Additional Considerations for SSI Devices


Clocking
High fanout clocks should be placed in center SLRs
Place regional clocks on center clock region within an SLR
Place clock pin / MMCMs in same SLR as timing critical I/O interfaces
(avoid driving timing critical I/O interfaces from a different SLR)
Clock pin choices should be balanced across upper & lower SLR:
2 upper SLR clock domains have 8 BUFG x 2
4 lower SLR clock domains have 4 BUFG x 4

Pinout
High fanout signals feeding all SLRs placed in center SLRs
I/O interfaces should not span across SLRs
Pay attention to data flow across SLRs
Avoid the need for multiple SLR crossings due to pinout decisions

For more details


Consult UG872: Large FPGA Methodology Guide for more details

Page 48

Copyright 2013 Xilinx


.

Improving Placement Through Floorplanning


First improve HDL, synthesis & constraints
Easier, more repeatable to not floorplan when avoidable

Start design without any floorplanning


See what P&R algorithms can do without restrictions

Using Vivado IDE


Highlight placement per module as guideline
Visualize placement of critical timing paths
Understand data flow in & out of Pblocks
Understand affects of Pblock inside & out
Resources around placement can affect data flow

Create Pblocks minding resource utilization

Careful not to over floorplan Less is best


Only floorplan the critical areas of the design
Do not create Pblocks with very high utilization
Can create routing congestion or new timing problems

Avoid overlapping Pblocks


Creates more complex placement and clock scenarios

Page 49

Copyright 2013 Xilinx


.

Baseline run with highlighted regions

Vivado Design Suite


Summary

Copyright 2013 Xilinx


.

UltraFastTM Methodology Review


For optimal results, adapt your HDL style to the FPGA

Be mindful of BRAM, LUTRAM, DSP, SRL inference needs


Avoid asynchronous reset and wired resets in general
Minimize control signals
For large FPGAs, design with the dataflow and floorplanning in mind

Baseline your constraints to converge rapidly


Provide clean timing constraints
Bad constraints results in bad runtime, performance and HW failures
Learn the essentials of timing creation & validation methods

Follow pin/clock planning guidelines


Must follow dataflow
Place large fanout clocks and pins in the center of SSIT devices

Page 51

Copyright 2013 Xilinx


.

Follow Xilinx

facebook.com/XilinxInc

twitter.com/XilinxInc

Copyright 2013 Xilinx


.

youtube.com/XilinxInc

Vivado Design
Thank
YouSuite

Copyright 2013 Xilinx


.

S-ar putea să vă placă și