Documente Academic
Documente Profesional
Documente Cultură
Agenda
Part 1: Clock tree synthesis: Fundamentals
Classical clock tree synthesis methods Advanced clock tree synthesis
Summaries
Structure balance
Taping point
Advantages
Eliminate port-CTS tuning process Interconnect aware skew balance Efficient (local clock tree skew optimization)
logic cloud
Principle of minimizing P-variation effect Minimize non-common part of clock tree between launch and capture clock nodes
Bad
Good
Bad
Good
Limitations:
Optimization is focused on expanded clock buffer trees. Quality drops when clock structure becomes complex and CTS constraints are not provided.
Results:
Complex clock structure with control logic Challenges in clock distribution and balancing
CTB
FlipFlop
CTB
Clock divider
CTB CTB
CTB CTB
FlipFlop
Large load cause large slew at non-CTS cell outputs Cascaded non-CTS cells amplify slew To fix slew violation, CTS tools insert back-to-back buffer chain => buffer chain delay is added to clock tree insertion delay !
A practical solution
Find non-CTS output nets Apply heavy net-weight to the nets in cell placement
After expansion
Flop1
Flop2
CTB
Flop2
CTB
Top-level delay
Though not require to be balanced in timing, the local flops are still balanced on loading => increase clock phase delay
CTB Flop3
Top-level delay
After expansion
CTB
Flop1
Flop2
CTB
Flop2
CTB
CTB
CTB
CTB
CTB
Flop3
Flop3
CTB
Top-level delay
Insert and place an isolation buffer to minimize load on clock source => eliminate local flops load on clock tree
PLLs
MPU Core
CLK&RST Mgr.
DMA Controller
DPLL x 2
Challenges of rigid pin constraints and fixed size hard IP core. CLOCK Equalize Steiner distance from clock source to leaf cells. Region clock management block in a small area close to PLL. No hard macros in std_cell placement area.
In general case, skew degrades design speed and causes malfunction due to hold time violations. clock skew need to be minimized
CTS
Clock select
CTS tools usually balance through one dividing path High fan-in mux has large delay variations from inputs
Clock-out
Div/bypass select
Delay line
Paths from Clock-in through the flop and the delay line are balanced.
1/8
CTS
bypass
Pros. Easy to implement Cons. FSM flops need to be skewed earlier than bypass clock to avoid a cycle shift of the equalizer output clock
Clock-in
CTS
Clock select
Pros.
Save one flop Divided clocks and bypass clock are always in phase
GATING
func_enable1
DPLL1
CTS
CTS
GATING
test_enable
test_enable
GATING
CTS
test_enable
1
GATING
DPLL2
CTS
GATING
0
CTS
test_enable
block2_func_enable func_enable2
GATING
CTS
bsr_clock
DOMAIN2
Four 1. 2. 3. 4.
levels of Gating Chip level clock gating Functional domain level clock gating Block level clock gating Transaction level clock gating
1/x Divider
CTS
Asynchronous module
test_enable
Defined constraints for local paths outside of the common clock distribution Balance the common clock distribution and local paths with defined constraints => Clock tree is balanced in various modes
GATING
func_enable1
DPLL1
CTS
CTS
GATING
test_enable
test_enable
GATING
CTS
test_enable
1
GATING
DPLL2
CTS
GATING
0
CTS
test_enable
block2_func_enable func_enable2
GATING
CTS
bsr_clock
DOMAIN2
1/x Divider
CTS
Asynchronous module
test_enable
Clock_in
Clock_out
0
Test_mode
FlipFlop
FlipFlop
Delay line Td
Pros. Single insertion point and easy to implement Implemented before clock tree expansion Local delay tuning has little disturbance in layout
Delay line Td
Clock_Gating_ Cell
CTB
Flop
Clock_Gating_ Cell
CTB
Flop
CTB
CTB CTB
CTB
Flop
CTB
Flop
Clock_Gating_ Cell
Clock_Gating_ Cell
Clock divider
CTS
Crosstalk induced violations Solution: Identify critical clock routes, reroute the nets with wide spacing or shielding
Summary
Clock tree synthesis: Fundamentals
Classical clock tree synthesis methods Advanced clock tree synthesis
Q&A