Sunteți pe pagina 1din 20

Advanced Digital IC-Design Overview

Synchronous
Clocking & Timing Asynchronous
Self Timed Design

Synchronous Circuit Asynchronous Circuit

Req Req Req


Handshake Handshake
Ack A k
Ack A k
Ack
IN REG REG REG OUT
Go Done Go Done
D Q
Logic D Q
Logic D Q

IN REG REG OUT


CLK D Q
Logic D Q
Logic

Local synchronization (handshaking):


Global synchronization Request
Clock period > Max Delay (tlogic + tR) Acknowledge

1
Globally Async Locally Sync (GALS) Synchronous Design

Clocked Domain The purpose of the clock is to

REG REG
Interface

Interface
IN
Logic
OUT Synchronize the registers on the chip with
Req
D Q D Q
Req
each other

Ack Ack Synchronize the registers on the chip with the


Local external world
Clock

Asynchronous Environment Clock skew is a large problem

Sequential Logic Latch versus Register

Latch stores data when clock is low (high)


Registers
Flip-Flop (or Register) stores data when clock rises (falls)
Combinational
Logic Latches Latch: Register:
Level Sensitive Edge Triggered
Flip-flops

Register
D Q D Q

Latch
State
Clk Clk

Clk Clk
Comb. Comb. Comb.
State State State D D
Logic Logic Logic

Q Q
Q on Data Q on Clock Edge

2
Clock Non-Idealities Clock Non-Idealities

Both skew and jitter affects the cycle time


Clock skew Skew might lead to race through the registers
a iation in temporally
Spatial variation tempo all equivalent
eq i alent
clock edges

Clock jitter
Temporal variations in consecutive edges of Same clock
the clock signal at two
different
locations on
the chip

tskew tjitter

Clock Non-Idealities - Feedthrough Example Clock System


VDD (Always on)
VDD Global
Clock feedthrough
Clock Module 1

2,5 N
f SYS = fCLK
A Q System
M Enable 2
Clock f CLK Data
1,5 Phase
Module 2 De-
Locked
B skew

Loop
Q Local
0,5 Clock
C N Enable 3 Signals

M
-0,5 Module 3

0 0,5 Time, ns 1

Coupling in dynamic devices Clock feedthrough On-Chip Clock Clock Clocked


can lift the output Generation Gating Modules

3
Synchronous Pipelined Datapath Clock Skew
Absolute Skew
Relative Skew
10x10 mm Chip
Example: 15 mm wire
R1 R2 R3 R4
In Logic Logic Logic
D Q D Q D Q D Q
Block #1 Block #2 Block #3 C = 300 fF
CLK tpd,reg tpd1
Delay
tpd2
Delay
tpd3
Delay
R = 4 k

tpHL = 0.69RC = 0.8 ns

Max frequency
The delay give CLK
clock skew 1
=
1
= 600 MHz
2t pHL 2 0.8 ns
L = 15mm

Clock Skew Setup- and Hold-times

Negative Skew

Data bus
REG

REG

REG
Abs

Log

Clock line
Out
REG

In
tjitter tsetup tjitter tho ld
R

CLK
Positive Skew

Clock and data routing

4
Clock Skew General Clock Distribution Tree

Ext. 1 Branches

CLK R t
Root
Leaves
Trunk

Clock

2
Source

2
Have a large relative skew
Large skew require large non-overlap

Balanced Clock Net Clock Distribution: H-Tree

Distributed
Buffers Small relative
skew
All wires and
buffers are
carefully balanced Absolute skew of
less importance

Clock
Clock

5
Clock Distribution: H-Tree IBM G4 Processor

A balanced H-
tree structure

Realistic Achieves a
skew control of
H-Tree 25 ps

Symmetric Clock Distribution Networks Distributed Buffers

Small relative
skew

Absolute skew of
less importance

H-tree X-tree
Clock

6
Clock Grid Clock Deskewing

Clock
Low impedance interconnect
Delay Line Delay Line

Deskew
Control

Phase
Ph
Clock Det.

Power Hungry

Clock Ring Example: Alpha 21164 (0.55um)

Clock Frequency 300MHz


Clock
AVG AVG AVG
Transistors 10 Million
Total Clock Load 3.75nF
AVG

AVG

Local Clock Power 20W (out of 50W)


Clock Levels 2
Clocks
AVG

AVG

Driver Size 58cm


A

AVG AVG AVG Clock Grid


TSPC

7
Example: Alpha 21164 Example: Alpha 21164

Clock
Drivers

600 MHz Alpha Hybrid 600 MHz Alpha


Relative Skew
72ps

Four clock
grids under
Clock a balanced
clock net

8
Skew Analysis - Example Skew Analysis - Example

L L a. Determine the L L
M M
R1 U R2 L L L R3 minimum clock R1 U R2 L L L R3
L X
period time if clock L X

clk
lk skew is disregarded clk
lk
Positive "clock skew" Positive "clock skew"

a. Determine the minimum clock period time if clock skew is disregarded.


R1 to R2, tR+2tL+tm+tS = 0.5+2*3.0+1.0+0.5 = 8ns
b. Determine the minimum clock period time if there is 1ns positive clock skew between
adjacent registers. R2 to R3, tR+3tL+tS = 0.5+3*3.0+0.5 = 10ns
c. Determine the minimum clock period time if there is 3ns positive clock skew between
adjacent registers. R2 to R2, tR+2tL+tm+tS = 0.5+2*3.0+1.0+0.5 = 8ns
d. Calculate the maximum clock
clock skew
skew for the datapath, both positive and negative if the Answer: The minimum clock period time is 10 ns
clock signal has a period of 16ns.

Register R setup time tS 0.5 ns Register R setup time tS 0.5 ns

Register R delay time tR 0.5 ns Register R delay time tR 0.5 ns

Logic L delay time tL 3.0 ns Logic L delay time tL 3.0 ns

Mux delay time tM 1.0 ns Mux delay time tM 1.0 ns

Skew Analysis - Example Skew Analysis - Example

b. Determine the c. Determine the


L L
minimum clock period L M minimum clock period L M
time if there is 1ns R1 U R2 L L L R3 time if there is 3ns R1 U R2 L L L R3
L X L X
positive clock skew positive clock skew
between adjacent clk
lk between adjacent clk
lk
registers. Positive "clock skew" registers Positive "clock skew"

R2 to R3, tR+3tL+tS-tSKEW = 0.5+3*3.0+0.5-1 = 9ns R2 to R2, tR+2tL+tm+tS = 0.5+2*3.0+1.0+0.5 = 8ns

Answer: The minimum clock period time is 9 ns (No skew in feedback)

Register R setup time tS 0.5 ns Register R setup time tS 0.5 ns

Register R delay time tR 0.5 ns Register R delay time tR 0.5 ns

Logic L delay time tL 3.0 ns Logic L delay time tL 3.0 ns

Mux delay time tM 1.0 ns Mux delay time tM 1.0 ns

9
Skew Analysis - Example Synchronizing Signals (Metastability)

d. Calculate the
L
maximum clock skew
R1
L M
R2 L L L R3 From asynchronous domains or
for the datapath, both U
L From synchronous domains with different clock
X
positive and negative if
clk
lk
the clock signal has a
Positive "clock skew"
periods
period of 16ns

Negative skew - R2 to R3, 16-tR+3tL+tS = 16-0.5-3*3.0-0.5 = 6ns


(6 ns for clk to R2 plus 10 ns for signal through logic)
Positive skew - R1 to R2, tR+tL+tm+tS = 0.5+3.0+1.0+0.5 = 5ns Asynchronous synchronous
system system
t
(R2 must close before signal arrives)
Register R setup time tS 0.5 ns

Register R delay time tR 0.5 ns


synchronization
Logic L delay time tL 3.0 ns

Mux delay time tM 1.0 ns

Synchronizing Signals (Metastability) Synchronizing Signals (Metastability)

Metastable state: possible output from a flip-flop Can occur if the setup tSU, hold time tH, or clock
pulse width tPW of a flip-flop is not met

DATA IN D Q Q1

Aperture window
CLK

tW

tres tres is DATA IN

important
t SU

for MTBF CLK


tres 1
t CO
Q1 0

Many designers are not aware of t W = Time window where input transition may cause a metastable condition
t SU = Actual clock setup time for flip-flop
metastability t CO = Actual flip-flop propagation delay
t res = Metastability resolution time

10
K 2 tres
Metastability Metastability MTBF = e K1 fCLK f DATA

K 2 tres MTBF variations due to the metastability


resolution time tres
K1 fCLK f DATA
MTBF = e 10
11

10
1000 years
10

Mean Time Between Failure (MTBF) is


9
10
8
1 year
10
exponential 10
7
MTBF
1 month
tres is the slack time available for settling
(seconds) 6
10
5
10
K1 and K2 are constants that are
1 day
4 i
ACTEL ACT 1 Devices
10
f DATA = 1 MHz
characteristics of the flip-flop
1 hour
3
10
F CLOCK = 10 MHz
2

fCLK and fDATA are the frequency of the


10
1
10

synchronizing clock and asynchronous data


2 4 6 8 10
tres (ns)
tres = available slack time (ns)

Synchronizer Synchronizer
Asynchronous input
CLK
Timing Violation
FF1 FF2
D
Da Q1 Q2 D
Ds D
D Q D Q Leads to Metastability

Correct in next Q1
CLK
register if Q1 have
Synchronized signal become stable
A synchronous input
Q2
FF1 FF2
If D is in the low-skew
Global aperture time (setup+hold) of the flip flop
clock Q1
D
Da
D Q D Q Q2
Ds
Q1 is uncertain
However, FF2 might have registered a proper data before
Much higher probability for a stable Q2 than Q1 CLK
CLK
A5 Synchronized signal

11
Synchronous - Asynchronous Why Asynchronous Circuits?

Synchronous Common arguments:


Clock skew Low power - Maybe
Worst case delay sets the speed High speed - Sometimes
Low emission - Yes
Asynchronous
Non-trivial design task due to race Low sensitivity to Process, Voltage, and
Temperature variations - Yes
Solution
S l i N clock
No l k di
distribution
t ib ti and
d ti
timing
i problems
bl Yes
Y
Self-timed design? No clock skew problems - Yes
Less interference to analog domain - Yes

Drawbacks - Asynchronous Design Motivation Asynchron design

Increased complexity and design-time Synchron


Supply current
in two designs
d
Poor support from design tools

Asynchron
Circuit overhead compared to synchronous designs are
Asynchron
100% is not unusual more noise
robust
Metastability, deadlock, and race hazards

12
Noise in Supply Plane Asynchronous Modules

Synchron DSP Asynchron DSP

data data data


logic logic

go done go done
req req req
handshake handshake
ack ack ack

Source: James Awad, Octasic Semiconductor

The most Basic Protocol The Two-Phase Protocol


On both raising
1.The sender issues a request 1. The sender establish stable data
and falling edges
2.The receiver replies by an acknowledge 2. The sender produces a request
3.Then the sender sends the data (No return-to-zero
3. The receiver absorbs data and
transitions)
1. Req produces an acknowledge

2. Ack
Module Module Data
1
1 3. Data 2
n Req 3
2
If the sender initiates the data transfer Ack
The transfer channel is a push-channel
If the receiver initiates the transfer Cycle 1 Cycle 2
The channel is a pull-channel

13
The Four-Phase Protocol Return-to- The Muller-C Element
zero
transitions
A
Static
1. The sender issues data and sets Req to high A B Q S
2. The receiver absorbs the data and sets Ack to high Q
0 0 0
3
3. The sender responds by setting Req to low R
0 1 Q B
4. The receiver acknowledges by setting Ack to low
1 0 Q
1 1 1
VDD

Data A Dynamic
1
A
3 B
Req
4 Q
2 C Q B
Ack
B A
Cycle 1 Cycle 2

Two-Phase Handshake Protocol A B Q Four-Phase Handshake Protocol


0 0 0
0 1 Q
Implementation using Muller-C elements Implementation using Muller-C elements
1 0 Q
Data
1 1 1
Sender
Data Receiver
Sender
logic
Receiver
logic
Logic
n Logic
Data ready Data accepted

Req
Data 010 Data S
ready 01Q0Q accepted C C
C Q
R
Req Ack
101
010 Handshake logic

Ack

14
Advanced Digital IC-Design Student Lectures

Send your slides to me, latest the night before


your presentation
Clocking & Timing Preferred format - .ppt

You will be evaluated by your fiends

Cont.
Cont Please look at the template:

http://www.eit.lth.se/course/eti135 -> Presentations

Home Exercises Advanced Digital IC Design

Invited Lecture
Solutions to 4 hand
hand-in
in
assignments are required, see
Static timing analysis 11/02, 15.15-17.00
http://www.eit.lth.se/course/eti135 -> Home Exercises
Design
es g for
o test is
s ca
canceled
ce ed

Deadline: March 8

15
Circuit Implementation Styles 2-Phase Protocol

Four-phase bundled-data which most closely resembles the


design
g of synchronous
y circuits and which normally y leads to the
most efficient circuits, due to the extensive use of timing
assumptions (example: Amulet 2 processor).

Two-phase bundled-data known also as micropipelines and


introduced by Ivan Sutherland in his 1988 Turing Award lecture
(example: Amulet 1 processor)

Four-phase dual-rail the classic approach introduced by


Mullers pioneering work in the 1950s.

Two-phase dual-rail such as Level-Encoded two-phase Dual-


Rail scheme (LEDR).

Example Example

From [Horowitz]

16
Example Example

Completion Signal Generation Self-Timed Pipelining

Start

B0
Req Req Req
B Hand Hand Hand
B1 Ack Shake Ack Shake Ack Shake
A&A
PDN PDN Start Done Start Done Start Done
B&B Dual

Dual Rail
In R1 F1 R2 F2 R3 F3 Out
Start
Used in
Phase B0 B1 B Comment self-timed
Precharge 0 0 0 Not Done modules tp1 tp2 tp3
Evaluation 0 1 1 Done
Evaluation 1 0 1 Done
Illegal 1 1 - Illegal

17
Delay Model Delay Matched Completion Detection

Req Req
Hand Hand
Ack Shake Ack Shake Delay replicas matched
to critical paths
Start Done Start Done
Worst-case delay
Sensitive to process
Delay Model Delay Model variations
e.g. Critical Path e.g. Critical Path Small circuit overhead

In R1 F1 R2 F2 Out

Combined Methods Completion Detection

Req Req Req


Hand Hand Hand
Ack Shake Ack Shake Ack Shake C
Start Done Start Done Start Done Dual Rail
C Done
Self- Delay- Self-
Logic
In R1 R2 R3 Out
timed Model timed C

tp1 tp2 tp3

Waits for all parts to be ready

18
Other Asynchron Modules Synchronous Asynchronous
Linear Pipelines (only one input and output)
Global

Traditional
Non-Linear Pipelines

S
Synchronous Asynchronous

AL

Prracticed
F

G
F
Join Fork

F F Globally Asynchronous Local


Locally Synchronous
Conditional Conditional
Join Split

Divide into smaller synchronous blocks Synchronous - Asynchronous

Globally Asynchronous Locally Synchronous to Avoid Skew


Clocking becomes less troublesome
for small clock domains Input delay state
Input
p reference

up/down Delay control counter

Digitally Controlled Oscillator

Cycle counter

Multiplication
factor
Output clock

Local synchronous clock generation

19
Clock Generation (PLL) PLL (AXIS)

Higher On-Chip Frequency

Off-Chip On-Chip
Clock VCO Clock
Phase Loop
Voltage-contr.
Detector Filter oscillator

Divider

20

S-ar putea să vă placă și