KV Pipeline Delay Pres PDF

2/24/2020
Pipelining
Kuruvilla Varghese
DESE
Indian Institute of Science
Kuruvilla Varghese
Pipelining - Genesis 2
• Manufacturing assembly line

• e.g. Automobile engine assembly line
– 1 Worker assembling a whole engine.
– 10 workers assembling engine in parts sequentially.
• Latency
• Throughput
Kuruvilla Varghese
1
2
2/24/2020
Idea: Large Datapath 3
tcomb
clk
tclk > [tco + tcomb + ts]maxpath
Latency: 1 Clock Cycle
Kuruvilla Varghese
Breaking Datapath 4
tcomb /3 tcomb /3 tcomb /3
tclk > [tco + tcomb/3 + ts]maxpath
Latency: 3 Clock Cycles

But tclk is ~ 1/3 of the earlier case
Throughput approximately 3 times the earlier case
Kuruvilla Varghese
2
4
2/24/2020
Pipelining Data path 5
• Breaking large datapath into smaller ones.

• Shorter clock cycles.
• Multiple data computations done concurrently (at different stages of
completion)
• I.e. In previous example, if ith instant of the data is in 3rd stage of
computation, (i+1)th instant of the data is in 2nd stage of computation
and (i+2)th instant of the data is in the first stage of computation.
• More latency, but increases throughput.
• Useful only if data flow is continuous, and length (size) of the data is
much higher compared to the number of pipeline stages.
Kuruvilla Varghese
Practical scenario 6
comb1 comb2 comb3
• Three combinational blocks of somewhat similar delay.

• Balance it approximately equal by proper combination of blocks.
• Or it could be different stages of a regular structure (e.g. Array
multiplier, Ripple Adder, etc.)
Kuruvilla Varghese
3
6
2/24/2020
Pipelining 7
tcomb1 tcomb2 tcomb3
tclk > [tco + max (tcomb1, tcomb2, tcomb3) + ts]maxpath
Kuruvilla Varghese
Pipelining – Balancing Delays 8
5 ns 10 ns
• It may happen that natural block delineation for pipelining may

not balance the delays across blocks, forcing to choose the largest
block delay as the clock period.
• One way is to further pipeline the large delay blocks to balance
delays.
Kuruvilla Varghese
4
8
2/24/2020
Pipeline within a block 9
5 ns 10 ns
5
ns
5
ns
• Suppose the 10 ns block contains an 8 bit ripple adder.

• This can be split in to two 4 bit ripple adders and pipeline registers
can be introduced in between, to achieve a 5 ns clock period.
• Sometime, such intra-stage pipelining is called micro-pipelining.
Kuruvilla Varghese
Pipeline within a block 10
5 ns 10 ns
5
ns
5
ns
• This situation can happen when you integrate a standard IP from some
vendors, the delay of the IP may not match the delay of your blocks.
• IP vendors sometime provides IP’s with customizable pipelines inside
IP.
• Designer would be able to customize IP with number of pipeline
stages inside.
Kuruvilla Varghese
5
10
2/24/2020
Resource manipulation 11
10 ns 5 ns
• Consider the above scenario; Suppose we can not split the 10 ns

block further, what can we do to balance the delays?
• Duplicate 10 ns block assuming the availability of input data, and
share the 5 ns block by both outputs.
Kuruvilla Varghese
11
Resource Replication and Sharing 12
10 ns
1 5 ns
10 ns
clk
Kuruvilla Varghese
6
12
2/24/2020
Resource Replication and Sharing 13
• 10 ns block is duplicated, assuming availability of data at the

input of both blocks.
• 5 ns block is shared by both outputs.
Kuruvilla Varghese
13
Retiming 14
• Introduce registers so that all inputs to a particular block

reaches at same clock cycle.
• Introduce registers so that all outputs arrives at the same
clock cycle
Kuruvilla Varghese
7
14
2/24/2020
Retiming 15
Kuruvilla Varghese
15
Trick 16
• Extend all the inputs to start from the left most side.
• Let all outputs extend to the right-most output side.
• Introduce registers vertically in all paths.
Kuruvilla Varghese
8
16
2/24/2020
Logic Delays 18
A
A Y tcd
Comb tpd
Y
tcd: Logic contamination delay

– Glitches due to unbalanced path delays
– Can also be called tpd(min)
tpd: Logic propagation delay
Kuruvilla Varghese
18
Logic Delays 19
A sel 0 1
0 1 2 3
B Y
C Y2 1 0 0 0
D Y1 0 0 0 1
1 1 0 0
sel Y0 tcd
tpd
Y 5 1 0 2
tY1 > tY0 > tY2

tcd: Logic contamination delay (tpd(min))
tpd: Logic propagation delay
Kuruvilla Varghese
9
19
2/24/2020
Logic Delays / Glitches / Hazard 20
• As we have discussed hazard in the introductory classes

• This is a real life example of it, A multiplexer is nothing but AND
gates followed with an OR Gate (conceptually), difference in path
delays create Static-1 Hazard
Kuruvilla Varghese
20
Register Delays 21
• Here, the Glitches occur due to difference in path delays

• Same thing can happen at the output of registers due to
differences in tcq of individual flip-flops
• So, is with latches, but there are two types of delays to
consider tcq (C-to-Q delay) and tdq (D-to-Q delay, in
transparent mode).
Kuruvilla Varghese
10
21
2/24/2020
Flip-Flop Delays 22
ts th
CLK
D Q
D
CLK
tccq
tpcq
ts: Setup time th: Hold time

tccq: Clock-to-Q contamination delay (tcq(min))
tpcq: Clock-to-Q propagation delay
Kuruvilla Varghese
22
Latch Delays 23
ts th
CLK
D Q
D
CLK tccq tpdq
tpcq tcdq
tccq: Clock-to-Q contamination delay In FF, tcq appears at the positive edge
for +ve edge triggered FF
tpcq: Clock-to-Q propagation delay
In a latch which is transparent when
tcdq: D-to-Q contamination delay clock is high tcq is defined at positive
tpdq: D-to-Q propagation delay edge, not at the latching –ve edge
Kuruvilla Varghese
11
23
2/24/2020
Pipelining: Flip Flops 24
D1 Q1 D2 Q2
Comb
CLK CLK
tclk
CLK
D1
tccq
tpcq
tclk > [tpcq + tpd + ts]maxptah
ts
Q1
tpd
D2
Kuruvilla Varghese
24
Pipelining: Flip Flops 25
• In pipelining using flip-flops, one is forced to choose the clock period greater than
the delay of the stage with largest delay, even if all other stages has less delays.
• But, in pipelining using latches, a stage can borrow the time from following
stages, as the latches will be in transparent mode when the clock signal is high.
• Or, delay across multiple cycles can be accommodated, even if individual stage
delays vary.
• There will be twice the number of stages with each stage with half the delay in
case of pipelining with latches as compared to pipelining with flip-flops.
• Odd numbered stage Latches are clocked using original clock signal and even
numbered stage latches are clocked by inverted clock signal (starting number 1).
Kuruvilla Varghese
12
25
2/24/2020
Pipelining: Latches (Time Borrowing) 26
Q1 D2 D3
D1 Q2 Q3
L1 comb1 L2 comb2 L3
CLK CLK/ CLK
CLK
CLK/
D1
Q1
D2
Q2
D3
Q3
26
Clocking 27
• In both cases, the clock frequency is same. In the case with flip-flops,
a single stage pipeline is used.
• In the case with latches, a double stage pipeline is used.
• To analyze consider a four stage combinational blocks with path
delays 12 ns, 12 ns, 8 ns, and 8 ns. Clock period of 20 ns is the target
with 2 stage pipelining.
• For pipelining with flip-flops analyze 2 stage pipeline with 24 ns and
16 ns.
• For pipelining with latches analyze 4 stage pipeline with 12 ns, 12 ns,
8 ns, and 8 ns
Kuruvilla Varghese
13
27
2/24/2020
Latches: Time Borrowing 28
• When CLK is low, odd numbered latches (L1, L3, …) are in latch
state and even numbered latches (L2, L4, …) are in transparent state,
allowing odd numbered stages to borrow time from even numbered
stages.
• When CLK is high, even numbered latches (L2, L4, …) are in latch
state and odd numbered latches (L1, L3, …) are in transparent state,
allowing even numbered stages to borrow time from odd numbered
stages.
Kuruvilla Varghese
28
Latches: Time Borrowing 29
• Owing to the transparent nature of latches, unequal delays could be

accommodated across stages.
• This could accumulate across multiple cycles.
• In cases, where there is feedback, overall loop delay must be less than
the total duration of the clock periods.
• e.g. In a single cycle loop, total path delay must be less than one clock
period.
Kuruvilla Varghese
14
29
2/24/2020
Control of pipeline: Data independent 30
• In custom pipelining (e.g. Multiplier) control is mainly

clocking the registers, as the data path operations are not
data dependent
• i.e. All streaming data undergo the same computation at
each stage.
Kuruvilla Varghese
30
Control of pipeline: Data dependent 31
• But like in the case of a CPU, there could be data dependent

computation.
• In such cases, the operations are specified along with data by
instructions.
• In the early stage, instruction is decoded and control signals
per pipeline stage are sent through the pipeline for proper
timing
• i.e. there would be datapath pipeline and control path
pipeline
Kuruvilla Varghese
15
31
2/24/2020
Control of pipeline: Data dependent 32
Datapath
Control path
con
trol
Kuruvilla Varghese
32
33
Thank You
Kuruvilla Varghese
16
33

KV Pipeline Delay Pres PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

KV Pipeline Delay Pres PDF

Încărcat de

Drepturi de autor:

Formate disponibile

2/24/2020

• Manufacturing assembly line

Idea: Large Datapath 3

tclk > [tco + tcomb + ts]maxpath

Latency: 1 Clock Cycle

tcomb /3 tcomb /3 tcomb /3

tclk > [tco + tcomb/3 + ts]maxpath

Latency: 3 Clock Cycles

Pipelining Data path 5

• Breaking large datapath into smaller ones.

comb1 comb2 comb3

• Three combinational blocks of somewhat similar delay.

tcomb1 tcomb2 tcomb3

tclk > [tco + max (tcomb1, tcomb2, tcomb3) + ts]maxpath

Pipelining – Balancing Delays 8

• It may happen that natural block delineation for pipelining may

Pipeline within a block 9

• Suppose the 10 ns block contains an 8 bit ripple adder.

Pipeline within a block 10

• Consider the above scenario; Suppose we can not split the 10 ns

Resource Replication and Sharing 12

Resource Replication and Sharing 13

• 10 ns block is duplicated, assuming availability of data at the

• Introduce registers so that all inputs to a particular block

tcd: Logic contamination delay

tY1 > tY0 > tY2

Logic Delays / Glitches / Hazard 20

• As we have discussed hazard in the introductory classes

• Here, the Glitches occur due to difference in path delays

ts: Setup time th: Hold time

Pipelining: Flip Flops 24

Pipelining: Flip Flops 25

Pipelining: Latches (Time Borrowing) 26

Latches: Time Borrowing 28

Latches: Time Borrowing 29

• Owing to the transparent nature of latches, unequal delays could be

Control of pipeline: Data independent 30

• In custom pipelining (e.g. Multiplier) control is mainly

Control of pipeline: Data dependent 31

• But like in the case of a CPU, there could be data dependent

Control of pipeline: Data dependent 32

S-ar putea să vă placă și