Sunteți pe pagina 1din 16

2/24/2020

Pipelining

Kuruvilla Varghese
DESE
Indian Institute of Science

Kuruvilla Varghese

Pipelining - Genesis 2

• Manufacturing assembly line


• e.g. Automobile engine assembly line
– 1 Worker assembling a whole engine.
– 10 workers assembling engine in parts sequentially.
• Latency
• Throughput

Kuruvilla Varghese

1
2
2/24/2020

Idea: Large Datapath 3

tcomb

clk

tclk > [tco + tcomb + ts]maxpath

Latency: 1 Clock Cycle

Kuruvilla Varghese

Breaking Datapath 4

tcomb /3 tcomb /3 tcomb /3

tclk > [tco + tcomb/3 + ts]maxpath

Latency: 3 Clock Cycles


But tclk is ~ 1/3 of the earlier case
Throughput approximately 3 times the earlier case

Kuruvilla Varghese

2
4
2/24/2020

Pipelining Data path 5

• Breaking large datapath into smaller ones.


• Shorter clock cycles.
• Multiple data computations done concurrently (at different stages of
completion)
• I.e. In previous example, if ith instant of the data is in 3rd stage of
computation, (i+1)th instant of the data is in 2nd stage of computation
and (i+2)th instant of the data is in the first stage of computation.
• More latency, but increases throughput.
• Useful only if data flow is continuous, and length (size) of the data is
much higher compared to the number of pipeline stages.
Kuruvilla Varghese

Practical scenario 6

comb1 comb2 comb3

• Three combinational blocks of somewhat similar delay.


• Balance it approximately equal by proper combination of blocks.
• Or it could be different stages of a regular structure (e.g. Array
multiplier, Ripple Adder, etc.)

Kuruvilla Varghese

3
6
2/24/2020

Pipelining 7

tcomb1 tcomb2 tcomb3

tclk > [tco + max (tcomb1, tcomb2, tcomb3) + ts]maxpath

Kuruvilla Varghese

Pipelining – Balancing Delays 8

5 ns 10 ns

• It may happen that natural block delineation for pipelining may


not balance the delays across blocks, forcing to choose the largest
block delay as the clock period.
• One way is to further pipeline the large delay blocks to balance
delays.

Kuruvilla Varghese

4
8
2/24/2020

Pipeline within a block 9

5 ns 10 ns

5
ns
5
ns

• Suppose the 10 ns block contains an 8 bit ripple adder.


• This can be split in to two 4 bit ripple adders and pipeline registers
can be introduced in between, to achieve a 5 ns clock period.
• Sometime, such intra-stage pipelining is called micro-pipelining.

Kuruvilla Varghese

Pipeline within a block 10

5 ns 10 ns

5
ns
5
ns

• This situation can happen when you integrate a standard IP from some
vendors, the delay of the IP may not match the delay of your blocks.
• IP vendors sometime provides IP’s with customizable pipelines inside
IP.
• Designer would be able to customize IP with number of pipeline
stages inside.
Kuruvilla Varghese

5
10
2/24/2020

Resource manipulation 11

10 ns 5 ns

• Consider the above scenario; Suppose we can not split the 10 ns


block further, what can we do to balance the delays?
• Duplicate 10 ns block assuming the availability of input data, and
share the 5 ns block by both outputs.

Kuruvilla Varghese

11

Resource Replication and Sharing 12

10 ns

1 5 ns

10 ns

clk

Kuruvilla Varghese

6
12
2/24/2020

Resource Replication and Sharing 13

• 10 ns block is duplicated, assuming availability of data at the


input of both blocks.
• 5 ns block is shared by both outputs.

Kuruvilla Varghese

13

Retiming 14

• Introduce registers so that all inputs to a particular block


reaches at same clock cycle.
• Introduce registers so that all outputs arrives at the same
clock cycle
Kuruvilla Varghese

7
14
2/24/2020

Retiming 15

Kuruvilla Varghese

15

Trick 16

• Extend all the inputs to start from the left most side.
• Let all outputs extend to the right-most output side.
• Introduce registers vertically in all paths.

Kuruvilla Varghese

8
16
2/24/2020

Logic Delays 18

A
A Y tcd
Comb tpd
Y

tcd: Logic contamination delay


– Glitches due to unbalanced path delays
– Can also be called tpd(min)
tpd: Logic propagation delay

Kuruvilla Varghese

18

Logic Delays 19

A sel 0 1
0 1 2 3

B Y
C Y2 1 0 0 0
D Y1 0 0 0 1
1 1 0 0
sel Y0 tcd
tpd

Y 5 1 0 2

tY1 > tY0 > tY2


tcd: Logic contamination delay (tpd(min))
tpd: Logic propagation delay
Kuruvilla Varghese

9
19
2/24/2020

Logic Delays / Glitches / Hazard 20

• As we have discussed hazard in the introductory classes


• This is a real life example of it, A multiplexer is nothing but AND
gates followed with an OR Gate (conceptually), difference in path
delays create Static-1 Hazard

Kuruvilla Varghese

20

Register Delays 21

• Here, the Glitches occur due to difference in path delays


• Same thing can happen at the output of registers due to
differences in tcq of individual flip-flops
• So, is with latches, but there are two types of delays to
consider tcq (C-to-Q delay) and tdq (D-to-Q delay, in
transparent mode).

Kuruvilla Varghese

10
21
2/24/2020

Flip-Flop Delays 22

ts th

CLK
D Q
D
CLK
tccq
tpcq

ts: Setup time th: Hold time


tccq: Clock-to-Q contamination delay (tcq(min))
tpcq: Clock-to-Q propagation delay

Kuruvilla Varghese

22

Latch Delays 23

ts th

CLK
D Q
D
CLK tccq tpdq
tpcq tcdq

tccq: Clock-to-Q contamination delay In FF, tcq appears at the positive edge
for +ve edge triggered FF
tpcq: Clock-to-Q propagation delay
In a latch which is transparent when
tcdq: D-to-Q contamination delay clock is high tcq is defined at positive
tpdq: D-to-Q propagation delay edge, not at the latching –ve edge
Kuruvilla Varghese

11
23
2/24/2020

Pipelining: Flip Flops 24

D1 Q1 D2 Q2
Comb
CLK CLK

tclk

CLK
D1
tccq
tpcq
tclk > [tpcq + tpd + ts]maxptah
ts
Q1
tpd

D2

Kuruvilla Varghese

24

Pipelining: Flip Flops 25

• In pipelining using flip-flops, one is forced to choose the clock period greater than
the delay of the stage with largest delay, even if all other stages has less delays.
• But, in pipelining using latches, a stage can borrow the time from following
stages, as the latches will be in transparent mode when the clock signal is high.
• Or, delay across multiple cycles can be accommodated, even if individual stage
delays vary.
• There will be twice the number of stages with each stage with half the delay in
case of pipelining with latches as compared to pipelining with flip-flops.
• Odd numbered stage Latches are clocked using original clock signal and even
numbered stage latches are clocked by inverted clock signal (starting number 1).

Kuruvilla Varghese

12
25
2/24/2020

Pipelining: Latches (Time Borrowing) 26

Q1 D2 D3
D1 Q2 Q3

L1 comb1 L2 comb2 L3
CLK CLK/ CLK

CLK

CLK/

D1
Q1

D2

Q2

D3

Q3

26

Clocking 27

• In both cases, the clock frequency is same. In the case with flip-flops,
a single stage pipeline is used.
• In the case with latches, a double stage pipeline is used.
• To analyze consider a four stage combinational blocks with path
delays 12 ns, 12 ns, 8 ns, and 8 ns. Clock period of 20 ns is the target
with 2 stage pipelining.
• For pipelining with flip-flops analyze 2 stage pipeline with 24 ns and
16 ns.
• For pipelining with latches analyze 4 stage pipeline with 12 ns, 12 ns,
8 ns, and 8 ns

Kuruvilla Varghese

13
27
2/24/2020

Latches: Time Borrowing 28

• When CLK is low, odd numbered latches (L1, L3, …) are in latch
state and even numbered latches (L2, L4, …) are in transparent state,
allowing odd numbered stages to borrow time from even numbered
stages.
• When CLK is high, even numbered latches (L2, L4, …) are in latch
state and odd numbered latches (L1, L3, …) are in transparent state,
allowing even numbered stages to borrow time from odd numbered
stages.

Kuruvilla Varghese

28

Latches: Time Borrowing 29

• Owing to the transparent nature of latches, unequal delays could be


accommodated across stages.
• This could accumulate across multiple cycles.
• In cases, where there is feedback, overall loop delay must be less than
the total duration of the clock periods.
• e.g. In a single cycle loop, total path delay must be less than one clock
period.

Kuruvilla Varghese

14
29
2/24/2020

Control of pipeline: Data independent 30

• In custom pipelining (e.g. Multiplier) control is mainly


clocking the registers, as the data path operations are not
data dependent
• i.e. All streaming data undergo the same computation at
each stage.

Kuruvilla Varghese

30

Control of pipeline: Data dependent 31

• But like in the case of a CPU, there could be data dependent


computation.
• In such cases, the operations are specified along with data by
instructions.
• In the early stage, instruction is decoded and control signals
per pipeline stage are sent through the pipeline for proper
timing
• i.e. there would be datapath pipeline and control path
pipeline
Kuruvilla Varghese

15
31
2/24/2020

Control of pipeline: Data dependent 32

Datapath
Control path

con
trol

Kuruvilla Varghese

32

33

Thank You

Kuruvilla Varghese

16
33

S-ar putea să vă placă și