Documente Academic
Documente Profesional
Documente Cultură
S HANTANU D UTT
Department of Electrical and Computer Engineering University of Illinois, Chicago Phone: (312) 355-1314; e-mail: dutt@eecs.uic.edu URL: http://www.eecs.uic.edu/dutt
Shantanu Dutt
UIC
Pipelining Introduction
1. Pipelining is the processing concept in which the entire processing ow is broken up into multiple stages, and a new data/instruction is processed by a stage potentially as soon as it is done with the current data/instruction, which then goes onto the next stage for further processing.
2. In a non-pipelined processing, by contrast, the next data/instruction is processed after the entire processing of the previous data/instruction is complete.
Data/ Instr. Delay = D Processing Path Data/ Instr. k stages each of delay D/k
(a) Non-pipelined processing; new data/instr. inputted every D secs. A data/instr. is done evry D secs.
(b) Pipelined processing: new data/instr. is inputted every D/k secs. After initial "fill", a data/instr. is done every D/k secs
3. A data/instructions processing in a pipeline is complete when it exits the last stage. If there are k stages each of equal delay, then the processing throughput (# of data/instruction processed per second) increases roughly by a factor of k.
4. The real-world has pipelining! E.g., water/oil pipelineprocessing is sending all the uid from source to destination, TV/radio signal communicationprocessing is sending all the signals from source to destination.
(a)
(E = R+Ex+W)
(b)
2. Each stage now requires its own C.U., as control signals need to be generated simultaneously for the processing in each stage: E.g., while the read and other control signals are asseted in the Fetch stage for instruction i, the Read stage is asserting the control signals for reading the operands for instruction i 2, the Decode stage is guring out which next stage to forward instruction i 1 to, the Execute stage is sserting the control signals for the right ALU operation for instruction i 3, while the Write stage is asserting the register write signals for instruction i 4.
3. Input registers are required for each stage to store intermediate values and instructions that are passed between the stages. E.g., the input register
of the Decode stage will store instruction i 1, the input register of the Read stage will store instruction i 2, the input register of the Execute stage will store the opcode and write register elds of instruction i 3 as well as the twi input operands generated by the Read stage in its previous processing of nstruction i 3, while the Write stages input register would store the ALU output and the write register eld of instruction i 4.
4. Further, the C.U. of stage j can only store new data.instruction in the input register of stage j 1 after it receives a signal (done j 1) from stage j 1 informing it that it is done with its current processing (much like wait is used to communicate between the memory unit and the processor).
5. Also, the C.U. of stage j can only start processing new data/instruction in its input register when it receives a signal new j 1 from stage j 1 informing it that new data/instruction has been written into its input register (which, will be done, of course, only after stage j asserts signal done j to stage j 1.
6. Some extra delay has been added to the total delay/latency for processing a single instruction due to the writing and reading of the stage registers that is needed in the pipelined version. We will, however, ignore this extra delay in our discussions.
Pipeline ThroughputNon-branching model 1. Let ti be the delay of stage i. Then, non-pipelined time T no processing n instruction is T no
pipe
pipe
n for
i 1
ti
k
where the Fill Time is the time taken to ll the pipeline, i.e., the time taken for the rst instruction to emerge from the pipeline, by which time, all pipeline stages ate lled or busy processing some instruction; before the Fill Time, some stages may be idle. The Fill Time is given by Fill Time sumik 1ti
Fill Time
1 max ti
i 1
0 3 ccs F1 D1 E1
F2 D2 E2
Instructions
F3 D3
E3
F4
D4
E4
D 1 4 5 7 8 10 11 13 14 16 2 2 3 3 4 4 1
Done
1 2 2 3 3 4 4 4 3 2 1
3. Pipelined throughput is given by n T pipe n for a large n and is in units of instructions / sec. Non-pipeline throughput is given by n T no pipe n 1 ik 1 ti and is in units of instructions / sec.
4. Example 1: If all tis are equal and that value is t, then non-pipeline
throughput is 1 kt instr/sec, while the pipelined throughput is 1 t instr/sec, which is k times more than the non-pipelined version.
5. Example 2: Consider a pipeline with 3 stages F, D, E with delays of 3 ccs, 1 cc and 2 ccs, respectively. T no n
pipe
n 6 cc s
T pipe n 6
1 3 cc s
1 3 instr/cc.
read
MAR
mar_sel Fetch CU r0write r0write
ri_sel rj_sel rk_sel
O1 O2
IR1
ir1_sel
IR0
MDR
dr_sel[0..1] O1 O2
CPU
Fetch Stage
opcode
Mux
Register File
O1 O2
Read CU
a_sel[0..2] b_sel[0..2]
Read Bus A
Opcode register
ALU CU
cout, m7, v
cin
Write CU