Sunteți pe pagina 1din 10

An Introductory Analysis of Pipelines

Consider a 5-stage instruction pipeline as shown below:




A time-space diagram is used to describe the progress of instructions through the pipeline.

s
t
a
g
e
s


WB I
1
I
2
I
3
I
4
I
5
I
6

M I
1
I
2
I
3
I
4
I
5
I
6
I
7

EX I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8

ID I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8
I
9

IF I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8
I
9
I
10

1 2 3 4 5 6 7 8 9 10
Clock Cycles
(Pipelined Execution)

Weve assumed that every stage takes one clock cycle and there are no hazards in the instruction stream.
Instruction Latency (the time it takes to complete an instruction) = 5 cycles
Instruction Throughput = 6/10 IPC = 0.6 IPC
In order to gain better appreciation of pipelined execution, we draw time-space diagram for non-pipelined
execution as shown below:

WB I
1
I
2

M I
1
I
2

EX I
1
I
2

ID I
1
I
2

IF I
1
I
2

1 2 3 4 5 6 7 8 9 10
Clock Cycles
(Non-Pipelined Execution)

Instruction Latency = 5 cycles
Instruction Throughput = 2/10 IPC = 0.2 IPC (instructions per cycle)
Thus pipelined execution improves instruction throughput. However, it doesnt improve instruction
latency. In practice, pipelining increases instruction latency due to delay of pipeline registers
Speedup
Suppose that a k-stage instruction pipeline executes a program containing n instructions. Let be the
cycle time.
Execution time on non-pipelined computer is given as
t
np
= nk ----------(1)
Execution time on pipelined computer is given as
t
p
= (k 1 + n) ----------(2)
IF ID EX M WB


where, (k 1) cycles are required to fill up the pipeline (also called pipeline setup time). By definition,
speedup S of pipelined execution over non-pipelined execution is given as
( )
) 3 (
1
1
after
before

+
=
+
=
=
=
n k
nk
n k
nk
t
t
t enhancemen time
t enhancemen time
S
p
np
t
t

Clearly, for a given pipeline, greater speedup is achieved, as more and more instructions are executed. We
can compute the upper bound on speedup as follows:
k
n
k
k
n S
Lim
ideal
=
+

=
1
1

We regard it as ideal speedup because its derivation is based on the assumption of no pipeline hazards. As
can be seen, even ideal speedup cannot go beyond pipeline depth (i.e. number of pipeline stages).
Instruction Throughput
Instruction throughput is defined as the number of instructions executed per unit time. This is
calculated as:

Multiplying numerator and denominator of (4) by k, we can express in terms of speedup S as:
t
e
k
S
=

The upper bound on is similarly found:
t
t
e
/ 1
1
1
1
=
|
.
|

\
|
+

=
n
k
n
Lim
ideal




( )
) 4 (
1

+
=
t
e
n k
n


CPI
Cycles per instruction (CPI) of pipelined
execution can be found as:

The lower bound on CPI is

1
1
k
n
| |
+
|
\ .

=1


( )
1
1
1
+

=
+
=
n
k
n
n k
CPI


Overview of MIPS ISA

MIPS is an acronym for Microprocessor without Interlock Pipeline Stages. MIPS is very popular
microprocessor in embedded devices. Salient features of its ISA are described below:
All instructions are 32-bit wide (fixed-length instructions)
o Fixed-length instructions are easy to decode (simple decoding logic and hence fast decoding)
as opposed to variable-length instructions
o With fixed-length instructions, its easy to generate address of next instruction to be fetched.
(address of next instruction = PC + instruction-length)
o The downside of fixed-length is poor storage economy as opposed to variable-length
instructions that use as much storage as required.
MIPS is a byte-addressable machine that means every byte has a unique address.
Address are 32-byte wide
There are 32 general-purpose registers each of size 32-bit
These features are examples of a design principle: simplicity favors regularity. Simple designs are
usually fast and easy to debug and improve.

Instruction Formats
An instruction format is a breakup of instruction into different fields, each field being reserved for a
specific purpose. MIPS has three instruction formats (the lesser the number of instruction formats, the
simpler the I SA will be)
1. R (Register) Format
This divides the instruction into six fields as follows:
6 5 5 5 5 6
op rs rt rd shamt funct

Where,
op = opcode
rs = identifier of first source register
rt = identifier of second source register
rd = identifier of destination register
shamt = shift amount indicating how many times a register must be shifted left or right
(only
used in shift instructions)
funct = distinguishes among R-type instructions as all R-type instructions have op = 0.


Example:
add $1, $2, $3
The machine encoding of this instruction will be as follows:

0 2 3 1 0 32
All arithmetic/logic instructions in MIPS are 3-address instructions i.e. they need to specify three
operands. Hence, MIPS is a 3-address machine. No arithmetic/logic instruction is allowed to have a
memory location as one of the operands i.e. only registers or in some instructions one immediate
operand is allowed. Hence, MIPS is a register-register architecture.
2. I (Immediate) Format
This divides the instruction into four fields as follows:
6 5 5 16
op rs rt Immediate/offset

This instruction format is used by following instruction types:
Example 1:
addi $1, $2, 25

This instruction adds 25 to the contents of register $2 and stores the result in $1. The machine
encoding of this instruction will be as follows:
8 2 1 25

Example 2: Data Transfer (Memory Reference Instructions)
lw $1, 40($2)
This instruction loads a word (1 word = 32 bits i.e. 4 bytes) from memory at the address given by $2
+ 40 into register $1. The register $2 contains the base register and thus regarded as base register.
That is, in register transfer language (RTL), the working of above instruction can be described as
follows:
$1 Mem [$2 + 40]
The machine encoding of this instruction will be as follows:
35 2 1 40

Please note that here rt is interpreted as a destination register rather than a source register.
sw $1, 40($2)
This instruction does the reverse of lw. Specifically, it stores a word from a CPU register ($1 in this
example) into memory at the address given by $2 + 40 The RTL description of the instruction
follows:
Mem [$2 + 40] $1


The machine encoding of this instruction will be as follows:
43 2 1 40

In MIPS only load/store instructions are allowed to access memory. No other instruction can access
memory. Such an architecture is called load-store architecture. You must appreciate that load-store
architecture = register-register architecture.
All RISC (Reduced Instruction Set Computers) use load-store architecture. Whats reduced in a
RISC? Instruction formats, addressing modes, number of instructions, virtually everything except a
large set of general-purpose CPU registers. RISC is actually based on the philosophy: less is more.
This is in contrast to the design philosophy of CISC (Complex Instruction Set Computers). MIPS is
an example of RISC machine
Example 3: Branch Instruction
There are two types of branch instructions in the MIPS:
1. beq $1, $2, 25
This compares contents of $1 and $2 and transfers control to (i.e. jumps to) an instruction
(called target instruction) located at the following address:
BTA (branch target address) = (PC + 4) + offset x 4
Where PC contains the address of branch instruction. In this example, the offset is 25. This
offset is signed (2s complement notation) expressed in words to increase the branching
distance.
The machine encoding of this instruction will be as follows:
4 1 2 25

Forward branching distance = 2
16 1
1 = (2
15
1) words

Backward branching distance = 2
16 1
= (2
15
) words



2. bne $1, $2, 25
Operates similar to beq with the difference that it tests inequality.


******





MIPS Pipeline
1. Instruction Fetch (IF) Stage
a. Instruction Fetch
Instructions address in PC is applied to instruction memory that causes the addressed instruction
to become available at the output lines of instruction memory.
b. Updating PC
The address in PC is incremented by 4 but what is written in PC is determined by the control
signal PCSrc. Depending upon the status of control signal PCSrc, PC is either written by the
branch target address (BTA) or the sequential address (PC + 4).
2. Instruction Decode (ID) Stage
a. Instruction is decoded by the control unit that takes 6-bit opcode and generates control signals.
b. The control signals are buffered in the pipeline registers until they are used in the concerned stage
by the corresponding instruction.
c. Registers are also read in this stage. Note that the first source registers identifier in every
instruction is at bit positions [25:21] and second source registers identifier (if any) is at bit
positions [20:16].
d. The destination registers identifier is either at bit positions [15:11] (for R-type) or at [20:16] (for
lw and addi). The correct destination registers identifier is selected via multiplexer controlled by
the control signal RegDst. However, this multiplexer is placed in the EX stage because the
instruction decoding is not finished until the second stage is complete. This identifier is buffered
until the WB stage because an instruction writes a register in the WB stage.
3. Execution (EX) Stage
a. This stage is marked by the use of ALU that performs the desired operation on registers (R-type),
calculates address (memory reference instructions), or compares registers (branch).
b. An ALU control accepts 6-bit funct field and 2-bit control signal ALUOp to generate the
required control signal for the ALU.
c. BTA is also calculated in the EX stage by a separate adder.
4. Memory (M) Stage
a. Data memory is read (lw) or written (sw) using the address calculated by the ALU in EX stage.
b. ZERO output of ALU and BRANCH signal generated by the control unit are ANDed to
determine the fate of branch (taken or not taken).




5. Write Back (WB) Stage
a. Result produced by ALU in EX stage (R-type) or data read from data memory in M stage (lw) is
written in destination register. The data to be written in destination register is selected via
multiplexer controlled by the control signal MemToReg.

Hardware Duplication
As different instructions are in different stages of pipeline; a functional unit is often required by more
than one instruction at the same time. This necessitates redundancy of hardware. E.g. Harvard
Architecture i.e. separate memory units for instructions and data are required because in a given pipeline
cycle two instructions may need to use memory (one for instruction fetch and another for data read/write)
as shown below.
WB I
1
I
2
I
3
I
4
I
5
I
6

M I
1
I
2
I
3
I
4
I
5
I
6
I
7

EX I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8

ID I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8
I
9

IF I
1
I
2
I
3
I
4
I
5
I
6
I
7
I
8
I
9
I
10

cycles 1 2 3 4 5 6 7 8 9 10
As indicated in cycle 4, I
1
accesses memory for data read/write and I
4
is being fetched from
instruction memory. Harvard Architecture averts this problem.

Graphical Representation of MIPS Pipeline

Consider pipelined execution of following MIPS instructions:
lw $1, 0($2)
add $3, $4, $5
The lw instruction uses all stages in the pipeline but add (like any other R-type instruction) doesnt access
data memory i.e. it doesnt use M stage. Thus the progress of above instructions through the MIPS
pipeline is illustrated below:

CC1 CC2 CC3 CC4 CC5
lw IF ID EX M WB
add IF ID EX WB



A resource conflict is indicated in CC5. That is, two different instructions attempt to use the same
hardware in the same cycle. This can be averted by ensuring uniformity: make all instructions pass
through all the stages in the same order.
As a consequence, some instructions will do nothing (accomplished through disabling corresponding
control signals) in some stages.
R-Type IF ID EX M WB

sw IF ID EX M WB

beq IF ID EX M WB

Where shaded boxes represent pipeline stages in which given instruction does nothing. Only lw uses all
five stages.
*****

S-ar putea să vă placă și