Documente Academic
Documente Profesional
Documente Cultură
1 1999 ©UCB
Review of Last Class
°MIPS Datapath
°Introduction to Pipelining
°Introduction to Instruction Level
Parallelism (ILP)
°Introduction to VLIW
2 1999 ©UCB
What is Multiprocessing
5 1999 ©UCB
Architectural Comparisons (cont.)
Simultaneous
Superscalar Fine-Grained Coarse-Grained Multiprocessing Multithreading
Time (processor cycle)
6 1999 ©UCB
Intel IXP1200 Network Processor
Initial
component of the Intel Exchange
Architecture - IXA
Eachmicro engine is a 5-stage pipeline – no ILP,
4-way multithreaded
7 core multiprocessing – 6 Micro engines and a
Strong Arm Core
166 MHz fundamental clock rate
Intel claims 2.5 Mpps IP routing for 64 byte packets
Already the most widely used NPU
Or more accurately the most widely admitted use
7 1999 ©UCB
IXP1200 Chip Layout
StrongARM processing
core
Microengines introduce
new ISA
I/O
PCI
SDRAM
SRAM
IX : PCI-like packet bus
On chip FIFOs
16 entry 64B each
8 1999 ©UCB
IXP1200 Microengine
4 hardware contexts
Single issue processor
Explicit optional context switch on
SRAM access
Registers
All are single ported
Separate GPR
1536 registers total
32-bit ALU
Can access GPR or XFER registers
Standard 5 stage pipe
4KB SRAM instruction store – not a
cache!
9 1999 ©UCB
Intel IXP2400 Microengine (New)
XScale core
replaces
StrongARM
1.4 GHz target in
0.13-micron
Nearest neighbor
routes added
between
microengines
Hardware to
accelerate CRC
operations and
Random number
generation
16 entry CAM
10 1999 ©UCB
MIPS Pipeline
Chapter 6 CS 161 Text
11 1999 ©UCB
Review: Single-cycle Datapath for MIPS
Stage 5
Instruction Data
PC Memory Registers ALU Memory
(Imem) (Dmem)
IM Reg DM Reg
12 1999 ©UCB
Stages of Execution in Pipelined MIPS
5 stage instruction pipeline
1) I-fetch: Fetch Instruction, Increment PC
2) Decode: Instruction, Read Registers
3) Execute:
Mem-reference: Calculate Address
R-format: Perform ALU Operation
4) Memory:
Load: Read Data from Data Memory
Store: Write Data to Data Memory
5) Write Back: Write Data to Register
13 1999 ©UCB
Pipelined Execution Representation
Time
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Program Flow
°To simplify pipeline, every instruction
takes same number of steps, called
stages
14
°One clock cycle per stage 1999 ©UCB
Datapath Timing: Single-cycle vs. Pipelined
°Assume the following delays for major
functional units:
• 2 ns for a memory access or ALU operation
• 1 ns for register file read or write
°Total datapath delay for single-cycle:
Insn Insn Reg ALU Data Reg Total
Type Fetch Read Oper Access Write Time
Add
4 Add
result
Shift
left 2
Read
Ins truction
PC Address register 1
Read
data 1
Read
register 2 Zero
Read ALU ALU
Write 0 Address Read
data 2 result 1
register M data
u M
Imem Write
data Regs x
1
u
x
0
Write
16 32
data
Dmem
Sign
extend
20
64 bits 133 bits 102 bits 69 bits
1999 ©UCB