Documente Academic
Documente Profesional
Documente Cultură
COMPUTER ARCHITECTURE
Bo Yu
Computer Engineering Department
SJSU
Adapted from Computer Organization and Design, 5th Edition, 4th edition, Patterson and Hennessy, MK
and Computer Architecture – A Quantitative Approach, 4th edition, Patterson and Hennessy, MK
Lecture 2 Key Concepts Review
■ Last Lecture
o MIPS Architecture
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
o Immediate arithmetic and load/store instructions
❖ rt: destination or source register number
❖ Constant: –215 to +215 – 1
❖ Address: offset added to base address in rs
● Example of translating MIPS Assembly Language into Machine Language:
A[300] = h + A[300]; $t1 has base address of array A, $s2 corresponds to h
The above assignment statement is compiled into:
Assembly
Language
Machine
Language
Instructions
Machine
Binary
Code
Cmpe 200 Lecture 3 8
Lecture 2: Instructions: Language of the Computer
■ Signed Negation
◼ Complement and add 1
◼ Complement means 1 → 0, 0 → 1
x + x = 1111...1112 = −1
x + 1 = −x
◼ Example: negate +2
◼ +2 = 0000 0000 … 00102
◼ –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
Cmpe 200 Lecture 3 10
Lecture 2: Instructions: Language of the Computer
■ Hexadecimal
◼ Base 16
◼ Compact representation of bit strings
◼ 4 bits per hex digit
■ Logical Operations
● Instructions for bitwise manipulation
● Useful for extracting and inserting groups of bits in a
word
Logical
C Operators Java Operators MIPS Instructions
Operations
Shift left logical << << sll
■ Shift Operations
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
■ Conditional Operations
● Branch to a labeled instruction if a condition is true
o Otherwise, continue sequentially
● beq rs, rt, L1
o if (rs == rt) branch to instruction labeled L1;
● bne rs, rt, L1
o if (rs != rt) branch to instruction labeled L1;
● j L1
o unconditional jump to instruction labeled L1
■ Basic Blocks
● C code:
● A basic block is a sequence of instructions with
o No embedded branches (except at end)
o No branch targets (except at beginning)
❖ –1 < +1 $t0 = 1
■ Memory Layout
● Text: program code
● Static data: global variables
o e.g., static variables in C, constant arrays and strings
o $gp initialized to address allowing ±offsets into this
segment
o $sp → starts at 7fff fffc, grows down towards data
segment
o $gp → starts at 1000 8000, moves using positive
and negative 16-bit offset
o Text Segment: MIPS machine code
o Static data: starts at 1000 0000
o Dynamic data: heap grows towards stack
● Dynamic data: heap
o E.g., malloc in C, new in Java
● Stack: automatic storage
Static linking
■ Instruction Execution
● Instruction Fetch
o PC => instruction memory => Instruction register
● Operand Read
o Register Numbers => Register File
o Read Registers
● Execute
o ALU to calculate
o Results for arithmetic operations
o Data memory address for Load/Store
o Branch target address (alternative implementation?)
● Next Instruction
o PC + target address for jumps or PC + 4
Cmpe 200 Lecture 3 24
Lecture 3: The Processor
■ CPU Overview
■ Multiplexers
MUX points
■ Control
MemWrite
Clock
RegWrite
Clock
PC Clock
PC clock
RegWrite clock
PC → Instruction Memory
MemWrite clock
Instruction Fetch
Instruction Decode
Operand Read
State elements: A memory element such as a
Execute
register or a memory, edge-triggered FFs Result Write
Cmpe 200 Lecture 3 28
Lecture 3: The Processor
PC clock
RegWrite clock
MemWrite clock
Cmpe 200 Lecture 3 29
Lecture 3: The Processor
■ Combinational Elements
◼ AND-gate ◼ Adder A
Y
+
◼ Y=A&B ◼ Y=A+B B
A
Y
B
◼ Arithmetic/Logic Unit
◼ Multiplexer ◼ Y = F(A, B)
◼ Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Cmpe 200 Lecture 3 31
Lecture 3: The Processor
■ Sequential Elements
● Register: stores data in a circuit
o Uses a clock signal to determine when to
update the stored value
o Edge-triggered: update when Clk changes
from 0 to 1
Clk
D Q
D
Clk
Q
■ Sequential Elements
● Register with write control
o Only updates on clock edge when write
control input is 1
o Used when stored value is required later
Clk
D Q Write
Write D
Clk
Q
■ Building a Datapath
● Datapath
o Elements that process data and addresses
in the CPU
❖ Registers, ALUs, mux’s, memories, …
● We will build a MIPS datapath incrementally
o Refining the overview design
■ Data Path
● Instruction Fetch
Increment by
32-bit 4 for next
register instruction
■ R-Format Instructions
● Read two register operands
● Perform arithmetic/logical operation
● Write register result
■ Branch Instructions
● Read register operands
● Compare operands
o Perform ALU op as required for condition testing
o Use ALU, subtract and check Zero output
● Calculate target address based on the
condition
o Sign-extend displacement
o Shift left 2 places (word displacement)
o Add to PC + 4
❖Already calculated by instruction fetch
Examples (with RTL descriptions):
BEQ If [R(rs)==R(rt) then PC PC + signExt(Imm16) || 00
else PC PC + 4
Cmpe 200 Lecture 3 38
Lecture 3: The Processor
■ Branch Instructions
Just
re-routes
wires
Sign-bit wire
replicated
Cmpe 200 Lecture 3 39
Lecture 3: The Processor
Sign
16 Extend 32
zeroExt
RegDst
Add
RegWrite MemWrite
4 MemtoReg
ALUSrc ovf
Instr[25-21] Read Addr 1 zero
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Branch
RegDst
MemtoReg
ALUSrc MemWrite
RegWrite
ovf
Instr[25-21] Read Addr 1
Instruction Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
zeroExt Instr[5-0]
■ ALU Control
● ALU used for
o Load/Store: F = add (to compute the address)
o Branch: F = subtract
o R-type: F depends on funct field
■ ALU Control
● Assume 2-bit ALUOp derived from opcode
o Combinational logic derives ALU control
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
■ J-Type Instruction
Jump 2 address
31:26 25:0
Instruction Opcode funct shamt ALU func ALUContrl RegDst Branch JUMP MemToReg MemWrite ALUsrc RegWrite zeroExt
LW xxxxx ADD 0010 0 0 0 1 0 1 1 0
SW yyyyyy ADD 0010 x 0 0 x 1 1 0 0
BEQ zzzzz SUB 0110 x 1 0 x 0 0 0 0
R-type add 100000 ADD 0010 1 0 0 0 0 0 1 0
sub 100010 SUB 0110 1 0 0 0 0 0 1 0
and 100100 AND 0000 1 0 0 0 0 0 1 0
or 100101 OR 0001 1 0 0 0 0 0 1 0
shift xxxx xxxx SHIFT 1 0 0 0 0 0 1 0
ori 1 0 0 0 0 0 1 1
Jump x x x 1 x 0 x 0 x
■ Performance Issues
● Single Clock
o Functional Block have different delays
o Single clock designs use the clock inefficiently – the clock must be
timed to accommodate the slowest instruction
o Longest delay determines clock period
√ Critical path: load instruction
√ Instruction memory → register file → ALU → data memory → register file
o Violates design principle: Making the common case fast
o We will improve performance by pipelining
Cycle 1 Cycle 2
Clk
lw sw Waste
■ Solution
Single-cycle (Tc= 800ps)
■ Next lecture
● Multicycle datapath - Pipelining