Documente Academic
Documente Profesional
Documente Cultură
EE37E 2005
Introduction
References:
-Modern Processor Design Book ( pp. 1 16)
- Computer Organization and Design Book (pp. 54- 89)
EE37E 2005
EE37E 2005
EE37E 2005
CPU
L2
SDRAM
L3
PC100/PC133
100-133MHZ
64-128 bits wide
2-way inteleaved
~ 900 MBYTES/SEC
Double Date
Rate (DDR) SDRAM
PC3200
400MHZ (effective 200x2)
64-128 bits wide
4-way interleaved
~3.2 GBYTES/SEC
(second half 2002)
RAMbus DRAM (RDRAM)
PC800, PC1060
400-533MHZ (DDR)
16-32 bits wide channel
~ 1.6 - 3.2 GBYTES/SEC
( per channel)
Caches
System Bus
adapters
Memory
Controller
Memory Bus
NICs
Controllers
Memory
I/O Buses
Disks
Displays
Keyboards
North
Bridge
South
Bridge
I/O Devices:
Chipset
EE37E 2005
Networks
Fast Ethernet
Gigabit Ethernet
ATM, Token Ring ..
Integrate Memory
Controller & a portion
of main memory with
CPU: Intelligent RAM
Integrated memory
Controller:
AMD Opetron
IBM Power5
L1
SMT
CMP
CPU
L2
L3
Caches
System Bus
adapters
Memory
Controller
Memory Bus
NICs
Controllers
Memory
I/O Buses
Disks (RAID)
Displays
Keyboards
North
Bridge
South
Bridge
Chipset
Networks
I/O Devices:
EE37E 2005
Standard operating Systems (UNIX, NT) lowered the cost of introducing new
architectures.
EE37E 2005
V L IW
"S u p e r i n s t r u c t i o n s " g r o u p e d t o g e t h e r
d e c re as e d H W c o n tro l c o m p le x ity
CMPs
S in g le C h ip M u ltip ro cesso rs
d u p lic ate e n tire p ro c e s s o rs
( t e c h s o o n d u e t o M o o r e 's L a w )
S I M U L T A N E O U S M U L T I T H R E A D I N G (SMT)
m u ltip le H W c o n te x ts (re g s , P C , S P )
e ac h c y c le , a n y c o n te x t m ay e x e c u te
SMT/CMPs (e.g. IBM Power5 in 2004)
EE37E 2005
Evolution of microprocessors
100000000
Graduation Window
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
10000000
Transistors
Moores Law
Pentium
i80486
1000000
i80386
i80286
100000
CMOS improvements:
Die size: 2X every 3 yrs
Line width: halve / 4-7 yrs
i8086
10000
i8080
i4004
1000
1970
1975
1980
1985
1990
1995
2000
Year
EE37E 2005
Figure1: Evolution of
microprocessors
9
1980 1990
1990
-2000
2000
-2010
Transistor 2K 100K
count
Clock
0.1 3
frequency MHz
3 30
MHz
30 MHz
1 GHz
1 15 GHz
Instructio 0.1IPC
ns/Cycle
0.1IPC0.9IPC
0.9IPC1.9IPC
1.9IPC2.9IPC
10
Software
Application
Operating
System
Machine Language
Program
Software/Hardware
Boundary
Assembly Language
Programs
Compiler
Firmware
Instruction Set
Architecture
Hardware
Digital Design
Circuit Design
Microprogram
Layout
Register Transfer
Notation (RTN)
Logic Diagrams
Circuit Diagrams
EE37E 2005
11
EE37E 2005
12
13
Datapath
ALU
Regs
Control
Shifter
Nand
Gate
EE37E 2005
14
Design as
Search
Problem A
Strategy 1
SubProb 1
BB1
BB2
BB3
Strategy 2
SubProb2
SubProb3
BBn
EE37E 2005
15
SOFTWARE
16
software
instruction set
hardware
Figure 2: ISA
EE37E 2005
17
18
(Software)
Program
Compiler
complexity
Exposed to
software
Static
Architecture (DSI)
Hardware
complexity
Machine
Hidden in
hardware
Dynamic
(Hardware)
EE37E 2005
19
Memory
Hierarchy
VLSI
L2 Cache
L1 Cache
Instruction Set Architecture
RAID
Emerging Technologies
Interleaving
Bus protocols
Coherence,
Bandwidth,
Latency
Addressing,
Protection,
Exception Handling
EE37E 2005
21
Definition
s
Performance is in units of things per sec
bigger is better
1
performance(x) =
execution_time(x)
" X is n times faster than Y" means
Execution_time(Y)
Performance(X)
n
=
Performance(Y)
Execution_time(X)
EE37E 2005
22
EE37E 2005
23
== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle
EE37E 2005
25
Amdahl's Law
Speedup due to enhancement E:
Exec Time w/o E Performance w/ E
Speedup(E)
26
Amdahls Law
ExTime new
Fraction enhanced
ExTime old 1 Fraction enhanced
Speedup enhanced
Speedup overall
ExTime old
1
Fraction enhanced
ExTime new 1 Fraction
enhanced
Speedup enhanced
EE37E 2005
27
Amdahls Law
Example: Floating point instructions improved to
run 2X; but only 10% of actual instructions are FP
ExTime new
0.1
ExTime old 1 0.1
ExTime old 0.95
Speedup overall
ExTime old
ExTime old
1
1.053
ExTime new ExTime old 0.95 0.95
EE37E 2005
28
EE37E 2005
29
Introduction
7.1 Introduction
7.2 Classifying Instruction Set Architectures
7.3 Memory Addressing
7.4 Operations in the Instruction Set
7.5 Type and Size of Operands
7.6 Encoding and Instruction Set
7.7 The Role of Compilers
7.8 The MIPS Architecture and Bonus
7.9. Endianess
EE37E 2005
30
Introduction
The Instruction Set Architecture is that portion of the machine visible to the
assembly level programmer or to the compiler writer.
software
instruction set
hardware
Questions:
- What are the advantages and disadvantages of various
instruction set alternatives?
- How do languages and compilers affect ISA?
EE37E 2005
31
Stack/accumulator/register
Number of memory operands.
Number of total operands.
EE37E 2005
32
Instruction Set
Architectures
Accumulator:
1 address
1+x address
Basic ISA
Classes
add A
addx A
add
add A B
add A B C
Stack:
0 address
General Purpose Register:
2 address
3 address
Load/Store:
0 Memory
1 Memory
ALU Instructions
can have two or
three operands.
EE37E 2005
33
Basic ISA
Classes
Instruction Set
Architectures
The results of different address classes is easiest to see with the examples here,
all of which implement the sequences for C = A + B.
Stack
Accumulator
Register
Register
(Register-memory)
(load-store)
Push A
Load A
Load R1, A
Load
R1, A
Push B
Add B
Add
Load
R2, B
Add
Store C
Store
Add
R3, R1, R2
R1, B
C, R1
Pop C
Store
C, R3
Registers are the class that won out. The more registers on the CPU, the better.
EE37E 2005
34
Instruction Set
Architectures
GPR0
EAX
Accumulator
GPR1
ECX
GPR2
EDX
GPR3
EBX
GPR4
ESP
Stack Pointer
GPR5
EBP
GPR6
ESI
Index Register
GPR7
EDI
Index Register
CS
SS
DS
ES
FS
Data Seg. 2
GS
Data Seg. 3
EIP
Instruction Counter
Eflags
Condition Codes
PC
EE37E 2005
35
Memory Addressing
Sections Include:
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode
EE37E 2005
36
Memory
Addressing
Interpreting Memory
Addresses
EE37E 2005
37
Memory
Addressing
Addressing
Modes
This table shows the most common modes. A more complete set is in Figure 2.6
Addressing Mode
Example Instruction
Meaning
When Used
Register
Add R4, R3
When a value is in a
register.
Immediate
Add R4, #3
For constants.
Displacement
M[100+R[R1] ]
Register Deferred
Absolute
EE37E 2005
Using a pointer or a
computed address.
Used for static data.
38
Memory
Addressing
Displacement
Addressing Mode
EE37E 2005
39
Memory
Addressing
Immediate Address
Mode
At high level:
At Assembler level:
a = b + 3;
Load
Add
if ( a > 17 )
Load
R2, 17
CMPBGT R1, R2
goto
Load
Jump
Addr
R2, 3
R0, R1, R2
R1, Address
(R1)
40
EE37E 2005
41
Operations In The
Instruction Set
Arithmetic and logical
Data transfer
Control
System
Floating point
Decimal
String
Multimedia -
Operator Types
and, add
move, load
branch, jump, call
system call, traps
add, mul, div, sqrt
add, convert
move, compare
2D, 3D? e.g., Intel MMX and Sun VIS
EE37E 2005
42
Control
Instructions
Operations In The
Instruction Set
taken or not
where is the target
link return address
save or restore
EE37E 2005
43
44
EE37E 2005
45
The MIPS
Architecture
Theres MIPS 32 that we learned in
CS140
32bit byte addresses aligned
Load/store only displacement
addressing
Standard datatypes
3 fixed length formats
32 32bit GPRs (r0 = 0)
16 64bit (32 32bit) FPRs
FP status register
No Condition Codes
Theres MIPS 64 the current arch.
Standard datatypes
4 fixed length formats (8,16,32,64)
32 64bit GPRs (r0 = 0)
64 64bit FPRs
MIPS Characteristics
Addressing Modes
Immediate
Displacement
(Register Mode used only for ALU)
Data transfer
load/store word, load/store
byte/halfword signed?
load/store FP single/double
moves between GPRs and FPRs
ALU
add/subtract signed? immediate?
multiply/divide signed?
and,or,xor immediate?, shifts: ll, rl,
ra immediate?
sets immediate?
EE37E 2005
46
The MIPS
Architecture
MIPS Characteristics
Control
branches == 0, <> 0
trap, returnfromexception
Floating Point
add/sub/mul/div
single/double
fp converts, fp set
EE37E 2005
47
The MIPS
Architecture
Register-Register
31
26 25
Op
21 20
Rs1
11 10
16 15
Rs2
6 5
Rd
Opx
Register-Immediate
31
26 25
Op
21 20
Rs1
16 15
immediate
Rd
Branch
31
26 25
Op
Rs1
21 20
16 15
Rs2/Opx
immediate
Jump / Call
31
26 25
Op
target
EE37E 2005
48
Byte Ordering
How should bytes within multi-byte word be
ordered in memory?
Conventions
Suns, Macs are Big Endian machines
Least significant byte has highest address
EE37E 2005
49
Little Endian
Least significant byte has lowest address
Example
Variable x has 4-byte representation 0x01234567
Address given by &x is 0x100
Big Endian
01
Little Endian
23
45
67
67
45
23
01
EE37E 2005
50
51
Classification of Processors
We can classify processors according to the areas in
which they are mostly used.
We can identity four different group of processors:
General purpose processors that are used in building
computers
Digital Signal processors which are processors designed
specifically for signal processing.
Microcontrollers which are small microcromputers
which integrate in the same chip a core processors plus
I/O elements and small amount of memories
Application specific processors which design to
performed specific function (i.e. Network processors)
EE37E 2005
52
EE37E 2005
53
Type of Computer
Processors Used
Technology
Macinstosh
PowerPC
Superscalar
(IBM, Motorola)
Sun
Ultrasparc
RISC
(SUN)
IBM Compatible
Intel Processors
Superscalar
Athlon, Duron
(AMD), Cyrix
EE37E 2005
54
DSP
Digital
* operating by the use of discrete signals to represent data
in the form of numbers
Signal
* a variable parameter by which information is conveyed
through an electronic circuit
Processing
* to perform operations on data according to programmed
instructions
Which leads us to a simple definition of: Digital Signal processing
EE37E 2005
55
The advantages of DSP are common to many digital systems and include:
Versatility:
digital systems can be reprogrammed for other applications (at least where
programmable DSP chips are used)
digital systems can be ported to different hardware (for example a different
DSP chip or board level product)
Repeatability:
digital systems can be easily duplicated
digital systems do not depend on strict component tolerances
digital system responses do not drift with temperature
Simplicity:
some things can be done more easily digitally than with analogue
systems
EE37E 2005
56
EE37E 2005
57