Sunteți pe pagina 1din 57

Lesson 5: Processor Design

Topic 1 Methods and Concepts

EE37E 2005

Introduction
References:
-Modern Processor Design Book ( pp. 1 16)
- Computer Organization and Design Book (pp. 54- 89)

EE37E 2005

While introducing this topic we will focus on these points:


Evolution of microprocessors
Instruction set processor design
Principles
Microprocessors are Instruction set processors (ISPs).
An ISP executes instructions from a predefined
instruction set.
A microprocessors functionality is fully characterized by
the instruction set it is capable of executing.
This predefined instruction set is also called the
instruction set architecture.

EE37E 2005

An ISA serves as an interface between software


and hardware.
In terms of processor design methodology, an
ISA is the specification of the design while the
microprocessor or ISP is the implementation of a
design.

EE37E 2005

Computer System Components


L1

1000MHZ - 3 GHZ (a multiple of system bus speed)


Pipelined ( 7 -21 stages )
Superscalar (max ~ 4 instructions/cycle) single-threaded
Dynamically-Scheduled or VLIW
Dynamic and static branch prediction

CPU

L2
SDRAM
L3
PC100/PC133
100-133MHZ
64-128 bits wide
2-way inteleaved
~ 900 MBYTES/SEC
Double Date
Rate (DDR) SDRAM
PC3200
400MHZ (effective 200x2)
64-128 bits wide
4-way interleaved
~3.2 GBYTES/SEC
(second half 2002)
RAMbus DRAM (RDRAM)
PC800, PC1060
400-533MHZ (DDR)
16-32 bits wide channel
~ 1.6 - 3.2 GBYTES/SEC
( per channel)

Caches

System Bus

Examples: Alpha, AMD K7: EV6, 400MHZ


Intel PII, PIII: GTL+ 133MHZ
Intel P4
800MHZ
Support for one or more CPUs

adapters

Memory
Controller
Memory Bus

NICs

Controllers

Memory

I/O Buses

Disks
Displays
Keyboards

North
Bridge

South
Bridge

I/O Devices:

Chipset

EE37E 2005

Example: PCI-X 133MHZ


PCI, 33-66MHZ
32-64 bits wide
133-1024 MBYTES/SEC

Networks
Fast Ethernet
Gigabit Ethernet
ATM, Token Ring ..

Computer System Components


Enhanced CPU Performance & Capabilities:
Memory Latency Reduction:
Conventional &
Block-based
Trace Cache.

Integrate Memory
Controller & a portion
of main memory with
CPU: Intelligent RAM
Integrated memory
Controller:
AMD Opetron
IBM Power5

L1

Support for Simultaneous Multithreading (SMT): Alpha EV8.


VLIW & intelligent compiler techniques: Intel/HP EPIC IA-64.
More Advanced Branch Prediction Techniques.
Chip Multiprocessors (CMPs): The Hydra Project. IBM Power 4,5
Vector processing capability: Vector Intelligent RAM (VIRAM).
Or Multimedia ISA extension.
Digital Signal Processing (DSP) capability in system.
Re-Configurable Computing hardware capability in system.

SMT
CMP

CPU

L2
L3

Caches

System Bus
adapters

Memory
Controller
Memory Bus

NICs

Controllers

Memory

I/O Buses

Disks (RAID)
Displays
Keyboards

North
Bridge

South
Bridge

Chipset

Networks

I/O Devices:

EE37E 2005

Recent Trends in Computer Design

The cost/performance ratio of computing systems have seen a steady decline


due to advances in:
Integrated circuit technology: decreasing feature size,
Clock rate improves roughly proportional to improvement in
Number of transistors improves proportional to (or faster).

Architectural improvements in CPU design.

Microprocessor systems directly reflect IC improvement in terms of a yearly


35 to 55% improvement in performance.

Assembly language has been mostly eliminated and replaced by other


alternatives such as C or C++

Standard operating Systems (UNIX, NT) lowered the cost of introducing new
architectures.

Emergence of RISC architectures and RISC-core architectures.

Adoption of quantitative approaches to computer design based on empirical


performance observations.

EE37E 2005

Microprocessor Architecture Trends


C IS C M a c h in e s
in s tru c tio n s tak e va ria b le tim e s to c o m p le te
R IS C M a c h in e s (m ic r o c o d e )
s im p le in s tru c tio n s , o p tim iz e d fo r s p e e d
R IS C M a c h in e s (p ip e lin e d )
s am e in d ivid u a l in s tru c tio n late n c y
g r e a t e r t h r o u g h p u t t h r o u g h i n s t r u c t i o n "o v e r l a p "
Superscalar P ro cesso rs
m u ltip le in s tr u c tio n s e x e c u tin g s im u lta n e o u s ly
M u ltith r e ad e d P r o c e sso r s
ad d itio n a l H W re s o u rc e s (re g s , P C , S P )
e ac h c o n te x t g e ts p ro c e s s o r fo r x c y c le s

V L IW
"S u p e r i n s t r u c t i o n s " g r o u p e d t o g e t h e r
d e c re as e d H W c o n tro l c o m p le x ity

CMPs
S in g le C h ip M u ltip ro cesso rs
d u p lic ate e n tire p ro c e s s o rs
( t e c h s o o n d u e t o M o o r e 's L a w )

S I M U L T A N E O U S M U L T I T H R E A D I N G (SMT)
m u ltip le H W c o n te x ts (re g s , P C , S P )
e ac h c y c le , a n y c o n te x t m ay e x e c u te
SMT/CMPs (e.g. IBM Power5 in 2004)

EE37E 2005

Evolution of microprocessors
100000000

Graduation Window
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million

10000000

Transistors

Moores Law

Pentium
i80486

1000000
i80386
i80286

100000

CMOS improvements:
Die size: 2X every 3 yrs
Line width: halve / 4-7 yrs

i8086
10000
i8080
i4004
1000
1970

1975

1980

1985

1990

1995

2000

Year

EE37E 2005

Figure1: Evolution of
microprocessors
9

Three decades of the history of microprocessors


tell a truly remarkable story of advances in the
computer industry (Table 1).
1970 1980

1980 1990

1990
-2000

2000
-2010

Transistor 2K 100K
count

100K 1 M 1M 100M 100M 2


B

Clock
0.1 3
frequency MHz

3 30
MHz

30 MHz
1 GHz

1 15 GHz

Instructio 0.1IPC
ns/Cycle

0.1IPC0.9IPC

0.9IPC1.9IPC

1.9IPC2.9IPC

Table 1. The amazing decades of the evolution of microprocessors


EE37E 2005

10

Hierarchy of Computer Architecture


High-Level Language Programs

Software

Application
Operating
System

Machine Language
Program

Software/Hardware
Boundary

Assembly Language
Programs

Compiler

Firmware

Instr. Set Proc. I/O system

Instruction Set
Architecture

Datapath & Control

Hardware

Digital Design
Circuit Design

Microprogram

Layout
Register Transfer
Notation (RTN)

Logic Diagrams
Circuit Diagrams

EE37E 2005

11

Instruction Set Processor Design


Critical to an ISP is the
instruction set
architecture, which
specifies the functionality
that must be implemented
by the instruction set
processor (ISP).

EE37E 2005

12

The Design Process


"To Design Is To Represent
Design activity yields description/representation of an
object
Traditional craftsman does not distinguish between the
conceptualization and the artifact
Separation comes about because of complexity
Concept is captured in one or more representation
languages

This process IS design

Design Begins With Requirements


Functional Capabilities: what it will do
Performance Characteristics: Speed, Power, Area,
Cost, . . .
EE37E 2005

13

Design Process (cont.)


CPU

Design Finishes As Assembly


Design understood in terms of
components and how they have
been assembled

Datapath
ALU

Top Down decomposition of


complex functions (behaviors)
into more primitive functions

Regs

Control
Shifter

Nand
Gate

Bottom-up composition of primitive


building blocks into more complex assemblies

Design is a "creative process," not a simple method

EE37E 2005

14

Design as
Search

Problem A

Strategy 1

SubProb 1
BB1

BB2

BB3

Strategy 2

SubProb2

SubProb3

BBn

Design involves educated guesses and verification


-- Given the goals, how should these be prioritized?
-- Given alternative design pieces, which should be selected?
-- Given design space of components & assemblies, which part will yield
the best solution?

Feasible (good) choices vs. Optimal choices

EE37E 2005

15

Instruction Set Architecture


(subset of Computer Architecture)
...

the attributes of a [computing] system as seen by the


programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data flows and
controls the logic design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964

Organization of Programmable Storage

SOFTWARE

Data Types & Data Structures:


Encodings & Representations
Instruction Set
Instruction Formats
Data Items
Modes of Addressing and AccessingEE37E
2005 and Instructions

16

The Instruction Set: a Critical


Interface

software
instruction set

hardware

Figure 2: ISA
EE37E 2005

17

Dynamic Static Interface


We have discussed two critical roles played by
the ISA:
Contract between software and Hardware, which
facilitates the development pf programs and machines
Specification for microprocessor design

The third role is an associated definition of an


interface that separates what is done statically
at the compile time versus what is done
dynamically at run time. This interface is called
the Dynamic-static Interface
EE37E 2005

18

(Software)
Program

Compiler
complexity

Exposed to
software

Static

Architecture (DSI)
Hardware
complexity
Machine

Hidden in
hardware

Dynamic

(Hardware)

Figure 3: The dynamic-static feature

EE37E 2005

19

Computer Architecture Topics


Input/Output and Storage
Disks, WORM, Tape
DRAM

Memory
Hierarchy

VLSI

L2 Cache

L1 Cache
Instruction Set Architecture

RAID
Emerging Technologies
Interleaving
Bus protocols
Coherence,
Bandwidth,
Latency

Addressing,
Protection,
Exception Handling

Pipelining, Hazard Resolution,


Pipelining and Instruction
Superscalar, Reordering,
Level Parallelism
Prediction, Speculation,
Vector, DSP
EE37E 2005
20

Principles of Processor Performance

EE37E 2005

21

Definition
s
Performance is in units of things per sec
bigger is better

If we are primarily concerned with response time

1
performance(x) =
execution_time(x)
" X is n times faster than Y" means
Execution_time(Y)

Performance(X)
n

=
Performance(Y)

Execution_time(X)

EE37E 2005

22

Cycles Per Instruction


IC = Instruction Count
CPI = Clock Per Instruction
CPU time Number of clock cycles Clock cycle time

Number of clock cycles


CPU time
Clock Frequency
Number of clock cycles
CPI
IC
CPU time IC CPI Clock cycle time
IC CPI
CPU time
Clock Rate
n

CPU time Cycle Time CPI j I j


j 1

EE37E 2005

23

Cycles Per Instruction


We may separate the contribution of each type of
instruction to the execution time defining:
n

Number of clock cycles CPI j IC j


j 1

where IC j is the number of times that instruction


j is executed, and CPI j is the average number of
clocks required to execute instruction j
Processor pipelining and memory interactions limit the accuracy of this
approach, but its a good first guess. For accuracy, it is necessary to simulate
the instructions of an entire program with issue, pipeline and memory
interactions.
EE37E 2005
24

Aspects of CPU Performance (CPU Law)


CPU
CPUtime
time

== Seconds
Seconds == Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program
Instruction
Cycle
Program
Program
Instruction
Cycle

EE37E 2005

25

Amdahl's Law
Speedup due to enhancement E:
Exec Time w/o E Performance w/ E
Speedup(E)

Exec Time w/ E Performance w/o E

Suppose that enhancement E accelerates a fraction


F of the task by a factor S, and the remainder of
the task is unaffected
E.g. special instructions, memory, IO, parallel
processing
EE37E 2005

26

Amdahls Law

ExTime new

Fraction enhanced
ExTime old 1 Fraction enhanced

Speedup enhanced

Speedup overall

ExTime old
1

Fraction enhanced
ExTime new 1 Fraction

enhanced
Speedup enhanced

EE37E 2005

27

Amdahls Law
Example: Floating point instructions improved to
run 2X; but only 10% of actual instructions are FP

ExTime new

0.1
ExTime old 1 0.1
ExTime old 0.95

Speedup overall

ExTime old
ExTime old
1

1.053
ExTime new ExTime old 0.95 0.95

EE37E 2005

28

Topic 2: Instruction Set Architecture


Design
Adapted from Prof. Jerry Breechers Notes + my CS21Q
Notes
(http://babbage.clarku.edu/~jbreecher/arch/arch.html)

EE37E 2005

29

Introduction
7.1 Introduction
7.2 Classifying Instruction Set Architectures
7.3 Memory Addressing
7.4 Operations in the Instruction Set
7.5 Type and Size of Operands
7.6 Encoding and Instruction Set
7.7 The Role of Compilers
7.8 The MIPS Architecture and Bonus
7.9. Endianess

EE37E 2005

30

Introduction
The Instruction Set Architecture is that portion of the machine visible to the
assembly level programmer or to the compiler writer.

software
instruction set
hardware

Questions:
- What are the advantages and disadvantages of various
instruction set alternatives?
- How do languages and compilers affect ISA?
EE37E 2005

31

Classifying Instruction Set


Architectures
Classifications can be by:
1.
2.
3.

Stack/accumulator/register
Number of memory operands.
Number of total operands.

EE37E 2005

32

Instruction Set
Architectures
Accumulator:
1 address
1+x address

Basic ISA
Classes

add A
addx A

acc acc + mem[A]


acc acc + mem[A + x]

add

tos tos + next

add A B
add A B C

EA(A) EA(A) + EA(B)


EA(A) EA(B) + EA(C)

Stack:
0 address
General Purpose Register:
2 address
3 address
Load/Store:
0 Memory

1 Memory

load R1, Mem1


load R2, Mem2
add R1, R2

ALU Instructions
can have two or
three operands.

ALU Instructions can


have 0, 1, 2, 3 operands.
Shown here are cases of
0 and 1.

add R1, Mem2

EE37E 2005

33

Basic ISA
Classes

Instruction Set
Architectures

The results of different address classes is easiest to see with the examples here,
all of which implement the sequences for C = A + B.

Stack

Accumulator

Register

Register

(Register-memory)

(load-store)

Push A

Load A

Load R1, A

Load

R1, A

Push B

Add B

Add

Load

R2, B

Add

Store C

Store

Add

R3, R1, R2

R1, B
C, R1

Pop C

Store

C, R3

Registers are the class that won out. The more registers on the CPU, the better.

EE37E 2005

34

Instruction Set
Architectures

Intel 80x86 Integer


Registers

GPR0

EAX

Accumulator

GPR1

ECX

Count register, string, loop

GPR2

EDX

Data Register; multiply, divide

GPR3

EBX

Base Address Register

GPR4

ESP

Stack Pointer

GPR5

EBP

Base Pointer for base of stack seg.

GPR6

ESI

Index Register

GPR7

EDI

Index Register

CS

Code Segment Pointer

SS

Stack Segment Pointer

DS

Data Segment Pointer

ES

Extra Data Segment Pointer

FS

Data Seg. 2

GS

Data Seg. 3

EIP

Instruction Counter

Eflags

Condition Codes

PC

EE37E 2005

35

Memory Addressing
Sections Include:
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode

EE37E 2005

36

Memory
Addressing

Interpreting Memory
Addresses

What object is accessed as a function of the address and length?


Objects have byte addresses an address refers to the number of bytes counted from
the beginning of memory.
Little Endian puts the byte whose address is xx00 at the least significant position in the
word.
Big Endian puts the byte whose address is xx00 at the most significant position in the
word.
Alignment data must be aligned on a boundary equal to its size. Misalignment typically
results in an alignment fault that must be handled by the Operating System.

EE37E 2005

37

Memory
Addressing

Addressing
Modes

This table shows the most common modes. A more complete set is in Figure 2.6

Addressing Mode

Example Instruction

Meaning

When Used

Register

Add R4, R3

R[R4] <- R[R4] + R[R3]

When a value is in a
register.

Immediate

Add R4, #3

R[R4] <- R[R4] + 3

For constants.

Displacement

Add R4, 100(R1)

R[R4] <- R[R4] +

Accessing local variables.

M[100+R[R1] ]
Register Deferred

Add R4, (R1)

R[R4] <- R[R4] +


M[R[R1] ]

Absolute

Add R4, (1001)

R[R4] <- R[R4] + M[1001]

EE37E 2005

Using a pointer or a
computed address.
Used for static data.

38

Memory
Addressing

Displacement
Addressing Mode

How big should the displacement be?


For addresses that do fit in displacement size:
Add R4, 10000 (R0)
For addresses that dont fit in displacement size, the compiler must do the
following:
Load R1, address
Add R4, 0 (R1)
Depends on typical displaces as to how big this should be.
On both IA32 and DLX, the space allocated is 16 bits.

EE37E 2005

39

Memory
Addressing

Immediate Address
Mode

Used where we want to get to a numerical value in an instruction.

At high level:

At Assembler level:

a = b + 3;

Load
Add

if ( a > 17 )

Load
R2, 17
CMPBGT R1, R2

goto

Load
Jump

Addr

R2, 3
R0, R1, R2

R1, Address
(R1)

So how would you get a 32 bit value into a register?


EE37E 2005

40

Operations In The Instruction Set


Sections Include:
Detailed information about types of instructions.
Instructions for Control Flow (conditional branches, jumps)

EE37E 2005

41

Operations In The
Instruction Set
Arithmetic and logical
Data transfer
Control
System
Floating point
Decimal
String
Multimedia -

Operator Types

and, add
move, load
branch, jump, call
system call, traps
add, mul, div, sqrt
add, convert
move, compare
2D, 3D? e.g., Intel MMX and Sun VIS

EE37E 2005

42

Control
Instructions

Operations In The
Instruction Set

Conditional branches are 20%


of all instructions!!

Control Instructions Issues:

taken or not
where is the target
link return address
save or restore

Instructions that change the PC:

(conditional) branches, (unconditional) jumps


function calls, function returns
system calls, system returns

EE37E 2005

43

Type And Size of Operands


The type of the operand is usually encoded in the Opcode a LDW
implies loading of a word.
Common sizes are:
Character (1 byte)
Half word (16 bits)
Word (32 bits)
Single Precision Floating Point (1 Word)
Double Precision Floating Point (2 Words)

Integers are twos complement binary.


Floating point is IEEE 754.
Some languages (like COBOL) use packed decimal.
EE37E 2005

44

The MIPS Architecture


MIPS is very RISC oriented.

EE37E 2005

45

The MIPS
Architecture
Theres MIPS 32 that we learned in
CS140
32bit byte addresses aligned
Load/store only displacement
addressing
Standard datatypes
3 fixed length formats
32 32bit GPRs (r0 = 0)
16 64bit (32 32bit) FPRs
FP status register
No Condition Codes
Theres MIPS 64 the current arch.
Standard datatypes
4 fixed length formats (8,16,32,64)
32 64bit GPRs (r0 = 0)
64 64bit FPRs

MIPS Characteristics
Addressing Modes
Immediate
Displacement
(Register Mode used only for ALU)
Data transfer
load/store word, load/store
byte/halfword signed?
load/store FP single/double
moves between GPRs and FPRs
ALU
add/subtract signed? immediate?
multiply/divide signed?
and,or,xor immediate?, shifts: ll, rl,
ra immediate?
sets immediate?
EE37E 2005

46

The MIPS
Architecture

MIPS Characteristics

Control

branches == 0, <> 0

conditional branch testing FP bit

jump, jump register

jump & link, jump & link register

trap, returnfromexception
Floating Point

add/sub/mul/div

single/double

fp converts, fp set

EE37E 2005

47

The MIPS
Architecture

The MIPS Encoding

Register-Register
31

26 25

Op

21 20

Rs1

11 10

16 15

Rs2

6 5

Rd

Opx

Register-Immediate
31

26 25

Op

21 20

Rs1

16 15

immediate

Rd

Branch
31

26 25

Op

Rs1

21 20

16 15

Rs2/Opx

immediate

Jump / Call
31

26 25

Op

target
EE37E 2005

48

Byte Ordering
How should bytes within multi-byte word be
ordered in memory?
Conventions
Suns, Macs are Big Endian machines
Least significant byte has highest address

Alphas, PCs are Little Endian machines


Least significant byte has lowest address

EE37E 2005

49

Byte Ordering Example


Big Endian
Least significant byte has highest address

Little Endian
Least significant byte has lowest address

Example
Variable x has 4-byte representation 0x01234567
Address given by &x is 0x100

Big Endian

0x100 0x101 0x102 0x103

01
Little Endian

23

45

67

0x100 0x101 0x102 0x103

67

45

23

01

EE37E 2005

50

Machine-Level Code Representation

Encode Program as Sequence of Instructions


Each simple operation
Arithmetic operation
Read or write memory
Conditional branch
Instructions encoded as bytes
Alphas, Suns, Macs use 4 byte instructions
Reduced Instruction Set Computer (RISC)
PCs use variable length instructions
Complex Instruction Set Computer (CISC)
Different instruction types and encodings for different machines
Most code not binary compatible

Programs are Byte Sequences Too!


EE37E 2005

51

Classification of Processors
We can classify processors according to the areas in
which they are mostly used.
We can identity four different group of processors:
General purpose processors that are used in building
computers
Digital Signal processors which are processors designed
specifically for signal processing.
Microcontrollers which are small microcromputers
which integrate in the same chip a core processors plus
I/O elements and small amount of memories
Application specific processors which design to
performed specific function (i.e. Network processors)
EE37E 2005

52

General Purpose Processors


These processors are used to built major computer
platforms.
We can name:
Intel / AMD based computers also called IBM
compatible
Macintosh computers built using PowerPC processors
Sun machines that use Ultrasparc Processors.

EE37E 2005

53

Examples of General Purpose Processors

Type of Computer

Processors Used

Technology

Macinstosh

PowerPC

Superscalar

(IBM, Motorola)
Sun

Ultrasparc

RISC

(SUN)
IBM Compatible

Intel Processors

Superscalar

Athlon, Duron
(AMD), Cyrix
EE37E 2005

54

DSP

Digital Signal Processing (DSP) is used in a wide variety of


applications, and it is hard to find a good definition that is general.
We can start by dictionary definitions of the words:

Digital
* operating by the use of discrete signals to represent data
in the form of numbers
Signal
* a variable parameter by which information is conveyed
through an electronic circuit
Processing
* to perform operations on data according to programmed
instructions
Which leads us to a simple definition of: Digital Signal processing

changing or analyzing information which is measured as discrete


sequences of numbers

EE37E 2005

55

Note two unique features of Digital Signal processing as opposed to


plain old ordinary digital processing:
signals come from the real world - this intimate connection with the real
world leads to many unique needs such as the need to react in real time and
a need to measure signals and convert them to digital numbers
signals are discrete - which means the information in between discrete
samples is lost

The advantages of DSP are common to many digital systems and include:
Versatility:
digital systems can be reprogrammed for other applications (at least where
programmable DSP chips are used)
digital systems can be ported to different hardware (for example a different
DSP chip or board level product)

Repeatability:
digital systems can be easily duplicated
digital systems do not depend on strict component tolerances
digital system responses do not drift with temperature
Simplicity:
some things can be done more easily digitally than with analogue
systems

EE37E 2005

56

DSP is used in a very wide


variety of applications.
But most share some
common features:

they use a lot of math


(multiplying and adding
signals)
they deal with signals
that come from the
real world
they require a response
in a certain time

Where general purpose


DSP processors are
concerned, most applications
deal with signal frequencies
that are in the audio range.

EE37E 2005

57

S-ar putea să vă placă și