Documente Academic
Documente Profesional
Documente Cultură
Tom Clarke
EDSAC first stored program computer Cambridge 1949 650 instructions per second. 1024 17-bit words of memory in mercury ultrasonic delay lines. 3000 valves, 12 kW power consumption, occupied a room 5m by 4m. Early use to solve problems in meteorology, genetics and X-ray crystallography.
60 years ago
ARM7 core up to 130 million instructions per second. 1995-2005. ARM7 core in many variations is most successful embedded processor today. Picture shows LPC2124 microcontroller which includes ARM7 core + RAM, ROM integrated peripherals.
The complete microcontroller is the square chip in the middle 128K X 32 bit words flash RAM 10mW/Mhz clock
and Now
tjwc - 13-Oct-10
ARM Organisation
Pipelining Throughput
Memory hierarchy
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.4
Part 1
Introduction to Machine Architecture
temp := v[k]; High Level Language Program Compiler Assembly Language Program Assembler Machine Language Program Machine Interpretation v[k] := v[k+1]; v[k+1] := temp;
lw lw sw sw
0000 1010 1100 0101
300mm wafer, used for production of Intels Dothan Pentium mobile CPUs
1.5
tjwc - 13-Oct-10
Application
Operating System
Levels of Abstraction
Compiler
What the computer does, not how it does it! ISA includes:Instruction (or Operation Code) Set
Data Types & Data Structures: Encodings & Representations Instruction Formats
I/O System
Digital Design
low
Organization of Programmable Storage (main memory etc) Modes of Addressing and Accessing Data Items and Instructions Behaviour on Exceptional Conditions (e.g. hardware divide by 0)
tjwc - 13-Oct-10
1.8
Computer Architecture
Programming languages
Operating Systems
Design expertise!
tjwc - 13-Oct-10
1.10
0 1997
1999
2001
2003
2005 Year
2007
2009
2011
Memory DRAM
disk
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.11 tjwc - 13-Oct-10
Internal Organisation
Processor aka CPU (Central Processing Unit)
Computer Processor Control Memory Devices: Input Output
8 6 4 2 0 1997
Datapath
1999
2001
2003
2005 Year
2007
2009
2011
Major components of Typical Computer System Data is mostly stored in the computer memory separate from the Processor, however registers in the processor datapath can also store small amounts of data
1.13 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.14
tjwc - 13-Oct-10
Summary
All computers consist of five components
Processor (CPU): (1) datapath and (2) control (3) Memory (4) Input devices and (5) Output devices Speed and density of computers is increasing exponentially with time (but different parts at different rates) This course will concentrate on Instruction Set Architectures and their use through assembly language programming
Assembly language is a human-readable version of machine code Think of this as being like learning arithmetic calculators are better for big calculations but you cant understand maths without doing it for yourself as well.
Based on von Neumann model Stored program and data in same memory Central Processing Unit (CPU) contains:
Arithmetic/Logic Unit (ALU) Control Unit Registers: fast memory, local to the CPU
Computer Memory
Look into memory and youll see 1s and 0s. Meaning depends on context: could be program, could be data. Data is stored in memory All computers require at least three types of signals (buses):
address bus - which location in memory data bus - carries the contents of the location control bus - governs the information transfer
Read from memory Write to memory
tjwc - 13-Oct-10
Memory locations
ADDRESS
10 9 8 7 6 5 4 3 2 1 0
01001101
MEMORY
ISE1/EE2 Introduction to Computer Architecture
1 3
1 2 4
0 1
1 6
1 6 6
tjwc - 13-Oct-10
Hexadecimal examples
You should remember these:
10(16) = 16, F(16) = 15 100(16) = 256, FF(16) = 255 1000(16) = 4096, FFF(16) = 4095 10000(16) = 65536, FFFF(16) = 65535
We will say more about this in part 2 For now, a number stored in a computer will have a fixed number of bits (binary digits) usually 8, 16 , or 32.
N bits can store whole unsigned numbers in range 0 x 2N 1
Each storage location or register in a computer which stores data has a fixed size in bits, and thus can store nonnegative whole numbers up a a given maximum size.
We can write these numbers in decimal, binary or hexadecimal it makes no difference to how they are stored or what they mean
tjwc - 13-Oct-10
1.21
tjwc - 13-Oct-10
1.22
CPU
ADDRESS Memory
0 1 2 3 4 5 551
DATA
A
data
Registers: Each can store one number (NB IR is not visible to programmer)
Accumulator
ISE1/EE2 Introduction to Computer Architecture 1.23 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.24
MU0 Design
Let us design a simple processor MU0 with 16-bit instruction and data bus and minimal hardware:Program Counter (PC) - holds address of the next instruction to execute (a register) Accumulator (A) - holds data being processed (a register) Instruction Register (IR) - holds current instruction code being executed Arithmetic Logic Unit (ALU) - performs operations on data
address 0 1
We will only design 8 instructions, but to leave room for expansion, we will allow capacity for 16 instructions
so we need 4 bits to identify an instruction: the opcode
Note top 4 bits define the operation code (opcode) and the bottom 12 bits define the memory address of the data (the operand) This machine can address up to 212 = 4k words = 8k bytes of data
1.25 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.26
tjwc - 13-Oct-10
LoaD A Store A ADD to A SUBtract from A JuMP Jump if Gt Equal Jump if Not Equal SToP
tjwc - 13-Oct-10
We need to load the accumulator with one value, add the other, and then store the result back into memory
Instructions execute in sequence
Note we follow tradition and use Hex notation for addresses and data
1.28
machine code
PC
0
A IR
data bus
005 006
control
Each machine instruction is implemented in hardware by one or more clock cycles in which things happen. We will trace through each cycle
This technique is invaluable when debugging or learning initially!
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.29
02E
...
Data
02F 030
Initially, we assume PC = 0, data and instructions are loaded in memory as shown, other CPU registers are undefined.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.30
Instruction 1: LDA
NB data shown is after each cycle has completed so PC is one more than PC used to fetch instruction
02E
PC A IR
Instruction 2: ADD
machine code 0 02E 2 02F 1 030 7 000 ----
02F
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 -addr bus
MU0
ALU
MU0
ALU
PC
Cycle 1
control
0AA0
data bus
control
202F
IR
MU0
ALU
006
PC A IR
MU0
0AA0 0110 -Cycle 2
ALU
006
addr bus
addr bus
...
...
PC
0BB0
data bus control
data bus
control
202F
IR
tjwc - 13-Oct-10
1.31
tjwc - 13-Oct-10
1.32
Instruction 3: STA
MU0
ALU PC A IR control
030
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 0BB0
addr bus
Instruction 4: STP
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 0BB0
000 001
Cycle 1
MU0
Cycle 1
ALU
addr bus
PC A IR
data bus
data bus
MU0
Cycle 2
ALU
006
control
006
addr bus
...
...
PC A IR
data bus
control
tjwc - 13-Oct-10
1.33
tjwc - 13-Oct-10
1.34
Program Counter (PC) - address of current instruction PC incremented automatically each time it is used
Therefore instructions are normally executed sequentially
The number of clock cycles taken by a MU0 instruction is the same as the number of memory accesses it makes.
LDA, STA, ADD, SUB therefore take 2 clock cycles each: one to fetch (and decode) the instruction, a second to fetch (and operate on) the data JMP, JGE, JNE, STP only need one memory read (the instruction itself) and therefore can be executed in one clock cycle.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.35
Assume MU0 will always reset to start execution from address 00016.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.36
This will introduce some of the general features needed to understand other ISAs such as the much more complex ARM ISA which we study in detail in Part 2
CPU design fundamentals Memory architecture
tjwc - 13-Oct-10
1.37
tjwc - 13-Oct-10
1.38
a := b+c
REGISTORS: have e.g 8 accumulators R0-R7 a,b,c stored in registers ADD R0,R1 MOV, R2, R0 ADD R2, R1, R0
2-operand instruction format (used by the Thumb instruction set of ARM, and the AVR 8 bit microcontrollers)
dest := dest op op1
2 operand (AVR) a: b: c: R2 R1 R0
1-operand instruction format (used in MU0 and some 8-bit microcontrollers such as MC6811)
acc := acc op op1
ADD R2,R1,R0
tjwc - 13-Oct-10
1.40
Design Strategies
Complex Instruction Set Computers (CISC) [e.g. VAX / ix86] dense code, simple compiler powerful instruction set, variable format, multi-word instructions multi-cycle execution, low clock rate Reduced Instruction Set Computers (RISC) [e.g. MIPS, SPARC] high clock rate, low development cost (?) easier to move to new technology Simple instructions, fixed format, single-word instructions, complex optimizing compiler
RISC
design emphasis on compilers
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture
CISC
design emphasis on processor
1.41
tjwc - 13-Oct-10
1.42
Think of main memory as being an array of words, the array index being the memory address. Each word (array location) has data which can be separately written or read. Usually instructions are one word in length but can be either more or less
memory bus Address bus CPU
Control bus
Instruction Memory
tjwc - 13-Oct-10
CPU
Data Memory
1.43 tjwc - 13-Oct-10
Data bus
ISE1/EE2 Introduction to Computer Architecture 1.44
Memory in detail
Memory locations store instructions data and each have unique numeric addresses
Usually addresses range from 0 up to some maximum value.
000 001 002 003 004 005 006
Memory space is the unique range of possible memory addresses in a computer system We talk about the address of a memory location. Each memory location stores a fixed number of bits of data, normally 8, 16, 32 or 64 We write mem8[100], mem16[100] to indicate the value of the 8 or 16 bits with memory address 100 etc
Internal datapaths inside computers could be different width - for example 4-bit, 8-bit, 16-bit or 32-bit. For example: ARM processor uses 32-bit internal datapath WORD = 32-bit for ARM, 16-bit for MU0, 64 bit for latest x86 processors BYTE (8 bits) and NIBBLE (4 bits) are architecture independent 31 24 23 16 15 8 7 0
MSB
LSB
...
tjwc - 13-Oct-10
1.45
tjwc - 13-Oct-10
1.46
LSB 7 5 3 1 6 4 2 0
16 bit memory with consecutive word addresses separated by 2
8: Word 6: address 4: 2: 0:
Endian-ness
Little-endian
ISE1/EE2 Introduction to Computer Architecture 1.48
tjwc - 13-Oct-10
tjwc - 13-Oct-10
Different views of memory access: byte or word Different instructions to allow either byte or word access to memory.
16 bit word Byte view view of memory of memory (little-endian)
Word read or write operates on a 800 0x02 0x03 whole word in memory 8 bits 16 bits The memory address is the word Mem16[800]= 0x0203(16) = 515 address as on the previous slide. Mem16[802]= 10(16) =16 Byte read or write is identical Mem8[800]= 3 except that a single byte is read or Mem8[801]= 2 written Mem8[802]= 10(16)= 16
Mem8[803]= 0
Word WRITE:
Mem16[addr] := A
8 bits Memory 8 bits
Byte READ:
A := 00000000 Mem8[addr]
Byte WRITE:
Mem8[addr] := A(7:0) (bottom 8 bits)
16 bits
tjwc - 13-Oct-10
1.49
tjwc - 13-Oct-10
1.50
pointer
41
41
Memory-mapped I/O. Some locations (addresses) in memory allow communication with peripheral devices.
For example, a memory write to the data register of a serial communication controller might output a byte on a serial port of a PC. In practice, all I/O in modern systems is memory-mapped
tjwc - 13-Oct-10
mem
0 1 2 3 4 5
ROM 512K
0:
1.51
11
A := mem[X+2]
copy
1.52
Array memory access is possible (memory location read or written depends on value of a variable etc) If register is set to constant value can use just like direct addressing
OK if CPU has large number of register so can set one register permanently
A X S+99
tjwc - 13-Oct-10
Subroutine call mechanism - this allows writing modular programs. Additional internal registers - this reduces the need for accessing external memory & speeds up calculations
tjwc - 13-Oct-10
Interrupts, direct memory access (DMA), and cache memory. interrupts: allow external devices (e.g. mouse, keyboard) to interrupt the current program execution DMA: allows external highthroughput devices (e.g. display card) to access memory directly rather than through processor Cache: a small amount of fast memory on the processor
1.56
31
29
CPSR
unused
7 6 5 4
N Z C V
I F T mode
PC
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.57 tjwc - 13-Oct-10
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (stack pointer) r14 (link register) r15
1.58
1.60
MU0 to ARM
Operation A := mem[S] R0 := mem[S] mem[S] := A mem[S] := Rn A := A + mem[S] R0 := R0+ mem[S] R0 := S R0 := R1 + R2 PC := S MU0 LDA S STA S ADD S n/a n/a JMP S ARM LDR R0, S STR R0, S LDR R1, S ADD R0, R0, R1 MOV R0, #S ADD R0, R1, R2 B S
A
ra := rb ra := n ra := rb + rc ra := rb + n set status bits on ra-rb set status bits on ra-n branch to label branch to label if zero branch if not zero branch if negative branch if zero or plus ra := mem[label] mem[label] := ra ra :=address of label ra := mem[rb] mem[rb] := ra
n decimal in range -128 to 127 (other values possible, see later) SUB => instead of +
CMP is like SUB but has no destination register ans sets status bits
BL label is branch & link Branch conditions apply to the result of the last instruction to set status bits (ADDS/SUBS/MOVS/CMP etc). LDRB/STRB => byte transfer
Other address modes: [rb,#n] => mem[rb+n] [rb,#n]! => mem[rb+n], rb := rb+n [rb],#n => mem[rb], rb:=rb+n [rb+ri] => mem[rb+ri]
1.61
R0 R1 R2
tjwc - 13-Oct-10
1.62
a: b: c: d:
R0 R1 R2 R3
; X (initialised to 3) ; Y (initialised to 11) ; 4 bytes (1 word) space for Z, uninitialised ;mark start ;load multiplier from mem[X] ;load number to be multiplied from mem[Y] ;initialise sum ;add Y to sum ;decrement count ;compare & set codes on R0 ;loop back if not finished (R0 0) ;store product in mem[Z]
a b c d
comments
opcode
operands
ISE1/EE2 Introduction to Computer Architecture 1.64
tjwc - 13-Oct-10
2n-1bn-1+ 2n-1bn-1+
2n-2bn-1 2n-2bn-1
.... + 8b3 + 4b2 + 2b1 + b0 = u(bi) 0 u 2n1 .... + 8b3 + 4b2 + 2b1 + b0 = z(bi) 2n-1 s 2n-11
Difference between z & u is not apparent in lower n bits
n bit binary addition has identical sum carry is different
V
ISE1/EE2 Introduction to Computer Architecture
oVerflow (signed)
1.65
tjwc - 13-Oct-10
qn-2
Signed
z(ai) + z(bi) = -2cn + 2cn-1) cn cn-1 for signed overflow: V in ARM architecture
tjwc - 13-Oct-10
z(qi)+2n-1 (
1.67
tjwc - 13-Oct-10
MOV R0,#0 ADR R2, S ADR R9, S+400 LOOP LDR R1, [R2] ADD R0, R0, R1 ADD R2, R2, #4 CMP R2, R9 BMI LOOP STOP B STOP
;A := 0 set A equal to sum of memory locations from S to S+99 ;X := S ;R9:=S+400 for later ;tmp := mem[X] S+4 ;A := A + tmp S+8 ;X := X+4 (4 bytes) ;set codes on X-(S+400)? ;why would BNE be better? ;stop
A S
LOOP ;tmp := mem[X] ;A := A + tmp ;X := X+4 ;set condition codes on X-(S+400)? ;branch back if result negative (N=1) ;stop
ISE1/EE2 Introduction to Computer Architecture 1.70
We will see later this can be simplified using more powerful instructions
tjwc - 13-Oct-10
S+396 S+400