Sunteți pe pagina 1din 18

Introduction to Computer Architecture ISE1 & EE2 Autumn 2010 - Dr.

Tom Clarke
EDSAC first stored program computer Cambridge 1949 650 instructions per second. 1024 17-bit words of memory in mercury ultrasonic delay lines. 3000 valves, 12 kW power consumption, occupied a room 5m by 4m. Early use to solve problems in meteorology, genetics and X-ray crystallography.

60 years ago

ARM7 core up to 130 million instructions per second. 1995-2005. ARM7 core in many variations is most successful embedded processor today. Picture shows LPC2124 microcontroller which includes ARM7 core + RAM, ROM integrated peripherals.
The complete microcontroller is the square chip in the middle 128K X 32 bit words flash RAM 10mW/Mhz clock

and Now

Original ARM design:


Steve Furber, Acorn Risc Machines, Cambridge, 1985 Electronic Delay Storage Automatic Computer

ARM7 CPU LPC2124 microcontroller


1.2

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

ISE1 / EE2 Principles of Computers


Course materials based on two books: Computer Organization & Design 2nd edition, Patterson & Hennessy 1998 (around 30 new - 15 2nd hand via Amazon) Covers most topics on this course V. Useful for ISE also used in 2nd Year. ARM System-on-Chip Architecture, Steve Furber, 2000 (around 25) Best book on ARM processor Administration issues: Around 15 lectures on this part of the course Problem sheets for programming labs (EE) / tutorials (ISE) Coursework: will be programming exercises + on-line tests Course website: www2.ee.ic.ac.uk/t.clarke/arch/
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.3

What will you be learning?


Part 1 - Introduction
What is a computer? What is Instruction Set Architecture (ISA) How is an ISA implemented in hardware? (at an abstract level) MU0 a simple computer Instruction formats Memory addressing

Part 2 ARM Assembly Language


Assembly language - large part of the course Details of the ARM ISA How to use arithmetic & logical data operations

Part 3 Hardware Topics


Computer interfacing with peripherals
interrupts & DMA

ARM Organisation
Pipelining Throughput

Memory hierarchy
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.4

Levels of representation in computers

Part 1
Introduction to Machine Architecture

temp := v[k]; High Level Language Program Compiler Assembly Language Program Assembler Machine Language Program Machine Interpretation v[k] := v[k+1]; v[k+1] := temp;

lw lw sw sw
0000 1010 1100 0101

$15, $16, $16, $15,


1001 1111 0110 1000

0($2) 4($2) 0($2) 4($2)


1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111

EDSAC mercury delay line

300mm wafer, used for production of Intels Dothan Pentium mobile CPUs
1.5

Control Signal Specification


tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.6

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

What is Computer Architecture ?


high

What is Instruction Set Architecture (ISA)?


. . . the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964

Application
Operating System

Levels of Abstraction

Compiler

What the computer does, not how it does it! ISA includes:Instruction (or Operation Code) Set
Data Types & Data Structures: Encodings & Representations Instruction Formats

INSTRUCTION SET ARCHITECTURE Processor Architecture

I/O System

Digital Design
low

VLSI Circuit Design

Key: Instruction Set Architecture (ISA) Different levels of abstraction


tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.7

Organization of Programmable Storage (main memory etc) Modes of Addressing and Accessing Data Items and Instructions Behaviour on Exceptional Conditions (e.g. hardware divide by 0)

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.8

Instruction Set Architecture (ISA)


A very important abstraction
interface between hardware and low-level software standardizes instructions, machine language bit patterns, etc. advantage: different implementations of the same ISA can all run identical software disadvantage: sometimes prevents using new innovations
E.g. 8086 segmented memory architecture for a long time inhibited better microprocessor design for IBM PCs

Factors influencing computer architectures


Technology Applications

Computer Architecture

Programming languages

Modern instruction set architectures:


ARM, AVR, 80x86/Pentium/etc, PowerPC, DEC Alpha, MIPS, SPARC, HP, PIC,
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.9

Operating Systems

Design expertise!

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.10

Logic & memory relative performance


Intel 32nm CPU technology 2008 2000M transistors
Density Improvement in MPU Technology
800 700 600 500 400 300 200 100 Density (M trans/sq cm) Chip size (sq mm)

Intel Dothan Pentium-M 2004 140M transistors

0 1997

1999

2001

2003

2005 Year

2007

2009

2011

Capacity Logic 2x in 3 years 4x in 3 years 4x in 3 years

Speed 2x in 3 years 1.4x in 10 years 1.4x in 10 years


1.12

Memory DRAM
disk
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.11 tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

Processor Speed Improvements


MPU Speed improvement
12 On-chip Clock Frequency (GHz) 10

Internal Organisation
Processor aka CPU (Central Processing Unit)
Computer Processor Control Memory Devices: Input Output

8 6 4 2 0 1997

Datapath

1999

2001

2003

2005 Year

2007

2009

2011

Major components of Typical Computer System Data is mostly stored in the computer memory separate from the Processor, however registers in the processor datapath can also store small amounts of data
1.13 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.14

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

Summary
All computers consist of five components
Processor (CPU): (1) datapath and (2) control (3) Memory (4) Input devices and (5) Output devices Speed and density of computers is increasing exponentially with time (but different parts at different rates) This course will concentrate on Instruction Set Architectures and their use through assembly language programming
Assembly language is a human-readable version of machine code Think of this as being like learning arithmetic calculators are better for big calculations but you cant understand maths without doing it for yourself as well.

Lecture 2 A Very Simple Processor


The point of philosophy is to start with something so simple as not to seem worth stating, and to end with something so paradoxical that no one will believe it." Bertrand Russell

Based on von Neumann model Stored program and data in same memory Central Processing Unit (CPU) contains:
Arithmetic/Logic Unit (ALU) Control Unit Registers: fast memory, local to the CPU

CPU I/O Memory

We will use the ARM processor as our main example


tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.15 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.16

Computer Memory
Look into memory and youll see 1s and 0s. Meaning depends on context: could be program, could be data. Data is stored in memory All computers require at least three types of signals (buses):
address bus - which location in memory data bus - carries the contents of the location control bus - governs the information transfer
Read from memory Write to memory
tjwc - 13-Oct-10

Memory locations
ADDRESS

Binary Notation numbers as 1s and 0s


There is no reason why one should be restricted to using base-10 (decimal) numbers only. Digital computers use a binary number system where the base (or radix) is 2:

10 9 8 7 6 5 4 3 2 1 0

01001101

For example, the value of this binary number is as on left: NOTATION:


12(10) 12(8) 1000(2)
1.17 tjwc - 13-Oct-10

MEMORY
ISE1/EE2 Introduction to Computer Architecture

12 decimal 12 octal = 10(10) 1000 binary = 8(16)


ISE1/EE2 Introduction to Computer Architecture 1.18

Numbers: how to represent data in computers?


The translation between lines on a bus, or bits in a register (binary info), and decimal (base 10) numbers is not simple. Could use binary numbers but this is difficult to read Good compromise hexadecimal (base 16) is used for convenience 1238(16) = 1X163 + 2X162 + 3X161 + 8 = 4664(10) = 0001001000111000(2) 1 2 3 8 Each hex digit represents 4 bits
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.19

Octal and hexadecimal number systems


1 1 B Extra digits
1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 1 1 0 1 0 1 0 1 10 A 11 B 12 C 13 D 14 E 15 F
ISE1/EE2 Introduction to Computer Architecture 1.20

1 3

1 2 4

0 1

1 6

1 6 6

132166(8)=B476(16) What is this number in decimal? (((11*16+4)*16+7)*16)+6

tjwc - 13-Oct-10

Hexadecimal examples
You should remember these:
10(16) = 16, F(16) = 15 100(16) = 256, FF(16) = 255 1000(16) = 4096, FFF(16) = 4095 10000(16) = 65536, FFFF(16) = 65535

Digital representation of numbers

We will say more about this in part 2 For now, a number stored in a computer will have a fixed number of bits (binary digits) usually 8, 16 , or 32.
N bits can store whole unsigned numbers in range 0 x 2N 1

For convenience 1024 = 1K, so 4096=4K, 65536=64K etc. What is 2013(16)?


In binary? In decimal?

Each storage location or register in a computer which stores data has a fixed size in bits, and thus can store nonnegative whole numbers up a a given maximum size.
We can write these numbers in decimal, binary or hexadecimal it makes no difference to how they are stored or what they mean

Can you work out 1000(16) 8 in hexadecimal?


FF8(16)

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.21

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.22

MU0 - A Very Simple Processor


Program Counter Instruction Register
address

Logical (programmers) view of MU0 Memory


CPU PC

CPU

ADDRESS Memory
0 1 2 3 4 5 551

Memory location with address 0 is storing data 551

DATA
A

data

Registers: Each can store one number (NB IR is not visible to programmer)

Memory Locations: Each can store one number

Arithmetic Logic Unit


tjwc - 13-Oct-10

Accumulator
ISE1/EE2 Introduction to Computer Architecture 1.23 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.24

MU0 Design
Let us design a simple processor MU0 with 16-bit instruction and data bus and minimal hardware:Program Counter (PC) - holds address of the next instruction to execute (a register) Accumulator (A) - holds data being processed (a register) Instruction Register (IR) - holds current instruction code being executed Arithmetic Logic Unit (ALU) - performs operations on data

MU0 Design (2)


Let us further assume that the memory is word-addressible each 16-bit word has its own location: word 0, word 1, etc.
Cant address individual bytes!

address 0 1

data 0123(16) 7777(16)

The 16-bit instruction code (machine code) has a format:

We will only design 8 instructions, but to leave room for expansion, we will allow capacity for 16 instructions
so we need 4 bits to identify an instruction: the opcode
Note top 4 bits define the operation code (opcode) and the bottom 12 bits define the memory address of the data (the operand) This machine can address up to 212 = 4k words = 8k bytes of data
1.25 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.26

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

MU0 Instruction Set


mem[S] contents of memory location with address S Think of memory locations as being an array here S is the array index A is the single 16 bit CPU register S is a number from instruction in range 0-4095 (000(16)-FFF(16))
Instruction Opcode (hex) 0000 (0) 0001 (1) 0010 (2) 0011 (3) 0100 (4) 0101 (5) 0110 (6) 0111 (7) Effect A := mem[S] mem[S] := A A := A + mem[S] A := A mem[S] PC := S if A 0, PC := S if A 0, PC := S stop
1.27

Our First Program


The simplest use of our microprocessor: add two numbers
Lets assume these numbers are stored at two consecutive locations in memory, with addresses 2E and 2F Lets assume we wish to store the result back to memory address 30

LoaD A Store A ADD to A SUBtract from A JuMP Jump if Gt Equal Jump if Not Equal SToP
tjwc - 13-Oct-10

LDA S STA S ADD S SUB S JMP S JGE S JNE S STP

We need to load the accumulator with one value, add the other, and then store the result back into memory
Instructions execute in sequence

LDA 02E ADD 02F STA 030 STP

002E 202F 1030 7???


Machine Code

Note we follow tradition and use Hex notation for addresses and data

Human readable (mnemonic) assembly code


tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

ISE1/EE2 Introduction to Computer Architecture

1.28

Walkthrough of MU0 execution


Common technique used to debug computer programs Write down state of all internal registers, memory, etc Trace through instructions making the necessary changes at each step In machine code the PC register determines which instruction will next be executed.
Normally PC is incremented at end of each instruction to get to the next instruction in the sequence

Caught in the Act!


Program
MU0
ALU addr bus

Assembly mnemonics 000 LDA 02E


001 002 003 004

machine code

PC

0
A IR

ADD 02F STA 030 STP ---AA0 110 --

0 02E 2 02F 1 030 7 000


---AA0 110 --

data bus

005 006

control

Each machine instruction is implemented in hardware by one or more clock cycles in which things happen. We will trace through each cycle
This technique is invaluable when debugging or learning initially!
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.29

02E

...

Data

02F 030

Initially, we assume PC = 0, data and instructions are loaded in memory as shown, other CPU registers are undefined.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.30

Instruction 1: LDA
NB data shown is after each cycle has completed so PC is one more than PC used to fetch instruction

02E
PC A IR

Instruction 2: ADD
machine code 0 02E 2 02F 1 030 7 000 ----

02F
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 -addr bus

MU0
ALU

MU0
ALU

addr bus data bus

PC

000 001 002 003 004 005

Cycle 1
control

0AA0
data bus

000 001 002 003 004 005

Cycle 1 (fetch instr and increment PC)

control

202F

IR

MU0
ALU

006

PC A IR

MU0
0AA0 0110 -Cycle 2
ALU

006

addr bus

addr bus

...

02E 02F 030

...

PC

02E 02F 030

Cycle 2 (execute instruction)

0BB0
data bus control

data bus

control

202F

IR

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.31

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.32

Instruction 3: STA
MU0
ALU PC A IR control

030
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 0BB0
addr bus

Instruction 4: STP
machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 0BB0

000 001 002 003 004 005

000 001

Cycle 1

MU0
Cycle 1
ALU

addr bus

PC A IR

002 003 004 005

data bus

data bus

MU0
Cycle 2
ALU

006

control

006

addr bus

...

...

PC A IR

02E 02F 030

02E 02F 030

data bus

control

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.33

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.34

Key Points: instructions


Microprocessors perform operations depending on instruction codes stored in memory Instructions usually have two parts:
Opcode - determines what is to be done with the data Operand - specifies where/what is the data

Key Points: hardware


Memory contains both programs and data Program area and data area in memory are usually well separated (but self-modifying code is possible!) ALU is responsible for arithmetic and logic functions There are usually one or more general purpose registers for storing results or memory addresses (MU0 only has one A (more registers => more powerful) Fetching data from inside the CPU is much faster than from external memory
Assume number of memory operations determines number of cycles needed to execute instruction

Program Counter (PC) - address of current instruction PC incremented automatically each time it is used
Therefore instructions are normally executed sequentially

The number of clock cycles taken by a MU0 instruction is the same as the number of memory accesses it makes.
LDA, STA, ADD, SUB therefore take 2 clock cycles each: one to fetch (and decode) the instruction, a second to fetch (and operate on) the data JMP, JGE, JNE, STP only need one memory read (the instruction itself) and therefore can be executed in one clock cycle.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.35

Assume MU0 will always reset to start execution from address 00016.
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.36

Lecture 3 Microprocessor Fundamentals


A good scientist is a person with original ideas. A good engineer is a person who makes a design that works with as few original ideas as possible. There are no prima donnas in engineering. Freeman Dyson

How to make CPU faster?


Make each instruction use as few clock cycles as possible Keep as much data inside the CPU as possible (many internal registers) Make each clock cycle as short as possible (high clock frequency) Get each instruction to do as much as possible (?) What do you mean by fast?
Different processor designs will be faster at different tasks Use benchmarks (big standard programs) written in high level languages to compare different processors.
Processor performance is benchmark-specific

This will introduce some of the general features needed to understand other ISAs such as the much more complex ARM ISA which we study in detail in Part 2
CPU design fundamentals Memory architecture

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.37

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.38

Instruction format classification


3-operand instruction format (used by ARM processor)
dest := op1 op op2 a,b,c stored in memory LDA mem[100]

a := b+c
REGISTORS: have e.g 8 accumulators R0-R7 a,b,c stored in registers ADD R0,R1 MOV, R2, R0 ADD R2, R1, R0

2-operand instruction format (used by the Thumb instruction set of ARM, and the AVR 8 bit microcontrollers)
dest := dest op op1

ADD mem[101] STA mem[102] 1 operand (MU0) a: b: c: mem[102] mem[101] mem[100]

2 operand (AVR) a: b: c: R2 R1 R0

3 operand (ARM) a: b: c: R2 R1 R0 ;R2:=R1+R0

1-operand instruction format (used in MU0 and some 8-bit microcontrollers such as MC6811)
acc := acc op op1

ADD R0,R1 ;R0:=R0+R1 MOV R2,R0 ;R2 := R0


ISE1/EE2 Introduction to Computer Architecture 1.39 tjwc - 13-Oct-10

ADD R2,R1,R0

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.40

Design Strategies
Complex Instruction Set Computers (CISC) [e.g. VAX / ix86] dense code, simple compiler powerful instruction set, variable format, multi-word instructions multi-cycle execution, low clock rate Reduced Instruction Set Computers (RISC) [e.g. MIPS, SPARC] high clock rate, low development cost (?) easier to move to new technology Simple instructions, fixed format, single-word instructions, complex optimizing compiler

Modern CPU Design


1. Why the move from CISC to RISC? technology factors increase expense of chip design better compilers, better software engineers Simple ISA better for concurrent execution 2. Load / Store architecture Lots of registers only go to main memory when really necessary. 3. Concurrent execution of instructions for greater speed multiple function units (ALUs, etc) superscalar or VLIW (EPIC) examples: Pentium & Athlon production line arrangement pipeline: all modern CPU

RISC
design emphasis on compilers
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture

CISC
design emphasis on processor
1.41

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.42

Main memory organisation


Main memory is used to store programs, data, intermediate results Two main organisations: Harvard & von Neumann Harvard architecture.
In A Harvard architecture CPU programs are stored in a separate memory (possibly with a different width) from the data memory. This has the added benefit that instructions can be fetched at the same time as data, simplifying & speeding up the hardware. In practice, the convenience of being able to read and write programs just like normal data makes this less usual
still popular for fixed program microcontrollers.

Von Neumann memory architecture


Von Neumann architecture (like MU0).
Programs and data occupy a single memory.

Think of main memory as being an array of words, the array index being the memory address. Each word (array location) has data which can be separately written or read. Usually instructions are one word in length but can be either more or less
memory bus Address bus CPU
Control bus

Instruction Memory
tjwc - 13-Oct-10

CPU

Data Memory
1.43 tjwc - 13-Oct-10

Data & Instruction Memory

Data bus
ISE1/EE2 Introduction to Computer Architecture 1.44

Harvard Introduction to Computer Architecture ISE1/EE2 architecture

Memory in detail
Memory locations store instructions data and each have unique numeric addresses
Usually addresses range from 0 up to some maximum value.
000 001 002 003 004 005 006

Nibbles, Bytes, Words


machine code 0 02E 2 02F 1 030 7 000 ---0AA0 0110 0BB0

Memory space is the unique range of possible memory addresses in a computer system We talk about the address of a memory location. Each memory location stores a fixed number of bits of data, normally 8, 16, 32 or 64 We write mem8[100], mem16[100] to indicate the value of the 8 or 16 bits with memory address 100 etc

Internal datapaths inside computers could be different width - for example 4-bit, 8-bit, 16-bit or 32-bit. For example: ARM processor uses 32-bit internal datapath WORD = 32-bit for ARM, 16-bit for MU0, 64 bit for latest x86 processors BYTE (8 bits) and NIBBLE (4 bits) are architecture independent 31 24 23 16 15 8 7 0

MSB

LSB

02E 02F 030

...

Nibble Byte Word

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.45

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.46

Byte and Word addressing


Many programs, for example those manipulating characters each of which takes up one byte need to access memory as an array of bytes, not words. For example in a 16 bit system there are 2 bytes for every word. This means that byte addresses must increase twice as fast as word addresses. There are two different ways to number bytes within a word as below. MSB 4: Word 3: number 2: 1: 0: 7 6 5 4 3 2 1 0 Little-endian LSB MSB LSB 6 7 4 5 2 3 0 1 Big-endian
1.47

Byte addresses for words


Most computer systems now use little-endian byte addressing, in which the least-significant byte has the lower address. It is inconvenient to have completely separate byte and word addresses, so word addressing usually follows byte addressing.
The word address of a word is the byte address of its lowest numbered byte. This means that consecutive words have addresses separated by 2 (16 bit words) or 4 (32 bit words) etc.

MSB 4: Word 3: number 2: 1: 0:


Not used

LSB 7 5 3 1 6 4 2 0
16 bit memory with consecutive word addresses separated by 2

8: Word 6: address 4: 2: 0:

Endian-ness

Little-endian
ISE1/EE2 Introduction to Computer Architecture 1.48

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

tjwc - 13-Oct-10

Different views of memory access: byte or word Different instructions to allow either byte or word access to memory.
16 bit word Byte view view of memory of memory (little-endian)

Internal Registers & Memory


Internal registers (e.g. A, R0) are same length as memory word Word READ:
A := Mem16[addr]
A 16 bits Top 8 bottom 8

802 0x00 0x10

Word read or write operates on a 800 0x02 0x03 whole word in memory 8 bits 16 bits The memory address is the word Mem16[800]= 0x0203(16) = 515 address as on the previous slide. Mem16[802]= 10(16) =16 Byte read or write is identical Mem8[800]= 3 except that a single byte is read or Mem8[801]= 2 written Mem8[802]= 10(16)= 16
Mem8[803]= 0

0x00 0x10 0x02 0x03

803 802 801 800

Word WRITE:
Mem16[addr] := A
8 bits Memory 8 bits

Byte READ:
A := 00000000 Mem8[addr]

Byte WRITE:
Mem8[addr] := A(7:0) (bottom 8 bits)
16 bits

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.49

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.50

What are memory locations used for?


LPC2138 microcontroller Read-write memory (RAM) is used for data and programs. It loses its contents On-chip memory map E007 0000: on power-down. I/O 28 X 16K Read-only memory (ROM) typically used E000 0000: to hold programs that do not change
Flash ROM allows data to be changed by programming (but not by memory write).

Memory addressing modes


Direct addressing
A := mem[N] (N in instruction word) Limited memory space (by number of bits available) Does not allow array access (where location is selcted by number in register) Indirect addressing mem
4
0 1 2 3 copy 4 5

pointer

41

41

Indirect addressing (not often used now)


A := mem[mem[N]]

A := mem[mem[1]] Register indirect offset addressing X


3

Memory-mapped I/O. Some locations (addresses) in memory allow communication with peripheral devices.
For example, a memory write to the data register of a serial communication controller might output a byte on a serial port of a PC. In practice, all I/O in modern systems is memory-mapped
tjwc - 13-Oct-10

400 7FFF: 32K 400 0000: RAM

Register indirect addressing:


A := mem[X]
where X is the name of an index register in CPU

mem

pointer Offset N=2 A


11

Register indirect offset addressing 7 FFFF:

0 1 2 3 4 5

ROM 512K
0:
1.51

A := mem[X + N] Register indirect is special case where offset = 0


tjwc - 13-Oct-10

11

A := mem[X+2]

copy

ISE1/EE2 Introduction to Computer Architecture

ISE1/EE2 Introduction to Computer Architecture

1.52

Example of register indirect addressing


Using register indirect addressing set A equal to sum of memory locations from S to S+99, then stop.
A := 0 X := S LOOP: A := A + mem[X] X := X + 1 if X < S+100 JMP LOOP STP S S+1 S+2

Advantages of register indirect offset addressing mode over direct addressing


Memory address size limited by register size not by bits in instruction word
32 bits => 232 = 4294967296 memory addresses

Array memory access is possible (memory location read or written depends on value of a variable etc) If register is set to constant value can use just like direct addressing
OK if CPU has large number of register so can set one register permanently

A X S+99

Allows programs to run using data anywhere in memory


Set register to base of program data area

Register used like this is often called base register


1.53 tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.54

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

Lecture 4 - Introduction to ARM programming


Steve is one of the brightest guys I've ever worked with brilliant - but when we decided to do a microprocessor on our own, I made two great decisions - I gave them two things which National, Intel and Motorola had never given their design teams: the first was no money; the second was no people. The only way they could do it was to keep it really simple. - Hermann Hauser talking about Steve Furber and the ARM design

Beyond MU0 - A first look at ARM


Complete instruction set. Wide variety of arithmetic, logical, shift & conditional branch instructions Larger address space - 12bit address gives 4k byte of memory. So use a 32-bit or address bus.
Typical physical memory size 1Mbyte (uses 20 bits) but can be anything up to 232 bytes

Why learn ARM?


Currently dominant architecture for embedded systems 32 bits => powerful & fast Efficient: very low power/MIPS Regular instruction set with many advanced features
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.55

Subroutine call mechanism - this allows writing modular programs. Additional internal registers - this reduces the need for accessing external memory & speeds up calculations
tjwc - 13-Oct-10

Interrupts, direct memory access (DMA), and cache memory. interrupts: allow external devices (e.g. mouse, keyboard) to interrupt the current program execution DMA: allows external highthroughput devices (e.g. display card) to access memory directly rather than through processor Cache: a small amount of fast memory on the processor

ISE1/EE2 Introduction to Computer Architecture

1.56

The ARM Instruction Set


Load-Store architecture Fixed-length (32-bit) instructions 3-operand instruction format (2 source operand regs, 1 result operand reg): ALU operations very powerful (can include shifts) Conditional execution of ALL instructions (v. clever idea!) Load-Store multiple registers in one instruction A single-cycle n-bit shift with ALU operation Combines the best of RISC with the best of CISC

ARM Programmers Model


16 X 32 bit registers R15 is equal to the PC
Its value is the current PC value Writing to it causes a branch!

R0-R14 are general purpose


R13, R14 have additional functions, described later

Current Processor Status Register (CPSR)


Holds condition codes AKA status bits

31

29

CPSR
unused

7 6 5 4

N Z C V

I F T mode

PC
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.57 tjwc - 13-Oct-10

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (stack pointer) r14 (link register) r15
1.58

ISE1/EE2 Introduction to Computer Architecture

ARM Programmer's Model (con't)


CPSR is a special register, it cannot be read or written like other registers
The result of any data processing instruction can modify status bits (flags) These flags are read to determine branch conditions etc

ARM's memory organization


Byte addressed memory Maximum 232 bytes of memory A word = 32-bits, half-word = 16 bits Words aligned on 4-byte boundaries

NB - Lowest byte address = LSB of word Little-endian

Main status bits (AKA condition codes):


N (result was negative) Z (result was zero) C (result involved a carry-out) V (result overflowed as signed number)
20 16 12 8 4 0
tjwc - 13-Oct-10 ISE1/EE2 Introduction to Computer Architecture 1.59 tjwc - 13-Oct-10

Other fields described later

Word addresses follow LSB byte address

ISE1/EE2 Introduction to Computer Architecture

1.60

ARM Assembly Quick Introduction


MOV ra, rb MOV ra, #n ADD ra, rb, rc ADD ra, rb, #n CMP ra, rb CMP ra, #n B label BEQ label BNE label BMI label BPL label LDR ra, label STR ra, label ADR ra, label LDR ra, [rb] STR ra, [rb]
tjwc - 13-Oct-10

MU0 to ARM
Operation A := mem[S] R0 := mem[S] mem[S] := A mem[S] := Rn A := A + mem[S] R0 := R0+ mem[S] R0 := S R0 := R1 + R2 PC := S MU0 LDA S STA S ADD S n/a n/a JMP S ARM LDR R0, S STR R0, S LDR R1, S ADD R0, R0, R1 MOV R0, #S ADD R0, R1, R2 B S
A

ra := rb ra := n ra := rb + rc ra := rb + n set status bits on ra-rb set status bits on ra-n branch to label branch to label if zero branch if not zero branch if negative branch if zero or plus ra := mem[label] mem[label] := ra ra :=address of label ra := mem[rb] mem[rb] := ra

n decimal in range -128 to 127 (other values possible, see later) SUB => instead of +
CMP is like SUB but has no destination register ans sets status bits

BL label is branch & link Branch conditions apply to the result of the last instruction to set status bits (ADDS/SUBS/MOVS/CMP etc). LDRB/STRB => byte transfer
Other address modes: [rb,#n] => mem[rb+n] [rb,#n]! => mem[rb+n], rb := rb+n [rb],#n => mem[rb], rb:=rb+n [rb+ri] => mem[rb+ri]
1.61

R0 R1 R2

ISE1/EE2 Introduction to Computer Architecture

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

1.62

Introduction to ARM data processing a := b+c-d


LDR R1, B ARM has 16 registers R0-R15 If a,b,c,d are in registers:

An ARM assembly module


symbols
AREA Example, CODE ;name a code block ;defines a numeric constant 3 11 4 r0, X r1, Y r2, #0 R2, R2, R1 r0, r0, #1 r0, #0 LOOP r2, Z TABSIZE EQU 10 X Y Z DCW DCW % ENTRY LDR LDR MOV LOOP ADD SUB CMP BNE STR END
1.63 tjwc - 13-Oct-10

LOAD data to reg LDR R2, C from memory


LDR R3, D ADD R0, R1, R2 SUB R0, R0, R3

module header and end

a: b: c: d:

R0 R1 R2 R3

; X (initialised to 3) ; Y (initialised to 11) ; 4 bytes (1 word) space for Z, uninitialised ;mark start ;load multiplier from mem[X] ;load number to be multiplied from mem[Y] ;initialise sum ;add Y to sum ;decrement count ;compare & set codes on R0 ;loop back if not finished (R0 0) ;store product in mem[Z]

STORE result to STR R0, A memory from reg

Machine Instructions: ADD Rx,Ry,Rz ;Rx := Ry + Rz SUB Rx,Ry,Rz ;Rx := Ry - Rz


ADD R0, R1, R2 SUB R0, R0, R3

mem[A] mem[B] mem[C] mem[D]


ISE1/EE2 Introduction to Computer Architecture

a b c d

comments

opcode

operands
ISE1/EE2 Introduction to Computer Architecture 1.64

tjwc - 13-Oct-10

CMP instruction & condition codes


CMP R0, #n CMP R0, #0 ; set condition codes computes x = R0 - n BNE LOOP; branch if Z=0 x = 0 <=> Z = 1 z(x) < 0 <=> N = 1 C is carry from addition condition codes AKA status bits V is two's complement overflow BNE ;branch if Z=0 (x 0) N Negative BEQ ;branch if Z=1 (x = 0) Z Zero BMI ;branch if N=1 (z(x) < 0) C Carry BPL ;branch if N=0 (z(x) 0)
z(x) two complement interpretation of bits x
tjwc - 13-Oct-10

Two's Complement in n bit binary word


unsigned binary

2n-1bn-1+ 2n-1bn-1+

2n-2bn-1 2n-2bn-1

.... + 8b3 + 4b2 + 2b1 + b0 = u(bi) 0 u 2n1 .... + 8b3 + 4b2 + 2b1 + b0 = z(bi) 2n-1 s 2n-11
Difference between z & u is not apparent in lower n bits
n bit binary addition has identical sum carry is different

two's complement signed binary

z(bi) = u(bi) 2nbn-1 2n z = (2n-1 z) + 1


2n-1: z: 2n-1-z: 11111111 00000010 11111101

V
ISE1/EE2 Introduction to Computer Architecture

oVerflow (signed)
1.65

Negating two's complement is inverting bits and adding 1


2n does not affect lower n bits
ISE1/EE2 Introduction to Computer Architecture 1.66

tjwc - 13-Oct-10

Overflow & Carry bits


Suppose qi is the N bit sum of a binary adder inputs ai & bi When does overflow happen? Unsigned
2ncn an-1 bn-1 an-2 cn cn-1 cn-2 qn-1
cn-1 weighting sign change from 2n-1 to -2n-1

What is subtraction in binary?


In a microprocessor
Subtract generates correct two's complement answer for two's complement operands. Subtract = negate followed by add: a - b = a + (-b) Example: 4 - 1 0100 0001 two's comp negate is
invert bits & add 1: 0001 => 1110 => 1111 No overflow because: cn=1 cn-1=1

u(ai) + u(bi) = u(qi) + cn = 1 for b n-2 unsigned overflow: C in ARM architecture

qn-2

Signed
z(ai) + z(bi) = -2cn + 2cn-1) cn cn-1 for signed overflow: V in ARM architecture
tjwc - 13-Oct-10

z(qi)+2n-1 (

MS two bits of n bit addition

0100 1111 + 10011


1.68

ISE1/EE2 Introduction to Computer Architecture

1.67

tjwc - 13-Oct-10

ISE1/EE2 Introduction to Computer Architecture

Register indirect addressing on ARM


Using register indirect addressing

Assembly module for answer


S S1 AREA Example2, CODE ;name a code block % 400 ;define 400 bytes space for table S->S+99 ; S1 is label equal to S+400 ENTRY ;start instructions here MOV R0,#0 ;A := 0 ADR R2, S ;X := S ADR R9, S1 ;R9 :=S+400 for later LDR R1, [R2] ADD R0, R0, R1 ADD R2, R2, #4 CMP R2, R9 BMI LOOP STOP B STOP END
1.69 tjwc - 13-Oct-10

MOV R0,#0 ADR R2, S ADR R9, S+400 LOOP LDR R1, [R2] ADD R0, R0, R1 ADD R2, R2, #4 CMP R2, R9 BMI LOOP STOP B STOP

;A := 0 set A equal to sum of memory locations from S to S+99 ;X := S ;R9:=S+400 for later ;tmp := mem[X] S+4 ;A := A + tmp S+8 ;X := X+4 (4 bytes) ;set codes on X-(S+400)? ;why would BNE be better? ;stop
A S

LOOP ;tmp := mem[X] ;A := A + tmp ;X := X+4 ;set condition codes on X-(S+400)? ;branch back if result negative (N=1) ;stop
ISE1/EE2 Introduction to Computer Architecture 1.70

We will see later this can be simplified using more powerful instructions
tjwc - 13-Oct-10

S+396 S+400

ISE1/EE2 Introduction to Computer Architecture

S-ar putea să vă placă și