Documente Academic
Documente Profesional
Documente Cultură
Hardware/Software Introduction
Chapter 3 General-Purpose Processors:
Software
Introduction
General-Purpose Processor
Processor designed for a variety of computation tasks
Low unit cost, in part because manufacturer spreads NRE
over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
Basic Architecture
Control unit and
datapath
Processor
Control unit
Note similarity to
single-purpose
processor
Datapath
ALU
Controller
Control
/Status
Registers
Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
PC
IR
I/O
Memory
Datapath Operations
Load
Processor
Control unit
Datapath
ALU
ALU operation
Controller
Registers
Store
Write register to
memory location
+1
Control
/Status
10
PC
11
IR
I/O
Memory
...
10
11
...
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Control Unit
Processor
Control unit
ALU
Controller
Datapath
Control
/Status
Registers
PC
IR
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
5
Processor
Control unit
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Datapath
Memory
500
501
R1
...
10
...
6
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
7
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
8
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
9
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
10
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC 100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
11
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
+1
Control
/Status
PC=101
Registers
clk
10
PC 101
IR
inc R1, R0
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
11
R1
...
10
...
12
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
PC=101
Registers
clk
10
PC 102
IR
store M[501], R1
R0
11
R1
PC=102
clk
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
...
500 10
501 11
...
13
Architectural Considerations
N-bit processor
N-bit ALU, registers,
buses, memory data
interface
Embedded: 8-bit, 16bit, 32-bit common
Desktop/servers: 32bit, even 64
PC size determines
address space
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
14
Architectural Considerations
Clock frequency
Inverse of clock
period
Must be longer than
longest register to
register delay in
entire processor
Memory access is
often the longest
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
15
Non-pipelined
Dry
Decode
Time
Instruction 1
Execute
Store res.
Fetch ops.
Pipelined
Fetch-instr.
Time
Pipelined
Time
16
17
Princeton
Processor
Fewer memory
wires
Harvard
Simultaneous
program and data
memory access
Program
memory
Data memory
Harvard
Memory
(program and data)
Princeton
18
Cache Memory
Memory access may be slow
Cache is small but fast
memory close to processor
Holds copy of part of memory
Hits and misses
Cache
Memory
19
Programmers View
Programmer doesnt need detailed understanding of architecture
Instead, needs to know what instructions can be executed
20
Assembly-Level Instructions
Instruction 1
opcode
operand1
operand2
Instruction 2
opcode
operand1
operand2
Instruction 3
opcode
operand1
operand2
Instruction 4
opcode
operand1
operand2
...
Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc.
Arithmetic/logical: move register through ALU and back
Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
21
First byte
Second byte
Operation
0000
Rn
direct
Rn = M(direct)
MOV direct, Rn
0001
Rn
direct
M(direct) = Rn
MOV @Rn, Rm
0010
Rn
0011
Rn
ADD Rn, Rm
0100
Rn
Rm
Rn = Rn + Rm
SUB Rn, Rm
0101
Rn
Rm
Rn = Rn - Rm
JZ Rn, relative
0110
Rn
opcode
Rm
immediate
relative
M(Rn) = Rm
Rn = immediate
PC = PC+ relative
(only if Rn is 0)
operands
22
Addressing Modes
Addressing
mode
Operand field
Immediate
Data
Register-direct
Register-file
contents
Memory
contents
Register address
Data
Register
indirect
Register address
Memory address
Direct
Memory address
Data
Indirect
Memory address
Memory address
Data
Data
23
Sample Programs
C program
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...
// total = 0
// i = 10
// constant 1
// constant 0
Loop:
5
6
7
JZ R1, Next;
ADD R0, R1;
SUB R1, R2;
JZ R3, Loop;
// Done if i=0
// total += i
// i-// Jump always
Next:
// next instructions...
24
Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
I/O
How communicate with external signals?
Interrupts
25
26
I/O Direction
Register Address
Output
2-9
Output
10,11,12,13,15
Input
14,16,17
Output
Pin 13
PC
Switch
Parallel port
Pin 2
LED
27
CheckPort
push
push
dx
mov
in
and
cmp
jne
SwitchOff:
mov
in
and
out
jmp
SwitchOn:
mov
in
or
out
Done:
pop
pop
CheckPort
proc
ax
;
;
dx, 3BCh + 1 ;
al, dx
;
al, 10h
;
al, 0
;
SwitchOn
;
extern C CheckPort(void);
// defined in
// assembly
void main(void) {
while( 1 ) {
CheckPort();
}
}
Pin 13
PC
Pin 2
Switch
Parallel port
LED
I/O Direction
Register Address
Output
2-9
Output
10,11,12,13,15
Input
14,16,17
Output
28
Operating System
Optional software layer
providing low-level services to
a program (application).
File management, disk access
Keyboard/display interfacing
Scheduling multiple programs for
execution
Or even just multiple threads from
one program
R0, 1324
R1, file_name
34
R0, L1
-----
29
Development Environment
Development processor
The processor on which we write and debug our programs
Usually a PC
Target processor
The processor that the program will run on in our embedded
system
Often different from the development processor
Development processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Target processor
30
C File
Compiler
Binary
File
Binary
File
Cross compiler
Asm.
File
Runs on one
processor, but
generates code for
another
Assemble
r
Binary
File
Linker
Library
Exec.
File
Implementation Phase
Debugger
Profiler
Verification Phase
Assemblers
Linkers
Debuggers
Profilers
31
Running a Program
If development processor is different than target, how
can we run our compiled code? Two options:
Download to target processor
Simulate
Simulation
One method: Hardware description language
But slow, not always available
32
typedef struct {
unsigned char first_byte, second_byte;
} instruction;
instruction program[1024];
unsigned char memory[256];
//instruction memory
//data memory
}
return 0;
int main(int argc, char *argv[]) {
FILE* ifs;
If( argc != 2 ||
(ifs = fopen(argv[1], rb) == NULL ) {
return 1;
}
if (run_program(fread(program,
sizeof(program) == 0) {
print_memory_contents();
return(0);
}
else return(-1);
int pc = -1;
unsigned char reg[16], fb, sb;
while( ++pc < (num_bytes / 2) ) {
fb = program[pc].first_byte;
sb = program[pc].second_byte;
switch( fb >> 4 ) {
case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;
case 2: memory[reg[fb & 0x0f]] =
reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return 1;
33
ISS
(b)
Implementation
Phase
Verification
Phase
Implementation
Phase
Development processor
Debugger/
ISS
Emulator
External tools
Download to board
Use device programmer
Runs in real environment, but
not controllable
Compromise: emulator
Programmer
Verification
Phase
Application-Specific Instruction-Set
Processors (ASIPs)
General-purpose processors
Sometimes too general to be effective in demanding
application
e.g., video processing requires huge video buffers and operations
on large arrays of data, inefficient on a GPP
Still programmable
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
35
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space
36
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
37
38
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.
39
Clock speed
Intel PIII
1GHz
IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110
550 MHz
Intel
8051
Motorola
68HC811
250 MHz
233 MHz
12 MHz
3 MHz
TI C5416
160 MHz
Lucent
DSP32C
80 MHz
Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None
Bus Width
MIPS
General Purpose Processors
32
~900
Power
Trans.
Price
97W
~7M
$900
32/64
~1300
5W
~7M
$900
32/64
NA
NA
3.6M
NA
32
268
1W
2.1M
NA
Microcontroller
~1
~0.2W
~10K
$7
~.5
~0.1W
~10K
$5
NA
NA
$34
32
NA
NA
$75
40
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
40
FSMD
Declarations:
bit PC[16], IR[16];
bit M[64k][16], RF[16][16];
Fetch
IR=M[PC];
PC=PC+1
from states
below
Mov1
RF[rn] = M[dir]
to Fetch
Mov2
M[dir] = RF[rn]
to Fetch
Mov3
M[rn] = RF[rm]
to Fetch
Mov4
RF[rn]= imm
to Fetch
op = 0000
0001
0010
PC=0;
Decode
Aliases:
op IR[15..12]
rn IR[11..8]
rm IR[7..4]
Reset
0011
0100
dir IR[7..0]
imm IR[7..0]
rel IR[7..0]
0101
0110
Add
RF[rn] =RF[rn]+RF[rm]
to Fetch
Sub
RF[rn] = RF[rn]-RF[rm]
to Fetch
Jz
41
Control unit
Controller
(Next-state and
control
logic; state register)
To all
input
control
signals
16
PCld
PCinc
PC
IR
From all
output
control
signals
Irld
RFs
2x1 mux
RFwa
RFw
RFwe
RF (16)
RFr1a
RFr1e
RFr2a
RFr2e
RFr1
RFr2
ALUs
PCclr
2
Ms
Datapath
3x1 mux
ALUz
ALU
Mre Mwe
Memory
42
A Simple Microprocessor
Reset
PC=0;
PCclr=1;
Fetch
IR=M[PC];
PC=PC+1
MS=10;
Irld=1;
Mre=1;
PCinc=1;
Decode
from states
below
Mov1
op = 0000
0001
0010
0011
0100
0101
0110
RF[rn] = M[dir]
to Fetch
Mov2
M[dir] = RF[rn]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=01; Mwe=1;
Mov3
M[rn] = RF[rm]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=10; Mwe=1;
Mov4
Add
RF[rn]= imm
to Fetch
RF[rn] =RF[rn]+RF[rm]
to Fetch
Sub
RF[rn] = RF[rn]-RF[rm]
to Fetch
Jz
FSMD
Control unit
To all
input
control
signals
Controller
(Next-state and
control
logic; state
register)
16
PCld
PCinc
PC
IR
From all
output
control
signals
Irld
RFs
2x1 mux
RFwa
RFwe
RFr1a
RFw
RF (16)
RFr1e
RFr2a
RFr2e
RFr1
RFr2
ALUs
PCclr
2
Ms
Datapath
3x1 mux
ALUz
ALU
Mre Mwe
Memory
43
Chapter Summary
General-purpose processors
Good performance, low NRE, flexible
ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs
44