Documente Academic
Documente Profesional
Documente Cultură
Hardware/Software Introduction
Chapter 3 General-Purpose Processors:
Software
Introduction
General-Purpose Processor
Processor designed for a variety of computation tasks
Low unit cost, in part because manufacturer spreads NRE
over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
Basic Architecture
Control unit and
datapath
Processor
Control unit
Note similarity to
single-purpose
processor
Datapath
ALU
Controller
Control
/Status
Registers
Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
PC
IR
I/O
Memory
Datapath Operations
Load
Processor
Control unit
Datapath
ALU
ALU operation
Controller
Registers
Store
Write register to
memory location
+1
Control
/Status
10
PC
11
IR
I/O
Memory
...
10
11
...
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Control Unit
Processor
Control unit
ALU
Controller
Datapath
Control
/Status
Registers
PC
IR
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
5
Processor
Control unit
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Datapath
Memory
500
501
R1
...
10
...
6
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
7
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
8
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
9
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
10
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC 100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
R1
...
10
...
11
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
+1
Control
/Status
PC=101
Registers
clk
10
PC 101
IR
inc R1, R0
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Memory
500
501
11
R1
...
10
...
12
Instruction Cycles
PC=100
clk
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
PC=101
Registers
clk
10
PC 102
IR
store M[501], R1
R0
11
R1
PC=102
clk
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
...
500 10
501 11
...
13
Architectural Considerations
N-bit processor
N-bit ALU, registers,
buses, memory data
interface
Embedded: 8-bit, 16bit, 32-bit common
Desktop/servers: 32bit, even 64
PC size determines
address space
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
14
Architectural Considerations
Clock frequency
Inverse of clock
period
Must be longer than
longest register to
register delay in
entire processor
Memory access is
often the longest
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
15
Non-pipelined
Dry
Decode
Time
Instruction 1
Execute
Store res.
Fetch ops.
Pipelined
Fetch-instr.
Time
Pipelined
Time
16
17
Princeton
Processor
Fewer memory
wires
Harvard
Simultaneous
program and data
memory access
Program
memory
Data memory
Harvard
Memory
(program and data)
Princeton
18
Cache Memory
Memory access may be slow
Cache is small but fast
memory close to processor
Holds copy of part of memory
Hits and misses
Cache
Memory
19
Programmers View
Programmer doesnt need detailed understanding of architecture
Instead, needs to know what instructions can be executed
20
Assembly-Level Instructions
Instruction 1
opcode
operand1
operand2
Instruction 2
opcode
operand1
operand2
Instruction 3
opcode
operand1
operand2
Instruction 4
opcode
operand1
operand2
...
Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc.
Arithmetic/logical: move register through ALU and back
Branches: determine next PC value when not just PC+1
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
21
Addressing Modes
Addressing
mode
Operand field
Immediate
Data
Register-direct
Register-file
contents
Memory
contents
Register address
Data
Register
indirect
Register address
Memory address
Direct
Memory address
Data
Indirect
Memory address
Memory address
Data
Data
22
Sample Programs
C program
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...
// total = 0
// i = 10
// constant 1
// constant 0
Loop:
5
6
7
JZ R1, Next;
ADD R0, R1;
SUB R1, R2;
JZ R3, Loop;
// Done if i=0
// total += i
// i-// Jump always
Next:
// next instructions...
23
Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
I/O
How communicate with external signals?
Interrupts
24
Operating System
Optional software layer
providing low-level services to
a program (application).
File management, disk access
Keyboard/display interfacing
Scheduling multiple programs for
execution
Or even just multiple threads from
one program
R0, 1324
R1, file_name
34
R0, L1
-----
25
Development Environment
Development processor
The processor on which we write and debug our programs
Usually a PC
Target processor
The processor that the program will run on in our embedded
system
Often different from the development processor
Development processor
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
Target processor
26
C File
Compiler
Binary
File
Binary
File
Cross compiler
Asm.
File
Runs on one
processor, but
generates code for
another
Assemble
r
Binary
File
Linker
Library
Exec.
File
Implementation Phase
Debugger
Profiler
Verification Phase
Assemblers
Linkers
Debuggers
Profilers
27
Running a Program
If development processor is different than target, how
can we run our compiled code? Two options:
Download to target processor
Simulate
Simulation
One method: Hardware description language
But slow, not always available
28
typedef struct {
unsigned char first_byte, second_byte;
} instruction;
instruction program[1024];
unsigned char memory[256];
//instruction memory
//data memory
}
return 0;
int main(int argc, char *argv[]) {
FILE* ifs;
If( argc != 2 ||
(ifs = fopen(argv[1], rb) == NULL ) {
return 1;
}
if (run_program(fread(program,
sizeof(program) == 0) {
print_memory_contents();
return(0);
}
else return(-1);
int pc = -1;
unsigned char reg[16], fb, sb;
while( ++pc < (num_bytes / 2) ) {
fb = program[pc].first_byte;
sb = program[pc].second_byte;
switch( fb >> 4 ) {
case 0: reg[fb & 0x0f] = memory[sb]; break;
case 1: memory[sb] = reg[fb & 0x0f]; break;
case 2: memory[reg[fb & 0x0f]] =
reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return 1;
29
ISS
(b)
Implementation
Phase
Verification
Phase
Implementation
Phase
Development processor
Debugger/
ISS
Emulator
External tools
Download to board
Use device programmer
Runs in real environment, but
not controllable
Compromise: emulator
Programmer
Verification
Phase
Application-Specific Instruction-Set
Processors (ASIPs)
General-purpose processors
Sometimes too general to be effective in demanding
application
e.g., video processing requires huge video buffers and operations
on large arrays of data, inefficient on a GPP
Still programmable
Embedded Systems Design: A Unified
Hardware/Software Introduction, (c) 2000 Vahid/Givargis
31
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space
32
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
33
34
Chapter Summary
General-purpose processors
Good performance, low NRE, flexible
ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs
35