Documente Academic
Documente Profesional
Documente Cultură
P. Bakowski
bako@ieee.org
P. Bakowski
number of bits
P. Bakowski
ARM ALU
The ALU, which performs the arithmetic and logic functions required by the instruction set. operands functions
P. Bakowski
P. Bakowski
data in register
to/from memory
P. Bakowski
d[31:0]
10
instructions
P. Bakowski
to instruction register
execute decode
14
fetch
decode fetch
clock cycle
P. Bakowski
15
fetch
decode fetch
P. Bakowski
16
Bbus
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
address register
Processor performance
The time T, required to execute a given program is given by:
fclk
P. Bakowski
- clock frequency
22
Processor performance
The time T, required to execute a given program is given by:
fclk
P. Bakowski
- clock frequency
23
Processor performance
The time T, required to execute a given program is given by:
fclk
P. Bakowski
- clock frequency
24
Processor performance
The time T, required to execute a given program is given by:
fclk
P. Bakowski
- clock frequency
25
Processor performance
Since Ninsi is constant for a given program there are only two ways to increase performance: increase the clock rate, fclk. reduce the average number of clock cycles per instruction, CPI.
P. Bakowski
26
stage1
stage2
stage3
P. Bakowski
27
stage1
stage2
stage3
stage1
stage2
stage3
DM
IM
Fetch stage
fetch decode execute buffer write
FETCH - the instruction is fetched from memory and placed in the instruction cache.
next PC
incrementer
I cache
to decoder
P. Bakowski
34
Decode stage
fetch decode execute buffer write
DECODE - the instruction is decoded and register operands read from the register file.
P. Bakowski
I - decode
35
Decode stage
fetch decode execute buffer write
There are three operand read ports in the register file, so most instructions can obtain all their operands in one cycle.
I - decode
register file
P. Bakowski
36
Execute stage
fetch decode execute buffer write
EXECUTE - an operand is shifted and the ALU result generated. register file
BS M ALU +4
ALUbus
P. Bakowski
37
Execute stage
fetch decode execute buffer write
EXECUTE - an operand is shifted and the ALU result generated. register file
BS M ALU +4
ALUbus
P. Bakowski
38
Execute stage
fetch decode execute buffer write
If the instruction is a load or store the memory address is computed in the ALU. register file
BS M ALU +4
ALUbus
P. Bakowski
39
Buffer stage
fetch decode execute buffer write
BUFFER data - data memory is accessed if required. Otherwise the ALU result is simply buffered for one clock cycle to give the same pipeline flow for all instructions. byte replication D cache +4
P. Bakowski
40
rotation/sign extension
WRITE-back; the results generated by the instruction are written back to the register file, including any data loaded from memory. ALUbus ALU D cache
P. Bakowski
Data forwarding
In the 5-stage pipeline instruction execution is spread across three pipeline stages, the only way to resolve data dependencies without stalling the pipeline is to introduce forwarding paths. fetch decode execute buffer write
P. Bakowski
42
Data forwarding
to register file fetch decode execute buffer fetch write write
Data dependencies arise when an instruction needs to use the result of one of its predecessors before that result has returned to the register file.
P. Bakowski
43
Data forwarding
Forwarding paths (by-pass) allow the intermediate results to be passed between stages as soon as they are available, in the 5-stage ARM pipeline each of the three source operands can be forwarded from any of three intermediate result registers to register file fetch decode execute buffer write by-pass paths fetch
P. Bakowski
write
44
PC organization - compatibility
The programming behavior of the PC implemented through r15 is based on the operational characteristics of the 3-stage ARM pipeline. Basically the 5-stage pipeline reads the instruction operands one stage earlier and that is incompatible with 3-stage design.
P. Bakowski
45
PC organization - compatibility
The programming behavior of the PC implemented through r15 is based on the operational characteristics of the 3-stage ARM pipeline. Basically the 5-stage pipeline reads the instruction operands one stage earlier and that is incompatible with 3-stage design.
P. Bakowski
46
PC organisation - solution
This problem is resolved by the incrementation of the PC value from the fetch stage in the decode stage, bypassing the pipeline register between the two stages. PC+4 for the next instruction is equal to PC+8 for the current instruction (4 bytes farther), so the correct r15 value is obtained without additional hardware.
P. Bakowski
47
PC organisation - solution
PC+4 for the next instruction is equal to PC+8 for the current instruction (4 bytes farther), so the correct r15 value is obtained without additional hardware. next PC+4
incrementer
I cache
next PC+8
P. Bakowski
48
P. Bakowski
49
P. Bakowski
50
P. Bakowski
51
P. Bakowski
52
bytes
words
3
P. Bakowski
0
53
0
54
P. Bakowski
0
55
P. Bakowski
BS M ALU +4
ALUbus
P. Bakowski
56
BS M ALU +4
ALUbus
P. Bakowski
57
register file
D -cache
memory
P. Bakowski
58
register file
D -cache
memory
P. Bakowski
59
ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions
P. Bakowski
60
ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions
P. Bakowski
61
ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions
P. Bakowski
62
ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions
P. Bakowski
63
P. Bakowski
64
register file M
BS ALU +4
ALUbus
P. Bakowski
65
P. Bakowski
66
P. Bakowski
67
P. Bakowski
68
P. Bakowski
69
link address
P. Bakowski
70
link address
system code
P. Bakowski
71
P. Bakowski
user code
72
memory-mapped devices
P. Bakowski
74
P. Bakowski
76
IRQ
FIQ
P. Bakowski
77
P. Bakowski
78
ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.
P. Bakowski
79
ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.
P. Bakowski
80
ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.
P. Bakowski
81
ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.
P. Bakowski
82
ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.
P. Bakowski
83
ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.
P. Bakowski
84
ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.
P. Bakowski
85
ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.
P. Bakowski
86
ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.
P. Bakowski
87
Bbus
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
address register
address register
incrementer
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
d[31:0]
address register
incrementer
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
d[31:0]
Store instruction
ir register compute address operation Bbus
a[31:0]
address register
incrementer
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
new address
d[31:0]
Store instruction
ir register store data auto-index Bbus
a[31:0]
address register
incrementer
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
auto-index
d[31:0]
Branch instruction
ir register compute branch address Bbus
a[31:0]
address register
incrementer
PC register bank
BS Abus ALU
ALUbus
P. Bakowski
branch address
d[31:0]
Branch instruction
ir register store return address Bbus
a[31:0]
address register
incrementer
BS Abus ALU
ALUbus
P. Bakowski
return address
d[31:0]
Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions
P. Bakowski
95
Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions
P. Bakowski
96
Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions
P. Bakowski
97
Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions
P. Bakowski
98
Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions
P. Bakowski
99