Documente Academic
Documente Profesional
Documente Cultură
Main Memory
Simple:
CPU, Cache, Bus,
Memory are same
bits)
width (32
Wide:
CPU/Mux 1 word;
Mux/Cache, Bus,
Memory N words
(Alpha: 64 bits &
256 bits)
Interleaved:
CPU, Cache, Bus
1wd:
Memory N Modules
(4 Modules);
shows word interleave
1-wordwide
Wide Memory
CPU
CPU
Interleaved Memory
CPU
memory
Cache
BUS
MUX
Cache
Cache
BUS
BUS
M
M
1 to send address,
6 access time,
1 to send data
Block access time
Assuming Cache Block is 4 words
Simple M.P.
= 4 x (1+6+1) = 32
Wide M.P.
=1+6+1
=8
Interleaved M.P. = 1 + 6 + (4x1) = 11
Address
Bank 0
Bank1
Bank 2
Bank 3
bus width
need a multiplexer to get the desired word from a block
2. Interleaved Memory
Interleaved Memory and Wide Memory
Consider the following description of a machine and its cache
performance
mem bus width = 1 word=32 bit
block size(word) 1
4
miss
rate(%)
3 instr
2
memory
accesses/
= 1.2
1 cache miss penalty = 8(1+6+1) cycles
average CPI(ignoring cache misses) = 2
What is the improvement over the base machine(block
size=1) in performance of interleaving 2-way and 4-way
versus doubling the width of memory and the bus
Interleaved Memory
Answer
CPI + (M ref/instr. x miss rate x miss penalty)
= 2 + (1.2 x (0.03 for 1-way, 0.02 for 2-way, or 0.01 for 4-way) x mis penalty)
Bank
Superbank Offset
Bank Number Bank Offset
Inner Loop is a
column processing
which causes bank
conflicts
Bank0 Bank1
Bank127 ,, Bank511
Column elements
are in
Bank Number:
Addr in Bank: 0
1
2
3
4
5
6
7
Seq. Interleaved
0
1
2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Modulo Interleaved
0
1
2
0
16
8
9
1
17
18
10
2
3
19
11
12
4
20
21
13
5
6
22
14
15
7
23
e.g., Video RAM for frame buffers, DRAM + fast serial output