Documente Academic
Documente Profesional
Documente Cultură
Chapter 2
MEMORY HIERARCHY
DESIGN
Introduction
Introduction
Memory Hierarchy
Processor
L1 Cache
Latency
L2 Cache
L3 Cache
Main Memory
Hard Drive or Flash
PROCESSOR
L1:
I-Cache
L2:
D-Cache
I-Cache
U-Cache
L3:
U-Cache
Main:
D-Cache
Main Memory
Introduction
Memory Hierarchy
Introduction
Introduction
Intel Core i7 can generate two references per core per clock
Four cores and 3.2 GHz clock
25.6 billion 64-bit data references/second +
12.8 billion 128-bit instruction references
= 409.6 GB/s!
Intel Core i7
Intel Core i5
4 cores 8 threads
2.5-3.5 GHz (Normal); 3.7 or 3.9GHz (Turbo)
4 cores 4 threads (or 2 cores 4 threads)
2.3-3.4GHz (Normal); 3.2-3.8Ghz (Turbo)
Intel Core i3
2 cores 4 threads
3.3 or 3.4 GHz
8
Introduction
Introduction
10
Placement Problem
Main
Memory
Cache
Memory
11
Placement Policies
12
Block number
Memory
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
1
2
3
4
5
6
7
Block can be
placed in any
location in
cache.
13
Block number
Memory
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
0
1
2
3
4
5
6
7
Set no.
Block number
Block number
Memory
0
1
2
3
4
5
6
7
0
1
2
3
Introduction
Write-through
Write-back
16
Dirty bit(s)
No need in I-caches.
No need in write through D-cache.
Write back D-cache needs it.
17
Write back
C P U
cache
Main memory
18
Write through
C P U
cache
Main memory
19
Miss rate
Introduction
Compulsory
Capacity
Conflict
20
Introduction
21
Cache organization
<21>
Tag
<8>
CPU
<5>
Index
address
blk
Data
Valid Tag
<1> <21>
Data
<256>
:
:
=
MUX
22
23
Higher associativity
Introduction
24
Metrics:
Advanced Optimizations
25
26
Advanced Optimizations
27
Advanced Optimizations
28
Advanced Optimizations
2) Way Prediction
Way selection
Increases mis-prediction penalty
29
Examples:
Pentium: 1 cycle
Pentium Pro Pentium III: 2 cycles
Pentium 4 Core i7: 4 cycles
Advanced Optimizations
3) Pipelining Cache
30
Advanced Optimizations
4) Nonblocking Caches
31
Advanced Optimizations
5) Multibanked Caches
32
Early restart
Advanced Optimizations
33
Advanced Optimizations
No write
buffering
Write buffering
34
Loop Interchange
Advanced Optimizations
8) Compiler Optimizations
Blocking
35
Advanced Optimizations
9) Hardware Prefetching
Pentium 4 Pre-fetching
Copyright 2012, Elsevier Inc. All rights reserved.
36
Cache prefetch
Advanced Optimizations
37
Advanced Optimizations
Summary
38
L1
I D
Core 1
I D
Core 2
Core 3
I D
I: 32KB 4-way
D: 32KB 4-way
L2
256KB
L3
8MB
39
Performance metrics
Cycle time
Memory Technology
Memory Technology
40
SRAM
Memory Technology
Memory Technology
DRAM
Every ~ 8 ms
Each row can be refreshed simultaneously
One transistor/bit
Address lines are multiplexed:
41
Amdahl:
Memory Technology
Memory Technology
Some optimizations:
Wider interfaces
Double data rate (DDR)
Multiple banks on each DRAM device
42
Memory Technology
Memory Optimizations
43
Memory Technology
Memory Optimizations
44
DDR:
DDR2
DDR3
1.5 V
800 MHz
DDR4
Memory Technology
Memory Optimizations
1-1.2 V
1600 MHz
45
Graphics memory:
Memory Technology
Memory Optimizations
Lower voltage
Low power mode (ignores clock, continues to
refresh)
46
Memory Technology
47
Type of EEPROM
Must be erased (in blocks) before being
overwritten
Non volatile
Limited number of write cycles
Cheaper than SDRAM, more expensive than
disk
Slower than SRAM, faster than disk
Memory Technology
Flash Memory
48
Memory Technology
Memory Dependability
(Hamming codes)
49
Role of architecture:
Virtual Memory
50
Virtual Machines
51
52