Lecture 25

CPS 104 Computer Organization and Programming Lecture- 25: Cache Memory
March 19, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures
CPS104 Lec25.1
GK Spring 2004
Admin.
Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!!
CPS104 Lec25.2
GK Spring 2004
Who Cares about Memory Hierarchy?

The CPU Memory Gap
106-Cycles/S
1000 CPU
100
10
DRAM 1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Year
CPS104 Lec25.3
GK Spring 2004
2000
Review: The CPU-Memory Speed Gap
To illustrate the problem consider typical delays, measured in ns.

u u u
Clock Period:
0.3ns
Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)
Question: How often does the computer access memory?

GK Spring 2004
CPS104 Lec25.4
The Motivation for Caches

Memory System
Processor
Cache
DRAM
Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.
CPS104 Lec25.5
GK Spring 2004
Levels of the Memory Hierarchy

Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns ~$.0005/bit Main Memory M Bytes 100-200ns ~$10-7/byte Disk 10-100 G Bytes ms -8 -10 ~$10 - 10 /byte Tape infinite sec-min $10 CPS104 Lec25.6
-11
Upper Level
Staging Xfer Unit
faster
Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes
Larger
Lower Level
GK Spring 2004
The Principle of Locality

Probability of reference
Address Space
2n
The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.
GK Spring 2004
CPS104 Lec25.7
Cache memory

Cache memory is a small, fast memory buffer that holds blocks of dada (and instructions). Cache memory is close to the processor, it has low access latency. The data blocks in the cache are the ones that were recently used by the processor, and (hopefully) will be used by the processor again. When accessing data, the hardware always checks first to see if the data is in the cache, before looking further down the memory hierarchy.
CPS104 Lec25.8
GK Spring 2004
Memory Hierarchy: Principles of Operation
At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor
Smaller, faster, and uses more expensive technology
Lower Level (Memory): the one further away from the processor
Bigger, slower, and uses less expensive technology
Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X
Lower Level (Memory)
From Processor
Blk Y
CPS104 Lec25.9
GK Spring 2004
Memory Hierarchy: Terminology
Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor
Hit Time << Miss Penalty

To Processor Upper Level (Cache)
Blk X
Lower Level (Memory)
From Processor
CPS104 Lec25.10
Blk Y
GK Spring 2004
Direct Mapped Cache Direct Mapped cache is an array of fixed size blocks. Each block holds consecutive bytes of main memory data. The Tag Array holds the Block Memory Address. A valid bit associated with each cache block tells if the data is valid. Cache Index: The location of a block (and its tag) in the cache. Block Offset: The byte location in the cache block. Cache-Index = (<Address> Mod (Cache_Size))/ Block_Size Block-Offset = <Address> Mod (Block_Size) Tag = <Address> / (Cache_Size)
CPS104 Lec25.11
GK Spring 2004
The Simplest Cache: Direct Mapped Cache

Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
CPS104 Lec25.12
Memory 4 Byte Direct Mapped Cache Cache Index 0 1 2 3
Location 0 can be occupied by data from: u Memory location 0, 4, 8, ... etc. u In general: any memory location whose 2 LSBs of the address are 00 u Address<1:0> => cache index Which one should we place in the cache? How can we tell which one GK Spring 2004 is in the cache?
Direct Mapped Cache (Cont.)
For a Cache of 2M bytes with block size of 2L bytes

u u
There are 2M-L cache blocks, Lowest L bits of the address are Block-Offset bits
u u
Next (M - L) bits are the Cache-Index. The last (32 - M) bits are the Tag bits.
32-M bits Tag M-L bits Cache Index L bits block offset
Data Address
CPS104 Lec25.13
GK Spring 2004
Example: 1-KB Cache with 32B blocks: Cache Index = (<Address> Mod (1024))/ 32 Block-Offset = <Address> Mod (32) Tag = <Address> / (1024)
22 bits Tag 5 bits Cache Index 5 bits block offset
Address
Valid bit Cache Tag 22 bits Direct Mapped Cache Data 32-byte block Byte 31 Byte 30
....
Byte 1 Byte 0
32 cache blocks
CPS104 Lec25.14
1K = 210 = 1024 25 = 32
GK Spring 2004
Example: 1KB Direct Mapped Cache with 32B Blocks
For a 1024 (210) byte cache with 32-byte blocks: u The uppermost 22 = (32 - 10) address bits are the Cache Tag u The lowest 5 address bits are the Byte Select (Block Size = 25) u The next 5 address bits (bit5 - bit9) are the Cache Index
31 Cache Tag Example: 0x50 9 Cache Index Ex: 0x01 Cache Data Byte 31 Byte 63 0x50 4 0 Byte Select Ex: 0x00
Stored as part of the cache state Valid Bit Cache Tag
: :
Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3
:
Byte 1023
:
CPS104 Lec25.15
Byte 992 31
Byte Select
GK Spring 2004
Example: 1K Direct Mapped Cache

31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00
Valid Bit 0 1 1
Cache Tag 0xxxxxxx 0x000050 0x004440
Cache Data Byte 31 Byte 63
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.16
Byte Select Cache Miss
GK Spring 2004

31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00
Valid Bit 1 1 1
Cache Tag 0x0002fe 0x000050 0x004440
Cache Data 0 New Block of data Byte 63 Byte 33 Byte 32 1 2 3
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.17
Byte Select
GK Spring 2004

31 Cache Tag 0x000050 9 Cache Index 0x01 4 Byte Select 0 0x08
Valid Bit 1 1 1
Cache Tag 0x0002fe 0x000050 0x004440
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.18
Byte Select Cache Hit
GK Spring 2004

Valid Bit 1 1 1
Cache Tag 0x0002fe 0x000050 0x004440
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.19
Byte Select Cache Miss
GK Spring 2004

Valid Bit 1 1 1
Cache Tag 0x0002fe 0x000050 0x002450
Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 New Block of data 3
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.20
Byte Select
GK Spring 2004
Block Size Tradeoff
In general, larger block size take advantage of spatial locality BUT: u Larger block size means larger miss penalty:
Takes longer time to fill up the block Too few cache blocks
If block size is too big relative to cache size, miss rate will go up
In general, Average Access Time:

u
Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

Miss Rate Exploits Spatial Locality
Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate
Miss Penalty
Block CPS104 Lec25.21 Size
Block Size
Block Size 2004 GK Spring
A N-way Set Associative Cache
N-way set associative: N entries for each Cache Index u N direct mapped caches operating in parallel Example: Two-way set associative cache u Cache Index selects a set from the cache u The two tags in the set are compared in parallel u Data is selected based on the tag result
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid
Valid
Adr Tag
Compare
SEL11
Mux
0 SEL0
Compare
OR
CPS104 Lec25.22
Hit
Cache Block
GK Spring 2004
Advantages of Set associative cache
Higher Hit rate for the same cache size. Fewer Conflict Misses. Can can have a larger cache but keep the index smaller (same size as virtual page index)
CPS104 Lec25.23
GK Spring 2004
Disadvantage of Set Associative Cache
N-way Set Associative Cache versus Direct Mapped Cache: u N comparators vs. 1 u Extra MUX delay for the data u Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: u Possible to assume a hit and continue. Recover later if miss.
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid
Valid
Adr Tag
Compare
SEL11
Mux
0 SEL0
Compare
OR
CPS104 Lec25.24
Hit
Cache Block
GK Spring 2004
Another Extreme Example:Fully Associative cache
Fully Associative Cache -- push the set associative idea to its limit! u Forget about the Cache Index u Compare the Cache Tags of all cache entries in parallel u Example: Block Size = 32B blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache
31 Cache Tag (27 bits long) 4 0 Byte Select Ex: 0x01
Cache Tag X X X X X
CPS104 Lec25.25
Valid Bit Cache Data Byte 31 Byte 1 Byte 0 Byte 63 Byte 33 Byte 32
: :
:
GK Spring 2004
Sources of Cache Misses
Compulsory (cold start or process migration, first reference): first access to a block u Cold fact of life: not a whole lot you can do about it Conflict (collision): u Multiple memory locations mapped to the same cache location u Solution 1: increase cache size u Solution 2: increase Associativity Capacity: u Cache cannot contain all blocks access by the program u Solution: increase cache size Invalidation: other process (e.g., I/O) updates memory
CPS104 Lec25.26
GK Spring 2004
Sources of Cache Misses
Direct Mapped Cache Size Compulsory Miss Big Same
N-way Set Associative Medium Same Medium Medium Same
Fully Associative Small Same Zero High Same
Conflict Miss Capacity Miss Invalidation Miss
High Low(er) Same
Note: If you are going to run billions of instruction, Compulsory Misses are insignificant.
CPS104 Lec25.27
GK Spring 2004

Lecture 25

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 25

Încărcat de

Drepturi de autor:

Formate disponibile

CPS 104 Computer Organization and Programming Lecture- 25: Cache Memory

March 19, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

Who Cares about Memory Hierarchy?

Review: The CPU-Memory Speed Gap

To illustrate the problem consider typical delays, measured in ns.

Question: How often does the computer access memory?

The Motivation for Caches

Levels of the Memory Hierarchy

The Principle of Locality

Memory Hierarchy: Principles of Operation

Smaller, faster, and uses more expensive technology

Bigger, slower, and uses less expensive technology

Lower Level (Memory)

Memory Hierarchy: Terminology

Hit Time << Miss Penalty

Lower Level (Memory)

The Simplest Cache: Direct Mapped Cache

Memory 4 Byte Direct Mapped Cache Cache Index 0 1 2 3

Direct Mapped Cache (Cont.)

For a Cache of 2M bytes with block size of 2L bytes

Example: 1KB Direct Mapped Cache with 32B Blocks

Stored as part of the cache state Valid Bit Cache Tag

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

Example: 1K Direct Mapped Cache

Cache Tag 0xxxxxxx 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

Byte Select Cache Miss

Example: 1K Direct Mapped Cache

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data 0 New Block of data Byte 63 Byte 33 Byte 32 1 2 3

Example: 1K Direct Mapped Cache

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

Byte Select Cache Hit

Example: 1K Direct Mapped Cache

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

Byte Select Cache Miss

Example: 1K Direct Mapped Cache

Cache Tag 0x0002fe 0x000050 0x002450

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 New Block of data 3

Block Size Tradeoff

In general, Average Access Time:

Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate

Block CPS104 Lec25.21 Size

Block Size 2004 GK Spring

A N-way Set Associative Cache

Advantages of Set associative cache

Disadvantage of Set Associative Cache

Another Extreme Example:Fully Associative cache

Sources of Cache Misses

Sources of Cache Misses

Direct Mapped Cache Size Compulsory Miss Big Same

N-way Set Associative Medium Same Medium Medium Same

Fully Associative Small Same Zero High Same

Conflict Miss Capacity Miss Invalidation Miss

High Low(er) Same

S-ar putea să vă placă și