Sunteți pe pagina 1din 27

CPS 104 Computer Organization and Programming Lecture- 25: Cache Memory

March 19, 2004 Gershon Kedem http://kedem.duke.edu/cps104/Lectures

CPS104 Lec25.1

GK Spring 2004

Admin.

Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!!

CPS104 Lec25.2

GK Spring 2004

Who Cares about Memory Hierarchy?


The CPU Memory Gap
106-Cycles/S
1000 CPU

100

10

DRAM 1

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

Year
CPS104 Lec25.3
GK Spring 2004

2000

Review: The CPU-Memory Speed Gap

To illustrate the problem consider typical delays, measured in ns.


u u u

Clock Period:

0.3ns

Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)

Question: How often does the computer access memory?


GK Spring 2004

CPS104 Lec25.4

The Motivation for Caches


Memory System

Processor

Cache

DRAM

Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.

CPS104 Lec25.5

GK Spring 2004

Levels of the Memory Hierarchy


Capacity Access Time Cost CPU Registers 100s Bytes <10s ns Cache K Bytes 10-100 ns ~$.0005/bit Main Memory M Bytes 100-200ns ~$10-7/byte Disk 10-100 G Bytes ms -8 -10 ~$10 - 10 /byte Tape infinite sec-min $10 CPS104 Lec25.6
-11

Upper Level
Staging Xfer Unit

faster

Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes

Larger

Lower Level
GK Spring 2004

The Principle of Locality


Probability of reference

Address Space

2n

The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.
GK Spring 2004

CPS104 Lec25.7

Cache memory

Cache memory is a small, fast memory buffer that holds blocks of dada (and instructions). Cache memory is close to the processor, it has low access latency. The data blocks in the cache are the ones that were recently used by the processor, and (hopefully) will be used by the processor again. When accessing data, the hardware always checks first to see if the data is in the cache, before looking further down the memory hierarchy.

CPS104 Lec25.8

GK Spring 2004

Memory Hierarchy: Principles of Operation

At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor

Smaller, faster, and uses more expensive technology

Lower Level (Memory): the one further away from the processor

Bigger, slower, and uses less expensive technology

Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X

Lower Level (Memory)

From Processor

Blk Y

CPS104 Lec25.9

GK Spring 2004

Memory Hierarchy: Terminology

Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss

Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor

Hit Time << Miss Penalty


To Processor Upper Level (Cache)
Blk X

Lower Level (Memory)

From Processor
CPS104 Lec25.10

Blk Y
GK Spring 2004

Direct Mapped Cache Direct Mapped cache is an array of fixed size blocks. Each block holds consecutive bytes of main memory data. The Tag Array holds the Block Memory Address. A valid bit associated with each cache block tells if the data is valid. Cache Index: The location of a block (and its tag) in the cache. Block Offset: The byte location in the cache block. Cache-Index = (<Address> Mod (Cache_Size))/ Block_Size Block-Offset = <Address> Mod (Block_Size) Tag = <Address> / (Cache_Size)
CPS104 Lec25.11
GK Spring 2004

The Simplest Cache: Direct Mapped Cache


Memory Address 0 1 2 3 4 5 6 7 8 9 A B C D E F
CPS104 Lec25.12

Memory 4 Byte Direct Mapped Cache Cache Index 0 1 2 3

Location 0 can be occupied by data from: u Memory location 0, 4, 8, ... etc. u In general: any memory location whose 2 LSBs of the address are 00 u Address<1:0> => cache index Which one should we place in the cache? How can we tell which one GK Spring 2004 is in the cache?

Direct Mapped Cache (Cont.)

For a Cache of 2M bytes with block size of 2L bytes


u u

There are 2M-L cache blocks, Lowest L bits of the address are Block-Offset bits

u u

Next (M - L) bits are the Cache-Index. The last (32 - M) bits are the Tag bits.
32-M bits Tag M-L bits Cache Index L bits block offset

Data Address
CPS104 Lec25.13
GK Spring 2004

Example: 1-KB Cache with 32B blocks: Cache Index = (<Address> Mod (1024))/ 32 Block-Offset = <Address> Mod (32) Tag = <Address> / (1024)
22 bits Tag 5 bits Cache Index 5 bits block offset

Address
Valid bit Cache Tag 22 bits Direct Mapped Cache Data 32-byte block Byte 31 Byte 30

....

Byte 1 Byte 0

32 cache blocks

CPS104 Lec25.14

1K = 210 = 1024 25 = 32

GK Spring 2004

Example: 1KB Direct Mapped Cache with 32B Blocks

For a 1024 (210) byte cache with 32-byte blocks: u The uppermost 22 = (32 - 10) address bits are the Cache Tag u The lowest 5 address bits are the Byte Select (Block Size = 25) u The next 5 address bits (bit5 - bit9) are the Cache Index
31 Cache Tag Example: 0x50 9 Cache Index Ex: 0x01 Cache Data Byte 31 Byte 63 0x50 4 0 Byte Select Ex: 0x00

Stored as part of the cache state Valid Bit Cache Tag

: :

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

:
Byte 1023

:
CPS104 Lec25.15

Byte 992 31

Byte Select
GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00

Valid Bit 0 1 1

Cache Tag 0xxxxxxx 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec25.16

Byte Select Cache Miss

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x0002fe 9 Cache Index 0x00 4 Byte Select 0 0x00

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data 0 New Block of data Byte 63 Byte 33 Byte 32 1 2 3

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec25.17

Byte Select

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x000050 9 Cache Index 0x01 4 Byte Select 0 0x08

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec25.18

Byte Select Cache Hit

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x002450 9 Cache Index 0x02 4 Byte Select 0 0x04

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x004440

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec25.19

Byte Select Cache Miss

GK Spring 2004

Example: 1K Direct Mapped Cache


31 Cache Tag 0x002450 9 Cache Index 0x02 4 Byte Select 0 0x04

Valid Bit 1 1 1

Cache Tag 0x0002fe 0x000050 0x002450

Cache Data Byte 31 Byte 63

Byte 1 Byte 0 0 Byte 33 Byte 32 1 2 New Block of data 3

: :

:
Byte 1023

:
Byte 992 31

=
CPS104 Lec25.20

Byte Select

GK Spring 2004

Block Size Tradeoff

In general, larger block size take advantage of spatial locality BUT: u Larger block size means larger miss penalty:

Takes longer time to fill up the block Too few cache blocks

If block size is too big relative to cache size, miss rate will go up

In general, Average Access Time:


u

Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate


Miss Rate Exploits Spatial Locality
Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate

Miss Penalty

Block CPS104 Lec25.21 Size

Block Size

Block Size 2004 GK Spring

A N-way Set Associative Cache

N-way set associative: N entries for each Cache Index u N direct mapped caches operating in parallel Example: Two-way set associative cache u Cache Index selects a set from the cache u The two tags in the set are compared in parallel u Data is selected based on the tag result
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid

Valid

Adr Tag

Compare

SEL11

Mux

0 SEL0

Compare

OR
CPS104 Lec25.22

Hit

Cache Block
GK Spring 2004

Advantages of Set associative cache

Higher Hit rate for the same cache size. Fewer Conflict Misses. Can can have a larger cache but keep the index smaller (same size as virtual page index)

CPS104 Lec25.23

GK Spring 2004

Disadvantage of Set Associative Cache

N-way Set Associative Cache versus Direct Mapped Cache: u N comparators vs. 1 u Extra MUX delay for the data u Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: u Possible to assume a hit and continue. Recover later if miss.
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid

Valid

Adr Tag

Compare

SEL11

Mux

0 SEL0

Compare

OR
CPS104 Lec25.24

Hit

Cache Block

GK Spring 2004

Another Extreme Example:Fully Associative cache

Fully Associative Cache -- push the set associative idea to its limit! u Forget about the Cache Index u Compare the Cache Tags of all cache entries in parallel u Example: Block Size = 32B blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache
31 Cache Tag (27 bits long) 4 0 Byte Select Ex: 0x01

Cache Tag X X X X X
CPS104 Lec25.25

Valid Bit Cache Data Byte 31 Byte 1 Byte 0 Byte 63 Byte 33 Byte 32

: :

:
GK Spring 2004

Sources of Cache Misses

Compulsory (cold start or process migration, first reference): first access to a block u Cold fact of life: not a whole lot you can do about it Conflict (collision): u Multiple memory locations mapped to the same cache location u Solution 1: increase cache size u Solution 2: increase Associativity Capacity: u Cache cannot contain all blocks access by the program u Solution: increase cache size Invalidation: other process (e.g., I/O) updates memory

CPS104 Lec25.26

GK Spring 2004

Sources of Cache Misses

Direct Mapped Cache Size Compulsory Miss Big Same

N-way Set Associative Medium Same Medium Medium Same

Fully Associative Small Same Zero High Same

Conflict Miss Capacity Miss Invalidation Miss

High Low(er) Same

Note: If you are going to run billions of instruction, Compulsory Misses are insignificant.

CPS104 Lec25.27

GK Spring 2004

S-ar putea să vă placă și