Documente Academic
Documente Profesional
Documente Cultură
CPS104 Lec25.1
GK Spring 2004
Admin.
Homework-6: Is posted. u Due date was extended to March 29. u No further extension! u The second part of this assignment will be posted Monday March 22. u This assignment is harder then it looks. u These two assignment have larger weight then other homework assignments. u Please start ASAP!!
CPS104 Lec25.2
GK Spring 2004
100
10
DRAM 1
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Year
CPS104 Lec25.3
GK Spring 2004
2000
Clock Period:
0.3ns
Instructions : 1-4 instructions/clock (4-way super-scalar) On-chip small-fast SRAM (Level-1 cache): 0.3-0.6ns (1-2 clocks). On-chip large-fast SRAM (Level-2 cache) 4-6ns (12-18 clocks). Off-chip large-fast SRAM (Level-3 cache) 7-14ns (20-40 clocks) Off chip large-slow DRAM (Main memory) 90-120ns (270-360 clocks)
CPS104 Lec25.4
Processor
Cache
DRAM
Motivation: u Large memories (DRAM) are slow u Small memories (SRAM) are fast Make the average access time shorter by: u Servicing most accesses from a small, fast memory. Reduce the bandwidth required of the large memory.
CPS104 Lec25.5
GK Spring 2004
Upper Level
Staging Xfer Unit
faster
Registers Instr. Operands Cache Blocks Memory Pages Disk Files Tape
user/operator Mbytes OS 512-8K bytes cache cntl 8-128 bytes prog./compiler 1-8 bytes
Larger
Lower Level
GK Spring 2004
Address Space
2n
The Principle of Locality: u Program access a relatively small portion of the address space at any instant of time. u Example: 90% of time in 10% of the code Two Different Types of Locality: u Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon. u Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.
GK Spring 2004
CPS104 Lec25.7
Cache memory
Cache memory is a small, fast memory buffer that holds blocks of dada (and instructions). Cache memory is close to the processor, it has low access latency. The data blocks in the cache are the ones that were recently used by the processor, and (hopefully) will be used by the processor again. When accessing data, the hardware always checks first to see if the data is in the cache, before looking further down the memory hierarchy.
CPS104 Lec25.8
GK Spring 2004
At any given time, data is copied between only 2 adjacent levels: u Upper Level (Cache) : the one closer to the processor
Lower Level (Memory): the one further away from the processor
Block: u The minimum unit of information that can either be present or not present in the two level hierarchy
To Processor Upper Level (Cache)
Blk X
From Processor
Blk Y
CPS104 Lec25.9
GK Spring 2004
Hit: data appears in some block in the upper level (example: Block X) u Hit Rate: the fraction of memory access found in the upper level u Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y) u Miss Rate = 1 - (Hit Rate) u Miss Penalty = Time to replace a block in the upper level +
Time to deliver the block the processor
From Processor
CPS104 Lec25.10
Blk Y
GK Spring 2004
Direct Mapped Cache Direct Mapped cache is an array of fixed size blocks. Each block holds consecutive bytes of main memory data. The Tag Array holds the Block Memory Address. A valid bit associated with each cache block tells if the data is valid. Cache Index: The location of a block (and its tag) in the cache. Block Offset: The byte location in the cache block. Cache-Index = (<Address> Mod (Cache_Size))/ Block_Size Block-Offset = <Address> Mod (Block_Size) Tag = <Address> / (Cache_Size)
CPS104 Lec25.11
GK Spring 2004
Location 0 can be occupied by data from: u Memory location 0, 4, 8, ... etc. u In general: any memory location whose 2 LSBs of the address are 00 u Address<1:0> => cache index Which one should we place in the cache? How can we tell which one GK Spring 2004 is in the cache?
There are 2M-L cache blocks, Lowest L bits of the address are Block-Offset bits
u u
Next (M - L) bits are the Cache-Index. The last (32 - M) bits are the Tag bits.
32-M bits Tag M-L bits Cache Index L bits block offset
Data Address
CPS104 Lec25.13
GK Spring 2004
Example: 1-KB Cache with 32B blocks: Cache Index = (<Address> Mod (1024))/ 32 Block-Offset = <Address> Mod (32) Tag = <Address> / (1024)
22 bits Tag 5 bits Cache Index 5 bits block offset
Address
Valid bit Cache Tag 22 bits Direct Mapped Cache Data 32-byte block Byte 31 Byte 30
....
Byte 1 Byte 0
32 cache blocks
CPS104 Lec25.14
1K = 210 = 1024 25 = 32
GK Spring 2004
For a 1024 (210) byte cache with 32-byte blocks: u The uppermost 22 = (32 - 10) address bits are the Cache Tag u The lowest 5 address bits are the Byte Select (Block Size = 25) u The next 5 address bits (bit5 - bit9) are the Cache Index
31 Cache Tag Example: 0x50 9 Cache Index Ex: 0x01 Cache Data Byte 31 Byte 63 0x50 4 0 Byte Select Ex: 0x00
: :
:
Byte 1023
:
CPS104 Lec25.15
Byte 992 31
Byte Select
GK Spring 2004
Valid Bit 0 1 1
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.16
GK Spring 2004
Valid Bit 1 1 1
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.17
Byte Select
GK Spring 2004
Valid Bit 1 1 1
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.18
GK Spring 2004
Valid Bit 1 1 1
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.19
GK Spring 2004
Valid Bit 1 1 1
: :
:
Byte 1023
:
Byte 992 31
=
CPS104 Lec25.20
Byte Select
GK Spring 2004
In general, larger block size take advantage of spatial locality BUT: u Larger block size means larger miss penalty:
Takes longer time to fill up the block Too few cache blocks
If block size is too big relative to cache size, miss rate will go up
Miss Penalty
Block Size
N-way set associative: N entries for each Cache Index u N direct mapped caches operating in parallel Example: Two-way set associative cache u Cache Index selects a set from the cache u The two tags in the set are compared in parallel u Data is selected based on the tag result
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid
Valid
Adr Tag
Compare
SEL11
Mux
0 SEL0
Compare
OR
CPS104 Lec25.22
Hit
Cache Block
GK Spring 2004
Higher Hit rate for the same cache size. Fewer Conflict Misses. Can can have a larger cache but keep the index smaller (same size as virtual page index)
CPS104 Lec25.23
GK Spring 2004
N-way Set Associative Cache versus Direct Mapped Cache: u N comparators vs. 1 u Extra MUX delay for the data u Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: u Possible to assume a hit and continue. Recover later if miss.
Cache Tag Cache Index Cache Data Cache Data Cache Block 0 Cache Block 0 Cache Tag Valid
Valid
Adr Tag
Compare
SEL11
Mux
0 SEL0
Compare
OR
CPS104 Lec25.24
Hit
Cache Block
GK Spring 2004
Fully Associative Cache -- push the set associative idea to its limit! u Forget about the Cache Index u Compare the Cache Tags of all cache entries in parallel u Example: Block Size = 32B blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache
31 Cache Tag (27 bits long) 4 0 Byte Select Ex: 0x01
Cache Tag X X X X X
CPS104 Lec25.25
Valid Bit Cache Data Byte 31 Byte 1 Byte 0 Byte 63 Byte 33 Byte 32
: :
:
GK Spring 2004
Compulsory (cold start or process migration, first reference): first access to a block u Cold fact of life: not a whole lot you can do about it Conflict (collision): u Multiple memory locations mapped to the same cache location u Solution 1: increase cache size u Solution 2: increase Associativity Capacity: u Cache cannot contain all blocks access by the program u Solution: increase cache size Invalidation: other process (e.g., I/O) updates memory
CPS104 Lec25.26
GK Spring 2004
Note: If you are going to run billions of instruction, Compulsory Misses are insignificant.
CPS104 Lec25.27
GK Spring 2004