Documente Academic
Documente Profesional
Documente Cultură
Exercise 5
Chapter 5: Memory Hierarchy Design and
Chapter 4: Multiprocessors and Thread-Level Parallelism
Mafijul Islam
Department of Computer Science and Engineering
December 3, 2009
Case Studies/Assignments:
Case Study 2: 5.6, 5.7
Case Study 1: 4.1, 4.2
Block size: 32 x 32
Modified C code:
for ( i = 0; i < 256; i= i + B ) {
for ( j = 0; j < 256; j = j + B ) {
for(m=0; m < B; m++) {
for(n=0; n < B; n++) {
output[j+n][i+m] = input[i+m][j+n];
}
}
}
}
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
10
68
10
18
Data
00
00
00
00
00
00
00
00
08
10
18
20
28
30
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
4.1
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
S
S
M
I
120
108
110
118
20
08
30
10
B0
B1
B2
B3
I
M
I
S
100
128
110
118
10
68
10
18
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
a
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
M
S
M
I
120
108
110
118
80
08
30
10
B0
B1
B2
B3
I
M
I
S
100
128
110
118
10
68
10
18
B0
B1
B2
B3
I
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
b
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
100
128
110
118
10
68
10
18
B0
B1
B2
B3
M
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
c
P0
120
108
110
118
00
00
00
00
80
08
10
10
B0
B1
B2
B3
I
S
S
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
S
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
10
68
30
18
Data
00
00
00
00
00
00
00
00
08
30
18
20
28
30
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
d
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
M
M
I
100
108
110
118
10
48
30
10
B0
B1
B2
B3
I
M
I
S
100
128
110
118
10
68
10
18
B0
B1
B2
B3
S
I
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
e
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
S
M
I
100
108
130
118
10
08
78
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
10
68
10
18
Data
00
00
00
00
00
00
00
00
08
30
18
20
28
30
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
f
P0
120
108
110
118
00
00
00
00
20
08
10
10
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
100
128
110
118
10
68
10
18
B0
B1
B2
B3
S
S
M
I
Data
ess T
a
Addr
Data
00
00
00
00
te
Cohe
renc
e Sta
g
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
P15
te
P1
te
P0
120
108
130
118
00
00
00
00
20
08
78
10
Implementation 1
Implementation 2
Nmemory
100
100
Ncache
70
130
Ninvalidate
15
15
Nwriteback
10
10
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
10
68
10
18
Data
00
00
00
08
00
10
00
18
00
20
00
28
00
30
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
4.2 (a)
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
Implementation 1:
Implementation 2:
100
100
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
10
68
10
18
Stall Cycles
Stall Cycles
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
I
M
I
S
Addr
ess T
ag
20
08
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
120
108
110
118
Cohe
renc
e Sta
te
S
S
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
P15
te
P1
te
P0
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100
100
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
68
30
10
68
10
18
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
I
S
I
S
Addr
ess T
ag
20
68
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
120
128
110
118
Cohe
renc
e Sta
te
S
S
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100 +
100 +
70+10
130+10
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
30
18
20
68
30
+ 100+10
+ 100+10
=
=
10
68
10
18
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
I
S
I
S
Addr
ess T
ag
20
68
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
120
128
130
118
Cohe
renc
e Sta
te
S
S
S
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
Data
00
00
00
08
00
10
00
18
00
20
00
28
00
30
10
68
10
18
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
4.2 (b)
P0
120
108
110
118
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100
100
Stall Cycles
Stall Cycles
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
10
68
10
18
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
I
M
I
S
Addr
ess T
ag
00
08
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
110
118
Cohe
renc
e Sta
te
S
S
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
P15
te
P1
te
P0
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100
100
15
15
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
Stall Cycles
Stall Cycles
10
68
10
18
B0
B1
B2
B3
S
I
I
I
120
108
110
118
Data
I
M
I
S
Addr
ess T
ag
00
48
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
110
118
Cohe
renc
e Sta
te
S
M
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100 +
100 +
15
15
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
30
18
20
28
30
+ 10+100
+ 10+100
=
=
10
68
10
18
B0
B1
B2
B3
S
I
I
I
120
108
110
118
Data
I
M
I
S
Addr
ess T
ag
00
48
78
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
130
118
Cohe
renc
e Sta
te
S
M
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
Data
00
00
00
08
00
10
00
18
00
20
00
28
00
30
10
68
10
18
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
4.2 (c)
P0
120
108
110
118
00
00
00
00
20
08
10
10
I
S
M
I
100
108
110
118
10
08
30
10
Implementation 1:
Implementation 2:
100
100
Stall Cycles
Stall Cycles
B0
B1
B2
B3
120
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
20
68
10
18
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Addr
Data
S
M
I
S
te
Cohe
renc
e Sta
g
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
P15
te
P1
te
P0
120
108
110
118
00
00
00
00
20
08
10
10
I
S
M
I
100
108
110
118
10
08
30
10
Read hit
Implementation 1:
Implementation 2:
100
100
0
0
B0
B1
B2
B3
120
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
20
68
10
18
B0
B1
B2
B3
S
S
I
I
g
Data
ess T
a
Addr
Data
S
M
I
S
Stall Cycles
Stall Cycles
Cohe
renc
e Sta
g
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P15
te
P1
te
P0
120
108
110
118
00
00
00
00
20
08
10
10
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
Implementation 1:
Implementation 2:
100 +
100 +
0
0
+
+
120
128
130
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
100
100
=
=
20
68
30
18
B0
B1
B2
B3
S
S
I
I
g
Data
ess T
a
Data
S
M
S
S
Addr
Cohe
renc
e Sta
g
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P15
te
P1
te
P0
120
108
110
118
00
00
00
00
20
08
10
10
B0
B1
B2
B3
I
S
M
I
100
108
110
118
10
08
30
10
B0
B1
B2
B3
I
M
I
S
Memory
Address
100
108
110
118
120
128
130
100
128
110
118
Data
00
00
00
08
00
10
00
18
00
20
00
28
00
30
10
68
10
18
B0
B1
B2
B3
S
S
I
I
Data
ess T
a
Data
00
00
00
00
te
g
ess T
a
Addr
Data
00
00
00
00
Cohe
renc
e Sta
te
P15
Cohe
renc
e Sta
g
ess T
a
Addr
Cohe
renc
e Sta
te
P1
Addr
4.2 (d)
P0
120
108
110
118
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100
100
Stall Cycles
Stall Cycles
B0
B1
B2
B3
Data
100
128
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
00
68
10
18
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
S
M
I
S
Addr
ess T
ag
10
08
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
110
118
Cohe
renc
e Sta
te
I
S
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
P15
te
P1
te
P0
00
00
00
00
20
08
10
10
Implementation 1:
Implementation 2:
100
100
B0
B1
B2
B3
Data
100
108
110
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
68
30
00
48
10
18
B0
B1
B2
B3
S
I
I
I
120
108
110
118
Data
S
M
I
S
Addr
ess T
ag
10
08
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
110
118
Cohe
renc
e Sta
te
I
I
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10
B0
B1
B2
B3
Implementation 1:
Implementation 2:
100 +
100 +
100+10
100+10
+
+
Data
100
108
130
118
00
00
00
00
Memory
100
108
110
118
120
128
130
00
00
00
00
00
00
00
00
08
10
18
20
28
30
100
100
=
=
00
48
30
18
B0
B1
B2
B3
S
S
I
I
120
108
110
118
Data
S
M
M
S
Addr
ess T
ag
10
08
30
10
ess T
a
Data
00
00
00
00
Addr
Cohe
renc
e Sta
g
ess T
a
100
108
110
118
Cohe
renc
e Sta
te
I
S
M
I
Addr
Cohe
renc
e Sta
B0
B1
B2
B3
te
P1
te
P0
00
00
00
00
20
08
10
10