Documente Academic
Documente Profesional
Documente Cultură
Unsigned Integers
If
have an n-digit unsigned numeral dn1 dn2 . . . d0 in radix (or base) r, then the value of that numeral is
Pwe
n1 i
2
i=0 r di , which is just fancy notation to say that instead of a 10s or 100s place we have an rs or r s place.
For binary, decimal, and hex we just let r be 2, 10, and 16, respectively.
Recall also that we often have cause to write down unreasonably large numbers, and our preferred tool for
doing that is the IEC prefixing system: Ki = 210 , Mi = 220 , Gi = 230 , Ti = 240 , Pi = 250 , Ei = 260 , Zi = 270 ,
Yi = 280 .
1.1
1. Convert the following numbers from their initial radix into the other two common radices: 0b10010011
= 147 = 0x93 , 0xD3AD = 0b1101 0011 1010 1101 = 54189 , 63 = 0b0011 1111 = 0x3F , 0b00100100 =
36 = 0x24 , 0xB33F = 0b1011 0011 0011 1111 = 45887 , 0 = 0b0 = 0x0 , 39 = 0b0010 0111 = 0x27 ,
0x7EC4 = 0b0111 1110 1100 0100 =32452 , 437 = 0b0001 1011 0101 = 0x1B5
2. Write the following numbers using IEC prefixes: 216 = 64 Ki , 234 = 16 Gi, 227 = 128 Mi, 261 = 2 Ei,
243 = 8 Ti, 247 = 128 Ti, 236 = 64 Gi, 258 = 256 Pi.
3. Write the following numbers as powers of 2: 2 Ki = 211 , 256 Pi = 258 , 512 Ki = 219 , 64 Gi = 236 , 16 Mi
= 224 , 128 Ei = 267 .
Signed Integers
Unsigned binary numbers work to store natural numbers, but many calculations use negative numbers as well.
To deal with this a number of different schemes have been used to represent signed numbers, but we will focus
on twos complement.
2.1
Twos complement
Most significant bit has a negative value, all others have positive.
Otherwise exactly the same as unsigned integers.
A neat trick for flipping the sign of a twos complement number: flip all the bits and add 1.
Addition is exactly the same as with an unsigned number.
Only one 0, and its located at 0b0.
2.2
Exercises
For the following questions assume an 8 bit integer. Answer each question for the case of a twos complement
number and an unsigned number.
1. What is the largest integer? The largest integer + 1?
(a) [Unsigned:] 255, 0
(b) [Twos Complement:] 127, -128
Counting
Bitstrings can be used to represent more than just numbers. In fact, we use bitstrings to represent everything
inside a computer. And, because we dont want to be wasteful with bits it is important that to remember that
n bits can be used to represent 2n distinct things. To reiterate, n bits can represent up to 2n distinct objects.
3.1
Exercises
1. If the value of a variable is 0, or e, what is the minimum number of bits needed to represent it. 2
2. If we need to address 3 TiB of memory and we want to address every byte of memory, how long does an
address need to be? 42 bits
3. If the only value a variable can take on is e, how many bits are needed to represent it. 0
C Introduction
C is syntactically very similar to Java, but there are a few key differences of which to be wary:
C is function oriented, not object oriented, so no objects for you.
C does not automatically handle memory for you.
In the case of stack memory (things allocated in the usual way), a datum is garbage immediately
after the function in which it was defined returns.
In the case of heap memory (things allocated with malloc and friends), data is freed only when the
programmer explicitly frees it.
In any case, allocated memory always holds garbage until it is initialized.
C uses pointers explicitly. *p tells us to use the value that p points to, rather than the value of p, and &x
gives the address of x rather than the value of x.
There are other differences of which you should be aware, but this should be enough for you to get your feet
wet.
The following functions work correctly (note: this does not mean intelligently), but have no comments. Document the code to prevent it from causing further confusion.
1. /* Returns the sum of the first N elements in ARR. */
int foo(int *arr, size_t n) {
return n ? arr[0] + foo(arr + 1, n - 1) : 0;
}
3. /* Does nothing. */
void baz(int x, int y) {
x = x ^ y;
y = x ^ y;
x = x ^ y;
}
Implement the following functions so that they perform as described in the comments.
1. /* Swaps the value of two ints outside of this function. */
void swap(int *x, int *y) {
int temp = *x;
*x = *y;
*y = temp;
}
Problem?
The following code segments may contain logic and syntax errors. Find and correct them.
1. /* Returns the sum of all the elements in SUMMANDS. */
int sum(int* summands) { // int sum(int* summands, unsigned int n) {
int sum = 0;
for (int i = 0; i < sizeof(summands); i++) // for (int i = 0; i < n; i++)
sum += *(summands + i);
return sum;
}
2. /* Increments all the letters in the string STRING, held in an array of length N.
* Does not modify any other memory which has been previously allocated. */
void increment(char* string, int n) {
for (int i = 0; i < n; i++) // for (i = 0; string[i] != 0; i++)
*(string + i)++; // string[i]++; or (*(string + i))++;
// consider the corner case of incrementing 0xFF
}
C Memory Management
1. Match the items on the left with the memory segment in which they are stored. Answers may be used
more than once, and more than one answer may be required.
1.
2.
3.
4.
5.
6.
7.
8.
9.
Static variables B
Local variables D
Global variables B
Constants A, B
Machine Instructions A
Data B
malloc() C
String Literals B
Characters A, B, C, D
A. Code
B. Static
C. Heap
D. Stack
3. Write code to prepend (add to the start) to a linked list, and to free/empty the entire list.
struct ll_node { struct ll_node* next; int value; }
free_ll(struct ll_node** list)
if(*list) {
free_ll(&((*list)->next));
free(*list);
}
*list = NULL;
Note: *list points to the first element of the list, or is NULL if the list is empty.
MIPS Intro
1. Assume we have an array in memory that contains int* arr = {1,2,3,4,5,6,0}. Let the value of arr
be a multiple of 4 and stored in register $s0. What do the following programs do?
a) lw $t0, 12($s0) // lb,lh
add $t1, $t0, $s0
sw $t0, 4($t1) // arr[2] <- 4; sb,sh
2. In 1), what other instructions could be used in place of each load/store without alignment errors?
3. What are the instructions to branch to label: on each of the following conditions?
$s0 < $s1
$s0 > 1
$s0 >= 1
Translate between the C and MIPS code. You may want to use the MIPS Green Sheet as a reference. In all of
the C examples, we show you how the different variables map to registers you dont have to worry about the
stack or any memory-related issues.
C
MIPS
sw $0, 0($s0)
addiu $s1, $0, 2
sw $s1, 4($s0)
sll $t0, $s1, 2
add $t0, $t0, $s0
sw $s1, 0($t0)
int a = 5, b = 10;
if(a + a == b) {
a = 0;
} else {
b = a - 1;
}
// computes s1 = 2^30
s1 = 1;
for(s0=0;s0<30;s++) {
s1 *= 2;
}
Translate between the C and MIPS code. You may want to use the MIPS Green Sheet as a reference. In all of
the C examples, we show you how the different variables map to registers you dont have to worry about the
stack or any memory-related issues.
C
MIPS
// Strcpy:
// $s1 -> char s1[]
// $s2 -> char *s2 =
//
malloc(sizeof(char)*7);
int i = 0;
do {
s2[i] = s1[i];
i++;
} while(s1[i] != \0);
s2[i] = \0;
#
#
#
#
s1[i]
s2[i]
char is
1 byte!
# unnecessary line
# could use offset
...
beq $s0, $0, Ret0
addiu $t2, $0, 1
beq $s0, $t2, Ret1
addiu $s0, $s0, -2
Loop: beq $s0, $0, RetF
addu $s1, $t0, $t1
addiu $t0, $t1, 0
addiu $t1, $s1, 0
addiu $s0, $s0, -1
j
Loop
Ret0: addiu $v0, $0, 0
j
Done
Ret1: addiu $v0, $0, 1
j
Done
RetF: addu $v0, $0, $s1
Done: ...
// Nth_Fibonacci(n):
// $s0 -> n, $s1 -> fib
// $t0 -> i, $t1 -> j
// Assume fib, i, j are these values
int fib = 1, i = 1, j = 1;
if (n==0)
return 0;
else if (n==1) return 1;
n -= 2;
while (n != 0) {
fib = i + j;
j = i;
i = fib;n--;
}
return fib;
// Collatz conjecture
// $s0 -> n
unsigned n;
L1: if (n % 2) goto L2;
goto L3;
L2: if (n == 1) goto L4;
n = 3 * n + 1;
goto L1;
L3: n = n >> 1;
goto L1;
L4: return n;
L1:
L2:
L3:
L4:
(1) You need to jump to an instruction that 228 + 4 bytes higher than the current PC. How do you do it?
Assume you know the exact destination address at compile time. (Hint: you need multiple instructions)
The jump instruction can only reach addresses that share the same upper 4 bits as the PC. A jump 228 + 4
bytes away would require changing the fourth highest bit, so a jump instruction is not sufficient. We must
manually load our 32 bit address into a register and use jr.
lui $at {upper 16 bits of Foo}
ori $at $at {lower 16 bits of Foo}
jr $at
(2) You now need to branch to an instruction 217 + 4 bytes higher than the current PC, when $t0 equals 0.
Assume that were not jumping to a new 228 byte block. Write MIPS to do this.
The largest address a branch instruction can reach is PC + 4 + SignExtImm. The immediate field is 16
bits and signed, so the largest value is 21 5 1 words, or 21 7 4 Bytes. Thus, we cannot use a branch
instruction to reach our goal, but by the problems assumption, we can use a jump. Assuming were jumping
to label Foo
beq $t0 $0 DontJump
j Foo
DontJump: ...
(3) Given the following MIPS code (and instruction addresses), fill in the blank fields for the following instructions (youll need your green sheet!):
0x002cff00: loop: addu $t0, $t0, $t0
0x002cff04:
jal foo
0x002cff08:
bne $t0, $zero, loop
...
0x00300004: foo: jr $ra
| 0 | 8 | 8 | 8 | 0 | 0x21 |
| 3 |
0xc0001
|
| 5 | 8 | 0 | -3 = 0xfffd |
$ra=__0x002cff08___
Conventions
1. How should $sp be used? When do we add or subtract from $sp?
$sp points to a location on the stack to load or store into. Subtract from $sp before storing, and add to $sp
after restoring.
2. Which registers need to be saved or restored before using jr to return from a function?
All $s* registers that were modified during the function must be restored to their value at the start of the
function
3. Which registers need to be saved before using jal?
$ra, and all $t*, $a*, and $v* registers if their values are needed later after the function call.
4. How do we pass arguments into functions?
$a0, $a1, $a2, $a3 are the four argument registers
5. What do we do if there are more than four arguments to a function?
Use the stack to store additional arguments
6. How are values returned by functions?
$v0 and $v1 are the return value registers.
When calling a function in MIPS, who needs to save the following registers to the stack? Answer caller for the
procedure making a function call, callee for the function being called, or N/A for neither.
$0
$v*
$a*
$t*
$s*
$sp
$ra
N/A
Caller
Caller
Caller
Callee
N/A
Caller
Now assume a function foo calls another function bar (which may be called from a main fucntion), which is known to
call some other functions. foo takes one argument and will modify and use $t0 and $s0. bar takes two arguments,
returns an integer, and uses $t0-$t2 and $s0-$s1. In the boxes below, draw a possible ordering of the stack just
before bar calls a function. The top left box is the address of $sp when foo is first called, and the stack goes
downwards, continuing at each next column. Add (f) if the register is stored by foo and (b) if the register is
stored by bar. The first one is written in for you.
1 $ra (f)
5 $t0 (f)
9 $v0 (b)
13 $t1 (b)
2 $s0 (f)
6 $ra (b)
10 $a0 (b)
14 $t2 (b)
3 $v0 (f)
7 $s0 (b)
11 $a1 (b)
15
4 $a0 (f)
8 $s1 (b)
12 $t0 (b)
16
C to MIPS
1. Assuming $a0 and $a1 hold integer pointers, swap the values they point to via the stack and return control.
addiu
lw
sw
lw
sw
lw
sw
addiu
jr
$sp,
$t0,
$t0,
$t0,
$t0,
$t0,
$t0,
$sp,
$ra
$sp, -4
0($a0)
0($sp)
0($a1)
0($a0)
0($sp)
0($a1)
$sp, 4
2. Translate the following algorithm that finds the sum of the numbers from 0 to N to MIPS assembly. Assume
$s0 holds N, $s1 holds sum, and that N is greater than or equal to 0.
int sum = 0
if (N==0)
return 0;
while (N != 0) {
sum += N
N--;
}
return sum;
3. What must be done to make the adding algorithm from the previous part into a callable MIPS function?
Add a prologue and epilogue to reserve space on the stack and store all necessary variables (see #3). Use $a0
instead of $s0 to store N, the functions argument.
RecursiveSum:
addiu $sp, $sp, -8
sw $ra, 4($sp)
sw $a0, 0($sp)
li $v0, 0
beq $a0, $0, Ret
addiu $a0, $a0, -1
jal RecursiveSum
lw $a0, 0($sp)
addu $v0, $v0, $a0
Ret:
lw $ra, 4($sp)
addiu $sp, $sp, 8
jr $ra
5.1
Overview
5.2
Exercises
1. What is the Stored Program concept and what does it enable us to do?
It is the idea that instructions are just the same as data, and we can treat them as such. This enables us to
write programs that can manipulate other programs!
2. How many passes through the code does the Assembler have to make? Why?
Two, one to find all the label adresses and another to convert all instructions while resolving any forward
references using the collected label addresses.
3. What are the dierent parts of the object files output by the Assembler?
Header: Size and position of other parts
Text: The machine code
Data: Binary representation of any data in the source file
Relocation Table: Identifies lines of code that need to be handled by Linker
Symbol Table: List of the files labels and data that can be referenced
Debugging Information: Additional information for debuggers
4. Which step in CALL resolves relative addressing? Absolute addressing? Assembler, Linker.
5. What step in CALL may make use of the $at register? Assemble
6. What does RISC stand for? How is this related to pseudoinstructions?
Reduced Instruction Set Computing. Minimal set of instructions leads to many lines of code. Pseudoinstructions are more complex instructions intended to make assembly programming easier for the coder. These are
converted to TAL by the assembler.
State
1. Fill out the timing diagram for the circuit below:
+---+
+---+
+---+
IN-|D Q|-s0-|D Q|-s1-|D Q|--Out
+-^-+
+-^-+
+-^-+
|
|
|
CLK--+--------+--------+
clk
in
s0
s1
out
2. Fill out the timing diagram for the circuit below:
+---+
+---+
A--|D Q|-R1-|D Q|-R2-+-^-+
+-^-+
|
|
CLK--+---|>o--+
clk
!clk
A
R1
R2
Logic Gates
1. Label the following logic gates:
Discussion - Logic
Week
+ AB
+ AB
Solution: AB
(b) XOR
+ AB
Solution: AB
(c) XNOR
+ AB
Solution: AB
Discussion - Logic
Week
A
Solution:
Output
4. How many dierent two-input logic gates can there be? How many n-input logic gates?
Solution: A truth table with n inputs has 2n rows. Each logic gate has a 0 or a 1 at each of these
n
rows. Imagining a function as a 2n -bit number, we count 22 total functions, or 16 in the case of n
= 2.
Boolean Logic
1+A=1
0B = 0
DeMorgans Law:
A + A = 1
=0
BB
AB = A + B
A + AB = A
=A+B
A + AB
A + B = AB
(A + B)(A + C) = A + BC
+ AB + B B)C
+ B))C = AC
(AA + AB
= (A + A(B
(1)
C + AB
C + AB C + AB
C + ABC + ABC
AC(
= AC + AC + AC
= AC + AC + AC + AC
= (A + A)C + A(C + C)
= A + C
(2)
(3)
(4)
(5)
C + BC)
(c) DeMorgans: A(B
Solution:
C + BC)
A(B
C + BC
A + B
CBC
= A + B
+ C)
= A + (B + C)(B
= A + B C + BC
=
Discussion - Logic
(6)
(7)
(8)
(9)
Week
Memory
Register Write
Write back the ALU result / the memory load to the register file
0
Jump Addr
Concat
<<2
Inst[25:0]
Branch Addr
1
(PC+4)[31:28]
PC+4
Instruction Fetch
<<2
Execute
Register Write
Write
Regiseter
Memory
+4
Inst[25:21]
1
P
C
Addr
Read
Data
Instruction
Memory
Inst[20:16]
0
Inst
[15:11]
Inst[15:0]
Write
Data
Read
Addr1
Read
Addr2
Register
File
Write
Addr
Zero
Read RF[rs]
Data1
Read RF[rt]
Data2
Write
Enable
RegWr
ALUSrc
A
L
U
Addr
Out
Read
Data
Data
Memory
Write
Data
Write
Enable
Sign / Zero
Extended
Inst[31:26]
Inst[5:0]
RegDst
ExtOp
Jump
Control Unit
ALUCtr
MemWr
MemToReg
Branch
Note: The Zero signal in the ALU is just one way to do this.
The reasoning for using a Zero here is that based on the following instructions
(on the next page) that we need to account for, we only want to branch if two
values are equal. We can easily do this by subtracting the two and outputting a
1 if the result is equivalent to 0 (hence the Zero signal)
Branch
RegDst
ExtOp
ALUSrc
ALUCtr
MemWr MemtoReg
add
0
0
1
X
0
0010
0
ori
0
0
0
0
1
0001
0
lw
0
0
0
1
1
0010
0
sw
0
0
X
1
1
0010
1
beq
0
1
X
1
0
0110
0
j
1
X
X
X
X
XXXX
0
X: dont care value(either 0 or 1 is ok)
This table shows the ALUCtr values for each operation of the ALU:
Operation
ALUCtr
AND
0000
OR
0001
ADD
0010
SUB
0110
0
0
1
X
X
X
SLT
0111
RegWr
1
1
1
0
0
0
NOR
1100
Clocking Methodology
The input signal to each state element must stabilize before each rising edge.
Critical path: Longest delay path between state elements in the circuit.
tclk tclk-to-q + tCL + tsetup, where tCL is the critical path in the combinational logic.
If we place registers in the critical path, we can shorten the period by reducing
the amount of logic between registers.
30
20
25
200
250
200
150
RegFile
Setup
TRFsetup
20
Now, we will optimize a single cycle CPU using pipelining. Pipelining is a powerful logic design
method to reduce the clock time and improve the throughput, even though it increases the
latency of an individual task and adds additional logic. In a pipelined CPU, multiple instructions
are overlapped in execution. This is a good example of parallelism, which is one of the great ideas
in computer architecture. To obtain a pipelined CPU, we will take the following steps.
Pipelining starts from adding pipelining registers by dividing a large combinational logic. We
have already chopped a single cycle CPU into five stages, and thus, will add pipeline registers
between two stages.
A great advantage of pipelining is the performance improvement with a shorter clock time. We
will use the same timing parameters as those in the previous discussion.
Parameter
Register
clk-to-q
tclk-to-q
Register
Setup
tsetup
Delay(ps)
30
20
Element
MUX
ALU
Mem
Read
Mem
Write
tMEMwrite
RegFile
Read
tRFread
RegFile
Setup
TRFsetup
tmux
tALU
tMEMread
25
200
250
200
150
20
Q1. What was the clock time and frequency of a single cycle CPU?
tclk,single >= tPC, clk-to-q + tIMEMread + tRFread + tALU + tDMEMread + tmux + tRFsetup
= 30 + 250 + 150 + 200 + 250 + 25 + 20 = 925 ps
fclk,single = 1/tclk,pipe <= 1/ (925 ps) = 1.08 GHz
Q2. What is the clock time and frequency of a pipelined CPU?
The performance improvement comes at a cost. Pipelining introduces pipeline hazards we have
to overcome.
Structural Hazard
Structural hazards occur when more than one instruction use the same resource at the same time.
Register File: One instruction reads from the register file while another writes to it. We can
solve this by having separate read and write ports and writing to the register file at the falling
edge of the clock.
Memory: The memory is accessed not only for the instruction but also for the data. Separate
caches for instructions and data solve this hazard.
Q3. Under what conditions do we need to introduce a nop? Under what conditions do we need
to forward the output of the MEM stage to the EX stage? Assume you have the signals
memToReg(n), rt(n), rs(n), regWrite(n), and regDst(n), where n is 0 for the signal of the current
instruction being executed by the EX stage, -1 for the previous, etc.
We forward if (rt(0) == regDst(-2) || rs(0) == regDst(-2)) && memToReg(-2) && regWrite(-2)
CPU
Cache
ED
F0
0D
1111 1110
1110 1101
1111 0000
0000 1101
Cache size
Block size
Tag bits
Index bits
Offset bits
16
4KiB
4B
10
32+4+1
32
32KiB
16B
17
11
128+17+1
32
64KiB
16B
16
12
128+16+1
64
2048KiB
128B
43
14
1068
CPU
Cache
Offset
Index
Number 7 6 5 4 3 2 1 0
r 0
1
2
3
3. 3Cs of Caches
3 types of cache misses:
1. Compulsory: Miss to an address not seen before. Reduce compulsory misses by having a longer
cache line, which brings in locations before we ask for them.
2. Conflict: Increasing the associativity or improving the replacement policy would remove the
miss.
3. Capacity: The only way to remove the miss is to increase the cache capacity.
Classify each M and R above as one of the 3 misses above.
4. Analyzing C Code
#define NUM_INTS 8192
int A[NUM_INTS]; /* A lives at
int i, total = 0;
for (i = 0; i < NUM_INTS; i +=
for (i = 0; i < NUM_INTS; i +=
0x10000
128)
128)
*/
{ A[i] = i; } /* Line 1 */
{ total += A[i]; } /* Line 2 */
Lets say you have a byte-addressed computer with a total memory of 1MiB. It features a 16KiB CPU
cache with 1KiB blocks.
1. How many bits make up a memory address on this computer? 20
2. What is the T:I:O breakdown? tag bits: 6
index bits: 4
offset bits: 10
3. Calculate the cache hit rate for the line marked Line 1: 50%
The integer accesses are 4*128=512 bytes apart, which means there are 2 accesses per
block. The first accesses in each block is a cache miss, but the second is a hit because A[i]
and A[i+128] are in the same cache block.
4. Calculate the cache hit rate for the line marked Line 2: 50%
The size of A is 8192*4 = 215 bytes. This is exactly twice the size of our cache. At the end of
line 1, we have the second half of A inside the cache, while in line 2 we start accesses from
the beginning of the array. Thus we cannot reuse any of the content of A and we get the
same hit rate as before. Note that we do not have to consider cache hits for total, since
the compiler will probably leave it in a register.
The IEEE 754 standard defines a binary representation for floating point values using three fields:
The sign determines the sign of the number (0 for positive, 1 for negative)
The exponent is in biased notation with a bias of 127
The significand is akin to unsigned, but used to store a fraction instead of an integer.
The below table shows the bit breakdown for the single precision (32-bit) representation:
Sign
Exponent
Significand
1 bit
8 bits
23 bits
There is also a double precision encoding format that uses 64 bits. This behaves the same as the single
precision but uses 11 bits for the exponent (and thus a bias of 1023) and 52 bits for the significand.
How a float is interpreted depends on the values in the exponent and significand fields:
For normalized floats:
Exponent Significand Meaning
Value = (-1)Sign x 2(Exponent Bias) x 1.significand2
0
Anything
Denorm
1-254
Anything
Normal
For denormalized floats:
255
0
Infinity
Sign
(Exponent Bias + 1)
Value = (-1) x 2
x 0.significand2 255
Nonzero
NaN
Exercises
1. How many zeroes can be represented using a float? 2
2. What is the largest finite positive value that can be stored using a single precision float?
0x7F7FFFFF = (2 2-23) x 2127
3. What is the smallest positive value that can be stored using a single precision float?
0x00000001 = 2-23 x 2-126
4. What is the smallest positive normalized value that can be stored using a single precision float?
0x00800000 = 2-126
5. Convert the following numbers from binary to decimal or from decimal to binary:
0x00000000
8.25
0x00000F00
39.5625
0xFF94BEEF
-
0x00000000 = 0
8.25 = 0x41040000
0x000000F0 = (2-12 + 2-13 + 2-14 + 2-15) x 2-126
39.5625 = 0x421E4000
0xFF94BEEF = NaN
- = 0xFF800000
AMAT
AMAT is the average (expected) time it takes for memory access. It can be calculated using this formula:
Exercises
Flynn Taxonomy
1. Explain SISD and give an example if available.
Single Instruction Single Data; each instruction is executed in order, acting on a single stream of data.
For example, traditional computer programs.
2. Explain SIMD and give an example if available.
Single Instruction Multiple Data; each instruction is executed in order, acting on multiple streams of
data. For example, the SSE Intrinsics.
3. Explain MISD and give an example if available.
Multiple Instruction Single Data; multiple instructions are executed simultaneously, acting on a single
stream of data. There are no good modern examples.
4. Explain MIMD and give an example if available.
Multiple Instruction Multiple Data; multiple instructions are executed simultaneously, acting on multiple
streams of data. For example, map reduce or multithreaded programs.
2. Failure in a WSC
1) In this example, a WSC has 55,000 servers, and each server has four disks whose annual failure
rate is 4%. How many disks will fail per hour?
(55,000 x 4 x 0.04) / (365 x 24) = 1.00 ! MTTF = 1 hour
2) What is the availability of the system if it does not tolerate the failure? Assume that the time to
repair a disk is 30 minutes.
MTTF = 1, MTTR = 0.5 ! Availability = 1 / (1 + 0.5) = 2/3 = 66.6%
3. Performance of a WSC
DRAM latency (us)
Global hit rate
DRAM bandwidth (MiB/sec)
Disk bandwidth (MiB/ sec)
Local
0.1
90%
20,000
200
Rack
100
9%
100
100
Array
300
1%
10
10
1) Calculate the AMAT of this WSC. What is vital for WSC performance?
AMAT = 0.9 x 0.1 + 0.09 x 100 + 0.01 x 300 = 0.09 + 9 + 3 = 12.09 us
Locality of access within a server is vital for WSC performance
2) How long does it take to transfer 1,000 MiB a) between disks within the server, and b) between
DRAM within the rack? What can you conclude from this example?
a) 1,000 / 200 = 5 sec, b) 1,000 / 100 = 10 sec. Data transfer outside a single server is detrimental
to WSC performance. Network switches are the bottlenecks
4. Power Usage Effectiveness (PUE) = (Total Building Power) / (IT Equipment Power)
Sources speculate Google has over 1 million servers. Assume each of the 1 million servers draw
an average of 200W, the PUE is 1.5, and that Google pays an average of 6 cents per kilowatt-hour
for datacenter electricity.
1) Estimate Googles annual power bill for its datacenters.
1.5 x 1,000,000 servers x 0.2kW/sever x $0.06/kW-hr x 8760 hrs/yr = $157.68 M/yr
2) Google reduced the PUE of a 50,000 machine datacenter from 1.5 to 1.25 without decreasing
the power supplied to the servers. Whats the cost savings per year?
(1.5 - 1.25) x 50,000 servers x 0.2kW/server x $0.06/kW-hr x 8760 hrs/yr = $1.314M/yr
2. Given a persons unique int ID and a list of the IDs of their friends, compute the list of mutual
friends between each pair of friends in a social network.
Declare+any+custom+data+types+here:+
FriendPair:+
int+friendOne+
int+friendTwo+
+
map(int+personID,+list<int>+friendIDs):+
++for+(+fID+in+friendIDs+):+
++if+(+personID+<+fID+):+
++++friendPair+=+(+personID,+fID+)+
++else:+
++++friendPair+=+(+fID,+personID+)+
++emit(friendPair,+friendIDs)+
+
+
+
+
+
reduce(+FriendPair+key,++
+++++Iterable<+list<int>+>+values):+
mutualFriends+=++
++intersection(++
++++values.next(),+values.next())+
emit(key,+mutualFriends)+
+
+
+
+
+
reduce(CoinPair+key,++
+++++++Iterable<+int+>+values):+
++total+=+0+
for+(+count+in+values+):+
++total++=+count+
++emit(key,+total)+
+
+
'
b) Using the output of the first MapReduce, compute the amount of money each person has. The
function valueOfCoin(String+coinType) returns a float corresponding to the dollar value of
the coin.
map(CoinPair+key,+int+amount):+
emit(coinPair.person,+++++
valueOfCoin(coinPair.coinType)*amount)+
+
+
+
+
reduce(String+key,++
+++++++Iterable<+float+>+values):+
++total+=+0+
for+(+amount+in+values+):+
++total++=+amount+
++emit(key,+total)++
+
+
+
+
Page Offset
Page Offset
With 4 KiB pages and byte addresses, 2^(page offset bits) = 4096, so page offset bits = 12.
The Big Picture: Logical Flow
Translate VA to PA using the TLB and Page Table. Then use
PA to access memory as the program intended.
Pages
A chunk of memory or disk with a set size. Addresses in
the same virtual page get mapped to addresses in the
same physical page. The page table determines the
mapping.
The Page Table
Index = Virtual Page Number
(VPN) (not stored)
Page
Valid
Each stored row of the page table is called a page table entry (the grayed section is the first page table
entry). The page table is stored in memory; the OS sets a register telling the hardware the address of the
first entry of the page table. The processor updates the page dirty in the page table: page dirty bits
are used by the OS to know whether updating a page on disk is necessary. Each process gets its own
page table.
Protection Fault--The page table entry for a virtual page has permission bits that prohibit the
requested operation
Page Fault--The page table entry for a virtual page has its valid bit set to false. The entry is not in
memory.
Exercises
1) What are three specific benefits of using virtual memory?
Bridges memory and disk in memory hierarchy.
Simulates full address space for each process.
Enforces protection between processes.
2) What should happen to the TLB when a new value is loaded into the page table address
register?
The valid bits of the TLB should all be set to 0. The page table entries in the TLB corresponded to the old
page table, so none of them are valid once the page table address register points to a different page
table.
5) A processor has 16-bit addresses, 256 byte pages, and an 8-entry fully associative TLB with
LRU replacement (the LRU field is 3 bits and encodes the order in which pages were accessed, 0
being the most recent). At some time instant, the TLB for the current process is the initial state
given in the table below. Assume that all current page table entries are in the initial TLB.
Assume also that all pages can be read from and written to. Fill in the final state of the TLB
according to the access pattern below.
Initial TLB
VPN
PPN
Valid
Dirty
LRU
0x01
0x11
0x00
0x00
0x10
0x13
0x20
0x12
0x00
0x11
0x00
0x14
0
1
0
0
7
4
0xac
0x15
0xff
0x16
Read 0x11f0: hit, LRUs: 1,7,2,5,7,0,3,4
Write 0x1301: miss, map VPN 0x13 to PPN 0x17, valid and dirty, LRUs: 2,0,3,6,7,1,4,5
Write 0x20ae: hit, dirty, LRUs: 3,1,4,0,7,2,5,6
Write 0x2332: miss, map VPN 0x23 to PPN 0x18, valid and dirty, LRUs: 4,2,5,1,0,3,6,7
Read 0x20ff: hit, LRUs: 4,2,5,0,1,3,6,7
Write 0x3415: miss and replace last entry, map VPN 0x34 to 0x19, dirty, LRUs, 5,3,6,1,2,4,7,0
Final TLB
VPN
PPN
Valid
Dirty
LRU
0x01
0x11
0x13
0x17
0x10
0x20
0x13
0x12
1
1
1
1
6
1
0x23
0x18
0x11
0x14
Hamming ECC
Recall the basic structure of a Hamming code. Given bits 1, . . . , m, the bit at position 2n is
parity for all the bits with a 1 in position n. For example, the first bit is chosen such that the sum
of all odd-numbered bits is even.
1. How many bits do we need to add to 00112 to allow single error correction?
Parity Bits: 3
2. Which locations in 00112 would parity bits be included?
Using P for parity bits: PP0P0112
3. Which bits does each parity bit cover in 00112?
Parity bit #1: 1, 3, 5, 7
Parity bit #2: 2, 3, 6, 7
Parity bit #3: 4, 5, 6, 7
4. Write the completed coded representation for 00112 to enable single error correction.
10000112
5. How can we enable an additional double error detection on top of this?
Add an additional parity bit over the entire sequence.
6. Find the original bits given the following SEC Hamming Code: 01101112
Parity group 1: error
Parity group 2: okay
Parity group 4: error
Incorrect bit: 1 + 4 = 5, change bit 5 from 1 to 0: 01100112
01100112 10112
7. Find the original bits given the following SEC Hamming Code: 10010002
Parity group 1: error
Parity group 2: okay
Parity group 4: error
Incorrect bit: 1 + 4 = 5, change bit 5 from 1 to 0: 10011002
10011002 01002
8. Find the original bits given the following SEC Hamming Code: 0100110100001102
Parity group 1: okay
Parity group 2: error
Parity group 4: okay
Parity group 8: error
Incorrect bit: 2 + 8 = 10, change bit 10 from 0 to 1: 0100110101001102
0100110101001102 011001001102