Sunteți pe pagina 1din 33

CS61C Fall 2015 Discussion 0 Number Representation

Unsigned Integers

If
have an n-digit unsigned numeral dn1 dn2 . . . d0 in radix (or base) r, then the value of that numeral is
Pwe
n1 i
2
i=0 r di , which is just fancy notation to say that instead of a 10s or 100s place we have an rs or r s place.
For binary, decimal, and hex we just let r be 2, 10, and 16, respectively.
Recall also that we often have cause to write down unreasonably large numbers, and our preferred tool for
doing that is the IEC prefixing system: Ki = 210 , Mi = 220 , Gi = 230 , Ti = 240 , Pi = 250 , Ei = 260 , Zi = 270 ,
Yi = 280 .

1.1

We dont have calculators during exams, so lets try this by hand

1. Convert the following numbers from their initial radix into the other two common radices: 0b10010011
= 147 = 0x93 , 0xD3AD = 0b1101 0011 1010 1101 = 54189 , 63 = 0b0011 1111 = 0x3F , 0b00100100 =
36 = 0x24 , 0xB33F = 0b1011 0011 0011 1111 = 45887 , 0 = 0b0 = 0x0 , 39 = 0b0010 0111 = 0x27 ,
0x7EC4 = 0b0111 1110 1100 0100 =32452 , 437 = 0b0001 1011 0101 = 0x1B5
2. Write the following numbers using IEC prefixes: 216 = 64 Ki , 234 = 16 Gi, 227 = 128 Mi, 261 = 2 Ei,
243 = 8 Ti, 247 = 128 Ti, 236 = 64 Gi, 258 = 256 Pi.
3. Write the following numbers as powers of 2: 2 Ki = 211 , 256 Pi = 258 , 512 Ki = 219 , 64 Gi = 236 , 16 Mi
= 224 , 128 Ei = 267 .

Signed Integers

Unsigned binary numbers work to store natural numbers, but many calculations use negative numbers as well.
To deal with this a number of different schemes have been used to represent signed numbers, but we will focus
on twos complement.

2.1

Twos complement

Twos complement is the standard solution for representing signed integers.

Most significant bit has a negative value, all others have positive.
Otherwise exactly the same as unsigned integers.
A neat trick for flipping the sign of a twos complement number: flip all the bits and add 1.
Addition is exactly the same as with an unsigned number.
Only one 0, and its located at 0b0.

2.2

Exercises

For the following questions assume an 8 bit integer. Answer each question for the case of a twos complement
number and an unsigned number.
1. What is the largest integer? The largest integer + 1?
(a) [Unsigned:] 255, 0
(b) [Twos Complement:] 127, -128

2. How do you represent the numbers 0, 1, and -1?


(a) [Unsigned:] 0b0000 0000, 0b0000 0001, N/A
(b) [Twos Complement:] 0b0000 0000, 0b0000 0001, 0b1111 1111
3. How do you represent 17, -17?
(a) [Unsigned:] 0b0001 0001, N/A
(b) [Twos Complement:] 0b0001 0001, 0b1110 1111
4. What is the largest integer that can be represented by any encoding scheme that only uses 8 bits? There
is no such integer. For example, an arbitrary 8-bit mapping could choose to represent the numbers from
1 to 256 instead of 0 to 255.
5. Prove that the twos complement inversion trick is valid (i.e. that x and x + 1 sum to 0). Note that for
any x we have x + x = 0b1. . .1. A straightforward hand calculation shows that 0b1 . . . 1 + 0b1 = 0.
6. Explain where each of the three radices shines and why it is preferred over other bases in a given context.
Decimal is the preferred radix for human hand calculations, likely related to the fact that humans have
10 fingers.
Binary numerals are particularly useful for computers. Binary signals are less likely to be garbled than
higher radix signals, as there is more distance (voltage or current) between valid signals. Additionally,
binary signals are quite convenient to design circuits with, as well see later in the course.
Hexadecimal numbers are a convenient shorthand for displaying binary numbers, owing to the fact that
one hex digit corresponds exactly to four binary digits.

Counting

Bitstrings can be used to represent more than just numbers. In fact, we use bitstrings to represent everything
inside a computer. And, because we dont want to be wasteful with bits it is important that to remember that
n bits can be used to represent 2n distinct things. To reiterate, n bits can represent up to 2n distinct objects.

3.1

Exercises

1. If the value of a variable is 0, or e, what is the minimum number of bits needed to represent it. 2
2. If we need to address 3 TiB of memory and we want to address every byte of memory, how long does an
address need to be? 42 bits
3. If the only value a variable can take on is e, how many bits are needed to represent it. 0

CS61c Fall 2015 Discussion 1 C


1

C Introduction

C is syntactically very similar to Java, but there are a few key differences of which to be wary:
C is function oriented, not object oriented, so no objects for you.
C does not automatically handle memory for you.
In the case of stack memory (things allocated in the usual way), a datum is garbage immediately
after the function in which it was defined returns.
In the case of heap memory (things allocated with malloc and friends), data is freed only when the
programmer explicitly frees it.
In any case, allocated memory always holds garbage until it is initialized.
C uses pointers explicitly. *p tells us to use the value that p points to, rather than the value of p, and &x
gives the address of x rather than the value of x.
There are other differences of which you should be aware, but this should be enough for you to get your feet
wet.

Uncommented Code? Yuck!

The following functions work correctly (note: this does not mean intelligently), but have no comments. Document the code to prevent it from causing further confusion.
1. /* Returns the sum of the first N elements in ARR. */
int foo(int *arr, size_t n) {
return n ? arr[0] + foo(arr + 1, n - 1) : 0;
}

2. /* Returns -1 times the number of zeroes in the first N elements of ARR. */


int bar(int *arr, size_t n) {
int sum = 0, i;
for (i = n; i > 0; i--) {
sum += !arr[i - 1];
}
return ~sum + 1;
}

3. /* Does nothing. */
void baz(int x, int y) {
x = x ^ y;
y = x ^ y;
x = x ^ y;
}

Programming with Pointers

Implement the following functions so that they perform as described in the comments.
1. /* Swaps the value of two ints outside of this function. */
void swap(int *x, int *y) {
int temp = *x;
*x = *y;
*y = temp;
}

2. /* Increments the value of an int outside of this function by one. */


void plus_plus(int *x) {
(*x)++; // or: x[0]++;
}

3. /* Returns the number of bytes in a string. Does not use strlen. */


int mystrlen(char* str) {
int count = 0;
while(*str++) {
count++;
}
return count;
}

Problem?

The following code segments may contain logic and syntax errors. Find and correct them.
1. /* Returns the sum of all the elements in SUMMANDS. */
int sum(int* summands) { // int sum(int* summands, unsigned int n) {
int sum = 0;
for (int i = 0; i < sizeof(summands); i++) // for (int i = 0; i < n; i++)
sum += *(summands + i);
return sum;
}

2. /* Increments all the letters in the string STRING, held in an array of length N.
* Does not modify any other memory which has been previously allocated. */
void increment(char* string, int n) {
for (int i = 0; i < n; i++) // for (i = 0; string[i] != 0; i++)
*(string + i)++; // string[i]++; or (*(string + i))++;
// consider the corner case of incrementing 0xFF
}

3. /* Copies the string SRC to DST. */


void copy(char* src, char* dst) {
while (*dst++ = *src++);
}
// This code has no errors.

CS61c Fall 2015 Discussion 2 C Memory Management & MIPS


1

C Memory Management
1. Match the items on the left with the memory segment in which they are stored. Answers may be used
more than once, and more than one answer may be required.
1.
2.
3.
4.
5.
6.
7.
8.
9.

Static variables B
Local variables D
Global variables B
Constants A, B
Machine Instructions A
Data B
malloc() C
String Literals B
Characters A, B, C, D

A. Code
B. Static
C. Heap
D. Stack

2. What is wrong with the C code below?


int* ptr = malloc(4 * sizeof(int));
if(extra_large) ptr = malloc(10 * sizeof(int)); // Memory leak if extra_large is true
return ptr;

3. Write code to prepend (add to the start) to a linked list, and to free/empty the entire list.
struct ll_node { struct ll_node* next; int value; }
free_ll(struct ll_node** list)

prepend(struct ll_node** list, int value)

if(*list) {
free_ll(&((*list)->next));
free(*list);
}

struct ll_node* item = (struct ll_node*)


malloc(sizeof(struct ll_node));
item->value = value;
item->next = *list;
*list = item;

*list = NULL;

Note: *list points to the first element of the list, or is NULL if the list is empty.

MIPS Intro
1. Assume we have an array in memory that contains int* arr = {1,2,3,4,5,6,0}. Let the value of arr
be a multiple of 4 and stored in register $s0. What do the following programs do?
a) lw $t0, 12($s0) // lb,lh
add $t1, $t0, $s0
sw $t0, 4($t1) // arr[2] <- 4; sb,sh

d) addiu $t0, $0, 12


sw $t0, 6($s0) // alignment error; sh,sb
e) addiu $t0, $0, 8
sw $t0, -4($s0) // out of bounds; sh,sb

b) addiu $s1, $s0, 27


lh $t0, -3($s1) // $t0 <- 0; lw,lb

f) addiu $s1, $s0, 10


addiu $t0, $0, 6
sw $t0, 2($s1) // arr[3] <- 6; sh,sb

c) addiu $s1, $s0, 24


lh $t0$, -3($s1) // alignment error; lb

2. In 1), what other instructions could be used in place of each load/store without alignment errors?

3. What are the instructions to branch to label: on each of the following conditions?
$s0 < $s1

$s0 <= $s1

$s0 > 1

$s0 >= 1

slt $t0, $s0, $s1

slt $t0, $s1, $s0

sltiu $t0, $s0, 2

bgtz $s0, label

bne $t0, $0, label

beq $t0, $0, label

beq $t0, $0, label

Translating between C and MIPS

Translate between the C and MIPS code. You may want to use the MIPS Green Sheet as a reference. In all of
the C examples, we show you how the different variables map to registers you dont have to worry about the
stack or any memory-related issues.
C

MIPS

// $s0 -> a, $s1 -> b


// $s2 -> c, $s3 -> z
int a = 4, b = 5, c = 6, z;
z = a + b + c + 10;

addiu $s0, $0, 4


addiu $s1, $0, 5
addiu $s2, $0, 6
addu $s3, $s0, $s1
addu $s3, $s3, $s2
addiu $s3, $s3, 10

// $s0 -> int * p = intArr;


// $s1 -> a;
*p = 0;
int a = 2;
p[1] = p[a] = a;

sw $0, 0($s0)
addiu $s1, $0, 2
sw $s1, 4($s0)
sll $t0, $s1, 2
add $t0, $t0, $s0
sw $s1, 0($t0)

// $s0 -> a, $s1 -> b

addiu $s0, $0, 5


addiu $s1, $0, 10
addu $t0, $s0, $s0
bne $t0, $s1, else
xor $s0, $0, $0
j exit
else:
addiu $s1, $s0, -1
exit:

int a = 5, b = 10;
if(a + a == b) {
a = 0;
} else {
b = a - 1;
}

// computes s1 = 2^30
s1 = 1;
for(s0=0;s0<30;s++) {
s1 *= 2;
}

addiu $s0, $0, 0


addiu $s1, $0, 1
addiu $t0, $0, 30
loop:
beq $s0, $t0, exit
addu $s1, $s1, $s1
addiu $s0, $s0, 1
j loop
exit:

// $a0 -> n, $v0 -> sum


int sum;
for(sum=0;n>0;sum+=n--);

xor $v0, $0, $0


loop:
blez $a0, exit
addu $v0, $v0, $a0
addiu $a0, $a0, -1
j loop
exit:

CS61c Summer 2015 Discussion 3 MIPSII/Instruction Formats


1

Translating between C and MIPS

Translate between the C and MIPS code. You may want to use the MIPS Green Sheet as a reference. In all of
the C examples, we show you how the different variables map to registers you dont have to worry about the
stack or any memory-related issues.
C

MIPS

// Strcpy:
// $s1 -> char s1[]
// $s2 -> char *s2 =
//
malloc(sizeof(char)*7);
int i = 0;
do {
s2[i] = s1[i];
i++;
} while(s1[i] != \0);
s2[i] = \0;

addiu $t0, $0, 0


Loop: addu $t1, $s1, $t0
addu $t2, $s2, $t0
lb
$t3, 0($t1)
sb
$t3, 0($t2)
addiu $t0, $t0, 1
addiu $t1, $t1, 1
lb
$t4, 0($t1)
bne
$t4, $0, Loop
Done: sb
$t4, 1($t2)

#
#
#
#

s1[i]
s2[i]
char is
1 byte!

# unnecessary line
# could use offset

...
beq $s0, $0, Ret0
addiu $t2, $0, 1
beq $s0, $t2, Ret1
addiu $s0, $s0, -2
Loop: beq $s0, $0, RetF
addu $s1, $t0, $t1
addiu $t0, $t1, 0
addiu $t1, $s1, 0
addiu $s0, $s0, -1
j
Loop
Ret0: addiu $v0, $0, 0
j
Done
Ret1: addiu $v0, $0, 1
j
Done
RetF: addu $v0, $0, $s1
Done: ...

// Nth_Fibonacci(n):
// $s0 -> n, $s1 -> fib
// $t0 -> i, $t1 -> j
// Assume fib, i, j are these values
int fib = 1, i = 1, j = 1;
if (n==0)
return 0;
else if (n==1) return 1;
n -= 2;
while (n != 0) {
fib = i + j;
j = i;
i = fib;n--;
}
return fib;

// Collatz conjecture
// $s0 -> n
unsigned n;
L1: if (n % 2) goto L2;
goto L3;
L2: if (n == 1) goto L4;
n = 3 * n + 1;
goto L1;
L3: n = n >> 1;
goto L1;
L4: return n;

L1:

L2:

L3:
L4:

addiu $t0, $0, 2


div $s0, $t0
# puts (n%2) in $hi
mfhi $t0
# sets $t0 = (n%2)
bne $t0, $0, L2
j L3
addiu $t0, $0, 1
beq $s0, $t0, L4
addiu $t0, $0, 3
mul $s0, $s0, $t0
addiu $s0, $s0, 1
j L1
srl $s0, $s0, 1
j L1
...

MIPS Addressing Modes


We have several addressing modes to access memory (immediate not listed):
(a) Base displacement addressing: Adds an immediate to a register value to create a memory address
(used for lw, lb, sw, sb)
(b) PC-relative addressing: Uses the PC (actually the current PC plus four) and adds the I-value of
the instruction (multiplied by 4) to create an address (used by I-format branching instructions like
beq, bne)
(c) Pseudodirect addressing: Uses the upper four bits of the PC and concatenates a 26-bit value from
the instruction (with implicit 00 lowest bits) to make a 32-bit address (used by J-formatinstructions)
(d) Register Addressing: Uses the value in a register as a memory address (jr)

(1) You need to jump to an instruction that 228 + 4 bytes higher than the current PC. How do you do it?
Assume you know the exact destination address at compile time. (Hint: you need multiple instructions)
The jump instruction can only reach addresses that share the same upper 4 bits as the PC. A jump 228 + 4
bytes away would require changing the fourth highest bit, so a jump instruction is not sufficient. We must
manually load our 32 bit address into a register and use jr.
lui $at {upper 16 bits of Foo}
ori $at $at {lower 16 bits of Foo}
jr $at

(2) You now need to branch to an instruction 217 + 4 bytes higher than the current PC, when $t0 equals 0.
Assume that were not jumping to a new 228 byte block. Write MIPS to do this.
The largest address a branch instruction can reach is PC + 4 + SignExtImm. The immediate field is 16
bits and signed, so the largest value is 21 5 1 words, or 21 7 4 Bytes. Thus, we cannot use a branch
instruction to reach our goal, but by the problems assumption, we can use a jump. Assuming were jumping
to label Foo
beq $t0 $0 DontJump
j Foo
DontJump: ...

(3) Given the following MIPS code (and instruction addresses), fill in the blank fields for the following instructions (youll need your green sheet!):
0x002cff00: loop: addu $t0, $t0, $t0
0x002cff04:
jal foo
0x002cff08:
bne $t0, $zero, loop
...
0x00300004: foo: jr $ra

| 0 | 8 | 8 | 8 | 0 | 0x21 |
| 3 |
0xc0001
|
| 5 | 8 | 0 | -3 = 0xfffd |
$ra=__0x002cff08___

(4) What instruction is 0x00008A03?


Hex -> bin:
0 opcode -> R-type:

0000 0000 0000 0000 1000 1010 0000 0011


000000 00000 00000 10001 01000 000011
sra $s1 $0 8

CS61C Fall 2015 Discussion 4 MIPS Procedures & CALL


1

MIPS Control Flow

Conventions
1. How should $sp be used? When do we add or subtract from $sp?
$sp points to a location on the stack to load or store into. Subtract from $sp before storing, and add to $sp
after restoring.
2. Which registers need to be saved or restored before using jr to return from a function?
All $s* registers that were modified during the function must be restored to their value at the start of the
function
3. Which registers need to be saved before using jal?
$ra, and all $t*, $a*, and $v* registers if their values are needed later after the function call.
4. How do we pass arguments into functions?
$a0, $a1, $a2, $a3 are the four argument registers
5. What do we do if there are more than four arguments to a function?
Use the stack to store additional arguments
6. How are values returned by functions?
$v0 and $v1 are the return value registers.

When calling a function in MIPS, who needs to save the following registers to the stack? Answer caller for the
procedure making a function call, callee for the function being called, or N/A for neither.

$0

$v*

$a*

$t*

$s*

$sp

$ra

N/A

Caller

Caller

Caller

Callee

N/A

Caller

Now assume a function foo calls another function bar (which may be called from a main fucntion), which is known to
call some other functions. foo takes one argument and will modify and use $t0 and $s0. bar takes two arguments,
returns an integer, and uses $t0-$t2 and $s0-$s1. In the boxes below, draw a possible ordering of the stack just
before bar calls a function. The top left box is the address of $sp when foo is first called, and the stack goes
downwards, continuing at each next column. Add (f) if the register is stored by foo and (b) if the register is
stored by bar. The first one is written in for you.
1 $ra (f)

5 $t0 (f)

9 $v0 (b)

13 $t1 (b)

2 $s0 (f)

6 $ra (b)

10 $a0 (b)

14 $t2 (b)

3 $v0 (f)

7 $s0 (b)

11 $a1 (b)

15

4 $a0 (f)

8 $s1 (b)

12 $t0 (b)

16

A Guide to Writing Functions

C to MIPS
1. Assuming $a0 and $a1 hold integer pointers, swap the values they point to via the stack and return control.
addiu
lw
sw
lw
sw
lw
sw
addiu
jr

void swap(int *a, int *b) {


int tmp = *a;
*a = *b;
*b = tmp;
}

$sp,
$t0,
$t0,
$t0,
$t0,
$t0,
$t0,
$sp,
$ra

$sp, -4
0($a0)
0($sp)
0($a1)
0($a0)
0($sp)
0($a1)
$sp, 4

2. Translate the following algorithm that finds the sum of the numbers from 0 to N to MIPS assembly. Assume
$s0 holds N, $s1 holds sum, and that N is greater than or equal to 0.
int sum = 0
if (N==0)

Start: add $s1 $0 $0


Loop: beq
$s0, $0, Ret
add
$s1, $s1, $s0
addiu $s0, $s0, -1
j
Loop
Ret: addiu $v0, $0, 0
j
Done
Done: jr
$ra

return 0;

while (N != 0) {
sum += N
N--;
}
return sum;

3. What must be done to make the adding algorithm from the previous part into a callable MIPS function?
Add a prologue and epilogue to reserve space on the stack and store all necessary variables (see #3). Use $a0
instead of $s0 to store N, the functions argument.
RecursiveSum:
addiu $sp, $sp, -8
sw $ra, 4($sp)
sw $a0, 0($sp)
li $v0, 0
beq $a0, $0, Ret
addiu $a0, $a0, -1
jal RecursiveSum
lw $a0, 0($sp)
addu $v0, $v0, $a0
Ret:
lw $ra, 4($sp)
addiu $sp, $sp, 8
jr $ra

Compile, Assemble, Link, Load, and Go!

5.1

Overview

5.2

Exercises

1. What is the Stored Program concept and what does it enable us to do?
It is the idea that instructions are just the same as data, and we can treat them as such. This enables us to
write programs that can manipulate other programs!
2. How many passes through the code does the Assembler have to make? Why?
Two, one to find all the label adresses and another to convert all instructions while resolving any forward
references using the collected label addresses.
3. What are the dierent parts of the object files output by the Assembler?
Header: Size and position of other parts
Text: The machine code
Data: Binary representation of any data in the source file
Relocation Table: Identifies lines of code that need to be handled by Linker
Symbol Table: List of the files labels and data that can be referenced
Debugging Information: Additional information for debuggers
4. Which step in CALL resolves relative addressing? Absolute addressing? Assembler, Linker.
5. What step in CALL may make use of the $at register? Assemble
6. What does RISC stand for? How is this related to pseudoinstructions?
Reduced Instruction Set Computing. Minimal set of instructions leads to many lines of code. Pseudoinstructions are more complex instructions intended to make assembly programming easier for the coder. These are
converted to TAL by the assembler.

State
1. Fill out the timing diagram for the circuit below:
+---+
+---+
+---+
IN-|D Q|-s0-|D Q|-s1-|D Q|--Out
+-^-+
+-^-+
+-^-+
|
|
|
CLK--+--------+--------+
clk
in
s0
s1
out
2. Fill out the timing diagram for the circuit below:
+---+
+---+
A--|D Q|-R1-|D Q|-R2-+-^-+
+-^-+
|
|
CLK--+---|>o--+
clk
!clk
A
R1
R2

Logic Gates
1. Label the following logic gates:

Solution: not, and, or, xor, nand, nor, xnor

2. Convert the following to boolean expressions:


(a) NAND

CS 61C 4QSJOH 201

Discussion  - Logic

Week 

+ AB
+ AB

Solution: AB
(b) XOR
+ AB

Solution: AB
(c) XNOR
+ AB
Solution: AB

CS 61C 4QSJOH

Discussion  - Logic

Week 

3. Create an AND gate using only NAND gates.

A
Solution:

Output

4. How many dierent two-input logic gates can there be? How many n-input logic gates?
Solution: A truth table with n inputs has 2n rows. Each logic gate has a 0 or a 1 at each of these
n
rows. Imagining a function as a 2n -bit number, we count 22 total functions, or 16 in the case of n
= 2.

Boolean Logic
1+A=1
0B = 0
DeMorgans Law:

A + A = 1
=0
BB

AB = A + B

A + AB = A
=A+B
A + AB

A + B = AB

(A + B)(A + C) = A + BC

1. Minimize the following boolean expressions:

(a) Standard: (A + B)(A + B)C


Solution:

+ AB + B B)C

+ B))C = AC
(AA + AB
= (A + A(B

(1)

C + AB
C + AB C + AB
C + ABC + ABC

(b) Grouping & Extra Terms: AB


Solution:
B
+ B) + AC(B
+ B)
+ AC(B + B)

AC(

= AC + AC + AC
= AC + AC + AC + AC
= (A + A)C + A(C + C)
= A + C

(2)
(3)
(4)
(5)

C + BC)
(c) DeMorgans: A(B
Solution:
C + BC)
A(B

CS 61C 4QSJOH

C + BC
A + B
CBC

= A + B

+ C)

= A + (B + C)(B

= A + B C + BC
=

Discussion  - Logic

(6)
(7)
(8)
(9)

Week 

CS61C Fall 2015


Discussion 6 Single Cycle CPU Datapath and Control
__________________________________________________________________
Single Cycle CPU Design
Here we have a single cycle CPU diagram. Answer the following questions:
1. Name each component.
2. Name each datapath stage and explain its functionality.
Stage
Functionality
Instruction
Send an address to the instruction memory
Fetch
Read the instruction (MEM[PC])
Generate the control signal values using the opcode & funct fields
Decode /
Read the register values with the rs & rt fields
Register Read
Sign / zero extend the immediate
Execute

Perform arithmetic / logical operations

Memory

Read from / write to the data memory

Register Write

Write back the ALU result / the memory load to the register file

3. Provide data inputs and control signals to the next PC logic.


4. Implement the next PC logic.
Next PC Logic

0
Jump Addr

Concat

<<2

Inst[25:0]

Branch Addr
1

(PC+4)[31:28]
PC+4

Instruction Fetch

<<2

Decode / Register Read

Execute

Register Write
Write
Regiseter

Memory

+4

Inst[25:21]
1

P
C

Addr

Read
Data

Instruction
Memory

Inst[20:16]
0
Inst
[15:11]

Inst[15:0]

Write
Data

Read
Addr1
Read
Addr2

Register
File

Write
Addr

Zero

Read RF[rs]
Data1

Read RF[rt]
Data2

Write
Enable

RegWr

ALUSrc

A
L
U

Addr

Out

Read
Data

Data
Memory
Write
Data

Write
Enable

Sign / Zero
Extended

Inst[31:26]
Inst[5:0]
RegDst

ExtOp

Jump

Single Cycle CPU Control Logic

Control Unit

ALUCtr

MemWr

MemToReg
Branch

Note: The Zero signal in the ALU is just one way to do this.
The reasoning for using a Zero here is that based on the following instructions
(on the next page) that we need to account for, we only want to branch if two
values are equal. We can easily do this by subtracting the two and outputting a
1 if the result is equivalent to 0 (hence the Zero signal)

CS61C Fall 2015


Discussion 6 Single Cycle CPU Datapath and Control
__________________________________________________________________
Fill out the values for the control signals from the previous CPU diagram.
Control Signals
Instrs.
Jump

Branch

RegDst

ExtOp

ALUSrc

ALUCtr

MemWr MemtoReg

add
0
0
1
X
0
0010
0
ori
0
0
0
0
1
0001
0
lw
0
0
0
1
1
0010
0
sw
0
0
X
1
1
0010
1
beq
0
1
X
1
0
0110
0
j
1
X
X
X
X
XXXX
0
X: dont care value(either 0 or 1 is ok)
This table shows the ALUCtr values for each operation of the ALU:
Operation
ALUCtr

AND
0000

OR
0001

ADD
0010

SUB
0110

0
0
1
X
X
X

SLT
0111

RegWr

1
1
1
0
0
0

NOR
1100

Clocking Methodology

The input signal to each state element must stabilize before each rising edge.
Critical path: Longest delay path between state elements in the circuit.
tclk tclk-to-q + tCL + tsetup, where tCL is the critical path in the combinational logic.
If we place registers in the critical path, we can shorten the period by reducing
the amount of logic between registers.

Single Cycle CPU Performance Analysis

The delays of circuit elements are given as follows:


Register
Register
Mem
Mem RegFile
Element
MUX
ALU
clk-to-q
Setup
Read Write
Read
Parameter
tclk-to-q
tsetup
tmux
tALU tMEMread tMEMwrite tRFread
Delay(ps)

30

20

25

200

250

200

150

RegFile
Setup
TRFsetup
20

1. Give an instruction that exercises the critical path.


Load Word (lw)
2. What is the critical path in the single cycle CPU?
Red dashed line in the diagram
3. What are the minimum clock cycle, tclk, and the maximum clock frequency, fclk?
Assume the tclk-to-q > hold time.
tclk >= tPC, clk-to-q + tIMEMread + tRFread + tALU + tDMEMread + tmux + tRFsetup
= 30 + 250 + 150 + 200 + 250 + 25 + 20 = 925 ps
fclk = 1/tclk <= 1/ (925 ps) = 1.08 GHz
4. Why is a single cycle CPU inefficient?
-Not all instructions exercise the critical path.
-It is not parallelized. Each component can be active concurrently.
5. How can you improve its performance? What is the purpose of pipelining?
Pipelining: Put pipeline registers between two datapath stages. ! reduce the clock time

CS61C Fall 2015


Discussion 7 Pipelined CPU
__________________________________________________________________
Pipelined CPU Design

Now, we will optimize a single cycle CPU using pipelining. Pipelining is a powerful logic design
method to reduce the clock time and improve the throughput, even though it increases the
latency of an individual task and adds additional logic. In a pipelined CPU, multiple instructions
are overlapped in execution. This is a good example of parallelism, which is one of the great ideas
in computer architecture. To obtain a pipelined CPU, we will take the following steps.

Step 1: Pipeline Registers

Pipelining starts from adding pipelining registers by dividing a large combinational logic. We
have already chopped a single cycle CPU into five stages, and thus, will add pipeline registers
between two stages.

Step 2: Performance Analysis

A great advantage of pipelining is the performance improvement with a shorter clock time. We
will use the same timing parameters as those in the previous discussion.
Parameter

Register
clk-to-q
tclk-to-q

Register
Setup
tsetup

Delay(ps)

30

20

Element

MUX

ALU

Mem
Read

Mem
Write
tMEMwrite

RegFile
Read
tRFread

RegFile
Setup
TRFsetup

tmux

tALU

tMEMread

25

200

250

200

150

20

Q1. What was the clock time and frequency of a single cycle CPU?
tclk,single >= tPC, clk-to-q + tIMEMread + tRFread + tALU + tDMEMread + tmux + tRFsetup
= 30 + 250 + 150 + 200 + 250 + 25 + 20 = 925 ps
fclk,single = 1/tclk,pipe <= 1/ (925 ps) = 1.08 GHz
Q2. What is the clock time and frequency of a pipelined CPU?

fclk,pipe = 1/tclk,pipe <= 1/ (300 ps) = 3.33 GHz

Q3. What is the speed-up? Why is it less than five?


Speed-up = tclk,pipe / tclk,single = fclk,pipe / fclk,single = 3.08.
This is because pipeline stages are not balanced evenly and there is overhead from pipeline
registers (tclk-to-q, tsetup). Moreover, this does not include the delays from the additional logic for
hazard resolution.

CS61C Fall 2015


Discussion 7 Pipelined CPU
__________________________________________________________________
Step 3: Pipeline Hazard

The performance improvement comes at a cost. Pipelining introduces pipeline hazards we have
to overcome.

Structural Hazard
Structural hazards occur when more than one instruction use the same resource at the same time.
Register File: One instruction reads from the register file while another writes to it. We can
solve this by having separate read and write ports and writing to the register file at the falling
edge of the clock.
Memory: The memory is accessed not only for the instruction but also for the data. Separate
caches for instructions and data solve this hazard.

Data Hazard and Forwarding


Data hazards occur due to data dependencies among instructions. Forwarding can solve many
data hazards.
Q1. Spot the data dependencies in the code below and figure out how forwarding can resolve
data hazards.
Instruction
C0
C1
C2
C3
C4
C5
C6
addi $t0, $s0, -1
IF
REG
EX
MEM
WB
and $s2, $t0, $a0
IF
REG
EX
MEM
WB
sw $s0, 100($t0)
IF
REG
EX
MEM
WB
The REG step for instructions 2 and 3 depend on data in the registers only available after the WB
step of instruction 1. We can forward the ALU output of the first instruction to the EX stages of
future instructions
Q2. In general, under what conditions will an EX stage need to take in forwarded inputs from
previous instructions? Where should those inputs come from in regards to the current cycle?
Assume you have the signals ALUout(n), rt(n), rs(n), regWrite(n), and regDst(n), where n is 0
for the signal of the current instruction being executed by the EX stage, -1 for the previous, etc.
Forward ALUout(-1) if (rt(0) == regDst(-1) || rs(0) == regDst(-1)) && regWrite(-1)
Forward ALUout(-2) if (rt(0) == regDst(-2) || rs(0) == regDst(-2)) && regWrite(-2)
Forward ALUout(-3) if (rt(0) == regDst(-3) || rs(0) == regDst(-3)) && regWrite(-3)

CS61C Fall 2015


Discussion 7 Pipelined CPU
__________________________________________________________________
Data Hazard and Stall
Forwarding cannot solve all data hazards. We need to stall the pipeline in some cases.
Q1. Spot the data dependencies in the code below and figure out why forwarding cannot
resolve this hazard.
Instruction
C0
C1
C2
C3
C4
C5
lw $t0, 20($s0)
IF
REG
EX
MEM
WB
addiu $t1, $t0, $t0
IF
REG
EX
MEM
WB
The add instruction needs the value of $t0 in the beginning of C3, but it is ready at the end of C3.
Q2. Now we stall the pipeline one cycle and insert nop after the lw instruction. Figure out how
this can resolve the hazard.
Instruction
C0
C1
C2
C3
C4
C5
C6
lw $t0, 20($s0)
IF
REG
EX
MEM
WB
nop
IF
REG
EX
MEM
WB
addiu $t1, $t0, $t0
IF
REG
EX
MEM
WB
By stalling one cycle, the add instruction can start its execution stage after the $t0 value is ready.

Q3. Under what conditions do we need to introduce a nop? Under what conditions do we need
to forward the output of the MEM stage to the EX stage? Assume you have the signals
memToReg(n), rt(n), rs(n), regWrite(n), and regDst(n), where n is 0 for the signal of the current
instruction being executed by the EX stage, -1 for the previous, etc.
We forward if (rt(0) == regDst(-2) || rs(0) == regDst(-2)) && memToReg(-2) && regWrite(-2)

Control Hazard and Prediction


Control hazards occur due to jumps and branches. We may solve them by stalling the pipeline.
However, it is painful since the branch condition is calculated after the execution stage and the
pipeline is stalled for three cycles. Instead, we add a branch comparator inside the register read
stage and introduce the branch delay slot, and redefine MIPS so that the instruction after a
branch statement will always be executed.
Q1. Reorder the following sets of instructions to account for the branch delay slot. You may
have to insert a nop instruction.
Set 1
Reordered set 1
Set 2
Reordered Set 2
addiu $t0, $t1, 5
addiu $t0, $t1, 5
addiu $t0, $t1, 5
addiu $t0, $t1, 5
ori $t2, $t3, 0xff
beq $t0, $s0, label
ori $t2, $t3, 0xff
ori $t2, $t3, 0xff
beq $t0, $s0, label
ori $t2, $t3, 0xff
beq $t0, $t2, label
beq $t0, $t2, label
lw $t4, 0($t0)
lw $t4, 0($t0)
lw $t4, 0($t0)
nop
lw $t4, 0($t0)

CS 61C Fall 2015 Discussion 8 Caches


In the following diagram, each blank box in the CPU Cache represents 8 bits (1 byte) of data. Our
memory is byte-addressed, meaning that there is one address for each byte. Compare this to wordaddressed, which means that there is one address for each word.
Tag bits
Index bits
Offset bits
Total
Offset
Index
Number
3
2
1
0
29
1
2
32
0
1
Index bits=log2 (Number of index rows)
Offset bits=log2 (Number of offsets columns)

CPU
Cache

1. Direct mapped caches


1. How many bytes of data can our cache hold? 8 bytes
How many words? 2 words
2. Fill in the Tag bits, Index bits, Offset bits with the correct T:I:O breakdown according to the
diagram.
3. Lets say we have a 8192KiB cache with an 128B block size, what is the tag, index, and offset
of 0xFEEDF00D?
FE

ED

F0

0D

1111 1110

1110 1101

1111 0000

0000 1101

Tag: 111111101 (0x1FD) Index: 1101101111100000 (0xDBE0) Offset: 0001101 (0x0D)


4. Fill in the table below. Assume we have a write-through cache, so the number of bits per
row includes only the cache data, the tag, and the valid bit.
Address size
(bits)

Cache size

Block size

Tag bits

Index bits

Offset bits

Bits per row

16

4KiB

4B

10

32+4+1

32

32KiB

16B

17

11

128+17+1

32

64KiB

16B

16

12

128+16+1

64

2048KiB

128B

43

14

1068

2. Cache hits and misses


Assume we have the following byte-addressed cache. Of the 32 bits in each address, which bits do we
use to find the row of the cache to use? We use the 4th and 5th least significant bit since the offset is 3
bits
Classify each of the following byte memory accesses as a cache hit (H), cache miss (M), or cache miss
with replacement (R).

CPU
Cache

Offset
Index
Number 7 6 5 4 3 2 1 0
r 0
1
2
3

1. 0x00000004 Index 0, Tag 0: M, Compulsory


2. 0x00000005 Index 0, Tag 0: H
3. 0x00000068 Index 1, Tag 3: M, Compulsory
4.0x000000C8 Index 1, Tag 6: R, Compulsory
5.0x00000068 Index 1, Tag 3: R, Conflict
6. 0x000000DD Index 3, Tag 6: M, Compulsory
7. 0x00000045 Index 0, Tag 2: R, Compulsory
8. 0x00000004 Index 0, Tag 0: R, Capacity
9. 0x000000C8 Index 1, Tag 6: R, Capacity

3. 3Cs of Caches
3 types of cache misses:
1. Compulsory: Miss to an address not seen before. Reduce compulsory misses by having a longer
cache line, which brings in locations before we ask for them.
2. Conflict: Increasing the associativity or improving the replacement policy would remove the
miss.
3. Capacity: The only way to remove the miss is to increase the cache capacity.
Classify each M and R above as one of the 3 misses above.

4. Analyzing C Code
#define NUM_INTS 8192
int A[NUM_INTS]; /* A lives at
int i, total = 0;
for (i = 0; i < NUM_INTS; i +=
for (i = 0; i < NUM_INTS; i +=

0x10000
128)
128)

*/

{ A[i] = i; } /* Line 1 */
{ total += A[i]; } /* Line 2 */

Lets say you have a byte-addressed computer with a total memory of 1MiB. It features a 16KiB CPU
cache with 1KiB blocks.
1. How many bits make up a memory address on this computer? 20
2. What is the T:I:O breakdown? tag bits: 6
index bits: 4
offset bits: 10
3. Calculate the cache hit rate for the line marked Line 1: 50%
The integer accesses are 4*128=512 bytes apart, which means there are 2 accesses per
block. The first accesses in each block is a cache miss, but the second is a hit because A[i]
and A[i+128] are in the same cache block.
4. Calculate the cache hit rate for the line marked Line 2: 50%
The size of A is 8192*4 = 215 bytes. This is exactly twice the size of our cache. At the end of
line 1, we have the second half of A inside the cache, while in line 2 we start accesses from
the beginning of the array. Thus we cannot reuse any of the content of A and we get the
same hit rate as before. Note that we do not have to consider cache hits for total, since
the compiler will probably leave it in a register.

5. Average Memory Access Time


AMAT is the average (expected) time it takes for memory access. It can be calculated using the
formula:
AMAT = hit_time + miss_rate miss_penalty
Remember that the miss penalty is the additional time it takes for memory access in the event
of a cache miss. Therefore, a cache miss takes (hit_time + miss_penalty) time.
1. Suppose that you have a cache system with the following properties. What is the AMAT?
a) L1$ hits in 1 cycle (local miss rate 25%)
b) L2$ hits in 10 cycles (local miss rate 40%)
c) L3$ hits in 50 cycles (global miss rate 6%)
d) Main memory hits in 100 cycles (always hits)
The AMAT is 1 + 0.25*(10 + 0.4*(50)) + 0.06*100 = 14.5 cycles.
Alternatively, we can calculate the global hit rates for each hierarchy:
L1$: 0.75
L2$: 0.25*0.6 = 0.15
L3$: 0.94 (0.75+0.15) = 0.04
Main Memory: 1 0.75 0.15 0.04 = 0.06
And the following hit times:
L1$: 1 cycle
L2$: 1+10 = 11 cycles
L3$: 1 + 10 + 50 = 61 cycles
Main Memory: 1 + 10 + 50 + 100 = 161 cycles
Then, AMAT = 0.75*1 + 0.15*11 + 0.04*61 + 0.06 * 161 = 14.5 cycles.

CS61C Fall 2015 Discussion 9


Floating Point

The IEEE 754 standard defines a binary representation for floating point values using three fields:

The sign determines the sign of the number (0 for positive, 1 for negative)
The exponent is in biased notation with a bias of 127
The significand is akin to unsigned, but used to store a fraction instead of an integer.

The below table shows the bit breakdown for the single precision (32-bit) representation:

Sign
Exponent
Significand
1 bit
8 bits
23 bits

There is also a double precision encoding format that uses 64 bits. This behaves the same as the single
precision but uses 11 bits for the exponent (and thus a bias of 1023) and 52 bits for the significand.

How a float is interpreted depends on the values in the exponent and significand fields:

For normalized floats:
Exponent Significand Meaning
Value = (-1)Sign x 2(Exponent Bias) x 1.significand2
0
Anything
Denorm

1-254
Anything
Normal
For denormalized floats:
255
0
Infinity
Sign
(Exponent Bias + 1)
Value = (-1) x 2
x 0.significand2 255
Nonzero
NaN

Exercises
1. How many zeroes can be represented using a float? 2

2. What is the largest finite positive value that can be stored using a single precision float?
0x7F7FFFFF = (2 2-23) x 2127

3. What is the smallest positive value that can be stored using a single precision float?
0x00000001 = 2-23 x 2-126

4. What is the smallest positive normalized value that can be stored using a single precision float?
0x00800000 = 2-126

5. Convert the following numbers from binary to decimal or from decimal to binary:
0x00000000
8.25
0x00000F00
39.5625
0xFF94BEEF
-
0x00000000 = 0
8.25 = 0x41040000
0x000000F0 = (2-12 + 2-13 + 2-14 + 2-15) x 2-126
39.5625 = 0x421E4000
0xFF94BEEF = NaN
- = 0xFF800000

AMAT
AMAT is the average (expected) time it takes for memory access. It can be calculated using this formula:

AMAT = hit time + miss rate miss penalty



Miss rates can be given in terms of either local miss rates or global miss rates. The local miss rate of a
cache is the percentage of accesses into the particular cache that miss at the cache, while the global
miss rate is the percentage of all accesses that miss at the cache.

Exercises

Suppose your system consists of:


A L1$ that hits in 2 cycles and has a local miss rate of 20%
A L2$ that hits in 15 cycles and has a global miss rate of 5%
Main memory hits in 100 cycles

1. What is the local miss rate of L2$?
Local miss rate = 5% / 20% = 0.25 = 25%

2. What is the AMAT of the system?
AMAT = 2 + 20% x 15 + 5% x 100 = 10 (using global miss rates)
Alternatively, AMAT = 2 + 20% x (15 + 25% x 100) = 10

3. Suppose we want to reduce the AMAT of the system to 8 or lower by adding in a L3$. If the L3$ has
a local miss rate of 30%, what is the largest hit time that the L3$ can have?
Let H = hit time of the cache. Using the AMAT equation, we can write:
2 + 20% x (15 + 25% x (H + 30% x 100)) 8
Solving for H, we find that H 30. So the largest hit time is 30 cycles.

Flynn Taxonomy
1. Explain SISD and give an example if available.
Single Instruction Single Data; each instruction is executed in order, acting on a single stream of data.
For example, traditional computer programs.

2. Explain SIMD and give an example if available.
Single Instruction Multiple Data; each instruction is executed in order, acting on multiple streams of
data. For example, the SSE Intrinsics.

3. Explain MISD and give an example if available.
Multiple Instruction Single Data; multiple instructions are executed simultaneously, acting on a single
stream of data. There are no good modern examples.

4. Explain MIMD and give an example if available.
Multiple Instruction Multiple Data; multiple instructions are executed simultaneously, acting on multiple
streams of data. For example, map reduce or multithreaded programs.

CS61C Spring 2015


Discussion 11 Warehouse Scale Computing and Spark
__________________________________________________________________
Warehouse Scale Computing
1. Amdahls Law
1) You are going to train the image classifier with 50,000 images on a WSC having more than
50,000 servers. You notice that 99% of the execution can be parallelized. What is the speedup?
1 / (0.01 + 0.99 / 50,000) 1 / 0.01 = 100

2. Failure in a WSC
1) In this example, a WSC has 55,000 servers, and each server has four disks whose annual failure
rate is 4%. How many disks will fail per hour?
(55,000 x 4 x 0.04) / (365 x 24) = 1.00 ! MTTF = 1 hour
2) What is the availability of the system if it does not tolerate the failure? Assume that the time to
repair a disk is 30 minutes.
MTTF = 1, MTTR = 0.5 ! Availability = 1 / (1 + 0.5) = 2/3 = 66.6%
3. Performance of a WSC
DRAM latency (us)
Global hit rate
DRAM bandwidth (MiB/sec)
Disk bandwidth (MiB/ sec)

Local
0.1
90%
20,000
200

Rack
100
9%
100
100

Array
300
1%
10
10

1) Calculate the AMAT of this WSC. What is vital for WSC performance?
AMAT = 0.9 x 0.1 + 0.09 x 100 + 0.01 x 300 = 0.09 + 9 + 3 = 12.09 us
Locality of access within a server is vital for WSC performance
2) How long does it take to transfer 1,000 MiB a) between disks within the server, and b) between
DRAM within the rack? What can you conclude from this example?
a) 1,000 / 200 = 5 sec, b) 1,000 / 100 = 10 sec. Data transfer outside a single server is detrimental
to WSC performance. Network switches are the bottlenecks
4. Power Usage Effectiveness (PUE) = (Total Building Power) / (IT Equipment Power)
Sources speculate Google has over 1 million servers. Assume each of the 1 million servers draw
an average of 200W, the PUE is 1.5, and that Google pays an average of 6 cents per kilowatt-hour
for datacenter electricity.
1) Estimate Googles annual power bill for its datacenters.
1.5 x 1,000,000 servers x 0.2kW/sever x $0.06/kW-hr x 8760 hrs/yr = $157.68 M/yr
2) Google reduced the PUE of a 50,000 machine datacenter from 1.5 to 1.25 without decreasing
the power supplied to the servers. Whats the cost savings per year?
(1.5 - 1.25) x 50,000 servers x 0.2kW/server x $0.06/kW-hr x 8760 hrs/yr = $1.314M/yr

CS61C Spring 2015


Discussion 11 Warehouse Scale Computing and Spark
__________________________________________________________________
Map Reduce
Use pseudocode to write MapReduce functions necessary to solve the problems below. Also,
make sure to fill out the correct data types. Some tips:
The input to each MapReduce job is given by the signature of the map() function.
The function emit(key+k,+value+v) outputs the key-value pair (k,+v).
The for(var+in+list) syntax can be used to iterate through Iterables or you can call the
hasNext() and next() functions.
Usable data types: int,+float,+String. You may also use lists and custom data types
composed of the aforementioned types.
The method intersection(list1,+list2) returns a list that is the intersection of list1 and
list2.
1. Given the students name and the course taken, output each students name and total GPA.
Declare+any+custom+data+types+here:+
CourseData:+
++++int+courseID+
float+studentGrade++//+a+number+from+084+
+
map(String+student,+CourseData+value):+ reduce(+String+key,++
++emit(student,+value.studentGrade)+
+++++Iterable<+float+>+values):+
+
totalPts+=+0+
+
totalClasses+=+0+
+
for+(+grade+in+values+):+
+
++++totalPts++=+grade+
+
++++totalClasses+++
+
emit(key,+totalPts+/+totalClasses)+
+

2. Given a persons unique int ID and a list of the IDs of their friends, compute the list of mutual
friends between each pair of friends in a social network.
Declare+any+custom+data+types+here:+
FriendPair:+
int+friendOne+
int+friendTwo+
+
map(int+personID,+list<int>+friendIDs):+
++for+(+fID+in+friendIDs+):+
++if+(+personID+<+fID+):+
++++friendPair+=+(+personID,+fID+)+
++else:+
++++friendPair+=+(+fID,+personID+)+
++emit(friendPair,+friendIDs)+
+
+
+
+
+

reduce(+FriendPair+key,++
+++++Iterable<+list<int>+>+values):+
mutualFriends+=++
++intersection(++
++++values.next(),+values.next())+
emit(key,+mutualFriends)+
+
+
+
+
+

CS61C Spring 2015


Discussion 11 Warehouse Scale Computing and Spark
__________________________________________________________________
3. a) Given a set of coins and each coins owner, compute the number of coins of each
denomination that a person has.
Declare'any'custom'data'types'here:'
CoinPair:+
++String+person+
++String+coinType+
!
map(String+person,+String+coinType):+
++++key+=+(person,+coinType)+
++++emit(key,+1)'
!
!
!
!
!

reduce(CoinPair+key,++
+++++++Iterable<+int+>+values):+
++total+=+0+
for+(+count+in+values+):+
++total++=+count+
++emit(key,+total)+
+
+
'

b) Using the output of the first MapReduce, compute the amount of money each person has. The
function valueOfCoin(String+coinType) returns a float corresponding to the dollar value of
the coin.
map(CoinPair+key,+int+amount):+
emit(coinPair.person,+++++
valueOfCoin(coinPair.coinType)*amount)+
+
+
+
+

reduce(String+key,++
+++++++Iterable<+float+>+values):+
++total+=+0+
for+(+amount+in+values+):+
++total++=+amount+
++emit(key,+total)++
+
+
+
+

CS61C Spring 2015


Discussion 11 Warehouse Scale Computing and Spark
__________________________________________________________________
Spark
RDD: primary abstraction of a distributed collection of items
Transforms: RDD ! RDD
Return a new distributed dataset formed by passing each element of
map(func)+
the source through a function func.
Similar to map, but each input item can be mapped to 0 or more
flatMap(func)+
output items (so func should return a Seq rather than a single item).
When called on a dataset of (K,V) pairs, returns a dataset of (K,V)
reduceByKey(func)+ pairs where the values for each key are aggregated using the given
reduce function func, which must be of type (V,V)+=>+V.
Actions: RDD ! Value
Aggregate the elements of the dataset regardless of keys using a
reduce(func)+
function func
1. Implement Problem 1 of MapReduce with Spark
#+students:+list((studentName,+courseData))+
studentsData+=+sc.parallelize(students)+
out+=+studentsData.map(lambda+(k,+v):+(k,+(v.studentGrade,+_1_)))++
++++++++++++++++++.reduceByKey(lambda+v1,+v2:+(v1[0]+++v2[0],+v1[1]+++v2[1]))+
++++++++++++++++++.map(lambda+(k,+v):+(k,+v[0]+/+v[1]))+
+
+
+

2. Implement Problem 2 of MapReduce with Spark


def+genFriendPairAndValue(pID,+fIDs):+
++return+[((pID,+fID),+fIDs)+if+pID+<+fID+else+(fID,+pID)+for+fID+in+fIDs]+
def+intersection(l1,+l2):+
++return+[x+for+x+in+b1+if+x+in+b2]+
#+persons:+list((personID,+list(friendID))+
personsData+=+sc.parallelize(persons)+
out+=+personsData.flatMap(lambda+(k,+v):+genFriendPairAndValue(k,+v))+
+++++++++++++++++.reduceByKey(lambda+v1,+v2:+intersection(v1,+v2))+
+
+
+

3. Implement Problem 3 of MapReduce with Spark


#+coinPairs:+list((person,+coinType))+
coinData+=+sc.parallelize(coinPairs)+
#(3.a)+out:+list(((person,+coinType),+count))+
out1+=+coinData.map(lambda+(k1,+k2):+((k1,+k2),+1))+
+++++++++++++++.reduceByKey(lambda+v1,+v2:+v1+++v2)+
#(3.b)+
out2+=+out1.map(lambda+(k,+v):+(k[0],+v+*+valueOfCoin(k[1])))+
+++++++++++.reduceByKey(lambda+v1,+v2:+v1+++v2)+
+
+

Virtual Memory Overview


Virtual address (VA): What your program uses
Virtual Page Number

Page Offset

Physical address (PA): What actually determines where in memory to go


Physical Page Number

Page Offset

With 4 KiB pages and byte addresses, 2^(page offset bits) = 4096, so page offset bits = 12.
The Big Picture: Logical Flow
Translate VA to PA using the TLB and Page Table. Then use
PA to access memory as the program intended.
Pages
A chunk of memory or disk with a set size. Addresses in
the same virtual page get mapped to addresses in the
same physical page. The page table determines the
mapping.
The Page Table
Index = Virtual Page Number
(VPN) (not stored)

Page
Valid

Page Permission Bits


Dirty (read, write, ...)

Physical Page Number (PPN)

(Max virtual page number)

Each stored row of the page table is called a page table entry (the grayed section is the first page table
entry). The page table is stored in memory; the OS sets a register telling the hardware the address of the
first entry of the page table. The processor updates the page dirty in the page table: page dirty bits
are used by the OS to know whether updating a page on disk is necessary. Each process gets its own
page table.

Protection Fault--The page table entry for a virtual page has permission bits that prohibit the
requested operation

Page Fault--The page table entry for a virtual page has its valid bit set to false. The entry is not in
memory.

The Translation Lookaside Buffer (TLB)


A cache for the page table. Each block is a single page table entry. If an entry is not in the TLB,
its a TLB miss. Assuming fully associative:
TLB Entry Tag = Virtual Page Number Page Table Entry
Valid
Page Dirty Permission Bits Physical Page Number

The Big Picture Revisited

Exercises


1) What are three specific benefits of using virtual memory?
Bridges memory and disk in memory hierarchy.
Simulates full address space for each process.
Enforces protection between processes.


2) What should happen to the TLB when a new value is loaded into the page table address
register?
The valid bits of the TLB should all be set to 0. The page table entries in the TLB corresponded to the old
page table, so none of them are valid once the page table address register points to a different page
table.

5) A processor has 16-bit addresses, 256 byte pages, and an 8-entry fully associative TLB with
LRU replacement (the LRU field is 3 bits and encodes the order in which pages were accessed, 0
being the most recent). At some time instant, the TLB for the current process is the initial state
given in the table below. Assume that all current page table entries are in the initial TLB.
Assume also that all pages can be read from and written to. Fill in the final state of the TLB
according to the access pattern below.

Free physical pages: 0x17, 0x18, 0x19


Access pattern:
Read
0x11f0
Write
0x1301
Write
0x20ae
Write
0x2332
Read
0x20ff
Write
0x3415

Initial TLB
VPN

PPN

Valid

Dirty

LRU

0x01

0x11

0x00

0x00

0x10

0x13

0x20

0x12

0x00
0x11

0x00
0x14

0
1

0
0

7
4

0xac

0x15

0xff

0x16





Read 0x11f0: hit, LRUs: 1,7,2,5,7,0,3,4
Write 0x1301: miss, map VPN 0x13 to PPN 0x17, valid and dirty, LRUs: 2,0,3,6,7,1,4,5
Write 0x20ae: hit, dirty, LRUs: 3,1,4,0,7,2,5,6
Write 0x2332: miss, map VPN 0x23 to PPN 0x18, valid and dirty, LRUs: 4,2,5,1,0,3,6,7
Read 0x20ff: hit, LRUs: 4,2,5,0,1,3,6,7
Write 0x3415: miss and replace last entry, map VPN 0x34 to 0x19, dirty, LRUs, 5,3,6,1,2,4,7,0

Final TLB
VPN

PPN

Valid

Dirty

LRU

0x01

0x11

0x13

0x17

0x10
0x20

0x13
0x12

1
1

1
1

6
1

0x23

0x18

0x11

0x14

Hamming ECC
Recall the basic structure of a Hamming code. Given bits 1, . . . , m, the bit at position 2n is
parity for all the bits with a 1 in position n. For example, the first bit is chosen such that the sum
of all odd-numbered bits is even.

1. How many bits do we need to add to 00112 to allow single error correction?
Parity Bits: 3
2. Which locations in 00112 would parity bits be included?
Using P for parity bits: PP0P0112
3. Which bits does each parity bit cover in 00112?
Parity bit #1: 1, 3, 5, 7
Parity bit #2: 2, 3, 6, 7
Parity bit #3: 4, 5, 6, 7
4. Write the completed coded representation for 00112 to enable single error correction.
10000112
5. How can we enable an additional double error detection on top of this?
Add an additional parity bit over the entire sequence.
6. Find the original bits given the following SEC Hamming Code: 01101112
Parity group 1: error
Parity group 2: okay
Parity group 4: error
Incorrect bit: 1 + 4 = 5, change bit 5 from 1 to 0: 01100112
01100112 10112
7. Find the original bits given the following SEC Hamming Code: 10010002
Parity group 1: error
Parity group 2: okay
Parity group 4: error
Incorrect bit: 1 + 4 = 5, change bit 5 from 1 to 0: 10011002
10011002 01002
8. Find the original bits given the following SEC Hamming Code: 0100110100001102
Parity group 1: okay
Parity group 2: error
Parity group 4: okay
Parity group 8: error
Incorrect bit: 2 + 8 = 10, change bit 10 from 0 to 1: 0100110101001102
0100110101001102 011001001102

S-ar putea să vă placă și