Ia Cell Unit 4

IFET COLLEGE OF ENGINEERING
DEPARTMENT OF CSE & IT

CS6303 – COMPUTER ARCHITECTURE
UNIT IV– PARALLELISM
(100% THEORY)
QUESTION BANK
PARALLELISM
Instruction-level-parallelism – Parallel processing challenges – Flynn's classification – Hardware
multithreading – Multicore processors
PART-A (2 MARKS)
Knowledge:
1. What is instruction-level parallelism?(ILP) (Nov/Dec 2015)(Nov/Dec 2016)
2. What is meant by static and dynamic multiple issues?
3. State Issue Slots and Issue Packet.
4. Define speculation.
5. What is meant by anti-dependence? How is it removed?
6. Define dynamic multiple-issue processors or super scalar processors.
(Nov/Dec 2015)
7. What is the use of reservation station and reorder buffer?
8. What is the function of commit unit?
9. Sketch the three primary units of a dynamically scheduled pipeline.
10. What is Flynn’s Classification? (Nov/Dec 2014)
11. Define vector architecture.
12. What is meant by Strip mining?
13. What is multithreading? (Nov/Dec 2014) (Nov/Dec 2016)
14. Define hardware multithreading.
15. Brief about coarse-grained multithreading
16. Define simultaneous multithreading.
17. Brief about multicore microprocessors.
18. Write a note on shared memory multiprocessor.
19. Sketch the Classic organization of a shared memory multiprocessor.
20. What is synchronization.
Understanding:
1. What is the technique used to get more performance from loops?
2. State the difference between software and hardware speculation.
3. Differentiate in-order execution from out-of-order execution.
4. Differentiate between Strong scaling and weak scaling. (Apr/May 2015)
5. Compare SISD and MIMD.
6. State the difference between superscalar processor and multicore microprocessors.
7. Compare UMA and NUMA multiprocessors. (Apr/May 2015)
8. Give an example for SISD, SIMD, MISD, MIMD?
9. Differentiate Vector and Multimedia Extensions.
10. State the difference between coarse-grained and fine-grained multithreaded superscalar.
Application:
1. How would this loop be scheduled on a static two-issue pipeline for MIPS?
Loop: lw $t0, 0($s1) # $t0 =array element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 0($s1) # store result
addi $s1, $s1, -4 # decrement pointer
bne $s1, $ zero, Loop #branch $s1 != 0
Reorder the instructions to avoid as many pipeline stalls as possible. Assume branches
are predicted, so that control hazards are handled by the hardware.
2. How the reservation stations and reorder buffer are used to implement effective register
renaming?
3. Write about Software-centric approaches and hardware-centric approaches to exploit ILP.
4. Why do only few applications sustain more than two instructions per clock despite the
existence of processors with four to six issues per clock?
5. Why is it difficult to write parallel processing programs that are fast, especially as the
number of processors increases?
6. Suppose you want to achieve a speedup of 90 times faster with 100 processors. What
percentage of the original computation can be sequential?
Skill
1. Why parallel processing programs are much harder to develop than sequential programs?
2. Why vector architecture is not more popular outside high-performance computing though
it has several advantages?
3. Draw a neat sketch to show that how the four vector lanes are used to improve the
performance of a single vector add instruction.
4. Are multicore processors approaching their performance limits?
PART B (13 MARKS)

Knowledge:
1. Explain Instruction level Parallel Processing. State the challenges of parallel processing. (13)
2. Explain in detail about static multiple-issue processors. (13)
3. (a)Explain Multicore Processor in detail. (6)
(b) Explain Simultaneous multithreading (SMT). (7)
4. Discuss shared memory multiprocessor with a neat diagram. (13)
5. Describe briefly about dynamic multiple-issue processors. (13)
6. (a)Explain Vector architecture in detail. (10)
(b) Explain the concept of speculation? (3)
Understanding:
1. What is hardware multithreading? Compare and contrast Fine grained Multi-Threading and
Coarse grained Multi-Threading. (13)
2. Discuss about SISD, MIMD, SPMD and VECTOR systems. (13)
3. (a)Compare Vector with Multimedia Extensions. (10)
(b) Compare Vector with Scalar. (3)
Application:
1. Suppose you want to perform two sums: one is a sum of 10 scalar variables, and one is a
matrix sum of a pair of two-dimensional arrays, with dimensions 10 by 10. For now let’s
assume only the matrix sum is parallelizable; we’ll see soon how to parallelize scalar sums.
What speedup do you get with 10 versus 40 processors? Next, calculate the speedups
assuming the matrices grow to 20 by 20. (5)
(b) How would this loop be scheduled on a static two-issue pipeline for MIPS? (3)
Loop: lw $t0, 0($s1) # $t0 =array element
Reorder the instructions to avoid as many pipeline stalls as possible. Assume
branches are predicted, so that control hazards are handled by the hardware.
(c) How loop unrolling should be used in this loop for avoiding delays in multiple issue
pipelines?
Loop: lw $t0, 0($s1) # $t0 =array element (5)
2. Let’s assume we have 64 processors. Calculate the sum of 64,000 numbers on a shared
memory multiprocessor computer with uniform memory access time. (10)
(b)To achieve the speedup of 20.5 with 40 processors, we assumed the load was perfectly
balanced. That is, each of the 40 processors had 2.5% of the work to do. Instead, show the
impact on speedup if one processor’s load is higher than all the rest. Calculate at twice the load
(5%) and five times the load (12.5%) for that hardest working processor. How well the rest of the
processors are utilized? (3)
Part-C (15 Marks)
1. You are trying to bake 3 blueberry pound cakes. Cake ingredients are as follows:
1 cup butter, softened
1 cup sugar
4 large eggs
1 teaspoon vanilla extract
1/2 teaspoon salt
1/4 teaspoon nutmeg
1 1/2 cups flour
1 cup blueberries
The recipe for a single cake is as follows:

Step 1: Preheat oven to 325°F (160°C). Grease and flour your cake pan.
Step 2: In large bowl, beat together with a mixer butter and sugar at medium speed until light and
fluffy. Add eggs, vanilla, salt and nutmeg. Beat until thoroughly blended. Reduce mixer speed to
low and add flour, 1/2 cup at a time, beating just until blended.
Step 3: Gently fold in blueberries. Spread evenly in prepared baking pan. Bake for 60 minutes.
a. Your job is to cook 3 cakes as efficiently as possible. Assuming that you only have one
oven large enough to hold one cake, one large bowl, one cake pan, and one mixer, come
up with a schedule to make three cakes as quickly as possible. Identify the bottlenecks in
completing this task.
b. Assume now that you have three bowls, 3 cake pans and 3 mixers. How much faster is
the process now that you have additional resources?
c. Assume now that you have two friends that will help you cook, and that you have a large
oven that can accommodate all three cakes. How will this change the schedule you
arrived at in 7.5.1 above?
d. Compare the cake-making task to computing 3 iterations of a loop on a parallel computer.
Identify data-level parallelism and task-level parallelism in the cake-making loop.
2. The dining philosopher’s problem is a classic problem of synchronization and

concurrency. The general problem is stated as philosophers sitting at a round table doing
one of two things: eating or thinking. When they are eating, they are not thinking, and
when they are thinking, they are not eating. There is a bowl of pasta in the center. A fork is
placed in between each philosopher. The result is that each philosopher has one fork to her
left and one fork to her right. Given the nature of eating pasta, the philosopher needs two
forks to eat, and can only use the forks on her immediate left and right. The philosophers
do not speak to one another.
a. Describe the scenario where none of philosophers ever eats (i.e., starvation). What is the
sequence of events that happen that lead up to this problem?
b. Describe how we can solve this problem by introducing the concept of a priority? But can
we guarantee that we will treat all the philosophers fairly? Explain.
3. Assume a quad-core computer system can process database queries at a steady state rate
of requests per second. Also assume that each transaction takes, on average, a fixed amount
of time to process. The following table shows pairs of transaction latency and processing
rate.
Average Transaction latency Maximum transaction processing rate
1 ms 5000/sec
2 ms 5000/sec
1 ms 10,000/sec
2 ms 10,000/sec
For each of the pairs in the table, answer the following questions:
a. On average, how many requests are being processed at any given instant?
b. Discuss why we rarely obtain this kind of speedup by simply increasing the number of
cores.
4. On a CC-NUMA system, the cost of accessing non-local memory can limit our ability to
utilize multiprocessing effectively. The following table shows the costs associated with
access data in local memory versus non-local memory and the locality of our application
expresses as the proportion of access that are local.
Local load/store(cycle) Non-Local load/store(cycle) %Load Accesses
25 200 20
Answer the following questions. Assume that memory accesses are evenly distributed through
the application. Also, assume that only a single memory operation can be active during any
cycle. State all assumptions about the ordering of local versus non-local memory operations.
a. If on average we need to access memory once every 75 cycles, what is impact on our
application?
b. If on average we need to access memory once every 50 cycles, what is impact on our
application?
5. Consider the following piece of C code:
for (j=2;j<1000;j++)
D[j] = D[j-1]+D[j-2];
a. Write the MIPS code corresponding to the above fragment.

Ia Cell Unit 4

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Ia Cell Unit 4

Încărcat de

Drepturi de autor:

Formate disponibile

IFET COLLEGE OF ENGINEERING

DEPARTMENT OF CSE & IT

PART B (13 MARKS)

Part-C (15 Marks)

The recipe for a single cake is as follows:

2. The dining philosopher’s problem is a classic problem of synchronization and

5. Consider the following piece of C code:

a. Write the MIPS code corresponding to the above fragment.

S-ar putea să vă placă și