Problem 2.2: Answer

Pulung Sombonuryo
23213103
EL6201 Sistem Paralel (Tugas 01)
1. Problem 2.2
Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with the processor
operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In
each memory cycle, the processor fetches four words (cache line size is four words). What is the peak
achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal
cache placement policy.
1
2
3
/* dot product loop */

for (i = 0; I < dim; i++)
dot_product += a[i] * b [i]
Answer:
When the cache is always hit, the maximum performance is calculated as below.
When the cache is always miss, all the data is obtained from the DRAM.
2. Problem 2.6
Consider a SMP with a distributed share-address-space. Consider a simple cost model in which it
takes 10 ns to access local cache, 100 ns to access local memory, and 400 ns to access remote
memory. A parallel program is running on this machine. The program is perfectly load balanced with
80% of all accesses going to local cache, 10% to local memory, and 10% to remote memory. What is
the effective memory access time for this computation? If the computation is memory bound, what is
the peak computation rate?
Now consider the same computation running on one processor. Here, the processor hits the cache 70%
of the time and local memory 30% of the time. What is the effective peak computation rate for one
processor? What is the fractional computation rate of a processor in parallel configuration as
compared to the serial configuration?
Hint: Notice that the cache hit for multiple processors is higher than that of for one processor. This is
typically because the aggregate cache available on multiprocessors is larger than on single processor
systems.
Answer:
Effective access time for case 1:
Peak computation rate for case1:
Pulung Sombonuryo
23213103
Effective access time for case 2:
Peak computation rate for case2:
Fractional computation rate in parallel configuration compared to serial configuration:
3. Problem 2.7
What are the major differences between message-passing and shared-address-space computers? Also
outline the advantages and disadvantages of the two.
Answers:
Message-passing computer:
- The system consists of several processing nodes/elements with its own exclusive address space.
- Each processing nodes/elements can be either single processor or a shared-address-space
multiprocessor.
- Interaction between processing nodes/elements must be accomplished using messages.
- The exchange of messages is used to transfer data, work, and to synchronize actions among the
processes.
Shared-address-space computer:
- Common data space is accessible to all processor.
- This type of system also referred to as multiprocessors.
- Processors interact by modifying data objects stored in shared-address space.
- Shared-address-space can be local (exclusive to a processor) or global (common to all
processors).
- If time taken by a processor to access any memory in the system is identical, the system is
classified as Uniform Memory Access (UMA). If time taken by a processor to access any memory
in the system is vary, the system is called Non-Uniform Memory Access (NUMA)
4. Problem 2.8
Why is it difficult to construct a true shared-memory computer? What is the minimum number of
switches for connecting p processors to a shared memory with b words (where each word can be
accessed independently)?
Answer:
True shared-memory computer is difficult to implement due to unreasonable complexity and cost
increase while maintain the scalability. Number of switches needed for connecting p processors to a
shared memory with b words are (pb).
Pulung Sombonuryo
23213103
5. Problem 2.11
Consider the omega network described in Section 2.4.3. As shown there, this network is blocking
network (that is, a processor that uses the network to access a memory location might prevent another
processor from accessing another memory location). Consider an omega network that connects p
processors. Define a function f that maps P = [0, 1, ., p-1] onto a permutation P of P (that is , P[i]
= f(P[i]) and P[i] P for all 0 i < p). Think of this function as mapping communication requests
by the processors so that processor P[i] requests communication with processor P[i].
a. How many distinct permutation functions exist?
b. How many of these functions result in non-blocking communication?
c. What is the probability that an arbitrary function will result in non-blocking communication?
Answer:
a. Number of input-output mapping : p!
b. Number of switch for each stage : p/2
Number of permutation each stage : 2p/2
Number of stage :
Number of non-blocking communication :
c. Probability of non-blocking communication :

Problem 2.2: Answer

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Problem 2.2: Answer

Încărcat de

Drepturi de autor:

Formate disponibile

Pulung Sombonuryo

/* dot product loop */

Peak computation rate for case1:

Effective access time for case 2:

Peak computation rate for case2:

Fractional computation rate in parallel configuration compared to serial configuration:

c. Probability of non-blocking communication :

S-ar putea să vă placă și