Parallelism

What do you mean by data and control parallelism in Parallel and Distributed Computing?
Discuss
with examples.
Data parallelism is a way of performing parallel execution of an application on multiple processors.

It focuses on distributing data across different nodes in the parallel execution environment and
enabling simultaneous sub-computations on these distributed data across the different compute
nodes. This is typically achieved in SIMD mode (Single Instruction, Multiple Data mode) and can
either have a single controller controlling the parallel data operations or multiple threads working in
the same way on the individual compute nodes (SPMD).
Data parallelism is a kind of parallelism that, instead of relying on process or task concurrency, is
related to both the flow and the structure of the information. It implies partitioning data to
processes, such that a single portion of data is assigned to a single process. The portions of data are
of approximately equal size. If the portions require rather different amounts of time to be processed,
the performance is limited by the speed of the slowest process. In that case, the problem can be
mitigated by partitioning the data into a large number of smaller portions. Then, a process takes
another portion once it finishes with the previous one, and a faster process is assigned more portions.
Single-program-multiple-data (SPMD) paradigm is an example of data parallelism, as the processes
share the same code but operate on different data. Another example is parallelization of a loop with
no loop-carried dependences in which the processes execute the same loop body, but for different
loop indices, and consequently, for different data.
For data parallelism, the goal is to scale the throughput of processing based on the ability to
decompose the data set into concurrent processing streams, all performing the same set of
operations. Let us take a small example-
A data parallel job on an array of 'n' elements can be divided equally among all the processors. Let
us assume we want to sum all the elements of the given array and the time for a single addition
operation is Ta time units. In the case of sequential execution, the time taken by the process will be
n*Ta time units as it sums up all the elements of an array. On the other hand, if we execute this job
as a data parallel job on 4 processors the time taken would reduce to (n/4) *Ta + merging overhead
time units. Parallel execution results in a speedup of 4 over sequential execution. One important
thing to note is that the locality of data references plays an important part in evaluating the
performance of a data parallel programming model. Locality of data depends on the memory
accesses performed by the program as well as the size of the cache.
In a multiprocessor system executing a single set of instructions (SIMD), data parallelism is achieved
when each processor performs the same task on different distributed data. In some situations, a
single execution thread control operation on all the data. In others, different threads control the
operation, but they execute the same code.
For instance, consider matrix multiplication and addition in a sequential manner as discussed in the
example.
Below is the sequential pseudo-code for multiplication and addition of two matrices where the result
is stored in the matrix C. The pseudo-code for multiplication calculates the dot product of two
matrices A, B and stores the result into the output matrix C.
If the following programs were executed sequentially, the time taken to calculate the result would be
of the O (n3 ) and O (n) for multiplication and addition respectively.
We can exploit data parallelism in the preceding codes to execute it faster as the arithmetic is loop
independent. Parallelism of the matrix multiplication code is achieved by using OpenMP. An OpenMP
directive, instructs the compiler to execute the code in the for loop in parallel. For multiplication, we
can divide matrix A and B into blocks along rows and columns respectively. This allows us to calculate
every element in matrix C individually thereby making the task parallel. For example: A[m*n] dot
B[n*k] can be finished in O (n) instead of O ( m *n*k) when executed in parallel using m*k processors.
It can be observed from the example that a lot of processors will be required as the matrix sizes keep
on increasing. Keeping the execution time low is the priority but as the matrix size increases, we are
faced with other constraints like complexity of such a system and its associated costs. Therefore,
constraining the number of processors in the system, we can still apply the same principle and divide
the data into bigger chunks to calculate the product of two matrices.
For addition of arrays in a data parallel implementation, let’s assume a more modest system with
two central processing units (CPU) A and B, CPU A could add all elements from the top half of the
arrays, while CPU B could add all elements from the bottom half of the arrays. Since the two
processors work in parallel, the job of performing array addition would take one half the time of
performing the same operation in serial using one CPU alone.
The program expressed in pseudocode below – which applies operation, foo, on every element in the
array d – illustrates data parallelism:
In an SPMD system executed on 2 processor system, both CPUs will execute the code. Data
parallelism emphasizes the distributed nature of the data, as opposed to the processing (control
parallelism). Most real programs fall somewhere on a continuum between control parallelism and
data parallelism.
Data parallel programming environments:
A variety of data parallel programming environments are available today, most widely used
of which are:
1. Message Passing Interface: It is a cross-platform message passing programming interface for

parallel computers. It defines the semantics of library functions to allow users to write
portable message passing programs in C, C++ and Fortran.
2. Open Multi Processing (Open MP): It’s an Application Programming Interface (API) which
supports shared memory programming models on multiple platforms of multiprocessor
systems.
3. CUDA and Open ACC: CUDA and Open ACC (respectively) are parallel computing API
platforms designed to allow a software engineer to utilize GPU’s computational units for
general purpose processing.
4. Threading Building Blocks and Raft Lib: Both open source programming environments that
enable mixed data/task parallelism in C/C++ environments across heterogeneous resources.
Control parallelism
Control parallelism focuses on distributing parallel execution threads across parallel computing
nodes. These threads may execute the same or different threads. These threads exchange messages
either through shared memory or explicit communication messages, as per the parallel algorithm. In
the most general case, each of the threads of a Task-Parallel system can be doing completely
different tasks but co-ordinating to solve a specific problem. In the most simplistic case, all threads
can be executing the same program and differentiating based on their node-ids to perform any
variation in task-responsibility. Most common Task-Parallel algorithms follow the Master-Worker
model, where there is a single master and multiple workers. The master distributes the computation
to different workers based on scheduling rules and other task-allocation strategies.
In “control parallelism” or task parallelism, processes are assigned pieces of code. Each piece of code
works on the same data and is assigned to exactly one process. An example of task parallelism is
computing the average and standard deviation on the same data. These two tasks can be executed
by separate processes. Another example is parallelization of a loop with an if–then–else construct
inside, such that different code is executed in different iterations.
Now we are taking simple example for better understanding of control parallelism:
Imagine that you are preparing a three-course dinner, with an appetizer, a main course, and a
dessert. Although our cultural bias constrains the order of eating those courses, there are no rules
about the order of preparing the courses. In other words, you can make the dessert before cooking
the main course, because there is nothing inherent in dessert preparation that depends on main
course preparation.
How long does it take to prepare this meal? If there is only one chef, then the total time to prepare
the meal is the sum of the times to prepare each course. But if we assume that there are three
competent chefs in the kitchen, then each one can take on the task of preparing one of the courses. If
all the participants start at the same time, then the total time to prepare the meal is bounded by the
longest preparation time of each of the three courses. By delegating each independent task to an
available resource, we reduce the overall preparation time, possibly by two-thirds of the original time
requirement.
This is an example of task parallelism, which is a coarsely grained parallelism. In this case, a high-
level process is decomposed into a collection of discrete tasks, each of which performs some set of
operations and results in some output or side effect.
In a multiprocessor system, control parallelism is achieved when each processor executes a different
thread (or process) on the same or different data. The threads may execute the same or different
code. In the general case, different execution threads communicate with one another as they work,
but this is not a requirement. Communication usually takes place by passing data from one thread to
the next as part of a workflow.
As a simple example, if a system is running code on a 2-processor system (CPUs "a" & "b") in a
parallel environment and we wish to do tasks "A" and "B", it is possible to tell CPU "a" to do task "A"
and CPU "b" to do task "B" simultaneously, thereby reducing the run time of the execution. The tasks
can be assigned using conditional statements as described below.
Control parallelism emphasizes the distributed (parallelized) nature of the processing (i.e. threads),
as opposed to the data (data parallelism). Most real programs fall somewhere on a continuum
between task parallelism and data parallelism.
Thread-level parallelism is the parallelism inherent in an application that runs multiple threads at
once. This type of parallelism is found largely in applications written for commercial servers such as
databases. By running many threads at once, these applications are able to tolerate the high
amounts of I/O and memory system latency their workloads can incur - while one thread is delayed
waiting for a memory or disk access, other threads can do useful work.
The exploitation of thread-level parallelism has also begun to make inroads into the desktop market
with the advent of multicore microprocessors. This has occurred because, for various reasons, it has
become increasingly impractical to increase either the clock speed or instructions per clock of a single
core. If this trend continues, new applications will have to be designed to utilize multiple threads in
order to benefit from the increase in potential computing power. This contrasts with previous
microprocessor innovations in which existing code was automatically sped up by running it on a
newer/faster computer.
The pseudocode below illustrates control parallelism:
The goal of the program is to do some net total task ("A+B"). If we write the code as above and
launch it on a 2-processor system, then the runtime environment will execute it as follows.
 In an SPMD (single program, multiple data) system, both CPUs will execute the code.
 In a parallel environment, both will have access to the same data.
 The "if" clause differentiates between the CPUs. CPU "a" will read true on the "if" and CPU "b"
will read true on the "else if", thus having their own task.
 Now, both CPU's execute separate code blocks simultaneously, performing different tasks
simultaneously.
Code executed by CPU “a”:
Code executed by CPU “b”:
This concept can now be generalised to any number of processors.
References:
1. Dinkar Sitaram , Geetha Manjunath, in Moving To The Cloud , 2012
2. https://www.sciencedirect.com/topics/computer-science/data-parallelism
3. Bertil Schmidt, Parallel Programming, 2018

CSN – 254
Assignment – 2
Group details:
S.No NAME ENROLLMENT CONTRIBUTION EMAIL

NO
1. Diya 18114021 Q2: Defining dmourya@cs.iitr.ac.in
Data and Control
Parallelism With
examples
2. Aanand 18114001 Q3: Study and anambudiripad@cs.iitr.ac.in
Analyses the
effects of Data
and Control
parallelism
3. Anjali 18114005 Q1: Part a and b ameena1@cs.iitr.ac.in
Parallel and
distributed
computing on
the basis of
programming
and architecture
4. Unmesh 18114079 Q1: Part b and c ukumar@cs.iitr.ac.in
Parallel and
distributed
computing on
the basis of
Architecture and
System

Parallelism

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Parallelism

Încărcat de

Drepturi de autor:

Formate disponibile

What do you mean by data and control parallelism in Parallel and Distributed Computing?

Data parallelism is a way of performing parallel execution of an application on multiple processors.

Data parallel programming environments:

1. Message Passing Interface: It is a cross-platform message passing programming interface for

This concept can now be generalised to any number of processors.

1. Dinkar Sitaram , Geetha Manjunath, in Moving To The Cloud , 2012

3. Bertil Schmidt, Parallel Programming, 2018

S.No NAME ENROLLMENT CONTRIBUTION EMAIL

S-ar putea să vă placă și