Sunteți pe pagina 1din 29

Tran, Van Hoai

Designing Parallel Programs


Dr. Tran, Van Hoai
Department of Systems & Networking
Faculty of Computer Science and Engineering
HCMC University of Technology
E-mail: hoai@cse.hcmut.edu.vn
2009-2010
Parallel Computing 2009-2010
Tran, Van Hoai
Issues
Considerations
Parallel machine architectures
Decomposition strategies
Programming models
Performance aspects: scalability, load balance
Parallel debugging, analysis, tuning
I/O on parallel machines
Not easy to suggest a methodical approach
Parallel Computing 2009-2010
Tran, Van Hoai
Steps in Designing (I. Foster)
Partitioning (phn hoch): decomposing the problem into small tasks
which can be performed in parallel
Communication (lin lc): determining communication structures, al-
gorithms to coordinate tasks
Agglomeration (kt t): combining the tasks into larger ones consid-
ering performance requirements and implementation costs.
Mapping (nh x): assigning tasks to processors to maximize processor
utilization and to minimize communication costs.
Problem
Partition
Communicate
Map
Agglomerate
Parallel Computing 2009-2010
Tran, Van Hoai
Other practical issues
Data distribution: input/output & intermediate data
Data access: management the access of shared data
Stage synchronization
Parallel Computing 2009-2010
Tran, Van Hoai
Partition (Decomposition)
Tasks: programmer-defined units of computation
Tasks can be executed simultaneously
Once defined, tasks are indivisible units of computation
Fine-grained decomposition
Two dimensions of decomposition:
Domain decomposition: data associated with the problem
Functional decomposition: computation operating on the
data
Avoiding the replication
Parallel Computing 2009-2010
Tran, Van Hoai
Domain Decomposition
Steps:
Dividing the data into equally-sized small tasks
Input/output & intermediate data
Different partitions may be possible
Different decompositions may exist for different phases
Determing the operations of computation on each task
Task = (data,operations)
Parallel Computing 2009-2010
Tran, Van Hoai
Functional Decomposition
Steps:
Dividing the computation into disjoint tasks
Examining data requirements of the tasks
Avoiding data replication
Hydrology Model Ocean Model
Atmospheric Model
Land Surface Model
Climate model
Search tree can be con-
sidered as functional de-
composition
Functional decomposition is a program structuring tech-
nique (modularity)
Parallel Computing 2009-2010
Tran, Van Hoai
Decomposition Methods
Domain decomposition (data decomposition)
Functional decomposition (task decomposition)
Recursive decomposition
Exploratory decomposition
Speculative decomposition
Parallel Computing 2009-2010
Tran, Van Hoai
Recursive Decomposition
Suitable for problems that can be solved using the divide-
and-conquer paradigm
Each of the subproblems generated by divide step becomes
tasks
Parallel Computing 2009-2010
Tran, Van Hoai
Quick Sort
QUICKSORT( A, q, r )
if q < r then
x := A[q]
s := q
for i := q + 1 to r do
if A[i] x then
s := s + 1
swap( A[s], A[i] )
end if
end for
swap( A[q], A[s] )
QUICKSORT( A, q, s )
QUICKSORT( A, s + 1, r )
end if
3 2 1 5 8 4 3 7
1 2 3 8 4 3 7
1 2 3 3 4 5 8 7
1 2 3 4 5 7 8 3
1 2 3 3 4 5 7 8
5 Pivot
Final position
Parallel Computing 2009-2010
Tran, Van Hoai
Quick Sort
5 1 12 11 10 6 8 3 7 4 9 2
3 4 2 1
1 2 3 4
1 2 3 4
5 12 11 10 6 8 7 9
5 6 8 7 9 12 11 10
5 6
7 8
5 6 7 8
9 10 12 11
10 12
11
11 12
Quicksort task-dependency graph based on recursive decomposition
Parallel Computing 2009-2010
Tran, Van Hoai
Minimum Finding
Possibly use divide-and-conquer algorithms to solve problems which are
traditionally solved by non divide-and-conquer approaches
FINDMIN( A )
min := A[0]
s := q
for i := 1 to n 1 do
if A[i] < min then
min := A[i]
end if
end for
RECURSIVE_MINFIND( A, n )
if n = 1 then
min := A[0]
else
lmin := RECURSIVE_FINDMIN( A, n/2 )
rmin := RECURSIVE_FINDMIN( &A[n/2], n
n/2 )
if lmin < rmin then
min := lmin
else
min := rmin
end if
end if
Parallel Computing 2009-2010
Tran, Van Hoai
Domain Decomposition
Operating on large amounts of data
Often performed in two steps:
Partitioning the data
Inducing the computational partitioning from the data partitioning
Data to be partitioned: input/out/intermediate
Parallel Computing 2009-2010
Tran, Van Hoai
Domain Decomposition
Dense matrix-vector multiplication
task 1
task 2
task n
A b y
2 1
n
3-D grid decomposition
Parallel Computing 2009-2010
Tran, Van Hoai
Matrix-Matrix Multiplication
Partitioning the output data

A
11
A
12
A
21
A
22

B
11
B
12
B
21
B
22

C
11
C
12
C
21
C
22

Partitioning
Task 1: C
11
= A
11
B
11
+ A
12
B
21
Task 2: C
12
= A
11
B
12
+ A
12
B
22
Task 3: C
21
= A
21
B
11
+ A
22
B
21
Task 4: C
22
= A
21
B
12
+ A
22
B
22
Parallel Computing 2009-2010
Tran, Van Hoai
Matrix-Matrix Multiplication
There are different decompositions of computations
Decomposition 1
Task 1: C
11
= A
11
B
11
Task 2: C
11
= C
11
+ A
12
B
21
Task 3: C
12
= A
11
B
12
Task 4: C
12
= C
12
+ A
12
B
22
Task 5: C
21
= A
21
B
11
Task 6: C
21
= C
21
+ A
22
B
21
Task 7: C
22
= A
21
B
12
Task 8: C
22
= C
22
+ A
22
B
22
Decomposition 2
Task 1: C
11
= A
11
B
11
Task 2: C
11
= C
11
+ A
12
B
21
Task 3: C
12
= A
12
B
22
Task 4: C
12
= C
12
+ A
11
B
12
Task 5: C
21
= A
22
B
21
Task 6: C
21
= C
21
+ A
21
B
11
Task 7: C
22
= A
21
B
12
Task 8: C
22
= C
22
+ A
22
B
22
Parallel Computing 2009-2010
Tran, Van Hoai
Matrix-Matrix Multiplication
Partitioning the intermediate data
Stage 1

A
11
A
12
A
21
A
22

B
11
B
12
B
21
B
22

D
111
D
112
D
122
D
122

D
211
D
212
D
222
D
222

Stage 2

D
111
D
112
D
122
D
122

D
211
D
212
D
222
D
222

C
11
C
12
C
21
C
22

Parallel Computing 2009-2010


Tran, Van Hoai
Matrix-Matrix Multiplication
A decomposition induced by a partitioning of D
Task 01: D
111
= A
11
B
11
Task 02: D
211
= A
12
B
21
Task 03: D
112
= A
11
B
12
Task 04: D
212
= A
12
B
22
Task 05: D
121
= A
21
B
11
Task 06: D
221
= A
22
B
21
Task 07: D
122
= A
21
B
12
Task 08: D
222
= A
22
B
22
Task 09: C
11
= D
111
+ D
211
Task 10: C
12
= D
112
+ D
212
Task 11: C
12
= D
121
+ D
221
Task 12: C
22
= D
122
+ D
222
Parallel Computing 2009-2010
Tran, Van Hoai
Matrix-Matrix Multiplication
A
21
A
11
B
11
B
12
D
121
D
122
D
111
D
112
A
22
A
12
B
21
B
22
D
221
D
222
D
211
D
212
C
22
C
21
C
12
C
11
1 2 3 4 5 6 7 8
9 10 11 12
+
Taskdependency graph
Parallel Computing 2009-2010
Tran, Van Hoai
Domain Decomposition
Most widely-used decomposition technique
Large problems often have large amounts of data
Splitting work based on data is natural way to obtain a high concur-
rency
Can be combined with other methods
2
2 1
1
3 7 9 11 4 5 8 1 Domain decomposition
Recursive Decomposition
10 6 13 7 19 3 9
Parallel Computing 2009-2010
Tran, Van Hoai
Exploratory Decomposition
Decomposing computations corresponding to a search of
a space of solutions
Not as general purpose
Possibly resulting in speedup anomalies
Slow-down or superlinear speedup
Parallel Computing 2009-2010
Tran, Van Hoai
Solution
Total sequential work: 2m+1
Total parallel work: 1
Total parallel work: 4m
Total sequential work: m
m m m m m m m m
Parallel Computing 2009-2010
Tran, Van Hoai
Speculative Decomposition
Extracting concurrency in problems in which next steps is
one of many possible actions that can only be determined
when the current task finishes
Principle:
Assuming a certain outcome of currently executed tasks
Executing some of the next steps (speculation)
Parallel Computing 2009-2010
Tran, Van Hoai
A
B
C
D
E
F
G
H
I System
inouts
System
Output
Network of discrete event simulation
Parallel Computing 2009-2010
Tran, Van Hoai
Speculative Execution
If predictions are wrong
Work is wasted
Work may need to be undone
state-restoration overhead
Only way to extract concurrency
Parallel Computing 2009-2010
Tran, Van Hoai
Design Checklist
#tasks #processors ?
If not, not flexible
Avoiding redundant communication and storage requirement ?
If not, not scalable
Task have comparable size ?
If not, hard to allocate to processors
#tasks is scalable with problem size ?
Ideally, problem size , #tasks
If not, unable to solve large problems with more processors
Several alternative partitions ?
If not, not flexible
Parallel Computing 2009-2010
Tran, Van Hoai
Communication
Communication is specified in 2 phases:
Defining channel structure, (technology-dependent)
Specifying messages sent and received
Determining communication requirements in functional de-
composition is easier than in domain decomposition
Data requirements among graph is presented as task-dependency
graph (TDG): certain task(s) can only start once some
other task(s) have finished
Parallel Computing 2009-2010
Tran, Van Hoai
Task-Dependency Graph
Key concepts derived from task-dependency graph
Degree of concurrency: number of tasks can be concur-
rently executed
Critical path: the longest vertex-weighted path
weights represent task size
Task granularity affects both of the characteristics above
Parallel Computing 2009-2010
Tran, Van Hoai
Task-Interaction Graph (TIG)
Capture the pattern of interaction between tasks
TIG usually contains TDG as a subgraph
i.e., there may be interactions between tasks even if there are no
dependencies (e.g., accesses on shared data)
7 0 1 2 3 4 5 6 9 8 1011
Task 11
Task 8
Task 4
Task 0
0
1
2
3
4
5
6
7
8
11
10 9
TDG and TIG are important in developing effectively
mapping (maximize concurreny and minimize overheads)
Parallel Computing 2009-2010

S-ar putea să vă placă și