Documente Academic
Documente Profesional
Documente Cultură
2. Classification of parallel algorithms (a) Divide and Conquer (b) Processor Farms
Master Master
Slave 1 Slave i Slave n Slave 1 Slave i Slave n
Thread
creation Move
Move
Move
Move
Move
No. of No. of
active threads = 0 thraeds = 0
Assignment Thread creation point (a)FSFE (b)LSFE
Thread termination point
Case Adaptation
Parallel Program
Figure 5. Iterative transformation.
3. Data structures: source and result data are determined 8,9 and 10 should be determined in a body, because they
in terms of (a) arrays or others, (b) one dimension, are interrelated.
two dimensions or three dimensions, and (c) integer, The definition of a given problem is described in the
floating, characters or structures. applications and specification. Typical data structures, task
division, termination conditions, and topologies are given
4. Task division: partitioning and distribution of source by the system using figures and explanations. The user
and result data are determined. Partitioning is selected selects one of them in each index or describes it. Arrays and
structures are candidates as data structures. In task division, relevant case matches with those of a given problem. This
data partitioning and data distribution should be determined. means that the bone structure, that is the most important
For example, raw-wise, column-wise and mesh are possible issue of parallel programs we believe, can be reused.
for two dimensional arrays. Each partitioned element may Task division can be reused to some extent if indices of
be distributed in blocks or in cyclic. Sometimes the whole task division are same. Otherwise, most of them should be
data may be copied. adapted or newly described. Unit calculations seem to be
Typical methods of algorithm class, parallelization meth- almost reused from a serial program. In addition, a user
ods and interaction are presented by the system, and thus can refer to the program body and speedup of a relevant
the user can select one of them. The algorithm class that is case. Thus, the user can predict how much speedup the new
the key issue in parallel programming is determined based program obtains.
on data dependencies and the termination conditions of
loops. For example, processor farm can be used if no data
dependencies and no termination conditions. Process net-
works can be applied if there are data dependencies but no
termination conditions. Since the typical parallel execution
structures are stored into the case base for each algorithm
class, those for the determined algorithm class are presented
to the user. The user selects the most suitable execution
structure based on synchronization and the parallelization
method. BACS execution structure is given by the system.
10
8
Threads, synchronization and task division are reused from
6
skeletons, while unit calculations are reused from serial
programs.
4
2
Threads and synchronization can be completely reused
1 in five programs. In thinning, the order of generating
0
0 1 4 8 16 threads were altered and five synchronizations between the
Number of Threads
top thread and the tail thread were inserted to confirm the
terminations of four direction calculations.
Figure 11. Parallelization effects of thinning. In terms of task division, image data storage, edge
detection and knapsack problem can reuse the skeletons in
part, but adaptation and new descriptions were required in
general. Most of the unit calculations were reused from the
5. Evaluation of case-based parallel program- serial programs. In package wrapping algorithm, shared
ming variables were altered to local variables in each thread.
We believe that it is most important to decide which
Six programs were developed by adapting relevant cases. algorithm class can be applied to a given problem, and
Image data storage was developed using quick sorting. which execution structure can implement the algorithm.
This means that it is important and difficult for programmers tem which allows to retrieve the most relevant case from a
to decide the most suitable execution structure and how to case base and adapt it to a given problem.
implement the structure using threads and synchronization. The skeletons of cases include threads, synchronization
If thinning is registered to the case base, the case is reused. and task division that are the most important and difficult
Hence, threads and synchronization can be almost reused issues of parallel programs. Image data storage, three di-
from skeletons, provided that the case base is enriched. mensional spline, edge detection, thinning, knapsack prob-
This means that the execution structure can be supported. lem and package wrapping algorithm were developed using
If these structures are determined, the user just insert unit relevant cases. The experiment showed that threads related
calculations and variable initializations into the program issues and synchronization can be reused from the skeletons,
from a serial program. Since task division requires case and task division requires case adaptation by programmers.
adaptation, a supporting mechanism should be considered. We have been developing other new programs using
relevant cases to verify the effectiveness of this system.
Simplification of task division and automatic case adapta-
6. Related work
tion should be investigated in the future.
7. Conclusions