Documente Academic
Documente Profesional
Documente Cultură
1
cells are adjacent to the easternmost cells. Thus, the grid’s shape is like the
surface of a torus (hence “toroidal”) or donut. See Figure 1.
2
(a) “Horizontal” Decomposition (b) “Vertical” Decomposition
3 Assignment
In this assignment, you will write a shared-memory parallelization of the Game
of Life evolution algorithm, using your choice of Pthreads or OpenMP. Your
assignment is broken into several parts, described in the subsections below.
You should read all parts before you begin, since they may help you plan your
implementation. The first part does not require any coding.
2. Each thread “owns” a particular section of the grid. How does information
pass between threads?
3. When do the threads need to synchronize? What construct can you use
to implement this synchronization?
3
3.2 Parallel implementation: first pass
Your next step is to write a simple parallel version of the Game of Life. This
should be based on the sequential implementation which is provided in life.c,
in the subroutine sequential game of life (). Please do NOT modify this func-
tion. You should implement a parallel function which computes the same results
as sequential game of life (). You are welcome to change the main() function
(in gol.c) and add or edit whatever source files you wish.
Your program should divide the evolution algorithm between multiple threads
in one of the methods suggested in Figure 2. The number of threads will be an
argument passed to your program (as described below), and you should divide
the board between them as evenly as possible. The threads should synchronize
after every tick. This first pass will be due next Tuesday (a week after release)
at 5pm.
Your program should take at least these three arguments. If you need to change
the command-line arguments, please document this clearly in a README file
or in your report. For example, you might like to include the number of threads
(if you use a Pthreads solution).
4
We provide some text files which contain sample initial conditions. You are
also welcome to make and test your own. You might want to try some of the
patterns in the Wikipedia article that repeat with a known period, or that are
stable.
Replace <nrows> with the number of rows in the board, and <ncols> with the
number of columns. The above command writes the initial conditions to a file
called start.txt.
Currently, bitboard.c is set to use a random seed from /dev/random,
/dev/urandom, or the system time, in that order of preference. You’re wel-
come to edit bitboard.c and tell it to use the same random seed each time,
if you like. This will produce the same sequence of numbers each time, which
might be useful for testing.
Note that the initboard.x application and its associated source files are
not part of the assignment. They are included only for your convenience, as a
way to test your code. You are welcome to edit them and/or add features, if
you like. Please inform us of such changes in the documentation that you turn
in.
3.2.4 Verification
The main() function in gol.c has a verifyp option. If you set this (in the source
code) to a nonzero number, the main() function will test your game of life ()
function against the results of sequential game of life (). (This means that if
you want to write your own sequential Game of Life function, you should give
it a different name, so that you won’t mess up the verification!) Note that
verification will incur a runtime cost (though it may or may not affect the
timings), so you might want to turn it off while benchmarking.
3.3 Optimizations
Once you have completed, benchmarked, and turned in a correct parallel im-
plementation, you will optimize your code. We will release more information
about optimizations soon, including suggested optimizations as well as target
performance. For now, concentrate on writing correct code. You’ll have an
additional week to optimize your code.
The aforementioned Wikipedia article also discusses some (sequential) opti-
mizations, such as only computing on active regions. You are not required to
5
Grid size Num. iters Time (s)
1000 × 1000 10 0.34
2000 × 2000 10 1.9
3000 × 3000 10 4.5
1000 × 1000 100 3.0
try any of these, but you may wish to read about them and try them. Note,
however, that tracking active regions in parallel is tricky, as you have to account
for active regions that span processor boundaries. Benchmarking will also re-
quire care, as tracking active regions won’t improve performance if you start
with a random board.
3.4 Benchmarking
Benchmark your program much as you did the parallel matrix-matrix multiply
assignment: measure weak and strong speedup. Also, measure the performance
of thread synchronization. You can do this by commenting out the actual com-
putation, but leaving in the parallel synchronization. If you test enough gener-
ations, you can sum up the resulting timings and get a fairly accurate measure
of parallel overhead.
You might wonder what problem sizes to use. This is guided in part by how
long it takes to solve the problem. Figure 1 shows some sequential timings on an
Itanium 2, for different problem sizes. You might want to use the clock resolution
(available in the main() function in gol.c) to guide the choice of problem size.
Furthermore, you should run several iterations, as this will expose how well your
code exploits the cache (you don’t have to worry about this until next week!).
You should pick at least some problem sizes somewhat larger than the cache
size. (The provided code represents the board with one byte per cell.)
3.5 Report
Include a written report with each homework submission. You may recycle the
relevant parts of the report for reporting on optimizations the following week.
The first week’s report should include:
6
• Results, presented in graphical form with minimal clutter;2
• Discussion of results;
You should report results in terms of the number of arithmetic operations (which
are not flops) per second: if your grid is n by n, take 7n2 and divide it by the
runtime. This is because computing each cell requires seven arithmetic opera-
tions: you have to sum the neighboring cells. Don’t count the actual number
of arithmetic operations: otherwise you’ll unfairly punish an implementation
that does less work per cell, or unfairly reward an implementation that does
redundant work in order to save communication.
3.6 Submission
Please use the submit script system to turn in the written report, and all of
your code (including the provided code). We must be able to compile, link, and
run your code without adding or changing source files. (Please also run a make
clean before turning in your source files.)
4 Reference Links
• Conway’s Game of Life at Wikipedia
• Life at Mathworld
2 This means you may want to leave some results out of the graph, if they don’t contribute
to our understanding of the performance. If you think we should see those results, you can
put them in a separate table or text file.