HW 3 A

UNIVERSITY OF CALIFORNIA AT BERKELEY
Computer Science Division

CS194-2: Parallel Programming
Assignment 3: Parallel Game of Life

1 Game of Life
1.1 Basic rules
In this assignment you will implement a shared-memory parallel version of Con-
way’s Game of Life. You’ve probably coded a sequential version of the Game
of Life in the CS 61 series, but you might wish to review Wikipedia’s entry on
Conway’s Game of Life. Here is an excerpt:1
The universe of the Game of Life is an infinite two-dimensional
orthogonal grid of square cells, each of which is in one of two possi-
ble states, live or dead. Every cell interacts with its eight neighbors,
which are the cells that are directly horizontally, vertically, or di-
agonally adjacent. At each step in time, the following transitions
occur:
1. Any live cell with fewer than two live neighbors dies, as if by
loneliness.
2. Any live cell with more than three live neighbors dies, as if by
overcrowding.
3. Any live cell with two or three live neighbors lives, unchanged,
to the next generation.
4. Any dead cell with exactly three live neighbors comes to life.
The initial pattern constitutes the “seed” of the system. The first
generation is created by applying the above rules simultaneously to
every cell in the seed. Births and deaths happen simultaneously,
and the discrete moment at which this happens is sometimes called
a tick. (In other words, each generation is a pure function of the
one before.) The rules continue to be applied repeatedly to create
further generations.
1.2 Boundary conditions

Our version will be slightly different. Instead of an infinite grid, our grid will be
finite with toroidal boundary conditions. This means that cells on the edges of
the grid will “wrap around” to connect with the opposite edge of the grid: the
northernmost cells are adjacent to the southernmost cells, and the westernmost
1 I’ve adjusted the punctuation slightly from the original.
1
cells are adjacent to the easternmost cells. Thus, the grid’s shape is like the
surface of a torus (hence “toroidal”) or donut. See Figure 1.
Figure 1: Finite Game of Life board showing neighbors with

toroidal boundary conditions. The next state of the lighter-
colored cells depends on the current state of the darker neigh-
bors.
1.3 Two buffers

Each successive grid state is a pure function of the previous grid state. Due to
the data dependencies, this means that you need two grids to ensure correctness:
current and next. For each cell in current, the evolution algorithm counts the
live cells among the eight neighboring cells in current to determine the state of
the cell in next. Once the evolution algorithm has updated the entire board, the
two boards are swapped (that is, the pointers are swapped, not the values), and
the loop repeats. Swapping the boards lets each generation be a pure function
of the previous generation, without needing a new board for each generation
(it’s a manual version of garbage collection).
If you don’t use two buffers, you’ll get different answers. Furthermore, the
answer will depend on the order in which you traverse the array, and (for a
parallel implementation) the way in which you divide the array among the
processors. Using two buffers ensures that you’ll always get the same answer,
no matter how you parallelize the code.
2 Parallelizing the Game of Life

The Game of Life is inherently a data-parallel problem. For each game tick
(evolution step), a cell’s next state depends only on the current state of its local
neighbors. Thus, we can effectively decompose the evolution algorithm into a
set of localized problems. See Figure 2 for an illustration of some decomposition
options.
2
(a) “Horizontal” Decomposition (b) “Vertical” Decomposition
Figure 2: Two possible domain decompositions on the Game

of Life board. Cells are colored to show the division of update
responsibilities between threads, with each color corresponding
to a thread. The board dimensions and number of threads was
arbitrarily chosen for illustration.
3 Assignment
In this assignment, you will write a shared-memory parallelization of the Game
of Life evolution algorithm, using your choice of Pthreads or OpenMP. Your
assignment is broken into several parts, described in the subsections below.
You should read all parts before you begin, since they may help you plan your
implementation. The first part does not require any coding.
3.1 Questions on Parallelization Issues

Here are some questions to get you thinking about the problem. Please answer
them with concise but complete explanations.
1. Without fixing the number of threads (number of regions), is one of the

domain decompositions in Figure 2 preferable to the others? Why? Hint:
Think of how memory is accessed.
2. Each thread “owns” a particular section of the grid. How does information
pass between threads?
3. When do the threads need to synchronize? What construct can you use
to implement this synchronization?
4. Identify any “parallelization cost” (overhead or redundancy) inherent in

the decompositions in Figure 2.
3
3.2 Parallel implementation: first pass
Your next step is to write a simple parallel version of the Game of Life. This
should be based on the sequential implementation which is provided in life.c,
in the subroutine sequential game of life (). Please do NOT modify this func-
tion. You should implement a parallel function which computes the same results
as sequential game of life (). You are welcome to change the main() function
(in gol.c) and add or edit whatever source files you wish.
Your program should divide the evolution algorithm between multiple threads
in one of the methods suggested in Figure 2. The number of threads will be an
argument passed to your program (as described below), and you should divide
the board between them as evenly as possible. The threads should synchronize
after every tick. This first pass will be due next Tuesday (a week after release)
at 5pm.
3.2.1 Program Arguments

We’ve set up gol.c with a main() function that takes three arguments:
$ ./gol <generations> <initial_board_file> <final_board_file>
The arguments have the following meanings:
• <generations>: the number of ticks by which to evolve the initial board.
• <initial_board_file>: the filename from which to read the initial board

configuration.
• <final_board_file>: the filename to which to write the final board con-

figuration. If a single hyphen or omitted, then the output is written to
stdout instead.
Your program should take at least these three arguments. If you need to change
the command-line arguments, please document this clearly in a README file
or in your report. For example, you might like to include the number of threads
(if you use a Pthreads solution).
3.2.2 Input and output files

Your program will read the initial board configuration from a file, and save
the final board configuration (after the given number of ticks) to another file.
We’ve implemented functions for reading and writing such files; you can find
them in load.h and save.h, respectively. The main() function in gol.c does
this automatically for you; please do NOT change this behavior.
The first line of the file will have two numbers separated by a single space:
the number of rows and the number of columns in the board, in that order. The
following lines list the elements in the board, each separated by a newline, in
column-major order. The numeral 0 represents a dead cell, and 1 represents a
living cell.
4
We provide some text files which contain sample initial conditions. You are
also welcome to make and test your own. You might want to try some of the
patterns in the Wikipedia article that repeat with a known period, or that are
stable.
3.2.3 Generating random input

The file bitboard.c contains a program for generating random input board
files. If you use the included Makefile, this source code should be built into
an executable, called initboard.x. You can use this to generate random initial
conditions:
$ ./initboard.x <nrows> <ncols> > start.txt
Replace <nrows> with the number of rows in the board, and <ncols> with the
number of columns. The above command writes the initial conditions to a file
called start.txt.
Currently, bitboard.c is set to use a random seed from /dev/random,
/dev/urandom, or the system time, in that order of preference. You’re wel-
come to edit bitboard.c and tell it to use the same random seed each time,
if you like. This will produce the same sequence of numbers each time, which
might be useful for testing.
Note that the initboard.x application and its associated source files are
not part of the assignment. They are included only for your convenience, as a
way to test your code. You are welcome to edit them and/or add features, if
you like. Please inform us of such changes in the documentation that you turn
in.
3.2.4 Verification
The main() function in gol.c has a verifyp option. If you set this (in the source
code) to a nonzero number, the main() function will test your game of life ()
function against the results of sequential game of life (). (This means that if
you want to write your own sequential Game of Life function, you should give
it a different name, so that you won’t mess up the verification!) Note that
verification will incur a runtime cost (though it may or may not affect the
timings), so you might want to turn it off while benchmarking.
3.3 Optimizations
Once you have completed, benchmarked, and turned in a correct parallel im-
plementation, you will optimize your code. We will release more information
about optimizations soon, including suggested optimizations as well as target
performance. For now, concentrate on writing correct code. You’ll have an
additional week to optimize your code.
The aforementioned Wikipedia article also discusses some (sequential) opti-
mizations, such as only computing on active regions. You are not required to
5
Grid size Num. iters Time (s)
1000 × 1000 10 0.34
2000 × 2000 10 1.9
3000 × 3000 10 4.5
1000 × 1000 100 3.0
Table 1: Some sequential Game of Life timings on an Itanium

2 Citris node.
try any of these, but you may wish to read about them and try them. Note,
however, that tracking active regions in parallel is tricky, as you have to account
for active regions that span processor boundaries. Benchmarking will also re-
quire care, as tracking active regions won’t improve performance if you start
with a random board.
3.4 Benchmarking
Benchmark your program much as you did the parallel matrix-matrix multiply
assignment: measure weak and strong speedup. Also, measure the performance
of thread synchronization. You can do this by commenting out the actual com-
putation, but leaving in the parallel synchronization. If you test enough gener-
ations, you can sum up the resulting timings and get a fairly accurate measure
of parallel overhead.
You might wonder what problem sizes to use. This is guided in part by how
long it takes to solve the problem. Figure 1 shows some sequential timings on an
Itanium 2, for different problem sizes. You might want to use the clock resolution
(available in the main() function in gol.c) to guide the choice of problem size.
Furthermore, you should run several iterations, as this will expose how well your
code exploits the cache (you don’t have to worry about this until next week!).
You should pick at least some problem sizes somewhat larger than the cache
size. (The provided code represents the board with one byte per cell.)
3.5 Report
Include a written report with each homework submission. You may recycle the
relevant parts of the report for reporting on optimizations the following week.
The first week’s report should include:
• Brief description of computing hardware and timer resolution;
• Answers to the questions listed above;
• Description and justification of your algorithm;
• Your choice of parallel system (OpenMP or Pthreads);
• Description of benchmarking methodology;
6
• Results, presented in graphical form with minimal clutter;2
• Discussion of results;
• Any additional information, including any nonstandard instructions for

building and running your program.
You should report results in terms of the number of arithmetic operations (which
are not flops) per second: if your grid is n by n, take 7n2 and divide it by the
runtime. This is because computing each cell requires seven arithmetic opera-
tions: you have to sum the neighboring cells. Don’t count the actual number
of arithmetic operations: otherwise you’ll unfairly punish an implementation
that does less work per cell, or unfairly reward an implementation that does
redundant work in order to save communication.
3.6 Submission
Please use the submit script system to turn in the written report, and all of
your code (including the provided code). We must be able to compile, link, and
run your code without adding or changing source files. (Please also run a make
clean before turning in your source files.)
3.7 Due dates

The first (unoptimized parallel) code and report are due next week (5pm Tues-
day 1 October). The optimized version of the code, and the updated report, are
due the following week (5pm Tuesday 8 October).
4 Reference Links
• Conway’s Game of Life at Wikipedia
• Life at Mathworld
• LLNL OpenMP Tutorial
• LLNL PThreads Tutorial
• Inline Functions at Wikipedia
• CS194-2 Fall 2007 Website
2 This means you may want to leave some results out of the graph, if they don’t contribute
to our understanding of the performance. If you think we should see those results, you can
put them in a separate table or text file.

HW 3 A

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

HW 3 A

Încărcat de

Drepturi de autor:

Formate disponibile

UNIVERSITY OF CALIFORNIA AT BERKELEY

Computer Science Division

Assignment 3: Parallel Game of Life

1.2 Boundary conditions

Figure 1: Finite Game of Life board showing neighbors with

1.3 Two buffers

2 Parallelizing the Game of Life

Figure 2: Two possible domain decompositions on the Game

3.1 Questions on Parallelization Issues

1. Without fixing the number of threads (number of regions), is one of the

4. Identify any “parallelization cost” (overhead or redundancy) inherent in

3.2.1 Program Arguments

$ ./gol <generations> <initial_board_file> <final_board_file>

The arguments have the following meanings:

• <generations>: the number of ticks by which to evolve the initial board.

• <initial_board_file>: the filename from which to read the initial board

• <final_board_file>: the filename to which to write the final board con-

3.2.2 Input and output files

3.2.3 Generating random input

$ ./initboard.x <nrows> <ncols> > start.txt

Table 1: Some sequential Game of Life timings on an Itanium

• Brief description of computing hardware and timer resolution;

• Answers to the questions listed above;

• Description and justification of your algorithm;

• Your choice of parallel system (OpenMP or Pthreads);

• Description of benchmarking methodology;

• Any additional information, including any nonstandard instructions for

3.7 Due dates

• LLNL OpenMP Tutorial

• LLNL PThreads Tutorial

• Inline Functions at Wikipedia

• CS194-2 Fall 2007 Website

S-ar putea să vă placă și