Documente Academic
Documente Profesional
Documente Cultură
N-Body-Simulation
Thorsten Grahs, 06.07.2015
Overview
Introduction
N-Body model
Algorithms
Hierarchical approach
Multipole expansion
MPI derived data-types
N-Body-Simulation
Dwarf # 4 N-Body-systems
Multi-body/Particle simulation
Simulation of interacting bodies/particles
Many physical systems can be modelled as particle
simulations
(e.g. micro scale view to fluid dynamics)
Cover the whole range of scales:
atomic scale
solar systems
n
P
f(xi , xj )
j=1,i6=j
n
P
j=1,i6=j
Fij = Gmi
n
P
j=1,i6=j
x x
Modelling
The outcome of the N-Body forces is a
System of dN 2nd order ODEs with
N: # of molecules & d Dimension
2
Fi = mi dd tx2i
Can be reformulated into a system of 2dN 1st-order ODEs:
.
pi = mi xi
.
pi = Fi
Time integration with Euler or Runge-Kutta Scheme
xn+1
= xni + t
i
t n
Fi
i
|m{z
}
vni
Cost of computation
Computational costs
The costs for the evaluation of the Forces depends
strongly (and not surprisingly) on the number of the
involved bodies.
They are of the order
O(N 2 )
Force evaluation dominates the overall costs of the
simulations
Need for cost reduction in order to achieve
high performance
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 14
Cost reduction
Ways to cost reduction
1. For every action there is an equal and opposite reaction
Fji = Fij
2. Introduction of an cutoff-radius R.
Forces, due to particles outside the radius R, will be
updated rarely:
O(NR 3 + N 2 )
3. Hierarchical methods (octree approach)
O(N log N)
4. Multipole methods
O(N)
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 15
Hierarchical approach
Barnes-Hut Algorithm (J. Barnes, P. Hut, 1986)
The simulation domain is divided into sub-volumes based
on the appearance of particles
The algorithm is build on the octree approach
Advantages
Only particles in neighbouring or nearby cells has to be
treated individually.
For remote cells the particles inside can be treated as one
particle with summed-up mass in the cell barycentre.
O(N log N)
Quad-/Octree
Particle injection
Mass centre
Force calculation
Multipole expansion
A fast algorithm for particle simulation
L. Greengard and V. Rokhlin, J. Comp. Phys., 73 (1987)
Building blocks:
Multipole expansion of the forces
Approximation of the expansion due to a given tolerance
Translation operators to recentre the expansion
O(N)
Taylor expansion
Force expansion in a Taylor series
Tree structure
Information transformation
Comparison of Methods
Expansion shift
signed char
signed short int
signed int
signed long int
unsigned char
unsigned short int
unsigned int
unsigned long int
float
double
long double
type-less 8 bit memory range)
denotes packed data)
MPI_Type_struct
MPI_Pack
MPI_Unpack
MPI_DOUBLE (1)
MPI_DOUBLE (3)
MPI_DOUBLE (3)
MPI_INT (1)
float x;
double data[2];
Layout in memory?
struct {
int num;
float x;
double data[2];
} obj
MPI_Datatype obj_type;
1
2
3
4
MPI_Bcast(&obj,1,obj_type,0,comm);
Continuous
Constructor
1
Example
1
2
3
int B[2][3];
MPI_Datatype matrix;
MPI_Type_contiguous(6, MPI_INT, &matrix);
Continuous example
Constructor
1
2
3
4
double A[N][N];
double B[N][N];
5
6
MPI_Datatype matrix;
7
8
9
10
11
12
13
14
if (rank == master_rank
MPI_Send(A, 1, matrix, 1, 10, comm);
else if( rank == 1 )
MPI_Recv(B, 1, matrix, 0, 10, comm, &status);
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 45
Vector
Constructor
1
2
3
4
5
newtype has
count blocks each consisting of
blocklength copies of
oldtype data-types.
Vector example
1
2
3
const int N = 5;
MPI_Datatype dt;
MPI_Type_vector(N, 1, N, MPI_INT, &dt);
Variables
count=5
blocklength=1
stride=5
2
3
double A[N][N];
4
5
MPI_Datatype column;
6
7
8
9
10
11
12
13
if (rank == master_rank
MPI_Send(&A[0][2], 1, column, 1, 10, comm);
else if( rank == 1 )
MPI_Recv(&A[0][2], 1, column, 0, 10, comm, &status);
Structure
Constructor
1
2
3
4
5
Parameters
count number of blocks (integer)
array_of_blocklength
number of elements in each block (array)
array_of_displacements
number of displacements in each block (array)
array_of_types type of elements in each block
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 49
Structure example
1
2
3
4
5
struct {
int num;
float x;
double data[4];
} obj;
Variables
count=3
MPI_Aint intex;
MPI_Type_extent(MPI_INT, &intex);
3
4
5
displacements[0] = static_cast<MPI_Aint>(0);
displacements[1] = intex
MPI_Aint
Address integers
Addresses in C are integer longer than int.
Datatype of locations/addresses is MPI_Aint and not int.
Displacements, which are differences of two addresses
can be longer than int.
Datatype MPI_Aint takes care of this possibility.
FORTRAN has integer which is four bytes long, hence it is
not necessary to use MPI_Aint.
Pointer n C/C++
Array
1
2
3
4
struct {
int n;
double x[3];
} obj;
Pointer to array
1
2
3
4
struct {
int n;
double * x;
} obj;
MPI_Address
int MPI_Address(void *location, MPI_Aint *address);
Get the address location in memory
1
2
3
4
5
struct {
int n; double * x;
} obj;
obj.n = 10;
obj.x = new double[obj.n];
6
7
8
9
10
11
12
MPI_Pack
Packing data
int MPI_Pack(void* inbuf, int incount,
MPI_Datatype datatype, void *outbuf,
int outsize, int *pos, MPI_Comm comm)
Create a package of discontinuous data
MPI_Pack | Arguments
inbuf
Input buffer, where the data comes from.
incount
# elements from type datatype, which will be packed.
datatype
Datentype of the elements.
outbuf
Buffer of the packed data.
outsize
Size of outbuf in bytes.
pos
Position in outbuf (in bytes) from where data are injected.
Is increased to the size of the inserted data after injection.
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 57
MPI_Pack
Works like a stream
Alternative approach to grouping data for communication
Manually pack variables into a contiguous buffer
Transmit
Unpack into desired variables
MPI_Unpack
Wheres a pack, theres an unpack...
int MPI_Unpack(void* inbuf, int insize,
int *pos, void *outbuf, int outcount,
MPI_Datatype datatype, MPI_Comm comm)
Unpacking the data on the receiver side.
Receiving the data into the inbuf buffer
Storing the unpacked data into the outbuf buffer
MPI_Pack | Arguments
inbuf
Incoming buffer with the packed data
insize
Size of inbuf in bytes.
pos
Position in inbuf (in bytes) from where the data should be
unpacked.
outbuf
Buffer where the unpacked data are stored.
outcount
# elements to be unpacked
datatype
Datatype of the elements.
06.07.2015 Thorsten Grahs Parallel Computing I SS 2015 Seite 60
Example
1
2
3
4
5
6
7
8
10
11
12
13
Considerations
Good
Sending few messages of great/big size is more efficient
than sending many messages of small size.
May avoid the use of system buffering
Ideal for sending variable length messages
Sparse matrix data, client/server task dispatch
Bad
You have to pack and unpack data yourself
Requires a lot of attention to detail
Further reading
J. Barnes and P. Hut
A hierarchical O(N log N) force-calculation algorithm,
Nature 324 (4): 446449, 1986.
L. Greengard and V. Rokhlin,
A Fast Algorithm for Particle Simulations , J. Comput.
Phys. 73, 325-348, 1987.
Netlib
Netlib Repository at UTK and ORNL
User-Defined Datatypes and Packing
http:
//www.netlib.org/utk/papers/mpi-book/node70.html