Sunteți pe pagina 1din 28

GPU-based Reaction Ensemble Monte Carlo Method

for non-ideal plasma thermodynamics


M. Tuttafesta
a
, A. DAngola
b
, A. Laricchiuta
c
, G. Colonna
c
, M. Capitelli
a
a
Universit`a di Bari, via Orabona, 4 - 70126 Bari, Italy
b
Scuola di Ingegneria SI, Universit`a della Basilicata, via dellAteneo Lucano, 10 - 85100
Potenza, Italy
c
CNR-IMIP Bari, via Amendola 122/D - 70126 Bari, Italy
Abstract
In the paper a Graphics Processing Unit (GPU)CUDA C version of the
Reaction Ensemble Monte Carlo method (REMC) is presented in order to
investigate the equilibrium behaviour of chemically reacting systems in highly
non-ideal condition. GPU version of the code is particularly ecient when
the total potential energy of the system must be calculated and a considerable
speed-up is achieved. Results, obtained in the case of Helium plasma at high
pressure, show dierences between real and ideal cases.
Keywords: REMC, Thermodynamics of plasmas, Monte Carlo, Real gases
1. Introduction
Reaction Ensemble Monte Carlo [1, 2] (REMC) is a molecular-level com-
puter simulation technique for predicting chemical equilibrium and thermo-
dynamic properties under nonideal conditions. Non-ideal plasmas are cur-
rently of great experimental and theoretical interest for astrophysics and for
applications in power engineering such as catalyst development, nanoporous
material manufacturing, supercritical uid separation, propulsion and com-
Preprint submitted to Computer Physics Communications January 28, 2013
bustion science, novel energy storage [3, 4, 5, 6, 7]. Accounting for intermolec-
ular forces between reacting components in non-ideal systems is critical for
optimising processes and applications with chemical reactions.
REMC is a computational tool for the exact calculation of macroscopic
plasma properties considering the interaction between particles and simul-
taneous multiple reactions. The required information for molecularlevel
computer simulation techniques are interaction potentials and internal ther-
modynamic properties of the reacting components. REMC is based on the
random sampling (Metropolis algorithm) of the grand-canonical ensemble
for a multicomponent reacting plasma and consists in a combination of three
types of state transitions: particle displacements; reaction movements and
volume changes. REMC approach, accounting for the interaction potential
among particles, is particularly suitable at high pressure, where non ideal
eects become important. This method has been successfully applied to He-
lium [2, 8], Argon and air plasmas [9] respectively consisting of 2, 7 and
26 ionization reactions. The interactions between charged particles in these
calculations were described by Deutsch potentials while both the neutral
neutral particle and the neutralion particle interactions were approximated
by Exp-6 potentials [10].
In this work a GPU-CUDA C version of REMC in the case of Helium at high
pressure is presented. A considerable speed-up between CPU and GPU exe-
cution times has been obtained in particular to calculate the total interaction
energy.
2
2. The REMC method
Reaction equilibrium in plasmas at specied temperature and pressure
is attained when the Gibbs free energy of the system is minimized subject
to mass conservation and charge neutrality. The REMC method involves
molecular and atomic production and deletion, as well as changes of parti-
cle identities during the simulation. Adding to the total partition function
the internal contribution Q
int
i
for isolated molecule i (which includes rota-
tional, vibrational, electronic and nuclear contributions), the grand canonical
partition function for a mixture of N reacting components is [11]

Q =

N
1
=0

Ns=0
_

_
dx
N
1
d
N
1
dx
Ns
d
Ns
exp [({V, x
N
i
})]
(1)
where
=
s

i=1
_
N
i

i
log (N
i
!) +N
i
log
_
V Q
int
i

3
th
__
(2)
and = 1/kT. Deviation from non ideality in plasmas arises from the long-
range Coulombic interaction between charged particles, from the lowering
of the ionization potential which depends on the composition and from the
short-ranged neutralneutral and neutralcharge interactions. DebyeH uckel
theory takes into account the rst two sources of non ideality and thermody-
namic and transport properties of plasma used in uid dynamic codes take
generally into account for this eect [12, 13, 14, 15, 16, 17, 18, 19]. In a
plasma, the Coulombic interactions lower the ionization potential of the ions
and thermochemical quantities are composition dependent resulting in an in-
3
herently nonlinear process. This eect can be accounted for into the REMC
method.
The REMC method generates a Markov chain to simulate the properties
of the system governed by eq. (1); the chain consists of a combination of
three main types of Monte Carlo steps: (1) Particle displacements, D; (2)
Reaction moves, R; (3) Volume changes at xed pressure, V .
Particle displacements and volume changes are implemented and obtained
as in traditional molecular dynamic simulations [20, 21]. Particles are initially
placed in a simulation box and the total energy of the system is evaluated
considering that only pairwise interactions are relevant. Initial conguration
may correspond to a regular crystalline lattice or to a random conguration
but with no hard-core overlaps. A particle is randomly selected ad a random
displacement is given by
r
l
= r
k
+ ( 0.5) (3)
where k and l are respectively the initial and the nal state of the system,
are small displacements properly selected and are random numbers uni-
formly distributed between 0 and 1. The reverse trial move is equally proba-
ble. The particle displacement is accepted with a transition probability k l
given by
W
D
kl
= min [1, exp (
kl
)] (4)
Computer simulations are usually performed by using a small number N
of particles and the size of the box is limited by the available storage and
by the speed of execution of the program. The computational time taken
for evaluating the potential energy is proportional to N
2
. When a particle
4
moves crossing the boundary of the box, the problem of surface eects can
be overcome by implementing periodic boundary conditions. The cubic box
is replicated throughout space to form an innite lattice. There are no walls
at the boundary of the central box, and no surface molecules. This box sim-
ply forms a convenient axis system for measuring the coordinates of the N
molecules. When periodic boundary condition is adopted an innite number
of terms is requested in order to evaluate the total energy. For a short-range
potential energy function, the summation to evaluate total energy can be
restricted considering the particle at the center of a region which has the
same size and shape as the basic simulation box, so that it interacts with all
the molecules whose centers lie within this region.
In REMC simulations at pressure and temperature xed, the volume is sim-
ply treated as an additional coordinate, and trial moves in the volume must
satisfy the same rules as trial moves in positions. A volume trial move con-
sists of an attempted change of the volume from V
k
to V
l
[20, 21]
V
l
=V
k
+ V ( 0.5) (5)
and such a random, volume-changing move will be accepted with the proba-
bility
W
V
kl
= min
_
1, exp
_

kl
(V
l
V
k
) P +N log
V
l
V
k
__
(6)
When a volume change is accepted the size of the box and the relative posi-
tion of the particles change. In order to calculate P
V
kl
the new conguration
energy must be recalculated considering the new distances between particles.
To simulate a chemically reacting system at specied temperature and pres-
sure rather than at constant temperature and volume, a trial volume change
5
as in eq. (6) must be considered.
Chemical reactions are considered as follows. In the case of multiple reac-
tions system [22], for any linearly independent set of n
R
chemical reactions
given by
s
j

i=1

ij
X
i
= 0 j = 1, 2, . . . , n
R
(7)
and considering that
N
i
= N
0
i
+
n
R

j=1

ij

j
i = 1, . . . , s; j = 1, . . . , n
R
(8)
the transition probability for a step
j
is [11]
W

j
kl
= min
_
1,
_
p
0
V
_

j

j
K

j
p,j
s
j

i=1
_
(N
0
i
!)
(N
0
i
+
j

ij
)!
_
exp (
kl
)
_
(9)
where
j
=

s
j
i=1

ij
is the net change in the total number of molecules for
reaction j.
Finally, the procedure for a reaction move is the following:
(a) a reaction is randomly selected;
(b) the reaction direction, forward or reverse, is randomly selected;
(c) a set of reactants and products according to the stoichiometry of the
reaction considered is randomly selected;
(d) the reaction move is accepted by evaluating the probability associ-
ated with performing changes of particle identities, together with particle
insertions and deletions, if the total number of molecules changes during the
selected reaction.
6
3. Case study: Helium plasma
In the Helium plasma (4 species: He, He
+
, He
++
, e

) we consider for
the He-He
+
interaction, three dierent potentials: Exp-6, an average and a
statistical selection of attractive and repulsive Hulburt-Hirschfelder potential
[23]. The following formula have been used:
1. Exp-6 [9]: u(r) =
_
_
_
, r < r
c
Aexp(Br) C
6
r
6
, r r
c
where: A = 6.161 10
17
J, B = 5.2802 10
10
m
1
, C
6
= 0.642
10
79
J m
6
, r
c
= 0.6008 10
10
m
2. Hulburt-Hirschfelder average (HH-modRep-mix) [23]: u
mix
(r) = (u
HH
(r)+
u
modRep
(r))/2
where:
u
HH
(r) = u
HH
( r(r)) =
=
0
{exp [2
HH
r] 2 exp [
HH
r] +
HH
r
3
[1 +
HH
r] exp [2
HH
r]}
r =
_
r
re
1
_
,
0
= 2.4730 eV, r
e
= 1.081

A,

HH
= 2.23,
HH
= 0.2205,
HH
= 4.389
u
modRep
(r) =
1
exp(ar br
2
cr
3
)

1
= 359 eV, a = 4.184

A
1
, b = 0.649

A
2
, c = 0.08528

A
3
7
3. Hulburt-Hirschfelder statistical selection (HH-modRep-stat) [23]:
u
stat
(r, ) =
_
_
_
u
HH
(r) 0 < 0.5
u
modRep
(r) 0.5 < 1
where is a random uniformly distributed number in [0,1] selected
at every numerical calculation of u
stat
. These potentials have been
reported and compared in Fig. 1.
-1e-20
-5e-21
0
5e-21
1e-20
0 1 2 3 4 5 6
E
n
e
r
g
y

(
J
)
r (Angstrom)
u-mix
Exp-6
-4e-19
-2e-19
0
2e-19
4e-19
6e-19
8e-19
1e-18
0 1 2 3 4 5 6
E
n
e
r
g
y

(
J
)
r (Angstrom)
u-mix
u-HH
u-ModRep
Figure 1: Exp-6, u
HH
, u
modRep
and u
mix
potentials as a function of He-He
+
interparticle distance.
4. CPU and GPU numerical methods
In order to compare CPU and GPU execution of the algorithm we build a
single computer program which runs sequentially the relevant module. The
input data are initialized before every module run, storing detailed processing
times for the whole modules as well as for relevant subroutines.
It must be pointed out that both CPU and GPU modules implement the
same algorithm, schematized in Tab. 1
The basic steps of CPU and GPU programs (Move-functions) are re-
ported in the Tab. 2 where, functions marked by a box contain a large loop,
8
Table 1: General algorithm
Input: pressure, temperature, initial particles distribution, potential types
Assign: initial particle positions, pair-potential pointers to function
Start cycles loop for equilibration
Start moves loop inside a cycle
Move-function: randomly generated movement (volume, particle, reaction)
with xed probabilities
End moves loop inside a cycle
Output: variables as a function of cycles
End cycles loop for equilibration
Cycles loop for averages: same scheme as equilibration
Output: variables averaged at a given temperature and pressure
that are parallelized in GPU kernels. GPUdeltaU is the parallel version of
CPUdeltaU. Both functions calculate the change in congurational energy
for a given movement. B and T are the number of blocks per grid and threads
per block respectively in kernels conguration. Tipically we adopt T = 128
for all kernels, B=32 for particle or reaction movements and 64 for volume
movement. In order to evaluate the speed-up=(CPU execution time)/(GPU
execution time) of our code we perform a test calculation considering only
20 cycles for equilibration and 80 cycles for averages but varying the number
of particles N for the Helium plasma.
A CUDA architecture is employed with a GPU GeForce GTS 450 and
the Toolkit 4.1, while the CPU is a Pentium(R) Dual-Core E5500. In Fig. 2
speeds-up are shown for each movement and for the whole module as a func-
tion of number of starting particles (100% He). Move-volume kernel speed-up
is greater than Move-position and Move-reaction ones because the former in-
9
volves N
2
threads while the latters involve N threads. Due to the
move ratio Move-position:Move-volume:Move-reaction is N : 1 : N lower
speeds-up dominate the whole module one.
Table 2: CPU and GPU subroutines in a Move-function.
Operations CPU main subroutine GPU main subroutine
Random Move-conguration CPU random function CPU random function
Move-conguration probability CPU Pmove function CPU Pmove function
Evaluate Umove GPUdeltaUmove Bmove
Tmove (...)
COPY Umove from HOST to DEVICE cudaMemcpy
Accept/Reject the Move-conguration CPU random function CPU random function
Update conguration cudaMemcpy
0
2
4
6
8
10
12
14
16
0 1000 2000 3000 4000 5000 6000
S
p
e
e
d
-
u
p

C
P
U
/
G
P
U
Particles
Module
Move Position
Move Volume
Move Reaction
Figure 2: Detailed speed-up for each movement and for the complete module
as a function of number of initial particles (100% He).
Both the CPUdeltaU
move
function and the corresponding GPUdeltaU
move
kernel, pointed out in Tab. 2, calculate the change in congurational energy
10
(eq. 6 for volume, eq. 4 for position and eq. 9 for reaction movements).
If N is the number of particles, the calculation consists of N independent
threads, which are proportional to N in the case of position and reaction
movements and to N
2
for volume movements. The CPUdeltaU
move
func-
tion performs a sequential loop on the N threads and a contextual summation
of N scalar terms. The GPUdeltaU
move
kernel parallelizes the above loop
using 32 or 64 blocks per grid and 128 threads per block and uses the shared
memory to perform a nal summation-reduction of the N terms.
Inside a single thread the pair-potential energy of a couple of particles
is calculated by means of a multi-potential function, whose implementation
is described in the Appendix A.1. In order to obtain good characteristics
of generality, exibility and reusability of the code, C language structs and
pointers to functions have been widely used. So, for instance, the informa-
tions about particles are stored inside an array of struct, as described in the
Appendix A.2. We did not use the curand CUDA library, so the implemen-
tation of the multi-potential function calling, described in the Appendix A.3,
uses an our hand-coded pseudo-random number generator, which has been
suitably tested (see Appendix B).
5. Results and conclusion
In this paper a GPU-CUDA C version of the REMC method has been
used to calculate non ideal eects on Helium plasma at high pressure. GPU
version of the code is particularly ecient for volume steps, where the total
energy of the system (proportional to N
2
) must be calculated. We performed
REMC simulations for Helium plasma at a xed pressure of p = 400 MPa and
11
for temperatures ranging from 20000 K to 100000 K. For a given temperature
we followed these steps:
1. Set the initial number of particles: [He,He
+
,He
++
,e

]=[500,0,0,0];
2. REMC simulation in Ideal Gas (IG) condition obtaining the corre-
sponding particle distribution;
3. REMC simulation starting from IG distribution and considering Exp-6
or u
stat
interaction potential.
In all simulations we generated 300 cycles for equilibration and 700 cycles
to accumulate averages of the desired quantities. Each cycle consists of a
xed number of total moves n = n
D
+ n
V
+ n
R
. Particle moves n
D
, volume
moves n
V
and reaction moves n
R
are selected randomly with the probability
ratio

N : 1 :

N, where

N is the half-sum of the maximum and the minimum
number of particles during a simulation run.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20000 40000 60000 80000 100000
M
o
l
a
r

f
r
a
c
t
i
o
n
T (K)
He
He+
He++
e-
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20000 40000 60000 80000 100000
M
o
l
a
r

f
r
a
c
t
i
o
n
T (K)
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
0.002
0.0022
20000 40000 60000 80000 100000
M
o
l
a
r

v
o
l
u
m
e

(
m
3
/
m
o
l
)
T (K)
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
0.002
0.0022
20000 40000 60000 80000 100000
M
o
l
a
r

v
o
l
u
m
e

(
m
3
/
m
o
l
)
T (K)
Figure 3: Molar fractions and molar volume obtained at 400 MPa by using
Exp-6 potential. Dashed lines refer to IG case.
Figures 3 and 4 show molar fractions, molar volume, molar enthalpy
and excess molar energy obtained at 400 MPa by using Exp-6 potential.
Results have been obtained also in ideal gas (IG) case, where no interactions
12
0
1000
2000
3000
4000
5000
20000 40000 60000 80000 100000
M
o
l
a
r

e
n
t
h
a
l
p
y

(
k
J
/
m
o
l
)
T (K)
0
1000
2000
3000
4000
5000
20000 40000 60000 80000 100000
M
o
l
a
r

e
n
t
h
a
l
p
y

(
k
J
/
m
o
l
)
T (K)
-200
-150
-100
-50
0
20000 40000 60000 80000 100000
M
o
l
a
r

i
n
t
e
r
a
c
t
i
o
n

e
n
e
r
g
y

(
k
J
/
m
o
l
)
T (K)
Figure 4: Molar enthalpy and excess molar energy obtained at 400 MPa by
using Exp-6 potential. Dashed lines refer to IG case.
4
5
6
7
8
9
10
20000 40000 60000 80000 100000
M
e
a
n

p
a
r
t
i
c
l
e

s
p
a
c
i
n
g

(
A
n
g
s
t
r
o
m
)
T (K)
4
5
6
7
8
9
10
20000 40000 60000 80000 100000
M
e
a
n

p
a
r
t
i
c
l
e

s
p
a
c
i
n
g

(
A
n
g
s
t
r
o
m
)
T (K)
Figure 5: Mean particle spacing obtained at 400 MPa by using Exp-6 poten-
tial. Dashed lines refer to IG case.
are considered. Some dierences can be observed expecially for the molar
volume.
The mean particle distance observed in Fig. 5 is higher than 5 Angstrom and
dierences cannot be observed between multi-potential curves as shown in
Fig. 1. At higher pressures, where the mean interparticle distances become
lower, multi-potential curve eects can be appreciated.
13
Appendix A. CPU and GPU implementation details
Appendix A.1. Potential energy calculation
The implementation of the multi-potential function, that calculates the
pair-potential energy of a couple of particles, is for CPU
Listing 1: mpot (CPU)
double mpot ( double r , DATAMPOT dmp, double rnd )
{
for ( i nt i =0; i <(dmp>npot)1 && rnd>(dmp>w[ i ] ) ; i ++);
return dmp>pot [ i ] ( r , &(dmp>dp [ i ] ) ) ;
}
and for GPU
Listing 2: mpot (GPU)
d e v i c e
double d mpot ( double r , DATAMPOT dmp, double rnd )
{ // . . . same as mpot }
where r is the inter-particle distance, rnd is a random number uniformly
distributed in [0, 1] and DATAMPOT is a typedef of the following struct
Listing 3: DATAMPOT
typedef struct {
i nt npot ;
double w[NPOTMAX] ;
POT pot [NPOTMAX] ;
DATAPOT dp [NPOTMAX] ;
14
} DATAMPOT;
where npot is the number of single-potentials belonging to a given DATAMPOT
instance and whose maximum value is NPOT_MAX. The array w contains the
statistical limits in the interval [0, 1], that is 0<w[0]<w[1] w[npot-2]<1,
w[-1]=0, so w[i]-w[i-1] is the statistical weight of the i-th single-potential.
The single-potential calculation is obtained by using an array of pointers
to function, pot, dened as type POT
typedef double (*POT)(double r, DATAPOT *dp);
(A.1)
The parameter r in (A.1) is, as above, the inter-particle distance and
DATAPOT is a typedef of the following struct
Listing 4: DATAPOT
typedef struct {
double pr [PRMAX] ;
} DATAPOT;
where the array pr contains scalar parameters associated to a single-potential
function.
Appendix A.2. Input and assignment
Initial particle species and distribution are read from le at input stage
of the general algorithm, see Tab. 1, together with potential types for every
dierent species-pair.
The informations about particles are stored in the array of struct allocated
as follows
15
part=(PARTICLE *)malloc(Npmax*sizeof(PARTICLE)); for CPU
cudaMalloc((void**)&dev_part,Npmax*sizeof(PARTICLE)); for GPU
where Npmax is the maximum number of particles in a simulation run and
PARTICLE is a typedef of the struct
Listing 5: PARTICLE
typedef struct {
// Coordi nat es
double x ;
double y ;
double z ;
i nt s ; // Speci es i ndex
double m; //Mass
i nt c ; // I oni c charge ( i n el ement ary uni t )
} PARTICLE;
The informations about species-pair potential are stored in the array of
DATAMPOT struct allocated as follows
Listing 6: datampot array allocation
//For CPU
datampot=(DATAMPOT) mal l oc (NSPESNSPES si zeof (DATAMPOT) ) ;
. . .
//For GPU
cudaMal l oc ( ( void)&dev datampot ,
NSPESNSPES si zeof (DATAMPOT) ) ;
where NSPES is the number of species in a simulation run.
16
Specic functions for each species-pair potential are dened according to
the prototype in (A.1). They are dened in double versions, e.g.:
Listing 7: Species-pair potentials
//For CPU
double fpotname1 ( . . . ) ;
double fpotname2 ( . . . ) ;
. . .
//For GPU
d e v i c e double d f potname1 ( . . . ) ;
d e v i c e double d f potname2 ( . . . ) ;
. . .
The pot member of datampot instance is assigned to the above potential
functions in the Assign stage of the general algorithm, see Tab. 1, according
to the following scheme
Listing 8: pot member assignment
for ( i =0; i <NSPES;++i ){
for ( j =0; j<=i ;++j ){
. . .
for ( k=0; k<datampot [ NSPES j+i ] . npot;++k){
. . .
i f ( ! strcmp ( potname , fpotname1 ) ) {
// Assi gn s i ng l e p o t e nt i a l f unct i on
datampot [ NSPES j+i ] . pot [ k]=fpotname1 ;
. . .
}
17
el se i f ( ! strcmp ( potname , fpotname2 ) ) {
// Assi gn s i ng l e p o t e nt i a l f unct i on
datampot [ NSPES j+i ] . pot [ k]=fpotname2 ;
. . .
}
. . .
}
//Symmetric copy
datampot [ NSPES i+j ]=datampot [ NSPES j+i ] ;
}
}
The dev_datampot instance is assigned by rst copying the whole datampot
array
Listing 9: dev datampot assignment, stage 1/2
cudaMemcpy( dev datampot , datampot ,
NSPESNSPES si zeof (DATAMPOT) , cudaMemcpyHostToDevice ) ;
After that copy, the pot member of dev_datampot is not assigned cor-
rectley, so we complete the assignment as follows
Listing 10: dev datampot assignment, stage 2/2
// f or every i , j , k i ndexes and every fpotname f unct i on
. . .
cudaMemcpyFromSymbol(&pot poi nt er , d f potname1 poi nt er ,
si zeof (POT) ) ;
cudaMemcpy(&( dev datampot [ NSPES j+i ] . pot [ k] ) , & pot poi nt er ,
si zeof (POT) , cudaMemcpyHostToDevice ) ;
18
where pot_pointer is a local POT variable and d_fpotname1_pointer is an
auxiliary pointer that is associated to the d_fpotname1 function and globally
assigned as
__device__ POT d_fpotname1_pointer=d_fpotname1;
Appendix A.3. Multi-potential function calling
In the CPU module, the implementation scheme of the CPUdeltaU
volume
function, named CPUdeltaU_vol, is as follows
Listing 11: CPUdeltaU vol
void CPUdeltaU vol
( double de l t a e pt , DATAMPOT dmp, double Lold , double Lnew)
{
. . .
for ( k=0; k<NpNp; ++k){
j=k%Np;
i=k/Np;
i f ( j >i ){
. . .
de l t a e pt +=
mpot ( r new , dmp+NSPES part [ i ] . s+part [ j ] . s ,
( double) rand ( ) /RANDMAX)
mpot ( r ol d , dmp+NSPES part [ i ] . s+part [ j ] . s ,
( double) rand ( ) /RANDMAX) ;
}
}
19
}
where delta_ept is the returning change in volume conguration energy
(the in 6), dmp is the current CPU datampot array allocated in List. (6),
Lold(Lnew) is the cube root of the current(changed) volume, Np is the instan-
taneous number of particles, r_old(r_new) is the distance between particles
i and j in the current(changed) volume.
It is important to note that the third argument of the function mpot,
dened in List. 1, which has to be a random uniformly distributed number
in [0, 1], is here obtained by the function rand, belonging to the stdlib C
library.
In the GPU module, the implementation scheme of the GPUdeltaU
volume
kernel, named GPUdeltaU_vol, is as follows
Listing 12: GPUdeltaU vol
#include l ock . h
. . .
g l o b a l
void GPUdeltaU vol
( double de l t a e pt , DATAMPOT dmp, double Lold , double Lnew,
Lock l ock , PARTICLE part , i nt Np, i nt NSPES, i nt MPr)
{
s ha r e d double cache [ threadsPerBl ock ] ;
i nt cacheIndex = threadI dx . x ;
. . .
double del t a ept t emp =0. 0;
k = threadI dx . x+bl ockI dx . xblockDim . x ;
20
while ( k<NpNp){
j=k%Np;
i=k/Np;
i f ( j >i ){
. . .
del t a ept t emp +=
d mpot ( r new , dmp+NSPES part [ i ] . s+part [ j ] . s ,
d randREMC(MPr+k))
d mpot ( r ol d , dmp+NSPES part [ i ] . s+part [ j ] . s ,
d randREMC(MPr+k ) ) ;
}
k += blockDim . xgridDim . x ;
}
// s e t t he cache v al ue s
cache [ cacheIndex ] = del t a ept t emp ;
// s ynchr oni z e t hr eads i n t h i s b l oc k
s ync t hr e ads ( ) ;
// f or reduct i ons , t hreadsPerBl ock must be a power of 2
// because of t he f o l l o wi ng code
i nt i r = blockDim . x /2;
while ( i r != 0){
i f ( cacheIndex<i r ) cache [ cacheIndex]+=cache [ cacheIndex+i r ] ;
s ync t hr e ads ( ) ;
i r /= 2;
21
}
i f ( cacheIndex == 0){
// wai t u nt i l we get t he l oc k
l ock . l ock ( ) ;
// we have t he l oc k at t h i s poi nt , updat e and r e l e as e
de l t a e pt += cache [ 0 ] ;
l ock . unl ock ( ) ;
}
}
where the rst four arguments are the same of the CPUdeltaU
volume
function.
The variable lock is local, simply declared without initialization before
the kernel calling, whose type Lock is dened in the header lock.h, listed
in the Appendix C (for details see [24]). Such a variable together with the
shared array cache allow to perform a well known [24] parallel algorithm of
summation-reduction.
The relevant aspect here is that the random argument of the device func-
tion d_mpot is provided by the hand-coded (pseudo) random number gen-
erator d_randREMC, according to the scheme x
i+1
= (a x
i
+ c) mod m,
implemented as follows
Listing 13: d randREMC
#define MRND 2147483647 //2311 a prime number
#define A RND 16807
#define C RND 0
d e v i c e double d randREMC( i nt x)
22
{
(x)=(A RND(x)+C RND)%MRND;
return ( ( double ) ( x ) ) /RANDMAX;
}
The array MPr, last argument of the kernel GPUdeltaU_vol, contains the
seeds used by the function d_randREMC and is allocated with a dimension
Npmax*Npmax to ensure a unique random sequence for each thread. Every
single element of MPr is initialized by a rand call and subsequently is passed
by reference in the d_randREMC function which in turn update its value at
every call.
Appendix B. Validation of the multi-potential statistical selection
The extent to which the multi-potential selections, performed by both
mpot (List. 1) and d_mpot (List. 2) functions, t the assumed distribution,
given by the w array in List. 3 , is evaluated by the reduced Chi-Square ,
2
(see [25] pag. 261 and Appendix D), dened as

2
=
1
d
n1

k=0
(O
k
E
k
)
2
E
k
(B.1)
which in general refers to a series of measurements grouped into bins
k = 0, . . . , n 1, where O
k
is the number of measurements observed in the
bin k, E
k
is the number of the expected ones, on the basis of some distribution,
and d is the number of degrees of freedom (d = n c, where c is the number
of constraints, that is the number of parameters that had to be calculated
from the data to compute the expected numbers E
k
).
23
As a test case we considered the simulation described in section 5 for
only T= 50000 K and with an He-He
+
interaction that is always Exp-6 but
is made articially as multi-potential by setting npot=8 and
w=[0.125, 0.225, 0.375, 0.5, 0.625, 0.825, 0.875, 1]
The chi-square in (B.1) is calculated accumulating data from the simula-
tion start to the end of every cycle by considering: n = npot (in the struct
described in List. 3), M = total number of multi-potential function (mpot
for CPU or d_mpot for GPU) calls for the He-He
+
interaction, O
k
= fraction
of M corresponding to the number of the k-th single-potential function (pot
element of the struct described in List. 3) calls, E
k
= M(w[k]-w[k-1]) and
then d = n 1.
As it is known [25], if the probability
Q(
2
|d) =
2
2
d/2
(d/2)
_

x
d1
exp(x
2
/2) dx (B.2)
that the observed chi-square will exceed its value by chance even for
a correct model is greater than 0.05, then one can consider the observed
distribution compatible with the expected one.
So we used the routine gammq from the Numerical Recipes, [26] chapter 6,
to calculate the probability (B.2) for every chi-square calculation, that is as
function of cycles. The results, are reported in Fig. B.6, from which one can
deduce an high quality of the multi-potential statistical selection for both
CPU and GPU model, seeing that the Q values are always greater than 5%,
frequently greater than 40% and a sometime greater than 90%.
In List. 14 one can
24
0
20
40
60
80
100
120
0 200 400 600 800 1000
Q
(

2
|
d
)

x

1
0
0
Cycles
CPU
GPU
Figure B.6: Percentage probability (B.2), for the same CPU and GPU test
case simulation, that the chi-square (B.1) will exceed the value calculated
from the simulation start to the end of a given cycle as a function of cycles.
Appendix C. Atomic locks
Listing 14: lock.h
1 #i f ndef LOCK H
2 #define LOCK H
3
4 struct Lock {
5 i nt mutex ;
6 Lock ( void ) {
7 cudaMal l oc ( ( void)&mutex , si zeof ( i nt ) ) ;
8 cudaMemset ( mutex , 0 , si zeof ( i nt ) ) ;
9 }
10
11 Lock ( void ) {
25
12 cudaFree ( mutex ) ;
13 }
14
15 d e v i c e void l ock ( void ) {
16 while ( atomicCAS( mutex , 0 , 1 ) != 0 ) ;
17 }
18
19 d e v i c e void unl ock ( void ) {
20 atomicExch ( mutex , 0 ) ;
21 }
22 };
23
24 #endif
References
[1] Johnson, J. H., Panagiotopoulos, A. Z., and Gubbins, K. E., Molecular
Physics 81 (1994) 717.
[2] Smith, W. R. and Triska, B., The Journal of Chemical Physics 100
(1994) 3019.
[3] Bezkrovniy, V., Schlanges, M., Kremp, D., and Kraeft, W., Phys. Rev.
E 69 (2004) 061204.
[4] Bezkrovniy, V. et al., Phys. Rev. E 70 (2004) 057401.
[5] Lisal, M., Brennan, J. K., and Smith, W. R., J. Chem. Phys. 124 (2006)
064712.
26
[6] Turner, C. H. and Gubbins, K. E., J. Chem. Phys. 119 (2003) 021105.
[7] Bourasseau, E., Dubois, V., Desbiens, N., and Maillet, J. B., J. Chem.
Phys. 127 (2007) 084513.
[8] Lisal, M., Smith, W., Bures, M., Vacek, V., and Navratil, J., Molecular
Physics 100 (2002) 2487.
[9] Lisal, M., Smith, W., and Nezbeda, I., Journal of Chemical Physics 113
(2000) 4885.
[10] Ree, F. H., The Journal of Physical Chemistry 87 (1983) 2846.
[11] Capitelli, M., Colonna, G., and DAngola, A., Fundamental Aspects
of Plasma Chemical Physics: Thermodynamics, volume 66 of Atomic,
Optical, and Plasma Physics, Springer, New York, 1st edition, 2011.
[12] Bernardi, D., Colombo, V., Coppa, G., and DAngola, A., European
Physical Journal D 14 (2001) 337, cited By (since 1996) 13.
[13] Capitelli, M., Colonna, G., Gorse, C., and DAngola, A., European
Physical Journal D 11 (2000) 279, cited By (since 1996) 47.
[14] Colonna, G., DAngola, A., and Capitelli, M., Physics of Plasmas 19
(2012), cited By (since 1996) 0.
[15] Colonna, G., DAngola, A., Laricchiuta, A., Bruno, D., and Capitelli,
M., Plasma Chemistry and Plasma Processing (2012) 1, cited By (since
1996) 0; Article in Press.
27
[16] Colonna, G. and DAngola, A., Computer Physics Communications 163
(2004) 177, cited By (since 1996) 16.
[17] DAngola, A. et al., European Physical Journal D 66 (2012), cited By
(since 1996) 0.
[18] DAngola, A., Colonna, G., Gorse, C., and Capitelli, M., European
Physical Journal D 46 (2008) 129, cited By (since 1996) 39.
[19] DAngola, A., Colonna, G., Gorse, C., and Capitelli, M., European
Physical Journal D 65 (2011) 453, cited By (since 1996) 1.
[20] Allen, M. P. and Tildesley, D. J., Computer Simulation of Liquids. 2nd
ed., Oxford University Press, USA, 1989).
[21] Frenkel, D. and Smit, B., Understanding molecular simulation : from
algorithms to applications, Academic Press, 2nd edition, 2002.
[22] Turner, H. et al., Molecular Simulation 34 (2008) 119.
[23] Bruno, D. et al., Physics of Plasmas 17 (2010) 112315.
[24] Sanders, J. and Kandrot, E., CUDA by example, Addison-Wesley, 2011.
[25] Taylor, J. R., Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements, 1997.
[26] Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P.,
Numerical recipes in C (2nd ed.): the art of scientic computing, Cam-
bridge University Press, New York, NY, USA, 1992.
28

S-ar putea să vă placă și