Documente Academic
Documente Profesional
Documente Cultură
GPU
(Graphics Processing Units)
Daniel Jasiski
Atizar
Who we are?
Atizar
Limited
OpenFOAM
interface
consulting
research
robust
customization
analysis
Atizar Flagship
Product
simFlow
simFlow
Features
create and import mesh
prepare cases
parametrise your problem
calculate solutions
compute in parallel with just one click
post-process results with ParaView
High End
CPU
CPU
Features
Clock Frequency
2.6 GHz
No of Cores
Memory Bandwidth
Feeding
1 CPU
Core
double [] a, b, c;
for(int i = 0; i < N; i+
+)
{
c[i] = a[i] + b[i];
}
2. * (2 + *
8
6 1)
GHz
input
output
bytes
ops/cycl
e
= 249.6
GB/s
We only have 51.2
Memory
Latency
CPU
Core
L1 Cache
~3 cycles
~10 cycles
L2 Cache
~30 cycles
~300 cycles
L3 Cache
Main Memory (RAM)
CPU
Stall
Time
...
CPU Core
GPU Core
Data parallelism
forAll(a,i)
{
c[i] = a[i] + b[i];
}
thrust::transform
{
a.begin(), a.end(),
b.begin(),
c.begin(),
thrust::plus<scalar>()
};
GPU
Accelerating
Linear
Solvers
OFgpu
SpeedIT
CUFFLINK
PARALUTION
Advantages
Simple drop-in library
No code changes in OpenFOAM
Disadvantages
Limited acceleration
Memory Copy
CPU-GPU
16 GB/s
Features
RapidCFD
Simulations running fully on
the GPU
Minimized CPU-GPU memory
copy
Support for multiple GPUs
PCG and GAMG solvers
AMI Interpolation
Limitations
RapidCFD
Not all solvers are available
Need original OpenFOAM for
meshing and decomposition
GAMG agglomeration on the CPU
AMI addressing calculation on the
CPU
Original Code
GPU Code
template<class Type>
void Foam::GAMGAgglomeration::restrictFaceField
(
Field<Type>& cf,
const Field<Type>& ff,
const label fineLevelIndex
) const
{
const labelList& fineToCoarse = faceRestrictAddressing_[fineLevelIndex];
template<class Type>
struct GAMGAgglomerationRestrictFunctor
{
const Type* ff;
const label* sort;
const Type zero;
GAMGAgglomerationRestrictFunctor(const Type* _ff, const label* _sort):
ff(_ff), sort(_sort), zero(pTraits<Type>::zero){}
cf = Zero;
__host__ __device__
Type operator()(const label& start, const label& end)
{
Type out = zero;
for(label i = start; i<end; i++)
out += ff[sort[i]];
return out;
}
forAll(fineToCoarse, ffacei)
{
label cFace = fineToCoarse[ffacei];
if (cFace >= 0)
{
cf[cFace] += ff[ffacei];
}
}
}
};
template<class Type>
void Foam::GAMGAgglomeration::restrictFaceField
(
gpuField<Type>& cf,
const gpuField<Type>& ff,
const label fineLevelIndex
) const
{
const labelgpuList& sort = faceRestrictSortAddressing_[fineLevelIndex];
const labelgpuList& target = faceRestrictTargetAddressing_[fineLevelIndex];
const labelgpuList& targetStart =
faceRestrictTargetStartAddressing_[fineLevelIndex];
cf = pTraits<Type>::zero;
thrust::transform
(
targetStart.begin(),targetStart.end()-1,
targetStart.begin()+1,target.begin(),
thrust::make_permutation_iterator(cf.begin(),target.begin()),
GAMGAgglomerationRestrictFunctor<Type>
(
ff.data(),
sort.data()
),
nonNegativeGAMGFunctor<label>()
);
}
Hardware
Performance
Tests
Type
Cores
Bandwidth
[GB/s]
Xeon E5-2670
CPU
51.2
166.4
Tesla K20X
GPU
2688
250
1312
GFlops (DP)
Case Study
Performance
Tests
3.15M Hexa cells
pisoFoam + LES
double precision
sim-flow.com