Documente Academic
Documente Profesional
Documente Cultură
SMP clusters
Communication architecture
SWITCH
2
Adapters
3
Shared devices
4
Communication bandwidth – IBM p690+
5
Communication bandwidth – Sun Fire 15K
6
Communication bandwidth – AlphaServer
7
Shared memory
8
Shared memory segments
9
How message passing works
10
Message passing performance
11
Communication between nodes
12
Communication between nodes
User Memory
Communication buffers
Communication buffers
Network adapter
13
Communication within a node
14
Communication within a node
Memory segment
15
Memory bottlenecks
16
Streams data
17
Adapter bottlenecks
18
Switch bottlenecks
19
Bottlenecks
20
Optimizing for SMP clusters
21
Explicit shared memory segments
22
Optimizing communication pattern
23
Tree based communication
24
Optimized trees
25
Unoptimized trees
26
Optimized trees
27
MPI process mapping
28
Split communicators
29
Split communicator II
30
Optimized AllReduce
MPI_Reduce(sendbuf,scratch,count,dt,op,0,local);
If(local_rank == 0)
MPI_AllReduce(scratch,recvbuf,count,dt,op,cross);
MPI_Bcast(recvbuf,count,dt,0,local);
31
Optimized AllReduce
32
Optimizing AlltoAll
33
Optimized AlltoAllv
34
Remove order constraints
35
Order constraint example
36
Conclusions
37
References
38