Documente Academic
Documente Profesional
Documente Cultură
Approach to Non-uniform
Cluster Computing
Vijay Saraswat
IBM Research
Overview
Clustered Computing
Big picture
places, atomic, async, finish, clocks, arrays
Guarantees
Challenges
Acknowledgements
X10 Tools
Julian Dolby, Steve Fink, Robert
Fuhrer, Matthias Hauswirth,
Peter Sweeney, Frank Tip,
Mandana Vaziri
University partners:
MIT (StreamIt), Purdue University
(X10), UC Berkeley (StreamBit), U.
Delaware (Atomic sections), U.
Illinois (Fortran plug-in), Vanderbilt
University (Productivity metrics),
DePaul U (Semantics)
Philippe Charles
Chris Donawa (IBM Toronto)
Kemal Ebcioglu
Christian Grothoff (Purdue)
Allan Kielstra (IBM Toronto)
Maged Michael
Christoph von Praun
Vivek Sarkar
PEs,
L1 $ .
PEs,
. . L1 $
Clusters (scale-out)
Proc Cluster
Proc Cluster
...
PEs,
L1 $ .
PEs,
SMP
. . L1 $
Multiple cores on a
chip
L2 Cache
L2 Cache
...
Coprocessors (SPUs)
SMTs
...
L3 Cache
Memory
...
SIMD
ILPSoftware will need to
3) Scalability wall:
deliver ~ 105-way parallelism to utilize
peta-scale parallel systems
Proc Cluster
Proc Cluster
PEs,
L1 $
PEs,
. . . L1 $
...
PEs,
L1 $
..
L2 Cache
L2 Cache
...
...
PEs,
. L1 $
\\
L3 Cache
Parallel
Specification
Source Code
Written
Specification
Algorithm
Development
//
Input Data
Requirements
Memory
Development of Parallel
Source Code --Design, Code,
Test, Port,
Scale, Optimize
//
Production
Runs of
Parallel Code
Maintenance and
Porting of Parallel Code
X10
Development
Toolkit
Java
Development
Toolkit
C
Development
Toolkit
...
Fortran
Development
Toolkit
...
X10
Components
X10 runtime
Java
components
Java runtime
Fortran
components
Fast extern
interface
Fortran runtime
C/C++
components
C/C++ runtime
Productivity
Scalability
Place
Granularity of
place can range
from single register
file to an entire
SMP system
Outbound
activities
Inbound
activities
Place-local heap
Activities &
Activity-local storage
heap
stack
control
Place-local heap
...
Activities &
Activity-local storage
heap
...
stack
control
heap
Inbound
activity
replies
Outbound
activity
replies
stack
heap
...
control
stack
control
Immutable Data
MPI (P > 1)
July 23,
2003in Saraswat, Jagadeesan Concurrent Clustered Programming.
Formalized
async
async (P) S
cf Cilks spawn
finish
finish S
cf Cilks sync
10
atomic
Conceptually executed in a
single step, while other
activities are suspended
Blocking operations
Accesses to data at remote
places
Creation of activities at
remote places
11
when
class OneBuffer {
nullable Object datum = null;
boolean filled = false;
public
void send(Object v) {
when ( !filled ) {
this.datum = v;
this.filled = true;
}
}
public
Object receive() {
when ( filled ) {
Object v = datum;
datum = null;
filled = false;
return v;
}
}
}
12
regions, distributions
Region
a (multi-dimensional) set of
indices
Distribution
region R = 0:100;
region R1 = [0:100, 0:200];
region RInner = [1:99, 1:199];
// a local distribution
distribution D1=R-> here;
// a blocked distribution
distribution D = block(R);
// union of two distributions
distribution D = (0:1) -> P0 || (2:N) -> P1;
distribution DBoundary = D RInner;
Based on ZPL.
13
arrays
Arrays may be
Multidimensional
Distributed
Value types
Initialized in parallel:
int [D] A= new int[D]
(point [i,j]) {return N*i+j;};
Array section
A [RInner]
14
ateach, foreach
Instance p of statement S
is executed at the place
where A[p] is located
foreach (point p:R) S
Creates |R| async
statements in parallel at
current place
Termination of all
activities can be ensured
using finish.
15
clocks
Operations
,cn)
Static Semantics
next;
c.drop();
Dynamic Semantics
16
Example: SpecJBB
finish async {
clock c = new clock();
Company company = createCompany(...);
for (int w : 0:wh_num) for (int t: 0:term_num)
async clocked(c) { // a client
initialize;
next; //1.
while (company.mode!=STOP) {
select a transaction;
think;
process the transaction;
if (company.mode==RECORDING)
record data;
if (company.mode==RAMP_DOWN) {
c.resume(); //2.
}
}
gather global data;
} // a client
// master activity
next; //1.
company.mode = RAMP_UP;
sleep rampuptime;
company.mode = RECORDING;
sleep recordingtime;
company.mode = RAMP_DOWN;
next; //2.
// All clients in RAMP_DOWN
company.mode = STOP;
} // finish
// Simulation completed.
print results.
17
Based on Middleweight
Java (MJ)
Configuration is a tree
of located processes
Basic theorems
Equational laws
Clock quiescence is
stable.
Monotonicity of places.
Deadlock freedom (for
language w/out when).
Type Safety
Memory Safety
18
Current Status
09/03
PERCS
Kickoff
02/04
X10
Kickoff
07/04
X10
0.32
Spec
Draft
X10
Grammar
Analysis passes
Parser
Target
Java
Code emitter
Structure
07/05
X10
Productivity
Study
12/05
X10
Prototype #2
PEM
Events
Code metrics
Translator based on
Polyglot (Java compiler
framework)
X10 extensions are
modular.
Uses Jikes parser
generator.
X10
Multithreaded
RTS
Native
code
JVM
X10
source
X10
Prototype
#1
Open
Source
Release?
Annotated
AST
AST
02/05
06/06
Code
Templates
Limitations
Parser: ~45/14K*
Translator: ~112/9K
RTS: ~190/10K
Polyglot base: ~517/80K
Approx 180 test cases.
(* classes+interfaces/LOC)
Program
output
19
Future Work:
Implementation
Type checking/inference
Load-balancing
Message aggregation
Efficient implementation of
scan/reduce
Efficient invocation of
components in foreign
languages
Continuous optimization
Activity aggregation
Consistency
management
Clocked types
Place-aware types
C, Fortran
20
Design/Theory
Atomic blocks
Structural study of
concurrency and
distribution
Tools
Clocked types
Hierarchical places
Weak memory model
Persistence/Fault
tolerance
Database integration
Refactoring language.
Applications
21
Backup material
Type system
Value classes
May only have final fields.
May only be subclassed
by value classes.
Instances of value
classes can be copied
freely between places.
nullable is a type
constructor
23
Example: Latch
public class Latch implements future {
protected boolean forced = false;
protected nullable boxed result = null;
protected nullable exception z = null;
public interface future {
boolean forced();
Object force();
}
public atomic
boolean setValue( nullable Object val,
nullable exception z ) {
if ( forced ) return false;
// these assignment happens only once.
this.result .val= val;
this.z = z;
this.forced = true;
return true;
public atomic boolean forced() {
return forced;
}
public Object force() {
when ( forced ) {
if (z != null) throw z;
return result;
}
}
24