Sunteți pe pagina 1din 3

A New Parallel Programming Paradigm for Emerging Architectures

Franck Butelle et Camille Coti LIPN, CNRS-UMR7030, Universit Paris 13, F-93430 Villetaneuse (France) {franck.butelle, camille.coti}@lipn.univ-paris13.fr

Context

The latest generation of parallel machines feature many cores on each node. The approach reduces the space taken by the machines, but also their electricity consumption, needs in air conditioning and, more generally, their cost. The latest machines and those that are currently being built feature 32 cores per node (e.g.,IBMs BlueWaters 1 ). GPGPUs 2 feature a large number of cores that share a common memory (1 024 cores, and up to 2 048 on the latest cards). Parallel machines such as those produced by SGI in the Altix series 3 interconnect nodes through a low-latency, high-throughput network to implement a virtual, global shared memory between the nodes. Microprocessors designed more specically for embeded systems such as "cluster on chip" systems currently feature up to 160 cores on a single microprocessor 4 . The architecture of such systems follow the shared memory model [4, 7]. Each computation unit (also called a processor in the literature) can access two types of memory areas: a private memory area, that only this computation unit can access, and a public memory area, which is shared between all the computation units. All the computation units can read and write in the shared memory. Parallel applications executed on such parallel machines can use this shared memory to communicate. A computation unit can write data in a segment of shared memory, and these data can be read later by one or several other computation units. In particular, AMD MPCore processors feature a technology that links the cores together and allows a core to have direct access to the cache memory of another core without any intermediate copy in its own cache. Following the hardware trend, programming languages and technique evolve on order to make ecient use of these emerging architectures. Introduced by Cray and used by other vendors (SGI, Quadrics), the SHMEM language is currently in a standardization and unication process with the OpenSHMEM standard 5 . The interest given by the high performance computing scientic community in this standardization is the sign of strong expectations for a paradigm that would be specic for these architectures.
1. 2. 3. 4. 5. http://www.ncsa.illinois.edu/BlueWaters/ General-Purpose Processing on Graphics Processing Units http://www.sgi.com/products/servers/altix/memory.html http://techresearch.intel.com/ProjectDetails.aspx?Id=151 http://computing.ornl.gov/SC10/documents/Kuehn_OpenSHMEM_SC10.pdf

Open problems and perspectives

SHMEM is a one-sided communication library that uses shared memory. The specicities of this model for distributed systems oer a new outlook over problems on distributed algorithms. One-sided communications introduce a peer-to-peer communication model which is totally dierent from the traditional models that can be found in the literature, in which a communication is performed by the conjugate actions of a sender and a receiver. In the case of one-sided communications, each computation unit can read or write data in the shared memory area without telling the other computation units. We studied a coherency problem on this shared memory problem in [2]. On this model, the programming language must provide some tools to the programmer and enforce some properties. The goal of this thesis is to propose algorithmic solutions to fulll these services and enforce these properties (e.g.,atomicity of some operations, distributed locks...) in the context of shared memory with one-sided communications for high performance computing.

Expected work

The work that must be conducted in this thesis is composed of three parts that complement one another. The rst aspect will be the design of a system as dened by this specication. In particular, the student will design distributed algorithms under this model. For example, guaranteeing the atomicity of the operations in a distributed system is not a trivial issue. Examination of this problem in a specic model can lead to novel algorithm that are more ecient and allow the exploitation of the specicities of this model for better performance. The second aspect of this work will consist in the realization of a prototype that implements the whole or part of the OpenSHMEM specication in order to validate the algorithmic choices and propose communication methods (for peer-to-peer and collective communications) that are specic for emerging architectures. The last aspect will be a qualitative analysis of the OpenSHMEM language, and in particular an comparison of its expressiveness with other parallel programming paradigms (such as MPI [5, 6] in a distributed memory model, UPC [3] in a distributed shared memory model). In the aftermath of this thesis, this work can possibly be extended to specic distributed architectures that use RDMA (Remote Direct Memory Access [1]) network interface cards.

Research supervision

This thesis will be supervised by Franck Butelle and Camille Coti. Franck Butelle is an assistant professor with a qualication that allows him to supervise research (the French "Habilitation Diriger des Recherches"). His work is focused on designing parallel and distributed algorithms. In particular, he is the author of novel

algorithms for spanning trees with and without constraints in a distributed computing model, as well as parallel and distributed scheduling algorithms. Camille Coti is an assistant professor. She works on parallel systems and run-time environments for parallel applications, and more particularly on the problems related to distributed computing raised by the exploitation of distributed resources. Among her contributions are fault-tolerance, scalability and connectivity. She was also part of the development of parallel programming libraries such as Open MPI, MPICH-V et QCGOMPI.

Rfrences
[1] High performance RDMA protocols in HPC. In Proceedings, 13th European PVM/MPI Users Group Meeting, Lecture Notes in Computer Science, Bonn, Germany, September 2006. Springer-Verlag. [2] Franck Butelle and Camille Coti. A model for coherent distributed memory for race condition detection. In proceedings of the 13th Workshop on Advances in Parallel and Distributed Computational Models (APDCM11), Anchorage, Ak, May 2011. to appear. [3] UPC Consortium. UPC Language Specications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National, 2005. [4] Shlomi Dolev. Self-Stabilization. MIT Press, March 2000. [5] Message Passing Interface Forum. MPI: A message-passing interface standard. Technical Report UT-CS-94-230, Department of Computer Science, University of Tennessee, April 1994. [6] Al Geist, William D. Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing L. Lusk, William Saphir, Anthony Skjellum, and Marc Snir. MPI-2: Extending the message-passing interface. In Luc Boug, Pierre Fraigniaud, Anne Mignotte, and Yves Robert, editors, 1st European Conference on Parallel and Distributed Computing (EuroPar96), volume 1123 of Lecture Notes in Computer Science, pages 128135. Springer, 1996. [7] Gerard Tel. Introduction to Distributed Algorithms. Cambridge University Press, 1994.

S-ar putea să vă placă și