Multitreading With Asynchronous

Combining Multi-Threading with Asyn
hronous Communi ation

{ A Case Study with DSM-Threads using Myrinet via BIP and Madeleine {
Thomas R
oblitz and Frank Mueller
Humboldt University Berlin, Institut f. Informatik, 10099 Berlin (Germany)
e-mail: muellerinformatik.hu-berlin.de
Abstra t
This paper ontributes a study on integrating ommuni ation middleware into a multi-threading environment. It addresses problems rooted in blo king and provides solutions to build asyn hronous ommuni ating
me hanisms on top of syn hronous ones. The paper
details dierent strategies for handling multi-threaded
on urren y in both an event-signalling framework and
a polling fashion, depending on the underlying message layer. It motivates the advantages of a single
re eption point for prioritized multi-threading and dis usses potential for zero- opy overhead on message re eption. Implementation details are given for a ase
study with DSM-Threads, a distributed exe ution environment with per node multi-threading, adapted to
use Madeleine as an abstra tion layer for ommuni ation and BIP on the lower level to utilize Myrinet.
The study underlines strengths and weaknesses of the
layered omponents and gives general guidelines for approa hing similar eorts of ommuni ation abstra tion.
1 Introdu tion
In re ent years, multi-threading has re eived in reased attention as a means to hide laten ies and exploit shared-memory multi-pro essors (SMPs), whi h
lead to standardizations su h as POSIX Threads
(Pthreads) [10 and in orporation of threading into
main-stream languages su h as Java. At the same time,
network ommuni ation has seen remarkable improvements resulting in higher bandwidth and lower laten y
onne tions, e.g., via Myrinet, SCI and Gigabit Ethernet. These trends have hanged the eld of super omputing. Appli ations formerly dedi ated to super omputing have been restru tured to exe ute in unison
within a luster of dedi ated workstations. The advent
of luster omputing emphasizes the need to ombine
the trends of multi-threading and high-e ien y ommuni ation. In parti ular, pro essor speeds are still
advan ing faster than improvements in the networking area. Hen e, ommuni ation is likely to remain a
bottlene k in distributed omputing. Multi-threading
may hide laten ies imposed by ommuni ation. In ad-
phone: (+49) (30) 2093-3011
fax:-3010
dition, asyn hronous ommuni ation allows omputation to progress without waiting for message re eption.
This paper addresses these trends by des ribing
dierent approa hes to ombine multi-threading, distributed exe ution and message-passing ommuni ation within existing frameworks. A ase study details
the eorts of adapting a distributed shared memory
framework, DSM-Threads [7, to adhere to ommuni ation abstra tion provided by Madeleine [1 and BIP
[9 for Myrinet. Several issues are raised, ranging from
on i ts in ontrol for distributed exe ution over different message re eption paradigms to methods of integrating thread-safe message passing for asyn hronous
ommuni ation. Calls are onsidered thread-safe (or
MT-safe: they blo k the alling thread, not the pro ess).
DSM-Threads is a runtime system to support distributed threads with a distributed shared virtual
memory. Appli ations that adhere to the Pthreads
standard rely on shared memory and thus experien e
their best performan e on SMPs. The programming
model of these appli ations, however, ontains inherent parallelism that is not only limited to SMPs but
an readily be exploited on a distributed system with
shared virtual memory, su h as DSM-Threads. Hen e,
DSM-Threads supports an API whi h strongly resembles that of POSIX Threads. It supports s alability
through a variety of syn hronization and memory oheren e proto ols that stri tly follow a de entralized
approa h, as opposed to a lient-server paradigm where
a server may present a bottlene k. Communi ation
relies on message passing with point-to-point onne tions. It tolerates asyn hronous ommuni ation on
sends and allows out-of-order message passing. In addition, nodes in the distributed environment may itself
be multi-threaded.
Madeleine is a framework for message passing that
supports TCP/IP, BIP, VIA and SCI. Besides its
task to abstra t from a tual network interfa es, it
also provides distributed exe ution and supports multithreading over a non-standard thread pa kage, Mar el.
BIP is a fast interfa e to the Myrinet network ar hite ture that supports various methods for ommuni a-
tion to support de-fa to standards as MPI or PVM.
2 Overview
This se tion gives an overview of the dierent software omponents, i.e., starting with drivers for networking on the lowest level through various levels of
middleware for ommuni ation abstra tion and on urrent exe ution up to a distributed runtime system.
Figure 1 depi ts the omponents for our sample study
but the approa h an be generalized to arbitrary systems with the same design goals where dierent middleware omponents may be used. On the top level, DSMDSMThreads
Madeleine
BIP/
TCP
Marcel
(shallow binding)
Pthreads
Figure 1. Layering of Software Components
Threads provides a distributed runtime system. DSMThreads also happens to support distributed shared
memory but this is not a requirement for this study,
i.e., we mostly abstra t from the DSM features in
the following. The distributed runtime requires an
underlying environment for per-node multi-threading
(Pthreads) and a message-passing framework. Sin e
message passing has to be mapped onto a variety of
network ar hite tures, whi h are subje t to onstant
hanges and improvement in bandwidth and laten y,
it is imperative to hoose a portable and extensible
approa h for supporting these network ar hite tures.
In this study, Madeleine was hosen as an intermediate layer sin e it supports a variety of network ar hite tures and standards, su h as Ethernet via TCP,
Myrinet via BIP, SCI and VIA. Our study mostly emphasizes the usage of BIP and TCP in this ontext.
Madeleine also provides a ommon interfa e to the upper layers, whi h enhan es their portability.
Modi ations within DSM-Threads in luded the
handling of ommuni ation, whi h was mapped onto
the Madeleine API, both on the sending and re eiving side. Table 1 depi ts the mapping from a TCPoriented interfa e to the orresponding Madeleine routines. Madeleine also supports in remental sends and
re eives of messages, whi h an be used for zero- opy
overhead by dire tly spe ifying the origin and destination within memory for message sends and re eptions,
respe tively. This aspe t will be detailed later on.
Madeleine is thread safe in the sense that a ess
TCP/IP-based
a ept
read
lose (read)
open
write
lose (write)
Madeleine
mad
mad
mad
mad
mad
mad
re eive
unpa k byte
re vbuf re eive
sendbuf init
pa k byte
sendbuf send
Table 1. Mapping TCP/IP onto Madeleine
from multiple threads within one node to the network

adapter is arbitrated. For this purpose, a non-standard
thread pa kage, Mar el, is utilized. Our work builds
on thread pa kages adhering to the Pthreads standard.
Hen e, we de ided to repla e Mar el by a shallow binding pa kage that maps a subset of the Mar el API onto
Pthreads, thereby allowing Madeleine to remain un hanged with regard to its relian e on Mar el.
3 Distributed Exe ution

The upper-most omponent of the layers (DSMThreads) provides a framework for on urrent and distributed exe ution in addition to enhan ed runtime
support, e.g., to guarantee memory onsisten y. In this
model, per-node multi-threading an be used to hide
laten ies and exploit SMPs within one node, whi h is
mapped onto Pthreads. In addition, distributed exe ution allows dynami reation of remote pro esses that
also adhere to the shared-memory programming model
due to the underlying runtime support for distributed
shared memory. Distributed exe ution, as depi ted in
Figure 2(a), may result in one or multiple spawned
pro esses per remote node, i.e., the user an ontrol
the distribution s heme expli itly or he relies on the
DSM runtime to perform this task for him. Multiple
pro esses for one node may also help to hide laten ies
and exploit SMPs. This may be parti ularly attra tive when the underlying Pthreads implementation itself only supports uni-pro essors (e.g., FSU Pthreads
[8) but the operating system handles SMPs at the pro ess level. Furthermore, new exe ution nodes an be
added dynami ally.
Both BIP and Madeleine a tually provide two distin t on epts within one pa kage: message-passing
ommuni ation and distributed exe ution. These distin t on epts are tightly oupled, whi h makes it dif ult to extra t one without the other. At the same
time, the upper-most layer (DSM-Threads) also provides a framework for distributed exe ution, as detailed
above. On one hand, it would be preferable to separate distin t on epts, su h as ommuni ation and
exe ution, from one another. On the other hand, it
was intended to rely on only minimal modi ations of
node 2
> program
node 2
> dsmload program
slave 1
slave 1
T0
dsm_init
dsm_exit
node 1
master
T0
T0
CS
master
Tfct
T0
dsm_init
dsm_thread_create
dsm_thread_create
CS
dsm_exit
T0
dsm_exit
CS
node 3
(function)
slave 2
slave 2
dsm_exit
dsm_init
T0
CS
dsm_init
dsm_exit
(a) Original Model

rsh $host program ...
send a message
CS
Tfct
dsm_exit
Tfct
program start
pthread_create
Tfct
CS
dsm_init
dsm_thread_create
dsm_thread_create
node 3
(function)
dsm_init
node 1
(b) Modified Model

CS communication server
T0 main thread
Tfct thread that performs a user function
Figure 2. Distributed Execution Models for DSM-Threads
lower-level omponents. Hen e, it was de ided to adjust DSM-Threads to adhere to the distributed exe ution model of Madeleine and BIP. Finally, the number
of nodes for distributed exe ution is stati ally regulated, i.e., remote pro esses are reated at initiation
time. No additional nodes an be added later on.
Figure 2(b) depi ts the modied DSM-Threads
model for distributed exe ution. It still provides the
same API to the user, i.e., the user is given the option
to distribute remote exe ution expli itly or the runtime
handles this task impli itly. However, spe i ations of
target nodes are only treated as hints. At initiation
time, the DSM runtime system instru ts Madeleine
(and thereby BIP) to reate pro esses on a stati set
of nodes. During runtime, a user request for remote
exe ution simply results in ommuni ation with a target node to reate a new thread on this node within
the existing pro ess. Hen e, it is no longer possible to
spawn multiple pro esses per node. Instead, su h user
requests result in lustering of a set of threads within
the dedi ated single pro ess on this node. There are no
impa ts on the semanti s of the exe ution model sin e
DSM-Threads assumes a shared-memory programming
paradigm. But there may be lost opportunities to exploit SMPs on the OS level when a Pthreads implementation does not already support SMPs.
4 Communi ation
This se tion ontrasts dierent approa hes to realize
message-based ommuni ation with regard to several
onstraints. The laten y and bandwidth limitations
of today's systems still pose a problem for ontemporary frameworks that support distributed exe ution.
Although onsiderable advan es both in laten y and
throughput have been made, the network generally remains the bottlene k in distributed environments. Hiding laten ies by multi-threading may help but annot
eliminate the problem. Another option to improve the
situation is to improve the responsiveness of a system,
i.e., to ensure that when an important message is re eived, it will be handled right away.
4.1 Direct and Indirect Communication
Figure 3(a) depi ts a dire t ommuni ation s heme.

A message sent from a thread T1 on node P1 is relayed through a port or hannel dire tly to the re eiving thread T2 on P2. Ea h re eiving thread utilizes a
private port / hannel for ommuni ation. This model
relies on MT-safe ommuni ation alls. It an be realized for TCP/IP ports as well as BIP hannels. This
seemingly intuitive s heme may not always be the best
hoi e, as detailed in the following.
Per node multi-threading and improved responsiveness on re eption may sometimes be on i ting goals.
In parti ular, multi-threading under a stri t prioritized
s heduling paradigm, su h as given by Pthreads, may
in uen e the ommuni ation behavior. This may result
in situations where a lower priority thread is preempted
while re eiving a message, thereby potentially blo king
the ommuni ation medium for onse utive higher priority threads for an indenite period of time. Hen e, a
message sent for a higher priority thread annot be re-
P1
T1
P2
T2
T1
P1
P2
POC1
T2
POC2
T2
msg to T1
CS
T1
POC1.1
msg to T1
msg to T2
CS
T2
T1
POC2.1
POC1.2
POC2.2
msg to T2
(a) Direct Reception of Messages on Distinct Ports/Channels
(b) Indirect Message Reception via Demultiplexing on a Single Port/Channnel
sending a message reception of a message notification of an user thread P 1/2 processes

T 1/2 user threads
CS communication server observes POC 1/2

POC x port/channel of a process/thread
Figure 3. Connection Models for Message Passing
eived before the lower priority thread regains ontrol

and nishes his message re eption. As a result, exe ution on the riti al path asso iated with a higher priority thread may be stalled, whi h in i ts worse overall
utilization and auses longer user response times.
One solution to avoid indenite blo king of ommuni ation devi es would be to use priority inheritan e
[10 to boost the priority of a blo ked thread when ontention for ommuni ation by higher priority threads
exists. In general, not all problems will be solved by
priority inheritan e sin e messages may sometimes be
sent via a number of hops through various nodes. For
multi-hop messages, the priority of the re eiver is not
known by intermediate nodes. Hen e, priority inheritan e may not solve the problem.
DSM-Threads uses message forwarding to support
various proto ols for syn hronization and memory onsisten y, whi h results in multi-hop messages. A dierent s heme was adopted to avoid indenite blo king
on ommuni ation devi es, as depi ted in Figure 3(b).
A dedi ated thread, the ommuni ation server, handles all in oming messages on a single port or hannel. Messages are demultiplexed within the node triggering syn hronization to wake up the true re ipient.
The ommuni ation server exe utes at a higher priority than user threads, whi h ensures the responsiveness of the system. The ommuni ation medium will
never be blo ked indenitely as in oming messages are
re eived immediately after the previous messages was
handed o to the proper thread by the ommuni ation
server. This s heme was adopted by DSM-Threads for
TCP/IP ports as well as BIP hannels, both supported
via the ommon Madeleine abstra tion layer.
4.2 MT-Safe Message Passing
Operations for ommuni ation, namely sending and

re eiving, may blo k exe ution. Su h syn hronous
ommuni ation is required by the re eive operation
while the send dire tive may operate syn hronously or
even asyn hronously (without blo king). It is required
that syn hronous ommuni ation be MT-safe (only the
alling thread is blo ked, not the pro ess).

Standard ommuni ation interfa es provided by an
operating system typi ally provide MT-safe ommuni ation interfa es and even user-level Pthreads implementations often support this feature. This allows
ommuni ation through TCP/IP to be dire tly used.
Internally, noti ation upon message re eption is implemented via event (signal) noti ation, whi h results
in resumption of thread exe ution. Figure 4 depi ts
this situation. A signal handler is registered at program start. Subsequent re eive alls blo k the urrent
thread. When a message arrives, the handler supplies
the message to the thread and rea tivates its exe ution. The thread then resumes with the pro essing of
the message. The kernel (or the Pthreads implementation) hides these details of signal delivery and exe ution
handling from the user.
Non-standard ommuni ation interfa es, e.g., addon drivers su h as BIP for Myrinet, often do not support asyn hronous noti ation of message arrivals via
signals. Interfa es for message re eption either blo k
the pro ess or probe for arrived messages. A multithreaded environment has to ensure that only the urrent thread is blo ked upon message re eption. Hen e,
probing for message arrival an be used in a polling
s heme to emulate event noti ation by signals. In Figure 5, a thread re eives a message by rst supplying a
buer and re eives a handle on some ondition variable. While the test indi ates that the probe for new
messages was negative, the thread voluntarily suspends
for a short period of time before the test is repeated.
On e the message arrives, the next probe auses the
thread to exit the loop and then pro ess the message.
Polling has the advantage that it an be easily implemented over most ommuni ation libraries. Sleepbased polling does not avoid the overhead of signalling
sin e a timer event is generated upon expiration of
the sleep duration. Sin e polling may result in repeated suspension and resumption of thread exe ution,
its overhead due to ontext swit hes and signals may
registerHandler( ReceiveMessage )
receive( buffer )
Handler
ReceiveMessage
suspends execution of the thread

msg available on the network
process message basically
copy msg to buffer, etc.
return
resume waiting thread
process message
Figure 4. Message Reception after Event Notification

handle := init( buffer )
while(!test( handle ))
sleep( x ms )
test != 0
msg available in buffer
wait x milliseconds; blocks thread;

other threads may become active
process message
Figure 5. Message Reception via Polling
ex eed that of event signalling upon message re eption.

The polling s heme was adopted for the ommuni ation server thread in DSM-Threads. This server thread
already exe utes at a higher priority than user threads,
whi h guarantees that messages will be onsumed at
the ommuni ation devi e as soon as they arrive, given
that the prior message has already been onsumed in
the same manner. This ensures responsiveness of the
multi-threaded node at the pri e of the polling overhead for omputationally intensive appli ations. Re all
that we assume the ommuni ation medium to represent the bottlene k in distributed exe ution. Hen e,
trading the polling overhead for in reased responsiveness seems promising. We made a similar experien e
when porting the ommuni ation interfa e of DSMThreads to SCI. Here, we used the Common Message
Layer (CML) [3, 2. Again, only message probing but
not signal noti ation was supported to he k for re eptions. We also used the polling approa h in this ase
to integrate CML with a multi-threading environment.
4.3 Partial Message Processing
The ommuni ation server of DSM-Threads urrently re eives an entire message. The server thread
then de ides if a simple or a more omplex operation is
required to handle the message. Simple a tions, su h
as message forwarding to another node, are performed
by the server thread itself. More omplex messages
will be handed o to a set of worker threads operating at lower priority. Handing a message to a worker
generally implies that a opy of the message data be
generated, as depi ted in Figure 6(a).
An alternate approa h would be to only re eive the

message header within the ommuni ation server but
leave the remaining message to a worker thread, as
seen in Figure 6(b). This partial (in remental) re eption of a message also provides opportunities for zero opy overhead when large data is transfered between
nodes. Page based DSM systems, for example, an
benet by pla ing entire pages in the orre t memory
lo ation as part of their onsisten y proto ols. BIP
a tually provides asyn hronous ommuni ation, whi h
is supported by partial message pro essing primitives
within Madeleine as thread-blo king operations. Due
to the underlying s heduling of Mar el, other threads
ontinue to exe ute and may potentially utilize the
CPU for some time before the blo ked thread is rea tivated and gets a han e to re eive the message.
Hen e, Madeleine over Mar el may in i t longer response times than our solution of DSM-Threads over
prioritized Pthreads. In Madeleine/Mar el, re eption may be subje t to voluntary suspension of other
threads while DSM-Threads/Madeleine/Pthreads ensures responsiveness through the polling interval.
5 Modi ations to the Communi ation

Several modi ations had to be in orporated into
the dierent software layers to adapt to the new ommuni ation s hemes. Changes were kept to an absolute
minimum within a lo alized portion of the software
omponents. Most hanges were in orporated within
sele ted pla es in Madeleine or as separate modules to
map equivalent fun tionality.
CS
memory
CS
memory
reads message header
reads message header

header, fixed size
allocates memory for
message body and stores
it there
header, fixed size
worker
temporary buffer
for a page
copies body (page) to

actual address
extracts page address

from header and
copies body (page)
directly to actual address
actual page
actual page
CS communication server
receiving a message
actions of the CS
(a) Traditional Copy Overhead
(b) ZeroCopy Overhead
Figure 6. Message Unpacking Models
5.1 Replacing Marcel
5.2 Adapting Madeleines BIP-Interface
Madeleine's usage of the Mar el thread pa kage was

simply repla ed by a shallow binding to the equivalent
Pthreads fun tionality provided as a separate module
(see Table 2 for the respe tive interfa es and types).
mar el givehandba k provides similar fun tionality as

the POSIX s hed yield interfa e. However, in the
Mar el
Routines
Pthreads
lo k task
unlo k task
mar el key reate
mar el setspe ifi
mar el getspe ifi
mar el mutex init
mar el mutex lo k
mar el mutex unlo k
mar el givehandba k
tmallo
tfree
mar el sele t
mar el sem init
mar el sem P
mar el sem V
pthread mutex lo k
pthread mutex unlo k
pthread key reate
pthread setspe ifi
pthread getspe ifi
pthread mutex init
pthread mutex lo k
pthread mutex unlo k
s hed yield
mar el mutex t
mar el mutexattr t
mar el key t
pthread mutex t
pthread mutexattr t
pthread key t
adopted from
adopted from
Mar el.
Mar el.
sele t
sem init (uses lo k/ ond)
sem p (uses lo k/ ond)
sem v (uses lo k/ ond)
Data Types
Table 2. Shallow Binding: Marcel to Pthreads
Lo k and unlo k utilize a global lo k in onjun tion

with the orresponding Pthreads primitives. Madeleine
uses a global semaphore to arbitrate on urrent ommuni ation requests. The orresponding semaphore
operations lo k task and unlo k task ould be repla ed by a global lo k for mutual ex lusion. Other
semaphore operations were resembled by a ombination of Pthreads lo ks and ondition variables.
Consider Table 2 again.
The Mar el interfa ed
presen e of stri t priority s heduling, suspension only

ae ts threads at the same priority level. If the ommuni ation server tried to yield to another thread, it
would not nd another thread at the same priority level
and simply ontinue to exe ute. This problem was addressed by implementing polling in the previously des ribed fashion. Suspension for a limited duration allows lower priority threads to gain ontrol to progress
with their exe ution.
Further extensions omprise mainly two areas, sending large messages and the hand-shaking me hanism of
redits for BIP buers. Madeleine uses a multi-stage
model to send long messages by partitioning them into
three parts depi ted in Figure 7(a).
1. The sender ommuni ates a request to the re ipient as part of a short message.
2. The re ipient returns an a knowledgement to signal that he is ready to re eive a long message.
3. The sender then progresses by transmitting the
remaining part of the a tual message.
Prioritized multi-threading environments, su h as
DSM-Threads, rely on a single re eiving thread, in
our ase the ommuni ation server, as detailed earlier.
The multi-stage send proto ol for large messages was
extended to the ee t that the ommuni ation server
re eives the a knowledgement message and rea tivates
the user thread through onditional syn hronization,
i.e., a onditional signal (see Figure 7(b)). If the
ommuni ation server himself sends a large message,
the wait and signal operations an simply be omitted
sender
sender
T
REQUEST
receiver
CS
T: wait(c)
ACK
DATA
receiver
CS
REQUEST
CS
ACK
CS: signal(c)
T continues
DATA
(a)
(b)
Figure 7. Models for Sending Large Messages under BIP
but the server has to remember pending partial operations. Demultiplexing of messages to the orre t
thread is realized through separate ondition variables
for ea h thread. Hen e, signaling a spe i ondition
only awakes one sele ted thread.
The se ond extension addresses the hand-shaking
me hanism for redit handling to ensure that the limited number of buers within BIP does not over ow. In
BIP, ea h re eiver posses a limited number of buers
per sender for storing short messages. In the beginning, a sender posses all redits for a re eiver, as depi ted Figure 8(a) with Bsnd and Br v redits for sends
to B and re eives from B, respe tively. The latter denotes the number of already used redits. When node
A sends a message to B, then redits on A for B will be
de remented, as seen in Figure 8(b-d). A send without
proper redits simply results in blo king sin e it must
be assumed that the re eiver does not have an empty
slot for short messages. In addition, used redits will
be returned as piggyba ks on every message, as seen in
Figure 8( ). This informs the other node that a slot
has be ome available for another message.
Sending messages in DSM-Threads was modied to
the extend that (in the absen e of redits) messages are
stored in a message queue. This ee tively results in
asyn hronous sends for threads on the DSM level. The
message queue is later handled by workers when new
redits are re eived. The worker is informed by the
ommuni ation server about the arrival of new redits. Depending on the number of re eived redits, a
subset of the queued messages may then be sent. Noti e that DSM-Threads uses asyn hronous ommuni ation proto ols in general for syn hronization and onsisten y handling on the DSM level, whi h fa ilitates
redit hand-shaking in the des ribed manner.
[5. They investigate performan e issues and me hanisms for mat hing polling rates and message arrival
rates. They on lude that signals are more appropriate for oarse parallelism while polling ex els under
ner granular parallelism. They also suggest to ombine both approa hes depending on the urrent state
of the s heduler, i.e., they suggest to use polling when
no threads are ready. Our work diers in that we aim
to minimize modi ations to existing omponents. Neither the thread s heduling (Pthreads), whi h may even
be inside the operating system, nor the ommuni ation
layer (BIP) was modied in our work.
Maquelin et al. also suggest to ombine polling and
interrupt handling for ommuni ation but on a hardware level [6. Interrupts are only generated if the
network interfa e realizes that polling may not result
in ee tive responsiveness. Results show that this approa h outperforms traditional polling in terms of responsiveness while a hieving omparable performan e
overhead. At the same time, this overhead is lower
than traditional interrupt handling. In ontrast, our
work was restri ted to software approa hes of dealing
with ommuni ation issues.
Itzkovitz et al. dis uss the merits of interrupts and
polling on Windows NT for Millipede [4. They suggest that the ombination of polling, multi-threading
and asyn hronous ommuni ation may neither a hieve
the best responsiveness, nor utilization, nor the lowest number of ontext swit hes. They enhan ed Fast
Messages to implement signal noti ation. Our work
fo uses on portability and extensibility with regard to
existing ommuni ation omponents. Hen e, we did
not enhan e BIP by signal noti ation. Future work
may in lude su h an approa h to assess if the results
from NT an be generalized to other environments.
6 Related Work
7 Con lusion
Langendoen et al. address issues of multi-threading

in the presen e of interrupts and polling for messages
We presented our experien e on integrating ommuni ation middleware into a multi-threading envi-
Bsnd Brcv
Asnd Arcv
Bsnd Brcv
Asnd Arcv
Bsnd Brcv
Asnd Arcv
Bsnd Brcv
Asnd Arcv
(a)
0 0
3
credits
0 back to B
1 credit
back to A
(c)
(b)
Figure 8. Credit Mechanism
ronment. Several problems rooted in blo king were

addressed and a number of solutions were suggested
to build asyn hronous ommuni ating me hanisms on
top of syn hronous ones. The paper detailed dierent
strategies for handling multi-threaded on urren y in
both an event-signalling framework and a polling fashion, depending on the underlying message layer. It
motivated the advantages of a single re eption point
for prioritized multi-threading and dis ussed potential
for zero- opy overhead on message re eption. Implementation details were given for a ase study with
DSM-Threads, a distributed exe ution environment
with per node multi-threading, whi h was adapted to
use Madeleine as an abstra tion layer for ommuni ation and BIP on the lower level to utilize Myrinet.
In summary, a lear separation of ommuni ation and
exe ution paradigms would have fa ilitated the port
but may have ompromised the exibility of BIP and
Madeleine. In addition, event noti ation through signals upon message re eptions would have fa ilitated
the port. It would have also have provided a more
responsive s heme of message handling with omparable overhead in the best ase and less overhead in the
worst ase. The study underlined strengths as well as
weaknesses of the involved layering omponents and
gave general guidelines for approa hing similar eorts
of ommuni ation abstra tion.
Referen es
[1 Lu Bouge, Jean-Fran ois Mehaut, and Raymond
Namyst. Madeleine: an e ient and portable
ommuni ation interfa e for multithreaded environments. In Pro . 1998 Int. Conf. Parallel Ar hite tures and Compilation Te hniques (PACT '98),
pages 240{247, ENST, Paris, Fran e, O tober
1998. IFIP WG 10.3 and IEEE.
[2 M. Eberl, W. Karl, M. Lebere ht, and M. S hulz.
Eine Software-Infrastruktur fur Na hri htenaustaus h und gemeinsamen Spei her auf SCIbasierten PC-Clustern. In Cluster Computing
Workshop, 1999.
[3 B. Herland and M. Eberl. A ommon design of
PVM and MPI using the SCI inter onne t. nal
[4
[5
[6
[7
0 credits
back to A
(d)
report for task a 2.11 of ESPRIT proje t 23174,

University of Bergen and Te hni al University of
Muni h, O tober 1997.
Ayal Itzkovitz, Assaf S huster, and Yoram Talmor. Harnessing the power of fast, low laten y,
networks for software dsms. In ACM Workshop
on Software Distributed Shared Memory, Rhodes,
Gree e, June 1999.
K.G. Langendoen, J. Romein, R.A.F. Bhoedjang,
, and H.E. Bal. Integrating polling, interrupts, and
thread management. In Symposium on the Frontiers of Massively Parallel Computation, pages 13{
22, Annapolis, Maryland, O tober 1996.
Olivier Maquelin, Guang R. Gao, Herbert H. J.
Hum, Kevin B. Theobald, , and Xin-Min Tian.
Polling wat hdog: Combining polling and interrupts for e ient message handling. In International Symposium on Computer Ar hite ture,
pages 178{188, Philadelphia, Pennsylvania, May
1996.
F. Mueller. On the design and implementation
of DSM-threads. In Pro . 1997 International
Conferen e on Parallel and Distributed Pro essing Te hniques and Appli ations, pages 315{324,
June 1997. (invited).

[8 Frank Mueller. A library implementation of
POSIX threads under UNIX. In Pro eedings of the
USENIX Conferen e, pages 29{41, January 1993.
[9 Lo Prylli and Bernard Touran heau. BIP: a
new proto ol designed for high performan e networking on Myrinet. In 1st Workshop on Per-
sonal Computer based Networks Of Workstations

(PC-NOW '98), volume 1388 of Le t. Notes in
Comp. S ien e, pages 472{485. Springer-Verlag,
April 1998.
[10 Te hni al Committee on Operating Systems and
Appli ation Environments of the IEEE. Portable
Operating System Interfa e (POSIX)|Part 1:

System Appli ation Program Interfa e (API),
1996. ANSI/IEEE Std 1003.1, 1995 Edition, in luding 1003.1 : Amendment 2: Threads Extension [C Language.

Multitreading With Asynchronous

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Multitreading With Asynchronous

Încărcat de

Drepturi de autor:

Formate disponibile

Combining Multi-Threading with Asyn

hronous Communi ation

phone: (+49) (30) 2093-3011

tion to support de-fa to standards as MPI or PVM.

Figure 1. Layering of Software Components

Table 1. Mapping TCP/IP onto Madeleine

from multiple threads within one node to the network

3 Distributed Exe ution

> dsmload program

(a) Original Model

(b) Modified Model

Tfct thread that performs a user function

Figure 2. Distributed Execution Models for DSM-Threads

Figure 3(a) depi ts a dire t ommuni ation s heme.

(b) Indirect Message Reception via Demultiplexing on a Single Port/Channnel

sending a message reception of a message notification of an user thread P 1/2 processes

CS communication server observes POC 1/2

Figure 3. Connection Models for Message Passing

eived before the lower priority thread regains ontrol

Operations for ommuni ation, namely sending and

alling thread is blo ked, not the pro ess).

suspends execution of the thread

process message basically

copy msg to buffer, etc.

resume waiting thread

Figure 4. Message Reception after Event Notification

msg available in buffer

wait x milliseconds; blocks thread;

Figure 5. Message Reception via Polling

ex eed that of event signalling upon message re eption.

An alternate approa h would be to only re eive the

5 Modi ations to the Communi ation

reads message header

reads message header

header, fixed size

copies body (page) to

extracts page address

(a) Traditional Copy Overhead

(b) ZeroCopy Overhead

Figure 6. Message Unpacking Models

5.1 Replacing Marcel

5.2 Adapting Madeleines BIP-Interface

Madeleine's usage of the Mar el thread pa kage was

mar el givehandba k provides similar fun tionality as

Table 2. Shallow Binding: Marcel to Pthreads

Lo k and unlo k utilize a global lo k in onjun tion

Consider Table 2 again.

The Mar el interfa ed

presen e of stri t priority s heduling, suspension only

Figure 7. Models for Sending Large Messages under BIP

Langendoen et al. address issues of multi-threading

ronment. Several problems rooted in blo king were

report for task a 2.11 of ESPRIT proje t 23174,

June 1997. (invited).

sonal Computer based Networks Of Workstations

Operating System Interfa e (POSIX)|Part 1:

S-ar putea să vă placă și