Documente Academic
Documente Profesional
Documente Cultură
Version 0.11
Jonathan R. Stanton
jonathan@cnds.jhu.edu
1 Introduction to Spread 1
1.1 What is Spread? . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Comparison with reliable IP-multicast . . . . . . . . . . . 2
1.2.2 Flexibility of services . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Modularity of Spread architecture . . . . . . . . . . . . . 4
1.3 Spread Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Additional Information . . . . . . . . . . . . . . . . . . . . . . . 6
3 Spread C API 23
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Short Buffer Handling . . . . . . . . . . . . . . . . . . . 23
3.2 API Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 SP Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 SP connect . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
iv CONTENTS
3.3.2 SP disconnect . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.3 SP join . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 SP leave . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.5 SP multicast and family . . . . . . . . . . . . . . . . . . 28
3.3.6 SP receive and SP scat receive . . . . . . . . . . . . . . . 30
3.3.7 SP equal group ids . . . . . . . . . . . . . . . . . . . . . 34
3.4 Miscellaneous Functions . . . . . . . . . . . . . . . . . . . . . . 34
Introduction to Spread
1
2 CHAPTER 1. INTRODUCTION TO SPREAD
3. Membership of a Group.
4. Reliable messages to a Group.
5. Ordering of messages sent to a Group.
6. Failure detection of members of the Group.
7. A strong semantic model of how messages are handled when changes to the
Group membership occur.
It should be obvious that the name “Group Communications System” is very
appropriate, as the concept of a “Group” is the fundamental abstraction of the
system. Once you have that abstraction all the other services make sense: knowing
who is in the group, talking to the group, knowing when someone leaves the group,
agreeing on an ordering of events in the group.
Here are a few distinct example applications that exhibit how the group com-
munication model provides a useful abstraction for a wide variety of distributed
applications.
• Service and machine monitoring. A number of machines export their status
to groups of interested monitors. Whenever failure occurs the monitors are
notified.
• Collaborative tools. Many different groups of participants each want to
share data, video and audio conferencing.
• DSM (Distributed Shared Memory). Sending pages of memory to machines
where it is needed using reliable multicast.
• Highly reliable services (such as air traffic control systems, stock exchanges,
military tracking and combat control systems). Services that involve com-
munication of information among numerous machines and people and have
high requirements for both availability and fault-tolerance.
• Replicated databases. A number of instances of a database exist in several
different locations. They must all be kept synchronized in such a way that a
client can query or update any of them and the results will be the same as if
only one copy existed.
best-effort reliability when sending multicast messages for small to medium sized
groups. The key difference is that most reliable IP-multicast protocols aim to
be also solve that problem for very large groups, while Spread does not support
very large groups, but does provide a stronger model of reliability and additional
service such as ordering.
A practical difference is that reliable IP-multicast usually relies on a wide
area IP-multicast network (such as the mbone, or ISP support for multicast rout-
ing) while Spread only relies on point-to-point unicast IP support, and uses IP-
multicast only as a performance optimization.
One subtle distinction between reliable IP-multicast and Spread ’s Reliable
service is that Spread integrates a membership notification service into the stream
of messages. The membership notifications provide some knowledge of who ac-
tually received the reliable messages. The issue of membership is a key distinction
between the unicast, or point-to-point world of TCP/IP and multicast services. In
multicast it is often necessary to know “with whom” you are reliably communi-
cating since there is no obvious ’other party’ as in unicast.
existing group communication abstractions and services would have been much
higher and the performance payoff would not be enough to overcome that.
level of service for each message that it sends. Spread supports 5 different levels
of service. Table 1.1 shows the different types and what kind of ordering and
reliability guarantees they provide.
1.3.1 Ordering
The ordering guarantees defined by Spread are:
None No ordering guarantee. Any other message also sent with ordering “None”
can arrive either before or after this one. Messages with stricter ordering
CAN depend on this message. For example, if a FIFO MESS message Ma
follows RELIABLE MESS message Mb then Ma cannot be delivered until
Ma has been delivered (but the reverse is not true).
Fifo by Sender All messages sent by this sender 1 of at least Fifo ordering are
delivered in FIFO order. As mentioned above a RELIABLE MESS sent
after a Fifo message may be delivered before the Fifo message.
Causal (Lamport) All messages sent by all senders are delivered in an order con-
sistent with Lamport’s definition of “Causal” order. This order is consistent
with Fifo ordering.
Total Order (Consistent w/Causal All messages sent by all senders are deliv-
ered in the exact same order to all recipients. This order is also consistent
with Causal order. It is provided by making the partial order defined by
causal into a total order. The total order uses the id of the sender to break
ties.
It is important to note that messages sent with Fifo ordering or less do not sup-
port the full membership semantics of Spread . This is a result of Spread optimiz-
ing two common operations, group joins and leaves and sending FIFO or Reliable
1
A sender is defined as a particular connection to a Spread daemon, so an application with 3
connections will be considered 3 different senders
6 CHAPTER 1. INTRODUCTION TO SPREAD
messages. First, joins and leaves of group members do not cost more then send-
ing one SAFE message and result in no extra synchronization costs. Second, Fifo
and Reliable messages are not delayed before delivery by any other messages. So
even if gaps exist in the global order of all messages, Reliable messages can still
be delivered and Fifo messages can be delivered as long as all the messages from
their sender have arrived. Because of these two optimizations, it is possible for a
Reliable or Fifo message to be delivered earlier then it would be if it was globally
ordered, however a gap in the global sequence may contain a join or leave mes-
sage (since they are just SAFE messages) so it might be that one process delivers
the Fifo or Reliable message before the join and a different process delivers the
join first and then the message.
1.3.2 Reliability
The Reliability guarantees defined by Spread are:
Unreliable The message is unreliable. It may be dropped or lost and will not be
recovered by Spread .
Reliable The message will be reliably delivered to all recipients who are mem-
bers of the group to which the message was sent. Spread will recover the
message to overcome any network losses.
Safe The message will ONLY be delivered to a recipient if the daemon that re-
cipient is connected to knows that all Spread daemons have the message. If
a membership change occurs, and as a result the daemon cannot determine
whether all daemons in the old membership have the message, then the
daemon will deliver the Safe message after a T RANSITIONAL M EMBERSHIP
message.
2.1.1 Downloading
Spread can be downloaded from http://www.spread.org/ or http://
www.cnds.jhu.edu/.
7
8 CHAPTER 2. INSTALLING AND CONFIGURING SPREAD
arch-bsdi
arch-sgi
arch-sunos
arch-sunsol
arch-pcsol
arch-linux
arch-freebsd
5. Now you need to copy the files, I will assume you use /usr/local/bin,include,lib,man.
Replace ”ARCH” with the directory for your architecture.
cp -p include/* /usr/local/include/
cp -p ARCH/libspread.a /usr/local/lib/
cp -p ARCH/libtspread.a /usr/local/lib/
cp -p ARCH/spread /usr/local/bin/
cp -p ARCH/monitor /usr/local/bin/
cp -p ARCH/user /usr/local/bin/
cp -p ARCH/tuser /usr/local/bin/
cp -p ARCH/simple_user /usr/local/bin/
cp -p ARCH/flooder /usr/local/bin/
cp -p docs/*.3 /usr/local/man/man3/
cp -p docs/*.1 /usr/local/man/man1/
To use the Java classes and examples you need to have a copy of the main
’spread’ daemon running. Then the spread/*.class files gives you the equivalent
of the libspread.a as a package of java classes. The user.java, user.html, and
user.class files give you a demonstration applet and source code. The tree.html
AllNames.html and packages.html give some documentation for the java inter-
face.
For Windows (95/NT) systems use the spread.exe daemon and the libspread.lib
or libtspread.lib to link with your programs.
From the directory where you unpacked the Spread source distribution do the
following:
1. Run “./configure” If you want the binaries and libraries to be installed some-
where other then /usr/local/, pass configure a –prefix=/my/location/path op-
tion.
2. Run “make”
3. If you want to install the binaries into your standard system locations, change
to a privledged user and run “make install” Otherwise, run “make install”
as the user you want to install Spread as.
Router Internet
192.168.1.20
192.168.1.23
192.168.1.21
192.168.1.24
192.168.1.22
192.168.1.25
Internet
Router Router
x.32.49.1 x.32.50.1
x.32.49.20 x.32.50.20
x.32.49.23 x.32.50.23
x.32.49.21 x.32.50.21
x.32.49.24 x.32.50.24
x.32.49.22 x.32.50.22
x.32.49.25 x.32.50.25
x.32.51.20 x.32.52.20
x.32.51.23 x.32.52.23
x.32.51.21 x.32.52.21
x.32.51.24 x.32.52.24
x.32.51.22 x.32.52.22
x.32.51.1 x.32.51.25 x.32.52.1 x.32.52.25
Router Router
Internet
Router Router
x.32.49.1 x.32.50.1
x.32.49.20 x.32.50.20
x.32.49.23 x.32.50.23
x.32.49.21 x.32.50.21
x.32.49.24 x.32.50.24
x.32.49.22 x.32.50.22
x.32.49.25 x.32.50.25
Figure 2.3: Sample Network with four sites connected over the Internet
12 CHAPTER 2. INSTALLING AND CONFIGURING SPREAD
1 Spread_Segment 192.168.1.255:3333 {
2 machine1 192.168.1.20
3 machine2 192.168.1.21
4 machine3 192.168.1.22
5 machine4 192.168.1.23
6 machine5 192.168.1.24
7 machine6 192.168.1.25
8 }
1 Spread_Segment x.32.49.255:3333 {
2 machine1 x.32.49.20
3 machine2 x.32.49.21
4 machine3 x.32.49.22
5 machine4 x.32.49.23
6 machine5 x.32.49.24
7 machine6 x.32.49.25
8 }
9 Spread_Segment x.32.50.255:3333 {
10 machineB1 x.32.50.20
11 machineB2 x.32.50.21
12 machineB3 x.32.50.22
13 machineB4 x.32.50.23
14 machineB5 x.32.50.24
15 machineB6 x.32.50.25
16 }
Figure 2.5: Sample configuration file for two sites directly connected
will print all log messages except those related to data-link or events. The PRINT
and EXIT flags should always be enabled for correct operation of Spread .
The log messages are either printed to the screen of the console where Spread is
run or to the log file specified by the EventLogFile option. The EventLogFile
2.2. CONFIGURING SPREAD 13
1 Spread_Segment x.32.49.255:3333 {
2 machine1 x.32.49.20
3 machine2 x.32.49.21
4 machine3 x.32.49.22
5 machine4 x.32.49.23
6 machine5 x.32.49.24
7 machine6 x.32.49.25
8 }
9 Spread_Segment x.32.50.255:3333 {
10 machineB1 x.32.50.20
11 machineB2 x.32.50.21
12 machineB3 x.32.50.22
13 machineB4 x.32.50.23
14 machineB5 x.32.50.24
15 machineB6 x.32.50.25
16 }
17 Spread_Segment x.32.51.255:3333 {
18 machineC1 x.32.51.20
19 machineC2 x.32.51.21
20 machineC3 x.32.51.22
21 machineC4 x.32.51.23
22 machineC5 x.32.51.24
23 machineC6 x.32.51.25
24 }
25 Spread_Segment x.32.52.255:3333 {
26 machineD1 x.32.52.20
27 machineD2 x.32.52.21
28 machineD3 x.32.52.22
29 machineD4 x.32.52.23
30 machineD5 x.32.52.24
31 machineD6 x.32.52.25
32 }
Figure 2.6: Sample configuration file for four sites connected by the Internet
Flag Function
PRINT General info that should always be printed.
EXIT Errors or other events that cause Spread to quit.
DEBUG Debugging information.
DATA LINK Lowest level of sending and receiving datagrams.
NETWORK Packing messages and setting who to talk with.
PROTOCOL Ordering, Token handling, and delivery algorithms.
SESSION Per user connection management.
CONFIGURATION Parsing and loading configuration file.
MEMBERSHIP State and messages sent during membership changes.
FLOW CONTROL Flow control state of the ring.
STATUS Reporting of status information to the monitor.
EVENTS All events (timed, fd based) and main loop.
GROUPS Group state and group membership changes.
MEMORY Memory debugging and allocation.
SKIPLIST State of data structure.
ALL Enables all flags.
NONE Disables all flags.
filename can contain the special string ’%h’ which will be replaced with the host-
name of the machine running the Spread daemon. This makes it easy to have one
configuration file which multiple daemons will use from the same NFS mounted
filesystem. An example is shown in Figure 2.7 in line 3.
The log messages will be prefixed with a timestamp string if the EventTimeStamp
option is enabled. The timestamp has a default format similar to most log times-
tamps. The format can be customized by setting EventTimeStamp equal to a
format string as shown in Figure 2.7.
The SocketPortReuse option allows one to choose when the SOREUSEADDR
socket option is used on TCP sockets opened up by Spread. When a TCP socket
is open in a server and clients are connected, if the server crashes or goes down
without cleanly closing off all of the client TCP connections, some connections
can be left in “TIME WAIT” state on the server which will prevent the server from
restarting (the bind to the TCP socket will fail) for about 2 minutes (the timeout
on TIME WAIT state). In an environment where you desire to restart the servers
immediately in the event of a crash or shutdown, this 2 minute wait is clearly
undesirable. The SOREUSEADDR socket option allows the daemon to restart
immediately, even if some connections are still in TIME WAIT state. However,
as a consequence of how it does this, it may also allow OTHER programs to bind
to the same port number and interface as Spread is bound to and possibly steal the
messages destined for Spread.
2.3. RUNNING THE DAEMON AND CLIENTS 15
This is a potentially serious security issue as it could allow a user who has
access to the machine running the Spread daemon to capture Spread traffic, or
interfere with the correct functioning of Spread. This security issue is well known
in the Operating system and Internet community and a number of operating sys-
tems have modified the SOREUSEADDR option to minimize of avoid the security
issues while maintaining it’s useful properties. So in many cases it is safe to en-
able the SocketPortReuse option, and the default that Spread ships with is AUTO.
In AUTO setting the SORESUSEADDR option is enabled when Spread is con-
figured in the spread.conf file to only bind to specific interfaces, and is disabled
when Spread binds to INADDR ANY (where no specific interfaces are specified
in the spread.conf file). We believe this is a safe option as the security issue only
arises when a program binds to INADDR ANY.
If you know you are running on an operating system which has a secure im-
plementation of SOREUSEADDR, or you do not allow any non-trusted users to
run programs on the same machines as Spread daemons run on, you can set this to
“On” and the daemon will always use this option to allow fast restarts. If you want
to disable this option completely so the daemon will Never use the SOREUSE-
ADDR option, set this to “Off”.
The RuntimeDir, DaemonUser and DaemonGroup options allow runtime con-
figuration of the file system location and uid/gid combination that Spread runs
itself as when it is executed with root privledges. This option only applies to
Unix-like operating systems. When executed with root privledges, Spread will
change to the specified directory in the file system and use the chroot system
call to change it’s / directory to be that directory. It will then drop all of it’s
privledges and continue to run as the user and group specified. Spread does both
of these actions after reading in it’s config file and opening the specified log file,
so both files can exist outside of the RuntimeDir directory. No files need to be
installed in this directory tree.
The spmonitor program will look for a file called spread.conf in three loca-
tions: first, wherever the -c command line option give if it is used, second in the
directory it is started from, and third in /etc/.
Once it has loaded the configuration file, monitor will give a brief text menu
and prompt as shown in figure 2.8. You can then select what you want to do and
to which daemons you want the command sent. The most common command will
be to send a daemon a status query. The results of that query will look something
like Figure 2.9. Some of the more interesting and useful information returned in
this status report are:
1. Line 1: The state and gstate should both be 1 during normal operation.
Other values indicate a membership change is occuring.
2. Line 1: The “after 116 seconds” gives the time this daemon has been alive.
3. Line 2: Gives the total number of alive daemons and how many different
segments they are in.
4. Line 3: The rounds value is the number of times the token has revolved
around the daemons.
2.3. RUNNING THE DAEMON AND CLIENTS 17
=============
Monitor Menu:
-------------
0. Activate/Deactivate Status {all, none, Proc, CR}
1. Define Partition
2. Send Partition
3. Review Partition
4. Cancel Partition Effects
9. Exit
Monitor>
5. Line 4: Sent pack and recv pack give the cumulative number of actual pack-
ets sent or received.
8. Line 7: The Groups is the total number of groups that currently exist in the
system.
9. Line 7/8: Window is the flow control window limiting how many pack-
ets are sent each token revolution, Pers Window limits each daemon from
initiating more then that number of packets each time it gets the token.
10. Line 8: Deliver M is really “Deliver Messages” and is the cumulative total
number of messages this daemon has been able to deliver.
11. Line 8: Deliver Pk is really “Deliver Packets” and is the cumulative total
number of packets (a message may contain multiple packets) this daemon
has delivered.
12. Line 9: Delta Mess and Delta Pack are the changes in the delivered message
and packet counters since the last status query was sent.
13. Line 10: Delta sec gives the time between this status query and the last one.
only includes one segment and the other is used when more then one segment
are currently active. The current values are shown below from lines 128-151 of
membership.c.
if( Wide_network )
{
Token_timeout.sec = 20; Token_timeout.usec = 0;
Hurry_timeout.sec = 6; Hurry_timeout.usec = 0;
Alive_timeout.sec = 1; Alive_timeout.usec = 0;
Join_timeout.sec = 1; Join_timeout.usec = 0;
Rep_timeout.sec = 5; Rep_timeout.usec = 0;
Seg_timeout.sec = 2; Seg_timeout.usec = 0;
Gather_timeout.sec = 10; Gather_timeout.usec = 0;
Form_timeout.sec = 10; Form_timeout.usec = 0;
Lookup_timeout.sec = 90; Lookup_timeout.usec = 0;
}else{
Token_timeout.sec = 5; Token_timeout.usec = 0;
Hurry_timeout.sec = 2; Hurry_timeout.usec = 0;
Alive_timeout.sec = 1; Alive_timeout.usec = 0;
Join_timeout.sec = 1; Join_timeout.usec = 0;
Rep_timeout.sec = 2; Rep_timeout.usec = 500000;
Seg_timeout.sec = 2; Seg_timeout.usec = 0;
Gather_timeout.sec = 5; Gather_timeout.usec = 0;
Form_timeout.sec = 5; Form_timeout.usec = 0;
Lookup_timeout.sec = 60; Lookup_timeout.usec = 0;
}
heavily loaded the token will be significantly slowed and all Spread daemons will
be slowed. When the load gets very high (over 30, or less on a large configuration),
these delays can even cause spurious membership changes as the daemons think
the token was lost, even though it is just slow because of the delays.
The best solution so far for this situation is to make some of the following
three changes.
First, modify the timeouts as described in Section 2.4.1 to be larger. Especially
increase the Token timeout and Form timeout to be at least several seconds larger
then the longest average time a token takes to get to all the machines. For example,
if because of scheduling delays each daemon takes 300 ms to get the cpu when a
token arrives, then allow at least 350 ms per daemon. So with 30 machines you
will want 11 seconds plus a few so maybe 15 second timeout for the Token. One
way to calculate this delay is to run the monitor and query one machine every
second watching the token-rounds variable. See how many seconds it takes for
one round of the token to occur under the highest load you normally experience.
Then add a few seconds and use that as your timeout.
Second, run the Spread daemon with real-time scheduling priority.2 This is
standard on all unixes (and can also be done on Windows), and is quite simple.
This will give Spread the first chance at the CPU whenever it needs it. The costs of
this are straightforward. First it requires root privilege on the machine the daemon
runs on, and second if Spread for some reason becomes a runaway process not
releasing the CPU it is impossible to stop unless you also have a shell set to a
higher real-time scheduling priority. We have never seen Spread runaway with
CPU and it is very unlikely a bug could cause it because of the event based design
of Spread .
Third, use the monitor to adjust the flow control parameters of the token. Since
each token rotates much slower under high load then under light load the dae-
mons are sending fewer messages per second on the network. If the load is high
but spare bandwidth on the network is available, you could try increasing the
number of packets each daemon is allowed to send when it gets the token (the
Personal window of each daemon) and the total number of packets that can be
sent during each rotation of the token (the Window).
but bursts are higher some buffering by the daemon can help significantly.
Two specific values can be of use in tuning the available buffering. The first
is the WATER MARK variable defined in the spread params.h file. This sets spread params.h
the number of messages Spread will accept from all client connections, without
sending them on, before blocking the applications. Once Spread has actually sent
some of the messages onto the network it will unblock the applications.
The second is the number of buffers that Spread will keep for each receiver
when delivering messages. If the client application is not calling SP receive suf-
ficiently often to keep up with the number of messages being delivered to it then
Spread will buffer upto MAX SESSION MESSAGES. spread params.h
3
The only advantage of shrinking the MAX PROCS SEGMENT is a small decrease in the
required memory so in almost all cases you will not need to change this value.
22 CHAPTER 2. INSTALLING AND CONFIGURING SPREAD
Chapter 3
Spread C API
3.1 Introduction
3.1.1 Short Buffer Handling
It is the traditional behavior of networking APIs that when a user provided buffer
is insufficient, the API will provide as much data as possible and truncate the rest.
Sometimes the user receives a notice that some data was truncated and sometimes
no notification is given. Thus it is the user’s responsiblity to detect when data-
grams are too short and recover in some way (such as re-requesting data).
The difficulty with using this approach in Spread is that when the application
has to recover from this some properties of the message are lost. For example,
if the message was a SAFE message, the other members can rightly assume that
either all the members will get the data or they will not get it because they crash
or disconnect from Spread . In this case some members might get part of the data,
but have to recover the rest of it, also the data can be lost even when the process
continues to execute correctly which makes it difficult for the other members to
detect the fault.
Essentially because each message has attached meaning, such as ordering, or
reliability guarantees, unpredictable loss of data in an otherwise reliable system
compromises the very semantics we want to use. It is possible to check for this
loss and recover, but the costs are significant, especially when weighed against
the cost of avoiding the problem in the first place. Thus, unlike UDP datagrams,
Spread messages are designed to be reliable even with short buffers.
The method used is straightforward. Spread will never truncate large messages
unless you explicitly ask it to. When you call SP receive with a data buffer or
groups list too short to hold all the data, the SP receive function will return with
an error code of GROUPS TOO SHORT or BUFFER TOO SHORT and NO data
or groups will be returned. The only information that will be returned is in the
following parameters:
23
24 CHAPTER 3. SPREAD C API
So, when SP receive returns one of the * TOO SHORT errors you can exam-
ine the service type and mess type fields to get some information about what kind of
message Spread is trying to give you. You can then examine the num groups and
endian mismatch fields to discover how large your buffers need to be. If either field
is set to 0 then that buffer was large enough and does not have to be increased.
Obviously this can only be true for one of the buffers since one of them was an
actual error. You then increase your application buffers and call SP receive again.
It should return with the message and without error (unless something else is also
wrong).
This retry approach is safe with multi-threaded applications because each call
succeeds or fails on it’s own and if two threads retry for the same message, one
will get it and the other will get the message after it (which is what would happen
anyway if they were not retrying).
The retry approach does, however, require that the application check for errors
when calling SP receive and if a * TOO SHORT error occurs they either enlarge
their buffers or call SP receive again with the DROP RECV flag set, as described
below. If they either ignore errors or do not correct the short buffers, the applica-
tion will continually loop calling SP receive and never receive anything.
If the application does not want to actually receive the entire data buffer or
groups list, it has the option of calling SP receive with the service type field set
to the DROP RECV flag. When this is done, Spread will treat the message just
like most networking systems and return all the data and groups that will fit in the
available space and truncate the rest. It will still return an error value informing
the application that it has lost data. In simple applications or ones with relaxed, or
3.2. API DATATYPES 25
specialized requirements this might be more useful then having to check for error
values and retry the SP receive.
3.3 SP Functions
3.3.1 SP connect
#include <sp.h>
int SP connect( const char * spread name, const char * private name,
int priority, int group membership, mailbox * mbox,
char * private group );
SP connect is the initial call an application must make to establish a connec-
tion with a Spread daemon. All other spread calls must refer to a valid mbox set
by this function (mbox is passed by reference).
The spread name is the name of the Spread daemon to connect to. It should be
a string in one of the following forms:
4803 connect to the Spread daemon on the local ma-
chine using Unix Domain Sockets with socket on
/tmp/4803. This form cannot be used to connect
to a Windows95/NT machine.
4803@localhost connect to the Spread daemon on port 4803 of
the local machine through loopback TCP/IP. This
form can be used on Windows95/NT machines.
26 CHAPTER 3. SPREAD C API
The private name is the name this connection would like to be known as. It
must be unique on the machine running the spread daemon. The name can be of
at most MAX PRIVATE NAME characters with the same character restrictions
as a group name (mainly it cannot contain the ’#’ character).
The priority is a 0/1 flag for whether this connection will be a ”Priority” con-
nection or not. Currently this has no effect.
The group membership is a boolean integer. If 1 then the application will re-
ceive group membership messages for this connection, if 0 then the application
will not receive any membership change messages.
The mbox should be a pointer to a mailbox variable. After the SP connect
call returns this variable will hold the mbox for the connection.
The private group should be a pointer to a string big enough to hold at least
MAX GROUP NAME characters. After the SP connect call returns it will con-
tain the private group name of this connection. This group name can be used to
send unicast messages to this connection and no one can join this special group.
RETURN VALUES
3.3.2 SP disconnect
#include <sp.h>
int SP disconnect( mailbox mbox );
SP disconnect should be called when the application is finished with a connec-
tion to the Spread daemon. The application may have other connections still open
to the daemon and may open a new connection after disconnecting.
The mbox should be for the connection you wish to disconnect from.
RETURN VALUES
NORMAL returns 0 on success
ILLEGAL SESSION when the session mbox given is not a valid connection.
3.3.3 SP join
#include <sp.h>
int SP join( mailbox mbox , const char * group );
SP join joins a group with the name passed as the string group. If the group does
not exist among the Spread daemons it is created, otherwise the existing group
with that name is joined.
The mbox of the connection upon which to join a group is the first parameter.
The group string represents the name of the group to join.
RETURN VALUES
NORMAL returns 0 on success.
ILLEGAL GROUP the group given to join was illegal for some reason. Usually
because it was of length 0 or length > MAX GROUP NAME
ILLEGAL SESSION the session specified by mbox is illegal. Usually because it
is not active.
CONNECTION CLOSED during communication errors occured and the join could
not be initiated.
3.3.4 SP leave
#include <sp.h>
int SP leave( mailbox mbox, const char * group );
SP leave leaves a group with the name passed as the string group. If the group
28 CHAPTER 3. SPREAD C API
does not exist among the Spread daemons this operation is ignored, otherwise the
group is left.
The mbox of the connection upon which to leave a group is the first parameter.
The group string represents the name of the group to leave.
RETURN VALUES
NORMAL returns 0 on success.
ILLEGAL GROUP the group given to leave was illegal for some reason. Usu-
ally because it was of length 0 or length > MAX GROUP NAME
ILLEGAL SESSION the session specified by mbox is illegal. Usually because it
is not active.
CONNECTION CLOSED during communication errors occured and the leave could
not be initiated.
#include <sp.h>
int SP multicast(mailbox mbox, service service type, const char * group,
int16 mess type, int mess len, const char * mess );
int SP scat multicast( mailbox mbox, service service type,
const char * group, int16 mess type, const scatter scat mess );
int SP multigroup multicast(mailbox mbox , service service type,
int num groups , const char groups[][MAX GROUP NAME],
int16 mess type, int mess len, const char * mess );
int SP multigroup scat multicast(mailbox mbox, service service type,
int num groups, const char groups[][MAX GROUP NAME],
int16 mess type, const scatter scat mess );
SP multicast and its variants all can send a message to one or more groups.
The message is sent on a particular connection and is marked as having come from
that connection. The service type is a type field that should be set to the service
this message requires. The valid flags for messages are:
• UNRELIABLE MESS
• RELIABLE MESS
• FIFO MESS
• CAUSAL MESS
• AGREED MESS
3.3. SP FUNCTIONS 29
• SAFE MESS
This type can be bit ORed with other flags like SELF DISCARD if desired.
Currently SELF DISCARD is the only additional flag.
If the SP multicast or SP scat multicast versions are being used then
only one group can be sent to. So the group string should include the name of
the group to send to. If a multigroup varient is being used, then the groups are
specified by the num groups integer and the array of group names called groups
representing all the groups the message should be sent to. Each group has a string
name of no more then MAX GROUP NAME chars. The array should have at
least as many group names as the ’num groups’ parameter indicates.
The Spread system will only send the message once but will deliver it to all
connections which have joined at least one of the groups listed.
The mess type is a short int (16 bits) which can be used by the application
arbitrarily. The intent is that it could be used to NAME different kinds of data
messages so they can be differentiated without looking into the body of the mes-
sage. This value will be endian corrected before receiving.
If the non-scatter variants are being used, then a single buffer is passed to the
multicast call specifying the full message to be sent. The mess len field gives the
length in bytes of the message. While the mess field is a pointer to the buffer
containing the message. For a scatter call, both of these are replaced with one
pointer, scat mess, to a scatter structure, which is just like an iovec. This allows
messages made up of several parts to be sent without an extra copy on systems
that support scatter-gather.
RETURN VALUES
ILLEGAL MESSAGE the message had an illegal structure, like a scatter not
filled out correctly.
here, unless the array is too small and you have chosen DROP RECV
semantics by setting that flag in the service type field when
you called SP receive. In that case as many group names
as can fit will be listed and the num groups value will be
set to be negative. For example, if your groups array could
store 5 group names, but a message for 7 groups arrived, the
first five group names would appear in the groups array and
num groups would be set to 7.
mess type set to the message type field the application sent with the
original message, this is only a short int (16bits). This value
is already endian corrected before the application receives it.
endian mismatch set to true (1) if the endianness of the sending machine dif-
fers from that of this receiving machine. Otherwise set to
false (0). This field is handled in a special way when certain
errors are returned. See Section 3.1.1 for details on this field
when the message buffers are too small.
mess the actual message body being received is stored into this
buffer.
max mess len the length of the mess buffer in bytes. Messages larger then
the buffer size are handled in the usual way. See Section 3.1.1
for details.
sender set to the name of the group for which the membership change
is occuring.
max groups not used.
max mess len not used.
num groups always set to 0.
groups is empty, since there are no normal groups for a transitional
membership. The sender field is used instead.
mess type set to -1.
endian mismatch set to zero since the transitional does not have any endian
issues.
32 CHAPTER 3. SPREAD C API
So, in essence, the only information you get is the sender field which is set
to the group name that received a transitional membership change message. The
importance of the TRANS MEMB MESS is that it tells the application that all
messages received after it and before the REG MEMB MESS for the same group
are ’clean up’ messages to put the messages in a consistant state before actually
changing memberships. For more explanations of this please see other documen-
tation and research papers.
If this is a MEMB MESSAGE (i.e. membership message) and it is specifically
a REG MEMB MESS type membership message, then:
sender set to the name of the group for which the membership change
is occuring.
max groups same as regular message.
max mess len same as regular message.
mess type set to the index of this process in the array of group members.
endian mismatch set to 0 since there are no endian issues with regular mem-
berships.
num groups set to the number of members in the group after the change.
groups contains a deterministically ordered list of the private group
names of the members of the group after the change.
mess contains the identifier of this group membership and a list of
all the private group names of those processes which came
with your process from the old group membership into this
new membership.
The data buffer will include the following fixed length fields:
• group id;
sender set to the name of the group for which the membership change
is occuring.
max groups same as for regular message.
max mess len same as for regular message.
mess type set to 0.
endian mismatch set to 0.
num groups set to 0.
groups will be empty. This is because this process is no longer part
of the group and thus has no knowledge of it.
mess contains the group id of new membership and the private
group name of the member who just left. This name should
always be the private group name of the connection which
received this message.
The data buffer will include the following fixed length fields:
• group id;
The trans members array will have 1 group name containing the private group
name of the leaving process, since this case only occurs with a CAUSED BY LEAVE
membership change.
RETURN VALUES
NORMAL Returns the size of the message received on success.
ILLEGAL SESSION the mbox given to receive on was illegal.
ILLEGAL MESSAGE the message had an illegal structure, like a scatter not
filled out correctly.
CONNECTION CLOSED during communication to receive the message commu-
nication errors occured and the receive could not be
completed.
BUFFER TOO SHORT the message body buffer was too short to hold the mes-
sage being received.
GROUPS TOO SHORT the groups buffer was too short to hold the groups list
or member list being received.
4.1 Introduction
Writing1 Spread applications in Java is as simple and easy as writing Spread appli-
cations in C, but with the added benefits of the Java language. All of the function-
ality of the C interface to Spread is available when developing in Java, with some
extra tools and utilities. The Spread library consists of one package, ”spread”,
which contains ten classes. The main classes are SpreadConnection, which rep-
resents a connection to a deamon, SpreadGroup which represents a spread group,
and SpreadMessage, which represents a message that is either being sent or being
received with spread.
import spread;
1
A previous version of this chapter was written by Dan Schoenblum the original author of the
Spread Java Library
35
36 CHAPTER 4. SPREAD JAVA API
import spread;
SpreadConnection SpreadConnection();
connect(InetAddress spread name, int port, String privateName,
boolean priority, boolean groupMembership);
disconnect();
SpreadGroup getPrivateGroup();
multicast(SpreadMessage message);
multicast(SpreadMessage messages[]);
SpreadMessage receive();
SpreadMessage[] receive(int numMessages);
boolean poll();
add(BasicMessageListener listener);
add(AdvancedMessageListener listener);
remove(BasicMessageListener listener);
remove(AdvancedMessageListener listener);
To establish a connection to a spread daemon, use the SpreadConnection class.
First, create a new SpreadConnection object, then use the connect() method to
make a connection to a daemon:
at most MAX PRIVATE NAME characters with the same character restrictions
as a group name (mainly it cannot contain the ’#’ character).
The priority is a 0/1 flag for whether this connection will be a ”Priority” con-
nection or not. Currently this has no effect.
The group membership is a boolean integer. If 1 then the application will re-
ceive group membership messages for this connection, if 0 then the application
will not receive any membership change messages.
This connection can be used until the disconnect() method is called, which
terminates the connection to the daemon.
Aside from adding and removing listeners, no methods should be called on a
SpreadConnection before connect() is called.
The private group should be a pointer to a string big enough to hold at least
MAX GROUP NAME characters. After the Connect call returns it will contain
the private group name of this connection. This group name can be used to send
unicast messages to this connection and no one can join this special group.
To receive a message, use SpreadConnection’s receive() method. receive() will
block until a message is available. When one is ready to be received, the message
will be read and placed into a new SpreadMessage object which is returned by
receive().
The isRegular() method can be used to check if the message is a regular mes-
sage. Otherwise, it is a membership message. Membership messages will only
be received if they are request by passing true as the final arguement to Spread-
Connection’s connect() method. If the message is a regular message, the get*()
methods in SpreadMessage will provide more information about the message. If
the message is a membership message, the getMembershipInfo() method can be
used to return a MembershipInfo object, which provides information about the
membership change.
if(message.isRegular() == true)
System.out.println("New message from " + message.getSender());
else
System.out.println("New membership message from "
+ message.getMembershipInfo().getGroup());
import spread;
SpreadMessage SpreadMessage();
boolean isIncoming()
boolean isOutgoing()
int getServiceType();
boolean isRegular();
boolean isMembership();
boolean isUnreliable();
boolean isReliable();
isFifo();
isCausal();
isAgreed();
isSafe();
isSelfDiscard();
SpreadGroup[] getGroups();
SpreadGroup getSender();
byte[] getData();
Object getObject();
Vector getDigest();
short getType();
boolean getEndianMismatch();
setServiceType(int serviceType);
setUnreliable();
setReliable()
setFifo();
setCausal();
setAgreed();
setSafe();
setSelfDiscard(boolean selfDiscard);
addGroup(SpreadGroup group);
addGroup(String group);
addGroups(SpreadGroup groups[]);
addGroups(String groups[]);
setData(byte[] data);
setObject(Serializable object);
digest(Serializable object);
setType(short type);
MembershipInfo getMembershipInfo();
Object clone();
First, create a new SpreadMessage object. This creates a new outgoing message.
Next, the message data, the groups the message is going to, and the type of de-
livery requested should be set. This will use functions like setData, addGroup,
and setReliable.
The setData() method sets the message’s data to an array of bytes. Alterna-
tives to setData() are setObject() and digest(), each of which takes an object that
implements the Serializable interface. setObject() is used for sending one Java ob-
ject, while repeatedly calling digest() can be used to send multiple objects in one
message. The addGroup() method is used to specify a group to send the message
to. The setReliable() is used to set the delivery method. Possible delivery meth-
ods are: unreliable, reliable, fifo, causal, agreed, and safe. The setDelfDiscard()
method can be used to specify that this message should not be sent back to the
user who is sending it.
To actually send the message, call SpreadConnection’s multicast() method on
the message you want to send.
SpreadGroup SpreadGroup();
join(SpreadConnection connection, String groupname);
leave();
String toString();
boolean equals(Object object);
To join a group on the connection, use the SpreadGroup class. First, create a
new SpreadGroup object, then use the join() method to join a group:
boolean isRegularMembership();
boolean isTransition();
boolean isCausedByJoin();
boolean isCausedByLeave();
boolean isCausedByDisconnect();
boolean isCausedByNetwork();
boolean isSelfLeave();
SpreadGroup getGroup();
GroupID getGroupID();
SpreadGroup[] getMembers();
SpreadGroup getJoined();
SpreadGroup getLeft();
SpreadGroup getDisconnected();
SpreadGroup[] getStayed();
4.5 Exceptions
When an error occurs in a Spread method, a SpreadException is thrown. One
example is if receive() is called on a SpreadConnection() object before connect()
42 CHAPTER 4. SPREAD JAVA API
try
{
connection.multicast(message);
}
catch(SpreadException e)
{
e.printStackTrace();
System.exit(1);
}
5.1 Introduction
The Event subsystem in Spread provides an abstract interface to manage all pos-
sible types of events that can occur in a networked application. This includes
network or file IO and timed function calls. These events are registered with sub-
system along with the functions to be called when the events occur. The Event
subsystem uses whatever tools the operating system provides to monitor system
events and to wait for specified times to elapse to implement a main loop which
calls the registered callback functions whenever appropriate.
A significant difference between the Spread event system and other similar
wrappers around select or poll is that the event system also supports the idea
of priority levels. Each event is registered at a particular priority level. At any
time only events with a certain priority or higher will be handled. This feature
is used in Spread to selectively ignore certain types of events (such as new client
connections) while other more important events are going on (such as membership
changes).
45
46 CHAPTER 5. THE EVENT SUBSYSTEM
The code and data parameters are passed to the function func when it is called
at the specified time. The function can use them however it wants. In most cases
you should use the code parameter if all you need to pass is an integer, for ex-
ample, representing a file descriptor, a state value, or to distinguish between the
normal case and a special case for the function. If you need to pass more com-
plicated state, then create a structure which stores it all and pass a pointer to the
structure in the data parameter.
Note that the E dequeue function does not free any data pointed at by the
data parameter so the application has to make sure to free that if noone besides
this function call needs it.
function for WRITE events on fd 5 and a third function for EXCEPTION events
on fd 5. As a practical matter, you will usually want to register the same function
for both READ and EXCEPTION events because the only way to detect when the
other end of a TCP socket is closed is by doing a read or recv call on it and
the return value being ’0’. The closing of a TCP socket is sometimes considered
a READ event by the operating system and sometimes an EXCEPTION event so
registering both is necessary to correctly handle closed TCP sockets in all cases.
events.h
• int E attach fd( int fd, int fd type, void (* func)(), int
code, void *data, int priority );