Documente Academic
Documente Profesional
Documente Cultură
Permission to copy without fee ell or part of this material is The existing multicast architecture is not restricted to
granted provided that the copies are not made or distributed for 1P networks, but is being accepted as the solution to
direct commercial advantage, the ACM copyright notice and the multicasting in many different kinds of networks and
title of the publication and its data appear, and notice is given environments.
that copying is bv permission of the Association for Comtmtincr
For each multicast group, the current architecture
Machinery. To copy otherwise, or to republish, requiree a fee
and/or specific permission. builds a shortest-path source-based delivery tree be-
SIGCOMM’93 - Ithaca, N. Y., USA /9/93
a I 993 ACM 0.89791 .619 -0/93 /0009 /0085 . ..$1 .50
85
tween each sender and the corresponding multicast re- good scaling properties across the full range of ap-
cipients. The multicast tree-building algorithms are plications will have both limited usefulness and a
tightly-coupled to particular unicast algorithms. At do- restricted lifetime in the internet.
main boundaries, where differing multicast algorithms
may interface, various ad hoc means are used to estab- * Robustness. Any multicast routing algorithm
lish the tree. This is further discussed in section 4.2. should include features that provide robustness
Routers on a multicast tree store (source, group) pair in terms of maintaining/repairing connectivity be-
information. t ween group members.
With the advent of multicasting in an internet of ever in- ticast packets between the source and destination(s).
creasing size and heterogeneity (with respect to routing To summarise, multicasting optimizes bandwidth con-
and addressing) we feel that the list of desirable proper- sumption and transmitter costs.
ties a multicast routing algorithm should exhibit, should In the following subsection we briefly describe some
be extended to include: features of the current 1P multicast environment. Sub-
sequent subsections outline current 1P multicast algo-
e Scalability. With the internet growing at its current rithms, namely the Distance-Vector Multicast Routing
rate, we can expect to see a large increase in the Protocol (DVMRP) and the Link-State Multicast Rout-
number of wide-area mult icasts. These can vary ing Protocol, respectively. A more comprehensive de-
considerably in their characteristics. Clearly, any scription of these algorithms can be found in [9] and
routing algorithm/protocol that does not exhibit [8].
86
3.1 1P Multicast Environment can cancel a previously sent “prune” message by send-
ing a “graft>’ message to the same router. ‘The “graft”
Most broadcast LANs, such as Ethernet, FDDI, and
message is propogat ed as far as necessary to rejoin the
ATM, intrinsically support multicast addressing. That
sending router to the multicast tree.
is, most end-systems and routers on these types of LAN
are able to distinguish multicast packets from other
types of traffic by means of address type; 1P supports
class D addressing. A class D address is an address
3.3 Link-State Multicast Routing Pro-
taken from a portion of the 1P address space set aside tocol
for multicasting. Each class D address uniquely identi-
fies a single host group. The link-state routing algorithm was extended in [9] to
support shortest-path multicast routing by simply hav-
Routers normally set their network interfaces to
ing routers include, as part of the “state” of a link} a list
promiscuously receive all multicast packets, but a host
of groups that have members on that link. Whenever a
only does so after a higher-layer application explicitly
new group appears or an old group disappears from a
requests it to do so. A single router, called the member-
link, the designated router on that link floods the new
ship interrogator, or designated router, polls the LAN
state to all other routers in the internetwork. Given
for host memberships at intervals, to which only group
full knowledge of which groups have members on which
member hosts reply, once for each group. This group-
links, any router can compute the shortest-path multi-
query/group-reporting mechanism is implemented on
cast tree from any source to any group using Dijkstra)s
LANs in hosts and routers by means of the Internet
algorithm [1]. If the router doing the computation falls
Group Management Protocol (IGMP).
within the tree computed, it can determine which links
it must use to forward copies of multicast packets from
3.2 DVMRP the given source to the given group.
Once the first packet has reached those routers that that the internet is a fast-growing, enormously complex,
have neither child subnets nor leaves with members on heterogeneous structure. In the not too distant future
them, those routers are each responsible for sending a we expect to see huge numbers of multicast groups in
special message called a “prune” back one hop on the existence at any one time.
87
4.2 Unicast Routing Algorithm Depen- trees, were first described by Wall [14]. He used a sin-
A DVMRP router must invest a modest amount of pro- CBT approach are listed below:
multicast packets between multicast-capable routers. A “tunnel” 4To find the topological centre of a dynamic network is NP-
then, is a sequence of rout ers that do not have multicast capabilit y. complete
A “tunnel” is created using a technique based on loose source 5A router has the option to refuse a request to become part of
routing, encapsulation, or a combination of both. a multicast tree.
88
but take full advantage of, underlying unicast rout-
ing, irrespective of which underlying unicast algo- Incoming “multicast” packet
rithm is operating. All of the multicast tree infor- containing CORE address ,,
i
and group-id. ‘..
mation can be derived solely from a router’s exist- ‘.
‘..
ing unicast forwarding tables, with no additional ‘.,
processing
CBT architecture
necessary.
being as robust as the underly-
These factors result in the ‘.
/“
“’’”d’
ing unicast routing algorithm - most of which are
>120
designed with robustness as a high priority.
f! !----b-
secondly, once on the corresponding tree, multi-
cast packets span the tree based on the packet’s
group identifier, or group-id6 (similar to a class D
1P address). We consider this two-phase routing
approach an important advancement in multicast-
ing. It has only been possible as a result of recog- @ = CORE ROUTER
nizing the need for having just one tree per group.
----- e - pati taken by multicast packet
With respect to 1P networks, CBT requires no par-
tition of the unicast address space. O = Non-core router
89
A group name is decided upon externally. It should be ber of cores per backbone networkl”, and the addresses
chosen so that some information as to the nature of the of these cores would be ‘{well-known”.
group can be derived from it. The name itself should be Alternatively, and possibly more appropriately, any
taken from the namespace unique to the group initiator. router could become a core when a host on one of its
In this way, name clashes are only possible within that attached subnets wishes to initiate a group. This is
namespace. However, such clashes are easily avoided by particularly attractive for a one-to-many “broadcast”
simply having each name authorised by the local system where the sender remains constant, since, if the sender
administrator. For example, the name of an audio con- is the core, the multicast tree formed will be a shortest-
ference initiated at University College London could be path spanning tree rooted at the sender.
called cbttalk. cs.ucl.ac.uk. Once a group’s name, core We have stressed that the placement of a group’s core
address(es), and group-id have been established, direc- should positively reflect that groups characteristics. In
tory service must be updated (in the case of DNS, which the absence of a dynamic core placement mechanism, a
cannot be dynamically updated by hosts, the informa- core should be “hand-picked”, i.e. selected by external
tion must be manually entered7). agreement based on a judgement of what is “known” 11
Whenever a host wishes to join a group for which it about the network topology between the current mem-
knows only the name, it must query directory service for bers.
both the group-id and core address corresponding to
that group name. Subsequently, the host can generate
a group membership report as part of IGMP containing
6.3 CBT Forwarding Algorithm
the information necessary for the local router to send a We distinguish between CBTcontrol packets and mul-
request to join the group. ticast data packets. CBT control packets do not carry
user data and are primarily concerned with tree build-
ing, re-configuration, or tree teardown. Multicast data
6.2 Cores and Core Placement packets on the other hand, carry only user data to/on
a tree once the tree has been established. CBT control
CBT involves having a single-core tree per group, with
packets and multicast data packets travel in 1P data-
additional cores to add an element of robustness to the
grams. CBT control packets are forwarded as per uni-
model. The cores associated with a particular tree are
cast and are processed at each hop. Therefore the 1P
collectively known as a core list. Each router in the
destination ~f control packets is always set to the next-
core list is assigned a priority or ranking by the group
hop on the path to the corresponding core.
initiator, and therefore is an explicitly ordered list. The
highest-ranked in the list is known as the primary core,
● CB T data packet forwarding algorithm. Multicast
the next-highest being the secondary core, followed by
the tertiary core, and so on. data travels in an 1P packet. Therefore, CBT-
capable routers must have a way of recognizing
We now outline several trivial heuristics for core
when mulitcast data packets are being carried in
placements. It is considered likelyg that the placement
an 1P packet. To make this possible, data pack-
of a core for a multicast tree will assist in optimizing the
ets destined for a particular group tree carry the
routes between any sender and group members on the
group core address in the “destination field” and
tree. Multicast path-finding algorithms from a known
the group-id in the “option” field of the 1P packet’s
source have been devised [4] for networks of varying mul-
header. This implies the need for a new 1P “options
ticast capability, but there exists no polynomial time
number” to be defined for CBT.
algorithm that can find the centre of a dynamic multi-
cast spanning tree. We consider this a topic requiring On arriving at an on-tree router, the core address
further research. in the destination address field of the 1P header
Cores could be statically configured throughout the is discarded, and the group-id in the option field is
internet - there need only be some relatively small num- placed in the destination address field. The reasons
for this are twofold: firstly, it allows for faster en-
7There i.q a time-delay between updating a DNS server -d tree switching, since there is no option processing
when the information becomes avsilable, as well as the poten-
necessary; secondly, since multicast-capable (non-
tially much longer delay in getting DNS updated by a system
administrator. To overcome these shortcomings, each site could
CBT) routers process all multicast packets by de-
have a certain number of permanent groups, i.e. cores addresses, fault, CBT multicast packets are prevented from
groupid, and group name would remain fixed. Whenever a host
wishes to initiate a group, it simply chooses one from a selection 10 The ~torage ~d switching overhead incurred by these core
of these tuples which is not in use. rout ers increases linesrly with the number of groups traversing
8 We considered disseminating core identities by including them them. A threshold value could be introduced indicating the max-
in link-state routing updates. However, this does not provide imum number of groups permitted to traverse a core router. Once
scalability since it involves global group information distribution. exceeded, additional core routers would need to be assigned to the
Further, it involves a dependency on link-state routing backbone.
‘Yet to be proven by experiments. 11 alleged to be known
90
being forwarded (unicast) by these routers, to the it. Alternatively, the core list is transmitted separately
core”. to the local router13. Once the local router has received
the core list, it can proceed to send a JOIN-REQUEST.
CBT routers forward arriving multicast data pack-
A JOIN-REQUEST includes, as part of its header,
ets based on information contained in their CBT
the 32-bit group-id, and an “active” bit which is set
Forwarding Information Base (FIB). The FIB con-
tains a list of interfaces for each group with which it before sending a join. The relevance of the “active” bit
will be explained in section 6.6.1. The JOIN-REQUEST
is associated. A multicast data packet is forwarded
is then forwarded to the next-hop router on, the path to
on the corresponding outgoing interfaces.
the core, as determined by the unicast forwimding table.
Section 6.7.2 discusses what happens if there is no such
6.4 Protocol Overview next-hop.
The join continues its journey until it either reaches
CBT routers will continue to use IGMP to monitor
the addressed core, or reaches a CBT-capable router
group membership on their directly attached subnet-
that is already part of the tree, as identified by the
works. CBT will also continue to use both the “all
group-id. At this point, the join’s journey is terminated
hosts” and the “all routers” addresses, and therefore
by the receiving router, which then decidesl whether to
these two addresses must remain reserved. At the link
acknowledge the join request. A JOIN-REQUEST is
level, CBT multicasts will appear identical to existing
normally acknowledged by means of a JOIN-ACK14.
1P multicasts. Taking Ethernet as an example, cur-
All the CBT-capable routers traversed by a JOIN-
rently the low-order 23-bits of the destination address
ACK change their status to CB T-non-core routers for
field are mapped directly onto the low-order 23-bits of
the group identified by group-id. It is the JOIN-ACK
the special Ethernet multicast hardware address. CBT
that actually creates a tree branch. Between sending a
will continue to do this, except the destination address
JOIN-REQUEST and receiving a JOIN-ACK, a router
will not contain a class D address. However, this should
is in a state of pending membership. A router that is in
not affect link-level multicasting at all.
the join pending state can not send join acknowledge-
CBT then, is an overlay of the underlying unicast
ments in response to other join requests received for
routing algorithm. Its position and relationship to other
the same group, but rather caches them for acknowl-
1P protocols is shown in Figure 2.
edgement subsequent to its own join acknowledgement.
Furthermore, if a router in the pending state gets a bet-
ter route to the core to which its join was sent, it sends
1P Service Interface
A A
a new join
ous join (this
on the better
is required
route after canceling
to deal with unicast
its previ-
transient
loops).
T
Each router on a CBT tree records its parent and child
ICMP CBT IGMP interfaces with respect to a particular tree, i.e. group.
The parent interface is that over which a JOIN-ACK was
received. A non-core router can have only one parent
1P Module interface per CBT tree. A child interface is one over
which a JOIN-ACK has been forwarded with respect
to a particular group15. A non-core router may have
multiple child interfaces per CBT tree.
1 ‘1
A QUIT-REQUEST is a request by a non-core router
to leave a group. A QUIT-REQUEST ma!y be sent by
a router to detach itself from a tree if and only if it has
Local Network Service Interface no members for that group on any directly attached
subnets, AND it has received a QUIT-REQUEST on
Figure 2: Protocol Relationships each of its child interfaces for that group, The QUIT-
REQUEST is sent to the parent router. The parent
immediately acknowledges the QUIT-REQUEST with
CBT operation is invoked whenever a router receives
a QUIT-ACK and removes that child interface from the
a group membership report as part of IGMP from some
host on a directly attached subnetwork. A group mem- tree. Any non-core router that sends a QIJIT-ACK in
bership report contains the group-id. A group member- 13 on ~ m~ti-acceSS LAN one router will be elected aS desig-
ship report may also have the core list ‘piggybacked” on nat ed rout er
14 However, ~ ~egative acbowledgement (JOINNACK) may be
lz~teropmabiuty has been an important design gO~ through- sent in reply to a join request for vsrious reasons, for example, in
out the development of CBT. For this reason alone, we decided the event of loop detection, or simply because the router does not
that the group-id should be selected from the class D 1P address wish to become part of a CBT tree for that group.
15 And ~vcr ~hich no QUIT-ACK has been 8Gnt I(SCC below),
Sp8CCm
91
response to receiving a QUIT-REQUEST should itself No further action is necessary since a loop-free path
send a QUIT-REQUEST upstream if the criteria de- exists from the originating router to the tree.
scribed above are satisfied.
If the loop consists entirely of routers on the tree,
Failure to receive a QUIT-ACK despite several re-
then the router originating the join is attempting to
transmissions gives the sending non-core router the right
re-join the tree. In this case also, the join could be
to remove the relevant parent interface information, and
acknowledged which would result in a loop forming
by doing so, removes itself from the CBT tree for that
on the tree, so we have designed a loop-detection
group.
mechanism which is described below.
● a join is sent over a transient loop, but no part For reasons of robustness, we need to consider what
of the corresponding CBT tree forms part of that happens when a primary core fails. There are two ap-
loop. In this case, the join will never get acknowl- proaches we can take, namely have:
edged and will therefore timeout. Subsequent re-
tries will succeed after the transient loop has dis- ● single-core CB T -trees. Paths as well as cores them-
appeared. selves can fail, which may result in parts of the net-
work being partitioned from others. In the presence
● a join is sent over a transient loop, and the loop of CBT trees this can mean a single tree can itself
consists either partly or entirely of routers on the become partitioned. To cater for tree partitions, we
corresponding CBT tree. If the loop consists only have multiple “backup” cores to increase the prob-
partly of routers on the tree and the join originated ability that every network node can reach at least
at a router that is not attempting to re-join the tree, one of the cores of a CBT tree. At any one time, a
then the JOIN-REQUEST will be acknowledged. non-core router is part of a single-core CBT tree.
92
● multi-cow CB T trees. Multi-core CBT trees can be the targets of CBT control packets and multicast
are most useful for groups that are topologically data packets, as well as being able to generate ICMP
widespread. Each core is then strategically placed cbt ~edirects (which non-core routers cannot do).
where the largest “pockets” of members are lo- All senders should direct multicast packets (and con-
cated so as to optimize the routes between those trol packets) at the highest-priority reachable core. Un-
members. Each of the cores must be joined to der normal, failure-free circumstances this will be the
at least one other, and a reachability y/maint enance primary core router. The reason for this is to force the
protocol must operate between them. There ex- emphasis of the tree onto just one router.
ists no ordering between the multiple cores, and If the primary core should fail, the recovery scenario
senders send multicasts preferably to the near- is the same as that described in section 6.5.
est core. The essential difference between multi-
core and single-core trees is that single-core trees
have no explicit protocol operating between the 7 Future Work
“backup” cores (even though the cores are joined
Work still needs to be done in the following, areas:
together), and, for any sender at any instant, there
is one core to which it must attempt to send its
● Testing and analysis. Once a prototype imple-
multicast packets.
mentation is in place, it will be important to es-
Although it may seem advantageous to have multi-core tablish the answers to some frequently-asked ques-
CBT trees as opposed to single-core trees, complex fail- tions. For example: how will the shape of a CBT
ure scenarios and the overhead of an explicit protocol tree affect performance/delay characteristics be-
operating between the cores have led us to the conclu- tween members; what are per node switching over-
sion that management of multi-core CBT trees is too heads/delays on and off-tree; how robust is CBT in
complex to justify. Our choice of having, in the worst the presence of link failures, and how quickly can
case, multiple partitioned single-core trees per group, CBT adapt to such failures?
offers a trade-off between complexity and robustness.
● Dynamic core selection and placement. Our
We feel that our design offers an acceptable level of ro-
scheme, present ed in section 6.2, may involve man-
bustness.
agement overhead to re-select and re-place the
core(s) for those groups whose characteristics vary
6.7.2 Robust Single-Core Trees considerably over the lifetime of the group. There-
To expand on our approach, we are presenting a model fore, dynamic core selection and placement algo-
based on robust single-core trees. Although our design rithms/heuristics and mechanisms need to be de-
provides each tree (group) with multiple cores, which veloped. These could contribute to the optimality
join each other at group initiation time, it is the pri- of core-based trees.
mary core that is considered the “central hub” of a tree, ● Dynamic group-id assignment. We consider this
with the additional cores simply providing an element
complementary to dynamic core selection and
of robustness to the design. This allows senders of mul-
placement. As multicast capability has been de-
ticast packets more than one destination option, thus
veloping and the number of multicast applications
increasing the probability that every network node can
growing, the issue of dynamic multicast group
“see” a particular tree.
address16 assignment has very much been left in the
At group initiation time, each core in a group’s core
background. The failure to address this problem17
list must attempt to join the primary core for the group.
means that group address conflicts are more and
“Cycles” are avoided between the the cores by having
more likely to occur.
each core set its “ re-join” bit prior to sending a JOIN-
REQUEST. A non-primary core that cannot reach the We consider the development of a dedicated mul-
primary must attempt to join the highest-ranked core ticast directory service which would be responsible
that is reachable from it. for all aspects of group management, a long term
A core that has not (yet) been successful in joining the goal of internetwork multicasting in general. A sim-
primary core, and which receives multicast data packets, ilar idea has recently been suggested in [13].
sends an ICMP cbt -redirect to the originator of those
● Scoping. Refined scoping mechanisms need to be
packets. An ICMP cbt ~edirect prevents data packets
developed. We consider the cores of core-based
disappearing down “black holes” by indicating to the
trees to be focal points with respect to scoping. As
sender to target its packets elsewhere, but giving no
precise indication as to which core the sender should 16 For ~he ~voSe of o~ discussion, in this section read WOuP
93
such, we envisage a core-based tree spanning mul- tion to separate the group identifier from the 1P core
tiple ASS to be made up of a hierarchy of concate- address. We considered this an important design is-
nated single core trees, thus allowing for scoping sue offering several advantages over the original design.
within each sub-tree. A sub-tree may comprise any Benny Rodrig (RAD Network Devices, Ltd) suggested
size from a LAN to an AS, or even multiple ASS. the loop detection mechanism of forwarding a special
packet uptree. Zheng Wang (UCL) proposed swapping
● Policy Routing and Multicasting. Policy routing [2]
the group-id and destination (core) address once en-
and multicasting have so far evolved independently
tree. Also, Joel Halpern (Network Systems Corpora-
of each other. As the internet becomes more policy tion), Steve Deering (Xerox Pare), and Scott Brim (Cor-
oriented, it has yet to be investigated what effects nell University) for their general constructive comments
this will have on multicasting. If internet-wide AD and suggestions since the conception of CBT.
policies are going to be dynamic and wide-ranging,
group membership could be severely constrained re-
sulting in a lack of openness - a desirable property References
of internetwork multicasting.
[1]J. E. Hopcroft A. V. Aho and J. D. Unman.
● CBT Security ATchiiecturt?. Security services such Data Structures and Aig?’oTithms. Addison-Wesley,
as authentication and encryption have yet to be Reading, Mass, U. S.A., 1983.
incorporated into the CBT architecture. The ar-
chitecture permits a number of different security [2] Lee Breslau and Deborah Estrin. Design and Eval-
approaches. These will be investigated by the au- uation of Inter-Domain Policy Routing Protocols.
thor in due course. InternetwoTking: Research and Experience, 2:177-
198, September 1991.
8 Summary [3] Scott Brim and John Moy. Support for Multi-
cast Communications Across Wide-Area Net works.
In this paper we have discussed the existing multicast High Pe~formance Network Research Report, Cor-
architecture, and current 1P multicast algorithms. We nell Univ., June 1992.
have shown the current architecture, based on multicast
[4] Ching-Hua Chow. On Multicast Path Finding Al-
trees rooted at each sender, to scale poorly in an in-
gorithms. In .lnfocom, Conference on ComputeT
ternet consisting on hundreds of thousands of multicast
Communications, pages 1274-1283. IEEE, April
groups at any one time. Furthermore, existing multicast
1991.
schemes were not designed to operate in a heterogeneous
unicast routing environment and therefore do not oper- [5] C. Partridge D. Waitzman and S. Deering. RFC
ate ‘Invisibly” across or inside heterogeneous domains. 1075, Distance Vector Multicast Routing Protocol.
Subsequently we presented a new multicast routing pro- SRI Network Information Center, November 1988.
tocol for 1P based on a different architecture that builds
[6] Y. K. Dalal and R. M. Metcalfe. Reverse Path For-
core-based multicast trees, one per group. This implies
warding of Broadcast Packets, Communications of
that routers not on a multicast tree need know noth-
the ACM, 21:1040–1048, December 1978.
ing about the existence of groups of which they are not
a member, and therefore can route packets from non- [7] Martin de Prycker. Asynchronous Transfer Mode.
member senders to the tree as per unicast. Routers on Ellis Horwood Limited, Chichester, England, 1991.
a multicast tree need only know their immediate uptree
and downtree neighbour routers - a minimal amount of [8] S. E. Deering. Multicast Routing in Internetworks
information with respect to a group. The architecture is and Extended LANs. In ACM Symposium on Com-
thus scalable, low-cost, and efficient. It can be applied munication Architectures and Protocols, pages 55-
to networks and technologies other than 1P. 64. ACM SIGCOMM, August 1988.
Work is continuing to refine CBT and eliminate po-
[9] S. E. Deering. Multicast Routing in a Datagram In-
tentizd weak points where possible. A CBT implementa-
ternetwork. PhD thesis, Stanford University, Cali-
tion is planned shortly, after which we can realistically
fornia, U. S.A., 1991.
assess its performance, especially when compared with
existing multicast techniques. [10] S. E. Hardcastle-Kille. RFC 1279, X.500 and Do-
mains. SRI Network InfoTmatzon CenteT, Septem-
ber 1991.
Acknowledgments
[11] B. Kahle M. Schwartz, A. Emtage and B. Neuman.
There are several people whom we would especially like A Comparison of Internet Resource Discovery Ap-
to thank for their valuable suggestions and contribu- proaches. Computing Systems, 5 (4):461-493, Fall
tions to CBT. Ian Wakeman (UCL) made the sugges- 1992.
94
[12] D. Piscitello. RFC 1209, The Transmission of 1P
Datagrams over the SMDS Service. SRI Network
Information Center, March 1991.
95