Kuechler, Schapranow, - Congestion Control

Congestion Control
Alexander K uchler, Matthieu-Patrick Schapranow

Alexander.Kuechler@hpi.uni-potsdam.de, Matthieu.Schapranow@hpi.uni-potsdam.de
CONGESTION describes a situation of extensive resource use when the supply ex-
ceeds the capacity. This phenomenon was hoped to disappear by introducing more
high-end data links, but it is still omnipresent in modern data networks. So, the trans-
ported data volume increases rapidly and users suffer from bottle necks in routes taken
by their data packets.
Therefore, it becomes more and more important to minimize the risk of congestion and
to nd ways to eliminate it if it occurs. Hence, this paper classies elementary types
of congestion and points out how to control it by implementing well-known algorithms
either on clients or servers side. On the one hand these include ways of preventing
congestion from occurring and, on the other hand, they offer fast mechanisms to clean
up congested network nodes without network-wide starvation.
A paper associated with the seminar
COMMUNICATION NETWORKS
Dr.-Ing. Thi-Thanh-Mai Hoang and Dr.-Ing. Andreas Willig
winter semester 2oo4/2oo5
Seminar Communication Networks 9-1
Congestion Control
9-2 Seminar Communication Networks
CONTENTS
Contents
1 Introduction 9-5
1.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
1.2 Possible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
1.2.1 Increase of Resources . . . . . . . . . . . . . . . . . . . . . . . . 9-5
1.2.2 Decrease of Load . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
1.3 Congestion Control vs. Flow Control . . . . . . . . . . . . . . . . . . . . 9-5
1.4 Classication of Congestion Control Algorithms . . . . . . . . . . . . . . 9-6
2 Host Centric Algorithms 9-8
2.1 Open Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
2.1.1 Open Loop and Source Driven . . . . . . . . . . . . . . . . . . . 9-8
2.1.1.1 Trafc Shaping . . . . . . . . . . . . . . . . . . . . . . . 9-8
2.1.1.2 Leaky Bucket . . . . . . . . . . . . . . . . . . . . . . . . 9-8
2.1.1.3 Token Bucket . . . . . . . . . . . . . . . . . . . . . . . . 9-9
2.1.2 Open Loop and Destination Driven . . . . . . . . . . . . . . . . . 9-10
2.2 Closed Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
2.2.1 Closed Loop and Implicit Feedback . . . . . . . . . . . . . . . . 9-11
2.2.1.1 Slow Start . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
2.2.1.2 Congestion Avoidance . . . . . . . . . . . . . . . . . . 9-12
2.2.2 Closed Loop and Explicit Feedback . . . . . . . . . . . . . . . . 9-14
2.2.2.1 Choke Packets . . . . . . . . . . . . . . . . . . . . . . . 9-14
2.2.2.2 Fast Retransmit . . . . . . . . . . . . . . . . . . . . . . 9-16
2.2.2.3 Fast Recovery . . . . . . . . . . . . . . . . . . . . . . . 9-17
2.2.2.4 Fast Retransmit combined with Fast Recovery . . . . . 9-18
3 Router Centric Algorithms 9-20
3.1 Congestion Collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
3.2 Small packet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
3.3 Router Processed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
3.3.1 Weight Fair Queuing . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
3.3.2 Load Shedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
3.3.3 Random Early Detection (RED) . . . . . . . . . . . . . . . . . . . 9-23
3.4 Router Indicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26
3.4.1 ICMP Source Quench . . . . . . . . . . . . . . . . . . . . . . . . 9-26
3.4.2 Explicit Congestion Notication (ECN) . . . . . . . . . . . . . . . 9-26
4 Conclusion 9-28
5 Glossary 9-29
Congestion Control
List of Figures
1 Classication of Congestion Control Algorithms . . . . . . . . . . . . . . 9-6
2 Leaky Bucket Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
3 Token Bucket Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
4 TCPs Slow Start / Congestion Avoidance combination . . . . . . . . . . 9-12
5 Time-cwnd-diagram: Slow Start / Congestion Avoidance. . . . . . . . . 9-13
6 Usual implementation of Choke Packets . . . . . . . . . . . . . . . . . . 9-14
7 Choke Packets hop-by-hop-scenario . . . . . . . . . . . . . . . . . . . . 9-15
8 Time-cwnd-diagram: Fast Retransmit . . . . . . . . . . . . . . . . . . . 9-16
9 Time-cwnd-diagram: Fast Recovery . . . . . . . . . . . . . . . . . . . . 9-17
10 Time-cwnd-diagram: Fast Retransmit combined with Fast Recovery . . 9-19
11 Packet Flow: Fast Retransmit combined with Fast Recovery . . . . . . . 9-19
12 Circuit diagram: Weight Fair Queuing . . . . . . . . . . . . . . . . . . . . 9-21
13 Load Shedding Algorithm: Dropping packet seven of twelve . . . . . . . 9-22
14 Load Shedding Algorithm: Dropping packet ten of twelve . . . . . . . . . 9-23
15 Petrinet: Random Early Detection . . . . . . . . . . . . . . . . . . . . . 9-24
1 INTRODUCTION
1 Introduction
1.1 Denition
A situation is called congestion if performance degrades in a subnet because of too
many data packets in present, i.e. trafc load temporarily exceeds the offered re-
sources.
The number of packets delivered is proportional to the number of packets send.
But if trafc increases too much, routers are no longer able to handle all the trafc and
packets will get lost. With further growing trafc this subnet will collapse and no more
packets are delivered.
Obviously, two naive solutions are possible: increase of resources or decrease of
load.
1.2 Possible Solutions
1.2.1 Increase of Resources
Increase of resources can be reached by stocking up routers memory to build a queue
for all input lines in order to make one output line. Furthermore, embedding a faster
processor which is able to do background tasks at least as fast as it could be able to
make output is another possibility. Last but not least higher bandwidth could be a factor
for avoiding congestion, too.
But in case of upgrading some components the bottleneck is only shifted. That is
why all components need to be balanced. Because of technical limits it is not possible
to increase the resources innitely (and even if it would be possible it will not avoid
congestion), so it is necessary to decrease the load.
1.2.2 Decrease of Load
A real decrease of load in a subnet is only possible by encouraging hosts to reduce
their outgoing trafc. This idea is not practicable so it is necessary to decrease the
load at chosen single points. To decrease routers load it is useful to tell other routers
forwarding packets another way without using the heavy loaded router.
1.3 Congestion Control vs. Flow Control
Ensuring that all trafc is carried in a subnet is called CONGESTION CONTROL and
controlling the point-to-point trafc between a sender and a receiver is called FLOW
CONTROL.
Congestion Control
Congestion Control involves all hosts, routers, store-and-forwarding processes and
other factors that have something in common with the subnets capacity. Flow control
should slow down a sender which is trying to send more that the receiver can deal
with. Some Congestion Control Algorithms also implements some kind of slow down
messages, so Flow Control and Congestion Control are unfortunately admixed.
1.4 Classication of Congestion Control Algorithms
It is practical to divide Congestion Control Algorithms in two main classes describing
the place where they inuence networks behavior as described in [Zha86] and shown
in 1. This is either on hosts side establishing end-to-end Congestion Control called
HOST CENTRIC or on routers side affecting transfered data packets called ROUTER
CENTRIC.
Figure 1: Classication of Congestion Control Algorithms
The Host Centric class is characterized by a high level of abstraction in the network
model, so that intermediate network nodes are considered as a static and transparent
connection channel without any inuence on networks behavior.
However, algorithms according to the Router Centric class involve each network ac-
tor as an active part of the Congestion Control process, thus more dynamic control can
be guaranteed by implementing algorithms up to the Network Layer of the ISO-/OSI-
Model.
Four classes of Host Centric Congestion Control Algorithms are discussed by Yang
and Reddy in [YR95].
On the one hand, there is the simple static solution called OPEN LOOP to prevent
congestion by understating possible bandwidth on senders side. That means, any
network client uses only a part of the available network bandwidth instead of bursting
1 INTRODUCTION
the whole data as fast as possible through the network. Certainly, this reduces the
throughput, but it is a simple way to prevent congestion from developing. Algorithms of
this type can be implemented on either sources or destinations side, so this class is
subdivided in source driven and destination driven approaches.
On the other hand, there is the more dynamic method to prevent congestion called
CLOSED LOOP. These algorithms adjust system preferences depending on the individ-
ual network state by gathering facts about already detected or possible soon appearing
congestion situations. These status information can be collected by explicit receiver
messages as well as implicit polling initiated by the sender to check whether a route is
congested or not.
Additionally, Router Centric Congestion Control Algorithms can be divided in two
subclasses. These classes specify the reaction in case of congestion: Router Processed
Algorithms do active congestion handling such as packet dropping whereas Router In-
dicated Algorithms signal congestion state and do not inuence this condition directly.
Congestion Control
2 Host Centric Algorithms
2.1 Open Loop
In the rst place, Open Loop Algorithms try to avoid congestion without making any cor-
rections once the systemis up. Essential points for Open Loop solutions are e.g. decid-
ing when to accept new trafc, when to discard which packets and making scheduling
decisions. All of these decisions are based on a sensible system design, so they do
not depend on the current network state.
A sender has to determinate how many packets can be send without provoking
congestion. The receiver has to decide carefully which packets to discard because
dropping any packet can cause considerable data retransmission and this will result in
additional network load.
2.1.1 Open Loop and Source Driven
2.1.1.1 Trafc Shaping
TRAFFIC SHAPING is a generic term for a couple of algorithms avoiding congestion on
senders side without feedback messages. Therefore, an essential decision - the data
rate - is negotiated either on connection set-up or is statically included in used imple-
mentations.
Afterwards, this negotiated data rate will be hold and variations are negligible. This
method can be found especially in ATM telecommunication networks such as Leaky
Bucket (2.1.1.2) or Token Bucket implementation (2.1.1.3). But this potentially creates
latency which is problematic for some applications, such as real time audio and video
applications.
2.1.1.2 Leaky Bucket
The LEAKY BUCKET Algorithm generates a constant output ow. The name describes
to way of working: it works like a bucket with water and a leak on the bottom as shown
in gure 2.
How much water runs into the bucket does not matter. As long as there is any water
left in the bucket it runs out at the same constant rate dened by the leaks size. Ob-
viously, if there is no water in the bucket there is no output. If the bucket is completely
lled additional incoming water gets lost.
This metaphor reects typical network behavior where drops of water are data pack-
ets and the bucket is a nite internal queue sending one packet per clock tick.
2 HOST CENTRIC ALGORITHMS
Figure 2: Leaky Bucket Algorithm
2.1.1.3 Token Bucket
The TOKEN BUCKET Algorithm is a variation of the aforementioned LEAKY BUCKET Al-
gorithm (2.1.1.2).
The intention is to allow temporary high output bursts, if the origin normally does not
generate huge trafc. One possible implementation uses credit points or tokens which
are provided in a xed time interval. These credit points can be accumulated in a lim-
ited number (= bucket size) in the bucket. In case of submitting data these credits have
to be used from the bucket, i.e. one credit is consumed per data entity (e.g. one byte
or one frame) that is injected into the network. If the amount of credit points is used up
(the bucket is empty), the sender has to wait, until it gathers new tokens within the next
time interval.
This fact is illustrated in gure 3 by trying to inject ve data entities into the network
(a) with three available credit points. After transmitting three of ve data entities in this
time tick, no more credits are available, thus no more data entities are injected into the
network (b) until new credits are accumulated with the next time tick.
This algorithm provides a relative priority system. On the one hand, it allows send-
ing small data-bursts immediately, which do typically not congest networks. On the
other hand, this algorithm will not drop any packets on senders side such as LEAKY
BUCKET (2.1.1.2). Because if no further tokens are available in the bucket, any sending
attempt is blocked until a new token becomes available.
Congestion Control
Figure 3: Token Bucket Algorithm
2.1.2 Open Loop and Destination Driven
Algorithms accompanying with this group can be identied by their static behavior:
once these implementations are running they work regardless how networks state
changes. That means, congestion is avoided by receivers side because of well for-
mulated specication. The question is, how those algorithms may look like. They use
receivers capabilities to inuence the initial senders behavior without any explicit indi-
cation.
Therefore, one possible implementation could send a smaller advertised window
(awnd) size in TCP headers than really possible to throttle down senders output. An-
other idea could be the delaying of ACK packets sending by a xed time, which must
be clear below the senders timeout including needed network latency time. But it is
really difcult to determine this delay time statically in more or less dynamic network
topologies such as the Internet. Furthermore, the receivers inuence on the sender is
only an advice, which could be ignored. Therefore, algorithms belonging to this group
are no longer important for development and research, thus no explicit examples will
be given in this paper.
2.2 Closed Loop
Closed loop solutions are the network implementation of typical control circuit. Algo-
rithms according to this class depend on a feedback loop with three parts:
1. system-wide congestion monitoring,
2. pass this information to an action point, and
3. adjust system operations to deal with the congestion.
To detect congestion it is useful to monitor network values like percentage of discarded
packets because of memory lacks, the number of timed out and therefore retransmitted
packets and average queue lengths as well as packet delay such as round trip times.
The gathered information have to be send from the nearly congested point to the
responsible party. So, it is necessary to send this information and with these messages
the trafc increases more and more, which encourages congestion to occur.
The main goal of closed loop solutions is slowing down routers sending packets by
collecting packets in their own queues to reduce and even break down congestion.
2.2.1 Closed Loop and Implicit Feedback
2.2.1.1 Slow Start
The Slow Start Algorithm as described in [Ste97, section 2], [APS99, section 3.1] tries
to avoid congestion by sending data packets defensively. Therefore, two special vari-
ables named congestion window (cwnd) and Slow Start threshold (ssthresh) are stored
on senders side.
Initially, cwnd is sized to one packet when the sender injects a new packet into the
network and waits for the acknowledgment (ACK) from the receiver. Normally, this
packet gets through the network and reaches the recipient in time, so it will be replied
by an ACK.
If this acknowledgment is received by the sender, cwnd is incremented; if network
capacity is reached and packets get lost, the sender does not increment the number of
packets any further. That means, by each sending cycle the number of injected data
packets is doubled until networks capacity is reached and the required ACK cannot get
through. More accurate in TCP, the minimum of cwnd and TCPs advertised window
size species the number of data packets to be injected. If the required ACK packets do
not reach the sender within a specied timeout, the sender interprets it as an evidence
for congestion. Therefore, the sender will set cwnd to its initial value and restarts data
transmission as aforementioned.
Congestion Control
2.2.1.2 Congestion Avoidance
This algorithm dened by [Ste97, section 2], [APS99, section 3.2] is used in combina-
tion with Slow Start, the exclusive use of Slow Start produces uctuated data rates and
loads the network additionally. Figure 4 describes that Slow Starts threshold (ssthresh)
Figure 4: TCPs Slow Start / Congestion Avoidance combination
is set to
1
2
cwnd
max
, if the cwnd size exceeds network capabilities, and congestion win-
dow is set back to its initial value [Jac88]. Afterwards, CONGESTION AVOIDANCE starts
to work: Slow Start increases packets sending rate exponentially until the optimal Slow
Start threshold (ssthresh) value is reached. By reaching this threshold level, the size
of the congestion window is calculated linearly, i.e. the threshold rate increases as
slowly as necessary and as fast as possible until the maximum network capabilities are
reached, that means no further ACKs can get through the network to the sender.
Figure 5 reects typical connection set-up process generated by algorithm pair
Slow Start / Congestion Avoidance in TCP. The graph is three-divided in segments
from time tick zero to four, from four to eight and from eight to the end. Starting
in segment one, initial cwnd size is one and the value doubles each clock tick until
timeouts occur caused by congestion at clock tick number four, where cwnd reaches
the size of sixteen packets. Hence, the optimal Slow Start threshold is calculated as
ssthresh =
1
2
cwnd = 8 and the actual cwnd-size is set back to its initial value one.
During segment two, Slow Start begins to work again until cwnd reaches its optimal
window size dened by eight at time tick eight. Therefore, algorithms change and Con-
gestion Avoidance continues the work by increasing the cwnd each tick linear by one
(or any other specied value) in segment number three. By reaching the maximal pos-
sible cwnd, here sixteen, Congestion Avoidance stops increasing cwnd and transmits
stable at this rate until further timeouts occur and the described algorithm pair starts
again.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
2
4
6
8
10
12
14
16
time
cwnd
ssthresh
Slow Start Congestion Avoidance
Figure 5: Time-cwnd-diagram: Slow Start / Congestion Avoidance.
Congestion Control
2.2.2 Closed Loop and Explicit Feedback
2.2.2.1 Choke Packets
Figure 6: Usual implementation of Choke Packets
The Choke Packets approach interprets the whole network as an active part of Flow
Control. Therefore, each network actor has its own maximum of throughput rate and
if it is exceeded so-called CHOKE PACKETS are send to the origin. These specially
marked packets prevent further network nodes from generating equal Choke Packets
and thus they prevent duplicate feedback. Furthermore, after a Choke Packet reaches
the initial sender, it will throttle down its own output rate to an adequate level, within the
time congestion is reported by further Choke Packets.
The best known Choke Packet is the Source Quench Internet Connection Manage-
ment Protocol (ICMP) message as described in 3.4.1 generated by busy routers.
Moreover, expansive networks such as the Internet or large LANs nowadays contain
numerous nodes between sender and recipient, so that the latency is proportional to
the number of nodes among them. It is not exceptional to have twelve or sixteen nodes
in-between sender and recipient on a network and if the last node starts to suffer from
congestion feedback information in form of Choke Packets travels n 1 hops back to
the source as illustrated in gure 6. Rectangles are network nodes, the dashed one on
the right side is congested, and circles in-between are communication channels, the
dark colored rectangles contain high data rate and the light ones contain choked data
rate, lled circles contain Choke Packets.
Meanwhile, the sender injects data into the network unaffected until the Choke
Packet arrives and the output can be choked dramatically. But the already sent packets
travel to the congested node, so that 2n1 packets arrive and contribute to congestion
at the loaded node, with n indicating the position of the congested router.
A possible solution to minimize this latency is to allow each network node buffering
data arrived from the sender when a Choke Packet already passed as illustrated in
gure 7. This affects data ow directly after each node when a Choke Packet passes,
so the throughput is decreased immediately and the place of congestions moves hop-
by-hop to the sender and nally disappears there completely. At this point, a typical
Figure 7: Choke Packets hop-by-hop-scenario
congestion scenario with standard Internet routers in homogeneous networks is de-
scribed. A small LAN containing three clients is connected through a Network Address
Translator (NAT) router to the Internet Service Provider (ISP). Two of the three clients
call sporadically some websites but the third one transfers permanent high amounts of
data. If any of the harmless clients try to transfer some data it will get Choke Packets
responded. This is caused by the third bursting client and is obviously unfair because
the affected clients are not responsible for that congestion.
Therefore, it is possible to use the specialized version of this algorithm characterized
by one queue per client, so that only the third client would receive Choke Packets and
the two others would be able to transfer small amounts of data unaffected. Those
types of Router Centric Algorithms are summarized as ACTIVE QUEUE MANAGEMENT
and are described in Weight Fair Queuing (cf. gure 3.3.1) partially.
Congestion Control
2.2.2.2 Fast Retransmit
FAST RETRANSMIT Algorithm uses explicit feedback methods to avoid long timeout pe-
riods waiting for packet retransmitting in case of packet loss.
Such problems are inherent in packet-switched data networks because every data
packet can travel individually trough the rest of the network and can use special routes
from the sender to the recipient. Consequently, the transmitted data packets will nei-
ther reach the recipient in accurate order nor complete continually.
Therefore, after detecting a missing packet the recipient sends duplicated ACK
packets for the last correct received packet until the missing packet receives. Unfortu-
nately, TCP may use duplicate ACK packets to indicate out-of-order-packets, thus two
ACK packets do not necessarily indicate a lost packet. Therefore, if a sender receives
multiple ACK packets with the same sequence number, normally at least three of them,
these packets indicate the last successfully submitted packet. Furthermore, the pres-
ence of these ACK packets underlines the absence of congestion, otherwise these
packets could not have been received, too. Thus, the sender restarts the transmission
with the packet specied by the multiple ACK packets. This results in fast retransmis-
sion of outstanding data without waiting for timers to get expired.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0
2
4
6
8
10
12
14
16
time
cwnd
ssthresh
Slow Start Congestion Avoidance
static timout dupl. ACKs
Slow Start
Figure 8: Time-cwnd-diagram: Fast Retransmit
Figure 8 illustrates that Fast Retransmit does not change data output characteris-
tics. After starting with Slow Start at time tick zero one packet is not acknowledged. At
time tick ve, six and seven the corresponding red marked ACK duplicates receive at
the sender. The third duplicate ACK triggers Fast Retransmit to work, so a half Slow
Start cycle continuing with Congestion Avoidance is executed. This behavior is identi-
cal to Slow Start / Congestion Avoidance as described in 2.2.1.2. Therefore, it is not
necessary to wait for a static timeout as dened at time tick twelve.
Nevertheless, due to networks latency this approach is problematic within increas-
ing network diameter and numerous intermediate network nodes. This problem is dis-
cussed later in context of Weight Fair Queuing (cf. 3.3.1).
2.2.2.3 Fast Recovery
A special Congestion Avoidance Algorithm often combined with Fast Retransmit (cf.
2.2.2.2) to restart transmission at a higher throughput rate than Slow Start (cf. 2.2.1.1)
does it is the FAST RECOVERY Algorithm.
Fast Recovery starts when Fast Retransmit stops working. If no further duplicate
ACK packets are received for Fast Retransmit Algorithm, the sender tries to return
to normal sending state. But, instead of Slow Start Congestion Avoidance (additive-
increase) is used, because the returned duplicate ACK packets traveled successfully
through the network. So, no congestion is present on this route at the present time and
the sender can begin transmitting at a relatively high output rate specied by ssthresh.
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0
2
4
6
8
10
12
14
16
time
cwnd
Congestion Avoidance
ssthresh
Slow Start
Figure 9: Time-cwnd-diagram: Fast Recovery
Figure 9 points out, how Fast Recovery speeds up transmission in contrast to clas-
sical Slow Start Algorithm (cf. 2.2.1.1). After the rst missing ACK packet at time tick
four an additional half Slow Start cycle is skipped and instead Congestion Avoidance
is started with output rate eight specied by optimal Slow Start threshold.
Congestion Control
2.2.2.4 Fast Retransmit combined with Fast Recovery
As aforementioned, Fast Retransmit and Fast Recovery are a so-called algorithm pair,
because they are used rarely alone. As discussed in [Ste97, section 4] and [APS99,
section 3.2] with the arrive of the third consecutive duplicate ACK packet cwnd and
ssthresh on senders side are set to:
ssthresh = max
cwnd
2
, 2
, cwnd = ssthresh + 3.
Fast Recovery is triggered by at least three duplicate ACK packets; this implies the
successful receipt of at least three packets after the missing one. Each further dupli-
cate ACK packet arriving at the sender results in incrementing cwnd by one, because
another packet left the network and is cached in the receivers input buffer. The re-
ceiver may dene an advertised window (awnd) size, which indicates the number of
additional cachable packets on receivers side; this is a destination driven indicator. If
the cwnd size is below the awnd size, i.e. cwnd = min(cwnd, awnd), the sender is
able to send at least one more data packet to the receiver, because it is able to process
them. Otherwise, although the sender is able to send further packets, it is not advisable
to send them immediately, because the receiver would discard them in result of a lack
of resources.
When the next non-duplicate ACK packet reaches, it should be the one for the re-
transmitted packet, the sender sets the cwnd again to ssthresh. Furthermore, this ACK
packet acknowledges the outstanding packets already sent after the lost one and before
the three identical duplicate ACK packets reached the sender. At this point, Conges-
tion Avoidance starts working and increments the output rate linearly as described in
2.2.1.2.
Figure 10 and gure 11 illustrate this algorithm pair. After starting with a typical
Slow Start at time tick zero at least one data packet, here packet ve, is not acknowl-
edged and duplicate ACK packets arrive at time tick three, four and ve. With the third
duplicate ACK packet a variation of Congestion Avoidance is triggered at time tick six.
After calculating ssthresh = max
4
2
, 2
= 2 and cwnd is set to cwnd = ssthresh +3 = 5

and linear increasing is started. On the one hand, it is not necessary to wait for timers
to timeout as dened at time tick ten. On the other hand, the Congestion Avoidance at
time tick six sets the data output rate to a relatively high level, here the best possible.
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
time
cwnd
ssthresh
Congestion Avoidance
static timeout dupl. ACKs
Slow Start
Figure 10: Time-cwnd-diagram: Fast Retransmit combined with Fast Recovery
Figure 11: Packet Flow: Fast Retransmit combined with Fast Recovery
Congestion Control
3 Router Centric Algorithms
Up to this point the described algorithms involve networks intermediate nodes such
as routers, switches, and hubs as active actors rarely. Consequently, research and
development decided to implement more dynamic scenarios avoiding progressive net-
work congestion. This concept called ROUTER CENTRIC, reduces the normally needed
communication between the point of congestion and the origin as well as the risk for
timeouts and high latency times until reaction can be initiated as described in [Zha86].
3.1 Congestion Collapse
The phenomenon of Congestion Collapse occurring in datagram networks using telnet
application was rst dened by John Nagle in 1984 [Nag84]. It only exists in datagram
networks with retransmission policy such as Transmission Control Protocol (TCP) on
Internet Protocol (IP) basis. Therefore, it is inherent in the systems idea, because with
the occurrence of a bandwidth bottleneck, the number of traveling packets increases.
This is the intended behavior because of longer round trip times (RTTs) and does not
indicate any problem. If an expected ACK packet from the receiver do not arrive in time,
retransmission is started by the sender automatically and in case of adaptive host re-
transmission algorithm the RTT average threshold is increased. In fact, this behavior
is intended, but in result of a sharp RTT rising, even adaptive host retransmission is
not able to overcome this scenario. More and more copies of the same packet are in-
jected into the network on senders side and contribute to serious network congestion.
A possible solution as discussed in 3.4.1 is the ICMP Source Quench packet indicated
by routers and gateways, so that senders can decrease their output level individually.
3.2 Small packet problem
TCP encapsulates network data in IP packets for further network transmission, there-
fore especially small data quantities get large overhead by the associated TCP header.
For instances, transmitting a single character will result in a 41 bytes long data packet,
containing only one byte essential data and 40 bytes additional overhead. Transmit-
ting multiple small data packets with huge headers stresses congested networks ad-
ditionally. To avoid these packets a static delay mechanism is implemented, delaying
packets some hundred milliseconds. This helps avoiding congestion to occur, but do
not decrease network load in congested state. Therefore, a dynamic approach is given
in [Nag84, page 3] which uses buffers to accumulate sending data until outstanding
ACK packets receive (after idle connection the rst packet is sent without waiting for
any ACK packet). This results in potential larger data packets reducing the overall
overhead and stops ooding already congested networks, because no ACK packets
will go through.
3 ROUTER CENTRIC ALGORITHMS
Assuming a le transfer over a network part with ve seconds RTT, a windows size
of 2 kB and a application writing data in blocks of 512 Bytes to TCP. The rst packet
containing 512 Bytes data and additional 40 Bytes header will be sent to the receiver.
Within the relatively high RTT, TCP buffers incoming data from the application, thus
after ve seconds and one arriving ACK packet the next packet can be sent. Starting
with this packet the data amount is constant 2 kB, so although the second packet starts
just after ve seconds delay (and all further, too), it uses the maximal available amount
of data and the header-data-ratio decreases.
3.3 Router Processed
3.3.1 Weight Fair Queuing
Obviously, a fundamental problem on many intermediate network nodes is the fairness
of Congestion Control. This involves the question, which packets from which sender
are dropped from the input buffer in case of occurring congestion. The DROP TAIL Al-
gorithm, which decides to cut the last i received packets off the queue when the queue
length exceeds a xed length and the LOAD SHEDDING Algorithm as described in3.3.2
are simple, rarely fair algorithms.
Therefore, the continuation of these ideas must be a fair queuing algorithm to avoid
senseless packet dropping, because every sender loads the network individually and
thus every sender contributes to congestion by its data quantity.
Figure 12: Circuit diagram: Weight Fair Queuing
A possible solution is an extension of the simple queuing algorithm: the WEIGHT
FAIR QUEUING. This algorithm introduces round-robin-scheduling to the queue, that
means every sender got its own associated queue and so bursting senders can be
treated in a special way as described in gure 12. So, slow or periodically transmit-
ting senders are not affected by Congestion Control and packets from explicit ooding
Congestion Control
senders will be dropped.
Nevertheless, this way of fair queuing does not satisfy a scenario with a non-bursting
sender transmitting multiple small quantities of data packets to multiple different recip-
ients in a network. This includes either congested or non-congested network parts,
which can carry more data than the congested part. Thus, a rather fair way would be a
solution with a per source-destination-pair queue but this would exceed contemporary
hardware limits.
3.3.2 Load Shedding
LOAD SHEDDING as discussed in [Tan03] and [YR95] is one way to deal with congestion
that does not disappear by itself, so not handled packets are discarded by the router.
To implement a router the intelligence to decide which packet should be dropped appli-
cations must mark their packets in priority classes. This indication should be done in a
very sensitive way, because it does not make any sense to mark all packets with a high
priority.
Two examples will make it clear why the classication in priority classes is useful.
The rst example describes a discarding and resending problem. If a router drops
packet seven of twelve, the sender will send packet seven to twelve again as illustrated
Figure 13: Load Shedding Algorithm: Dropping packet seven of twelve
in gure 13. So, discarding one packet provokes retransmission of these six pack-
ets. Alternatively, the router can discard packet ten so that only three packets have to
be resend (gure 14). The second example is about multimedia compression: video
compressing algorithms divide videos in sequences of entire and subsequent frames.
After transmitting an entire frame only subsequent frames with differences from the
last entire frame are transmitted. So, if an entire frame is discarded, every subsequent
frame gets useless. Otherwise, discarding a subsequent frame will show the entire but
Figure 14: Load Shedding Algorithm: Dropping packet ten of twelve
old frame. Obviously, it is important to choose the right packet, so that the number of
retransmitted packets is minimized; that is the Load Sheddings task.
3.3.3 Random Early Detection (RED)
RANDOM EARLY DETECTION belongs to the class of Active Queue Management Algo-
rithms and is implemented router centric. This approach tries to prevent completely full
router queues by either marking or discarding packets before the situation gets hope-
less and congestion occurs. So, for instance in combination with Explicit Congestion
Notication as described in 3.4.2 it is not primarily essential to drop packets; instead
they are marked with a congestion ag and senders throttle down their output individ-
ually. The well-known implementation as illustrated in gure 15 can be subdivided into
two main parts as described in [FJ93]: AVERAGE QUEUE LENGTH CALCULATION and
PACKET DROP DECISION.
Each incoming packet triggers a new calculation of the average queue length
(avgQueueLen) and afterwards the packet drop decision is made on basis of two xed
thresholds, a lower and an upper one, thus the following three possible cases are
distinguishable:
average queue length is below lower threshold, i.e. no congestion present,
average queue length is above upper threshold, that means packets have to be
discarded/marked as a result of serious congestion state.
average queue length is above lower threshold and below upper threshold, that
means congestion is possible raising and marking/discarding probability has to
be calculated.
Congestion Control
Figure 15: Petrinet: Random Early Detection
Average Queue Length Calculation uses this function:
avgQueueLen(queueLen) = (1 queueWeight)avgQueueLen + queueWeightqueueLen,
with the current queue length queueLen and an individual queue weight parameter
queueWeight (approximately between 0.002 and 0.003), which balances the reaction
on temporary incoming bursts.
Packet Drop Decision is calculated when the average queue length avgQueue-
Len is between lower and upper threshold with the following function pair:
p
mark
(avgQueueLen) =
avgQueueLen threshold
lower
threshold
upper
threshold
lower
and
p
nal
(count) = p
impact
p
mark
1 p
mark
count
,
with count is the number successfully processed packets since last marked packet
and an individual impact parameter p
impact
to specify the algorithms reaction strength
(approximately 0.1). This results in p
nal
[0, p
impact
] and p
nal
increases with in-
creasing avgQueueLen.
With help of the ascertained nal probability some packets get marked or discarded.
When using an additional algorithm such as Explicit Congestion Notication as de-
scribed in 3.4.2 a specic header bit is set and the sender throttles down its output
rate. By dropping packets the same result is forced, because missing ACK packets
result in retransmission of discarded packets at half-rate such as described in context
with Congestion Avoidance (2.2.1.2).
The idea of random discarding intends to desynchronized network clients, using
this intermediate node, so that data bursts do no longer arrive simultaneously. So, the
intermediate node is able to handle the incoming trafc by shifting its magnitude to
different time ticks.
Congestion Control
3.4 Router Indicated
3.4.1 ICMP Source Quench
The TCP/IP suite contains special Internet Control Message Protocol (ICMP) mes-
sages for indicating special network states such as congestion. On the one hand,
receivers should send these messages when packets have been dropped and on the
other hand, intermediate network nodes should send these indications shortly before
they have to drop any packets; Nagle (cf. 3.1) decided to send these messages when
routers buffer get half lled.
If a host receives a Source Quench message, it should react with a decreasing
output data rate, so that the number of pending packets is reduced. That results in
communication on a more moderate level, but does not end in starvation. But this is
only a recommendation and so not every host must interpret these messages. There-
fore, it is important to defense intermediate nodes such as routers and gateways from
excessive trafc generated by malicious hosts.
The only way intermediate nodes can react to this overload situation is to discard
packets. But how to select these packets fairly? On the one hand, a simple approach is
to drop last recently arrived packets, thus the network load is decreased, but each host
using this gateway is inuenced and suffers from the malicious hosts behavior. On the
other hand, newly arriving packets are analyzed whether duplicates of this packet are
already in the processing queue. This can be done with the help of hash functions,
thus a decision can be reached as fast as possible with a minimum of computational
resources. This tactic reduces network load generated by hosts using bad static re-
transmission techniques.
Many other solutions are conceivable and so these algorithms are part of nowadays
research and development.
3.4.2 Explicit Congestion Notication (ECN)
This algorithm is designed for homogeneous networks based on TCP and involves all
intermediate routers as dynamic parts of congestion controlling. Therefore, each router
in this scenario has to support ECN to reduce the number of potential dropped packet
as a result of overowing router buffers.
ECN uses two specic header bits of the IP protocol called ECN-Capable Transport
and Congestion Experienced (CE) as dened in [RFB01, section 5]. To indicate ECN-
support one bit is set to one and the corresponding bit is set to zero such as 10 or 01
because missing ECN-support is indicated by setting none of these bits.
ECN-capable routers must notice raising congestion and modify transfered packets
by setting each of these bits to one, so that initial packets sender can decrease output
rate immediately. That implies a similar problem as mentioned in 2.2.2.1. If there are
numerous hops between initial packets sender and the point of occurring congestion,
the systems-wide latency avoids immediate reaction, thus already injected data can
contribute to networks critical load.
For instance, sending a data packet from host A to host B with fteen routers in-
between, where the last router remarks occurring congestion. The sent data packets
passes router fteen, both ECN bits are set to one and travels to host B, which responds
with an ACK, passing router fteen, setting both ECN bits to one again and travels to
host A. So, within the time the rst marked packet receives at the initial sender addi-
tional 2(n + 1) superuous packets already have been injected, with n represents the
number of routers in-between.
In case of few routers, the result is a immediate lowered transmission rate on
senders side and no more congestion is produced by this host. This results in no
or minimal data retransmission because of no discarded data packets on routers side,
so that superuous network load is prevented. The given implementation is question-
able because the notication could get involved in congestion or get lost anyway, so
the indication will never reach the sender and no data can be choked.
Nowadays, best known ECN-implementation is the Source Quench ICMP Choke
Packet 3.4.1 generated by routers. But in fact, this is only rarely supported and highly
criticized because these messages consume bandwidth and could increase conges-
tion as well.
A similar solution is the so-called DECBit used in association with the Random Early
Detection Algorithm (cf. 3.3.3) with the DECnet protocol. It indicates the exceeding of
a specic router queue length level. If such modied ACK packets reach the initial data
packets sender, it has to throttle down its output rate to e.g. the half prior output rate.
Congestion Control
4 Conclusion
The chosen classication is only one way to classify the algorithms; other ones start
with some policies on the different layers of the ISO/OSI-model for explaining the work
of the algorithms. With a much more detailed look on the implementation of the given
algorithms it is possible to recognize that the given classication is very close to a clas-
sication based on different layers of the ISO/OSI-model.
The specied algorithms underline the fact, that all Host Centric Algorithms work on
the Transport Layer or above. Most of the implementations use special (TCP) header
elds to transmit additional information. However, most Router Centric Algorithms do
work on the Network Layer, which is obviously the result of the TCP/IP stack.
The variety of available Congestion Control Algorithms can not x the problem com-
pletely and congestion will never be solved by one algorithm alone. Therefore, it is
necessary to know congestion reasons in a specic network or subnetwork to chose
well working algorithms or combinations of algorithms to reduce congestion in these
specic network circumstances.
In further growing wide area networks (WANs) such as the Internet congestion can
not be exiled, because of growing complexity of these virtual heterogeneous networks.
Large data networks suffer under the mathematical problem that network burst and
trafc rates can not be determinate by calculating, so the best approximation can only
be done stochastically.
This leads to the philosophical question, whether those WANs have reached a level
of self-sufciency, so they are not controlled by individuals any longer, rather than by
themselves because all components contribute a to more and more intelligent way of
recovering from temporarily overload states such as congestion.
5 GLOSSARY
5 Glossary
ACK Acknowledgment Packets such as used in TCP.
ATM Asynchronous Transfer Mode using xed cell-data length of 47 bytes.
awnd Advertised Window: indicates max. available buffer on receivers side.
cwnd Congestion Window: special buffer used in TCPs Slow Start.
ECN Explicit Congestion Notication: A Router Centric Congestion Control Algorithm.
ICMP Internet Connection Management Protocol used for special network control mes-
sages.
IP Internet Protocol: unreliable datagram service offered by networks using the most
common TCP/IP protocol suite.
ISP Internet Service Provider offers access to the Internet.
LAN Local Area Network.
NAT Network Address Translation: Used by routers to switch data packets between
different subnets.
RED Random Early Detection: A Router Centric Congestion Control Algorithm.
RTT Round Trip Time describes packets traveling time through a network.
ssthresh denes the Slow Start optimal threshold.
TCP Transmission Control Protocol used in the TCP/IP protocol suite to offer reliable
datagram service on unreliable IP basis.
WAN Wide Area Network.
Congestion Control
References
[APS99] Mark Allman, Vern Paxson, and W. Richard Stevensen. Request For
Comments 2581: TCP Congestion Control. Technical report, April 1999.
http://www.ietf.org/rfc/rfc2581.txt.
[FJ93] S. Floyd and V. Jacobsen. Random Early Detection gateways for Congestion
Avoidance. IEEE/ACM Transactions on Networking, V.1 N.4, pages
397413, August 1993.
http://www.icir.org/floyd/papers/red/red.html.
[Jac88] V. Jacobsen. Congestion Avoidance and Control. August 1988.
[Nag84] John Nagle. Request For Comments 896: Congestion Control in IP/TCP
Internetworks. Technical report, January 1984.
http://www.ietf.org/rfc/rfc896.txt.
[RFB01] K. Ramakrishnan, S. Floyd, and D. Black. Request For Comments 3168:
The Addition of Explicit Congestion Notication (ECN) to IP. Technical
report, September 2001. http://www.ietf.org/rfc/rfc3168.txt.
[Ste97] W. Richard Stevens. Request For Comments 2001: TCP Slow Start,
Congestion Avoidance, Fast Retransmit and Fast Recovery Algorithms.
Technical report, January 1997. http://www.ietf.org/rfc/rfc2001.txt.
[Tan03] Andrew S. Tanenbaum. Computer Networks - 4th Edition. Prentice Hall
PTR, 2003.
[YR95] C.-Q. Yang and A.V.S. Reddy. A Taxonomy for Congestion Control
Algorithms in Packet Switching Networks. IEEE Network Magazine, Vol. 9,
pages 3445, July/August 1995.
[Zha86] Lexia Zhang. Why TCP Timers Dont Work Well, Communications
Architectures and Protocols. pages 397405, August 1986.
REFERENCES

Kuechler, Schapranow, - Congestion Control

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Kuechler, Schapranow, - Congestion Control

Încărcat de

Drepturi de autor:

Formate disponibile

Congestion Control

Alexander K uchler, Matthieu-Patrick Schapranow

= 2 and cwnd is set to cwnd = ssthresh +3 = 5

S-ar putea să vă placă și