Sunteți pe pagina 1din 97

Shahzad Malik,

Ph.D.
smalik@comsats.edu.pk
Internet Architecture and Protocols
Internet Architecture and Protocols
Lecture 4: Congestion Control
ETN675
ETN675
4-2
Some General Questions
How can congestion happen?
What is congestion control?
Why is congestion control difficult?
4-3
Congestion in packet switching networks
When more incoming packets than out-going capacity,
buffer is needed
If the situation lasts too long, buffer will be full,
packets be discarded, congestion occurs.
Retransmission of lost/discard packets worsen
congestion
As a result, the throughput will be very low.
Some ways are needed to control and solve congestion
4-4
Causes/costs of congestion: scenario 1
two senders, two
receivers
one router, infinite
buffers
output link capacity: R
no retransmission
maximum per-connection
throughput: R/2
unlimited shared
output link buffers
Host A
original data:
in
Host B
throughput:
out
R/2
R/2

o
u
t

in
R/2
d
e
l
a
y

in
large delays as arrival rate,

in
, approaches capacity
4-5
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of timed-out packet
application-layer input = application-layer output:
in
=
out
transport-layer input includes retransmissions :
in

in
finite shared output
link buffers
Host A

in
: original data
Host B

out
'
in
: original data, plus
retransmitted data

4-6
Causes/costs of congestion: scenario 2
idealization: perfect
knowledge
sender sends only when
router buffers available
finite shared output
link buffers

in
: original data

out
'
in
: original data, plus
retransmitted data
copy
free buffer space!
R/2
R/2

o
u
t

in
Host B
A
4-7
Causes/costs of congestion: scenario 2

in
: original data

out
'
in
: original data, plus
retransmitted data
copy
no buffer space!
Idealization: known
loss packets can be lost,
dropped at router due
to full buffers
sender only resends if
packet known to be lost
A
Host B
4-8
Causes/costs of congestion: scenario 2

in
: original data

out
'
in
: original data, plus
retransmitted data
free buffer space!
Idealization: known
loss packets can be lost,
dropped at router due
to full buffers
sender only resends if
packet known to be lost
R/2
R/2

in

o
u
t
when sending at R/2,
some packets are
retransmissions but
asymptotic goodput
is still R/2 (why?)
A
Host B
4-9
Causes/costs of congestion: scenario 2
A

in

out
'
in
copy
free buffer space!
timeout
R/2
R/2

in

o
u
t
when sending at R/2,
some packets are
retransmissions
including duplicated
that are delivered!
Host B
Realistic: duplicates
packets can be lost, dropped
at router due to full buffers
sender times out prematurely,
sending two copies, both of
which are delivered
4-10
Causes/costs of congestion: scenario 2
R/2

o
u
t
when sending at R/2,
some packets are
retransmissions
including duplicated
that are delivered!
costs

of congestion:
more work (retrans) for given goodput
unneeded retransmissions: link carries multiple copies of pkt
decreasing goodput
R/2

in
Realistic: duplicates
packets can be lost, dropped
at router due to full buffers
sender times out prematurely,
sending two copies, both of
which are delivered
4-11
Causes/costs of congestion: scenario 3
four senders
multihop paths
timeout/retransmit
Q:

what happens as
in

and
in

increase

?
finite shared output
link buffers
Host A

out
Host B
Host C
Host D

in
: original data
'
in
: original data, plus
retransmitted data
A:

as red
in


increases, all arriving
blue pkts

at upper queue are
dropped, blue throughput 0
4-12
Causes/costs of congestion: scenario 3
another cost

of congestion:
when packet dropped, any upstream
transmission capacity used for that packet was
wasted!
C/2
C/2

o
u
t

in

4-13
The Cost of Congestion
Cost
Packet loss
wasted upstream
bandwidth when a pkt is
discarded at
downstream
wasted bandwidth due to
retransmission (a pkt
goes through a link
multiple times)
Long delay
Load
Load
D
e
l
a
y
T
h
r
o
u
g
h
p
u
t
knee cliff
congestion
collapse
packet
loss
4-14
Principles of congestion control
4-15
Congestion Control
Big picture:
How to determine/control a flows sending rate?
Congestion:
informally: too many sources sending too much data
too fast for the network to handle
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
wasted bandwidth
long delays (queueing in router buffers)
a top-10 problem !
4-16
Flow Control vs. Congestion Control
Flow control:: keep a fast sender from overrunning
a slow receiver.
Congestion control:: the efforts made by network
nodes to prevent or respond to overload
conditions.
Congestion control is intended to keep a fast
sender from sending data into the network due
to a lack of resources in the network {e.g.,
available link capacity, router buffers}.
4-17
Two Techniques to deal with congestion
pre-allocate resources to avoid congestion
control congestion (e.g. slowing down at source) when as it
occurs
Two points of implementation
hosts at the edges of the network (e.g. transport protocol)
routers inside the network (queuing discipline)
Underlying network service model
best-effort
multiple qualities of service
Congestion Control
4-18
Congestion Control Framework
The problem is really a resource allocation problem
Resource Allocation is a process by which network elements
try to meet the competing demands that applications have
for network resources primarily link bandwidth and buffer
space in routers and switches
Congestion Control are efforts made by nodes (hosts and
routers) to prevent or respond to overload conditions.
Taxonomy
open-loop vs closed-loop
router-centric vs host-centric
reservation-based vs feedback-based
window-based vs rate-based
4-19
Congestion Control Taxonomy
Open-loop (preventive approaches):
If a traffic flow will degrade the network
performance and QoS can not be guaranteed, then
reject the traffic flow from the beginning. Called
admission control.
Closed-loop (after fact approaches):
After congestion occurred, eliminate or reduce
congestion.
4-20
Congestion Control Taxonomy
Router-Centric
The internal network routers take responsibility for:


Which packets to forward


Which packets to drop or mark


The nature of congestion notification to the hosts.
This includes the Queuing Algorithm to manage the
buffers at the router.
Host-Centric
The end hosts adjust their behavior based on
observations of network conditions.
(e.g., TCP Congestion Control Mechanisms)
4-21
Congestion Control Taxonomy
Reservation-Based the hosts attempt to reserve
network capacity when the flow is established.
The routers allocate resources to satisfy reservations or
the flow is rejected.
The reservation can be receiver-based (e.g., RSVP) or
sender-based.
Feedback-Based - The transmission rate is
adjusted (via window size) according to feedback
received from the sub network.
Explicit feedback FECN, BECN, ECN
Implicit feedback router packet drops.
4-22
Implicit vs. Explicit
Implicit:
congestion inferred by end
systems through observed
loss, delay (e.g. TCP)
Explicit:
routers provide feedback to
end systems
explicit rate sender
should send at
single bit indicating
congestion (SNA,
DECbit, TCP ECN, ATM)
Congestion Control Taxonomy
4-23
Window-based:
Congestion control by
controlling the window
size of a transport
scheme, e.g. set window
size to 64KBytes
Example: TCP
Rate-based:
Congestion control by
explicitly controlling
the sending rate of a
flow, e.g. set sending
rate to 128Kbps
Example: ATM
Rate-based vs. Window-based
Congestion Control Taxonomy
4-24
Open-loop congestion control
4-25
Open-Loop Congestion Control
It prevents congestion from happening
It entails explicit resource allocation along the
path
It is more versatile in providing different service
models
Quality of Service (QoS)
It involves:
Signaling to specify Resource Requirement
Connection Admission Control (CAC)
Policing
Traffic Shaping
Scheduling
4-26
Open-loop congestion control
Quality of Service (QoS)
To Be Discussed Later IP QoS
4-27
Closed-loop congestion control
4-28
Approaches towards congestion control
End-end congestion
control:
no explicit feedback from
network
congestion inferred from
end-system observed loss,
delay
approach taken by TCP
Network-assisted
congestion control:
routers provide feedback
to end systems
single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
explicit rate sender
should send at
Two broad approaches towards congestion control:
4-29
Host-Centric or End-to-End
congestion control/avoidance
TCP -

congestion control
TCP Vegas -

congestion avoidance
4-30
Congestion Control/Avoidance
End-to-End: Two possibilities
TCP, TCP Vegas
Congestion Control - TCPs strategy
control congestion once it happens
repeatedly increase load in an effort to find the
point at which congestion occurs, and then back
off
Congestion Avoidance - TCP Vegas
Alternative strategy
predict when congestion is about to happen
reduce rate before packets start being discarded
call this congestion avoidance, instead of
congestion control
4-31
TCP congestion control
4-32
TCP Congestion Control
Essential strategy :: The TCP host sends packets
into the network without a reservation and then
reacts to observable events.
Originally TCP assumed FIFO queuing.
Basic idea :: each source determines how much
capacity is available to a given flow in the network.
ACKs are used to pace the transmission of packets
such that TCP is self-clocking.
4-33
TCP Congestion Control
Available bandwidth to destination varies with
activity of other users
Transmitter dynamically adjusts transmission rate
according to network congestion as indicated by RTT
(round trip time) and ACKs
Elastic utilization of network bandwidth
buffer
segments
buffer
Application
Transport
ACKS
RTT
Estimation
4-34
Phases of TCP Congestion Behavior
1. Light traffic
Arrival Rate << R
Low delay
Can accommodate more
2.

Knee (congestion onset)
Arrival rate approaches R
Delay increases rapidly
Throughput begins to
saturate
3.

Congestion collapse
Arrival rate > R
Large delays, packet loss
Useful application
throughput drops
T
h
r
o
u
g
h
p
u
t

(
b
p
s
)
D
e
l
a
y

(
s
e
c
)
R
R
Arrival
Rate
Arrival
Rate
4-35
TCP Congestion Control
Desired operating point: just before knee
Sources must control their sending rates so that aggregate
arrival rate is just before knee
TCP sender maintains a congestion window cwnd to
control congestion at intermediate routers
Effective sending window is minimum of congestion
window and advertised window (Rcv window, rwnd)
Problem: source does not know what its fair share of
available bandwidth should be -> how to set cwnd
Solution: adapt dynamically to available BW
Sources probe the network by increasing cwnd
When congestion detected, sources reduce rate
Ideally, sources sending rate stabilizes near ideal point
4-36
TCP Congestion Control
sender limits transmission:
cwnd is dynamic, function
of perceived network
congestion
TCP sending rate:
roughly: send cwnd
bytes, wait RTT for
ACKS, then send more
bytes
last byte
ACKed sent, not-
yet ACKed
(in-flight)
last byte
sent
cwnd
LastByteSent-
LastByteAcked
< cwnd
sender sequence number space
rate
~
~
cwnd
RTT
bytes/sec
4-37
TCP Congestion Control
How does sender perceive congestion?
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (CongWin) after loss
event
three mechanisms:
Slow start
Congestion avoidance (AIMD)
Fast retransmit/Fast Recovery
4-38
TCP Slow Start
Objective: determine the available capacity in the first
when connection begins,
increase rate exponentially
until first loss event:
initially cwnd = 1 MSS
double cwnd every RTT
done by incrementing
cwnd for every ACK
received
summary: initial rate is slow
but ramps up exponentially
fast
Host A
o
n
e
se
g
m
e
n
t
R
T
T
Host B
time
tw
o
se
g
m
e
n
ts
fo
u
r se
g
m
e
n
ts
4-39
TCP Slow Start
Slow start: increase
congestion window size by
one segment upon receiving
an ACK from receiver
initialized at 2 segments
(cwnd = 1 MSS)
used at (re)start of data
transfer
congestion window
increases exponentially
available bandwidth may
be >> MSS/RTT
ACK
Seg
RTTs
1
2
4
8
cwnd
4-40
TCP Congestion Avoidance
Algorithm progressively sets
a congestion threshold
When cwnd > threshold,
slow down rate at which
cwnd is increased
Increase congestion window
size by one segment per
round-trip-time (RTT)
Each time an ACK arrives,
cwnd is increased by 1/cwnd
In one RTT, cwnd segments
are sent, so total increase
in cwnd is cwnd x 1/cwnd = 1
cwnd grows linearly with
time
RTTs
1
2
4
8
cwnd
threshold
4-41
TCP: detecting, reacting to loss
loss indicated by timeout:
cwnd set to 1 MSS;
window then grows exponentially (as in slow start) to
threshold, then grows linearly
loss indicated by 3 duplicate ACKs: TCP RENO
dup ACKs indicate network capable of delivering some
segments
cwnd is cut in half window then grows linearly
TCP Tahoe always sets cwnd to 1 (timeout or 3
duplicate acks)
4-42
Congestion is detected upon
timeout (or receipt of
duplicate ACKs TCP Reno)
Assume current cwnd
corresponds to available
bandwidth
Adjust congestion threshold =
x current cwnd
Reset cwnd to 1
Go back to slow-start
Over several cycles expect to
converge to congestion
threshold equal to about the
available bandwidth
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
10
5
15
20
0
Round-trip times
Slow
start
Congestion
avoidance
Time-out
Threshold
TCP Congestion Control
4-43
TCP Congestion Control
How does the TCP congestion algorithm change congestion
window dynamically according to the most up-to-date state of
the network?
At light traffic: each segment is ACKed quickly
Increase cwnd aggressively
Slow Start (starts slowly, grows exponentially)
At knee: segment ACKs arrive, but more slowly
Slow down increase in cwnd Additive Increase
At congestion: segments encounter large delays (so
retransmission timeouts occur); segments are dropped in
router buffers (resulting in duplicate ACKs)
Reduce transmission rate, then probe again
Multiplicative Decrease
4-44
8 Kbytes
16 Kbytes
24 Kbytes
time
congestion
window
multiplicative decrease:

cut CongWin in half
after loss event
additive increase:

increase CongWin by
1 MSS every RTT in
the absence of loss
events: probing
Long-lived TCP connection
AIMD
(Additive Increase / Multiplicative Decrease)
4-45
Congestion
window
10
5
15
20
0
Round-trip times
Slow
start
Congestion
avoidance
Congestion occurs
Threshold
TCP Tahoe Congestion Control
4-46
Fast Retransmit & Fast Recovery
Congestion causes many segments to be
dropped
If only a single segment is dropped, then
subsequent segments trigger duplicate
ACKs before timeout
Can avoid large decrease in cwnd as
follows:
When three duplicate ACKs arrive,
retransmit lost segment immediately
Reset congestion threshold to cwnd
Reset cwnd to congestion threshold + 3 to
account for the three segments that
triggered duplicate ACKs
Remain in congestion avoidance phase
However if timeout expires, reset cwnd to 1
In absence of timeouts, cwnd will oscillate
around optimal value
SN=1
ACK=2
ACK=2
ACK=2
ACK=2
SN=2
SN=3
SN=4
SN=5
4-47
TCP Reno -

Refinement
After 3 dup ACKs:
CongWin is cut in half
window then grows
linearly
But after timeout event:
CongWin instead set to
1 MSS;
window then grows
exponentially
to a threshold, then
grows linearly
3 dup ACKs

indicates
network capable of
delivering some segments


timeout before 3 dup
ACKs

is more alarming
Philosophy:
4-48
TCP Reno (Fast Retransmit/Fast Recovery)
C
o
n
g
e
s
t
i
o
n

w
i
n
d
o
w
10
5
15
20
0
Round-trip times
Slow
start
Congestion
avoidance
Time-out
Threshold
Time-out
3 Dup Acks
4-49
TCP: switching from slow start to CA
Q:

when should the
exponential
increase switch to
linear?
A:

when cwnd gets
to 1/2 of its value
before timeout.
Implementation:
variable ssthresh
on loss event, ssthresh is
set to 1/2 of cwnd just
before loss event
4-50
TCP Tahoe and TCP Reno
With fast recovery, slow start only occurs:
At cold start
After a coarse-grain timeout
This is the difference between TCP Tahoe and
TCP Reno
4-51
TCP Congestion Control
timeout
ssthresh =cwnd/2
cwnd =1 MSS
dupACKcount =0
retransmit missing segment

cwnd >ssthresh
congestion
avoidance
cwnd =cwnd +MSS (MSS/cwnd)
dupACKcount =0
transmit new segment(s), as allowed
new ACK
.
dupACKcount++
duplicate ACK
fast
recovery
cwnd =cwnd +MSS
transmit new segment(s), as allowed
duplicate ACK
ssthresh=cwnd/2
cwnd =ssthresh +3
retransmit missing segment
dupACKcount ==3
timeout
ssthresh =cwnd/2
cwnd =1
dupACKcount =0
retransmit missing segment
ssthresh=cwnd/2
cwnd =ssthresh +3
retransmit missing segment
dupACKcount ==3
cwnd =ssthresh
dupACKcount =0
New ACK
slow
start
timeout
ssthresh =cwnd/2
cwnd =1 MSS
dupACKcount =0
retransmit missing segment
cwnd =cwnd+MSS
dupACKcount =0
transmit new segment(s), as allowed
new ACK
dupACKcount++
duplicate ACK

cwnd =1 MSS
ssthresh =64 KB
dupACKcount =0
New
ACK!
New
ACK!
New
ACK!
4-52
Summary: TCP Congestion Control
When CongWin is below Threshold, sender in slow-
start phase, window grows exponentially.
When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, Threshold set
to CongWin/2 and CongWin set to Threshold.
When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
4-53
TCP sender congestion control
Event State TCP Sender Action Commentary
ACK receipt
for previously
unacked
data
Slow Start
(SS)
CongWin = CongWin + MSS,
If (CongWin > Threshold)
set state to Congestion
Avoidance
Resulting in a doubling of
CongWin every RTT
ACK receipt
for previously
unacked
data
Congestion
Avoidance
(CA)
CongWin = CongWin+MSS *
(MSS/CongWin)
Additive increase, resulting
in increase of CongWin by
1 MSS every RTT
Loss event
detected by
triple
duplicate
ACK
SS or CA Threshold = CongWin/2,
CongWin = Threshold,
Set state to Congestion
Avoidance
Fast recovery,
implementing multiplicative
decrease. CongWin will not
drop below 1 MSS.
Timeout SS or CA Threshold = CongWin/2,
CongWin = 1 MSS,
Set state to Slow Start
Enter slow start
Duplicate
ACK
SS or CA Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
4-54
TCP throughput
avg. TCP thruput as function of window size, RTT?
ignore slow start, assume always data to send
W: window size (measured in bytes) where loss occurs
avg. window size (# in-flight bytes) is W
avg. thruput is 3/4W per RTT
W
W/2
avg TCP thruput =
3
4
W
RTT
bytes/sec
4-55
TCP Futures: TCP over long, fat pipes
example: 1500 byte segments, 100ms RTT, want
10 Gbps throughput
requires W = 83,333 in-flight segments
throughput in terms of segment loss probability,
L [Mathis 1997]:


to achieve 10 Gbps

throughput, need a loss
rate of L = 210
-10
a very small loss rate!
new versions of TCP for high-speed
TCP throughput =
1.22
.
MSS
RTT
L
4-56
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each
should have average rate of R/K
TCP connection 1
bottleneck
router
capacity R
TCP connection 2
4-57
TCP Fairness
two competing sessions:
additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
R
R
equal bandwidth share
Connection 1 throughput
C
o
n
n
e
c
t
i
o
n

2

t
h
r
o
u
g
h
p
u
t
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
4-58
TCP Fairness
Fairness and UDP
multimedia apps often
do not use TCP
do not want rate
throttled by congestion
control
instead use UDP:
send audio/video at
constant rate, tolerate
packet loss
Fairness, parallel TCP
connections
application can open multiple
parallel connections between
two hosts
web browsers do this
e.g., link of rate R with 9
existing connections:
new app asks for 1 TCP, gets
rate R/10
new app asks for 11 TCPs, gets
R/2
4-59
Congestion Avoidance/Prevention

TCP Vegas
4-60
Congestion Control/Avoidance
End-to-End: Two possibilities
TCP, TCP Vegas
TCPs strategy
control congestion once it happens
repeatedly increase load in an effort to find the
point at which congestion occurs, and then back
off
Alternative strategy - TCP Vegas
predict when congestion is about to happen
reduce rate before packets start being discarded
call this congestion avoidance, instead of
congestion control
4-61
TCP Vegas
Idea: source watches for some sign that routers
queue is building up and congestion will happen too;
e.g., RTT grows, sending rate flattens
60
20
0.5 1.0 1.5 4.0 4.5 6.5 8.0
Time (seconds)
Time (seconds)
70
30
40
50
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
900
300
100
0.5 1.0 1.5 4.0 4.5 6.5 8.0
1100
500
700
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Time (seconds)
0.5 1.0 1.5 4.0 4.5 6.5 8.0
5
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
4-62
TCP Vegas -

Algorithm
Let BaseRTT be the minimum of all measured RTTs
(commonly the RTT of the first packet)
If not overflowing the connection, then
ExpectRate = CongestionWindow/BaseRTT
Source calculates sending rate (ActualRate) once per
RTT
Source compares ActualRate with ExpectRate
Diff = ExpectedRate - ActualRate
if Diff <
increase CongestionWindow linearly
else if Diff >
decrease CongestionWindow linearly
else
leave CongestionWindow unchanged
4-63
TCP Vegas -

Algorithm
Parameters


= 1 packet


= 3 packets
Even faster retransmit
keep fine-grained timestamps for each packet
check for timeout on first duplicate ACK
70
60
50
40
30
20
10
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
240
200
160
120
80
40
Time (seconds)
4-64
Router-Centric congestion
control/avoidance
4-65
Definitions
congestion avoidance when impending congestion is
indicated, take action to avoid congestion.
incipient congestion congestion that is beginning to
be apparent.
need to notify connections of congestion at the
router by either marking the packet [ECN] or
dropping the packet {This assumes a drop is an
implied signal to the source host.}
4-66
Packet Queuing (at the router)
Queuing algorithms determine:
How packets are buffered.
Which packets get transmitted.
Which packets get marked or dropped.
Indirectly determine the delay at the router.
Queues at outgoing links drop/mark packets to
implicitly/explicitly signal congestion to TCP sources.
4-67
Queuing Mechanisms
Some of the possible choices in queuing
algorithms:
FIFO (FCFS) also called Drop-Tail
Priority Queuing (PQ)
Fair Queuing (FQ)
Weighted Fair Queuing (WFQ)
4-68
FIFO [FCFS] Queuing
First packet to arrive is first to be transmitted.
FIFO queuing mechanism that drops packets from
the tail of the queue when the queue overflows.
Introduces global synchronization when packets
are dropped from several connections.
FIFO is the scheduling mechanism, Drop Tail is
the policy
4-69
Priority Queuing
Mark each packet with a priority (e.g., in
TOS (Type of Service field in IP)
Implement multiple FIFO queues, one for
each priority class.
Always transmit out of the highest priority
non-empty queue.
Still no guarantees for a given priority class.
4-70
Priority Queuing
Problem:: high priority packets can starve
lower priority class packets.
Priority queuing is a simple case of
differentiated services [DiffServ].
One practical use in the Internet is to
protect routing update packets by giving
them a higher priority and a special queue
at the router.
4-71
Fair Queuing [FQ]
The basic problem with FIFO is that it
does not separate packets by flow.
Another problem with FIFO :: an ill-
behaved flow can capture an arbitrarily
large share of the networks capacity.
Idea:: maintain a separate queue for each
flow, and FQ services these queues in a
round-robin

fashion.
4-72
Fair Queuing [FQ]
4-73
Fair Queuing [FQ]
Ill-behaved flows are segregated into
their own queue.
There are many implementation details for
FQ, but the main problem is that packets
are of different lengths simple FQ is not
fair!!
Ideal FQ:: do bit-by-bit round-robin.
4-74
Weighted Fair Queuing [WFQ]
WFQ idea:: Assign a weight to each flow (queue)
such that the weight logically specifies the number
of bits to transmit each time the router services
that queue.
This controls the percentage of the link capacity
that the flow will receive.
The queues can represent classes of service and
this becomes DiffServ.
An issue how does the router learn of the weight
assignments?
Manual configuration
Signalling from sources or receivers.
4-75
Active Queue Management
TCP sources interact with routers to deal
with congestion caused by an internal
bottlenecked link.
Drop Tail :: FIFO queuing mechanism.
Random Drop
Early Random Drop
RED :: Random Early Detection
Source Quench messages
DECbit scheme
ECN :: Explicit Congestion Notification
4-76
Drop Tail Router
FIFO queuing mechanism that drops packets
from the tail when the queue overflows.
Introduces global synchronization when
packets are dropped from several
connections.
4-77
Random Drop Router
When a packet arrives and the queue is full,
randomly choose a packet from the queue to
drop.
4-78
Early Random Drop Router
If the queue length exceeds a drop level,
then the router drops each arriving packet
with a fixed drop probability.
Does not control misbehaving users (UDP)
p
Drop level
4-79
Random Early Detection (RED)
RED: randomly drop packets before buffer
overflows.
A higher rate source would suffer higher dropped
packets.
Notification is implicit
just drop the packet (TCP will timeout)
could make explicit by marking the packet
Early Random Drop
rather than wait for queue to become full, drop
each arriving packet with some drop probability
whenever the queue length exceeds some drop level
Active Queue Management (AQM)
4-80
RED Router
Random Early Detection (RED) detects
congestion early by maintaining an
exponentially-weighted average queue size.
RED probabilistically drops packets before
the queue overflows to signal congestion to
TCP sources.
RED attempts to avoid global synchronization
and bursty packet drops.
4-81
RED

Averaging Queue Length
Queue length
Instantaneous
Averge
Ttime
4-82
RED

Averaging Queue Length
Compute average queue length
AvgLen

= (1 -

Weight) * AvgLen + Weight * SampleLen
0 < Weight < 1 (usually 0.002)
SampleLen is queue length each time a packet arrives
Weight should be chosen such that changes in queue length
over time scale much less than 100ms is filtered out
Because TCP responds to congestion takes RTT, which is
estimated currently for the Internet is 100ms
This exponential weighted moving average is
designed such that short-term increases in queue
size from bursty traffic or transient congestion do
not significantly increase average queue size
4-83
RED

Queue Thresholds
Two queue length thresholds
i f AvgLen <= MinThreshold t hen
enqueue t he packet
i f MinThreshold < AvgLen < MaxThreshold t hen
cal cul at e pr obabi l i t y P
dr op ar r i vi ng packet wi t h pr obabi l i t y P
i f AvgLen >= MaxThreshold t hen
dr op ar r i vi ng packet
MaxThreshold MinThreshold
AvgLen
4-84
RED

Drop Probability
Computing probability P
TempP = MaxP * ( AvgLen - Mi nThr eshol d) /
( MaxThr eshol d - Mi nThr eshol d)
P = TempP/ ( 1 - count * TempP) = 1/ ( 1/ TempP
count )
MaxP = 0. 02
Count comput es how many newl y ar r i ved packet s ar e
queued ( not dr opped) when t he AvgLen i s wi t hi n t wo
t hr eshol ds
I t ensur es r oughl y even di st r i but i on of dr op, avoi ds
cluster dropping
4-85
RED Router Mechanism
1
0
Average Queue Length
min
th
max
th
Dropping/Marking
Probability
Queue Size
max
p
4-86
RED Router Mechanism -

Enhanced
1
0
Min-threshold Max-threshold
Dropping/Marking
Probability
Queue Size
max
p
Average Queue Length (avg
q
)
Gentile
Gentile
RED
RED
4-87
RED Tuning
Probability of dropping a particular flows packet(s) is roughly
proportional to the share of the bandwidth that flow is currently
getting
MaxP is typically set to 0.02, meaning that when the average queue
size is halfway between the two thresholds, the gateway drops
roughly one out of 50 packets.
If traffic is bursty, then Mi nThr eshol d should be sufficiently
large to allow link utilization to be maintained at an acceptably high
level
Difference between two thresholds should be larger than the
typical increase in the calculated average queue length in one RTT;
setting MaxThr eshol d to twice Mi nThr eshol d is reasonable for
traffic on todays Internet
When the instantaneous queue length exceeds the buffer space,
the packets are dropped causing RED to enter drop tail mode. One
of the goals of RED is to avoid drop tail as much as it is possible
4-88
Source Quench messages
Router sends source quench messages back
to source before the queue reaches capacity.
Complex solution that gets router involved in
end-to-end protocol.
4-89
DECbit

scheme
Uses a congestion-indication bit in packet header to
provide feedback about congestion.
Upon packet arrival, the average queue length is
calculated for last (busy + idle) period plus current
busy period.
When the average queue length exceeds one, the
router sets the congestion-indicator bit in arriving
packets header.
If at least half of packets in sources last window
have the bit set, decrease the congestion window
exponentially.
4-90
Explicit Congestion Control
Router employing AQM detects incipient congestion
It can set congestion experienced (CE) bit in the IP
header of the vulnerable packet, and instead of
dropping the packet queue the marked packet
More precise indication of congestion
Two unused bits in the IP TOS byte is reserved for CE
The end-host (sender) reacts to packet marking by
adjusting its cwnd
Less severe than packet loss case
Much earlier than fast retransmit case
Real-time applications that suffer from delay caused
by packet loss get benefit from network assisted
congestion control
4-91
ECN related bit definitions
New Definition of TCP Flag Field in TCP Header
0 3 4 7 8 9 10 11 12 13 14 15
Header Length Reserved CWR ECE URG ACK PSH RST SYN FIN
CWR: Congestion Window Reduced
ECE: ECN Echo
0 5 7
DSCP ECN
RFC 2474
Differentiated Services Code Point (DSCP)
ECN is defined as Currently Unused (CU)
Class Selector Code Point (xxx000) carries
the IP Precedence field
RFC 3168
CU is now defined as ECN field to notify the
end system of explicit congestion notification
within the network
New Definition of IP TOS Byte
ECT CE
ECN Field
ECT CE
0 0 Not ECT
0 1 ECT(0)
1

0 ECT(1)
1 1 CE
ECT: ECN capable Transport
CE: Congestion Experienced
4-92
ECN-capable Router
It marks the packet with the CE code point only if it would
otherwise drop the packet due to incipient congestion
It marks only those packets that belong to ECN-capable
transport connection
This is indicated by ECT code point in the IP header
It does not mark, rather drops a packet if the packet is to
be dropped due to reasons other than congestion
E.g. DiffServ edge router may drop packets of certain flows
controlled by QoS policy
It should not set the CE code point based on instantaneous
queue size
E.g. RED is based on average queue size, whereas ATM and
Frame Relay sets congestion bit as a result of instantaneous
queue size exceeding threshold causing noisy notification
It leaves the packet unchanged if the packet is marked with
the CE code point
4-93
ECN-capable transport
ECN-capable transport exhibits same response to dropped packets
and CE packet
Otherwise it may result in unfair treatment of flows, because routers
may adopt different drop policies for ECN-capable packets and non
ECN-capable packets
CE packet indicates persistent rather than transient congestion
Reaction should be appropriate to the persistent congestion
ECN-capable TCP requires following functionalities:
Negotiation between the end points to determine that both are ECN-
capable
An ECN-Echo (ECE) flag in the TCP header so that the receiver
indicates the sender receiving a CE packet
Congestion Window Reduced (CWR) flag in the TCP header so the the
sender informs the receiver that it has reduced the congestion window
as a result of receiving an earlier packet with ECE flag set
Set ECT code point in the IP header of every packet the sender
generates to indicate to the router to employ ECN instead of dropping
the packet
4-94
ECN Scenario
ECN-SETUP-SYN (SYN with
ECE and CWR bits set)
ECN-SETUP-SYN-ACK (SYN-

ACK with ECE bit set)
TCP ACK
D
ata Packe
t w
ith
E
C
T
an
d
C
E
b
its se
t
ACK packet with ECE bit set
D
a
ta
p
a
c
k
e
t w
ith
C
W
R
b
it se
t
ECT bit set
ECT and CE bits set
ECE bit set
CWR bit set
S
R
4-95
Performance Evaluation
The senders reaction is same as in non-ECN case
Reduce the cwnd by one half of the current value
The main differences are:
It may happen sooner
There is no retransmit of lost packet
Short flows experience shorter transfer time
Due to no retransmission of lost packet
Especially in cases when packet drop occurs at or close to the end
of transmission
TCP connection with small cwnd can avoid retransmit timeout
When fast retransmit cannot be triggered due to lack of sufficient
number of duplicate acks
Long flows also experience short transfer time
ECN eliminates the delay of fast retransmit and retransmit
timeout procedures
4-96
Lectures 3 and 4: summary
principles behind transport layer
services:
multiplexing, demultiplexing
reliable data transfer
flow control
congestion control
instantiation, implementation in
the Internet
UDP
TCP
router-centric congestion
control/avoidance
Active queue management
next:
leaving the
network edge
(application,
transport layers)
into the network
core
4-97
Acknowledgement
The slides are primarily adapted from the
slides provided by Kurose and Ross.
Some material is based on Leon
Garcia/Widjaja and Perterson/Davie.

S-ar putea să vă placă și