Sunteți pe pagina 1din 55

CS551: Queue Management

Christos Papadopoulos
(http://netweb.usc.edu/cs551/)

Congestion control vs. resource


allocation
Networks key role is to allocate its transmission
resources to users or applications
Two sides of the same coin
let network do resource allocation (e.g., VCs)
difficult to do allocation of distributed resources
can be wasteful of resources

let sources send as much data as they want


recover from congestion when it occurs
easier to implement, may lose packets

Connectionless Flows
How can a connectionless network allocate anything to a
user?
It doesnt know about users or applications

Flow:
a sequence of packets between same source - destination pair,
following the same route

Flow is visible to routers - it is not a channel, which is an


end-to-end abstraction
Routers may maintain soft-state for a flow
Flow can be implicitly defined or explicitly established
(similar to VC)
Different from VC in that routing is not fixed

Taxonomy
Router-centric v.s. Host-centric
router-centric: address problem from inside network routers decide what to forward and what to drop
A variant not captured in the taxonomy: adaptive routing!

host centric: address problem at the edges - hosts


observe network conditions and adjust behavior
not always a clear separation: hosts and routers may
collaborate, e.g., routers advise hosts

..Taxonomy..
Reservation-based v.s. Feedback-based
Reservations: hosts ask for resources, network
responds yes/no
implies router-centric allocation

Feedback: hosts send with no reservation,


adjust according to feedback
either router or host centric: explicit (e.g., ICMP
source quench) or implicit (e.g., loss) feedback

..Taxonomy
Window-based v.s. Rate-based
Both tell sender how much data to transmit
Window: TCP flow/congestion control
flow control: advertised window
congestion control: cwnd

Rate: still an open area of research


may be logical choice for reservation-based
system

Service Models
In practice, fewer than eight choices
Best-effort networks
Mostly host-centric, feedback, window based
TCP as an example

Networks with flexible Quality of Service


Router-centric, reservation, rate-based

Queuing Disciplines
Each router MUST implement some
queuing discipline regardless of what the
resource allocation mechanism is
Queuing allocates bandwidth, buffer space,
and promptness:
bandwidth: which packets get transmitted
buffer space: which packets get dropped
promptness: when packets get transmitted

FIFO Queuing
FIFO:first-in-first-out (or FCFS: first-come-firstserve)
Arriving packets get dropped when queue is full
regardless of flow or importance - implies droptail
Important distinction:
FIFO: scheduling discipline (which packet to serve
next)
Drop-tail: drop policy (which packet to drop next)

Dimensions
Scheduling
Per-connection state

Single class

Class-based queuing

Head

Drop position

FIFO
Tail

Random location

Early drop

Overflow drop

..FIFO
FIFO + drop-tail is the simplest queuing algorithm
used widely in the Internet

Leaves responsibility of congestion control to


edges (e.g., TCP)
FIFO lets large user get more data through but
shares congestion with others
does not provide isolation between different flows
no policing

Fair Queuing
Demmers90

Fair Queuing
Main idea:
maintain a separate queue for each flow currently
flowing through router
router services queues in Round-Robin fashion

Changes interaction between packets from


different flows
Provides isolation between flows
Ill-behaved flows cannot starve well-behaved flows
Allocates buffer space and bandwidth fairly

FQ Illustration
Flow 1
Flow 2
I/P

O/P

Flow n
Variation: Weighted Fair Queuing (WFQ)

Some Issues
What constitutes a user?
Several granularities at which one can express flows
For now, assume at the granularity of sourcedestination pair, but this assumption is not critical

Packets are of different length


Source sending longer packets can still grab more than
their share of resources
We really need bit-by-bit round-robin
Fair Queuing simulates bit-by-bit RR
not feasible to interleave bits!

Bit-by-bit RR
Router maintains local clock
Single flow: suppose clock ticks when a bit
is transmitted. For packet i:
Pi: length, Ai = arrival time, Si: begin transmit
time, Fi: finish transmit time. Fi = Si+Pi
Fi = max (Fi-1, Ai) + Pi

Multiple flows: clock ticks when a bit from


all active flows is transmitted

Fair Queuing
While we cannot actually perform bit-by-bit
interleaving, can compute (for each packet)
Fi. Then, use Fi to schedule packets
Transmit earliest Fi first

Still not completely fair


But difference now bounded by the size of the
largest packet
Compare with previous approach

Fair Queuing Example


Flow 1

Flow 2

Output

F=10
F=8
F=5

Flow 1
(arriving)

Cannot preempt packet


currently being transmitted

F=10
F=2

Flow 2
Output
transmitting

Delay Allocation
Aim: give less delay to those using less than
their fair share
Advance finish times for sources whose
queues drain temporarily
Bi = Pi + max (Fi-1, Ai - d)
Schedule earliest Bi first

Allocate Promptness
Bi = Pi + max (Fi-1, Ai - d)
d gives added promptness:
if Ai < Fi-1, conversation is active and d does
not affect it: Fi = Pi + Fi-1
if Ai > Fi-1, conversation is inactive and d
determines how much history to take into
account

Notes on FQ
FQ is a scheduling policy, not a drop policy
Still achieves statistical muxing - one flow can fill
entire pipe if no contenders FQ is work
conserving

WFQ is a possible variation need to learn


about weights off line. Default is one bit per
flow, but sending more bits is possible

More Notes on FQ
Router does not send explicit feedback to source still needs e2e congestion control
FQ isolates ill-behaved users by forcing users to share
overload with themselves
user: flow, transport protocol, etc

Optimal behavior at source is to keep one packet


in the queue
But, maintaining per flow state can be expensive
Flow aggregation is a possibility

Congestion Avoidance
TCPs approach is reactive:
detect congestion after it happens
increase load trying to maximize utilization until loss
occurs
TCP has a congestion avoidance phase, but thats
different from what were talking about here

Alternatively, we can be proactive:


we can try to predict congestion and reduce rate before
loss occurs
this is called congestion avoidance

Router Congestion Notification


Routers well-positioned to detect congestion
Router has unified view of queuing behavior
Routers can distinguish between propagation and
persistent queuing delays
Routers can decide on transient congestion, based on
workload

Hosts themselves are limited in their ability to


infer these from perceived behavior

Router Mechanisms
Congestion notification
the DEC-bit scheme
explicit congestion feedback to the source

Random Early Detection (RED)


implicit congestion feedback to the source
well suited for TCP

Design Choices for Feedback


What kind of feedback
Separate packets (source quench)
Mark packets, receiver propagates marks in ACKs

When to generate feedback


Based on router utilization
You can be near 100% utilization without seeing a throughput
degradation

Queue lengths
But what queue lengths (instantaneous, average)?

A Binary Feedback Scheme for


Congestion Control in Computer
Networks (DEC-bit)
Ramakrishnan90

The Dec-bit Scheme

The Dec-bit Scheme


Basic ideas:
on congestion, router sets a bit (CI) bit on packet
receiver relays bit to sender in acknowledgements
sender uses feedback to adjust sending rate

Key design questions:


Router: Feedback policy (how and when does a router
generate feedback)
Source: Signal filtering (how does the sender respond?)

Why Queue Lengths?


It is desirable to implement FIFO
Fast implementations possible
Shares delay among connections
Gives low delay during bursts

FIFO queue length is then a natural choice


for detecting the onset of congestion

The Use of Hysteresis


If we use queue lengths, at
what queue lengths should
we generate feedback?
Threshold or hysteresis?

Surprisingly, simulations
showed that if you want to
increase power
Use no hysteresis
Use average queue length
threshold of 1
Maximizes power function

Power = throughput/delay

Computing Average Queue


Lengths
Possibilities:
Instantaneous
Premature, unfair
Averaged over a fixed time
window, or exponential
average
Can be unfair if time
window different from
round-trip time

Solution
Adaptive queue length
estimation: busy/idle cycles
But need to account for
long current busy periods

Sender Behavior
How often should the source change
window?
In response to what received information
should it change its window?
By how much should the source change its
window?
We already know the answer to this: AIMD
DEC-bit scheme uses a multiplicative factor of
0.875

How Often to Change Window?


Not on every ACK received
Window size would oscillate dramatically
because it takes time for a window changes
effects to be felt
If window changes to W, it takes (W+1) packets for
feedback about that window to be received

Correct policy: wait for (W+W) acks


Where W is window size before update and W
is size after update

Using Received Information


Use the CI bits from W acks in order to decide
whether congestion still persists
Clearly, if some fraction of bits are set, then
congestion exists
What fraction?
Depends on the policy to set the threshold
When queue size threshold is 1, cutoff fraction should
be 0.5
This has the nice property that the resulting power is
relatively insensitive to this choice

Changing the Senders Window


Sender policy
monitor packets within a window
make change if more than 50% of packets had
CI set:
if < 50% had CI set, then increase window by 1
else new window = window * 0.875

additive increase, multiplicative decrease for


stability

Dec-bit Evaluation

Relatively easy to implement


No per-connection state
Stable
Assumes cooperative sources
Conservative window increase policy
Some analytical intuition to guide design
Most design parameters determined by
extensive simulation

Random Early Detection (RED)


Floyd93

Random Early Detection (RED)


Motivation:
high bandwidth-delay flows have large queues
to accommodate transient congestion
TCP detects congestion from loss - after queues
have built up and increase delay

Aim:
keep throughput high and delay low
accommodate bursts

Why Active Queue


Management? (Rfc2309)
Lock-out problem
drop-tail allows a few flows to monopolize the
queue space, locking out other flows (due to
synchronization)

Full queues problem:


drop tail maintains full or nearly-full queues
during congestion; but queue limits should
reflect the size of bursts we want to absorb, not
steady-state queuing

Other Options
Random drop:
packet arriving when queue is full causes some
random packet to be dropped

Drop front:
on full queue, drop packet at head of queue

Random drop and drop front solve the lockout problem but not the full-queues problem

Solving the Full Queues


Problem
Drop packets before queue becomes full
(early drop)
Intuition: notify senders of incipient
congestion
example: early random drop (ERD):
if qlen > drop level, drop each new packet with
fixed probability p
does not control misbehaving users

Differences With Dec-bit

Random marking/dropping of packets


Exponentially weighted queue lengths
Senders react to single packet
Rationale:
Exponential weighting better for high bandwidth
connections
No bias when weighting interval different from roundtrip time, since packets are marked randomly
Random marking avoids bias against bursty traffic

RED Goals
Detect incipient congestion, allow bursts
Keep power (throughput/delay) high
keep average queue size low
assume hosts respond to lost packets

Avoid window synchronization


randomly mark packets

Avoid bias against bursty traffic


Some protection against ill-behaved users

RED Operation
Min thresh

Max thresh

Average queue
length
P(drop)
1.0

MaxP
Avg length
minthresh

maxthresh

Queue Estimation
Standard EWMA: avg - (1-wq) avg +
wqqlen
Upper bound on wq depends on minth
want to set wq to allow a certain burst size

Lower bound on wq to detect congestion


relatively quickly

Thresholds
minth determined by the utilization
requirement
Needs to be high for fairly bursty traffic

maxth set to twice minth


Rule of thumb
Difference must be larger than queue size
increase in one RTT
Bandwidth dependence

Packet Marking
Marking probability based on queue length
Pb = maxp(avg - minth) / (maxth - minth)

Just marking based on Pb can lead to


clustered marking -> global synchronization
Better to bias Pb by history of unmarked
packets
Pb = Pb/(1 - count*Pb)

RED Algorithm

RED Variants
FRED: Fair Random Early Drop (Sigcomm, 1997)
maintain per flow state only for active flows (ones
having packets in the buffer)

CHOKe (choose and keep/kill) (Infocom 2000)


compare new packet with random pkt in queue
if from same flow, drop both
if not, use RED to decide fate of new packet

Extending RED for Flow


Isolation
Problem: what to do with non-cooperative
flows?
Fair queuing achieves isolation using perflow state - expensive at backbone routers
Pricing can have a similar effect
But needs much infrastructure to be developed

How can we isolate unresponsive flows


without per-flow state?

Red Penalty Box


With RED, monitor history for packet
drops, identify flows that use
disproportionate bandwidth
Isolate and punish those flows

Flows That Must Be Regulated


Unresponsive:
fail to reduce load in response to increased loss

Not TCP friendly


long-term usage exceeds that of TCP under same
conditions

Using disproportionate bandwidth


use disproportionately more bandwidth than other flows
during congestion

Assumptions:
We can monitor a flows arrival rate

Identifying Flows to Regulate


Not TCP friendly: use TCP model

TCP tput: (1.5*sqrt(0.66B)) / (RTT*sqrt(p))


B: packet size in bytes, p: packet drop rate
x
Better approximation in Padhye et al. paper
Problems:
Needs bounds on packet sizes and RTTs

Unresponsive
if drop rate increases by x then arrival rate should
decrease by a factor of sqrt(x)

..Flows to Regulate
Flows using disproportionate bandwidth
assume additive increase, multiplicative
decrease only flows
assume cwin = W at loss
can be shown that: loss prob <= 8/(3W2)
for segment size B:
tput < 0.75W*B/RTT

S-ar putea să vă placă și