Documente Academic
Documente Profesional
Documente Cultură
Mirja Kühlewind∗ , David P. Wagner† , Juan Manuel Reyes Espinosa† , Bob Briscoe‡
∗ Communication
Systems Group, ETH Zurich, Switzerland
mirja.kuehlewind@tik.ee.ethz.ch
† Institute of Communication Networks and Computer Engineering, University of Stuttgart, Germany
david.wagner@ikr.uni-stuttgart.de
‡ BT Research, Ipswich, UK
bob.briscoe@bt.com
control assuming the benefits lead to an increasing proportion C. Enhanced ECN Feedback
of DCTCP flows in the future Internet.
ECN allows network nodes to notify of congestion by
We give an overview on DCTCP in Section II. In Sec- setting a flag in the Internet Protocol (IP) header (Congestion
tion III we present modifications to algorithm and imple- Experienced (CE) codepoint), with no need to drop packets.
mentation of DCTCP and introduce our proposed dual AQM A host receiving a CE-marked packet will send ECN-Echoes
scheme. In Section IV we present our results and show that a (ECE) in every TCP acknowledgement until it receives a
stable operation with lower latency including equal sharing, if Congestion Window Reduced (CWR)-flagged TCP packet. By
desired, is possible. We also derived rules for defining AQM this mechanism only one congestion feedback can be sent per
parameters in order to achieve both, lower latency and equal RTT which is appropriate for conventional TCP congestion
sharing. control but not for DCTCP. Thus DCTCP changes the ECN
feedback mechanism. It aims to get exactly one ECN-Echo
for each CE-marked packet. However, to be able to use
II. OVERVIEW OF DCTCP delayed acknowledgements, Alizadeh et al. define in [1] a
DCTCP is a combination of a congestion control scheme two state machine for handling ECN feedback. Note that
and AQM scheme that is based on new semantics of congestion there is no negotiation as DCTCP assumes that the receiver
notification using ECN signaling. DCTCP implements three is DCTCP-enabled. For wider use in the Internet, the authors
changes: a different reaction to congestion in the sender, a are standardizing a negotiation phase [3].
specific RED configuration in the network nodes, and a more
accurate congestion feedback mechanism from the receiver to III. M ODIFICATIONS
the sender.
Our evaluation is based on a Linux patch provided by
the University of Stanford [6] applied to the Linux kernel
A. Simple Marking Scheme version 3.2.18. In the initial phase of our investigations we
observed unexpected and undesired behavior of that imple-
The AQM scheme for DCTCP operation is deceptively mentation which we fixed by minor modifications described
simple: if the current instant queue occupancy is larger than a in the following section. Furthermore, we implemented two
certain threshold K, every arriving packet will be marked. This algorithmic modifications to provide a faster adaptation to the
mechanism can be implemented as a specific parameterization current congestion level. Finally, we present our dual AQM
of RED [5]. RED probabilistically decides about the marking scheme in Subsection III-C.
of arriving packets based on the average queue occupancy,
calculated as a weighted moving average with the weighting
A. Implementation in Linux
factor w. Only above a minimum threshold (M in T hresh)
arriving packets will be marked with linearly increasing proba- a) Finer resolution for α value: In the provided im-
bility, as also displayed in Figure 3, reaching up to a maximum plementation [6] the resolution of α was limited to minimum
marking probability (M ax P rob) at the maximum threshold value of 1/210 . Because of this, in our simulation scenario the
(M ax T hresh) and a probability of 1 above. The proposed congestion window converged to a fixed value in a situation
AQM scheme for DCTCP can be realized by using RED with with very few ECN markings. For our investigations we
M in T hresh = M ax T hresh = K and w = 1. changed the minimum resolution to 1/220 . It should be noted
that for large congestion windows and very low marking rates,
B. Congestion Control Algorithm an even finer resolution might be necessary.
Setting of the Slow Start threshold: In the provided
A TCP sender maintains a congestion window (cwnd),
DCTCP implementation the Slow Start threshold (ssthresh)
giving the allowed number of packets in flight during one
is incorrectly set to the current cwnd value after a reduction.
Round Trip Time (RTT). With DCTCP, when an ECN-Echo
In our implementation we correctly reset the ssthresh to the
arrives, cwnd is updated to reflect not just the existence of
cwnd − 1 instead. With the original patch a DCTCP sender
some congestion within a round trip, but the exact proportion
was in Slow Start (cwnd <= ssthresh) after each decrease
of congestion marks. This is achieved according to the follow-
and thus immediately increases (by one packet) on arrival of
ing equation
the first ACK, then leaving Slow Start and correctly entering
cwnd ← (1 − α/2) ∗ cwnd (1) Congestion Avoidance. As with DCTCP not every window
recalculation causes a window reduction, therefore this error
where α is the moving average of the fraction of marked caused a non-linear increase in a noticeable range.
packets in the last RTT. Its calculation is given by
b) Allow the congestion window to grow in CWR state:
α ← (1 − g) ∗ α + g ∗ F (2) While the Linux congestion control implementation in general
does not allow any further window increases during roughly
where F is the fraction of the marked packets in the last one RTT after the reception of a congestion signal, this
RTT and g is a weighting factor. g is recommendationed does not seem to be appropriate for DCTCP. Thus in our
to be set to 1/24 in [1]. α is updated once per RTT. This implementation we allow the congestion window to grow even
congestion control algorithm allows the sender to gently reduce during this so-called CWR state. Moreover, if no reduction was
its congestion windows in case of low fraction of markings, performed, we do not reset snd cwnd cnt, which maintains
whereas strong reductions are performed in case of a high when to increase the window next, to preserve the linear
degree of congestion. increase behavior.
584
Globecom 2014 Workshop - Telecommunications Standards - From Research to Standards
40
20
0
0 2 4 6 8 10 12 14
Time (seconds)
50
40 TCP Reno TCP Reno drops
DCTCP DCTCP markings Fig. 3. Packet mark probability calculation
(MSS)
30
20
10 C. Dual AQM Scheme
0
5 10 15 20 25 The packets of DCTCP and other TCP flows need to be
Time (seconds) handled differently in the AQM scheme of the bottleneck
network node according to the different congestion signal
Fig. 2. Congestion window and mark/drop events for one TCP Reno and
one DCTCP host sharing an accordingly configured queue
semantics. We propose an AQM scheme based on one shared
queue but applying two differently parameterized instances of
the RED algorithm, one for non-ECN traffic and one for ECN
B. Algorithmic Modifications traffic. Our scheme classifies the traffic based on the ECN-
capability, thus packets will respectively be marked or dropped
c) Continuous update of α: As mentioned in Section to notify of congestion. This approach would probably result
II-B, α is updated only once per RTT. With such a periodic in low throughput for ECN-enabled end-systems that still use
update scheme, α might not catch the maximum congestion conventional congestion control such as Reno. However, given
level and, even worse, might still reflect an old value when this ECN standard was defined in 2001 and has hardly seen
the congestion window reduction is performed. To avoid this, any active use, it is unlikely this is an important factor.
we update α on the reception of each acknowledgement. It
must be mentioned that for with the modificaion also the Instead we propose to standardize an ECN signal that
weighting factor g must be chosen differently because α signals congestion immediately, allowing the end hosts to dis-
is recalculated more often. Therfore we set g to be 1/28 tinguish between smoothed and immediate congestion notifi-
instead of 1/24 to compensate for this effect, thus making the cation. DCTCP together with more accurate ECN feedback, as
behavior similar to the original DCTCP patch in our rather already under standardization [3], [4], could be re-implemented
static evaluation scenarios. However, the right choice of g and turned on after ECN capability negotiation with the server.
depends on the absolute number of markings, and thus number The much greater performance benefits of DCTCP, could then
of recalculations performed, and therefore actually depends on incentivize OS developers to deploy DCTCP with ECN turned
the current number of packets in flight. This dependency could on by default.
be compensated by normalizing the fraction of marked packets Figure 2 shows exemplarily the resulting congestion sig-
F with the current number of packets in flight, or simply the nals along with the congestion window of one Reno and
current congestion window value. one DCTCP flow equally sharing the bandwidth. The used
d) Progressive congestion window reduction: In the parameter set is derived from our investigations described later
original implementation the congestion window is recalculated on in the second part of Section IV-C.
as soon as the CWR state is entered. But, as explained above,
the actual congestion level would need to be determined over IV. P RELIMINARY E VALUATION
the following RTT in which further congestion signal are
expected to be received. We cannot wait one whole RTT to In this evaluation we investigated DCTCP with the pro-
perform any window reductions, as this would cause further posed dual AQM scheme in a simplified scenario to show
unnecessary congestion. Thus we decrease the congestion win- feasibility. Our parameter study shows that a large range of
dow progressively on reception of each ECN-Echo. For each configuration can be used to achieve different operation points
recalculation we use the congestion window value cwnd max in link utilization, queue occupancy (and thus latency) and
from the start of the CWR state and reset the congestion bandwidth sharing between multiple flows. We investigated the
window only if the resulting value is lower than the current two approaches of RED parameterization for the DCTCP traf-
value. fic as illustrated in Figure 3: i) (left) a degenerate configuration
with M in T hresh DCT CP = M ax T hresh DCT CP =
Figure 1 shows the congestion window of one DCTCP K creating a simple marking threshold as originally proposed
flow either using the original patch or our modification in for DCTCP or ii) (right) with M in T hresh DCT CP <
comparison to one TCP Reno flow. It can be seen that after M ax T hresh DCT CP as in standard RED configurations
the Slow Start phase our implementation adapts faster but as described in III-C, i.e. either using a marking threshold K
otherwise the behavior is similar, as desired. or a marking slope. The selected parameterization covers only
585
Globecom 2014 Workshop - Telecommunications Standards - From Research to Standards
10
Throughput ratio
Reno / DCTCP
1
0.1
0.01
1.00
0.99
Utilization
0.98
Fig. 4. Simulation scenario (only forward direction) 0.97
0.96
0.95
0.94
a limited range of the large parameter set but presents the two 0.93
most interesting cases. Other scenarios need to be investigated
Queue occupancy
3.0
Min Thresh Reno
before applying our approach to the Internet to cover corner 2.5
BDP/8 BDP/
√
2
(BDPs)
cases. 2.0 BDP/4 BDP
1.5 BDP/2
1.0
A. Simulation Environment and Scenario 0.5
0.0
We evaluated our approach based on simulations using the 1.0 1.5 2.0 2.5 3.0
IKR SimLib [7], an event-driven simulation library with an K / Min Thresh Reno
extension to integrate virtual machines [8] running a Linux
kernel with our modified DCTCP implementation. Fig. 5. Results when using marking threshold
586
Globecom 2014 Workshop - Telecommunications Standards - From Research to Standards
Reno / (DCTCP/N)
Throughput ratio
10
1 1.0
N =1 N =4
Jain’s index
N =2 N =5 0.9
N =3
0.8
0.1
1.00 0.7
0.99
Utilization
0.6
0.98
0.97 0.5
M
0.96 1.0
in
0.9
T
0.95 0.8
hr
0.94 0.7
es
0.6 1.0
h
0.93 0.8
D
0.5
P
C
0.4 0.6
TC
Queue occupancy
0.32
T
0.3 0.4 C
b D
C
0.30
Pro
P
0.2 0.2
0.28
ax
(B
0.0
(BDPs)
0.26 M
D
0.24
P
s)
0.22
0.20
0.18
0.16 Fig. 8. Jain’s fairness index
0.14
1.0 1.5 2.0 2.5 3.0
9
10
Reno / DCTCP
4
1
3
0.1 2
Min Thresh DCTCP
BDP /8
√
BDP / 2 1
0.01 1 2 3 4 5 6 7 8 9
BDP /4 BDP
Queue occupancy
1.4
BDP /2 Max Prob DCTCP /
1.3
1.2
Max Prob Reno
(BDPs)
1.1
1.0 Fig. 9. Maximum fairness
0.9
0.8 TCP Reno
0.7
0.2 0.4 0.6 0.8 1.0 our parameter set. Figure 7 shows the throughput ratio between
Max Prob DCTCP Reno and DCTCP and the queue occupancy when varying
M ax prob DCT CP from 0 to 1. The plot shows equal
Fig. 7. Results when using marking slope sharing is possible for many parameterizations, except for the
M in T hresh DCT CP = BDP . This is expected since
both flows get the same feedback rate but Reno reacts by
settings with large values of K/M in T hresh Reno which
halving the sending rate while DCTCP usually will decrease
also increases the average queue occupancy. Thus with a larger
less (depending on the number of markings). Figure 8 shows a
proportion of DCTCP in a (future) traffic mix, the AQM
3-dimensional plot of the Jain’s fairness index [10] depending
thresholds could be lowered while maintaining high utilization.
on M in T hresh DCT CP and M ax P rob DCT CP on
the left. The highlighted ridge marks a fairness index of
C. Using a Marking Slope one. Figure 9 shows the parameter combinations of maximum
The alternative parameterization for the DCTCP traffic fairness. The function y = x/(x − 1) is overlaid to illustrate
uses a conventional RED configuration forming a slope of that it fits quite closely to the measured maximum fairness
increasing marking probability depending on the average queue configurations. These results suggest that we can achieve equal
occupancy (in contrast to using a step function of instantaneous sharing for a parameterization according to the following rule:
queue length). We also studied the influence of the weighting
M ax P rob Reno −
M ax P rob DCT CP
M in T hresh DCT CP 1
factor w DCT CP . As expected for a non-dynamic scenario = M ax P rob DCT CP
(3)
with just long-running flows, we found that it has only minor M in T hresh Reno M ax P rob Reno
influence on bandwidth sharing and thus chose the same That means for a given configuration for Reno, there is
value for w DCT CP as for w Reno of 0.002. For this just one parameter to choose, M in T hresh DCT CP or
study we fixed M in T hresh Reno to BDP , resulting in M ax P rob DCT CP .
M ax T hresh Reno = 3 ∗ BDP . We investigate values
for M in T hresh DCT CP smaller than BDP and shifted Equal Sharing Configurations: Since this formula
M ax T hresh DCT CP to M in T hresh DCT CP + 2 ∗ provides configurations that implement (about) equal shar-
BDP to keep the same distance between the thresholds and ing, we scale the maximum marking probability by
thus the same slope which again is a simplification to narrow M in T hresh DCT CP and altered only the minimum
587
Globecom 2014 Workshop - Telecommunications Standards - From Research to Standards
Occupancy (BDPs)
1.00 1.0
Jain’s index,
0.99 itself. We expect further advantages from DCTCP’s reaction to
0.98 0.8 congestion when flows with very small and very large RTTs
utilization
0.97
0.96 0.6 are sharing the same bottleneck. We also need to show that
0.95 Jain’s index 0.4 endpoints and network nodes with the new semantics can
0.94
0.93 Utilization
0.2 safely coexist with any legacy ECN endpoints or network
0.92 Occupancy
0.91 0.0
nodes, in case they are deployed without update.
0.0 0.2 0.4 0.6 0.8 1.0
Min Thresh Reno (BDPs)
The proposed way to deploy DCTCP in the Internet re-
quires instantaneous and more accurate ECN feedback. Today
Fig. 10. Equal Sharing ECN is defined as a “drop equivalent” and therefore provides
only small performance gains and consequently has not seen
wide deployment. With a change in semantics, ECN could
threshold for Reno traffic in this evaluation. That means we set be used as an enabler for new low latency services also
M ax P rob Reno to 0.1/(M in T hresh Reno/BDP ) and implementing a different response to congestion, similar to
M ax P rob DCT CP to 0.2/(M in T hresh Reno/BDP ). DCTCP.
We show results for M in T hresh DCT CP = 1/2 ∗
M in T hresh Reno. Apart from a more accurate ECN signal, where a proposal
by the authors has already been adopted onto the IETF’s
Figure 10 shows Jain’s fairness index, utilization and agenda, we also see a need to standardize a change to the
queue occupancy. As it can be seen, such configurations semantics of ECN to provide immediate congestion informa-
achieve almost maximum fairness in terms of equal shar- tion without any further delays in the network. This work
ing and the queue occupancy depends about linearly on provides further input on the needs for a future, immediate,
M in T hresh Reno. The achieved utilization is close to and therefore more beneficial ECN-based congestion control
100 % with a M in T hresh Reno of only 0.3*BDP. For the loop and proposes an approach for how congestion control
very simple scenario considered, this finding allows to define could react to such signal.
a trade off between delay and utilization while keeping the
sharing equal thus being “TCP-friendly”. VI. ACKNOWLEDGMENTS
This work was performed while the first author was still
V. C ONCLUSIONS
with IKR, University of Stuttgart, Germany This work is
In this paper we propose a new dual AQM scheme that part-funded by the European Community under its Seventh
can be implemented based on Weighted RED (WRED) to Framework Programme through the ETICS project and the
incrementally deploy DCTCP with its different congestion Reducing Internet Transport Latency (RITE) project (ICT-
semantics in the public Internet. Today ECN sees only min- 317700). The views expressed here are solely those of the
imal deployment, but activities in standardization are under authors.
way to re-define the congestion feedback mechanism and its
meanings. We argue that a classification solely based on the R EFERENCES
ECN-capability of the traffic provides an opportunity for actual [1] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prab-
DCTCP deployment in the Internet. Therefore, we evaluated hakar, S. Sengupta, and M. Sridharan, “DCTCP: Efficient packet
the possibility of concurrent usage of DCTCP with other transport for the commoditized data center,” in Microsoft Research
conventional TCP congestion control. We evaluated two RED publications, January 2010.
configurations for providing ECN-based feedback for DCTCP [2] M. Alizadeh, A. Javanmard, and B. Prabhakar, “Analysis of DCTCP:
stability, convergence, and fairness.” in SIGMETRICS. ACM, 2011.
traffic: i) a marking threshold K as proposed by the original
DCTCP approach and ii) a marking slope as in standard [3] B. Briscoe, R. Scheffenegger, and M. Kuehlewind, “More Accurate
ECN Feedback in TCP: draft-kuehlewind-tcpm-accurate-ecn-03,” IETF,
RED configurations. We showed that both approaches can Internet-Draft, Jul. 2014, (Work in Progress).
be configured for stable operation, where the proportions of [4] M. Kühlewind, R. Scheffeneger, and B. Briscoe, “Problem Statement
DCTCP and Reno traffic converge to a certain ratio or even and Requirements for a More Accurate ECN Feedback: draft-ietf-tcpm-
to an equal rate, if desired. Moreover, we found a formula for accecn-reqs-06,” IETF, Internet-Draft, Jul. 2014, (Work in Progress).
RED parameters that always results in equal sharing between [5] S. Floyd and V. Jacobson, “Random Early Detection gateways for
DCTCP and non-DCTCP. This relation allows high utilization Congestion Avoidance,” IEEE/ACM Transactions on Networking, pp.
to be traded off against low delay. We showed that, even with 397–413, Aug. 1993.
the minimum threshold set very low to maintain low latency, [6] “DCTCP patch for linux 3.2,” https://github.com/mininet/mininet-
tests/blob/master/dctcp/0001-Updated-DCTCP-patch-for-3.2-
utilization increases with a larger fraction of DCTCP traffic. kernels.patch, 2014.
This study is only a first step to show that the way [7] “IKR Simulation Library,” http://www.ikr.uni-
proposed to deploy DCTCP in the Internet would at least stuttgart.de/Content/IKRSimLib/, 2014.
give a reasonable share of capacity to long-running flows; [8] T. Werthmann, M. Kaschub, M. Kühlewind, S. Scholz, and D. Wagner,
“VMSimInt: A Network Simulation Tool Supporting Integration of
while still reducing latency and maintaining high utilization. Arbitrary Kernels and Applications,” in Proceedings of the 7th ICST
Further evaluation is needed using scenarios with all kinds Conference on Simulation Tools and Techniques (SIMUTools), 2014.
of traffic models, e.g. with more and not only long-running [9] S. Floyd, “RED: Discussions of setting parameters,”
flows and different shares of DCTCP and conventional TCP http://www.icir.org/floyd/REDparameters.txt, November 1997.
flows. Our interest lies also in a wider parameter study focusing [10] R. Jain, D. Chiu, and W. Hawe, “A quantitative measure of fairness
on scenarios with ECN marking based on the instantaneous and discrimination for resource allocation in shared computer systems,”
queue length only, as DCTCP already implements smoothing DEC Research Report TR-301, September 1984.
588