Enabling Flow-Level Latency Measurements Across Routers in Data Centers - PPT

Enabling Flow-level Latency
Measurements across Routers in Data

Centers
Parmjeet Singh, Myungjin Lee

Sagar Kumar, Ramana Rao Kompella
Latency-critical applications in data centers
 Guaranteeing low end-to-end latency is important
 Web search (e.g., Google’s instant search service)
 Retail advertising
 Recommendation systems
 High-frequency trading in financial data centers
 Operators want to troubleshoot latency anomalies

 End-host latencies can be monitored locally
 Detection, diagnosis and localization through a network: no
native support of latency measurements in a router/switch
Prior solutions
 Lossy Difference Aggregator (LDA)
 Kompella et al. [SIGCOMM ’09]
 Aggregate latency statistics
 Reference Latency Interpolation (RLI)

 Lee et al. [SIGCOMM ’10]
 Per-flow latency measurements
More suitable due to more fine-grained measurements
Deployment scenario of RLI
 Upgrading all switches/routers in a data center network
 Pros
 Provide finest granularity of latency anomaly localization
 Cons
 Significant deployment cost
 Possible downtime of entire production data centers
 In this work, we are considering partial deployment of RLI

 Our approach: RLI across Routers (RLIR)
Overview of RLI architecture
Router
Ingress I
 Goal Egress E
 Latency statistics on a per-flow basis between interfaces
 Problem setting
 No storing timestamp for each packet at ingress and egress
due to high storage and communication cost
 Regular packets do not carry timestamps
Overview of RLI architecture
Linear interpolation
Ingress I 1E
2
Egress L
line
Delay
1
Reference
L
Latency R
Packet
R 2
Estimator Interpolated
Injector
delay
 Premise of RLI: delay locality
Time
 Approach
1) The injector sends reference packets regularly
2) Reference packet carries ingress timestamp
3) Linear interpolation: compute per-packet latency estimates at
the latency estimator
4) Per-flow estimates by aggregating per-packet estimates
Full vs. Partial deployment
RLI Sender (Reference Packet Injector) RLI Receiver (Latency Estimator)
Switch 1 Switch 3 Switch 5
 Full deployment: 16 RLI sender-receiver pairs

 Partial deployment: 4 RLI senders + 2 RLI receivers
 81.25 % deployment cost reduction

Case 1: Presence of cross traffic
Link utilization Bottleneck

Cross
estimation on Switch 1 Link
Traffic
 Issue: Inaccurate link utilization estimation at the sender
leads to high reference packet injection rate
 Approach
 Not actively addressing the issue
 Evaluation shows no much impact on packet loss rate increase
 Details in the paper
Case 2: RLI Sender side
 Issue: Traffic may take different routes at an intermediate

switch
 Approach: Sender sends reference packets to all receivers
Case 3: RLI Receiver side

 Issue: Hard to associate reference packets and regular
packets that traversed the same path
 Approaches
 Packet marking: requires native support from routers
 Reverse ECMP computation: ‘reverse’ engineer intermediate
routes using ECMP hash function
 IP prefix matching at limited situation
Deployment example in fat-tree topology
IP prefix matching Reverse ECMP computation /

IP prefix matching
Evaluation
 Simulation setup
 Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)
 Simulator
10% / 1%
RLI RLI
injection rate
Regular Sender Reference Receiver
Traffic packets
Packet Traffic
Switch1 Switch2
Trace Divider
Cross Cross Traffic

Traffic Injector
 Results
 Accuracy of per-flow latency estimates
Accuracy of per-flow latency estimates
Bottleneck link utilization: 93%
67%
10% injection
1% injection 10% injection

1% injection
CDF
Relative error
1.2% 4.5% 18% 31%
Summary
 Low latency applications in data centers
 Localization of latency anomaly is important
 RLI provides flow-level latency statistics, but full

deployment (i.e., all routers/switches) cost is expensive
 Proposed a solution enabling partial deployment of RLI

 No too much loss in localization granularity (i.e., every other
router)
Thank you! Questions?

Enabling Flow-Level Latency Measurements Across Routers in Data Centers - PPT

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Enabling Flow-Level Latency Measurements Across Routers in Data Centers - PPT

Încărcat de

Drepturi de autor:

Formate disponibile

Enabling Flow-level Latency

Measurements across Routers in Data

Parmjeet Singh, Myungjin Lee

 Operators want to troubleshoot latency anomalies

 Reference Latency Interpolation (RLI)

 In this work, we are considering partial deployment of RLI

Switch 1 Switch 3 Switch 5

Switch 2 Switch 4 Switch 6

 Full deployment: 16 RLI sender-receiver pairs

 81.25 % deployment cost reduction

Switch 1 Switch 3 Switch 5

Link utilization Bottleneck

Switch 1 Switch 3 Switch 5

Switch 2 Switch 4 Switch 6

 Issue: Traffic may take different routes at an intermediate

Switch 1 Switch 3 Switch 5

Switch 2 Switch 4 Switch 6

IP prefix matching Reverse ECMP computation /

Cross Cross Traffic

1% injection 10% injection

 RLI provides flow-level latency statistics, but full

 Proposed a solution enabling partial deployment of RLI

S-ar putea să vă placă și