Sunteți pe pagina 1din 11

Latency Equalization: A Routing Service for Interactive Applications

Minlan Yu
Princeton University

Marina Thottan
Bell Labs, Alcatel-Lucent

Li Li
Bell Labs, Alcatel-Lucent

AbstractMulti-party interactive network applications such as teleconferencing, network gaming and online trading are gaining popularity. In addition to end-to-end latency bounds, these applications require that the perceived delay difference among multiple users of the service is minimized. We propose a Latency EQualization (LEQ) service for interactive network applications, which equalizes the perceived latency for all users of an application. To scalably and effectively implement LEQ service, network support is essential. We propose to deploy a few hubs in provider networks to redirect the packets of interactive applications through routes with similar end-to-end delay. We formulate the hub placement problem, prove its NP hardness, and provide a greedy algorithm to solve this problem. Through extensive simulations, we show that our LEQ hubrouting architecture signicantly reduces delay difference in large network topologies. The LEQ architecture is incrementally deployable in todays networks, requiring only a few nodes to be modied to perform the LEQ service.

I. I NTRODUCTION The increased availability of broadband access to the home, has spawned a new generation of netizens. Today end users expect to use the network as an interactive medium for multimedia communications and entertainment. This growing consumer space has led to several new network applications in the business and entertainment sector. In the entertainment arena, new applications involve multiple users participating in a single interactive session, for example, online gaming [1] and online music (orchestra) [2]. The commercial sector has dened interactive services such as bidding in e-commerce [3] and telepresence [4]. Depending on the number of participants involved, interactive applications are sensitive to both end-toend latency, and latency differences among participants. Minimizing the perceived latency differences among participants will enable more real-time interactivity. End-to-end latency requirements can be achieved by trafc engineering and other QoS techniques. However, these approaches are insufcient to address the needs of multiparty interactive network applications that require bounded latency difference across multiple users to improve interactivity. In online gaming the difference in lag experienced by gamers signicantly impacts the playability of the game [5], [6]. Game servers have implemented mechanisms by which participating players can vote to exclude players with higher lag times. In distributed live music concerts [2], individual musicians located at different geographic locations experience perceptible sound impairments introduced by delay differences among the musicians thus severely degrading the quality of

the music. In e-commerce, latency differences between pairs of shopping agents and pricing agents can result in price oscillations leading to an unfair advantage to those pairs of agents who have lower latency [3]. Previous work considered application-based solutions either at the client or server side to achieve equalized delay [7], [8], [9]. Client-side solutions are hard to implement because they require that all clients exchange latency information to all other clients and they are vulnerable to cheating [6]. Serverside techniques rely on the server to estimate network delay. This function places computational and memory overhead on the servers [10], and this in turn limits the number of clients the server can support [1]. In this work we take a different approach and investigate network-based Latency EQualization (LEQ). For example, in multi-player gaming compared to game servers with application-based LEQ solutions, Internet service providers (ISPs) have more detailed knowledge of current network trafc and congestion events, and greater access to network resources and routing control. Therefore, ISPs are better able to support LEQ for a large number of players with varying delays to the servers. This support can signicantly improve game experience leading to longer play times which in turn leads to larger revenue streams. We propose a general network-based LEQ service for interactive applications by providing equalized-latency paths between the clients and servers in each application. A direct solution would be to buffer packets on routers. However, this requires large router buffers and complex router scheduling mechanisms to decide how long to buffer each packet, and will entail signicant changes to todays routers. Therefore we propose a simple scheme to achieve equalized-latency paths by directing trafc from clients closer to the server along longer paths, so that their delay is similar to that of faraway clients. Since trafc load for interactive applications is relatively small [11], sending a few packets through longer paths will not impact other trafc in the network. Equalized-latency paths are achieved by setting up a few hubs in the network, and packets from different clients are redirected through these hubs to the servers. Since the redirection is implemented through IP encapsulation, our hub routing mechanism can be deployed by leveraging todays routing architecture. We show in the paper that our mechanism is easier to manage than other network-based solutions. We formulate the hub placement problem, which decides

where to place the hubs and the assignment of hubs to different clients to minimize delay difference. We prove that this hub placement problem is NP-hard and inapproximable. Therefore we propose a greedy algorithm which works well in achieving equalized-latency paths. Through extensive simulation studies, we show that our LEQ architecture signicantly reduces delay difference in both intradomain and interdomain settings. The paper is organized as follows: Section II motivates the need for network support. Section III describes the LEQ architecture and its deployment issues. Section IV provides our theoretical results and algorithms for the hub placement problem. Section V evaluates the LEQ architecture and algorithms under both static and dynamic scenarios. Sections VI and VII discuss related work and conclude the paper. II. M OTIVATION
FOR

N ETWORK S UPPORT

Current solutions for LEQ in interactive applications are implemented either at the client or server side without any network support. Application-based latency compensation techniques impose memory and computation overhead on the servers and clients [10]. Furthermore, due to the inadequacy of these end-system based techniques to manage latency imbalances, the applications suffer from problems such as inaccuracy of states, and the associated inability to scale to a large number of users [1]. We take gaming as an example to discuss the limitations of these application-based techniques. Client-side latency compensation techniques are based on hardware and software enhancements to speed up the processing of event updates and application rendering. These techniques cannot compensate for network-based delay differences among a group of clients. Buffering event update packets at the client edge routers or client nodes is hard to implement, because this requires the coordination of all the clients regarding which packets to buffer and for how long. This leads to additional measurement and communication overhead. In gaming applications clients implement dead reckoning, a scheme that uses previously received event updates to estimate the new positions of the players. This method does not work well when the actions of the player are hard to predict. Clientside solutions are also prone to cheating [6]. Players can hack the compensation mechanisms or tamper with the buffering strategies to gain unfair advantage in the game. Due to the problems of client-side solutions, several delay compensation schemes are implemented at the server side. However, while introducing CPU and memory overhead on the server, they still do not completely meet the requirements of fairness and interactivity. For example, with the bucket synchronization mechanism [7], the received packets are buffered in a bucket and the server calculations are delayed till the end of each bucket cycle. The performance of this method is highly sensitive to the bucket (time window) size used, and there is a tradeoff between interactivity versus the memory and computation overhead on the server. In the time warp synchronization scheme [8], snapshots of the game state are taken before the execution of each event. When there are late events, the game state is rolled back to one of the previous

snapshots and the game is re-executed with the new events. This scheme does not scale well for fast-paced, high-action games, because taking snapshots on every event requires both fast computation and large amounts of memory. In [9], a gameindependent application was placed at the server to equalize delay differences by constantly measuring network delays and adjusting players total delays by adding articial lag. Measuring network delay for all clients by polling from the server introduces additional CPU and network overhead. Based on the limitations of the end-system based techniques for latency compensation we conclude that it is difcult for end hosts to compensate for latency on their own. There is a pressing need to support LEQ as a general service to improve user experience for all interactive network applications. With network support for LEQ, the network delay measurement can be ofoaded from the application server and performed more accurately. The network-based LEQ could also react faster to network congestion or failure. Network support for LEQ is complementary to server-side delay compensation techniques. Since network-based LEQ service provides a bound on the delay difference among participants of the interactive applications, the applications can better ne tune their performance. For example in the case of gaming applications the servers can use smaller bucket sizes in bucket synchronization, or use fewer snapshots in time warp synchronization. Therefore for the same memory footprint, servers can increase the number of concurrent clients supported and improve the quality of the interactive experience. III. L ATENCY E QUALIZATION A RCHITECTURE In this section, we rst motivate our design choice hub routing (Section III-A). Then we describe the basic LEQ service in the context of a single administrative domain (Section III-B), which could model either a single AS network or multiple cooperative ASes. In the cooperative environment we consider the application of the LEQ architecture over the combined topology of ASes. For the basic LEQ architecture we focus on equalizing delays between edge routers that connect the clients and servers to the network. The extension of the basic LEQ service to consider access network delays and the general case of multiple administrative domains is discussed (Section III-E). We also describe some of the additional advantages that our LEQ architecture provides for server-centric network applications (Section III-D). A. Alternative Network-based Solutions We motivate our design choice of hub routing for LEQ, by discussing some alternative network-based solutions: Buffering by routers: One obvious approach of using the network to equalize delays is to buffer packets at the routers. This would require large buffers and appropriate router scheduling mechanisms that take into account packet delay requirements. If the edge router cannot buffer the packet (due to lled buffers), the edge router can stamp a delay value in the packet header so that routers along the path can then buffer the packet for some specied delay. However, we did

client1

R3 R1 1 4 4 R2 3 8 R4

server 1 4 R6 2 R5 8 3 R8 8 R9 1 5 R7 4 4 R10

not choose the buffering approach as it introduces signicant changes to the normal operation of todays routers. At the very least, routers will have to make scheduling decisions (how long to buffer) based on inspecting the headers of each packet. Source routing: One could use source routing to address the problem of LEQ. Source routing can be used by the sender (i.e., client or client edge router) to choose the path taken by the packet [12]. However, this requires that all senders are aware of the network topology, which is challenging to achieve. Tune link weights: Another approach is to tune the link weights of OSPF. Today, link weights are typically tuned to achieve trafc engineering goals. Using link weights to achieve LEQ will require that routers have two sets of link weights, one optimized for load balancing the network and the other optimized for LEQ. This approach would also require changes to the routers to support application-specic routing tables. Set up MPLS paths: We can set up MPLS paths with equalized latency between each pair of client and server edge routers. This approach incurs the additional overhead of path set up and requires NC NS MPLS paths to be congured. (NC and NS are the number of client and server edge routers respectively.) This solution does not scale well for large number of client and server edge routers. Our solution hub routing: Based on the above design considerations we chose a hub routing approach: Using a small number of hubs in the network to redirect packets, we equalize the delays of low-latency clients (that are near the servers) by choosing longer paths. The hubs are optimally placed to ensure that the delay differences are minimized for all those users that are steered via these hubs. Since the trafc for interactive applications is small [11] (e.g., in gaming, heavy-duty graphics are loaded to the clients in advance), this redirection will not cause congestion on the hubs or the network. We choose hub routing for four reasons: i) It only requires a few hubs in the network thus reducing the cost of deployment; ii) It allows incremental deployment. Even with one hub, we can reduce the delay difference by 40% on average compared with shortest path routing; iii) No modication of the underlying routing protocols is necessary. and iv) The LEQ architecture can be implemented as an overlay on the underlying routing infrastructure. B. Basic LEQ Hub Routing Architecture For the basic LEQ architecture, we consider a single administrative domain scenario, and focus on equalizing delays between the client edge routers and the server edge routers. Based on the LEQ requirements, each client edge router is assigned to a set of hubs. Client edge routers redirect the application packets corresponding to the LEQ service through the hubs to the destined server. By redirecting through the specied hubs, packets from different client edge routers with different delays to the server are guaranteed to reach the server within a bounded delay difference.

client2

Fig. 1. LEQ routing architecture (R1..R10 are ten routers in the network. R1, R2 are client edge router; R10 is server edge router. The numbers on the links represent the latency (ms) of each link. R6, R7 and R8 are hubs.)

Figure 1 illustrates the basic LEQ routing architecture. The client trafc from an interactive application enters the provider network through edge routers R1 and R2. The server for the interactive application is connected to the network through edge router R10. If the trafc is routed using the shortest path routing metric, the latency between R1 and R10 is 3ms and from R2 to R10 is 10 ms. Using the LEQ routing architecture, R6 and R7 are chosen as hubs for R1, and R7 and R8 are chosen as hubs for R2. Using redirection through hubs, R1 has two paths to the server edge router R10: R1 R6 R10 and R1 R7 R10, both of which have a delay of 10ms. R2 also has two paths: R2 R7 R10 and R2 R8 R10, whose delay is also 10ms. Thus LEQ is achieved by clever hub selection. We achieve LEQ between clients 1 and 2 by redirecting the packets from client 1 through hubs along longer paths. C. Deployment Considerations for LEQ Service
LEQ Service Manager Hubs Setup Phase: LEQ service [maximum delay bound, #hubs (M), #hubs per edge router (m) ] Hub Placement Algorithm A set of hubs for each client edge router Deployment Phase: Client Edge Routers

Service ID, Ap

p Identi ers b IDs

Assigned Hu

Running Phase:

Network stat e info (Server Edge Router ID, Cl ient

Edge Router

ID, Delay)

Fig. 2.

LEQ Service Deployment

Our LEQ architecture requires a centralized server the LEQ service manager to initialize the LEQ service for each interactive application. Figure 2 outlines LEQ service deployment in three phases: Setup phase: LEQ service manager considers all the client edge routers as potential entry points for participants in an

interactive application. Therefore the service manager runs our ofine hub placement algorithm to select a few hubs for each client edge router. The hub placement algorithm is executed ofine because it requires operational time to select and set up hubs. The algorithm is executed in a centralized service manager because it requires global knowledge of routing paths. To deal with delay changes due to congestion, each client edge router is assigned to more than one hub. Therefore the client edge router has the exibility to select paths that avoid congestion. Once the hubs have been chosen, LEQ service manager will set up the hubs. Hubs can be either routers with access to some network performance measurements or standalone network appliances that can do packet processing and delay monitoring. Deployment phase: The service manager sends the service ID and application specic information for packet identication (e.g., port number and server IP addresses) to the client edge router and hubs. It also sends the information of the assigned hubs (i.e., IDs and hub IP addresses) to the client edge routers. Packet indirection through hubs can be easily implemented through IP encapsulation without any changes in todays routing architecture. Running phase: Each hub monitors the delay experienced between itself and different participating client and server edge routers, and sends it to client edge routers at some application specic time interval t (e.g., t may be 5 minutes for gaming applications). To provide LEQ service, the edge router rst identies the application packets that require LEQ services based on the port number and IP addresses of the servers (known to the service provider in advance), and then sends the packets via one of its assigned hubs. D. Additional Advantages of LEQ service 1) Reliability: For reliability we choose multiple hubs for each client edge router. In fact, there is a tradeoff to be made with regard to the appropriate number of hubs for each client. More hubs would lead to more diversity of equalized-latency paths for one client, and thus provide more reliable paths in the face of transient congestion or link/node failure. However, these additional equalized-latency paths are realized by a small compromise in the delay difference achieved. We study this tradeoff through our dynamic simulation in Section V-E. 2) Low management and monitoring overhead: LEQ architecture requires much less management and monitoring overhead than the other routing based approaches. From a messaging perspective at the deployment phase, the LEQ service manager sends M + NC control messages, where M is the number of hubs and NC is the number of client edge routers. At the running phase, each edge router will receive network state messages from its assigned hubs (usually 2 or 3). LEQ service does not impose a big monitoring overhead since each hub only monitors the delay to its set of client and server edge routers and the monitoring time scale is in the order of a few minutes. In contrast, with source routing each client edge router must communicate with all the other NC 1 client edge routers. If MPLS paths had been set up

between each pair of client and server edge routers, it would be necessary to monitor the delay on all the NC NS paths. 3) Application Based Enhancements: Another advantage of the LEQ architecture is that application-specic functionality can be added to the hubs. For example, in online gaming, the multicast distribution of server updates can be implemented in a resource efcient manner by leveraging the optimized location of hubs. Furthermore, LEQ service support could simplify the design and implementation of future interactive applications. For example, today in gaming applications, a new game client has to query and select from a large list of servers to nd the server with required delay and delay difference specications. This query-select process is done manually by the player and is time consuming. With LEQ support on the edge routers, both the server selection and the routing paths can be pre-calculated for all the clients. E. Extensions to the Basic LEQ Service 1) Access Network Delay: Latency differences in interactive applications often arise from the disparity in the access network delays. Multiple clients may connect to the same client edge router through different access networks. Access network delay depends on the technology used, and the current load on the last mile link [13]. Average access network delays can be: 180 ms for dial-up, 20 ms for cable, 15 ms for Asymmetric Digital Subscriber Line (ADSL) and negligible for Fiber Optic Service (FiOS).1 In our LEQ architecture we account for this disparity of access network delays by grouping clients into latency groups. We provide different hubs for each latency group to achieve latency equalization among all the clients. These hubs are chosen using the median access delay among the clients in each latency group. When a client connects to an edge router, the edge router can measure the access delay of the client through either active probing or passive probing. The router will then identify the latency group the client belongs to based on its access network delay and then forwards the application trafc to the corresponding hub. This implies that multiple clients connected to the same edge router but with widely different access delays will be assigned to different hubs. 2) Multiple Administrative Domains: Hosting interactive applications under multiple administrative domains is challenging. There are two possible scenarios. The rst case is a cooperative environment, where different Internet service providers cooperate to provide LEQ service within their respective domains. This case also corresponds to scenarios where service providers can negotiate peering policies that are application specic [14]. In this cooperative environment we consider the application of the LEQ architecture over the combined topology of both providers. In this scenario, similar to the single administrative domain, the LEQ architecture can signicantly reduce delay differences. The second case is the service agnostic peering environment where the game server is hosted by one of the providers
1 We assume servers are connected to the network on dedicated high speed links and thus do not have access delay.

and users from other providers at best receive the basic best effort service. This applies to situations where there is no knowledge of topology and routing in the peering domains, and no cooperation in placing hubs. In this case, the hosting network treats users coming from other peering networks with differing delays at a border router as similar to users with different access delays. Our evaluation in Section V shows that we can indeed reduce delay differences signicantly with only one provider supporting LEQ service. IV. A LGORITHMS
FOR

L ATENCY E QUALIZATION

The key component of our LEQ architecture is the hub placement algorithm, which focuses on the problem of hub selection and the assignment of hubs to the client edge routers. Hubs are selected with the goal of minimizing the delay difference across all client edge routers. We rst formulate the basic hub placement problem without considering access delay, and prove that it is NP-hard and inapproximable. Therefore, a greedy heuristic algorithm is proposed to solve this general problem and we have extended the basic algorithm to handle access delays. Using the selected hub nodes we show that delay differences can be signicantly reduced as compared to shortest path routing. A. Formulating the Basic Hub Placement Problem
TABLE I N OTATIONS OF d(u, v) Sc i Hc i Ns r M m Dmax
BASIC HUB PLACEMENT PROBLEM

Given M , m, r, Dmax , our goal is to nd a set Hci of m hubs for each client edge router ci , so that we can minimize the delay difference . Let d(ci , hj ) denote the delay from a client edge router ci to a hub hj . Similarly let d(hj , sk ) denote the delay from a hub hj to a server sk . We use the notation dijk for d(ci , hj ) + d(hj , sk ). Let yj = 1 denote router hj is a hub, 0 otherwise. xij = 1 denotes router hj is a hub for client edge router ci , 0 otherwise. To better understand the problem, we present the integer programming formulation for the Hub Placement Problem as follows: minimize delay difference P yj M jVH xij yj , ci VC , hj VH P xij m, ci VC jVH
dijk xij Dmax , ci VC , hj VH , sk Sci

Propagation delay between routers u and v Set of servers associated with client edge router ci Set of hub nodes assigned to client edge router ci Number of servers in the network Number of servers associated with each client edge router Total number of hubs Number of hubs selected for each client edge router Maximum delay bound of each end-to-end path Delay difference

The rst equation in the formulation is the constraint that the total number of hubs cannot be more than M . The second equation means that each client edge router can only select its hubs from the hub set VH . The third equation is the constraint that each client must have at least m hubs. The fourth equation is the constraint that the maximum delay cannot be more than Dmax . The fth constraint species that pair-wise delay differences between pairs xij and xi j cannot exceed . It takes effect only when xij = 1 and xi j = 1, otherwise, the constraint is trivially true. The last two equations indicate that yj and xij are binary variables. B. Complexity of the Basic Hub Placement Problem We now show that the basic hub placement problem is NPhard and inapproximable even for a single server. Theorem 1: When m < M , the basic hub placement problem is NP-hard and is not approximable. Proof: Our proof is based on the reduction from the wellknown set cover problem which is NP hard. Set cover problem. Consider a ground set U = e1 , e2 , , en , and a collection of n subsets Bi U of that ground set. Given an integer M , the set cover problem is to select at most M subsets such that taken together they cover all the elements in U . In other words, is there a set C {B1 , , Bn } such that |C| M and Bi C Bi = U ? Hub placement is NP hard. Given an instance of the set cover problem, we construct an instance of the hub placement problem. We map each element ei to a client edge router ci , and map each subset Bj to a candidate hub hj . If ei Bj , we set d(ci , hj ) = ; otherwise, we set it to Dmax . We set the same delay from each candidate hub to each server (d(hj , sk ) = , j, k).3 Let m = 1, i.e., each client edge router has to have at least one hub. Obviously, there is a set cover of
3 This

|dijk di j k |(xij + xi j 1) , ci , c VC , hj , h VH , sk , s Sci i j k yj {0, 1}, j : 1 j |VH | xij {0, 1}, i, j : 1 j |VH |, 1 i |Vc |

We use an undirected graph G = (V, E) to represent the network. V consists of client edge routers, candidate hubs, and servers. Let VC V denote the set of client edge routers, VS V denotes the set of server nodes, VH denotes the set of routers that can be chosen as hub nodes. We denote d(u, v), u, v V as the propagation delay of the underlying network path between routers u and v. In order to balance the load among these NS = |VS | servers, we associate each client edge router with its r closest servers (in terms of propagation delay). We denote by Sci the set of servers that are associated with client edge router ci . Note that the choice of Sci is independent of the hub locations. We also dene Dmax as the maximum delay each client edge router ci can tolerate on its path to any server in Sci .2 To reduce the management overhead, we set up at most M hubs in the network. For reliability we require that each client edge router has at least m hubs chosen from M hubs. Thus each client edge router has m different paths to the servers. (See Table I for notations.)
2 Value of D max depends on maximum delay requirements and application interactivity level.

proof holds for both one server or multiple server cases.

size M if and only if we have M hubs with delay difference equal to zero where each client gets assigned at least one hub. Thus we reduce the set cover problem to the hub placement problem. Moreover, given a solution in the hub placement problem, we can easily calculate its delay difference. Therefore the hub placement problem is NP hard. Hub placement is inapproximable. Suppose we can approximate the problem within a factor. Let the maximum delay difference of this algorithm be AP X and the optimal delay difference be OP T . Then AP X OP T , since we cannot pick links with delay D (since it exceeds maximum delay), and all the rest of the links are within delay from client edge routers to hubs. Thus, OP T must be zero which means AP X 0, so the algorithm gives an exact solution for the set cover problem which is a contradiction. C. Greedy Hub Placement Algorithm and a Special Case We rst provide a greedy algorithm for the basic hub placement problem and then show that when m = M , there exists an polynomial-time optimal solution (Table II).
TABLE II H UB PLACEMENT PROBLEM SUMMARY Hub Constraint Complexity Algorithm m<M NP hard Greedy (Algorithm 1) Inapproximable m=M P Optimal (Algorithm 2)

Algorithm 1 Basic hub placement for min Step 1. Sort all the delays from client edge router ci to server sk through hub hj in increasing order, denote this array A Step 2. For each A[t], binary search to nd the min delay difference: for each delay A[t] lef t = 0, right=Dmax A[t] while(lef t not equal right) t = (lef t + right)/2 Lt = greedycover(A[t], t , m, G, {d(u, v)}) if (|Lt | > M ) lef t = t else right = t Step 3. pick Lt with smallest (t ,A[t]) in terms of lexicographical order.
Fig. 3. The pseudo-code of basic hub placement algorithm

binary search, given a possible minimal delay A[t] and a maximum delay difference bound t , we use a greedy set cover algorithm (greedycover in Algorithm 1) to pick the M hubs. We rst set t to be Dmax , which is the maximum tolerable delay. Lt is the set of hubs returned by greedycover when the minimal absolute delay is A[t]. That is, we pick one hub at a time and the selected hub covers the maximum number of uncovered client edge routers. Note that a client edge router will not be covered in greedycover by a hub if its inclusion causes the maximum delay difference to exceed the pre-set bound t . If m > 1, each client edge router has to be covered m times. If no feasible solution exists, greedycover will set Lt to some value larger than M , and binary search will output Dmax as the t . Finally, we pick the solution with minimum t (Step 3). If there are multiple optimal solutions, we pick the one with smallest min delay A[t] among them, because applications are also sensitive to delay. 2) Optimal Algorithm for the Special case m = M : When m = M , we can design an optimal algorithm as shown in Figure 4. Since each client edge router will go through all M hubs, we know the minimal delay and maximum delay for all the paths going through a given hub hj (paths from any ci via hj to any sk Sci ). This is in contrast to the general case where these two delays depend on the assignment of hubs to client edge routers. This is the intuition for why we can design an optimal algorithm for the special case, but not for the general case. Let B be a set of all candidate hub records where each record b B has three elds: hub ID hid , minimal delay mind and maximum delay maxd . Denote the records sorted in increasing order of mind and maxd by B1 and B2 respectively (Step 1). For each candidate hub record b1 , the t algorithm computes a candidate solution Lt by adding in M hubs with smallest b2 .maxd whose b2 .mind > b1 .mind (Step t j j 2). The algorithm then picks the Lt with minimal t (Step 3). Theorem 2: Algorithm 2 is optimal when m = M . Proof: Note that Lt is optimal for each possible mind . Therefore Lt with minimal t must be an optimal solution for the problem.
Algorithm 2 Optimal algorithm for m = M Step 1. Let B1 , B2 be the candidate hub records sorted by their min delay and max delay in increasing order. Step 2. for each b1 B1 t Lt = {b1 .hid }, t = Dmax t for each b2 B2 j if (b2 .mind > b1 .mind and |Lt | < M ) j t Lt = Lt {b2 .hid } t = b2 .maxd b1 .mind t j Step 3. Pick Lt such that t is the smallest.
Fig. 4. The pseudo-code of algorithm for m = M

1) Greedy algorithm for m < M : To solve the hub placement problem we design a simple greedy heuristic algorithm to pick the M hubs (see Figure 3). Our algorithm rst sorts in increasing order all the delays from each client edge router through each possible hub to its associated servers (Step 1). This sorted list is denoted by the array A. For the example in Fig 1, the delays from client 1 through hubs R6, R7, and R8 are 10, 10, and 11. The delays from client 2 through hubs R6, R7, and R8 are 18, 10, and 10. We set the following variables: A[0] = 10, A[1] = 11, A[2] = 18. We would like to optimize for delay difference. We use a binary search to nd a feasible solution with minimum delay difference (Step 2). In each step of the

D. Hub Placement with Access Delays The basic hub placement problem can be easily extended to account for access delays by extending the denition of the

nodes V in the general graph G(V, H) to include client groups. For the clients that connect to the same client edge router, we divide them into groups based on their access delay. For example, we can partition clients of an edge router into four groups with access delays in [0,30),[30,60), [60,100), [100,). For each client group gi , we dene access delay a(gi ) as the median delay between clients in the group and their associated edge router. We use median delay to characterize the client groups in our algorithm. Further, we dene the delay between the client groups and the hubs as d(gi , hj ). This delay consists two parts: the access delay and the delay from the edge router to hub hj . Similarly the delay from hj to the server sk denoted by d(hj , sk ) consists of the delay from the hub hj to the edge router and the delay from the edge router to sk . Similar to Algorithm 1, we have the greedy hub placement algorithm as shown in Figure 5.
Algorithm 3 Hub placement with access delay Step 1.a For each client group, calculate the median access delay and its delay to the hubs. Step 1.b Sort all the delays no larger than Dmax from client group gi to server sk through candidate hub hj in increasing order, denote this array A Step 2 and 3: Same as Step 2. and 3. in Algorithm 1
Fig. 5. The pseudo-code of hub placement algorithm with access delay

Our evaluation uses several parameters that dene the Hub routing architecture: the total number of hubs M , the number of hubs selected for each client edge router m, the number of servers in the network NS , and the number of servers allocated for each client edge router r. We evaluate the LEQ routing architecture with and without access delay. We use acd to denote the range of access delay. The performance metric is the delay difference , which is the maximum difference in delay among all the paths selected. We use all the edge nodes in the backbone topology as client edge routers and randomly choose NS edge nodes as the location of servers. Each client would communicate with r servers that are nearest to it in terms of propagation delay. We then run the hub routing and shortest path routing algorithms to compute the paths for these clients and servers. Note that in the static case the path computation is based on the propagation delay in the network. We compute the propagation delay of these networks based on the geographical distances between any two nodes. In the shortest path routing algorithm we associate the link weights with the propagation delay of these links. To eliminate the bias introduced by server location, when NS = 1, we test all the possible locations of the servers; when NS > 1, we used 1000 simulation runs, each time randomizing the server location. B. Single Administrative Domain: Without Access Delay Using the static trafc scenario we analyze the effect of propagation delay, and explore the potential of the LEQ architecture to discover latency equalized paths. To highlight the main features of the LEQ architecture, we rst consider the provider network without network access delays. (1) LEQ architecture reduces delay difference signicantly compared with shortest path routing and only requires a few hubs. Figure 6(a) shows the average delay difference between all the client paths to any one server for both LEQ routing and OSPF. In all the networks we tested, we nd that LEQ routing with a single hub per client (m = 1) reduces delay difference to 5 ms (about 85% reduction for AT&T, 85% for Telstra and 90% for Abilene over OSPF4 ). Even with just one hub in the entire network (M =1), LEQ routing has on average 40% reduction in delay difference. Also the best performance for LEQ routing is achieved when the number of hubs per client (m) is set to 1. As the number of hubs per client increases we nd that due to the increased path diversity the average delay difference increases. However, even with 3 hubs per client the performance of LEQ routing is signicantly better than OSPF. From Figure 6(a) we note that increasing the total number of hubs in the network to more than 5 does not provide any signicant improvement in the delay difference measurements. This result holds with varied topologies (AT&T with 391 nodes and Abilene with 11 nodes).4 (2) We trade the smaller delays of clients near the server for an improved overall delay difference for all the clients.
4 The

The use of client groups simplies the management of the LEQ architecture. For a newly arrived client, we only need to determine the client group to which it belongs. The client edge router will forward the packets from the new client to the hubs associated with its client group. V. E XPERIMENTAL E VALUATION We evaluate our LEQ routing architecture by both static analysis and dynamic simulation on realistic provider network topologies. In the static case we only consider propagation delays and this corresponds to the scenario of a lightly loaded network. For the dynamic case we evaluate the LEQ routing architecture under a rate varying trafc matrix where the offered load on the links could approach the maximum link capacity over short periods of time. In each simulation scenario we compare the performance of the LEQ routing scheme with that of OSPF. We also show simulation results that characterize the parameters of the LEQ routing architecture. A. Simulation Setup For our network simulations we use large ISP network topologies such as AT&T, Telstra and Sprint. These topologies were obtained through the Rocketfuel project [15]. For the dynamic case we consider the Abilene network topology [16]. The key characteristics of these networks are summarized in Table III.
TABLE III M AIN CHARACTERISTICS OF EXAMPLE NETWORKS AT&T 391 1280 Sprint 376 1467 Telstra 97 132

ISP Number of nodes Number of links

Abilene 11 14

gures for Telstra and Abilene are omitted due to space limitations

40
Delay Difference (ms)

Delay (ms)

30 20 10 00 2

OSPF LEQ (m = 1) LEQ (m = 2) LEQ (m = 3)

Delay Difference (ms)

Delay Difference (ms)

4 8 6 Number of Hubs (M)

10

(a) delay difference

2 4 8 10 6 Number of Hubs (M) (b) max, avg and median delay (m = 3)

80 70 60 50 40 30 20 10 00

OSPF-max OSPF-avg OSPF-median LEQ-max LEQ-avg LEQ-median

router with access delays that are randomly chosen within the range. A server can impose restrictions on client participation based on the range of the access network delay it can support.
80 60 40 20 00
OSPF LEQ

120 100 80 60 40 20 00 2 4 8 6 Number of Hubs (M) 10


OSPF LEQ

Fig. 6. Comparison of OSPF and LEQ routing in AT&T network (NS = 1)

Figure 6(b) highlights the philosophy behind LEQ routing. It shows the maximum, average and median delay of the selected paths for both OSPF and LEQ routing with number of hubs per client set to 3 in the AT&T network. We see that LEQ routing achieves smaller delay difference at the expense of increasing the average delay of shorter paths in the network. However, the maximum delay among these paths is similar to that obtained with OSPF. The increase in average delay mainly comes from the increased latency for the smaller delay clients. We trade the increased delay of some clients for reducing the overall delay difference, because delay difference is more critical than delay as long as the absolute delay is bounded [17]. (3) With multiple servers, LEQ architecture achieves similar performance gains. In some applications such as P2P gaming and distributed live music performances, there are multiple servers at different places, and each client is associated with several servers. Figure 7 shows the inuence of the number of servers (NS ) and servers per client r on delay difference . Keeping the number of servers per client (r) xed, increasing the total number of servers would not inuence the delay difference of LEQ routing. With OSPF, increasing the total number of servers reduces the average delay difference. This is because with more servers, the clients can choose nearer servers. Thus their shortest path delay to the servers decreases and the delay difference decrease correspondingly. Even with 20 servers LEQ routing still performs better than OSPF.
60
Delay Difference (ms) Delay Difference (ms)
OSPF (r = 1) LEQ (r = 1) OSPF (r = 5) LEQ (r = 5)

2 4 8 6 Number of Hubs (M)

10

(a) AT&T network, acd : [0 50]


80
Delay Difference (ms)

(b) AT&T network acd : [0 100]


120 100 80 60 40 20 00
OSPF LEQ

60 40 20 00
OSPF LEQ

Delay Difference (ms)

2 4 8 6 Number of Hubs (M)

10

2 4 8 6 Number of Hubs (M)

10

(c) Telstra network, acd : [0 50]

(d) Telstra network acd : [0 100]

Fig. 8.

Delay difference with access delay

(1) The improvement in delay difference depends on the range of variation in the access network delays. Figures 8 (a) and (c) show that with access delay ranging from 0ms to 50ms, the delay difference can be reduced by 50% for Telstra network and 45% for AT&T network. However, with access delays ranging from 0 to 100ms (Figures 8(b) and (d)), the delay difference is reduced only by 25% for Telstra network and 35% for AT&T network. This result shows that when the access network delay difference is very large, LEQ routing performance will benet from improvements in access network technologies that reduce access delay variations.
90
Delay Difference (ms)

80 70 60 50 40 300

Delay Difference (ms)

30 25 20 15 10 5 00

50 40 30 20 10 00

OSPF (r = 1) LEQ (r = 1) OSPF (r = 5) LEQ (r = 5)

OSPF, Ns = 1 OSPF, Ns = 5 LEQ, Ns = 1 LEQ, Ns = 5

80 70 60 50 40 300
OSPF, Ns = 1 OSPF, Ns = 5 LEQ, Ns = 1 LEQ, Ns = 5 20 40 80 60 Maximum Delay (ms) (b) Telstra network

10 5 15 Number of Servers (Ns) (a) AT&T network

20

10 5 15 Number of Servers (Ns) (b) Telstra network

20

20 40 80 100 60 Maximum Delay (ms) (a) AT&T network

Fig. 9. User experience comparison of max delay and delay difference (acd = [0,50], M = 5 )

Fig. 7. Inuence of number of servers (NS ) and servers per client r in LEQ routing (M = 5, m = 2)

C. Single Administrative Domain: With Access Delay The classication of access delay is discussed in Section III For simplicity, we consider two ranges of access network delays (acd): [0ms, 50ms] and [0ms, 100ms] (We omit ms in the following text). We assume 20 clients on each edge

(2) LEQ improves user experience an important criteria for interactive applications. The user experience of interactive applications relates to both delay difference and delay. In both Telstra and AT&T network (Figure 9), LEQ routing has smaller delay difference and thus better fairness while keeping the same maximum delay as with OSPF routing. When there are multiple servers (NS =5, r = 1), the maximum delay for both OSPF and hub routing is reduced, while LEQ routing

still has signicant improvement in user experience (delay difference) as compared to OSPF. D. LEQ with Multiple Administrative Domains We consider two ASes, Sprint and AT&T, with the server residing on the AT&T network, while both Sprint and AT&T have clients on their edge routers with access delay within the range acd. We investigate the following scenarios for the two ASes: (1) ASes cooperate on routing (joint OSPF, joint LEQ) and, (2) ASes make routing decisions independently (OSPFLEQ, LEQ-LEQ). Our observations are as follows: (1) Independent of AS level cooperation, the AS hosting the server can improve game experience by supporting LEQ. Figure 10 plots the delay difference and maximum delay for several scenarios. In the joint OSPF and joint LEQ scenarios, we assume that the routing scheme has complete knowledge of both network topologies. Joint LEQ routing always performs signicantly better than joint OSPF. Even without the knowledge of the peering Sprint network topology, using LEQ architecture on AT&T network alone (OSPF-LEQ) provides better performance than using the standard OSPF routing on the joint topology (joint OSPF). (2) When the server is placed close to peering node, applying LEQ routing in the server host AS is enough to improve game experience signicantly. In Figure 10, we see that if LEQ routing is deployed independently in both Sprint and AT&T, the game experience is worse than when applying LEQ only on AT&T. This is because the two ASes do not cooperate but independently optimize for LEQ. When we apply LEQ routing in Sprint, within the Sprint network the clients may experience equal delay with each other, but when they come to the AT&T network, their delay difference with clients in AT&T is still large. (3) When two ASes have equal access delay, placing the server close to peering nodes5 improves game experience. In our study the server location is typically randomized except for those cases in Figure 10 and 11 labelled xed server. In the xed server scenario, game servers are located close to the peering nodes. We see that the game performance is improved by placing servers close to the peering node locations when access delays from both networks are in the same range (Sprint: [0-50], AT&T: [0-50]). From Figure 10, we see that when the server is placed close to the peering node, and AT&T applies LEQ routing without any knowledge of the Sprint network (OSPF-LEQ, x server case), we can achieve similar delay difference to the case when we have full knowledge of both topologies (joint LEQ case). Some game applications may only allow players with low access delay in Sprint to join the game to further improve game experience. As shown in Figure 11, when the access delay of players in Sprint is close to 0, the game performance is also improved. However, in this case, the location of the servers does not affect performance. Although players from Sprint edge router have less access delay, they take longer
5 peering

paths to reach the AT&T network where the server is located. Therefore, they can be viewed as similar to players in AT&T network with some access delay. When the access delays across the networks have large variability, placing servers near the peering nodes has no benet on LEQ. E. Dynamic Analysis In a typical service provider network, links are usually maintained at below 50% utilization. However it has been observed that in the presence of trafc burst, it is possible that on the time scale of minutes the average utilization could be close to 90% - 95% of the link capacity [18]. Under these conditions the queue size builds up at the links and contributes to the overall delay between the clients and the server. Therefore, we investigate the performance of LEQ architecture under dynamic trafc conditions. We implemented LEQ routing as a new application in ns2 for packet-level simulations. We generated two classes of packets: packets of background trafc and probing packets. The background trafc denotes trafc of all the other applications in the network. Since the game trafc is much smaller than background trafc and will not inuence the network conditions, we do not simulate them explicitly in our experiment. Instead, we use small probing packets that go through LEQ routing paths and we measure the actual latency experienced by these packets. To force some of the probing packets to go through LEQ routing paths, each client edge router marks the probing packets with the address of the destination node and sends them to a hub. The hub is selected in a round robin schedule among the m hubs allocated to the client edge router through the hub placement algorithm. Upon receiving the probing packets, the hub looks up the destination server from the packet and redirects it to the server. For comparative purposes, we also send probing packets through shortest path routes computed using Dijkstras algorithm parametrized by the propagation delay metric. For the dynamic analysis we use the Abilene network topology. We use a single server that is located at Washington D.C. All the 11 edge nodes have clients. The bandwidth capacity of each link in Abilene is xed at 10 Gbps. For background trafc, we generated real trafc matrices based on the Abilene Netow data [16]. The size of the probing packet is 48 bytes, which is similar to the size of general UDP packets in gaming applications [11]. (1) LEQ achieves reliability under transient congestion by providing multiple hubs for each client. In Figure 12, we run the experiment with 150 realistic Abilene trafc matrices, each applied for 10 seconds. During the time interval of 500 - 1000 seconds, we insert trafc burst on selected links by increasing the utilization to 90 percent of the link capacity. Probing packets are sent at a xed interval of 0.01 second. We then analyze all the packet traces and calculate the average delay difference for both LEQ routing and shortest path routing (OSPF).The over-loaded links were chosen based on a snapshot of the Abilene network operation center [19]. When transient congestion happens, LEQ routing with m = 2, 3

nodes are the nodes that connect the two ASes.

200

90 80

300

Delay Difference (ms)

Delay Difference (ms)

150

70 60 50 40 30 90

Delay Difference (ms)

rand server, [0-50][0-50] fix server, [0-50][0-50] rand server, [0][0-50] fix server, [0][0-50]

250 200 150 100 50

OSPF LEQ (m = 1) LEQ (m = 2) LEQ (m = 3)

100
joint OSPF joint LEQ OSPF-LEQ; rand server OSPF-LEQ; fix server LEQ-LEQ; fix server

50

120

140 160 Maximum Delay (ms)

180

200

100

110 120 Maximum Delay (ms)

130

140

0 0

500

1000

1500

Time (second)

Fig. 10. LEQ in peering ASes with access delay Fig. 11. Effect of access delay in peering ASes (acd: [0-50]) (OSPF-LEQ means applying OSPF in (The rst range is Sprint acd which is either close to 0
Sprint, LEQ in AT&T; LEQ-LEQ means applying LEQ or [0-50]; the second range is AT&T acd range, which is separately in both Sprint and OSPF.) always [0-50]. We apply OSPF in Sprint, LEQ in AT&T)

Fig. 12.

Transient congestion in LEQ routing

have alternate routes to get around the congested link. Thus the impact of transient congestion is less prominent on LEQ routing than it is on shortest path routing.
70 60 50 40 30 20 10 00

Delay Difference (ms)

500

OSPF LEQ

400 300 200 100 00 2 4 8 10 6 Number of Hubs Per Player (m) (b) Packet-level simulation of delay during congestion
OSPF LEQ

2 4 8 10 6 Number of Hubs Per Player (m) (a) Static analysis of propagation delay

Fig. 13.

Inuence of hubs per client (m) on LEQ

(2) Considering the tradeoff of robustness and performance, we need to assign each client edge router 2 or 3 hubs from the hub set. In Figure 13(a), m = 1 is the optimal value for the number of hubs in the static analysis. However, in Figure 13(b), during transient congestion, queuing delay becomes more critical than propagation delay. Thus adding one more hub (changing m from 1 to 2) provides more path diversity while reducing the average delay difference. Robustness to trafc variability and transient network congestion are important requirements for any real-time interactive application. In LEQ architecture this robustness can be achieved with m = 2 or 3 without severe impact on delay difference. This is consistent with the work in [20], where they prove that load balancing becomes much easier by just having two routing paths and the exibility to split trafc between them. VI. R ELATED W ORK Network support for gaming and other interactive services is a relatively new topic since the scalability and commercial signicance of these services has only recently become a signicant issue. In [1] the authors provide the motivation for network support and design a game booster box a network-based game platform that combines low level network awareness and game specic logic. The goal of the booster box is to ofoad network functions from the game server, specically network monitoring. Shaikh et al. [21] and Saha

et al. [22] proposed an online game hosting platform, a middleware solution based on existing grid components. The platform performs functions such as addition and removal of servers and administrative tasks such as new game deployment, directory management, player redirection to server, and game content distribution. The preliminary study in [23] presented the basic idea of LEQ as a service that could run on programmable routers. Our work discusses the deployment issues of LEQ architecture (e.g., with access delay, multiple administrative domains), provides algorithms for hub placement for different network settings, and has a complete simulation and evaluation of LEQ in various scenarios. As far as we know there are no theoretical results aimed at optimizing latency difference in the network. Previous work focused on reducing delay in the overlay network (e.g., RON [24]), or reducing bandwidth costs with bounded delay (e.g., VPN tree routing [25]). In this work, we have proved the NP-hardness of the hub placement problem and provide algorithms for optimal hub placement that achieves latency equalized paths in the network. Cha et al. [26] proposed a strategy to place relay nodes in the intradomain network. However, their placement algorithm is aimed at reducing cost, not delay difference. VII. C ONCLUSION The LEQ routing architecture and algorithms presented in this paper clearly provide a pathway for networks to support scalable and robust multi-party interactive applications. Based on the evaluation of our LEQ architecture, we conclude that, with minimal modications to the current routing infrastructure, provider networks can easily support and enhance the quality of multi-party interactive applications. It is also possible to envision exible routing services such as LEQ that are customized for specic applications across multiple ISPs by leveraging virtual networks [27]. R EFERENCES
[1] D. Bauer, S. Rooney, and P. Scotton, Network infrastructure for massively distributed games, in ACM NetGames, 2002. [2] A. Kapur, G. Wang, P. Davidson, and P. R. Cook, Interactive network media: A dream worth dreaming? in Organized Sound, 2005. [3] A. R. Greenwald, J. O. Kephart, and G. Tesauro, Strategic pricebot dynamics, in ACM Conference on Electronic Commerce, 1999.

Delay Difference (ms)

[4] Cisco telepresence solutions, http://www.cisco.com/en/US/netsol/ ns669/networking solutions solution segment home.html. [5] S. Zander and G. Armitage, Empirically measuring the QoS sensitivity of interactive online game players, Australasian Telecommunication Networks and Applications Conference (ATNAC), December 2004. [6] J. Brun, F. Safaei, and P. Boustead, Managing latency and fairness in networked games, Communications of The ACM, 2006. [7] C. Diot and L. Gautier, A distributed architecture for multiplayer interactive applications on the Internet, 1999. [8] E. Cronin, B. Filstrup, A. Kurc, and S. Jamin, A distributed multiplayer game server system, in ACM NetGames, 2002. [9] S. Zander, I. Leeder, and G. J. Armitage, Achieving fairness in multiplayer network games through automated latency balancing, Advances in Computer Entertainment Technology, 2005. [10] A. Abdelkhalek and A. Bilas, Parallelization and performance of interactive multiplayer game servers, in IPDPS, 2004. [11] J. Farber, Network game trafc modelling, in ACM NetGames, 2002. [12] J. Postel, Internet protocol: DARPA Internet program protocol specication, RFC 791, 1981. [13] T. Jehaes, D. D. Vleeschauwer, B. V. Doorselaer, E. Deckers, W. Naudts, K. Spruyt, and R. Smets, Access network delay in networked games, in ACM NetGames, 2003. [14] R. Mahajan, D. Wetherall, and T. Anderson, Mutually controlled routing with independent ISPs, in NSDI, 2007. [15] N. Spring, R. Mahajan, and D. Wetherall, Measuring ISP topologies with Rocketfuel, in IEEE/ACM Transactions On Networking, 2004. [16] http://abilene.internet2.edu/. [17] G. Armitage, An experimental estimation of latency sensitivity in multiplayer quake 3, in International Conference on Networks, 2003. [18] R. Prasad, C. Dovrolis, and M. Thottan, Router buffer sizing revisited: The role of the output/input capacity ratio, Proc. CoNext, 2007. [19] http://weathermap.grnoc.iu.edu/. [20] M. Mitzenmacher, The power of two choices in randomized load balancing, IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, pp. 10941104, 2001. [21] A. Shaikh, S. Sahu, M. Rosu, M. Shea, and D. Saha, Implementation of a service platform for online games, in ACM NetGames, 2004. [22] D. Saha, S. Sahu, and A. Shaikh, A service platform for online games, ACM NetGames, 2003. [23] M. Yu, M. Thottan, and L. Li, Latency equalization: A programmable routing service primitive, in ACM SIGCOMM PRESTO Workshop, 2008. [24] D. G. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris, Resilient overlay networks, in Proc. Symposium on Operating Systems Principles, Banff, Canada, 2001. [25] A. Gupta, J. Kleinberg, A. Kumar, R. Rastogi, and B. Yener, Provisioning a virtual private network: A network design problem for multicommodity ow, in ACM STOC, 2001. [26] M. Cha, S. Moon, C. D. Park, and A. Shaikh, Placing relay nodes for intra-domain path diversity, Proc. IEEE INFOCOM, 2006. [27] N. Feamster, L. Gao, and J. Rexford, How to lease the Internet in your spare time, ACM Computer Communication Review, 2007.

S-ar putea să vă placă și