Sunteți pe pagina 1din 3

On the Challenge of Assessing Overlay Topology Adaptation Mechanisms

Jochen Dinger and Hannes Hartenstein Institut f ur Telematik, Universit at Karlsruhe (TH), Germany dinger@tm.uka.de, hartenstein@rz.uni-karlsruhe.de

Abstract
Our thesis is that a peer-to-peer networks overlay topology should adapt to match the demand graph of the peerto-peer network. In order to assess the effectiveness of various adaptation mechanisms, a comparison with an optimal topology for a given demand graph would be helpful. However, several related optimization/decision problems have been shown to be NP-hard. The contributions of this paper are threefold: i) we briey survey NP-hardness results related to assessing overlay adaptation strategies, ii) we present a specic optimization problem and metric, and iii) we provide experimental results indicating the potential of optimizing the overlay topology. Finally, in the spirit of a challenge paper we state and discuss various open issues.

1. Introduction
In an overlay network, every node is a potential neighbor of each other node since the networks topology is a logical one. Due to constraints on resources, the question arises which topology would be an optimal one given the resource constraints. These constraints can be classied as follows: based on the underlying network characteristics, i.e., delay or capacity of actual links, based on location of data and services, and based on the nodess capabilities of managing peers, e.g., the number of direct neighbors a node can maintain. This resource constraint can be independent of the underlying network. For example, [6] and [5] consider the efciency with respect to the underlying network. In [2] the problem of nding an optimal path in an overlay is analyzed. For these two classes the corresponding optimization/decision problems are computationally intractable. We briey survey related NP-hardness results in Section 2.

In this paper we look at the third class of resource constraints where the constraints are solely based on the number of neighbors a node can actively manage. We furthermore assume that the peer-to-peer network should be able to re-organize itself in order to continuously optimize the performance of the peer-to-peer network by taking into account the actual network status. However, when we want to assess a topology adaptation mechanism, we would like to compare the derived network topologies with globally optimal ones. We assume that the demands of each node are known, i.e., we can quantify how often a node requests information of each other node. Furthermore, we assume that these demands are not necessarily uniformly distributed as it is frequently assumed in previous work. We also take different distributions into account like as it is done in [3], because demands in real world are not inevitable uniformly distributed. Based on the constraints and the demand information we can dene a cost metric to minimize the total delay or load in terms of hop counts (w.r.t. the virtual topology). While the information on demands is not available for the future, the retrospective view can serve for assessing the effectiveness of topology adaptation strategies. Unfortunately, the corresponding optimization problem as outlined in Section 3 appears to be a hard problem. To assess how much an optimal solution to our problem could differ from a classical P2P approach, we present some experimental evidence comparing a xed topology as used by Chord [7] with an (heuristically) optimized topology and with a lower bound based on the cost function of the demand graph (Section 4). The results show a signicant improvement for the optimized version in terms of costs when the query (demand) distribution is somewhat skewed. We conclude the paper with a discussion on future directions.

2 NP-Complete Problems and Conjectures


Finding a spanning tree with minimum diameter based on latency constraints of the underlying network has been issue of various papers like [1] and [5]. It has been shown that this problem is NP-hard. In [2] it is shown that given

a template for a composed service nding a suitable path in a service overlay network under QoS constraints raises NPhard problems. Also, the problem of allocating a singlepath ow for a specied demand in capacity-constrained networks represents a NP-complete problem [4].

Object: 0, 5, 7, 11, 12 QueryObject: 11, 6 QueryNodes: 0, 3 0

1 1 1 1 3
Object: 6, 15 QueryObject: 4 QueryNodes: 2

1 1

Object: 10, 13, 14 QueryObject: 8, 0 QueryNodes: 4, 0 Object: 1, 2, 3, 8 QueryObject: 15 QueryNodes: 3

2
Object: 4, 9 QueryObject: 12, 11 QueryNodes: 0, 0

3. Optimization Model
We argue that for a short period of time the P2P network can be treated as static network. In this period of time each node queries for some objects. Thus, each node has to contact other nodes, which store the objects it is interested in, through available connections. Each node has a query queue. Based on these queries (demands) and the constraint that each node can only establish a connection to d other nodes, we want to know which connections should have been chosen. Our metric to evaluate the topology is the distance (hop counts) between nodes weighted by the corresponding demands. For a precise understanding we dene a graph based model for our optimization problem: Given a complete graph G = (E, V ) with |V | = n 1) edges in an undirected graph and nodes and n(n 2 n (n 1) edges in a directed graph respectively. (Undirected means that all connections are bidirectional.)
i i Each node i has a demand Di = di 1 , . . . dn where dk denotes how often node i will query node k . The total number of queries is q = i,j di j.

Optimal adj. matrix: - x x o o x - o o x x o - x o o o x - x o x o x -

Figure 1. Example of a demand graph. For each node the objects stored at the node, the objects queried by the nodes, and the corresponding queried nodes are given. An optimal adjacency matrix leads to total costs of 8.

4. Experiments
For a small number of nodes n the problem can be solved exactly, but small translates to only about 16 nodes. Because we cannot in general calculate the optimal overlay for larger values of n, we calculate a nearly optimal solution. Our approach is as follows: We start with an empty set of edges. First, we order all demands by their value. Starting from the demand with the greatest value, we insert an edge for each demand. Such an edge begins at the demand source and ends at demand destination. If this is not possible anymore, because of degree constraints, we insert an edge such that the resulting path is optimal in a greedy fashion. If there are still unsatised demands and we can not insert anymore edges, we delete and mark the least dominating edge w.r.t. the total costs C . Then we return to step before. The algorithm terminates if all demands can be satised. This approach has been shown feasible in our experiments. Fig. 2 shows one of our results, given 23 to 210 nodes and uniform distribution of 4096 objects. The gure shows average values out of 100 runs. The nodes have been represented by a directed graph with an in- and outdegree constraint of log2 n. For comparison we also calculated the costs achieved through a Chord approach. In the case of Chord the ring was fully assigned and the key range adapted. To compare our heuristic algorithm, we additionally calculated the demand graph costs without edge degree constraints because this provides a lower bound. The q = n queries and query targets respectively have been generated with a Gaussian distribution ( = 1000, 2 = 100) based on the object id space. We also performed tests with different distributions and different query-node ratios. The heuristic approach results in signicantly lower costs when the query distribution (demand distribution) deviates from a uniform distribution (thus, is somewhat skewed). When

We are now looking for a subgraph G = (E , V ) of G such that costs are minimal under the following denitions: Let pathi,j denote the shortest path between nodes i and j in graph G . Let L(pathi,j ) denote the length of the path in hops. We then want to minimize C =
i i,j (L(pathi,j ) dj ).

The demand vectors can be visualized by the demand graph Gd = (E d , V ) where the edges represent the queries between the nodes and the edge weights represent the corresponding number of queries. An example of such a graph is shown in Fig. 1. For example, node 2 has to query node 0 two times. Thus there is an edge between node 0 and node 2 weighted with 2. The gure also shows an optimal adjacency matrix that has total costs of 8. If all vertices in the demand graph have degrees that are less or equal to d, the optimal solution is just the demand graph. But in general the demand graph does not fulll the degree constraint. Then, not only edges in the demand graph Gd are candidates for G but all possible edges from G have to be taken into account. Finding an optimal solution appears to be a hard problem. Therefore, we look at heuristics in the following section on experimental results.

4500 4000 3500 3000 Cost 2500 2000 1500 1000 500 0 8 16 32 64 128

q*log(n)/2 Chord Heuristic Demand

256

512

1024

# Nodes

Figure 2. Average costs comparing the demand graph,


the topology derived by the heuristic approach, the Chord scheme, and the theoretic costs of Chord for P2P networks of various sizes.

approaching a uniform distribution the corresponding costs get close to the costs of the Chord approach. Fig. 2 shows that the costs of the Chord topology are nearly the same as the estimated theoretic costs of q 1 2 log2 (n). As we can see in Fig. 2 our heuristic is close to the lower bound. The results show that there is a potential for optimization in certain situations. Clearly, our approach assumes global knowledge and ignores issues like robustness and, therefore, should not be misunderstood as a new topology adaptation mechanism itself. Instead, the proposed approach can be used to assess existing and new topology adaptation mechanisms and the costs of a complete decentralized design.

can be treated as a static network. For which kind of overlay scenario is the assumption acceptable that we treat the topology as static for an amount of time? Understanding dynamicity will be a major issues of our future work. We think that dependent on the scenario we can actually predict demands. Thus, modeling dynamics is the next step for enhanced P2P modeling. Such a model should include on the one hand node arrivals and departures and on the other hand also appropriate models for the distributions of queries over time. In particular, one also has to keep different forms in mind like Gaussian and Zipf-like [3] distributions. Evaluation of costs and computational complexity. Is the cost function we use an appropriate metric? Is the optimization problem actually NP-hard? A suitable NP reduction is missing. Fairness. We focused on the global optimum, i.e., single nodes might be handicapped for the global benet. But what about fairness for each node? In our opinion, the importance of fairness depends on the purpose, e.g., for SETI@home fairness is not an issue while for le-sharing it is important. Robustness. The optimal topology stated above is not robust because there are mainly single-paths between the nodes. How many additional edges are necessary for a sufcient degree of robustness? By nding answers to the questions stated above, a theory of overlay adaptation has to be derived that indicates under which assumptions an adaptation strategy improves the networks performance and reduces the cost of a P2P design.

References
[1] E. Brosh and Y. Shavitt. Approximation and heuristic algorithms for minimum delay application-layer multicast trees. In Proc. of INFOCOM, 2004. [2] X. Gu, K. Nahrstedt, R. Chang, and C. Ward. QoS-assured service composition in managed service overlay networks. In Proc. of Distributed Computing Systems, 2003. [3] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer networks. In Proc. of int. conf. on Supercomputing, New York, USA, 2002. [4] M. Pioro and D. Medhi. Routing, Flow, and Capacity Design in Communication and Computer Networks. Morgan Kaufmann, 2004. [5] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker. Topologically-aware overlay construction and server selection. In Proc. of INFOCOM, 2002. [6] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Int. Conf. on Distributed Systems Platf., 2001. [7] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. of SIGCOMM, 2001.

5. Discussion and Open Issues


Although we have seen that there is a potential for optimization when the demand distribution deviates from a uniform distribution over all nodes, various questions arise in the context of adaptation of overlay topologies: Routing and discovery. When we optimize an overlay topology with respect to some cost function, how can we nd an appropriate routing or search mechanism? For unstructured P2P approaches, the relationship between optimization cost functions and the quality of service of the P2P search can be quite immediate. However, since adaptive and distributed routing in general represents a challenge, the degree of topology dynamics and the associated costs with respect to re-routing and searching lead to a trade-off situation. Time scale and costs for reorganization. On the one hand, we assume that the overlay topology should change over time to adjust for current demands. On the other hand we assume that for some (short) period of time the network

S-ar putea să vă placă și