Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 1
INTRODUCTION
1.1 Motivation:
Peer-to-peer (P2P) networks have become an important infrastructure during the last years, and P2P networks have evolved from simple systems like Napster and Gnutella to more sophisticated ones based on distributed hash tables, such as CAN and CHORD. Although, schemes, based on hash functions, provide good performance for point queries (where the search key is known exactly), they almost does and loose scalability and performance. Obviously, for such queries we have to build some different infrastructure, seemingly, based on semantic relations among peers and data, they contain. There are two main intuitions that come to mind: queries can be routed only to a semantically chosen subset of peers, able to answer queries. If a peer cannot answer a query fully enough, it forwards the query only to its neighbors, which can also have answers and so on. Finally, the amount of flooding messages is reduced. shared data in the P2P systems often has pronounced ontological structure, because of its origin and relations to real world concepts (music, scientific papers,and movies) and its possible to sort such data into parts, classify its content somehow and identify semantically similar groups. These guesses were realized in conception , and presented in this write-up with several extensions from other papers. dont work for approximate, range, or text queries. In this case we must flood messages, like Gnutella
Key k is assigned to node successor(k), which is the node whose identifier is equal to or follows the identifier of k. If there are N nodes and K keys, then each node is responsible for roughly responsibility for keys. When a new node joins or leaves the network, keys changes hands.
If each node knows only the location of its successor, a linear search over the network could locate a particular key. This is a naive method for searching the network, since any given message could potentially have to be relayed through most of the network. Chord implements a faster search method. Chord requires each node to keep a "finger table" containing up to The entry of node will contain the address of successor entries. .
fig1[2]
3
CHAPTER 3
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems
3.1 INTRODUCTION
Peer-to-peer Internet applications have recently been popularized through file sharing applications like Napster, Gnutella and FreeNet. While much of the attention has been focused on the copyright issues raised by these particular applications, peer-to-peer systems have many interesting technical aspects like decentralized control, selforganization, adaptation and scalability. Peer-to-peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities and all communication is symmetric. Pastry is intended as general substrate for the construction of a variety of peer-to-peer Internet applications like global file sharing, file storage, group communication and naming systems. Several application have been built on top of Pastry to date, including a global, persistent storage utility called PAST and a scalable publish/subscribe system called SCRIBE . Other applications are under development. Pastry provides the following capability. Each node in the Pastry network has a unique numeric identifier (nodeId). When presented with a message and a numeric key, a Pastry node efficiently routes the message to the node with a nodeId that is numeri-cally closest to the key, among all currently live Pastry nodes. The expected number of routing steps is O(log N), where N is the number of Pastry nodes in the network. At each Pastry node along the route that a message takes, the application is notified and may perform application-specific computations related to the message. Pastry takes into account network locality; it seeks to minimize the distance mes-sages travel, according to a scalar proximity metric like the number of IP routing hops. Each Pastry node keeps track of its immediate neighbors in the nodeId space, and no-tifies applications of new node arrivals, node failures and recoveries. Because nodeIds are randomly assigned, with high probability, the set of nodes with adjacent nodeId is diverse in geography, ownership, jurisdiction, etc.[4] Applications can leverage this, as Pastry can route to one of knodes that are numerically closest to the key. A heuristic ensures that among a set of nodes with the kclosest 4
nodeIds to the key, the message is likely to first reach a node near the node from which the message originates, in terms of the proximity metric. Applications use these capabilities in different ways. PAST, for instance, uses a fileId, computed as the hash of the files name and owner, as a Pastry key for a file. Replicas of the file are stored on thekPastry nodes with nodeIds numerically closest to the fileId. A file can be looked up by sending a message via Pastry, using the fileId as the key. By definition, the lookup is guaranteed to reach a node that stores the file as long as one of the knodes is live. Moreover, it follows that the message is likely to first reacha node near the client, among the knodes; that node delivers the file and consumes themessage. Pastrys notification mechanisms allow PAST to maintain replicas of a file on the knodes closest to the key, despite node failure and node arrivals, and using onlylocal coordination among nodes with adjacent nodeIds. Details on PASTs use of Pastry As another sample application, in the SCRIBE publish/subscribe System, a list of subscribers is stored on the node with nodeId numerically closest to the topicId of a topic, where the topicId is a hash of the topic name. That node forms a rendez-vous point for publishers and subscribers. Subscribers send a message via Pastry using the topicId as the key; the registration is recorded at each node along the path. A publisher sends data to the rendez-vous point via Pastry, again using the topicId as the key. The rendez-vous point forwards the data along the multicast tree formed by the reverse paths from the rendez-vous point to all subscribers.[4]
address. As a result of this random assignment of nodeIds, with high probability, nodes with adjacent nodeIds are diverse in geography, ownership, jurisdiction, network attachment, etc. For the purpose of routing, nodeIds and keys are thought of as a sequence of digits with base 2b. Pastry routes messages to the node whose nodeId is numerically closest to the given key. This is accomplished as follows. In each routing step, a node normally forwards the message to a node whose nodeId shares with the key a prefix that is at least one digit (or b bits) longer than the prefix that the key shares with the present nodes id. If no such node is known, the message is forwarded to a node whose nodeId shares a prefix with the key as long as the current node, but is numerically closer to the key than the present nodes id.
The 2b-1 entries at row n of the routing table ,each refer to a node whose nodeId shares the present nodes nodeId in the first n digits,but whose n+1th digit has one of the 2 b-1 possible values other than the n+1th digit in the present nodes id. Each entry in the routing table contains the IP address of one of potentially manynodes whose nodeId have the appropriate prefix; in practice, a node is chosen that is close to the present node, according to the proximity metric.[4]
already part of the system. Such a node can be located automatically, for instance, using expanding ring IP multicast, or be obtained by the system administrator through outside channels. Pastry uses an optimistic approach to controlling concurrent node arrivals and departures. Since the arrival/departure of a node affects only a small number of exist-ing nodes in the system, contention is rare and an optimistic approach is appropriate. Briefly, whenever a node Aprovides state information to a node B, it attaches a timestamp to the message. Badjusts its own state based on this information and eventually sends an update message toA(e.g., notifying Aof its arrival). Battaches the original timestamp, which allows Ato check if its state has since changed. In the event that its state has changed, it responds with its updated state and Brestarts its operation. Node departure Nodes in the Pastry network may fail or depart without warning. In this section, we discuss how the Pastry network handles such node departures. A Pastry node is considered failed when its immediate neighbors in the nodeId space can no longer communicate with the node.
Some p eer-to-peer application we have built using Pastry replicate information on the kPastry nodes with the numerically closest nodeIds to a key in the Pastry nodeId space. PAST, for instance, replicates files in this way to ensure high availability despite node failures. Pastry naturally routes a message with the given key to the live node with the numerically closest nodeId, thus ensuring that the message reaches one of the knodes as long as at least one of them is live. Moreover, Pastrys locality properties make it likely that, along the route from a client to the numerically closest node, the message first reaches a node near the client, in terms of the proximity metric, among the knumerically closest nodes. This is useful in applications such as PAST, because retrieving a file from a nearby node minimizes client latency and network load. Moreover, observe that due to the random assignment of nodeIds, nodes with adjacent nodeIds are likely to be widely dispersed in the network. Thus, it is important to direct a lookup query towards a node that is located relatively near the client.[4]
Pastry uses a heuristic to overcome the prefix mismatch issue described above. The heuristic is based on estimating the density of nodeIds in the nodeId space using local information. Based on this estimation, the heuristic detects when a message approaches the set of k numerically closest nodes, and then switches to numerically nearest address based routing to locate the nearest replica. Pastry is able to locate the nearest node in over 75%, and one of the two nearest nodes in over 91% of all queries.
resumes. One solution to this problem involves the use of IP multicast. Pastry nodes can pe-riodically perform an expanding ring multicast search for other Pastry nodes in their vicinity. If isolated Pastry overlays exists, they will be discovered eventually, and rein-tegrated. To minimize the cost, this procedure can be performed randomly and infre-quently by Pastry nodes, only within a limited range of IP routing hops from the node, and only if no search was performed by another nearby Pastry node recently. As an added benefit, the results of this search can also be used to improve the quality of the routing tables.
11
12
13
that live nodes are never removed from the list. This preference for old contacts is driven by our analysis of Gnutella trace data collected by Saroiu et. al. Figure 1 shows the percentage of Gnutella nodes that stay online another hour as a function of current uptime. The longer a node has been up, the more likely it is to remain up another hour. By keeping the oldest live contacts around, k-buckets maximize the probability that the nodes they contain will remain online. A second benefit of k-buckets is that they pro-vide resistance to certain DoS attacks. One cannot flush nodes' routing state by flooding the system with new nodes. Kademlia nodes will only insert the new nodes in the k-buckets when old nodes leave the sys-tem.
4.4 Kademlia Protocol Brief: The Kademlia protocol consists of four RPCs: PING, STORE, FIND NODE, and FIND VALUE. The PINGRPC probes a node to see if it is online. STORE in-structs a node to store a hkey; valuei pair for later retrieval. FIND NODE takes a 160-bit ID as an argu-ment. The recipient of a the RPC returns hIP address; UDP port; Node IDi triples for the k nodes it knows about closest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest kbucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about). FIND VALUE behaves like FIND NODEreturning hIP address; UDP port; Node IDi tripleswith one exception. If the RPC recipient has received a STORE RPC for the key, it just returns the stored value. In all RPCs, the recipient must echo a 160-bit ran-dom RPC ID, which provides some resistance to ad-dress forgery. PINGs can also be piggy-backed on RPC replies for the RPC recipient to obtain addi-tional assurance of the sender's network address. The most important procedure a Kademlia partic-ipant must perform is to locate the k closest nodes to some given node ID. We call this procedure a node lookup. Kademlia employs a recursive algorithm for node lookups. The lookup initiator starts 15
by picking nodes from its closest non-empty k-bucket (or, if that bucket has fewer than entries, it just takes the closest nodes it knows of). The initiator then sends parallel, asynchronous FIND NODE RPCs to the concurrency parameter. In the recursive step, the initiator resends the FIND NODE to nodes it has learned about from pre-vious RPCs. (This recursion can begin before all of the previous RPCs have returned). Of the k nodes the initiator has heard of closest to the tar-get, it picks that it has not yet queried and re-sends the FIND NODE RPC to them. Nodes that fail to respond quickly are removed from consider-ation until and unless they do respond. If a round of FIND NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND NODE to all of the k closest nodes it has not already queried. The lookup terminates when the initiator has queried and gotten responses from the k closest nodes it has seen. When = 1 the lookup al-gorithm resembles Chord's in terms of message cost and the latency of detecting failed nodes. Most operations are implemented in terms of the above lookup procedure. To store a hkey,valuei pair, a participant locates the k closest nodes to the key and sends them STORE RPCs. Additionally, each node re-publishes the hkey,valuei pairs that it has ev-ery hour.2 This ensures persistence (as we show in our proof sketch) of the hkey,valuei pair with very high probability. Generally, we also require the orig-inal publishers of a hkey,valuei pair to republish it every 24 hours. Otherwise, all hkey,valuei pairs ex-pire 24 hours after the original publishing, in order to limit stale information in the system. Finally, in order to sustain consistency in the publishing-searching life-cycle of a hkey,valuei pair, we require that whenever a node w observes a new node u which is closer to some of w's hkey,valuei pairs, w replicates these pairs to u without removing them from its own database. To find a hkey,valuei pair, a node starts by per-forming a lookup to find the k nodes with IDs closest to the key. However, value lookups use FIND VALUE rather than FIND NODE RPCs. Moreover, the procedure halts immediately when any node returns the value. For caching purposes, once a lookup suc-ceeds, the requesting node stores the hkey,valuei pair at the closest node it observed to the key that did not return the value. Because of the unidirectionality of the topology, future searches for the same key are likely to hit cached entries before querying the closest node. Dur-ing times of high 16 nodes it has chosen. is a system-wide
popularity for a certain key, the system might end up caching it at many nodes. To avoid over-caching, we make the expiration time of a hkey,valuei pair in any node's database exponen-tially inversely proportional to the number of nodes between the current node and the node whose ID is closest to the key ID. While simple LRU eviction would result in a similar lifetime distribution, there is no natural way of choosing the cache size, since nodes have no a priori knowledge of how many val-ues the system will store. Buckets will generally be kept constantly fresh, due to the traffic of requests traveling through nodes. To avoid pathological cases when no traffic exists, each node refreshes a bucket in whose range it has not performed a node lookup within an hour. Re-freshing means picking a random ID in the bucket's range and performing a node search for that ID. To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbor. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes' k-buckets as nec-essary.
assumption, we show that the node lookup procedure is correct and takes logarithmic time. Sup-pose the closest node to the target ID has depth h. If none of this node's h most significant k-buckets is empty, the lookup procedure will find a node half as close (or rather whose distance is one bit shorter) in each step, and thus turn up the node in h log k steps. If one of the node's k-buckets is empty, it could be the case that the target node resides in the range of the empty bucket. In this case, the final steps will not decrease the distance by half. How-ever, the search will proceed exactly as though the bit in the key corresponding to the empty bucket had been flipped. Thus, the lookup algorithm will always return the closest node in h log k steps. More-over, once the closest node is found, the concurrency switches from to k. The number of steps to find the remaining k 1 closest nodes can be no more than the bucket height of the closest node in the kth-closest node, which is unlikely to be more than a constant plus log k. To prove the correctness of the invariant, first con-sider the effects of bucket refreshing if the invariant holds. After being refreshed, a bucket will either contain k valid nodes or else contain every node in its range if fewer than k exist. (This follows from the correctness of the node lookup procedure.) New nodes that join will also be inserted into any buckets that are not full. Thus, the only way to violate the in-variant is for there to exist k + 1 or more nodes in the range of a particular bucket, and for the k actually contained in the bucket all to fail with no intervening lookups or refreshes. However, k was precisely cho-sen for the probability of simultaneous failure within an hour (the maximum refresh time) to be small. In practice, the probability of failure is much smaller than the probability of k nodes leaving within an hour, as every incoming or outgoing re-quest updates nodes' buckets. This results from the symmetry of the XOR metric, because the IDs of the nodes with which a given node communicates dur-ing an incoming or outgoing request are distributed exactly compatibly with the node's bucket ranges. Moreover, even if the invariant does fail for a sin-gle bucket in a single node, this will only affect run-ning time (by adding a hop to some lookups), not correctness of node lookups. For a lookup to fail, k nodes on a lookup path must each lose k nodes in the same bucket with no intervening lookups or refreshes. If the different nodes' buckets have no overlap, this happens with probability 2 lower probability of failure. 18
k2
. Other-wise, nodes
appearing in multiple other nodes' buck-ets will likely have longer uptimes and thus
Now we look at a hkey,valuei pair's recovery. When a hkey,valuei pair is published, it is popu-lated at the k nodes, closest to the key. It is also re-published every hour. Since even new nodes (the least reliable) have probability 1=2 of lasting one hour, after one hour the hkey,valuei pair will still be present on one of the k nodes closest to the key with probability 1 2 k . This property is not violated by the insertion of new nodes that are close to the key, because as soon as such nodes are inserted, they contact their closest nodes in order to fill their buck-ets and thereby receive any nearby hkey,valuei pairs they should store. Of course, if the k closest nodes to a key fail and the hkey,valuei pair has not been cached will lose the pair.
4.6 Discussion
The XOR-topology-based routing that we use very much resembles the first step in the routing algo-rithms of Pastry , Tapestr, and Plaxton's dis-tributed search algorithm . All three of these, however, run into problems when they choose to ap-proach the target node b bits at a time (for acceler-ation purposes). Without the XOR topology, there is a need for an additional algorithmic structure for discovering the target within the nodes that share the same prefix but differ in the next b-bit digit. All three algorithms resolve this problem in different ways, each with its own drawbacks; they all require secondary routing tables of size O(2b) in addition to the main tables of size O(2b log2b n). This increases the cost of bootstrapping and maintenance, compli-cates the protocols, and for Pastry and Tapestry pre-vents a formal analysis of correctness and consis-tency. Plaxton has a proof, but the system is less geared for highly faulty environments like peer-to-peer networks. Kademlia, in contrast, can easily be optimized with a base other than 2. We configure our bucket table so as to approach the target b bits per hop. This requires having one bucket for each range of nodes at a distance [j2160 (i+1)b ; (j + 1)2160 (i+1)b ] from us, for each 0 < j < 2b and 0 i < 160=b, which amounts to expected no more than (2b 1) log2b n buckets with actual entries. The implementation cur-rently uses b = 5.
4.7 Summary
With its novel XOR-based metric topology, Kadem-lia is the first peer-to-peer system 19
to combine provable consistency and performance, latency-minimizing routing, and a symmetric, unidirectional topology. Kademlia furthermore introduces a con-currency parameter, , that lets people trade a con-stant factor in bandwidth for asynchronous lowest-latency hop selection and delay-free fault recovery. Finally, Kademlia is the first peer-to-peer system to exploit the fact that node failures are inversely related to uptime.
CHAPTER 5
Koorde Protocol
5.1 Introduction:
It is a new routing protocol. It shares almost all aspects with Chord But, meets (to within constant factor) all lower bounds just mentioned: It has degree 2 and O(log n) hop Or degree log n and O(log n / loglog n) hops and fault tolerant. Like Chord it also has O(log n) load balance or constant with O(log n) times degree. Its each node has 2 outgoing neighbors and also two incoming neighbors. It can show good routing load balance need b = log n bits for n distinct nodes So log n hops to route.[3]
Machines not all aware of each other Each tracks small set of neighbors Route to responsible node via sequence of hops to neighbors 20
5.4 Tradeoffs:
With larger degree, hope to achieve Smaller hop count Better fault tolerance
But higher degree implies More routing table state per node Higher maintenance overhead to keep routing tables up to date
21
Theorem: to tolerate half nodes failing, (e.g. net partition) need degree W(log n) Pf: if less, some node loses all neighbors Might as well take O(log n / loglog n) hops!
Koorde uses a deBruijn network Fingers shift in one bit Degree 2 (2 possible bits to shift in) Diameter log n
22
CHAPTER 6
Simlation framework and tools used: 6.1 Omnet++4.2.1
6.1.1 Introduction OMNeT++ is a discrete event simulation environment. Its primary application area is the simulation of communication networks, but because of its generic and flexible architecture, is successfully used in other areas like the simulation of complex IT systems, queueing networks or hardware architectures as well. OMNeT++ provides a component architecture for models. Components (modules) are programmed in C++, then assembled into larger components and models using a high-level language (NED). Reusability of models comes for free. OMNeT++ has extensive GUI support, and due to its modular architecture, the simulation kernel (and models) can be embedded easily into your applications. Although OMNeT++ is not a network simulator itself, it is 24
currently gaining widespread popularity as a network simulation platform in the scientific community as well as in industrial settings, and building up a large user community.[11]
6.1.2 Components
simulation kernel library compiler for the NED topology description language OMNeT++ IDE based on the Eclipse platform GUI for simulation execution, links into simulation executable (Tkenv) command-line user interface for simulation execution (Cmdenv) utilities (makefile creation tool, etc.) documentation, sample simulations, etc.
6.1.3 Platforms
OMNeT++ runs on ubuntu 11.10 ,Linux, Mac OS X, other Unix-like systems and on Windows (XP, Win2K, Vista, 7). The OMNeT++ IDE requires Linux32/64, Mac OS X 10.5 or Windows XP.[7]
6.1.4 ScreenShots
25
26
27
6.2 Oversim/inet-Oversim
6.2.1 Introduction OverSim is an OMNeT++-based open-source simulation framework for overlay and peer-to-peer networks, developed at the Institute of Telematics, University of Karlsruhe (TH), Germany. The simulator contains several models for structured (e.g. Chord, Kademlia, Pastry) and unstructured (e.g. GIA) peer-to-peer protocols. An example implementation of the framework is an implementation of a peer-to-peer SIP communications network.[6]
packet losses (INETUnderlay), and on the other hand a fast and simple alternative model for high simulation performance (SimpleUnderlay).[5] Scalability: OverSim was designed with performance in mind. On a modern desktop PC a typical Chord network of 10,000 nodes can be simulated in real-time. The simulator was used to successfully simulate networks of up to 100,000 nodes. Base Overlay Class: The base overlay class facilitates the implementation of structured peer-to-peer protocols by providing a RPC interface, a generic lookup class and a common API keybased routing interface to the application.
Reuse of Simulation Code: The different implementations of overlay protocols are reusable for real network applications, so that researchers can validate the simulator framework results by comparing them to the results from real-world test networks like PlanetLab. Therefore, the simulation framework is able to handle and assemble real network packets and to communicate with other implementations of the same overlay protocol. Statistics: The simulator collects various statistical data such as sent, received, or forwarded network traffic per node, successful or unsuccessful packet delivery, and packet hop count. Inet: The INET framework is an open-source communication networks simulationpackage, written for the OMNEST/OMNeT++ simulation system. The INET framework contains models for several Internet protocols: beyond TCP and IP there is UDP, Ethernet, PPP and LS with LDP and RSVP-TE signalling.[5] 29
6.3 SCAVETOOL: Scave is the result analysis tool of OMNeT++ and its task is to help the user process and visualize simulation results saved into vector and scalar files. Scave is designed so that the user can work equally well on the output of a single simulation run (one or two files) and the result of simulation batches (which may be several hundred files, possibly in multiple directories)[12]. Ad-hoc browsing of the data is supported in addition to systematic and repeatable processing. With the latter, all processing and charts are stored as recipes. For example, if simulations need to be re-run due to a model bug or misconfiguration, existing charts do not need to be drawn all over again. Simply replacing the old result files with the new ones will result in the charts being automatically displayed with the new data. Scave is implemented as a multi-page editor. What the editor edits is the recipe, which includes what files to take as inputs, what data to select from them, what (optional) processing to apply, and what kind of charts to create from them. The pages (tabs) of the editor roughly correspond to these steps. You will see that Scave is much more than just a union of the OMNeT++ 3.x
30
The first page displays the result files that serve as input for the analysis. The upper half specifies what files to select, by explicit filenames or by wildcards. The lower half shows what files actually matched the input specification and what runs they contain. Note that OMNeT++ result files contain a unique run ID and several metadata annotations in addition to the actual recorded data. The third tree organizes simulation runs according to their experimentmeasurementreplication labels.[11] The underlying assumption is that users will organize their simulation-based research into various experiments. An experiment will consist of several measurements which are typically (but not necessarily) simulations done with the same model 31
but with different parameter settings; that is, the user will explore the parameter space with several simulation runs. To gain statistical confidence in the results, each measurement will be possibly repeated several times with different random number seeds. It is easy to set up such scenarios with the improved ini files of OMNeT++. 4.x. Then, the experiment-measurement-replication labels will be assigned more-or- less automatically please refer to the Inifile document (Configuring Simulations in OMNeT++ 4.x) for more discussion.
The second page displays results (vectors, scalars, and histograms) from all files in tables and lets the user browse them. Results can be sorted and filtered. Simple filtering is possible with combo boxes, or when that is not enough, the user can write arbitrarily complex filters using a generic pattern matching expression language. Selected or filtered data can be immediately plotted, or remembered in named datasets for further processing.[11]
32
It is possible to define reusable datasets that are basically recipes on how to select and process data received from the simulation. You can add selection and data processing nodes to a dataset. Chart drawing is possible at any point in the processing tree.
33
Line charts are typically drawn from time-series data stored in vector files. Preprocessing of the data is possible in the dataset. The line chart component can be configured freely to display the vector data according to your needs.[11]
34
Bar charts are created from scalar results and histograms. Relevant data can be grouped and displayed via the Bar chart component. Colors, chart type, and other display attributes can be set on the component.
35
The Output Vector View can be used to inspect the raw numerical data when required. It can show the original data read from the vector file, or the result of a computation. The user can select a point on the line chart or a vector in the Dataset View and its content will be displayed.
The Dataset View is used to show the result items contained in the dataset. The content of the view corresponds to the state of the dataset after the selected processing is done. 36
cMessage* bucketRefreshTimer; cMessage* siblingPingTimer; public: Kademlia(); ~Kademlia(); void initializeOverlay(int stage); void finishOverlay(); void joinOverlay(); bool isSiblingFor(const NodeHandle& node,const OverlayKey& key, int numSiblings, bool* err ); int getMaxNumSiblings(); int getMaxNumRedundantNodes(); void handleTimerEvent(cMessage* msg); bool handleRpcCall(BaseCallMessage* msg); void handleUDPMessage(BaseOverlayMessage* msg); virtual void proxCallback(const TransportAddress& node, int rpcId, cPolymorphic *contextPointer, Prox prox); protected: NodeVector* findNode(const OverlayKey& key, int numRedundantNodes, int numSiblings, BaseOverlayMessage* msg); void handleRpcResponse(BaseResponseMessage* msg, cPolymorphic* context, int rpcId, simtime_t rtt); void handleRpcTimeout(BaseCallMessage* msg, const TransportAddress& dest, cPolymorphic* context, int rpcId, const OverlayKey& destKey); /** * handle a expired bucket refresh timer*/ void handleBucketRefreshTimerExpired(); OverlayKey distance(const OverlayKey& x, const OverlayKey& y, bool useAlternative = false) const; /** * updates information shown in GUI*/ void updateTooltip(); virtual void lookupFinished(bool isValid); virtual void handleNodeGracefulLeaveNotification(); friend class KademliaLookupListener; private: uint32_t bucketRefreshCount; /*< statistics: total number of bucket refreshes */ uint32_t siblingTableRefreshCount; /*< statistics: total number of sibling table refreshes */ uint32_t nodesReplaced; KeyDistanceComparator<KeyXorMetric>* comparator; KademliaBucket* siblingTable; std::vector<KademliaBucket*> routingTable; int numBuckets; void routingInit(); 38
void routingDeinit(); int routingBucketIndex(const OverlayKey& key, bool firstOnLayer = false); KademliaBucket* routingBucket(const OverlayKey& key, bool ensure); bool routingAdd(const NodeHandle& handle, bool isAlive, simtime_t rtt = MAXTIME, bool maintenanceLookup = false); bool routingTimeout(const OverlayKey& key, bool immediately = false); void refillSiblingTable(); void sendSiblingFindNodeCall(const TransportAddress& dest); void setBucketUsage(const OverlayKey& key); bool recursiveRoutingHook(const TransportAddress& dest, BaseRouteMessage* msg); bool handleFailedNode(const TransportAddress& failed); }; #endif
.NED file
module KademliaModules like IOverlay { gates: input udpIn; // gate from the UDP layer output udpOut; // gate to the UDP layer input tcpIn; // gate from the TCP layer output tcpOut; // gate to the TCP layer input appIn; // gate from the application output appOut; // gate to the application submodules: kademlia: Kademlia { parameters: @display("p=60,60;i=block/circle"); } connections allowunconnected: udpIn --> kademlia.udpIn; udpOut <-- kademlia.udpOut; appIn --> kademlia.appIn; appOut <-- kademlia.appOut; }
.CC file
#include "KademliaBucket.h" KademliaBucket::KademliaBucket(uint16_t maxSize, const Comparator<OverlayKey>* comparator) : BaseKeySortedVector< KademliaBucketEntry >(maxSize, comparator) { lastUsage = -1; } 39
KademliaBucket::~KademliaBucket() { } KOORDE: .CC file #include <IPAddressResolver.h> #include <IPvXAddress.h> #include <IInterfaceTable.h> #include <IPv4InterfaceData.h> #include <GlobalStatistics.h> #include "Koorde.h" using namespace std; namespace oversim { Define_Module(Koorde); void Koorde::initializeOverlay(int stage) { // because of IPAddressResolver, we need to wait until interfaces // are registered, address auto-assignment takes place etc. if (stage != MIN_STAGE_OVERLAY) return; // fetch some parameters deBruijnDelay = par("deBruijnDelay"); deBruijnListSize = par("deBruijnListSize"); shiftingBits = par("shiftingBits"); useOtherLookup = par("useOtherLookup"); useSucList = par("useSucList"); setupDeBruijnBeforeJoin = par("setupDeBruijnBeforeJoin"); setupDeBruijnAtJoin = par("setupDeBruijnAtJoin"); // init flags breakLookup = false; // some local variables deBruijnNumber = 0; deBruijnNodes = new NodeHandle[deBruijnListSize]; // statistics deBruijnCount = 0; deBruijnBytesSent = 0; // add some watches WATCH(deBruijnNumber); WATCH(deBruijnNode); // timer messages deBruijn_timer = new cMessage("deBruijn_timer"); Chord::initializeOverlay(stage); } Koorde::~Koorde() 40
{ cancelAndDelete(deBruijn_timer); } void Koorde::changeState(int toState) { Chord::changeState(toState); switch(state) { case INIT: // init de Bruijn nodes deBruijnNode = NodeHandle::UNSPECIFIED_NODE; for (int i=0; i < deBruijnListSize; i++) { deBruijnNodes[i] = NodeHandle::UNSPECIFIED_NODE; } updateTooltip(); break; case BOOTSTRAP: if (setupDeBruijnBeforeJoin) { // setup de bruijn node before joining the ring cancelEvent(join_timer); cancelEvent(deBruijn_timer); scheduleAt(simTime(), deBruijn_timer); } else if (setupDeBruijnAtJoin) { cancelEvent(deBruijn_timer); scheduleAt(simTime(), deBruijn_timer); } break; case READY: // init de Bruijn Protocol cancelEvent(deBruijn_timer); scheduleAt(simTime(), deBruijn_timer); // since we don't need the fixfingers protocol in Koorde cancel timer cancelEvent(fixfingers_timer); break; default: break; } } void Koorde::handleTimerEvent(cMessage* msg) { if (msg->isName("deBruijn_timer")) { handleDeBruijnTimerExpired(); } else if (msg->isName("fixfingers_timer")) { handleFixFingersTimerExpired(msg); } else { Chord::handleTimerEvent(msg); } 41
} bool Koorde::handleFailedNode(const TransportAddress& failed) { if (!deBruijnNode.isUnspecified()) { if (failed == deBruijnNode) { deBruijnNode = deBruijnNodes[0]; for (int i = 0; i < deBruijnNumber - 1; i++) { deBruijnNodes[i] = deBruijnNodes[i+1]; } if (deBruijnNumber > 0) { deBruijnNodes[deBruijnNumber - 1] = NodeHandle::UNSPECIFIED_NODE; --deBruijnNumber; } } else { bool removed = false; for (int i = 0; i < deBruijnNumber - 1; i++) { if ((!deBruijnNodes[i].isUnspecified()) && (failed == deBruijnNodes[i])) { removed = true; } if (removed || ((!deBruijnNodes[deBruijnNumber 1].isUnspecified()) && failed == deBruijnNodes[deBruijnNumber - 1])) { deBruijnNodes[deBruijnNumber - 1] = NodeHandle::UNSPECIFIED_NODE; --deBruijnNumber; } } } } return Chord::handleFailedNode(failed); } void Koorde::handleDeBruijnTimerExpired() { OverlayKey lookup = thisNode.getKey() << shiftingBits; if (state == READY) { if (successorList->getSize() > 0) { // look for some nodes before our actual debruijn key // to have redundancy if our de-bruijn node fails lookup -= (successorList>getSuccessor(successorList->getSize() / 2).getKey() - thisNode.getKey()); } 42
if (lookup.isBetweenR(thisNode.getKey(), successorList->getSuccessor().getKey()) || successorList->isEmpty()) { int sucNum = successorList->getSize(); if (sucNum > deBruijnListSize) sucNum = deBruijnListSize; deBruijnNode = thisNode; for (int i = 0; i < sucNum; i++) { deBruijnNodes[i] = successorList>getSuccessor(i); deBruijnNumber = i+1; } updateTooltip(); } else if (lookup.isBetweenR(predecessorNode.getKey(), thisNode.getKey())) { int sucNum = successorList->getSize(); if ((sucNum + 1) > deBruijnListSize) sucNum = deBruijnListSize - 1; deBruijnNode = predecessorNode; deBruijnNodes[0] = thisNode; for (int i = 0; i < sucNum; i++) { deBruijnNodes[i+1] = successorList>getSuccessor(i); deBruijnNumber = i+2; } updateTooltip(); } else { DeBruijnCall* call = new DeBruijnCall("DeBruijnCall"); call->setDestKey(lookup); call->setBitLength(DEBRUIJNCALL_L(call)); sendRouteRpcCall(OVERLAY_COMP, deBruijnNode, call->getDestKey(), call, NULL, DEFAULT_ROUTING); } cancelEvent(deBruijn_timer); scheduleAt(simTime() + deBruijnDelay, deBruijn_timer); } else { if (setupDeBruijnBeforeJoin || setupDeBruijnAtJoin) { DeBruijnCall* call = new DeBruijnCall("DeBruijnCall"); call->setDestKey(lookup); call->setBitLength(DEBRUIJNCALL_L(call)); sendRouteRpcCall(OVERLAY_COMP, bootstrapNode, call->getDestKey(), 43
call, NULL, DEFAULT_ROUTING); scheduleAt(simTime() + deBruijnDelay, deBruijn_timer); } } } #if 0 void Koorde::handleFixFingersTimerExpired(cMessage* msg) { // just in case not all timers from Chord code could be canceled } #endif void Koorde::handleUDPMessage(BaseOverlayMessage* msg) { Chord::handleUDPMessage(msg); } bool Koorde::handleRpcCall(BaseCallMessage* msg) { if (state == READY) { // delegate messages RPC_SWITCH_START( msg ) RPC_DELEGATE( DeBruijn, handleRpcDeBruijnRequest ); RPC_SWITCH_END( ) if (RPC_HANDLED) return true; } else { EV << "[Koorde::handleRpcCall() @ " << thisNode.getIp() << " (" << thisNode.getKey().toString(16) << ")]\n" << " Received RPC call and state != READY!" << endl; } return Chord::handleRpcCall(msg); } 1].getKey())) { return deBruijnNodes[i]; } } return deBruijnNodes[deBruijnNumber-1]; } const NodeHandle& Koorde::walkSuccessorList(const OverlayKey& key) { for (unsigned int i = 0; i < successorList->getSize()1; i++) { if (key.isBetweenR(successorList>getSuccessor(i).getKey(), successorList->getSuccessor(i+1).getKey())) { 44
return successorList->getSuccessor(i); } } return successorList->getSuccessor(successorList >getSize()-1); } void Koorde::updateTooltip() { // Updates the tooltip display strings. if (ev.isGUI()) { std::stringstream ttString; // show our predecessor, successor and de Bruijn node in tooltip ttString << "Pred "<< predecessorNode << endl << "This " << thisNode << endl << "Suc " << successorList>getSuccessor() << endl << "DeBr " << deBruijnNode << endl; ttString << "List "; for (unsigned int i = 0; i < successorList->getSize(); i+ +) { ttString << successorList->getSuccessor(i).getIp() << " "; } ttString << endl; ttString << "DList "; for (int i = 0; i < deBruijnNumber; i++) { ttString << deBruijnNodes[i].getIp() << " "; } ttString << endl; getParentModule()->getParentModule()-> getDisplayString().setTagArg("tt", 0, ttString.str().c_str()); getParentModule()>getDisplayString().setTagArg("tt", 0, ttString.str().c_str()); getDisplayString().setTagArg("tt", 0,ttString.str().c_str()); // draw an arrow to our current successor showOverlayNeighborArrow(successorList>getSuccessor(), true,"m=m,50,0,50,0;ls=red,1"); } } void Koorde::finishOverlay() { // statistics simtime_t time = globalStatistics>calcMeasuredLifetime(creationTime); if (time >= GlobalStatistics::MIN_MEASURED) { 45
globalStatistics->addStdDev("Koorde: Sent DEBRUIJN Messages/s",deBruijnCount / time); globalStatistics->addStdDev("Koorde: Sent DEBRUIJN Bytes/s", deBruijnBytesSent / time); } Chord::finishOverlay(); } void Koorde::recordOverlaySentStats(BaseOverlayMessage* msg) { Chord::recordOverlaySentStats(msg); BaseOverlayMessage* innerMsg = msg; while (innerMsg->getType() != APPDATA && innerMsg->getEncapsulatedPacket() != NULL) { innerMsg = static_cast<BaseOverlayMessage*>(innerMsg>getEncapsulatedPacket()); } switch (innerMsg->getType()) { case RPC: { if ((dynamic_cast<DeBruijnCall*>(innerMsg) != NULL) || (dynamic_cast<DeBruijnResponse*>(innerMsg) != NULL)) { RECORD_STATS(deBruijnCount++; deBruijnBytesSent +=msg>getByteLength()); } break; } } } OverlayKey Koorde::findStartKey(const OverlayKey& startKey,const OverlayKey& endKey,const OverlayKey& destKey,int& step) { OverlayKey diffKey, newStart, tmpDest, newKey, powKey; int nBits; if (startKey == endKey) return startKey; diffKey = endKey - startKey; nBits = diffKey.log_2(); if (nBits < 0) { nBits = 0; } while ((startKey.getLength() - nBits) % shiftingBits ! = 0) { nBits--; } 46
step = nBits + 1; #if 0 // TODO: work in progress to find better start key uint shared; for (shared = 0; shared < (startKey.getLength() nBits); shared += shiftingBits) { if (destKey.sharedPrefixLength(startKey << shared) >= (startKey.getLength() - nBits - shared)) { break; } } uint nBits2 = startKey.getLength() - shared; newStart = (startKey >> nBits2) << nBits2; tmpDest = destKey >> (destKey.getLength() - nBits2); newKey = tmpDest + newStart; std::cout << "startKey: " << startKey.toString(2) << endl << "endKey : " << endKey.toString(2) << endl << "diff : " << (endKeystartKey).toString(2) << endl << "newKey : " << newKey.toString(2) << endl << "destKey : " << destKey.toString(2) << endl << "nbits : " << nBits << endl << "nbits2 : " << nBits2 << endl; // is the new constructed route key bigger than our start key return it if (newKey.isBetweenR(startKey, endKey)) { std::cout << "HIT" << endl; return newKey; } else { nBits2 -= shiftingBits; newStart = (startKey >> nBits2) << nBits2; tmpDest = destKey >> (destKey.getLength() nBits2); newKey = tmpDest + newStart; if (newKey.isBetweenR(startKey, endKey)) { std::cout << "startKey: " << startKey.toString(2) << endl << "endKey : " << endKey.toString(2) << endl<< "diff : " << (endKey-startKey).toString(2) << endl<< "newKey : " << newKey.toString(2) << endl << "destKey : " << destKey.toString(2) << endl << "nbits : " << nBits << endl<< "nbits2 : " << nBits2 << endl; std::cout << "HIT2" << endl; return newKey; } } 47
std::cout << "MISS" << endl; #endif newStart = (startKey >> nBits) << nBits; tmpDest = destKey >> (destKey.getLength() - nBits); newKey = tmpDest + newStart; // is the new constructed route key bigger than our start key return it if (newKey.isBetweenR(startKey, endKey)) { return newKey; } // If the part of the destination key smaller than the one of // the original key add pow(nBits) (this is the first bit where // the start key and end key differ) to the new constructed key // and check if it's between start and end key. newKey += powKey.pow2(nBits); if (newKey.isBetweenR(startKey, endKey)) { return newKey; } else { // this part should not be called throw cRuntimeError("Koorde::findStartKey(): Invalid start key"); return OverlayKey::UNSPECIFIED_KEY; } } void Koorde::findFriendModules() { successorList = check_and_cast<ChordSuccessorList*> (getParentModule()>getSubmodule("successorList")); } void Koorde::initializeFriendModules() { // initialize successor list successorList>initializeList(par("successorListSize"), thisNode, this); } }; //namespace
.NED file
#ifndef __KOORDE_H_ #define __KOORDE_H_ #include <omnetpp.h> #include <IPvXAddress.h> #include <OverlayKey.h> #include <NodeHandle.h> #include <BaseOverlay.h> 48
#include "../chord/ChordSuccessorList.h" #include "../chord/Chord.h" namespace oversim { class Koorde : public Chord { public: virtual ~Koorde(); // see BaseOverlay.h virtual void initializeOverlay(int stage); // see BaseOverlay.h virtual void handleTimerEvent(cMessage* msg); // see BaseOverlay.h virtual void handleUDPMessage(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void recordOverlaySentStats(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void finishOverlay(); virtual void updateTooltip (); protected: //parameters int deBruijnDelay; /**< number of seconds between two de bruijn calls */ int deBruijnNumber; /**< number of current nodes in de bruijn list; depend on number of nodes in successor list*/ int deBruijnListSize; /**< maximal number of nodes in de bruijn list */ int shiftingBits; /**< number of bits concurrently shifted in one routing step */ bool useOtherLookup; /**< flag which is indicating that the optimization other lookup is enabled */ bool useSucList; /**< flag which is indicating that the optimization using the successorlist is enabled */ bool breakLookup; /**< flag is used during the recursive step when returning this node */ bool setupDeBruijnBeforeJoin; /**< if true, first setup the de bruijn node using the bootstrap node and than join the ring */ bool setupDeBruijnAtJoin; /**< if true, join the ring and setup the de bruijn node using the bootstrap node in parallel */ //statistics int deBruijnCount; /**< number of de bruijn calls */ int deBruijnBytesSent; /**< number of bytes sent during de bruijn calls*/ //Node handles NodeHandle* deBruijnNodes; /**< List of de Bruijn nodes */ 49
NodeHandle deBruijnNode; /**< Handle to our de Bruijn node */ //Timer Messages cMessage* deBruijn_timer; /**< timer for periodic de bruijn stabilization */ virtual void changeState(int state); virtual void handleDeBruijnTimerExpired(); //virtual void handleFixFingersTimerExpired(cMessage* msg); // see BaseOverlay.h virtual bool handleRpcCall(BaseCallMessage* msg); // see BaseOverlay.h virtual void handleRpcResponse(BaseResponseMessage* msg,cPolymorphic* context, int rpcId,simtime_t rtt ); // see BaseOverlay.h virtual void handleRpcTimeout(BaseCallMessage* msg, const TransportAddress& dest,cPolymorphic* context,int rpcId, const OverlayKey& destKey); virtual void handleRpcJoinResponse(JoinResponse* joinResponse); virtual void handleRpcDeBruijnRequest(DeBruijnCall* deBruinCall); virtual void handleRpcDeBruijnResponse(DeBruijnResponse* deBruijnResponse); virtual void handleDeBruijnTimeout(DeBruijnCall* deBruijnCall); virtual NodeHandle findDeBruijnHop(const OverlayKey& destKey, KoordeFindNodeExtMessage* findNodeExt); // see BaseOverlay.h NodeVector* findNode(const OverlayKey& key,int numRedundantNodes,int numSiblings,BaseOverlayMessage* msg); virtual OverlayKey findStartKey(const OverlayKey& startKey,const OverlayKey& endKey,const OverlayKey& destKey,int& step); virtual const NodeHandle& walkDeBruijnList(const OverlayKey& key); virtual const NodeHandle& walkSuccessorList(const OverlayKey& key); virtual bool handleFailedNode(const TransportAddress& failed); virtual void rpcJoin(JoinCall* call); virtual void findFriendModules(); virtual void initializeFriendModules(); }; }; //namespace #endif 50
.H File
#ifndef __KOORDE_H_ #define __KOORDE_H_ #include <omnetpp.h> #include <IPvXAddress.h> #include <OverlayKey.h> #include <NodeHandle.h> #include <BaseOverlay.h> #include "../chord/ChordSuccessorList.h" #include "../chord/Chord.h" namespace oversim { class Koorde : public Chord { public: virtual ~Koorde(); // see BaseOverlay.h virtual void initializeOverlay(int stage); // see BaseOverlay.h virtual void handleTimerEvent(cMessage* msg); // see BaseOverlay.h virtual void handleUDPMessage(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void recordOverlaySentStats(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void finishOverlay(); virtual void updateTooltip (); protected: //parameters int deBruijnDelay; /**< number of seconds between two de bruijn calls */ int deBruijnNumber; /**< number of current nodes in de bruijn list; depend on number of nodes in successor list */ int deBruijnListSize; /**< maximal number of nodes in de bruijn list */ int shiftingBits; /**< number of bits concurrently shifted in one routing step */ bool useOtherLookup; /**< flag which is indicating that the optimization other lookup is enabled */ bool useSucList; /**< flag which is indicating that the optimization using the successorlist is enabled */ bool breakLookup; /**< flag is used during the recursive step when returning this node */ bool setupDeBruijnBeforeJoin; /**< if true, first setup the de bruijn node using the bootstrap node and than join the ring */ bool setupDeBruijnAtJoin; /**< if true, join the ring and setup the de bruijn node using the bootstrap node in parallel */ //statistics int deBruijnCount; /**< number of de bruijn calls */ int deBruijnBytesSent; /**< number of bytes sent during de bruijn calls*/ //Node handles NodeHandle* deBruijnNodes; /**< List of de Bruijn nodes */
51
NodeHandle deBruijnNode; /**< Handle to our de Bruijn node */ //Timer Messages cMessage* deBruijn_timer; /**< timer for periodic de bruijn stabilization */ virtual void changeState(int state); virtual void handleDeBruijnTimerExpired(); //virtual void handleFixFingersTimerExpired(cMessage* msg); // see BaseOverlay.h virtual bool handleRpcCall(BaseCallMessage* msg); // see BaseOverlay.h virtual void handleRpcResponse(BaseResponseMessage* msg, cPolymorphic* context, int rpcId,simtime_t rtt ); // see BaseOverlay.h virtual void handleRpcTimeout(BaseCallMessage* msg, const TransportAddress& dest,cPolymorphic* context, int rpcId, const OverlayKey& destKey); virtual void handleRpcJoinResponse(JoinResponse* joinResponse); virtual void handleRpcDeBruijnRequest(DeBruijnCall* deBruinCall); virtual void handleRpcDeBruijnResponse(DeBruijnResponse* deBruijnResponse); virtual void handleDeBruijnTimeout(DeBruijnCall* deBruijnCall); virtual NodeHandle findDeBruijnHop(const OverlayKey& destKey, KoordeFindNodeExtMessage* findNodeExt); // see BaseOverlay.h NodeVector* findNode(const OverlayKey& key,int numRedundantNodes,int numSiblings,BaseOverlayMessage* msg); virtual OverlayKey findStartKey(const OverlayKey& startKey, const OverlayKey& endKey,const OverlayKey& destKey,int& step); virtual const NodeHandle& walkDeBruijnList(const OverlayKey& key); virtual const NodeHandle& walkSuccessorList(const OverlayKey& key); // see BaseOverlay.h virtual bool handleFailedNode(const TransportAddress& failed); virtual void rpcJoin(JoinCall* call); virtual void findFriendModules(); virtual void initializeFriendModules(); }; }; //namespace #endif
52
double routingTableMaintenanceInterval @unit(s); // pastry configuration according to the original paper bool overrideOldPastry; bool overrideNewPastry; // optimized pastry configuration @display("i=block/circle"); } simple PastryRoutingTable { parameters: @display("i=block/table"); } simple PastryLeafSet { parameters: @display("i=block/table"); } simple PastryNeighborhoodSet { parameters: @display("i=block/table"); } module PastryModules like IOverlay { gates: input udpIn; // gate from the UDP layer output udpOut; // gate to the UDP layer input tcpIn; // gate from the TCP layer output tcpOut; // gate to the TCP layer input appIn; // gate from the application output appOut; // gate to the application submodules: pastry: Pastry { parameters: @display("p=60,52;i=block/circle"); } pastryRoutingTable: PastryRoutingTable { parameters: @display("p=140,68;i=block/table"); } pastryLeafSet: PastryLeafSet { parameters: @display("p=220,52;i=block/table"); } pastryNeighborhoodSet: PastryNeighborhoodSet { parameters: @display("p=300,68;i=block/table"); } connections allowunconnected: 54
udpIn --> pastry.udpIn; udpOut <-- pastry.udpOut; appIn --> pastry.appIn; appOut <-- pastry.appOut; }
.H file
#ifndef __PASTRY_H_ #define __PASTRY_H_ #include <vector> #include <map> #include <queue> #include <algorithm> #include <omnetpp.h> #include <IPvXAddress.h> #include <OverlayKey.h> #include <NodeHandle.h> #include <BaseOverlay.h> #include <BasePastry.h> #include "PastryTypes.h" #include "PastryMessage_m.h" #include "PastryRoutingTable.h" #include "PastryLeafSet.h" #include "PastryNeighborhoodSet.h" class Pastry : public BasePastry { public: virtual ~Pastry(); // see BaseOverlay.h virtual void initializeOverlay(int stage); // see BaseOverlay.h virtual void handleTimerEvent(cMessage* msg); // see BaseOverlay.h virtual void handleUDPMessage(BaseOverlayMessage* msg); void handleStateMessage(PastryStateMessage* msg); virtual void pingResponse(PingResponse* pingResponse, cPolymorphic* context, int rpcId, simtime_t rtt); protected: virtual void purgeVectors(void); virtual void changeState(int toState); virtual bool recursiveRoutingHook(const TransportAddress& dest, BaseRouteMessage* msg); void iterativeJoinHook(BaseOverlayMessage* msg, bool incrHopCount); std::vector<PastryStateMsgHandle> stReceived; std::vector<PastryStateMsgHandle>::iterator stReceivedPos; std::vector<TransportAddress> notifyList; private: void clearVectors(); 55
simtime_t secondStageInterval; simtime_t routingTableMaintenanceInterval; simtime_t discoveryTimeoutAmount; bool partialJoinPath; int depth; int updateCounter; bool minimalJoinState; bool useDiscovery; bool useSecondStage; bool sendStateAtLeafsetRepair; bool pingBeforeSecondStage; bool overrideOldPastry; bool overrideNewPastry; cMessage* secondStageWait; cMessage* ringCheck; cMessage* discoveryTimeout; cMessage* repairTaskTimeout; void doSecondStage(void); void doRoutingTableMaintenance(); bool handleFailedNode(const TransportAddress& failed); void checkProxCache(void); void processState(void); bool mergeState(void); void endProcessingState(void); void doJoinUpdate(void); // see BaseOverlay.h virtual void joinOverlay(); }; #endif
.CC file
#include "PastryNeighborhoodSet.h" #include "PastryTypes.h" Define_Module(PastryNeighborhoodSet); void PastryNeighborhoodSet::earlyInit(void) { WATCH_VECTOR(neighbors); } void PastryNeighborhoodSet::initializeSet(uint32_t numberOfNeighbors, uint32_t bitsPerDigit,const NodeHandle& owner) { this->owner = owner; this->numberOfNeighbors = numberOfNeighbors; this->bitsPerDigit = bitsPerDigit; if (!neighbors.empty()) neighbors.clear(); // fill Set with unspecified node handles for (uint32_t i = numberOfNeighbors; i>0; i--) neighbors.push_back(unspecNode()); } 56
void PastryNeighborhoodSet::dumpToStateMessage(PastryStateMessage* msg) const { uint32_t i = 0; uint32_t size = 0; std::vector<PastryExtendedNode>::const_iterator it; msg->setNeighborhoodSetArraySize(numberOfNeighbors); for (it = neighbors.begin(); it != neighbors.end(); it++) { if (!it->node.isUnspecified()) { ++size; msg->setNeighborhoodSet(i++, it->node); } } msg->setNeighborhoodSetArraySize(size); } const NodeHandle& PastryNeighborhoodSet::findCloserNode(const OverlayKey& destination, bool optimize) { std::vector<PastryExtendedNode>::const_iterator it; if (optimize) { // pointer to later return value, initialize to unspecified, so // the specialCloserCondition() check will be done against our own // node as long as no node closer to the destination than our own was // found. const NodeHandle* ret = &NodeHandle::UNSPECIFIED_NODE; for (it = neighbors.begin(); it != neighbors.end(); it++) { if (it->node.isUnspecified()) break; if (specialCloserCondition(it->node, destination, *ret)) ret = &(it->node); } return *ret; } else { for (it = neighbors.begin(); it != neighbors.end(); it++) { if (it->node.isUnspecified()) break; if (specialCloserCondition(it->node, destination)) return it->node; } return NodeHandle::UNSPECIFIED_NODE; } } void PastryNeighborhoodSet::findCloserNodes(const OverlayKey& destination, NodeVector* nodes) { std::vector<PastryExtendedNode>::const_iterator it; for (it = neighbors.begin(); it != neighbors.end(); it++) if (! it->node.isUnspecified()) nodes->add(it->node); } bool PastryNeighborhoodSet::mergeNode(const NodeHandle& node, simtime_t prox) { std::vector<PastryExtendedNode>::iterator it; 57
bool nodeAlreadyInVector = false; // was the node already in the list? bool nodeValueWasChanged = false; // true if the list was changed, false if the rtt was too big // look for node in the set, if it's there and the value was changed, erase it (since the position is no longer valid) for (it = neighbors.begin(); it != neighbors.end(); it++) { if (!it->node.isUnspecified() && it->node == node) { if (prox == SimTime::getMaxTime() || it->rtt == prox) return false; // nothing to do! neighbors.erase(it); nodeAlreadyInVector = true; break; } } // look for the correct position for the node for (it = neighbors.begin(); it != neighbors.end(); it++) { if (it->node.isUnspecified() || (it->rtt > prox)) { nodeValueWasChanged = true; break; } } neighbors.insert(it, PastryExtendedNode(node, prox)); // insert the entry there if (!nodeAlreadyInVector) neighbors.pop_back(); // if a new entry was inserted, erase the last entry return !nodeAlreadyInVector && nodeValueWasChanged; // return whether a new entry was added } void PastryNeighborhoodSet::dumpToVector(std::vector<TransportAddress>& affected) const { std::vector<PastryExtendedNode>::const_iterator it; for (it = neighbors.begin(); it != neighbors.end(); it++) if (! it->node.isUnspecified()) affected.push_back(it->node); } const TransportAddress& PastryNeighborhoodSet::failedNode(const TransportAddress& failed) { std::vector<PastryExtendedNode>::iterator it; for (it = neighbors.begin(); it != neighbors.end(); it++) { if (it->node.isUnspecified()) break; if (it->node.getIp() == failed.getIp()) { neighbors.erase(it); neighbors.push_back(unspecNode()); break; } } // never ask for repair return TransportAddress::UNSPECIFIED_NODE; 58
} std::ostream& operator<<(std::ostream& os, const PastryExtendedNode& n) { os << n.node << ";"; if (n.rtt != SimTime::getMaxTime()) os << " Ping: " << n.rtt; return os; }
.H file
#ifndef __CHORD_H_ #define __CHORD_H_ #include <BaseOverlay.h> #include <NeighborCache.h> #include "ChordMessage_m.h" namespace oversim { class ChordSuccessorList; class ChordFingerTable; 59
class Chord : public BaseOverlay, public ProxListener { public: Chord(); virtual ~Chord(); // see BaseOverlay.h virtual void initializeOverlay(int stage); // see BaseOverlay.h virtual void handleTimerEvent(cMessage* msg); // see BaseOverlay.h virtual void handleUDPMessage(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void recordOverlaySentStats(BaseOverlayMessage* msg); // see BaseOverlay.h virtual void finishOverlay(); // see BaseOverlay.h OverlayKey distance(const OverlayKey& x, const OverlayKey& y, bool useAlternative = false) const; virtual void updateTooltip(); void proxCallback(const TransportAddress &node, int rpcId, cPolymorphic *contextPointer, Prox prox); protected: int joinRetry; /**< */ int stabilizeRetry; /**< // retries before neighbor considered failed */ double joinDelay; /**< */ double stabilizeDelay; /**< stabilize interval (secs) */ double fixfingersDelay; /**< */ double checkPredecessorDelay; int successorListSize; /**< */ bool aggressiveJoinMode; /**< use modified (faster) JOIN protocol */ bool extendedFingerTable; unsigned int numFingerCandidates; bool proximityRouting; bool memorizeFailedSuccessor; bool newChordFingerTable; bool mergeOptimizationL1; bool mergeOptimizationL2; bool mergeOptimizationL3; bool mergeOptimizationL4; // timer messages cMessage* join_timer; /**< */ cMessage* stabilize_timer; /**< */ cMessage* fixfingers_timer; /**< */ cMessage* checkPredecessor_timer; // statistics int joinCount; /**< */ int stabilizeCount; /**< */ int fixfingersCount; /**< */ 60
int notifyCount; /**< */ int newsuccessorhintCount; /**< */ int joinBytesSent; /**< */ int stabilizeBytesSent; /**< */ int notifyBytesSent; /**< */ int fixfingersBytesSent; /**< */ int newsuccessorhintBytesSent; /**< */ int keyLength; /**< length of an overlay key in bits */ int missingPredecessorStabRequests; /**< missing StabilizeCall msgs */ virtual void changeState(int toState); // node references NodeHandle predecessorNode; /**< predecessor of this node */ TransportAddress bootstrapNode; /**< node used to bootstrap */ // module references ChordFingerTable* fingerTable; /**< pointer to this node's finger table */ ChordSuccessorList* successorList; /**< pointer to this node's successor list */ // chord routines virtual void handleJoinTimerExpired(cMessage* msg); virtual void handleStabilizeTimerExpired(cMessage* msg); virtual void handleFixFingersTimerExpired(cMessage* msg); virtual void handleNewSuccessorHint(ChordMessage* chordMsg); virtual NodeVector* closestPreceedingNode(const OverlayKey& key); virtual void findFriendModules(); virtual void initializeFriendModules(); // see BaseOverlay.h virtual bool handleRpcCall(BaseCallMessage* msg); // see BaseOverlay.h NodeVector* findNode(const OverlayKey& key, int numRedundantNodes, int numSiblings, BaseOverlayMessage* msg); virtual void joinOverlay(); virtual void joinForeignPartition(const NodeHandle &node); virtual bool isSiblingFor(const NodeHandle& node, const OverlayKey& key, int numSiblings, bool* err); int getMaxNumSiblings(); int getMaxNumRedundantNodes(); void rpcFixfingers(FixfingersCall* call); virtual void rpcJoin(JoinCall* call); virtual void rpcNotify(NotifyCall* call); void rpcStabilize(StabilizeCall* call); virtual void handleRpcResponse(BaseResponseMessage* msg, cPolymorphic* context, int rpcId, simtime_t rtt); virtual void handleRpcTimeout(BaseCallMessage* msg, const TransportAddress& dest, cPolymorphic* context, int rpcId, const OverlayKey& destKey); virtual void pingResponse(PingResponse* pingResponse, cPolymorphic* context, int rpcId, simtime_t rtt); virtual void pingTimeout(PingCall* pingCall, const TransportAddress& dest, cPolymorphic* context, int rpcId); 61
virtual void handleRpcJoinResponse(JoinResponse* joinResponse); virtual void handleRpcNotifyResponse(NotifyResponse* notifyResponse); virtual void handleRpcStabilizeResponse(StabilizeResponse* stabilizeResponse); virtual void handleRpcFixfingersResponse(FixfingersResponse* fixfingersResponse, double rtt = -1); virtual bool handleFailedNode(const TransportAddress& failed); friend class ChordSuccessorList; friend class ChordFingerTable; private: TransportAddress failedSuccessor; }; }; //namespace #endif
.NED file
module ChordModules like IOverlay { parameters: @display("i=block/network2"); gates: input udpIn; // gate from the UDP layer output udpOut; // gate to the UDP layer input tcpIn; // gate from the TCP layer output tcpOut; // gate to the TCP layer input appIn; // gate from the application output appOut; // gate to the application submodules: chord: Chord { parameters: @display("p=60,60"); } fingerTable: ChordFingerTable { parameters: @display("p=150,60"); } successorList: ChordSuccessorList { parameters: @display("p=240,60"); } connections allowunconnected: udpIn --> chord.udpIn; udpOut <-- chord.udpOut; appIn --> chord.appIn; appOut <-- chord.appOut; }
62
63
64
65
66
CHAPTER 9
67
References
[1] Kademlia: A Peer-to-peer Information System Based onthe XOR Metric Petar Maymounkov and David Mazi`eres fpetar, dmg@cs.nyu.edu http://kademlia.scs.cs.nyu.edu [2] Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. Ion Stoica, Robert Morris, DavidKarger, M. Frans Kaashoek, Hari Balakrishnan, MIT Laboratory for Computer Science. chord@lcs.mit.edu, http://pdos.lcs.mit.edu/chord/ [3] Koorde: http://www.pdos.lcs.mit.edu/chord/ [4] Pastry: 1.Antony Rowstron, Microsoft Research Ltd, St. George House, Guildhall Street, Cambridge, CB2 3NH, UK. antr@microsoft.com 2. Peter Drusche, Rice University MS-132, 6100 Main Street, Houston, TX 770051892, USA. druschel@cs.rice.edu [5] OverSim: Ingmar Baumgart, Bernhard Heep, Stephan Krause, Institute of Telematics Universitat Karlsruhe (TH), Zirkel 2, D76128 Karlsruhe, Germany, Email:{baumgart, heep, stkrause}@tm.uka.de [6] OveSim/inet-oversim: https://github.com/oversim/inet-oversim.git [7] OMNeT++ Network Simulation Framework. www.omnetpp.org/ [8] http://h33t.com/tor/456004/omnet-4-2-2-released [9] http://www.h33t.com:3310/announce [10] http://fr33dom.h33t.com:3310/announce [11] www.omnetpp.org/doc/omnetpp/IDE-Overview.pdf [12] http://www.omnetpp.org/doc/omnetpp/manual/usman.html#sec411
68