Fast-Response Dynamic Routing

Fast-Response Dynamic Routing Balancing for High-Speed Interconnection Networks*
D. Lugones, D. Franco, and E. Luque.

Department of Computer Architecture and Operating Systems. Universitat Autnoma of Barcelona, Spain.
diego.lugones@caos.uab.es {daniel.franco, emilio.luque}@uab.es. AbstractCommunication requirements in High Performance Computing systems demand the use of high-speed Interconnection Networks to connect processing nodes. However, when communication load is unfairly distributed across the network resources, message congestion appears. Congestion spreading increases latency and reduces network throughput causing important performance degradation. The FastResponse Dynamic Routing Balancing (FR-DRB) is a method developed to perform a uniform balancing of communication load over the interconnection network. FR-DRB distributes the message traffic based on a gradual and load-controlled path expansion. The method monitors network message latency and makes decisions about the number of alternative paths to be used between each source-destination pair for message delivery. FR-DRB performance has been compared with other routing policies under a representative set of traffic patterns which are commonly created by parallel scientific applications. Experiments results show an important improvement in latency and throughput.
the flow control mechanism [3]. Therefore, current design trends demand efficient congestion control techniques to improve the network throughput using a suitable amount of resources and reducing the congestion caused by the adverse traffic [1]. This can be achieved by adaptive routing techniques which dynamically manage existing network resources to reduce congestion. 2. RELATED WORK. Typically, adaptive congestion control mechanisms perform three basic tasks: network traffic monitoring, congestion detection, and congestion control. In traffic monitoring, such parameters as point to point message latency [9], buffer occupancy level [3] or link speed-down (also called backpressure) [1] are evaluated in order to detect and notify the congestion onset. After notification is received, some action is performed by network endnodes or switches to avoid performance degradation. Message Throttling (MT) is probably the most popular control action due to low cost and easy implementation reasons. MT stops (or reduces) injection until packets belonging to the congested area are delivered to their corresponding destinations. Message throttling is used to keep the buffer occupation bounded in switches. However, latency is still dreadfully increased, because packets must wait at the source nodes until congestion disappears, so performance is degraded. Other congestion control approaches are based on the buffer management in switchs ports [3]. In these cases, packets flows are locally reallocated at switches to avoid contention. However, good performance is not achieved because congestion sources are not controlled, and the local reallocation is not enough to reduce the traffic demand on the oversubscribed switch. Finally, congestion control techniques based on adaptive routing algorithms modify their behavior according to the traffic condition to avoid congestion. Such policies handle with congestion by sending messages from source to destination through alternative paths. Thus, congested area is avoided and message injection is upheld. Therefore, the
1. INTRODUCTION. Interconnection networks play a principal role in todays High Performance Computing (HPC) systems, which are a very important platform for solving scientific problems requiring ever-larger computational speed. Hence, an efficient design of the interconnection network becomes critical to conceive more powerful techniques that allow delivering messages at the fastest speed. As HPC system size is increased, the interconnection network becomes a bottleneck. Nowadays, network cost and power consumption are much higher than processors' [1]. To address this issue, the number of network components is reduced. However, this reduction leads the network throughput near the saturation point, because the network must fulfill the same communication requirements but using fewer resources (switches and links). When communication load is unfairly distributed across the network some resources could be idle, while others could be quite congested (Hot-spot). If congestion is not efficiently controlled those resources may reach the saturation. As a consequence, the message latency is considerably raised up, and the global system performance is degraded. This situation is even worse in lossless networks because congestion is quickly propagated to the whole network by
*
This work was funded by the MEC-Spain under contract TIN2007-64974
978-1-4244-5012-1/09/$25.00 2009 IEEE
global system performance is improved because traffic load is fairly distributed over the network resources. Some examples are: HSAM [18], RECN-DD [3], PIPD [15], DRB [9] and [7] , GOAL [16], and other methods presented in [2], [12], [15], and [4]. Some disadvantages of the adaptive routing mechanisms are the overhead resulting from information monitoring, the path changing and the need to guarantee both deadlock freedom [2], and in-order packet delivery. As mentioned above, the information about congestion is analyzed by the routing algorithm in order to perform some corrective action. In this case, information about the past is used to decide the immediate future behavior of the routing algorithm. Hence, a fast response speed is mandatory for the monitoring and notification activities to provide the routing algorithm with updated congestion information. It is also important that the algorithm has robustness respect to the available information it uses (i.e. Algorithm should make appropriate decisions despite monitoring information is not always very accurate). This issue raises a tradeoff: if good decisions are needed, more information is required from the system, but more information means more traffic overhead. Therefore, the amount of information needed, and the overhead required to gather and process this information must be balanced. Consequently, an efficient routing algorithm has to extract the smartest behavior from the information that it has, and it must also provide a fast response time (i.e. It must be able to rapidly detect critical situations). In this paper, we present the Fast-Response Dynamic Routing Balancing algorithm (FR-DRB), a new routing policy that uses several alternative paths simultaneously to increase the available effective bandwidth between the source-destination pairs for message delivery. Our proposal prevents the network congestion and fulfils the features mentioned above. In FR-DRB we apply the concept of communication load balancing to perform a uniform traffic load distribution over the network resources. Distribution is accomplished by a dynamic path expansion which is controlled according to the congestion level in each source-destination path. The Monitoring phase is achieved by measuring the total latency value that is registered by the messages along their path. The Notification phase is accomplished by acknowledge messages (Ack), which are generated according to the congestion level. In order to address the tradeoff between monitoring overhead and response speed, the FR-DRB mechanism generates Acks only when network traffic is low. The destination node sends the Ack only if message latency does not exceeds a Threshold latency value. Meanwhile, the source node monitors the time that the user message is delayed in the network by using a watchdog timer. When the watchdog timeout arises, FR-DRB immediately begins to use its functionality expanding the source-destinations paths in order to achieve greater bandwidth and avoiding the Ack generation at destination node. Thus, FR-DRB eliminates
monitoring overhead when network is working near saturation. The idea of using a watchdog timer is common to several systems and contexts. In [9], the watchdog is used in combination with pooling in the communication processor delivering messages to the receiver thread. Furthermore, a description of some congestion control techniques using time windows is presented in [6], and also in [11]. FR-DRB is based on DRB [7], a former algorithm aimed to provide load balancing in current technologies. However, FR-DRB is intended to extend the functionality but considering important design goals not included in the former version. These goals are fast response to congestion, robustness and notification overhead reduction. In addition, FR-DRB is inline with current approaches used in commercial interconnects (i.e. InfiniBand), unlike the proposal presented in [9] which demands additional requirements to the network components (i.e. local adaptivity, and the acknowledge generation in switches). The rest of this paper is organized as follows. Section 3 presents a complete description of the FR-DRB policy. Section 4 shows the performance evaluation conducted to achieve a comparison with other routing methods, and also, to measure FR-DRB response time. Finally, Section 5 presents conclusions. 3. FAST-RESPONSE DYNAMIC ROUTING BALANCING FR-DRB defines the Metapath as the set of possible alternative paths between each source-destination pair. Metapath Configuration defines how to create alternative paths used to expand single paths, and when to use them according to the congestion level. Congestion detection is accomplished by watchdog timers and Ack messages. If timers exceed the limit value (timer expiration), then a metapath is configured and new alternative paths are selected. Hence, the available effective bandwidth between src-dst pairs is increased when network is congested. Also, the latency undergone by the messages is recorded by the messages themselves. If latency is lower than threshold, it is sent back to the sender node using an acknowledge message (Ack) to stop the timer and to provide the sender node with latency information. Otherwise, for higher network latency values, the watchdog timer on the sender will reach a time limit indicating that latency is high. In this case, the acknowledge message is not generated. Each alternative path in the metapath is created by using two intermediate nodes (INs), which are surrounding neighbors of source and destination nodes respectively. Those INs are used like messages scattering and gathering areas from source and destination nodes. INs are selected by FR-DRB for each source-destination pair in the user application. A three-step path (Multi-Step Path, MSP) is then built by selecting two INs: The IN1 which is a neighbor of the source node, and the IN2 which is neighbor of the destination node. Thus, the alternative paths created by FR-DRB are built around the original path, and the
a) Latency Detection and Notification
(b) Metapath Configuration Fig. 1. FR-DRB phases
(c) MultiStepPath Selection
latency information is used to decide the number of alternative paths over which messages will be distributed. The basic phases of FR-DRB are shown in the following figures: Fig. 1 (a). Detection and notification: Congestion is detected according to packets latency and the buffers occupation state in switches. In case of non congested path, notification is achieved by Acks packets. Otherwise, watchdog timer expiration is used to notify the sources nodes about congestion. Fig. 1 (b) Metapath configuration: A set of surrounding nodes for each source, and for each destination node is provided. Fig. 1 (c) shows an example of a Metapath: A set of Multi-step paths (MSPs) defined by a set of intermediate nodes pairs. Next, a detailed description of the three algorithm components (Monitoring activity, Dynamic Metapath Configuration and MultiStepPath Selection) is provided. 3.1 Monitoring Activity. Traffic load monitoring is accomplished at two different network elements: the sender node, and the intermediate switches. At sender node, a watchdog timer registers the time that users message spends in traveling to destination node plus the return time of the Ack message. In the meanwhile, the message latency is also accumulated at intermediate switches. The Watchdog timer is started (start signal) when the message is injected into the network, and it is stopped (stop signal) when the Ack arrives to the source, or when the timer exceeds a specified time limit. This limit is calculated according to: Where: - LZL(DATA) is the zero-load latency of the data packet. - LZL(ACK) is the zero-load latency of the ack packet. - Th_lat is the threshold latency value.
The zero-load latency is defined as the minimal average latency accumulated by a packet in the network assuming that the packet do not contend for resources with other packets [15]. Thus zero-load latency is given by network physical constrains such as distance between nodes (Hops), link bandwidth and packet size. When timer reaches the time limit (expiration), metapath configuration is invoked using this value as a parameter. Timer activity is shown in the Watchdog Timer function on Table 1(a). On the other hand, Latency information is registered by the FR-DRB switch and it is transported as the message travels from source to destination node, as shown in the Traffic Load Monitoring function (pseudo code of Table 1(b)). The time that a message waits in switchs buffers when it gets blocked by other messages is known as contention latency. This is the latency value recorded in the message. Latency information is evaluated when a message arrives to its corresponding destination. If latency value is lower than a threshold, an Ack message is generated and sent back to the sender node in order to stop the watchdog timer. Otherwise, if accumulated latency is higher than the threshold, the Ack message is not generated because the watchdog timer should already have invoked the metapath configuration module. Thus, Ack messages are not injected when network is near saturation. Ack messages have higher priority in the routing unit, and their size is less than 1% of the data message, because only a header with the latency value is transported. Threshold value must be set according to the latency that users application can tolerate. For example, threshold value can be set to a 50% more than zero-load latency. This value implies that average link throughput is reduced in 33% respect to the nominal value. In this case, the path is considered as congested. From this point of view, the latency works as a saturation index. When latency is
Watchdog Timer (start, stop: signals): /*FR-DRB Endnode*/ Wait for start signal to arrive; Repeat Increase timer If (timer >T limit ) Call Metapath configuration (table 2) Reset watchdog timer If (stop signal arrives) Reset watchdog timer End repeat
(a)
hop distance are considered first, then INs of 2-hop distance, etc. This metapath expansion is performed gradually by including more surrounding neighbors in the metapath configuration. Thus, the traffic load is fairly distributed over the network resources. The metapath configuration phase is shown in Table 2. If metapath average latency is larger than the threshold value, then the metapath size is increased, otherwise, it is decreased.
Metapath Configuration (MSP, Th_Lat); /* Executed in source nodes each time a Latency (MSP) arrives or a Timer expires*/ Variables Latencies_MSP: Vector[1..Number_of_MSP] of integer; Threshold Th_Lat; Begin 1. Receive a Latency or a Timer Limit; 2. Calculate the Metapath Latency (P*). Latency(P*)=( Latency(MSPs)-1)-1 3. If (Latency (P*) > Th_Lat) Increase the number of INs to provide new alternative paths. Else If (Latency (P*) < Th_Lat) Decrease the number of INs to constrict metapath. End If End Metapath Configuration Table 2. Metapath Configuration Code
Traffic Load Monitoring (Msg M, Th_Lat, MSP) /*FR-DRB Switch*/ Begin 1. For each step of message M, 1.1. Accumulate latency (queue time) to calculate MSP latency 1.2. Continue to next intermediate node or to final destination. 2. When the message arrives to final destination, 2.1. If (Latency(MSP) > Th_Lat) do not send acknowledge message else Latency (MSP) is sent back to the source node in an acknowledge message. 3.When the acknowledge message arrives at the source node: 3.1 Reset watchdog timer (Stop signal) 3.2. Latency (MSP) is delivered to the Metapath Configuration function (MSP, latency (MSP)). End Monitoring
(b)
Table 1. Traffic Load Monitoring and Timer functions
going beyond the threshold, the monitoring module assumes that paths performance is poor and allows FR-DRB to improve it. The goal of latency recording in messages is to identify the networks local traffic at any moment in order to provide routing adaptivity. By using this local information, the effect of other messages (which were sent by other sources) is considered. Consequently, by means of this distributed mechanism a global and collective effect of mutual influences is achieved. 3.2 Metapath Configuration. FR-DRB executes the dynamic metapath configuration using the information gathered at monitoring phase. The objective of this configuration is to determine for each source-destination pair, the type and size of the metapath according to the message latency or the timer information. This is achieved by the selection of intermediate nodes. INs build a path which is different from the original one. The INs configuration regards the latency values at any moment, together with the topological characteristics of the interconnection network. INs are selected according to their distance to the source (or destination) node. The INs of 1-
3.3 MultiStepPath Selection. Each time a message is injected into the network, the MultiStepPath Selection module is invoked to perform the traffic load distribution by selecting one multi-step path. Consequently, messages are proportionally distributed among the MSPs according to the latency information. Hence, the paths having the lowest latency values will receive the greater number of messages. Given a source node with N alternative paths, lets be Lci (i:1...N) the latency recorded in path Ci (if there is not any latency recorded yet, zero-load latency is used), and lets be Bci the corresponding bandwidth calculated as: Bci=1/Lci. Then the alternative path Cx will be selected in the following injection according to the probability:
U (Cx)
B Cx
B Ci
i 1
Paths are selected according to their latency and also to their length. If paths are long in hops, the message transmission time could be high enough and lead to performance degradation, so shortest and less loaded paths are selected. The pseudo code in Table 3 shows the MultiStepPath selection phase. As explained above, when the message is injected into the network, a watchdog timer is started to count the time that
the acknowledge message takes to arrive. Then, if the timer exceeds a time limit, it can be deduced that latency is high enough. Therefore, the path selection can be performed before the Ack message arrival providing fast response to congestion.
Multistep Path Selection () /*Executed in the source node each time a message is injected*/ Begin 1. Build Probability Density Function (PDF) of MultiStepPath bandwidths (BCis). 2. Select MultiStep Path using the PDF. 3. Inject Message in the network 3.1 Build a message header. 3.1.1 Concatenate INs headers. 3.2 Inject message 3.3 Start timer End MultiStepPath Selection Table 3. FR-DRB MSP Selection Code
3.4 Putting All Components Together. All the functionality and operations performed by the FRDRB algorithm are shown in Fig. 2. When a source node injects a message in the interconnection network, a MultiStepPath (MSP) is selected according to the respective latencies of the alternative paths. Path having the lowest latency is selected with higher probability. The message is then injected into the network, and concurrently, the watchdog timer is started to measure the message trip time. When message leaves the source node, it is forwarded to destination node through intermediate switches. Contention suffered at switchs buffers (queuing latency) is recorded and stored in the message itself. When the message arrives to its destination, it is delivered to the user. Then, latency information is sent back to the sender in the Ack header only if Recorded Latency < Threshold. Otherwise, monitoring activity is finished and the acknowledge message is not generated. Meanwhile, the watchdog timer located at the source node runs side by side with the sending of the message. In case that Ack message arrives before the watchdog expiration, the latency value is delivered to the metapath configuration module. This module configures the metapath by selecting the alternative paths to be used according to the latency value. However, if the watchdog exceeds the time limit, the FR-DRB algorithm will use the latency threshold to configure the metapath. Current switches in HPC systems are not just network cops, since they are endowed with smart capabilities in order to evaluate and adapt communication load in accordance to network condition [7]. For instance, InfiniBand (IBA) switches, the most used technology in todays HPC clusters [18], are provided with features aimed to perform the buffer monitoring and the multipath selection, as is required by the FR-DRB policy. IBA also provides the watchdog timers to fulfill congestion control requirements [5].
FR-DRB operations are performed concurrently with packet delivery. As shown in Fig. 2, message is forwarded without any overhead when output port is free (thick arrows). Otherwise, latency accumulation is performed only when the messages are waiting in the buffer. Hence, this operation does not delay the send/receive primitives. Also, MultiStepPath selection and metapath configuration are performed concurrently with the load injection, and the messages are not delayed either. Deadlock freedom is ensured by having a separate escape channel for each phase. As we adopt two intermediate nodes, one escape channel is used (if required) from Src to IN1, another one from IN1 to IN2, and a third one from IN2 to Dst. Hence, each phase defines a virtual network, and the packets change virtual network at each intermediate node. Although each virtual network relies on a different escape channel, they all share the same adaptive channel(s). Thus, our current FR-DRB implementation uses four virtual channels. The use of adaptive routing algorithms can cause out of order delivery of packets. If the user application requires inorder packet delivery, FR-DRB reorder packets at the destination node by using the well known sliding window protocol, as is the case for other routing policies like [15].
4. FR-DRB PERFORMANCE EVALUATION In order to assess the FR-DRB performance, we analyze how Latency and Throughput metrics are improved by the monitoring activity, and the multipath configuration and selection mechanisms. Latency metric represents the elapsed time between the generation of a packet at the source node, until it is completely delivered at the destination node. Throughput metric represents the traffic load which is accepted by the network vs. the traffic load which is offered by the sender nodes. Both metrics give a global and average network performance description. In addition, network latency maps and latency over time charts are also provided to evaluate mechanism transient response. Evaluation methodology is divided into two major parts. The first part is designed to perform a network response analysis under the Hot-spot traffic pattern to evaluate the FR-DRB transient behavior and the traffic load distribution in extreme conditions. This specific pattern establishes some fixed destinations in order to increase the traffic in a particular network area causing saturated paths. In addition, the remainder network nodes inject uniform load in order to create background traffic over the network. In the second part, we evaluate the proposed technique using well known communication patterns: Butterfly, Perfect Shuffle and Matrix Transpose. These patterns are collection of benchmarks that describe the conditions commonly created by parallel scientific applications (further description of these patterns is provided in [2] and [14]). The FR-DRB operations and modules, together with network components were modeled [8] using the standard simulation and modeling tool OPNET Modeler [13]. Opnet
Fig. 2. FR-DRB algorithm: Monitoring, Metapath Configuration and MSP Selection
provides a Discrete Event Simulator (DES) engine. This environment allows defining network components behavior by a Finite State Machine approach (FSM), and it supports detailed specification of protocols, applications, and queuing policies. The simulations were conducted for three InfiniBand-like networks using the most popular topologies in HPC systems (mesh, torus, and fat-tree), as is claimed by the Top 500 supercomputer list [18]. In all cases, virtual cutthrough switching, and credit-based flow control were assumed. In order to achieve a comparative analysis, we have implemented five routing policies. The Valiants Routing algorithm [17] is an oblivious routing protocol aimed to achieve full load balancing. This mechanism performs two phases. In the first phase, an intermediate node (IN) is randomly selected and packet is forwarded to this node. After the IN is reached, packets are sent to destination following the dimension order routing (DOR) approach [2]. Also, we have implemented the Turn model [12], an adaptive method that allows several possible minimal paths between source and destination. At each switch, this policy tries to forward packets through any free (or less loaded) output link from those belonging to the minimal path. Thus, local adaptivity is provided. Finally, in order to evaluate the FR-DRB response time and the impact of the timer expiration, we have set the time limit of the watchdog to two different values: A fixed value related to the saturation point, and infinity which implies no watchdog expiration (as in the former DRB method [7]).
4.1 Hotspot Analysis. Latency and throughput results obtained for the 1024nodes mesh network are presented in Fig. 3 (a) and (b). FRDRB shows the same behavior that the other routing policies at low loads, and consequently it does not overload
the network. However, at higher loads throughput is improved using FR-DRB routing by 94% and latency is reduced by 96% related to DOR. The improvement relies in the fact that FR-DRB is a method with a fast response time and low overhead. FR-DRB mechanism starts as soon as the watchdog timer surpasses the time limit without waiting for the Ack message which may arrive very much later, as is the case for DRB. Performance improvements are larger at higher loads. This implies that FR-DRB distributes better the traffic load. Hence, independently of the original spatial distribution, the load that each switch perceives is similar, and the latency experienced by messages is uniform. In addition, Fig. 3 (c) shows the network latency surface under a quadruple Hotspot pattern. This pattern is designed to analyze the network performance under heavy load. In this case, a deterministic routing algorithm (DOR) was used. As DOR does not perform any load balancing, Fig. 3 (c) is useful to see the impact of Hotspot in the network, because this is the worst congestion case. We show the average contention latency by means of the latency surface, in which each grid point (xy coordinates) represents the average latency in the buffers of network switches (Figures 3(d), 3(e), 3(f), and 3(g)). Also, the effective load distribution, of each algorithm, is shown by the contour lines projected at the base of the charts. Latency reduction accomplished by FR-DRB is 99% respect to DOR. Fig. 3 (d) shows that Valiant algorithm distributes the traffic load better than Turn model (Fig.3(e)). However, the average message latency is worst because path length is doubled (in average). Thus, Valiant algorithm performs a suitable load distribution at expense of a latency rise. Also, the local adaptivity of Turn model improves latency behavior but it lacks of a suitable load balancing. FR-DRB outperforms the algorithms mentioned above because it provides a global load distribution minimizing the message latency (Fig.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 3. Performance results for Hot-spot pattern in the mesh network
3(g)). Therefore, network nodes using FR-DRB can adapt themselves to the network condition avoiding hotspots by using the free available bandwidth. Finally, Fig. 3 (h) shows the latency experienced by messages along the time. The Hotspot duration is in the range between 2 and 2.4 seconds. In case of deterministic routing, latency peak reaches 45 ms approx. FR-DRB reduces this peak almost 7 times due to the multipath selection feature and the watchdog timer module, both described in section 3.
4.2 Benchmark Traffic Analysis. We present the charts of latency reduction and throughput improvement for the three proposed traffic patterns defined above (Butterfly, Perfect Shuffle and Matrix Transpose). Fig. 4(a) and (b) show performance improvements in a 1024-nodes torus network for the Valiant, Turn model, DRB, and FR-DRB policies. Results are presented in percentage [%] and they are all related to
deterministic routing (DOR). We also present, in Fig. 5(a) and (b), the performance improvements achieved in a 64-nodes network arranged in a Fat-tree topology (4ary- 2tree) which is widely used in today datacenters. In such topology, routing algorithms are different to torus algorithms. Routing in fat-trees is composed of two phases, an adaptive upwards phase and a deterministic downwards phase. As adaptive routing is used in the ascending phase, several output ports are possible at each switch and the final choice depends on the selection function. The impact of the selection function on performance has been previously studied in [4]. We have implemented the First Free (FF) and Cyclic Priority (CP) selection functions to perform a comparative analysis. The FF selection function selects the first free physical link, and the CP selection function uses a round robin algorithm to choose a different physical link each time a packet is forwarded.
(a)
(b)
Fig. 4. Performance results obtained for persistent patterns in a Torus topology network
(a)
(b)
Fig. 5. Performance results for persistent patterns in a Fat-Tree topology network
Latency reductions and throughput improvements results are also presented in percentage [%]. These results are related to deterministic routing, in which packets are delivered through the same statically assigned path, and no adaptivity or randomization is provided. Experiments show that FR-DRB achieves lower latencies (up to 80%) and higher throughput (up to 100%) than the other methods, in both topologies. When load increases, latency improvements also increase. This gain allows heavier communication load for networks using the FR-DRB mechanism, or in cost-bounded systems, our policy allows using less network resources for a given communication load, because those resources are more efficiently handled. Improvements in latency shown by FR-DRB are given by the monitoring strategy and by the multipath configuration and selection mechanism described in this paper.
5. CONCLUSIONS In this paper, we proposed the Fast-Response Dynamic Routing Balancing policy to deal with congestion in highspeed interconnection networks. FR-DRB controls the performance degradation produced by packet contention in network resources. Congestion control is accomplished by distributing the communication load over several alternative paths. FR-DRB performs a latency monitoring in the path connecting source and destination nodes. When latency value is highly increased, source nodes start sending messages concurrently through new, different and less loaded alternatives paths. As all source nodes are aware about the latency state, a global latency reduction is achieved as shown in the experiments. Sources nodes are also provided with watchdog timer that leads to faster response time when network is congested. Furthermore, the watchdog limits the acknowledge message generation which reduces the overhead when network is near the saturation point. FR-DRB has been developed to fulfill the design objectives for parallel-computer interconnection networks. These objectives are all-to-all connection and low latency between any pair of nodes for any communication load in the network. Experiments performed to validate the FRDRB policy have revealed very good improvements in latency and throughput, and congestion is reduced allowing the use of the network at higher loads. Therefore, FR-DRB is useful for persistent and bursty communication patterns, which are those that can produce the worst hot-spot situations. REFERENCES
[1] [2] Baydal, E. A Family of Mechanisms for Congestion Control in Wormhole Networks, IEEE TPDS, vol 16, pp.772-784, 2005. Duato J, Yalamanchili S, Ni L. Interconnection Networks, an Engineering Approach. Morgan Kaufmann. 2002.
[3] [4] [5] [6] [7] [8]
[9]
[10] [11] [12] [13] [14] [15]
[16] [17] [18]
[19]
Garcia P.J., et al. "RECN-DD: A Memory-Efficient Congestion Management Technique for Advanced Switching," in ICPP, pp. 23-32, 2006. Gilabert, F., M. Gmez, et al. On the Influence of the Selection Function on the Performance of Fat-Trees. At Euro-Par, pp: 864-873. Vol. 4128. 2006. IBTA, InfiniBand Architecture Specification, Volume 1, Release 1.2.1, http://www.infinibandta.org/specs/. Jain, R., "Congestion control in computer networks: issues and trends," Network, IEEE, vol.4, no.3, pp.24-30, 1990. Lugones, D. Franco, D. and Luque, E. "Dynamic Routing Balancing On InfiniBand Networks", Journal of Comp. Sci. & Tech. (JCS&T), Vol. 8 - No. 2. pp. 104-110, 2008. Lugones, D. Franco, D. and Luque, E. Modeling Adaptive Routing Protocols in High Speed Interconnection Networks ", at OPNETWORK2008, Washington, EEUU. 2008. available at: https://aomail.uab.es/~dlugones/opnet.html Lugones, D. Franco, D. and Luque, E., "Dynamic and Distributed Multipath Routing Policy for High-Speed Cluster Networks", Cluster Computing and the Grid, 2009. CCGRID '09. 9th IEEE/ACM International Symposium on , vol., no., pp.396-403, 18-21 May 2009 Maquelin O et al Polling watchdog: Combining polling and interrupts for efficient message handling at ISCA, pp.179-188, 1996 Mo, J. and Walrand, J. Fair end-to-end window-based congestion control. IEEE/ACM Trans. Netw., 556-567, 2000. Ni L, Glass C. The Turn model for Adaptive Routing. In ISCA, 278-287, 1992. OPNET Technologies, Opnet Modeler Accelerating Network R&D, June 2008, http://opnet.com. 2008. Petrini, F., Hoisie, A., Wu-chun Feng, Graham R., "Performance evaluation of the quadrics interconnection network," at IPDPS, pp.1698-1706, Apr 2001. Y. Shihang, G. Min, I. Awan, "An Enhanced Congestion Control Mechanism in InfiniBand Networks for High Performance Computing Systems," at AINA, IEEE Computer Society, vol 1, pp. 845-850, 2006. Singh A., Dally W., Towles B., Gupta AK. Globally Adaptive Load-Balanced Routing on Tori, IEEE Comp. Arch. Letters, vol.3, pp.69, 2004. Valiant LG. Brebner GJ. "Universal Schemes for Parallel Communication". ACM STOC. Milwaukee, 263-277, 1981. A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, D. Panda, "Hot-Spot Avoidance With MultiPathing Over InfiniBand: An MPI Perspective," In Proceedings of the CCGRID, IEEE Computer Society, pp. 479-486, 2007. Top500 Supercomputers Site, Interconnect Family share for 11/2008, Nov. 2008, http://www.top500.org.

Fast-Response Dynamic Routing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Fast-Response Dynamic Routing

Încărcat de

Drepturi de autor:

Formate disponibile

Fast-Response Dynamic Routing Balancing for High-Speed Interconnection Networks*

D. Lugones, D. Franco, and E. Luque.

This work was funded by the MEC-Spain under contract TIN2007-64974

978-1-4244-5012-1/09/$25.00 2009 IEEE

a) Latency Detection and Notification

(b) Metapath Configuration Fig. 1. FR-DRB phases

(c) MultiStepPath Selection

Table 1. Traffic Load Monitoring and Timer functions

Fig. 2. FR-DRB algorithm: Monitoring, Metapath Configuration and MSP Selection

Fig. 3. Performance results for Hot-spot pattern in the mesh network

Fig. 5. Performance results for persistent patterns in a Fat-Tree topology network

[3] [4] [5] [6] [7] [8]

[10] [11] [12] [13] [14] [15]

[16] [17] [18]

S-ar putea să vă placă și