Sunteți pe pagina 1din 11

OAM Challenges In A Carrier Ethernet Network by Stephen Christo and Robert Roden PhD, LightStorm Networks Ltd

What Is OAM? Operations, administration and maintenance (OAM) is a set of functions that enable the detection of network fault conditions and the measurement of network performance. OAM functionality is required to enforce a carriers strict service level agreements (SLAs) by ensuring compliance with QoS guarantees, detecting problems before they escalate, and isolating network defects. A fault may trigger control or management plane functions (eg rerouting, alarm singles, etc) but these functions are outside the pure definition of OAM. In traditional TDM networks, data are transferred at a constant rate so any discontinuity could be identified immediately. When a loss-of-signal (LOS) is detected in a TDM network, a platform would send a failure notification and generate an alarm indication signal (AIS) which would limit the number of failure notifications that an operator would observe. In TDM networks most OAM fields are included in the frame overhead. Ethernet-based networks could take the same approach but most standards have adopted a model that uses special-purpose OAM packets. Depending upon the location and function within a network, systems will support different OAM standards (see Fig. 1).

Fig 1: Different OAM Standards in Network In access applications Ethernet in the first mile (EFM), developed in the IEEE 802.3 workgroup as the 802.3ah specification, provides visibility at the Ethernet physical layer. It uses OAM protocol data units (PDUs) to provide different types of information and works well for point-topoint connections. However, EFM is limited in its capabilities because it can only place a remote device in loopback, set critical event flags, and query a remote device configuration. There is no performance measurement and link state information is minimal. Most importantly, EFM does not solve the more difficult and complex issue of end-to-end Ethernet connectivity and service guarantees. The IEEE 802.1ag and ITU Y.1731 specifications enable service layer monitoring and provide a superset of the EFM functionality. As networks become more sophisticated the requirement for service level granularity will increase and these two standards are expected to become more prevalent across the network. EFM will continue to be deployed but will most likely be limited to simple point-to-point applications and therefore the focus of this TechNote is on Ethernet service layer OAM.

Ethernet Service Layer OAM Overview Ethernet service layer OAM is comprised of two primary components: 1. Connectivity fault management (CFM) 2. Performance monitoring (PM) CFM enables fault detection and localization for point-to-point and multi-point Ethernet virtual connections (EVCs). It provides connectivity checks across multiple administrative domains in order to support end-to-end monitoring. Performance monitoring also spans across domains and measures the frame loss, frame delay, and frame delay variation of packets within a particular EVC or service instance.

Fig 2: Common Multi-Domain Network Model An Ethernet network is often comprised of multiple administrative domains and the IEEE, ITU, and MEF have adopted the common multi-domain network model (see Fig. 2). The service network is divided into three domains: 1. Customer 2. Provider 3. Operator A service instance will span across a network covering one or more operators and each domain will have its own network management system (NMS). A CFM or PM packet would travel between domains on the dotted line (Fig. 2, again). Within each domain, there are maintenance end points (MEPs) and maintenance intermediate points (MIPs). A MEP will initiate and respond to CFM messages. MIPs passively receive these messages and respond back to the originating MEP. A single port can have multiple MEPs and a single MIP (corresponding to the highest level domain).

A maintenance association (MA) is the relationship between two MEPs. MAs are divided into eight levels which are used to prevent leaking of CFM or PM messages across domains. Two levels are for network operators, two are for service providers, and three are for the customer. The eighth level is used for Ethernet section monitoring. In Fig. 2, assume an MA is established between the two customer sites. A CFM message is generated at one customer MEP and sent across the network to its destination MEP at the customer level. The MEPs in the service provider OAM level are co-located with MIPs in the customer level and transparently pass the CFM message through the network as it belong to a higher OAM level, thereby enabling true end-to-end management. A similar relationship also exists between the MEPs and MIPs of the service and operator OAM levels.

Connectivity Fault Management Mechanisms Both 802.1ag and Y.1731 specifications define continuity check messages (CCMs), loopback messages (LBMs), and link trace messages (LTMs). Of these, CCMs are arguably the most important as they function as heart-beat messages between MEPs in order to detect loss of continuity or incorrect network connections. An originating MEP will multicast a CCM message within an MA at a rate ranging from as low as one every ten minutes to as high as three hundred per second. All MIPs and MEPs within the MA will receive the CCM but not respond to it. They will maintain a context database that will contain information such as MEP ID, destination address, port, etc. If a receiving MEP receives a CCM that does not match the expected value or if it does not receive the CCM message within three-and-one-half times the configured transmission period, it will declare a loss of continuity or incorrect network connection. In addition to continuity failure, CCMs can also used to detect the following defects: Service cross-connect (MA ID mismatch) Duplicate MEP configurations (MEP ID match) Forwarding loops (duplicate sequence numbers) Data corruption (bad data checksum) There are two types of LBMs: in-service and out-of-service. In-service is the most prevalent and is used to identify the precise location of a connectivity fault within an MA. This function is performed by an MEP which sends a ping-like request and waits for a response (LRM) from the target MEP or MIP. A MIP before the fault will respond, the one behind will not, identifying the fault location within the MA. Out-of-service LBMs are occasionally used before a service is configured for diagnostic test purposes such as throughput measurements. An LTM is basically a ping message that is used to identify the adjacency relationships of remote MEPs and MIPs within the same MA level. The message body of an LTM includes the destination address of the target MEP or MIP that terminates the link trace. When an intermediate MIP receives an LTM, it will respond back to the originating MEP and forward the LTM until the destination MIP or MEP is reached. Through this function, an LTM can effectively trace the path to a target MEP or MIP.

In addition to these common messages, Y.1731 also defines other CFM type messages and functions including: Alarm Indication Signal (AIS) Remote Defect Indication (RDI) Locked Signal Function (LCK) Test Signal (TST) When an MEP detects a connectivity failure at a specific MA level, an AIS will be multicast away from the failure on each affected S-VLAN. An AIS signal limits the amount of redundant alarms received by a NMS upon a failure and informs clients that a service instance has failed. RDI is an encoded bit in a CCM packet that is used to inform upstream MEPs that there has been a downstream failure. A LCK signal communicates that a MEP is performing intentional diagnostic or administrative actions so they are not confused with actual fault messaging. The TST message includes data which can be used to test throughput, measure bit errors, or detect out of sequence frames.

Performance Management (PM) Mechanisms Ethernet PM encompasses frame delay (with the associated frame delay variation) and frame loss (with the related concept of availability). Initial work on Ethernet PM functions was developed by the ITU as part of their broader activity on Ethernet OAM and is described in Y.1731. At a system level, PM for packet-based networks is an inherently statistical process since the individual measurements will frequently vary widely over time depending on the congestion status of the network; this statistical nature is reflected in the performance attributes often defined in SLAs (which might refer to mean values, median values, minimum/maximum ranges, percentile figures, distributions etc). The overall OAM architecture enabling the collection of PM data is derived from 802.1ag/Y.1731 and includes the previously-mentioned concepts of MAs, MEPs, management domains etc. PMs (delay or loss) related to an EVC are between the MEPs bounding the EVC. Each PM invocation typically lasts for an extended period of time with frequent, periodic message exchanges (typically in the 10 100 ms range) resulting in large amounts of data being collected for subsequent statistical analysis. PMs (delay or loss) can be collected by either a single or dual-ended process. In a single-ended process all control, processing and collection of measurements is performed at the source end using a request/response exchange with the destination, typically, simply responding to messages. In a dual-ended process both ends are co-coordinated by the management system with, typically, the source generating (one-way) messages and the destination processing and collecting the measurements (based on the received one-way message). From a management perspective, single-ended processes are often preferred because the two ends dont need to be co-ordinated and the management system only needs access to the source node.

Fig. 3: Delay Measurement Exchange Single-ended frame delay measurements are collected by a delay measurement message/delay measurement response (DMM/DMR) exchange. The source MEP inserts the DMM transmit time (TxTimestampf) in the DMM (which is assigned the CoS being monitored) while the destination MEP copies that value into the DMR and adds its own DMM receive time (RxTimestampf) and DMR transmit time (TxTimestampb) (see Fig. 3). The source MEP calculates the two-way frame delay after a single DMM/DMR exchange using the following: 2-way FD = RxTimestampb TxTimestampf The source MEP can also calculate one-way frame delay variation from two samples s1 and s2 irrespective of clock synchronization as follows: 1-way forward FDV = [RxTimestampf (s2) TxTimestampf (s2)] [RxTimestampf (s1) TxTimestampf (s1)] 1-way backward FDV = [RxTimestampb (s2) TxTimestampb (s2)] [RxTimestampb (s1) TxTimestampb (s1)] Dual-ended delay measurements must be initiated at both source and destination MEPs and use the 1DM PDU. The transmitting MEP inserts the TxTimestampf value which allows the receiving MEP to calculate the one-way FDV (from a sequence of 1DM frames) and the oneway FD if the source and destinations clocks are synchronized. To enable frame loss calculations each MEP maintains local counters for frames transmitted to the remote MEP (TxFCl) and received from the remote MEP (RxFCl). Exchanging these counters in protocol messages allows one-way frame loss in each direction to be calculated without requiring synchronization of the local and remote counters. With respect to a particular node, the frame loss when transmitting to the remote node is called far-end frame loss while frame loss when receiving is called near-end frame loss.

Fig. 4: Single-Ended Frame Loss Measurement Single-ended frame loss measurement is used for on-demand OAM and uses loss measurement message/loss measurement reply (LMM/LMR) to exchange the counter information. The Y.1731 notation uses forward and backward directions which are with respect to the initiating end (ie the MEP which transmits the LMM), consequently far-end corresponds to forward and near-end to backward (see Fig. 4). The initiator inserts its local TxFCl as the TxFCf field of the LMM The responder generates the LMR and: Copies the TxFCf from the LMM to the LMR Inserts its local RxFCl as the RxFCf field of the LMR Inserts its local TxFCl as the TxFCb of the LMR On receiving an LMR, the initiator can calculate frame loss using the current (c) and previous (p) set of counters: Frame Loss far-end = (TxFCfc TxFCfp) (RxFCfc RxFCfp) Frame Loss near-end = (TxFCbc TxFCbp) (RxFClc RxFClp) Frame loss ratio for both directions can then be simply calculated as frame loss divided by frames transmitted expressed as a percentage. Dual-ended frame loss measurement is always on and uses CCMs to exchange the counters. The basic ideas are similar to that described for single-ended although the mapping between near end/far end and forward/backward is reversed to reflect the fact that the calculations are performed at the destination rather than the source. It is understood that since CCMs are defined to always operate at the same priority, dual-ended frame loss calculations are aggregated over all CoS of the EVC.

Service OAM Frame Format Fig. 5 shows the standard Ethernet Service OAM header. As with all Ethernet packets, the source and destination MAC addresses are preceded by the preamble and frame delimiter. The S-TAG and C-Tag are optional. The OAM Ethertype is defined by the specification committees (eg IEEE802.1ag). The MA level corresponds to the administrative domains such as customer, provider, or operator. The version field is currently unused and set to zero until future extensions of the service OAM protocol are added. The OpCode identifies the OAM message type (eg CCM, LBM, LTM, LMM, etc). The 8-bit flag field provides additional information on the OAM message type such as the CCM interval. The TLV offset contains the offset information to the first TLV field (for example if the offset is zero, TLV information is immediately after the header). Each message type may have specific header information. For example, a continuity check message includes sequence number, MA ID and MEP ID while a loopback message includes a transaction ID. The TLVs provide additional information for the OAM packet.

Fig. 5: Standard Ethernet Service OAM Header Challenges With Scaling Ethernet CFM

Fig. 6: Network Example With Two Endpoints Fig. 6 shows an example network with two endpoints connected through redundant PBB-TE or Pseudowire (PW) tunnels in 10GbE links. For the sake of simplicity, only the customer OAM

domain is shown but it is assumed that the MIPs of this domain are co-located with MEPs within the lower service provider and operator OAM domains. The originating MEP, will send CCM packets at predefined intervals through both the primary and secondary paths which act as periodic heart beat messages for each PBB-TE or PW service. The receiving MEP (and MIPs) will maintain CCM context within a MEP database that has entities such as MEP DA, port, etc. When a CCM is received the MEP will check that it arrived within the specified time interval and compare the packet contents with the expected information in the database. If the CCM packet did not arrive within three-and-one-half times of the specified time interval or if the content compare demonstrated an inaccuracy, a fault condition would be sent to higher level management software. In first-generation carrier Ethernet systems, the generation, reception, and inspection of CCM packets was largely performed in software. However, as Ethernet deployments and related services have become more prevalent, this software approach is no longer adequate. In the network example discussed above, assume the customers purchasing services paid for 50 ms recovery times, requiring intervals between CCMs to be set to the 3.3 ms minimum. This minimum interval is based on a failure notification being generated if a CCM packet is not received within three-and-one-half times of the set interval, in this case 3.5 * 3.3 ms = 11.55 ms. Once the link fault message is sent to the NMS, it then has ~39 ms to complete the failover of the service(s) to the secondary path. Now assume that 2% of a 10GE trunk is used for OAM messaging (ATM specifies 5% or less today) and that all services require a 50 ms failover time. The size of a typical OAM packet is 100 128 byte. Using the following calculations, a realistic number of MAs across a 10GE link can be estimated as: 10 Gbit/s * 2%  200 Mbit/s (128 byte * 8bit) 3.3 ms  303 kbit/s 200 Mbit/s 303 kbit/s  660 MAs It is reasonable to assume that this number can grow to over 1 k MAs per single 10GE port or 10 x 1GE ports. A control processor does not have enough processing cycles and memory bandwidth to support this number of MAs. Preliminary analysis has demonstrated that, at best, the typical control or network processor on an Ethernet line card would support no more than 100 MAs with the minimum 3.3 ms interval. Due to these limitations and the growing requirement to scale services in carrier Ethernet systems, developers are scrambling to upgrade existing platforms. In order to meet carrier requirements, and not overwhelm onboard processors, some level of hardware assist is necessary. OAM service accelerators will abstract the CCM generation, reception and checking state machines into hardware, eliminating the need for software intervention unless a fault condition is identified. With hardware assist, the number of MAs that can be maintained increases by no less than an order of magnitude as shown in Fig. 7.

Fig. 7: Hardware Assist

Challenges With Implementing An Accurate Ethernet PM Solution Compared to continuity checks, performance monitoring is not so bandwidth intensive. However, by enabling hardware-based performance monitoring, more deterministic and accurate measurements are possible. For example, measuring frame loss ratio requires messaging between two endpoints with packets that contain the number of frames transmitted and received. If software is generating these message frames, the time between calculating the transmit/receive value and actually sending the packet can significantly vary based on processor activity leading to false error reporting. By abstracting a portion of the performance monitoring in hardware, additional accuracy in both frame delay (and delay variation) and frame loss can be achieved. The delay measurements are between the ingress and egress MEPs located on the ingress and egress UNIs which imply that the timestamps should be taken as close as possible to the respective MACs; that is, before queuing on ingress and after queuing on egress (thereby ensuring that queuing delays in the edge nodes are taken into account). By integrating a service accelerator that immediately samples a timestamp before generating an outgoing DMM/DMR, or after receiving an incoming DMM/DMR, the non-deterministic nature of a software measurement is eliminated from the solution. It is assumed that the inaccuracy introduced would be extremely small compared to the actual measured delays (typically at least 10s of s and normally 10s of ms). Accurate frame loss measurements involve the co-coordinated collection of Tx and Rx frame counters in a given interval. In effect, the intervals are delimited by the LMM/LMR messages in the traffic stream so, for example, the TX frame count in a LMM should reflect the number of frames in the traffic stream between this LMM and the previous one. For accurate frame loss

measurements, the Tx and Rx counters need to be implemented in the hardware. In addition, the insertion of outgoing LMM/LMR messages or extraction of incoming LMM/LMR packets should be implemented in hardware. By having both the insertion/extraction of frame loss packets and sampling of counters abstracted in hardware, determinism and accuracy are significantly increased.

Summary As Ethernet becomes more prominent in the network, support for service layer OAM capabilities is becoming a critical requirement for any carrier Ethernet platform. With service layer OAM, true end-to-end connectivity and performance monitoring is possible while maintaining clear demarcation points between customers and service providers; service providers and operators. However, as present implementations move from first generation pre-standard solutions to compliant and interoperable second generation designs, system developers are faced with the challenge of scaling and providing more accuracy to their OAM solutions. In order to truly support carrier-class service layer OAM, some level of hardware abstraction is required. Solving the problem exclusively in software will no longer be an option as carriers are demanding increased scaling of continuity checks and improved accuracy of performance measurements in order to enforce strict service level agreements.

About The Authors Robert Roden, Chief Systems Architect Robert Roden has successfully developed advanced networking architectures throughout his career both in Ireland and the UK. Most recently, he was an architect at Intel Corporation developing specifications for network processors, security platforms and modular technology. Roden came to Ireland in 1994 to join Tellabs, where he developed architectures for carrierclass, next-generation switching systems and converged voice/data platforms. Before leaving the UK, he was a Consulting Engineer in the Network Architecture group at Digital Equipment Corporation working on telecommunications and router architecture. His early experience includes LAN technology with ICL and the first European experimental packet switching development with Logica. He earned a BSc and PhD in Physics from Imperial College, London. Steve Christo, Director, Strategic Marketing and Product Management Steve has a 15+ year track record in strategic marketing and product management. Prior to joining Lightstorm Networks he was focused on the high-speed computing and networking markets working for Stargen, Intel, Digital Equipment Corporation and Raytheon. During that period he helped introduce PCI Express and Advanced Switching interconnect (ASI) to the market through frequent speaking engagements, authoring 20 published articles and as the Chair of the ASI marketing workgroup. He earned a degree in electrical engineering from Northwestern University and an MBA from Bentley College.

References 1. IEEE 802.1ag, Virtual Bridge Local Area Networks Amendment 5: Connectivity Fault Management, March 2006 2. Y.1731 OAM Functions and Mechanisms for Ethernet based Networks, May 2006 3. Ethernet Service OAM: Overview, Applications, Deployment, and Issues, Fujitsu Network Communications Inc, 2006 4. End-to-End Ethernet Connectivity Fault Management in Metro and Access Networks, Alcatel-Lucent White Paper 5. Ethernet OAM, Yaakov Stein, RAD Data Communications, 2006 6. Ethernet Operations, Administration, and Maintenance, Cisco Systems

S-ar putea să vă placă și