Documente Academic
Documente Profesional
Documente Cultură
Access Equipment
V200R006C20SPC600
Feature Description
Issue 01
Date 2018-01-30
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Purpose
This document describes the feature in terms of its overview, principle, and applications.
This document together with other types of document helps intended readers get a deep
understanding of the feature. For information on how the ATN equipment supports this
feature, see the Product Description.
Related Version
The following table lists the product version related to this document.
Intended Audience
This document is intended for:
l Network Planning Engineer
l Commissioning Engineer
l Data Configuration Engineer
l System Maintenance Engineer
Security Declaration
l Encryption algorithm declaration
The encryption algorithms DES/3DES/RSA (RSA-1024 or lower)/MD5 (in digital
signature scenarios and password encryption)/SHA1 (in digital signature scenarios) have
a low security, which may bring security risks. If protocols allowed, using more secure
Special Declaration
l This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
l The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
l Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
l The pictures of hardware in this document are for reference only.
Symbol Conventions
Symbol Description
Command Conventions
Convention Description
GUI Conventions
Convention Description
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Contents
2 System Management...................................................................................................................38
2.1 Information Center....................................................................................................................................................... 38
2.1.1 Introduction............................................................................................................................................................... 38
2.1.2 Principles................................................................................................................................................................... 40
2.1.2.1 Information Classification...................................................................................................................................... 40
2.1.2.2 Information Hierarchy............................................................................................................................................ 44
2.1.2.3 Information Output................................................................................................................................................. 45
2.1.2.4 Information Shield.................................................................................................................................................. 47
2.1.2.5 Suppression of the Log Processing Rate................................................................................................................ 48
2.1.2.6 Diagnostic Logs in Binary Format......................................................................................................................... 49
2.1.3 Terms, Acronyms, and Abbreviations....................................................................................................................... 50
2.2 SNMP........................................................................................................................................................................... 50
2.2.1 Terms and Abbreviations...........................................................................................................................................50
2.2.2 Introduction............................................................................................................................................................... 51
2.2.3 Principle.....................................................................................................................................................................52
2.2.3.1 SNMP Management Model and Related Concepts................................................................................................ 52
2.2.3.2 SNMPv1................................................................................................................................................................. 54
2.2.3.3 SNMPv2c................................................................................................................................................................56
2.2.3.4 SNMPv3................................................................................................................................................................. 57
2.2.3.5 Comparisons of SNMPv1, SNMPv2c, and SNMPv3.............................................................................................59
2.2.3.6 SNMP Attack Defense Mechanism........................................................................................................................ 60
2.2.4 Applications...............................................................................................................................................................60
2.2.4.1 SNMP for Configuration Management.................................................................................................................. 60
2.2.4.2 SNMP for VPN User Management........................................................................................................................ 61
2.2.5 Terms, Acronyms, and Abbreviations....................................................................................................................... 62
2.3 RMON and RMON2.................................................................................................................................................... 63
2.3.1 Introduction............................................................................................................................................................... 63
2.3.2 Principles................................................................................................................................................................... 64
2.3.2.1 RMON and RMON2 Infrastructure........................................................................................................................65
2.3.2.2 Features of RMON and RMON2............................................................................................................................72
2.3.2.3 Remote Monitoring of RMON and RMON2......................................................................................................... 74
2.3.2.4 Table Management in RMON and RMON2...........................................................................................................75
2.3.2.5 Implementation of RMON and RMON2 on Huawei Devices............................................................................... 76
2.3.3 Terms and Abbreviations...........................................................................................................................................79
2.4 IP FPM..........................................................................................................................................................................79
2.4.1 Introduction............................................................................................................................................................... 79
2.4.2 Principles................................................................................................................................................................... 80
2.4.2.1 Basic Concepts....................................................................................................................................................... 80
2.4.2.2 Basic Functions.......................................................................................................................................................82
2.4.3 Applications...............................................................................................................................................................86
3 Reliability....................................................................................................................................163
3.1 VRRP..........................................................................................................................................................................163
3.1.1 Introduction............................................................................................................................................................. 163
3.1.2 Principles................................................................................................................................................................. 165
3.1.2.1 Master/Backup Mode........................................................................................................................................... 169
3.1.2.2 VRRP Load Balancing......................................................................................................................................... 169
3.1.2.3 VRRP Tracking Interface Status...........................................................................................................................170
3.1.2.4 BFD for VRRP..................................................................................................................................................... 170
3.1.2.5 Pinging the Virtual IP Address............................................................................................................................. 174
3.1.2.6 VRRP Security..................................................................................................................................................... 174
3.1.2.7 VRRP Smooth Switching..................................................................................................................................... 174
3.1.2.8 mVRRP.................................................................................................................................................................175
3.1.2.9 VRRPv3 Packet Format........................................................................................................................................176
3.1.3 Applications.............................................................................................................................................................177
3.1.3.1 VRRP Tracking Interface Status...........................................................................................................................178
3.1.3.2 mVRRP.................................................................................................................................................................179
3.1.4 Terms, Acronyms and Abbreviations...................................................................................................................... 179
3.2 Bit-Error-Triggered Protection Switching.................................................................................................................. 180
3.2.1 Introduction to Bit-Error-Triggered Protection Switching...................................................................................... 180
3.2.2 Principles................................................................................................................................................................. 181
3.2.3 Applications.............................................................................................................................................................186
3.2.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which an RSVP-TE Tunnel Carries a
PW.................................................................................................................................................................................... 186
3.2.3.2 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which an LDP LSP Carries a PW...187
3.2.3.3 Application of Bit-Error-Triggered Protection Switching on Trunk Interfaces................................................... 188
3.2.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 189
3.3 BFD............................................................................................................................................................................ 189
3.3.1 Overview................................................................................................................................................................. 189
3.3.2 Key Concepts...........................................................................................................................................................190
3.3.2.1 BFD for IP............................................................................................................................................................ 196
3.3.2.2 BFD for PIS.......................................................................................................................................................... 197
3.3.2.3 BFD for TTL........................................................................................................................................................ 198
3.3.2.4 Introduction to BFDv6..........................................................................................................................................199
3.3.3 Application Environment........................................................................................................................................ 200
3.3.3.1 BFD for USR........................................................................................................................................................ 200
3.3.3.2 BFD for OSPF...................................................................................................................................................... 200
3.3.3.3 BFD for IS-IS....................................................................................................................................................... 201
3.3.3.4 BFD for BGP........................................................................................................................................................ 202
3.3.3.5 BFD for LSP......................................................................................................................................................... 202
3.3.3.6 BFD for PST......................................................................................................................................................... 204
3.3.3.7 BFD for TE........................................................................................................................................................... 204
3.3.3.8 BFD for PW..........................................................................................................................................................206
3.3.4 Terms and Abbreviations.........................................................................................................................................208
3.4 NSR Overview............................................................................................................................................................209
3.4.1 Introduction............................................................................................................................................................. 209
3.4.2 NSR Features Supported by the ATN...................................................................................................................... 211
3.5 Ethernet OAM............................................................................................................................................................ 212
3.5.1 Introduction............................................................................................................................................................. 212
3.5.2 Principles................................................................................................................................................................. 212
3.5.2.1 EFM OAM............................................................................................................................................................214
3.5.2.2 Ethernet CFM....................................................................................................................................................... 218
3.5.2.3 Basic Y.1731 Functions........................................................................................................................................ 226
3.5.2.4 OAM Fault Association........................................................................................................................................239
3.5.2.5 OAM-based Security............................................................................................................................................ 241
5.1 Ethernet.......................................................................................................................................................................280
5.1.1 Introduction to Ethernet...........................................................................................................................................280
5.1.2 Principles................................................................................................................................................................. 281
5.1.2.1 Physical Layer of the Ethernet..............................................................................................................................281
5.1.2.2 Data Link Layer of the Ethernet........................................................................................................................... 290
5.1.3 Applications.............................................................................................................................................................294
5.1.3.1 Computer Interconnection.................................................................................................................................... 294
5.1.3.2 Interconnection Between High-Speed Network Devices..................................................................................... 295
5.1.3.3 Means to Access MANs....................................................................................................................................... 295
5.1.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 295
5.2 VLAN......................................................................................................................................................................... 297
5.2.1 Introduction............................................................................................................................................................. 297
5.2.2 Principles................................................................................................................................................................. 298
5.2.2.1 Basic Concepts..................................................................................................................................................... 298
5.2.2.2 VLAN Communication Principles....................................................................................................................... 301
5.2.2.3 VLAN Aggregation.............................................................................................................................................. 305
5.2.2.4 VLAN Mapping....................................................................................................................................................312
5.2.2.5 Flexible Service Access Through Sub-interfaces of Various Types.....................................................................313
5.2.3 Application.............................................................................................................................................................. 320
5.2.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 322
5.3 Trunk...........................................................................................................................................................................322
5.3.1 Introduction............................................................................................................................................................. 322
5.3.2 Principles................................................................................................................................................................. 322
5.3.2.1 Basic Principles.................................................................................................................................................... 322
5.3.2.2 Restrictions on Trunk Interfaces...........................................................................................................................324
5.3.2.3 Trunk Interface Classification and Features......................................................................................................... 325
5.3.2.4 Trunk Forwarding Principles................................................................................................................................ 326
5.3.2.5 Inter-Board Trunk................................................................................................................................................. 327
5.3.2.6 LACP.................................................................................................................................................................... 328
5.3.2.7 E-Trunk.................................................................................................................................................................336
5.3.3 Usage Scenario........................................................................................................................................................ 343
5.3.3.1 Eth-Trunk..............................................................................................................................................................343
5.3.3.2 Link Aggregation Group...................................................................................................................................... 344
5.3.3.3 E-Trunk Application in Dual-homing Networking.............................................................................................. 345
5.3.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 346
5.4 STP/RSTP/MSTP....................................................................................................................................................... 346
5.4.1 Introduction............................................................................................................................................................. 346
5.4.2 Principles of STP/RSTP.......................................................................................................................................... 348
5.4.2.1 Background...........................................................................................................................................................348
5.4.2.2 Basic Concepts..................................................................................................................................................... 350
5.4.2.3 BPDU Format....................................................................................................................................................... 359
5.4.2.4 STP Topology Calculation....................................................................................................................................361
6.7.4.3 DBA......................................................................................................................................................................536
6.7.4.4 FEC....................................................................................................................................................................... 537
6.7.4.5 Line Encryption.................................................................................................................................................... 538
6.7.5 GPON Terminal Authentication and Management..................................................................................................539
6.7.5.1 GPON Terminal Authentication (an ONU Not Preconfigured)........................................................................... 539
6.7.5.2 GPON Terminal Authentication (an ONU Preconfigured).................................................................................. 540
6.7.6 Networking Applications (FTTx)............................................................................................................................ 544
6.7.7 Terms, Acronyms, and Abbreviations..................................................................................................................... 545
7 IP Services................................................................................................................................... 547
7.1 IP Addressing............................................................................................................................................................. 547
7.1.1 Introduction to IP Addresses................................................................................................................................... 547
7.1.2 Principles................................................................................................................................................................. 548
7.1.2.1 Classes of IP Addresses........................................................................................................................................ 548
7.1.2.2 Characteristics of IP Addresses............................................................................................................................ 549
7.1.2.3 Special IP Addresses............................................................................................................................................ 550
7.1.2.4 Private IP Addresses............................................................................................................................................. 551
7.1.3 Applications.............................................................................................................................................................551
7.1.3.1 Subnetting............................................................................................................................................................. 551
7.1.3.2 IP Address Allocation...........................................................................................................................................553
7.1.3.3 IP Address Unnumbered.......................................................................................................................................553
7.1.3.4 IP Address Resolution.......................................................................................................................................... 553
7.1.3.5 IP Address Overlapping in the VPN Instance...................................................................................................... 554
7.1.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 555
7.2 ARP............................................................................................................................................................................ 555
7.2.1 Introduction to ARP.................................................................................................................................................555
7.2.2 Principles................................................................................................................................................................. 557
7.2.2.1 Basic ARP Principles............................................................................................................................................557
7.2.2.2 Dynamic ARP.......................................................................................................................................................563
7.2.2.3 Static ARP............................................................................................................................................................ 566
7.2.2.4 ARP Automatic Scanning and Fixed ARP........................................................................................................... 568
7.2.2.5 Gratuitous ARP.....................................................................................................................................................569
7.2.2.6 Proxy ARP............................................................................................................................................................ 571
7.2.2.7 ARP-Ping..............................................................................................................................................................576
7.2.2.8 IP Address Conflict Detection.............................................................................................................................. 579
7.2.2.9 ARP Security........................................................................................................................................................ 580
7.2.3 Applications.............................................................................................................................................................582
7.2.3.1 Application of Static ARP.................................................................................................................................... 583
7.2.3.2 Application of Proxy ARP Within a VLAN.........................................................................................................583
7.2.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 585
7.3 ACL............................................................................................................................................................................ 585
7.3.1 Introduction to the ACL.......................................................................................................................................... 585
7.3.2 Principles................................................................................................................................................................. 586
8 IP Routing................................................................................................................................... 637
8.1 IP Routing Overview.................................................................................................................................................. 637
8.1.1 Introduction to IP Routing....................................................................................................................................... 637
8.1.2 Principles................................................................................................................................................................. 637
8.1.2.1 Routers..................................................................................................................................................................638
8.1.2.2 Routing Protocols................................................................................................................................................. 639
8.1.2.3 Routing Table and FIB Table................................................................................................................................639
8.1.2.4 Route Iteration...................................................................................................................................................... 642
8.1.2.5 Static Routes and Dynamic Routes...................................................................................................................... 642
9 IP Multicast.................................................................................................................................823
9.1 IP Multicast Overview................................................................................................................................................823
9.1.1 Introduction............................................................................................................................................................. 823
9.1.2 Principles................................................................................................................................................................. 826
9.1.2.1 Basic Concepts..................................................................................................................................................... 826
9.1.2.2 Basic Framework.................................................................................................................................................. 827
9.1.2.3 Multicast Addresses..............................................................................................................................................828
9.1.2.4 Multicast Model Classification.............................................................................................................................830
9.1.2.5 Multicast Protocols............................................................................................................................................... 831
9.1.2.6 Multicast Packet Forwarding................................................................................................................................833
9.1.3 Terms, Acronyms, and Abbreviations..................................................................................................................... 834
9.2 PIM............................................................................................................................................................................. 835
9.2.1 PIM.......................................................................................................................................................................... 835
9.2.2 Principles................................................................................................................................................................. 836
9.2.2.1 PIM-DM............................................................................................................................................................... 836
9.2.2.2 PIM-SM................................................................................................................................................................ 842
9.2.2.3 PIM-SSM..............................................................................................................................................................855
9.2.2.4 PIM Reliability..................................................................................................................................................... 856
9.2.2.5 PIM Security.........................................................................................................................................................858
9.2.2.6 PIM Control Message........................................................................................................................................... 867
9.2.3 Applications.............................................................................................................................................................880
9.2.3.1 PIM-DM Intra-domain......................................................................................................................................... 880
9.2.3.2 PIM Intra-domaim................................................................................................................................................ 881
9.2.3.3 PIM-SSM Intra-domain........................................................................................................................................ 883
9.2.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 885
9.3 IGMP.......................................................................................................................................................................... 886
9.3.1 Introduction............................................................................................................................................................. 886
9.3.2 Principles................................................................................................................................................................. 887
9.3.2.1 IGMPv1&v2&v3.................................................................................................................................................. 887
9.3.2.2 IGMP Group Compatibility..................................................................................................................................889
9.3.2.3 IGMP Querier Election.........................................................................................................................................890
9.3.2.4 Router-Alert for IGMP......................................................................................................................................... 891
9.3.2.5 IGMP Only-Link.................................................................................................................................................. 891
9.3.2.6 IGMP On-Demand............................................................................................................................................... 891
9.3.2.7 IGMP Prompt-Leave............................................................................................................................................ 892
9.3.2.8 IGMP Policy Control............................................................................................................................................893
9.3.2.9 SSM Mapping.......................................................................................................................................................895
9.3.2.10 Source Address-based IGMP Message Filtering................................................................................................897
9.3.2.11 Protocol Comparison.......................................................................................................................................... 898
9.3.3 IGMP Applications..................................................................................................................................................898
10 MPLS..........................................................................................................................................968
10.1 MPLS Basics............................................................................................................................................................ 968
10.1.1 Introduction........................................................................................................................................................... 968
10.1.2 Principles............................................................................................................................................................... 969
10.1.2.1 Concepts............................................................................................................................................................. 969
10.1.2.2 Establishing LSPs............................................................................................................................................... 976
10.1.2.3 MPLS Forwarding.............................................................................................................................................. 977
10.1.2.4 MPLS Ping/Traceroute....................................................................................................................................... 982
10.1.3 Applications...........................................................................................................................................................987
10.1.3.1 MPLS-based VPN.............................................................................................................................................. 987
10.1.3.2 PBR to an LSP.................................................................................................................................................... 988
10.1.4 Terms, Acronyms, and Abbreviations................................................................................................................... 988
10.2 MPLS LDP............................................................................................................................................................... 990
10.2.1 Introduction........................................................................................................................................................... 990
10.2.2 Principles............................................................................................................................................................... 990
10.2.2.1 Concepts............................................................................................................................................................. 990
10.2.2.2 LDP Sessions...................................................................................................................................................... 992
11 VPN.......................................................................................................................................... 1119
11.1 VPN Overview........................................................................................................................................................ 1119
11.1.1 Introduction to VPN.............................................................................................................................................1120
11.1.1.1 Classification of VPN....................................................................................................................................... 1122
11.1.1.2 Architecture of VPN......................................................................................................................................... 1127
11.1.1.3 Typical Networking of VPN............................................................................................................................. 1127
11.1.2 Principles..............................................................................................................................................................1127
11.1.2.1 VPN Tunnel...................................................................................................................................................... 1128
11.1.2.2 Implementation Modes of VPN........................................................................................................................ 1128
11.1.2.3 Features Related to the Implementation of VPN.............................................................................................. 1129
11.1.3 VPN Applications................................................................................................................................................ 1130
11.1.4 Terms, Acronyms, and Abbreviations..................................................................................................................1132
11.2 Tunnel Policy.......................................................................................................................................................... 1144
11.2.1 Introduction..........................................................................................................................................................1144
11.2.2 Principles..............................................................................................................................................................1144
11.2.2.1 Tunnel Type Prioritizing Policy........................................................................................................................ 1144
11.2.2.2 Tunnel Binding Policy...................................................................................................................................... 1144
11.2.2.3 Comparison of Tunnel Policies.........................................................................................................................1145
11.2.2.4 Tunnel Selector................................................................................................................................................. 1145
11.2.2.5 Introduction.......................................................................................................................................................1146
11.2.3 Applications......................................................................................................................................................... 1147
11.2.3.1 Connecting Discontinuous Local Networks into a VPN.................................................................................. 1147
11.2.4 Terms, Acronyms, and Abbreviations..................................................................................................................1148
11.7.1 Introduction..........................................................................................................................................................1212
11.7.2 Principles............................................................................................................................................................. 1213
11.7.2.1 Centralized Management of Hard-Pipe-based Leased Line Services on the NMS.......................................... 1213
11.7.2.2 Interface-based Hard Pipe Bandwidth Reservation.......................................................................................... 1215
11.7.2.3 AC Interface Service Bandwidth Limitation.................................................................................................... 1216
11.7.2.4 Hard Pipe TE LSP.............................................................................................................................................1216
11.7.2.5 Hard Pipe VLL/PWE3 PW............................................................................................................................... 1216
11.7.2.6 Hard Pipe Reliability........................................................................................................................................ 1218
11.7.2.7 Hard Pipe Service Quality Monitoring............................................................................................................. 1218
11.7.3 Applications......................................................................................................................................................... 1218
11.7.3.1 Hard-Pipe-based Enterprise Leased Line Application......................................................................................1218
11.7.3.2 Hard-Pipe-based Enterprise Leased Line Protection........................................................................................ 1219
11.7.3.3 Hard-Pipe-based Leased Line Services Implemented Using Both Huawei and Non-Huawei Devices........... 1219
11.7.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1220
11.8 VPLS.......................................................................................................................................................................1220
11.8.1 Introduction to VPLS...........................................................................................................................................1221
11.8.2 Principles............................................................................................................................................................. 1222
11.8.2.1 VPLS Introduction............................................................................................................................................1222
11.8.2.2 LDP VPLS........................................................................................................................................................ 1230
11.8.2.3 BGP AD VPLS................................................................................................................................................. 1231
11.8.2.4 VPLS PW Redundancy.....................................................................................................................................1236
11.8.3 Terms, Acronyms, and Abbreviations................................................................................................................. 1239
11.9 L2VPN Loop Detection.......................................................................................................................................... 1239
11.9.1 Overview..............................................................................................................................................................1240
11.9.2 Principles............................................................................................................................................................. 1240
11.9.2.1 Basic Concepts and Implementation Principles................................................................................................1240
11.9.3 Applications......................................................................................................................................................... 1242
11.9.3.1 Application of L2VPN Loop Detection When a CE Is Single-homed to a PE over Redundant Links............1242
11.9.3.2 Application of L2VPN Loop Detection When a Customer Network Is Dual-homed to a VPLS/VLL Network
........................................................................................................................................................................................ 1243
11.9.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1244
11.10 IP RAN Virtual Cluster.........................................................................................................................................1244
11.10.1 Introduction to IP RAN Virtual Clusters........................................................................................................... 1244
11.10.2 Principles........................................................................................................................................................... 1252
11.10.2.1 Data Plane....................................................................................................................................................... 1252
11.10.2.2 Control Plane.................................................................................................................................................. 1254
11.10.2.3 Management Plane..........................................................................................................................................1259
11.10.2.4 Protection Switching.......................................................................................................................................1260
11.10.2.5 Graceful Restart.............................................................................................................................................. 1261
11.10.2.6 OAM............................................................................................................................................................... 1262
11.10.3 Application.........................................................................................................................................................1264
11.10.3.1 Application of IP RAN Virtual Clusters......................................................................................................... 1264
11.10.4 Terms, Acronyms, and Abbreviations............................................................................................................... 1267
12 QoS........................................................................................................................................... 1269
12.1 QoS Overview........................................................................................................................................................ 1269
12.1.1 Introduction to QoS............................................................................................................................................. 1269
12.1.1.1 Traditional Packets Transmission Application................................................................................................. 1270
12.1.1.2 New Applications Requirements...................................................................................................................... 1270
12.1.2 End-to-End QoS Model....................................................................................................................................... 1270
12.1.2.1 Best-Effort Service Model................................................................................................................................1271
12.1.2.2 Integrated Service Model..................................................................................................................................1271
12.1.2.3 Differentiated Service Model........................................................................................................................... 1271
12.1.3 Techniques Used for the QoS Application.......................................................................................................... 1276
12.1.3.1 Traffic Classification........................................................................................................................................ 1278
12.1.3.2 Traffic Policing and Shaping............................................................................................................................ 1278
12.1.3.3 Congestion Avoidance Configuration...............................................................................................................1279
12.1.3.4 RSVP................................................................................................................................................................ 1281
12.2 Traffic Policing and Traffic Shaping...................................................................................................................... 1281
12.2.1 Introduction......................................................................................................................................................... 1281
12.2.2 Principles............................................................................................................................................................. 1282
12.2.2.1 Basic Principles of Traffic Policing..................................................................................................................1282
12.2.2.2 Basic Principles of Traffic Shaping.................................................................................................................. 1284
12.2.3 Applications.........................................................................................................................................................1286
12.2.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1287
12.3 Congestion Avoidance and Management............................................................................................................... 1288
12.3.1 Introduction......................................................................................................................................................... 1288
12.3.2 Principles............................................................................................................................................................. 1289
12.3.2.1 Basic Principles of Congestion Avoidance.......................................................................................................1289
12.3.2.2 Basic Principles of Congestion Management................................................................................................... 1293
12.3.3 Applications.........................................................................................................................................................1296
12.3.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1296
12.4 Class-Based QoS.................................................................................................................................................... 1298
12.4.1 Introduction......................................................................................................................................................... 1298
12.4.2 Principles............................................................................................................................................................. 1299
12.4.2.1 Simple Traffic Classification............................................................................................................................ 1299
12.4.2.2 Complex Traffic Classification.........................................................................................................................1304
12.4.3 Applications.........................................................................................................................................................1307
12.4.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1309
12.5 HQoS...................................................................................................................................................................... 1309
12.5.1 Introduction to HQoS.......................................................................................................................................... 1310
12.5.2 Principles............................................................................................................................................................. 1310
12.5.2.1 Related Concepts of HQoS...............................................................................................................................1310
12.5.2.2 Queue Scheduling Technology......................................................................................................................... 1311
12.5.2.3 HQoS Queue Scheduling..................................................................................................................................1313
12.5.3 HQoS Applications..............................................................................................................................................1314
13 Clock........................................................................................................................................ 1316
13.1 Clock Synchronization........................................................................................................................................... 1316
13.1.1 Introduction......................................................................................................................................................... 1316
13.1.2 Principles............................................................................................................................................................. 1317
13.1.2.1 Basic Concepts................................................................................................................................................. 1317
13.1.2.2 Synchronization Mode and Issues of Concern................................................................................................. 1317
13.1.2.3 Networking Mode for Clock Synchronization................................................................................................. 1319
13.1.2.4 Typical Networking for Clock Synchronization...............................................................................................1321
13.1.2.5 Clock Protection Switching.............................................................................................................................. 1323
13.1.3 Applications.........................................................................................................................................................1325
13.1.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1331
13.2 NTP.........................................................................................................................................................................1331
13.2.1 Introduction......................................................................................................................................................... 1332
13.2.2 Principle...............................................................................................................................................................1332
13.2.2.1 Network Architecture....................................................................................................................................... 1332
13.2.2.2 Operating Mode................................................................................................................................................ 1333
13.2.2.3 Event Processing of NTP..................................................................................................................................1336
13.2.2.4 Operating Principle...........................................................................................................................................1338
13.2.2.5 Security Mechanism......................................................................................................................................... 1339
13.2.2.6 Dynamic and Static Associations of NTP........................................................................................................ 1340
13.2.3 Terms and Acronyms...........................................................................................................................................1340
13.3 1588v2.................................................................................................................................................................... 1342
13.3.1 Introduction to 1588v2........................................................................................................................................ 1342
13.3.2 Principles............................................................................................................................................................. 1344
13.3.2.1 Basic Concepts................................................................................................................................................. 1345
13.3.2.2 Principle of Synchronization............................................................................................................................ 1347
13.3.3 Application Environment.................................................................................................................................... 1356
13.3.4 Terms and Abbreviations.....................................................................................................................................1358
13.4 1588 ACR............................................................................................................................................................... 1360
13.4.1 Introduction to 1588 ACR................................................................................................................................... 1360
13.4.2 Principles............................................................................................................................................................. 1361
13.4.2.1 Basic Mechanisms of 1588 ACR......................................................................................................................1361
13.4.2.2 Basic Principles of 1588 ACR..........................................................................................................................1362
13.4.3 Applications.........................................................................................................................................................1364
13.4.4 Terms and Abbreviations.....................................................................................................................................1366
13.5 1588 ATR................................................................................................................................................................1367
13.5.1 Introduction to 1588 ATR....................................................................................................................................1368
13.5.2 Principles............................................................................................................................................................. 1369
13.5.2.1 Basic Mechanisms of 1588 ATR...................................................................................................................... 1369
13.5.2.2 Basic Principles of 1588 ATR.......................................................................................................................... 1370
13.5.3 Applications.........................................................................................................................................................1372
14 Security.................................................................................................................................... 1393
14.1 MAC Address Limit............................................................................................................................................... 1393
14.1.1 Introduction to MAC Address Limitation........................................................................................................... 1393
14.1.2 Principles............................................................................................................................................................. 1394
14.1.2.1 Basic Principles of MAC Address Limit.......................................................................................................... 1394
14.1.2.2 Traffic Suppression Principle........................................................................................................................... 1395
14.1.3 MAC Address Limit Applications.......................................................................................................................1396
14.1.4 Terms, Acronyms, and Abbreviations ................................................................................................................ 1397
14.2 DHCP Snooping..................................................................................................................................................... 1398
14.2.1 Introduction......................................................................................................................................................... 1398
14.2.2 Principles............................................................................................................................................................. 1398
14.2.2.1 Bogus DHCP Server Attack............................................................................................................................. 1398
14.2.2.2 Middleman Attack and IP/MAC Spoofing Attack........................................................................................... 1399
14.2.2.3 DoS Attack Launched by Changing the Value of the CHADDR Field............................................................1401
14.2.2.4 Format of the Option 82 Field.......................................................................................................................... 1402
14.2.3 Applications.........................................................................................................................................................1404
14.2.4 Terms, Acronyms, and Abbreviations ................................................................................................................ 1405
14.3 URPF...................................................................................................................................................................... 1405
14.3.1 Introduction......................................................................................................................................................... 1406
14.3.2 Principles............................................................................................................................................................. 1407
14.3.2.1 Principles of URPF........................................................................................................................................... 1407
14.3.3 Applications.........................................................................................................................................................1408
14.3.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1411
14.4 Local Attack Defense..............................................................................................................................................1411
14.4.1 Introduction..........................................................................................................................................................1411
14.4.2 Principle of Device Security................................................................................................................................1412
14.4.2.1 Management and Control Plane Protection...................................................................................................... 1412
14.4.2.2 Attack Source Tracing...................................................................................................................................... 1412
14.4.2.3 CP-CAR............................................................................................................................................................1413
14.4.2.4 Whitelist-based Application Layer Association............................................................................................... 1414
14.4.2.5 Alarm................................................................................................................................................................ 1414
14.4.3 Applications.........................................................................................................................................................1414
14.4.3.1 Whitelist-based Application Layer Association............................................................................................... 1414
14.4.3.2 CP-CAR............................................................................................................................................................1415
14.4.4 Acronyms and Abbreviations.............................................................................................................................. 1415
14.4.4.1 Abbreviations....................................................................................................................................................1416
14.5 Mirroring................................................................................................................................................................ 1416
14.5.1 Introduction to Mirroring.....................................................................................................................................1416
14.5.2 Principle...............................................................................................................................................................1417
14.5.2.1 Principle of Local Mirroring.............................................................................................................................1418
14.5.2.2 Application....................................................................................................................................................... 1418
14.6 Online Packet Head Capture...................................................................................................................................1419
14.6.1 Introduction......................................................................................................................................................... 1419
14.6.2 Principles............................................................................................................................................................. 1419
14.6.3 Applications.........................................................................................................................................................1420
14.6.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1421
14.7 MPAC..................................................................................................................................................................... 1421
14.7.1 Introduction......................................................................................................................................................... 1421
14.7.2 Principles............................................................................................................................................................. 1422
14.7.3 Terms, Acronyms, and Abbreviations................................................................................................................. 1423
14.8 Keychain................................................................................................................................................................. 1424
14.8.1 Introduction......................................................................................................................................................... 1424
14.8.2 Principles............................................................................................................................................................. 1425
14.8.2.1 Principles of Keychain......................................................................................................................................1425
14.8.3 Applications.........................................................................................................................................................1425
14.8.3.1 Non-TCP Applications of Keychain.................................................................................................................1425
14.8.3.2 TCP Applications of Keychain.........................................................................................................................1427
14.8.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1429
14.9 IPSec....................................................................................................................................................................... 1430
14.9.1 Introduction......................................................................................................................................................... 1430
14.9.2 Principles............................................................................................................................................................. 1431
14.9.2.1 IPSec Basic Concepts....................................................................................................................................... 1431
14.9.2.2 IPSec Implementation.......................................................................................................................................1435
14.9.3 Applications.........................................................................................................................................................1436
14.9.3.1 IPSec Application in PIM.................................................................................................................................1436
14.9.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1437
15 User Management..................................................................................................................1439
15.1 AAA and User Management.................................................................................................................................. 1439
15.1.1 Introduction to AAA and User Management...................................................................................................... 1440
15.1.2 Principles............................................................................................................................................................. 1441
15.1.2.1 AAA..................................................................................................................................................................1441
15.1.2.2 RADIUS........................................................................................................................................................... 1443
15.1.2.3 HWTACACS.................................................................................................................................................... 1447
15.1.2.4 User Management.............................................................................................................................................1448
15.1.3 Applications.........................................................................................................................................................1449
15.1.3.1 RADIUS Authentication and Accounting........................................................................................................ 1449
15.1.3.2 HWTACACS Authentication, Accounting, and Authorization........................................................................1449
15.1.4 Acronyms and Abbreviations.............................................................................................................................. 1450
15.2 DHCP......................................................................................................................................................................1450
15.2.1 DHCP Overview..................................................................................................................................................1450
15.2.2 Principles............................................................................................................................................................. 1451
15.2.2.1 DHCP Overview...............................................................................................................................................1451
15.2.2.2 Introduction to DHCP Messages...................................................................................................................... 1452
15.2.2.3 Description of the Option 82 Field................................................................................................................... 1455
15.2.2.4 Operation Principle of a DHCP Client............................................................................................................. 1457
15.2.2.5 DHCP Relay Principles.................................................................................................................................... 1458
15.2.2.6 Working Principles of a DHCP Server............................................................................................................. 1460
15.2.3 Applications.........................................................................................................................................................1464
15.2.3.1 DHCP Client Application................................................................................................................................. 1464
15.2.3.2 DHCP Server Application................................................................................................................................ 1464
15.2.3.3 DHCP Relay Application................................................................................................................................. 1465
15.2.4 Terms and Abbreviations.....................................................................................................................................1465
15.3 DHCPv6..................................................................................................................................................................1466
15.3.1 Introduction......................................................................................................................................................... 1466
15.3.2 Principles............................................................................................................................................................. 1467
15.3.2.1 Principles of DHCPv6 Access.......................................................................................................................... 1467
15.3.3 Applications.........................................................................................................................................................1470
15.3.3.1 DHCPv6 Client over PPPoE (Including DHCPv6-PD)................................................................................... 1470
15.3.3.2 DHCPv6 Relay................................................................................................................................................. 1471
15.3.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1471
15.4 Plug-and-Play......................................................................................................................................................... 1472
15.4.1 Introduction to Plug-and-Play............................................................................................................................. 1472
15.4.2 Principles............................................................................................................................................................. 1472
15.4.2.1 Principle of DHCP............................................................................................................................................1472
15.4.2.2 Operation Process of PnP................................................................................................................................. 1473
15.4.3 Applications.........................................................................................................................................................1475
15.4.3.1 Application of PnP............................................................................................................................................1475
15.4.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1475
15.5 DCN........................................................................................................................................................................1476
15.5.1 Introduction......................................................................................................................................................... 1476
15.5.2 Principles............................................................................................................................................................. 1477
15.5.2.1 Basic Concepts................................................................................................................................................. 1477
15.5.2.2 Basic DCN Principles....................................................................................................................................... 1478
15.5.2.3 DCN over Service Interfaces............................................................................................................................ 1481
15.5.2.4 Gateway DCN over the Control Plane............................................................................................................. 1483
15.5.2.5 DCN Security................................................................................................................................................... 1484
15.5.3 Applications.........................................................................................................................................................1485
15.5.3.1 Typical DCN Application................................................................................................................................. 1485
15.5.4 Terms and Abbreviations.....................................................................................................................................1486
15.6 PPPoE Access.........................................................................................................................................................1487
15.6.1 Introduction to the PPPoE................................................................................................................................... 1487
15.6.2 Principles............................................................................................................................................................. 1487
15.6.2.1 PPPoE Negotiation Process for User Login..................................................................................................... 1487
15.6.2.2 PPPoE Packet Format....................................................................................................................................... 1490
15.6.2.3 PPPoE Packet Structure....................................................................................................................................1491
15.6.3 Applications.........................................................................................................................................................1494
15.6.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1494
15.7 PPPoE+...................................................................................................................................................................1495
15.7.1 Introduction to PPPoE+....................................................................................................................................... 1495
15.7.2 Principles............................................................................................................................................................. 1495
15.7.3 Applications.........................................................................................................................................................1497
15.7.4 Acronyms and Abbreviations.............................................................................................................................. 1498
15.8 802.1x Access......................................................................................................................................................... 1498
15.8.1 Introduction......................................................................................................................................................... 1499
15.8.2 Principle...............................................................................................................................................................1499
15.8.2.1 Basic Principle of 802.1x Access..................................................................................................................... 1499
15.8.2.2 Authentication Initiation and User Logoff....................................................................................................... 1500
15.8.2.3 EAP Packet Relaying and Termination............................................................................................................ 1501
15.8.2.4 Basic Process of the 802.1x Authentication System........................................................................................ 1501
15.8.3 Applications.........................................................................................................................................................1503
15.8.4 Terms, Acronyms, and Abbreviations................................................................................................................. 1504
15.9 Attributes List of RADIUS, HWTACACS.............................................................................................................1504
15.9.1 HWTACACS Attribute........................................................................................................................................1504
15.9.2 RADIUS Attributes............................................................................................................................................. 1506
15.9.2.1 Attributes Carried in RADIUS Packets............................................................................................................ 1506
15.9.2.1.1 Attributes in RADIUS Packets...................................................................................................................... 1506
15.9.2.1.2 Attributes in RADIUS COA&DM Packets................................................................................................... 1512
1 Basic Configurations
This document describes the basic configurations protocols and features in terms of its
overview, principle, and applications.
1.1.1 Overview
This section describes the primary functions and the evolution of the VRP.
NOS
NOS is a type of system software used to realize network access and provide interconnection
services.
The primary functions of the NOS are as follows:
With the expansion of network scale and the rapid development of Internet technologies, an
efficient and stable NOS becomes key to guaranteeing network services and service quality.
VRP
Similar to the NOS, the VRP is the nerve center to Huawei products, ranging from low-end to
core ATNs, Ethernet switches to service gateways.
l Unified user interface and management interface: unified kernel of real-time operating
system, IP forwarding engine, IP routing, and configuration management plane.
l Control plane, interface criterion on the forwarding plane, interaction between link layer
of various products and the VRP control plane
l Shielding link layer discrepancy from the network layer through the network interface
layer
VRP
FIB
Packet Forwarding ASIC/
Engine NP
(CPU/ASIC/NP-based)
I/O I/O
Card Card
VRP FIB
VRP
I/O I/O
Card Card
1.1.2.3 SCP
AAA LocalM
CM
1.1.2.4 DFP
FE API
FEC
FE DRV
1.1.2.5 GCP
MRM4/6
Net Interface Subsystem MPLS Subsystem
1.1.2.6 SMP
1.1.2.7 SSP
Kernel
Operating System
l All the protocols and features are componentized and can be dynamically controlled
through the License.
l Core components are separated from the hardware platforms, and can provide better
adaptability and cross-platform application.
1.1.3.2 License
The License furnishes the VRP with high flexibility.
The License manages the following items:
l Features (GTL license): A user can access only those features allowed by the License.
l Resources (PAF license): For example, the License sets limits on the number of reserved
routes, on the number of LSPs, and on the number of VPN instances.
Generally, the product price increases along with the quantity of functions purchased.
Functions not required can be removed to save the customer money.
1.1.3.3 HA
High availability (HA) guarantees the availability of the VRP 99.999% of the time. In other
words, the unavailable period is less than 5 minutes in a year.
To enable HA, the VRP adopts the following mechanisms:
l System level hot standby (HSB)
l System level GR
l Protocol level GR
l Fast reroute (FRR)
l FTP client/server
l TFTP client
l SSH FTP (SFTP) client/server
The following describes the principles of every protocol feature according to the type,
including the following parts:
l FTP
l TFTP
l Telnet
l SSH
l User management
l Virtual file system
l Daylight saving time
l Timing restart
Purpose
The terminal service provides the access interface and HMIs for users to configure devices.
File transfer provides transmission control for system files and configuration files, and simple
remote management for the file system.
1.2.2 Principles
1.2.2.1 FTP
As a protocol in the TCP/IP protocol suite, the File Transfer Protocol (FTP), running at the
application layer, is used for transferring files between local and remote hosts over the
Internet. FTP, which is implemented based on the file system, has been widely used during
version upgrade, log downloading and configuration saving.
NOTE
FTP is insecure. Using SFTP is recommended.
IP Network
Server Client
l FTP server: indicates that the ATN functions as an FTP server to which users can log in
to access files by running the FTP client program.
l FTP client: indicates that the ATN functions as an FTP client that can access files saved
on a remote server. After running the terminal emulation program or using the Telnet
program on a PC to set up a connection to the ATN, a user can set up a connection to a
remote FTP server by using the FTP commands and access files saved on the remote
server.
In addition to file transfer, FTP supports interactive access, format specifications, and
authentication control.
FTP provides common file operation s to help users perform simple management over the file
system as well as supporting file transfer between hosts. Users can use a PC running the FTP
client program to upload files, download files, and access file directories on the ATN that
functions as an FTP server, or, use the FTP client program on the ATN that functions as an
FTP client to transfer files to an FTP server.
l File type
– ASCII mode is used for text. Data is converted from the sender's character
representation to "8-bit ASCII" before transmission, and to the receiver's character
representation.
– Extended Binary-Coded Decimal Interchange Code (EBCDIC) mode requires that
both ends use the EBCDIC character set.
– Binary mode requires that the sender sends each file byte for byte. This mode is
often used to transfer image files and program files.
– Local mode allows two hosts using different file systems to send files in binary bit
streams. The bit stream of each byte is defined by the sender.
NOTE
The ATN supports the ASCII and binary modes. Differences between these two modes are as
follows:
l ASCII characters are used to separate carriage returns from line feeds.
l Binary characters can be transferred without format converting.
The client can select an FTP transmission mode, but by default the ASCII mode is used. The client
can use a mode switch command to switch between the two modes.
l File structure
– Byte stream structure is also called the file structure. A file is considered as a
continuous byte stream.
– Record structure is used only for text files in either ASCII or EBCDIC mode.
– Page structure files are transferred page for page with the pages numbered so the
receiver can save them without worrying about the pages being out of order.
NOTE
The ATN supports both the record structure and the byte stream structure.
l Transfer mode
– Stream mode
Data is sent as a continuous stream. For the file structure, the sender sends an End-
Of-File (EOF) indicator at the end of file transfer to prompts the receiver to close
the data connection. For the record structure, a two-byte sequence number is used to
indicate the end of the record and file.
– Block mode
FTP breaks a file into several blocks and each block starts with a block header.
– Compressed mode
FTP compresses the bytes that are the same and consecutively sent.
NOTE
The ATN supports the stream mode.
l port command
The port command enables an interface. The command format is port a,b,c,d,e,f. a,b,c,d
specifies the IP address of an interface, in dotted decimal notation; e,f, which consists of
two decimal numbers, specifies the interface number calculated based on the formula of
e x 256 + f. For example:
ftp> debug
Debugging On .
ftp> ls
---> PORT 10,164,9,96,5,28
Here, 10.164.9.96 is an IP address; the values 5 and 28 are used to calculate the interface
number 1308 (5 x 256 + 28 = 1308).
FTP Connections
Figure 1-10 shows the process of file transfer through FTP.
Control
User Protocol Connection Server Protocol
Interpreter Interpreter
Client Server
l Control connection
A control connection is set up between the FTP client and the FTP server. The server
enables common port 21 and then waits for a connection request from the client; the
client enables common port 21 and then sends a request for setting up a connection to the
server.
A control connection always waits for communication between the client and the server,
transmits related commands from the client to the server, and then responses from the
server to the client.
l Data connection
The server uses port 20 for data connections. Generally, the server can either open or
close a data connection actively. For files sent from the client to the server in the form of
streams, however, only the client can close a data connection.
FTP transfers each file in streams, using an EOF indicator to identify the end of a file.
Therefore, a new data connection is required for each file or directory list to be
transferred. When a file is being transferred between the client and the server, it indicates
that a data connection is set up.
FTP
In the current system, FTP manages the control connection by using User Protocol
Interpretation (User-PI) and Server Protocol Interpretation (Server-PI) and transfers files by
using the User Data Transport Process (User-DTP) and Server Data Transport Process
(Server-DTP).
l FTP client
The FTP User Interface (UI) provides an interactive command line interface (CLI) for
users, which receives and interprets command lines input by users and offers help
information. After receiving a command on the UI, FTP triggers User-PI to convert the
command into a standard FTP command, and then manages the control connection to the
FTP client.
– After a login command is input, User-PI creates a control connection between the
client and the server.
– After a directory operation command is input, User-PI sends and receives control
data between the client and the server.
– After a file transfer command is input, User-PI enables User-DTP to transfer files
between the client and the server. User-DTP is responsible for creating a data
connection to the FTP server for data exchange. The data connection is temporarily
set up. That is, a data connection is set up when files or directory lists need to be
transferred and disconnected when the transfer process is complete or a
disconnection request is received.
l FTP server
Server-PI listens to FTP standard port 21 to wait for connection requests from the FTP
client. After receiving a login connection request from the FTP client, the FTP server
handles the request and sends a reply.
– After a login command is received, the login authentication process is triggered. If
the login authentication succeeds, a control connection to the FTP client is set up.
– After files are received, Server-DTP and User-DTP are triggered to create a data
connection to transfer files.
Server-DTP supports both active and passive data connection requests. By default,
Server-DTP is in the active state.
When Server-DTP is transferring data, a user can forcibly disconnect the connection.
Upon receiving a disconnection request, Server-DTP stops transferring data and
disconnects the connection. Normally, a data connection is automatically disconnected
when file transfer is complete.
1. The server enables port 21 to wait for a connection request from the client.
2. The client sends a connection request to the server.
3. After the request is received, a control connection is set up between the temporary port
on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client chooses a temporary port for the data connection and sends the port number
by using the port command to the server over the control connection.
6. The server sends a request to the client for setting up a data connection to the temporary
port on the client.
7. After the request is received by the client, the data connection between the temporary
port on the client and port 20 on the server is set up.
The process of setting up an FTP data connection by using passive mode is as follows:
1. The server enables port 21 to wait for a connection request from the client.
2. The client sends a connection request to the server.
3. After the request is received, a control connection is set up between the temporary port
on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client sends a command string PASV to the server to request the port number.
6. The server chooses a temporary port for the data connection and sends the port number
to the client over the control connection.
7. The server sends a request to the client for setting up a data connection.
8. The data connection between the temporary port on the client and the temporary port for
the data connection on the server is set up.
Figure 1-11 shows the process of setting up an FTP connection, assuming that the number of
the temporary port for the control connection is 2345 and the number of the temporary port
for the data connection is 2346.
1.2.2.2 TFTP
The Trivial File Transfer Protocol (TFTP) is a simple protocol for file transfer.
The TFTP client supports file upload and download by using TFTP. To ensure simple
implementation, TFTP utilizes the User Datagram Protocol (UDP) as its transport protocol.
Compared with FTP, TFTP does not require complicated interaction interfaces and
authentication control. Therefore, TFTP is applicable in a networking environment without
complicated interactions between the client and the server. For example, you can obtain the
memory image of the system through TFTP when the system is started up. To preserve the
small size of TFTP packets, TFTP is realized based on UDP.
Presently, the ATN implements the TFTP client rather than the TFTP server. The TFTP client
can upload and download files.
Currently, the ATN can act only as the TFTP client and only the binary transfer type is
available.
NOTE
Telnet is insecure. Using STelnet is recommended.
Internet
l NVT ASCII
NVT ASCII is a 7-bit ASCII character set. Each 7-bit character is sent as an 8-bit byte,
with the high-order bit set to 0. The Internet protocol suite including FTP and the Simple
Mail Transfer Protocol (SMTP) uses NVT ASCII.
l IAC
Telnet uses in-band signaling in both directions. The byte 0xff is called the Interpret As
Command (IAC). The next byte is the command byte.
Commands and their meanings are listed as follows:
– SE: suboption end
– SB: suboption begin
– WILL: option negotiation
– WONT: option negotiation
– DO: option negotiation
– DONT: option negotiation
– IAC: data byte 255
GA 249 Go ahead
l Telnet connection
A Telnet connection is a TCP connection used to transmit data with Telnet control
information.
l Telnet client/server mode
Telnet adopts the client/server mode. Figure 1-13 shows the schematic diagram of the
Telnet client/server mode.
TCP
Pseudo connection Terminal
TCP/IP TCP/IP
terminal driver driver
Kernel Kernel
User at a
Login shell
terminal
Principle of Telnet
Telnet is designed to operate between any two hosts or terminals. The client operating system
maps to the NVT whatever type of terminal the user is using. The server then maps the NVT
to whatever terminal type the server supports. The types of clients and terminals are ignored.
Communication ends are simply assumed as being connected to the NVTs.
NOTE
Telnet adopts the symmetric mode. Theoretically, there must be an NVT at each of the two ends of a
Telnet connection.
The two ends of a Telnet connection send WILL, WONT, DO, or DONT requests for option
negotiation. The options to be negotiated include echo, character set of command change, and
line mode.
This section describes the operating principles of Telnet:
l Requests in a Telnet connection
Either end of a Telnet connection can initiate a request to the other end. Table 1-2 shows
different requests and their meanings.
NOTE
When the sender sends an "option disable" request, such as WONT and DONT, the receiver must
accept the request.
When the sender sends an "option enable" request, such as WILL and DO, the receiver can either
accept or reject the request.
l If the receiver accepts the request, the option is enabled immediately.
l If the receiver rejects the request, the option remains disabled, but the sender can retain the
features as the NVT.
l Option negotiation
Option negotiation requires three bytes:
The IAC type, the byte for WILL, DO, WONT or DONT, and the option ID.
The following example illustrates the process of option negotiation.
The server needs to enable the "remote traffic control" with the option ID 33, and the
client grants the request. The commands exchanged between the server and client are as
follows:
– On the server: <IAC,WILL,33>
– On the client: <IAC,DO,33>
l Suboption negotiation
Certain options require more information than the option ID. For example, if the sender
requires the receiver to specify the terminal type, the receiver must respond with an
ASCII string to specify the terminal type.
The format of the commands for suboption negotiation is as follows:
< IAC, SB, option code, contents of suboption, IAC, SE >
A complete process of suboption negotiation is as follows:
– The sender sends a DO or WILL command carrying an option ID to request that the
option be enabled.
– The receiver returns a WILL or DO command carrying the option ID to accept the
request.
After the preceding two steps, both ends agree to enable the option.
One end of the connection starts suboption negotiation by sending a request
composed of the SB, suboption ID, and SE in sequence.
– The opposite end responds to the request for suboption negotiation by sending a
command composed of the SB, suboption ID, related negotiation information, and
SE in sequence.
– The receiver returns a DO or WILL command to accept the negotiation information
about the suboption.
If there are no additional suboptions to be negotiated, the negotiation ends.
NOTE
In the preceding process, the receiver is assumed to accept the request from the sender. In practice,
the receiver can reject requests from the sender at any time as required.
The following example illustrates the process of terminal type negotiation.
The client needs to enable the "terminal type" with the option ID 24. The server grants
the request and sends a request for querying the client terminal type. The client then
sends to the server another request carrying its terminal type "DELL PC". The
commands exchanged between the server and client are as follows:
– On the client: <IAC, WILL, 24>
– On the server: <IAC, WILL, 33>
– On the server: <IAC, SB, 24, 1, IAC, SE>
– On the client: <IAC, SB, 24, 0, "D", "E", "L", "L", "P", "C", IAC, SE>
NOTE
l Only the sender that sends the DO command can request terminal type information.
l Only the sender that sends the WILL command can provide terminal type information.
Terminal type information cannot be sent automatically but only in request-response mode.
The terminal type is an NVT ASCII string of case insensitive characters.
l Operating modes
Telnet has the following operating modes:
– Half-duplex
– Character at a time
– Line at a time
– Line mode
Telnet Server
PC ATN CX600
l Terminal redirection
As shown in Figure 1-15, a user runs the Telnet client application and logs in to the ATN
through a specified port, and then sets up connections with the devices connected to the
ATN through asynchronous serial interfaces. The typical application is that the devices
directly connected to the ATN through asynchronous serial interfaces are remotely
configured and maintained.
Ethernet
ATN
NOTE
Only the ATNs having asynchronous serial interfaces support terminal redirection.
1.2.2.4 SSH
SSH is short for Secure Shell. Its standard port number is 22.
Data transmission in Telnet mode is prone to attack, because it does not have a secure
authentication mode and use TCP to transmit data in plain text. Simple Telnet access is also
vulnerable to Denial of Service (DoS) attacks, IP address spoofing, and route spoofing.
With the increasing emphasis on network security, data transmission in plain text used by
traditional Telnet and FTP is becoming unacceptable. SSH is a network security protocol that
provides secure remote access and other secure network services on an insecure network by
encrypting network data.
SSH uses TCP to exchange data and builds a secure channel based on TCP. In addition to
standard port 22, SSH supports access through other service ports to prevent attacks.
SSH supports password authentication, Elliptic Curves Cryptography (ECC), Digital-
Signature Algorithm (DSA) and Revest-Shamir-Adleman Algorithm (RSA) authentication. It
uses Data Encryption Standard (DES), 3DES, and Advanced Encryption Standard (AES)
encryption to prevent password interception, ensuring the integrity and reliability of the data
and guarantee the secure data transmission. In particular, ECC, RSA and DSA authentication
supports the combined use of symmetric and asymmetric encryption. This implements secure
key exchange and finally secures the session process.
By virtue of data encryption in transmission and more secure authentication, SSH is widely
used and has become one of the more important network protocols.
SSH has two versions: SSH1 (SSH 1.5) and SSH2 (SSH 2.0). Both are different and
incompatible. SSH2.0 is superior to SSH 1.5 in security, functions, and performance.
NOTE
SSH in this chapter refers to SSH2.0, unless otherwise specified.
Devices that can function as the STelnet client and server support both SSH1 (SSH 1.5) and
SSH2 (SSH 2.0). Devices that can function as the SFTP client and server support SSH2 (SSH
2.0).
Secure Telnet (STelnet) enables users to remotely and securely log in to the device, and
provides the interactive configuration interface. All data exchanges based on STelnet are
encrypted. This ensures the security of sessions.
The SSH File Transfer Protocol (SFTP) enables users to log in to the device securely for file
management from a remote device. This improves the security of data transmission for the
remote system update. Meanwhile, the client function provided by SFTP enables users to log
in to the remote device for secure file transmission.
The server checks whether the SSH user, public key, and digital user signature are valid.
If all of them are valid, the user is permitted to access the server; if any of them is
invalid, the authentication fails and the user is denied access.
l DSA authentication
The digital signature algorithm (DSA) is an asymmetric encryption algorithm used the
authenticating clients. DSA algorithm consists of a public key and a private key.
Like RSA, the server checks whether the SSH user, public key, and digital user signature
are valid. If all of them are valid, the user is permitted to access the server; if any of them
is invalid, the authentication fails and the user access is denied.
Compared with RSA authentication, DSA authentication adopts the DSA encryption
mode and is widely used.
– In many cases, SSH only supports DSA to authenticate the server and the client.
– In SSH, DSA authentication takes precedence over RSA authentication.
l ECC authentication
The differences between the ECC and RSA algorithms are as follows:
– The RSA algorithm is based on large number factorization, which increases the key
length. And the long keys slow down the computing speed and complicate the key
storage and management.
– Based on discrete logarithm, the ECC algorithm is difficult to crack and is more
secure.
Compared with the RSA algorithm, the ECC algorithm shortens the key length while
ensuring the same security.
Compared with the RSA algorithm, the ECC algorithm secures the encryption with short
keys, which speeds up encryption. The ECC algorithm has the following advantages:
– ECC algorithm provides same security with shorter key length than the RSA
algorithm.
– Features a shorter computing process and higher processing speed than the RSA
algorithm.
– Requires less storage space than the RSA algorithm does.
– Requires lower bandwidth than the RSA algorithm does.
l Password authentication
Password authentication is based on the user name and password.
On the server, the AAA module assigns a login password to each authorized user. The
server has the mappings between user names and passwords. When a user requests
access the server, the server authenticates the user name and password. If either of them
fails to pass authentication, the access is denied.
l ECC-password authentication, RSA-password authentication and DSA-Password
authentication
The server can authenticate the client by checking both the public key and the password.
It allows user access only when both public key and password are consistent with those
configured on the server.
l ALL authentication
The server can authenticate the client by checking both the public key and the password.
It allows user access when either the public key or the password is consistent with those
configured on the server.
Ethernet 100BASE-TX
Server LapTop PC
WAN
Principles of SSH
SSH uses the traditional client/server (C/S) application model. Its security is guaranteed by
using the following modes:
Data encryption: Through the negotiation between the client and the server, an encryption key
is generated and used in data symmetric encryption. This ensures confidentiality during data
transmission.
Data integrity: Through the negotiation between the client and the server, an integrity key is
generated and used to uniquely identify a session link. All session packets are identified by
the integrity key. Any modifications made by the third party during transmission can be
discovered by the receiver based on the integrity key. The receiver can discard these modified
packets to ensure the data integrity.
Authority authentication: There are multiple authentication modes. Authority authentication
allows only valid users to have a session with the server, improving system security and
safeguarding the benefits of valid users.
Version Negotiation
Algorithm Negotiation
Key Exchange
User Authentication
Session request
Interactive session
1. Version negotiation
In the version negotiation phase, the SSH client sends a request for setting up a TCP
connection to the SSH server. After the TCP connection is set up, the SSH server and
SSH client negotiate the SSH version. After a matched version protocol is obtained,
different version protocols correspond to different state machine processes. If the version
of the client matches that of the server, the key negotiation starts; otherwise, the SSH
server tears down the TCP connection.
2. Algorithm negotiation
In the algorithm negotiation phase, the sender sends algorithm negotiation messages to
the receiver, together with their parameters, such as the random cookie, key exchange
algorithm, host key algorithm, Message Authentication Code (MAC) method, and
supported language.
After receiving these algorithm negotiation messages, the receiver compares the received
algorithm list set with the local algorithm list set. If the key exchange algorithm, public
key encryption algorithm, or MAC algorithm is not found, the receiver tears down the
connection with the sender and the algorithm negotiation fails.
3. Key exchange
After the server and client negotiate the version, the server sends the client a packet
containing the server's host public key, the server public key, the supported encryption
algorithm, the authentication algorithm, the protocol extension flag, and an 8-byte
cookie. This packet is sent in simple text.Then, the server and client calculate a 16-byte
session ID using the same parameter. The client also randomly generates a 32-byte
session key used to encrypt data. The client does not send the session key to the server,
but use the most-significant 16 bytes of the session key to XOR the 16-byte session ID to
obtain a result. The client then arranges the result using the Most Significant Bit (MSB)
first rule and obtains a multiple precision (MP) integer. Then the client encrypts the MP
integer using a public key with a smaller module value, arranges the result using the
MSB first rule again, and obtains a new value. Then the client uses a public key with a
larger module value to encrypt the new value.
The server is now in the waiting state. When receiving a key generation message from
the client, the server then returns a key generation message to the client, which indicates
that key exchange is complete and that the new key should be used for communications.
If the server fails to receive a key generation message from the client, it returns a key
exchange failure message and tears down the connection.
4. User authentication
After obtaining the session key, the SSH server authenticates the SSH client. The SSH
client sends the identity information to the SSH server. After a specific authentication
mode is configured on the SSH server, the client sends an authentication request. If the
authentication succeeds or the connection with the server expires, the connection is
terminated.
The SSH server authenticates a user in one of the following methods:
– In ECC, RSA, DSA authentication, the client generates an ECC, RSA, DSA key
pair and sends the public key to the server. When a user initiates an authentication
request, the client randomly generates a text encrypted with the private key and
sends it to the server. The server decrypts it by using the public key. If decryption
succeeds, the server considers this user trustable and grants access rights. If
decryption fails, the server tears down the connection.
– Password authentication is implemented based on AAA. Like Telnet and FTP, SSH
supports local database authentication and remote RADIUS server authentication.
The SSH server compares the user name and password of an SSH client with the
pre-configured ones. If both are matched, authentication succeeds.
5. Session request
After user authentication is completed, the client sends a session request to the server.
The session requests include the running of Shell and commands. At the same time, the
server waits to process the request from the client. During this phase, the server responds
to the client with an SSH_SMSG_SUCCESS message after successfully processing a
request from the client. If the server fails to process or identify the request, it responds
with an SSH_SMSG_FAILURE message.
Possible causes for the authentication failure are as follows:
– The server fails to process the request.
– The server cannot identify the request.
6. Interactive session
After the session request is accepted, the SSH connection enters the interactive session
mode. In this phase, data is transmitted bidirectionally.
a. The client sends a packet with the encrypted command to the server.
b. After receiving the packet, the server decrypts the packet and runs the command.
Then, the server packages the encrypted command execution results and sends the
packet to the client.
c. Upon receiving the packet, the client decrypts it and displays the command
execution results on the terminal.
User management, consisting of user interface configurations, user view configurations, and
terminal services, provides users' secure login and operations, thus implementing unified
management over different user interfaces.
User Interface
A User Interface (UI), which is presented in the form of a user interface view, enables users to
log in to the device. Through a user interface, you can configure the parameters on all
physical and logical interfaces that work in asynchronous and interactive modes. In this
manner, you can manage, authenticate, and authorize the login users.
0 CON0
NOTE
In the previous examples, the numbers ranging from 1 to 32 are reserved for VTYs. TTY is a
synchronous or asynchronous terminal line, which is related to specific physical devices.
Currently, the commands for viewing absolute numbering and relative numbering have been
provided.
User Login
In the absence of user authentication, any user can configure the device after the PC is
connected to the device through the console port.
Thus, the device and network are vulnerable to attacks. In this case, users should be created
for the device and passwords should be set for users so that the device can manage users. SSH
users are configured with RSA authentication and other users are configured with AAA. For
more information, refer to the AAA Feature Description.
User Classification
The users of the device can be classified into the following types based on the types of
services that users enjoy.
l HyperTerminal users: indicate the users who log in to the device through the console
port.
l Telnet users: indicate the users who log in to the device through Telnet.
l FTP users: indicate the users who transfer files by setting up the FTP connection with the
device.
l SSH users: indicate the users who perform the remote access to the network by setting
up the SSH connection with the device, including the STelnet mode and the SFTP mode.
l NMS users: indicate the users who set up the connection with the device through SNMP
or Telnet to manage devices in machine-to-machine mode.
One user can obtain multiple services simultaneously to perform multiple functions. VTY
users, namely, Telnet or SSH users, need be bound to admission protocols in the user interface
view before they log in.
User Priorities
The system supports hierarchical management over HyperTerminal users and VTY users.
Command levels are increased from 4 to 16. Similar to command levels, users are classified
into 16 levels numbered 0 to 15. The greater the number, the higher the user level. The level
of the command that a user can run is determined by the level of this user.
l In the case of password authentication, the level of the command that the user can run
depends on the level of the user interface.
l In the case of AAA authentication, the command the user can run depends on the level of
the local user specified in AAA configuration.
A user can run the commands whose levels are equal to or lower than the user level. For
example, the level 2 user can access the commands at levels 0, 1, and 2. The level 3 user can
access the commands at levels 0, 1, 2, and 3.
NOTE
The one-to-one mapping exists between user levels and command lines.
User Authentication
After users are configured, the system authenticates the users when they log in to the device.
l Password authentication: In this mode, users can log in to the device by entering the
password rather than the username. This mode is configured based on the terminal line.
A password can be configured for a terminal line or a group of terminal lines.
l AAA authentication: It includes AAA local authentication and AAA remote
authentication. In AAA local authentication, users need enter both the username and
password on the local device. If necessary, users also need enter user attributes, such as
user rights and FTP paths of users. In AAA remote authentication, user information need
be configured on the AAA server. In general, AAA server authentication is used for
VTY users; AAA local authentication or non-authentication is used for console users.
For more information, refer to the AAA Feature Description.
Planning Users
The network administrator can plan the users of the device as required.
Basic Concepts
l Storage device: a hardware device used to store data
l File: a mechanism used for the system to store and manage information
l Directory: a mechanism used by the system to integrate and organize files and to provide
a logical container of files
Managing Files
You can perform the following operations for files:
Miscellaneous
l Executing batch files
A batch file is created and executed to automat several tasks. Batch files must be created
on the client and uploaded to the device.
This operation need edit batch files on the client and upload batch files to the device.
l Configuring the prompt mode of the file system
If data is lost or damaged during file management, the system should provide prompts as
to corrective steps.
NOTICE
If the prompt mode is set as quiet, the system does not provide prompts when data is lost
because of user misoperations such as the accidentally deleting files. Therefore, this quiet
mode should be used with caution.
NOTE
If a great amount of command output is to be displayed, the device takes a long period of time to output
all information. Wait a although to obtain desired information.
l After: The lines containing user-specified contents and the subsequent lines are
displayed.
l Before + After or After + Before: The lines containing user-specified contents and the
preceding and subsequent lines are displayed.
Generally, all display commands need to support the pipe character. The display commands
that meet the following requirements, however, do not necessarily support the pipe character:
l Commands whose output information is stable can be displayed in current screen.
l Commands whose output information does not vary with configurations, dynamic data,
and specifications.
In high latitude areas, the sun rises earlier in summer than in the winter. To reduce evening
usage of incandescent lighting and save energy, clocks are adjusted forward one hour in the
spring. At present, about 110 countries around the world adopt DST.
Users can customize the DST zone according to their countries' or regions' convention. Users
can set when and how clocks are adjusted forward, usually an hour. With DST enabled, the
system time is adjusted accordingly; when it is time to end DST, the system time
automatically returns to normal.
1.2.3 Applications
IP Network
Server ATN
172.16.105.110/24 172.16.105.111/24
Server
172.16.104.110/24
console cable
A user can use TFTP to upload or download files to or from the server in a simple interaction
environment. Currently, the device acts only as a TFTP client.
Figure 1-21 shows the networking of downloading or uploading files through TFTP.
Server ATN PC
TFTP Client
10.111.16.160/24
As shown in Figure 1-22, the user on ATN logs in to the remote CX through Telnet.
ATN CX600
– A device can function as the STelnet server. Alternatively, it can function as the
STelnet client to access other STelnet servers.
– STelnet services can be enabled or disabled as required and they must be configured
on global mode. By default, STelnet services are disabled.
l SSH for SFTP
SFTP is based on SSH2.0, which supports the following authentication modes: password
authentication, RSA authentication, DSA authentication and ECC authentication. To
access the server using a client, an authorized user needs to enter the correct user name,
password, and private key to pass the authentication on the server. After that, the user
can use SFTP that is similar to FTP to manage remote file transfer on the network. The
system uses the negotiated session key to encrypt the user's data.
– A device can function as the SFTP server. Alternatively, it can function as the SFTP
client to access other SFTP servers.
– SFTP services can be enabled or disabled as required and they must be configured
on global mode. By default, SFTP services are disabled.
– Different users are allowed to use SFTP to access different file directories. Users
can access only the set SFTP directories. Available files for different users are
isolated from each other.
SFTP Client
legal user
SSH Client
setting port VPN
SFTP Server
SFTP Client
attacker
SSH Client
legal user
Network
SSH Client
setting port SSH Server VPN
SSH Client
attacker
SSH Client
legal user
Network
SSH Client
setting port
SSH Server
SSH Client
attacker
ACL
Terms
Terms Description
FTP In the TCP/IP protocol suite, the File Transfer Protocol (FTP) is applied
to the application layer. It is used to transfer files between local and
remote hosts. FTP is implemented based on the file system.
SSH Secure Shell (SSH) uses multiple encryption and authentication modes
to solve the problem of data encryption and user authentication in
traditional services. In virtue of its mature public key or private key
system, SSH provides an encryption channel between the client and the
server. This solves the problem of insecurity caused when data, such as
passwords, are transmitted over the network in plain text. SSH also
supports multiple authentication modes, such as CA and the smart card,
which solves the authentication problem and eliminates such insecurity
factors as the man-in-the-middle attack.
Terms Description
TLS TLS is a protocol based on the Netscape's SSL 3.0 protocol. TLS
replaces the vulnerability of SSL, which was vulnerable to man-in-the-
middle attack and used a weak MAC construction. The successors of
SSL are TLS 1.0 and TLS 1.1, which are defined by IETF. HTTPS,
LDAP and SNMP are some of the protocols that continue to use SSL.
Abbreviations
Abbreviations Full Name
2 System Management
This document describes the system management feature in terms of the overview, principle,
and applications.
2.1.1 Introduction
Definition
The information center functions as an information hub and is essential to the operation of a
device. It manages most output information and supports information classification to achieve
effective filtering. Together with debugging commands and the SNMP module, the
information center provides powerful support for network administrators to monitor the
device operation and locate network faults.
1. Receives logs, traps, and debugging information (with different severities) sent from
different modules.
NOTE
The logs, traps, and debugging information are stored in the log, trap, and debugging queues of the
information center. Each queue supports a maximum of 30,000 messages.
2. Distributes the information to different information channels according to user settings.
3. Outputs the information in different directions based on the mappings between the
information channels and directions.
The following table lists the main functions of the information center.
Information The information center classifies information into three types: log, trap, and
classification debugging information.
Information The information center can output information to a log file, console, virtual
output type (VTY) terminal, true type (TTY) terminal, log host, SNMP agent, log
buffer, or trap buffer.
In addition, the information center can output SSL-encrypted syslog
packets.
Information You can use commands to shield output information based on severities or
shield modules.
Purpose
The information center outputs information in a unified format to different directions,
improving information readability, maintainability, and flexibility from the following aspects:
1. Controls the output direction, that is, where information is to be output. Currently,
information can be output to a log file, console, VTY/TTY terminal, log host, SNMP
agent, log buffer, or trap buffer.
2. Filters information based on the information source, severity, type, and output direction.
3. Provides a system-level information output platform.
4. Displays system-level debugging information.
2.1.2 Principles
Logs
l Log overview
According to the ITU-T, logs are records of events and unexpected activities of managed
objects. The log module helps view user operations and manage system security,
providing basis for system diagnosis and maintenance. Therefore, logs are important for
O&M and fault locating.
l Log implementation on devices
The information center is enabled by default. It can output logs to a specified destination
as required.
For example, you can configure the information center to output logs to a specified log
host. Currently, the device supports a maximum of eight log hosts specified. This feature
allows logs to be simultaneously sent to different log hosts for backup.
The information center can send logs to the console and log buffer by default. If the log
quantity in the log buffer reaches the upper limit, the logs that are stored earliest will be
replaced by new ones.
l Diagnostic log
Diagnostic logs are used for fault locating and are not intended for users. Therefore,
users are not informed of these logs.
The information center still uses the original user log management system to process
diagnostic logs. With this system, you can view user logs rather than diagnostic logs. As
diagnostic log files are encrypted after being generated, specific diagnosis information in
the files cannot be obtained.
By default, diagnostic logs are output to diagnostic log files.
l Security log
The following types of security logs are available:
– Account management security logs: record account operation information, such as
user accounts, IP addresses, login and logout time, and operation time, contents, and
results.
– Protocol security logs: record insecure protocol interactions or algorithms.
– Attack defense security logs: record attack event information, such as the event
occurrence time, attack locations and sources, IP attack types, and attack impacts.
– Status security logs: record software and hardware abnormalities, real-time data of
key performance indicators as well as bandwidth, entry, and storage resources, and
process/branch abnormalities.
l Log output format
Syslog is a sub-function of the information center. Syslog uses UDP to output logs to log
hosts through port 514.
Figure 2-1 shows the log output format.
<Int_16> Leading character Before logs are output to log hosts, leading
characters are added to the logs. However, logs
saved on the local device do not have leading
characters.
AAA Module name Indicates the name of the module that outputs
information to the information center.
Traps
l Trap overview
Traps are notifications generated when the system detects faults. Information about the
faults is carried in traps. Different from logs, traps are time-sensitive and need to be
notified to users in time. Therefore, the information center processes traps sent to the
NMS in a different way.
Traps are sent from a device to an NMS. With the SNMP agent enabled on a device, the
trap function enabled on the associated module, and the NMS host to which traps are
sent configured, when an event occurs (for example, a network interface goes Down), the
device generates a trap and sends it to the specified destination address. If the route
between the device and NMS are reachable, the NMS can receive the trap.
The device has a trap buffer for storing traps. If the device is specified as an information
source on the information center, the buffer can store traps generated by the local device
regardless of whether a destination NMS host is configured.
l Trap-related concepts
– Event: indicates anything that takes place on the managed object. For example, the
object is added, deleted, or modified.
– Fault: indicates a situation where the system does not function properly. A fault
may cause the system to fail to operate or implement redundancy.
– Trap: indicates a notification generated when the system detects a fault.
l Trap output format
ModuleName Module name Indicates the name of the module that generates a
trap.
Debugging Information
Debugging information records a device's internal running status. A device can generate
debugging information only after the debugging function of the associated module is enabled
in the user view. Debugging information contains the contents of packets sent or received by
the debugged module. Note that enabling debugging only generates debugging information.
Displaying debugging information requires additional configuration. Different from logs and
traps, no buffer is available for debugging information. The information center can be
configured to output debugging information to the console or log hosts.
You can connect the PC to the console port of a device (called console mode) or to a network
interface of a device through Telnet (called terminal mode). When debugging the device in
console or terminal mode, you can determine the debugging information to be output.
Various debugging commands are provided for debugging protocols and functions that a
device supports. You can enable the debugging of a protocol or function for fault diagnosis.
Debugging
information 1 2 3
Protocol
debugging
ON OFF ON
1 3 1 3
Terminal screen
display
OFF ON
1 3
Overview
If a large amount of information is available, users may feel hard to differentiate between
information about normal operations and information about faults. Therefore, an information
hierarchy is designed to help users roughly determine whether to take an action immediately
or shield the information that does not require any user action.
Information Severities
The information center defines eight information severities. A smaller severity value indicates
a higher severity. Table 2-4 describes the severities.
0 Emergencies A fatal fault occurs in the device, which causes the system to
fail to function properly unless the device is restarted. For
example, a program abnormality leads to a device restart or a
memory error is detected.
2 Critical A critical fault occurs in the device, which requires that actions
be taken to analyze or process it. For example, the memory
usage or temperature falls below the lower limit, Bidirectional
Forwarding Detection (BFD) detects that the device is
unreachable, or the device is generating error messages.
The severity of output information can be modified. If you filter output information based on
a specified severity, only the information with a severity value less than or equal to the
specified value is output. That is, only the information with the specified severity or higher is
output.
For example, if the severity value is set to 6, the information with a severity value ranging
from 0 to 6 is output.
4 logbuffer Log buffer Outputs logs to the log buffer of a device. An area
inside the device is specified as the log buffer to
record logs.
As each output direction is associated with an information channel, information can be output
to a specified direction through the associated channel.
You can change information channel names or the mappings between the channels and output
directions as needed.
Information Output
Terminals connected to the device dynamically change. The information center needs to know
the latest change in time to determine whether to output information to terminals and in which
format information is output. Every time an EXEC user logs in, logs out, or has its attributes
changed, the information center is notified of the event through the EXEC module so that
information can be correctly output.
If information is output to a log file, a log file in .zip format is generated. When available
storage space is smaller than the specified threshold, the information center deletes the earliest
log file.
An information shield table helps filter information based on the type, severity, and source
and output information in multiple directions. Multiple information shield tables can be
created in the information center. Each information shield table maps one or multiple output
directions. Shielded information can be unshielded as required.
As shown in Figure 2-4, by default, logs, traps, and debugging information are output
through default channels. You can also specify a channel to output information. For example,
you can configure logs to be output to the log buffer through channel 6. In this way, all the
logs will be output through the specified channel (channel 6) rather than the default channel
(channel 4).
Loghost Loghost
Traps 2
Trapbuffer Trap buffer
3
Logbuffer Log buffer
4
Debugs
SNMP agent
5 SNMP agent
6 channel6
Direction of logs channel7
7
Direction of alarms
8 channel8
Direction of debugging
information channel9 Logfile
9
The information center monitors the traffic of logs with different IDs. If the traffic of logs
with a specific ID exceeds the threshold during a monitoring period, the information center
processes only the conforming traffic and discards the non-conforming traffic. If the traffic of
logs with a specific ID falls below the threshold and remains below the threshold within five
monitoring periods, the suppression is removed.
Common user logs are saved in text rather than binary format. Although they use a large log space, they
can be accessed and viewed at any time, independent of device types and versions.
A diagnostic log consists of two parts: static template data and dynamic log data. The two
parts are associated using a log ID that uniquely identifies a log sent by a log module with a
specific severity. Figure 2-5 shows how static template data and dynamic log data are
combined into a complete log.
Figure 2-5 Combining static template data and dynamic log data into a complete log
- <Module name="INFO">
- <LOG ID="1079398422" LEVEL="6" ALIAS="SUPPRESS_DIAGLOG">
Static template <Lang name="en-US" value="Last diagnostic message repeated
data [ULONG] times.([STRING])" />
</LOG>
</Module>
“1079398422”+
+
Dynamic log “Mar 31 2012 04:29:05.230.1-01:00”+”1”+”InfoID=1077497885,
data ModuleName=SHELL, InfoAlias=CMDTIMEOUT"
Saved in binary format.
=
Mar 31 2012 04:29:05.230.1-01:00 huawei %%01INFO/6/
SUPPRESS_DIAGLOG(D)[1464]:Last diagnostic message repeated 1
Complete log
times.(InfoID=1077497885, ModuleName=SHELL,
InfoAlias=CMDTIMEOUT)
l Static template data: contains the log ID and fixed log contents, such as diagnostic log
information about all modules on the device. This type of data is saved in .xml format
and can be configured on the device, enhancing log availability and extensibility.
l Dynamic log data: contains the log ID and variable log contents, such as the time and
dynamic parameters. This type of data is saved in binary format and generated based on
operations, events, or alarms in the system.
You can view a generated diagnostic log file using either of the following methods:
l Run commands directly on the device to view log information. After command
execution, the system fills dynamic data in a template based on the log ID and displays
complete log information.
l Use a log parsing tool to parse the static template data and dynamic log data. The parsing
tool is an .exe file. It fills the dynamic data in the static template based on the log ID and
displays complete log information.
Terms
Term Definition
2.2 SNMP
Terms
Terms Explanation
Terms Explanation
BER BER is the basic encoding rules. It is in the syntax structure of the ASN.
1, describing how data is represented during transmission.
Abbreviation
Abbreviation Full Spelling
2.2.2 Introduction
Definition
The Simple Network Management Protocol (SNMP) is used to manage TCP/IP networks. It
uses a central computer (a network management station) that runs network management
software. SNMP has the following characteristics:
l Simplicity: SNMP applies to small-scale networks requiring high speed and low costs
because it uses a polling mechanism and provides basic functions. SNMP uses UDP
packets, and is therefore supported by most devices.
l Ease of use: SNMP ensures the transmission of management information between any
two devices on the network, thereby allowing the network administrator to query
information, modify parameters, and locate faults on any device.
Purpose
As networks rapidly develop and applications become more diversified, network management
becomes difficult due to the following factors:
l The number of network devices is dramatically increasing, which increases the network
administrator's workload. In addition, networks' coverage areas are constantly being
expanded, making real-time monitoring and fault location of network devices difficult.
l The network supports a variety of devices from different vendors. Each vendor has a set
of management interfaces (such as command line interfaces), which complicates network
management.
SNMP has been developed to simplify the management of large numbers of network devices.
SNMP uses the network management system (NMS) to manage these network devices in
batches, which greatly improves management efficiency. SNMP can manage various network
devices from different vendors, regardless of the differences between these devices.
Along with hardware and software, SNMP monitors, configures, analyzes, estimates, and
controls network resources, ensuring a higher quality of service and better operating
performance at a lower cost.
Version Evolution
In May 1990, RFC 1157 was developed to define the first SNMP version: SNMPv1. RFC
1157 provides a systematic method for monitoring and managing the network. SNMPv1
cannot ensure the security of the network because it is based on community-name
authentication, and only a few error codes are returned.
Later, Internet Engineering Task Force (IETF) released SNMPv2p. For network security,
SNMPv2p imports the concept "participant". This concept, however, was not popularized
because of the problems encountered during actual practice. SNMPv2p was then replaced by
SNMPv2c. SNMPv2c does not cover the concept "participant". It still uses the community-
name authentication of SNMPv1 but imports the get-bulk operation to provide more error
codes.
Because SNMPv2c did not provide a high level of security, the IETF released SNMPv3.
SNMPv3 provides user security module-based (USM-based) encrypted authentication and
view-based access control model (VACM).
Benefits
l Improves the work efficiency of the network administrator. The network administrator
can use SNMP to query information, modify information, and locate faults on any
device.
l Reduces management costs. SNMP provides basic functions for managing devices with
different management tasks, physical attributes, and network types.
l Reduces the impact of feature operations on the device. SNMP is simple in terms of
hardware/software installation, packet type, and packet format.
l Ensures reliable packet transmission by providing a retransmission mechanism. SNMP
supports packet transmission in the "request-response" mode and the active report mode.
l Ensures secure packet transmission by providing security mechanisms such as
authentication and encryption.
2.2.3 Principle
Internet
Device Device
Agent Agent
MIB MIB
OID Node ... OID Node ...
1.3.6.1.2.1.1.1 sysDescr ... 1.3.6.1.2.1.1.1 sysDescr ...
1.3.6.1.2.1.1.2 sysObjectID ... 1.3.6.1.2.1.1.2 sysObjectID ...
1.3.6.1.2.1.1.3 sysUpTime ... 1.3.6.1.2.1.1.3 sysUpTime ...
...
...
...
...
...
...
Management Management
object object
Each network management system has at least one network management station (NMS)
running management processes to manage network devices.
The network has devices to be managed, and the agent process is run on these devices. Each
managed device may have several management objects. The agent queries the MIB on the
device under the request of the NMS.
Elements in the network management system are as follows:
l NMS
A network manager or a system using SNMP to manage or monitor network devices.
The NMS runs on NMS servers.
– An NMS can send requests to an agent on a device to query or modify the value of
one or multiple parameters.
– An NMS can receive trap messages sent from the agent on a device to learn the
current status of the device.
l Agent
An agent process on the network device, which maintains data sent from the managed
device and responds to requests from the NMS by sending management data to the
NMS.
– Upon receiving requests of the NMS, the agent performs the required operation
over the MIB and sends the operation result to the NMS.
– When a fault or an event occurs on the device, the agent running on the device
sends notifications to the NMS, reporting the current status of the device.
l MIB
A database. It contains variables maintained by network elements. These variables can
be queried and set by the management process. The MIB defines the name, status, access
rights, and data type of the managed device.
An agent can use the MIB to:
– Learn the current status of the device.
– Set the status parameter of the device.
As shown in Figure 2-7, data information is saved in a tree structure (OID tree) similar
to that of the Domain Name System (DNS). Each Object Identifier (OID) is mapped with
a management object. In this example, the OID of the system is 1.3.6.1.2.1.1 and the
OID of the interface is 1.3.6.1.2.1.2.
The OID tree facilitates information management and improves management efficiency.
With the OID tree, the network administrator can query information in batches.
dod(6)
internet(1)
l Management object
Object to be managed. A device may have multiple management objects, including a
hardware component (such as an interface board), software, and parameters (such as a
route selection protocol) configured for the hardware or software.
2.2.3.2 SNMPv1
This section describes SNMPv1 in terms of the packet format and working principle.
SNMP packets
IP UDP Community
Version SNMPv1 PDU
header header name
Get/GetNext/Set PDU or
Response PDU or
tap PDU
2.2.3.3 SNMPv2c
This section describes SNMPv2c in terms of the packet format and working principle.
SNMP packets
IP UDP Community
Version SNMPv2c PDU
header header name
Get/GetNext/Set PDU or
Response PDU or
Tap PDU
GetBulk PDU or
Inform PDU
As shown in Figure 2-10, SNMPv2c Protocol Date Units (PDUs) can be classified into get
PDUs, get-next PDUs, set PDUs, response PDUs, trap PDUs and two newly added PDUs
(getBulk PDUs and inform PDUs).
Get-Bulk-request
Get-response
NM Station Agent
UDP Port162 Inform-request UDP Port161
Response
Compared with SNMPv1, two operation types are added in SNMPv2c, as shown in Figure
2-11.
l get-bulk
An NMS performs the get-bulk operation to query pieces of information about a
managed device. One get-bulk operation functions the same as multiple consecutive get-
next operations. The number of get-next operations that function the same as one get-
bulk operation (a one-time get-bulk packet exchange on the host side) can be set on the
NMS.
l Inform
A managed device performs the inform operation to send notifications to the NMS. This
operation is only supported in SNMPv2c. Different from trap messages, inform
messages require a response after reaching the NMS. If the NMS does not send a
response, the managed device sends the inform message again until a response is
returned or the number of message retransmission occurrences reaches the upper limit. If
an inform message fails to be sent, the system logs the failure event on the managed
device. When the NMS restarts, it is notified of any inform messages that failed to be
sent. Inform messages are more reliable than trap messages.
2.2.3.4 SNMPv3
This section describes SNMPv3 in terms of the packet format and working principle.
IP UDP Security
Version Header data SNMPv3 PDU
header header parameters
Compared with SNMPv1 and SNMPv2c, two fields are added to SNMPv3 packets:
l Header data: records the maximum message size supported by the sender, the security
mode, and whether the message is encrypted or authenticated.
l Security parameter: records information of the user name, authentication key, and private
parameter.
The modular architecture of the SNMP entity has the following advantages:
l Strong adaptability: This architecture is adaptable for both simple and complex
networks.
l Easy management: This architecture consists of multiple independent sub-systems and
applications. When a fault occurs in the system, it is easy to locate the sub-system to
which the fault belongs based on the fault type.
l Excellent expandability: An SNMP system can be extended by increasing the number of
modules on the SNMP entity. For example, a module can be added in the security sub-
system for the application of a new security protocol.
SNMPv1 and SNMPv2c use the community name for packet authentication between the NMS
and the agent. This authentication mode is less secure. To enhance system security, SNMPv3
sets private keys for different users and provides data encryption, encrypted authentication,
and user access control functions, and allows the communication between AAA users and the
NMS. AAA users of different levels have permission to access different MIB objects.
NOTE
SNMPv1 Community name and SNMPv1 PDU (get PDU, get-next PDU,
set PDU, response PDU, and trap PUD)
SNMPv2c Community name and SNMPv2c PDU (get PDU, get-next PDU,
set PDU, response PDU, trap PUD, get-bulk PDU, and inform
PDU)
Compared with SNMPv1, SNMPv2c have the following
characteristics:
l More operation types are provided.
l The inform alarm is more reliable than the trap alarm.
l More standard error codes for defining different scenarios are
supported.
SNMPv3 Header data, security parameter (user name, private key, and
private parameter), and SNMPv3 PDU (get PDU, get-next PDU,
set PDU, response PDU, trap PUD, and get-bulk PDU)
Compared with SNMPv2c, SNMPv3 have the following
characteristics:
SNMPv3 is more secure than SNMPv1 and SNMPv2c because
SNMPv3 supports user authentication, user access control,
authorization, and authentication encryption. Authentication
modes include MD5 and SHA, and the encryption mode is DES
56 , AES-192, AES-256, 3DES and AES-128.
2.2.4 Applications
network administrator to configure and manage each device on site. If these network devices
are provided by different vendors and the device from each vendor has a set of management
interfaces (such as command line interfaces), the network administrator's workload for
managing these devices is increased. To reduce the operation cost and improve the work
efficiency, the network administrator can use SNMP to manage, configure, and monitor
network devices remotely.
LAN
LAN
NM Station
IP Network
SNMP
M LAN
LAN
SNMP is enabled on the network, SNMP manager is configured on the NMS, and agent is
enabled on the managed device.
With SNMP:
l The NMS can learn the device status by sending requests to the agent and control
devices remotely.
l The agent can report the status and faults of the device to the NMS in real time.
M
CE1 CE3 VPN A
VPN A
A NMS A
Tunnel A
PE1 PE2
Tunnel B
VPN B VPN B
CE4
CE2
SNMP deployed on PE1 and PE2 can provide the following functions:
l Enables the NMS to manage PEs in batches and establish a tunnel on the VPN network.
l Manages PEs and their accessed CEs using the NMS and ensures that CE1 and CE3 are
added to VPN A, and CE2 and CE4 are added to VPN B.
These functions save the network management cost, monitor device operating performance,
and improve the service quality.
Term
Term Description
2.3.1 Introduction
This section describes the basic knowledge about RMON.
Drawbacks of SNMP
SNMP is a widely used network management protocol. It collects statistics about network
communications by using the agent software embedded in the managed device. The
management software polls the agent for the information. The agent then searches the MIB
and returns the required information to the NMS. This process implements network
management through the NMS. Though the MIB counter records the sum of the statistics, it
cannot analyze the history status of daily communications. To completely obtain the
information about traffic and traffic change in traffic volume in a day, the NMS software must
continue to poll the agent for required information and then analyze the network status.
Poll in SNMP has the following drawbacks:
Introduction of RMON
To improve the usability of management information, lighten the burden on the NMS, and
enable the network manager to monitor several network segments, the Internet Engineering
Task Force (IETF) proposed RMON to replace SNMP for managing increasingly distributed
networks.
RMON is based on SNMP and is compatible with SNMP. It consists of two parts: the NMS
and the SNMP agent. The implementation of RMON is simple because it uses the original
mechanism of SNMP. RMON enables the SNMP module to monitor remote network devices
more efficiently and actively. It provides an efficient method to monitor the running status of
sub-networks, which reduces the communication traffic between the NMS and the Agent.
Large-scale networks can therefore be managed in a simple and effective manner.
RMON Goals
RMON provides an effective method to monitor traffic behaviors in sub-networks. RMON
goals are as follows:
l Offline operation: The monitor can continuously collect information about errors,
performance, and configuration even when the network manager is not available.
l Proactive monitoring: The monitor must be available at the onset of any network failure.
It can notify the network manager of the failure and provide useful statistics for fault
location.
l Problem detection and reporting: The monitor can be configured to monitor conditions,
such as faults in the network and resource consumption. When any of these conditions
occurs, an event is logged. This is helpful in checking errors.
l Data analyzing: The monitor can collect and analyze data about the sub-network. This
lightens the burden on the NM Station.
l Multiple managers: Multiple managers can be used to enhance reliability. Managers have
different functions and provide different management performance for interior devices.
2.3.2 Principles
RMON defines a set of MIBs, which contain the information about standard network
monitoring function and interfaces.This implements the communication between the SNMP
management terminal and the remotely managed devices.
statistic ( 1 )
protocolDir ( 11 )
history ( 2 )
protocolDir ( 12 )
alarm ( 3 )
addressMap ( 13 )
host ( 4 )
nlHost( 14 )
hostTopN ( 5 )
nlMatrix ( 15 )
matrix ( 6 )
alHost ( 16 )
filter ( 7 )
alMatrix( 17 )
capture ( 8 )
userHistory( 18 )
event ( 9 )
probeConfig( 19 )
RMON RMON2
l Statistics group: collects basic statistics of each monitored sub-network. The statistics
include the data flow on a network segment, distribution of various packets, error frames,
and collision times.
l History group: periodically collects the network status statistics and stores them for
future use.
l Alarm group: allows predefining a set of thresholds for alarm variables that can be any
object in the local MIB. The monitor records logs or sends trap messages to the NMS
when the sample crosses a threshold in a certain direction.
l Host group: contains inbound and outbound traffic statistics associated with each host
discovered on the network.
l HostTopN group: contains statistics about hosts that top a list ordered by one of the
parameters.
l Matrix group: stores errors and useful information in the form of a matrix. This is
convenient for operators to search the information based on any set of two addresses.
l Filter group: allows the monitor to observe packets on the interface and select a specific
packet through filtering.
l Packet getting group: provides a cache mechanism and allows packets to be obtained
after they flow through a channel.
l Event group: stores all the events generated by the RMON agent in a table. The event
group records logs or sends trap messages to the NMS when an event occurs.
NOTE
The alarm group requires the implementation of the event group. The hostTopN group requires the
implementation of the host group. The getting group requires the implementation of the filter group.
Statistics Group
The statistics group collects basic statistics of each monitored sub-network.
Figure 2-16 shows the three tables in the statistics group.
statistic ( rmon 1 )
EtherStats Table ( 1 )
tokenRing MPLS
Stats Table ( 2 )
tokenRing
PStats Table ( 3 )
EtherStatsTable
This table contains 21 objects. It has a record entry for each monitored sub-network to display
statistics about the sub-networks. Most objects in this table are counters, used by the monitor
to record packets with different status across sub-networks.
The EtherStatsTable contains information about sub-networks and error information, such as
Cyclic Redundancy Check (CRC) code, and correct and incorrect packets. Therefore, this
table displays the operating status of the entire network. Information collected in the
EtherStatsTable and MIB-II is similar. The information in the EtherStatsTable is more detailed
and is more pertinent to Ethernet networks.
Upper and lower limits in the alarm table are set based on the statistics in the EtherStatsTable.
Setting the alarm threshold is an effective method for network monitoring.
The tokenRing MPLS StatsTable and the tokenRing PStatsTable provide statistics of token
ring networks. Most objects in the tables are counters.
History Group
The history group periodically collects statistical samples on a monitor.
This group consists of one historyControlTable and three HistoryTables as shown in Figure
2-17.
history ( rmon 2 )
historyControl
Table ( 1 )
etherHisTable( 2 )
tokenRingMLHistory
Table( 3 )
tokenRingPHistory
Table( 4 )
l historyControlTable
The historyControlTable contains detailed information, such as the sampling interval and
interface information. Every record in it defines the sampling interval for a specified
interface. After being sampled, data is saved as a related entry in the data table. As
defined in RMON, a monitored interface must have two control rows, one of which
defines the sampling interval as 30 seconds and the other defines the sampling interval as
30 minutes. The short interval is used to detect the burst communication events, and the
long interval is used to detect the stable communication events.
l Data Table
Data tables are applied to record data. The etherHistoryTable is a data table especially
for Ethernet networks. The tokenRingMLIHistoryTable and the tokenRingTable are data
tables for token ring networks. Similar to the statistics group, the data table also provides
counters.
Alarm Group
The alarm group allows predefining a set of thresholds for variables. If the monitored variable
exceeds the threshold, an event is generated, and the monitor records logs or sends trap
messages to the NMS. This group is dependent on the event group and requires the
implementation of the event group. The alarm group consists of only one table: the
alarmTable. Each record defines the specified variable, sampling interval, and threshold.
Host Group
The host group collects statistics associated with the specified host newly discovered on a
Local Area Network (LAN).It discovers hosts by monitoring the source and destination MAC
addresses in the packets transmitted across the LAN. The host group retains a group of
statistics for each host.
This group consists of the hostControlTable, the hostTable, and the hostTimeTable, as shown
in Figure 2-18.
host ( rmon 4 )
hostControlTable( 1 )
hostTable( 2 )
hostTimeTable( 3 )
l hostControlTable
Every row in the hostControlTable corresponds to a monitored network interface.
Options in the control tables define various data. Control tables also record the time
when entries in the data table are deleted. The control tables and the data tables are
directly mapped. The hostTable records the MAC addresses discovered by network
interfaces specified by rows in the control table.
l hostTable
Rows in the hostTable store statistics about hosts. This table can be indexed either by
MAC addresses of hosts or network interfaces. If the network interface discovers a new
host, a row is added in the hostTable. Once a row is added in the hostTable, the monitor
begins to the check the MAC address of the corresponding network interface.
l hostTimeTable
Rows in the hostTimeTable store the same information as that in the hostTable. The
hostTimeTable is indexed by the creation time instead of the MAC address.
The hostTimeTable also supports management stations. You can effectively find new
entries for the specified interface without downloading the information of the complete
table.
HostTopN Group
The hostTopN group is used to maintain statistics of the hosts in a sub-network. The
monitored hosts top a list ordered by one of their variables. For example, this group can
collect the information about the host with the top 10 data transmission amount.
This group consists of the hostTopNControlTable and the hostTopNTable, as shown in Figure
2-19.
hostTopN ( rmon 5 )
hostTopControl
Table( 1 )
hostTopNTable( 2 )
l hostTopControl Table
Every row in the hostTopControlTable defines a Top-N report of a network interface. It
also covers the period from the time the last TOP-N report is initialized to the time the
system starts.
l hostTopTable
This table contains information about Top N hosts. Each row represents a unique host.
This table also contains the MAC address of the host and defines the changes of the
sampled data.
Matrix Group
The matrix group stores statistics of traffic between hosts in a sub-network. The statistics are
stored in a matrix format. This group consists of the matrixControlTable, the matrixSDTable,
and the matrixSDTable, as shown in Figure 2-20.
matrix (rmon 6)
matrixControlTable
(1)
matrixSDTable(2)
matrixSDTable(2)
l matrixControlTable
Each row in this table identifies a sub-network. It displays the session status on the
network interface and records the statistics of sessions in two data tables.
l matrixSDTable
This table is used to store the statistics of traffic from the specified source host to
multiple destination hosts. This table records two entries for a pair of hosts exchanging
information recently. One entry recording traffic sent from the source host to the
destination host; the other entry records traffic sent from the destination host to the
source host.
l matrixSDTable
This table is similar to the matrixSD table. The difference lies in the sequence of
indexes.
Filter Group
The filter group allows the monitor to trace the packets on a specified interface. Basic
components of this group are two types of filters: data filter and status filter. A data filter
allows the monitor to shield the traced packets in a bit method. A status filter allows the
monitor to match the packets based on packet status. Filters can be used in the logical
AND/OR combination to form a complicated test mode.
This group consists of the filterTable and the channelTable.
filterindex (rmon 7)
filter Table(1)
channelTable(2)
The filterTable defines related filters. Each row in the channelTable corresponds to a unique
channel and is associated with one or several rows in the filterTable.
capture ( rmon 8 )
BufferControl
Table( 1 )
captureBuffer
Table( 2 )
Each row in the bufferControlTable defines a cache used to get and store packets passing
through a channel. Each row in the captureBufferTable corresponds to a obtained packet.
Event Group
The event group can define events. An event can be triggered by a certain condition in the
MIB or can trigger a certain operation defined in the MIB. The event also generates logs
(recorded in this group) or SNMP trap messages.
This group consists of the eventTable and the logTable, as shown in Figure 2-23.
event ( rmon 9)
eventTable(1)
logTable(2)
The eventTable defines events. Each row in the table describes the parameters of the event
triggered by a certain condition.
If events are recorded, corresponding entries in the logTable are created.
management, the monitor must communicate with the central network management station.
Monitors have the following functions:
l Traces information groups in the network, collects statistics, and summarizes the
information.
l Provides important management information for the network manager; stores certain
information groups for later analysis.
l Filters groups according to information types.
l Gets special information groups.
To implement RMON in the network, the monitor and the RMON client software must be
used. The monitor runs effectively without continuously polling the managed devices. It
generates a trend diagram to illustrate the network operating status based on the capability of
the RMON module to store history statistics. The monitor reports the operating status and
describes any obtained abnormal situation regardless of accidental network events. The client
software diagnoses the fault based on the reported information from the RMON module and
finds solutions.
RMON allows multiple monitors and collects data in the following ways:
l Uses a special RMON probe. The NMS obtains management information from the
RMON probe and controls network resources directly. This helps in obtaining all the
information on the RMON MIB. This costs a lot because RMON probes must be
deployed in all LANs.
l Embeds the RMON Agent into a network device (ATN and HUB), enabling the device
with the RMON probe function. The NMS uses the basic SNMP commands to exchange
data information with the RMON agent and to collect the network management
information. This is, however, restricted by the device resources, and the NMS collects
the information in four groups only (alarm, event, history, and statistics) rather than the
entire RMON MIB data. This method improves the efficiency of the network monitoring
and reduces costs.
NOTE
The ATN implements the monitoring and statistics collection function only on the Ethernet interfaces of
network devices.
RMON and RMON2 are both used to monitor Ethernet links. RMON monitors the traffic only
at the MAC layer whereas RMON2 can monitor the traffic at the MAC layer and the
subsequent upper layers.
RMON2 can decode data packets from layer 3 to layer 7 in the OSI model. The RMON agent
has the following functions:
l Monitors the traffic based on network layer protocols and addresses. This enables the
agent to learn its connected external LAN network segment and view the incoming
traffic to the LAN through the ATN.
l Records the incoming and outgoing traffic to and from a specific application because the
RMON2 agent decodes and monitors the traffic of applications such as email, the File
Transfer Protocol (FTP), and WWW.
In this manner, the monitor can record the information about the actions of the application on
a host and display diagrams to illustrate the action of each application. This strengthens the
network monitoring.
Configurations
To implement remote monitoring, configurations about data collection are required. The type
of data required to be collected must be configured. MIBs are divided into multiple function
groups. Each group contains one or more controls tables, corresponding to one or more data
tables. The manager can read from or write into the control tables, whereas data tables are
read-only. A control table contains all parameters in the data table. The manager collects the
required data by modifying parameters in data tables. Parameter setting is implemented by
adding or deleting records in the control table. After data is collected according to the data in
the control table, data is stored in the corresponding data table.
Defining and actualizing of the functions of the monitor are implemented through tables. The
operation process is similar to the database operation. Parameters in the control tables are all
configured with values, and every record defines a specified data collection function. Records
in the data table correspond to the records in the control table. Every record in the control
table and its corresponding data records are bound through pointers. Records in the control
table have indexes, which are used to search the corresponding records in the data table.
Similarly, records in the data table also have indexes, which are applied to find the
corresponding records in the control table.
To modify parameters in the control table, first delete the record with the specified index in
the control table. Note that the corresponding records in the data table must be deleted
simultaneously. The manager then generates a control record and adds it to the control table.
When the records in the control table and the data table are mapped one-to-one, the control
table and the data table can be considered as one table.
l A manager requires additional resources more than that a monitor can provide.
l A manager uses a significant amount of resources for a long period. This prevents other
manager from using the monitor.
l A manager uses resources and then crashes; the resources used cannot be released.
A mechanism is developed to prevent the preceding conflicts and help to resolve them. This
mechanism is a simple control function in the control table of the RMON MIB. Each control
table has a label identifying the owner of the function. This shows the relationship between
records and related functions. When multiple managers want to access the same control table,
the following can be implemented based on the relationship:
l A manager may recognize resources it owns as well as resources that it no longer needs.
l A network manager can know the resources and successfully release resources or related
functions.
l An authorized network manager can release resources that are reserved by other
managers.
l Upon initialization, a manager can recognize the resources it has reserved. It then
releases the resources if it no longer needs them.
If multiple NMSs want to access the same control table, it is more effective to use the
resource sharing function. When a manager intends to utilize a function in a monitor, it first
scans the control table of that function to find the function or a similar function defined by
other managers for sharing. If the function is found, the manager can read the records from
the data table corresponding to the control table. The owner of the records may
indiscriminately modify or delete the functions. Therefore, in certain cases, other managers
may find that the expected functions have been modified or deleted.
Generally, during the initialization of each monitor, default function sets should be
configured. The labels of function owners are characters starting with "monitor", indicating
that resources related to the pre-defined functions belong to the specified monitor. If some
managers need to use the functions, they can only read but cannot modify or delete the
function. Functions can be deleted by the manager of the monitor (commonly, network
manager) only.
Row Addition
The manager obeys the following rules to add rows through the SNMP Set operation:
l The manager sends the Set request to the managed device for adding a row. If the index
of the new row does not conflict with indexes of other rows, the agent generates a new
row.
l If the tabular information is not configured for the new row, the agent can set the row to
the default value or maintain the row in the incomplete status.
l Before the manager requires adding a new row, the inactive row must keep the inactive
status.
l If the new row to be created exists, the agent responds with an error packet.
Row Deletion
If the manager sets the value of this object to an invalid value by sending the Set request to
the agent, a row can be deleted.
Row Modification
The manager sets the value of the object to an invalid value and then modifies the value by
sending the Set request. In this way, the value of the object is modified.
Implementation of RMON
RMON effectively implements monitoring on all network segments. In LANs, deploying
RMON probes is highly expensive. In addition, monitoring network segments individually
degrades the performance of RMON and generates heavy traffic on the network.
To solve the preceding problems, manufacturers embed the RMON module into network
devices. This is more economic and effective. The RMON agent module is embedded on
Huawei ATNs, forming a complete system with other modules in the ATN. The NMS can use
SNMP. The network managers then do not need extra learning to handle RMON.
RMON in the ATN supports four groups, namely, statistics, history, alarm, and event, defined
in RFC 2819, and a Performance-MIB defined by Huawei. The four groups are described as
follows:
l Statistics group
The statistics group collects basic statistics of each monitored sub-network. The statistics
include the data flow on a network segment, distribution of various packets, error frames,
and collision times.
The statistics group contains an ethernetStatsTable. Rows can be created in the
ethernetStatsTable only on Ethernet and Gigabit Ethernet interfaces (not sub-interfaces).
An interface corresponds to only one row in the etherStatsTable. The etherStatsTable has
a maximum of 100 rows.
l History group
The history group periodically collects the network status statistics and stores them for
future use. The history group has the following tables:
– historyControlTable: controls information such as the sampling interval and
interface information. After being sampled, data is saved as a related entry in the
ethernetHistoryTable. Rows can be created in the historyControlTable only on
Ethernet and Gigabit Ethernet interfaces (not sub-interfaces) and 10 Gbit/s Ethernet
interfaces. The historyControlTable contains a maximum of 100 rows.
– ethernetHistoryTable: provides the network administrator with other history
statistics such as the traffic on a network segment, error packets, broadcast packets,
utilization, and collision times. Each entry in the historyControlTable defines a
sampling interval and is associated with the historyControlTable that the sampling
is based on. The history control table is created once one sampling interval arrives.
alarmTable 60 6000
eventTable 60 600
logTable 600 -
prialarmTable 50 6000
Implementation of RMON2
RMON2 is one of the RMON MIB standards, serving as a supplement to RMON. In RMON2,
some groups are added.
As defined in RFC 2021, RMON2 contains several MIB groups: protocolDir, protocolDist,
addressMap, nlHost, nlMatrix, alHost, alMatrix, usrHistory, probeConfig, and
rmonConformance.
Currently, the ATN supports two RMON2 MIB groups: protocolDir and nlHost.
Figure 2-24 shows the relationship between the protocolDirTable, nlHostTable, and
hlHostControl table.
nlHostTimeMark
protocolDirParameters other PARA
(index2)
index
protocolDirLocalIndex
3
nlHostAddress
other PARA
(index4)
other
PARA
l protocolDirTable
It lists the protocols, which the RMON agent can decode and count. Each row in the
table corresponds to one type of protocols. The protocols can be network layer protocols,
transport layer protocols, or higher-layer protocols. Note that nlHost supports the
network layer host group instead of the application layer group. That is, application layer
host control and the alHostTable are not implemented in the host control table.
Therefore, only IP can be set in the protocol directory group.
l nlHostTable
The nlHostTable is used to count the amount of inbound and outbound traffic on the
interface. It provides traffic statistics for the specified network address. This table
collects statistics of the host discovered by the RMON agent and classifies statistics
based on network addresses.
l hlHostControlTable
The hlHostControlTable contains two tables: network layer host control table and
application layer host control table. It defines the monitored interface, and records the
total number of frames that are received on the interface but not recorded in the
nlHostTable. It also records the number of times of entry addition and deletion and the
expected maximum number of entries in the nlHostTable. The alHostControlTable
cannot control the alHostTable.
Abbreviations
Abbreviations Full Spelling
2.4 IP FPM
NOTE
Among the ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the IP FPM
function.
2.4.1 Introduction
Definition
IP Flow Performance Measurement (FPM) is a Huawei proprietary feature that measures
packet loss rate and delay of end-to-end service packets transmitted on an IP network to
determine network performance. This feature is easy to deploy and provides an accurate
assessment of network performance.
Purpose
As IP services are more widely adopted, fault diagnosis and end-to-end service quality
analysis are becoming an increasingly pressing concern for carriers. However, absence of
effective measures prolongs fault diagnosis and increases the workload. Currently, carriers use
Network quality analysis (NQA) and Y.1731 to measure the quality of services running on IP
radio access networks (RANs).
Both measures, however, have their own shortcomings.
l NQA measures network performance by determining the packet loss rate of simulated
packets, but not actual service packets transmitted on networks. The performance
counters collected by NQA may not represent the actual service quality, and therefore
cannot serve as a solid reference for network performance analysis.
l Y.1731 measures only Layer 2 Ethernet network performance, but not performance of
networks spanning different layers.
l Neither NQA nor Y.1731 can monitor end-to-end networks at different layers, and
therefore are not effective for monitoring IP network performance.
IP FPM does not have any of these shortcomings. IP FPM directly measures service packets
to assess IP network performance and monitors services in real time for network diagnosis.
Benefits
IP FPM brings the following benefits to carriers:
l Allows carriers to use the network management system (NMS) to monitor the network
running status to determine whether the network quality complies with the service level
agreement (SLA).
l Allows carriers to promptly adjust services based on measurement results to ensure
proper transmission of voice and data services, improving user experience.
2.4.2 Principles
IP FPM Model
The IP Flow Performance Measurement (FPM) model describes how service flows are
measured to obtain the packet loss rate and delay. In statistical terms, the statistical objects are
the service flows, and the statistical calculations determine the packet loss rate and delay of
the service flows traveling across the transit network. Service flow statistical analysis is
performed on the ingress and egress of the transit network.
The IP FPM model is composed of three objects: target flows, the transit network, and the
statistical system. The statistical system is further classified into the Target Logical Port
(TLP), Data Collecting Point (DCP), and Measurement Control Point (MCP). Figure 2-25
shows the IP FPM model.
Upstream-TLP1
Upstream-TLP2 Downstream-TLP1
DCP DCP
Upstream-TLP3 Downstream-TLP2
Upstream-TLP4
Transit Network
l Target flow
Target flows must be pre-defined.
One or more fields in IP headers can be specified to identify target flows. The field can
be the source IP address or prefix, destination IP address or prefix, protocol type, source
port number, destination port number, or type of service (ToS). The more fields
specified, the more accurately flows can be identified. Specifying as many fields as
possible is recommended to maximize the measurement accuracy.
l Transit network
The transit network only bears target flows. The target flows are not generated or
terminated on the transit network. The transit network can be a Layer 2 (L2), Layer 3
(L3), or L2+L3 hybrid network. Each node on the transit network must be reachable at
the network layer.
l TLP
TLPs are interfaces on the edge nodes of the transit network. TLPs perform the
following actions:
– Compile statistics on the packet loss rate and delay.
– Generate statistics, such as the number of packets sent and received, traffic
bandwidth, and timestamp.
An In-Point-TLP collects statistics about service flows it receives. An Out-Point-TLP
collects statistics about service flows it sends.
l DCP
DCPs are edge nodes on the transit network. DCPs perform the following actions:
– Manage and control TLPs.
– Collect statistics generated by TLPs.
– Report the statistics to an MCP.
l MCP
MCPs can be any nodes on the transit network. MCPs perform the following actions:
– Collect statistics reported by DCPs.
– Summarize and calculate the statistics.
– Report measurement results to user terminals or the network management system
(NMS).
Measurement Flags
Measurement flags, also called identification flags, identifies whether a specific packet is used
to measure packet loss or delay.
A specific bit in the IPv4 packet header can be specified as a measurement flag for packet loss
or delay measurement.
l The third to seventh bits in the ToS field are seldom used in actual applications. These
bits, if available, can be used as measurement flags for service packets.
l Bit 0 in the Flags field is reserved and can be directly used as a measurement flag.
Figure 2-26 shows the possible measurement flags in the IPv4 packet header.
0 15 16 31 bit
Version IHL Type of Service Total Length
Identification Flags Fragment Offset
Time to Live Protocol Header Checksum
Source Address
Destination Address
Options Padding
If two or more bits in the IPv4 packet header have not been planned for other purposes, they
can be used for packet loss and delay measurement at the same time. If only one bit in the
IPv4 packet header has not been planned, it can be used for either packet loss or delay
measurement in one IP FPM instance.
Function Overview
IP Flow Performance Measurement (FPM) measures multipoint-to-multipoint (MP2MP)
service flows to obtain the packet loss rate and delay.
Three IP FPM types are available: proactive performance statistics, on-demand performance
statistics, and hop-by-hop performance statistics. Table 2-9 lists the usage scenarios for these
IP FPM types.
Implementation
A bearer network where traffic passes through has boundaries through which traffic enters
and leaves. On the IP/MPLS network shown in Figure 2-27, the number of packets entering
the network in the ingress direction on ATN is PI, and the number of packets leaving the
network in the egress direction on ATN is PE.
ATNB
IP/MPLS
Over a specified period, the difference between the number of packets entering the network
and the number of packets leaving the network is the packet loss.
l The number of packets entering the network is the sum of all packets moving in the
ingress direction: PI = PI(1) + PI(2) + PI(3)
l The number of packets leaving the network is the sum of all packets moving in the
egress direction: PE = PE(1) + PE(2) + PE(3)
Over a specified period, the difference between the time a service flow enters the network and
the time the service flow leaves the network is the delay.
Packet Loss Measurement
Packet loss measurement calculates the difference between the volume of traffic entering the
network and the volume of traffic leaving the network over a specified period.
Figure 2-28 shows a typical network where end-to-end performance can be measured.
Service packets enter the network from ATNA and leave the network from ATNB.
ATN1 ATN2
ATN1
0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0
ATN2
0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0
time t5 t4 t3 t2 t1 t0
1. t0: ATNA sets the loss measurement flag to 1 for incoming service packets in the first
period and starts counting all service packets with the loss measurement flag as 1.
2. t1: ATNB starts receiving service packets with the loss measurement flag as 1 in the first
period and starts counting these service packets.
3. t2: ATNA finishes counting the incoming service packets with the loss measurement flag
as 1 in the first period and calculates the total number of these service packets PI1.
ATNA then sets the loss measurement flag to 0 for incoming service packets in the
second period and starts counting all service packets with the loss measurement flag as 0.
4. t3: ATNB finishes receiving service packets with the loss measurement flag as 1 in the
first period and calculates the total number of these services packets PE2.
NOTE
ATNB starts receiving service packets with the loss measurement flag as 1 from t1. At T3, the internal
timer has run for a specified period. ATNB determines that it finishes receiving service packets with the
loss measurement flag as 1 in this period based on the period elapse, but not on whether service packets
with the loss measurement flag as non-1 have been received. Therefore, service packet measurement
will not be affected by packet disorder. This mechanism ensures that service packets in each period are
correctly collected.
5. t4: ATNA sets the loss measurement flag to 1 for incoming service packets in the third
period and starts counting all service packets with the loss measurement flag as 1.
6. t5: ATNB starts receiving service packets with the loss measurement flag as 1 in the third
period and starts counting these service packets.
ATNB can obtain the number of received service packets with the loss measurement flag as 1
in the first period any time between t3 and t5. The formula is LostPacket = PI1 - PE2.
Delay Measurement
Delay measurement calculates the difference between the time a service flow enters the
network and the time the service flow leaves the network over a specified period.
In IP FPM, delay measurement is implemented for sampled service packets by recording the
time the packets are sent and the time the packets are received.
t1 t2
0 0 0 1 0 0 0 0 0 1 0 0
IP/MPLS
0 0 1 0 0 0 0 0 1 0 0 0
t4 t3
ATNA ATNB
2.4.3 Applications
IP datacom networks, as the mainstream of datacom networks, are large in scale and provide
various access modes. To maximize carriers' return on investment, reduce network
construction costs, and evolve the existing network smoothly into a Long Term Evolution
(LTE) network, an IP RAN solution is introduced.
IP RANs require performance measurement for SLA compliance and routine O&M
performance management. As the bearer network quality (delay, jitter, and packet loss) affects
the radio service quality, the bearer network department must provide optimal methods to
detect the network operating status. In addition, if the service quality deteriorates, the bearer
network must be able to provide its own performance data to help fault locating.
IP RAN provides a variety of solutions. The following section describes the application of IP
FPM end-to-end performance measurement in HVPN, L2+L3 mixed VPN, and L3 dual-
homing scenarios.
HVPN Scenarios
Figure 2-30 shows an HVPN networking. Table 2-10 lists how to deploy IP FPM in an
HVPN scenario.
L3VPN MCP
TLP
RSG1
TLP AGG1
RNC
TLP AGG2
RSG2 SGW/MME
eNodeB
TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow
TLP Performance measurement can be implemented for E2E services and local
depl switching services on IP RANs. On the network shown in Figure 2-30:
oym l For E2E services, configure IP FPM on both ends (CSG, RSG1, and RSG2) of
ent the Layer 3 service flow; deploy TLPs on the UNIs of the CSG, RSG1, and
RSG2 and bind the TLPs to the access-side interfaces (TLPs must be bound to
the outbound interfaces on both RSGs).
l For local switching services, deploy TLPs only on the CSG sub-interfaces
connecting to the base stations.
DCP Configure the CSG, RSG1, and RSG2 as DCPs to send measurement data to the
depl MCP.
oym
ent
MCP l If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG.
oym l If routes are unreachable between the access and aggregation networks, deploy
ent the MCP on an AGG.
On the network in Figure 2-30, deploy the MCP on RSG1.
Cloc Configure the network time protocol (NTP) or 1588v2 so that all device clocks can
k be synchronized.
depl l To implement IP FPM one-way delay measurement, you must configure 1588v2
oym for clock synchronization. If not, the measurement is incorrect.
ent
l To implement IP FPM two-way delay measurement or packet loss
measurement, configure either NTP or 1588v2 for clock synchronization.
1588v2 implements higher-precision clock synchronization than NTP. Using
1588v2 is recommended.
In E2E VPN and native IP+L3VPN scenarios, deploy IP FPM in the same manner as that in
HVPN scenarios.
L2VPN L3VPN
MCP TLP
AGG1 RSG1
TLP RNC
TLP
AGG2
RSG2 SGW/MME
eNodeB
TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow
TLP For E2E services, deploy TLPs on the AC interfaces that carry services (Layer 2
depl user interfaces on the CSG and Layer 2/Layer 3 user interfaces on RSGs).
oym Configure flow characteristics based on the 5-tuple, start measurement and
ent counting, and send measurement data to the MCP through protocol packets
DCP Configure the CSG, RSG1, and RSG2 as DCPs to send measurement data to the
depl MCP.
oym
ent
MCP Routes are unreachable between the access and aggregation networks, and the CSG
depl does not have routes to RSGs. Therefore, deploy the MCP on an AGG. On the
oym network in Figure 2-31, deploy the MCP on AGG1.
ent
In packet loss measurement, ARP request messages are filtered out and unknown unicast
traffic is also filtered out based on the source IP addresses. The causes are as follows:
L3 Dual-homing Scenarios
Figure 2-32 shows an L3 dual-homing scenario in which a NodeB is dual-homed to two
CSGs and the NodeB's gateway address is the VRRP backup group's virtual IP address.
MCP
TLP
TLP
CSG1 AGG1 RSG1 RNC
1
2
Service data
VRRP
VRRP IP FPM packets 3
NodeB VLANIF
5
AGG2 4
CSG2 RSG2 SGW/MME
TLP TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow
In normal situations (no link or node failure between the RSGs and CSGs), traffic travels
along the primary path CSG1 -> AGG1 -> RSG1.
l If the link between RSG1 and S-GW/MME fails, traffic switches to the path CSG1 ->
AGG1 -> RSG1 -> RSG2.
l If RSG1 fails, the VPN routes advertised by RSG1 become invalid, and a master/backup
VRRP switchover occurs. As a result, traffic switches to the path CSG1 -> CSG2 ->
AGG2 -> RSG2.
l If CSG1 or the link between CSG1 and the NodeB fails, a master/backup VRRP
switchover occurs, and traffic switches to CSG2.
The upstream traffic enters the network from links 1 and 5 and leaves the network from links
2 and 4. Link 3 is a transit path. The measurement result can be obtained by comparing the
number of packets sent through links 1 and 5 and received through links 2 and 4. The
downstream traffic enters the network from links 2 and 4 and leaves the network from links 1
and 5, and the measurement result can be obtained in the same way. Table 2-12 lists how to
deploy IP FPM in an L3 dual-homing scenario.
TLP Deploy TLPs on the UNIs of CSG1, CSG2, RSG1, and RSG2, and bind the TLPs
depl to the UNIs.
oym
ent
DCP Configure the CSG1, CSG2, RSG1, and RSG2 as DCPs to send measurement data
depl to the MCP.
oym
ent
MCP If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG. If routes are unreachable between the access and aggregation
oym networks, deploy the MCP on an AGG. On the network in Figure 2-32, deploy the
ent MCP on RSG1.
DCPs send measurement data collected by the TLPs to the MCP. The MCP uses the
synchronization method to calculate the ingress and egress data to obtain the measurement
result. The measurement is irrelevant to the topology change, and therefore the deployment is
easy.
Summary
IP FPM offers the following benefits:
l Supports E2E performance measurement on large-scale networks.
l Supports service-based packet loss and delay measurement with high precision.
l Applies to various networking scenarios.
In end-to-end performance measurement, only the traffic entering and leaving the network is
measured. This measurement reflects only the quality of the entire network. If a network fault
occurs, end-to-end performance measurement cannot help locate the fault. To locate the fault,
IP FPM provides hop-by-hop performance measurement.
On the mobile backbone IP RAN shown in Figure 2-33, multiple NEs are deployed, and
services are complex. Once a network fault occurs, it is difficult to locate the fault.
AGG2
TLP0-2 SGW/MME
eNodeB TLP2-2 RSG2 TLP3-2
TLP1-2
IP FPM
IP FPM can function with the NMS for fault locating. The process is as follows:
l The NMS provides visualized service paths for target flows by segmenting the service
forwarding path into multiple closed hops and delivering these closed hops to the DCPs
and MCP.
l The DCPs report the hop-by-hop measurement data to the MCP.
l The MCP calculates the packet loss and delay performance for each hop.
l The NMS displays the real-time data of each hop using the MIB, helping locate the fault.
Figure 2-33 lists how to deploy IP FPM on the network shown in Figure 2-33.
DCP Configure all devices that have TLPs deployed as DCPs to send measurement data
depl to the MCP.
oym
ent
MCP If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG. If routes are unreachable between the access and aggregation
oym networks, deploy the MCP on an AGG. On the network in Figure 2-33, deploy the
ent MCP on RSG1.
Cloc Configure the network time protocol (NTP) or 1588v2 so that all device clocks can
k be synchronized.
depl
oym
ent
A G
B D F H
C E I
In some situations, the accurate path diagram cannot be obtained, and subsequently ACHs
cannot be formed. In this case, some key points on the path can be pinpointed to form
measurement sections through which traffic passes through, as shown in Figure 2-35. ACHs
can be divided based on these measurement sections.
Measurement
section
In hop-by-hop performance measurement, the MCP measures the packet loss and delay based
on ACHs. The smaller scale an ACH covers, the more accuracy for fault locating. ACH
division helps identify a local area, a direct link, or inbound and outbound interfaces on a
device.
DCP Data Collecting Point, a device that manages TLPs, collects statistics
generated by TLPs, and reports the statistics to the MCP in the IP
FPM model.
TLP Target Logical Port, an interface that compiles statistics and outputs
data in the IP FPM model.
2.5 NQA
Definition
Network Quality Analysis (NQA) is a feature provided by the device. Independent of lower-
layer hardware, NQA functions above the link layer to measure the performance of protocols
running at the network layer, transport layer, and application layer.
Purpose
The device provides NQA to help carriers to monitor network QoS in real time and locate the
faults occurring on the network.
To visualize the qualities of network services and allow users themselves to check whether the
qualities of network services meet requirements, carriers must take the following measures:
l Provide statistics about the device to illuminate the qualities of network services.
Owing to the statistical multiplexing and traffic burst of IP networks, NQA can only be
described by statistics. Therefore, carriers need to provide relevant statistical parameters
such as delay, jitter, and packet loss ratio at the equipment side.
l Monitor the qualities of network services by deploying probe devices.
As the scale of networks continuously increases, if dedicated probe devices are used, for
example, the third party probe device Brix, more and more probe devices are needed.
This will increase carriers' expenditure.
The device provides NQA to meet the preceding requirements. Through the network quality
test function integrated in a device, NQA can accurately test the operating status of the
network and collect statistics. In addition, since dedicate probe devices are not required, NQA
provided by the device also reduces the carriers' cost.
NQA measures the performance of different protocols running on the network. In that case,
carriers can collect operation indexes of networks in real time, such as delay in setting up a
TCP connection, file transmission rate, and delay in setting up an FTP connection.
The Ping operation is a traditional method to monitor the network quality. Compared with
information collected through NQA, the information collected through the Ping operation is
limited. The following shows the differences between NQA and Ping in the aspect of
functions and configurations.
Function NQA not only supports Internet Control Message Ping is based on ICMP
Protocol (ICMP) tests but also service availability and is used to test only
tests (such as TCP, UDP, FTP, SNMP, Traceroute, the round-trip time (RTT)
and LSP Ping/Traceroute services). Moreover, of a datagram between
NQA can be used to test the response time of the source and the
each service and the jitter time on the network destination and test the
through Jitter tests. reachability of the
By default, NQA supports a maximum of 50 test destination.
instances. A user can initiate only
one Ping operation at a
time.
Configura For NQA, you can run commands on the client to For Ping, you need to run
tion view NQA test results. Note the following: the ping command on the
In NQA, parameters of operations can be set and Console to test the
tests can be started through the Network reachability of a specified
Management System (NMS). You can obtain IP address. The RTT or
statistics by viewing the output test results and timeout period of every
history tables. packet can be displayed
in real time.
In most test instances, you only need configure
NQA clients. Configuring NQA servers is
necessary for FTP, TCP, UDP, and UDP Jitter test
instances.
An NQA server responds to the test request from
a client through the monitoring function. After
being configured with the corresponding
destination IP address and port number, the NQA
server can respond to the test request. The IP
address and port number specified in the
monitoring service on the server must be
consistent with those configured on the client.
NQA Ping
Schedulin NQA supports test instance scheduling, which Ping only supports
g mode avoids concurrent test operating and hence command line delivery.
reduces the burden on a device.
NQA supports the configuration of different start
time and end time for a single test instance.
NQA supports three modes of starting tests:
immediate, delayed, and periodical.
NQA supports five modes of ending tests:
automatic, immediate, delayed, timely, and
ending tests when the life cycle of the test
expires.
When several tasks are performed at the same
time, a device reasonably arranges start time and
test intervals.
2.5.2 Principles
In NQA, two test ends are called an NQA client and an NQA server. An NQA test is initiated
by the NQA client. Users can configure test instances through command lines or the NMS.
Then, NQA places different types of test instances into various test queues for scheduling.
When starting an NQA test instance, you can choose to start the test instance immediately or
in a timing manner, or delay starting the test. A test packet is generated according to the test
type when the timer expires. If the size of the generated test packet is not in accordance with
the minimum size of the protocol packet, the test packet must be generated and sent out with
the size being the defined minimum size of the protocol packet.
After the test instance starts, a response packet is returned. Carriers can then know the
operating status about the protocol by analyzing the received response packet. The test packet
is marked with a timestamp based on the local system time before being sent to the
destination. After receiving the test packet, the destination sends a response packet to the
source. The source then marks the received response packet with a timestamp based on the
current local system time. This helps the source to calculate the RTT of the packet according
to the time of sending and receiving the packet.
NOTE
For a Jitter test instance, not only the source needs to mark the packet with a timestamp but also the
destination needs to mark the packet with a timestamp based on the local system time after receiving the
packet and returning a response packet. In this way, the source can calculate the jitter time of the packet.
Carriers can know the network operating status by viewing test results.
ATN A ATN B
Network
t1
t3'
t2'
t2 t4'
t4
A larger absolute jitter value indicates poorer link quality, no matter whether the jitter
value is positive or negative.
A UDP jitter test can also measure the packet loss rate unidirectionally.
ATN A ATN B
Network
t1
t4 UDP Jitter
Request
UDP Jitter
Reply
On the network shown in the Figure 2-37, Server (ATNB) collects statistics about received
packets. After Client (ATNA) finds that the number of packets sent by itself is different from
the number of packets received by itself, Client (ATNA) initiates a unidirectional packet loss
query to learn the number of packets received by Server (ATNB).
If Client (ATNA) does not receive any query reply, Client (ATNA) records Packet Loss
Unknown.
NOTE
By default, UDP jitter (hardware-based) is not enabled. To implement a hardware-based UDP jitter test,
enable the interface board to send packets.
ATN A ATN B
Network
t1
t3'
t2'
t2 t4'
t4
1. Client (ATNA) adds timestamp t1 to an ICMP packet and sends the packet to Destination
(ATNB).
2. Upon receipt of the packet, Destination (ATNB) adds timestamp t1' to the packet.
3. After processing the packet, Destination (ATNB) adds timestamp t2' to the packet and
forwards it back to Source (ATNA).
4. Upon receipt of the packet, Source (ATNA) adds timestamp t2 to the packet.
The following can be calculated based on the timestamp information in the packets received
by Client (ATNA):
l Maximum, minimum, and average jitter of the packets from Client (ATNA) to
Destination (ATNB) and from Destination B to Client (ATNA)
l Maximum unidirectional delay from Destination (ATNB) to Client (ATNA) and from
Client (ATNA) to Destination (ATNB)
l Source-to-destination jitter=(t3'-t1')-(t3-t1)
A larger absolute jitter value indicates poorer link quality, no matter whether the jitter
value is positive or negative.
l Destination-to-source jitter=(t4–t2)-(t4'-t2')
A larger absolute jitter value indicates poorer link quality, no matter whether the jitter
value is positive or negative.
RTT=(t2–t1)-(t2'-t1')
If the RTT is longer than the specified timeout period, the network is congested and ICMP
packets will be counted as lost packets.
Packet loss rate = Number of lost ICMP packets/Number of sent ICMP packets
In an ICMP jitter test, you can set the number of packets to be sent consecutively in a single
test instance to simulate a certain type of traffic.
The following items can be calculated based on the information in the packets received by the
client:
l Maximum, minimum, and average jitters of the packets from the client to the server and
from the server to the client
l Maximum unidirectional delay from the server to the client or from the client to the
server
The NQA path jitter test first identifies the IP address of each hop from the client to the server
by initiating a trace test, and then initiates an ICMP jitter test from the client to obtain the
jitter value of each hop along the path. Figure 2-41 shows the process of a path jitter test:
1. ATN A initiates a trace test to obtain the IP address of each hop along the path to ATN C.
2. ATN A initiates an ICMP jitter test to the IP address of each hop to obtain the jitter value
of each hop.
l Time to set up a control connection: It is the time taken by the client to set up a TCP
control connection with the FTP server through three-way handshake and the time taken
to interchange signals through the control connection.
l Time to set up a data transmission connection: It is the time taken by the client to
download a specified file from the FTP server or upload a specified file to the FTP server
through the data transmission connection.
Through an FTP test, the following can be calculated based on the information in the packets
received by the client:
These statistics can clearly reflect the performance of the FTP protocol over the network.
NOTE
l At present, the NQA FTP test only supports data transmission in proactive and ASCII modes.
Anonymous users cannot be used for the test.
ATN A ATN B
UDP Server
ATN A ATN B
Network
t1
UDP packet
UDP packet
t2
1. Client (ATNA) adds timestamp t1 to a UDP packet and forwards the packet to UDP
Server (ATNB).
2. Upon receipt of the packet, UDP Server (ATNB) directly forwards it back to Client
(ATNA).
3. Upon receipt of the packet, Client (ATNA) adds timestamp t2 to the packet. Client
(ATNA) then calculates the time used for communication between itself and UDP Server
(ATNB) by subtracting the time at which it sends the UDP packet (t1) from the time at
which it receives the UDP packet (t2). The calculation result is called delay, a
performance counter clearly reflecting UDP performance.
If the delay is longer than the specified timeout period, the network is congested and UDP
packets will be counted as lost packets.
A UDP test can also measure the packet loss rate using the following formula:
Packet loss rate = Number of lost UDP packets/Number of sent UDP packets
ATN A ATN B
Network
t1
icmp echo
request
icmp echo
reply
t2
1. Client (ATNA) adds timestamp t1 to an ICMP Echo Request packet and send the packet
to Destination (ATNB).
2. Upon receipt of the packet, Destination (ATNB) responds Client (ATNA) with an ICMP
Echo Reply packet.
3. Upon receipt of the packet, Client (ATNA) adds timestamp t2 to the packet. Client
(ATNA) then calculates the time used for communication between itself and Destination
(ATNB) by subtracting the time at which it sends the ICMP Echo Request packet (t1)
from the time at which it receives the ICMP Echo Reply packet (t2). The calculation
result is called delay, a performance counter clearly reflecting network status.
If the delay is longer than the specified timeout period, the network is congested and ICMP
packets will be counted as lost packets.
An ICMP test can also measure the packet loss rate using the following formula:
Packet loss rate = Number of lost ICMP packets/Number of sent ICMP packets
NOTE
An ICMP test is usually conducted to check the connectivity. However, it cannot accurately test the link
delay. Therefore, to test link performance, you are advised to conduct an NQA jitter or ICMP jitter test
with hardware-based packet sending enabled.
ATN C
IP Network
ATN A ATN B
ATN D
1. In an LSP ping test, a UDP MPLS Echo Request packet is constructed first. The
destination IP field is filled with an IP address on the network segment 127.0.0.0/8. The
client searches for the LDP LSP based on the specified remote LSR ID and forwards the
packet through the LDP LSP in the MPLS domain. For a TE LSP, the packet can be sent
from a tunnel interface and forwarded along a specified CR-LSP..
2. The egress monitors port 3503 and returns an MPLS Echo Reply packet.
The client then calculates the time for the communication between the client and the
egress by subtracting the time at which the client receives the MPLS Echo Reply packet
from the time at which the client sends the MPLS Echo Request packet. This mechanism
help administrators get a snapshot of the MPLS network status.
MPLS
Backbone
PE-A P PE-B
PW
VLAN1 VLAN2
NodeB RNC
1. In an LSP trace test, a UDP MPLS Echo Request packet is constructed first. The
destination IP field is filled with an IP address on the network segment 127.0.0.0/8. The
client searches for the LDP LSP based on the specified remote LSR ID. The Echo
Request packet includes Downstream MapPing TLV that carries the information about
the downstream node of the current LSP node, such as the IP address of the next hop and
the outgoing label. The TTL value of the first Trace Echo Request packet is 1, .
2. The client forwards the Echo Request packet through the specified LDP LSP in the
MPLS domain. When TTL expires after the packet reaches the first node on the LSP
path, the node returns an MPLS Echo Reply massage.
3. The client continues sending Echo Request packets with TTL value increased by 1 each
time until all LSRs on an LSP return MPLS Echo Reply messages.
After the client receives response massages from the LSRs, display and collect
information about the LSP forwarding path and devices along the path. This mechanism
help administrators get a snapshot of the LSP forwarding path from the source host to the
destination host and collect information about devices along the path.
MPLS
Backbone
PE-A P PE-B
PW
VLAN1 VLAN2
NodeB RNC
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
LBM MEP2
LBR
MEP1 MEP3
When one host sends a large number of IP packets to another host, the IP packets are
fragmented according to the maximum acceptable packet length. This affect forwarding
efficiency. It is preferable that these packets be of the largest size that does not requires
fragmentation anywhere along the path from the client to the server. This packet size is
referred to as the path MTU.
Usually, the path MTU is equal to the minimum of the MTUs of each hop along the sub-paths.
As shown in Figure 2-54, the MTU value between ATN A and ATN B is 100 bytes and
between ATN B and ATN C is 200 bytes. Therefore, the path MTU value between ATN A and
ATN C is 100 bytes.
An NQA path MTU test is initiated from the client to the server. It requires several
incremental steps to estimate the maximum path MTU. Figure 2-54 shows the process of a
path MTU test:
1. ATN A sends an ICMP probe packet to ATN C, with the packet size as the minimum
range (The value is configurable and the default value is 48 bytes).
2. When the first probe packet successfully hits the destination, ATN A continues to send
ICMP probe packets with incremental steps (which is configurable and the default value
is 10 bytes) to ATN C until three consecutive packets time out. This indicates that the
MTU of the sent packet is greater than the minimum path MTU.
3. ATN A sends a 48-byte detection packet to ATN C to check the connectivity of the
network. If the connectivity of the network is normal, the size of the last successful
probe packet before the timeout in step 2 is the maximum path MTU.
NOTE
The packet header contains a Don't Fragment (DF) flag, indicating whether a packet can be fragmented.
The DF field should be set to 1, indicating that the device cannot fragment the packet.
1. A VSI and a MAC address are specified. The MAC address can be the bridge MAC
address of the peer PE on the VPLS network or the MAC address of the CE on the user
side. The test instance constructs an MPLS Echo Request packet, with the network
address 127.0.0.0/8 being added to the IP header as the destination IP address. Then, the
MAC table learned on a PW side is checked. If an entry corresponding to the destination
address is found in the MAC table, the MPLS Echo Request packet is forwarded to the
PW; otherwise, the MPLS Echo Request packet is broadcast throughout all PWs in the
specified VSI.
2. The PE monitors the port numbered 3503. When the port receives the MPLS Echo
Request packet, the PE node responds with an MPLS Echo Reply packet.
3. If the MAC address specified on the client is the MAC address of the CE side, the MPLS
Echo Request packet is not actually forwarded to the CE. Instead, the MAC address of
the requested CE is searched on the PE node to which the requested CE is connected. If
the MAC address of the CE exists on the PE node, the VPLS ping test is regarded as
successful; otherwise, the test is regarded as failed.
The client can then calculate the time for the communication between the client and the
egress by subtracting the time at which the client receives the MPLS Echo Reply packet
from the time at which the client sends the MPLS Echo Request packet. This can clearly
reflect the MPLS network status.
PE1 PE2
PW vsi:a2
vsi:a2
VPLS
MAC port
PW
0018-826D-4917 GE0/3/1.3
PW
PE3
vsi:a2
As shown in Figure 2-55, the process of initiating the VPLS ping test on PE1 is as follows.
1. A VPLS ping test instance is configured on PE1, with the MAC address of CE2, namely,
0018-826d-4917, as the destination MAC address. The entry corresponding to the
destination MAC address is not found in the MAC table on PE1. Consequently, the
MPLS Request packet is broadcast throughout PWs to the specified VSI.
2. Both PE2 and PE3 receive the MPLS Request packet. Because the destination MAC
address and the bridge MAC address on PE3 are different, and no entry corresponding to
the destination MAC address is found in the MAC table on the CE, according to the split
horizon principle, the Request packet is not forwarded.
3. The destination MAC address and the bridge MAC address on PE2 are different. An
entry corresponding to the destination MAC address, however, is found in the MAC
table on the CE. In this case, an MPLS Reply packet is returned to the client, indicating
that the VPLS ping test is successful.
Sending Receiving
PE PE
PW vsi:a2
vsi:a2
VPLS
MAC port
PW
0018-826D-4917 GE0/3/1.3
TT
PW
L1 PE3
vsi:a2
TTL2
Figure 2-56 shows the process of a VPLS trace test initiated on the client PE.
1. A VPLS trace test instance is configured on the sending PE, with the MAC address of
CE2, namely, 0018-826d-4917, as its destination MAC address. An MPLS Echo Request
packet with the TTL being 1 is sent. Because no destination MAC address is found on
the sending PE, the MPLS Echo Request packet is broadcast throughout all PWs of the
specified VSI.
2. After receiving the MPLS Echo Request packet, PE3 checks Because the destination
MAC address and the bridge MAC address on the PE3 are different, and no entry
corresponding to the destination MAC address is found in the MAC table, when the TTL
carried in the MPLS Echo Request expires, the packet is not forwarded and an MPLS
Echo Reply packet is returned to the sending PE.
The Receiving PE receives the MPLS Echo Request packet. The destination MAC
address and bridge MAC address on the Receiving PE are different. An entry
corresponding to the destination MAC address exists, however, is found in the MAC
table on the CE. In this case, an MPLS Reply packet is returned to the Sending PE,
indicating that the VPLS ping test is successful.
VPLS PW ping or VPLS PW trace operations initiated through NQA commands are the same
as ping or trace operations initiated through common command lines in principle, and
additionally provide the scheduling and result collection mechanism and the threshold-
exceeding alarm function.
VPLS PW ping and VPLS PW trace comply with RFC 4379 and RFC 5085 in implementing
PW detection: MPLS echo packets that carry Forwarding Equivalence Class (FEC) fields are
encapsulated in tunnel mode and labeled with the Router Alert option; the Router Alert
function is enabled on the VPLS network. MPLS echo packets are transmitted between PEs to
detect PWs and are not sent to CEs, which means that the NQA test instance can be
configured only on PEs. If an NQA test instance is configured on a non-PE device, it cannot
be started because there is no VSI used for establishing PWs on the non-PE device and as a
result, the test result is "drop."
During the VPLS PW detection through an NQA test instance, threshold monitoring and
NQA test instance scheduling can be actively performed based on the specifications defined
in the IP SLA.
l A VPLS PW ping or VPLS PW trace test instance can be configured to actively monitor
VPLS services and detect faults in VPLS services. In the case that the round-trip time
(RTT) of a packet exceeds the threshold, a connection is interrupted, or the response to a
request packet times out, an SNMP trap message in an NQA test instance is sent to the
Network Management System (NMS) for notification and collects the statistics (such as
the RTT) for users to query.
l The scheduling function can be enabled to periodically schedule an NQA test instance to
detect a specific VPLS PW as required. When multiple NQA test instances are started
concurrently to detect multiple PWs, the scheduling function enables these NQA test
instances to operate separately and arranges the operation time properly so that as many
test instances as possible can be started for PW detection. The maximum number of test
instances that the system allows is calculated based on the traffic metric of test instances.
label and sends the MPLS echo request packet to the pre-defined destination along the
route.
1. Parameters for the test instance are configured on the sender PE. For example, on a
Martini VPLS network, the destination address, PW ID, and VSI name need to be
configured. Then, the NQA module constructs an MPLS echo request packet carrying
the timestamp in the private TLV field and the Router Alert option, and sends the MPLS
echo request packet to the public network based on the forwarding information in the
forwarding table. The initial and maximum TTL values of the MPLS echo request packet
to be sent can be specified, and the TTL value of the first MPLS echo request packet is 1.
2. After receiving the MPLS echo request packet, the intermediate PE checks whether the
TTL value of the packet expires. If so, the intermediate PE sends the MPLS echo request
packet to the CPU for processing. After that, the intermediate PE constructs an MPLS
echo reply packet and encapsulates the downstream TLV into the MPLS echo reply
packet. After obtaining the next hop information based on the packet's incoming label
and the inbound interface index, the intermediate PE sends the MPLS echo reply packet
to the sender PE.
3. After the sender PE receives the MPLS echo reply packet, it keeps sending MPLS echo
request packets with the TTL values being increased by 1 each time a packet is sent until
an MPLS echo request packet reaches the destination or the TTL value reaches the upper
limit.
4. Information about Ps is not displayed in the NQA test result by default. It can be
obtained by running the lsp-path full-display command.
Overview
An NQA general flow test is a standard traffic testing method for evaluating network
performance and is in compliance with RFC 2544. This test can be used in various
networking scenarios that have different packet formats. NQA general flow tests are
conducted using UDP packets with source UDP port 0xC020 and destination UDP port 7. As
defined in RFC 2544, in a general flow test, test results can be written into a file and
proactively pushed to an STP or SFTP server.
Before a customer performs a service cutover, an NQA general flow test helps the customer
evaluate whether the network performance counters meet the requirements in the design. An
NQA general flow test has the following advantages:
l Enables a device to send simulated service packets to itself before services are deployed
on the device.
Existing methods, unlike general flow tests, can only be used when services have been
deployed on networks. If no services are deployed, testers must be used to send and
receive test packets.
l Uses standard methods and procedures that comply with RFC 2544 so that NQA general
flow tests can be conducted on a network on which both Huawei and non-Huawei
devices are deployed.
Related Concepts
l Specified lower threshold bandwidth: a dynamic value that changes while a test is being
conducted. The initial value of the lower threshold bandwidth is configured.
l Specified upper threshold bandwidth: a dynamic value that changes while a test is being
conducted. The initial value of the upper threshold bandwidth is configured.
Test Procedure
A general flow test is an NQA test tool using UDP packets. Before a general flow test is
conducted, the push function must be configured on an initiator, and the initiator must have a
reachable route to the FTP or SFTP server. An initiator (NQA client) initiates a general flow
test and sends test packets to a reflector. After the test packets arrive at the reflector, the
reflector interchanges the source and destination addresses in the packets and loops the
packets to the initiator. The initiator counts the number of sent and received packets and
calculates indicators based on timestamps carried in the packets. After the general flow test is
complete, the initiator writes test results into a file and upload the file onto an FTP or SFTP
server.
A general flow test measures the following counters:
l Throughput: maximum rate at which packets are sent without loss. The value is
expressed in kbit/s.
l Packet loss rate: percentage of discarded packets to all sent packets.
l Latency: consists of the bidirectional delay time and jitter calculated based on the
transmission and receipt timestamps carried in test packets. The transmission time in
each direction includes the time the forwarding devices process the test packet. The
value is expressed in microseconds.
These counters are calculated in separate tests. A counter must be specified before a test is
conducted.
On the network shown in Figure 2-57, UNI-A (User Network Interface A) is an initiator, and
UNI-B is a reflector. UNI-A and UNI-B conduct tests on the throughput, delay time, and
packet loss rate.
FTP/SFTP server
Push
Initiator Reflector
UNI-A
UNI-B
Looped traffic
Throughput tests
NOTE
The packet encapsulation format and the percentage of valid payloads vary with the service scenario.
Therefore, the network throughput data obtained through general flow tests (RFC 2544 tests) differs.
This difference exists no matter whether the network throughput is measured using general flow tests or
any other test method.
Use an L3VPN scenario where two devices are connected through sub-interfaces as an example. The
RFC 2544 test rate is calculated based on the L1 rate, with both the inter-frame gap of Ethernet packets
(12 bytes by default) and the preamble (8 bytes by default) being considered. During L3VPN access,
two MPLS labels (4 bytes per label) need to be added, and a VLAN tag (4 bytes) needs to be added for
each public network sub-interface. Therefore, the scenario-affected theoretical network throughput can
be calculated using the following formula: (Test packet length + Interframe gap length + Preamble
length)/(Test packet length + Interframe gap length + Preamble length + Length of the two MPLS labels
added + Length of the VLAN tag added) x Link bandwidth. For example, if the test packet length is 64
bytes, the theoretical network throughput is calculated as follows: (64 + 12 + 8)/(64 + 12 + 8 + 8 + 4) x
Link bandwidth = 87.5% x Link bandwidth. The theoretical value is for reference only. In real-world
applications, the value may also be affected by other factors.
l Lossless mode: also called the share mode on an interface. The interface allows both
RFC 2544 traffic and non-RFC 2544 traffic to share bandwidth so that non-RFC 2544
traffic can be transmitted properly without being discarded or interrupted.
NOTE
In lossless mode, when bandwidth congestion occurs, RFC 2544 traffic and non-RFC 2544 traffic
affect each other.
l Lossy mode: This mode supports exclusive port occupation and priority-based blocking.
The test will interrupt services.
– Exclusive port occupation: The port is exclusively occupied, and non-RFC 2544
traffic will be discarded, causing service interruptions.
– Priority-based blocking: Only service packets of a specified priority are blocked,
and those of other priorities can be properly forwarded.
Test procedure
Throughput tests are conducted to test throughput values by sending test packets at rates of
the specified upper and lower rate thresholds or at rates in between. The difference between
the test result and actual throughput must be less than a specified precision value. The test
procedure is as follows:
1. An initiator sends test packets at a rate equal to the lower threshold. The network
bandwidth is acceptable if no packet is dropped within a specified period or the packet
loss rate is less than the configured packet loss rate. Then the test continues.
2. The initiator sends test packets at a rate equal to the upper threshold. The network
bandwidth is acceptable if no packet is dropped within a specified period or the packet
loss rate is less than the configured test failure percentage. Then the test continues.
3. The initiator changes rates to send test packets to find a maximum rate that is the final
throughput test result. In the previous step, if the actual packet loss rate is greater than
the configured packet loss rate, the initiator uses the bisection method to attempt to send
test packets at different rates between the upper and lower rate thresholds. This process
repeats until a maximum rate is found when the test result meets the throughput
precision, and the packet loss rate is less than the configured packet loss rate.
l If the satisfying bandwidth is found within a configured bandwidth range, the test ends.
l A test times out, after the configured duration expires. In the test results, the test times
out, the tested bandwidth is recorded. In addition, as the packet lost ratio cannot be
calculated, the device considers all packets discarded.
Latency tests
Latency tests can only be conducted when background traffic is being transmitted. An
initiator sends background traffic at a specific rate and test packets at a specific interval to a
reflector. The initiator then calculates the bidirectional delay time and jitter based on the
transmission and receipt time.
An initiator sends test packets at a specific rate and interval to a reflector. Software collects
statistics about the sent and received packets every second. The initiator stops sending test
packets and counts the number of sent and received packets. The initiator then calculates the
packet loss rate based on the statistics.
Applications
A general flow test can be used in the following scenarios:
l Layer 2: native Ethernet scenario and L2VPN scenario, including Virtual Leased Line
(VLL) and Virtual Private LAN Service (VPLS) networking
l Layer 3: native IP scenario and L3VPN scenario
l IP gateway scenario
FTP/SFTP server
Push
Initiator Reflector
UNI-A
UNI-B
Looped traffic
In both the Layer 2 and Layer 3 scenarios, a general flow test is performed between two UNIs
on the network shown in Figure 2-58. Before a test instance runs, the push function must be
enabled on the initiator, and the initiator must have a reachable route to an FTP or SFTP
server. The initiator sends test packets to the reflector. The reflector returns all test packets
received by a reflector interface or only returns packets matching a specific filter condition.
After the initiator receives the test packets, it collects statistics and yields test results based on
the statistics. The initiator writes the test results into a file and uploads the file onto an FTP or
SFTP server.
FTP/SFTP server
Push
UNI-A
Reflector Initiator
IP gateway
Looped traffic
On the network shown in Figure 2-59, a reflector functions as a switch. Layer 3 services on a
user-side CE are sent to an IP gateway (initiator) through a Layer 2 network. A general flow
test can be conducted in this scenario. The procedure is similar to that in the Layer 3 scenario.
Unlike the initiator in the Layer 3 scenario, the IP gateway cannot learn the MAC address of
the reflector or CE. The reflector simulates a user logging in to the CE and sends gratuitous
ARP packets to the IP gateway. The IP gateway can learn the MAC address carried in the
gratuitous ARP packets.
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the Ethernet
Service Activation Test function.
Background
Ethernet service activation test is a technique that is provided for carriers to evaluate Ethernet
performance. Before rolling out services, carriers are desperate to know current network
performance, whether network configurations are correct, and whether the network
performance meets the service level agreement (SLA). The information facilitates future
business planning and service promotion. Therefore, a highly reliable and precise
performance test method is essential for the carriers to quickly evaluate their network
performance.
To address this issue, the Internet Engineering Task Force (IETF) published RFC 2544
"Benchmarking Methodology for Network Interconnect Devices", defining generalflow
testing as a standard method of evaluating network performance. However, this type of test
has restrictions in real-world applications due to the following disadvantages:
l The results of a generalflow test denote the network performance boundary only. They
cannot be used to evaluate whether a specified service meets the SLA.
l The test is unable to verify network configurations or provide performance counters,
such as the committed information rate (CIR), excess information rate (EIR), and color
mode (CM).
ITU-T Y.1564 (Ethernet Service Activation Test Methodology) was then published in 2011,
providing the test method of Ethernet service activation for carriers to evaluate Ethernet
performance. The values of performance counters in an Ethernet service activation test report
are different from those of the same performance counters in a generalflow test report. Table
2-15 lists the differences. Compared with a generalflow test, an Ethernet service activation
test offsets the preceding disadvantages and better reflects the performance of network
resources leased to customers.
Table 2-15 Performance counters that the generalflow and Ethernet service activation test
reports both provide
Performance Counter Generalflow Test Ethernet Service
Activation Test
Related Concepts
Bandwidth profile
A bandwidth profile defines the bandwidth that a carrier assigns user services that need to
enter a carrier network and the priorities based on which the user services are processed.
A bandwidth profile restricts the transmission rate of service traffic using parameters, such as
CIR and EIR.
l CIR: Rate at which a frame relay network agrees to transfer information in normal
conditions. Namely, it is the rate, at which the token is transferred to the leaky bucket.
l EIR: Bandwidth for excessive or burst traffic above the CIR. It equals the result of the
actual transmission rate without the safety rate.
During testing, service frames are marked green, yellow, or red based on the CIR and EIR.
Devices on the network forward green frames, place yellow frames into queues, and discard
red frames. Figure 2-60 shows the mappings between service transmission rates, CIR and
EIR, and colors.
Figure 2-60 Mappings between service transmission rates, CIR and EIR, and colors
Transmission
rate
100% of
link rate
CIR + EIR
CIR
Test period
Red frames
Non-conformant to either CIR or EIR
Yellow frames
Conformant to EIR
Green frames
Conformant to CIR
NOTE
The mapping process is also called traffic measurement, which can be implemented by hardware using
the token bucket technique.
In addition to CIR and EIR, the CM parameter is used to control traffic measurement. The
CM enables carriers to measure and process service traffic by marking traffic priorities (such
as 802.1p and DSCP), rather than only based on the CIR and EIR.
The CM parameter is essential when the same service has applications with different
performance requirements. For example, voice traffic that requires a low packet loss ratio and
short delay needs to be marked green. However, TCP file transfer traffic that is insensitive to
transmission problems can be marked yellow.
l Color-aware mode
Before entering a carrier network, high-priority traffic is marked green, while low-
priority traffic is marked yellow. A carrier network device processes traffic marked with
different colors based on the mapping between the CIR and EIR.
In this mode, user traffic must match the mapping between the CIR and EIR. Otherwise,
the traffic color will be changed by the carrier network. For example, if the rate of the
traffic marked green falls between the CIR and the sum of the CIR and EIR, the traffic
color is changed to yellow by the carrier network. If the rate of the traffic marked yellow
exceeds the sum of the CIR and EIR, the carrier network discards the traffic on UNIs.
l Color-blind mode
A carrier network device processes traffic based on the mapping between the CIR and
EIR in compliance with the first in first out (FIFO) rule, regardless of traffic colors.
FTP/SFTP server
Push
Customer
Customer network
network
PE1 PE2
(Initiator) (Reflector)
UNI A
UNI B
l Outward mode
In Figure 2-62, the initiator and reflector do not reside on the tested network. They are
external devices connected to NNI A or NNI B at one end of the network. The reflector
can be a CE that supports reflection.
FTP/SFTP server
Push
Customer
Customer Network
Network
PE1 PE2
(Initiator) (Reflector)
UNI A
UNI B
Test Procedure
An Ethernet service activation test is conducted to check whether the transmission
performance of Ethernet frames meets the SLA. Figure 2-63 shows the two phases in an
Ethernet service activation test: configuration test and performance test. Before a general flow
test is conducted, the push function must be configured on an initiator, and the initiator must
have a reachable route to the FTP or SFTP server. After the general flow test is complete, the
initiator writes test results into a file and upload the file onto an FTP or SFTP server.
l Configuration test: Each service flow must be generated and tested separately to verify
the correctness of Ethernet service deployment.
l Performance test: Each service flow must be tested based on the configured CIR to
measure service forwarding quality. A performance test takes a longer period of time to
complete than a configuration test.
Set test
parameters
Start the
test
Pass
Performance
test
Pass
Test
completed
Configuration tests
Configuration tests include CIR, EIR, and traffic policing tests.
1. CIR test
CIR tests include simple and step CIR tests. Table 2-16 describes CIR test methods and
advantages.
Step CIR test As shown in Figure 2-60: A step CIR test provides
1. An initiator more accurate network
periodically sends test status analysis, whereas
flows at 25%, 50%, the test time is longer than
75%, and 100% of the that of a simple CIR test.
configured CIR.
2. A reflector loops the
test flows back.
3. The initiator receives
test flows looped by
the reflector and
calculate the IR, FLR,
FTD, FDV, and AVAIL
at each rate.
4. If the calculated
counters are within the
configured SAC range,
the test is successful,
and the next test can be
conducted. If the
calculated counters are
out of the configured
SAC range, the test
fails, and the whole test
stops.
5. If all tests are
successful, the CIR test
is successful, and the
EIR test is conducted.
2. EIR test
EIR tests are conducted in either color-aware or color-blind mode. Table 2-17 describes
EIR test methods and advantages.
NOTE
If the EIR is set to 0 kbit/s, the EIR test is not conducted, and a traffic policing test is performed.
3. Traffic policing test
Traffic policing tests are conducted using either the color-aware or Color-blind mode.
Table 2-18 describes traffic policing test methods and advantages.
NOTE
If the traffic policing test for NQA test flows is disabled, a performance test is conducted
immediately.
Performance test
A device automatically starts a performance test only after the configuration tests, including
the CIR, EIR, and traffic policing tests, are complete.
In a performance test, an initiator simultaneously sends test flows for all service flows at the
specified CIR.
A reflector loops the test flows back. Upon receipt, the initiator calculates performance
counters, including the IR, FLR, FTD, FDV, and AVAIL for each service flow. If the counters
are within the configured SAC range, the performance test for a service flow is successful. If
the counters are out of the configured SAC range, the performance for the service flow fails.
NOTE
During the Ethernet service activation test, if the master and the slave switch over in 1 to 1 mode, and
the ISSU upgrade is performed, the test may be failed and need to be restarted.
Usage Scenarios
An Ethernet service activation test applies to the following scenarios:
l Layer 2 scenarios: native Ethernet and L2VPN (such as a scenario where virtual leased
line (VLL) or virtual private LAN service (VPLS) services are created)
l Layer 3 scenarios: Native IP and L3VPN
l Virtual gateway scenario
NOTE
The L2VPN and L3VPN scenarios support only the inward mode.
Benefits
Ethernet service activation tests provide accurate test results that can reliably reflect the
performance of networks that carriers lease to customers. Therefore, Ethernet service
activation tests help carriers verify that network quality meets requirements before service
provisioning on networks when high-precision tests are not required, tests devices are
insufficient, or onsite operations are difficult to perform.
MA Maintenance Association
MD Maintenance Domain
MP Maintenance Point
Definition
The ping command is a very common debugging tool for testing the accessibility of devices.
It uses a series of Internet Control Message Protocol (ICMP) Echo messages to determine:
The tracert command is used to discover the gateways that packets actually pass through
when traveling from the source host to the destination host.
Purpose
When a device is faulty, you can use the ping and tracert commands to check network
connectivity.
The ping command is used to test the network connectivity and the host accessibility. The
source host first sends an ICMP request message to the destination host, and then waits for an
ICMP reply message.
The tracert command is used to check the network connectivity and locate network faults.
2.6.2 Principles
time called a timeout. If the source does not receive the echo reply message within the
timeout, the source displays that the Request message times out.
The ping command sets the identifier field in the ICMP message as the process ID of the
sending process. This allows the remote end to distinguish multiple ping processes that are
running on the local end simultaneously.
The ping command labels each ICMP Echo Request message with a sequence ID that starts
from 1 and is increased by 1. The number of ICMP Echo Request messages to be sent varies
with different systems. The default number is 5. The number of ICMP Echo Request
messages can also be set through commands. If the destination is reachable, the source can
receive five ICMP Echo Reply messages from the destination, with sequence numbers
corresponding to those of ICMP Echo Request messages.
If the TTL field is reduced to 0 during the message forwarding, the device that the message
reaches sends an ICMP timeout message to the source host, indicating that the destination
host is unreachable.
2.6.2.3 LSPV
Label Switched Path Verification (LSPV) is a mechanism that uses the MPLS ping and
traceroute (abbreviated as tracert) to detect LSP errors and locate faulty nodes.
MPLS tunnel technologies support multiple upper-layer protocols and services. Similar to the
IP ping and tracert, the MPLS ping and tracert are used to detect the connectivity of an LSP.
In MPLS, the control panel responsible for establishing LSPs cannot detect data forwarding
failures over LSPs. This makes the network maintenance difficult.
The MPLS ping and tracert use MPLS Echo Request messages and MPLS Echo Reply
messages to detect the connectivity of an LSP. Both MPLS Echo Request and MPLS Echo
Reply messages are UDP packets using the well-known UDP port of 3503. The receiver
identifies MPLS Echo Request messages based on the port number. An MPLS Echo Request
message carries FEC is sent along the same LSP as common packets with the same FEC to
detect the connectivity of the LSP. MPLS Echo Request messages are transmitted to the
destination by using MPLS, whereas MPLS Echo Reply messages are transmitted to the
source by using IP. To prevent the egress from forwarding the received Echo Request message
to other nodes, the destination address in the IP header of the Echo Request message is set to
127.0.0.1/8 (the local loopback address), and the TTL value contained in the IP header is set
to 1.
The VRP supports ping and tracert for the following link types:
NOTE
If there are multiple LDP LSPs, the LDP LSP to be checked is determined by the next hop address.
NOTE
If the TE tunnel is configured with a hot-standby LSP, the connectivity of the hot-standby LSP can be
detected by using a ping or tracert operation.
In a PWE3 VLL ping operation, an Echo Request message is first sent to the peer PE.
After receiving the message, the peer PE extracts and sends FEC information to the
L2VPN module to determine whether the peer PE is the egress. If so, the peer PE returns
an Echo Reply message.
NOTE
l If the reply mode is specified as 4, the label alert function must be enabled for the PW.
l If a multi-hop PW is detected in label alert mode, the PW Switching Point (SPE) sends the Echo
Request message to the L2VPN module. If the L2VPN module determines that the SPE is not the
egress, the SPE will forward the Echo Request message instead of returning an Echo Reply message.
l PWE3 VLL PW tracert
In a PWE3 VLL network, to implement the VLL PW tracert, a PW must be configured
with a PW template enabled with VCCV. A PWE3 VLL tracert can help you obtain
information about SPEs and Ps along the path that the message travels from the source to
the destination, check the connectivity of the PW, and locate the fault of a PW.
A PWE3 VLL tracert can be performed in control word mode, label alert mode, or TTL
mode. The default mode is label alert. The TTL mode and control word mode are
mutually exclusive.
The TTL value of the PW Tracert Request message is increased by 1 each time the
original device sends the request message. Each time the transit node (P node) receives
an Echo Request message with an expired TTL value, it sends the Echo Request message
to the LSPV module. The LSPV module responds with an Echo Reply message
containing information about the next hop of the node that sends the Echo Request
message.
l A PW tracert terminates when either of the following situations occurs:
– The PW Tracert Request message reaches the egress.
– The TTL value of the PW Tracert Request message reaches the upper threshold.
If a PW ID that is optional is set and specified, the PW with the specified PW ID is detected. If no
PW ID is specified, the PW associated with the VSI ID is detected.
In the procedure of the Martini VPLS PW ping and tracert, the following functions are
performed by each type of node:
a. Ingress: The ingress obtains the forwarding token from the L2VPN module based
on the VSI name, peer address, and PW ID. If no PW ID is specified, the first PW is
detected by default. The ingress searches for the TunnelInfo at the control plane
based on the forwarding token, obtains the downstream information based on the
TunnelInfo, and then encapsulates the Request message.
b. Transit node: The transit node searches for the Next Hop Label Forwarding Entry
(NHLFE) and Incoming Label Map (ILM) based on the incoming label and then
obtains the downstream information based on the incoming label and index of the
inbound interface.
c. Egress: The egress delivers the incoming label and FEC TLV to the L2VPN
module. The L2VPN module determines whether the egress is the destination of the
packets. If so, the egress returns a Reply message.
d. The detection result or timeout information is displayed.
l LDP LSP
l TE tunnel
On an MPLS network, if a fault occurs on the MPLS network and the control plane fails to
detect the fault, the source cannot successfully ping the destination. To identify whether the
fault occurs on the MPLS network or the IP network, you can specify the first node in the
ping operation. Subsequent ping packets will be forwarded based on IP, which can help you
fast locate the fault.
2.6.2.4 CE Ping
CE ping is tool used by a PE on the L2VPN network to identify whether the IP address of the
CE is online (reachable) by initiating an ARP request.
As shown in Figure 2-64, a ping operation is performed on PE1 to check whether CE1 is
reachable and an ARP message requesting the MAC address corresponding to the IP address
of CE1 is sent from the AC interface on PE1. CE1 responds the request with an ARP reply
message when the required IP address is the IP address of itself. Upon receiving the ARP
reply message, PE1 displays that the IP address is reachable.
Figure 2-64 Networking diagram of configuring CE ping to detect the connectivity between
the PE and CE on a VLL network
CE-Ping
TE Traffic Engineering
CE Custom Edge
PE Provider Edge
UPE Underlayer PE
SPE Superstratum PE
PW Pseudo-Wires
2.7.1 Introduction
Definition
The Fault Management (FM) is used to dynamically manage and report alarms generated on
devices in a centralized manner.
Purpose
With the rapid growth in network scales and complexity, more and more network
configurations and applied features are required. When a module on a device is faulty, a great
number of alarms may be generated on one or multiple devices. The alarms, however, may be
lost during sending to the network management device because of limited capability of
handling alarms on the devices or the network management system (NMS). As a result,
certain needed alarms cannot be displayed, which inconveniences network management.
In the FM, alarm classification and alarm buffer are introduced.
l Alarm classification: Alarms can be classified into levels. (Default alarm classification is
enabled in the system, and you can modify alarm classification.) You can use alarm
classification to display the concerned alarms and shield the alarms that are not needed
from being displayed.
l Alarm buffer: Alarms or events of specified types can be saved in the devices. (Default
types are set in the system, and you can modify alarm types.) The alarms saved on the
device can be displayed on the NMS through MIB interfaces. In addition, the device
provides the active alarm function. The NMS synchronizes the alarms of the current
activities in real time.
2.7.2 Principles
l If the user focuses on certain types of alarms, he or she can set these types of alarms to
be of the highest level and configure filtering conditions. In this manner, the system
reports only these types of alarms to the NMS.
Before sending an alarm, the system determines whether the alarm is destined for Huawei's
NMS. If so, a parameter DateAndTime is added to the alarm binding table to store the alarm
generation time. The NMS then obtains the alarm generation time by parsing this parameter.
NOTE
This function is applicable to only Huawei's NMS. For a third-party network management network
system, the alarm binding table does not contain the parameter DateAndTime. Therefore, this function is
not supported.
After an alarm is sent to the FM module, the system determines whether the alarm needs to be
saved. If it needs to be saved, the system generates a copy of the alarm binding table and store
the copy in the alarm queue or the event queue according to the alarm type. The system also
provides MIB interfaces, through which users can obtain alarms from a device.
NOTE
The alarm binding table obtained by the NMS is coded according to the type, length, and value (TLV);
the alarms obtained by the NMS through the MIB are coded messages.
The binding table stored on the device is of Huawei's private data structure. Therefore, a third-party
NMS may not be able to correctly parse the alarms.
Terms
None.
Abbreviation
Abbreviatio Full Spelling
n
FM Fault Management
2.8.1 Introduction
Definition
Performance management (PM) is used to monitor and collect performance indexes in the
system, such as the CPU usage and data about received and sent packets. It also periodically
collects data about each performance, provides current and historical performance statistics
for user query, and reports alarms based on user-defined performance thresholds.
Purpose
As the telecommunication industry develops, users pose higher operation and maintenance
requirements on devices. PM is the key feature for improving operation and maintenance
capability of devices. It provides current and historical statistics on all performance indexes,
which are used to judge running status of the system and analyze system errors, and used as
basis of system configuration.
Performance trend can be analyzed based on performance data. For example, you can analyze
the increase trend and speed of network traffic in a month or more based on the peak and
bottom values of traffic in one day.
Based on analysis of various performance data, you can provide materials and basis for
optimizing network configurations and expanding network capacity.
2.8.2 Principles
Performance management includes the threshold monitoring function and statistics function.
2.8.2.1 Statistics
To allow devices to periodically collect performance data, you can configure the statistics
function.
The statistics function contains different statistics tasks. Each statistics task is configured with
only one statistics instance type and data collection period. You can set a data collection
period (to 5 m, 10 m, 15 m, 30 m, 60 m, or 1440 m), set a statistics instance and related
indexes, and set an interval at which the system generates statistics files (the value ranges
from 1 to 16). When a statistics task is running, the system collects parameters used in the
monitored instance and indexes in a specific period, and calculates collected values at the end
of the period. The system will save statistics data to a file after the specific period (collection
period x interval at which statistics files are generated).
To allow the system to send alarms when system's performance data exceeds the
corresponding threshold, you can configure the threshold monitoring function. This function
monitors the system periodically. The system compares the indicator value of a collected
instance and the threshold value of monitoring rules within a certain interval. If the indicator
value exceeds the range of the threshold value, an alarm will be triggered. After an alarm is
triggered, the system will monitor the data until the data fall within the specified range. The
statistics function can also suppress the alarm signals to prevent repeated sending of an alarm.
You can query statistics files and current or historical performance statistics using the NMS or
related commands, and clear the current performance statistics.
The system can upload statistics files to the performance management server for network
management. The uploading mode is passive. This is because the system is instructed by the
NMS to upload files generated periodically.
Term
None.
PM Performance Management
2.9.1 Overview
Definition
Power over Ethernet (PoE) refers to power supply through an Ethernet. It is also called power
over LAN (PoL) or active Ethernet.
UPS
IPRAN
PSE
PD
Purpose
As IP phones, network video monitoring, and wireless Ethernet networks are widely applied,
the power supply requirements on the Ethernet become urgent. In most situations, access
point devices need DC power supply, but access point devices are often installed outdoors or
on the ceiling that has a long distance from the ground. The nearby proper power socket is
difficult to find. Even if the proper power socket is available, the network administrator finds
it hard to install the AC/DC converter required by access point devices. On many large-scale
LANs, administrators need to manage multiple access point devices that require uniform
power supply and management. In this case, power supply management is difficult. The PoE
function addresses this problem.
The PoE technology is used on the wired Ethernet and is most widely used on local LANs.
The PoE function transmits power together with data to terminals over cables or transmits
power without data over idle lines. This technology provides power on the 10Base-T,
100Base-TX, or 1000Base-T Ethernet at a distance of up to 100 m. PoE can be used to
effectively provide centralized power for terminals such as IP phones, Access Points (APs),
chargers of portable devices, POS machines, cameras, and data collection devices. Terminals
are provided with power when they access the network. Therefore, indoor cabling of power
supply is not required.
Benefits
l Power supplies are easily and conveniently accessed and the costs of power cables and
cable routing are saved.
l Uninterruptible power supplies (UPSs) are also used to provide redundancy power
supply to IP cameras, video servers, and IP phones, in order to prevent the devices from
being powered off.
1 Detecting On the PSE, the port where PoE is enabled outputs a low voltage
the PD initially until the PSE detects the PD (connected to the line
terminal) that supports IEEE 802.3af or IEEE 802.3at.
2 Negotiating The PSE classifies the PD and negotiates the power supply
the power capability with the PD by analyzing the detected feature resistance.
supply
capability
with the
PD
4 Normally After the voltage reaches to a relative steady voltage, he PSE starts
supplying to supply power to the PD.
power to
the PD
5 Powering When supplying power to the PD, the PSE continuously detects the
off the PD input current of the PD. When the input current of the PD is lower
than the limit or the input current increases sharply, the PSE
powers off the PD and starts detecting the PD again. The PD is
lower than the limit when the PD is removed. The input current
increases sharply when the PD is disconnected from the PSE, the
PD power is overloaded, the PD is short-circuited, or the PD power
exceeds the power supply capability of the PSE.
As defined in IEEE standard, PSEs provide power for PDs and are classified into MidSpan
(the PoE module is installed out of the device) and Endpoint (the PoE module is integrated to
device) PSEs. The Endpoint PSE is compatible with 10Base-T, 100Base-TX, and 1000Base-T
interfaces. The Endpoint PSE is more widely used than the Midspan PSE.
ATN only support Endpoint PSEs.
ATN
Power Power
Sourcing Device
Equipment (PD)
(PSE)
Endpoint PSEs can work in Alternative A (line pair 1/2 and line pair 3/6) and Alternative B
(Line pair 4/5 and line pair 7/8) power supply modes according to different copper line pairs.
l Alternative A mode: Power is transmitted over pairs of lines that transmit data.
The PSE supplies power to PDs using twisted pairs 1/2 and 3/6. The DC power and data
frequency do not interfere with each other. Twisted pair 1/2 forms the positive (negative)
pole while twisted pair 3/6 forms the negative (positive) pole.
10BASE-T and 100BASE-TX interfaces use twisted pairs 1/2 and 3/6 to transmit data
while the 1000BASE-T interface use four twisted pairs to transmit data.
Figure 2-67 10BASE-T and 100BASE-TX interfaces using the alternative A power
supply mode
1 1
Data Pair Data Pair
2 2
4 4
Power Power
Sourcing 5 5 Device
Equipment 7 7
(PD)
(PSE)
8 8
3 3
Data Pair Data Pair
6 6
1 1
Data Pair Data Pair
2 2
4 4
Data Pair Data Pair
5 5
7 7
Data Pair Data Pair
8 8
3 3
Data Pair Data Pair
6 6
Power
Power
Sourcing
Device
Equipment
(PD)
(PSE)
Figure 2-69 10BASE-T and 100BASE-TX interfaces using the alternative B power
supply mode
1 1
Data Pair Data Pair
2 2
4 4
Power Power
Sourcing 5 5 Device
Equipment 7 7
(PD)
(PSE)
8 8
3 3
Data Pair Data Pair
6 6
1 1
Data Pair Data Pair
2 2
4 4
Data Pair Data Pair
5 5
7 7
Data Pair Data Pair
8 8
3 3
Data Pair Data Pair
6 6
Power
Power
Sourcing
Device
Equipment
(PD)
(PSE)
Generally, a standard PD supports the two modes, whereas the PSE only needs to support one
mode. ATN support only Alternative A.
2.9.3 Applications
Usually, terminal equipment (such as an IP phones, AP, and data collector) requires DC power
supply. However, such devices are usually installed in corridors or on ceilings, where suitable
power sockets are unavailable. On many large-scaled local area networks (LANs), the
administrator needs to manage devices at multiple access positions simultaneously. These
devices require unified power supply, making power supply management difficult.
As shown in Figure 2-71, after power over Ethernet (PoE) is deployed, the PSE directly
supplies power to access devices (such as IP phones, APs, and other wireless LAN access
devices). This eliminates the needs for external power supplies, decreases cable connections,
reduces costs, and simplifies management.
UPS
IPRAN
PSE
PD
PD Powered device
2.10 TWAMP
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the TWAMP
function.
2.10.1 Introduction
Definition
The Two-Way Active Measurement Protocol (TWAMP) is a technology that measures the
round-trip performance of an IP network.
Purpose
As networks rapidly develop and applications widely apply, various services are deployed to
meet requirements in different scenarios. Therefore, networks encounter increasingly higher
requirements for statistics collection. A tool that rapidly provides statistics about the IP
network performance is in urgent need.
Traditionally, network elements (NEs) themselves generate and maintain statistics about the
IP network performance. To display statistics about the performance of the entire network, a
network management system (NMS) is required to manage multiple NEs and collect statistics
about these NEs. However, there may be no NMS deployed or the NMS may be incapable of
collecting statistics.
TWAMP has the following advantages over the traditional tools that collect statistics about IP
network performance:
l Unlike network quality analysis (NQA), TWAMP has a unified test model and packet
format, facilitating deployment.
l IP Flow Performance Management (FPM) requires end-to-end devices to be
synchronized when implementing statistical analysis, whereas TWAMP boasts stronger
availability and deployment.
Therefore, TWAMP applies to the scenario in which statistics about the IP network
performance must be rapidly obtained but not necessarily be highly accurate.
Benefits
TWAMP brings the following benefits to carriers:
l TWAMP enables carriers to rapidly and flexibly obtain statistics about the performance
of the entire network when the NMS is incapable of collecting such statistics.
l TWAMP can be configured to collect statistics when the IP network does not support
clock synchronization.
2.10.2 Principles
Implementation
The Two-Way Active Measurement Protocol (TWAMP) defines a method for measuring
round-trip IP network performance between two TWAMP-capable devices. Figure 2-72
shows how TWAMP is implemented.
l The performance management system instructs the control-client to establish a test
session with a specific TWAMP server.
l The control-client establishes and completes the test session.
l The performance management system collects statistics during the test are saved.
Control-client
① IP Network
③ ②
Performance
management system
Server
TWAMP collects statistics about the delay, jitter, and packet loss rate.
l The delay and jitter are calculated based on timestamps. The session-sender sends a
probe carrying a sending timestamp T0, and the reflector replies with a response probe
carrying a receiving timestamp T1 and a responding timestamp T2. After receiving the
response probe, the session-sender records the receiving timestamp T3. The delay and
jitter during a single period are calculated based on the four timestamps.
l The packet loss rate is calculated based on the serial numbers (starting from 0) carried in
probes. The session-sender sends a probe with a serial number, and the reflector replies
with a response probe with the same serial number. Each time the session-sender sends a
probe or the reflector replies with a response probe, the serial number increases by 1.
The packet loss rate is calculated based on the two rows of serial numbers.
Intercommunication Model
TWAMP uses the client/server mode and defines four logical entities, as shown in Figure
2-73.
l Control-client: establishes, starts, and stops a test session and collects statistics.
l Session-sender: proactively sends probes for performance statistics after being notified
by the control-client.
l Server: responds to the control-client's request for establishing, starting, or stopping a
test session.
l Session-reflector: replies to the probes sent by the session-sender with response probes
after being notified by the server.
Server
TWAMP-Control
Control-client
To facilitate implementation, TWAMP unifies the four logical entities, as shown in Figure
2-74. Control signals are exchanged between the control-client and server through a TCP
connection; probes are exchanged between the session-sender and session-reflector through a
UDP connection. The control-client and server establish and start a test session. Once a test
session starts, the control-client and server notify the session-sender and session-reflector
respectively of the session information and allow the session-sender to send probes and the
session-reflector to respond to the probes.
NOTE
On a live network, if a network element (NE) functions as a server and session-reflector alone, the NE
participates in TWAMP session establishment and probe exchanges but does not compile statistics. If a device
or tester functions as the control-client and session-sender, the device or tester proactively establishes a
TWAMP session for statistics collection. Users manage the control-client alone to rapidly obtain statistics
about the performance of the entire IP network.
Server-Greeting
Set-Up-Response
Server-Start
Control-client Server
Session-sender Session-reflector
Request-TW-Session
Accept-Session
Test sessions are started, and the session-reflector starts to respond to probes.
Control-client Server
Session-sender Session-reflector
Start-Session
Start-ACK
Stop-Session
2.10.3 Applications
2.10.3.1 TWAMP Applications on an IP Network
As shown in Figure 2-79, ATN A, Router B, and Router C on an IP network function as the
servers in a TWAMP test. Router E functions as the control-client and specifies an IP address
to start collecting statistics. ATN E sends statistics to the performance management system.
Users can compare statistics to measure the performance of network segments. For example,
to measure the performance of the IP network between ATN A and Router B, the control-
client initiates a TWAMP test for ATN A and Router B each. Users can check the statistics
about the performance of the IP network between ATN A and Router B by comparing the two
sets of statistics.
Router B
ATN A
Router E
(Control-client) TWAMP statistics packets
Figure 2-80 TWAMP applications on an L3VPN with the control-client on the private
network side
Performance
management
system
L3VPN
PE2 (Server)
PE1 (Server)
Figure 2-81 TWAMP applications on an L3VPN with the control-client on the public
network side
L3VPN
PE2
Performance
management
system
P
PE1 (Control-client)
2.11.1 Introduction
Definition
Two-Way Active Measurement Protocol (TWAMP) Light is a light version of TWAMP, which
is defined in RFC 5357. TWAMP Light measures the round-trip performance of an IP
network by using simplified control protocol to establish test sessions.
Purpose
On conventional IP radio access networks (IP RANs), carriers desperately need a universal
tool that rapidly provides statistics about the IP network performance for operation,
administration and maintenance (OAM). Currently, Network Quality Analysis (NQA) and IP
Flow Performance Measurement (IP FPM) are mainly used. However, NQA does not allow
for intercommunication between a Huawei device and a non-Huawei device and requires
complex deployment, and IP FPM has high requirements on network devices and applies only
to a few scenarios. To resolve this problem, the Internet Engineering Task Force IP
performance monitoring (IETF IPPM) group defines a set of protocols, including TWAMP.
TWAMP, in its standard or light version, measures the round-trip performance of an IP
network.
As described in Table 2-22, TWAMP Light is simpler and easier than TWAMP.
Benefits
TWAMP Light is an IP link detection technology and can be easily used to help users monitor
the network quality (delay, jitter, and packet loss rate).
2.11.2 Principles
Communication Models
The Two-Way Active Measurement Protocol (TWAMP) measures the round-trip performance
of an IP network and has two versions: standard version and light version (TWAMP Light).
TWAMP uses the client/server model and defines four logical entities:
l Control-Client: establishes, starts, and stops a test session and collects statistics.
l Session-Sender: sends probes for performance statistics after being notified by the
Control-Client.
l Server: responds to the Control-Client's request for establishing, starting, or stopping a
test session.
l Session-Reflector: replies to the probes sent by the Session-Sender with response probes
after being notified by the server.
Controller Responder
Control-Client Server
TWAMP-Control
Session-Sender Session-Reflector
TWAMP-Test
Controller Responder
Server
Control-Client Session-Reflector
Session-Sender
TWAMP-Test
Compared with TWAMP (in Figure 2-82), TWAMP Light (in Figure 2-83) moves the control
plane (server) from the Responder to the Controller.
Therefore, TWAMP Light simplifies the communication model of TWAMP and greatly
relaxes its requirements on the Responder performance, allowing the Responder to be rapidly
deployed. In addition, TWAMP Light supports plug-and-play.
l Controller: sends and receives packets over a test session, collects and calculates
performance statistics, and reports the statistics to the NMS.
l Responder: responds to the packets received over a test session.
NOTE
Different from TWAMP, TWAMP Light has parameters statically configured for a test session. You can
configure the IP address and UDP port number on the Responder using MIBs. After a test session is
created, TWAMP-Test packets are transmitted over the test session to help calculate the performance
statistics, such as the packet loss rate, delay, and jitter. Therefore, TWAMP Light does not need any
control protocol for parameter negotiation. TWAMP Light simplifies the working process of protocols
and is easier to deploy in real world situations.
2.11.2.2 Principles
Related Concepts
The Two-Way Active Measurement Protocol (TWAMP) Light consists of on-demand
measurement and proactive measurement.
l On-demand measurement works in a specified period after being started. It can be
performed once or periodically in the specified period.
l Proactive measurement works continuously after being started to collect statistics.
Network
Controller
Responder
t1 Test-request
t3 t1'
t2'
Test-response t3'
t4'
t2
t4
In Figure 2-85, TWAMP-Test packets function as probes and carry the IP address and
UDP port number that are predefined for the test session between the Controller and
Responder. The Controller sends a TWAMP-Test packet to the Responder, and the
Responder replies to it. The Controller collects TWAMP statistics as follows:
a. The Controller collects statistics about the two-way delay, jitter, and packet loss rate
based on the sequence numbers and timestamps carried in TWAMP-Test packets.
Delay
The delay is calculated based on timestamps. The Controller sends a probe carrying
a sending timestamp t1, and the Responder replies with a response probe carrying a
receiving timestamp t1' and a responding timestamp t2'. After receiving the
response probe, the Controller records the receiving timestamp t2. The delay during
a single period is calculated based on the four timestamps.
Delay1 = t2 - t1- ( t2' - t1')
Jitter
The jitter is calculated based on two consecutive delays.
Based on the preceding delay formula, the following delay can be calculated as
follows: Delay2 = t4 - t3 - ( t4' - t3')
Jitter = | Delay2 — Delay1 |
Packet Loss Rate
The packet loss rate is calculated based on the sequence numbers (starting from 0)
carried in probes. The Controller sends a probe with a sequence number, and the
Responder replies with a response probe with the same sequence number. Each time
the Controller sends a probe or the Responder replies with a response probe, the
sequence number increases by 1. The packet loss rate is calculated based on the two
rows of sequence numbers.
Packet loss rate = Number of lost packets/Total number of sent packets
b. The Controller collects performance statistics based on TWAMP-Test packets and
reports the statistics to the NMS using Performance Monitoring (PM) for proactive
2.11.3 Applications
Network
Router
ATN
Controller
Responder
3 Reliability
This document describes the reliability in terms of the overview, principle, and applications.
3.1 VRRP
3.2 Bit-Error-Triggered Protection Switching
3.3 BFD
3.4 NSR Overview
Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.This
section describes how to implement NSR and related technologies.
3.5 Ethernet OAM
3.6 E-LMI
3.7 MPLS-TP OAM
3.8 ISSU Feature Description
Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.
3.1 VRRP
3.1.1 Introduction
Purpose
The Virtual Router Redundancy Protocol (VRRP) is a fault tolerant protocol that groups
several ATNs into a virtual router. If the next hop ATN of a host fails, VRRP switches traffic
to another ATN, ensuring continuous and reliable communication.
The basic concepts related to VRRP are as follows:
l VRRP device: a router running VRRP, which may belong to one or multiple virtual
routers.
l Virtual router: an abstract device managed by VRRP, also called a VRRP backup group.
A virtual router functions as a default gateway on a shared local area network (LAN). A
virtual router is identified by a virtual router identifier and has a set of virtual IP
addresses.
l Virtual IP address: IP address of a virtual router. A virtual router is manually assigned
one or multiple virtual IP addresses.
l IP address owner: a VRRP device that uses a virtual router's IP address as an actual
interface address. When working normally, the VRRP device responds to packets
destined for the virtual IP address, such as ping packets and TCP packets.
l Virtual MAC address: a MAC address that is generated according to a virtual router ID.
A VRRP virtual router has a virtual MAC address in the format of 00-00-5E-00-01-
{VRID}, and a VRRP6 virtual router has a virtual MAC address in the format of
00-00-5E-00-02-{VRID}. A virtual router responds to Address Resolution Protocol
(ARP) requests using the virtual MAC address but not the interface's actual MAC
address.
l Primary IP address: an IP address selected from one of the physical interfaces' IP
addresses. It is usually the first configured IP address. The primary IP address functions
as the source IP address in VRRP multicast packets.
l Master Router (virtual router master): a VRRP device that forwards packets to the virtual
IP address and responds to ARP requests. When an IP address owner is available, it
usually functions as the master router.
l Backup Router (virtual router backup): a set of VRRP devices that do not forward
packets. If the master router fails, the backup routers will compete to be the new master
router.
l Preemption mode: a mode in which a backup router becomes the master router if the
backup router has a higher priority than the current master router.
Description
As networks rapidly develop and applications become diversified, various value-added
services, such as Internet Protocol television (IPTV) and video conferencing, have become
widespread. Demands for network infrastructure reliability are increasing, especially in
nonstop network transmission.
Generally, hosts use one default gateway to communicate with external networks. If the
default gateway fails, communication between the hosts and external networks is interrupted.
System reliability can be improved using dynamic routing protocols (such as RIP and OSPF)
or ICMP Router Discovery Protocol (IRDP). However, this method requires complex
configurations and each host must support dynamic routing protocols.
VRRP resolves this issue by enabling several routers to be grouped into a virtual router, also
called a VRRP backup group. In normal circumstances, the master router in the VRRP backup
group functions as a default gateway and provides access services for users. If the master
router fails, VRRP elects a backup router from the VRRP backup group to provide access
services for users.
Benefits
VRRP offers the following benefits to carriers:
l Reliable transmission: A logical VRRP gateway on a multicast or broadcast local area
network (LAN), such as an Ethernet network, ensures reliable transmission over key
links. VRRP helps prevent service interruptions if a link to a physical VRRP gateway
fails.
l Flexible applications: A VRRP header is encapsulated into an IP packet. This
implementation allows the association between VRRP and various upper-layer protocols.
l Low network overheads: VRRP uses only VRRP Advertisement packets.
3.1.2 Principles
VRRP combines a group of routing devices on a LAN into a backup group that functions as a
virtual router. Hosts on the LAN only need to obtain the IP address of the virtual router rather
than the IP address of a specific device in the backup group. When the IP address of the
virtual router is configured as the default gateway for the hosts, the hosts can communicate
with an external network through the virtual gateway.
VRRP dynamically associates the virtual router with a physical device that transmits services.
When the device fails, another device is selected to transmit services. The switchover is
transparent to users, allowing the internal and external networks to communicate without
interruption.
Virtual IP Address:
10.110.10.1 ATN A
Master
10.110.10.5
HostA
ATN B
Backup
10.110.10.6
HostB Network
ATN C
10.110.10.7 Backup
HostC
Ethernet
l ATN A, ATN B, and ATN C form a VRRP backup group that functions as a virtual
router. The IP address of the virtual router is 10.110.10.1. The virtual IP address can be
specified or borrowed from an interface of a device in this VRRP backup group.
l The actual IP addresses of ATN A, ATN B, and ATN C are 10.110.10.5, 10.110.10.6, and
10.110.10.7, respectively.
l Hosts on a LAN only need to set the default route to 10.111.10.1 rather than a physical
interface address of a specific device.
Hosts communicate with external networks through this virtual gateway. The virtual router
functions as follows:
l The master device is selected according to device priorities:
– The device with a higher priority is selected as the master device.
– If two devices have the same priority and one of them is the master device, the
backup device will remain in the backup state. If the two devices with the same
priority compete for becoming the master device, the device with a larger interface
IP address will be selected as the master device.
l Other devices function as backup devices and track the status of the master device.
– The master device sends a VRRP multicast packet at intervals of
Advertisement_Interval to notify backup devices in the backup group that the
master device is working normally.
– In a VRRP group with one backup device, when the backup device does not receive
packets from the master device within the period of Master_Down_Interval, the
backup device transitions itself to become the master device. In a VRRP group with
multiple backup devices, when the backup devices do not receive packets from the
master device within the period of Master_Down_Interval, multiple backup devices
may become the master devices in a short period. The devices then compare the
priorities in the received VRRP packets with their local priorities, and the device
with the highest priority is selected as the master device. After a backup device
becomes the master device, it sends gratuitous ARP packets to update MAC entries
on the switches. User traffic is then switched to the master device. The entire
process is transparent to users.
The preceding analysis demonstrates that when using VRRP, hosts do not need to perform
additional operations and can communicate with external networks even when a device fails.
......
IP Address (n)
Authentication Data (1)
Authentication Data (2)
State Machine
VRRP defines three states: Initialize, Master, and Backup. Only the device in the Master state
can forward packets destined for the virtual IP address.
Figure 3-3 shows the VRRP state transition.
INITIALIZE
R
ge
ec
th
sa
e c th e
5 wi
ei
es
ei
ve
25 ae
ve prio
m
s
as sg
s
n
a
w
a rity
Sh
rit me
do
St lo
ut
ut
ar w
pr up
do
Sh
tu e r
y
t
w
p
r
th ta
a
n
io
m th a
S
s
m
es n
ve
es
e
s a 25
ei
es
sa
ec
ge 5
iv
ge
R
ce
w
Re
i th
Receives a packet with higher priority
MASTER BACKUP
MASTER_DOWN_TIMER times out
Initialize: A ATN is in the Initialize state when started. If a Startup message is received, the
ATN changes to the Backup state or the Master state. If theATN is the IP address owner, it
changes to the Master state directly. In this state, the ATN does not process VRRP packets.
Master: In the Master state, the ATN performs the following:
l Sends the VRRP packets periodically.
l Sends the virtual MAC address in respond to ARP requests with the virtual IP address.
l Forwards IP packets in which the destination MAC address is the virtual MAC address.
l If the ATN is the virtual IP address owner, it accepts IP packets of which the destination
IP address is the virtual IP address. If the ATN is the not virtual IP address owner, it
discards these IP packets.
l Transitions to the Backup state if the priority in the received packet is greater than the
local priority.
l Transitions to the Initialize state when the interface is shut down.
Backup: In the Backup state, the ATN performs the following:
l Accepts VRRP packets sent by the master and check determine the master is working
properly.
l Does not respond to ARP requests with the virtual IP address.
l Discards IP packets in which the destination MAC address is the virtual MAC address.
l Discards IP packets in which destination IP address is the virtual IP address.
l When receiving a packet of lower priority, it immediately switches to the Master state by
default. If non-preemption is configured, the ATN resets the timer. If a preemption delay
is configured, the ATN resets the timer and switches to the Master state after the
preemption delay expires. When receiving a packet of higher priority, the ATN resets the
timer. When receiving a packet of equal priority, the ATN resets the timer but does not
compare IP addresses.
l Transitions to the master device when receiving the event that
MASTER_DOWN_TIMER times out.
l Each backup group consists of a master device and multiple backup devices.
l The master devices of backup groups can be different.
l A device can join multiple backup groups and obtain different priorities in each group.
ATN B
Backup
10.110.10.6
HostB Network
ATN C
Backup/Master
HostC 10.110.10.7
Ethernet
Backup group 2
Virtual IP Address:
10.110.10.2
As shown in Figure 3-4, two backup groups are configured, that is, Backup group 1 and
Backup group 2.
l ATN A is the master in Backup group 1 and the backup in Backup group 2.
Principles
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the master/
backup status. A backup device attempts to preempt the Master state after a period three times
as long as the time interval at which VRRP Advertisement packets are broadcast. During this
period, user traffic is still forwarded to the master device, which results in user traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
Related Concepts
Association between a VRRP backup group and a BFD session can be implemented in either
of the following modes:
l When an NPE is directly connected to a UPE, a VRRP backup group can be bound to a
common BFD session. If the BFD session detects a fault and goes Down, the BFD
module notifies the VRRP backup group of the status change. After receiving the
notification, the VRRP backup group changes VRRP priorities of devices and determines
whether to perform a master/backup VRRP switchover.
l When an NPE is connected to a UPE through another device, a VRRP backup group can
be bound to link and peer BFD sessions. If the BFD session detects a fault and goes
Down, the BFD module notifies the VRRP backup group of the status change. After
receiving the notification, the VRRP backup group directly performs a master/backup
switchover.
NOTE
In both association modes, the VRRP backup group can be bound to a static BFD session or a static
BFD session with automatically negotiated discriminators.
Implementation
The following sections describe how to associate a VRRP backup group with a BFD session
in different modes.
Association Between a VRRP Backup Group and a Common BFD Session
On the network shown in Figure 3-5, VRRP is enabled on NPE1 and NPE2. NPE1 functions
as the master device and NPE2 functions as the backup device. NPE1 is transmitting user
traffic. A common BFD session is established between NPE1 and NPE2. The VRRP backup
group tracks the status of the BFD session. If the BFD session detects a fault and goes Down,
the BFD module notifies the VRRP backup group of the status change. After receiving the
notification, the VRRP backup group changes VRRP priorities of devices and performs a
master/backup VRRP switchover.
If the BFD session detects a fault in a link between NPE1 and the UPE, the BFD session goes
Down. The BFD module notifies the VRRP backup group of the status change. After
receiving the notification, NPE2's VRRP priority increases to be higher than NPE1's VRRP
priority. NPE2 becomes the master device and takes over traffic. During this process, a rapid
master/backup VRRP switchover is performed.
Figure 3-5 Association between a VRRP backup group and a common BFD session
Master Backup
NPE1 NPE2
VRRP
BFD
UPE
Figure 3-5 shows the network on which a VRRP backup group tracks a common BFD session
when and after a fault occurs.
l NPE1's VRRP priority is 120 and NEP1 is in the Master state in a VRRP backup group.
l NPE2's VRRP priority is 100 and NEP2 is in the Backup state in a VRRP backup group.
The immediate preemption mode is enabled on NPE2.
l On NPE2, the VRRP backup group is configured to track a common BFD session. If the
BFD session detects a fault and goes Down, NPE2 increases its VRRP priority by 40
after being notified.
Implementation is as follows:
1. When NPE1 works properly, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE2 tracks the status of NPE1 and the BFD
session.
2. If a BFD session detects either of the following faults, the BFD session goes Down:
– Link or device fault between NPE1 and NPE2
NPE1 receives a VRRP Advertisement packet from NPE2 that becomes the master
device. After detecting that the priority carried in the VRRP Advertisement packet
is higher the local priority, NPE1 stops sending VRRP Advertisement packets and
enters the Backup state.
Before receiving a packet from NPE2, NPE1 retains the Master state, while NPE2
becomes the master device. Both NPE1 and NPE2 are in the Master state during a
short period of time. Using a trunk technique can prevent dual masters in a VRRP
backup group.
NPE2 increases its VRRP priority to 140 (100 + 40), higher than NPE1's VRRP
priority. NPE2 preempts the Master state and sends gratuitous ARP packets to
update MAC addresses on downstream devices.
– NPE1 device fault
NPE2 increases its VRRP priority to 140 (100 + 40), higher than NPE1's VRRP
priority. NPE2 preempts the Master state and sends gratuitous ARP packets to
update MAC addresses on downstream devices.
3. After the fault is rectified, the BFD session goes Up.
NPE2 restores the priority value of 100. NPE2 retains the Master state and is still able to
send VRRP Advertisement packets.
After receiving the packets sent by NPE2, NPE1 detects that the priority carried in the
packets is lower than the local VRRP priority, and waits a specified period before
preempting the Master state. After restoring the Master state, NPE1 sends VRRP
Advertisement packets and gratuitous ARP packets.
After receiving VRRP Advertisement packets carrying a higher priority than the local
priority, NPE2 enters the Backup state.
4. NPE1 in the Master state forwards user traffic to networks and NPE2 is in the Backup
state.
The preceding process shows that BFD for VRRP is different from VRRP. After BFD for
VRRP is used and a fault occurs, the backup device immediately preempts the Master state
without waiting a period three times the interval at which a VRRP Advertisement packet is
broadcast. A master/backup VRRP switchover can be implemented in milliseconds.
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
On a network shown in Figure 3-6, VRRP runs between two NPEs. A peer BFD session is set
up between NPEs to detect link and device failures. A link BFD session is established
between each NPE and a UPE to detect link and device failures. When NPE2 detects that the
peer BFD session goes down, while link2 BFD session goes Up, NPE2's VRRP status
changes from Backup to Master and takes over traffic.
Figure 3-6 Association between a VRRP backup group and link and peer BFD sessions
NPE1 NPE2
UPE
Figure 3-6 shows the network on which a VRRP backup group tracks link and peer BFD
sessions.
l NPE1 and NPE2 run VRRP.
l A peer BFD session is established between NPEs through the UPE to detect link and
device failures.
l Link1 BFD session is established between the UPE and NPE1. Link2 BFD session is
established between the UPE and NPE2 to detect link and device failures.
Implementation is as follows:
1. When NPE1 works properly, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE1 tracks the BFD session status. NPE2
tracks the status of NPE1 and the BFD session.
2. If a BFD session detects either of the following faults, the BFD session goes Down:
– Link1 or the UPE fails. Link1 BFD session and the peer BFD session go Down.
Link2 BFD session is Up.
NPE1's VRRP status directly becomes Initialize.
NPE2's VRRP status directly becomes Master.
– NPE1 fails. Link1 BFD session and the peer BFD session go Down. Link2 BFD
session is Up. NPE2's VRRP status becomes Master.
3. After a fault is rectified, the BFD sessions go Up, and the NPEs in the VRRP backup
group restore their VRRP status.
NOTE
A Link2 fault does not affect NPE1 status, and NPE1 continues to forward upstream traffic properly.
However, NPE2's VRRP status becomes Master if the peer BFD session and Link2 BFD session go
Down, and NPE2 detects the peer BFD session status change before detecting Link2 BFD session status
change. After NPE2 detects Link2 BFD session status change, NPE2's VRRP status enters Initialize.
Figure 3-7 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 3-7 State machine for the association between a VRRP backup group and link and
peer BFD sessions
INITIALIZE
Th
n sio
rit s
e go
5 rio oe
ow s
Th s U ty v
l i n es
D se
go rior
y
25 p g
e p, alu
k
e i
es FD
is RP on
BF Do
lin a e
e R si
go k B
k nd is
D n
lu V es
BF t n
se
lin
va the s
D he ot 2
w
ss
d FD
e
se VR 5
Th
io
an k B
n
ss R 5
p, in
io P
U el
n
Th
MASTER BACKUP
The peer BFD session goes Down
and the link BFD session goes Up
The preceding process shows that after link BFD for VRRP and peer BFD for VRRP are
used, the backup device can immediately preempt the Master state if a fault occurs. The
backup device does not wait a period as long as three times the interval at which a VRRP
Advertisement packet is broadcast or its VRRP priority is changed. A master/backup VRRP
switchover can be performed in milliseconds.
Benefits
BFD for VRRP speeds up masters/backup VRRP switchovers if faults occur.
As a result, a backup device preempts to become the master device. Then the new master
device sends a gratuitous ARP packet to the virtual IP address of each virtual router to notify
the related bound modules of the status change. In preemption mode, if the original master
device has a higher priority, it can preempt to become the master device again after the
switchover. This causes the VRRP status to change twice, affecting service traffic.
To prevent service traffic forwarding from being affected during an AMB/SMB switchover,
VRRP devices must support VRRP smooth switching.
When the AMB and SMB on a device are working properly, the master device in a VRRP
backup group sends VRRP multicast packets at intervals of Advertisement_Interval. The
backup device determines whether the master device works properly based on the multicast
packets it receives.
During VRRP smooth switching, the master device cooperates with backup devices to ensure
smooth transmission of services.
l To perform VRRP smooth switching, the master device and backup devices must be
enabled to learn the interval at which VRRP packets are sent. After this function is
enabled:
– The master device does not learn the interval at which VRRP packets are sent or
check consistency of the intervals.
– When a backup device receives a VRRP packet from the master device, it checks
the interval in the VRRP packets. If the interval in the packet is different from the
interval configured on the device, the backup device changes its own interval to the
interval specified in the packet.
l ATN A is configured with VRRP smooth switching. After an AMB/SMB switchover
occurs and the new AMB starts, VRRP saves the currently configured interval, changes
the interval of the master VRRP backup group, and sends a VRRP switching packet
carrying the new interval to ATN B at the currently configured intervals.
l After receiving the VRRP packet, ATN B finds that the interval carried in the VRRP
packet is different the locally configured interval. ATN B then changes the local interval
to the interval carried in the received VRRP packet.
l After smooth switching is complete, ATN A sends a VRRP Recovery packet carrying the
interval set before the AMB/SMB switchover. ATN B then learns the interval again.
When performing VRRP smooth switching, note the following:
l During VRRP smooth switching, the interval learning function takes precedence over the
preemption function. That is, when the interval carried in the received packet is different
from the current interval and the priority carried in the received packet is lower than the
current priority, VRRP first learns the interval and resets the timeout timer, and then
determines whether to preempt to become the master.
l VRRP smooth switching also depends on the system performance. If the system is very
busy after a AMB/SMB switchover occurs and cannot schedule operations of the VRRP
module, VRRP smooth switching cannot take effect.
3.1.2.8 mVRRP
Principles
A UPE is usually dual-homed to two NPEs at the aggregation layer on a MAN. Multiple
VRRP backup groups can be configured on the two NPEs to transmit various types of
services. Each VRRP backup group maintains its own state machine, leading to transmission
of a lot of VRRP Advertisement packets between NPEs.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission,
a VRRP backup group can be configured as a Management Virtual Router Redundancy
Protocol (mVRRP) backup group. Other VRRP backup groups are bound to the mVRRP
backup group and become service VRRP backup groups. Only the mVRRP backup group, not
service VRRP backup groups, sends VRRP packets to negotiate the master/backup status. The
mVRRP backup group determines the master/backup status of the service VRRP backup
groups.
Related Concepts
An mVRRP group has all functions of a common VRRP backup group. Different from a
common VRRP backup group, an mVRRP backup group can be bound to other service VRRP
backup groups and determine the status of the service VRRP backup groups.
An mVRRP backup group can be bound to a maximum of 127 VRRP backup groups, but cannot be
bound to another mVRRP backup group.
Benefits
This feature offers the following benefits:
The VRRP protocol supports both VRRPv2 and VRRPv3 packets. VRRPv2 is defined in RFC
3768, and VRRPv3 is defined in RFC 5798. Both VRRPv2 and VRRPv3 are used to advertise
the priority and status of the master device to other devices in a backup group. Figure 3-8
shows the format of a VRRPv3 packet.
IPvX Address(es)
3.1.3 Applications
Internet
Switch
Solved problem: VRRP cannot detect status changes on interfaces that are not enabled with
VRRP. In this case, when the outbound interface is faulty, VRRP cannot detect the fault,
which causes service interruption.
The configuration is as follows:
l VRRP is enabled to track specified interfaces.
l A VRRP backup group tracks an interface in Increased mode or Reduced mode.
l When the status of the interface tracked by VRRP changes, the VRRP backup group is
notified of the change and then increases or decreases the VRRP priority to determine
VRRP switchover.
As shown in Figure 3-9, ATN—A and ATN—B are enabled with VRRP. In addition, the
priority of the VRRP backup group on ATN—B is higher than the priority of the VRRP group
on ATN—A. ATN B tracks interface in Reduced mode. ATN—B functions as the master
device and the user traffic is sent by the master ATN—B, as shown in dotted lines in Figure
3-9. Now, interface on ATN—B connected to the Internet is faulty. The VRRP backup group
that tracks GE 1/0/0 in Reduced mode decreases the priority. Then, ATN—A preempts to be
the master device and receives user traffic and sends the traffic to the Internet.
3.1.3.2 mVRRP
Master
NPE1
UPE
NPE2
mVRRP Backup
Service VRRP
Problem: A large number of VRRP packets are transmitted, wasting bandwidth and CPU
resources.
l An mVRRP backup group and multiple ordinary VRRP backup groups are set up on
NPE 1 and NPE 2. The ordinary VRRP backup groups are bound to the mVRRP backup
group and function as service VRRP backup groups.
l The UPE does not sense the mVRRP backup group and service VRRP backup groups.
As shown in Figure 3-10, when an mVRRP backup group on NPE 1 changes from the Master
state to the Backup or Initialize state, the mVRRP backup group requests all its bound service
VRRP backup groups to change their state to Backup. In this case, the mVRRP backup group
on NPE 2 changes from the Backup state to the Master state, and all service VRRP backup
groups bound to it also change their status to Master. When the mVRRP backup group and the
service backup groups change to the Master state, they broadcast gratuitous ARP packets to
switch user traffic to the new master backup groups.
ME Metro Ethernet
PW pseudo wire
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from
narrowband voice services to integrated broadband services, including voice and streaming
media. Meeting the bandwidth demand with traditional bearer networks dramatically raises
carriers' operation costs. To tackle the challenges posed by this rapid broadband-oriented
development, carriers urgently need mobile bearer networks that feature flexibility, low costs,
and high efficiency. IP-based mobile bearer networks are an ideal choice. IP radio access
networks (RANs), a type of IP-based mobile bearer network, are increasingly widely used.
Traditional bearer networks use the retransmission mechanism or the mechanism that allows
one end to accept only one copy of packets from the multiple copies of packets sent by the
other end to minimize bit error impact. IP RANs have higher reliability requirements than
traditional bearer networks when carrying broadband services. Traditional fault detection
mechanisms cannot trigger protection switching based on random bit errors. As a result, bit
errors may degrade or even interrupt services on an IP RAN in extreme cases.
NOTE
To prevent impacts on services, check whether protection links have sufficient bandwidth resources
before deploying bit-error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
l Protects traffic against random bit errors, meeting high reliability requirements and
improving service quality.
l Enables devices to record bit error events. These records help carriers locate the nodes or
lines that have bit errors and take corrective measures promptly.
3.2.2 Principles
Table 3-1 describes the functions provided by bit-error-triggered protection switching.
Interface- This function detects bit errors, This function is the foundation for
based bit calculates the bit error rate (BER), and bit-error-triggered protection
error reports bit error events. switching.
detection
Bit-error- This function detects bit error events This function protects trunk
triggered on member trunk interfaces and uses interfaces against bit errors.
trunk bit error events to trigger trunk
update interfaces to update the availability
status of member interfaces.
Bit-error- This function triggers routes to re- This function protects services
triggered converge after detecting bit error transmitted over a Label
section events on interfaces, which in turn Distribution Protocol (LDP) label
switching triggers traffic to switch from the switched path (LSP) against bit
faulty route to another route. errors.
Related Concepts
Bit-error-triggered protection switching involves the following concepts:
l Bit error: refers to the deviation between a bit that is sent and the bit that is received.
l BER: is the number of bit errors divided by the total number of transferred bits during a
studied time interval. The BER can be considered as an approximate estimate of the bit
error probability.
l Segment BER: is calculated based on the bit errors received by the inbound interface on
an LSP node.
l LSP BER: is calculated based on the BER of each segment on an LSP.
A trunk interface goes Down when the number of member interfaces in the Up state falls
below the configured lower threshold.
To deploy bit-error-triggered section switching, configure bit error detection on the interfaces
along an LDP LSP and configure the switching type as trigger-section. If an interface along
an LDP LSP detects a bit error event, the interface triggers route re-convergence, which in
turn triggers the LDP LSP to switch to another LSP.
As a result, the LDP LSP always uses the link with a lower BER to transmit traffic,
minimizing the impact of bit errors on services.
NOTE
Bit-error-triggered route switching and section switching are mutually exclusive. Before you configure
bit-error-triggered route switching for an LDP LSP, ensure that bit-error-triggered section switching is
not configured.
Figure 3-12 shows how LSP bit error status is determined. If the BER of an LSP reaches or
exceeds the bit-error-triggered protection switching threshold of the RSVP-TE tunnel, the
LSP is in the excessive BER state. If the BER of the LSP is below the bit-error-triggered
protection switching threshold, the LSP is in the normalized BER state.
Protection switching
threshold
Revertive switching
threshold
After the bit error status of the primary and backup LSPs are determined, the RSVE-TE
tunnel determines whether to perform a primary/backup LSP switchover based on the
following principles:
l If the primary and backup LSPs are both in the excessive or normalized BER state, the
RSVE-TE tunnel transmits traffic over the primary LSP.
l If one LSP is in the excessive BER state and the other LSP is in the normalized BER
state, the RSVE-TE tunnel transmits traffic over the latter one, no matter whether the
latter LSP is the primary or backup LSP.
Bit-Error-Triggered PW Switching
As shown in Figure 3-13, an RSVP-TE tunnel carries a PW. PW redundancy is configured to
provide service-level protection. If the RSVP-TE tunnel does not have a TE hot standby
tunnel or the primary and backup LSPs of the RSVP-TE tunnel are both in the excessive BER
state, bit-error-triggered RSVP-TE tunnel switching cannot protect traffic against bit errors.
To resolve this issue, you can configure bit-error-triggered PW switching.
SPE1
PW1
NPE
VPN Site
CE Bypass PW
UPE
PW2
SPE2
RSVP-TE Tunnel
The bit error status of the tunnel carrying the PW refers to the bit error status of the LSP that transmits
traffic in the tunnel.
You can configure a revertive switching policy to control revertive PW switching. When the tunnel
carrying the primary PW enters the normalized BER state, the revertive switching policy allows traffic
to immediately switch back to the primary PW, to switch back to the primary PW after a delay, or not to
switch back to the primary PW.
tunnel switching cannot protect L3VPN services against bit errors. To resolve this issue,
configure bit-error-triggered VPN route switching on SPE1. After the configuration is
complete, SPE1 automatically reduces the priority of VPN routes advertised by itself if it
detects a bit error event. Then, the UPE and NPE preferentially select the VPN routes
advertised by SPE2. As a result, L3VPN services are transmitted over the links without bit
errors. After the RSVP-TE tunnel between the UPE and SPE1 recovers, SPE1 automatically
increases the priority of VPN routes advertised by itself, so that the UPE and NPE
preferentially select the VPN routes advertised by SPE1 again.
Backbone
NPE
VPN Site
CE
UPE
SPE2
RSVP-TE Tunnel
3.2.3 Applications
Networking Description
Figure 3-15 shows a typical IP radio access network (RAN) networking diagram. The IP
RAN uses a Resource Reservation Protocol-Traffic Engineering (RSVP-TE) tunnel to carry a
pseudo wire (PW). A traffic engineering (TE) hot standby tunnel is configured for the RSVP-
TE tunnel to provide link-level protection. PW redundancy is configured to provide service-
level protection.
RNC
PW1
(primary)
PW3
BNC
Feature Deployment
To meet the high reliability requirements of the IP RAN and better protect services against bit
errors, configure bit-error-triggered protection switching for both the RSVE-TE tunnel and
PWs.
To configure bit-error-triggered protection switching for the RSVP-TE tunnel, enable bit error
detection on the interfaces along the primary and backup LSPs, configure the switching type
as trigger-LSP, and configure bit error alarm thresholds. Then, enable bit-error-triggered
protection switching on the tunnel interface and set the bit-error-triggered protection
switching threshold and bit-error-triggered revertive switching threshold.
After you configure bit-error-triggered protection switching for the RSVP-TE tunnel,
configure bit-error-triggered protection switching for the PW carried over the RSVP-TE
tunnel and the backup PW.
Networking Description
Figure 3-16 shows a typical IP radio access network (RAN) networking diagram. The IP
RAN uses a Label Distribution Protocol (LDP) label switched path (LSP) to carry a pseudo
wire (PW). A protection mechanism, such as LDP fast reroute (FRR) or LDP-Interior
Gateway Protocol (IGP) synchronization, is used to provide link-level protection. PW
redundancy is configured to provide service-level protection.
RNC
PW1
(primary)
PW3
BNC
Feature Deployment
To meet the high reliability requirements of the IP RAN and protect services against bit
errors, configure bit-error-triggered protection switching for the LDP LSP. To do so, enable
bit error detection on the interfaces along the LDP LSP, configure the switching type as
trigger-section, and configure bit error alarm thresholds. After an interface along the LDP
LSP detects a bit error event, the interface updates its own status and triggers route re-
convergence, which in turn triggers LDP LSP or PW switching (or revertive LDP LSP or PW
switching).
Networking Description
Figure 3-17 shows a trunk interface networking diagram.
Feature Deployment
To improve trunk reliability, you can configure bit error detection for trunk interfaces. To do
so, enable bit error detection on each member interface and then on the trunk interface itself.
After bit error detection is configured for a trunk interface, a member interface changes its
status when detecting a bit error event, no matter whether the configured switching type is
trigger-LSP or trigger-section. Then, the trunk interface updates the availability status of the
member interface:
l If the status of the member interface changes from Up to Down, the trunk interface
disables the member interface from forwarding traffic.
l If the status of the member interface changes from Down to Up, the trunk interface
enables the member interface to forward traffic.
A trunk interface goes Down when the number of member interfaces in the Up state falls
below the configured lower threshold.
PW pseudo wire
3.3 BFD
3.3.1 Overview
Purpose
Bidirectional forwarding detection (BFD) rapidly monitors communications faults between
systems and notifies upper-layer applications of those faults.
Description
To minimize the impact of a fault on services and improve network availability, a network
device must rapidly detect communications faults between adjacent devices so that the upper
layer protocol can resolve the issue and recover services.
Currently, the existing detection mechanisms are as follows:
l Hardware detection: For example, Synchronous Digital Hierarchy (SDH) alarms are
used to detect link faults. Hardware detection can fast detect a fault; however, not all
media support this hardware detection mechanism.
l Slow Hello: Usually refers to the Hello mechanism used by a routing protocol. The slow
Hello mechanism can detect a fault in seconds. For example, in high-speed gigabit rate
data transmission, a detection time of more than one second results in a large data loss.
Delay-sensitive services, like voice, cannot function with more than a one second delay.
l Other detection mechanisms: Different protocols or manufacturers may provide their
own proprietary detection mechanisms; however, deploying proprietary detection
mechanisms on different systems can be very difficult.
l Low-cost fast fault detection for channels between adjacent forwarding engines. Faults
can be detected on interfaces, data links, and forwarding engines.
l A single mechanism capable of real-time detection over any media, at any protocol layer.
l Upper layer applications provide BFD with parameters, such as the detection address and
the detection time.
l BFD creates, deletes, or modifies a BFD session according to this information and
notifies the upper layer applications of the session status.
The following sections describe basic BFD concepts, including the BFD detection
mechanism, detected link types, BFD session modes, and session management.
BFD control packets are encapsulated in UDP packets. In the initial phase of a BFD session,
both systems negotiate with each other using parameters in BFD control packets, such as
discriminators, expected minimum intervals for sending and receiving BFD control packets,
and local BFD session status. When negotiations are successful, the two systems send BFD
control packets to each other at the negotiated intervals.
To meet fast detection requirements, the BFD draft specified that BFD control packets must
be sent and received at intervals expressed in microseconds. However, BFD-enabled devices
of most manufacturers can only process BFD control packets within milliseconds due to
limited processing capabilities. Therefore, the configured interval is expressed in milliseconds
and is converted to microseconds during internal processing. The minimum detection time
that the ATN supports is 10 milliseconds.
PWs l SS PWs -
l MS PWs
l BGP PWs
l IP links
In the ATN, BFD in either single-hop detection mode or multi-hop detection mode can
monitor the following IP links:
– Layer 3 physical interfaces
– Ethernet sub-interfaces (including Eth-Trunk sub-interfaces)
– MLPPP
When a physical Ethernet interface has several sub-interfaces, BFD sessions can be
established on the physical Ethernet interface and each of its sub-interfaces.
l Eth-Trunk
– Layer 2 Eth-Trunk links
– Layer 3 Eth-Trunk links
l VLANIF
– VLAN Ethernet member links
– VLAN Ethernet sub-interfaces
– VLANIF interfaces
BFD sessions used to detect a VLANIF interface and VLAN member interfaces are
independent from each other and can detect these interfaces at the same time.
l MPLS LSP
To detect Multiprotocol Label Switching label switched path (MPLS LSP) connectivity,
BFD session negotiation is performed in the following modes:
Static mode BFD session parameters, such as the local and remote
discriminators, are manually configured and delivered for BFD
session establishment.
NOTE
In static mode, configure unique local and remote discriminators for each
BFD session. This mode prevents incorrect discriminators from affecting
BFD sessions that have correct discriminators and prevents BFD sessions
from alternating between Up and Down.
1. BFD configured on both ATN A and ATN B independently starts state machines. The
initial status of BFD state machines is Down. ATN A and ATN B send BFD control
packets with the State field set to Down. If BFD sessions are established in static mode,
the value of Your Discriminator in BFD control packets is manually specified. If BFD
sessions are established in dynamic mode, the value of Your Discriminator is set to 0.
2. After receiving a BFD control packet with the State field set to Down, ATN B switches
the session status to Init and sends a BFD control packet with the State field set to Init.
NOTE
After the local BFD session status of ATN B changes to Init, ATN B no longer processes the
received BFD control packets with the State field set to Down.
3. The BFD session status change of ATN A is the same as that of ATN B.
4. After receiving a BFD control packet with the State field set to Init, ATN B changes the
local session status to Up.
5. The BFD session status change of ATN A is the same as that of ATN B.
1. ATN and CX-B enable BFD state machines respectively. The initial status of BFD state
machines is Down. ATN and CX-B send BFD control packets with the State field being
Down. In the static configuration of a BFD session, Your Discriminator in the BFD
control packet is specified manually. In dynamic establishment of a BFD session, Your
Discriminator is 0.
2. After receiving the BFD packet with the State field being Down, CX-B switches the
session status to Init and sends the BFD packet with the State field set to Init.
3. After the local BFD session status of CX-B changes to Init, CX-B no longer processes
the received BFD packets with the State field being Down.
4. The status change of the BFD session on ATN is the same as the status change of the
BFD session on CX-B.
5. After receiving the BFD packet with the State field being Init, CX-B changes the local
session status to Up.
6. The status change of the BFD session on ATN is the same as the status change of the
BFD session on CX-B.
BFD session
ATN CX-B
BFD session
Example 2
Figure 3-20 shows a multi-hop BFD session detecting a path between ATN A and ATN C.
The BFD session is bound to the peer IP address but not the outgoing interface.
BFD session
BFD session
In BFD for PIS, after detecting a link fault, a BFD session immediately sends a Down
message to the corresponding interface. Then, the interface enters the BFD Down state, which
matches the link protocol Down state. An interface in BFD Down state processes only BFD
packets, so the interface can quickly detect link faults.
To configure BFD for PIS, configure a multicast BFD session and associate it with an
interface. In BFD for PIS, BFD packet forwarding is independent of the IP attributes on the
interface.
ATN CX-B
BFD session
As shown in Figure 3-21, a BFD session is established on ATN and CX-B. The BFD session
sends a packet with the source address being the default multicast IP address to GE 1/0/0 to
detect the single-hop link. After BFD for PIS is enabled, when BFD detects a link fault, the
BFD session sends a Down message to the corresponding interface and then the interface
enters the BFD Down state.
The BFD control packets are encapsulated in the UDP packets, using the source port in the
range of 49152 to 65535 and destination port 3784 or 4784. As defined in the BFD draft, the
destination port 4784 is used by multi-hop BFD control packets.
Application Environment
Typical Application 1
Figure 3-22 shows that a BFD session detects a single-hop path between devices and the BFD
session is bound to the outgoing interface.
ATN CX-B
BFD session
Typical Application 2
Figure 3-23 shows that a BFD session detects a multi-hop path between ATN A and CXC and
the BFD session is bound to the peer IP address but not the outgoing interface.
BFD session
With BFDv6 and routing protocol association, after a new neighbor relationship is set up
based on the routing protocol, a BFDv6 session is dynamically established to detect the link
between the neighbors. After detecting a link failure, BFDv6 notifies the routing protocol of
the failure. In this manner, faster convergence is achieved. If the neighbor relationship is
Down, the BFDv6 session is deleted dynamically.
In BFD for IS-IS, the establishment of a BFD session is dynamically triggered by IS-IS but
not configured manually. When detecting a fault, the BFD session notifies IS-IS of the fault
through the Routing Management Module (RM). IS-IS processes the neighbor-Down event
and quickly sends the link state PDU (LSP), and performs the partial route calculation (PRC).
In this manner, IS-IS routes fast converge.
The BFD fault detection interval is at the millisecond level. Instead of replacing the IS-IS
Hello mechanism, BFD works with IS-IS to detect the adjacency fault more quickly. In
addition, BFD instructs IS-IS to recalculate routes, ensuring correct packet forwarding.
The RM allows IS-IS and BFD to interact with each other. Through the RM, IS-IS instructs
BFD to dynamically set up or delete BFD sessions. The BFD event messages are also
delivered to IS-IS through the RM.
BFD session
BFD session
After BFD is enabled on ATN—A, ATN—B, and ATN—C, the BFD session can quickly
detect faults on the link between ATN—A and ATN—B, and notify IS-IS through the RM.
Then, IS-IS sets the neighbor status to Down to trigger the IS-IS topology calculation. In
addition, IS-IS updates LSPs to ensure that ATN—C (ATN—B's neighbor) can receive the
updated LSPs from ATN—B in time. This implements fast network topology convergence.
NOTE
By default, a multi-hop BGP session is established between Huawei devices that set up an IBGP peer
relationship. A BFD for IGP session and A BFD for IBGP session cannot be both set up between a
Huawei device and a non-Huawei device that sets up a single-hop BGP session with its peer by default.
In such a situation, setting up only A BFD for IGP session or A BFD for IBGP session between the
Huawei and non-Huawei devices is recommended.
AS 100 AS 200
EBGP
ATNA ATNB
BFD session
As shown in Figure 3-25, ATN—A belongs to AS 100 and ATN—B belongs to AS 200. ATN
—A and ATN—B are directly connected through the External Border Gateway Protocol
(EGBP). A BFD session is established to detect the BGP neighbor relationship between ATN
—A and ATN—B. When the link between ATN—A and ATN—B is faulty, the BFD session
can quickly detect the fault and notify BGP.
l Static configuration: The negotiation of a BFD session is performed using the local
discriminator and remote discriminator that are configured manually.
l Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator TLV in LSP ping packets.
BFD detects the following types of LSPs:
l Static LSP
l LDP LSP
l Static CR-LSP
l Dynamic CR-LSP
BFD uses the asymmetric mode to detect LSP connectivity. That is, the ingress and the egress
periodically send BFD packets to each other. If the ingress or the egress does not receive BFD
packets from the other within the detection period, the LSP is considered Down and BFD
sends an LSP Down message to the LSP management module (LSPM).
P1
PE1 CE2
BFD session
PE3
As shown in Figure 3-26, only traffic from PE1 to CE2 is involved in BFD for LSP. When a
fault occurs on the link between PE1 and P1, PE1 can detect the fault through the interface,
and BFD for LDP LSP does not need to be configured. When a fault occurs on the link
between P1 and PE2, PE1 cannot detect the fault through the interface, and BFD for LDP LSP
must be configured to perform fast detection.
An LDP LSP destined for PE2 is set up on PE1. BFD for LDP LSP is enabled and a BFD
session is set up. Policies of Virtual Private Network fast reroute (VPN FRR) are configured
on PE1, and the path from PE1 to PE3 is configured as the protection path.
When a fault occurs on the link between PE1 and P1 or between P1 and PE2, PE1 quickly
detects the fault and triggers VPN FRR switching. Then, traffic sent to CE2 is switched to the
protection path from PE1 to PE3.
Before Switchover
After Switchover
Primary Lsp
Backup Lsp
BFD Session
As show in Figure 3-27, a BFD session detects a fault on the link through which the primary
LSP passes. When a fault occurs on the link of the primary LSP, the BFD session on the
ingress notifies the LSPM of the fault. Then, the ingress switches traffic to the backup LSP
and a new BFD session is set up along the link, through which the backup LSP passes, to
detect the link status.
P1
R1 R2
P2 主Tunnel
备Tunnel
P3
Primary Lsp
Backup Lsp
L2VPN can use BFD for PW to rapidly detect tunnels or PWs between two PEs and trigger
service switchover in case of a fault, reducing the impact of link failures on services.
The TTL mode indicates that the TTL value is variable (automatically calculated or manually set);
the non-TTL mode indicates that the TTL value is fixed at 255.
– BFD for PW in TTL mode: BFD packets are encapsulated into PW packets and
transmitted over a PW regardless of whether the PW is in control word mode or
non-control word mode.
Figure 3-29 Networking diagram for the AC fault detection and notification mechanism
OAM detection
OAM
PE1 PE2 notification
AC fault
RNC
Node B
(CE2)
(CE1)
On the network shown in Figure 3-29, if the AC interface connecting CE1 to PE1 becomes
faulty:
4. After PE2 receives the OAM notification message, if a secondary PW exists between
PE1 and PE2, traffic switches to the secondary PW; if no secondary PW exist between
PE1 and PE2, PE2 sends the message to CE2 through the AC interface found on the
basis of OAM mappings.
Applications
As shown in Figure 3-30, the link UPE1-> SPE1-> UPE2 is the primary PW and the link
UPE1-> SPE2-> UPE2 is the secondary PW. A BFD session is established between UPE1 and
UPE2 to detect multi-segment PWs from UPE1 to UPE2. If the BFD session detects a fault in
the primary PW between UPE1 and UPE2, traffic is rerouted from the primary PW to the
secondary PW.
Figure 3-30 Networking diagram for the configuration of a static BFD session to detect the
multi-segment PW
BFD session
VP
LS L
(VL VL
L)
SPE2
Abbreviation
Abbreviation Full Spelling
VC Virtual Circuit
AC Attachment Circuit
TE Traffic Engineer
PW Pseudo Wire
3.4.1 Introduction
NSR is a type of reliability technology that keeps the neighbor relationships of a device
during the active/standby switchover of main control boards on the device.
Non-Stopping Forwarding (NSF) and Non-Stopping Routing (NSR) are two solutions to High
Availability (HA).
l NSF: ensures that forwarding services are not interrupted during the active/standby
switchover of main control boards by using the protocol-specific GR mechanism.
– When a fault occurs in the system, forwarding services are not interrupted during
the active/standby switchover of main control boards.
– After the device recovers, it can re-establish neighbor relationships with other
devices, and then rebuild the routing table based on the information obtained from
its neighbors.
For details about the GR configuration, see the chapter "GR Configuration" in the
Configuration Guide - Reliability.
l NSR: ensures that route processing is not interrupted on the control plane and the
forwarding plane during the active/standby switchover of main control boards by using
the backup mechanism of a related protocol.
During the active/standby switchover of main control boards on a device, the route
processing is not interrupted because of the following factors:
– No neighbor or topology information is lost.
– No neighbor relationship goes Down.
The advantages of NSR are as follows:
– NSR on the local device does not depend on or affect the remote device. Therefore,
the local and remote devices can communicate properly.
– The route convergence speed of NSR is higher than that of NSF.
NSR l When the control plane becomes l More bandwidths are required and
faulty, the forwarding plane can more system resources are
still provide forwarding services. consumed.
l During the active/standby
switchover of main control
boards, the route processing is not
interrupted because of the
following factors:
– No neighbor or topology
information is lost.
– No neighbor relationship goes
Down.
l The active/standby switchover of
main control boards is relevant to
only the local device.
– The active/standby switchover
of main control boards on the
local device does not depend
on or affect the remote device.
Therefore, the local device and
the remote device can
communicate properly.
– The route convergence speed
of NSR is higher than that of
NSF.
NSR and GR
The device that performs the master/slave main control board switchovers supports two HA
protection mechanisms: NSR and GR. NSR and GR are mutually exclusive for a specific
protocol. However, after NSR is deployed on a device, the device can still be configured as a
GR helper to help its neighbors complete GR, which improves service reliability on all nodes
of a network.
l ISIS
l OSPF
l BGP
l IPv4 L3VPN
l RSVP
l LDP
l BFD
3.5.1 Introduction
Definition
Ethernet operation, administration and maintenance (OAM) is for use on Ethernet networks.
l Fault management
– Ethernet OAM enables a device to send detection packets, either on demand or
periodically, to monitor network connectivity.
– Ethernet OAM uses methods similar to Packet Internet Groper (PING) and
traceroute used on IP networks to diagnose faults on Ethernet networks.
– Ethernet OAM can work with a protection switching protocol to trigger a device or
link switchover if a connectivity fault is detected. Switchovers help networks
achieve carrier-class reliability, by ensuring that network interruptions are less than
or equal to 50 milliseconds.
l Performance management
Performance management is usually implemented at the attachment circuit (AC)
interface and measures the packet loss ratio, delay, and jitter during packet transmission.
It also collects statistics on various types of traffic. By using performance management
tools in a network management system (NMS), carriers can monitor the network running
status, diagnose faults, and check whether the network forwarding capability complies
with the service level agreement (SLA) that has been signed with users.
Purpose
Since its appearance, Ethernet has gradually become the major local area network (LAN)
technology owing to its easy implementation and low costs. With the application of gigabit
Ethernet (GE) and 10 gigabit Ethernet (10GE) technologies in recent years, Ethernet has been
applied in metropolitan area networks (MANs) and wide area networks (WANs).
Ethernet was originally developed for LANs, which do not have high requirements for
reliability and stability compared with MANs and WANs. As a result, Ethernet lacks an OAM
mechanism, hindering Ethernet for use as an ISP network. Therefore, Ethernet OAM is the
trend.
3.5.2 Principles
Ethernet OAM is classified as link- or network-level Ethernet OAM.
Node B ATN
CE UPE
PE-AGG
BRAS
SOHO
CE UPE
IP/
MPLS
Intranet core
CE ......
CX600
PE-AGG
UPE
CE
Commercial
centre
CE UPE
Residential
area
EFM OAM (802.3ah) Ethernet CFM (802.1ag) Backbone
Ethernet in the first mile Access convergence layer network
on the MAN
EFM OAM provides point-to-point fault detection on the link between two directly connected
devices.
Peer Discovery
The EFM OAM working mode is an attribute of the interface on which EFM OAM is
enabled. EFM OAM has two working modes: active mode and passive mode. The default
EFM OAM working mode of an interface is the active mode.
Before configuring EFM OAM on an interface, configure a working mode for the interface:
l If the active mode is configured, the interface initiates the peer discovery process. When
EFM OAM is enabled and the interface initiates the peer discovery process, the interface
and its peer interface enter the EFM OAM discovery phase.
l If the passive mode is configured, the interface does not initiate the peer discovery
process. Two interfaces in passive mode cannot simultaneously negotiate sessions. In
addition, interfaces in passive mode cannot initiate requests for remote loopback or
variables.
Interface 1
(Active) Responds to OAM discovery
(Responds witn an OAMPDU)
OAMPDU flow
On the network shown in Figure 3-32, the EFM OAM working modes of interfaces 1 and 2
are active and passive, respectively. After EFM OAM is enabled on interface 1, the peer
discovery process is as follows:
1. Interface 1 sends an OAM protocol data unit (OAM PDU) to interface 2. This OAM
PDU carries the EFM OAM configuration of interface 1.
2. After receiving the OAM PDU, interface 2 compares its EFM OAM configuration with
that of interface 1 and then responds with an OAM PDU. The OAM PDU sent from
interface 2 to interface 1 carries not only the EFM OAM configurations of both
interfaces 1 and 2, but also the Flags field, which indicates whether interface 2 is
satisfied with the EFM OAM configuration of interface 1.
Figure 3-33 shows the OAM PDU format.
TLV Type
Destination MAC=01-80-C2-00-00-02
TLV Length
Source MAC
OAM Version Number
Slow protocol type=88-09
OAM Revision Number
Subtype=03
State Field
Flags Local TLV
OAM Configuration
Code Remote TLV
OAM PDU configuration
Data/Pad ......
OUI
Frame Check Sequence
Vendor Specific Info
Name Description
7:5 reserved, TLV is set to 0 in local information
4 variable reachability
1=DTE supports that OAM PDUs are in response to sent variables
0=DTE does not support that OAM PDUs are in response to sent
variables
3 link event
1=DTE supports to parse link events
0=DTE does not support to parse link events
0 OAM mode
1=DTE is configured to work in active mode
0=DTE is configured to work in passive mode
3. After receiving the OAM PDU from interface 2, interface 1 compares its EFM OAM
configuration with that of interface 2 to check whether their configurations match.
After the preceding process is complete, interfaces 1 and 2 enter the Detect state if their EFM
OAM configurations match. In the Detect state, the two interfaces periodically send OAM
PDUs to maintain their neighbor relationship. If their EFM OAM configurations do not
match, the two interfaces remain in the Discovery state and keep sending OAM PDUs for
status negotiation until the negotiation is successful or EFM is disabled on either or both of
the interfaces.
Link Monitoring
After link monitoring is configured, the system queries physical-layer statistics about the
interface management module and checks the link quality of an interface. Within a specified
period, if the number of errored frames, errored codes, or errored frame seconds detected on
an interface reaches or exceeds a specified threshold, the link on which the interface resides is
faulty. The local device generates an alarm, reports the alarm to an NMS, and sends an OAM
PDU to notify the remote device of the link fault. An errored frame second is a 1-second
interval during which at least one errored frame is detected.
Fault Notification
Faults that can be reported include protocol packet timeout, physical link faults, and OAM
module transmission faults.
l If a protocol packet times out or a physical link fails, the fault event is logged and
reported to an NMS.
l If a transmission fault occurs on the OAM module, the fault event is logged and reported
to an NMS.
If a reverse link is reachable, an OAM PDU is sent to notify the peer of the fault. After
receiving the OAM PDU, the peer logs and reports the fault event to an NMS.
l If the EFM OAM module is associated with other modules, such as BFD, Ethernet CFM,
the OAM fault association module notifies the associated modules of the fault.
Remote Loopback
In Figure 3-35, when the local interface sends non-OAM PDUs to the remote interface, the
remote interface sends the non-OAM PDUs back to the local interface instead of forwarding
them to their destination addresses. This is called remote loopback.
Remote loopback can be used to locate link faults and test link quality. In remote loopback
mode, the local interface sends test packets to the remote interface. The local device then
calculates communication quality parameters (such as the packet loss ratio) of the current link
based on the number of packets sent and received.
Interface 1 Interface 2
(Active) (Passive)
Data flow
Only interfaces in active mode can initiate remote loopback. Remote loopback can be enabled
on an interface only when the interface is in active mode and both this interface and its remote
peer are in the Detect state. The remote loopback process is as follows:
1. The local interface sends a loopback request to the remote interface and waits for a reply.
2. After receiving the loopback request from the local interface, the remote interface sends
a loopback reply to the local interface and enters the remote loopback state.
3. If the local interface receives the loopback reply within 2 seconds, it enters the remote
loopback state. If the local interface does not receive a loopback reply within 2 seconds,
it retransmits a loopback request to the remote interface. An interface can retransmit a
loopback request a maximum of three times.
To stop remote loopback, the local interface sends the remote interface a message for
disabling remote loopback. After receiving this message, the remote interface exits the
loopback state.
To prevent service interruptions caused by users forgetting to stop remote loopback, remote
loopback is automatically disabled after a timeout period. This timeout period is configurable.
After remote loopback times out, the local interface automatically sends the remote interface a
message to disable remote loopback.
2
Optical Optical
Module A Module B
1
Table 3-6 Differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007
Item IEEE 802.1ag IEEE Std Remarks
Draft 7 802.1ag-2007
Because IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 define packets in different
formats, only one version can be used if Ethernet CFM is required.
Basic Concepts
l MD
An MD is a network or a part of a network for which connectivity is managed by CFM.
Devices in an MD are managed by an Internet service provider (ISP) or carrier.
Each MD has a level, which ranges from 0 to 7. A larger value indicates a higher level.
802.1ag packets in low-level MDs cannot pass through high-level MDs, whereas 802.1ag
packets in high-level MDs can pass through low-level MDs.
MD1 (Level=6)
……
MD2 (Level=3)
……
……
l Default MD
Each device can be configured with a single default MD with the highest priority
according to IEEE Std 802.1ag-2007. The default MD allows a high-level MD to detect
the internal topology of a low-level MD.
As shown in Figure 3-38, in the scenarios of MD nesting, devices with high-level MDs
configured may be the edge and intermediate devices of low-level MDs. When 802.1ag
packets in high-level MDs pass through low-level MDs, the packets are transparently
transmitted. If no default MD is configured and the internal topologies of low-level MDs
needed to be detected, devices in low-level MDs must create MIPs with specified
priorities on specified interfaces to reply to devices in high-level MDs with loopback
reply (LBR) or linktrace reply (LTR) messages.
MD1 (Level=6)
……
MD2 (Level=3)
……
MIP
If default MDs with the same level as high-level MDs are configured on devices in low-
level MDs, MIPs are created based on default MDs to reply to requests sent by devices
in high-level MDs. CFM detects topology changes and monitors the connectivity of both
high- and low-level MDs.
The default MD must have a higher level than all MDs to which MEPs configured on the
local device belong. In addition, the default MD must have the same level as a high-level
MD. The default MD is used to transmit high-level continuity check messages (CCMs)
and create MIPs to send LTR messages.
IEEE Std 802.1ag-2007 states that one default MD can be configured on each device and
associated with multiple virtual local area networks (VLANs). VLAN interfaces can
automatically create MIPs based on default MDs.
NOTE
On a device with a default MD configured, the VLAN that has been associated with the default
MD must not be associated with an MA.
l MA
An MA is a part of an MD. An MD can be divided into one or more MAs. Ethernet CFM
detects connectivity faults in each MA.
On a provider network, a VLAN is generally mapped to a service instance (SI). MA
division helps detect connectivity faults on networks where an SI is transmitted.
The level of an MA is the level of the MD to which the MA belongs.
l MEP
As shown in Figure 3-39, a MEP is located at the edge of an MA.
A MEP is configured on an interface. The level of a MEP is the level of the MD to which
the MEP belongs.
MA
MEP
MIP
l MIP
As shown in Figure 3-39, a MIP is located within an MA.
A MIP is automatically created on an interface based on a specific rule. Table 3-7
describes the differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 in
creating MIPs.
Table 3-7 Differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 in
creating MIPs
Rule for Creating a IEEE 802.1ag Draft 7 IEEE Std 802.1ag-2007
MIP
The level of a MIP is determined by a creation rule and the level of the MD for creating
the MIP.
On the network shown in Figure 3-40, MD1 to MD5 are nested in MD7, and MD2 to
MD5 are nested in MD1. MD7 has a higher level than MD1 to MD5, and MD1 has a
higher level than MD2 to MD5. Multiple MEPs are created on ATN A in MD1, and the
MEPs belong to MDs at different levels.
VLAN2
MD7(Level=7)
...
MD1(Level=6) MD5(Level=2)
... ...
VLAN1 VLAN2
MD2(Level=5) MD3(Level=4)
ATNA
MD4(Level=3) ...
A default rule is configured on ATN A to create a MIP in MD1. The process for creating
a MIP is as follows:
a. ATN A compares MEP levels and finds the MEP at level 5, the highest level. The
level of a MEP is determined by the level of the MD to which the MEP belongs.
b. ATN A selects the MD at level 6, which is higher than the MEP at level 5.
c. ATN A creates a MIP at level 6.
If MDs at level 6 or higher do not exist, no MIP can be created.
If a MIP at level 1 already exists on ATN A, a MIP at level 6 cannot be created.
l MP
MEPs and MIPs are called MPs.
Continuity Check
Ethernet CFM enables MEPs to periodically send continuity check messages (CCMs) to one
another to check the continuity between them. This check is called continuity check (CC).
Figure 3-41 CC
M
CC
CC
M
MEP1
MEP3
M
C
C
MA
MEP2
l CCM generation
A MEP generates and sends CCMs. MEP1, MEP2, and MEP3 are in the same MA on
the network shown in Figure 3-41. After the function of sending CCMs is enabled,
MEP1, MEP2, and MEP3 send multicast CCMs to one another at the same interval.
Each CCM carries a level equal to the MEP level.
l MEP database establishment
Each Ethernet CFM-enabled device has a MEP database. A MEP database records
information about the local MEP and RMEPs in the same MA. The local MEP and
RMEPs are manually configured, and their information is automatically recorded in the
MEP database.
l Fault identification
If a MEP does not receive CCMs from its RMEP within a period that is 3.5 times the
interval at which CCMs are sent, the MEP considers the path to the RMEP faulty. If
OAM fault association is configured, the OAM module triggers the associated module to
react or triggers a switchover.
l CCM termination
An MEP terminates CCMs. If a MEP receives a CCM carrying a level higher than the
local level, it forwards this CCM. If the MEP receives a CCM carrying a level lower than
or equal to the local level, it does not forward this CCM, which ensures that CCMs in a
lower-level maintenance domain (MD) are not sent to a higher-level MD.
A MEP initiates an 802.1ag MAC ping test to monitor a path to a MEP or MIP destination
address. These nodes have the same level and they can share an MA or be in different MAs.
MEP2
LBM LBR
MEP3
MEP1
ATNA
On the network shown in Figure 3-42, MEP1 initiates an 802.1ag MAC ping operation to
MEP2. The process is as follows:
1. MEP1 sends a loopback message (LBM) to MEP2. The LBM must carry either the host
MAC address or MEP ID of MEP2.
2. After receiving the LBM, MEP2 responds with a loopback reply (LBR). MEP1
calculates the period of the ping operation to analyze network performance.
Within a specified timeout period, if MEP1 does not receive an LBR from MEP2, MEP1
considers MEP2 unreachable; if MEP1 receives an LBR from MEP2, MEP1 calculates
the delay from MEP1 to MEP2 based on the timestamp carried in the LBR. In addition,
MEP1 can measure the frame loss ratio based on the difference between the number of
LBMs and the number of LBRs.
MEP2
LTM LTR
MEP1
MIP2
MIP1
LTR LTR
On the network shown in Figure 3-43, MEP1 initiates an 802.1ag MAC trace operation to
MEP2. The process is as follows:
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of
the destination MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and
forwards the LTM if the TTL value is not zero. MIP1 then replies with an LTR to MEP1.
The LTR carries forwarding information and the TTL value carried in the received LTM.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is
repeated for MIP2 and MEP2. In addition, MEP2 finds that its MAC address is the
destination address carried in the LTM and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the
forwarding path between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive
the LTM or reply with an LTR. MEP1 can locate the faulty node based on such a
response failure. For example, if the link between MEP1 and MIP2 works properly, but
the link between MIP2 and MEP2 is faulty, MEP1 can receive LTRs from MIP1 and
MIP2 but fails to receive an LTR from MEP2. MEP1 then considers the path between
MIP2 and MEP2 faulty.
Function Overview
Y.1731 can manage fault information and monitor performance.
l Fault management functions include continuity check (CC), loopback (LB), and linktrace
(LT). The principles of Y.1731 fault management are the same as those of CFM fault
management.
l Performance monitoring functions include single- and dual-ended frame loss
measurement, one- and two-way frame delay measurement, single-ended synthetic loss
measurement (SLM), alarm indication signal (AIS) on virtual private LAN service
(VPLS) networks, virtual leased line (VLL) networks, and virtual local area networks
(VLANs).
Single-ended Collects frame loss To collect frame loss statistics, select either
frame loss statistics to assess the single- or dual-ended frame loss
measurement quality of links between measurement:
MEPs, independent of l Dual-ended frame loss measurement
continuity check. provides more accurate results than the
Dual-ended Collects frame loss single-ended method. The interval
frame loss statistics to assess link between dual-ended frame loss
measurement quality on CFM CC- measurements varies with the interval
enabled devices. between CCM transmissions. The
CCM transmission interval is shorter
than the interval between single-ended
frame loss measurements. The dual-
ended method allows for a short
interval between dual-ended frame loss
measurements.
l Single-ended frame loss measurement
can be used to minimize the impact of
many CCMs on the network.
One-way Measures the network To measure the link delay time, select
frame delay delay time on a either one- or two-way frame delay
measurement unidirectional link between measurement:
MEPs. l One-way frame delay measurement can
Two-way Measures the network be used to measure the delay time on a
frame delay delay time on a unidirectional link between a MEP and
measurement bidirectional link between its RMEP. The MEP must synchronize
MEPs. its time with its RMEP.
l Two-way frame delay measurement can
be used to measure the delay time on a
bidirectional link between a MEP and
its RMEP. The MEP does not need to
synchronize its time with its RMEP.
Ethernet test Measures the bandwidth An ETH-test can measure the link
(ETH-test) throughput and code errors bandwidth throughput and code errors on a
on links. newly established link. After the carrier
leases this link to a user, the user also
conducts an ETH-test to measure the link
bandwidth throughput and code errors.
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange
ETH-LM frames to collect frame loss statistics on E2E links. ETH-LM modes are classified
as near-end ETH-LM or far-end ETH-LM.
ETH-LMM
ETH-LMR
CE2 CE4
CE3 CE6
Y.1731
After single-ended frame loss measurement is enabled, a MEP on provider edge PE1
sends an RMEP on PE2 an ETH-LMM containing an ETH-LM request. The MEP then
receives an ETH-LMR message containing an ETH-LM response from the RMEP on
PE2. The ETH-LMM carries a local transmit counter TxFCl (with the value of TxFCf),
indicating the time when the message is sent by the local MEP. After receiving the ETH-
LMM, PE2 replies with an ETH-LMR message, containing the following information:
– TxFCf: copied from the ETH-LMM
– RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
– TxFCb: value of the local counter TxFCl at the time of ETH-LMM transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss
based on the following values:
– Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter
RxFCl value that is the time when this ETH-LMR message was received. These
values are represented as TxFCf[tc], RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
– Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and
local counter RxFCl value that is the time when this ETH-LMR message was
received. These values are represented as TxFCf[tp], RxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
Far-end frame loss = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
Near-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Service packets are prioritized based on 802.1p priorities and are transmitted using
different policies. Traffic passing through a provider (P) device on the network shown in
Figure 3-45 carries 802.1p priority values of 1 and 2.
Single-ended frame loss measurement is enabled on PE1 to send traffic with the priority
value of 1 to measure frame loss on a link between PE1 and PE2. Traffic with the
priority value of 2 is also sent. After receiving traffic with the priority values of 1 and 2,
the P forwards traffic with a higher priority, delaying the arrival of traffic with the
priority value of 1 at PE2. As a result, the frame loss ratio is no accurate.
802.1p priority-based single-ended frame loss measurement can be enabled to obtain
accurate results.
User User
Network Network
Y.1731
Priority 1
Priority 2
ETH-CCM
ETH-CCM
CE2 CE4
CE3 CE6
Y.1731
After dual-ended frame loss measurement is configured, each MEP periodically sends a
CCM carrying a request to its RMEP. After receiving the CCM, an RMEP collects near-
and far-end frame loss statistics and does not forward the message. The CCM contains
the following information:
– TxFCf: value of the local counter TxFCl at the time of CCM transmission
– RxFCb: value of the local counter RxFCl at the time of the reception of the last
CCM
– TxFCb: value of TxFCf in the last received CCM
PE1 uses received information to measure near- and far-end frame loss based on the
following values:
– Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value
that is the time when this CCM was received. These values are represented as
TxFCf[tc], RxFCb[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this CCM was received.
– Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter
RxFCl value that is the time when this CCM was received. These values are
represented as TxFCf[tp], RxFCb[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous CCM was received.
Far-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
Near-end frame loss = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing
SLM, the local MEP exchanges frames containing ETH-SLM information with one or more
RMEPs.
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame
with the single-ended ETH-SLM reply information is called an SLR. SLM frames carry SLM
protocol data units (PDUs), and SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the point-to-
multipoint network shown in Figure 3-47, inward MEPs are configured on PE1's and PE3's
interfaces, and single-ended frame LM is performed on the PE1-PE3 link. Traffic coming
through PE1's interface is destined for both PE2 and PE3, and single-ended frame LM will
collect frame loss statistics for all traffic, including the PE1-to-PE2 traffic. As a result, the
collected statistics are not accurate. Unlike singled-ended frame LM, single-ended SLM
collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
PE2 CE2
User
Network
CE1 PE1
User
Network
Network
CE3
User
Network
SLM PE3
SLR
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR
frames from PE3. SLM frames contain TxFCf, the value of TxFC1 (frame transmission
counter), indicating the frame count at the transmit time. SLR frames contain the following
information:
l TxFCf: value of TxFC1 (frame transmission counter) indicating the frame count on PE1
upon the SLM transmission
l TxFCb: value of RxFC1 (frame receive counter) indicating the frame count on PE3 upon
the SLR transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the
near-end and far-end frame loss based on the following values:
l Last received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive counter)
indicating the frame count on PE1 upon the SLR reception. These values are represented
as TxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc indicates the time when the last SLR frame was received during the measurement
period.
l Previously received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive
counter) indicating the frame count on PE1 upon the SLR reception. These values are
represented as TxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous
measurement period.
On a network, each packet carries the IEEE 802.1p field, indicating its priority. According to
packet priority, different QoS policies will be applied. On the network shown in Figure 3-48,
the PE1-to-PE3 traffic has two priorities: 1 and 2, as indicated by the IEEE 802.1p field.
When implementing single-ended SLM for traffic over the PE1-PE3 link, PE1 sends SLM
frames with varied priorities and checks the frame loss. Based on the check result, the
network administrator can adjust the QoS policy for the link.
PE2 CE2
User
Network
CE1 PE1
User Network
Network Y.1731 CE3
User
Network
PE3
Y.1731
MEP
Priority 1
Priority 2
ETH-DM
Delay measurement (DM) measures the delay time and delay variation. A MEP sends its
RMEP a message carrying ETH-DM information and then receives a response message
carrying ETH-DM information from its RMEP.
ETH-DM supports the following modes:
l One-way frame delay measurement
A MEP sends its RMEP a 1DM message carrying one-way ETH-DM information. After
receiving this message, the RMEP measures the one-way frame delay or delay variation.
The one-way frame delay measurement can be implemented only after the MEP
synchronizes the time with its RMEP. The delay variation can be measured regardless of
whether the MEP synchronizes the time with its RMEP. If a MEP synchronizes its time
with its RMEP, the one-way frame delay and delay variation can be measured. If the time
is not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following
modes:
– The on-demand measurement computes the one-way frame delay at a time or a
specific number of times for diagnosis.
– The proactive mode computes the one-way frame delay periodically.
Figure 3-49 illustrates the procedure for one-way delay measurement.
1DM PDU
CE2 CE4
Y.1731
One-way frame delay measurement is implemented on an E2E link between a local MEP
and its RMEP. The local MEP sends 1DMs to the RMEP and then receives replies from
the RMEP. After one-way frame delay measurement is configured, a MEP periodically
sends 1DMs carrying TxTimeStampf (the time when the 1DM was sent). After receiving
the 1DM, the RMEP parses TxTimeStampf and compares this value with RxTimef (the
time when the DM frame was received). The RMEP calculates the one-way frame delay
based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay value can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets prioritize services. Traffic passing through a P
on the network shown in Figure 3-50 carries 802.1p priority values of 1 and 2.
One-way delay measurement is enabled on PE1 to send traffic with the priority value of
1 to measure the frame delay on a link between PE1 and PE2. Traffic with the priority
value of 2 is also sent. After receiving traffic with the priority values of 1 and 2, the P
forwards traffic with a higher priority, delaying the arrival of traffic with the priority
value of 1 at PE2. As a result, the frame delay calculated on PE2 is no accurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain
accurate results.
1DM PDU
User User
Network Network
Y.1731
Priority 1
Priority 2
CE3 CE6
Y.1731
DMM, the RMEP replies with a DMR message. This message carries RxTimeStampf
(the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the
exception that the source and destination MAC addresses was interchanged. Upon
receipt of the DMR message, the MEP calculates the two-way frame delay using the
following equation:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
The frame delay value can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets prioritize services. Traffic passing through a P
on the network shown in Figure 3-52 carries 802.1p priority values of 1 and 2.
Two-way delay measurement is enabled on PE1 to send traffic with the priority value of
1 to measure the frame delay on a link between PE1 and PE2. Traffic with the priority
value of 2 is also sent. After P receiving traffic with the priority values of 1 and 2, the P
forwards traffic with a higher priority, delaying the arrival of traffic with the priority
value of 1 at PE2. As a result, the frame delay calculated on PE2 is no accurate.
802.1p priority-based two-way frame delay measurement can be enabled to obtain
accurate results.
DMR
User User
Network Network
Y.1731
Priority 1
Priority 2
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with the level of 6 on each of customer edge CE1 and CE2
access interfaces on the user network shown in Figure 3-53. A MEP is configured in MD2
with the level of 3 on each of PE1 and PE2 access interfaces on a carrier network.
l If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS PDUs to
CEs. After receiving the AIS PDUs, the CEs suppress alarms, minimizing the impact of
a lot of alarms on a network management system (NMS).
l After the link between the PEs recovers, the PEs stop sending AIS PDUs. CEs do not
receive AIS PDUs during a period of time 3.5 times as long as the interval at which AIS
PDUs are sent. Therefore, the CEs cancel the alarm suppression function.
VLL/VPLS/VLAN
VLAN/QinQ VLAN/QinQ
MD2 Level 3
MD1 Level 6
Usage Scenario
Y.1731 applies to virtual leased lines (VLLs), virtual private leased line service (VPLS)
connections, and virtual local area networks (VLANs). AIS and multicast loopback
applications are the same in these scenarios. Different Y.1731 statistical functions are
supported in specific scenarios. The following example illustrates Y.1731 statistical functions
in different scenarios on the network shown in Figure 3-54.
Core
Y.1731 Y.1731
Y.1731
AC Side PW Side
Scenario
link3
link1 PE2
PE1
BFD
CE1 CE2
link2
link4
PE3 PE4
PW2
ATN CX-B
The interface associate with
EFM OAM
NOTE
l For ATN 950Bs, only ATN 950Bs with AND2CXPB/AND2CXPEs configured support the
association between Ethernet CFM and an interface.
l Only the ATN 950B can detect faults on the active link in a manually configured 1:1 active/standby
link aggregation group.
Figure 3-60 Association between Ethernet OAM and clearing ARP entries
ETH OAM
Master PE2a
L2VPN
Backup
ATN
Node B RNC
PE2b
Anti-Jitter
NOTE
Anti-jitter is supported only on devices that comply with IEEE Standard 802.1ag-2007.
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows only
the alarm with the highest level to be sent to an NMS. If alarms remain after the highest-level
alarm is cleared, the alarm with the next highest level is sent to the NMS. The process repeats
until all alarms are cleared.
The principles of CFM alarm suppression are:
l High-level alarms require immediate troubleshooting.
l A single fault may trigger alarms with different levels. After the highest-level alarm is
cleared, alarms with lower levels may also be cleared.
EFM OAM EFM OAM provides continuity check, fault monitoring, fault
notification, and remote loopback functions for the link between
directly connected devices. EFM OAM is used on links between
customer edges (CEs) and user-end provider edges (UPEs) on a
metropolitan area network (MAN) shown in Figure 3-61. EFM OAM
helps maintain the reliability and stability of connections between a
user network and a provider network.
Ethernet CFM Ethernet CFM provides E2E connectivity fault detection, fault
notification, fault acknowledgement, and fault locating functions.
Ethernet CFM is used at the access and aggregation layers of the
MAN shown in Figure 3-61 to monitor network-wide connectivity
and detect connectivity faults. Ethernet CFM can work with a
protection switching technology to improve network reliability.
CE UPE
P E -A G G
BRAS
SOHO
CE UPE
IP /M P L S
In tra n e t co re
CE ......
CX
P E -A G G
UPE
CE
C o m m e rcia l
ce n tre
CE UPE
R e sid e n tia l
a re a
E F M O A M (8 0 2 .3 a h ) E th e rn e t C F M (8 0 2 .1 a g ) B a ckb o n e
E th e rn e t in th e first m ile A cce ss co n ve rg e n ce la ye r n e tw o rk
o n th e M A N
3.5.3 Applications
Fault and Performance Detection on E-Line Services
RNC
Node B
Metro
ATN
PW/PBT
Node B Node B
ATN
Defined in MEF 6, VLL services refer to Ethernet line services based on point-to-point
Ethernet virtual connections.
As shown in Figure 3-62, PWs are set up using the MPLS technology. As for the whole
service channel, a tunnel can be considered as a hop; as for a metropolitan network, Ethernet
OAM is used to set up an MD from a local UPE interface that accesses a CPE to a remote
UPE interface that accesses a CPE. MAs are set up for specific user services, and a MEP in
inward mode is created on a UPE interface that accesses a CPE. In this manner, faults on PWs
can be detected and services transmitted through PWs can be protected (MA2).
Metro
MSTP/RRPP/RPR
Node B
(CPE)
ATN Node B
(UPE)
(CPE)
3AH
1AG
VPLS
Metro
Node B
(CPE)
ATN Node B
(UPE) (CPE)
3AH
1AG
Defined in MEF 6, E-LAN services provide P2MP connections for users. E-LAN services can
be implemented using technologies, such as pure Ethernet, VPLS, and QinQ. Each node on a
network must learn a MAC address.
UPEs implement fault detection on the network of multiple UPEs through 802.1ag by sending
multicast CC packet. Detection on delay or jitter can be implemented only between two
specific UPEs. Fault detection between the CPE and UPE uses 802.3ah, and no P2MP issues
are involved.
inter-VLAN replication. For E-tree services, fault detection is performed on the leaf node for
the link from the root to the leaf and the leaf node is dual-homing protected.
l Fault detection on the single-direction P2MP network:
As shown in Figure 3-65, no inter-VLAN replication is applied on such a network. Each
node only concerns the status of the link to the root node, but not the status of the links
to other leaf nodes. The root node, however, does not need to concern the status of the
link to any leaf node. Therefore, you can set up only one MA and configure all leaf
nodes and the root node as MEPs. On the root node, you can specify all leaf nodes as
remote MEPs on the root node and enable the sending of CC packets. On all leaf nodes,
you can specify only the root node as the remote MEP and enable the receiving of CC
alarms. After the configuration, you can perform OAM detection on the network.
Metro/IP/MPLS
Core
Multicast
source
Leaf1
Bridge
Leaf2 Leaf3
Metro/IP/MPLS
Core
Multicast
source
Leaf1
Bridge
Leaf2 Leaf3
MEP in MA1
MEP in MA2
MEP in MA3
Figure 3-67 Fault detection and protection switchover of the LACP static link aggregation
group
Ethernet CFM
GE0/2/1 Link1 GE1/0/1
GE0/2/2 Link2 GE1/0/2
Link3
ATN GE0/2/3 GE1/0/3 CX-B
You can configure Ethernet CFM on ATN and CX-B and configure MEPs on all the member
interfaces of the aggregation group. MEPs on the interfaces of the same link are configured
within the same MA. MEPs on the interfaces of different links are configured within different
MAs. MEPs on all the interfaces belong to the same MD. You can detect the link connectivity
by exchanging CCMs between MEPs of the same link. You can then associate Ethernet CFM
with the interfaces.
When a connectivity fault occurs on Link1, the OAM modules on ATN and CX-B block and
then unblock their GE0/2/1 and GE1/0/1 interfaces respectively. In this manner, the LACP
module senses the connectivity fault on Link1 and switches the service data from Link1 to the
inactive Link3.
Terms
None
MA maintenance association
MD maintenance domain
3.6 E-LMI
3.6.1 Introduction
Definition
Ethernet Local Management Interface (E-LMI) is an operation, administration and
maintenance (OAM) protocol defined by the Metro Ethernet Forum (MEF) 16 Technical
Specification. E-LMI runs on the user-to-network interface (UNI) link between a provider
edge (PE) and a customer edge (CE), enabling the PE to notify the CE of the connectivity
status and Ethernet service configuration parameters available for the UNI on the CE side.
Figure 3-68 shows where E-LMI is deployed on a network.
Purpose
E-LMI enables the PE to implement the following functions:
l Notifies the addition or deletion of an Ethernet virtual connection (EVC) to the CE.
l Notifies EVC status (Active, Not Active, or Partially Active) to the CE.
l Communicates UNI and EVC attributes to the CE.
3.6.2 Principles
E-LMI enables a CE to request and receive status and service attributes from a PE so that the
CE can be automatically configured to implement Metro Ethernet services. Figure 3-69
shows E-LMI networking.
E-LMI E-LMI
EVC
An EVC associates two or more UNIs and is classified as point-to-point or multipoint-to-
multipoint. Figure 3-70 shows EVC classification.
Multipoint to
Point to Point MultiPoint
EVC EVC
E-LMI Messages
Table 3-10 describes E-LMI messages.
STATUS ENQUIRY The CE sends this STATUS message The PE returns this
message with a message to the PE to with a report type of message carrying
report type of Full request the status of Full Status the status of the UNI
Status the UNI and all and all EVCs to the
EVCs. CE.
STATUS ENQUIRY The CE uses this STATUS message The PE uses this
message with a message to with a report type of message to
report type of E- determine whether E-LMI Check determine whether
LMI Check E-LMI is E-LMI is
operational. operational.
As shown in Figure 3-71, after receiving a STATUS ENQUIRY message with a report type of
Full Status from the CE, the PE checks whether the length of the message exceeds the length
of a single Ethernet frame. If the length of the message exceeds the length of a single Ethernet
frame, the PE returns a STATUS message with a report type of Full Status Continued. After
receiving the message, the CE returns a STATUS ENQUIRY message with a report type of
Full Status Continued. This process continues until all the remaining status information is sent
out in a single Ethernet frame. The PE then sends a STATUS message with a report type of
Full Status.
FULL Status
Purpose
Both networks and services are part of an ongoing process of transformation and integration.
New services like triple play services, Next Generation Network (NGN) services, carrier
Ethernet services, and Fiber-to-the-x (FTTx) services are constantly emerging from this
process. Such services demand more investment and have higher OAM costs. They require
state of the art QoS, full service access, and high levels of expansibility, reliability, and
manageability of transport networks. Traditional transport network technologies, such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or
Wavelength Division Multiplexing (WDM) cannot meet these requirements because they lack
a control plane. Unlike traditional technologies, MPLS-TP does meet these requirements
because it can be used on next-generation transport networks that can process data packets, as
well as on traditional transport networks.
Because traditional transport networks like SDH or Optical Transport Node (OTN) networks
have high reliability and maintenance benchmarks, MPLS-TP must provide powerful OAM
capabilities. MPLS-TP OAM provides the following functions:
l Fault management
l Performance monitoring
l Protection switching
3.7.2 Principles
3.7.2.1 MPLS-TP OAM Functional Components
MPLS-TP OAM uses a range of functional components that have been defined in ITU-T Y.
1731. These definitions are widely used by IETF MPLS-TP work teams. The functional
components mentioned below are defined in the majority of OAM-associated MPLS-TP
standards documentation.
ME and MEG
MPLS-TP OAM functions are performed for maintenance entities (MEs). An ME consists of
a pair of maintenance entity group end points (MEPs) at either end of a transport path as well
as maintenance entity group intermediate points (MIPs) located between the two MEPs. OAM
operates between two MEPs along a transport path. The path can be either a P2P transport
path, such as a pseudo wire (PW) or a point-to-point label switched path (P2P LSP), or a
point-to-multipoint (P2MP) path, for example, a P2MP LSP.
One or more MEs that use a transport path form a maintenance entity group (MEG).
Figure 3-72 illustrates either a P2P LSP or a PW. If the figure shows a P2P LSP, A and D are
label edge routers (LERs) and B and C are label switching routers (LSRs) If it illustrates a
multi-segment pseudo wire (MS-PW), A and D are a terminating provider edge (T-PEs), and
B and C are switching provider edges (S-PEs). A and D are MEPs and B and C are MIPs. The
equipment in the diagram can be connected by a physical link, an LSP, or a sub-layer
transport path.
MEP
A MEP is the source or sink node of a MEG. The MEP can only be an LER on an MPLS-TP
LSP, or a T-PE on an MPLS-TP PW.
On a P2P LSP shown in Figure 3-73, only PE1 and PE2 are MEPs.
End-to-End
LSP
Maintenance
End Point
CC
CC is a pro-active OAM operation. It detects LOC faults between any two MEPs that are part
of a MEG. A MEP sends CC messages (CCMs) to a remote RMEP at specified intervals. If
the RMEP does not receive a CCM for a period 3.5 times greater than the specified interval, it
considers the connection between the two MEPs to be faulty. This causes the RMEP to report
an alarm and enter the Down state, and it triggers automatic protection switching (APS) on
both MEPs. After receiving a CCM from the MEP, the RMEP will clear the alarm and exit the
Down state.
CV
CV is also a pro-active OAM operation. It enables a MEP to report alarms when packets are
received unexpectedly or in error. For example, if a CV-enabled MEP receives a packet from
an LSP and finds that this packet has been transmitted in error by the LSP, the MEP will
report an alarm indicating a forwarding error.
MPLS-TP
LMM TxFCf
Single-End ETH-LM
LMR TxFCf RxFCb TxFCb
Figure 3-74 shows single-ended and dual-ended LM processes. Dual-ended LM works only
in pro-active monitoring mode. Two MEPs in this mode periodically send CCMs carrying the
following information:
l TxFCf: the local TxFCl value recorded when a CCM is sent.
l RxFCb: the local RxFCl value recorded when a CCM is received.
l TxFCb: the TxFCf value carried in a received CCM. This TxFCb value is the RMEP's
TxFCI.
After receiving CCMs carrying frame count information, both MEPs use formulas shown in
Figure 3-75 to implement near-end and far-end LM.
TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values carried in
the most recently received CCM. RxFCl[tc] is the RMEP's RxFCI value, and tc shows the
time when this CCM is received.
TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values carried in
the previously received CCM. RxFCl[tp] is the RMEP's RxFCI value, and tp shows the time
when the previous CCM was received.
Single-ended LM usually works in on-demand mode. In this mode, only the MEP periodically
sends a loss measurement message (LMM); the RMEP does not. LMMs carry the following
information to the REMP:
l TxFCf: the local TxFCl value recorded when the LMM is sent.
After receiving an LMM, the RMEP returns a loss measurement reply (LMR) carrying the
following information:
l TxFCf: the TxFCf value carried in the LMM.
l RxFCf: the local RxFCl value recorded when the LMM was received.
l TxFCb: the local TxFCl value recorded when the LMR is sent.
After receiving an LMR, the local MEP uses the equations shown in Figure 3-76 to calculate
near-end and far-end frame loss measurement.
TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values carried in the
most recently received LMR. RxFCl[tc] is the local RxFCI value recorded when the LMR is
received, and tc shows the time when the LMR is received.
TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values carried in the
previously received LMR. RxFCl[tp] is the local RxFCI value, and tp shows the time when
the previous LMR was received.
MPLS-TP
DMM TxTimeStampf
Two-way ETH-DM
DMR TxTimeStampf
3.7.3 Application
An MPLS-TP network uses Pseudo Wire Emulation Edge to Edge (PWE3) to transmit TDM,
ATM, or Ethernet services.
Figure 3-78 Networking diagram for MPLS-TP OAM over an IP RAN in the Layer 2 to edge
scenario
BTS/NodeB
FE/GE
MPLS-TP
N*E1 GE
IMA E1 STM-1
RNC/BSC
BTS/NodeB
FE/GE
BTS/NodeB
MPLS-TP OAM is used for MPLS-TP operation and maintenance. MPLS-TP OAM can
effectively detect, identify, and locate faults in the client layer and quickly switch traffic when
links or nodes become defective. This reduces network maintenance expenditures.
CC Continuity Check
CV Connectivity Verification
DM Delay Measurement
LM Loss Measurement
NOTE
Users cannot specify the source and target versions at random for ISSU. The two versions need to be
planned and developed in advance. For details about ISSU target versions supported by the current
version, contact Huawei R&D engineers.
3.8.1 Introduction
Definition
In-Service Software Upgrade (ISSU) is a upgrade mode in which services are not affected.
ISSU reduces service interruption time greatly and enhances network availability.
Purpose
The traditional software upgrade mode interrupts services running on the device for a long
time, which decreases benefits brought to operators. The online software patch technology
can upgrade some modules of software to rectify the defects of software during the running of
the device; however, the technology is restricted under most situations. As a result, the full
image software upgrade is required.
In addition, the traditional software upgrade is carried out during midnight to reduce the
impact of upgrade on services, and it has strict requirements on upgrade operations. If the
traditional software upgrade is not finished within the specified period, the device must be
rolled back to the previous version. Thus, the software has to be upgraded again. As a result,
new services cannot be provided to users or defects cannot be rectified in time. In addition,
time limitation increases the probability of manual operation failures and the cost on human
resources and management.
Besides choosing a proper upgrade time, traditional upgrading technologies reduce service
interruption time by establishing multiple equal-cost paths or backup paths to switch services.
As a result, network configurations must be modified, which increases the failure probability
and prolongs upgrade period. In addition, traffic may be interrupted because some backup
paths are too crowded to bear traffic after service switchover, as shown in Figure 3-79
Out-service
for
This method takes time and resources to switch services to the backup path. If the network
does not have backup paths, this method does not take effect. This method does not apply to
the cluster chassis which is a trend of network development.
ISSU greatly reduces the impact of software upgrade on services, improves customer
satisfaction, enhances product competitiveness, and even facilitates Huawei engineers in
maintaining devices. In ISSU, no services are interrupted (ideally), software is upgraded at
any time and without switching services, thus simplifying software upgrade, as shown in
Figure 3-80
in-service
software
upgrade
Benefits
l Benefits Brought to Operators
ISSU reduces service interruption time to seconds. In addition, ISSU can be
implemented without switching services, and thus it is free from backup links and
reduces maintenance cost on software upgrade.
l Benefits Brought to Users
In ISSU, users' services are seldom affected and thus customer satisfaction is improved.
3.8.2 Principles
ISSU implements automatic software upgrade without interrupting services. Its principle is as
follows:
l ISSU restarts the SMB based on the new version. In this way, the new process the SMB
after restart form the new forwarding plane and Active Management Plane (AMP).
l Data synchronization and configuration restoration are performed between the new AMB
and old AMB.
l The new AMP and forwarding plane replace the old ones to implement ISSU.
NOTE
In Figure 3-81, A is old version, B is new version.
2. Start the Standby Management Plane (SMP) based on the new version and synchronize
data.
– The AMB generates a configuration file of the new version and synchronize it with
the SMB (AMB of the new AMP).
– The AMB of the new AMP restores configurations.
– The data is dynamically synchronized between the new and old AMPs.
3. Perform the AMP/SMP switchover.
– After data backup is complete, AMP/SMP switchover can be performed. The new
AMP (namely the original SMP) takes over the whole host system, including data
forwarding plane.
– The incremental updates of service entries and data smoothing of the new AMP are
complete.
4. Update the new SMP.
– Restart the new SMP based on the new version to complete ISSU.
Devices that support ISSU can be upgraded through ISSU. Check the version compatibility
through a version comparison tool or command lines. Then, accordingly the proper ISSU
mode is automatically selected. As shown in Figure 3-82, the device marked in yellow is to
be upgraded. Through ISSU, instead of using multiple equal-cost/backup paths, you can
upgrade the device directly on the device.
Terms
NA.
Abbreviations
Abbreviation Full Spelling
4 Interface Management
4.1.1 Introduction
Definition
A logical interface is a virtualized interface and does not physically exist. It must be manually
configured to exchange data. Logical interfaces include:
l Sub-interface
l Trunk interface
l VLANIF interface
l Virtual Ethernet (VE) interface
l Loopback interface
l Null0 interface
l Tunnel interface
Purpose
l Sub-interface
For point-to-point communication, assigning one IP address per physical interface
generally meets requirements. However, if the link layer of an interface supports
4.1.2 Principles
l Member interfaces must have consistent parameters on the two ends of a trunk link.
These parameters include:
– Interface quantity
– Interface transmission rate
– Interface duplex mode
l Data sequence must be guaranteed.
A data flow is a group of packets with the same source and destination MAC addresses,
source and destination IP addresses, and source and destination port numbers. For
example, the Telnet or FTP connection between two devices is a data flow.
Before trunking is configured, data flow frames can reach their destination in the correct
order because only one physical connection exists between two devices. However, after
trunking is configured, frames may reach the destination in an incorrect order, because
packets are transmitted over multiple links.
To prevent frame misorder, a packet forwarding mechanism is configured.
After the packet forwarding mechanism is configured, packets are transmitted in any of the
following manners:
l Packets with the same source and destination MAC addresses are transmitted over the
same physical link.
l Packets with the same source and destination IP addresses are transmitted over the same
physical link.
l Packets with the same source and destination IP addresses, source and destination
TCP/UDP port numbers, and IP protocol types are transmitted over the same physical
link.
LLC
Data link
layer MAC
Trunk
Physical layer PHY
The MAC sub-layer regards trunk interfaces as physical interfaces and delivers frames
directly to trunk interfaces.
The trunk module maintains a trunk forwarding table that contains the following two items:
l Key value
Key values are calculated using a hash algorithm based on packets' MAC or IP
addresses.
l Interface number
The number of entries in a trunk forwarding table is the same as the number of bundled
member interfaces. For example, if interfaces 3, 4, 5, and 6 are bundled into a trunk
interface, a trunk forwarding table contains four entries, as shown in the following
figure.
KEY 0 1 2 3
PORT 3 4 5 6
The trunk module forwards frames based on the trunk forwarding table in the following way:
1. The trunk module receives a frame from the MAC sub-layer and extracts the source
MAC or IP address or extracts the destination MAC or IP address.
2. The trunk module uses a hash algorithm to calculate the key value.
3. The trunk module searches for the interface number in the trunk forwarding table based
on the key value.
4. The trunk module sends the frame through the corresponding interface.
For more trunk details, see ATNMulti-service Access EquipmentFeature Description -
LAN Access and MAN Access.
4.1.3 Applications
Eth-Trunk Interface
An Eth-Trunk interface aggregates the bandwidth of its member interfaces. As shown in the
following figure, an Eth-Trunk interface is created on ATN A and ATN B, and two full-duplex
GE interfaces are added to the Eth-Trunk interface. The bandwidth of the Eth-Trunk link
totals the bandwidths of the two GE interfaces.
An Eth-Trunk interface improves traffic transmission reliability. If one member link fails,
traffic is switched to the other member link.
An Eth-Trunk interface supports load balancing on its member interfaces to prevent network
congestion.
VLANIF Interface
VLANIF interfaces are used for inter-VLAN communication. In Figure 4-4, if hosts in
VLAN 2 need to communicate with hosts in VLAN 3, VLANIF interfaces must be created for
VLAN 2 and VLAN 3 on the ATN device.
ATN
VLANIF VLANIF
VLAN 2 VLAN 3
Loopback Interface
l Reliability improvement
– Application in IP address unnumbered
When an interface uses an IP address only for a short period, the interface can
borrow a loopback interface's IP address from another interface to save IP address
resources and maintain interface stability.
– Application in router IDs
Some dynamic routing protocols require that ATNs have router IDs. A router ID
uniquely identifies an ATN in an autonomous system (AS).
For example, if no router ID is configured for both OSPF or BGP on a device, the
device needs to select the largest IP address as the router ID from the local interface
The system allows only the packets sent from the loopback interface address to
access the RADIUS server, thereby facilitating log reading and writing.
Tunnel Interface
Tunnels such as GRE tunnels use tunnel interfaces to forward packets. Tunnel interfaces are
virtual interfaces that must be created before these tunnels can be used.
The source and destination addresses of a tunnel uniquely identify a tunnel. The source tunnel
address of the local end is the destination address of the remote end. Conversely, the
destination tunnel address of the local end is the source address of the remote end.
Terms
Term Definition
FR Frame Relay
GE Gigabit Ethernet
IP Internet Protocol
MP Multilink PPP
VA virtual access
VE virtual Ethernet
enabled on an interface, the physical status of the interface frequently switches between Up
and Down because alarm reporting is also faster. This frequent switching is called network
flapping.
l The transmission alarm customization function allows you to specify alarms that can
cause an interface to change its physical status.
l The transmission alarm suppression function allows you to suppress network flapping by
setting a series of thresholds.
Purpose
Transmission alarm customization allows you to filter unwanted alarms, and transmission
alarm suppression enables you set thresholds on customized alarms, allowing devices to
ignore alarm burrs (also alarm chatters) generated during transmission link protection and
preventing frequent network flapping.
If the transmission device fails, the IP device receives alarms. Then, the transmission device
protects the link and notifies the IP device that alarms are cleared. The entire process in which
the alarms are generated and then cleared generally lasts from 50 ms to 200 ms. For the IP
device, the process is generally regarded as a burr of a corresponding duration.
The IP device is expected to ignore such burrs. That is, when the transmission device
maintains or protects the link, the system uses the suppression to prevent route flapping,
ensuring that the network continues to run stably. Transmission alarms can be customized,
which minimizes the impact of alarms on the physical status of the interface. Transmission
alarm suppression can efficiently filter and suppress alarm signals to prevent interface
flapping.
4.2.2 Principles
Alarm Burr
An alarm burr, also called alarm chattering, is a process in which alarm generation and
clearance signals are received in a short period (the period varies with specific usage
scenarios, devices, or service types).
For example, if a loss of signal (LOS) alarm is cleared 50 ms after it is generated, the process
from the alarm generation to clearance is an alarm burr.
Alarm Flapping
Alarm flapping is a process in which an alarm is repeatedly generated and cleared in a short
period (The period varies with specific usage scenarios, devices, or service types).
For example, if an LOS alarm is generated and cleared 10 times in 1s, alarm flapping occurs.
1. After a transmission device generates an alarm, it determines whether to report the alarm
to its connected IP device based on the alarm type.
– If the alarm type is b3tca, sdbere, or sfbere, the device determines whether the
alarm threshold is reached.
If the threshold is reached, the device reports the alarm to the IP devices for
processing.
Otherwise, the device ignores the alarms.
– All other alarm types are directly reported to the IP device for processing.
2. If alarms are configured to be recorded to logs, the alarms are recorded after being
generated.
3. The IP device determines whether to change the physical status of the interface based on
customized alarm types.
– If no alarm types are customized to affect the physical status of the interface, alarms
of these types are ignored. The physical status of the interface remains unchanged.
– If an alarm type is customized to affect the physical status of the interface, the
alarm is processed based on the transmission alarm customization mechanism.
l If a certain type of alarms is customized to affect the interface status but transmission
alarm filtering or suppression is not configured:
– The physical status of the interface changes to Down if such an alarm is generated.
– The physical status of the interface changes to Up if such an alarm is cleared.
l If a certain type of alarms is customized to affect the interface status and transmission
alarm filtering or suppression is configured, the IP device processes the alarm according
to the filtering mechanism or suppression parameters.
figure of merit
ceiling
suppress
reuse
t1 t2 t3 t4 t5 time
Figure 4-5 shows the correlation between a transmission device sending alarm generation
signals and how figure of merit increases and decreases.
1. At t1 and t2, figure of merit is smaller than suppress. Therefore, alarm signals
generated at t1 and t2 affect the physical status of the interface, and the physical status of
the interface changes to Down.
2. At t3, figure of merit exceeds suppress, and the alarm is suppressed. The physical status
of the interface is not affected, even if new alarm signals arrive.
3. At t4, figure of merit reaches ceiling. If new alarm signals arrive, figure of merit is
recalculated but does not exceed ceiling.
4. At t5, figure of merit falls below reuse, and the alarm is free from suppression.
Terms
None
4.3.1 Introduction
Definition
Alarm inversion inverts the state of LOS alarms when they are generated on physical
interfaces during device deployment.
Purpose
Alarm inversion can be used to invert the state of when an LOS alarm is generated. This
function is useful during device deployment when a device's physical interfaces have services
configured but do not connect to any cables. In such a scenario without alarm inversion
enabled, a device will report LOS alarms to the NMS. When this function is enabled, the
device will not report LOS alarms. Clearing the alarms will disable alarm inversion so that
subsequent LOS alarms can be reported. Alarm inversion does not affect network monitoring.
4.3.2 Principles
This document describes the LAN access and MAN access in terms of the overview,
principle, and applications.
5.1 Ethernet
5.2 VLAN
5.3 Trunk
5.4 STP/RSTP/MSTP
5.5 QinQ
5.6 RRPP
5.7 LLDP
5.8 Transparent Transmission of Layer 2 Protocol Packets
5.9 ERPS (G.8032)
5.10 Automatic Link Discovery
5.1 Ethernet
Definition
The Ethernet technology originates from an experimental network with the purpose of
connecting multiple PCs at the speed of 3 Mbit/s. In general, Ethernet refers to a standard for
10 Mbit/s Ethernet networks. The Digital Equipment Corporation (DEC), Intel, and Xerox
(DIX) joined efforts to develop and then issued the standard in 1982. The IEEE 802.3
standard is developed on the basis of the Ethernet standard, and is compatible with it.
In TCP/IP, the encapsulation format of IP packets of the Ethernet is defined in RFC 894, and
that of the IEEE 802.3 network is defined in RFC 1042. Currently, the most commonly-used
encapsulation format is that defined in RFC 894, which is called Ethernet_II or Ethernet DIX.
NOTE
To distinguish Ethernet frames of those two types, in this document, Ethernet frames defined in RFC
894 are called Ethernet_II frames; Ethernet frames defined in RFC 1042 are called IEEE 802.3 frames.
Purpose
Ethernet is a universal communication protocol standard used for local area networks (LANs).
This standard defines the cable type and signal processing method used for LANs.
Ethernet networks are broadcast networks established based on the Carrier Sense Multiple
Access with Collision Detection (CSMA/CD) mechanism. Collisions restrict Ethernet
performance. Early Ethernet devices such as hubs work at the physical layer, and cannot
confine collisions to a particular scope. This restricts network performance improvement.
Working at the data link layer, switches are able to confine collisions to a particular scope.
Therefore, switches help improve Ethernet performance and gradually replace hubs to become
mainstream Ethernet devices. Switches, however, do not restrict broadcast traffic on the
Ethernet. This affects Ethernet performance. To resolve this problem, divide a LAN into
virtual local area networks (VLANs) on switches or use Layer 3 switches.
As a simple, cheap, and easy-to-implement LAN technology, Ethernet has become the
mainstream in the industry. The development of Fast Ethernet (FE) and Gigabit Ethernet
(GE), which provide higher Ethernet performance, helps Ethernet become the most promising
network technology.
5.1.2 Principles
NOTE
The fatal defect of the coaxial cable is the fact that devices on the cable are connected in series and
therefore a single node failure can cause the breakdown of the entire network. As the physical
standards of coaxial cables, 10BASE-2 and 10BASE-5 have fallen into disuse.
l 100M Ethernet cable standards
The 100M Ethernet is also called Fast Ethernet (FE). Compared with the 10M Ethernet,
the 100M Ethernet has faster transmission rate at the physical layer, but they have no
difference at the data link layer.
Table 5-2 lists the 100M Ethernet cable standards.
Both the 10Base-T and 100Base-TX are applied to Category 5 twisted pair cables. They
have different transmission rates. The 10Base-T transmits data at a rate of 10 Mbit/s
whereas the 100Base-TX transmits data at 100 Mbit/s.
The 100Base-T4 is rarely adopted now.
l Gigabit Ethernet cable standards
The Gigabit Ethernet is developed on the basis of the Ethernet standard defined in IEEE
802.3. Based on the Ethernet protocol, the transmission rate of the FE is increased by 10
times and reaches 1 Gbit/s. Table 5-3 lists the Gigabit Ethernet cable standards.
Using the Gigabit Ethernet technology, you can upgrade the existing Fast Ethernet from
100 Mbit/s to 1000 Mbit/s.
The physical layer of a Gigabit Ethernet uses 8B10B coding. In the traditional Ethernet
technology, the data link layer delivers 8-bit data sets to its physical layer. After proper
processing, the data sets, still being 8 bit, are sent to the data link layer for transmission.
The situation is different on the Gigabit Ethernet of optical fibers, in which the physical
layer maps the 8-bit data sets transmitted from the data link layer to 10-bit data sets and
then sends them out.
l 10GE cable standards
IEEE 802.3ae is the 10GE cable standard. For a 10GE, the cables are all optical fiber in
full-duplex mode.
The 10GE is under way, and will be widely deployed in future.
CSMA/CD
l Concept of CSMA/CD
The Ethernet network was originally designed to connect computers and other digital
devices on a shared physical line. The computers and digital devices can access the
shared line only in half-duplex mode. Therefore, a mechanism of collision detection and
avoidance is required to prevent multiple devices from contending for the line. Carrier
Sense Multiple Access with Collision Detection (CSMA/CD) is therefore introduced.
The concept of CSMA/CD is described as follows:
– CS: carrier sense
Before transmitting data, a station monitors the line to check whether the line is
idle. In this manner, chances of collision are decreased.
– MA: multiple access
The data sent by a station can be received by multiple stations.
– CD: collision detection
If two stations transmit electrical signals at the same time, the signals are
superimposed, and therefore the voltage amplitude doubles the normal amplitude.
The situation results in collision.
The stations, therefore, stop transmission after sensing the conflict, and resume the
transmission after a random delay.
l Half-duplex mode
The half-duplex mode has the following features:
– Receiving data or sending data takes place in only one direction at a time.
– The CSMA/CD mechanism is adopted.
– The transmission distance is limited.
Hubs work in half-duplex mode.
l Full-duplex mode
After Layer 2 switches replace Hubs in networking, the shared Ethernet changes to the
switched Ethernet, and the half-duplex mode is replaced by the full-duplex mode. As a
result, the transmission rate is drastically increased, and the maximum throughput
reaches the double rate.
The full-duplex mode solves the problem of conflicts once and for all. CSMA/CD,
therefore, is no longer adopted by the Ethernet.
The full-duplex mode has the following features:
– Transmitting data and receiving data can take place simultaneously.
– The maximum throughput doubles the transmission rate.
– This mode does not have the limitation on the transmission distance.
Except Hubs, the network cards, Layer 2 devices, and Layer 3 devices produced in
recent 10 years all support the full-duplex mode.
To realize the full-duplex mode, the hardware requirements are as follows:
– Full-duplex network cards and chips
– Physical media over which sending and receiving frames are separated
– Point-to-point connection
16ms
1ms
Similar to an Ethernet network that uses twisted pair cables, an Ethernet network that
uses optical modules and optical fibers also implements auto-negotiation by sending
code streams. These code streams are called Configuration (C) code streams. Different
from electrical interfaces, optical interfaces generally do not negotiate traffic
transmission rates and work in duplex mode. Therefore, only flow control parameters are
negotiated.
Auto-negotiation priorities of the Ethernet duplex link are listed as follows in a
descending order:
– 1000M full-duplex
– 1000M half-duplex
– 100M full-duplex
– 100M half-duplex
– 10M full-duplex
– 10M half-duplex
If auto-negotiation succeeds, the Ethernet card activates the link. Then, data can be
transmitted on the link. If auto-negotiation fails, the link is unavailable.
Auto-negotiation is implemented based on the chip design at the physical layer. As
defined in IEEE 802.3, auto-negotiation is implemented in any of the following cases:
– A faulty link recovers.
– A device is re-powered on.
– Either of two connected devices resets.
– A renegotiation request packet is received.
In other cases, two connected devices do not always send auto-negotiation code streams.
Auto-negotiation does not use special packets or bring additional protocol costs.
l Auto-negotiation rules for interfaces
Two connected interfaces can communicate with each other only when they are in the
same working mode.
– If both interfaces work in the same non-auto-negotiation mode, the interfaces can
communicate.
– If both interfaces work in auto-negotiation mode, the interfaces can communicate
through negotiation. The negotiated working mode depends on the interface with
lower capability (specifically, if one interface works in full-duplex mode and the
other interface works in half-duplex mode, the negotiated working mode is half-
duplex). The auto-negotiation function also allows the interfaces to negotiate about
the traffic control function.
– If a local interface works in auto-negotiation mode and the remote interface works
in a non-auto-negotiation mode, the negotiated working mode of the local interface
depends on the working mode of the remote interface.
Table 5-4 describes the auto-negotiation rules for interfaces of the same type.
Table 5-4 Auto-negotiation rules for interfaces of the same type (the local interface
works in auto-negotiation mode)
Table 5-5 describes the auto-negotiation rules for interfaces of different types.
According to the auto-negotiation rules described in Table 5-4 and Table 5-5, if an
interface works in auto-negotiation mode and the connected interface works in a
non-auto-negotiation mode, packets may be dropped or auto-negotiation may fail. It
is recommended that you configure two connected interfaces to work in the same
mode to ensure that they can communicate properly.
FE and higher-rate optical interfaces only support the full duplex mode. Auto-
negotiation is enabled on GE interfaces for the negotiation of traffic control. When
devices are directly connected using GE optical interfaces, auto-negotiation is
enabled on the optical interfaces to detect the unidirectional optical fiber fault. If
one of two optical fibers is faulty, the fault information is synchronized on both
ends through auto-negotiation. As a result, interfaces on both ends go Down. After
the fault is rectified, the interfaces go Up again through auto-negotiation.
HUB
l Hub principle
When terminals are connected using twisted pair cables, a convergence device, which is
called Hub, is required. Operating at the physical layer, Hubs connect devices. Figure
5-2 shows a Hub operation model.
Application Application
layer layer
Presentation Presentation
layer layer
Session Session
layer layer
Transport Transport
layer layer
Network Network
layer HUB layer
Link Link
layer layer
Physical Physical Physical Physical
layer layer layer
layer
The appearance of a Hub is a box with multiple interfaces. Each interface can connect to
a terminal. Therefore, multiple devices can be connected through a Hub to form a star
topology.
NOTE
Note that although the topology is physically a star shape, the Hub uses the bus and CSMA/CD
technologies.
11 22 33 44 55
IN OUT OUT OUT OUT
l According to the supported interfaces, Hubs can be classified into the following two
types:
– Category-I Hub: supports physical interfaces of one type.
For example, a Category-I Hub provides only Category-5 twisted pair interfaces,
Category-3 twisted pair interfaces, or optical fiber interfaces.
– Category-II Hub: provides interfaces of different types. For example, a Category-II
Hub can provide both Category-5 twisted pair interfaces and optical fiber interfaces.
Both types have no difference in internal operation mode; however, they are used in
different scenarios because they provide different types of interface. In practice,
Category-I Hubs are commonly used.
Duplex mode, either half or full, refers to the operation mode of the physical layer. Access
mode refers to the access of the data link layer. Therefore, in the Ethernet, the data link layer
and physical layer are associated.
Therefore, different access modes are required for different operation modes. This brings
about some inconvenience to the design and application of the Ethernet.
Some organizations and vendors propose to divide the data link layer into two sub-layers: the
Media Access Control (MAC) sub-layer and the Logical Link Control (LLC) sub-layer.
Therefore, different physical layers correspond to different MAC sub-layers, and the LLC
sub-layer becomes totally independent, as shown in Figure 5-4.
Figure 5-4 Hierarchical structure of the data link layer of the Ethernet
Network
layer
LLC layer
Data link
layer
MAC layer
Physical
layer
d. The Ethernet frame is sent to the peer according to the destination MAC address.
e. The peer compares the destination MAC address with entries in the MAC address
table.
n If an entry is matched, the frame is accepted.
n If no entry is matched, the frame is discarded.
The Cyclic Redundancy Check (CRC) field provides an error detection mechanism.
Each sending device calculates a CRC code containing the DMAC, SMAC, Type,
and Data fields. Then the CRC code is filled into the 4-byte CRC field.
l Format of an IEEE 802.3 frame
As shown in Figure 5-6, the format of an IEEE 802.3 frame is similar to that of an
Ethernet_II frame except that in an IEEE 802.3 frame, the Type field is changed to the
Length field, and the LLC field and the Sub-Network Access Protocol (SNAP) field
occupy 8 bytes of the Data field.
– Length
The Length field specifies the number of bytes of the Data field.
– LLC
The LLC field consists of three sub-fields: Destination Service Access Point
(DSAP), Source Service Access Point (SSAP), and Control.
– SNAP
The SNAP field consists of the Org Code field and the Type field. Three bytes in
the Org Code field are all 0s. The Type field functions the same as the Type field in
Ethernet_II frames.
For descriptions about other fields, see the relevant description of Ethernet_II frames.
Based on the values of DSAP and SSAP, IEEE 802.3 frames can be divided into the
following types:
– If DSAP and SSAP are both 0xff, the IEEE 802.3 frame changes to a Netware-
Ethernet frame that bears NetWare data.
– If DSAP and SSAP are both 0xaa, the IEEE 802.3 frame changes to an
Ethernet_SNAP frame.
Ethernet_SNAP frames can be encapsulated with data of multiple protocols. The
SNAP can be considered as an extension of the Ethernet protocol. SNAP allows
vendors to invent their own Ethernet transmission protocols.
The Ethernet_SNAP standard is defined by IEEE 802.1 to guarantee
interoperability between IEEE 802.3 LANs and Ethernet networks.
– Other values of DSAP and SSAP indicate IEEE 802.3 frames.
LLC Sub-layer
As described, the MAC sub-layer supports two types of frame: IEEE 802.3 frames and
Ethernet_II frames. In an Ethernet_II frame, the Type field identifies the upper layer protocol.
Therefore, on a device, only the MAC sub-layer is required, and the LLC sub-layer does not
need to be realized.
In an IEEE 802.3 frame, besides the traditional services of the data link layer, the LLC sub-
layer defines additional useful features. All these features are provided by the sub-fields of
DSAP, SSAP, and Control.
l Connectionless service
Currently, the Ethernet implements this service.
l Connection-oriented service
The connection is set up before data is transmitted. The reliability of the data is
guaranteed during the transmission.
l Connectionless data transmission with acknowledgement
The connection is not required before data transmission. The acknowledgement
mechanism is adopted to improve the reliability.
The following is an example that describes the applications of SSAP and DSAP. Assume that
terminals A and B use connection-oriented services. Data is transmitted in the following
process:
5.1.3 Applications
At the beginning, a lot of computers are connected using coaxial cables to access shared
directories or access a file server located on the local network segment. All the computers,
regardless of servers or hosts, are equal on the network.
The structure, however, cannot keep up with the development in applications. Currently, most
traffic flows between clients and servers. This type of traffic model becomes a bottleneck on
servers inevitably.
After the full-duplex Ethernet technology and Ethernet switches are introduced, the servers
are connected to high-speed interfaces (100 Mbit/s) on Ethernet switches, and the clients are
connected to low-speed interfaces on Ethernet switches. The traffic bottleneck can be
alleviated. The modern operating system provides distributed services and database services.
Servers based on this operating system communicate with clients and other servers for data
synchronization. 100M FE cannot meet the bandwidth requirement; therefore, the 1000M
Ethernet technology emerges as the times require.
Terms
Term Description
Term Description
MAC It is short for Media Access Control. At the data link layer of the OSI
model, the MAC sub-layer is adjacent to the physical layer.
GE Gigabit Ethernet
5.2 VLAN
5.2.1 Introduction
Definition
The Virtual Local Area Network (VLAN) technology logically divides a physical LAN into
multiple VLANs, each of which is a broadcast domain. Only intra-VLAN communication is
allowed for higher network security.
Purpose
The traditional LAN technology uses the bus structure and has the following shortcomings:
l Conflict occurs if multiple nodes send messages simultaneously.
l Messages are broadcast to all nodes.
l Networks have security risks because all hosts in a LAN share the same transmission
channel.
To overcome these shortcomings, bridges and Layer 2 switches are used to effectively isolate
the collision domain.
However, bridge and Layer 2 switches cannot address the network security issues caused by
broadcast domains.
NOTE
To reduce the broadcast traffic, broadcast traffic must be isolated among hosts that do not
have communication requirements. The ATN can select routes based on IP addresses and
effectively reduce broadcast traffic between two connected network segments. However, the
solution is costly. Multiple logical LANs, namely, VLANs were developed on the physical
LAN.
Hosts only in the same VLAN can communicate with each other. Broadcast packets are
therefore confined within each VLAN, and network security is also enhanced.
For example, it is costly for different companies in the same building to build their own
LANs. If these companies share the same LAN in the building, there may be security
problems.
To address these problems, these companies can use the VLAN technology.
VLAN-A
VLAN-B
VLAN-C
Figure 5-7 shows a networking diagram for a typical VLAN application. Three switches are
placed at different locations (for example, different floors of a building). Each switch
connects to three hosts that belong to different VLANs. Each VLAN can be used by a
different company. In the diagram, a dotted box indicates a VLAN.
5.2.2 Principles
Link Types
VLAN links can be classified into the following types:
l Access link: a link connecting a host and a switch. In Figure 5-9, the link between PCs
and switches are all access links.
l Trunk link: a link connecting switches. In Figure 5-9, the links between switches are
trunk links. Frames transmitted over trunk links carry VLAN tags.
VLAN2
VLAN3
Access Link
Trunk Link
Port Types
Some ports of a device can identify VLAN frames defined by IEEE 802.1Q, whereas others
cannot. Ports can be classified into three types based on whether they can identify VLAN
frames:
l Access port
An access port connects a switch to a host over an access link, as shown in Figure 5-9.
An access port has the following features:
– Directly discards frames with VLAN tags.
– Adds a PVID to its received untagged frame.
– Removes the tag from a frame before it sends the frame.
l Trunk port
As shown in Figure 5-9, a trunk port connects a switch to anther switch over a trunk
link. A trunk port has the following features:
– Allows tagged frames from multiple VLANs to pass.
– Directly discards untagged frames.
– Directly sends the tagged frame.
l Hybrid port
As shown in Figure 5-10, a hybrid port connects a switch to either a host over an access
link or another switch over a trunk link. A hybrid port allows frames from multiple
VLANs to pass and removes tags from outgoing VLAN frames by default.
Hybrid Port
Access Link
Trunk Link
Default VLAN
A default VLAN can be configured on access, trunk, and hybrid ports. However, the
meanings of default VLAN vary with port types.
l Default VLAN of an access port
– If an access port receives a tagged frame, it discards the frame.
– Before an access port sends a frame with a tag whose VID is the same as the PVID,
it removes the VLAN tag from the frame. Frames sent by an access port to a peer
device never carry VLAN tags.
l Default VLAN of a hybrid port
– If a hybrid port receives an untagged frame, it adds a VLAN tag to the frame and
sets the VID in the tag to the PVID.
– When a hybrid port receives a tagged frame,
n If the frame's VLAN ID is permitted by the port, the port accepts the tagged
frame.
n If the frame's VLAN ID is denied by the port, the port discards the tagged
frame.
– When a hybrid port sends a frame,
n If the frame's VLAN ID is permitted by the port, the port directly transmits the
frame.
n If the frame's VLAN ID is the same as the PVID, the port strips the VLAN tag
and sends the frame out.
Basic Principles
To improve frame processing efficiency, frames arriving at a switch all carry VLAN tags for
uniform processing. If an untagged frame enters a switch port which has a default VLAN
configured, the port then adds a VLAN tag whose VID is the same as the PVID to the frame.
If a tagged frame enters a switch port, the port does not add any tag to the frame, even if the
port has a default VLAN configured.
The switch processes frames in a different way according to port types. The following table
describes how ports of different types process frames.
Access Accepts an untagged Discards the frame. Removes the tag that
frame and adds a tag with contains the PVID
the default VLAN ID to and sends the frame
the frame. out.
Trunk Discards the frame. l Accepts the tagged If the frame's VLAN
frame if the frame's ID is permitted by the
VLAN ID is permitted port, the port
by the port. transmits the frame.
l Discards the tagged Otherwise, the port
frame if the frame's discards the frame.
VLAN ID is denied by
the port.
QinQ A QinQ port adds a tag to each single-tagged frame and supports a maximum of
4094 x 4094 VLAN tags, which meet the requirement of a Metropolitan Area
Network (MAN) on the number of VLANs.
VLAN3
VLAN2
Host A Host B
In Figure 5-11, the trunk link between ATNA and ATNB must support both communication
within VLAN 2 and communication within VLAN 3. To implement this, the ports at both
ends of the trunk link must be configured to belong to both VLANs. Specifically, GE 1 on
ATNA and GE 2 on ATNB must belong to both VLAN 2 and VLAN 3.
Host A sends a frame to Host B in the following process:
1. Host A sends the frame to GE1 on ATNA.
2. GE1 adds a tag with a VLAN ID of 2 to the frame on GE1. 2 is the ID of the VLAN to
which GE1 belongs.
3. ATNA sends the frame to all its interfaces except GE1that belong to VLAN 2 (except
GE1).
4. GE2 sends the frame to ATNB.
5. After receiving the frame, ATNB finds that the frame belongs to VLAN 2 and sends the
frame to its interfaces that belong to VLAN 2 (except GE2).
6. GE4 sends the frame to Host B.
Communication within VLAN 3 is similar and is omitted here.
Inter-VLAN Communication
After VLANs are configured, hosts in different VLANs cannot directly communicate with
each other at Layer 2. To implement communication between VLANs, you must create routes
between these VLANs. The implementation details are as follows:
CX
Subinterface
VLAN Trunk
Access port
VLAN2 VLAN3
routing protocol on the Layer 3 switch is required. VLANIF interfaces are therefore
introduced.
A VLANIF interface is a Layer 3 logical interface, which can be configured on either a
Layer 3 switch or a router.
As shown in Figure 5-13, two VLANs, VLAN 2 and VLAN 3, are configured on the
switch. To implement communication between the two VLANs, create two VLANIF
interfaces on the switch and assign IP addresses and configure routes for the VLANIF
interfaces.
VLANIF VLANIF
VLAN2 VLAN3
The Layer 3 switching addresses the shortcomings in the scheme of Layer 2 switch +
Router and implements faster traffic forwarding at a lower cost. Nevertheless, the Layer
3 switching has the following shortcomings:
– The Layer 3 switch scheme applies only to networks with almost all Ethernet
interfaces.
– The Layer 3 switching applies only to networks with stable routes and few changes
in the network topology.
Background
A VLAN is widely used on switching networks because of its flexible control of broadcast
domains and convenient deployment. On a Layer 3 switch, inter-VLAN communication is
implemented by configuring a VLANIF interface (logical Layer 3 interface) to each VLAN
and assigning an IP address to each VLANIF interface. This wastes IP addresses. Figure 5-14
shows a typical VLAN division in the device.
VLANIF2:1.1.1.1 VLANIF4:1.1.1.25
VLANIF3:1.1.1.17
2 1.1.1.0/28 1.1.1.1 14 13 10
3 1.1.1.16/29 1.1.1.17 6 5 5
4 1.1.1.24/30 1.1.1.25 2 1 1
In Table 5-6, VLAN 2 requires 10 host addresses. Subnet 1.1.1.0/28 with mask length 28 bits
is assigned to VLAN 2. 1.1.1.0 is the subnet address, and 1.1.1.15 is the directed broadcast
address. These two addresses cannot be used as the host address. In addition, as the default
address of subnet, 1.1.1.1's network gateway cannot be used as the host address. The other 13
addresses ranging from 1.1.1.2 to 1.1.1.14 can be used by the hosts. In this way, although
VLAN 2 needs only ten addresses, 13 addresses need to be assigned to it according to the
division of the subnet.
VLAN 3 requires five host addresses, and subnet 1.1.1.16/29 with mask length 29 bits needs
to be assigned to VLAN 3. VLAN 4 requires only one address, and subnet 1.1.1.24/30 with
mask length 30 bits needs to be assigned to VLAN 4.
In the above example, 16 (10+5+1) addresses are required for all the VLANs, however 28
(16+8+4) addresses will be used according to the common VLAN addressing mode even if
the optimal scheme is used. Therefore, nearly half of the addresses will be wasted. In addition,
if VLAN 2 is accessed by only three hosts instead of ten later, the extra addresses will also be
wasted.
This division is inconvenient for future network upgrade and expansion. If VLAN 4 needs an
additional two hosts and does not want to change the assigned IP addresses, and the addresses
after 1.1.1.24 has been assigned to others, a new subnet with mask length 29 bits and a new
VLAN need to be assigned to VLAN 4's new customers. As a result, VLAN 4's customers
only have three hosts, but the customers are assigned to two different subnets in separate
VLANs, which becomes inconvenient for network management.
In the preceding example, several IP addresses are used as subnet addresses, subnet
directional broadcast addresses, and default addresses of subnet network gateways, indicating
that these IP addresses cannot be used as host addresses in the VLAN. VLAN aggregation is
used to eliminate this limitation on address assignment.
Principles
VLAN aggregation, also known as a super-VLAN, partitions broadcast domains by using
multiple VLANs in a physical network so different VLANs can belong to the same subnet. In
VLAN aggregation, two basic concepts are involved, super VLAN and sub-VLAN.
l Super VLAN: Super VLANs differ from common VLANs. In super VLANs, only Layer
3 interfaces are created and physical ports are not contained. The super VLAN can be
regarded as a logical Layer 3 collection of many sub-VLANs.
l Sub-VLAN: Sub-VLANs are used to isolate broadcast domains. In sub-VLANs, only
physical ports are contained, and Layer 3 VLAN interfaces cannot be created. The Layer
3 switching with the external network is implemented through the super VLAN Layer 3
interface.
A super VLAN can contain one or more sub-VLANs each with different broadcast domains.
The sub-VLAN does not occupy an independent subnet segment. In the same super VLAN, IP
addresses of hosts belong to the super VLAN's subnet segment, regardless of the mapping
between hosts and sub-VLANs.
The same Layer 3 interface is shared by sub-VLANs. Some subnet IDs, default gateway
addresses of the subnet, and directed broadcast addresses of the subnet are saved. In addition,
different broadcast domains can use the unused addresses in the same subnet segment. As a
result, subnet differences are eliminated, addressing becomes flexible and previously wasted
addresses can be used.
Use the Table 5-6 to explain the implementation principle. Suppose that user demands are
unchanged. In VLAN 2, 10 host addresses are demanded; in VLAN 3, 5 host addresses are
demanded; in VLAN 4, 1 host address is demanded.
Create VLAN 10 and configure VLAN 10 as a super VLAN. Then assign subnet address
1.1.1.0/24 with mask length being 24 bits to VLAN 10, where 1.1.1.0 is the subnet ID and
1.1.1.1 is the gateway address of the subnet, as shown in Figure 5-15. The corresponding sub-
VLAN address assignment of VLAN 2, VLAN 3, and VLAN 4 is shown in Table 5-7.
Super VLAN 10
VLANIF10:1.1.1.1/24
3 5 1.1.1.12-1.1.1.16 5
4 1 1.1.1.17 1
In VLAN aggregation implementation, sub-VLANs are not divided according to the previous
subnet border. Instead, their addresses are flexibly assigned in the super VLAN's subnet
according to the required number of hosts.
Table 5-7 shows that VLAN 2, VLAN 3, and VLAN 4 share a subnet (1.1.1.0/24), a default
gateway address of the subnet (1.1.1.1), and a directed broadcast address of the subnet
(1.1.1.255). In this manner, the subnet ID (1.1.1.16, 1.1.1.24), the default gateway of the
subnet (1.1.1.17, 1.1.1.25), and the directed broadcast address of the subnet (1.1.1.5, 1.1.1.23,
and 1.1.1.24) can be used as host IP addresses.
In total, 16 addresses (10 + 5 + 1 = 16) are required for the three VLANs. In practice, in this
subnet, a total of 16 addresses are assigned to the three VLANs (1.1.1.2 to 1.1.1.17). A total
of 19 IP addresses are used, that is, the 16 host addresses together with the subnet ID
(1.1.1.0), the default gateway of the subnet (1.1.1.1), and the directed broadcast address of the
subnet (1.1.1.255). In the network segment, 236 addresses (255 - 19 = 236) are available,
which can be used by any host in the sub-VLAN.
Figure 5-16 Layer 3 communication between different sub-VLANs based on ARP proxy
Super VLAN 10
VLANIF10: 1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
If Host A's ARP table has no corresponding entry for Host B and the gateway between
sub-VLANs is enabled with the ARP proxy, communication between Host A in VLAN 2
and Host B in VLAN 3 proceeds as follows:
a. After comparing the IP address of Host B 1.1.1.3 with its IP address, Host A finds
that both IP addresses are in the same network segment 1.1.1.0/24, and its ARP
table has no entry corresponding to Host B.
b. Host A initiates an ARP broadcast to request for Host B's MAC address.
c. Host B is not in the broadcast domain of VLAN 2, and cannot receive the ARP
request.
d. Since the gateway's ARP proxy is enabled between sub-VLANs, after receiving
Host A's ARP request, the gateway discovers that the IP address of Host B 1.1.1.3
is the IP address of a directly-connected interface. The gateway then initiates an
ARP broadcast to all other sub-VLAN interfaces to request Host B's MAC address.
e. After receiving an ARP request, Host B offers an ARP response.
f. After receiving Host B's ARP response, the gateway replies with Host A's MAC
address.
g. The ARP tables in both the gateway and Host A have entries corresponding to Host
B.
h. To send packets to Host B, Host A initially sends packets to the gateway, and then
the gateway carries out Layer 3 forwarding.
The process that Host B uses to send packets to Host A functions in the same way.
l Layer 2 communication between a sub-VLAN and an external network
In Figure 5-17, in port-based Layer 2 VLAN communication, the received or sent
frames are not tagged with the super VLAN ID.
ATN1
GE1 GE2
Super VLAN 10
VLANIF10:1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
Host A sends a frame to Switch 1 through GE1. Upon receipt, Switch 1 adds a VLAN
tag with a VLAN ID 2 to the frame. The VLAN ID is not changed to VLAN 10's ID on
Switch 1 even if VLAN 2 is the sub-VLAN of VLAN 10. After passing through GE3 (a
trunk port), this frame still carries VLAN 2's ID.
That is, Switch 1 itself does not send VLAN 10's frames.
A super VLAN has no physical port. This limitation is obligatory, as shown below:
– If you configure the super VLAN and then the trunk interface, the frames of a super
VLAN are filtered automatically according to the VLAN range set on the trunk
interface.
In Figure 5-17, no frame of the super VLAN 10 passes through GE3 on Switch 1,
even though the interface allows frames from all VLANs to pass through.
– If you complete configuring the trunk interface and allow all VLANs to pass
through, the super VLAN still cannot be configured on Switch 1, because any
VLAN with physical ports cannot be configured as the super VLAN, and the trunk
interface allows only the frames tagged with VLAN IDs to pass through.
As for Switch 1, the valid VLANs are VLAN 2 and VLAN 3, and all frames are
forwarded in these VLANs.
l Layer 3 communication between a sub-VLAN and an external network
VLANIF20
1.1.3.1/24
ATN2 GE2
GE1 VLANIF10
Host C
1.1.2.2/24
1.1.3.2/24
VLANIF10
GE3 1.1.2.1/24
ATN1
GE1 GE2
Super VLAN 4
VLANIF4:1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
b. Host A initiates an ARP broadcast to its gateway, requesting the gateway's MAC
address.
c. After receiving the ARP request, Switch 1 identifies the correlation between the
sub-VLAN and the super VLAN, and offers an ARP response to Host A through
sub-VLAN 2. The source MAC address in the ARP response packet is the MAC
address of VLANIF4 for super VLAN 4.
d. Host A learns the gateway's MAC address.
e. Host A sends the packet to the gateway, with the destination MAC address as the
MAC address of VLANIF4 for super VLAN 4, and the destination IP address of
1.1.3.2.
f. After receiving the packet, Switch 1 performs Layer 3 forwarding and sends the
packet to Switch 2, with the next hop address as 1.1.2.2, the outgoing interface as
VLANIF10.
g. After receiving the packet, Switch 2 performs Layer 3 forwarding and sends the
packet to Host C through the directly-connected interface VLANIF20.
h. The response packet from Host C reaches Switch 1 after Switch 2 carries out Layer
3 forwarding.
i. After receiving the packet, Switch 1 performs Layer 3 forwarding and sends the
packet to Host A through the super VLAN.
VLAN 2 VLAN 3
2 3
GE1
3
ATN A ATN B
3
2
3
172.16.0.1/16 172.16.0.7/16
If devices in two VLANs need to communicate through VLAN mapping, the IP addresses of
these devices must be on the same network segment. Otherwise, VLAN mapping does not
take effect, as the devices communicate through Layer 3 routes.
Background
On an ME network, commonly, users and services are first identified based on single VLAN
tags or double VLAN tags carried in packets and then access different VPNs through sub-
interfaces. In some special scenarios where the access device does not support QinQ or a
VLAN tag is used in different services, different services cannot be distributed to different
VSIs or VPN instances.
As shown in Figure 5-20, the High Speed Internet (HSI), Voice over IP (VoIP), and Internet
Protocol Television (IPTV) services belong to VLAN 10 and are converged to the CSG; the
CSG is connected to the UPE through L2VPNs.
If the CSG does not support QinQ, it cannot differentiate the received HSI, VoIP, and IPTV
services for transmitting them through different PWs. In this case, you can configure the CSG
to resolve the 802.1p priorities. Then, the UPE can transmit different packets through different
PWs based on the 802.1p priorities of the packets.
In a similar manner, if the CSG is connected to UPE through L3VPNs, the UPE can transmit
different services through different VPN instances based on the 802.1p priorities of the
packets.
Figure 5-20 Networking diagram of multiple services belonging to the same VLAN
BTV VOD
Platform
SR Video
HSI
PW1
VoIP
CSG PW2 PE
Internet
IPTV BRAS
VLAN 10
Data flow1
Data flow2
Basic Concepts
The sub-interfaces are classified as shown in Table 5-8 based on service identification
policies configured on them.
For details on ATM cell relay implemented through PWE3, refer to the chapter "PWE3" in the
Feature Description - VPN.
In the access of a non-IP station to an L2VPN, the process of transmitting ATM packets
is as follows:
a. The NodeB differentiates service types (voice, data, or signal). The NodeB
encapsulates the IP packets as follows:
n Encapsulates different users with different VLAN IDs.
n Encapsulates different services with different 802.1p priorities.
n Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
n Encapsulates different services of different users with different VLAN IDs but
the same or different 802.1p priorities.
After the NodeB encapsulates packets with different VLAN IDs and 802.1p
priorities, it encapsulates the packets with tunnel labels and PW labels for
implementing ATM cell relay. Then, the ATM services can be transparently
transmitted to the CSG through label switching. Figure 5-22 shows the format of a
packet on the outbound interface of the CSG.
b. After CSG receives the ATM services, the 802.1p sub-interface on CSG resolves
the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
different VSIs through priority mapping. In this manner, different services are
transmitted to PE2 through different VSIs. Figure 5-23 shows the format of a
packet transmitted between CSG and PE.
c. Upon receiving the packets, the PE2 decapsulates the packets to obtain the original
ATM packets and then sends the original ATM packets to the BSC.
l Access of an IP station
As shown in Figure 5-24, when a CSG accesses an IP station, PWE3 is not required on
the CSG and MASG. After the CSG receives IP packets, it performs the following:
a. The NodeB differentiates service types (voice, data, or signal). The NodeB
encapsulates the IP packets as follows:
n Encapsulates different users with different VLAN IDs.
n Encapsulates different services with different 802.1p priorities.
n Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
n Encapsulates different services of different users with different VLAN IDs but
the same or different 802.1p priorities.
b. After CSG receives the packets, its 802.1p sub-interface resolves the packets to
obtain their VLAN IDs and 802.1p priorities. The packets then access different
VSIs through priority mapping. In this manner, different services are transmitted to
PE2 through different VSIs.
c. The PE2 then transmits the packets to the BSC.
Manage
NOTE
As shown in Figure 5-25, a CSG accesses a non-IP station. NodeB and the CSG, and the
MASG and RNC are connected through TDM or ATM physical links. Then, the CSG
differentiates service types (voice, data, or signal) based on timeslots in TDM or PVCs in
ATM.
Voice
Manage
CSG L3VPN PE2 RNC
Data NodeB
For details on ATM cell relay implemented through PWE3, refer to the chapter "PWE3" in the
Feature Description - VPN.
In the access of a non-IP station to an L3VPN, the process of transmitting ATM packets
is as follows:
a. The NodeB differentiates service types (voice, data, or signal). The NodeB
encapsulates the IP packets as follows:
n Encapsulates different users with different VLAN IDs.
n Encapsulates different services with different 802.1p priorities.
n Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
n Encapsulates different services of different users with different VLAN IDs but
the same or different 802.1p priorities.
After the NodeB encapsulates packets with different VLAN IDs and 802.1p
priorities, it encapsulates the packets with tunnel labels and PW labels for
implementing ATM cell relay. Then, the ATM services can be transparently
transmitted to DSLAM through label switching. Figure 5-26 shows the format of a
packet on the outbound interface of the CSG.
b. After CSG receives the ATM services, the 802.1p sub-interface on CSG resolves
the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
different VPN instances through priority mapping. In this manner, different services
are transmitted to PE2 through different VPN instances. Figure 5-27 shows the
format of a packet transmitted between the CSG and PE2.
c. Upon receiving the packets, the PE2 decapsulates the packets to obtain the original
ATM packets and then sends the original ATM packets to the RNC.
l Access of an IP station
As shown in Figure 5-28, when a CSG accesses an IP station, PWE3 is not required on
the CSG and MASG. After the CSG receives IP packets, it performs the following:
a. The NodeB differentiates service types (voice, data, or signal). The NodeB
encapsulates the IP packets as follows:
n Encapsulates different users with different VLAN IDs.
n Encapsulates different services with different 802.1p priorities.
n Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
n Encapsulates different services of different users with different VLAN IDs but
the same or different 802.1p priorities.
b. After the CSG receives the packets, its 802.1p sub-interface resolves the packets to
obtain their VLAN IDs and 802.1p priorities. The packets then access different
VPN instances through priority mapping. In this manner, different services are
transmitted to PE2 through different VPN instances.
c. After PE2 receives the packets, it sends the packets to the RNC.
Voice
NOTE
5.2.3 Application
Port-Based VLAN Division
Router
Trunk Link
Different companies residing in the same business premise may need to isolate service data.
Therefore, according to the port requirement of each company, VLANs are created on the
core switch of the business premise, and ports of each company are assigned into the
corresponding VLAN. This ensures that each company can have a "virtual switch" or say a
"virtual workstation".
ATN CX
Trunk Link
As shown in Figure 5-30, the trunk link can be utilized to connect different NodeB and RNC.
In this manner, data of different NodeBs can be isolated, and the inter-department
communication within the NodeB and RNC can be implemented.
Figure 5-31 Networking diagram of communications between multiple VLANs on the same
Layer 3 device
CX600
Trunk Link
ATN
As shown in Figure 5-31, if VLAN 2, VLAN 3, and VLAN 4 only belong to CX, these
VLANs are not VLANs across different switches. In such a situation, you can configure a
VLANIF interface for each VLAN on CX to implement the communications between these
VLANs.
The Layer 3 device shown in Figure 5-31 can be a ATN or a Layer 3 switch.
l Multiple VLANs belongs to different Layer 3 devices.
CX-A CX-B
As shown in Figure 5-32, VLAN 2, VLAN 3, and VLAN 4 are VLANs across different
ATNs. In such a situation, you can configure a VLANIF interface respectively on CX-A and
CX-B for each VLAN, and then configure the static route or run a routing protocol between
CX-A and CX-B.
The Layer 3 device shown in Figure 5-32 can be a ATN or a Layer 3 switch.
5.3 Trunk
5.3.1 Introduction
Definition
Trunking bundles multiple physical interfaces into a single logical interface, which is called a
trunk interface. The bundled physical interfaces are member interfaces.
Purpose
Before trunking is used, the transmission rate between two network devices connected by a
fast Ethernet twisted pair cable is limited to 100 Mbit/s. To provide a higher transmission rate,
the twisted pair cable has to be replaced with a gigabit optical fiber, or the existing network
has to be upgraded to a Gigabit Ethernet network. These solutions are costly and not suitable
for small-and-medium size enterprises or institutions.
5.3.2 Principles
Additionally, a trunk interface can be configured to support routing protocols and services.
Figure 5-33 shows a simple Eth-Trunk example in which two devices are directly connected
through three GE interfaces. These three interfaces are bundled into an Eth-Trunk interface at
both ends of the trunk link. In this way, bandwidth is increased, and reliability is improved.
A trunk link can be considered a direct point-to-point link, with devices on both ends being
either ATNs, switches, or a ATN on one end and a switch on the other end.
l Load balancing
Load balancing can be implemented on a trunk interface. For example, on an Eth-Trunk
interface, you can configure weights for member links to carry out load balancing.
l Higher reliability
When the physical link of a member interface fails, the traffic on the member link is
switched to another member link, ensuring uninterrupted service on the trunk link.
l Increased bandwidth
The bandwidth of a trunk interface equals the sum of the bandwidth of all member
interfaces.
Table 5-9 shows the Eth-Trunk link aggregation modes that theATN supports.
Manual load If one of the devices on one Manual load balancing is a basic
balancing mode end of an Eth-Trunk link link aggregation mode, in which you
does not support the Link must manually create the Eth-Trunk
Aggregation Control interface, add interfaces to the Eth-
Protocol (LACP), you can Trunk interface, and specify active
create an Eth-Trunk interface member interfaces. LACP is not
in load balancing mode on involved.
the ATN and add multiple In manual load balancing mode, all
interfaces to the Eth-Trunk active member interfaces forward
interface to increase data and perform load balancing. In
bandwidth and enhance this mode, Traffic can be evenly
transmission reliability. balanced among all member
interfaces. Alternatively, you can
also set a weight for each member
interface to implement uneven load
balancing; in this manner, the
interface that has a greater weight
value transmits a larger volume of
traffic. If an active link in the link
aggregation group fails, traffic is
balanced among the remaining
active links.
Static LACP mode If devices on both ends of an In static LACP mode, you must
Eth-Trunk link support manually create an Eth-Trunk
LACP, you can create an Eth- interface and add interfaces to the
Trunk interface in static Eth-Trunk interface. Different from
LACP mode on the ATN. In link aggregation in manual load
this mode, both load balancing mode, active member
balancing and backup can be interfaces are selected by sending
implemented. LACP data units (LACPDUs) in
static LACP mode. That is, when a
group of interfaces are added to an
Eth-Trunk interface, devices at both
ends determine active and inactive
interfaces by sending LACPDUs to
each other.
– Frames with the same source and destination IP addresses, source and destination
TCP/UDP port numbers, and IP protocol types are transmitted over the same
physical link.
Classification
Trunk interfaces can be classified into two types: Eth-Trunk and IP-Trunk.
Features
Eth-Trunk interfaces configured on the ATN support the following features:
l IP address assignment and allowing each trunk member interface to "borrow" a trunk
interface's IP address
l Layer 2 forwarding, MPLS forwarding, and Layer 3 forwarding (unicast )
l Hash algorithm-based load balancing
l QoS.
l VPN instance binding
l Hot backup and hot swapping
l Addition of interfaces on different boards to a single trunk interface
NOTE
The maximum number of Up member links can be configured only for Eth-Trunk interfaces in
static LACP mode.
The maximum number of Up member links is used to control the number of member links in the
Up state. After this number is reached, additional Up member links are forcibly set to Down.
NOTE
Trunk member interface backup can only be configured for Eth-Trunk interfaces in static LACP mode.
LLC
Data link
layer MAC
Trunk
Physical layer PHY
The MAC sub-layer regards trunk interfaces as physical interfaces and delivers frames
directly to the trunk interfaces.
The trunk module maintains a trunk forwarding table that contains the following two items:
l HASH-KEY value
Key values are calculated using the hash algorithm based on packets' MAC or IP
addresses
l Interface number
KEY 0 1 2 3
PORT 3 4 5 6
The trunk module forwards a frame based on the trunk forwarding table in the following way:
1. The trunk module receives a frame from the MAC sub-layer and extracts its source MAC
address/IP address or destination MAC address/IP address.
2. The trunk module uses the hash algorithm to calculate the HASH-KEY value.
3. Based on the HASH-KEY value, the trunk module searches the trunk forwarding table
for the interface number and sends the frame from the corresponding interface.
The preceding mechanisms allow user traffic to be quickly switched if the status of the
member interfaces changes, preventing traffic loss.
5.3.2.6 LACP
As shown in Figure 5-36, a trunk link is established between ATN A and CX-B. Four full-
duplex GE interfaces on ATN A are bundled into a trunk interface and connected to the
corresponding interfaces on CX-B. One of the GE interfaces, however, is incorrectly
connected to the interface on CX-C. The trunk interface cannot effectively detect the fault and
still sends data to CX-C.
If LACP is enabled on ATN A, CX-B, and CX-C, and ATN A is configured with an LACP
priority higher than that of CX-B, after the LACP negotiation, data can be correctly sent from
ATN A to CX-B.
Trunk
CX-C
Basic Concepts
l Link aggregation
Link aggregation bundles a group of physical interfaces into a logical interface to
increase bandwidth and improve reliability.
l Link aggregation group
A link aggregation group (LAG), also called a trunk link, is a logical link formed by
bundling several physical links.
If all bundled links are Ethernet links, the LAG is called an Ethernet LAG or an Eth-
Trunk link. The LAG interface is called an Eth-Trunk interface, and Ethernet interfaces
that constitute an Eth-Trunk interface are called member interfaces.
An Eth-Trunk interface can be considered as a single Ethernet interface. The only
difference lies that an Eth-Trunk interface must select one or more member Ethernet
interfaces before forwarding data. You can configure features on an Eth-Trunk interface
the same way as on a single Ethernet interface, except for some features that take effect
only on physical Ethernet interfaces.
NOTE
An Eth-Trunk member interface cannot be added to another Eth-Trunk interface.
l Active and inactive interfaces
Member interfaces can be active or inactive. Active interfaces forward data, whereas
inactive interfaces cannot.
Links connected to active interfaces are called active links, and links connected to
inactive interfaces are called inactive links.
To enhance link reliability, a backup link is used. Interfaces on the two ends of the
backup link are inactive. The inactive interfaces become active only when the active
interfaces fail.
l Maximum number of active member interfaces
If the number of active member interfaces reaches this threshold, additional member
interfaces cannot become active.
l Minimum number of active member interfaces
The minimum number of active member interfaces is specified to ensure Eth-Trunk
interface bandwidth. This threshold prevents data loss during transmission when the
number of active interfaces is insufficient.
If the number of active member interfaces falls below this threshold, the Eth-Trunk
interface goes Down, and all member interfaces of the Eth-Trunk interface stop
forwarding data.
l System LACP priority
A system LACP priority is set to prioritize the devices at both ends. In static LACP
mode, the active interfaces selected by devices must be consistent on both ends;
otherwise, the LAG cannot be set up. To ensure consistency, you can set a higher system
LACP priority for one end. Then, the other end selects the active interfaces based on the
end with a higher system LACP priority.
A smaller system LACP priority value indicates a higher system LACP priority. The
default system LACP priority value is 32768.
l Interface LACP priority
An interface LACP priority determines the likelihood that an interface can be selected as
an active interface. Interfaces with higher priorities are selected as active interfaces. A
smaller interface LACP priority value indicates a higher interface LACP priority.
l M:N backup
Link aggregation in static LACP uses LACPDUs to negotiate active link selection. This
mode is also called M:N mode where M indicates the number of active links and N
indicates the number of backup links. This mode improves link reliability and allows
load balancing to be performed across M active links.
In Figure 5-37, M+N links with the same attributes (in the same LAG) are set up
between two devices. When data is transmitted over the aggregated link, load balancing
is performed on the M active links; no data is transmitted over the N backup links.
Therefore, the actual bandwidth of the aggregated link is the sum of the M links'
bandwidth, and the maximum bandwidth of the aggregated link is the sum of the M+N
links' bandwidth.
If one of the M links fails, LACP selects a link from the N backup links to take over the
traffic. In such a situation, the actual bandwidth of the aggregated link is still the sum of
M links' bandwidth, but the maximum bandwidth of the aggregated link becomes the
sum of the M+N-1 links' bandwidth.
Primary link
Backup link
M:N backup applies when the bandwidth of M links needs to be provided and link
redundancy is required. If an active link fails, the system automatically selects the
backup link with the highest priority and adds it to the current LAG.
If no backup link is available and the number of Up member links is less than the lower
threshold for the number of Up links, the device shuts down the trunk interface.
The manual 1:1 master/backup mode is used when the peer device does not support
LACP.
l Static LACP mode
In static LACP mode, you must also manually create a trunk interface and add member
interfaces to it. Compared with link aggregation in manual load balancing mode, active
interfaces in static LACP mode are selected through exchange of Link Aggregation
Control Protocol Data Units (LACPDUs). To be specific, when a group of interfaces are
added into a trunk interface, the status of each member interface (active or inactive)
depends on LACP negotiation.
Table 5-10 shows a comparison between manual load balancing and static LACP modes.
Table 5-10 Comparison between the manual load balancing and static LACP modes
ATNA CX-B
In this mode, load balancing is implemented among all member interfaces. The ATN supports
the following load balancing types:
The former two types apply to per-destination load balancing, and the third mode applies to
the per-packet load balancing.
NOTE
In manual load balancing mode, member interfaces on different boards can be added into the same Eth-
Trunk interface.
Destination Address
Source Address
Length/Type
Subtype=LACP
Version Number
TLV_type=Actor Information
Actor_Information_Length=20
Actor_Port
Actor_State
Actor_System_Priority
Actor_System
Actor_Key
Actor_Port_Priority
Reserved
TLV_type=Partner Information
Partner_Information_Length=20
Partner_Port
Partner_State
Partner_System_Priority
Partner_System
Partner_Key
Partner_Port_Priority
Reserved
TLV_type=Collector Information
Collector_Information_Length=16
CollectorMaxDelay
Reserved
TLV_type=Terminator
Terminator_Length=0
Reserved
FCS
LACPDU
b. Devices at both ends determine the Actor based on the system LACP priority and
system ID.
In Figure 5-41, devices at both ends receive LACPDUs from each other. Use ATN
B as an example. When ATN B receives LACPDUs from -A, ATN B checks and
records information about CX-A and compares system priorities. If the system
priority of CX-A is higher than that of ATN B, CX-A acts as the Actor and ATN B
selects the active interfaces based on the priorities of the corresponding interfaces
on CX-A. In this manner, active interfaces of both devices are determined.
c. Devices at both ends determine active interfaces based on the Actor's LACP
priorities and interface IDs.
In Figure 5-42, after devices at both ends determine the Actor, they select active
interfaces according to the priorities of the Actor's interfaces.
Then active interfaces are selected, active links in the LAG are specified, and load
balancing is implemented across these active links.
Primary link
Backup link
5.3.2.7 E-Trunk
Enhanced Trunk (E-Trunk) controls and implements link aggregation among multiple devices.
E-Trunk implements device-level link reliability, instead of board-level link reliability.
NOTE
Only ATN 950B ( AND2CXPB/AND2CXPE ) support this command.
Basic Concepts
l E-Trunk ID
An E-Trunk ID is an integer that uniquely identifies an E-Trunk link.
l Eth-Trunk ID
An Eth-Trunk ID is an integer that uniquely identifies an Eth-Trunk link.
l E-Trunk priority
E-Trunk priorities determine the master/backup status of two devices in an aggregation
group. In Figure 5-44, PE1 has a higher E-Trunk priority than PE2. Therefore PE1 is the
master device, and PE2 is the backup device. A smaller value indicates a higher E-Trunk
priority.
l E-Trunk system ID
E-Trunk system IDs determine the master/backup status of two devices in an aggregation
group when they have the same E-Trunk priority. An E-Trunk system ID is the MAC
address of an Ethernet interface on a main control board. The device with a smaller
MAC address has a higher priority.
l LACP E-Trunk system priority
The Link Aggregation Control Protocol (LACP) system priority of an E-Trunk member
interface (Eth-Trunk interface) is called LACP E-Trunk system priority.
If an E-Trunk consists of Eth-Trunk interfaces in static LACP mode, LACP E-Trunk
system priorities determine the LACP Actor from the two ends of an Eth-Trunk link. The
end with a smaller LACP E-Trunk system priority value functions as the Actor. The
Actor selects active interfaces from its local Eth-Trunk member interfaces, and then the
other end selects its local Eth-Trunk member interfaces that are directly connected to the
active interfaces of the Actor as active interfaces.
NOTE
l LACP E-Trunk system priorities apply to E-Trunks that consist of Eth-Trunk interfaces in
static LACP mode.
l LACP system priorities apply to Eth-Trunk interfaces in static LACP mode.
l LACP E-Trunk system priorities and LACP system priorities are configurable. If both of them
are configured and Eth-Trunk interfaces in static LACP mode are added to an E-Trunk, the
LACP E-Trunk system priority is used.
l LACP E-Trunk system ID
The LACP system ID of an E-Trunk member interface (Eth-Trunk interface) is called
LACP E-Trunk system ID.
If two devices have the same LACP E-Trunk system priority, the device with the smaller
LACP E-Trunk system ID has a higher priority.
NOTE
l LACP E-Trunk system IDs apply to E-Trunks that consist of Eth-Trunk interfaces in static
LACP mode.
l LACP system IDs apply to Eth-Trunk interfaces in static LACP mode.
l LACP E-Trunk system IDs are configurable, whereas LACP system IDs are not because
LACP system IDs are the MAC addresses of the Ethernet interfaces on main control boards.
In E-Trunk, to enable a CE to consider the remote PEs as a single device, you must
configure the same LACP E-Trunk system priority and system ID for the PEs. In Figure
5-44, the LACP E-Trunk system ID is in the format of a MAC address.
l Eth-Trunk working mode
Eth-Trunk working modes refer to the mode in which E-Trunks' member interfaces Eth-
Trunks work. Eth-Trunk interfaces that are added to an E-Trunk can work in any of the
following modes:
– Automatic
– Forcible master
– Forcible backup
l Timeout period
Normally, the master and backup devices in an E-Trunk periodically exchange Hello
messages. If the backup device does not receive any Hello message within the timeout
period, it becomes the master device.
The timeout period is obtained through this formula: Timeout period = Interval at which
Hello messages are sent x Time multiplier.
If the time multiplier is 3, the backup device becomes the master device if it does not
receive any Hello message within three consecutive sending intervals.
PE1
10
- Trunk
Eth-Trunk 1 Eth
E-Trunk 1
CE
Eth-Tru
nk 10
PE2
NOTE
When you configure IP addresses for Eth-Trunk interfaces connecting the CE and PEs to
transmit Layer 3 services, the PE Eth-Trunk interface configurations must meet the
following requirements:
l The Eth-Trunk interfaces must have IP addresses residing on the same network
segment.
In most cases, the master device advertises the direct route to its Eth-Trunk interface,
and the backup device does not. After a master/backup device switchover is complete,
the new master device (former backup device) advertises the direct route to its Eth-
Trunk interface.
l The same MAC address must be configured for the PE Eth-Trunk interfaces.
This prevents the CE from updating its ARP entries for a long time when a master/
backup device switchover is performed and therefore ensures uninterrupted service
forwarding.
There are few scenarios for configuring IP addresses for Eth-Trunk interfaces connecting the
CE and PEs to transmit Layer 3 services and then adding the PE Eth-Trunk interfaces to an
E-Trunk.
n PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In
addition, the Eth-Trunk interfaces are added to the E-Trunk group.
n CE end
Eth-Trunk interfaces in static LACP mode are configured on the CE. By using
the Eth-Trunk interfaces, the CE is connected to PE1 and PE2.
The E-Trunk group is invisible to the CE.
i. E-Trunk master/backup status
PE1 and PE2 negotiate the E-Trunk master/backup status by exchanging E-
Trunk packets. Normally, after the negotiation one PE functions as the master
and the other as the backup.
The master/backup status of a PE depends on the E-Trunk priority and E-
Trunk ID carried in E-Trunk packets. The smaller the E-Trunk priority value,
the higher the E-Trunk priority. The PE with the higher E-Trunk priority
functions as the master. If the E-Trunk priorities of the PEs are the same, the
PE with the smaller E-Trunk system ID functions as the master device.
ii. Master/backup status of a member Eth-Trunk interface in the E-Trunk group
The master/backup status of a member Eth-Trunk interface in the E-Trunk
group is determined by its E-Trunk status and the peer Eth-Trunk interface
status.
As shown in Figure 5-44, PE1 and PE2 are on the two ends of the E-Trunk
link. PE1 is considered as the local end and PE2 as the peer end.
The status of each member Eth-Trunk interface in the E-Trunk group is
determined, as shown in Table 5-11.
Table 5-11 Master/backup status of an E-Trunk group and its member Eth-
Trunk interfaces
Status of the Working Status of the Status of the
Local E-Trunk Mode of the Peer Eth- Local Eth-
Local Eth- Trunk Trunk
Trunk Interface Interface
Interface
In normal situations:
○ If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the
master, and its link status is Up.
○ If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the
backup, and its link status is Down.
If the link between the CE and PE1 fails, the following situations occur:
1) PE1 sends an E-Trunk packet containing information about the faulty
Eth-Trunk 10 of PE1 to PE2.
2) After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the
peer is faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2
becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is
forwarded through PE2. In this way, traffic destined for the peer CE is
protected.
If PE1 is faulty, the following situations occur:
1) If the PEs are configured with BFD, the PE2 detects that the BFD session
status becomes Down, then functions as the master and Eth-Trunk 10 of
PE2 functions as the master.
2) If the PEs are not configured with BFD, PE2 will not receive any E-Trunk
packet from PE1 before its timeout period runs out, after which PE2 will
function as the master and Eth-Trunk 10 of PE2 will function as the
master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2
becomes Up. The traffic of the CE is forwarded through PE2. In this way,
destined for the peer CE is protected.
– Sending and receiving of E-Trunk packets
E-Trunk packets carrying the source IP address and port number configured on the
local end are sent through UDP. Factors triggering the sending of E-Trunk packets
are as follows:
n The sending timer times out.
n The configurations change. For example, the E-Trunk priority, packet sending
period, timeout period multiplier, addition/deletion of a member Eth-Trunk
interface, or source/destination IP address of the E-Trunk group changes.
n A member Eth-Trunk interface fails or recovers.
E-Trunk packets contain the timeout period to be used as the timeout period for the
peer.
– BFD fast detection
A device cannot quickly detect a fault on its peer based on the timeout period of
received packets. In this case, BFD can be configured on the device. The peer end
needs to be configured with an IP address. After a BFD session is established to
detect whether the route destined for the peer is reachable, E-Trunk can sense any
fault detected by BFD.
– Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the
Eth-Trunk interface on the local device goes Down or the local device fails, the peer
device becomes the master and the physical status of the member Eth-Trunk
interface becomes Up.
When the local end recovers, the local end needs to function as the master.
Therefore, the local Eth-Trunk interface enters the LACP negotiation state. After
being informed by LACP that the negotiation ability is Up, the local device starts
the switchback delay timer. After the switchback delay timer times out, the local
Eth-Trunk interface becomes the master. After LACP negotiation, the Eth-Trunk
interface becomes Up.
l Eth-Trunk interfaces in manual load balancing mode are added to an E-Trunk.
– Master/backup status negotiation
As shown in Figure 5-45, the CE is directly connected to PE1 and PE2, and E-
Trunk runs between PE1 and PE2.
PE1
10
- Trunk
Eth-Trunk 1 Eth
E-Trunk 1
CE
Eth-Tru
nk 10
PE2
n PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In
addition, the Eth-Trunk interfaces are added to the E-Trunk group.
i. E-Trunk master/backup status
PE1 and PE2 negotiate the E-Trunk master/backup status by exchanging E-
Trunk packets. Normally, after the negotiation one PE functions as the master
and the other as the backup.
The master/backup status of a PE depends on the E-Trunk priority and E-
Trunk ID carried in E-Trunk packets. The smaller the E-Trunk priority value,
the higher the E-Trunk priority. The PE with the higher E-Trunk priority
functions as the master. If the E-Trunk priorities of the PEs are the same, the
PE with the smaller E-Trunk system ID functions as the master device.
ii. Master/backup status of a member Eth-Trunk interface in the E-Trunk group
The master/backup status of a member Eth-Trunk interface in the E-Trunk
group is determined by its E-Trunk status and the peer Eth-Trunk interface
status.
As shown in Figure 5-45, PE1 and PE2 are on the two ends of the E-Trunk
link. PE1 is considered as the local end and PE2 as the peer end.
The status of each member Eth-Trunk interface in the E-Trunk group is
determined, as shown in Table 5-12.
Table 5-12 Master/backup status of an E-Trunk group and its member Eth-
Trunk interfaces
Status of the Working Status of the Status of the
Local E-Trunk Mode of the Peer Eth- Local Eth-
Local Eth- Trunk Trunk
Trunk Interface Interface
Interface
In normal situations:
○ If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the
master, and its link status is Up.
○ If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the
backup, and its link status is Down.
If the link between the CE and PE1 fails, the following situations occur:
1) PE1 sends an E-Trunk packet containing information about the faulty
Eth-Trunk 10 of PE1 to PE2.
2) After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the
peer is faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master.
Through the Trunk negotiation, the status of Eth-Trunk 10 on PE2
becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is
forwarded through PE2. In this way, traffic destined for the peer CE is
protected.
If PE1 is faulty, the following situations occur:
1) If the PEs are configured with BFD, the PE2 detects that the BFD session
status becomes Down, then functions as the master and Eth-Trunk 10 of
PE2 functions as the master.
2) If the PEs are not configured with BFD, PE2 will not receive any E-Trunk
packet from PE1 before its timeout period runs out, after which PE2 will
function as the master and Eth-Trunk 10 of PE2 will function as the
master.
Through the Trunk negotiation, the status of Eth-Trunk 10 on PE2
becomes Up. The traffic of the CE is forwarded through PE2. In this way,
destined for the peer CE is protected.
– Sending and receiving of E-Trunk packets
E-Trunk packets carrying the source IP address and port number configured on the
local end are sent through UDP. Factors triggering the sending of E-Trunk packets
are as follows:
E-Trunk Restrictions
To improve the reliability of CE and PE links, and to ensure that traffic can be automatically
switched between these links, the configurations on both ends of the E-Trunk link must be
consistent. Use the networking in Figure 5-44 as an example.
l The Eth-Trunk link directly connecting PE1 to the CE and the Eth-Trunk link directly
connecting PE2 to the CE must be configured with the same working rate, and duplex
mode. This ensures that both Eth-Trunk interfaces have the same key and join the same
E-Trunk group.
l Proper IP addresses must be specified for the two PEs to ensure Layer 3 connectivity.
The address of the local PE is the peer address of the peer PE, and the address of the peer
PE is the peer address of the local PE. Here, it is recommended that the addresses of the
PEs are configured as loopback interface addresses.
l The two PEs must be configured with the same security key (if necessary).
5.3.3.1 Eth-Trunk
In Figure 5-46, an Eth-Trunk link is established between ATN A and CXB, and two full-
duplex GE interfaces are added to the Eth-Trunk interface. The total bandwidth of the Eth-
Trunk interface doubles that of each GE interface.
Eth-Trunk1
10.1.1.2/24
An Eth-Trunk interface improves link reliability. If one Eth-Trunk member link fails, traffic
switches to the other member link.
An Eth-Trunk interface also alleviates network congestion, because the Eth-Trunk interface
balances its traffic between two member links.
Core
Network
PE-AGG
Eth-Trunk 1
UPE
You can set a working mode for the Eth-Trunk interfaces as required:
l If devices at both ends of the Eth-Trunk link support LACP, set the Eth-Trunk interfaces
to work in static LACP mode.
l If the device at either end of the Eth-Trunk does not support LACP, set the Eth-Trunk
interfaces to work in manual load balancing mode.
After Eth-Trunk 1 is created, you can implement QoS on the logical interface as it is a
common interface.
You can also implement traffic shaping, congestion management, and congestion avoidance
for outgoing traffic on Eth-Trunk 1 on both the UPE and PE-AG. These configurations ensure
that packets of high priorities are preferentially sent.
Service Overview
Eth-Trunk implements link reliability between single devices. However, if a device fails, Eth-
Trunk ceases to take effect.
To improve network reliability, carriers introduced the device redundancy method that
requires master and backup devices. If the master device or primary link fails, the backup
device can take over user services. In this situation, another device must be dual-homed to the
master and backup devices, and inter-device link reliability must be ensured.
Networking Description
In Figure 5-48, the customer edge (CE) is dual-homed to the virtual private LAN service
(VPLS) network, and Eth-Trunk is deployed on the CE and provider edges (PEs) to
implement link reliability.
In normal situations, the CE communicates with remote devices on the VPLS network
through PE1. If PE1 or the link between the CE and PE1 fails, the CE cannot communicate
with PE1. To ensure that services are not interrupted, deploy E-Trunk on PE1 and PE2. If PE1
or the link between the CE and PE1 fails, traffic is switched to PE2. The CE then continues to
communicate with remote devices on the VPLS network through PE2. If PE1 or the link
between the CE and PE1 recovers, traffic is switched back to PE1. E-Trunk provides backup
between Eth-Trunk links of the PEs, improving device-level reliability.
Eth
-Tr
un
k 10
E-Trunk 1
Eth-Trunk 10
P PE2 PE3
LA link aggregation
5.4 STP/RSTP/MSTP
5.4.1 Introduction
Definition
Redundant links are generally used on an Ethernet switching network to provide link backup
and enhance network reliability. However, the use of redundant links may produce loops,
causing broadcast storms and rendering the MAC address table unstable. As a result,
communication quality deteriorates, and communication services may be interrupted. The
Spanning Tree Protocol (STP) solves this problem.
STP has a narrow sense and a broad sense:
l STP, in a narrow sense, refers to only the STP protocol defined in IEEE 802.1D.
l STP, in a broad sense, refers to the STP protocol defined in IEEE 802.1D, the Rapid
Spanning Tree Protocol (RSTP) defined in IEEE 802.1W, and the Multiple Spanning
Tree Protocol (MSTP) defined in IEEE 802.1S.
The following spanning tree protocols are defined:
l STP
IEEE 802.1D, issued in 1998, defines STP.
STP, a management protocol at the data link layer, is used to detect and prevent loops on
a Layer 2 network. STP blocks redundant links on a Layer 2 network and trims a
network into a loop-free tree topology.
The STP topology, however, converges at a slow speed. Even an edge port cannot be
changed to the Forwarding state until twice the amount of time specified by the Forward
Delay timer elapses. The default time specified by the forward delay timer is 15 seconds.
l RSTP
IEEE 802.1W, issued in 2001, defines RSTP.
RSTP, as an enhancement of STP, achieves fast convergence of the network topology.
Both RSTP and STP have one defect: All the Virtual Local Area Networks (VLANs) in a
LAN share the same spanning tree. As a result, data traffic from different VLANs cannot
be balanced. Even worse, packets in some VLANs cannot be forwarded.
RSTP is backward compatible with STP and can be used together with STP on a
network.
l MSTP
IEEE 802.1S, issued in 2002, defines MSTP.
MSTP defines a VLAN mapping table in which VLANs are associated with multiple
spanning tree instances (MSTIs). In addition, MSTP divides a switching network into
multiple regions, each of which has multiple independent MSTIs. In this manner, the
entire network is trimmed into a loop-free tree topology, and replication and circular
propagation of packets and broadcast storms are prevented on the network. MSTP also
provides multiple redundant paths to load-balance VLAN traffic.
l Table 5-13 provides a comparison between these spanning tree protocols.
Purpose
After a spanning tree protocol is configured on an Ethernet switching network, it calculates
the network topology and implements the following functions to remove network loops:
l Loop cut-off: The potential loops on the network are cut off by blocking redundant links.
l Link redundancy: If an active path becomes faulty, a redundant link can be activated to
ensure network connectivity.
5.4.2.1 Background
STP is used to prevent loops in the LAN. The switching devices running STP discover loops
on the network by exchanging information with one another, and block certain interfaces to
cut off loops. Along with the growth of the LAN scale, STP has become an important
protocol for the LAN.
Host A
port1 1 port1 5
2
S1 S2
port2 3 port2 4
Host B
Data flow
On the network shown in Figure 5-49, the following situations may occur:
l Broadcast storms render the network unavailable.
It is known that loops lead to broadcast storms. In Figure 5-49, STP is not enabled on
the switches S1 and S2. If Host A broadcasts a request, the request is received by port 1
and forwarded by port 2 on both S1 and S2. S1's port 2 then receives the request from
S2's port 2 and forwards the request from S1's port 1. Similarly, S2's port 2 receives the
request from S1's port 2 and forwards the request from S2's port 1. As such transmission
repeats, resources on the entire network are exhausted, causing the network unable to
work.
l Flapping of MAC address tables damages MAC address entries.
As shown in Figure 5-49, even update of MAC address entries upon the receipt of
unicast packets damages the MAC address table.
Assume that no broadcast storm occurs on the network. Host A unicasts a packet to Host
B. If Host B is temporarily removed from the network at this time, the MAC address
entries of Host B on S1 and S2 are deleted. The packet unicast by Host A to Host B is
received by port 1 on S1. S1, however, does not have associated MAC address entries.
Therefore, the unicast packet is forwarded to port 1 and port 2. Then, port 2 on S2
receives the unicast packet from port 2 on S1 and sends it out through port 1. As such
transmission repeats, port 1 and port 2 on S1 and S2 continuously receive unicast packets
from Host A. Therefore, S1 and S2 modify the MAC address entries continuously,
causing the MAC address table to flap. As a result, MAC address entries are damaged.
Basic Design
STP runs at the data link layer. The devices running STP discover loops on the network by
exchanging information with each other and trim the ring topology into a loop-free tree
topology by blocking a certain interface. In this manner, replication and circular propagation
of packets are prevented on the network. In addition, STP prevents the processing
performance of network devices from deteriorating.
The devices running STP usually communicate with each other by exchanging configuration
Bridge Protocol Data Units (configuration BPDUs). BPDUs are classified into two types:
l Configuration BPDU: used to calculate a spanning tree and maintain the spanning tree
topology.
l Topology Change Notification BPDU (TCN BPDU): used to inform upstream devices of
a topology change by downstream device.
NOTE
Configuration BPDUs contain sufficient information for devices to calculate the spanning tree. They
contain the following information:
l Root bridge ID: is composed of a root bridge priority and the root bridge's MAC address. Each
STP network has only one root bridge.
l Cost of the root path: indicates the cost of the shortest path to the root bridge.
l ID of a designated bridge: is composed of a bridge priority and a MAC address.
l ID of a designated port: is composed of a port priority and a port name.
l Message Age: sets the lifetime of a BPDU on the network.
l Max Age: sets the maximum time a BPDU is saved.
l Hello Time: sets the interval at which BPDUs are sent.
l Forward Delay: indicates the time interface status transition takes.
On an STP-capable network, the device with the smallest BID is selected to be the
root bridge.
– PID
The PID is composed of a 4-bit port priority and a 12-bit port number. The port
priority occupies the left most 4 bits and the port number occupies remaining bits
on the right.
The PID is used to select the designated port.
NOTE
The port priority affects the role of a port in a specified spanning tree instance. For details,
see 5.4.2.4 STP Topology Calculation.
l Path cost
The path cost is a port variable and is used to select a link. STP calculates the path cost
to select a robust link and blocks redundant links to trim the network into a loop-free tree
topology.
On an STP-capable network, the accumulative cost of the path from a certain port to the
root bridge is the sum of the costs of all the segment paths into which the path is
separated by the ports on the transit bridges.
Table 5-14 shows the path costs defined in IEEE 802.1t. Different device manufacturers
use different path cost standards.
NOTE
The rate of an aggregated link is the sum of the rates of all Up member links in the aggregated
group.
Three Elements
There are generally three elements used when a ring topology is to be trimmed into a tree
topology: root bridge, root port, and designated port. Figure 5-50 shows the three elements.
root
bridge A B S2
PC=100;RPC=0 PC=100;RPC=100
S1
B A
PC=100;RPC=0 PC=99;RPC=199
A B
PC=100;RPC=100 PC=99;RPC=199
B A
S3 PC=200;RPC=300 PC=200;RPC=300 S4
l Root bridge
The root bridge is the bridge with the smallest BID. The smallest BID is discovered by
exchanging configuration BPDUs.
l Root port
The root port is the port with the smallest root path to the root bridge. The root port is
determined based on the path cost. Among all STP-capable ports on a network bridge,
the port with the smallest root path cost is the root port. There is only one root port on an
STP-capable device, but there is no root port on the root bridge.
l Designated port
For description of the designated bridge and designated port, see Table 5-15.
As shown in Figure 5-51, AP1 and AP2 reside on S1; BP1 and BP2 reside on S2; CP1
and CP2 reside on S3.
– S1 sends configuration BPDUs to S2 through AP1. S1 is the designated bridge of
S2, and AP1 on S1 is the designated port.
– Two devices, S2 and S3, are connected to the LAN. If S2 is responsible for
forwarding configuration BPDUs to the LAN, S2 is the designated bridge of the
LAN and BP2 on S2 is the designated port.
Figure 5-51 Networking diagram of the designated bridge and designated port
S1
AP1 AP2
BP1 CP1
S2 S3
BP2 CP2
LAN
After the root bridge, root port, and designated port are selected successfully, the entire tree
topology is set up. If the topology is stable, only the root port and the designated port forward
traffic. All the other ports are in the Blocking state and receive only STP protocol packets
instead of forwarding user traffic.
Root BID Each STP-capable network has only one root bridge.
Root path cost Cost of the path from the port sending configuration
BPDUs to the root bridge.
After a device on the STP-capable network receives configuration BPDUs, it compares the
fields shown in Table 5-16 with that of the configuration BPDUs on itself. The four
comparison principles are as follows:
NOTE
During the STP calculation, the smaller the value, the higher the priority.
l Smallest BID: used to select the root bridge. Devices running STP select the smallest
BID as the root BID shown in Table 5-16.
l Smallest root path cost: used to select the root port on a non-root bridge. On the root
bridge, the path cost of each port is 0.
l Smallest sender BID: used to select the root port when a device running STP selects the
root port between two ports that have the same path cost. The port with a smaller BID is
selected as the root port in STP calculation. Assume that the BID of S2 is smaller than
that of S3 in Figure 5-50. If the path costs in the BPDUs received by port A and port B
on S4 are the same, port B becomes the root port.
l Smallest PID: used to block the port with a greater PID but not the port with a smaller
PID when the ports have the same path cost. The PIDs are compared in the scenario
shown in Figure 5-52. The PID of port A on S1 is smaller than that of port B. In the
BPDUs that are received on port A and port B, the path costs and BIDs of the sending
devices are the same. Therefore, port B with a greater PID is blocked to cut off loops.
S1 S2
A B
designated port
blocked port
Forwardi A port in the Forwarding state Only the root port and designated port
ng forwards user traffic and BPDUs. can enter the Forwarding state.
Learning When a device has a port in the This is a transitional state, which is
Learning state, the device creates a designed to prevent temporary loops.
MAC address table based on the
received user traffic but does not
forward user traffic.
Listening All ports are in the Listening state This is a transitional state.
when STP calculation is being
implemented to determine port roles.
Blocking A port in the Blocking state receives This is the final state of a blocked
and forwards only BPDUs, not user port.
traffic.
Disabled A port in the Disabled state does not The port is Down.
forward BPDUs or user traffic.
2
Listening
3 5
4
1 4
Disabled or
Down Blocking Learning
2
4
5
2
Forwarding
NOTICE
A Huawei datacom device uses MSTP by default. After a device transitions from the MSTP
mode to the STP mode, its STP-capable port supports the same port states as those supported
by an MSTP-capable port, including the Forwarding, Learning, and Discarding states. For
details, see Table 5-18.
Port Description
Status
Forwardi A port in the Forwarding state can send and receive BPDUs as well as forward
ng user traffic.
Learning A port in the Learning state learns MAC addresses from user traffic to
construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward
user traffic.
Port Description
Status
The following parameters affect the STP-capable port states and convergence.
l Hello time
The Hello timer specifies the interval at which an STP-capable device sends
configuration BPDUs to detect link faults.
When the network topology becomes stable, the change made on the interval takes effect
only after a new root bridge takes over. The new root bridge adds certain fields in
BPDUs to inform non-root bridges of the change in the interval. After a topology
changes, TCN BPDUs will be sent. This interval is irrelevant to the transmission of TCN
BPDUs.
l Forward Delay time
The Forward Delay timer specifies the delay for interface status transition. When a link
fault occurs, STP recalculation is performed, causing the structure of the spanning tree to
change. The configuration BPDUs generated during STP recalculation cannot be
immediately transmitted over the entire network. If the root port and designated port
forward data immediately after being selected, transient loops may occur. Therefore, an
interface status transition mechanism is introduced by STP. The newly selected root port
and designated port do not forward data until an amount of time equal to twice the
forward delay has past. In this manner, the newly generated BPDUs can be transmitted
over the network before the newly selected root port and designated port forward data,
which prevents transient loops.
NOTE
The Forward Delay timer specifies the duration of a port spent in both the Listening and Learning
states. The port in the Listening or Learning state is blocked, which is key to preventing transient
loops.
l Max Age time
The Max Age time specifies the aging time of BPDUs. The Max Age time can be
manually configured on the root bridge.
Configuration BPDUs are transmitted over the entire network, ensuring a unique Max
Age value. After a non-root bridge running STP receives a configuration BPDU, the
non-root bridge compares the Message Age value with the Max Age value in the
received configuration BPDU.
– If the Message Age value is smaller than or equal to the Max Age value, the non-
root bridge forwards the configuration BPDU.
– If the Message Age value is larger than the Max Age value, the configuration
BPDU ages and the non-root bridge directly discard it. In this case, the network size
is considered too large and the non-root bridge disconnects from the root bridge.
NOTE
If the configuration BPDU is sent from the root bridge, the value of Message Age is 0. Otherwise,
the value of Message Age indicates the total time during which a BPDU is sent from the root
bridge to the local bridge, including the delay in transmission. In real world situations, each time a
configuration BPDU passes through a bridge, the value of Message Age increases by 1.
l Configuration BPDUs are heartbeat packets. STP-enabled designated ports send BPDUs
at intervals specified by the Hello timer.
l Topology Change Notification (TCN) BPDUs are sent only after the device detects
network topology changes.
A BPDU is encapsulated into an Ethernet frame. In an Ethernet frame, the destination MAC
address is the multicast MAC address 01-80-C2-00-00-00; the value of the Length/Type field
is the length of MAC data; in the Logical Link Control (LLC) header, as defined in the IEEE
standard, the values of Destination Service Access Point (DSAP) and Source Service Access
Point (SSAP) are 0x42 and the value of Control is 0x03; the BPDU header follows the LLC
header. Figure 5-54 shows the format of an Ethernet frame.
Configuration BPDU
Configuration BPDUs are most commonly used.
During initialization, each bridge actively sends configuration BPDUs. After the network
topology becomes stable, only the root bridge actively sends configuration BPDUs. Other
bridges send configuration BPDUs only after receiving configuration BPDUs from upstream
devices. A configuration BPDU is at least 35 bytes long, including the parameters such as the
BID, path cost, and PID. A BPDU is discarded if both the sender BID and Port ID field values
are the same as those of the local port. Otherwise, the BPDU is processed. In this manner,
BPDUs containing the same information as that of the local port are not processed.
Protocol 2 Always 0
Identifier
Protocol 1 Always 0
Version
Identifier
BPDU Type 1 Indicates the type of a BPDU. The value is one of the
following:
l 0x00: Configuration BPDU
l 0x80: TCN BPDU
Root Path 4 Indicates the cumulative cost of all links to the root bridge.
Cost
Message 2 Records the time since the root bridge originally generated
Age the information that a BPDU is derived from.
If the configuration BPDU is sent from the root bridge, the
value of Message Age is 0. Otherwise, the value of Message
Age indicates the total time during which a BPDU is sent
from the root bridge to the local bridge, including the delay
in transmission. In real world situations, each time a
configuration BPDU passes through a bridge, the value of
Message Age increases by 1.
Forward 2 Indicates the time spent in the Listening and Learning states.
Delay
Figure 5-55 shows the Flags field. Only the leftmost and rightmost bits are used in STP.
Reserved
Bit7 Bit0
TCN BPDU
The contents of TCN BPDUs are quite simple, including only three fields: Protocol ID,
Version, and Type, as shown in Table 5-19. The value of the Type field is 0x80, four bytes in
length.
TCN BPDUs are transmitted by each device to its upstream device to notify the upstream
device of changes in the downstream topology, until they reach the root bridge. A TCN BPDU
is generated in one of the following scenarios:
l Where the port is in the Forwarding state.
l Where a designated port receives TCN BPDUs and sends a copy to the root bridge.
NOTE
As each bridge considers itself the root bridge, the value of the root BID field in the BPDU sent by
each port is recorded as its BID; the value of the Root Path Cost field is the cumulative cost of all
links to the root bridge; the sender BID is the ID of the local bridge; the Port ID is the PID of the
local bridge port that sends the BPDU.
A B
S1 {S2_MAC,0,S2_MAC,B_PID} S2
Once a port receives a BPDU with a priority higher than that of itself, the port extracts
certain information from the BPDU and synchronizes its own information with the
obtained information. The port stops sending the BPDU immediately after saving the
updated BPDU.
When sending a BPDU, each device fills in the Sender BID field with its own BID.
When a device considers itself the root bridge, the device fills in the Root BID field with
its own BID. As shown in Figure 5-56, Port B on S2 receives a BPDU with a higher
priority from S1, and therefore considers S1 the root bridge. When another port on S2
sends a BPDU, the port fills in its Root BID field with S1_BID. The preceding
intercommunication is repeatedly performed between two devices until all devices
consider the same device as the root bridge. This indicates that the root bridge is
selected. Figure 5-57 shows the root bridge selection.
Priority 32768
MAC 0000-0C12-3458
A
S2
Priority 32768
least-cost path). The port connecting to that path becomes the root port of the bridge.
Figure 5-58 shows the root port selection.
NOTE
In the Root Path Cost algorithm, after a port receives a BPDU, the port extracts the value of the
Root Path Cost field, and adds the obtained value and the path cost on the itself to obtain the root
path cost. The path cost on the port covers only directly-connected path costs. The cost can be
manually configured on a port. If the root path costs on two or more ports are the same, the port
that sends a BPDU with the smallest sender BID value is selected as the root port.
S1 port1 port3
root S3
bridge
port2 port4
19(cost)
root port
root port
designated port
blocked port
3 Config 1 0
6 Config 1 1
Config 0 1
7 Config 0 1
Config 0 1
1. After the network topology changes, a downstream device continuously sends Topology
Change Notification (TCN) BPDUs to an upstream device.
2. After the upstream device receives TCN BPDUs from the downstream device, only the
designated port processes them. The other ports may receive TCN BPDUs but do not
process them.
3. The upstream device sets the Topology Change Acknowledgment (TCA) bit of the Flags
field in the configuration BPDUs to 1 and returns the configuration BPDUs to instruct
the downstream device to stop sending TCN BPDUs.
4. The upstream device sends a copy of the TCN BPDUs to the root bridge.
5. Steps 1, 2, 3 and 4 are repeated until the root bridge receives the TCN BPDUs.
6. The root bridge sets the Topology Change (TC) bit of the Flags field in the configuration
BPDUs to 1 to instruct the downstream device to delete MAC address entries.
NOTE
l TCN BPDUs are used to inform the upstream device and root bridge of topology changes.
l Configuration BPDUs with the TCA bit being set to 1 are used by the upstream device to inform the
downstream device that the topology changes are known and instruct the downstream device to stop
sending TCN BPDUs.
l Configuration BPDUs with the TC bit being set to 1 are used by the upstream device to inform the
downstream device of topology changes and instruct the downstream device to delete MAC address
entries. In this manner, fast network convergence is achieved.
Figure 5-59 is used as an example to show how the network topology converges when the
root bridge or designated port of the root bridge becomes faulty.
l The root bridge becomes faulty.
Figure 5-61 Diagram of topology changes in the case of a faulty root bridge
A S2 B
port5 port6
S1 port1 port3 S3
root root
bridge bridge
port2 port4
root port
designated port
As shown in Figure 5-61, the root bridge becomes faulty, S2 and S3 will reselect the
root bridge. S2 and S3 exchange configuration BPDUs to select the root bridge.
l The designated port of the root bridge becomes faulty.
Figure 5-62 Diagram of topology changes in the case of a faulty designated port on the
root bridge
S2
A B
port5 port6
S1 port1 port3
root S3
bridge
port2 port4
root port
designated port
As shown in Figure 5-62, the designated port of the root bridge, port1, becomes faulty.
port6 is selected as the root port through exchanging configuration BPDUs of S2 and S3.
In addition, port6 sends TCN BPDUs after entering the forwarding state. Once the root
bridge receives the TCN BPDUs, it will send TC-BPDUs to instruct the downstream
device to delete MAC address entries.
Disadvantages of STP
STP ensures a loop-free network but has a slow network topology convergence speed, leading
to service deterioration. If the network topology changes frequently, the connections on the
STP-capable network are frequently torn down, causing frequent service interruption. Users
can hardly tolerate such a situation.
Disadvantages of STP are as follows:
l Port status or port roles are not subtly distinguished, which is not conducive to the
learning and deployment for beginners.
A network protocol that subtly defines and distinguishes different situations is likely to
outperform the others.
– Ports in the Listening, Learning, and Blocking states do not forward user traffic and
are not even slightly different to users.
– The differences between ports in essence never lie in the port status but the port
roles from the perspective of use and configuration.
It is possible that the root port and designated port are both in the Listening state or
Forwarding state.
l The STP algorithm determines topology changes after the time set by the timer expires,
which slows down network convergence.
l The STP algorithm requires a stable network topology. After the root bridge sends
configuration BPDUs, other devices forward them until all bridges on the network
receive the configuration BPDUs.
This also slows down topology convergence.
S1
root bridge
B A
S2 S3
A A a
S1
root bridge
B A
S2 S3
A a
B A
b
root port
designated port
Alternate port
Backup port
As shown in Figure 5-63, RSTP defines four port roles: root port, designated port,
alternate port, and backup port.
The functions of the root port and designated port are the same as those defined in STP.
The alternate port and backup port are described as follows:
Port status and port roles are not necessarily related. Table 5-20 lists states of ports with different
roles.
Table 5-20 Comparison between states of STP ports and RSTP ports with different roles
l Configuration BPDUs in RSTP are differently defined. Port roles are described based on
the Flags field defined in STP.
Compared with STP, RSTP slightly redefined the format of configuration BPDUs.
– The value of the Type field is no longer set to 0 but 2. Therefore, the RSTP-capable
device always discards the configuration BPDUs sent by an STP-capable device.
– The 6 bits in the middle of the original Flags field are reserved. Such a
configuration BPDU is called a Rapid Spanning Tree (RST) BPDU, as shown in
Figure 5-64.
the root bridge to a designated port on the network segment connecting to the
alternate port.
When the port role changes, the network topology will change accordingly. For
details, see 5.4.2.6 Details About RSTP.
– Edge ports
In RSTP, a designated port on the network edge is called an edge port. An edge port
directly connects to a terminal and does not connect to any other switching devices.
An edge port does not receive configuration BPDUs, and therefore does not
participate in the RSTP calculation. It can directly change from the Disabled state to
the Forwarding state without any delay, just like an STP-incapable port. If an edge
port receives bogus configuration BPDUs from attackers, it is deprived of the edge
port attributes and becomes a common STP port. The STP calculation is
implemented again, causing network flapping.
l Protection functions
Table 5-21 shows protection functions provided by RSTP.
P/A Mechanism
The Proposal/Agreement (P/A) mechanism helps a designated port to enter the Forwarding
state as soon as possible. As shown in Figure 5-65, the P/A negotiation is performed based on
the following port variables:
Upstream Downstream
device device
Root port
Designated port
1. proposing: When a port is in the Discarding or Learning state, this variable is set to 1.
Additionally, a Rapid Spanning Tree (RST) BPDU with the Proposal field being 1 is sent
to the downstream switching device.
2. proposed: After a port receives an RST BPDU with the Proposal field being 1 from the
designated port on the peer device, this variable is set to 1, urging the designated port on
this network segment to enter the Forwarding state.
3. sync: After the proposed variable is set to 1, the root port receiving the proposal sets the
sync variable to 1 for the other ports on the same device; a non-edge port receiving the
proposal enters the Discarding state.
4. synced: After a port enters the Discarding state, it sets its synced variable to 1 in the
following manner: If this port is the alternate, backup, or edge port, it will immediately
set its synced variable to 1. If this port is the root port, it will monitor the synced
variables of the other ports. After the synced variables of all the other ports are set to 1,
the root port sets its synced variable to 1, and sends an RST BPDU with the Agreement
field being 1.
5. agreed: After the designated port receives an RST BPDU with the Agreement field being
1 and the port role field indicating the root port, this variable is set to 1. Once the agreed
variable is set to 1, this designated port immediately enters the Forwarding state.
S1
p0 1 Proposal
3 Agreement
p1
S2
p2 E p4
p3
As shown in Figure 5-66, a new link is established between the root bridges S1 and S2. On
S2, p2 is an alternate port; p3 is a designated port in the Forwarding state; p4 is an edge port.
The P/A mechanism works in the following process:
NOTE
To use the P/A mechanism, ensure that the link between the two devices is a P2P link in full-duplex
mode. Once the P/A negotiation fails, a designated port can be selected by performing the STP
negotiation after the forwarding delay timer expires twice.
On a network where both STP-capable and RSTP-capable devices are deployed, STP-capable
devices ignore RST BPDUs; if a port on an RSTP-capable device receives a configuration
BPDU from an STP-capable device, the port switches to the STP mode after two Hello
intervals and starts to send configuration BPDUs. In this manner, RSTP and STP are
interoperable.
After STP-capable devices are removed, Huawei RSTP-capable datacom devices can switch
back to the RSTP mode.
HostC HostA
(VLAN3) VLAN3 VLAN2 (VLAN2)
VLAN2 VLAN3
S2 S5
S3 S6
spanning tree(root bridge:S6)
On the network shown in Figure 5-67, STP or RSTP is enabled. The broken line shows the
spanning tree. S6 is the root switching device. The links between S1 and S4 and between S2
and S5 are blocked. VLAN packets are transmitted by using the corresponding links marked
with "VLAN2" or "VLAN3."
Host A and Host B belong to VLAN 2 but they cannot communicate with each other because
the link between S2 and S5 is blocked and the link between S3 and S6 denies packets from
VLAN 2.
To fix the defect of STP and RSTP, the IEEE released 802.1s in 2002, defining the Multiple
Spanning Tree Protocol (MSTP). MSTP implements fast convergence and provides multiple
paths to load balance VLAN traffic.
MSTP divides a switching network into multiple regions, each of which has multiple
spanning trees that are independent of each other. Each spanning tree is called a multiple
spanning tree instance (MSTI) and each region is called a multiple spanning tree (MST)
region.
NOTE
HostC HostA
(VLAN3) VLAN3 VLAN2 (VLAN2)
VLAN2
S2 S5
S3 S6
spanning tree(root bridge:S4)
spanning tree(root bridge:S6)
As shown in Figure 5-68, MSTP maps VLANs to MSTIs in the VLAN mapping table. Each
VLAN can be mapped to only one MSTI. This means that traffic of a VLAN can be
transmitted in only one MSTI. An MSTI, however, can correspond to multiple VLANs.
Two spanning trees are calculated:
l MSTI 1 uses S4 as the root switching device to forward packets of VLAN 2.
l MSTI 2 uses S6 as the root switching device to forward packets of VLAN 3.
In this manner, devices within the same VLAN can communicate with each other; packets of
different VLANs are load balanced along different paths.
MSTP Network
MSTI
MSTI1
1
MSTI2 MSTI0 MSTI2 MSTI0
MST Region MST Region
MSTI1
MSTI2 MSTI0
MST Region
MST Region
An MST region contains multiple switching devices and network segments between them.
The switching devices of one MST region have the following characteristics:
l MSTP-enabled
l Same region name
l Same VLAN-MSTI mappings
l Same MSTP revision level
A LAN can comprise several MST regions that are directly or indirectly connected. Multiple
switching devices can be grouped into an MST region by using MSTP configuration
commands.
As shown in Figure 5-70, the MST region D0 contains the switching devices S1, S2, S3, and
S4, and has three MSTIs.
AP1
D0 S1
MSTI1
Master Bridge
root switch:S3
MSTI2
root switch:S2
MSTI0 (IST)
S2 S3 root switch:S1
VLAN1 MSTI1
VLAN2,VLAN3 MSTI2
S4 other VLANs MSTI0
Regional Root
Regional roots are classified into Internal Spanning Tree (IST) and MSTI regional roots.
In the region B0, C0, and D0 on the network shown in Figure 5-72, the switching devices
closest to the Common and Internal Spanning Tree (CIST) root are IST regional roots.
An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional
root is the root of the MSTI. On the network shown in Figure 5-71, each MSTI has its own
regional root.
MST Region
VLAN VLA
N10
10&20&30 &20
VLAN 20&30
VLAN 10
Root
Root
MSTIs are independent of each other. An MSTI can correspond to one or more VLANs, but a
VLAN can be mapped to only one MSTI.
Master Bridge
The master bridge is the IST master, which is the switching device closest to the CIST root in
a region, for example, S1 shown in Figure 5-70.
If the CIST root is in an MST region, the CIST root is the master bridge of the region.
CIST Root
A0
CIST Root
D0 Region Root B0
Region Root
C0
Region Root
IST
CST
On the network shown in Figure 5-72, the CIST root is the root bridge of the CIST. The CIST
root is a device in A0.
CST
A Common Spanning Tree (CST) connects all the MST regions on a switching network.
If each MST region is considered a node, the CST is calculated by using STP or RSTP based
on all the nodes.
As shown in Figure 5-72, the MST regions are connected to form a CST.
IST
An IST resides within an MST region.
An IST is a special MSTI with the MSTI ID being 0, called MSTI 0.
An IST is a segment of the CIST in an MST region.
As shown in Figure 5-72, the switching devices in an MST region are connected to form an
IST.
CIST
A CIST, calculated by using STP or RSTP, connects all the switching devices on a switching
network.
As shown in Figure 5-72, the ISTs and the CST form a complete spanning tree, the CIST.
SST
A Single Spanning Tree (SST) is formed in either of the following situations:
l A switching device running STP or RSTP belongs to only one spanning tree.
l An MST region has only one switching device.
Port Role
Based on RSTP, MSTP has two additional port types. MSTP ports can be root ports,
designated ports, alternate ports, backup ports, edge ports, master ports, and regional edge
port.
The functions of root ports, designated ports, alternate ports, and backup ports have been
defined in RSTP. Table 5-22 lists all port roles in MSTP.
NOTE
Port Description
Role
Root port A root port is the non-root bridge port closest to the root bridge. Root bridges
do not have root ports.
Root ports are responsible for sending data to root bridges.
As shown in Figure 5-73, S1 is the root; CP1 is the root port on S3; BP1 is the
root port on S2; DP1 is the root port on S4.
Designate The designated port on a switching device forwards bridge protocol data units
d port (BPDUs) to the downstream switching device.
As shown in Figure 5-73, AP2 and AP3 are designated ports on S1; BP2 is a
designated port on S2; CP2 is a designated port on S3.
Port Description
Role
Master A master port is on the shortest path connecting MST regions to the CIST root.
port BPDUs of an MST region are sent to the CIST root through the master port.
Master ports are special regional edge ports, functioning as root ports on ISTs
or CISTs and master ports in instances.
As shown in Figure 5-73, S1, S2, S3, and S4 form an MST region. AP1 on S1,
being the nearest port in the region to the CIST root, is the master port.
Regional A regional edge port is located at the edge of an MST region and connects to
edge port another MST region or an SST.
During MSTP calculation, the roles of a regional edge port in the MSTI and
the CIST instance are the same. If the regional edge port is the master port in
the CIST instance, it is the master port in all the MSTIs in the region.
As shown in Figure 5-73, ports AP1, DP2, and DP3 in the MST region are
directly connected to other regions, and therefore they are all regional edge
ports of the MST region.
As shown in Figure 5-73, AP1 is a regional edge port and a master port in the
CIST. Therefore, AP1 is the master port in every MSTI in the MST region.
Edge port An edge port is located at the edge of an MST region and does not connect to
any switching device.
Generally, edge ports are directly connected to terminals.
After MSTP is enabled on a port, edge-port detecting is started automatically.
If the port fails to receive BPDU packets within (2 x Hello Timer + 1) seconds,
the port is set to an edge port. Otherwise, the port is set to a non-edge port.
As shown in Figure 5-73, BP3 is an edge port.
AP1 AP4
MST Region
Root port
AP2 AP3
S1 Designated port
Alternate
Root Bridge
port
CP1 BP1 Backup port
S3 S2 Regional edge
port
BP2 Master port
CP2 CP3 BP3
Edge port
S4
DP1 DP4 PC
DP2 DP3
Forwardi A port in the Forwarding state can send and receive BPDUs as well as forward
ng user traffic.
Learning A port in the Learning state learns MAC addresses from user traffic to
construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward
user traffic.
There is no necessary link between the port status and the port role. Table 5-24 lists the
relationships between port roles and port status.
Octet
Protocol Identifier 1-2
Protocol Version Identifier 3
BPDU Type 4
CIST Flags 5
CIST Root Identifier 6-13
CIST External Path Cost 14-17
CIST Regional Root Identifier 18-25
CIST Port Identifier 26-27
Message Age 28-29
Max Age 30-31
Hello Time 32-33
Forward Delay 34-35
Version 1 Length=0 36
Version 3 Length 37-38
MST Configuration Identifier 39-89
MST 90-93
CIST Internal Root Path Cost
special
CIST Bridge Identifier 94-101
fields
CIST Remaining Hops 102
MSTI Configuration Messages 103-39+Version
(may be absent) 3 Length
The first 36 bytes of an intra-region or inter-region MST BPDU are the same as those of an
RST BPDU.
Fields from the 37th byte of an MST BPDU are MSTP-specific. The field MSTI
Configuration Messages consists of configuration messages of Multiple Spanning Tree
Instances (MSTIs).
Table 5-26 lists the major information carried in an MST BPDU.
CIST External 4 Indicates the total path costs from the MST region
Path Cost where the switching device resides to the MST region
where the CIST root switching device resides. This
value is calculated based on link bandwidth.
Hello Time 2 Indicates the Hello timer value. The default value is 2
seconds.
Forward Delay 2 Indicates the forwarding delay timer. The default value
is 15 seconds.
CIST Internal 4 Indicates the total path costs from the local port to the
Root Path Cost IST master. This value is calculated based on link
bandwidth.
Figure 5-75 shows the sub-fields in the MST Configuration Identifier field.
Octet
Configuration Identifier Format Selector 39
Configuration Name 40-71
Revision Level 72-73
Configuration Digest 74-89
Table 5-27 describes the sub-fields in the MST Configuration Identifier field.
Figure 5-76 shows the sub-fields in the MSTI Configuration Messages field.
Table 5-28 describes the sub-fields in the MSTI Configuration Messages field.
MSTI Internal Root 4 Indicates the total path costs from the
Path Cost local port to the MSTI regional root
switching device. This value is
calculated based on link bandwidth.
If a port transmits either dot1s or legacy BPDUs by default, the user needs to identify the
format of BPDUs sent by the peer, and then runs a command to configure the port to support
the peer BPDU format. Once the configuration is incorrect, a loop probably occurs due to
incorrect MSTP calculation.
By using the stp compliance command, you can configure a port on a Huawei datacom
device to automatically adjust the MST BPDU format. With this function, the port
automatically adopts the peer BPDU format. The following MST BPDU formats are
supported by Huawei datacom devices:
l auto
l dot1s
l legacy
In addition to dot1s and legacy formats, the auto mode allows a port to automatically switch
to the BPDU format used by the peer based on BPDUs received from the peer. In this manner,
the two ports use the same BPDU format. In auto mode, a port uses the dot1s BPDU format
by default, and keeps pace with the peer after receiving BPDUs from the peer.
After a switching device becomes the root, it sends BPDUs at Hello intervals. Non-root
switching devices adopt the Hello Time value set for the root.
Huawei datacom devices allow the maximum number of BPDUs sent by a port at a Hello
interval to be configured as needed.
The greater the Hello Time value, the more BPDUs sent at a Hello interval. Setting the Hello
Time to a proper value limits the number of BPDUs sent by a port at a Hello interval. This
helps prevent network topology flapping and avoid excessive use of bandwidth resources by
BPDUs.
MSTP Principle
In MSTP, the entire Layer 2 network is divided into multiple MST regions, which are
interconnected by a single Common Spanning Tree (CST). In an MST region, multiple
spanning trees are calculated, each of which is called a Multiple Spanning Tree Instance
(MSTI). Among these MSTIs, MSTI 0 is also known as the internal spanning tree (IST). Like
STP, MSTP uses configuration messages to calculate spanning trees, but the configuration
messages are MSTP-specific.
Vectors
Both MSTIs and the Common and Internal Spanning Tree (CIST) are calculated based on
vectors, which are carried in Multiple Spanning Tree (MST) BPDUs. Therefore, switching
devices exchange MST BPDUs to calculate MSTIs and the CIST.
Root ID Identifies the root switching device for the CIST. The root
identifier consists of the priority value (16 bits) and MAC address
(48 bits).
External root path Indicates the path cost from a CIST regional root to the root.
cost (ERPC) ERPCs saved on all switching devices in an MST region are the
same. If the CIST root is in an MST region, ERPCs saved on all
switching devices in the MST region are 0s.
Regional root ID Identifies the MSTI regional root. The regional root ID consists
of the priority value (16 bits) and MAC address (48 bits).
Internal root path Indicates the path cost from the local bridge to the regional root.
cost (IRPC) The IRPC saved on a regional edge port is greater than the IRPC
saved on a non-regional edge port.
Designated Identifies the nearest upstream bridge on the path from the local
switching device bridge to the regional root. If the local bridge is the root or the
ID regional root, this ID is the local bridge ID.
Designated port Identifies the port on the designated switching device connected
ID to the root port on the local bridge. The port ID consists of the
priority value (4 bits) and port number (12 bits). The priority
value must be a multiple of 16.
Receiving port ID Identifies the port receiving the BPDU. The port ID consists of
the priority value (4 bits) and port number (12 bits). The priority
value must be a multiple of 16.
CIST Calculation
After completing the configuration message comparison, the switching device with the
highest priority on the entire network is selected as the CIST root. MSTP calculates an IST
for each MST region, and computes a CST to interconnect MST regions. On the CST, each
MST region is considered a switching device. The CST and ISTs constitute a CIST for the
entire network.
MSTI Calculation
In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between
VLANs and MSTIs. Each MSTI is calculated independently. The calculation process is
similar to the process for STP to calculate a spanning tree. For details, see 5.4.2.4 STP
Topology Calculation.
MSTIs have the following characteristics:
l The spanning tree is calculated independently for each MSTI, and spanning trees of
MSTIs are independent of each other.
l MSTP calculates the spanning tree for an MSTI in the manner similar to STP.
l Spanning trees of MSTIs can have different roots and topologies.
l Each MSTI sends BPDUs in its spanning tree.
l The topology of each MSTI is configured by using commands.
l A port can be configured with different parameters for different MSTIs.
l A port can play different roles or have different status in different MSTIs.
On an MSTP-aware network, a VLAN packet is forwarded along the following paths:
l MSTI in an MST region
l CST among MST regions
Upstream Downstream
device device
Sends a proposal so
that the port can
rapidly enter the
Forwarding state The root port blocks
all the other non-
edge ports
Sends an agreement
The root port
The designated enters the
port enters the Sends an agreement Forwarding state
Forwarding state
root port
designated port
5.4.4 Applications
STP Application
On a complex network, network designers tend to deploy multiple physical links between two
devices, one of which is the master and the others are the backup. Loops are likely or bound
to occur in such a situation.
Network
STP
CE1 CE2
PC1 PC2
Blocked port
On the network shown in Figure 5-78, after a CE and a PE running STP discover loops on the
network by exchanging information with each other, they trim the ring topology into a loop-
free tree topology by blocking a certain port. In this manner, replication and circular
propagation of packets are prevented on the network, and the switching devices are released
from processing duplicated packets, thereby improving their processing performance.
MSTP Application
MST Region
ATNA ATNB
all VLAN
VLAN
ATNC 20&40 ATND
In Figure 5-79, MSTP can be configured to use different spanning tree instances to forward
packets in different VLANs. The detailed configurations are as follows:
l Configure all switches on the network to belong to the same MST region.
l Configure VLAN 10 packets to be forwarded within MSTI 1; VLAN 30 packets within
MSTI 3; VLAN 40 packets within MSTI 4; VLAN 20 packets within MSTI 0.
BPDU Tunneling
The BPDU tunneling technology allows a user's networks located in different areas to
transparently transmit BPDUs on a specified VLAN VPN within a carrier's network. In this
manner, all devices on the user's networks can calculate the spanning tree. The user's
networks and the operator's networks have their own independent spanning trees.
As shown in Figure 5-80, the upper part is an operator's network; the lower part is a user's
network. The operator's networks hold ingress/egress devices; the user's networks consist of
user's network A and user's network B.
You can configure the packet ingress device to replace the original destination MAC address
of a BPDU with a MAC address in a special format and the packet egress device to replace
the MAC address in a special format with the original MAC address. In this manner, the
BPDU is transparently transmitted.
User's Network
NetworkA NetworkB
STP Spanning Tree Protocol. A protocol used in a local area network (LAN) to
eliminate loops. STP-capable devices exchange protocol packets to
discover loops in the network and block redundant interfaces to eliminate
loops.
RSTP Rapid Spanning Tree Protocol. A protocol defined in the IEEE 802.1w.
RSTP is a supplement to STP and implements faster convergence than
STP.
VLAN Virtual local area network. A logical switched network that is constructed
across different network segments by using network management
software. A VLAN forms a logical subnet (broadcast domain). One
VLAN can include multiple network devices.
5.5 QinQ
5.5.1 Introduction
Definition
802.1Q-in-802.1Q (QinQ) adds another IEEE 802.1Q tag to 802.1Q tagged packets entering
the network. QinQ expands the VLAN space by tagging the tagged packets. It allows services
in a private VLAN to be transparently transmitted over a public network.
Purpose
The 12-bit VLAN tag defined in IEEE 802.1Q identifies only a maximum of 4096 VLANs,
unable to isolate and identify the users on the growing metro Ethernet (ME) network. QinQ is
therefore developed to expand the VLAN space by adding another 802.1Q tag to an 802.1Q
tagged packet. In this way, the number of VLANs increases to 4096 x 4096.
In addition to expanding VLAN space, QinQ is applied in other scenarios with the
development of the ME network and carriers' requirements on refined operation. The outer
and inner VLAN tags can be used to differentiate users from services. For example, the inner
tag represents a user, whereas the outer tag represents a service. Moreover, QinQ functions as
a simple and practical VPN technology by transparently transmitting private VLAN services
over a public network. It extends services of a core MPLS VPN to the ME network and
implements an end-to-end VPN.
As the metro Ethernet grows and the refined operation requires, double tags of QinQ can be
applied in other scenarios. The inner tag indicates the user; the outer tag indicates the service.
In addition, when QinQ packets that carry double tags traverse the Internet Service Provider
(ISP) network, the inner tag is transmitted transparently. Such an implementation mode can
also be regarded as a simple and practical VPN technology. QinQ extends services of a core
MPLS VPN in the metro Ethernet and forms the end-to-end VPN.
Since the QinQ technology is easy to use, it has been widely applied in the ISP network. For
example, it is used by multiple services in the metro Ethernet. The introduction to selective
QinQ (VLAN stacking) makes QinQ more popular among ISPs. As the metro Ethernet
develops, different vendors propose their own metro Ethernet solutions. QinQ with its
simplicity and flexibility, plays important roles in metro Ethernet solutions.
5.5.2 Principles
5.5.2.1 Principles
802.1Q-in-802.1Q (QinQ) expands the VLAN space by adding another 802.1Q tag to a
tagged 802.1Q packet. To accommodate to the ME network development, QinQ becomes
diversified in its encapsulation and termination modes and is more intensely applied in service
refined operation.
802.1Q Encapsulation
DA SA ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 46 Byte~1500 Bytes 4 Bytes
QinQ
Encapsulation
DA SA ETYPE TAG ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 42 Byte~1500 Bytes 4 Bytes
QinQ Encapsulation
QinQ encapsulation is to add an 802.1Q tag to a tagged-802.1Q packet on UPE interfaces
connecting to users. Sometimes, QinQ encapsulation can also be performed on a routing sub-
interface.
l QinQ encapsulation on a routing sub-interface
In general, QinQ encapsulation is performed on the switched port. In a special situation,
however, QinQ encapsulation can be performed on the route sub-interface.
When data needs to be transmitted transparently over the MPLS/IP core network by
PWE3/VLL/VSI, the routing sub-interface on the NPE encapsulates packets with an
outer VLAN ID based on the user VLAN ID and accesses VLL/PWE3 through the outer
VLAN. This sub-interface is also called a QinQ stacking sub-interface.
This encapsulation mode is traffic-based. QinQ stacking sub-interfaces support only
L2VPN (PWE3/VLL/VPLS) and do not support Layer 3 forwarding.
l The VLAN IDs deployed at new sites and old sites conflict, but new sites need to
communicate with old sites.
l The VLAN ID planning of each site on the public network is different. As a result, the
VLAN IDs conflict. The sites, however, do not need to communicate.
l The VLAN IDs on both ends of the public network are different.
IP forwarding can be configured on a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packets received by
the NPE carry one or two VLAN tags:
l If the user packets contain one tag, the sub-interface for dot1q VLAN tag termination
must support IP forwarding.
l If the user packets contain double tags, the sub-interface for QinQ VLAN tag termination
must support IP forwarding.
IP 100
Dot1q
terminatio
n ISP
MPLS/IP
network
Dot1q PE
terminatio
n
IP 200
The sub-interface for dot1q VLAN tag termination first identifies the outer VLAN tag and
then generates an ARP entry containing the IP address, MAC address, and outer VLAN tag.
l For the upstream traffic, the termination sub-interface strips the MAC address and the
outer VLAN tag, and searches the routing table to perform Layer 3 forwarding based on
the destination IP address.
l For the downstream traffic, the termination sub-interface encapsulates IP packets with
the MAC address and outer VLAN tag according to ARP entries and then sends IP
packets to the target user.
IP 200 1000
QinQ
termination
ISP
MPLS/IP
network
NPE
PE
QinQ
termination
IP 300 1000
The sub-interface for QinQ VLAN tag termination first identifies double VLAN tags and then
generates an ARP entry containing the IP address, MAC address, and double VLAN tags.
l For the upstream traffic, the termination sub-interface strips the MAC address and
double VLAN tags, and searches the routing table to perform Layer 3 forwarding based
on the destination IP address.
l For the downstream traffic, the termination sub-interface encapsulates IP packets with
the MAC address and double VLAN tags according to ARP entries and then sends IP
packets to the target user.
Proxy ARP can be configured for a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packet received by
a PE contains one or two VLAN tags.
l If the user packet contains one tag, the sub-interface that has proxy ARP configured is a
sub-interface for Dot1q VLAN tag termination.
l If the user packet contains double tags, the sub-interface that has proxy ARP configured
is a sub-interface for QinQ VLAN tag termination.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3.
However, because PC1 and PC3 are in different VLANs, PC3 does not receive the ARP
request from PC1.
To resolve this problem, configure proxy ARP on the sub-interface for dot1q VLAN tag
termination.
Figure 5-84 Proxy ARP on a sub-interface for dot1q VLAN tag termination
PE
10.1.1.254/24
Dot1q Termination
sub-interface
Switch1
On the network shown in Figure 5-85, PC1, PC2, and PC3 are on the same network segment.
PC1 and PC2 belong to VLAN 100; PC3 belongs to VLAN 200. Switch 1 has selective QinQ
enabled and attaches outer VLAN tag 1000 to the packets from Switch 2 and Switch 3 to the
PE.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3.
However, because PC1 and PC3 are in different VLANs, PC3 does not receive the ARP
request from PC1.
To resolve this problem, enable proxy ARP on the sub-interface for QinQ VLAN tag
termination.
Figure 5-85 Proxy ARP on a sub-interface for QinQ VLAN tag termination
PE
10.1.1.254/24
VLAN1000 QinQ Termination
sub-interface
Switch1
Figure 5-86 L3VPN access through a sub-interface for dot1q VLAN tag termination
VPN1
VLAN300
VPN1
VLAN100
Switch2
IP 100
IP xxx IP 300
DSLAM1
CE1 Dot1q
termination CE3
PE3
DSLAM2
IP 200
IP 200
VPN1 DSLAM
VLAN200 ISP Backbone
Network 3
PE1
VPN1
VLAN200
Dot1q PE2
termination CE4
CE2
IP xxx
IP 300
Switch1 Switch3
DSLAM4
Figure 5-87 L3VPN access through a sub-interface for QinQ VLAN tag termination
VPN1
VLAN300
VPN1
VLAN100
Switch2
IP 100
Switch1 Switch3
DSLAM4
Sub-interface for QinQ VLAN tag termination access to Pseudo-Wire Emulation Edge to
Edge (PWE3)/VLL means that the sub-interface for QinQ VLAN tag termination is
configured with PWE3/VLL functions. After a range of inner and outer VLAN tags are
configured on the sub-interface for QinQ VLAN tag termination of the PE, users within the
VLAN tag range are allowed to access PWE3/VLL. The packet that carries double tags is
transparently transmitted to the remote end as Layer 2 data for identification and
authentication. The remote end is often a Broadband Remote Access Server (BRAS).
Figure 5-88 shows a typical networking for PWE3/VLL access through the sub-interfaces for
QinQ VLAN tag termination.
Figure 5-88 PWE3/VLL access through the sub-interfaces for QinQ VLAN tag termination
IP 100
IP xxx 1000
DSLAM1
CE1
ISP
Network
PE1
IP 200
IP 300
Switch
On a VPLS network, one Virtual Circuit (VC) link connects only a user's two VLANs that are
distributed in different places. If the user wants to connect multiple VLANs distributed in
different places, multiple VCs are required.
VPLS access can be configured for a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packet received by
a PE contains one or two VLAN tags.
l If a user packet carries one VLAN tag, configure a sub-interface for dot1q VLAN tag
termination for VPLS access.
l If a user packet carries double VLAN tags, configure a sub-interface for QinQ VLAN
tag termination for VPLS access.
Figure 5-89 VPLS access through a sub-interface for dot1q VLAN tag termination
CE3
IP 3000
PE3
VSI1
Dot1q Dot1q
termination termination
IP xxx IP xxx
CE1 CE2
VLAN100 VLAN200 VLAN100 VLAN200
IP 100 IP 200 IP 100 IP 200
VPLS supports the Point-to-Multipoint Protocol (P2MP) and forwards data by learning MAC
addresses. In this case, VPLS access through a sub-interface for dot1q VLAN tag termination
can be performed by MAC address learning on the basis of a single VLAN tag. Note that
there are no restrictions on VLAN tags for VPLS access.
Figure 5-90 VPLS access through a sub-interface for QinQ VLAN tag termination
CE3
IP 3000
PE3
VSI1
QinQ QinQ
termination termination
IP xxx 1000 IP xxx 1000
CE1 CE2
VLAN100 VLAN200 VLAN100 VLAN200
IP 100 IP 200 IP 100 IP 200
VPLS supports the P2MP and forwards data by learning MAC addresses. In this case, VPLS
access through a sub-interface for QinQ VLAN tag termination can be performed by MAC
address learning on the basis of double VLAN tags. Note that there are no restrictions on
VLAN tags for VPLS access.
ISP Network
PE1 PE2
CE1 CE2
VLAN100 VLAN200 VLAN100 VLAN200
CE3
1~200 1000
PE3
VSI1
IP/MPLS
Network
VSI1 VSI1
1~200 1000
1~200 1000
PE1 PE2
CE1 CE2
5.5.3 Applications
Core Network
NPE
NPE
VLAN1001 VLAV1XX
VLAN2001 VLAV3XX
VLAN1000 VLAV1XX
VRRP VLAN3001 VLAV5XX
VLAN2000 VLAV3XX Metro
VLAN3000 VLAV5XX Ethernet
UPE
VLAN101 VLAN101
VLAN301 VLAN301
VLAN501 VLAN501
PVC101
PVC301
PVC501
On the network shown in Figure 5-93, DSLAMs support multiple Permanent Virtual Channel
(PVC) access. One user uses multiple services, such as HSI, IPTV and VoIP.
PVCs are used to carry services that are assigned with different VLAN ID ranges. Figure
5-93 lists the VLAN ID ranges for each service.
If a user needs to use the VoIP service, user VoIP packets are sent to a DSLAM over a
specified PVC and assigned with VLAN ID 301. When the packets reach the UPE, an outer
VLAN ID (for example, 2000) is added to the packets. The inner VLAN ID (301) represents
the user, and the outer VLAN ID (2000) represents the VoIP service (the DSLAM location
can also be marked if different DSLAMs add different VLAN tags to packets). The UPE then
sends the VoIP packets to the NPE where the double VLAN tags are terminated. Then, the
NPE sends the packets to an IP core network or a Virtual Private Network (VPN).
HSI and IPTV services are processed in the same way. The difference is that QinQ
termination of HSI services is implemented on the BRAS.
The NPE can also perform HQoS scheduling based on the double tags and generate a
Dynamic Host Configuration Protocol (DHCP) binding table to avoid network attacks. In
addition, the NPE can implement DHCP authentication based on the double tags or other
information and enable QinQ Virtual Router Redundancy Protocol (VRRP) to ensure reliable
service access.
ME VPLS ME
Finance
Others Finance Others
VLAN100 VLAN300 VLAN100 VLAN300
Marketing
VLAN200 Marketing
VLAN200
The carrier uses VPLS on the MPLS/IP core network and QinQ on the ME network. Each site
is configured with three VLANs. VLANs 100, 200, and 300 represent the finance, sales and
other departments respectively. An outer VLAN 1000 is encapsulated on the UPE (Packets
can be added with different VLAN tags on different UPEs.) The Virtual Switching Instance
(VSI) on the NPE is in symmetry mode. After the configurations, only users of the same
VLAN in different sites can communicate with each other.
QinQ interface An interface that can process VLAN frames with a single tag (dot1q
termination or VLAN mapping) or with double tags (QinQ termination,
or VLAN stacking)
QinQ An interface that can identify single or double VLAN tags and strips the
termination tags or sends the packets according to the subsequent forwarding.
sub-interface
5.6 RRPP
NOTE
Only ATN 910/ATN 910B/ATN 910I/ATN 950B support this feature.
5.6.1 Introduction
Definition
The Rapid Ring Protection Protocol (RRPP) is a Huawei-specific link layer protocol that
prevents loops on an Ethernet ring network. RRPP enables devices to detect loops by
exchanging RRPP packets and to eliminate loops by blocking certain interfaces.
Purpose
Most MANs and LANs use ring networks to implement high reliability. Any node or link
failure on the ring will not affect the service if the backup link is deployed.
When a device or link fails, it takes a period of time for data switches to a backup device or
link. To reduce convergence time and remove the impact of network scale on convergence
time, Huawei develops RRPP. Compared with other Ethernet ring technologies, RRPP boasts
the following features, as shown in Table 5-31:
l The convergence speed is rapid: The convergence time of Layer 2 traffic is less than 50
milliseconds (ms) and that of Layer 3 traffic is less than 200 ms.
l Convergence time is irrelevant to the number of nodes on the ring network. Therefore,
RRPP can be applied to a network with a great diameter.
l RRPP can prevent broadcast storms caused by loops when an Ethernet ring network is
complete.
l On an Ethernet ring network, when a link is torn down, a backup link immediately starts
to resume the normal communication between nodes.
l The cost is low.
Token Token ring is the first ring technology Token ring does not have a self-
ring on data communication networks and healing capability.
adopts a single-direction ring structure Token ring is of low speed, so that it
based on MAC layer protocols. applies only to local area networks
(LANs).
Fiber FDDI is an enhancement of token ring FDDI does not have a self-healing
Distribut and adopts the double-ring structure. capability.
ed FDDI uses a token to control a ring FDDI does not consider bandwidth
Digital network. consumption because FDDI uses the
Interface FDDI uses fibers to transmit data, source address stripping technique.
(FDDI) providing higher performance and
efficiency compared with token ring.
5.6.2 Principles
RRPP Domain
Master Node
CX-B
Edge Node
RRPP Sub-Ring 1 Transit Node
CX-A
RRPP Major-Ring
l RRPP domain
Each RRPP domain is uniquely identified by an integer ID.
An RRPP domain consists of a group of switches that are mutually connected and
configured with the same domain ID and control VLAN.
Network elements that form an RRPP domain are as follows:
– RRPP major ring
– RRPP sub-ring
– Control VLAN
– Master node
– Transit node
– Edge node
– Assistant edge node
– Common port
– Edge port
– Primary port
– Secondary port
l RRPP ring
Physically, an RRPP ring corresponds to an Ethernet ring topology. Each RRPP ring is a
part of an RRPP domain.
An RRPP domain can comprise a single RRPP ring or a major ring plus multiple sub-
rings.
Sub-ring protocol packets are transmitted through the major ring as data packets; major
ring protocol packets are transmitted only within the major ring.
NOTE
An RRPP domain can only have one RRPP major ring.
l Control VLAN
In an RRPP domain, a control VLAN is used to transmit only RRPP packets, whereas a
data VLAN is used to transmit data packets. A data VLAN can contain both the RRPP
and non-RRPP ports.
Each RRPP domain is configured with two control VLANs: major control VLAN and
sub-control VLAN. A major control VLAN belongs to a major ring, whereas a sub-
control VLAN belongs to a sub-ring. During the configuration, you need only to specify
a major control VLAN, and the system will automatically take the VLAN with an ID 1
greater than the major control VLAN ID as a sub-control VLAN.
Protocol packets of the major ring are transmitted in the major control VLAN; protocol
packets of the sub-ring are transmitted in the sub-control VLAN. Interfaces of both the
major control VLAN and the sub-control VLAN cannot be configured with VLANIF
interfaces.
On each switch, the port connected to an RRPP ring network belongs to a control
VLAN.
l Node type
Each device on an Ethernet ring is a node. Nodes on the RRPP ring are classified into
following types:
– Master node
The master node determines how to handle topology changes. Each RRPP ring must
have only one master node.
Any device on the Ethernet ring can serve as the master node.
The status of the master node can be either Complete or Failed.
When all links on the ring network are in the Up state and the master node can
receive Hello packets sent by itself over the secondary port, the master node is in
the Complete state.
The status of the master node represents the status of the RRPP ring. When the
master node is in the Complete state, the RRPP ring is also in the Complete state. In
this situation, the master node blocks the secondary port to prevent data packets
from forming broadcast loops on the ring topology. After being blocked, the
secondary port can receive RRPP protocol packets, but cannot transmit data
packets.
When a link on the ring network is in the Down state, the master node is in the
Failed state. In this situation, the master node unblocks the secondary port to ensure
uninterrupted communication between nodes on the ring network.
– Transit node
On an RRPP ring, all nodes except the master node are transit nodes. Each transit
node monitors the status of its directly connected RRPP link and notifies the master
node of any changes in link status.
The status of transit nodes can be Link-Up, Link-Down, or Preforwarding.
When both the primary and secondary ports of a transit node are in the Up state, the
transit node is in the Link-Up state.
When either the primary port or secondary port of a transit node is in the Down
state, the transit node is in the Link-Down state.
When either the primary port or secondary port of a transit node is in the Blocked
state, the transit node is in the Preforwarding state.
As shown in Figure 5-96, when a transit node in the Link-Up state detects that the
link of the primary port or secondary port turns Down, the transit node switches to
the Link-Down state and sends a Link-Down packet to notify the master node.
The transit node never directly switches back to the Link-Up state from the Link-
Down state. When the link on a port of the transit node in the Link-Down state turns
Up and the primary port and secondary port return to the Up state, the transit node
switches to the Preforwarding state and blocks the recovered port. When the
primary and secondary ports go Up, the master node does not immediately detect
the change, and the secondary port therefore remains unblocked. If the transit node
immediately switches back to the Link-Up state, broadcast loops formed by data
packets occur on the ring network. Therefore, the transit node first enters
Preforwarding from the Link-Down state.
When a port on the transit node in the Preforwarding state goes Down, the transit
node enters the Link-Down state. When an interface on the transit node in the
Preforwarding state goes Up and the transit node receives a COMPLETE-FLUSH-
FDB packet from the master node, the transit node enters the Link-Up state. If the
COMPLETE-FLUSH-FDB packet is lost during the transmission, RRPP provides a
backup mechanism to unblock temporarily blocked ports and trigger the state
transition. Specifically, the transit node automatically changes to Link-Up and
unblocks the temporarily blocked port.
Link-Up State
The primary and secondary ports of the master node perform different functions.
The master node sends a Hello packet from the primary port. If the secondary port
can receive this packet, the RRPP ring of the node is complete. The master node
must block the secondary port to prevent a data loop.
However, if the packet is not received within the specified period, the RRPP ring is
faulty. The master node must unblock the secondary port to guarantee normal
communication between nodes on the ring.
If the secondary port on the master node of the major ring is blocked, both the data
packets and protocol packets of the sub-rings are prevented from passing through
the port. When the secondary port is unblocked, both the data packets and protocol
packets of the sub-rings are permitted to pass through the port. On the major ring,
protocol packets of the sub-rings are processed as data packets.
The primary and secondary ports of the transit node provide the same function.
– Common port and edge port
On an edge node or an assistant edge node, the port shared by the sub-ring and
major ring is called the common port. The port only on the sub-ring is called the
edge port.
A common port is regarded as a port on the major ring and belongs to both the
major control VLAN and the sub-control VLAN. The RRPP port on the sub-ring
only belongs to the sub-control VLAN. The major ring is regarded as a logical node
of the sub-ring, and packets of the sub-ring are transparently transmitted through the
major ring. However, the packets of the major ring are transmitted only in the major
ring.
RRPP Packets
HEALTH A packet sent from the master node to detect whether a loop exists on
(HELLO) a network.
LINK-DOWN A packet sent from a transit, edge, or assistant edge node to notify the
master node that a port has gone Down and the loop has disappeared.
COMMON- A packet sent from the master node to instruct the transit, edge, or
FLUSH-FDB assistant edge node to update its MAC address forwarding table, ARP
entries, and ND entries.
COMPLETE- A packet sent from the master node to instruct the transit, edge, or
FLUSH-FDB assistant edge node to update its MAC address forwarding table, ARP
entries, and ND entries.
In addition, this packet instructs the transit node to unblock the
temporarily blocked ports.
MAJOR-FAULT A packet sent from an assistant edge node to notify the edge node that
the major ring in the domain fails if the assistant edge node does not
receive the Edge-Hello packet from the edge port within a specified
period.
0 7 8 15 16 23 24 31 32 39 40 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
EtherType PRI VLAN ID Frame Length
DSAP/SSAP CONTROL OUI = 0x00e02b
0x00bb 0x99 0x0b RRPP Length
RRPP_VER RRPP TYPE Domain ID Ring ID
0x0000 SYSTEM_MAC_ADDR (6 bytes)
HELLO_TIMER FAIL_TIMER
0x00 LEVEL HELLO_SEQ 0x0000
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
l Destination MAC Address: indicates the destination MAC address of an RRPP packet.
This field occupies 48 bits.
l Source MAC Address: indicates the source MAC address of an RRPP packet. This field
occupies 48 bits and is the bridge MAC address of a device.
l EtherType: indicates the encapsulation type. This field occupies 16 bits and has a fixed
value of 0x8100 for tagged encapsulation.
l PRI: indicates the priority of Class of Service (COS). This field occupies 4 bits and has a
fixed value of 0xe.
l VLAN ID: indicates the ID of a VLAN to which the packet belongs. This field occupies
12 bits.
l Frame Length: indicates the length of the Ethernet frame. This field occupies 16 bits and
has a fixed value of 0x0048.
l DSAP/SSAP: indicates the destination service access point/source service access point.
This field occupies 16 bits and has a fixed value of 0xaaaa.
l CONTROL: an 8 bit field of no significance. This field has a fixed value of 0x03.
l OUI: a 24 bit field of no significance. This field has a fixed value of 0x00e02b.
l RRPP_LENGTH: indicates the length of an RRPP data unit. This field occupies 16 bits
and has a fixed value of 0x0040.
l RRPP_VERS: indicates the version of an RRPP packet. This field occupies 8 bits, and
the current version is 0x01.
l RRPP TYPE: indicates the type of an RRPP packet. This field occupies 8 bits.
– HEALTH = 0x05
– COMPLETE-FLUSH-FDB = 0x06
– COMMON-FLUSH-FDB = 0x07
– LINK-DOWN = 0x08
– EDGE-HELLO = 0x0a
– MAJOR-FAULT= 0x0b
l DOMAIN_ID: indicates the ID of the RRPP domain to which the packet belongs. This
field occupies 16 bits.
l RING_ID: indicates the ID of the RRPP ring to which the packet belongs. This field
occupies 16 bits.
l SYSTEM_MAC_ADDR: indicates the bridge MAC address from which the packet is
sent. This field occupies 48 bits.
l HELLO_TIMER: indicates the timeout period of the Hello timer on the node that sends
the packet, in seconds. This field occupies 16 bits.
l FAIL_TIMER: indicates the timeout period of the Fail timer on the node that sends the
packet, in seconds. This field occupies 16 bits.
l LEVEL: indicates the level of the RRPP ring to which the packet belongs. This field
occupies 8 bits.
l HELLO_SEQ: indicates the sequence number of the Hello packet. This field occupies 16
bits.
Polling Mechanism
l Hello timer and fail timer
When RRPP detects the link status of the Ethernet ring through the Polling mechanism,
the master node sends Hello packets according to the Hello timer and checks whether the
secondary port receives Hello packets within a set period according to the Fail timer.
Then, the master node determines whether to unblock the secondary port.
– The value of the Hello timer specifies the interval at which the master node sends
Hello packets from the primary port.
– The value of the Fail timer specifies the maximum delay during which the primary
port sends the Hello packet and the secondary port receives the Hello packet.
When the link is faulty, RRPP fast convergence enables the transit node in the Link-Up
state to immediately instruct the master node to unblock the secondary port through the
Link-Down packet. When the link recovers, the master node instructs the transit node to
○
P S (Block->Forward)
Master
Block
Data Packets P Primary port
LINK DOWN Packet S Secondary port
Checking the Channel Status of the Sub-Ring Protocol Packets on the Major
Ring
On a network where multiple sub-rings are crossed with the master ring. to prevent loops
among sub-rings after secondary ports are unblocked by master nodes on sub-rings, check the
channel status of the sub-ring protocol packets on the major ring.
In Figure 5-99, if the common link between the major ring and sub-ring is faulty and at lease
one non-common link is faulty, the master node of each sub-ring blocks its secondary port
("S" in the figure) because the secondary port no longer receives the Hello packet. Broadcast
loops (blue dashed lines in the figure) may occur between sub-rings. To prevent loops, check
the channel status of the sub-ring protocol packets on the major ring.
Transit Edge
(Block)
P
Sub
(Block) Ring 2 Sub
S
Master 2
Major Ring
P Sub S
Ring 1
S Sub
Master P Master 1
Assistant-Edge
Block
S Block
Master Assistant
EDGE-HELLO
Data Packet
P primary port
S secondary port
If the assistant edge node receives the Edge-Hello packets within the specified period,
the packet channel is normal. Otherwise, the channel is faulty.
2. The channel breaks off and the edge node blocks the edge port.
After the assistant edge node detects that the channel for sub-ring protocol packets
breaks off, the assistant edge node immediately sends the Major-Fault packet to the edge
node through the sub-ring. After receiving the Major-Fault packet, the edge node blocks
its edge port.
As shown in Figure 5-101, the assistant edge node sends the Major-Fault packet to the
edge node through the sub-ring.
Figure 5-101 Blocking the edge port in response to the Major-Fault packet received on
edge node
Edge
Block P
Master
S Block
S
Master Assistant
MAJOR-FAULT P primary port
Data Packet S secondary port
3. The master node of the sub-ring unblocks the secondary port even after the Hello timer
expires.
After the edge node blocks its edge port, the channel for sub-ring protocol packets breaks
off because of the failure in the major ring. Therefore, the master node of the sub-ring
cannot receive the Hello packet sent within the specified period. The master node,
therefore, turns Failed and unblocks the secondary port.
As shown in Figure 5-102, the edge node blocks its edge port. The master node of the
sub-ring unblocks the secondary port that is blocked in Figure 5-102.
Figure 5-102 Sub-ring failed due to the blocked channel on the major ring
Edge
Block P
Master
S
S
Master Assistant
Data Packet
P primary port
S secondary port
Block P
Master
S Block
S
Master Assistant
HELLO P primary port
Data Packet S secondary port
In Figure 5-104, the master node on the sub-ring sends the COMPLETE-FLUSH-FDB
packets. After receiving the packets, the edge node unblocks the edge port.
Figure 5-104 Unblocking the edge port on the edge node of the sub-ring
Edge
P
Master
S Block
Major Sub
Ring Ring
P
S
Master Assistant
COMPLETE-FLUSH-FDBP primary port
Data Packet S secondary port
P S Block
Master
HELLO
Data Packet
P primary port
S secondary port
In Figure 5-105,
– If all links on the ring are in the Up state, the RRPP ring is in the Complete state.
The status of the master node reflects the health status of the ring.
– When the ring is in the Complete state, the master node blocks its secondary port to
prevent a broadcast loop.
– The master node periodically sends a Hello packet from the primary port. The Hello
packet is transmitted through all transit nodes and reaches the secondary port of the
master node.
2. A link fails.
P S Block
Master
LINK DOWN
Data Packet
P primary port
S secondary port
In Figure 5-106,
– When a link on an RRPP port of the transit node is faulty, the node sends a Link-
Down packet to inform the master node.
– When the master node receives the Link-Down packet, it changes its status from
Complete to Failed and unblocks the secondary port.
If the Link-Down packet is lost during transmission, the master node relies on the
Polling mechanism to restore communication between nodes. If the secondary port
of the master node does not receive a Hello packet from the primary port within
before the Fail timer expires, the master node will also discover the fault and
unblock the secondary port.
– When the network topology changes, the master node updates the forwarding
database (FDB) to ensure that packets can be sent to the correct destination. In
addition, the master node sends a COMMON-FLUSH-FDB packet from the
primary port and secondary port to instruct all transit nodes to update FDBs. As
shown in Figure 5-107.
P S
Master
COMMON-FLUSH-FDB
Data Packet
P primary port
S secondary port
P S Block
Master
COMPLETE-FLUSH-FDB
Data Packet
P primary port
S secondary port
5.6.3 Applications
Single RRPP Ring
Figure 5-109 is the networking of a single RRPP ring. Normally, data flow is transmitted
along the path of Transit 1 -> Transit 2 -> Master. If the link between Transit 1 and Transit 2
fails, the path of the data flow on the RRPP ring changes.
RRPP
Domain
Transit 2
NodeB
P
Master IP/MPLS
Core
Transit 1 Block
S MSE/NPE
NodeB
Transit 3
Data Packet
NodeB
l One layer is the aggregation layer between the aggregation devices PE-AGGs, for
example, RRPP Domain 1 in Figure 5-110.
l The other layer is the access layer between PE-AGGs and UPEs, such as RRPP Domain
2 and RRPP Domain 3 in Figure 5-110.
As shown in Figure 5-110, in this networking, tangent RRPP rings can be adopted. That is,
the aggregation layer is the RRPP major ring; the access layer is the RRPP sub-ring.
UPE PE-AGG
RRPP
Domain Transit 1
2 PE-AGG
Master
RRPP P IP/MPLS
UPE Domain Core
UPE 1 S
UPE NPE
Block
RRPP
Domain PE-AGG
3 Transit 2
Master
UPE
PE-AGG: PE-aggregation -
Two tangent rings cannot belong to the same RRPP domain. The tangency points are
configured to two domains. The master node on a ring can be the tangency point.
For multiple tangent RRPP rings, the failure in one ring does not affect other domains. The
convergence process of RRPP rings in a domain is the same as that of a single ring.
Master
PE-AGG: PE-aggregation -
PE1 VPLS PE 2
Master
RRPP Ring
CE 1 CE 2
CE 2 does not
support the local
PE3 PE 4 switching.
CE2 accesses PE2 and PE4 through Layer 2 interfaces. PE2 and PE4 connect at Layer 2. CE2
supports port isolation within a VLAN and does not support local switching. Therefore, CE2,
PE2, and PE4 cannot form a ring.
PE1, PE2, PE3, and PE4 set up a VPLS network. Therefore, a ring CE1 - PE1 - PE3 - CE1
forms. Enable RRPP on CE1, PE1, and PE3 can eliminate the loop.
RRPP Rapid Ring Protection Protocol. A link layer protocol specially used to prevent
loops on an Ethernet ring network. Devices running RRPP discover loops on the
network by exchanging information with each other, and block certain interfaces
to eliminate loops.
MSTP Multi-Spanning Tree Protocol. A spanning tree protocol defined in IEEE 802.1s. It
introduces concepts of region and instance. Based on different requirements,
MSTP divides a big network into regions where multiple spanning tree instances
(MSTIs) are created. These MSTIs are mapped to virtual LANs (VLANs), and
bridge protocol data units (BPDUs) are transmitted between network bridges.
VPLS Virtual Private LAN Service. A type of point-to-multipoint service used in public
networks. VPLS ensures that isolated user sites can be connected through
MAN/WAN and two sites can communicate as if they were in a LAN.
FDB Forwarding database. A database that includes entries for guiding multicast data
forwarding. FDBs can be layer 2 or layer 3. The layer 2 FDB refers to the MAC
table, which provides information about the MAC address and outbound interface
and directs layer 2 forwarding. The layer 3 FDB refers to the ARP table, which
provides information about the IP address and outbound interface and directs layer
3 forwarding.
5.7 LLDP
5.7.1 Introduction
Definition
The Link Layer Discovery Protocol (LLDP) is a Layer 2 discovery protocol defined in the
IEEE 802.1ab standard. Each LLDP interface stores local status information in the standard
Purpose
The Ethernet technology is widely used on the Local Area Network (LAN) and Metropolitan
Area Network (MAN). Network scale expansion requires enhanced Ethernet network
management capabilities, such as the capabilities to automatically obtain the topology of
interconnected devices and solve configuration conflicts between different devices.
Currently, the NMS uses the automated discovery function to trace topology changes. Most
NMSs can only analyze topologies up to the network layer. The information obtained by these
NMSs concerns only basic events such as adding or deleting devices. The NMSs cannot
obtain information about interfaces through which a device connects to other devices. That is
to say, the NMSs cannot locate a device position or describe the current network topology.
LLDP is introduced to address these problems. LLDP provides information about device
positions and interfaces through which one device connect to other devices. In addition,
LLDP discovers the paths between NEs, such as a client, device, application server, and
network server.
Benefits
LLDP improves O&M efficiency by allowing an NMS to rapidly obtain Layer 2 network
topologies and topology changes.
5.7.2 Principles
LLDP Packets
LLDP packets are Ethernet packets encapsulated with LLDP data units (LLDPDUs). LLDP
packets support two encapsulation modes: Ethernet II and Subnetwork Access Protocol
(SNAP). Currently, the versatile routing platform (ATN) supports the Ethernet II
encapsulation mode. Figure 5-113 shows the format of an Ethernet II LLDP packet.
Field Description
Source MAC address An interface MAC address or a bridge MAC address for a device
(The interface MAC address takes precedence over the bridge
MAC address.)
LLDPDU
An LLDPDU is a data unit encapsulated in the data field of an LLDP packet.
LLDP requires that each LLDPDU carry a maximum of 28 types of TLVs and that each
LLDPDU start with Chassis ID TLV, Port ID TLV, and Time to Live TLV, and end with
End of LLDPDU TLV. Other types of TLVs are optional.
TLV
A TLV is the smallest unit of an LLDPDU and indicates an object's type, length, and value.
For example, a device ID is carried in Chassis ID TLV, interface ID in Port ID TLV, and
network management address in Management Address TLV.
l TLV type indicates the type of a TLV. This field occupies seven bits. Each TLV type has
a unique value. For example, the value of End of LLDPDU TLV is 0 and the value of
Chassis ID TLV is 1.
l TLV information string length indicates the length of the TLV information. This field
occupies 9 bits.
l TLV information string indicates TLV information. This field occupies a maximum of
511 bytes.
NMS
MP
SN
SN
MP
LLDPDU
ATN A ATN B
SNMP Packets LDPDU frames
When LLDP is enabled on both ATNA and ATNB, LLDP discovers a network topology as
follows:
1. ATNA sends its status information to ATNB using LLDPDUs.
2. ATNB analyzes the received LLDPDUs and stores the analysis result in its LLDP remote
system MIB so that the NMS can extract the network topology information.
3. ATNB also sends its status information to ATNA.
4. ATNA analyzes the received LLDPDUs and stores the analysis result in its LLDP remote
system MIB so that the NMS can extract the network topology information.
5. The NMS extracts local information and neighbor information from ATNA and ATNB.
The NMS then analyzes the information and determines the network topology.
NMS
SNMP SNMP
Tu
nn
el
nn
el
Tu
ISP Network
LLDPDU LLDPDU
ATN A ATN B
LLDPDU Frames
SNMP Packets
1. ATNA sends LLDP multicast packets with a packet type of 0x88CC and a MAC address
of 01-80-C2-00-00-0E. The LLDP packets are transparently transmitted to ATNB
through a tunnel on the Internet Service Provider (ISP) network.
2. After receiving the LLDP packets, ATNB check the packet type and determines that
these LLDP packets can be processed. ATNB then further analyzes the LLDP packets
and stores the analysis result in its LLDP remote system MIB so that the NMS can
extract the network topology information.
3. ATNB sends LLDP packets in the same manner as ATNA. ATNA also analyzes the
LLDP packets sent from ATNB and stores the analysis result in its LLDP remote system
MIB so that the NMS can obtain the network topology information.
4. The NMS locates ATNA and ATNB based on the management addresses and obtains the
topology information for analysis.
NOTE
To implement LLDP topology discovery between indirectly connected neighbors, a tunnel must have
been established between ATNA andATNB on the ISP network for transparent transmission of LLDP
packets.
After the interval is set on a device, all LLDP interfaces on the device also send LLDP
packets to neighbors at this interval. The time at which these interfaces begin to send LLDP
packets can be different.
The interval for sending LLDP packets determines the network topology discovery speed and
can be adjusted according to the network load:
l A larger value reduces the frequency at which LLDP packets are exchanged, and
therefore conserves system resources. However, if the value is too large, a device may
fail to efficiently notify neighbors of its status, affecting network topology change
discovery.
l A smaller value increases the frequency at which a local device sent its status
information to its neighbors, helping the NMS to efficiently discover network topology
changes. However, if the value is too small, LLDP packets are exchanged too frequently,
increasing the system burden and wasting resources.
When the status of a device changes frequently, increase the delay to reduce the frequency at
which LLDP packets are sent to neighbors.
The delay in sending LLDP packets must be adjusted according to the network load:
l A larger value reduces the frequency at which LLDP packets are exchanged, and
therefore conserves system resources. However, if the value is too large, a device may
fail to efficiently notify neighbors of its status, affecting network topology change
discovery.
l A smaller value increases the frequency at which a local device sent its status
information to its neighbors, helping the NMS to efficiently discover network topology
changes. However, if the value is too small, LLDP packets are exchanged too frequently,
increasing the system burden and wasting resources.
l The interval at which LLDP packets are sent must be equal to or four times greater than
the delay in sending LLDP packets. Figure 5-118 shows the relationship between the
interval and delay for sending LLDP packets.
Figure 5-118 Relationship between the interval and delay in sending LLDP packets
An LLDP frame is
sent and the Interval
timer is triggered
Does local
status information No
change within the
interval? B
Yes
The Delete
timer times out and Yes
check whether local
status information
changes
D
C No
The Interval
timer continues
until it times out
int: is short for interval, indicating the interval for sending LLDP frames.
del: is short for delay, indicating the delay in sending LLDP frames.
A, B, C, and D refer to different time points at which LLDP frames are sent.
Figure 5-119 shows the process of sending LLDP packets at different time points.
interval
A
interval
B
interval
delay
C
interval
delay
D
Operation Mode
The ATN allows LLDP to work in duplex operation mode. Specifically, the ATN can send and
receive LLDP packets at the same time.
5.7.3 Applications
Figure 5-120 LLDP configurations on a network where an interface has multiple neighbors
NMS
SNMP
SNMP
LLDPDU
CX-D CX-F
LL
D
PD
U
LL
D
U
PD
PD
LLDPDU
Router E U
D
LL
SNMP packets
LLDPDU
Interfaces enables with LLDP
NMS Network Management System
NOTE
Layer 2 networks involved in Layer 2 protocol tunneling refer to networks constructed by Layer 2
interfaces but not Layer 2 virtual private networks (L2VPNs).
Purpose
Transparent transmission of Layer 2 protocol packets is a technology used to transparently
transmit the protocol packets of users over the ISP network. On the ingress of the ISP
network, protocol packets sent by users are forwarded to the ISP network after their multicast
destination MAC addresses are changed or modified; on the egress of the ISP network, the
multicast destination MAC addresses of the protocol packets are restored to the original ones.
5.8.2 Principles
Generally, the destination MAC addresses of Layer 2 protocol packets are the same. For
example, the MSTP packets are BPDUs, of which the destination MAC address is 0180-
C200-0000. Therefore, when a Layer 2 protocol packet reaches a PE on the ISP network, the
PE sends the protocol packet to the CPU to perform STP calculation, without identifying
whether the protocol packet comes from a user network or the ISP network.
In this case, devices in user network1 perform STP calculation together with PE1 rather than
devices in user network2. As a result, the Layer 2 protocol packets in user network1 cannot
traverse the ISP network to reach user network2.
Figure 5-121 Transparent transmission of Layer 2 protocol packets in the ISP network
ISP
network
PE1 PE2
CE1 CE2
User User
network1 network2
To address the preceding problem, you can configure transparent transmission of Layer 2
protocol packets. Currently, the Huawei devices support the transparent transmission of
packets of the following Layer 2 protocols:
l Cisco Discovery Protocol (CDP)
l Device link detection protocol(DLDP)
l Dynamic Trunking Protocol (DTP)
l Ethernet Operation, Administration, and Maintenance 802.3ah (EOAM3ah)
l Generic Multicast Registration Protocol (GMRP)
l Generic VLAN Registration Protocol (GVRP)
l HUAWEI Group Management Protocol (HGMP)
l Link Aggregation Control Protocol (LACP)
l Link Layer Discovery Protocol (LLDP)
l Port Aggregation Protocol (PAGP)
l Per VLAN Spanning Tree Plus (PVST+)
l Spanning Tree Protocol (STP)
l Unidirectional Link Detection (UDLD)
l VLAN Trunking Protocol (VTP)
l User-defined protocols
If Layer 2 protocol packets need to be transparently transmitted on the ISP network, the
following conditions must be met during the transmission process:
l Each site of a user network can receive the Layer 2 protocol packets from other sites.
l The Layer 2 protocol packets of a user network cannot be processed by the CPUs of the
devices on the ISP network.
l Layer 2 protocol packets of different user networks must be isolated and do not affect
each other.
Transparent transmission of Layer 2 protocol packets can prevent the Layer 2 protocol
packets of different user networks from interfering in each other, which cannot be achieved by
the previous technologies.
Source address
Length
BPDU Data
MAC address according to the mapping between the specific destination multicast MAC
address configured on the device and the Layer 2 protocol. In addition, the egress
determines whether to remove the outer VLAN tag according to the configured
transparent transmission mode, and then forwards the protocol packet to the UPE.
The Huawei devices support the following transparent transmission modes of Layer 2
protocol packets in different application scenarios:
Currently, the Huawei devices support the transparent transmission of packets of the
following Layer 2 protocols:
LAN-B
MSTP
As shown in Figure 5-123, each interface on a PE connects to one user network. The user
networks belong to different LANs, that is, LAN-A and LAN-B. BPDUs sent from user
networks to the PE are untagged. The PE, however, needs to identify that LAN from which
the BPDUs come. BPDUs of a user network in LAN-A must be sent to other user networks in
LAN-A rather than the user networks in LAN-B. In addition, BPDUs must not be processed
by PEs.
l Change the default multicast MAC address of the Layer 2 BPDU that can be identified
by the devices on the ISP network into another multicast MAC address.
a. Set the roles of the ingress device on the ISP network to provider. Therefore, the
destination MAC addresses of the BPDUs sent by the devices on the ISP network
are changed to 01-80-C2-00-00-08 instead of the original 01-80-C2-00-00-00.
b. Set the roles of all devices in a user network to customer. Therefore, the destination
MAC addresses of the BPDUs sent by the user network are still 01-80-
C2-00-00-00.
By default, the device is configured as the customer on the network.
c. On the device of the ISP network, add the interfaces that connect to the same user
network to the same VLAN. After receiving the Layer 2 protocol packet from the
user network, the device on the ISP network adds the default VLAN ID of the
interface to the packet.
d. The devices (of the provider type) on the ISP network do not take the BPDU as the
Layer 2 BPDU and do not send the BPDU to the CPU for processing. Instead, the
devices select a corresponding Layer 2 tunnel according to the default VLAN ID of
the interface to forward the BPDU.
e. The BPDU is normally forwarded by the devices on the ISP network and normally
traverses the ISP network.
f. When reaching the egress on the ISP network, the Layer 2 BPDU is forwarded to
the UPE without being changed.
l Replace the original multicast MAC address of the Layer 2 BPDU with a specified
multicast MAC address.
NOTE
This method applies to all types of transparent transmission of Layer 2 protocol packets.
a. After receiving and identifying the Layer 2 protocol packet (such as a BPDU of the
STP protocol) from the user network, the device on the ISP network adds the
default VLAN ID of the interface to the Layer 2 protocol packet.
b. According to the mapping between the specified destination multicast MAC address
and the Layer 2 protocol, the device on the ISP network changes the standard
destination multicast MAC address of the Layer 2 BPDU into the specified
destination multicast MAC address.
c. The Layer 2 BPDU is normally forwarded by the devices on the ISP network,
therefore successfully traversing the ISP network.
d. When the Layer 2 BPDU reaches the egress, the egress restores the destination
multicast MAC address to the standard destination multicast MAC address of the
Layer 2 BPDU according to the mapping between the special destination multicast
MAC addresses and Layer 2 protocols, and then forwards the BPDU to the UPE.
LAN-B LAN-B
MSTP MSTP
PE 1 ISP Network PE 2
BPDU Tunnel
LAN-A LAN-A
MSTP MSTP
LAN-B
MSTP
Currently, some Layer 2 protocol packets, such as protocol packets of a spanning tree
protocol, need carry VLAN tags. When receiving Layer 2 protocol packets with VLAN tags, a
device on the ISP network considers them as invalid protocol packets and discards them. To
avoid this problem, you can configure VLAN-based transparent transmission of Layer 2
protocol packets on the devices on the ISP network. In this manner, the Layer 2 protocol
packets can traverse the ISP network through Layer 2 tunnels.
Similar to the interface-based transparent transmission of Layer 2 protocol packets, there are
two processing methods in this application scenario:
l Change the default multicast MAC address of the Layer 2 protocol packet that can be
identified by the devices on the ISP network into another multicast MAC address.
a. Set the roles of the ingress device on the ISP network to provider. Therefore, the
destination MAC addresses of the BPDUs sent by the devices on the ISP network
are changed to 01-80-C2-00-00-08 instead of the original 01-80-C2-00-00-00.
b. Set the roles of all devices in a user network to customer. Therefore, the destination
MAC addresses of the BPDUs sent by the user network are still 01-80-
C2-00-00-00.
By default, the device is configured as the customer on the network.
c. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ISP network.
d. Configure the devices on the ISP network to identify the Layer 2 protocol packets
with VLAN IDs and allow the packets to pass through.
e. The devices (of the provider type) on the ISP network do not take the packet as the
BPDU and do not send the packet to the CPU for processing. Instead, the devices
select a corresponding Layer 2 tunnel to forward the packet according to the VLAN
IDs with which the packets are allowed to pass through.
f. The Layer 2 protocol packet is transmitted as an ordinary Layer 2 packet by the
devices on the ISP network, therefore successfully traversing the ISP network.
g. When reaching the egress on the ISP network, the Layer 2 protocol packet is
forwarded to the CE without being changed.
l Replace the original multicast MAC address of the Layer 2 protocol packet with a
specified multicast MAC address.
NOTE
This method applies to transparent transmission of all types of Layer 2 protocol packets.
a. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ISP network.
b. Configure the devices on the ISP network to identify the Layer 2 protocol packets
with VLAN IDs and allow the packets to pass through.
c. According to the mapping between the specified destination multicast MAC address
and the Layer 2 protocol, the device on the ISP network changes the standard
destination multicast MAC address of the Layer 2 protocol packet into the specified
destination multicast MAC address.
d. After the MAC address is changed, the Layer 2 protocol packet is transmitted as an
ordinary Layer 2 packet by the devices on the ISP network, therefore successfully
traversing the ISP network.
e. When the Layer 2 protocol packet reaches the egress, the egress restores the
destination multicast MAC address to the standard destination multicast MAC
address according to the mapping between the specified destination multicast MAC
addresses and Layer 2 protocols, and then forwards the Layer 2 protocol packet to
the CE.
tag or public tag, used for carrying the VLAN ID of a public network. The inner tag is
usually known as the private tag, used for carrying the VLAN ID of a private network.
NOTE
The QinQ function configured on a Layer 2 interface is also called VLAN stacking.
QinQ
Encapsulation
DA SA ETYPE TAG ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 38 Byte~1500 Bytes 4 Bytes
LAN-B LAN-B
MSTP MSTP
PE-VLAN20:CE-VLAN 100~199
PE PE 2
ISP Network
1
CE-VLAN 100 BPDU Tunnel CE-VLAN 100
BPDU Tunnel
CE-VLAN 200 CE-VLAN 200
PE-VLAN30:CE-VLAN 200~299
LAN-A LAN-A
MSTP MSTP
As shown in Figure 5-126, the convergence interfaces on the PEs are configured with
the function of QinQ-based transparent transmission of Layer 2 protocol packets. Then,
the PEs add different outer tags to the packets from different user networks.
In this application scenario, the following processing methods are available:
– Change the default multicast MAC address of the Layer 2 BPDU that can be
identified by the devices on the ISP network into another multicast MAC address.
i. Set the roles of the ingress device on the ISP network to provider. Therefore,
the destination MAC addresses of the BPDUs sent by the devices on the ISP
network are changed to 01-80-C2-00-00-08 instead of the original 01-80-
C2-00-00-00.
ii. Set the roles of all devices in a user network to customer. Therefore, the
destination MAC addresses of the BPDUs sent by the user network are still
01-80-C2-00-00-00.
By default, the device is configured as the customer on the network.
iii. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ISP network.
iv. Configure transparent transmission of Layer 2 protocol packets and the QinQ
function on the interfaces of the ingress on the ISP network.
v. According to the user VLAN IDs, the ingress on the ISP network allocates
different outer tags, that is, the public VLAN IDs, to the Layer 2 protocol
packets.
vi. The ingress on the ISP network selects different Layer 2 tunnels according to
different outer tags. Then, the layer 2 protocol packets are transmitted as
ordinary Layer 2 packets by the devices on the ISP network.
vii. The Layer 2 protocol packet is transmitted as an ordinary Layer 2 packet by
the devices on the ISP network, therefore successfully traversing the ISP
network.
viii. The egress removes the outer tags and forwards the Layer 2 protocol packets
to the corresponding user networks according to the inner tags.
As shown in Figure 5-126, after receiving the BPDUs with the tags ranging from
100 to 199, the PEs label the BPDUs with the outer tag 20, and then forward the
BPDUs in the ISP network; after receiving the BPDUs with the tags ranging from
200 to 299, the PEs label the BPDUs with the outer tag 30, and then forward the
BPDUs in the ISP network. In this way, the BPDUs of different user networks can
be transparently transmitted in the ISP network; moreover, less VLAN IDs are
occupied.
– Replace the original multicast MAC address of the Layer 2 protocol packet with a
specified multicast MAC address.
i. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ingress device on the ISP network.
ii. Configure transparent transmission of Layer 2 protocol packets and the QinQ
function on the interfaces of the ingress on the ISP network.
iii. According to the user VLAN IDs, the ingress on the ISP network allocates
different outer tags, that is, the public VLAN IDs, to the Layer 2 protocol
packets.
iv. The ingress on the ISP network selects different Layer 2 tunnels according to
different outer tags. Then, the layer 2 protocol packets are transmitted as
ordinary Layer 2 packets by the devices on the ISP network.
As shown in Figure 5-127, PE1, PE2, and PE3 are connected to construct a Layer 2 network;
VLAN 2 and VLAN 3 are respectively created in user networks LAN-A and LAN-C and in
user networks LAN-B and LAN-D; Layer 2 protocol packets with VLAN IDs as VLAN 2 and
VLAN 3 are sent from LAN-A and LAN-B, and then forwarded by CE1, CE2, and CE3. In
addition, a standard Layer 2 protocol, such as the Link Layer Discovery Protocol (LLDP), of
the untagged type needs to be run between CE1, CE2, and CE3.
In this scenario, PEs may receive Layer 2 protocol packets with VLAN IDs and without
VLAN IDs. In this case, you can configure hybrid VLAN-based transparent transmission of
Layer 2 protocol packets on the PEs of the ISP network to enable the PEs to transparently
transmit Layer 2 protocol packets with VLAN tags and without VLAN tags.
NOTE
5.8.3 Applications
LAN-B
MSTP
PE1, PE2, and PE3 are connected to construct a Layer 2 switching network, and access LAN-
A and LAN-B through different interfaces. Each LAN runs Layer 2 control protocol packets.
Here, STP is taken as an example.
l The type of the Layer 2 control protocol packets that need to be transparently transmitted
is set on the interfaces that connect PE1, PE2, and PE3 to CEs, and the original multicast
MAC address of Layer 2 protocol packets from user networks is replaced with a
specified multicast MAC address.
l After identifying that the packets received from CEs are Layer 2 control protocol
packets, PE1 replaces the original multicast MAC address of the packets with the
specified multicast MAC address according to the configured mapping, and then
forwards the packets. The packets whose multicast MAC address is replaced with the
specified multicast MAC address are forwarded as common Layer 2 packets on the ISP
network.
l When the packets reach PE2, PE2 restores the multicast MAC address of the packets to
the standard multicast MAC address according to the configured mapping between
multicast MAC addresses and Layer 2 control protocol packets, and then forwards the
packets to the corresponding CE, completing transparent transmission of Layer 2
protocol packets.
LAN-B LAN-B
MSTP MSTP
PE 1 ISP Network PE 2
BPDU Tunnel
PE 3 CE-VLAN 100
LAN-A LAN-A
MSTP MSTP
LAN-B
MSTP
PE1, PE2, and PE3 are connected to construct a Layer 2 ISP network. CEs add one tag to
Layer 2 control protocol packets from user networks and then send them to the PEs. The
packets received by PEs have only one tag.
The process of transparently transmitting Layer 2 control protocol packets is as follows:
l VLAN-based transparent transmission of Layer 2 control protocol packets is configured
on the interfaces that connect PE1, PE2, and PE3 to CEs.
l After identifying that the packets received from CEs are Layer 2 control protocol
packets, PE1 replaces the original multicast MAC address of the packets with the
specified multicast MAC address according to the configured mapping, and then
forwards the packets. The packets whose multicast MAC address is replaced with the
specified multicast MAC address are forwarded as common Layer 2 VLAN packets on
the ISP network.
l When the packets reach PE2, PE2 restores the multicast MAC address of the packets to
the standard multicast MAC address according to the configured mapping between
multicast MAC addresses and Layer 2 control protocol packets, and then forwards the
packets to the corresponding CE, completing transparent transmission of Layer 2
protocol packets.
LAN-B LAN-B
MST
MSTP P
PE-VLAN20:CE-VLAN 100~199
PE PE 2
ISP Network
1
CE-VLAN 100 BPDU Tunnel CE-VLAN 100
BPDU Tunnel
CE-VLAN 200 CE-VLAN 200
PE-VLAN30:CE-VLAN
200~299
LAN-A LAN-A
MSTP MSTP
PE1 and PE2 are connected to construct a Layer 2 switching network. VLAN 20 and VLAN
30 are configured on the PEs. CEs send tagged Layer 2 control protocol packets (VLAN ID
being 100 or 200) to the PEs. QinQ is configured on the interfaces that connect PE1 and PE2
to CEs.
l Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user networks
to the ISP network.
l Configure transparent transmission of Layer 2 protocol packets and the QinQ function on
the interfaces of the ingress on the ISP network.
l According to the user VLAN IDs, the ingress on the ISP network allocates different
outer tags, that is, the public VLAN IDs, to the Layer 2 protocol packets.
l The ingress on the ISP network selects different Layer 2 tunnels according to different
outer tags. Then, the layer 2 protocol packets are transmitted as ordinary Layer 2 packets
by the devices on the ISP network.
l Configure transparent transmission of Layer 2 protocol packets and the QinQ function on
the interfaces of the egress on the ISP network.
l The egress removes the outer tags and forwards the Layer 2 protocol packets to the
corresponding user networks according to the inner tags.
VLAN3
PE2
VLAN3
VLAN3 CE2 LAN-C
LAN-A
VLAN3
ISP
VLAN2
VLAN2
CE1
PE1 VLAN2
VLAN2
PE3
LAN-B
CE3 LAN-D
PE1, PE2, and PE3 are connected to construct a Layer 2 switching network. USR_A, USR_B,
USR_C, and USR_D form different Layer 2 domains of VLAN 2 and VLAN 3 and send
tagged Layer 2 control protocol packets. CE1, CE2, and CE3 forward tagged Layer 2 control
protocol packets (VLAN ID being 2 or 3) and standard untagged Layer 2 control protocol
packets.
l The default VLAN and the dot1q tunnel attribute are configured on the interfaces that
connect PE1, PE2, and PE3 to CEs. In addition, interface-based transparent transmission
of Layer 2 control protocol packets is configured on these interfaces.
l After receiving Layer 2 control protocol packets (tagged or untagged), PE1 replaces the
original multicast MAC address of the packets with a specified multicast MAC address
according to the configured mapping, and then adds an outer VLAN tag with the default
VLAN ID to the packets before forwarding them. The packets whose multicast MAC
address is replaced with the specified multicast MAC address are forwarded as common
Layer 2 VLAN packets on the ISP network.
l When the packets reach PE2 and PE3, PE2 and PE3 restore the multicast MAC address
of the packets to the standard multicast MAC address according to the configured
mapping between multicast MAC addresses and Layer 2 control protocol packets, and
then remove the outer VLAN tag from the packets and forward the packets to the
corresponding CEs, completing transparent transmission of Layer 2 protocol packets.
Purpose
Redundant links are generally used on an Ethernet switching network to provide link backup
for higher network reliability. However, redundant links may produce loops, causing
broadcast storms and reducing the stability of MAC address tables. As a result, the
communication quality deteriorates, and communication services may be interrupted. To
resolve these problems, ring network protocols must be used to prevent loops.
Benefits
ERPS offers the following benefits:
5.9.2 Principles
As shown in Figure 5-132, ATN A through ATN D constitute a ring and are dual-homed to an
upper-layer network. This access mode will cause a loop on the entire network. To ensure link
connectivity, ERPS is used to prevent loops.
Network
NPE1 NPE2
ATNA ATND
ERPS
ATNB RPL
ATNC
CE
RPL owner
RPL neighbour
Figure 5-132 shows a typical ERPS single-ring network. The following describes ERPS
based on this networking:
ERPS Ring
An ERPS ring consists of interconnected ATN devices that have the same control VLAN. A
ring is a basic ERPS unit.
ERPSv1 supports only major rings (closed). ERPSv2 supports both major rings and sub-rings
(open). Major rings can be reconfigured as sub-rings.
Node
A node is a ATN added to an ERPS ring. A node can have a maximum of two ports added to
the same ERPS ring.
Port Role
ERPS defines three port roles: ring protection link (RPL) owner port, RPL neighbor port
(only in ERPSv2), and ordinary port.
l RPL owner port
An RPL owner port is a ring port responsible for blocking traffic over the RPL to prevent
loops. An ERPS ring has only one RPL owner port.
When the node on which the RPL owner port resides receives an R-APS PDU indicating
the failure of a link or node on the ring, it unblocks the RPL owner port to allow the port
to send and receive traffic. This process ensures that traffic is not interrupted.
l RPL neighbor port
An RPL neighbor port is a ring port directly connected to an RPL owner port and is used
to reduce the number of times that filtering database (FDB) entries are refreshed.
RPL owner and neighbor ports are both blocked under normal conditions to prevent
loops.
If an ERPS ring fails, both RPL owner and neighbor ports are unblocked.
l Ordinary port
Ordinary ports are ring ports other than the RPL owner and neighbor ports.
An ordinary port monitors the status of the directly connected ERPS link and sends R-
APS PDUs to inform the other ports if the link status changes.
Port Status
On an ERPS ring, an ERPS-enabled port can be in either of the following states:
l Forwarding: The port forwards user traffic and sends and receives R-APS PDUs.
l Discarding: The port only sends R-APS PDUs.
Control VLAN
A control VLAN is used to transmit R-APS PDUs for an ERPS ring.
Each ERPS ring must be configured with a control VLAN. After a port is added to an ERPS
ring that has a control VLAN configured, the port is added to the control VLAN
automatically.
Different ERPS rings cannot be configured with the same control VLAN ID.
Unlike control VLANs, data VLANs are used to transmit data packets.
ERP Instance
On a ATN running ERPS, the VLAN in which R-APS PDUs and data packets are transmitted
must be mapped to an Ethernet Ring Protection (ERP) instance so that ERPS forwards or
blocks the VLAN packets based on blocking rules. Otherwise, VLAN packets may cause
broadcast storms on the ring network and render the network unavailable.
Timer
ERPS defines four timers: guard timer, wait to restore (WTR) timer, hold-off timer, and wait
to block (WTB) timer (only in ERPSv2).
l Guard timer
After a faulty link or node recovers or a clear operation is executed, the nodes on the two
ends of the link or the recovered node sends R-APS No Request (NR) messages to
inform the other nodes of the link or node recovery and starts a guard timer. To avoid
receiving out-of-date R-APS Signal Fail (SF) messages before the timer expires, each
involved node does not process any R-APS PDUs. After the timer expires, if the
involved node still receives an R-APS (SF) message, the local port enters the Forwarding
state. (An R-APS (SF) message is sent by a node to other nodes after the node detects
that one of its ring ports is Down.)
l WTR timer
If the RPL owner port is unblocked due to a link or node failure, the involved port may
not go Up immediately after the link or node recovers. To prevent the RPL owner port
from alternating between Up and Down, the node on which the RPL owner port resides
starts a WTR timer after receiving an R-APS No Request (NR) message. If the node
receives an R-APS (SF) message before the timer expires, it terminates the WTR timer.
If the node does not receive any R-APS SF message before the timer expires, it unblocks
the RPL owner port when the timer expires and sends an R-APS NR, RPL Blocked (NR,
RB) message. After receiving this R-APS (NR, RB) message, the nodes set their
recovered ports on the ring to the Forwarding state.
l Hold-off timer
Protection switching sequence requirements vary for Layer 2 networks running ERPS.
For example, in a multi-layer service application, if a server fails, it will require a certain
period of time to recover. No protection switching is performed immediately after the
server fails, and the client does not detect the failure during this period. A hold-off timer
can be set to meet this requirement. If a fault occurs, the fault is not immediately
reported to ERPS. Instead, the hold-off timers starts. If the fault persists after the timer
expires, the fault will be reported to ERPS.
l WTB timer
The WTB timer starts after an FS or MS operation is performed. When multiple nodes
on an ERPS ring are in the FS or MS state, the clear operation takes effect only after the
WTB timer expires. This ensures that the RPL owner port will not be blocked
immediately.
The WTB timer value cannot be configured. Its value is the guard timer value plus 5.
l In revertive switching, the RPL owner port is re-blocked after the wait to restore (WTR)
timer expires, and the traffic channel is blocked on the RPL.
l In non-revertive switching, the traffic channel continues to use the RPL.
ERPSv1 supports only revertive switching. ERPSv2 supports both revertive and non-revertive
switching.
Major
Sub-Ring Ring
Sub-Ring
with without
virtual virtual
channel channel
Interconnection Node
By default, sub-rings use NVCs to transmit R-APS PDUs, except for the scenario shown in
Figure 5-134.
NOTE
When sub-ring links are not contiguous, VCs must be used. On the network shown in Figure 5-134,
links b and d belong to major rings 1 and 2, respectively; links a and c belong to the sub-ring. Because
links a and c are not contiguous, they cannot detect the status change between each other. Therefore,
VCs must be used for R-APS PDU transmission.
Sub-Ring Major
Major b with virtual d
Ring1 Ring2
channel
Interconnection Node
Table 5-36 lists the advantages and disadvantages of R-APS PDU transmission modes on
sub-rings with VCs or NVCs.
Table 5-36 Comparison between R-APS PDU transmission modes on sub-rings with VCs or
NVCs
R-APS Advantage Disadvantage
PDU
Transmis
sion
Mode on
Sub-
rings
Using Does not require resource Inapplicable when sub-ring links are not
NVCs reservation or control VLAN contiguous.
assignment from adjacent rings.
...
37
[optional TLV starts here;otherwise End TLV]
last End TLV(0)
MEL 3 bits Identifies the maintenance entity group (MEG) level of the R-
APS PDU.
OpCode 8 bits Indicates an R-APS PDU. The value of this field is 0x28.
Flags 8 bits The value of this field is 0x00. This field should be ignored
upon reception.
R-APS Specific 32 x 8 Carries R-APS ring information and is the core in an R-APS
Information bits PDU. This field has different meanings for some of its sub-
fields in ERPSv1 and ERPSv2. Figure 5-136 shows the R-
APS Specific Information field format in ERPSv1. Figure
5-137 shows the R-APS Specific Information field format in
ERPSv2.
(Node ID)
(Node ID)
Request/ 4 bits Indicates that this R-APS PDU is a request or state PDU. The
State value can be:
l 1101: forced switch (FS)
l 1110: Event
l 1011: signal fail (SF)
l 0111: manual switch (MS)
l 0000: no request (NR)
l Others: reserved
Node ID 6 x 8 bits Identifies the MAC address of a node on the ERPS ring. It is
informational and does not affect protection switching on the
ERPS ring.
Reserved 2 24 x 8 bits Reserved for future extension and should be ignored upon
reception. Currently, this sub-field should be encoded as all 0s
in transmission.
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
CE
Blocked Interface
Data Flow
A Link Fails
As shown in Figure 5-139, if the link between ATN D and ATN E fails, the ERPS protection
switching mechanism is triggered. The ports on both ends of the faulty link are blocked, and
the RPL owner port and RPL neighbor port are unblocked to send and receive traffic. This
mechanism ensures that traffic is not interrupted. The process is as follows:
1. After ATN D and ATN E detect the link fault, they block their ports on the faulty link
and perform a Filtering Database (FDB) flush.
2. ATN D and ATN E send three consecutive R-APS Signal Fail (SF) messages to the other
LSWs and then, after 5s, send another R-APS (SF) message.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. ATN C
on which the RPL owner port resides and ATN B on which the RPL neighbor port
resides unblock the respective RPL owner port and RPL neighbor port, and perform an
FDB flush.
Figure 5-139 ERPS single ring networking (unblocking the RPL owner port and RPL
neighbor port if a link fails)
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
CE Failed Link
Blocked Interface
Data Flow
3. After receiving an R-APS (NR, RB) message, ATN D and ATN E unblock the ports at
the two ends of the link that has recovered, stop sending R-APS (NR) messages, and
perform an FDB flush. The other LSWs also perform an FDB flush after receiving an R-
APS (NR, RB) message.
Protection Switching
l Forced switch
On the network shown in Figure 5-140, ATN A through ATN E on the ERPS ring can
communicate with each other. A forced switch (FS) operation is performed on the ATN
E's port that connects to ATN D, and the ATN E's port is blocked. Then the RPL owner
port and RPL neighbor port are unblocked to send and receive traffic. This mechanism
ensures that traffic is not interrupted. The process is as follows:
a. After the ATN E's port that connects to ATN D is forcibly blocked, ATN E performs
an FDB flush.
b. ATN E sends three consecutive R-APS (FS) messages to the other LSWs and then
sends one R-APS (FS) message at an interval of 5s afterwards.
c. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush.
ATN C on which the RPL owner port resides and ATN B on which the RPL
neighbor port resides unblock the respective RPL owner port and RPL neighbor
port, and perform an FDB flush.
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
CE
Blocked Interface
Data Flow
l Clear
After a clear operation is performed on ATN E, the port that is forcibly blocked by FS
sends R-APS (NR) messages to all other ports on the ERPS ring.
– If the ERPS ring uses revertive switching, the RPL owner port starts the wait to
block (WTB) timer after receiving an R-APS (NR) message. After the WTB timer
expires, the FS operation is cleared. The RPL owner port is then blocked, and the
blocked port on ATN E is unblocked. If you perform a clear operation on ATN C
(on which the RPL owner port resides) before the WTB timer expires, the RPL
owner port is immediately blocked, and the blocked port on ATN E is unblocked.
– If the ERPS ring uses non-revertive switching and you want to block the RPL
owner port, perform a clear operation on ATN C (on which the RPL owner port
resides).
l Manual switch
Compared with an FS operation, a manual switch (MS) operation triggers protection
switching in a similar way except that an MS operation does not take effect in FS, MS,
or link failure conditions.
Network
NPE1 NPE2
ATNA ATNE
Major Ring
ATNB
RPL ATND
Sub-Ring1
Sub-Ring2
ATNC
ATNF ATNG
PC1 PC2
RPL owner
Data Flow
A Link Fails
As shown in Figure 5-142, if the link between ATN D and ATN G fails, the ERPS protection
switching mechanism is triggered. The ports on both ends of the faulty link are blocked, and
the RPL owner port on sub-ring 2 is unblocked to send and receive traffic. In this situation,
traffic from PC1 still travels along the original path. ATN C and ATN D inform the other
nodes on the major ring of the topology change so that traffic from PC2 is also not
interrupted. Traffic between PC2 and the upper-layer network travels along the path PC2 <->
ATN G <-> ATN C <-> ATN B <-> ATN A <-> ATN E <-> ATN B. The process is as
follows:
1. After ATN D and ATN G detect the link fault, they block their ports on the faulty link
and perform a Filtering Database (FDB) flush.
2. ATN G sends three consecutive R-APS (SF) messages to the other LSWs and then, after
5s, sends another R-APS (SF) message.
3. ATN G then unblocks the RPL owner port and performs an FDB flush.
4. After the interconnection node ATN C receives an R-APS (SF) message, it performs an
FDB flush. ATN C and ATN D then send R-APS Event messages within the major ring
to notify the topology change in sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic from PC2 is switched to a normal link.
Figure 5-142 ERPS multi-ring networking (unblocking the RPL owner port if a link fails)
Network
NPE1 NPE2
ATNA ATNE
Major Ring
ATNB
RPL ATND
Sub-Ring1
Sub-Ring2
ATNC
ATNF ATNG
PC1 PC2
Blocked Interface
Data Flow
The following example uses revertive switching to describe the process after the link
recovers.
1. After the link between ATN D and ATN G recovers, ATN D and ATN G start a guard
timer to avoid receiving out-of-date R-APS PDUs. The two ATN devices do not receive
any R-APS PDUs before the timer expires. Then ATN D and ATN G send R-APS (NR)
messages within sub-ring 2.
2. ATN G on which the RPL owner port resides starts the wait to restore (WTR) timer.
After the WTR timer expires, ATN G blocks the RPL owner port and unblocks its port
on the link that has recovered and then sends R-APS (NR, RB) messages within sub-ring
2.
3. After receiving an R-APS (NR, RB) message from ATN G, ATN D unblocks its port on
the recovered link, stops sending R-APS (NR) messages, and performs an FDB flush.
ATN C also performs an FDB flush.
4. ATN C and ATN D, the interconnection nodes, then send R-APS Event messages within
the major ring to notify the link recovery of sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic changes to the normal state, as shown in Figure 5-141.
Network
NPE1 NPE2
ATN
ATN
ERPS
ATN
P2 ATN
P1
CE1 CE2
VLAN: VLAN:
100~200 300~400
ERPS ring1
ERPS ring2
Blocked Interface1
Blocked Interface2
Data Flow1
Data Flow2
After Ethernet CFM is deployed on ERPS nodes connecting to transmission devices and
detects a transmission link failure, Ethernet CFM informs the ERPS ring of the failure so that
ERPS can perform fast protection switching.
NOTE
On the network shown in Figure 5-144, ATN A, ATN B, and ATN C form an ERPS ring.
Three relay nodes exist between ATN A and ATN C. Ethernet CFM is configured on ATN A
and ATN C. Interface1 on ATN A is associated with Interface1 on Relay1, and Interface1 on
ATN C is associated with Interface1 on Relay3.
In normal situations, the RPL owner port sends R-APS (NR) messages to all other nodes on
the ring at an interval of 5s, indicating that ERPS links are normal.
Figure 5-144 ERPS ring over transmission links (links are normal)
Relay2
Interface1
Interface1
Relay1
Relay3
Interface1
Interface1
ATNA
ATNC
ATNB
RPL owner
Data Flow
If Relay2 fails, ATN A and ATN C detect the Ethernet CFM failure, block their Interface1,
send R-APS (SF) messages through their respective interfaces connected to ATN B, and then
perform a Filtering Database (FDB) flush.
After receiving an R-APS (SF) message, ATN B unblocks the RPL owner port and performs
an FDB flush. Figure 5-145 shows the networking after Relay2 fails.
Relay2
Interface1
Interface1
Relay1
Relay3
Interface1
Interface1
ATNA
ATNC
ATNB
Blocked Interface
Data Flow
After Relay2 recovers, Relay2 in revertive switching mode re-blocks the RPL owner port and
sends R-APS (NR, RB) messages.
After ATN A and ATN C receive an R-APS (NR, RB) message, ATN A and ATN C unblock
their blocked Interface1 and perform an FDB flush so that traffic changes to the normal state,
as shown in Figure 5-144.
5.9.3 Applications
To prevent loops caused by redundant links, enable ERPS on the nodes of the ring network.
ERPS is a Layer 2 loop-breaking protocol defined by the ITU-T. It boasts of fast convergence,
implementing convergence within 50 ms.
As shown in Figure 5-146, ATN A through ATN E constitute an aggregation ring that
provides Layer 2 aggregation services and accesses a Layer 3 network for service processing.
The aggregation ring runs ERPS, providing protection switching for Layer 2 redundant links.
VLANIF interfaces are configured on ATN A and ATN B for Layer 3 access. In addition,
VRRP is configured on the VLANIF interfaces to function as the virtual gateway, and peer
BFD is enabled for fast fault detection and then fast VRRP switching.
Network
NPE1 NPE2
VRRP+peer BFD
ATNE
ATNA
ERPS ATND
ATNB
RPL
RPL Owner
ATNC
CE1 CE3
CE2
Blocked Port
Data Flow1
Data Flow2
Data Flow3
Terms
Term Description
FDB Forwarding database, including entries for guiding data forwarding. There are
Layer 2 FDB and Layer 3 FDB. The Layer 2 FDB refers to the MAC table,
which provides information about MAC addresses and outbound interfaces and
guides Layer 2 forwarding. The Layer 3 FDB refers to the ARP table, which
provides information about IP addresses and outbound interfaces and guides
Layer 3 forwarding.
MSTP The Multiple Spanning Tree Protocol (MSTP) is a new spanning tree protocol
defined in IEEE 802.1s. MSTP uses the concepts of region and instance. Based
on different requirements, MSTP divides a large network into regions where
instances are created. These instances are mapped to VLANs. BPDUs with
region and instance information are transmitted between bridges. A bridge
determines which domain it belongs to based on the information carried in
BPDUs.
RRPP The Rapid Ring Protection Protocol (RRPP) is a link layer protocol specially
used to prevent loops on an Ethernet ring network. Devices running RRPP detect
loops on the network by exchanging information with each other, and block
certain interfaces to eliminate loops.
RSTP The Rapid Spanning Tree Protocol (RSTP) is defined in IEEE 802.1w released
in 2001. RSTP is the amendment and supplementation to STP, implementing
rapid convergence.
STP The Spanning Tree Protocol (STP) is defined in IEEE 802.1d released in 1998.
This protocol is used to eliminate loops on a LAN. The ATN devices running
STP detect loops on the network by exchanging information with each other, and
block specified interfaces to eliminate loops.
FS forced switch
MS manual switch
NR No Request
SF Signal Fail
5.10.1 Introduction
Definition
The Automatic Link Discovery Protocol (ALDP) is a Huawei proprietary feature used by the
ATN to discover neighbors at the link layer. This protocol allows the Network Management
System (NMS) to use the Simple Network Management Protocol (SNMP) and Management
Information Base (MIB) to initiate a link detection process. After receiving the set operation
delivered by the MIB, ATN sends Link Detect packets to its neighboring devices. Upon
receiving the Link Detect packets, the neighboring devices respond with Link Reply packets
to the ATN. The ATN then saves neighbor information. The NMS can use the MIB to view
the neighbor information on the ATN and calculates the network topology based on the
obtained neighbor information.
Purpose
At present, many NMSs use the Automated Discovery function to trace topology changes.
The function only allows the NMS to calculate the topology at the network layer and
determine to which subnet a device belongs. As a result, the topology discovery result only
shows basic topology information such as device addition or deletion, but not detailed
topology information such as interfaces through which devices are connected, device
locations, and network topology status.
To discover detailed topology information at the link layer, including interface connection
information on devices, the automatic link discovery protocol is introduced.
Benefits
This feature improves carriers' network sensitivity to topology changes and operating
efficiency.
5.10.2 Principles
SN
SN
MP
Link
detect
Network
P
SN
M
Tunnel M
SN
Link Link
detect detect
ATNA ATNB
Packet Format
Figure 5-149 and Figure 5-150 show formats of automatic link discovery packets.
Figure 5-149 Format of an automatic link discovery packet on an Ethernet physical link
Flag
DA SA Type Information
(6bytes) (6bytes) (2bytes)
(20bytes) CRC
(0xff-ff-ff-ff-ff-ff) (0x0000)
“Huawei Link * (4bytes)
Search”
Figure 5-150 Format of an automatic link discovery packet on an Ethernet sub-interface link
H a rd w a re
typ e + p ro to co l
typ e + h a rd w a re
DA SA 0x8100 VLAN T yp e F la g In fo rm a tio n CRC
a d d re ss
le n g th + p ro to co l
a d d re ss le n g th
Packet Type
Link automatic discovery packets can be classified into two types:
l Link Detect packet: is generated by the NMS to discover links. The TLV field in a Link
Detect packet is Send Link Info SubTLV.
l Link Reply packet: is a response to a Link Detect packet. The TLV fields in the Link
Reply packet are Recv Link Info SubTLV and Send Link Info SubTLV.
5.10.3 Applications
The automatic link discovery function allows the device to obtain neighbor information at the
link layer, expanding the network scale managed by the NMS and providing network
administrators with detailed network topology information.
In Figure 5-151, the network on which automatic link discovery is enabled can be a VLAN
network,a network that traverses third-party SDH devices. A network administrator can click
a link on the NMS to acquire information about the link and its connected network elements.
6 WAN Access
This document describes the WAN features in terms of the overview, principle, and
applications.
6.1.1 Introduction
Definition
IMA is the acronym of Inverse Multiplexing for ATM. The general idea of IMA is that the
sender schedules and distributes a high-speed ATM cell stream to multiple low-speed physical
links for transmission, and then the receiver schedules and reassembles the stream fragments
into one cell stream and submits the cell stream to the ATM layer. In this manner, bandwidths
are multiplexed flexibly, improving the efficiency of bandwidth usage.
Based on ATM circuits on PSNs, Asynchronous Transfer Mode over Packet Switching
Networks (ATMoPSN) is a type of PWE3 service emulation. ATMoPSN emulates ATM
services over a PSN such as an MPLS or Ethernet network, and transparently transmits ATM
services over the PSN. ATM cells can be encapsulated in the following modes: 1-to-1 VPC, 1-
to-1 VCC, N-to-1 VPC, and N-to-1 VCC.
Purpose
Currently, on mobile carriers' networks, a great number of ATM switches are deployed on the
convergence point to converge ports and bandwidths for ATM and IMA interfaces of Base
Station. With the changes in the entire industry chain, ATM switches are showing
disadvantages in terms of costs and scalability.
Along with the trend of All-IP on core networks and increasing use of the Ethernet technology
on access-layer devices, the Ethernet plus IP solution has become more appealing to
customers than conventional service access and bearing solutions, in terms of both costs and
resource usage. Therefore, for service providers and users, the provision and bearing of ATM
services need to be shifted to PSNs. ATMoPSN is a well-developed solution to meet this
need.
Benefits
This feature offers the following benefits to carriers:
l Construction and maintenance of networks will cost less.
l Networks can be expanded flexibly and bandwidth usage is more efficient.
This feature offers the following benefits to users:
None
6.1.2 Principles
Basic IMA Concepts
l ICP cell
ICP is short for IMA Control Protocol. ICP cells are a type of IMA negotiation cells,
used mainly to synchronize frames and transmit control information (such as the IMA
version, IMA frame length, and peer mode) between communicating devices. The offset
of ICP cells in IMA frames on a link is fixed. Like common cells, ICP cells consist of a
5-byte header and 48-byte payload.
l Filler cell
In the ATM model without an IMA sub-layer, decoupling of cell rates is implemented by
Idle cells at the Transmission Convergence (TC) sub-layer. After the IMA sub-layer is
adopted, decoupling of cell rates can no longer be implemented at the TC sub-layer due
to frame synchronization. Therefore, Filler cells are defined at the IMA sub-layer to
implement decoupling of cell rates. If there is no ATM cell to be sent, the sender sends
Filler cells so that the physical layer transmits cells at a fixed rate.
l Minimum number of active links
It refers to the minimum number of active links that are required when the IMA group
enters the Operational state. Link faults may cause the number of active links for the
IMA group in the Operational state to be smaller than the configured minimum value. As
a result, the IMA group status changes and IMA may go Down. Two communication
devices can be configured with different minimum numbers of active links, but both
devices must be configured with at least the specified minimum number of active links to
be able to properly send ATM cells.
l Differential delay
Links in an IMA group may have different delays and jitters. If the difference between
the greatest phase and the smallest phase in an IMA group exceeds the configured
differential delay, the IMA group removes the link with the longest delay from the
cyclical sending queue and informs the peer that the link is unavailable by sending the
Link Control Protocol (LCP) cells. Through negotiation between the two ends of a link,
the link becomes active and then rejoins the cyclical sending queue of the IMA group.
ATMoPSN
Figure 6-1 shows a reference model of ATM cell transport.
Emulated Service
Layer 2 service emulation attempts to emulate original ATM services between two PEs
connected through PWs that are set up to transmit packets, cells, and bit streams over public
networks or PSNs.
The outer tag (called PSN Label) identifies a PSN tunnel; the inner tag (called PW Header)
identifies a PW; ATM cells that are used for Layer 2 connections are the payload of PWs.
ATM cell transport involves three levels (port, VP, and VC), four encapsulation modes (N-
to-1, 1-to-1, AAL5-PDU, and AAL-SDU), and two transparent transport modes (cell and
frame). They are applicable to different scenarios.
Currently, only the N-to-1 PVC, N-to-1 VCC, 1-to-1 PVC, or 1-to-1 VCC encapsulation mode
of ALL0 cell is supported.
l N-to-1 VPC
In N-to-1 VPC ATM cell transport, a PW transmits cells of multiple ATM VPCs. A
tunnel packet carries both the VPI and the VCI information, as shown in Figure 6-2.
l N-to-1 VCC
In N-to-1 VCC ATM cell transport, a PW transmits cells of multiple ATM VCCs. A
tunnel packet carries both the VPI and the VCI information. ATM cell transport through
PWs must support the N-to-1 VCC mode. In this mode, multiple VCs can be set up
between a PE and a CE. Data transmission on VCs is independent of each other.
In N-to-1 VCC ATM cell transport, multiple VCs of different service boards can be
mapped to a PW, as shown in Figure 6-3.
l 1-to-1 VPC
In 1-to-1 VPC ATM cell transport, one PW transmits cells of one ATM VPC. A tunnel
packet carries the VCI information but not the VPI information, as shown in Figure 6-4.
Pseudowire Header
VCI
VCI
l 1-to-1 VCC
In 1-to-1 VCC ATM cell transport, one PW transmits cells of one ATM VCC. A tunnel
packet does not carry the VPI or VCI information, as shown in Figure 6-5.
Pseudowire Header
M V Res PTI C
6.1.3 Applications
Applicable Scenario 1
ATM over
ATM over
E1
E1 E3/OC3
PE1 PE2
ATM over ATM over
Packet E1
E1 E3/OC3
Switched
Network
2.5G Base Station 2.5G BSC
2G BSC
2G Base Station
Scenario description
As show in Figure 6-6, after ATM services from the NodeB are converged at the E1 interface
on PE1, ATM cells are encapsulated into PSN packets that can be transmitted over PSNs.
After arriving at the downlink PE2, the PSN packets are decapsulated into the original ATM
cells and then the ATM cells are sent to the RNC.
Advantages of the solution
In this solution, services of multiple types are converged at a PE on a PSN. This improves the
efficiency of current network resources, reduces Plesiochronous Digital Hierarchy (PDH)
VLLs, and facilitates the deployment of new sites as well as the maintenance and
management of multiple services.
Applicable Scenario 2
lub
Node B
N*E1(ATM IMA) N*E1(ATM IMA)
MPLS over
Metro Ethernet
CX600 RNC
N*E1(ATM IMA)
Node B
PWE3 ATM Transparent
Cell Transport
Scenario description
Deploying ATN on a Metro Ethernet-based MPLS network, as shown in Figure 6-7, can
solve the problem of bandwidth statistical multiplexing. In this scheme, a NodeB is connected
to the ATN that provides an E1 IMA interface. After the ATN receives cells on the IMA
interface, it transparently transmits the high-speed ATM cell stream through ATM PWE3 to
the CX at the RNC side. Then, the CX at the RNC side divides the high-speed ATM cell
stream into N segments, and sends each segment along a low-speed E1 link to the RNC.
Advantages of the solution
In this solution, MPLS networks are used to implement bandwidth multiplexing, reducing
costs on network construction and maintenance.
Terms
None
AN Access Node
PW Pseudo Wire
Definition
The Point-to-Point Protocol (PPP) is a link layer protocol.
l Link Control Protocol (LCP): Creates, monitors, and tears down PPP links.
l Network Control Protocol (NCP) suite: negotiates the format and type of packets
transmitted on data links.
l Extended PPP suite (such as PPPoE): provides further supports on PPP functions.
In the event that a single synchronous serial interface cannot meet the bandwidth requirement,
you can use the Multilink PPP (MP) to bundle multiple synchronous serial interfaces to form
a logical interface to meet the bandwidth requirement.
Purpose
PPP transmits data between two peers over full-duplex synchronous or asynchronous links. In
addition, PPP provides authentication mechanisms.
6.2.2 Principles
AUTHENTICATE SUCCESS
UP OPENED
Dead Establish
FAIL
/NONE
DOWN
Terminate Network
In the process of configuring, maintaining, and terminating the point-to-point (P2P) link, the
P2P link goes through several distinct phases which are specified in Figure 6-8:
1. Link Dead
Setup of a PPP link begins and ends with the Link Dead phase.
After the communicating devices on both ends detect that a physical link is activated
(generally, carrier signals are detected on the link), the devices enter the Link
Establishment phase.
2. Link Establishment
In this phase, the LCP negotiation is performed. The negotiation involves options such
as: Maximum Receive Unit (MRU), authentication mode, magic number, and
asynchronous character mapping.
If the LCP negotiation fails, both ends return to the Link Dead phase. If the LCP
negotiation succeeds, LCP changes to an Open state, indicating that the lower-layer link
has been established and the devices enter the next phase. If authentication is configured,
the devices enter the Authentication phase; if authentication is not configured, the
devices enter the Network-Layer Protocol phase.
3. Network-Layer Protocol
Once PPP has completed the previous phases, each network-layer protocol (such as IP,
IPX, or AppleTalk) must be separately configured by the appropriate Network Control
Protocol (NCP). After an NCP enters the Open state, PPP will carry the corresponding
network-layer protocol packets.
If one device receives a Configure-Request packet in this phase, both devices return to
the Link Establishment phase.
4. Link Termination
PPP can terminate the link at any time. This might happen because of the loss of carrier
signal, authentication failure, link quality failure, the expiration of an idle-period timer,
or the administrative closing of the link.
LCP is used to close the link through the exchange of Terminate packets. When the link
is closing, PPP informs the network-layer protocols so that they may take appropriate
action. After the exchange of Terminate packets, the implementation should signal the
physical-layer to disconnect in order to enforce the termination of the link.
6.2.3 Applications
None
MP Multilink PPP
6.3 CES
6.3.1 Introduction
Definition
l TDM
Time Division Multiplex (TDM) is implemented by dividing a channel by time,
sampling voice signals, and enabling sampled voice signals to occupy a fixed interval
that is called timeslot according to time sequence. In this way, multiple ways of signals,
through TDM, can be combined into one way of high-rate complex digital signal (group
signal) in a certain structure. Each way of signal is transmitted independently.
– In the PDH system, E1, T1, E3, and T3 are usually used.
– In the SDH system, the STM-1, STM-4, and STM-16 are usually used.
Clock Synchronization
TDM services require clock synchronization. One of the two parties in communication
takes the clock of the other as the source, that is, the device functioning as the Data
Circuit-terminal Equipment (DCE) outputs clocks signals to the device functioning as
the Data Terminal Equipment (DTE). If the clock mode is incorrect or the clock is faulty,
error code is generated or synchronization fails.
The synchronization clock signals for TDM services are extracted from the physical
layer. The 2.048 MHz synchronization clock signals for E1 are extracted from the line
code. The transmission adopts HDB3 or AMI coding that carries timing information.
Therefore, devices can extract clock signals from these two types of codes.
l PWE3
Pseudo Wire Emulation Edge-to-Edge (PWE3) is a mechanism to emulate core features
of the telecom service through PSN, such as a T1 leased line or frame relay (FR). The
PW technology is used to carry emulated services from one PE to another PE or multiple
PEs through a PSN. It adopts a tunnel (IP/MPLS) on the PSN to emulate multiple
services, such as HDLC, PPP, TDM, and Ethernet.
PSN can transmit Protocol Data Units (PDUs) of multiple services. Interoperability and
conversion between the services are not required. Tunnels used for PWE3 are called
pseudo wires (PW). PW data traffic is invisible for the core network. The core network
transparently transmits CE services.
AC PW AC
PSN Tunnel
l TDMoPSN
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks
(TDMoPSN) is a kind of PWE3 service emulation. TDMoPSN emulates TDM services
over a PSN such as an MPLS or Ethernet network; therefore, transparently transmitting
TDM services over a PSN. TDMoPSN is mainly implemented by means of two
protocols: Structure-Agnostic TDM over Packet (SAToP) and Structure-Aware TDM
Circuit Emulation Service over Packet Switched Network (CESoPSN).
l IP RAN
IP RAN, mobile carrier, is a technology used to carry wireless services over the IP
network. IP RAN scenarios are complex because different base stations (BSs), interface
technologies, access and convergence scenarios are involved.
– 2G/2.5G/3G/LTE, traditional BSs/IP BSs, GSM/CDMA, TDM/ATM/IP (interface
technologies) are involved.
– Varying with the BS type, distribution model, network environment, and evolution
process, the convergence modes include microwave, MSTP, DSL, PON, and Fiber.
You can converge services on BSs directly to the MAN UPE or through
convergence gateways (with functions of BS convergence, compression
optimization, packet gateway, and offload).
– Reliability, security, QoS and operation and maintenance (OM) are considered in IP
RAN scenarios. In some IP RAN scenarios, transmission efficiency is concerned.
Purpose
TDMoPSN is just a mature solution of this kind. TDMoPSN is applied to implement
accessing and bearing of TDM services on the PSN.TDMoPSN is mainly applied to IP RAN
carrying wireless services to carry fixed network services between MSAN devices.
Benefits
The TDMoPSN feature offers the following benefits to carriers:
l Saves rent for expensive TDM leased lines.
l Facilitates smooth evolution of the network.
l Simplifies network operations and reduces maintenance cost.
l Binds only the useful time slots into packets to improve the resource utilization.
The TDMoPSN feature offers the following benefits to users:
Be free from paying expensive rent for leased lines for fixed network operators when an
enterprise access the network for the voice service.
6.3.2 Principles
TDMoPSN
A TDMoPSN packet, as defined by Recommendation rfc4553-Structure-Agnostic Time
Division Multiplexing, includes the Ethernet header, TDMoPSN packet (CESoPSN or SAToP
packet), and FCS.
TDMoPSN Frame
l SAToP
The Structure-Agnostic TDM over Packet (SAToP) function emulates PDH circuit
services of low rate.
SAToP is used to carry E1 services in unframed mode (non-structured). It divides and
encapsulates serial data streams of TDM services, and then transmits encapsulated
packets in a PW. SAToP is the most simple method to handle transparent transmission of
PDH low-rate services in TDM circuit simulation schemes.
Ch24 ... .
Ch2 Ch1 F
UDT
. TDMoIP
. IWF
Ch24 ... Ch2 Ch1 F
DS1/T1
TDMoPSN services on the ATN are encapsulated through MPLS. The CESoPSN
encapsulation structure complies with Recommendation draft-ietf-pwe3-cesopsn-07 and
SAToP encapsulation structure complies with Recommendation rfc4553-Structure-Agnostic
Time Division Multiplexing.
CESoPSN implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation draft-ietf-pwe3-cesopsn-07 as shown in Figure 6-14.
0 1 2 3
OPTIONAL
l MPLS Lable
The specified PSN header includes data required for forwarding packets from the PSN
border gateway to the TDM border gateway.
PWs are distinguished by PW tags that are carried on the specified layer of the PSN.
Since TDM is bidirectional, two PWs in reverse directions should be correlated.
l PW Control Word
The structure of the CESoPSN control word is defined by Recommendation draft-ietf-
pwe3-cesopsn-07 as shown in Figure 6-15.
0 1 2 3
0 1 2 3
The padding method for the RTP header on the ATN is to keep the sequence number (16
bits) consistent with the PW control word and pad other bits with 0s.
l TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by the
number of timeslots bound to PW (bytes). When the length of the whole PW packet is
shorter than 64 bytes, fixed bit fields are padded to meet requirements of Ethernet
transmission.
SAToP implementation
SAToP services are encapsulated through MPLS, with the structure defined by
Recommendation rfc4553-Structure-Agnostic Time Division Multiplexing as show in Figure
6-17
0 1 2 3
...
MPLS Label Stack
...
OPTIONAL
...
TDM data (Payload)
...
l MPLS Lable
The MPLS label for SAToP is the same as the MPLS label for CESoPSN.
l PW Control Word
The structure of the CESoPSN control word is defined by Recommendation RFC4553-
Structure-Agnostic Time Division Multiplexing as show in Figure 6-18.
0 1 2 3
– Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The initial value is the sequence number is
random.
l Optional RTP
The optional RTP for SAToP is the same as the optional RTP for CESoPSN.
l TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by 32
(bytes). When the length of the whole PW packet is shorter than 64 bytes, the fixed bits
are padded to meet requirements of Ethernet transmission.
Implementation Procedures
The frequency of E1 frames is 8000 frames/second, namely, 32 bytes/frame. An E1 frame
consists of 32 timeslots and each timeslot corresponds to one byte of 32 bytes. For example,
in CESoPSN mode, timeslot 0 (the byte 0 of 32 bytes) as the frame header, cannot carry data
but is used for special processing. The other 31 timeslots correspond to bytes 1 to 31 of each
E1 frame. In SAToP mode, no frame header is used and an E1 frame consists of 32 bytes.
As shown in Figure 6-19, the following implementation procedures goes from CE1, PE1,
PE2, to CE2. In the direction of TDM transparent transmission from CE1 to PE1, in
CESoPSN mode, PE1 encapsulates bytes 1 to 31 (payload) of the E1 frame received from
CE1 in a PW packet. In SAToP mode, PE1 encapsulates 256 bits as payload from the bit
stream in the form of 32 x 8 = 256bit in a PW packet. The frequency of E1 frames is fixed,
and therefore PE1 receives data (31 bytes or 256 bits) of a fixed frequency from CE1 and then
encapsulates data in the PW packet continuously. When the number of encapsulated frames
reaches the pre-configured number, the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the control word is mandatory. The L bit, R bit,
and sequence number domain must be paid attention to. The L bit and R bit are used to carry
alarm information. They are used when the TDM transparent transmission process transmits
E1 frame data received by PE1 in a PW to an E1 interface of PE2 and PE1 needs to transmit
alarm information (such as AIS and RDI) from CE1 to a remote device. PE1 reports received
alarm information (AIS/RDI) to the control plane. The control plane modifies the L bit and R
bit in the control word of the PW packet and then sends them with E1 frame data to PE2.
The sequence number is used to prevent PW packets from being discarded or disordered
during forwarding on the PSN. Every time a PW packet is sent by PE1, the sequence number
increases by 1.
The downstream traffic goes from PE2 to CE2. After receiving a PW packet from the PSN,
PE2 caches the PW packet in different buffers by the mask included in the sequence number.
For example, the sequence number is 16 bits and 256 buffers are configured for caching, and
therefore the lowest 8 bits of the 16-bit sequence number is cached according to the map
address. When the sequence number of received PW packet is sequential and the configured
jitter buffer for the PW packet reaches the threshold, the PW packet is unpacked and then
sent. For example, 8 frames are encapsulated in a packet. According to the frequency of 8000
frames/second, 8 frames require 1 ms; however, the jitter buffer is configured to 3 ms.
Therefore, PW packets are not sent until its total number reaches 3.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
Before the PW packet is resolved and the sequence number is processed, the L bit and R bit
need to be processed. The L bit and R bit that carry alarm information is sent to PE2. After
being extracted with payload, the PW packet is sent to CE2 at the same frequency as that of
CE1 in the way that 31 bytes or 256 bits are included in a frame; otherwise, PE2 overruns or
underruns. Therefore, clock synchronization (frequency synchronization) is required between
the CE1 lock and PE2 clock in TDM transparent transmission.
AC PW AC
PSN Tunnel
As shown in Figure 6-19, it is assumed that data is transmitted from CE2 to CE1. Alarm
transparent transmission is the process of transmitting E1/T1 alarms on PE1 to downstream
PE2 through the PW control word, restoring E1/T1 alarms, and then transmitting them to
CE2, and vice versa.
The types of alarms that can be transparently transmitted are AIS and RDI. Involved PW
control words are the L bit, R bit, and M bit.
Other Features
Both the non-slotted TDM interface (SAToP transparent transmission) and the slotted TDM
interface (CESoPSN transparent transmission) can be created.
The serial port supports encapsulation of packets through multiple protocols such as TDM,
ATM, PPP.
The dynamic or static PW protocol is supported.
6.3.3 Applications
Applicable Scenario 1
Node B RNC
PE1 PE2
ATM over
ATM over E1 Packet E1
Switched E3/OC3
Network
Node B RNC
TDM over E1
TDM over E1
E3/OC3
Node B RNC
Scenario description
After TDM services from 2G base stations are converged on the E1 interface on PE1, TDM
packets are encapsulated into PSN packets that can be transmitted on PSNs. After reaching
downstream PE2, PSN packets are decapsulated to original TDM packets and then the TDM
packets are sent to the 2G convergence device.
Advantages of the solution
In the solution, multiple types of services are converged at a PE on the PSN. The solution
effectively saves original network resources, uses less PDH VLLs, and facilitates site
deployment and the maintenance and administration of multiple services.
Purpose
BER measurement is implemented using random PRBS bit sequences over an entire link,
monitoring link connectivity and quality.
Benefits
BER measurement brings the following benefits to operators:
l Monitors link quality during network cutover and helps identify potential risks,
improving the cutover success ratio and minimizing user complaints about operator
network issues.
l Helps speed up service deployment and cutover on a network, shortening the service
launch period.
6.4.3 Application
E1 MPLS/IP E1
Core
Node B ATN CX RNC
The ATN can detect the bit error rate on E1 links between itself and a NodeB or RNC.
Terms
BER Description
6.5 APS
This document describes principles and applications of the Automatic Protection Switching
(APS) feature.
NOTE
Only ATN 950B supports the Automatic Protection Switching (APS) feature.
Definition
Automatic Protection Switching (APS) is a mechanism of using a protection interface on the
Synchronous Digital Hierarchy (SDH) network as the backup for a working interface. When
APS detects a fault on the working link, a switchover request conveyed by the K1 and K2
bytes of the Multiplex Section Overhead (MSOH) on the protection link is sent to the peer
device. Upon receiving the switchover request, the peer device returns a reply and performs
the switchover action.
Object
APS is an inherent feature of the SDH network. In a mobile bearer network, the ATN needs to
be connected to the Add/DropMultiplex (ADM) on the SDH network or the RNC. The
previous protection mechanisms on the ATN, however, cannot accomplish the task of
protecting the link between the ATN and the ADM or RNC. The APS feature, that is
supported on the ATN, ADM, and RNC, is thus introduced to meet the requirement of link
protection.
Benefits
The APS feature brings remarkable benefits to operators:
6.5.2 Principles
1001 Unused
0111 Unused
0101 Unused
0100 Exercise
0011 Unused
0000 No request
The priority of 0000 is the lowest, and the priority of 1111 is the highest.
K2 byte coding
l Bits 1 to 4 indicate the link number. The link is defined with the same syntax as K1 Bits
5 to 8.
l Bit 5 indicates the protection mode: 1 indicates the 1:N mode and 0 indicates the 1+1
mode.
l Bits 6 to 8 indicate the operation mode or operation code. For details, see Table 6-2.
APS Modes
l According to the protection architecture, APS modes can be classified into 1 + 1 mode
and 1:N mode
– In 1 + 1 mode, a protection link is paired with each working link. Normally, the
sender periodically sends the signal payload to both the working and protection
links (this process is called bridging), and the receiver obtains the signal payload
from the working link unless the working link becomes unavailable. In most cases,
the switchover action is performed only on the receiver and the negotiation between
the sender and receiver based on K1 and K2 bytes is not required.
In APS 1 + 1 mode, the time taken for the switchover is short and the switch
reliability is high; however, the link usage is as low as 50%. Figure 6-23 shows
detailed switchover procedures in APS 1 + 1 mode.
Working link
Normal Condition:
Protection link
one signal is chosen per pair
Working link
Failture Condition:
Protection link the "best" signal is chosen
Source Destination
Working link
Normal Condition:
protection on
channel is empty
Protection link
Source Destination
When multiple working links become unavailable, only the data on the working link
with the highest priority is switched to the protection link. Data on other working
links is discarded.
When N is 1, the APS mode is 1:1.
In 1:N mode, both the sender and receiver perform the switchover action after the
negotiation based on K1 and K2 bytes. Comparing with 1 + 1 mode, 1:N mode
features higher link usage but lower reliability.
l According to the reverse mode, APS modes can be classified into revertive mode and
non-revertive mode.
In revertive mode, data is switched back from the protection link to the working link
after the working link becomes available and remains stable for several minutes. In non-
revertive mode, data is not switched back from the protection link to the working link
after the working link becomes available. The APS 1 + 1 mode can and be revertive or
non-revertive, the default mode is non-revertive. The APS 1:1 mode can and be revertive
or non-revertive, the default mode is revertive.
l According to the switchover mode in the event of link failure, APS modes can be
classified into unidirectional mode and bidirectional mode.
– In unidirectional mode, only the receiver detects the link failure and performs the
switchover action. After the switchover, the sender and receiver select different
links to receive traffic.
– In bidirectional APS mode, the receiver detects the fault, but both the receiver and
sender perform the switchover action after the negotiation based on K bytes. After
the switchover, the receiver and sender select the same link to send or receive data.
To guarantee the normal transmission of services (above Layer 2) after the APS switchover,
physical interfaces are added to a trunk interface, such as the CPOS-Trunk interface (Trunk-
Serial and Global-MP-Group). Service attributes are configured on the trunk interface to
transparently transmit services above Layer 2. However, physical attributes are configured on
the physical interfaces.
6.5.3 Applications
On the following mobile bearer network, NodeBs connect to an MSTP network through E1
lines. After processing services, the MSTP network sends the services to the ATN device
through CPOS interfaces. To improve service reliability, an APS group can be created, and the
two CPOS interfaces can be added to the APS group. If the working link fails, the ATN
device automatically receives traffic from the protection link, thereby ensuring service
reliability.
E1
Cpos0/2/1
NodeB1 GE1/3/3 STM-1
MSTP
Cpos0/2/2 GE0/3/3
E1 RNC
NodeB2
6.6 xDSL
NOTE
6.6.1 Introduction
Definition
Digital subscriber line (DSL) provides digital connections over telephone lines without
affecting the plain old telephone service (POTS).
xDSL refers to a family of modulation and demodulation DSL technologies. xDSL uses a
high frequency (over 4 kHz) digital compression mechanism to provide high-speed broadband
network access service. Because the frequency band for xDSL is higher than that for voice
signals, telephone lines can transmit both data and voice signals without one affecting the
other.
Ethernet in the first mile (EFM) combines the technical advantages of SHDSL and Ethernet,
and provisions the POTS and high-speed Internet access services over common twisted pairs,
while addressing user demands for high definition (HD) TV and video on demand (VOD).
EFM is ideal for providing "last mile" access to residential areas.
Purpose
Obtaining cost-effective broadband resources is a challenge for carriers building mobile
backhaul networks. Fixed-line networks are a viable option for mobile backhaul, but terminal
modems on these networks are difficult to manage and their reliability cannot be guaranteed.
The xDSL physical interface cards (PICs) for ATNs integrate the traditional modems for easy
management and high reliability. xDSL PICs and other PICs designated for ATNs provide
mobile backhaul solutions for carriers.
Benefits
This feature helps carriers reduce investment costs in building 3G networks by using the high
bandwidth provided by the legacy copper cable fixed network. Compared with traditional
modems, xDSL also ensures higher reliability and easier management.
6.6.2 Principles
l In ATM mode, the xDSL PIC encapsulates packets in AAL5 format, converts them into
ATM cells, processes the ATM cells at the physical layer, and then transmit them.
l In EFM mode, the xDSL PIC encapsulates packets in EFM format, processes them at the
physical layer, and then transmits them.
l In IMA mode, the xDSL PIC encapsulates packets in AAL5 format, converts them into
IMA cells, processes the IMA cells at the physical layer, and then transmit them.
l xDSL interface: a physical interface on an xDSL PIC. You can disable or enable an
xDSL interface in the xDSL interface view, and configure an xDSL interface in a VE
interface view.
l DSL-group interface: a link-layer logical interface. You can set link-layer attributes in a
DSL-group interface view.
l VE interface: a Layer 3 logical interface used only on the NNI side to carry ETHoA
services. You can also run the portswitch command to switch a VE interface to a Layer
2 interface..
Before configuring xDSL services, create a VE interface and a DSL-group interface. Then,
enter the DSL-group interface view and configure link-layer attributes for the xDSL PIC.
Establish the binding relationships between VE and DSL-group interfaces and between DSL-
group and xDSL interfaces. After the binding relationships are established, configure xDSL
services on the VE interfaces. The xDSL services are then configured on the VE interfaces
and carried over the xDSL interfaces.
In the transmit direction, an xDSL PIC receives Ethernet packets, processes them, and then
transmits them. In the receive direction, an xDSL PIC receives and processes service packets,
and then converts them into Ethernet packets for the ATN to further process.
6.6.3 Applications
Wholesale xDSL
service
DSLAM
HSDPA
flow
ATM
IMA STM-1
In the offload scenario for ETH-based service forwarding, data services (HSDPA flow) are
carried over the Layer 2 Ethernet switching network (wholesale xDSL service network). An
MPLS tunnel must be set up between the ATN and CX devices to carry the PW.
In the ATN-to-CX direction:
l The ATN encapsulates the ATM cells into Ethernet frames. Then, the xDSL PIC
performs EFM/PTM encapsulation or performs AAL5 adaptation and ATM
encapsulation.
l The DSLAM terminates the xDSL and ATM encapsulation and transports the Ethernet
frames into the Layer 2 Ethernet switching network.
l The CX receives and decapsulates the Ethernet frames to ATM cells.
Wholesale xDSL
service
DSLAM
HSDPA
flow
ATM
IMA STM-1
In the offload scenario for IP-based xDSL service forwarding, data services (HSDPA flow)
are carried over the IP switching network (wholesale xDSL service network). A GRE tunnel
must be set up between the ATN and CX to carry the PW. GRE in this figure indicates that IP
packets are carried by the GRE tunnel.
l The ATN encapsulates the ATM cells into Ethernet frames. Then, the xDSL PIC
performs EFM encapsulation or performs AAL5 adaptation and ATM encapsulation.
l DSLAM terminates the xDSL and ATM encapsulation and forwards the packets over
Layer 3 to the CX.
6.7 GPON
Gigabit-capable passive optical network (GPON) is a PON technology standardized by ITU-T
Recommendation G.984.x. GPON devices support high-bandwidth transmission, thereby
addressing the bandwidth bottleneck in twisted-pair access and meeting user demands on
high-bandwidth services.
6.7.1 Overview
With the wide use of broadband services and fiber-in and copper-out development, carriers
require a longer transmission reach, higher bandwidth and reliability, and lower operating
expense (OPEX) on services. GPON meets the requirements by providing:
l A longer transmission reach: Optical fibers are used for transmission, providing a
coverage radius of 20 km for the access layer.
l A higher bandwidth: The maximum downstream and upstream bandwidths are 2.5 Gbit/s
and 1.25 Gbit/s, respectively, for each user.
l Quality of service (QoS) for all services: A GPON carries GPON encapsulation mode
(GEM) frames to ensure better QoS.
l Optical splitters: An optical splitter splits a single optical fiber into multiple optical
fibers, allowing a single optical fiber from the central office (CO) to feed multiple users.
Optical splitter conserve optical fiber resources, reduce the number of optical and
electrical devices in the CO, and reduce the OPEX.
A PON is a point to multi-point (P2MP) network and consists of three parts, as shown in
Figure 6-29.
ONU
l The optical line terminal (OLT) implements a PON protocol and aggregates PON traffic,
and is located at the CO.
l Optical network units (ONUs)/Optical network terminals (ONTs) are located on the user
side to provide various ports for connecting to user terminals. The OLT and ONUs are
connected through a passive optical distribution network (ODN) for communication.
l The ODN is composed of passive optical components, such as optical fibers and one or
more passive optical splitters (POSs). The ODN provides highly reliable optical channels
between the OLT and ONUs.
NOTE
A passive ODN does not require active optical amplifiers or regenerators, saving the costs associated
with maintaining outdoor active devices.
6.7.2 Introduction
GPON System
Gigabit-capable passive optical network (GPON) is a mainstream PON technology that
provides gigabit access speeds. Other PON technologies include Ethernet passive optical
network (EPON) and broadband passive optical network (BPON). BPON uses ATM
encapsulation for carrying ATM services (however, as ATM becomes obsolescent, BPON
usage is shrinking).
Figure 6-30 shows the components involved in a GPON network.
Downstream Wavelength:
1490nm
ODN
OLT
Upstream Wavelength:
ONU 1310nm
Main features:
l On a GPON network, an OLT connects to multiple ONUs by way of an optical splitter
that splits the optical fiber connection from the OLT into multiple optical fibers that
connect to the ONUs. The GPON network wavelength used for transmission in the
upstream direction is 1310 nm, and that used in the downstream direction is 1490 nm.
l Wavelength division multiplexing (WDM) is used to transmit data over an ODN: data is
broadcast in the downstream direction, and time division multiple access (TDMA) is
used in the upstream direction.
ONU1
OLT
ONU2
ONU3 Splitter
ONU1
OLT
ONU2
ONU3 Splitter
GEM Frame
On a gigabit-capable passive optical network (GPON), a GPON encapsulation mode (GEM)
frame is the smallest service-carrying unit and the basic encapsulation structure. All service
streams are encapsulated into GEM frames, transmitted over GPON midia, and identified by
GEM ports. Each GEM port is identified by a unique port ID that is globally allocated by an
OLT. Similar to the virtual path identifier (VPI)/virtual channel identifier (VCI) in an
asynchronous transfer mode (ATM) virtual connection, a GEM port identifies a virtual service
channel that transmits service streams between the OLT and an ONU.
A GEM header consists of payload length indicator (PLI), Port ID, payload type indicator
(PTI), and header error check (HEC), and is used to differentiate data of different GEM ports.
l PLI: identifies the length of the data payload.
l Port ID: uniquely identifies a GEM port.
l PTI: identifies the type and status of the data that is being transmitted. For example, the
PTI value can indicate whether an operation, administration and maintenance (OAM)
message is being transmitted or whether data transmission is complete.
l HEC: provides the forward error correction (FEC) function to ensure transmission
quality.
l Fragment payload: identifies the payload of a frame fragment.
Figure 6-34 shows the mapping between an Ethernet frame and GEM frame.
l The GPON system parses Ethernet frames and maps Ethernet data into GEM payloads
for transmission.
l GEM frames automatically encapsulate header information.
l The mapping format is clear and widely compatible.
T-CONT
A transmission container (T-CONT) is a carrier and basic control unit of upstream service
streams in the GPON system. Each T-CONT is identified by an Alloc-ID, which is allocated
by a GPON port of the OLT. All GEM ports are mapped to T-CONTs. T-CONTs then transmit
upstream service streams to an OLT through dynamic bandwidth allocation (DBA)
scheduling.
ONU OLT
GEM Port
GEM Port T-CONT
GEM Port
GEM Port
T-CONT
GEM Port
T-CONTs are divided into five types, which can be selected based on the upstream service
streams during scheduling. Each T-CONT type has its own quality of service (QoS) feature.
Table 6-3 lists the T-CONT types. Type 1 through Type 5 represent fixed, assured, non-
assured, best-effort, and hybrid modes, respectively.
Fixed X – – – X
Bandwidth
Assured – Y Y – Y
Bandwidth
Description l The fixed l The l This type l This type This type is
bandwidt assured is the is the the
h is bandwidt combinat maximu combination
reserved h is ion of the m of the fixed,
for available assured bandwidt assured, and
specific at any bandwidt h that can maximum
ONUs or time h and be used bandwidths.
specific required maximu by an It supports
services by an m ONU. the following
on ONU. bandwidt l It applies functions:
ONUs. It When h. The to IPTV l Reserves
cannot be the system and high- bandwidt
used by bandwidt assures speed h that
other h some Internet cannot be
ONUs required bandwidt services. preempte
even if by the h for d for
no service users and users.
upstream streams allows
service on the users to l Provides
streams ONU is preempt bandwidt
are smaller bandwidt h to an
carried than the h. ONU
on the assured However, when
specific bandwidt the total required.
ONUs. h, the used l Allows
Even if system bandwidt users to
no can use h cannot preempt
upstream the DBA exceed some
service mechani the bandwidt
streams sm to maximu h. (The
are allocate m total used
carried the configure bandwidt
on the remainin d h cannot
specific g bandwidt exceed
ONUs, bandwidt h. the
the fixed h to l Applies maximu
bandwidt services to VoIP m
h cannot on other services. configure
be used ONUs. d
by other l Because bandwidt
ONUs. DBA is h.)
l Applies required,
to TDM this type
or VoIP provides
services lower
that are real-time
sensitive performa
nce
to service compare
quality. d with
the fixed
bandwidt
h.
NOTE
In Table 6-3, X indicates the fixed bandwidth value, Y indicates the assured bandwidth value, Z indicates the
maximum bandwidth value, and a hyphen (-) indicates not involved.
l Each GEM port can carry one or more types of service streams. For GEM ports carrying
service streams, each GEM port must be mapped to a T-CONT before upstream service
scheduling. Each optical network unit (ONU) supports multiple T-CONTs that can
transmit different types of services.
l A T-CONT can be bound to one or more GEM ports. On the optical line terminal (OLT),
GEM ports are demodulated from the T-CONT, and service streams are demodulated
from the GEM port payload for further processing.
Service mapping
l In the downstream direction, the GPON service processing unit encapsulates all service
streams into GEM ports and broadcasts the streams to all ONUs connected to the OLT's
GPON port. Each ONU filters data according to GEM port IDs and accepts only its own
services. Then, each ONU decapsulates service streams from the GEM port and sends
them to the user terminal through an ONU service port. Figure 6-35 shows GPON
service mapping in the downstream direction.
GEM Port
filter IFgpon ONU N
l In the upstream direction, ONUs map service streams to GEM ports and then to different
types of T-CONTs. After services are transmitted to an OLT, the T-CONT demodulates
GEM ports and sends them to the GPON MAC chip. The MAC chip demodulates
service streams in the GEM port payload and then sends them to a service processing
unit. Figure 6-36 shows GPON service mapping in the upstream direction.
IFgpon
GEM Port
T-CONT IFgpon ONU N
GEM Port
Downstream framing
125 µs
Physical Control Block
Downstream (PCBd)
Upstream Payload
Bandwidth Map
ONU
AllodID Start End AllodID Start End AllodID Start End
Ind
Preamble Delimiter PLOAMu
time -ID x x y y
Allocation
PLOu Allocation interval
interval
Upstream framing
Each upstream frame contains the content carried by one or more T-CONTs. The BWmap in
each downstream frame identifies the transmission start time and end time for each T-CONT.
When an ONU receives the PON media access right from another ONU, it must send physical
layer overhead upstream (PLOu) data. If an ONU is allocated two consecutive Alloc-IDs (the
end time of one Alloc-ID is smaller by 1 than the start time of the other Alloc-ID), the ONU
stops sending the PLOu of the second Alloc-ID.
Upstream GPON Frame
An upstream GPON frame consists of the physical layer overhead upstream (PLOu), physical
layer operations, administration, and management upstream (PLOAMu), dynamic bandwidth
report upstream (DBRu), and payload fields. These fields are described as follows:
l PLOu: used for frame alignment, synchronization, and identification for an ONU.
l PLOAMu: used for reporting ONU management messages, including maintenance and
management status. This field must be negotiated and may or may not be carried in a
frame.
l DBRu: used for reporting the T-CONT status to apply for bandwidth next time and for
allocating dynamic bandwidths. This field must be negotiated and may or may not be
carried in a frame.
l Payload: can be a DBA status report or data frame. If this field is a data frame, this field
consists of a GEM header and frames.
Downstream GPON Frame
GPON uses TDM for upstream transmission. If multiple ONUs transmit data upstream
concurrently, transmission conflicts occur. To prevent conflicts, an OLT sends a notification
through the downstream frame, informing each ONU of its timeslot for upstream
transmission.
The OLT broadcasts PCBd to all ONUs. Each ONU receives the entire PCBd and performs
operations based on the information contained in the PCBd.
Figure 6-38 shows the PCBd structure.
PCBd Payload
PCBd contains PSync, Ident, PLOAMd, BIP, PLend, and US BW Map fields. These fields are
described as follows:
l PSync: used by ONUs to specify the start of each frame.
l Ident: used for sorting a frame in the frames of the same type in length sequence.
l PLOAMd: used for reporting ONU management messages, including maintenance and
management status. This field must be negotiated and may or may not be carried in a
frame.
l BIP: used for performing a parity check for all bytes between two BIP fields (excluding
the preamble and delimit) to monitor error codes.
l PLend: used for specifying the length of the BWmap field.
l US BW Map: used by the OLT for sending the upstream bandwidth mapping to each T-
CONT. The BWmap specifies the start and end times for each T-CONT in transmitting
data.
6.7.4.1 Ranging
The logical distance from optical network units (ONUs) to an optical line terminal (OLT)
vary. The round trip delays (RTDs) between an OLT and ONUs also vary depending on time
and environment. Therefore, collisions may occur when an ONU sends data in TDMA mode
(in this mode, only one of the ONUs connecting to a PON port sends data at any given
moment), as shown in Figure 6-39.
ONU1
Collision
OLT
ONU2
ONU3 Splitter
Ranging helps prevent the collisions and is enabled when an ONU registers for the first time.
In the ranging process, the OLT measures the RTD and calculates the equalization delay
(EqD) of each ONU to ensure that the Teqd value, (which is equal to RTD plus EqD) of all
ONUs connected to the same PON port are the same. Therefore, the logical distance from
each ONU to an OLT are the same, preventing collisions during upstream transmission.
ONU1
Td1
Based on ranging
OLT
ONU2
Td2
Td3 Splitter
ONU3
NOTE
In the ranging process, the OLT must open a window and pause upstream transmission channels of other
ONUs.
apply in this direction.Figure 6-41 shows the burst transmit function supported by ONU-side
optical modules, and Figure 6-42 shows the burst receive function supported by OLT-side
optical modules.
ONU1
Burst-transmit module
OLT
ONU2
Continuous-transmit module
ONU3
Ranging can be implemented to prevent cells transmitted by different ONUs from conflicting
with each other on the OLT. However, the ranging accuracy is ± 1 bit, and the cells
transmitted by different ONUs have a protection time of several bits (not a multiple of 1 bit).
If the ONU-side optical modules do not support the burst receive and transmit function, the
transmitted signals overlap and distortion occurs.
Main features:
l The distance from each ONU to the OLT varies and therefore the optical signal
attenuation varies for each ONU. As a result, an OLT may use different power and level
to receive packets at different timeslots.
l If the OLT-side optical modules do not support the burst receive function, the OLT may
restore incorrect signals because only the level greater than the level threshold is valid
and the signals with the level lower than the level threshold cannot be restored.
6.7.4.3 DBA
In the GPON system, an OLT controls an ONU's upstream data traffic by sending
authorization signals to the ONU. PON requires an effective TDMA mechanism to control the
upstream traffic so that data packets from multiple ONUs do not collide in upstream
transmission. However, such a mechanism requires quality of service (QoS) management in
an optical distribution network (ODN). The management cannot be implemented or may
severely decrease efficiency because the ODN is a passive network. To resolve this problem,
ITU-TG.984.3 Recommendation defines the dynamic bandwidth allocation (DBA) protocol
for managing upstream PON traffic.
DBA brings the following benefits:
l Improved upstream bandwidth usage on a PON port
l More users on a PON port
l Higher bandwidths for services that have burst requirements
Figure 6-43 shows DBA principles.
ONU OLT
DBA report
Control plane DBA algorithm
logic
BW Map
T-CONT
Time slot Data plane
Scheduler
l The OLT controls the upstream traffic by allocating data authorization to each
transmission container (T-CONT) inside the ONU.
l The ONUs report their data status to the OLT. After receiving this report, the OLT uses
DBA to periodically update the information according to the status of data waiting to be
sent on the ONU and notifies all ONUs of the updates through a downstream frame.
l Each ONU dynamically adjusts its upstream bandwidth according to the allocated
bandwidth.
NOTE
Bandwidth can also be allocated in static mode, which is also called fixed mode. In this mode, an OLT
periodically allocates a fixed bandwidth to each ONU based on the ONU's service level agreement (SLA),
bandwidth, and delay indicators.
l In static mode, an OLT uses a polling mechanism. The bandwidths allocated to ONUs may vary but the
bandwidth allocated to each ONU is the same in each polling period. The bandwidth guarantee depends
on an ONU's SLA but not on its upstream service traffic. An ONU is allocated a fixed bandwidth,
regardless of whether it is carrying upstream services.
l Static allocation mode is simple and applies to services that require a fixed bandwidth, such as TDM.
However, this mode does not apply to IP services that have burst requirements on bandwidth. If this
mode is implemented to IP services, the upstream bandwidth may fail to meet the upstream service
transmission requirement.
6.7.4.4 FEC
Forward error correction (FEC) detects and corrects bit errors by allowing the transmit end to
encode redundant signals and the receive end to decode the signals based on specific rules.
Common FEC codes include Hamming codes, Reed-Solomon (RS) codes, and convolutional
codes. "Forward" in FEC means error correction is unidirectional, and no error feedback is
provided.
GPON uses RS(255,239) codes in which the codeword is 255 bytes long, consisting of 239
data bytes followed by 16 overhead bytes. RS(255,239) complies with ITU-T G.984.3. The
FEC algorithm reduces the bit error rate (BER) of 10-3 to 10-12 for GPON lines. However, due
to the overhead caused by multi-frame tail fragments, the bandwidth throughput of the GPON
system with FEC enabled is about 90% of that with FEC disabled. Figure 6-44 shows FEC
principles.
Encryption Algorithm
The encryption algorithm uses the advanced encryption standard (AES). Also known as the
Rijndael algorithm, AES is a block cipher-based standard described in documents published
by the National Institute of Standards and Technology (NIST). AES replaces the original data
encryption standard (DES) and has been used worldwide after being analyzed by multiple
institutes. The GPON system uses the AES-128 encryption algorithm in counter (CTR) mode.
In this mode, the AES-128 encryption algorithm generates a 16-byte pseudo-random cipher
block stream that is used to perform an exclusive OR operation with the input simple text to
produce the ciphertext key. To regenerate the simple text, the ciphertext is used to perform an
exclusive OR operation with the same pseudo-random cipher block stream. The AES key
length is fixed at 128 bits.
Key Change
1. An OLT initiates a key change request to an ONU. The ONU responds to the request and
sends a new key to the OLT.
2. After receiving the new key, the OLT replaces the existing key with the new one and
uses the new key to encrypt data.
3. The OLT sends the frame number that uses the new key to the ONU.
4. The ONU receives the frame number and changes the verification key on data frames.
NOTE
l Because the length of a physical layer OAM (PLOAM) message is limited, the ONU sends the key in two
pieces to the OLT for three times. For redundancy, the key is sent three times. If the OLT does not receive
either part of the key after the three sending attempts, the OLT re-initiates a key replacement request to
the ONU. If the key transmission fails three times, the OLT declares a loss of key synchronization (LOKi)
and deactivates the ONU.
l The OLT delivers a command three times to instruct the ONU to use the frame number of the new key.
The ONU switches the verification key on data frames once it receives the command.
Upstream_Overhead PLOAM
O2: Standby state
SN _Request(BWMap)
Serial_Number_ONU PLOAM
Ranging request
Ranging time
Request password
Password
SN Authentication
In SN authentication, the OLT matches only the ONU SN. Figure 6-47 shows the SN
authentication process.
NOTE
Serial_Number_ONU PLOAM
Ranging request
Ranging time
Normal-state Normal-state
ONU OLT
l After receiving an SN response message from an ONU, the OLT checks whether another
ONU with the same SN is online. If such an ONU is online, the OLT reports an SN
conflict alarm to the command line interface (CLI) or network management system
(NMS). Otherwise, the OLT directly assigns an ONU ID to the ONU.
l After the ONU enters the operation state, the OLT does not send a password request to
this ONU. Instead, the OLT automatically configures a GPON encapsulation mode
(GEM) port for the ONU to carry optical network terminal management and control
interface (OMCI) messages, and allows the ONU to go online. The GEM port must have
the same ID as the ONU ID. After the ONU goes online, the OLT reports an ONU online
alarm to the CLI or NMS.
SN+Password Authentication
In SN+password authentication, the OLT matches both the ONU SN and password. Figure
6-48 shows the SN+password authentication process.
Upstream_Overhead PLOAM
O2: Standby state
SN _Request(BWMap)
Serial_Number_ONU PLOAM
O3: Serial number state
SN is
matched.
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
Normal-state Normal-state
ONU OLT
l After receiving an SN response message from an ONU, the OLT checks whether another
ONU with the same SN is online. If such an ONU is online, the OLT reports an SN
conflict alarm to the CLI or NMS. Otherwise, the OLT directly assigns an ONU ID to the
ONU.
l After the ONU enters the operation state, the OLT sends a password request to the ONU.
After the ONU responds with a password, the OLT compares the password with the local
password. If the two passwords are the same, the OLT directly configures a GEM port
for the ONU to carry OMCI messages, allows the ONU to go online, and reports an
ONU online alarm to the CLI or NMS. If the two passwords are different, the OLT
reports a password error alarm to the CLI or NMS. The OLT does not report an ONU
automatic discovery message even if the ONU automatic discovery function is enabled
on the PON port. Instead, the OLT sends the Deactivate_ONU-ID PLOAM message to
deregister the ONU.
Password Authentication
In password authentication, an ONU that has password authentication configured connects to
a PON port. If the OLT determines that the ONU SN or password conflicts with that of an
online ONU, the OLT deregisters the ONU to be authenticated, protecting the online ONU
from being affected. Password authentication is available in two modes: once-on and always-
on.
The once-on mode applies to the following scenario: A carrier allocates a password to a user
and requires the user to go online within a specified time. After going online, the user cannot
change the ONU. To change the ONU, the user must notify the carrier. In once-on mode, the
aging time is configurable. After the aging time is set, the ONU must register with the OLT
and go online within the preset aging time. Otherwise, the ONU is not allowed to register with
the OLT or go online. Once the ONU is authenticated, its SN cannot be changed.
For the once-on mode:
l Only the initial authentication of an ONU is performed by password, as shown in Figure
6-49.
l In subsequent authentications, the ONU can be authenticated in SN or SN+password
mode according to the CLI configuration, as shown in Figure 6-47 or Figure 6-48.
Upstream_Overhead PLOAM
O2: Standby state
SN _Request(BWMap)
Serial_Number_ONU PLOAM
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
Password is
matched. O5: Operation state
Normal-state Normal-state
ONU OLT
In once-on mode, before the ONU registration times out or before the ONU successfully
registers with the OLT for the first time, the ONU discovery status is ON. Only the ONU
whose discovery status is ON is allowed to register with the OLT and go online. After the
ONU registration times out or after the ONU successfully registers with the OLT for the first
time, the OLT sets the ONU discovery status to OFF.
l The ONU whose registration times out is not allowed to register with the OLT or go
online. The registration timeout flag of the ONU needs to be reset at the central office
(CO), and then the ONU can go online.
l An ONU that successfully registers for the first time is allowed to register and go online
again.
The always-on mode applies to the following scenario: A carrier allocates a password to a
user, and the user can use different ONUs with this password and different SNs. The user can
change the ONU without notifying the carrier. In always-on mode, no restriction is set on the
time when the user goes online.
l An ONU is authenticated in password mode when it goes online for the first time. After
the ONU passes the password authentication and goes online successfully, the OLT
generates an SN+password binding entry according to the ONU SN and password.
Figure 6-50 shows the authentication process.
l If an ONU goes online not for the first time, the following situation may occur:
– If the SN and password of the ONU are the same as those of the ONU that goes
online for the first time, the ONU is authenticated in SN+password mode. Figure
6-48 shows the authentication process.
– If the user replaces the ONU with another ONU that has the same password but a
different SN, the new ONU is authenticated in password mode. After this ONU
passes authentication and goes online successfully, the original SN+password
binding entry is updated. Figure 6-50 shows the authentication process.
Upstream_Overhead PLOAM
O2: Standby state
SN _Request(BWMap)
Serial_Number_ONU PLOAM
O3: Serial number state
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
Password is
O5: Operation state
matched.
Normal-state Normal-state
ONU OLT
l Voice
l Data
l Video
l Leased line
l Distributed service
PBX
STM-1/E1
SDH/Metro
ONU Enterprise
Enterprise Router Splitter
HQ
FTTB/FTTC OLT
ONU
E1/GE
Terms
None
7 IP Services
This document describes the IP services in terms of the overview, principle, and applications.
7.1 IP Addressing
This chapter provides an introduction to Internet Protocol (IP) addressing, the principles of IP
addresses, and IP applications.
7.2 ARP
7.3 ACL
7.4 IPv4
7.5 IP Unicast Policy-Based Routing
7.6 IPv6
7.1 IP Addressing
This chapter provides an introduction to Internet Protocol (IP) addressing, the principles of IP
addresses, and IP applications.
You need to allocate IP addresses for the hosts on an IP network. To connect a computer to the
Internet, you need to apply to the Internet Service Provider (ISP) for an IP address.
byte. For example, the binary IP address of the Host A is 00001010 00000001 00000001
00000010; the decimal IP address of the Host A is 10.1.1.2.
l Network ID field (net-id): It is used to distinguish networks. The bits of the net-ID are
called the class field (or class bits). These bits are used to distinguish the IP address
class.
l Host ID field (host-id): It is used to distinguish different hosts on the network.
The network ID field identifies a network, and the host ID field identifies a connection of the
network device on the network. If multiple network devices have the same network ID, they
reside at the same network regardless of their locations. That is, whether multiple network
devices on a public network reside at the same network does not depend on their locations.
7.1.2 Principles
This section describes the classification and characteristics of IP addresses, as well as private
and special IP addresses.
You can determine the class of an IP address depending on the first bits of the network ID
field. This is the simplest method to distinguish each class of addresses.
B 1 0 Net-id Host-id
C 1 1 0 Net-id Host-id
D 1 1 1 0 Multicast-address
E 1 1 1 1 Reserved
Most IP addresses in use belong to Class A, Class B, or Class C. Class-D IP addresses are
multicast addresses, and Class-E IP addresses E are reserved. For details, refer to RFC 1166
(Internet Numbers).
Certain IP addresses are reserved for special uses. Table 7-1 lists the ranges of IP addresses
for all five classes.
about the host position. The network ID field determines which network a host belongs
to.
l When a host is connected to two networks, the host must have two IP addresses with
different net-IDs. This host is called a multi-homed host. Each interface on a host has an
IP address. Therefore, a multi-interface host has multiple IP addresses.
l According to the Internet concept, different LANs connected through repeaters or
bridges are in the same network. Therefore, these LANs have the same net-ID.
l For IP addresses, all networks assigned with net-IDs are equal (regardless of whether it
is a small LAN or a large WAN).
NOTE
In Table 7-2, net-id and subnet-id indicate the fields that are neither all zero bits nor all one bits.
7.1.3 Applications
This section describes applications of IP addresses.
7.1.3.1 Subnetting
The network part of an IP address is called the network address. The network address
identifies a unique network segment. A network administrator can divide a network address
into subnets so that broadcast packets are transmitted within a single subnet.
From the perspective of address allocation, subnets are supplements to network addresses.
Only the net-id is assigned so that IP addresses can be used flexibly when an enterprise
applies for IP addresses. The specific host-ids are assigned by the enterprise as long as there is
no repetition of host IDs in the Intranet.
When hosts are widely scattered on a network, you can divide the internal host-ids into many
subnets. Through the subnet classification, the entire network can be divided into smaller
networks.
Subnets on an enterprise network are invisible outside the enterprise. When an external packet
enters the enterprise network, the internal devices select the routes based on the subnet ID.
The devices then forward the packet to the destination host.
Figure 7-2 shows the subnetting of a Class B IP address. The subnet mask consists of a string
of continuous 1s and 0s. The 1s correspond to the net ID field and the subnet ID field. The 0s
correspond to the host ID field.
Mask 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
After performing an AND operation on the 32-bit IP address and the corresponding subnet
mask, you can get the net ID of an IP address. If the IP address is 10.1.1.2 and the subnet
mask is 255.255.0.0, you can get 10.1.0.0 as the network address after performing the AND
operation on the IP address and the corresponding subnet mask.
Subnetting reduces the available IP addresses for hosts. For example, an IP address of Class B
originally can accommodate 65534 host IDs. After a 6-bit subnet field is classified, there can
be a maximum of 64 subnets. Each subnet has a 10-bit host ID, which means each subnet has
a maximum of 1022 (210-2, except the host IDs with all 1s and all 0s) host IDs. Therefore,
there are 65408 (64 x 1022 = 65408) host IDs, 126 less than the number of IDs before subnet
classification.
If an enterprise does not divide its network into subnets, the subnet mask is the default value.
The number of 1s in the subnet mask indicates the net ID length. Therefore, the default values
of the subnet mask for Class A, Class B and Class C IP addresses are 255.0.0.0, 255.255.0.0,
and 255.255.255.0 respectively.
During subnetting and IP address planning, consider the following rules to implement
reasonable and efficient network planning:
Hierarchy
To divide network in hierarchy, consider geographic and service factors so that subnetting is
subject to network hierarchies in top-down mode. In this manner, networks are effectively
managed and routing tables are simplified. In most cases:
Consistency
Consecutive addresses facilitate routing aggregation on a hierarchical network, which greatly
reduces the number of routing tables and improves route searching efficiency. When
allocating IP addresses, note the following issues:
Expansibility
When you allocate addresses, reserve certain addresses in each hierarchy. In this manner,
consecutive addresses can be allocated to an expanded network, implementing long-term
network planning.
A backbone network must have enough consecutive addresses for independent ASs and
further network expansion.
Efficiency
When planning subnets, fully use address resources as follows to ensure that are sufficient IP
addresses for hosts:
l Use variable-length subnet masking (VLSM) to fully and properly use address resources.
l Considering the routing mechanism of networks to fully use the IP address spaces that
have been allocated for better IP address utilization.
Service-oriented
Devices that have similar functions should be allocated IP addresses of the same type. IP
address allocation complies with the following rules:
Figure 7-3 Relationship between the host name, IP address, and physical address
DNS:Hostname->IP
ARP:IP->MAC
Destination
Hostname:HostB
IP:209.0.0.6/24
MAC:0800-2B00-EE0A
VPN Instance
The concept of VPN instance is first introduced by BGP/MPLS VPN to isolate VPN routes
from public routes and isolate different routes between different VPNs.
In this manner, VPN instances can also be widely used in non-BGP/MPLS VPN network
environments. By using VPN instances, you can create several independent virtual devices on
the same device. In addition, routes in an IP network are isolated.
In the ATN, various software features support VPN instances. They are bound to different
VPN instances, and they can implement VPN multi-instance function, such as multi-instances
of various routing protocols (RIP multi-instances, OSPF multi-instances, ISIS multi-instances,
and BGP multi-instances).
dotted decimal A format of IP address. IP addresses in this format are separated into
notation four parts by a dot "." with each part in the decimal numeral format.
7.2 ARP
For the ATN 905A-P that functions as a small cell dock, if the destination IP address in ARP
reply packets received by an interface is not the interface's IP address, ARP entries are not
learned or updated.
Purpose
If two hosts need to communicate, the network-layer address (IP address) of the receiver must
be available to the sender. Because IP datagrams must be encapsulated into frames before they
can be transmitted over the physical network, the physical address (MAC address) of the
receiver must also be available to the sender. The sender must map the IP address of the
receiver to the receiver's MAC address, so that IP datagrams can be successfully transmitted.
ARP provides a mechanism for mapping IP addresses to MAC addresses.
Function Overview
In addition to the previous function, ARP has other features, as described in Table 7-4.
Benefits
ARP implements mapping between IP addresses at the network layer and MAC addresses at
the link layer on the Ethernet network. It is the basis for Ethernet communication.
7.2.2 Principles
Related Concepts
l Address Resolution Protocol (ARP) messages
ARP messages include Request messages and Reply messages. Figure 7-4 shows the
ARP message format.
NOTE
The Ethernet Address of destination contains a total of 48 bits. Ethernet Address of destination
(0-31) indicates the first 32 bits of the Ethernet Address of destination field and Ethernet Address of
destination (32-47) indicates the last 16 bits of the Ethernet Address of destination field.
An ARP message consists of 42 bytes. The first 14 bytes indicate the Ethernet frame
header, and the last 28 bytes are the content of the ARP Request or Reply message.
Table 7-5 describes the fields in an ARP message.
Ethernet address 48 bits Ethernet destination MAC address. This field in an ARP
of destination Request message is the broadcast MAC address, with a
value of 0xFF-FF-FF-FF-FF-FF.
Frame type 16 bits Frame type. For an ARP Request or Reply message, the
value of this field is 0x0806.
Hardware type 16 bits Type of the hardware address. For an Ethernet network,
the value of this field is 1.
Protocol length 8 bits Length of the protocol address. For an ARP Request or
Reply message, the value of this field is 4.
Ethernet address 48 bits Source MAC address. The value of this field is the same
of sender as the Ethernet source MAC address in the Ethernet
frame header.
Ethernet address 48 bits Destination MAC address. The value of this field in an
of destination ARP Request message is 0x00-00-00-00-00-00.
l ARP table
If a host broadcasts an ARP Request message before it sends every IP datagram, the
communication traffic on the network will greatly increase. Furthermore, all other hosts
on the network have to receive and process the ARP Request messages, which lowers
network efficiency. To solve this problem, an ARP table is maintained on each host to
ensure efficient ARP operations. An ARP table contains the latest mapping between IP
addresses and MAC addresses. The mapping between an IP address and a MAC address
is called an ARP entry.
ARP entries can be classified as dynamic or static.
– Dynamic ARP entries are automatically generated and maintained by using ARP
messages. Dynamic ARP entries can be aged and overwritten by static ARP entries.
– Static ARP entries are manually configured and maintained by a network
administrator. Static ARP entries can neither be aged nor be overwritten by dynamic
ARP entries.
Before sending IP datagrams, a host searches the ARP table for the MAC address
corresponding to the destination IP address.
– If the ARP table contains the corresponding MAC address, the host directly sends
the IP datagrams to the MAC address instead of sending an ARP Request message.
– If the ARP table does not contain the corresponding MAC address, the host
broadcasts an ARP Request message to request the MAC address of the destination
host.
l Reverse Address Resolution Protocol (RARP)
If only the MAC address of a host is available, its IP address can also be obtained using
RARP messages.
You need to establish the mapping between MAC addresses and IP addresses on a
gateway. When a new host must be configured, the RARP client requests the host's IP
address from the RARP server on the gateway.
Implementation
l ARP implementation within a network segment
Figure 7-5 shows how ARP is implemented within a network segment, by using IP
datagram transmission from Host A to Host B as an example.
NOTE
The numbers in the following figures correspond to the steps described below the figures.
Figure 7-5 ARP implementation between Host A and Host B in the same network
segment
PE
Port1 Port2
2
CE1 2 CE2
1
4 3
5
IP datagram
a. Host A searches its ARP table and does not find the mapping between the IP
address and MAC address of Host B. Host A then sends an ARP Request message
to request the MAC address of Host B. In this ARP Request message, the source IP
address and source MAC address are respectively the IP address and MAC address
of Host A, the destination IP address and destination MAC address are respectively
the IP address of Host B and 00-00-00-00-00-00, and the Ethernet source MAC
address and Ethernet destination MAC address are respectively the MAC address of
Host A and the broadcast MAC address.
b. After receiving the ARP Request message, CE1 broadcasts it in the network
segment.
c. After receiving the ARP Request message, Host B adds the MAC address of Host A
into its ARP table and sends an ARP Reply message to Host A. In this ARP Reply
message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host B, the destination IP and MAC addresses are respectively the IP
and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC addresses of Host B and Host A.
NOTE
The destination IP address in the ARP Request message is not the IP address of PE. Therefore, PE
discards the received ARP Request message.
d. After receiving the ARP Reply message, CE1 forwards it to Host A.
e. After receiving the ARP Reply message, Host A adds the MAC address of Host B
into its ARP table and sends the IP datagrams to Host B.
l ARP implementation between different network segments
NOTE
ARP messages are Layer 2 messages. Therefore, ARP is applicable only to devices on the same
network segment. If two hosts in different network segments need to communicate, the source host
sends IP datagrams to the default gateway, and then the default gateway forwards the IP datagrams
to the destination host. ARP implementation between different network segments involves
separate ARP implementation within network segments. In this manner, hosts in different network
segments can communicate.
Figure 7-6 and Figure 7-7 show how ARP is implemented between different network
segments, by using IP datagram transmission from Host A to Host C as an example.
Figure 7-6 shows the ARP implementation between Host A and PE in the same network
segment. The ARP implementation enables Host A to send IP datagrams to PE.
Figure 7-6 ARP implementation between Host A and PE in the same network segment
3 PE
2
5
Port1 Port2
IP address: IP address:
10.10.10.3/24 10.10.11.1/24
MAC address: 3-3-3 MAC address: 4-4-4
CE1 CE2
4
5 1 2
IP datagram
a. Host A searches its ARP table and does not find the mapping between the IP
address and MAC address of port 1 on the default gateway PE, which is connected
to Host C. Host A then sends an ARP Request message to request the MAC address
of port 1 on PE. In this ARP Request message, the source IP and MAC addresses
are respectively the IP and MAC addresses of Host A, the destination IP and MAC
addresses are respectively the IP address of port 1 on PE and 00-00-00-00-00-00,
and the Ethernet source and destination MAC addresses are respectively the MAC
address of Host A and the broadcast MAC address.
b. After receiving the ARP Request message, CE1 broadcasts it in the network
segment.
c. After receiving the ARP Request message, PE adds the MAC address of Host A
into its ARP table and sends an ARP Reply message to Host A. In this ARP Reply
message, the source IP and MAC addresses are respectively the IP and MAC
addresses of port 1 on PE, the destination IP and MAC addresses are respectively
the IP and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC address of port 1 on PE and the MAC address
of Host A.
NOTE
The destination IP address in the ARP Request message is not the IP address of Host B.
Consequently, Host B discards the received ARP Request message.
d. After receiving the ARP Reply message, CE1 forwards it to Host A.
e. After receiving the ARP Reply message, Host A adds the MAC address of port 1 on
PE into its ARP table and sends the IP datagrams to PE.
Figure 7-7 shows the ARP implementation between PE and Host C in the same network
segment. The ARP implementation enables PE to send the IP datagrams to Host C.
Figure 7-7 ARP implementation between PE and Host C in the same network segment
Routing table
PE 5
1
Port1 Port2
IP address: IP address:
10.10.10.3/24 10.10.11.1/24 4
MAC address: 3-3-3 MAC address: 4-4-4
CE1 2 CE2
2
3
5
IP datagram
PE queries its routing table and sends the IP datagrams from port 1 to port 2.
a. PE searches its ARP table and does not find the mapping between the IP address
and MAC address of Host C. Then, PE sends an ARP Request message to request
the MAC address of Host C. In this ARP Request message, the source IP and MAC
addresses are respectively the IP and MAC addresses of port 2 on PE, the
destination IP and MAC addresses are respectively the IP address of Host C and
00-00-00-00-00-00, and the Ethernet source and destination MAC address are
respectively the MAC address of port 2 on PE and the broadcast MAC address.
b. After receiving the ARP Request message, CE2 broadcasts it in the network
segment.
c. After receiving the ARP Request message, Host C adds the MAC address of port 2
on PE into its ARP table and sends an ARP Reply message to PE. In this ARP
Reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host C, the destination IP and MAC addresses are respectively the IP
and MAC addresses of port 2 on PE, and the Ethernet source and destination MAC
addresses are respectively the MAC address of Host C and the MAC address of port
2 on PE.
NOTE
The destination IP address in the ARP Request message is not the IP address of Host D.
Consequently, Host D discards the received ARP Request message.
d. After receiving the ARP Reply message, CE2 forwards it to PE.
e. After receiving the ARP Reply message, PE adds the MAC address of Host C into
its ARP table and sends the IP datagrams to Host C.
So far, the IP datagram transmission from Host A to Host C is complete.
NOTE
1. ARP Request messages are broadcast, whereas ARP Reply messages are unicast.
2. In the ARP implementation, the switches CE1 and CE2 transparently forward IP datagrams and do
not modify them.
Definition
Dynamic Address Resolution Protocol (ARP) means that devices dynamically learn and
update the mapping between IP addresses and MAC addresses by using ARP messages. You
do not need to manually configure the mapping.
Related Concepts
l Dynamic ARP aging mechanism
The dynamic ARP aging mechanism enables an ARP entry that is not used in a specified
period to be automatically deleted. By deleting seldom used ARP entries, the dynamic
ARP aging mechanism helps to reduce storage space of ARP tables and speed up ARP
table queries.
Table 7-6 describes concepts related to the dynamic ARP aging mechanism.
Aging A dynamic ARP entry Two interconnected devices can learn the mapping
time has a life cycle. If a between their respective IP and MAC addresses
dynamic ARP entry is using ARP and can save the mapping in their ARP
not updated before its tables. Then, the two devices can communicate by
life cycle ends, this using the ARP entries. When the peer device
dynamic ARP entry is becomes faulty, or the network adapter of the peer
deleted from the ARP device is replaced but the local device does not
table. The life cycle is receive any status change information about the
called aging time. peer device, the local device continues sending IP
datagrams to the peer device. As a result, network
traffic is interrupted because the ARP table of the
local device is not updated in time. To reduce the
risk of network traffic interruption, an aging timer
can be set for each ARP entry. After the aging timer
of a dynamic ARP entry expires, the entry is
automatically deleted.
Numb Before a dynamic The ARP aging timer can help reduce the risk of
er of ARP entry is aged, network traffic interruptions that occur because an
aging the local device sends ARP table is not updated quickly enough, but
probe ARP aging probe cannot eliminate problems due to delays.
attemp messages to the peer Specifically, if the length of a dynamic ARP entry
ts device. If the local aging timer is N seconds, the local device can detect
device does not the status change of the peer device after N seconds.
receive an ARP Reply During the N seconds, the ARP table of the local
message after the device is not updated. If the number of aging probe
number of aging attempts is specified, the local device can obtain the
probe times reaches status change information about the peer device and
the specified number, update its ARP table.
the dynamic ARP
entry is deleted.
Enhanced Functions
Layer 2 topology probe
With the Layer 2 topology probe function, the aging time of all ARP entries corresponding to
the VLAN to which a Layer 2 interface belongs is set to 0 when the status of the Layer 2
interface changes from Down to Up. Then, the device resends ARP probe messages to update
all the ARP entries.
If a non-Huawei device is interconnected with a Huawei device, the non-Huawei device does
not respond to an ARP aging probe message with the destination MAC address as the
broadcast MAC address if the ARP table of the non-Huawei device contains the mapping
between the IP address and MAC address of the Huawei device. Then, the Huawei device
considers that the link to the non-Huawei device is in the Down state and deletes the mapping
between the IP address and MAC address of the non-Huawei device. Therefore, if a non-
Huawei device is interconnected with a Huawei device, configure the Huawei device to
unicast ARP aging probe messages to the non-Huawei device.
Implementation
Devices dynamically learn and update the mapping between IP addresses and MAC addresses
by using ARP messages. The process involves the creation, update, and aging of dynamic
ARP entries.
l Creating and updating dynamic ARP entries
If an ARP message received by a device meets any of the following conditions, the
system automatically creates or updates the corresponding ARP entry:
– The source IP address of the ARP message is in the same network segment as the IP
addresses of inbound interfaces. The destination IP address of the ARP message is
the IP address of the interface on the device.
– The source IP address of the ARP message is in the same network segment as the IP
addresses of inbound interfaces. The destination IP address of the ARP message is
the virtual IP address of the Virtual Router Redundancy Protocol (VRRP) backup
group configured on the interface on the device.
– The source IP address of the ARP message is in the same network segment as the IP
addresses of inbound interfaces, which are virtual Ethernet interfaces applied in the
IP over Ethernet over AAL5 (IPoEoA) service.
l Aging dynamic ARP entries
After the aging timer of a dynamic ARP entry on a device expires, the device sends ARP
aging probe messages to the peer device. If the device does not receive an ARP Reply
message after the number of aging probe attempts reaches the specified number, the
dynamic ARP entry is aged.
Usage Scenarios
Dynamic ARP is applicable to a network with a complex topology, insufficient bandwidth
resources, and a high requirement for real-time communication.
Benefits
Dynamic ARP entries are dynamically created and updated using ARP messages. They do not
need to be manually maintained, greatly reducing maintenance workload.
Definition
Static Address Resolution Protocol (ARP) means that the mapping between IP addresses and
MAC addresses is manually created by a network administrator.
Principles
Static ARP and dynamic ARP differ in ARP entry creation and maintenance methods.
Dynamic ARP entries are automatically created and maintained using ARP messages,
whereas static ARP entries are manually configured and maintained by a network
administrator. The advantages and disadvantages of dynamic ARP and static ARP are as
follows:
l Dynamic ARP
– Advantages: Dynamic ARP entries do not need to be manually configured and
maintained. When a device becomes faulty or the network adapter on a host is
frequently replaced, the ARP entry can be updated in real time. Maintenance
workload is greatly reduced.
– Disadvantages:
n Dynamic ARP entries can be aged and overwritten by new dynamic ARP
entries. This affects the stability and security of network communications.
n The execution of dynamic ARP consumes some network resources, which may
affect user services. Therefore, dynamic ARP is not applicable to a network
with insufficient bandwidth resources.
l Static ARP
– Advantages:
n Static ARP entries are neither aged nor overwritten by dynamic ARP entries.
This ensures the reliability of network communications.
Related Concepts
Static ARP entries are classified into short and long entries.
l Short static ARP entries
The short static ARP entries cannot be used to forward messages directly. Users send
ARP request messages. If the source IP and MAC addresses of the received reply
messages are the same as the configured IP and MAC addresses, the interface receiving
ARP reply messages is added into the static ARP entries. Therefore, devices can use this
interface to forward messages directly.
NOTE
If a MAC address is configured for multiple interfaces, the short static ARP entry in which the
MAC address exists cannot be updated by users.
l Long static ARP entries
When configuring long static ARP entries, configure IP and MAC addresses as well as
the VLAN and outbound interface through which devices send messages based on the
ARP entries. Long static ARP entries are used to forward messages directly.
Usage Scenarios
l Static ARP is applicable to a network with a simple topology and high stability.
l Static ARP is applicable to a network where information security is of high priority, such
as a governmental network or military network.
NOTE
Short static ARP entries mainly apply to a scenario in which network administrators want to bind users'
IP and MAC addresses but users' access interfaces can change.
Benefits
Static ARP ensures the communication security. If a static ARP entry is configured on a
device, the device can communicate with the peer device using only the specified MAC
address. Network attackers cannot modify the mapping between the IP address and MAC
address by using ARP messages, ensuring normal communications between the two devices.
Background
Static ARP protects a network against ARP spoofing attacks. However, network
administrators must configure static ARP entries, which can be time-consuming and
laborious, and errors may occur during the configuration. ARP automatic scanning and fixed
ARP solve this problem, while ensuring reliable and secure network operations.
Related Concepts
ARP automatic scanning: A device automatically sends ARP Request packets to all its
neighbor devices on a local area network (LAN) to obtain the MAC addresses of the neighbor
devices and generate dynamic ARP entries.
Fixed ARP: The device converts the generated dynamic ARP entries to static ARP entries.
ARP automatic scanning is generally used with fixed ARP. A device uses ARP automatic
scanning to generate dynamic ARP entries and uses fixed ARP to convert these dynamic ARP
entries to static ARP entries. These features prevent network attackers from modifying ARP
entries to attack the network.
Implementation
Figure 7-8 shows a network that implements ARP automatic scanning and fixed ARP.
Internet
PE
CE1 CE2
Host A, Host B, Host C, and HostD on a LAN communicate with the Internet through a
provider edge (PE) on the network shown in Figure 7-8. The implementation of ARP
automatic scanning and fixed ARP is as follows:
1. After the PE is configured with ARP automatic scanning, the PE sends ARP Request
packets to each host to learn their MAC addresses and generate dynamic ARP entries.
2. After the PE is configured with fixed ARP, the PE converts the generated dynamic ARP
entries to static ARP entries.
Usage Scenario
ARP automatic scanning and fixed ARP apply to small-sized LANs.
Benefits
ARP automatic scanning and fixed ARP rapidly configure static ARP entries to maintain
reliable and secure network communications.
Principles
To ensure the stability and reliability of network communication, a device can broadcast
gratuitous Address Resolution Protocol (ARP) messages to notify the other devices in the
same network segment of its address information in the following scenarios:
l You need to check whether the IP address of a device conflicts with the IP address of
another device in the same network segment. The IP address of each device must be
unique to ensure the stability of network communication.
l After the MAC address of a host changes after its network adapter is replaced, the host
must quickly notify other devices in the same network segment of the MAC address
change before the ARP entry is aged. This ensures the reliability of network
communication.
l When a master/slave switchover occurs in a Virtual Router Redundancy Protocol
(VRRP) backup group, the new master router needs to notify other devices in the same
network segment of its status change.
Related Concepts
Gratuitous ARP message
A gratuitous ARP message is a special ARP message. The source and destination IP addresses
in a gratuitous ARP message are the IP addresses of the sender.
Implementation
l If a device finds that the source IP address in a received gratuitous ARP message is the
same as its own IP address, the device sends a gratuitous ARP message to notify the
sender of the address conflict.
l If a device finds that the source IP address in a received gratuitous ARP message is
different from its own IP address, the device maintains the corresponding ARP entry
based on the information (such as the sender's IP address and MAC address) carried in
the gratuitous ARP message.
PE1 CE PE2
Port1 Port2
IP address: 10.1.1.1/24 IP address: 10.1.1.1/24
Port1 Port2
ARP Reque
st message
es sage
it ou s ARP m
Grat u
Gratuitou
s ARP m
essage
…
As shown in Figure 7-9, the IP address of port 1 on PE1 is 10.1.1.1, and the IP address of port
2 on PE2 is 10.1.1.1.
1. Port 1 broadcasts an ARP Request message. Port 2 receives the ARP Request message
and finds that the source IP address in the message conflicts with its own IP address.
Then, port 2 performs the following operations:
a. Port 2 sends a gratuitous ARP message to notify port 1 of its IP address.
b. A conflict node is generated on the conflict link of port 2. Then, port 2 sends
gratuitous ARP messages to port 1 at an interval of 5 seconds.
2. Port 1 receives the gratuitous ARP messages from port 2 and finds that the source IP
address in the message conflicts with its own IP address. Then, port 1 performs the
following operations:
a. Port 1 sends a gratuitous ARP message to notify port 2 of its IP address.
b. A conflict node is generated on the conflict link of port 1. Then, port 1 sends
gratuitous ARP messages to port 2 at an interval of 5 seconds.
Port 1 and port 2 send gratuitous ARP messages to each other at an interval of 5 seconds until
the address conflict is rectified.
If one port does not receive a gratuitous ARP message from the other port within 8 seconds,
the port considers that the address conflict has been rectified. The port deletes the conflict
node on its conflict link and stops sending gratuitous ARP messages to the other port.
Functions
Gratuitous ARP implements the following functions:
l To check for IP address conflict on the network, send a gratuitous ARP message from a
device. If the device receives a gratuitous ARP message from another device, the IP
addresses of the two devices conflict.
l When the MAC address of a host changes after its network adapter is replaced, the host
sends a gratuitous ARP message to notify other devices of the MAC address change
before the ARP entry is aged. This ensures the reliability of network communication.
After receiving the gratuitous ARP message, other devices maintain the corresponding
ARP entry in their ARP tables based on the address information carried in the message.
l When a master/slave switchover occurs in the VRRP backup group, the new master
router sends a gratuitous ARP message to notify other devices on the network of its
status change.
Benefits
Gratuitous ARP reveals address conflict on a network so that ARP tables of devices can be
quickly updated. This ensures the stability and reliability of network communication.
Principles
The Address Resolution Protocol (ARP) is applicable only to devices on the same physical
network. When a device on a physical network needs to send IP datagrams to another physical
network, the gateway needs to query the routing table to implement communication between
the two networks. However, routing table query consumes system resources and can affect
other services. To resolve the problem, you can deploy proxy ARP on an intermediary device.
The proxy ARP feature helps reduce system resource consumption caused by routing table
queries and improve the efficiency of system processing.
Implementation
l Routed proxy ARP
A large network of a company is usually divided into multiple subnets to facilitate
management. The routing information of a host in a subnet can be modified so that IP
datagrams sent from this host to another subnet is first sent to the gateway and then to
another subnet. With this solution, devices are hard to manage and maintain, but
deploying proxy ARP on the gateway effectively resolves the management and
maintenance problems caused by network division.
Figure 7-10 shows how proxy ARP is implemented using the communication between
Host A and Host B as an example.
2
Port1
IP address: 10.10.10.2/24
MAC address: 2-2-2
Destination IP Destination MAC
ARP Request address address
message 10.10.11.1 FF-FF-FF
a. Host A sends an ARP Request message to request the MAC address of Host B.
b. After receiving the ARP Request message, PE checks the destination IP address of
the message and finds that the requested MAC address is not its MAC address. PE
then checks whether there are routes to Host B.
n If there are routes to Host B, PE checks whether routed proxy ARP is enabled
on it.
○ If routed proxy ARP is enabled on PE, PE sends the MAC address of its
port 1 to Host A.
○ If routed proxy ARP is not enabled on PE, PE discards the ARP Request
message sent by Host A.
n If there are no routes to Host B, PE discards the ARP Request message sent by
Host A.
c. After learning the MAC address of port 1, Host A sends IP datagrams to PE based
on this MAC address.
After receiving the IP datagrams, PE forwards them to Host B.
l Proxy ARP within a VLAN
Figure 7-11 shows how proxy ARP is implemented within a VLAN by using the
communication between Host A and Host C as an example.
Figure 7-11 Typical networking diagram for proxy ARP within a VLAN
VLAN 4
VLANIF 4
Interface CE IP address:
isolation 10.10.10.4/24
deployed on CE MAC address:
4-4-4
3
1
2
Host A, Host B, and Host C belong to the same VLAN. Port isolation is configured on
CE. Therefore, Host A and Host C cannot communicate at Layer 2. You can configure a
VLANIF interface on CE and enable proxy ARP within a VLAN to implement
communication between Host A and Host C.
a. Host A sends an ARP Request message to request the MAC address of Host C.
b. After receiving the ARP Request message, CE checks the destination IP address of
the message and finds that the requested MAC address is not the MAC address of
its VLANIF 4. Then, CE searches its ARP table for the ARP entry indicating the
mapping between the IP address and MAC address of Host C.
n If CE finds this ARP entry in its ARP table, CE checks whether proxy ARP
within a VLAN is enabled on it.
○ If proxy ARP within a VLAN is enabled on CE, CE sends the MAC
address of its VLANIF 4 to Host A.
○ If proxy ARP within a VLAN is not enabled on CE, CE discards the ARP
Request message sent by Host A.
n If CE does not find this ARP entry in its ARP table, CE discards the ARP
Request message sent by Host A and checks whether proxy ARP within a
VLAN is enabled on it.
○ If proxy ARP within a VLAN is enabled on CE, CE sends the ARP
Request message to Host C. After CE receives an ARP Reply message
Figure 7-12 Typical networking diagram for proxy ARP between VLANs
PE
Super-VLAN 4
VLANIF 4
IP address:10.10.10.3/24
MAC address: 3-3-3
3 2 1 VLAN 3 VLAN 2
Sub-VLAN 3 Sub-VLAN 2
HostA HostB
IP address: 10.10.10.1/24 IP address: 10.10.10.2/24
MAC address: 1-1-1 MAC address: 2-2-2
Destination IP Destination MAC
ARP Request address address
message 10.10.10.2 FF-FF-FF
a. Host A sends an ARP Request message to request the MAC address of Host B.
b. After receiving the ARP Request message, PE checks the destination IP address of
the message and finds that the requested MAC address is not the MAC address of
its VLANIF 4. Then, PE searches its ARP table for the ARP entry indicating the
mapping between the IP address and MAC address of Host B. The ARP entries
include dynamically learned and statically configured ARP entries.
n If PE finds this ARP entry in its ARP table, PE checks whether proxy ARP
between VLANs is enabled on it.
○ If proxy ARP between VLANs is enabled on PE, PE sends the MAC
address of its VLANIF 4 to Host A.
○ If proxy ARP between VLANs is not enabled on PE, PE discards the
ARP Request message sent by Host A.
n If PE does not find this ARP entry in its ARP table, PE discards the ARP
Request message sent by Host A and checks whether proxy ARP between
VLANs is enabled on it.
○ If proxy ARP between VLANs is enabled on PE, PE sends the ARP
Request message to Host B. After PE receives an ARP Reply message
from Host B, an ARP entry indicating the mapping between the IP
address and MAC address of Host B is generated in the ARP table.
○ If proxy ARP between VLANs is not enabled on PE, PE does not perform
any operations.
c. After learning the MAC address of VLANIF 4, Host A sends IP datagrams to PE
based on this MAC address.
After receiving the IP datagrams, PE forwards them to Host B.
Usage Scenarios
Table 7-7 describes the usage scenarios for the three types of ARP.
Routed Two hosts that need to communicate belong to the same network segment
proxy ARP but are located on different physical networks.
Proxy ARP Two hosts that need to communicate belong to the same VLAN in which
within a user isolation is configured and the same network segment.
VLAN
Proxy ARP Two hosts that need to communicate belong to the same network segment
between but different VLANs.
VLANs NOTE
In the VLAN aggregation scenario, proxy ARP between VLANs can be enabled on
the VLANIF interface corresponding to the super-VLAN to implement
communication between sub-VLANs.
Benefits
l Proxy ARP enables a host on a network to mistakenly consider that the destination host
is in the same network segment. In this manner, the details of the physical network are
hidden, and the division of the network into subnets is transparent to hosts.
l All processing related to proxy ARP is performed on a gateway, with no configuration
needed on the hosts connected to it. In addition, proxy ARP affects only the ARP tables
on hosts and does not affect the ARP table and routing table on a gateway.
7.2.2.7 ARP-Ping
Principles
ARP-Ping includes ARP-Ping IP and ARP-Ping MAC, and is used to maintain a network on
which Layer 2 features are deployed (ARP refers to Address Resolution Protocol).
l ARP-Ping IP
Before configuring an IP address for a device, check whether the IP address is being
used by another device. Generally, the ping command can be used to check whether an
IP address is being used. However, if a firewall is configured for the device using the IP
address, and the firewall is configured not to respond to ping messages, you may
mistakenly believe that the IP address is not being used. To solve this problem, use the
ARP-Ping IP feature. ARP messages are Layer 2 protocol messages and, in most cases,
can pass through a firewall configured not to respond to ping messages.
l ARP-Ping MAC
The host's MAC address is the fixed address of the network adapter on the host. It does
not normally need to be configured manually; however, there are exceptions. For
example, if a device has multiple interfaces and the manufacturer does not specify MAC
addresses for these interfaces, the MAC addresses must be configured, or a virtual MAC
address must be configured for a Virtual Router Redundancy Protocol (VRRP) backup
group. Before configuring a MAC address, use the ARP-Ping MAC feature to check
whether the MAC address is being used by another device.
Related Concepts
l ARP-Ping IP
A device obtains the specified IP address and outbound interface number from the
configuration management plane, saves them to the buffer, constructs an ARP Request
message, and broadcasts the message on the outbound interface. If the device does not
receive an ARP Reply message within a specified period, the device displays a message
indicating that the IP address is not being used by another device. If the device receives
an ARP Reply message, and the specified timeout expires, the device compares the
source IP address in the ARP Reply message with the IP address stored on the buffer. If
the two IP addresses are the same, the device displays the source MAC address in the
ARP Reply message and displays a message indicating that the IP address is being used
by another device.
l ARP-Ping MAC
The ARP-Ping MAC process is similar to the ping process. It varies in that ARP-Ping
MAC is applicable only to directly connected Ethernet LANs or Layer 2 Ethernet virtual
private networks (VPNs). A device obtains the specified MAC address and outbound
interface number (optional) from the configuration management plane, constructs an
Internet Control Message Protocol (ICMP) Echo Request message, and broadcasts the
message on the outbound interface. If the device does not receive an ICMP Echo Reply
message within a specified period, the device displays a message indicating that the
MAC address is not being used by another device. If the device receives an ICMP Echo
Reply message within a specified period, the device compares the source MAC address
in the message with the MAC address stored on the device. If the two MAC addresses
are the same, the device displays the source IP address in the ICMP Echo Reply message
and displays a message indicating that the MAC address is being used by another device.
Implementation
l ARP-Ping IP implementation
ATN A
GE0/2/0
10.1.1.1/24
Ethernet A
As shown in Figure 7-13, ATN A can use ARP-Ping IP to check whether the IP address
10.1.1.2 is being used. ATN A receives an ARP Reply message from Host A, with an IP
address of 10.1.1.2. After the specified timeout expired, ATN A displays the MAC
address of Host A along with a message indicating that the IP address is in use by
another host.
The ARP-Ping IP implementation process is as follows:
a. After the IP address 10.1.1.2 is specified using a command line on ATN A, ATN A
broadcasts an ARP Request message and starts a timer for ARP Reply messages.
b. After receiving the ARP Request message, Host A on the same LAN finds that the
destination IP address in the message is the same as its own IP address and sends an
ARP Reply message to ATN A.
c. After receiving the ARP Reply message, and the specified timeout expires, the
device compares the source IP address in the ARP Reply message with the IP
address stored on the device.
n If the two IP addresses are the same, ATN A displays the source MAC address
in the message and displays a message indicating that the IP address is being
used by another host. Meanwhile, ATN A stops the timer for ARP Reply
messages.
n If the two IP addresses are different, ATN A discards the ARP Reply message
and displays a message indicating that the IP address is not being used by
another host.
If ATN A does not receive any ARP Reply messages before the ARP Reply
message timer expires, it displays a message indicating that the IP address is not
being used by another host.
l ARP-Ping MAC implementation
ATN A
GE0/2/0
10.1.1.1/24
Ethernet A
As shown in Figure 7-14, ATN A can use ARP-Ping MAC to check whether the MAC
address 0013-46E7-2EF5 is being used by another host. After receiving ICMP Echo
Reply messages from all the hosts on the network, ATN A displays the IP address of the
host with a MAC address of 0013-46E7-2EF5 and displays a message indicating that the
MAC address is being used by another host.
The ARP-Ping MAC implementation process is as follows:
a. After the MAC address 0013-46E7-2EF5 is specified using a command line on
ATN A, ATN A broadcasts an ICMP Echo Request message and starts a timer for
ICMP Echo Reply messages.
b. After receiving the ICMP Echo Request message, all the other hosts on the same
LAN send ICMP Echo Reply messages to ATN A.
c. If ATN A receives an ICMP Echo Reply message from a host, ATN A compares the
source MAC address in the message with the MAC address in the command line.
n If the two MAC addresses are the same, ATN A displays the source IP address
in the ICMP Echo Reply message and displays a message indicating that the
MAC address is being used by another host. Meanwhile, ATN A stops the
timer for ICMP Echo Reply messages.
n If the two MAC addresses are different, ATN A discards the ICMP Echo Reply
message and displays a message indicating that the MAC address is not being
used by another host.
If ATN A does not receive any ICMP Echo Reply messages before the ICMP Echo
Reply message timer expires, it displays a message indicating that the MAC address
is not being used by another host.
Usage Scenarios
ARP-Ping is applicable to directly connected Ethernet LANs or Layer 2 Ethernet VPNs.
Benefits
ARP-Ping checks whether an IP address or MAC address to be configured is being used by
another device, preventing address conflict.
Background
The occurrence of an IP address conflict causes route flapping and traffic interruptions,
affecting user services. IP address conflicts are often caused by incorrect networking or
configurations. Users expect that devices can automatically detect IP address conflicts on a
network and immediately notify users of conflict reasons, so that they can rapidly resolve
such conflicts and minimize impact on services.
IP address conflict detection can help users quickly locate and modify the conflicted IP
addresses and instruct users to properly configure and manage the IP addresses of devices on
a network.
Implementation
IP address conflict detection can be classified into active and passive detection, and their
differences are as follows:
l Active detection
When the protocol status of an interface on a device changes to Up, the device actively
sends gratuitous ARP packets to detect possible IP address conflicts. For the detailed
detection procedure, see 7.2.2.5 Gratuitous ARP.
l Passive detection
When a device receives ARP packets that are not gratuitous ARP packets, it checks the
IP addresses carried by the ARP packets. The device concludes that IP address conflicts
exist on the network if any of the following conditions are met:
– The source IP address in an ARP packet is the same as the IP address of the
inbound interface that receives the ARP packet, but the source MAC address in the
ARP packet is different than the MAC address of the inbound interface.
– The source IP address in an ARP packet is the same as the IP address in an existing
ARP entry, but the source MAC address is different than the MAC address in the
ARP entry.
– The source IP address in an ARP packet is different than the CE IP address
configured on the inbound interface that connects to the CE, or the source MAC
address is different than the CE MAC address configured on the inbound interface
that connects to the CE.
– The source IP address in an ARP packet is 0.0.0.0 (probe ARP packet), The
destination IP address is the same as the IP address of the inbound interface that
receives the ARP packet, but the source MAC address in the ARP packet is
different than the MAC address of the inbound interface.
Usage Scenario
IP address conflict detection is applicable to Ethernet LANs.
Benefits
IP address conflict detection helps users quickly locate and modify IP address conflicts to
ensure stability and security of user services.
Principles
The Address Resolution Protocol (ARP) is simple and easy to implement. It is the basis for
Ethernet communication. However, ARP does not provide any security mechanisms.
Attackers can modify ARP entries by transmitting pseudo ARP messages to attack the
network. ARP attacks and ARP viruses pose a serious threat to LAN security. Network
devices must be able to utilize various technologies to effectively detect and avoid ARP
attacks.
ARP security ensures the security and robustness of network devices by filtering out untrusted
ARP messages and enabling timestamp suppression on certain ARP messages.
Related Concepts
ARP Miss message
An ARP Miss message is reported by a device to the upper-layer software when the device
fails to find a matched ARP entry for IP datagram forwarding. After receiving the ARP Miss
message, the upper-layer software generates a fake ARP entry and sends it to the device. The
upper-layer software then sends an ARP Request message to request the destination MAC
address. After receiving an ARP Reply message, the upper-layer software learns address
information in the message and sends the real ARP entry to the device to replace the fake
ARP entry. The device can then forward IP datagrams.
A dynamic fake ARP entry has an aging time.
l Before the aging time elapses, the device stops sending ARP Miss messages to the
upper-layer software.
l After the aging time elapses, the dynamic fake ARP entry is deleted. If the device still
cannot find the matched ARP entry, the device sends another ARP Miss message to the
upper-layer software.
Implementation
Table 7-8 shows how ARP security is implemented.
Interface- The number of ARP entries that an An unauthorized user sends a large
based interface can learn is restricted, number of ARP messages to a device.
ARP effectively preventing ARP entry This results in the device having to
entry overflow and ensuring ARP entry learn a large number of ARP entries in
restrictio security. a short period of time, causing ARP
n entry overflow. As a result, authorized
users cannot use the network as
normal.
Timestam A device counts received ARP Miss Unauthorized users use specific tools
p messages. If the number of ARP Miss to send a large number of ARP
suppressi messages received in a specified messages to hosts in the local network
on on period exceeds the threshold, the segment or other network segments.
ARP device does not process excess ARP Many ARP Miss messages are
Miss Miss messages. generated because MAC addresses
messages NOTE corresponding to the destination IP
Currently, timestamp suppression on addresses do not exist. Devices have
ARP Miss messages can be performed to spend a lot of resources processing
based only on source IP addresses. these ARP Miss messages, and the
processing of other services is
affected.
Enhanced Functions
ARP security not only provides solutions to various attacks, but also sends alarms when
potential attack behaviors are encountered.
Usage Scenarios
ARP security is deployed at the access layer and aggregation layer.
l After ARP security is deployed at the edge of the access layer, a device can learn only
address information carried in the ARP Reply messages corresponding to the ARP
Request messages sent by the device. This mechanism prevents attacks from most ARP
Request messages.
l After ARP security is deployed at the edge of the aggregation layer, many untrusted ARP
messages are filtered out and timestamp suppression is performed on certain ARP
messages. This mechanism ensures security and stability of core network devices.
Benefits
ARP security ensures the reliability of network communication, and the security and
robustness of network devices.
7.2.3 Applications
Networking Description
As shown in Figure 7-15, the intranet of an organization communicates with the Internet by
using the gateway PE. You can deploy static Address Resolution Protocol (ARP) to prevent
network attackers from obtaining private information by modifying ARP entries on PE.
Internet
PE
CE1 CE2
l Before static ARP is deployed, PE dynamically learns and updates ARP entries using
ARP messages. However, dynamic ARP entries can be aged and overwritten by new
dynamic ARP entries. Therefore, network attackers can send fake ARP messages to
modify ARP entries on PE to obtain the private information of the organization.
l After static ARP is deployed, ARP entries on PE are manually configured and
maintained by a network administrator. Static ARP entries are neither aged nor
overwritten by dynamic ARP entries. Therefore, deploying static ARP can prevent
network attackers from sending pseudo ARP messages to modify ARP entries on PE,
and information security is ensured.
Feature Deployment
Deploy static ARP on PE to set up fixed mapping between IP addresses and MAC addresses
of hosts on the intranet. This can prevent network attackers from sending pseudo ARP
messages to modify ARP entries on PE, ensuring the stability and security of network
communication and minimizing the risk of private information being stolen.
Networking Description
As shown in Figure 7-16, to facilitate ease of management, communication isolation is
implemented for various departments on the intranet of a company. For example, although
Host A of the president's office, Host B of the R&D department, and Host C of the financial
department belong to the same VLAN, they cannot communicate at Layer 2. However, the
business requires that the president's office communicate with the financial department. To
permit this, you can enable proxy Address Resolution Protocol (ARP) within a VLAN on CE
so that Host A can communicate with Host C.
l Before proxy ARP within a VLAN is enabled, if Host A sends an ARP Request message
to request the MAC address of Host C, the message cannot be broadcast to hosts of the
R&D department and financial department due to interface isolation configured on CE.
Therefore, Host A can never learn the MAC address of Host C and cannot communicate
with Host C.
l After proxy ARP within a VLAN is enabled, CE does not discard an ARP Request
message sent from Host A although the destination IP address in the message is not the
IP address of CE. Instead, CE sends the MAC address of its VLANIF 4 to Host A. Then,
Host A sends IP datagrams to this MAC address.
Figure 7-16 Typical networking diagram for proxy ARP within a VLAN
VLAN 4
VLANIF 4
Interface CE IP address:
isolation 10.10.10.4/24
deployed on CE MAC address:
4-4-4
3
3
1
2
IP datagram
Feature Deployment
Configure VLANIF 4, which is a Layer 3 interface, on CE, and enable proxy ARP within a
VLAN on VLANIF 4. After the deployment, CE sends the MAC address of its VLANIF 4 to
Host A when receiving a request for the MAC address of Host C from Host A. Host A then
sends IP datagrams to CE, which forwards the IP datagrams to Host C. Consequently, the
communication between Host A and Host C is implemented.
VE virtual Ethernet
7.3 ACL
NOTE
In this document, if an ACL function supports both IPv4 and IPv6, the implementation of this ACL
function is the same for IPv4 and IPv6 unless otherwise specified. For ACL function support for IPv4
and IPv6 and implementation differences between IPv4 and IPv6, see Appendix.
Purpose
ACLs are used to ensure reliable data transmission between devices on a network by
performing the following:
l Defend the network against various attacks, such as attacks by using IP, Transmission
Control Protocol (TCP), or Internet Control Message Protocol (ICMP) packets.
l Control network access. For example, ACLs can be used to control enterprise network
user access to external networks, to specify the specific network resources accessible to
users, and to define the time ranges in which users can access networks.
l Limit network traffic and improve network performance. For example, ACLs can be
used to limit the bandwidth for upstream and downstream traffic and to apply charging
Benefits
ACL rules are used to classify packets. After ACL rules are applied to a device, the device
permits or denies packets based on them. The use of ACL rules therefore greatly improves
network security.
NOTE
An ACL is a set of rules. It identifies a type of packet but does not filter packets. Other ACL-associated
functions are used to filter identified packets.
7.3.2 Principles
An ACL manages all rules configured by users and provides rule matching algorithm for
services. Services then can permit or deny packets according to the matched rule.
Management of ACL
As a group of rules, each ACL can store multiple rules. When extra ACL groups or rules are
added, the system prompts a configuration failure message.
If no ACL exists, no rules are contained in an ACL, or all rules in an ACL do not meet
matching conditions, it indicates that packets do not match the ACL rules.
The rule order is determined by two factors: rule ID and rule matching order.
There are two rule matching orders, namely, configuration order and automatic order.
l Configuration order indicates that ACL rules are matched according to their
configuration order. Users can configure rule IDs, or the system automatically generates
rule IDs according to an ACL step. The ACL step enables users to easily maintain or add
rules. For example, the step of ACL is 5 by default. When a user does not configure a
rule ID, the system automatically generates a rule ID, 5, for the first rule. In this manner,
if the user intends to add a new rule before rule 5, he or she only needs to input a rule ID
smaller than 5. Then, after rearrangement, the new rule becomes the first rule.
l In the case of automatic order, the system automatically assigns rule IDs, and puts the
rule which is the most precision to the first place according to the principle of depth first.
This can be implemented through the comparison of address wildcards. The smaller the
wildcard, the smaller the specified host range.
For example, 129.102.1.1 0.0.0.0 specifies a host at 129.102.1.1, and 129.102.1.1
0.0.0.255 specifies a network segment ranging from 129.102.1.1 to 129.102.1.255. In
this case, the former rule that specifies a smaller host range is placed before the latter one
in an ACL. The detailed standards are as follows:
– The clauses of basic ACL rules are ordered as follows:
n The clause carrying VPN instance information is ordered first.
n If the VPN instance information is the same, the clause with a smaller range of
source IP addresses is ordered first.
n If the ranges of source IP addresses are the same, the clause configured first is
ordered first.
– The any clauses of interface-based ACL rules are ordered last, and the other clauses
are ordered in the configuration sequence.
– The clauses of advanced ACL rules are ordered as follows:
n The clause carrying VPN instance information is ordered first.
n If the VPN instance information is the same, the clause with IPv4 protocols is
ordered first.
n If the protocol information is the same, the clause with a smaller range of
source IP addresses is ordered first.
n If the range of source IP addresses is the same, the clause with a smaller range
of destination IP addresses is ordered first.
n If the range of destination IP addresses is the same, the clause with a smaller
range of TCP/UDP port numbers is ordered first.
n If the ranges of TCP/UDP port numbers are the same, the clause configured
first is ordered first.
A rule is identified by a rule ID, which is configured by a user or generated by the system
according to the ACL step. All rules in an ACL are arranged in ascending order of rule IDs.
Rule IDs are separated by a certain space. The size of the space depends on the ACL step. For
example, if the ACL step is set to 5, the difference between two rule IDs are 5, such as 5, 10,
15, and the rest may be deduced by analogy. If the ACL step is 2, the rule ID automatically
generated by the system starts from 2. In this manner, the user can add a rule before the first
rule.
l Configuration order
– If rule IDs are not specified, the system automatically assigns rule IDs according to
the ACL step and the configuration order of rules. For example, the user configures
three rules without rule IDs. If the ACL step is 5, the system assigns rule IDs 5, 10,
and 15 to the three rules according to the configuration order.
– If rule IDs are specified, rules are arranged according to their rule IDs. For example,
rule IDs are 5, 10, and 15. If a rule ID, 3, is specified for a new ACL rule, the order
of the rules is 3, 5, 10, and 15. It can be considered that a new rule is added before
rule 5.
Therefore, in the case of the configuration order, the system performs rule matching
according to the configuration order of rules. In essence, the system performs rule
matching in ascending order of the rule IDs. In this manner, a new rule may be matched
earlier.
l Automatic order
In the case of the automatic order, the user cannot specify rule IDs. Instead, the system
automatically assigns rule IDs according to the principle of depth first. In addition, the
user cannot add a new rule. The rule that specifies a smaller packet range obtains a
smaller rule ID. The system performs rule matching in ascending order of rule IDs.
NOTE
7.3.3 Applications
Application of ACLs in Route Filtering
ACLs can be applied in various dynamic routing protocols to filter the advertised and
received routes.
OSPF
Internet
ATNA 172.1.17.0/24
172.1.18.0/24
172.1.19.0/24
172.1.20.0/24
ATNB
ATND
As shown in Figure 7-17, in a network running the Open Shortest Path First (OSPF) protocol,
ATN A receives routes from the Internet, and provides part of the Internet routes for ATN B.
An ACL is configured on ATN A and applied in OSPF to control the advertisement and
receiving of routes.
l ATN A provides routes 172.1.17.0/24, 172.1.18.0/24, and 172.1.19.0/24 for ATN B.
l ATN C accepts only the route 172.1.18.0/24.
l ATN D accepts all the routes provided by ATN B.
Network A
Network B
ATNA
Network E
Network C
Network D
As shown in Figure 7-18, an ACL is configured on ATN A to identify all packets from
Network A. Then, the ACL is applied to the QoS policy. In this manner, all the packets from
Network A are forwarded only after ATN A performs QoS processing. The packets from other
networks, however, are forwarded normally, because they do not match the ACL.
Basic ACL A basic ACL can define ACL rules based on only source
addresses.
Term Description
Advanced ACL An advanced ACL can define ACL rules based on the source
addresses, target addresses, protocol type, such as TCP source
or target port, the type of the ICMP protocol, and message
codes.
Ethernet frame header- An Ethernet frame header-based ACL can define rules to filter
based ACL packets based on the source MAC address, destination MAC
address, or protocol type of Ethernet frames.
Abbreviations
Abbreviation Full Spelling
7.3.5 Appendix
ACL Type Support for Support for Implementation Difference
IPv4 IPv6
7.4 IPv4
Definition
At the core of the TCP/IP protocol suite, Internet Protocol Version 4 (IPv4) works at the
Internet layer in the TCP/IP model. This layer corresponds to the network layer in the OSI
model. At the IP layer, information is divided into data units, and address and control
information is added to allow datagrams to be routed.
Connectionless transmission means that IP does not maintain status information for
subsequent datagrams. Every datagram is processed independently, meaning that IP
datagrams may not be received in the same order they are sent. If a source sends two
consecutive datagrams A and B in sequence to the same destination, each datagram is
possibly routed over a different path to the destination, and therefore B may arrive ahead of
A.
Purpose
IPv4 shields the differences at the link layer and provides the upper layer with services based
on a uniform standard of transmission on the network layer.
7.4.2 Principles
Figure 7-19 shows the position of TCP in the hierarchical architecture. Below it is the IP
protocol. TCP transmits data of different sizes based on the services provided by IP. IP
fragments and reassembles data and then transmits packets on different networks.
Network layers
Higher layer
TCP
IP
Transport network
In ISO reference model, TCP connects the upper-layer application program and the lower-
layer IP protocol.
TCP can transmit data to upper-layer application programs asynchronously. Assume that the
lower-layer interface is the IP protocol interface. To implement reliable data transmission in
connection-oriented mode on unreliable networks, TCP must provide the following:
Data ...
...
User data format
Application Layer
Socket API
Transport Layer
NetworkLayer
Datalink Layer
Physical Layer
Four types of sockets are supported, shielding the differences at the transport layer:
l TCP-based socket: ensures reliable transmission of data streams to the application layer.
l UDP-based socket: provides connectionless and unreliable data transmission to the
application layer. Such transmission, however, can provide packet boundaries.
l Raw IP-based socket: also called the raw socket. Similar to the UDP-based socket, the
raw IP-based socket provides connectionless and unreliable data transmission and packet
boundaries. It allows application programs to directly access the network layer.
l Link-layer-based socket: provided for the Intermediate System-to-Intermediate System
(IS-IS) routing protocol. The link-layer-based socket allows IS-IS to directly access the
link layer.
7.4.3 Applications
The ATN supports the control over ICMP message sending on the outbound interfaces. You
can run command lines to either enable the system to send ICMP Host Unreachable or
Redirection messages or disable the system from sending ICMP Host Unreachable or
Redirection messages. If you disable the system from sending ICMP messages, the system no
longer sends the two types of messages, reducing the traffic burden and protecting the
network from malicious attacks.
Terms
None
Abbreviations
Abbreviation Full Spelling
7.5.1 Introduction
Definition
Policy-based routing (PBR) is a mechanism used to make routing decisions based on user-
defined routing policies. This differs from the routing mechanism based on destination
addresses of IP packets.
NOTE
Purpose
Traditionally, packets are forwarded based on destination addresses in routing tables
constructed based on routing protocols. This mechanism allows routers to route packets based
on only destination addresses of packets. This routing mode meets requirements for data
forwarding but does not support differentiated services. IP PBR allows network
administrators to select forwarding paths based on packet attributes, such as destination
addresses, source addresses, and packet sizes.
Benefits
l This feature improves flexibility of route selection.
7.5.2 Principles
Related Concepts
Policy-based routing (PBR) can be categorized into the following types:
l Interface PBR: applies to received packets instead of locally sent packets (such as ping
packets).
l Local PBR: applies to locally sent packets instead of received packets.
Implementation
PBR is implemented in the following steps:
1. Specify packets suitable for PBR.
2. Specify routes for these packets.ATN PBR allows routers to flexibly select routes
according to access control list (ACL)-based packet filtering results, addresses, and
packet sizes. ACL-based packet filtering allows routers to classify packets based on
source and destination addresses, protocols, port numbers, priorities, types of services
(ToSs), time segments, and virtual private networks (VPNs). Then, the routers forward
these packets along different routes.
PBR is implemented as follows:
If PBR has been configured, a router first checks whether packets match any PBR nodes when
sending or forwarding packets.
l If the router finds matched PBR nodes, it performs the following steps to send or forward
packets:
a. The router sets priorities for packets based on the predefined priority rules to
differentiate services based on priorities. After priorities are set, the process goes to
step 2.
b. The router checks whether sending interfaces are configured for the matched PBR
nodes.
n If yes, the router sends packets through these sending interfaces.
n If no, the process goes to step 3.
c. The router checks whether next hops are configured for matched PBR nodes.
NOTE
Multiple next hops can be configured for a PBR node for load balancing.
n If yes, the router sends packets to next hops.
n If no, the router follows the normal procedure for sending packets by searching
routes based on destination addresses of packets. If no route is available, the
process goes to step 4.
d. The router checks whether a default sending interface has been configured.
n If yes, the router sends packets to the default sending interface.
n If no, the process goes to step 5.
e. The router checks whether default next hops are configured for matched PBR
nodes.
Usage Scenarios
PBR can be used for:
l Security: PBR can be configured to filter the IP address of a network attacker and
prevent routers from forwarding data flows from this IP address.
l Load balancing: When multiple paths to an Internet service provider (ISP) network are
available, network administrators use PBR to distribute traffic based on route bandwidths
to achieve load balancing.
l Routing based on source addresses: When a network provides two lines with different
rates to access the Internet, network administrators use PBR to ensure that users with
high priorities use the line with the higher rate and common users use the line with the
lower rate.
l Routing based on service classes: Data packets have different service requirements for
transmission rate, throughput, and reliability. PBR allows routers to route data packets
based on the network status. For example, routers use large-bandwidth lines for voice
and video services and small-bandwidth lines for data services.
Benefits
Different from traditional routing protocols, PBR allows network administrators to control
packet forwarding and storage more effectively and flexibly. For example, when packets have
the same destination address, PBR allows routers to select routes based on source addresses of
packets.
7.5.3 Applications
Service Overview
As shown in Figure 7-23, the internal network is connected to the Internet through a router.
The router provides multiple interfaces to connect to the Internet.
l To ensure that a certain type of packet is forwarded through a specified interface,
configure IP unicast policy-based routing (PBR) for the interface.
l To apply PBR to packets that are generated on a router, configure local PBR.
Networking Description
PC1 PC2
Port1
Port3
Internet
10.110.0.0 Port2
RouterA
PC3
Feature Deployment
l Routing based on source addresses: When a network provides two lines with different
rates to access the Internet, network administrators use PBR to ensure that users with
high priorities use the line with the higher rate and common users use the line with the
lower rate. A PBR node is configured on ATN A. The PBR node defines routing rules
and actions. For example, PBR is enabled on Ethernet port 3. The PBR configuration
allows ATN A to send all packets that are received on port 3 from PC1 at 10.110.0.11/24
through port 2 and send other packets based on their destination addresses.
l Routing based on service classes: Data packets have different service requirements for
transmission rate, throughput, and reliability. PBR allows routers to route data packets
differently based on the network status. For example, routers use large-bandwidth lines
for voice and video services and small-bandwidth lines for data services. For example,
the bandwidth of the line for sending packets from port 1 is larger than that from port 2.
PBR can be configured on port 3 of ATN A to enable ATN A to send voice and video
services from port 1 and data services from port 2.
Terms
Term Description
7.6 IPv6
Purpose
The IPv4-based Internet achieves a great success. Consequently, the IP technology is widely
applied. With the rapid development of the Internet, however, deficiencies in IPv4 become
increasingly obvious in the following aspects:
l The IPv4 address space is insufficient.
An IPv4 address is identified by using 32 bits. In theory, a maximum of 4.3 billion
addresses can be provided. In actual applications, less than 4.3 billion addresses are
available because of address allocation. In addition, IPv4 address resources are allocated
unevenly. Address resources of the USA occupy almost half of the global address space;
the address resources of Europe are relatively fewer than those of the USA; the address
resources of the Asian-Pacific region are much fewer. The development of mobile IP and
broadband technology requires more IP addresses. Consequently, limited IPv4 address
resources directly restrict the further development of the IP technology.
There are several solutions to IPv4 address shortage. Classless Interdomain Routing
(CIDR) and Network Address Translator (NAT) are two representative solutions to IPv4
address shortage. CIDR and NAT, however, have their disadvantages and unsolved
problems. This promotes the development of IPv6.
l The backbone device maintains too many routing entries.
Many discontinuous IPv4 addresses are allocated because of the problems in the initial
IPv4 address allocation planning. As a result, routes cannot be aggregated effectively.
The increasingly large routing table consumes a lot of memory, degrading forwarding
efficiency. Subsequently, device manufacturers have to upgrade products to improve
route addressing and forwarding performance.
l Address autoconfiguration and readdressing cannot be performed easily.
An IPv4 address occupies only 32 bits and IP addresses are allocated unevenly.
Consequently, IP addresses need to be reallocated during network expansion or network
replanning. The workload for maintenance is heavy.
l Security cannot be well guaranteed.
As the Internet develops, security problems become more serious. The IPv4 design does
not fully consider security, so the original framework cannot ensure end-to-end security.
IPv6 provides end-to-end security by using IP security (IPSec) as the standard extended
header.
IPv6 radically solves the problem of IP address shortage. Moreover, IPv6 has the following
advantages: It is easy to deploy, compatible with various applications, easy for IPv4 networks
to transit to IPv6 networks. With so many obvious advantages over IPv4, IPv6 is rapidly
developed.
7.6.2 Principles
Basic functions of IPv6 include IPv6 neighbor discovery and IPv6 path MTU (PMTU)
discovery. Neighbor discovery and PMTU discovery are implemented through Internet
Control Message Protocol for IPv6 (ICMPv6) messages.
Fixed Header
0 7 15 31
Version Traffic class Flow label
Payload length Next header Hop limit
Source address
Destination address
IPv6 header
Hop Limit 8 bits Replaces the IPv4 Time to Live field. This
field defines the maximum allowable
number of hops a packet can pass through.
The value is decreased by 1 for each node
that forwards the packet. The packet is
discarded if Hop Limit is decreased to zero.
Extension Header
Optional Internet layer information is encoded in separate headers to reduce IPv6 packet
processing costs and to limit the bandwidth needed for IPv6 headers. Extension headers are
classified as follows:
l Hop-by-Hop Options header
This header is used to specify sending parameters for each hop on the path of a packet.
Every intermediate node on the path needs to read and process the field. It is identified
by the Next Header value 0 in the IPv6 header.
0 7 15 31
Options
– Next Header: 8 bits. It identifies the type of header immediately following the Hop-
by-Hop Options header. Its functions are the same as the Next Header field in a
fixed header. It is included in all options headers.
– Hdr Ext Len: 8 bits. It indicates the length of the Hop-by-Hop Options header, not
including the first 8 bytes.
– Options: a combination of fields. This is used to describe a data forwarding feature
or to fill in the Hop-by-Hop Options header. A Hop-by-Hop Options header can
contain one or more Options fields. The Options field that describes data
forwarding is essential for the Hop-by-Hop Options header. The following table
describes the Options field format.
The Options field is used in the Destination Options header as well as in the Hop-
by-Hop Options header. Each option is encoded in the type-length-value (TLV)
format.
n Option Type: 8 bits. It identifies the option type and specifies the method used
by relevant nodes to process this field.
n Opt Data Length: 8 bits. It indicates the length of the Option Data field for this
option, not including the Option Type and Opt Data Length fields.
n Option Data: variable-length field. It contains data specific to this Option
Type.
l Destination Options header
The format of the Destination Options header is similar to the Hop-by-Hop Options
header shown in Figure 7-26, except that the value of the Next Header field in the
Destination Options header is 60. Destination Options headers are the only type of
header that can occur twice in a packet, once before a Routing header and once before
the upper-layer header. When the Destination Options header is before a Routing header,
it is processed by the nodes in the address list contained in the Routing header. When the
Destination Options header is before the upper-layer header, it is processed by the
destination device. Options that need to be processed by all nodes on a specified
forwarding path are placed before the Routing header, whereas options that need to be
processed only by the destination device are placed before the upper-layer header.
l Routing header
The Routing header is used to specify the intermediate nodes that a packet must pass
through. Figure 7-28 shows the format of the Routing header.
Type-specific data
The Next Header and Hdr Ext Len fields mean the same things in a Routing header as
they do in a Hop-by-Hop Options header, except that the Next Header field has the value
43 in a Routing header. The other Routing header fields are as follows:
– Routing Type: 8 bits. It identifies the type-specific data. At present, RFC 2460 has
defined only Routing Type=0.
– Segments Left: 8 bits. It indicates the number of route segments remaining. This
refers to the number of listed intermediate nodes still to be visited before the
destination is reached.
– Type-specific data: Variable-length field. The Routing Type determines the format
of this field. Type-specific data for Routing Type=0 defined by RFC 2460 is the IP
addresses of intermediate nodes to be visited.
l Fragment header
When packet size exceeds the Maximum Transmission Unit (MTU), the packet needs to
be fragmented. Fragments are identified by the Fragment header. Unlike IPv4,
fragmentation in IPv6 is performed only by source nodes, not by routers along the path a
packet traverses. Figure 7-29 shows the format of the Fragment header.
Identification
Authentication Data(variable)
Sequence Number
NOTE
When using extension headers, note the following:
l When more than one extension header is used in the same packet, it is recommended that those
headers appear in the previously mentioned order.
l Not all extension headers need to be checked and processed by intermediate nodes. When an
intermediate node forwards a packet, it determines whether to process extension headers carried in
the packet on the basis of the Next Header field value in the fixed header. Packets have only one of
each type of extension header, with the exception of the Destination Options header. This header
may occur twice, once before a Routing header and once before the upper-layer header.
l The value 59 in the Next Header field of an IPv6 header or extension header indicates that there is
nothing following that header. Even if the Payload Length field indicates that there are more bytes
behind that header, those bytes must be ignored, and passed on unchanged if the packet is forwarded.
l X:X:X:X:X:X:X:X
– An IPv6 address is divided into eight groups, separated by colons. Each group (an
X) is a 16-bit hexadecimal number that consists of four hexadecimal digits,
including 0 to 9 and A to F. For example, 2031:0000:130F:0000:0000:09C0:876A:
130B is an IPv6 address.
For convenience, a group containing all 0s is displayed as a single 0. The example
address can be written as 2031:0:130F:0:0:9C0:876A:130B.
– Two or more consecutive groups of 0s can be replaced with an empty group using a
pair of colons (::), which helps minimize the IPv6 address length. The example
address can also be written as 2031:0:130F::9C0:876A:130B.
An IPv6 address can only contain a single pair of colons (::). If an IPv6 address
contains more than one pair of colons, a computer cannot restore the compressed
address to the original 128-bit address because it cannot identify the number of
zeros in the IPv6 address.
l X:X:X:X:X:X:d.d.d.d
Each "X" is 16 bits long and consists of four hexadecimal digits. Each "d" is 8 bits long
and is presented by a decimal number. "d.d.d.d" represents an IPv4 address. The
following addresses are expressed in this format:
– 0:0:0:0:0:0:IPv4-address: an IPv4-compatible IPv6 address. The most significant
96 bits of 0s precede a 32-bits IPv4 address. The IPv4 address must be reachable on
an IPv4 network and can only be a unicast address, but not a multicast address, a
broadcast address, a loopback address, or an unspecified address (0.0.0.0, for
example).
An IPv4-compatible IPv6 address is used to configure an IPv6 over IPv4 tunnel.
– 0:0:0:0:0:FFFF:IPv4-address: an IPv4-mapped IPv6 address that is mapped to an
IPv4 address of an IPv4 node. This address type is used to represent the address of
an IPv4 node as an IPv6 address.
64 bits 64 bits
2001:A304:6101:0001 0000:00E0:F726:4E58
local IPv6 unicast address as a source or destination address are forwarded only on
a local link. A link-local IPv6 unicast address can be automatically configured on
any interface using a link-local prefix FE80::/10 (1111 1110 10 in binary) and an
EUI-64 interface ID.
– A unique-local unicast address identifies a single site and has a globally unique
prefix. Sites use unique-local unicast addresses to establish private connections,
without incurring address conflicts. Even if routes destined for unique-local unicast
addresses leak, the routes do not conflict with Internet routes. Upper layer
applications use unique-local unicast addresses as global unicast addresses. Figure
7-34 shows the unique-local unicast address structure. The address contains the
following fields:
n 1111101: the prefix of a unique-local unicast address.
n L: a 1-bit field. The value can be:
○ 1: The address is used locally.
○ 2: The address is reserved for future use.
n Group ID: a 40-bit global identifier that is a pseudo random number.
n Subnet ID: a 16-bit subnet identifier that identifies a subnet within a site.
n Interface ID: a 64-bit identifier that identifies an interface.
n Global routing prefix: with three left-most bits of 001. When an ISP assigns a
global routing prefix to an organization, the global routing prefix must have at
least 48 bits.
n Subnet ID: identifies a subnet within a site.
n Interface ID: uniquely identifies an interface.
l Anycast address: identifies a group of interfaces on different nodes. Packets bound for an
anycast address reach the interface that is nearest to the source node among interfaces in
the interface group identified by the anycast address. A routing protocol determines the
shortest path.
Applicable environment: When a mobile host needs to communicate with the mobile
agent on the home subnet, it uses the anycast address of the device of the subnet.
Specifications of addresses: Anycast addresses do not have independent address space.
They can use the format of any unicast address. Therefore, a syntax is used to
differentiate an anycast address from a unicast address.
l Multicast address: identifies a group of interfaces on different nodes. A multicast IPv6
address is similar to an IPv4 multicast address. Packets bound for a specified multicast
address reach all interfaces identified by the multicast address. Figure 7-36 shows the
multicast address structure. The address contains the following fields:
– 11111111: a binary number that identifies a multicast address.
– Flags: a 4-bit field that identifies a subnet. The third-bit T flag can be:
n 0: permanent multicast address
n 1: transient or dynamic multicast address
– Scope: a 4-bit field that identifies the usage scope of a multicast address. Some
meaningful scope values are as follows:
n 1: Interface-Local scope. A multicast address is locally used on a node.
n 2: Link-Local scope. A multicast address is locally used on a link.
n 4: Admin-Local scope. A multicast address is locally used for management.
n 5: Site-Local scope. A multicast address is locally used within a site.
n 8: Organization-Local scope. A multicast address is locally used by an
organization.
n E: Global scope. A multicast address is globally used.
– Group ID: a 112-bit field that identifies a multicast group. The multicast group can
be permanent or transient within a specified scope.
Although no IPv6 broadcast addresses exist, IPv6 multicast addresses provide broadcast
address functions.
Insert FFFE:
00000000 00010010 00110100 1111111111111110 00000000
1010101111001101
l Supporting QoS
In an IPv6 header, a new field, namely, the Flow Label field, specifies how to identify
and process traffic. The Flow Label field identifies a flow and allows a device to
recognize packets in a flow and to provide special processing.
QoS is guaranteed for even the packets encrypted with IPSec because the IPv6 header
can identify different types of flows.
l Built-in security
Adopting IPSec as the standard extension header, IPv6 provides end-to-end security.
This provides specifications for ensuring network security, and improves interoperation
between different IPv6 applications.
l Flexible extension header
An IPv4 header supports only the 40-byte option, whereas the size of the IPv6 extension
header is limited only by the IPv6 packet size.
IPv6 introduces multiple extension headers to replace the Options field in the IPv4
header. This improves the packet processing efficiency, enhances IPv6 flexibility, and
provides better scalability for the IP protocol. Figure 7-38 shows an IPv6 extension
header.
Fragment
IPv6 header extension IPv6 data
header
Routing Destination
IPv6 header extension extension IPv6 data
header header
When multiple extension headers are used in the same packet, the headers must be listed in
the following order:
Not all extension headers need to be examined and processed by devices. When forwarding
packets, a device determines whether to process the extension headers based on the Next
Header value in the IPv6 basic header.
Appearing twice in a packet, the destination options extension header appears before the
routing extension header and after the upper layer extension header. The other extension
headers appear only once.
7.6.2.4 ICMPv6
As one base protocol of IPv6, Internet Control Message Protocol for IPv6 (ICMPv6)
generates error messages and informational messages, which are used by IPv6 nodes to report
errors and information generated during packet processing. Figure 7-39 shows the format of
an ICMPv6 message.
0 7 15 23 31
Type (1) Code (1) Checksum (2)
Packet Content
......
l ND Packet Format
l Router Discovery
l Default Router Priority and Route Information
l Duplicate Address Detection
l Neighbor Discovery
ND Packet Format
After being configured with an IPv6 address, a node checks whether this address is available
and does not conflict with other addresses. When a node is a host, a ATN needs to notify the
host of the optimal next-hop address of a packet to a destination. When a node is a ATN, it
needs to advertise its address, address prefix, and other configuration parameters to instruct
hosts to configure parameters. When forwarding IPv6 packets, a node needs to know the link
layer addresses and check reachability of neighboring nodes. IPv6 ND provides five types of
ICMPv6 messages:
l Router Solicitation (RS): After startup, a host sends an RS message to a device and waits
for the device to respond with a Router Advertisement (RA) message. Figure 7-40
shows the RS message format.
– Prf: a 2-bit Default Router Preference flag. The Prf value of a router that sends the
RA message is used as the priority of the default router for hosts.
– P: a 1-bit Proxy flag. Its value can be:
n 0: disables ND proxy.
n 1: enables ND proxy.
– Rsv: This field must be initialized to 0 on the transmit end and be ignored on the
receive end.
– Router Lifetime: a 16-bit field that indicates the lifetime (in seconds) of a default
router. The lifetime of a router that sends the RA message is used as the lifetime of
the default router for hosts. The default value is 30 minutes, and the maximum
value is 18.2 hours. Value 0 indicates that the router sending the RA message does
not function as the default router, while information carried in the RA message
takes effect.
– Reachable Time: a 32-bit field that indicates a period of time (in milliseconds),
during which a router considers its neighbor reachable after having received a
reachability confirmation. A router sends an RA message through an interface to
enable all nodes on a link connected to the interface to use the same reachable time.
The value can be set. The default value is 0 in an RA message. Value 0 means that a
router does not use this field.
– Retrans Timer: a 32-bit retransmission field that indicates the interval at which NS
messages are resent. The Retrans Timer value is used during neighbor
unreachability detection and address resolution. The value can be set. The default
value is 0 in an RA message. Value 0 means that a router does not use this field.
– Options:
n Source link-layer option: only used on link layers that have addresses. A router
must ignore this option when performing load sharing among multiple link-
layer addresses.
n MTU option: variable MTU of a link.
n Prefix Information option: specifies one or more prefixes for address
autoconfiguration.
n Advertisement Interval option: interval (in milliseconds) at which RA
messages are sent. This option is used for mobile IPv6.
n Home Agent option: used for mobile IPv6.
n Route Information option: used by a host to generate a default route.
l Redirect: When a device finds that the inbound interface and outbound interface of a
packet are the same, the device can send Redirect messages to instruct the host that sends
the packet to choose a better next hop. Figure 7-42 shows the Redirect message format.
Router Discovery
Router discovery is used to locate a neighboring device and learn the address prefix and
configuration parameters related to address autoconfiguration. IPv6 router discovery is
implemented based on the following messages:
l Router Solicitation (RS) message
When a host is not configured with a unicast address, for example, when the system is
just started, it sends an RS message. An RS message helps the host rapidly perform
address autoconfiguration without waiting for the RA message periodically sent by an
IPv6 device. An RS message is an ICMPv6 message with type 133.
l Router Advertisement (RA) message
Interfaces on each IPv6 device periodically send RA messages only when they are
enabled to send IPv6 RA messages. After receiving the RS message of an IPv6 device on
the local link, a device responds with an RA message. An RA message is sent to the all-
nodes multicast address (FF02::1) or to the IPv6 unicast address of the node that sends
the RS message. An RA message is an ICMPv6 message with type of 134 and contains
the following information:
– Whether to use address autoconfiguration.
– Supported autoconfiguration type: stateless or stateful.
– One or multiple on-link prefixes. On-link nodes can perform address
autoconfiguration using these address prefixes.
– Lifetime of the advertised on-link prefixes
– Whether the device that sends an RA message can be used as a default device. If
yes, the lifetime, expressed in seconds, of the default device is also used.
– Other information about the host, such as the hop limit and the MTU that specifies
the maximum size of the packet initiated by a host.
After an IPv6 node on the local link receives the RA message, it extracts the preceding
information to obtain the updated default device list, prefix list, and other configurations.
Address Autoconfiguration
A router sends RA messages with the M field to instruct a host how to perform address
autoconfiguration. A host selects an address configuration mode based on the M flag in an RA
message shown in Figure 7-41. The configuration modes include stateless and stateful
address configuration.
l If the M field is set to 0, stateless address allocation is used. The host does not need to be
additionally configured, the router needs a few configurations, and no server is needed.
After a host receives an RA message, it uses prefix information in the message and local
interface ID to automatically calculate an IPv6 address. The host also sets the default
router according to the default router information in the message. Stateless address
allocation only applies to hosts, not routers.
l If the M field is set to 1, stateful address allocation is used. A server, for example, a
DHCPv6 server, assigns a host an IPv6 address. The server maintains a database that
contains the host information and configured addresses. Stateful address allocation
allows hosts to obtain IPv6 addresses from a server.
Hosts can select the mode for configuring other information, such as DNS and SIP server
address based on the O field carried in the RA messages:
l If the O field is set to 0, the host obtains IPv6 settings (except an IPv6 address) using a
stateless protocol, for example, ND.
l If the O field is set to 1, the host obtains IPv6 settings (except an IPv6 address) using a
stateful protocol, for example, DHCPv6.
NOTE
RFC 4861 defines that if the M flag is set to 1, the O flag must also be set to 1.
NS(Muslticast)
Nerghber Solicitation
Destination MAC:33-33-FF-52-F9-D8
Source Addr: :: HostC
HostB
Destination Addr:FF02::1:FF52:F9DB
MAC:00-60-08-52-F9-D8
Target Addr:FEC0::2:260:8FF:FE52:F9D8
IP:FEC0::2:260:8FF:FE52:F9D8
Neighbor Discovery
Similar to IPv4 ARP, IPv6 ND resolves the addresses of neighbors and monitors the
reachability of neighbors based on NS and NA messages.
When a node needs to obtain the link-layer address of another node on the same local link, it
sends an NS message of type 135. The NS message is similar to an IPv4 ARP Request
message and is destined for a multicast address instead of a broadcast address. Only the node
with last 24 bits in the address the same as the multicast address can receive the NS message.
This helps minimize the possibility of broadcast storms. A destination node fills in its link-
layer address in the NA message.
NS(Multicast)
Nerghber Solicitation
Destination MAC:33-33-FF-02-6E-A5
Source Addr:FE80::210:5AFF:FEAA:20A2
HostB HostC
Destination Addr:FF02::1:FF02:6EA5
MAC:00-60-97-02-6E-A5
Target Addr:FE80::260:97FF:FE02:6EA5
IP:FE80::260:97FF:FE02:6EA5
Source Link-Layer Addr:00-10-5A-AA-20-A2
An NS message is also used to monitor the reachability of a neighbor when the link-layer
address of the neighbor is known. After receiving an NS message, a destination node responds
with an NA message of type 136 on the local link. After receiving the NA message, the source
node can communicate with the destination node. When the link-layer address of a node on
the local link changes, the node proactively sends an NA message.
NS/ SEND An attacker sends a legitimate The key reason why such an attack
NA CGA node (host or ATN) a neighbor is launched during ND address
spoo solicitation (NS) message that resolution is that legitimate nodes
fing contains a bogus source link- fail to determine the IPv6
layer address option or a addresses and link-layer addresses
neighbor advertisement (NA) as well as the bindings between
message that contains a bogus them. Therefore, legitimate nodes
target link-layer address incorrectly receive NS or NA
option. NS/NA spoofing messages sent from the attacker. In
causes messages for the response to this attack, SEND
legitimate node to be sent to CGA combines a CGA address, a
the bogus address. CGA option, and an RSA option to
authenticate the validity of the
source address carried in an ND
message.
Spoo SEND An attacker uses the link-layer SEND counters this attack by
fed CGA address of the current first-hop requiring a Redirect message to
Redi router to send a Redirect contain an RSA Signature option.
rect message to a legitimate host. The RSA signature is calculated
mess The legitimate host accepts this using the public key of the
age message because the host legitimate host. All messages that
mistakenly considers that the fail to pass the RSA signature-
message came from the first- based authentication are discarded.
hop router.
Repl SEND An attacker captures valid SEND protects against this attack
ay CGA messages and replays them. from solicited messages (such as
attac That is, even if Neighbor NS/RS messages) by including a
ks Discovery Protocol (NDP) Nonce option and requiring
messages are cryptographically response messages (such as
protected so that their contents NA/RA messages) to include a
cannot be forged, they are still matching Nonce option. SEND
prone to replay attacks. protects against this attack from
unsolicited messages (such as
NA/RA/Redirect messages) by
including a Timestamp option.
l Basic Concepts
CGA option The CGA option includes the sender's amendment value
and public key. The receiver can use the CGA option to
verify the sender's CGA.
RSA Signature The RSA Signature option includes the hash value of the
option sender's public key and the digital signature constructed
using the sender's private key and ND messages. The
receiver uses the RSA Signature option to verify the
integrity of ND messages and authenticate the identity of
the sender.
Trust Anchor The Trust Anchor option identifies a trust anchor for which
option a given certification path should be constructed.
l Deployment Model
– SEND deployment with no public key infrastructure (PKI)
SEND-NS(SLLA,CGA,Nonce,(Time Stamp),RSA)
SEND-NA(SLLA,CGA,Nonce,(Time Stamp),RSA)
extension. The ATN can then advertise only the prefixes that are within
the prefix range specified in the certificate.
○ Unconstrained prefix: If the IP address extension in the certificate for a
ATN is missing or is the null prefix (::/0), the prefixes that the ATN
advertises are said to be unconstrained. That is, the ATN is allowed to
advertise any prefix.
CA – Certificate Authority
TA – Trust Anchor
CRL – Certificate Revocation List
CA (C0)
CRL TA
CA (C1)
CA (C2)
TA TA
Off link
On link
Device (CR)
Host A Host B
RS(SLLA,CGA,Nonce,(Time Stamp),RSA)
1
RA(SLLA,CGA,Nonce,(Time Stamp),RSA)
2
CPS(Trust Anchor)
3
CPA(Trust Anchor,Certificate(C1))
4
CPA(Trust Anchor,Certificate(C2))
CPA(Trust Anchor,Certificate(CR))
3 证书验证、签名验证、前缀验证
As shown in Figure 7-48, when SEND ADD is deployed with PKI, both the host
and ATN use trust anchors. The ATN is certified using CA2, with CA2 certified
using CA1 and CA1 certified using CA0. CA0 is trusted by the host.
SEND ADD includes offline preparation and online operation.
n Offline preparation:
NOTE
The host caches the certificates that passed the authentication so that no
more CPS messages will be sent upon receiving an RA message. The
cached certificates require periodic CRL checks to ensure availability.
2) Signature authentication:
If certificate authentication succeeds, the host uses the public key
carried in the ATN certificate (CR) to certify the digital signature
contained in the RSA Signature option of the RA message. If
signature authentication fails, the host discards the RA message.
3) (Optional) Prefix authentication:
If signature verification succeeds and the ATN certificate (CR)
contains the IP address extension, the host authenticates the prefix
carried in the IP address extension.
○ If stateful address autoconfiguration is performed, the host
authenticates the prefix provided by the DHCP server other
than the prefix carried in the RA message.
○ If stateless address autoconfiguration is performed, the host
authenticates the prefix (or prefix range) carried in an RA
message sent by the ATN.
NOTE
When the PMTU learned by the node is smaller than or equal to the actual PMTU, the PMTU
discovery process is complete. Before the PMTU discovery process is complete, ICMPv6
Datagram Too Big messages may be repeatedly sent and received because smaller IPv6
MTUs may be found on farther paths.
Figure 7-49 Structure of a single protocol stack and dual protocol stacks in Ethernet
7.6.2.8 TCP6
Transmission Control Protocol Version 6 (TCP6) provides a mechanism to establish virtual
circuits between processes of two endpoints. A TCP6 virtual circuit is similar to the full-
duplex circuit that transmits data between systems. Providing reliable data transmission
between processes, TCP6 is called a reliable protocol. TCP6 also provides a mechanism to
optimize the transmission performance according to the network status. When all the data can
be received and acknowledged, the transmission rate increases gradually. However, delay
causes the sending host to decrease the sending rate before it receives Acknowledgement
packets.
TCP6 is generally used in interactive applications, such as the Web. However, certain errors in
data receiving affect the normal operation of devices. TCP6 establishes virtual circuits by
using the three-way handshake mechanism, and all the virtual circuits are deleted through the
four-way handshake. TCP6 connections provide multiple checksums and reliability functions,
but increase the cost. As a result, TCP6 has lower efficiency than User Datagram Protocol
Version 6 (UDP6).
Client Server
Call the socket and Call the socket and receive its
receive its return value return value
Call the bind/listen function
and receive their return values
Call the connect SYN
function
Call the accept function
Set up a Wait SYN|ACK
connection
Receive the return Wait
ACK
value of connect
Receive the return value of accept
Call the recv function
Data Call the send function and receive
Wait its return value
Data Receive the return Call the recv function
transmission value of recv Wait
Call the send Data|ACK
function Receive the return value of recv
ACK
7.6.2.9 UDP6
User Datagram Protocol Version 6 (UDP6) is a computer communication protocol used to
exchange packets on a network. UDP6 has the following characteristics:
l UDP uses only source and destination information and is mainly used in the simple
request/response structure.
l UDP is unreliable, so it cannot be determined whether UDP6 datagrams reach their
destinations.
l UDP is connectionless. That is, no virtual circuits are required during data transmission
between hosts.
The connectionless feature of UDP6 enables UDP6 to send data to multicast addresses. This
is different from TCP6, which requires specific source and destination addresses.
7.6.2.10 RawIP6
RawIP6 fills only a limited number of fields in the IPv6 header, and it allows application
programs to provide their own IPv6 headers.
RawIP6 is similar to UDP6 in the following aspects:
l RawIP6 is unreliable, so it cannot be determined whether RawIP6 datagrams reach their
destinations.
l RawIP6 is connectionless. That is, no virtual circuits are required during data
transmission between hosts.
Unlike UDP6, RawIP6 allows application programs to directly operate the IP layer through
the socket. This facilitates the direct interactions with the lower layer.
7.6.3 Applications
Router
VLANIF1
Switch1
VLAN1
PC 1 PC 2
If PC1 needs to communicate with PC2, PC1 first sends an NS packet to query the MAC
address of PC2; the NS packet, however, cannot reach PC2 because interface isolation is
configured on S1. The routing device, therefore, is responsible for forwarding the NS packet
to PC2. Note that on the routing device, the MAC address carried in the NS packet is changed
to the MAC address of VLANIF1. PC2 then returns an NA packet to PC1. After receiving the
NA packet, the routing device generates an ND entry for PC2 and related routing entries,
changes the MAC address carried in the NA packet to the MAC address of itself, and
forwards the NA packet to PC1. In this manner, the MAC address of PC2 learnt by PC1 is
actually the MAC address of the routing device.
PC1 then encapsulates packets based on the learnt ND entries and sends the packets to the
ATN and the ATN forwards the packets to PC2 based on the learnt route.
As shown in Figure 7-52, PC1 and PC2 belong to VLAN 1 (Access-VLAN) and VLAN 2
(Access-VLAN) respectively and are connected to the routing device through S1 and S2
separately. VLAN 1 and VLAN 2 both belong to VLAN 3 (Aggregate-VLAN). In such a
case, you can configure ND proxy between VLANs on VLANIF3 of the ATN so that PC1 can
communicate with PC2.
ATN A
GE0/2/1 GE0/2/2
VLAN1 VLAN2
VLAN3
Switch1 Switch2
VLAN1 VLAN2
If PC1 needs to communicate with PC2, PC1 first sends an NS packet to query the MAC
address of PC2. The routing device then changes the MAC address carried in the NS packet to
the MAC address of VLANIF3 on the routing device. PC2 then returns an NA packet to PC1.
After receiving the NA packet, the routing device generates an ND entry for PC2 and related
routing entries, changes the MAC address carried in the NA packet to the MAC address of
itself, and forwards the NA packet to PC1. In this manner, the MAC address learnt by PC1 is
actually the MAC address of the routing device.
PC1 then encapsulates packets based on the learnt ND entries and sends the packets to the
routing device and the routing device forwards the packets to PC2 based on the learnt route.
ND Neighbor discovery, which is used during the forwarding of IPv6 packets for
duplicate address detection, neighbor address resolution, and neighbor
reachability detection. Additionally, ND is a set of protocols and processes for
host address configuration In ND, different ICMPv6 messages are used for
router discovery and neighbor discovery.
ICMPv6 Internet Control Message Protocol Version 6, which is a base protocol of IPv6
and generates error messages and informational messages used by IPv6 nodes
to report errors and information generated during packet processing.
PMTU Path MTU, which discovers the supported MTU on a specific path by using
ICMPv6 Datagram Too Big messages.
Abbreviations
Abbreviation Full Spelling
ND Neighbor Discovery
RS Router Solicitation
RA Router Advertisement
NS Neighbor Solicitation
NA Neighbor Advertisement
8 IP Routing
This document describes the IP routing in terms of the overview, principle, and applications.
Definition
Routing is the basic element of data communication networks. Routing information guides
data packet forwarding. IP routing refers to the process of relaying and forwarding packets.
8.1.2 Principles
8.1.2.1 Routers
In the Internet, network connecting devices control traffic and ensure the quality of data
transmission on the network. Common network connecting devices include hubs, bridges,
switches and, routers.
As a typical network connection device, a router is used to select routes and forward packets.
According to the destination address in the received packet, a router selects a proper path,
which has single-hop or multiple hops in it, to send the packet to the next router. The last
router is responsible for sending the packet to the destination host. In addition, the router can
select an optimal path to transmit data.
The hop count from a router to its directly connected network is zero, and to a network
through another router, is one. The remaining number of hops required for the route can be
deduced by analogy. If a router is connected to another router through a network, that is, a
network segment exists between the two routers, the two routers are considered as adjacent
routers on the Internet. This connection between routers is independent of the physical links
that constitute each network segment.
In Figure 8-1, to get from Host A to Host C, a packet needs to go through three networks and
two routers. The bold arrows indicate network segments.
Host A
Host C
Host B
The size of networks may vary, and the length of each network segment may also vary. In this
case, the number of network segments is multiplied by a weighted coefficient when the actual
length of a path is measured.
Routing through the minimum number of network segments is not always the ideal path. For
example, routing through three high-speed LAN network segments is probably much faster
than routing through two low-speed WAN network segments.
Routing Table
Each ATN maintains the protocol routing table for each type of protocol and a local core
routing table (or routing management table).
l Protocol routing table
A protocol routing table stores the routing information discovered by the protocol.
A routing protocol can import and advertise the routes that are discovered by other
protocols. For example, if a ATN that runs the Open Shortest Path First (OSPF) protocol
needs to use OSPF to advertise direct routes, static routes, or Intermediate System-
Intermediate System (IS-IS) routes, the ATN must import the routes into the OSPF
routing table.
l Local core routing table
A ATN uses the local core routing table to store protocol routes and preferred routes. The
ATN then sends the preferred routes to the FIB table to guide packet forwarding.
The ATN selects routes according to the priorities of protocols and costs stored in the
routing table. To view the local core routing table of a ATN, run the display ip routing-
table command.
NOTE
A ATN that supports Layer 3 Virtual Private Network (L3VPN) maintains a local core routing
table for each VPN instance.
A routing table contains the following key data for each IP packet:
l Destination address: is used to identify the destination IP address or the destination
network address of an IP packet.
l Network mask: is combined with the destination address to identify the address of the
network segment where the destination host or ATN resides.
– The network address of the destination host or ATN is obtained through the "AND"
operation on the destination address and network mask. For example, if the
destination address is 1.1.1.1 and the mask is 255.255.255.0, the address of the
network where the host or ATN resides is 1.1.1.0.
– The network mask is composed of several consecutive 1s. These 1s can be
expressed in either the dotted decimal notation or the number of consecutive 1s in
the mask. For example, the network mask can be expressed either as 255.255.255.0
or 24.
l Proto: indicates the protocol through which routes are learned.
l Pre: indicates the preference added to the IP routing table for a route. To the same
destination, multiple routes with different next hops and outgoing interfaces exist. The
routes in the table are those discovered by different routing protocols or tare the
manually configured static routes. The router selects the route with the highest
preference (the smallest value) as the optimal route. For more information on the
preference of each protocol, see Table 8-1.
l Cost: indicates the route cost. When multiple routes to the same destination have the
same preference, the route with the lowest cost is selected as the optimal route.
NOTE
The Preference value is used to compare the preferences of various routing protocols, while the
Cost value is used to compare the preferences of different routes of the same routing protocol.
l NextHop: indicates the IP address of the next device that an IP packet passes through.
l Interface: indicates the outgoing interface through which an IP packet is forwarded.
The routes are divided according to the destination of the packet into the following types:
l Subnet route: The destination is a subnet.
l Host route: The destination is a host.
In addition, based on whether the router is directly connected to the network in which the
destination resides, a route is one of the following connection types:
l Direct route: The ATN is directly connected to the destination network.
l Indirect route: The ATN is not directly connected to the destination network.
To reduce the number of entries in the routing table, you can set a default route. All packets
that fail to match entries in the routing table are forwarded through this default route. For
example, the first route listed in the preceding routing table, with the destination address of
0.0.0.0/0, is a default route.
As shown in Figure 8-2, ATN A is connected to three networks, so it has three IP addresses
and three physical interfaces. Figure 8-2 also shows the routing table of ATN A.
2.2.2.2/24 3.3.3.2/24
12.0.0.0/8 13.0.0.0/8
The, ATN performs the "AND" operation on the destination address in the packet and the
network mask of each entry in the FIB table. The ATN then compares the result of the "AND"
operation with the entries in the FIB table to find a match. The ATN chooses the optimal route
to forward packets according to the best or "longest" match.
NOTE
The complete routing table contains active routes and inactive routes. The brief routing table contains
only active routes. To view the complete routing table, run the display ip routing-table verbose
command.
After receiving a packet that carries the destination address 9.1.2.1, the ATN searches the
following table:
FIB Table:
Total number of Routes : 5
Destination/Mask Nexthop Flag TimeStamp Interface
TunnelID
9.1.2.1/32 192.168.22.2 DGHUT t[11687] GE0/2/1
0xa
192.168.7.255/32 127.0.0.1 HU t[11637] InLoop0
0x0
192.168.7.2/32 127.0.0.1 HU t[11637] InLoop0
0x0
1.1.1.1/32 192.168.2.1 DGHUT t[288] Eth0/2/0
0x7
4.4.4.255/32 127.0.0.1 HU t[213] InLoop0 0x0
The ATN chooses the 9.1.2.1/32 entry because it is the longest match. The router then
forwards the packet through GE0/2/1 for the 9.1.2.1 entry.
cannot automatically adapt to changes in the network topology, so they must be manually
configured.
On the other hand, dynamic routing protocols use routing algorithms to automatically adapt to
changes in network topology. Dynamic routes are applicable to the network that is equipped
with Layer 3 devices. The dynamic route configuration, however, has a higher requirement
(such as large memory capacity) for system performance and occupies more network
resources.
Range of Functions
Routing protocols are classified according to the application range:
l Interior Gateway Protocol (IGP): runs inside an AS, such as RIP, OSPF, and IS-IS.
l Exterior Gateway Protocol (EGP): runs between different ASs, such as BGP.
Algorithm
Routing protocols are classified according to the type of algorithm they use:
l Distance-Vector Routing Protocol: includes RIP and BGP (BGP is also called Path-
Vector).
l Link-State Routing Protocol: includes OSPF and IS-IS.
The algorithms differ mainly in their methods of route discovery and route calculation.
Destination Addresses
Routing protocols are classified by the following types of destination addresses: t
l Unicast routing protocol: includes RIP, OSPF, BGP, and IS-IS.
Static routes and dynamic routes discovered by the routing protocol are managed in the ATN.
All these routes can be shared among different routing protocols to implement
Readvertisement of Routing Information.
Route Preferences
Routing protocols (including the static route) can learn different routes to the same
destination, but not all routes are optimal. Only one routing protocol at one time determines
the optimal route to a destination. To select the optimal route, each routing protocols
(including the static route) is configured with a preference (the smaller the value, the higher
the preference). When multiple routing information sources coexist, the route with the highest
preference is selected as the optimal route (the smaller the value is, the higher the preference
is). Table 8-1 lists the routing protocols and the default preferences of routes found by each
protocol.
In Table 8-1, 0 indicates the direct route, and 255 indicates any route learned from unreliable
sources.
DIRECT 0
OSPF 10
IS-IS 15
STATIC 60
RIP 100
IBGP 255
EBGP 255
Except for direct routes, you can manually configure a routing protocol's preference. In
addition, the preference for each static route can be distinct from the other routes.
The ATN also defines the external preference and internal preference. External preference is
the preference set by a user for each routing protocol. Table 8-1 shows the default external
preference.
If different routing protocols are configured with the same preference, the system determines
which routes discovered by these routing protocols become the preferred routes through an
internal preference. Table 8-2 shows the internal preferences of routing protocols.
DIRECT 0
OSPF 10
IS-IS Level-1 15
IS-IS Level-2 18
STATIC 60
UNR 65
RIP 100
IBGP 200
EBGP 20
For example, two routes, an OSPF route and a static route, can reach the destination
10.1.1.0/24, and the preferences of both routes are set to 5. In this case, the ATN determines
the optimal route according to the internal preferences listed in Table 8-2. The internal
preference value 10 of OSPF is higher than the internal preference value 60 of the static route.
Therefore, the system selects the route discovered by OSPF as the optimal route.
Definition
Priority-based route convergence, which provides faster convergence of routes for key
services, is an important technology to improve network reliability.
Routes can be set with different convergence priorities, such as critical, high, medium, and
low. The system performs route convergence based on the convergence priorities and a
convergence rule. In other words, the system schedules the convergence of routes with
different convergence priorities in proportion to a weighting scheme.
Purpose
With the integration of network services, the services must be differentiated. As required by
operators, the routes for key services, such as Voice over IP (VoIP), video conferences, should
converge as fast as possible, while the routes for common services can be converged
relatively slowly. To improve network reliability, the system converges routes in a manner
based on their convergence priorities.
Principle
Table 8-3 shows the default convergence priorities of public routes. The routing protocols
first compute and deliver routes of high convergence priorities to the system. By default, the
system converges routes according to the scheduling weight values assigned to the
convergence priorities in the proportions of critical:high:medium:low = 8:4:2:1. You can re-
configure the scheduling weight values as required.
Direct High
Static Medium
RIP Low
BGP Low
NOTE
For private routes, only 32-bit host routes of OSPF and IS-IS can be identified as medium and all other
routes are identifies as low.
Load Balancing
The ATN supports the multi-route model (multiple routes with the same destination and
priority). Routes discovered by one routing protocol with the same destination and cost can
load-balance traffic. In each routing protocol view, you can run the maximum load-
balancing number command to configure the number of routes for load balancing. ATN
supports adopts per-flow load balancing.
l Per-flow load balancing
After per-flow load balancing is configured, the ATN forwards packets based on the
quintuple (the source address, destination address, source port, destination port, and
protocol in the packets). When the quintuple is the same, the ATN always chooses the
next hop address that is the same as the last one to send packets. Figure 8-3 shows the
networking for per-flow load balancing.
RouterB
GE0/2/0
10.1.1.0/24
P1~P6 10.1.1.0/24
ATN-A 10.2.1.0/24
10.2.1.0/24
GE0/2/4 P1~P6 RouterD
RouterC
ATN-A needs to forward packets to 10.1.1.0/24 and 10.2.1.0/24. Based on per-flow load
balancing, packets of the same flow are transmitted along the same path. The process for
ATN-A to forward packets is as follows:
– The first packet P1 to 10.1.1.0/24 is forwarded through GE 0/2/0, and all
subsequent packets to 10.1.1.0/24 are forwarded through the interface.
– The first packet P1 to 10.2.1.0/24 is forwarded through GE 0/2/4, and all
subsequent packets to 10.2.1.0/24 are forwarded through the interface.
Currently, the protocols that support load balancing are RIP, OSPF, BGP, and IS-IS. In
addition, static routes support load balancing.
IP FRR Overview
FRR refers to the mechanism that a fault detected at the physical layer or data link layer is
reported to the upper-layer routing system, and a backup link is immediately used to forward
packets.
Background of IP FRR
On traditional IP networks, when a fault occurs at the lower layer of the forwarding link, the
visible evidence is that the physical interface on the ATN becomes Down. After the ATN
detects the fault, it informs the upper layer routing system to recalculate routes and then
update routing information. Usually, it takes the routing system several seconds to re-select an
available route.
For services that require a low delay and low packet loss ratio, the convergence time of
several seconds is intolerant because it may lead to service interruption. For example, Voice
over Internet Protocol (VoIP) services are tolerant to interruption in milliseconds. IP FRR
ensures that the forwarding system swiftly detects such a fault and then takes measures to
restore services as soon as possible.
l IP FRR for the public network: protects ATNs of the public network.
l IP FRR for the private network: protects costomer edges (CEs).
1. If the primary link is available, you can configure IP FRR by using a routing policy to
provide the forwarding information of the backup route for the forwarding engine.
2. If the forwarding engine is notified of a link fault, the engine uses the backup link to
forward traffic before the routes on the control plane converge.
Item Description
IP FRR IP FRR is suitable for IP services that require a low delay and low packet
loss ratio.
l Protects the public network and CEs.
l Implements FRR through a backed up route.
VPN FRR VPN FRR is suitable for services that require a low delay and low packet
loss ratio on VPNs.
l Protects provider edges (PEs).
l Implements FRR through a backup tunnel.
In the ATN the routes discovered by a protocol can be imported to the routing table of another
protocol. Each protocol has the mechanism to import routes. For details, refer to the chapter
"Routing Policy".
Definition
Indirect next hop can change the direct association between route prefixes and the next hop
into an indirect association. Then, next hop information can be refreshed independently, the
prefixes of the same next hop do not need to be refreshed one by one, and route convergence
is speeded up.
Purpose
In the scenario in need of route iteration, when IGP routes or tunnels are switched,
Forwarding Information Base (FIB) entries are quickly refreshed. This implements traffic fast
convergence and reduces the impact on services.
Iteration Policy
An iteration policy is used to control the iteration result of the next hop to meet the
requirements of different application scenarios. In route iteration, iteration behaviors do not
need to be controlled by the iteration policy. Instead, iteration behaviors only need to comply
with the longest matching rule. What is more, the iteration policy needs to be applied only
when VPN routes iterate tunnels.
By default, the system selects LSPs for a VPN. If other types of tunnels are required, you
need to configure a tunnel policy and bind the tunnel policy to a tunnel. After a tunnel policy
is applied, the system adopts the tunnel bound in the tunnel policy or selects a tunnel
according to the priorities of different types of tunnels.
Forwarding
Prefix 1 Nexthop 1
Information 1
Forwarding
Prefix 2 Nexthop 2
Information 2
…… …… ……
Forwarding
Prefix N Nexthop N
Information N
As shown in Figure 8-4, before indirect next hop is adopted, prefixes are totally independent,
each corresponding to its next hop and forwarding information. When a dependent route
changes, the next hop corresponding to each prefix is iterated and forwarding information is
updated based on the prefix. In this case, the convergence speed is related to the number of
prefixes.
Actually, prefixes of a BGP neighbor have the same next hop, forwarding information, and
refreshed forwarding information.
Prefix 1
Forwarding
Prefix 2 Nexthop
Information
……
Prefix N
As shown in Figure 8-5, after indirect next hop is adopted, prefixes of a BGP neighbor share
a next hop. When a dependent route changes, only the shared next hop is iterated and
forwarding information is updated based on the next hop. In this case, traffic of all prefixes
can be converged at a time. The convergence speed is irrelevant to the number of prefixes.
8.1.3 Applications
IP forwarding
Link A PE1
IP forwarding
PE2
Indirect Next Hop Enabled When IBGP Routes Are Iterated to an IGP Route
AS100
IGP IGP
ATN-B
IBGP
ANT-A ATN-D
IGP IGP
ATN-C
As shown in Figure 8-7, ATN-A and ATN-D establish an IBGP neighbor relationship. To
refresh Forwarding Information Base (FIB) entries and guide the packet forwarding, the real
outbound interface and the directly connected next hop must be identified based on the
original IBGP next hop. Note that the next hop of an IBGP route cannot be used to guide
packet forwarding, because the IBGP neighbor relationship is generally established through
two loopback interfaces, and the next hop is not directly reachable.
ATN-D receives 4 thousand routes from ATN-A. These routes have the same original BGP
next hop. After being iterated, these routes eventually follow the same IGP path (A->B->D).
When the IGP path (A->B->D) fails, these IBGP routes do not need to be iterated separately,
and the relevant FIB entries do not need to be refreshed one by one. Actually, only the shared
next hop need be iterated and refreshed. Consequently, these IBGP routes can be converged to
the path (A->C-> D) at a time in the forwarding plane. Therefore, convergence time is related
only to the number of next hops, and sub-second convergence that is irrelevant to the number
of prefixes is implemented.
If ATN-A and ATN-D establish a multi-hop EBGP neighbor relationship, the convergence
procedure is the same as the previous procedure. Next hop separation also applies to multi-
hop EBGP route iteration.
Indirect Next Hop Enabled When VPN Routes Are Iterated to a Tunnel
AS100
P1
Tunnel1
As shown in Figure 8-8, PE1 and PE2 establish a neighbor relationship and PE2 receives 4
thousand routes from PE1. These routes have the same original BGP next hop. After being
iterated, these private routes eventually follow the same network public tunnel, namely, tunnel
1. When tunnel 1 fails, these routes do not need to be iterated separately, and the FIB entries
do not need to be refreshed one by one. Actually, only the shared next hop need be iterated,
and the relevant FIB entries need be refreshed. Consequently, these VPN routes can be
converged to tunnel 2 at a time in the forwarding plane. Therefore, convergence time is
related only to the number of next hops, and sub-second convergence that is irrelevant to the
number of prefixes is implemented.
Term Description
FRR FRR is applicable to services that are very sensitive to packet loss and delay.
When a fault is detected at the lower layer, the lower layer informs the upper
layer routing system of the fault. Then, the routing system forwards packets
through a backup link. In this manner, the impact of the link fault on services is
minimized.
UNR When a user goes online through a Layer 2 device, such as a switch, but there is
no available Layer 3 interface and the user is assigned an IP address, no
dynamic routing protocol can be used. To enable devices to use IP routes to
forward the traffic of this user, use the Huawei User Network Route (UNR)
technology to assign a route to forward the traffic of the user.
Abbreviations
Abbreviation Full Name
CE Customer Edge
PE Provider Edge
RM Route Management
Definition
Static routes need to be manually configured by the administrator.
Purpose
On a simple network, the administrator just needs to configure static routes so that the
network can run properly. Properly configuring and using static routes can improve network
performance and guarantee the required bandwidth for important applications.
8.2.2 Principles
Actually, each routing entry requires a next-hop address. Before sending a packet, a device
needs to use the longest match rule to search its routing table for the route that matches the
destination address in the packet. The device can find the associated link layer address only
after the next-hop address of the packet is specified.
NOTE
If the next hop IP address manually specified for a static route changes, the device on which the static
route is configured is unaware of the change. As a result, traffic fails to be forwarded along the static
route. To address this problem, associate the static route with DHCP so that the static route can obtain
the next hop IP address dynamically.
l If the next hop IP address obtained using DHCP changes, the static route updates it.
l If no next hop IP address can be obtained using DHCP, the static route is invalid.
l For a Point-to-Point (P2P) interface, the next-hop address is the address you specify as
the outbound interface. That is, the address of the remote interface connected to this
interface is the next-hop address. For example, when an MP-group interface is
encapsulated with the Point-to-Point Protocol (PPP) and obtains the remote IP address
through PPP negotiation, you need to specify only the outbound interface rather than the
next-hop address.
l Non-Broadcast Multiple-Access (NBMA) interfaces (such as an ATM interface) are
applicable to Point-to-Multipoint (P2MP) networks. IP routes and the mappings between
IP addresses and link layer addresses are required. Therefore, you need to configure
next-hop addresses.
l When configuring static routes, do not specify the Ethernet interface as the outbound
interface. An Ethernet interface is a broadcast interface and a VT interface can be
associated with several virtual access (VA) interfaces. If the Ethernet or VA interface is
specified as the outbound interface, a unique next hop cannot be determined because
multiple next hops exist. In actual applications, to specify a broadcast interface (such as
an Ethernet interface) or a VT interface as the outbound interface, you are recommended
to specify the associated next-hop address instead.
2 ATN B 4
1 5
ATN A ATN C
In Figure 8-9, static routes to network segments 3, 4, and 5 need to be configured on ATN A;
static routes to network segments 1 and 5 need to be configured on ATN B; and static routes
to network segments 1, 2, and 3 need to be configured on ATN C.
ATN B
Preference=60
Preference=100
ATN A ATN C
ATN D
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
l If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
l If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
l By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this
case, whether the static route associated with the BFD session can be selected as the
optimal route is subject to the re-negotiated BFD session status.
l Single-hop detection
For a non-iterated static route, the configured outbound interface and next-hop address
provide the information about the directly connected next hop. In this case, the outbound
interface bound to the BFD session is the outbound interface of the static route, and the
peer address is the next-hop address of the static route.
l Multi-hop detection
For an iterated static route, only the next-hop address is configured. Therefore, the
directly connected next-hop and outbound interface need to be iterated. In this case, the
peer address of the BFD session is the original next-hop address of the static route, and
the outbound interface is not specified. Generally, the original next hop to be iterated is
an indirect next hop. Therefore, multi-hop detection is performed on the static routes that
support route iteration.
NOTE
If the next hop of a route is not directly reachable, the route cannot be used for packet forwarding. Based
on information about the current next hop of this route, the system will calculate an actual outbound
interface and an actual next hop. This process is called route iteration. In the display ip routing-table
command output, if the Flags value of a route is displayed R, the route is an iterated route. Otherwise,
the route is not an iterated route.
NOTE
Terms
Term Description
FRR Fast Reroute is applicable to the services that are very sensitive to packet loss
and delay. After FRR is configured, when a fault is detected at the lower layer,
the fault is reported to the upper-layer routing system. Then, packets are
forwarded through a backup link. Therefore, the impact of link faults on the
carried services is minimized.
Abbreviations
Abbreviatio Full Name
n
RM Route Management
8.3 RIP
8.3.1 Introduction
Definition
Routing Information Protocol (RIP) is a simple Interior Gateway Protocol (IGP). RIP is used
in small-scale networks, such as campus networks and simple regional networks.
RIP employs the hop count as the metric to measure the distance to the destination. In RIP, by
default, the hop count from a router to its directly connected network is 0, and the hop count
from a router to a network that is reachable through another router is 1, and so on. That is, the
hop count equals the number of routers along the path from the local network to the
destination network. To speed up the convergence, RIP defines the hop count as an integer
ranging from 0 to 15. A hop count greater than or equal to 16 is considered infinite, indicating
that the destination network or host is unreachable. Due to the hop limit, RIP is not applicable
to large-scale networks.
RIP supports split horizon, poison reverse, and triggered update, which improves performance
and prevents routing loops.
Purpose
As the earliest IGP, RIP is used in small- and medium-sized networks. Its implementation is
simple, and the configuration and maintenance of RIP are easier than those of Open Shortest
Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS). Therefore, RIP is
widely used on live networks.
8.3.2 Principles
RIP is based on the Distance-Vector (DV) algorithm. It forwards packets through User
Datagram Protocol (UDP). RIP uses timers to guarantee advertisement, update, and aging of
routing information. However, design defects in RIP may cause routing loops. Therefore, split
horizon, poison reverse, and triggered update were introduced into RIP to prevent routing
loops.
In addition, RIP periodically advertises its routing table to neighbors, and route
summarization was introduced to reduce the size of the routing table.
8.3.2.1 RIP-1
RIP Version 1 (RIP-1) is a classful routing protocol, and its protocol packets can only be
broadcast. Figure 8-11 shows the packet format. A RIP packet can carry a maximum of 25
entries. RIP is based on UDP, and RIP-1 data packets cannot be longer than 512 bytes.
Because RIP-1 packets do not carry any mask information, RIP-1 can identify only the routes
to natural network segments, such as Class A, Class B, and Class C. Therefore, RIP-1 does
not support route aggregation or discontinuous subnets.
0 7 15 31
Header Command Version Must be zero
Address familyidentifier Must be zero
IPaddress
Route
Entries Must be zero
Must be zero
Metric
8.3.2.2 RIP-2
RIP version 2 (RIP-2), is a classless routing protocol. Figure 8-12 shows the format of a
RIP-2 packet.
0 7 15 31
Header Command Version Must be zero
Address Family Identifier Route Tag
IP Address
Route
Subnet Mask
Entries
Next Hop
Metric
l Supports external route tags and flexibly controls routes based on the tag using a routing
policy.
l Supports route summarization and Classless Inter-domain Routing (CIDR) because
RIP-2 packets carry mask information.
l Supports next hop specification so that the optimal next hop address can be specified on
the broadcast network.
l Uses multicast routes to send update packets. Only RIP-2 routers can receive protocol
packets, which reduces resource consumption.
8.3.2.3 Timers
RIP uses the following three timers:
l Update timer: The update timer periodically triggers update packet transmission. By
default, the interval at which update packets are sent is 30s.
l Age timer: If a RIP device does not receive any packets from its neighbor to update a
route before the route expires, the RIP device considers the route unreachable. By
default, the age timer interval is 180s.
l Garbage-Collect timer: If a route becomes invalid after the age timer expires, the route is
placed into a garbage queue instead of being immediately deleted from the RIP routing
table. If an Update packet of a route is received before the garbage-collect timer expires,
the route is placed back into the age queue. If no Update packet of a route is received
before the garbage-collect timer expires, the route is deleted from the RIP routing table.
The advertisement of RIP routing updates is triggered by the update timer at a default interval
of 30 seconds. Each entry is associated with the age timer and garbage-collect timer. After a
route is learned from a neighbor, it is added to the routing table, and the age timer is started. If
no update packet is received from the neighbor within 180s, the cost of the route is set to 16
(indicating that the route is unreachable). At the same time, the garbage-collect timer is
started. If no update packet is received within 120 seconds, the entry is deleted after the
garbage-collect timer expires.
10.0.0.0/8
ATN A ATN B
10.0.0.0/8
In Figure 8-13, ATN B sends a route to 10.0.0.0 to ATN A, and ATN A does not send the
route back to ATN B.
10.0.0.0/8
cost=16
ATN A ATN B
cost=1
10.0.0.0/8
On the network shown in Figure 8-14, if poison reverse is not configured, ATN B sends ATN
A a route that was learned from ATN A. The metric of the route from ATN A to network
10.0.0.0 is 1. If the route from ATN A to network 10.0.0.0 is unreachable and ATN B keeps
sending ATN A routes to network 10.0.0.0 because ATN B fails to receive a route update
packet from ATN , a routing loop occurs.
If poison reverse configured, if ATN A sends ATN B a message that the route received from
ATN B is unreachable, ATN B does not learns the unreachable route from ATN A, which
avoids route loops.
If both split horizon and poison reverse are configured, only poison reverse takes effect.
ATN C 10.3.0.0
E0 S0
The network to
10.4.0.0 fails.
10.4.0.0
In the networking shown in Figure 8-15, when network 10.4.0.0 becomes unreachable, ATN
C learns the information first. By default, a RIP-enabled device sends routing updates to its
neighbors every 30s. If the update message of ATN B is sent to ATN C when ATN C is
waiting for the route update message, ATN C learns the incorrect route to 10.4.0.0. In this
case, the next hops of the routes from ATN B or ATN C to 10.4.0.0 are ATN C or ATN B
respectively, which results in a routing loop. If ATN C sends an Update packet to ATN B
immediately after it detects a network failure. The routing table of ATN B is updated in time,
which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
device sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
summarization because RIP-2 packets carry mask information. Therefore, RIP-2 supports
subnetting.
In RIP-2, route summarization reduces the size of the routing table and improves the
extensibility and efficiency of a large-scale network.
Route summarization is classified as follows:
l Process-based classful summarization
For example, a RIP process summarizes the route 10.1.1.0 /24 with metric 2 and route
10.2.2.0/24 with metric 3 into the route 10.0.0.0/8 with metric 2.
l Interface-based aggregation:
Users can specify a summary address.
For example, users can configure a RIP-enabled interface to summarize the route
10.1.1.0/24 with metric 2 and route 10.2.2.0/24 with metric 3 into the route 10.1.0.0/16
with metric 2.
Routers with a distributed architecture support the RIP Hot Standby (HSB) feature. RIP backs
up data from the Active Main Board (AMB) to the Standby Main Board (SMB). Whenever
the AMB fails, the SMB becomes active. Therefore, RIP is not affected.
RIP backs up only RIP configurations. During a Graceful Restart (GR), a RIP-enabled device
resends a routing request to neighbors to synchronize route database.
NOTE
The ATN can only be used as a GR Helper, not GR Restarter.
Poison reverse Poison reverse allows a RIP-enabled interface to set the metric of the
route that it learns from a neighbor to 16 (indicating that the route is
unreachable) and then send the route back. After receiving this route, the
neighbor deletes the useless route from its routing table, which prevents
loops.
Split horizon Split horizon prevents a RIP-enabled interface from sending back the
routes it learns, which reduces bandwidth consumption and prevents
routing loops.
8.4 RIPng
8.4.1 Introduction
Definition
RIP next generation (RIPng) is an extension to RIP Version 2 (RIPv2) on IPv6 networks.
Most RIP concepts apply to RIPng.
RIPng is a distance-vector routing protocol, which measures the distance (metric or cost) to
the destination host by the hop count. In RIPng, the hop count from a device to its directly
connected network is 0, and the hop count from a device to a network that is reachable
through another device is 1. When the hop count is greater than or equal to 16, the destination
network or host is considered unreachable.
To be applied on IPv6 networks, RIPng makes the following changes to RIP:
l UDP port number: RIPng uses UDP port number 521 to send and receive routing
information.
l Multicast address: RIPng uses FF02::9 as the link-local multicast address of a RIPng
device.
l Prefix length: RIPng uses a 128-bit (the mask length) prefix in the destination address.
l Next hop address: RIPng uses a 128-bit IPv6 address.
l Source address: RIPng uses the local link address FE80::/10 as the source address to
send RIPng update packets.
Purpose
RIPng is an extension to RIP for support of IPv6.
8.4.2 Principles
RIPng is an extension to RIPv2 on IPv6 networks and uses the same timers as RIPv2. RIPng
supports split horizon, poison reverse, and triggered update, which prevents routing loops.
---------
l Next hop RTE: It defines the IPv6 address of the next hop and is located before a group
of IPv6-prefix RTEs that have the same next hop.
l IPv6-prefix RTE: It describes the destination IPv6 address and the cost in the RIPng
routing table and is located after a next hop RTE. A next hop RTE can be followed by
multiple different IPv6-prefix RTEs.
8.4.2.2 Timers
RIPng uses the following three timers:
l Update timer: This timer periodically triggers update packet transmission. By default, the
interval at which update packets are sent is 30s. This timer is used to synchronize RIPng
routes on the network.
l Age timer: If a RIPng device does not receive any update packet from its neighbor
before a route expires, the RIPng device considers the route to its neighbor unreachable.
l Garbage-collect timer: If no packet is received to update an unreachable route after the
Age timer expires, this route is deleted from the RIPng routing table.
The advertisement of RIPng routing updates is periodically triggered by the update timer with
default value 30 seconds. Each routing entry is associated with the age timer and garbage-
collect timer. Each time a route is learned and added to the routing table, the age timer is
started. If no update packet is received from the neighbor within 180 seconds, the metric of
the route is set to 16, and the garbage-collect timer is started. If no update packet is received
within 120 seconds, the route is deleted after the garbage-collect timer expires.
2001:DB8:1::/64
ATNA ATNB
2001:DB8:1::/64
On the network shown in Figure 8-19, after ATN B sends a route to network 2001:DB8:1::/64
to ATN A, ATN A does not send the route back to ATN B.
2001:DB8:1::/64
metric=16
ATNA ATNB
2001:DB8:1::/64
metric=1
As shown in Figure 8-20, if poison reverse is not configured, ATN B sends ATN A a route
that was learned from ATN A. The metric of the route from ATN A to network
2001:DB8:1::/64 is 1. When the route from ATN A to network 2001:DB8:1::/64 becomes
unreachable and ATN B does not receive an update packet from ATN A and keeps sending
ATN A the route from ATN A to network 2001:DB8:1::/64, a routing loop occurs.
With poison reverse, after receiving a route from ATN B, ATN A sends ATN B a message that
the route is unreachable. ATN B then no longer learns the reachable route from ATN A, which
prevents routing loops.
If both poison reverse and split horizon are configured, only poison reverse takes effect.
ATNC 2001:DB8:11::
E0 S0
The network to
2001:DB8:1:: fails.
2001:DB8:1::
On the network shown in Figure 8-21, if network 2001:DB8:1:: is unreachable, ATN C learns
the information first. By default, a RIPng-enabled device sends Update packets to its
neighbors every 30 seconds. If ATN C receives an Update packet from ATN B within 30s
when ATN C is still waiting to send update packets, ATN C learns the incorrect route to
network 2001:DB8:1:: from ATN B. In this case, the next hops of the routes from ATN B and
ATN C to network 2001:DB8:1:: are ATN C and ATN B, respectively, which results in a
routing loop. If ATN C sends an Update packet to ATN B immediately after it detects a
network fault, ATN B can rapidly update its routing table, which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local ATN
sets the metric of the route to 16 and then advertises the route immediately to its neighbors.
This process is called route poisoning.
8.4.2.7 Multi-process
RIPng supports multi-process and multi-instance, which simplifies network management and
improves service control efficiency. Multi-process allows a set of interfaces to be associated
with a specific RIPng process, which ensures that the specific RIPng process performs all the
protocol operations only on this set of interfaces. Therefore, multiple RIPng processes can run
on one router, and each process manages a unique set of interfaces. In addition, the routing
data of each RIPng process is independent; however, processes can import routes from each
other.
Routers with a distributed architecture support RIPng Hot Standby (HSB). In the RIPng HSB
process, RIPng backs up RIPng configuration from the Active Main Board (AMB) to the
Standby Main Board (SMB). Whenever the AMB fails, the SMB becomes active.
After the SMB is activated, RIPng resends a request to neighbors to synchronize the route
database. Therefore, RIPng is not affected.
NOTE
The ATN can function as a GR helper, not a GR restarter.
Poison Poison reverse allows a RIPng-enabled interface to set the metric of the
reverse route that it learns from a neighbor to 16 (indicating that the route is
unreachable) and then send the route back. After receiving this route, the
neighbor deletes the useless route from its routing table, which prevents
loops.
Split horizon Split horizon prevents a RIPng-enabled interface from sending back the
routes it learns, which reduces bandwidth consumption and prevents
routing loops.
8.5 IS-IS
Definition
Intermediate System to Intermediate System (IS-IS) is a link-state routing protocol that uses
the shortest path first (SPF) algorithm to calculate routes. IS-IS is an Interior Gateway
Protocol (IGP) and is used within an autonomous system (AS).
IS-IS was initially designed by the International Organization for Standardization (ISO) for its
Connectionless Network Protocol (CLNP).
To support IP routing, the Internet Engineering Task Force (IETF) extended and modified IS-
IS in RFC 1195. This modification enables IS-IS to be applied to TCP/IP and Open Systems
Interconnection (OSI) environments. This type of IS-IS is called Integrated IS-IS or Dual IS-
IS.
The term IS-IS used in this document refers to Integrated IS-IS, unless otherwise stated.
NOTE
If IS-IS IPv4 and IS-IS IPv6 implement a feature in the same way, details are not provided in this
chapter. For details about the implementation differences, see the appendix 8.5.4 Appendixes.
Purpose
United States' Government Open Systems Interconnection Profile (GOSIP) held the opinion
that TCP/IP was an interim protocol suite that would eventually be replaced by the OSI suite.
All routing protocols except IS-IS support TCP/IP only. IS-IS can apply to both TCP/IP and
OSI networks and supports dynamic routing information exchange on an IP network.
Benefits
IS-IS has become a scalable, powerful, and easy-to-use IGP after many years of development.
It has the following advantages:
l Implements routing in a routing domain.
l Supports fast network convergence when a fault occurs on a network.
l Provides loop-free routes.
l Improves network stability.
l Supports network scalability.
l Improves network resource usage.
These advantages make IS-IS that carriers use for wide-scale deployment on live networks to
guarantee network stability, security, and scalability.
8.5.2 Principles
Development of IS-IS
CLNP is a Layer 3 protocol in the OSI model posed by the ISO. IS-IS was initially designed
by the ISO and is used as a routing protocol based on CLNP addressing.
CONP/CMNS CLNP/CLNS
Network
IS-IS ES-IS
OSI adopts systemized (or hierarchical) addressing. The services at the transport layer in OSI
can be addressed through the Network Service Access Point (NSAP).
With the popularity of TCP/IP, the IETF extends and modifies IS-IS in RFC 1195 to support
IP routing. This enables IS-IS to be applied to TCP/IP and OSI environments. This type of IS-
IS is called Integrated IS-IS or Dual IS-IS.
The lengths of the IDP and the DSP are variable. The length of the NSAP varies from 8 bytes
to 20 bytes.
IDP DSP
Area Address
8 bytes to 20 bytes. When configuring IS-IS on a device, you can configure only a NET
instead of an NSAP.
In general, an IS-IS process is configured with only one NET. When areas need to be
redefined, for example, areas need to be combined or an area needs to be divided into
sub-areas, you can configure multiple NETs.
An IS-IS process can be configured with a maximum of three area addresses; therefore, a
maximum of three NETs can be configured. When configuring multiple NETs, ensure
that their system IDs are the same.
For example, in NET ab.cdef.1234.5678.9abc.00, the area is ab.cdef, the system ID is
1234.5678.9abc, and the SEL is 00.
NOTE
The routers in the same area must have the same area address.
As shown in Figure 8-25, most fields in a P2P IIH are the same as those in a LAN IIH.
The P2P IIH does not have the priority and LAN ID fields, but has a local circuit ID
field. The local circuit ID indicates the local link ID.
l LSP packet format
LSPs are used to exchange link-state information. There are two types of LSPs: Level-1
LSPs and Level-2 LSPs. Level-1 IS-IS transmits Level-1 LSPs; Level-2 IS-IS transmits
Level-2 LSPs; and Level-1-2 IS-IS can transmit both Level-1 and Level-2 LSPs.
Level-1 and Level-2 LSPs have the same format, as shown in Figure 8-26.
ATN D ATN E
Overload
ATN A ATN C
ATN B
l CLV
The variable length fields in a PDU are the multiple Code-Length-Values (CLVs). A
CLV is also called Type-Length-Value (TLV). Figure 8-30 shows the CLV format.
8 Padding IIH
The CLVs with codes ranging from 1 to 10 are defined in ISO 10589 (CLV 3 and CLV 5
are not listed in the table), and the other CLVs are defined in RFC 1195.
IS-IS Areas
l Two-Level structure
To support large-scale routing networks, IS-IS adopts a two-level structure in a routing
domain. A large domain can be divided into areas. In general, Level-1 routers are located
in an area, Level-2 routers are located between areas, and Level-1-2 routers are located
between Level-1 and Level-2 routers.
l Level-1 router
A Level-1 router manages intra-area routing. It establishes neighbor relationships with
only the Level-1 and Level-1-2 routers in the same area and maintains a Level-1 LSDB.
The LSDB contains routing information in the local area. A packet to a destination
beyond this area is forwarded to the nearest Level-1-2 router.
l Level-2 router
A Level-2 router manages inter-area routing. It can establish neighbor relationships with
Level-2 routers or Level-1-2 routers in other areas. It maintains a Level-2 LSDB which
contains inter-area routing information.
All Level-2 routers form the backbone network of the routing domain. They are
responsible for communications between areas. The Level-2 routers in the routing
domain must be contiguous to ensure the continuity of the backbone network. Only
Level-2 routers can exchange data packets or routing information with routers beyond
the area.
l Level-1-2 router
A router that belongs to both a Level-1 area and a Level-2 area, is called a Level-1-2
router. It can establish Level-1 neighbor relationships with Level-1 routers and Level-1-2
routers in the same area. It can also establish Level-2 neighbor relationships with Level-2
routers and Level-1-2 routers in other areas. Level-1 routers can be connected to other
areas only through Level-1-2 routers.
A Level-1-2 device maintains two LSDBs: a Level-1 LSDB and a Level-2 LSDB. The
Level-1 LSDB is used for intra-area routing, whereas the Level-2 LSDB is used for
inter-area routing.
NOTE
Level-1 routers in different areas cannot establish neighbor relationships. Level-2 routers can
establish neighbor relationships with each other, regardless of the areas to which the Level-2
routers belong.
l Interface level
A Level-1-2 device may need to establish only a Level-1 adjacency with a neighbor and
establish only a Level-2 adjacency with another neighbor. In this case, you can set the
level of an interface to control the setting of adjacencies on the interface. Specifically,
only Level-1 adjacencies can be established on a Level-1 interface, and only Level-2
adjacencies can be established on a Level-2 interface.
Figure 8-31 shows a network that runs IS-IS. The network is similar to an OSPF network
with multiple areas. The entire backbone area contains all routers in area 1 and Level-1-2
routers in other areas.
Area2 Area3
L1
L1/2
L1/2
L2
L2
backbone Area1
L2 L2
Area5
Area4
L1/2 L1
L1/2
L1
L1
L1
L1
Figure 8-32 shows another type of IS-IS topology. All the contiguous Level-1-2 and Level-2
routers form the backbone area of IS-IS. In this topology, Level-2 routers belong to different
areas, and Level-1-2 routers also belong to different areas. No area is specifically defined as
the backbone area.
Area1
L1
L2
L1
L1/L2
Area2 L1/L2 L1
Area4
L2
L2 Area3
NOTE
For OSPF, inter-area routes are forwarded by the backbone area, and the SPF algorithm is
used only in the same area. For IS-IS, both Level-1 and Level-2 routes are calculated through
the SPF algorithm to generate the Shortest Path Tree (SPT).
L1/L2 L1/L2
L1 Adjacencies
L2 Adjacencies
L1 DIS L2 DIS
A DIS is used to create and update pseudo nodes. It also generates LSPs of the pseudo nodes.
The LSPs describe the available routers on the network.
The pseudo node is used to simulate the virtual node on the broadcast network and is not a
real router. In IS-IS, a pseudo node is identified by the system ID of the DIS and the 1-byte
Circuit ID (its value is not 0).
With pseudo nodes, the network topology is simplified, and LSPs are shortened. When the
network changes, the number of generated LSPs is reduced. Therefore, the SPF consumes
fewer resources.
NOTE
On IS-IS broadcast networks, although all the routers set up adjacencies with each other, the LSDBs are
synchronized by the DISs.
ATN-A ATN-B
ATN-C ATN-D
ATN A, ATN B, ATN C, and ATN D are Level-2 routers. ATN A is newly added to the
broadcast network. The process of establishing the neighbor relationship between ATN A
and ATN C or between ATN A and ATN D is similar to that between ATN A and ATN
B.
Figure 8-35 shows the process of establishing the neighbor relationship between Router
A and Router B.
L2 LAN IIH
ATN A broadcasts a Level-2 LAN IS-IS Hello PDU. After receiving the PDU, ATN B
sets its neighbor status with ATN A to Initial. Then, ATN B responds to ATN A with a
Level-2 LAN IIH packet indicating that ATN A is a neighbor of ATN B. On receiving
the IIH packet, ATN A sets its neighbor status with ATN B to Up.
The network is a broadcast network, and a DIS needs to be elected. After the neighbor
relationship is established, routers wait for two intervals before sending Hello packets to
elect the DIS. The IIH packets exchanged by the routers contain the Priority field. The
router with the highest priority is elected as the DIS. If the routers have the same priority,
the router with the largest interface MAC address is elected as the DIS.
l Establishment of a neighbor relationship on a P2P link
The establishment of a neighbor relationship on a P2P link is different from that on a
broadcast link. On a P2P link, the establishment of a neighbor relationship can be
conducted in 2-way or 3-way handshake mode.
– 2-way mode
Upon receiving an IS-IS Hello packet, a router unidirectionally sets up the neighbor
relationship.
– 3-way mode
A neighbor relationship is established after IS-IS Hello PDUs are sent for three
times, which is similar to the establishment of a neighbor relationship on a
broadcast link.
NOTE
For details on 3-way handshake mechanism of IS-IS, see IS-IS 3-Way Handshake chapters.
l Only neighboring routers of the same level can set up the neighbor relationship with each
other.
l For Level-1 routers, their area IDs must be the same.
l Routers must be on the same network segment.
Network types of IS-IS interfaces on both ends of a link must be consistent. Otherwise, a
neighbor relationship cannot be established. By simulating Ethernet interfaces as P2P
interfaces, you can establish a neighbor relationship on a P2P link.
IS-IS runs on the data-link layer and was initially designed for CLNP. Therefore, the
establishment of an IS-IS neighbor relationship is not related to IP addresses. In the
implementation of a device, IS-IS runs only over the IP layer. Therefore, IS-IS needs to check
the IP address of its neighbor. If secondary IP addresses are assigned to the interfaces, the
routers can still set up the IS-IS neighbor relationship only when either the primary IP
addresses or secondary IP addresses are on the same network segment.
When IP address unnumbered is not configured, if the IP address of a neighbor and the
address of an interface through which the local device receives packets are not on the same
network segment, the neighbor relationship cannot be set up, preventing IP unreachability.
The neighbor relationship can be set up if you prevent the device from checking the IP
addresses contained in received Hello PDUs.
l For P2P interfaces, you can prevent them from checking IP addresses.
l For Ethernet interfaces, simulate them as P2P interfaces and then prevent them from
checking IP addresses.
ATN -A
ATN -C
ATN -B (DIS)
LSP
ATN -C.00-00
CSNP
ATN -A.00-00
ATN -B.00-00
ATN -B.01-00 PSNP
ATN -C.00-00 ATN -A.00-00
ATN -B.00-00
ATN -B.01-00
LSP
ATN -A.00-00
ATN -B.00-00
ATN -B.01-00
– Newly added ATN C sends Hello packets to establish neighbor relationships with
the other routers in the broadcast domain. For details, see "Establishment of a
neighbor relationship on a broadcast link."
– After setting up the neighbor relationships with other routers, ATN C sends its LSP
to the following multicast addresses after the LSP timer expires:
Level-1: 01-80-C2-00-00-14
Level-2: 01-80-C2-00-00-15
Then, all neighbors on the network can receive the LSP.
– The DIS on the network segment adds the LSP received from ATN C to its LSDB.
After the CSNP timer expires, the DIS sends CSNPs to synchronize the LSDBs on
the network. By default, CSNPs are sent at an interval of 10 seconds.
– After receiving the CSNPs from the DIS, ATN C checks its LSDB and sends a
PSNP to request the LSPs it does not have.
– After receiving the PSNP, the DIS sends the required LSPs to synchronize LSDBs.
l Process of updating the LSDB of the DIS
– When the DIS receives an LSP, it searches the LSDB for related records. If the DIS
does not find the LSP in its LSDB, the DIS adds the LSP to its LSDB and
broadcasts the new LSDB.
– If the sequence number of the received LSP is greater than that of the local LSP in
the LSDB, the DIS replaces the local LSP with the received LSP in the LSDB, and
broadcasts the new LSDB.
– If the sequence number of the received LSP is less than that of the local LSP in the
LSDB, the DIS sends the local LSP to the inbound interface.
– If the sequence number of the received LSP is equal to that of the local LSP in the
LSDB, the DIS checks whether the Remaining Lifetime of the received LSP is 0. If
the Remaining Lifetime of the received LSP is not 0 and the Remaining Lifetime of
the local LSP in the LSDB is 0, the DIS replaces the local LSP with the received
LSP and broadcasts the new LSDB. If the Remaining Lifetime of the received LSP
is 0 and the Remaining Lifetime of the local LSP in the LSDB is not 0, the DIS
sends the local LSP in the LSDB to the inbound interface.
– If the sequence numbers of the received LSP and local LSP in the LSDB are the
same, and the Remaining Lifetimes of the two LSPs are not 0, the DIS compares the
checksum of the two LSPs. If the checksum of the received LSP is greater than that
of the local LSP in the LSDB, the DIS replaces the local LSP with the received LSP
and broadcasts the new LSDB. If the checksum of the received LSP is less than that
of the local LSP in the LSDB, the DIS sends the local LSP in the LSDB to the
inbound interface.
– If the sequence numbers of the received LSP and local LSP in the LSDB are the
same, the Remaining Lifetimes of the two LSPs are not 0, and the checksums of the
two LSPs are the same, the DIS does not forward the received LSP.
l Synchronizing the LSDB on a P2P link
ATN -A ATN -B
PPP
LSP
ATN -A.00-00
PSNP
ATN -A.00-00
Retransmission
times out
LSP Resend
ATN -A.00-00 response packet
PSNP
ATN -A.00-00
a. When the neighbor relationship is set up for the first time, a router sends a CSNP to
its neighbor. If the LSDB of the neighbor and the CSNP are not synchronized, the
neighbor sends PSNP requests for a required LSP.
b. The router sends the required LSP to the neighbor and starts the LSP retransmission
timer. The router then waits for a PSNP from the neighbor as an acknowledgement
of receiving the LSP.
c. If the router does not receive the PSNP from the neighbor after the LSP
retransmission timer expires, it resends the LSP.
NOTE
– When an IS-IS process is created, it can be associated with a VPN instance. Then,
the IS-IS process belongs to the VPN instance and processes events only in the
VPN instance. If the VPN instance is deleted, the IS-IS process is also deleted.
For easy management and effective control, IS-IS supports multi-process and multi-instance.
In the scenario where IS-IS is applied to users on private networks, after a VPN is created,
interfaces bound to the VPN and routes in the VPN are isolated from other VPNs and public
network data. In this case, you can adopt IS-IS multi-instance to deploy IS-IS in the VPN.
For the routers that support the VPN, each IS-IS process is associated with a specific VPN
instance. All the interfaces attached to an IS-IS process, therefore, must be associated with the
VPN instance with which this IS-IS process is associated.
At present, VPN instances are maintained by the VPN module. IS-IS multi-instance is
implemented by associating an IS-IS process with a VPN instance when creating the IS-IS
process.
In most cases, intra-area routes are managed by Level-1 devices. All Level-2 and Level-1-2
devices form a contiguous backbone area. A Level-1 area can only be connected to the
backbone area, not to another Level-1 area.
The routing information of a Level-1 area is advertised to a Level-2 area through a Level-1-2
device; therefore, Level-1-2 and Level-2 devices know the routing information of the entire
IS-IS domain. A Level-2 device, by default, does not inform a Level-1 area of the learned
routing information of either the backbone area or other Level-1 areas. Therefore, the Level-1
devices do not know the routing information beyond the area. As a result, the Level-1 devices
cannot select the optimal route to the destination beyond the area.
In IS-IS route leaking, you can define access control lists (ACLs), routing policies, and tags
on Level-1-2 routers so that these routers select eligible routes about other Level-1 areas and
the backbone area. The Level-1-2 routers can then advertise to their Level-1 areas these
eligible routes.
ATN A ATN C
Level-1 Level-1-2
1.1.1.1/24 1.1.1.2/24
cost50
cost10 4.4.4.2/24
l In the figure, ATN A, ATN B, ATN C, and ATN D belong to area 10. ATN A and ATN B
are Level-1 routers; ATN C and ATN D are Level-1-2 routers.
l ATN E and ATN F are Level-2 routers and belong to area 20.
The optimal route for ATN A to send a packet to ATN F is ATN A -> ATN B -> ATN D ->
ATN E -> ATN F, which has a cost of 40. However, the selected route that the packet traverses
is ATN A -> ATN C -> ATN E -> ATN F, which has a cost of 70. This route is not the optimal
route from ATN A to ATN F.
Because ATN A does not detect the routes outside the local area, ATN A sends the packets to
other network segments through the default route generated by the nearest Level-1-2 router.
To ensure that the optimal route is selected, you can enable route leaking on the Level-1-2
routers ATN C and ATN D.
l Incremental SPF (I-SPF): recalculates only the routes of the changed nodes rather than
all the nodes when the network topology changes, which speeds up route calculation.
l Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.
l LSP fast flooding: speeds up LSP flooding.
l Intelligent timer: is applicable to LSP generation and SPF calculation.
The first timeout period of the timer is fixed. If an event triggers the timer before the set
timer expires, the next timeout period increases.
I-SPF
In ISO 10589, the Dijkstra algorithm is used to calculate routes. When a node changes on the
network, this algorithm is used to recalculate all routes. The calculation takes a long time and
consumes too many CPU resources, which affects the convergence speed.
I-SPF improves the algorithm. Except for the first time, only the nodes that have changed
rather than all nodes on the network are calculated. The SPT generated using I-SPF is the
same as that generated using the Dijkstra algorithm. This significantly decreases CPU usage
and speeds up network convergence.
PRC
Similar to I-SPF, PRC calculates only the changed routes, but it does not calculate the shortest
path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a router. If the SPT
changes after I-SPF calculation, PRC calculates all the leaves only on the changed node. If the
SPT remains unchanged, PRC processes only the changed leaves.
For example, if IS-IS is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. PRC updates only the routes of this interface, consuming less CPU
resources.
PRC working with I-SPF further improves network convergence performance and replaces
the SPF algorithm.
NOTE
With LSP fast flooding, when a device receives newer LSPs, it floods out the LSPs less than
the specified number before calculating routes, which speeds up network convergence.
Intelligent Timer
Although the route calculation algorithm is improved, the long interval for triggering route
calculation affects the convergence speed. Frequent network changes also consume too many
CPU resources. The SPF intelligent timer addresses these problems.
In most cases, an IS-IS network running normally is stable. Frequent changes on a network
are rather rare, and IS-IS does not calculate routes frequently. Therefore, a short period
(within milliseconds) can be configured as the first interval for route calculation. If the
network topology changes frequently, the interval set by the intelligent timer increases with
the number of calculations, which reduces CPU consumption.
The LSP generation intelligent timer is similar to the SPF intelligent timer. When the LSP
generation intelligent timer expires, the system generates a new LSP based on the current
topology. The original mechanism uses a timer with fixed intervals, which results in slow
convergence and high CPU consumption. Therefore, the LSP generation timer is designed as
an intelligent timer to respond to emergencies (for example, the interface goes Up or Down)
quickly and speed up network convergence. In addition, when the network changes
frequently, the interval for the intelligent timer becomes longer to reduce CPU consumption.
You can assign the highest convergence priority to routes for key services so that these routes
converge first. This decreases the impact on key services and improves network reliability.
As defined in RFC 3786, virtual system IDs can be configured, and virtual LSPs that carry
routing information can be generated for IS-IS.
Terms
l Originating system: is a router that runs the IS-IS protocol. A single IS-IS process can
advertise its LSPs as multiple "virtual" routers do, except that the originating system
refers to a real IS-IS process.
l Normal system ID: is the system ID of the originating system.
l Additional system ID: assigned by network administrators, is used to generate additional
or extended LSP fragments. Up to 256 additional or extended LSP fragments can be
generated. Like a normal system ID, an additional system ID must be unique in the
routing domain.
The additional system ID, assigned by network administrators, is used to generate
additional or extended LSP fragments. Up to 256 additional or extended LSP fragments
can be generated. Like a normal system ID, an additional system ID must be unique in
the routing domain.
l Virtual system: identified by an additional system ID, is used to generate extended LSP
fragments. These fragments carry the additional system IDs in their LSP IDs.
Principles
IS-IS LSP fragments are identified by the LSP Number field in their LSP IDs. The LSP
Number field is 1 byte. An IS-IS process can generate a maximum of 256 fragments. A 1497-
byte LSP can carry about 30,000 routes. With fragment extension, more information can be
carried.
With additional system IDs (up to 50 virtual systems), an IS-IS process can generate a
maximum of 13056 LSP fragments.
When a virtual system and fragment extension are configured, an IS-IS router adds the
contents that cannot be contained in its LSPs to the LSPs of the virtual system and notifies
other routers of the relationship between the virtual system and itself through a special TLV in
the LSPs.
IS Alias ID TLV
A special TLV, IS Alias ID TLV, is defined in RFC 3786.
Type 1 byte TLV type. If the value is 24, it indicates the IS Alias
ID TLV.
Regardless of the operation mode, the originating system and virtual system send the LSPs
with fragment number 0 carrying the IS Alias ID TLV to indicate the originating system.
Operation Modes
The following figure shows the networking for LSP fragment extension, which can be run in
two different modes.
ATNA1
ATNB ATNA
ATNA2
l The IS-IS router can run the LSP fragment extension feature in the following modes:
Mode-1: is used when some routers on the network do not support the LSP fragment
extension.
In this mode, virtual systems participate in the SPF calculation. The originating system
advertises LSPs that contain information about the links to each virtual system.
Similarly, each virtual system advertises LSPs that contain information about the links to
the originating system. In this manner, the virtual systems function the same as the
physical devices connected to the originating system on the network.
Mode-1 is a transitional mode for earlier IS-IS versions that do not support fragment
extension. In the earlier versions, IS-IS cannot identify Alias ID TLVs. Therefore, the
LSP sent by a virtual system must resemble a common IS-IS LSP.
The LSP sent by a virtual system contains the same area address and overload bit as
those in the common LSP. If the LSPs sent by a virtual system contain TLVs specified in
other features, the TLVs must be the same as those in common LSPs.
LSPs sent by a virtual system carry information of the neighbor (the originating system),
and the carried cost is the maximum value minus 1. LSPs sent by the originating system
carry information of the neighbor (the virtual system), and the carried cost is 0. This
mechanism ensures that the virtual system is a node downstream of the originating
system when other devices calculate routes.
In Figure 8-39, ATN B does not support the LSP fragment extension; ATN A supports
the LSP fragment extension in mode-1, and ATN A1 and ATN A2 are virtual systems of
ATN A. In this example, ATN A1 and ATN A2 send LSPs carrying routing information
of ATN A. After receiving LSPs from ATN A, ATN A1, and ATN A2, ATN B considers
there to be three devices at the peer end and calculates routes normally. Because the cost
of the route from ATN A to ATN A1 and the cost of the route from ATN A to ATN A2
are both 0, the cost of the route from ATN B to ATN A is equal to that of the route from
ATN B to ATN A1.
l Mode-2: is used when all routers on the network support LSP fragment extension.
In this mode, virtual systems do not participate in the SPF calculation. All routers on the
network detect that the LSPs generated by the virtual systems belong to the originating
system.
Working in mode-2, IS-IS identifies IS Alias ID TLVs, which are used to calculate the
SPT and routes.
In Figure 8-39, ATN B supports LSP fragment extension; ATN A supports the LSP
fragment extension in mode-2; andATN A1 and ATN A2 send LSPs carrying routing
information of ATN A. When receiving LSPs from ATN A1 and ATN A2, ATN B
obtains IS Alias ID TLV and learns that the originating system of ATN A1 and ATN A2
is ATN A. ATN B considers information advertised by ATN A1 and ATN A2 to be about
ATN A.
Regardless of the LSP fragment extension mode in use, LSPs can be resolved. However, if
LSP fragment extension is not supported, only LSPs in mode-1 can be resolved.
Table 8-11 Comparison between LSP fragment extension mode-1 and mode-2
Area Yes No
Process
After LSP fragment extension is configured, if information is lost because LSPs overflow, the
system restarts the IS-IS process. After being restarted, the originating system loads as much
routing information as possible. Any excessive information beyond the forwarding capability
of the system is added to the LSPs of the virtual systems for transmission.
Usage Scenario
NOTE
If there are non-Huawei devices on the network, LSP fragment extension must be set to mode-1.
Otherwise, these devices cannot identify LSPs.
The value of an administrative tag is associated with certain attributes. If the cost-style is
wide, wide-compatible or compatible and the prefix of the reachable IP address to be
advertised by IS-IS has these attributes, IS-IS adds the administrative tag to the TLV in the
prefix. The tag is flooded with the prefix throughout the routing domain.
On an IS-IS router without hostname exchange, information about IS-IS neighbors and
LSDBs is represented by a system ID with 12 hexadecimal digits, for example, aaaa.eeee.
1234. This representation is complicated and not easy to use.
To easily maintain and manage IS-IS networks, the dynamic hostname exchange mechanism
was introduced.
Dynamic hostname information is advertised in the form of a dynamic hostname TLV (type
137) in LSPs. The dynamic hostname exchange mechanism also provides a service to
associate a host name with the Designated IS (DIS) on a broadcast network. Then, this
mechanism advertises this association through LSPs in the form of a dynamic hostname TLV.
On the ATN, routers with IS-IS dynamic hostname mapping enabled add the Dynamic
Hostname TLV (TLV type 137) that records the local host name to the LSPs they generate
before sending the LSPs.
Dynamic Hostname TLV (TLV type 137) includes the following fields:
The Dynamic Hostname TLV is optional and can be inserted anywhere in an LSP. The
hostname value cannot be null. A router determines whether to add the TLV to LSPs to be
sent. The router that receives the LSPs determines to ignore or obtain the TLV for its mapping
table.
Implementation
l Matching rules
The dynamic hostname mechanism abides by the longest matching rule. First, System ID
+NSEL is first compared. If that does not match, the system ID is then compared.
l Dynamic hostname transmission
The dynamic hostname can be carried by the original LSP only.
l DIS dynamic hostname transmission
The DIS dynamic hostname is transmitted through the LSPs generated by the DIS.
l Dynamic hostname priority
The dynamic hostname takes precedence over the static hostname. When both dynamic
and static hostnames are configured, the dynamic hostname replaces the static hostname.
l Dynamic hostname configuration and resolution
The dynamic hostname is a maximum of 64 bytes, and a maximum of 255-byte content
can be resolved.
Usage Scenario
In maintenance and management, the hostname is easier to identify and retain than the system
ID. After a hostname is configured, it rather than the system ID is displayed when you view
information about IS-IS on the router.
The hostname exchange mechanism implemented on the ATN includes dynamic and static
hostname mapping. The system ID is replaced by the hostname in the following cases:
l When an IS-IS neighbor is displayed, the system ID of the IS-IS neighbor is replaced by
the dynamic hostname. If the IS-IS neighbor is the DIS, the system ID of the DIS is
replaced by the dynamic hostname of the neighbor.
l When an LSP in the IS-IS LSDB is displayed, the system ID in the LSP ID is replaced
by the dynamic hostname of the router that advertises the LSP.
l When details about the IS-IS LSDB are displayed, the Host Name field is included for
the LSP generated by the router where dynamic hostname exchange is enabled; the
system ID is replaced by the dynamic hostname of the IS-IS neighbor.
8.5.2.9 IS-IS HA
NOTE
IS-IS HA includes hot standby, data backup, command line backup, batch backup, and real-
time backup.
IS-IS backs up data from the Active Main Board (AMB) to the Standby Main Board (SMB).
If the AMB fails, the SMB becomes active and takes traffic over from the AMB. IS-IS,
therefore, can keep working normally.
Basic Concepts
l Data backup
It indicates backup of data of processes and interfaces.
l Command line backup
If the AMB processes command lines successfully, it sends them to the SMB for
processing. If the AMB fails to process the command lines, it logs that the command
lines fail to take effect and does not send them to the SMB for processing. If the SMB
fails to process the command lines, the failure is recorded in a log.
Hot Standby
The IS-IS Hot Standby (HSB) feature is supported by devices.
IS-IS HSB allows IS-IS configurations on the AMB and those on the SMB to be consistent.
When an AMB/SMB switchover occurs, IS-IS on the new AMB performs GR. The new AMB
sends requests to neighbors to reestablish neighbor relationships and synchronize the LSDB.
Traffic, therefore, is not affected.
NOTE
The ATN can function as a GR helper, not a GR restarter.
Batch Backup
l Backing up data in batches
When the SMB is installed, all data of the AMB is backed up to the SMB. No
configuration can be changed during batch backup.
l Backing up command lines in batches
When the SMB is installed, all configurations of the AMB are backed up to the SMB.
No configuration can be changed during batch backup.
Real-time Backup
l Real-time backup of data
It indicates real-time backup of changed data of processes and interfaces to the SMB.
l Real-time backup of command lines
It indicates that command lines that were run successfully on the AMB are backed up to
the SMB.
NOTE
By default, the IS-IS 3-way handshake mechanism is implemented on P2P links, as defined in RFC
3373.
8.5.2.11 IS-IS GR
IS-IS Graceful Restart (GR) is a high availability (HA) technology and ensures non-stop
forwarding.
Because IS-IS is a link state routing protocol, all routers in an area must maintain the same
network topology and share the same LSDB.
After a master/slave switchover, no neighbor information is stored on the restarted router.
Therefore, the first Hello packets sent by the router after restart do not contain the neighbor
list. After receiving the Hello packets, the neighbor checks the 2-way neighbor relationship
and detects that it is not in the neighbor list of the Hello packets sent by the router. Then the
neighbor relationship is interrupted.
The neighbor then generates new LSPs and floods the topology changes to all other routers in
the area. The routers recalculate routes, which leads to a routing interruption or even a routing
loop.
Because no LSDB is stored on the restarted router, the router needs to synchronize its LSDB
with those of its neighbors.
When restarting IS-IS without GR mode, IS-IS neighbor relationships are reset, and LSPs are
regenerated and flooded. This triggers the SPF calculation in the entire area, which causes
route flapping and forwarding interruptions in the area.
The IETF defines IS-IS GR in RFC 3847. Protocol restarts are processed for both reserved
and unreserved FIB entries, preventing route flapping and traffic forwarding interruptions
caused by the restarts.
When a router fails, neighbors at the routing protocol layer detect that their neighbor
relationships are Down and then become Up again after a period. This is neighbor relationship
flapping, which may cause route flapping, black-hole routes, or routing loops on the restarted
router, decreasing network reliability. To address this problem, GR was introduced.
Basic Concepts
IS-IS GR involves two roles: GR restarter and GR helper.
NOTE
The ATN can function as a GR helper, not a GR restarter.
To implement GR, IS-IS introduces the restart Type-Length-Value (TLV), T1 timer, T2 timer,
and T3 timer.
Restart TLV
The restart TLV is an extended part of an IS-to-IS Hello (IIH) PDU. All IIH packets of the
router that supports IS-IS GR contain the restart TLV that carries the parameters for protocol
restarts. Figure 8-40 shows the format of the restart TLV.
Remaining Time
Type 1 byte TLV type. Type value 211 indicates the restart TLV.
Remaining 2 bytes Time during which the neighbor retains the adjacency, in
Time seconds. The length of the field is 2 bytes. When RA is set,
the value is mandatory.
Timers
IS-IS GR has three timers: T1, T2, and T3.
l T1
Any interface enabled with IS-IS GR maintains a T1 timer. On a Level-1-2 router,
broadcast interfaces maintain a T1 timer for Level-1 and Level-2 neighbor relationships.
If the GR restarter has already sent an IIH packet with RR being set but does not receive
any IIH packet that carries the restart TLV and the RA set from the GR helper even after
the T1 timer expires, the GR restarter resets the T1 timer and continues to send the
restart TLV.
If the ACK packet is received or the T1 timer expires three times, the T1 timer is
disabled. The default value of a T1 timer is 3 seconds.
l T2
Level-1 and Level-2 LSDBs maintain independent T2 timers.
The value of the T2 timer indicates the longest time during which the system waits for
the LSDB synchronization. The default value is 60s.
l T3
The entire system maintains a T3 timer.
T3 indicates the maximum time that a whole GR process is allowed to last.
If the T3 timer expires, GR fails.
The initial value of the T3 timer is 65535 seconds. After the IIH packets with RA set are
received from neighbors, the T3 timer uses the smallest value of the Remaining Time
field in the IIH packets.
The T3 timer only applies when devices are restarted.
The following describes the process of IS-IS GR in restarting and starting modes:
IS-IS Restarting
Figure 8-41 shows the process of IS-IS restarting.
Active/standby
switchover
CSNP
Delete T1 timer
LSPs
Delete T2 timer
1. After performing a protocol restart, the GR restarter performs the following actions:
– Starts T1, T2, and T3 timers.
– Sends IIH packets that contain the restart TLV from all interfaces. In these packets,
RR is set to 1, and RA and SA are set to 0.
2. After receiving an IIH packet, the GR helper performs the following actions:
– Maintains the neighbor relationship and updates the current Holdtime.
– Replies with an IIH packet containing the restart TLV. In the packet, RR is set to 0;
RA is set to 1, and the value of the Remaining Time field indicates the left time for
the Holdtime to expire.
– Sends CSNPs and all LSPs to the GR restarter.
NOTE
3. After the GR restarter receives the IIH response packet, in which RR is set to 0 and RA
is set to 1, from the neighbor, it performs the following actions:
– Compares the current value of the T3 timer with the value of the Remaining Time
field in the packet. The smaller value is used as the value of the T3 timer.
– Deletes the T1 timer maintained by the interface that receives the ACK packet and
CSNPs.
– If the interface does not receive the ACK packet or CSNPs, the GR restarter
repeatedly resets the T1 timer and resends the IIH packet that contains the restart
TLV. If the number of timeouts of the T1 timer exceeds the threshold value, the GR
restarter deletes the T1 timer and initiates the normal IS-IS processing to complete
LSDB synchronization.
4. After the GR restarter deletes the T1 timers on all interfaces, the synchronization with all
neighbors is complete when the CSNP list is cleared and all LSPs are collected. The T2
timer is then deleted.
5. After the T2 timer is deleted, LSDBs of the corresponding level are synchronized.
– In the case of a Level-1 or Level-2 router, SPF calculation is triggered.
– In the case of a Level-1-2 router, it determines whether the T2 timer of the other
level is also deleted. If both T2 timers are deleted, SPF calculation is triggered.
Otherwise, the router waits for the T2 timer of the other level to expire.
6. After all T2 timers are deleted, the GR restarter deletes the T3 timer and updates FIB
entries. The GR restarter re-generates the LSPs of each level and floods them. During
LSDB synchronization, the GR restarter deletes the LSPs generated before the restarting.
7. At this point, the IS-IS restarting of the GR restarter is complete.
IS-IS Starting
The starting device does not retain FIB entries. Before it starts, the starting device needs to
reset its adjacencies that are Up with its neighbors and suppress the neighbors from
advertising the adjacencies. The IS-IS starting process is different from the IS-IS restarting
process, as shown in Figure 8-42.
GR Restarter GR Helper
Starting
CSNP
Delete T1 timer
LSPs
Delete T2 timer
Replies with an IIH packet that does not contain the restart TLV. The neighbor then
initiates the normal IS-IS processing. In this case, the neighbor does not suppress
the advertisement of the adjacency with the GR restarter. On a P2P link, the
neighbor also sends a CSNP.
3. After the adjacency is re-initiated, the GR restarter re-establishes the adjacency with the
neighbors on each interface. When an adjacency set on an interface is Up, the GR
restarter starts the T1 timer for the interface.
4. After the T1 timer expires, the GR restarter sends an IIH packet in which both RR and
SA are set to 1.
5. After the neighbor receives the IIH packet, it replies with an IIH packet, in which RR is
set to 0 and RA is set to 1, and sends a CSNP.
6. After the GR restarter receives the IIH ACK packet and CSNP from the neighbor, it
deletes the T1 timer.
If the GR restarter does not receive the IIH packet or CSNP, it repeatedly resets the T1
timer and resends the IIH packet in which RR and SA are set to 1. If the number of
timeouts of the T1 timer exceeds the threshold value, the GR restarter deletes the T1
timer and initiates the normal IS-IS processing to complete LSDB synchronization.
7. After receiving the CSNP from the helper, the GR restarter synchronizes the LSDB.
8. After the LSDB of this level is synchronized, the T2 timer is deleted.
9. After all T2 timers are deleted, the SPF calculation is started, and LSPs are regenerated
and flooded.
10. At this point, the IS-IS starting of the GR restarter is complete.
Usage Scenario
GR is typically applied to PEs, especially single-point PEs, preventing single points of failure
on a PE or master/slave control board switchovers due to maintenance operations, such as
software upgrades. GR ensures non-stop forwarding of key services. Figure 8-43 shows the
networking for the application of GR.
VPN A VPN B
CE-1 PE3 CE-2
PE1
NOTE
NSF is deployed on PE2 to prevent single points of failure on PE2. (IS-IS GR, and LDP GR run on
PE2).
On the PEs, IS-IS, or LDP GR is run. On the Ps, IS-IS or LDP GR is run. The MPU/SRUs on
the PEs and Ps work in backup mode.
l IPv6 Reachability
The IPv6 Reachability TLV indicates the reachability of a network by specifying the
route prefix and cost. The type value is 236 (0xEC).
l IPv6 Interface Address
The IPv6 Interface Address TLV is similar to the IP interface address TLV of IPv4 in
function, except that it changes the original 32-bit IPv4 address to a 128-bit IPv6
address. The type value is 232 (0xE8).
The NLPID is an 8-bit field that identifies network layer protocol packets. The NLPID of
IPv6 is 142 (0x8E). If an IS-IS router supports IPv6, it advertises routing information through
the NLPID value.
8.5.2.13 IS-IS TE
IS-IS Traffic Engineering (TE) is an extension of IS-IS to support MPLS TE.
To establish CR-LSPs, MPLS needs to learn the traffic attributes of all the links in the local
area. MPLS can acquire the TE information of the links through IS-IS.
Traditional routers select the shortest path as the primary route regardless of other factors,
such as bandwidth, even when the path is congested.
ATN C
ATN D
ATN H
ATN B
ATN E
ATN A
ATN F ATN G
In Figure 8-44, all the links have the same cost. The shortest path from ATN A/ATN H to
ATN E is ATN A/ATN H -> ATN B -> ATN C -> ATN D -> ATN E. Data is forwarded along
this shortest path. The path ATN A/ATN H -> ATN B -> ATN C -> ATN D -> ATN E may be
congested, and the path ATN A/ATN H -> ATN B -> ATN F -> ATN G -> ATN D -> ATN E
may be idle.
To solve the preceding problem, the cost of the path ATN B-ATN C can be increased so that
the traffic is switched to the path ATN A/ATN H -> ATN B -> ATN F -> ATN G -> ATN D ->
ATN E.
This method eliminates the congestion on the link ATN A/ATN H -> ATN B -> ATN C ->
ATN D -> ATN E; however, the other link ATN A/ATN H -> ATN B -> ATN F -> ATN G ->
ATN D -> ATN E may be congested. In addition, on networks with complicated topologies,
changing the cost of one link may affect multiple routes.
As an overlay model, MPLS can set up a virtual topology over the physical network topology
and map traffic to this virtual topology, effectively combining MPLS and TE technology into
MPLS TE.
MPLS TE can resolve network congestion problems by allowing carriers can precisely control
the path through which traffic passes and prevent traffic from passing through congested
nodes. Meanwhile, MPLS TE can reserve resources during the establishment of LSPs to
ensure service quality.
To ensure continuity of services, MPLS TE provides the CR-LSP backup and fast reroute
(FRR) mechanisms. If a link fault occurs, traffic can be switched immediately. Through
MPLS TE, service providers (SPs) can fully utilize the current network resources to provide
diverse services, optimize network resources, and methodically manage the network.
To accomplish the preceding tasks, MPLS TE needs to learn TE information about all devices
on the network. However, MPLS TE lacks a mechanism in which each device floods its TE
information throughout the entire network for TE information synchronization. However, IS-
IS does provide such a mechanism. Therefore, MPLS TE can advertise and synchronize TE
information with the help of IS-IS. To support MPLS TE, IS-IS needs to be extended.
In brief, IS-IS TE collects TE information on IS-IS networks and then transmits the TE
information to the CSPF module.
Basic Principle
As specified in RFC 5305 and RFC 4205, IS-IS TE defines new TLVs and sub-TLVs in IS-IS
LSPs to carry TE information, floods, synchronizes, and resolves TE information, and
transmits the resolved TE information to the CSPF module. IS-IS TE plays the role of a porter
in MPLS TE. Figure 8-45 shows the relationships between IS-IS TE, MPLS TE, and CSPF.
MPLS TE
TE management
Feedback
Advertising
And Adjust
CSPF IS-IS TE
calculating TE Flooding TE
collecting
To carry TE information in LSPs, IS-IS TE defines the following TLVs in RFC 5305:
l Extended IS reachability TLV
The Extended IS reachability TLV replaces the IS reachability TLV and extends the TLV
format using sub-TLVs. The implementation of sub-TLVs in TLVs is the same as that of
TLVs in LSPs. Sub-TLVs are used to carry TE information configured on physical
interfaces.
NOTE
Currently, all sub-TLVs defined in RFC 5305 and sub-TLV type 22 defined in RFC 4124 are
supported.
Administrative Group 3 4
Unreserved Bandwidth 11 32
Usage Scenario
IS-IS TE helps MPLS TE set up TE tunnels. In Figure 8-46, a TE tunnel is set up between
ATN A and ATN C.
ATN B
ATN C
Tunnel
ATN D
TE tunnel FA 10
10 10 10 10
ATN -T
If packets from ATN A to ATN C need to travel through the TE tunnel, you can enable IS-IS
Shortcut (AA) and the IS-IS process on the TE tunnel interfaces. Then, ATN A considers the
cost of the path to ATN C as 10 and then selects the tunnel interface as the outbound interface.
IS-IS Shortcut (AA) applies only to the local interface and functions unidirectionally.
IS-IS Shortcut (AA) does not affect the original structure of the IS-IS SPT, regardless of
whether a TE tunnel exists. Apart from the link from ATN A to ATN B, and that from ATN B
to ATN C, a link marked with an S from ATN A to ATN C is added. S is short for Shortcut.
The link marked with an S participates in route calculation.
l Absolute metric
An absolute metric indicates that the metric of TE tunnels in IS-IS is fixed.
l Relative metric
A relative metric indicates that the metric of TE tunnels in IS-IS is relative. The route
cost is the physical link cost plus the relative metric.
In Figure 8-47, if the relative metric is set to 1, the cost of the path from ATN A to ATN C
through the TE tunnel is 21 (10+10+1).
If the relative metric is set to 0, the TE tunnel and physical link have the same cost on the
outbound interface. If the relative metric is less than 0, the TE tunnel interface is preferred as
the outbound interface.
The metric of IS-IS Shortcut (AA) is prior to the IS-IS cost. If the metric of IS-IS Shortcut
(AA) is not configured, IS-IS uses the IS-IS cost of the TE tunnel interface. If the metric of
IS-IS-Shortcut (AA) is configured, IS-IS uses its value.
The algorithm of IS-IS Advertise (FA) is the same as that of IS-IS Shortcut (AA).
The differences between IS-IS Advertise (FA) and IS-IS Shortcut (AA) are listed as follows:
l IS-IS Advertise (FA) advertises the TE tunnel information to other ISs, whereas IS-IS
Shortcut (AA) does not.
In Figure 8-47, if the TE tunnel is enabled with IS-IS Advertise (FA), ATN A advertises
information indicating that ATN C is its neighbor. The neighbor information is carried in
TLV type 22 with no sub-TLVs. That is, no TE information is carried. If the TE tunnel is
enabled with IS-IS Shortcut (AA), ATN A does not advertise such information.
l IS-IS Advertise (FA) functions only when bidirectional TE tunnels are configured.
If the TE tunnel is enabled with IS-IS Advertise (FA), ATN C must advertise information
indicating that ATN A is its neighbor. Then, the TE tunnel interface can be used by ATN
A to forward traffic. If the TE tunnel is enabled with IS-IS Shortcut (AA), ATN A does
not check whether ATN C is its neighbor.
l IS-IS Advertise (FA) affects the SPTs of other routers.
If the TE tunnel is enabled with IS-IS Advertise (FA), ATN A advertises the message
that ATN C is a neighbor of ATN A to other routers on the network. Other routers then
consider ATN C a neighbor of ATN A and add ATN C to the SPT without marking it
with an S.
l IS-IS Advertise (FA) does not support the relative metric.
IS-IS Advertise (FA) functions on the entire network. Therefore, note the following
points when deploying IS-IS Advertise (FA):
– TE tunnels enabled with IS-IS Advertise (FA) are preferred to be bidirectional.
– In Figure 8-47, you must enable IS-IS Advertise (FA) on the TE tunnel from ATN
C to ATN A so that the TE tunnel from ATN A to ATN C is available.
– If there are P2P neighbors between the two devices enabled with IS-IS Advertise
(FA), a unidirectional TE tunnel is also available.
– In Figure 8-47, if the TE tunnel from ATN A to ATN B is enabled with IS-IS
Advertise (FA), and ATN A and ATN B are connected through networks other than
Ethernet, this TE tunnel is available. In this case, the physical link from ATN B to
ATN A functions as a TE tunnel enabled with IS-IS Advertise (FA) in the other
direction.
In the earlier ISO 10589, the largest metric of an interface is 63. TLV type 128 and TLV type
130 contain information about routes, and TLV type 2 contains information about IS-IS
neighbors.
As defined in RFC 3784, with IS-IS wide metric, the largest metric of an interface is extended
to 16777215, and the largest metric of a route is 4261412864. With IS-IS wide metric
enabled, TLV type 135 contains information about routes; TLV type 22 contains information
about IS-IS neighbors.
l The following TLVs are used in narrow mode:
– IP Internal Reachability: carries routes within an area.
– IP External Reachability: carries routes outside an area.
– IS Neighbors: carries information about neighbors.
l The following TLVs are used in wide mode:
– Extended IP Reachability TLV: replaces the earlier IP Reachability TLV and carries
information about routes. This TLV expands the range of the route cost to 4 bytes
and carries sub-TLVs.
– IS Extended Neighbors TLV: carries information about neighbors.
NOTE
IS-IS in wide mode and IS-IS in narrow mode cannot communicate. If IS-IS in wide mode and IS-IS in
narrow mode need to communicate, you must change the mode to enable all routers on the network to
receive packets sent by other routers.
Table 8-14 Metric style carried in received and sent under different metric style
configurations
When the metric style is set to compatible, IS-IS sends the information both in narrow and
wide modes.
Process
NOTICE
Once the metric style is changed, the IS-IS process restarts. Therefore, exercise caution when
changing the metric style.
l If the metric style carried in sent packets is changed from narrow to wide:
The information previously carried by TLV type 128, TLV type 130, and TLV type 2 is
now carried by TLV type 135 and TLV type 22.
l If the metric style carried in sent packets is changed from wide to narrow:
The information previously carried by TLV type 135 and TLV type 22 is now carried by
TLV type 128, TLV type 130, and TLV type 2.
l If the metric style carried in sent packets is changed from narrow or wide to narrow and
wide:
The information previously carried in narrow or wide mode is now carried by TLV type
128, TLV type 130, TLV type 2, TLV type 135, and TLV type 22.
P2
PE1 P1 P3 PE2
P4
IS-IS LDP synchronization on P1 and P2 can shorten the traffic interruption during traffic
switchback from the backup LSP to the primary LSP.
To prevent packet loss during traffic switchback, LDP-IGP synchronization delays switchback
of the traffic forwarded by IGP routes until LDP sessions are established. That is, before an
LSP is set up, the original LSP is not deleted and is still used to forward traffic.
1 2
Init
3 3
Hold 5 Hold
Down max cost
3 4
Hold max Cost
2
Timer Expired
4 4
Sync
Achieved
– If an interface is in the Init state and the LDP session is Down, the interface changes
to the HoldDown state when it receives a message indicating that the interface is
Up.
– If an interface is in the Init state and the LDP session is Up, the interface changes to
the Achieve state when it receives a message indicating that the interface is Up.
– If an interface is in the Holdtimeout state, the interface changes to the Init state
when it receives a message indicating that the interface is Down.
– If an interface is in the Holdtimeout state, the interface changes to the Achieve state
when it receives a message indicating that the LDP session is Up.
– If an interface is in the HoldMaxCost state, the interface changes to the Achieve
state when it receives a message indicating that the interface is Up.
– If an interface is in the HoldMaxCost state, the interface changes to the Init state
when it receives a message indicating that the interface is Down.
– If the HoldMaxCost timer expires, an interface changes to the Holdtimeout state
when it does not receive a message indicating that the LDP session is Up.
– If an interface in the HoldDown state receives a message indicating that the LDP
session is Down, the interface state changes to Achieve.
– If an interface in the HoldDown state receives a message indicating that the
interface is Down, the interface state changes to Init.
– If an interface in the HoldDown state receives a message indicating that the Hold
Down timer expires, the interface state changes to HoldMaxCost.
– If an interface in the Achieve state receives a message indicating that the LDP
session is Down, the interface state changes to HoldMaxCost.
– If an interface in the Achieve state receives a message indicating that the interface
is Down, the interface state changes to Init.
Usage Scenario
In the networking shown in Figure 8-48, LDP-IGP synchronization can be configured to
prevent packet loss during traffic switchback from the backup LSP to the primary LSP.
Two systems periodically send BFD packets on the path between them. If one system does not
receive any BFD packets from its peer within the detection period, the system detects that the
bidirectional path to its peer is faulty. Under some conditions, systems need to negotiate the
sending and receiving rates to reduce the load.
NOTE
BFD uses the local discriminator and remote discriminator to differentiate multiple BFD sessions
between the same pair of systems.
l Static BFD
In static BFD, BFD session parameters including local and remote discriminators are set
using commands, and the requests for establishing BFD sessions are manually delivered.
Static BFD
In static BFD, BFD session parameters including local and remote discriminators are set using
commands, and the requests for establishing BFD sessions are manually delivered.
In this mode, the creation and deletion of BFD sessions also need to be triggered manually,
which is inflexible and configuration errors can occur from user mistakes. For example, the
local discriminator and remote discriminator are incorrectly configured, which causes
abnormal functioning of the BFD session.
Dynamic BFD
Dynamic BFD is more flexible than static BFD. In dynamic BFD, routing protocols trigger
the establishment of BFD session. The establishment of a BFD-for-IPv4 session is triggered
by IS-IS when an IPv4 neighbor relationship is set up.
In setting up a new neighbor relationship, IS-IS sends parameters of the neighbors and
detection parameters (including source and destination IP addresses) to BFD. BFD then sets
up a session according to the received parameters. Dynamic BFD is more flexible than static
BFD.
The RM module provides related services for association with the BFD module for IS-IS.
Through RM, IS-IS prompts BFD to set up or tear down BFD sessions by sending notification
messages. In addition, BFD events are transmitted to IS-IS through RM.
– Basic IS-IS functions are configured on each router and IS-IS is enabled on the
interfaces of the routers.
– BFD is enabled on each router, and BFD for IPv4 is enabled on interfaces or
processes of the routers.
– BFD for IPv4 is enabled on interfaces or processes, and the status of the
neighboring router is Up (the DIS must be elected on a broadcast network).
l Process of setting up a BFD session
– P2P network
After the conditions for setting up a BFD session are satisfied, IS-IS instructs BFD
through RM to directly set up a BFD session between neighbors.
– Broadcast network
After the conditions for establishing BFD sessions are met, and the DIS is elected,
IS-IS instructs BFD through RM to establish a BFD session between the DIS and
each router. No BFD session is established between non-DISs.
On a broadcast network, the routers (including non-DIS routers) of the same level on the
same network segment can set up neighbor relationships. In the implementation of IS-IS
BFD, however, BFD sessions are set up between the DIS and non-DIS devices rather
than between non-DISs. On a P2P network, BFD sessions are directly set up between
neighbors.
If a Level-1-2 neighbor relationship is set up between two routers on a link, IS-IS sets up
two BFD sessions for the Level-1 and Level-2 neighbors on a broadcast network, but
sets up only one BFD session on a P2P network.
l Conditions for tearing down a BFD session
– P2P network
When a neighbor relationship that was set up on P2P interfaces by IS-IS is down
(that is, the neighbor relationship is not in the Up state) or when the IP protocol type
of a neighbor is deleted, IS-IS tears down the BFD session.
– Broadcast network
When a neighbor relationship that was set up on P2P interfaces by IS-IS is torn
down (that is, the neighbor relationship is not in the Up state) when the IP protocol
type of a neighbor is deleted, or when the DIS is re-elected, IS-IS tears down the
BFD session.
When the configurations of a dynamically established BFD session are deleted or BFD
for IS-IS is disabled on an interface, all BFD sessions to which neighbor relationships on
the interface correspond-between devices or between devices and the DIS are deleted.
After dynamic BFD is globally disabled in an IS-IS process, the BFD sessions on all the
interfaces in this IS-IS process are deleted.
NOTE
BFD detects only one-hop links between IS-IS neighbors, because IS-IS establishes only one-hop
neighbor relationships.
l Response to the Down event of a BFD session
When detecting a link failure, BFD generates a Down event, and then notifies RM of the
event. RM then instructs IS-IS to deletes the neighbor relationship. IS-IS recalculates
routes to speed up route convergence on the entire network. After BFD for IPv4 informs
IS-IS of the link failure, IS-IS changes only the IPv4 route.
When a router and its neighbor are Level-1-2 routers, they set up two neighbor
relationships, that is, the Level-1 neighbor relationship and the Level-2 neighbor
relationship. Then, IS-IS sets up two BFD sessions for the Level-1 neighbor relationship
and Level-2 neighbor relationship. In this case, the RM module deletes the neighbor
relationship of a specific level.
Applicable Environment
NOTICE
You must configure BFD according to the actual network environment. If timer parameters
are set improperly, network flapping may occur.
BFD for IS-IS can quickly sense link changes to implement fast route convergence.
Primary path
Backup path
ATN C
Background
With the development of networks, the services such as Voice over IP (VoIP) and online video
services require high-quality real-time transmission. Nevertheless, if an IS-IS link fault
occurs, traffic can be switched to a new link only after the processes, including fault
detection, LSP update, LSP flooding, route calculation, and FIB entry delivery, are complete.
As a result, it takes much more than 50 ms to rectify the fault, which cannot meet the
requirement for real-time transmission services on the network.
Implementation Principle
IS-IS Auto FRR pre-computes a backup link by using the Loop-Free Alternate (LFA)
algorithm, and then adds the backup link and the primary link to the forwarding table. In the
case of an IS-IS network failure, IS-IS Auto FRR can fast switch traffic to the backup link
before routes on the control plane converge. This ensures normal transmission of traffic and
improves the reliability of the IS-IS network.
The backup link is calculated through the LFA algorithm. With the neighbor that can provide
the backup link being the root, the shortest path to the destination node is calculated by a
device through the SPF algorithm. Then, the loop-free backup link is calculated according to
the inequality defined in RFC 5286.
IS-IS Auto FRR can filter backup routes that need to be added to the IP routing table. Only
the backup routes matching the filtering policy are added to the IP routing table. In this
manner, users can flexibly control the addition of IS-IS backup routes to the IP routing table.
In the scenario where a BFD session is bound to IS-IS Auto FRR, when BFD detects a link
fault on an interface, the BFD session goes Down, triggering FRR on the interface. After that,
the traffic is switched from the faulty link to the backup link, which protects the traffic.
IS-IS Auto FRR supports the following types of TE links:
l IP protecting TE
As shown in Figure 8-51, the TE tunnel has the smallest IS-IS cost among the paths
from ATN S to ATN D. Therefore, ATN S selects the TE tunnel as the primary path to
ATN D. The path ATN S->ATN N->ATN D has the second smallest cost. According to
the LFA algorithm, ATN S selects the path ATN S->ATN N->ATN D as the backup path.
The outbound interface of the backup path is the physical interface that connects ATN S
to ATN N.
NOTE
If the outbound interface of the backup link is the actual outbound interface of the TE tunnel, IP
protecting TE fails.
IS-IS cost = 13
IS
-IS
co
=1
st
st
=1
co
0
-IS
IS
ATNN
Traffic in normal
l TE protecting IP
As shown in Figure 8-52, the physical path ATN S-->ATN N-->ATN D has the smallest
IS-IS metric among the paths from ATN S to ATN D. Therefore, ATN S prefers the path
ATN S-->ATN N-->ATN D as the primary path from ATN S to ATN D. The IS-IS cost of
the TE tunnel is 12, and the explicit path of the TE tunnel is the direct link from ATN S
to ATN D. The IS-IS metric of the direct link from ATN S to ATN D is 13, which is
greater than the IS-IS metric of the TE tunnel. Therefore, IS-IS selects the TE tunnel as
the backup path. TE protecting IP is implemented.
IS-IS cost = 13
1
=
st
IS
o
-IS
c
-IS
co IS
st
=
10
ATNN
Traffic in normal
Application Environment
IS-IS Auto FRR traffic protection is classified into link protection and link-node dual
protection. Distance_opt(X, Y) indicates the shortest path between node X and node Y.
Link protection: indicates that the object to be protected is the traffic passing through an IS-IS
Auto FRR-enabled link. The link cost must satisfy the inequality: Distance_opt(N, D) <
Distance_opt(N, S) + Distance_opt(S, D). In the inequality, S indicates the source node of
traffic, N indicates a node on the backup link, and D indicates the destination node of traffic.
As shown in Figure 8-53, traffic is forwarded from ATN S to ATN D. The link cost satisfies
the link protection inequality. When the primary link fails, ATN S switches traffic to the
backup link from ATN S to ATN N so that the traffic can be further transmitted along
downstream paths. This ensures that the traffic interruption period is less than 50 ms.
cost = 10
ATNS co ATND
st
10
=
=
10
st
co
ATNN
Link-node dual protection: Figure 8-54 shows link-node dual protection of IS-IS Auto FRR.
Node protection takes precedence over link protection.
Link-node dual protection must satisfy the following situations:
l The link cost must satisfy the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
l The interface cost of the router must satisfy the inequality: Distance_opt(N, D) <
Distance_opt(N, E) + Distance_opt(E, D).
S indicates the source node of traffic; E indicates the faulty node; N indicates the node on the
backup link; D indicates the destination node of traffic.
ATNE
co
st
5
=
=
st
10
co
ATNS co ATND
st
10
=
=
10
st
co
ATNN
Based on the authentication modes of packets, authentication is classified into the following
types:
l Simple authentication
The authenticated party directly adds the configured password to packets for
authentication. This authentication mode provides the lowest password security. Because
this imposes security risks, the MD5 authentication was introduced.
l MD5 authentication
In MD5 authentication, passwords are encrypted through the MD5 algorithm before they
are added to packets. This improves the security of passwords.
l Keychain authentication
Keychain authentication further improves network security with configurable key chain
that changes with time.
IS-IS provides a TLV to carry authentication information. The TLV components are as
follows:
l Type (of the authentication packets): is defined by ISO as 10, with a length of 1 byte.
l Length: indicates the length of the authentication TLV, which is 1 byte.
l Value: indicates the authentication information, including authentication type and
password, which ranges from 1 to 254 bytes.
– Type 0 is reserved.
– Type 1 indicates simple authentication.
– Type 54 indicates MD5 authentication.
– Type 255 indicates private routing domain authentication.
l The authentication password for IIH packets is saved on interfaces for interface
authentication.
l The authentication password for Level-1 LSPs and SNPs is saved in the IS-IS process
for area authentication.
l The authentication password for Level-2 LSPs and SNPs is saved in the IS-IS process
for routing domain authentication.
l A router sends authentication packets with the authentication TLV and verifies the
authentication information of the packets it receives.
l A router sends authentication packets with the authentication TLV but does not verify the
authentication information of the packets it receives.
For area authentication and routing domain authentication, you can enable a router to
authenticate SNPs and LSPs separately in the following ways:
l A router sends LSPs and SNPs that carry the authentication TLV and verifies the
authentication information of received LSPs and SNPs.
l A router sends LSPs that carry the authentication TLV and verifies the authentication
information of received LSPs. The router sends SNPs that carry the authentication TLV
but does not verify the authentication information of received SNPs.
l A router sends LSPs that carry the authentication TLV and verifies the authentication
information of received LSPs. The router sends SNPs without the authentication TLV
and does not verify the authentication information of received SNPs.
l A router sends LSPs and SNPs that carry the authentication TLV but does not verify the
authentication information of received LSPs and SNPs.
Usage Scenario
ATN D ATN E
l IS-IS neighbor relationships can be set up between multiple routers on the same network
only when interface authentication is configured in the same manner on all the routers.
l When multiple routers are in the same area, you must configure area authentication the
same way on all the routers to ensure synchronization of their Level-1 LSDBs.
l When Level-2 neighbor relationships are set up between multiple routers, you must
configure routing domain authentication the same way on all the routers to ensure the
synchronization of their Level-2 LSDBs.
Terms
Term Description
s
Term Description
s
LSP Link State Protocol Data Unit. It broadcasts link states in the area and contains all
information about a router. The information includes IS-IS neighbors, IP address
prefix, the ES it is connected to, and the area address. LSPs are classified as
Level-1 LSPs or Level-2 LSPs. A router generates one Level-1 LSP and one
Level-2 LSP with fragments included.
CSNP Complete Sequence Numbers Protocol Data Unit. It contains brief information
about the local LSDB and is used to synchronize the LSDBs of neighbors. CSNPs
are sent and resolved at different levels.
Pseud A virtual node that is used to simulate a broadcast network. It is generated by the
onode DIS and sets up neighbor relationships with all routers on the broadcast network.
PE Provider Edge
CE Customer Edge
TLV Type-Length-Value
MI Multiple Instance
MT Multi-topology
GR Graceful Restart
RM Routing Management
PE Provider Edge
CE Customers Edge
8.5.4 Appendixes
Feature Supported Supported Differences
by IPv4 by IPv6
8.6 OSPF
8.6.1 Introduction
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2, defined in RFC 2328, is intended for IPv4. OSPF stated in this
document refers to OSPFv2, unless otherwise stated.
Purpose
Before the emergence of OSPF, the Routing Information Protocol (RIP) was widely used as
an IGP on networks.
RIP is a distance vector algorithm-based routing protocol. Due to its slow convergence,
routing loops, and poor scalability, RIP is gradually being replaced with OSPF.
As a link-state protocol, OSPF can solve many problems encountered by RIP. Additionally,
OSPF has the following advantages:
l Receives and sends packets in multicast mode, which reduces the load on devices that do
not run OSPF.
l Supports Classless Interdomain Routing (CIDR).
l Supports load balancing among equal-cost routes.
l Supports packet encryption.
With the preceding advantages, OSPF is widely accepted and used as an IGP.
8.6.2 Principles
Database Description (DD) DD packets carry brief information about the local
Link State Database (LSDB) and are used to
synchronize the LSDBs of two routers.
Link State Request (LSR) LSR packets are used to request the required LSAs
from neighbors.
LSR packets are sent only after DD packets have
been exchanged successfully.
Link State Update (LSU) LSU packets are used to send the required LSAs to
neighbors.
Link State Acknowledgment LSAck packets are used to acknowledge the received
(LSAck) LSAs.
LSA Type
Router-LSA (Type1) Describes the link status and link cost of the ATN.
Generated by each ATN and advertised in the area to which
the ATN belongs.
Network-LSA (Type2) Describes the link status of all routers in the local network
segment. Generated by a designated router (DR) and
advertised in the area to which the DR belongs.
Router Type
Figure 8-56 illustrates the common types of routers in OSPF.
IS-IS ASBR
Area1 Area4
Area0
Area Border Router (ABR) An ABR can belong to two or more areas; one of the areas
must be a backbone area.
An ABR is used to connect the backbone area and non-
backbone areas. It can be physically or logically connected
to the backbone area.
Type1 external route Because of the high reliability of Type1 external routes,
the calculated cost of external routes equals that of AS
internal routes, and.
In other words, the cost of a Type1 external route equals
the cost of the route from the router to the corresponding
ASBR plus the cost of the route from the ASBR to the
destination.
Type2 external route Because of the low reliability of Type2 external routes,
their costs are considered to be greater than the cost of
any internal path to an ASBR.
The cost of a Type2 external route equals the cost of the
route from the ASBR to the destination.
Area Type
Totally stub area Allows Type3 default routes that are advertised by an ABR, and
denies inter-area routes and the routes outside an AS.
NSSA area Imports routes from outside an AS, unlike a stub area. An ASBR
advertises Type7 LSAs in the local area.
Network Description
Non-Broadcast If the link layer protocol is frame relay (FR), ATM, or X.25,
Multiple Access OSPF defaults the network type to NBMA.
(NBMA) In NBMA networks, protocol packets, such as Hello, DD, LSR,
LSU, and LSAck packets, are transmitted in unicast mode.
Point-to-Multipoint Regardless of the link layer protocol, OSPF does not default the
(P2MP) network type to P2MP. A P2MP network must be forcibly
changed from other network types. The common practice is to
change a non-fully connected NBMA network to a P2MP
network.
In P2MP networks:
l Hello packets are transmitted in multicast mode through the
multicast address 224.0.0.5.
l Other protocol packets, such as DD, LSR, LSU, and LSAck
packets, are transmitted in unicast mode.
Point-to-point (P2P) If the link layer protocol is PPP, HDLC, or LAPB, OSPF
defaults the network type to P2P.
In broadcast networks:
l In P2P networks, protocol packets, such as Hello, DD, LSR,
LSU, and LSAck packets, are transmitted in multicast mode
through the multicast address 224.0.0.5.
l LSU packets are retransmitted in unicast mode.
Stub Area
A stub area is a special area where ABRs do not flood the received external routes. In a stub
area, the size of the routing table of routers and routing information in transmission are
greatly reduced.
Configuring a stub area in a network is optional. Not all areas can be configured as stub areas.
Generally, a stub area is a non-backbone area with only one ABR and is located at the AS
boundary.
To ensure the reachability of a destination outside an AS, the ABR in a stub area generates a
default route and advertises it to non-ABRs in the stub area.
When you configure a stub area, note the following:
l The backbone area cannot be configured as a stub area.
l If an area needs to be configured as a stub area, use the stub command to configure all
the routers in this area.
l An ASBR cannot exist in a stub area. That is, external routes are not flooded in the stub
area.
l A virtual link cannot pass through a stub area.
router calculates routes by using an LSA that describes a default route in an LSDB, but
not an LSA of the same type advertised by another router.
l If the OSPF router needs to advertise an LSA that describes a default route only with the
help of another route, the route cannot be the one in the local routing domain. That is, it
cannot be the one learned by the local OSPF process. The external default route guides
forwarding outside the local OSPF routing domain, but the next hop of the routes in the
local OSPF routing domain are inside the local OSPF routing domain, and fails to
forward packets outside the local OSPF routing domain.
l The router checks whether there is any peer with the state of full in area 0 before
advertising the default route. The router advertises the default route only when there are
such peers because if there is no such peer, the backbone area cannot forward packets
and advertising the default route is meaningless.
Table 8-21 shows the advertisement of default routes in different areas.
Totally stub area AS external routes in Type5 LSAs or inter-area routes in Type3
LSAs cannot be advertised in a totally stub area.
Routers in the totally stub area have to learn AS external routes and
routes to other areas through an ABR. To help OSPF generate a
default router, you need to configure a totally stub area. After the
totally stub area is configured, an ABR automatically generates a
default summary-LSA (Type3 LSA) and advertises it to the entire
totally stub area. Routers in the totally stub area can obtain
reachable AS external routes and routes to other areas through the
ABR.
NSSA area A small number of AS external routes that are obtained through the
ASBR in the NSSA can be imported to an NSSA. Routes to other
areas in ASE LSAs (Type5 LSAs) cannot be advertised in the
NSSA. AS external routes are imported by the ASBR, and other
external routes are advertised through other areas. The ABR
generates a default NSSA LSA (Type7 LSA) automatically and
advertises it in the entire NSSA. A small number of AS external
routes can be obtained through the ASBR in the NSSA, and other
routes to other areas can be obtained through the ABR in the NSSA
connected to ASBR in other areas. You need to run commands on
the ASBR. The ASBR generates a default NSSA LSA (Type7
LSA) and advertises it to the entire NSSA. This way, external
routes can be received through the ASBR in an NSSA.
A Type7 LSA that describes a default route is neither translated
into a Type5 LSA that describes a default route on an ABR nor
advertised in the entire OSPF routing domain.
Totally NSSA area External routes in ASE LSAs (Type5 LSAs) to other areas or inter-
area routes in Type3 LSAs cannot be advertised in a totally NSSA.
Routers in the totally NSSA learn routes to other areas from an
ABR. You can configure a totally NSSA so that an ABR
automatically generates a default Type7 LSA and advertises it to
the entire totally NSSA. In this manner, routes to external areas and
inter-area routes can be advertised in the totally NSSA through the
ABR.
The filtering action determines whether to add routing entries to the routing table. That
is, only the routes that pass the filtering are added to the local routing table. All the
routes, however, can still be advertised from the OSPF routing table.
l Learning of inter-area LSAs
You can configure ABRs filter the incoming summary-LSAs of the local area using a
command. This configuration takes effect only on ABRs, because only ABRs can
advertise summary-LSAs.
l Advertisement of inter-area LSAs
You can configure ABRs to filter the outgoing summary-LSAs of the local area through
a command. This configuration takes effect only on ABRs.
l Table 8-22 Differences between inter-area LSA learning and route learning
Filters the incoming Filters only the calculated routes in LSAs to determine
LSAs of an area whether these routes are added to the local routing table.
directly.
l A virtual link must be configured on both ends of the link; otherwise, it does not take
effect.
l A transit area provides an internal route of a non-backbone area for both ends of the
virtual link.
According to RFC 2328, during the deployment of OSPF, all non-backbone areas need to be
connected to the backbone area. Otherwise, some areas will be unreachable.
As shown in Figure 8-57, Area 2 is not connected to the backbone area (Area 0), and ATN A
is not an ABR. Therefore, ATN A does not advertise routing information of Network 1 in
Area 0. As a result, ATN B does not have the route to Network 1.
ATN B
Network1
Area1 Area2
Area0
ABR ATN A
Area0 Area2
Virtual Link
ABR Area1 ABR
Transit Area
As shown in Figure 8-58, OSPF packets transmitted between two ABRs are only forwarded
by the OSPF routers that reside between the two ABRs. These routers detect that they are not
the destinations of the packets, and forward the packets as common IP packets.
OSPF Multi-process
OSPF supports multi-processes. Multiple OSPF processes can run on the same router
independently. Route interaction between different OSPF processes is similar to route
interaction between different routing protocols.
An interface of a router belongs to only a certain OSPF process.
A typical application of OSPF multi-process is to run OSPF between PEs and CEs in the VPN
where OSPF is also adopted in the backbone network. On the PEs, the two OSPF processes
are independent of each other.
8.6.2.2 OSPF GR
More and more routers use technologies that separate the control plane from the forwarding
plane. With such technologies, when the network topology remains stable, a restart of the
control plane does not affect the forwarding plane, and the forwarding plane can still forward
data properly, which ensures non-stop service forwarding.
Graceful restart (GR) is such a technology. It ensures that the forwarding plane keeps
forwarding data even if a restart occurs, and the actions on the control plane, such as re-
establishment of neighbor relationships and route calculation, do not affect the forwarding
plane. GR prevents service interruptions caused by route flapping, improving network
reliability.
Basic Concepts
GR is one of the high availability (HA) technologies, which comprise a set of comprehensive
technologies, such as fault-tolerant redundancy, link protection, faulty node recovery, and
Unless otherwise stated, GR described in this section refers to the GR technology defined in
RFC 3623.
l Grace-LSA
OSPF implements GR by flooding grace LSAs. Grace LSAs are used to inform a
neighbor of the GR time, cause, and interface address when the GR starts and ends.
l Role of a router during GR:
NOTE
The ATN device can function as a GR helper, but not as a GR restarter.
– Restarter: is the router that restarts. The restarter can be configured to support
totally GR or partly GR.
– Helper: is the router that helps the restarter. The helper can be configured to support
planned GR or unplanned GR or to selectively support GR through configured
policies.
l Causes of GR:
– Unknown: GR is triggered for an unknown reason.
– Software restart: GR is triggered by command execution.
– Software reload/upgrade: GR is triggered by a software restart or upgrade.
– Switch to redundant control processor: GR is triggered by an unexpected master/
slave control board switchover.
l GR period
The GR period cannot exceed 1800s. OSPF routers can exit from GR regardless of
whether GR succeeds or fails, without waiting for GR to expire.
Classification of OSPF GR
Classification based on GR status:
l Totally GR: indicates that if a neighbor of a router does not support GR, the router exits
from GR.
l Partly GR: indicates that if a neighbor does not support GR, only the interface associated
with this neighbor exits from GR, whereas the other interfaces perform GR normally.
l Planned GR: indicates that a router restarts or performs the master/slave control board
switchover because of command execution. The restarter sends a grace LSA before the
restart or master/slave control board switchover.
l Unplanned GR: indicates that a router restarts or performs a master/slave control board
switchover because of faults. A router performs the master/slave control board
switchover, without sending a grace LSA, and then enters GR after the slave control
board goes Up.
GR Process
l A router starts GR.
In planned GR mode, after a master/slave control board switchover is triggered because
of command execution, the restarter sends a grace LSA to all neighbors to notify them of
the start, period, and cause of GR, and then performs the master/slave control board
switchover.
In unplanned GR, the restarter does not send the grace LSA.
In unplanned GR mode, the restarter sends a grace LSA immediately after the slave
board goes Up, informing neighbors of the start, period, and cause of GR. The restarter
then sends a grace LSA to each neighbor five times to ensure that neighbors receive it.
This implementation is proposed by manufacturers but not defined by OSPF.
The restarter sends a grace LSA to notify neighbors that it enters GR. During GR,
neighbors retain neighbor relationships with the restarter so that other routers are not
aware of the switchover of the restarter.
l The router implements GR.
Restarter Helper
Before the active/ Grace-LSA
Enter Helper
standby switchover
Switchover LSAck Return LSAck packet
Finish switchover
for the received LSA
GR Before GR expires, the restarter re- After the helper receives the
succeed establishes neighbor relationships with grace LSA with the Age being
s. all neighbors before a master/slave 3600s from the restarter, their
control board switchover. neighbor relationship enters the
Full state.
Table 8-24 Comparison of master/slave control board switchovers in the GR mode and non-
GR mode
Switchover in Non-GR Mode Switchover in GR Mode
l OSPF neighbor relationships are re- l OSPF neighbor relationships are re-
established. established.
l Routes are recalculated. l Routes are recalculated.
l FIB entries change. l FIB entries remain unchanged.
l The entire network detects route l Except for neighbors of the device where
changes, and routes flap for a short master/slave control board switchover
period. occurs, other routers do not detect route
l Packets are lost during forwarding, changes.
and services are interrupted. l No packets are lost, and services are not
affected.
8.6.2.3 OSPF TE
OSPF Traffic Engineering (TE) is a new feature extended on the basis of OSPF to support
MPLS TE and establish and maintain the Label Switch Path (LSP) of TE. In the MPLS TE
In addition to the network topology, TE also needs to know network constraints, such as the
bandwidth, TE metric, administrative group, and affinity attribute. Current OSPF functions,
however, cannot meet these requirements. Therefore, OSPF needs to be extended by
introducing a new type of LSAs to advertise network constraints. Based on the network
constraints, the Constraint Shortest Path First (CSPF) algorithm can calculate the path that
satisfies certain constraints.
Information Information
flooding flooding
Message advertisement
Incoming Outgoing
packets packets
Packet forwarding component
OSPF does not care what the specific information is or how MPLS uses the information.
TE LSA
OSPF uses a new type of LSAs, namely, Type10 opaque LSAs, to collect and advertise TE
information. This type of LSA contains the link status information required by TE, including
the maximum link bandwidth, maximum reservable bandwidth, current reserved bandwidth,
and link color. Type10 opaque LSAs synchronize link status information among ATNs in an
area through the OSPF flooding mechanism. In this manner, a uniform TEDB is formed for
route calculation.
0 15 23 31
LS age Options LS type = 10
Opq type = 1 Opaque ID
Advertising router
LS sequence number
LS checksum length = 132
TLV type = 1 TLV length = 4
Router address
TLV type = 2 TLV length = 100
Sub-TLV type = 1 Sub-TLV length = 1
Link type = 1 Padding
Sub-TLV type = 2 Sub-TLV length = 4
External route tag
Link ID
Sub-TLV type = 3 Sub-TLV length = 4N
Local IP address
Remote IP address
The TE LSA uses the TLV format to carry the needed information. At present, two types of
TLVs are defined as follows:
l Router address TLV: uniquely identifies an MPLS node. In CSPF, this is known as the
router ID.
l Link TLV: carries the attributes of a link enabled with MPLS TE. Table 8-25 shows the
sub-TLVs that can be carried in the Link TLV.
Type2: Link ID (the length of the Link ID, in the format of an IP address.
Value field is 4 bytes) l For a point-to-point link, this field indicates the
OSPF router ID of the neighbor.
l For a multi-access link, this field indicates the
interface IP address of the designated router
(DR).
Type5: Traffic Engineering Metric TE metric configured on a TE link. The data format
(the length of the Value field is 4 is ULONG.
bytes)
Type6: Maximum Bandwidth (the Maximum bandwidth of a link. The data format is 4
length of the Value field is 4 bytes) bytes in floating point.
Definition
As an extension of OSPF, OSPF VPN multi-instance enables Provider Edges (PEs) and
Customer Edges (CEs) in VPNs to run OSPF for interworking and use OSPF to learn and
advertise routes.
Purpose
As a widely used IGP, in most cases, OSPF runs in VPNs. If OSPF runs between PEs and
CEs, and PEs use OSPF to advertise VPN routes to CEs, no other routing protocols need to be
configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
In Figure 8-62, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPF refer
to the process IDs of the multiple OSPF instances running on PEs.
Figure 8-62 Networking with OSPF running between PEs and CEs
VPN1 VPN1
Site1 Site3
Area1
Area0
CE1 CE3
Area0 Area0
MPLS VPN
OSPF 100 VPN1
OSPF 100 VPN1 Backbone
CE2 CE4
Area1 Area2
Site2 Site4
VPN2 VPN1
The routes that PE1 receives from CE1 are advertised to CE3 and CE4 as follows:
1. PE1 imports OSPF routes of CE1 into BGP and converts them to BGP VPNv4 routes.
2. PE1 uses MP-BGP to advertise the BGP VPNv4 routes to PE2.
3. PE2 imports the BGP VPNv4 routes into OSPF and then advertises these routes to CE3
and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
NOTE
By default, when OSPF is configured in VPN instances, the device is considered as an ASBR for each
area. For this reason, if two OSPF non-backbone areas (for example area 2 and area 4) are connected by
two parallel links, routes are imported between the two areas. Specifically, the routes in area 2 are
advertised to area 4, and the routes in area 4 are advertised to area 2. As a result, the two areas have the
same routing table. To prevent this problem, use one of the following methods:
l Filter LSAs or routes using a route-filter, route-policy, or any other filters available.
l Configure two VPN instances on each link, deploy import/export extended communities for
communication, and configure two OSPF processes.
.
In the extended application of OSPF VPN, the MPLS VPN backbone network is considered
Area 0. OSPF requires that Area 0 be contiguous. Therefore, Area 0 of all VPN sites must be
connected to the MPLS VPN backbone network. If a VPN site has Area 0, the PEs that CEs
access must be connected to the backbone area of this VPN site through Area 0. In this
scenario, a virtual link can also be deployed between the PEs and the backbone area. Figure
8-63 shows the networking for configuring OSPF areas between PEs and CEs.
PE1 VPN
backbone PE2
Area0 Area0
Area1
Virtual link
In Figure 8-63, a non-backbone area (Area 1) is configured between PE1 and CE1, and a
backbone area (Area 0) is configured in Site 1. Then, the backbone area in Site 1 is separated
from the VPN backbone area. To ensure that the backbone areas are contiguous, a virtual link
is configured between PE1 and CE1.
OSPF Domain ID
If inter-area routes are advertised between local and remote OSPF areas, these areas are
considered to be in the same OSPF domain.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPF route (Type 3 or Type 5) to be advertised to CEs based on domain IDs.
l If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
l If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 routes.
The remote domain ID is different from the Different If the local area is a non-
local primary domain ID or any of the local NSSA, external routes are
secondary domain IDs. generated.
If the local area is an NSSA,
NSSA routes are generated.
PE1
VPN
backbone
CE1
PE2
In Figure 8-64, on PE1, OSPF imports a BGP route destined for 10.1.1.1/32 and then
generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPF route
with 10.1.1.1/32 as the destination address and PE1 as the next hop and advertises the route to
PE2. Therefore, PE2 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
Similarly, CE1 also learns an OSPF route with 10.1.1.1/32 as the destination address and PE2
as the next hop. PE1 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops, respectively, and
the next hops of the routes from PE1 and PE2 to 10.1.1.1/32 are CE1, which leads to a routing
loop.
In addition, the priority of an OSPF route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 10.1.1.1/32 are replaced with the OSPF route, and the OSPF
route with 10.1.1.1/32 as the destination address and CE1 as the next hop is active in the
routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPF is deleted, which causes the OSPF route to be withdrawn. As a result, no OSPF route
exists in the routing table, and the BGP route becomes active again. This cycle causes route
flapping.
OSPF VPN provides solutions to this problem, as described in Table 8-27.
VPN route tag The VPN route tag is carried in the When a PE detects that the
Type 5 or Type 7 LSA generated by VPN route tag in the
PEs based on the received BGP incoming LSA is the same
private route. as that in the local LSA, the
It is not carried in BGP extended PE ignores this LSA, which
community attributes. The VPN route prevents routing loops.
tag is valid only on PEs that receive
BGP routes and generate OSPF LSAs.
Sham Link
OSPF sham links are unnumbered P2P links between two PEs over an MPLS VPN backbone
network.
Generally, BGP extended community attributes carry routing information over the MPLS
VPN backbone between BGP peers. OSPF running on the other PE can use the routing
information to generate inter-area routes from PEs to CEs.
Area 1 Area 1
OSPF 200 OSPF 200
CE12 CE22
VPN1 VPN1
site1 site3
backdoor
In Figure 8-65, if an intra-area OSPF link exists between the network segments of local and
remote CEs. Routes that pass through the intra-area route link and have higher priorities than
inter-area routes that pass through the MPLS VPN backbone network. As a result, VPN traffic
is always forwarded through the intra-area route instead of the backbone network. To prevent
such a problem, an OSPF sham link can be established between PEs so that the routes that
pass through the MPLS VPN backbone network also become OSPF intra-area routes and take
precedence.
l A sham link is a link between two VPN instances. Each VPN instance contains the
address of an end-point of a sham link. The address is a loopback address with the 32-bit
mask in the VPN address space on the PE.
l After a sham link is established between two PEs, the PEs become neighbors on the
sham link and exchange routing information.
l A sham link functions as a P2P link within an area. Users can select a route from the
sham link and intra-area route link by adjusting the metric.
Multi-VPN-Instance CE
OSPF multi-instance generally runs on PEs. The routers that run OSPF multi-instance within
user LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPF multi-instance running on PEs, MCEs have the following
characteristics:
l Do not need to support OSPF-BGP association.
l Establish one OSPF instance for each service. Different virtual CEs transmit different
services, which ensures LAN security at a low cost.
l Implement OSPF multi-instances on a CE. The key to implementing MCEs is to disable
loop detection and calculate routes directly. MCEs also need to use the received LSAs
with the ND-bit 1 for route calculation.
Definition
OSPF Not-So-Stubby Areas (NSSA) are a new type of OSPF areas.
Derived from stub areas, NSSAs resemble stub areas in many ways. The difference between
NSSAs and stub areas is that NSSAs can import and flood AS external routes to the entire
OSPF AS, without learning external routes in other areas of the OSPF network.
Purpose
As defined in OSPF, stub areas cannot import external routes. This prevents a large number of
external routes from consuming the bandwidth and storage resources of the ATN s in stub
areas. Stub areas cannot meet the requirement of the scenario where external routes need to be
imported Resource consumption caused by external routes also needs to be avoided.
Therefore, NSSAs are introduced into the network.
RIP RIP
Type5 Type5 NSSA Area
Type7 LSA
l Type7 LSAs are a new type of LSA that was introduced to support NSSAs and describe
the imported external routes.
l Type7 LSAs are generated by the ASBRs of NSSAs and flooded only in the NSSAs
where ASBRs reside.
l When receiving Type7 LSAs, the ABRs of NSSAs selectively translate Type7 LSAs to
Type5 LSAs so that external routes can be advertised in other areas of the OSPF
network.
l Default routes can also be expressed through Type7 LSAs so that traffic can be
forwarded to other ASs.
N-bit
A ATN in an area must be configured with the same area type. In OSPF, the N-bit is carried in
a Hello packet to identify that a ATN supports NSSAs. OSPF neighbor relationships cannot
be established between the ATNs with different area types.
Going against RFC 1587, some manufacturers also set the N-bit in OSPF Database
Description (DD) packets. Huawei devices can be configured to be compatible with the
devices of these manufacturers for interworking.
FA indicates that the packet to a specific destination address is to be forwarded to the address
specified by.
The loopback interface address in an area is preferentially selected as the FA. If no loopback
interface exists, the address of the interface that is Up and has the largest logical index in the area
is selected as the FA.
l Default Type7 LSAs that meet the preceding conditions can also be translated.
l Type7 LSAs generated by ABRs are not set with the P-bit.
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects connectivity of a data protocol on the same path between two
systems. The path can be a physical link, logical link, or tunnel.
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session immediately
detects a link fault and then notifies OSPF of the fault. This speeds up the OSPF's response to
the change of the network topology.
Purpose
A link fault or topology change can cause ATNs to recalculate routes. The convergence of
routing protocols must be sped up to improve network performance.
Because link faults are unavoidable, a feasible solution is required to detect faults faster and
notify the faults to routing protocols immediately. If BFD is associated with routing protocols,
when a link fault occurs, BFD can speed up the convergence of routing protocols.
Not associated An OSPF Dead timer expires. By default, the Seconds level
timeout period is 40s.
Principle
GE0/2/0 GE0/2/1
1.1.1.2/24 2.2.2.1/24 Area0
ATN C
3. The outbound interface on ATN A connected to ATN B is GE 0/2/1. If the link fails,
BFD detects the fault and notifies ATN A.
4. ATN A processes the event that a neighbor relationship is Down and re-calculates routes.
After calculation, the outbound interface is GE 0/2/0 passes through ATN C and then
reaches ATN B.
Definition
Generalized TTL security mechanism (GTSM) is a mechanism that protects services over the
IP layer by checking whether the TTL value in an IP packet header is within a pre-defined
range.
Purpose
On networks, attackers may simulate OSPF packets and keep sending them to a device. After
receiving these packets, the device directly sends them to the control plane for processing
without checking their validity if the packets are destined for the device. As a result, the
control plane is busy processing these packets, resulting in high CPU usage.
GTSM is used to protect the TCP/IP-based control plane against CPU-utilization attacks, such
as CPU-overload attacks.
Principles
GTSM-enabled devices check the TTL value in each received packet based on a configured
policy. The packets that fail to match the policy are discarded or sent to the control plane,
which prevents the devices from possible CPU-utilization attacks. A GTSM policy involves
the following items:
l For directly connected OSPF neighbors, the TTL value of the protocol packets to be sent
is set to 255.
l For multi-hop neighbors, a reasonable TTL range is defined.
l GTSM takes effect on unicast packets rather than multicast packets. This is because the
TTL value of multicast packets can only be 255, and therefore GTSM is not needed to
protect against multicast packets.
l GTSM does not support tunnel-based neighbors.
l GTSM is applicable to Non-Broadcast Multi-Access (NBMA) networks, virtual links,
and sham links.
Definition
Generally, routers periodically send Hello packets through OSPF interfaces. Specifically, a
router uses a Hello timer to control the interval at which Hello packets are sent and can send
Hello packets again only after the Hello timer expires. Therefore, sending Hello packets at a
fixed interval slows down the establishment of OSPF neighbor relationships.
Without Smart-discover l Hello packets are sent only when the Hello timer
expires.
l Neighbors keep waiting to receive Hello packets
within the Hello interval.
Principles
In the following scenarios, Smart-discover-enabled interfaces can send Hello packets to
neighbors regardless of whether the Hello timer expires:
Definition
When a new router is deployed on the network or a router is restarted, network traffic may be
lost during BGP route convergence because IGP routes converge more quickly than BGP
routes.
Purpose
If a backup link exists, during traffic switchback, BGP traffic may be lost because BGP routes
converge more slowly than OSPF routes do.
In Figure 8-68, ATN A, ATN B, ATN C, and ATN D run OSPF and are IBGP peers. ATN C is
the backup device of ATN B. When the network is stable, BGP and OSPF routes converge
completely on the devices.
In most cases, traffic from ATN A to 10.3.1.0/30 passes through ATN B. If ATN B fails,
traffic is switched to ATN C. After ATN B recovers, traffic is switched back to ATN B.
However, convergence of OSPF routes is complete while BGP route convergence is still
going on at this point. As a result, ATN B does not have the route to 10.3.1.0/30.
When packets from ATN A to 10.3.1.0/30 reach ATN B, ATN B discards them because ATN
B has no route to 10.3.1.0/30.
ATN -C AS 20
10.1.2.2/30 ATN -F
10.1.4.1/30
10.3.1.2/30
10.1.2.1/30 10.1.4.2/30
10.3.1.1/30
ATN -A AS 10 ATN -D EBGP
ATN -E
10.1.3.2/30 10.2.1.1/30
10.2.1.2/30
10.1.1.1/30
10.1.1.2/30 10.1.3.1/30
ATN -B
Principles
In Figure 8-68, OSPF-BGP synchronization is enabled on ATN B. In this situation, before
BGP route convergence is complete, ATN A continues to forward traffic to the backup link
ATN C, without forwarding traffic to ATN B, until BGP route convergence on ATN B is
complete.
The router enabled with OSPF-BGP synchronization remains as a stub router within the set
synchronization period. During this period, the link cost in the LSA advertised by the router is
the maximum value (65535), instructing other OSPF devices not to use it as a transit router
for data forwarding.
Background
In the networking with primary and backup links, if the primary link fails and then recovers,
traffic is switched from the backup link back to the primary link.
IGP route convergence completes before an LDP session is established. Consequently, the
original LSP is deleted before the new LSP is established. As a result, LSP traffic is
interrupted.
Purpose
In Figure 8-69, the primary link travels along the path PE1→P1→P2→P3→PE2, and the
backup link travels along the path PE1→P1→P4→P3→PE2.
If the primary link fails, traffic is switched to the backup link. After the primary link recovers,
traffic is switched back to the primary link. During this process, traffic is interrupted for a
long time.
P2
PE1 P1 P3 PE2
P4
No Seconds
Yes Milliseconds
Principles
OSPF-LDP synchronization delays route switchback by suppressing the establishment of
OSPF neighbor relationships until LDP convergence is complete. Specifically, the backup link
continues to forward traffic until an LSP is established on the primary link, and then the
backup link is deleted.
l Hold-down
l Hold-max-cost
l Delay
After the primary link recovers, a router performs the following operations:
1. Starts the hold-down timer. The OSPF interface does not establish OSPF neighbors but
waits for establishment of an LDP session until the timer expires.
2. Starts the hold-max-cost timer when the hold-down timer expires and advertises the
maximum link cost of the interface connected to the primary link through local LSAs.
3. Starts the Delay timer to wait for establishment of an LSP after an LDP session is
reestablished on the primary link.
4. When the Delay timer expires, LDP notifies OSPF that synchronization is complete,
regardless of the OSPF status.
Definition
OSPF requires that routers in the same area have the same link state database (LSDB).
When a router fails to store additional routing information because of limited system
resources, OSPF database overflow occurs.
Purpose
You can configure stub areas or NSSAs to solve the problem of the continuous increase in
routing information that causes the exhaustion of system resources of routers. However,
configuring stub areas or NSSAs cannot prevent the database overflow caused by the increase
in dynamic routes. To reduce the LSDB size, set the maximum number of external LSAs in
the LSDB.
Principles
To prevent database overflow, you can set the maximum number of non-default external
routes on a router.
All routers on the OSPF network must be set with the same upper limit. If the number of
external routes on a router reaches the upper limit, the router enters the Overflow state and
starts an overflow timer. The router automatically exits from the overflow state after the timer
expires, By default, it is 5 seconds.
Entering overflow state A router deletes all non-default external routes that is
generated.
Staying in overflow state l Router does not generate non-default external routes.
l Router discards the newly received, non-default
external routes, and does not reply with an LSAck
packet.
l When the overflow timer expires, the router checks
whether the number of external routes still exceeds the
upper limit.
– If so, the router restarts the timer.
– If not, the router exits from overflow state.
In HSB mode, OSPF backs up necessary information on the Active Main Board (AMB) to the
Standby Main Board (SMB). When the AMB fails, the SMB replaces it to ensure the normal
operation of OSPF.
l Incremental SPF (I-SPF): recalculates only the routes of the changed nodes rather than
all the nodes when the network topology changes, which speeds up route calculation.
l Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.
l An OSPF intelligent timer: can dynamically adjust its value based on the user's
configuration and the interval at which an event is triggered, such as the route calculation
interval, which ensures rapid and stable network operation.
The OSPF intelligent timer uses the exponential backoff technology so that the value of
the timer can reach the millisecond level.
I-SPF
In ISO 10589, the Dijkstra algorithm was adopted to calculate routes. When a node changes
on the network, this algorithm is used to recalculate all routes. The calculation takes a long
time and consumes too many CPU resources, which affects the convergence speed.
I-SPF improves the Dijkstra algorithm. Except for the first time, only changed nodes instead
of all nodes are involved in calculation. The SPT generated at last is the same as that
generated by the previous algorithm. I-SPF decreases CPU usage and speeds up network
convergence.
PRC
Similar to I-SPF, PRC calculates only the changed routes. However, PRC does not calculate
the shortest path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a router. Either an SPT or
a leaf change causes a route change. The SPT change is irrelevant to the leaf change. PRC
processes routing information as follows:
l If the SPT changes, PRC processes the routing information of all leaves on a changed
node.
l If the SPT remains unchanged, PRC does not process the routing information on any
node.
l If a leaf changes, RPC processes the routing information on the leaf only.
l If a leaf remains unchanged, PRC does not process the routing information on any leaf.
For example, if OSPF is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. PRC updates only the routes of this interface, consuming less CPU
resources.
PRC improves the SPF algorithm. Working with I-SPF, RPC further improves network
convergence performance.
NOTE
On live networks, only I-SPF and PRC are used to calculate OSPF routes.
To speed up route convergence on the entire network, the OSPF intelligent timer controls
route calculation, LSA generation, and LSA receiving.
l On a network where routes are calculated repeatedly, the OSPF intelligent timer
dynamically adjusts the route calculation based on user's configuration and the
exponential backoff technology. The number of route calculation times and the CPU
resource consumption are decreased. Routes are calculated after the network topology
stabilizes.
l On an unstable network, if a router generates or receives LSAs due to frequent topology
changes, the OSPF intelligent timer can dynamically adjust the interval. No LSAs are
generated or processed within an interval, which prevents invalid LSAs from being
generated and advertised on the entire network.
The OSPF intelligent timer is started by default and uses the default value.
Definition
The management information base (MIB) is a database that stores information. The network
administrator can call MIB objects through the agent to control, configure, or monitor
network devices. For details, refer to chapter "SNMP".
As defined in RFC 4750, the OSPF MIB is used to set, modify, and view the running status of
OSPF on network devices.
Purpose
The network administrator can use the MIB to query information about the operation of
managed devices and to configure network devices through the set operation. The OSPF MIB
helps the administrator monitor and manage networks more rapidly and effectively.
The network administrator can perform the get and get-next operations (rather than the set
operation) on all OSPF MIB objects defined in RFC 4750.
To enhance and supplement MIBs defined in RFC 4750, private OSPF MIBs are supported,
and you can perform the set operation on the private MIBs.
Principles
After an OSPF process is bound to the MIB, the network administrator can perform the get
and get-next operations through the OSPF MIB to obtain information about OSPF link state
databases (LSDBs), areas, interfaces, and neighbors of the bound OSPF process.
OSPF supports the set operation on three private MIB tables, process table, area table, and
network table.
l Using the set operation on the process table of a private MIB, you can create or delete an
OSPF process, and configure or delete parameters of the OSPF process.
l Using the set operation on the area table of a private MIB, you can create or delete an
OSPF area, and set or delete parameters of the OSPF area.
l Using the set operation on the network table of a private MIB, you can create or delete a
specific network segment for an OSPF area.
Using the set operation on the three tables, you can configure the basic OSPF functions and
set up a basic OSPF topology to conveniently manage and configure the network.
Definition
When multiple concurrent links exist, you can deploy OSPF mesh-group to add the links to a
mesh group. Then, OSPF floods LSAs only to a link selected from the mesh group, reducing
the pressure on the system.
The mesh-group feature is disabled by default.
Purpose
After receiving or generating an LSA, an OSPF process floods the LSA. When there are
multiple concurrent links, OSPF floods the LSA to each link and sends Update messages.
If there are 2000 concurrent links, OSPF floods each LSA 2000 times. Only one flooding,
however, is valid.
To prevent burden on the system caused by repetitive flooding, enable mesh-group to add
multiple concurrent links to a group so that the system floods LSAs only to a primary link
selected from the mesh group.
Principles
In Figure 8-70, ATN-A and ATN-B are OSPF neighbors and are connected through three
links. After receiving a new LSA from interface 4, ATN-A floods it to ATN-B through
interfaces 1, 2, and 3.
This flooding causes a heavy load on the concurrent links. For the neighbor with concurrent
links, only one link is needed to flood the LSA.
1 LSA
LSA 4 2 LSA
When multiple concurrent links exist between a device enabled with OSPF mesh-group and
its neighbor, the device selects a primary link to flood received LSAs, as shown in Figure
8-71.
1 LSA
LSA 4 2 LSA
3 LSA
ATN-A ATN-B
As defined in OSPF, a device floods LSAs to a link only when the neighbor status is
Exchange or higher. When the status of the interface on the primary link is lower than
Exchange, the device reselects a primary link from the concurrent links and then floods the
LSA. After receiving the LSA flooded by ATN-A from link 1, ATN-B does not flood the LSA
back through interfaces 2 and 3.
As defined by the mesh-group feature, the router ID of a neighbor uniquely identifies the
mesh group. Interfaces connected to the same neighbor with the status higher than Exchange
belong to the same mesh group.
In Figure 8-72, a mesh group of ATN-A resides in Area 0, which contains the links of
interface 1 and interface 2. Interface 3 resides on the broadcast link and has more than one
neighbor. Therefore, interface 3 cannot be added to the mesh group.
4 2
ATN-B
ATN-A 3
Area0
NOTE
If mesh-group is enabled on a router and the router IDs of the router and its directly connected neighbor
are the same, LSDBs cannot be synchronized, and routes cannot be calculated correctly. In this case, you
need to reconfigure the router ID of the neighbor.
NOTE
The nexthop and outbound interface of an OSPF loop-free backup link can be obtained using either of
the following methods:
l For a static backup link, after IP FRR is enabled, configure a nexthop and an outbound interface for
the static backup link.
l For a dynamic backup link, after OSPF IP FRR is enabled, use the LFA algorithm to calculate the
nexthop and outbound interface for the dynamic backup link.
This section describes how to obtain the nexthop and outbound interface for the dynamic backup link.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPF-enabled
devices need to perform multiple operations, including detecting the fault, updating the link-
state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
much longer time than 50 ms, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met. OSPF IP FRR can solve this problem.
OSPF IP FRR conforms to dynamic IP FRR defined by RFC 5286. With OSPF IP FRR,
devices can switch traffic from a faulty primary link to a backup link within 50 ms, protecting
against a link or node failure.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which OSPF supports only LFA and Remote LFA.
Related Concepts
OSPF IP FRR
OSPF IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA)
algorithm to compute the next hop of a backup link and stores the next hop together with the
primary link in the forwarding table. If the primary link fails, the device switches the traffic to
the backup link before routes are converged on the control plane. This mechanism keeps the
traffic interruption duration within 50 ms and minimizes the impacts.
An OSPF IP FRR policy is used to filter alternate next hops. Only the alternate next hops that
match the filtering rules of the policy can be added to the IP routing table. Users can
configure a desired OSPF IP FRR policy to filter alternate next hops.
LFA algorithm
A device uses shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor that can provide a backup link to the destination node. The device then uses the
inequalities defined in RFC 5286 and the LFA algorithm to calculate the next hop of the loop-
free backup link that has the smallest cost of the available shortest paths.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
NOTE
In the following description, Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a
source node, N for a node along a backup link, and D for a destination node.
Link protection takes effect when the traffic to be protected flows along a specified link and
the link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
As shown in Figure 8-73, traffic flows from Router S to Router D. The primary link is Router
S->Router E->Router D, and the backup link is Router S->Router N->Router E->Router D.
The preceding inequality is met. With OSPF IP FRR, Router S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration within 50 ms.
Node-and-link protection
NOTE
In the following description, Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a
source node, E for the faulty node, N for a node along a backup link, and D for a destination node.
Node-and-link protection takes effect when the traffic to be protected flows along a specified
link and node and the following conditions are met:
As shown in Figure 8-74, traffic flows from Router S to Router D. The primary link is Router
S->Router E->Router D, and the backup link is Router S->Router N->Router D. The
preceding inequalities are met. With OSPF IP FRR, Router S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration within 50 ms.
In Figure 8-75, traffic flows through PE1 -> P1 -> P2 -> PE2, and the primary link is between
P1 and P2. Remote LFA calculates a PQ node (P4) and establishes a Label Distribution
Protocol (LDP) tunnel between P1 and P4. If P1 detects a failure on the primary link, P1
encapsulates packets into MPLS packets and forwards MPLS packets to P4. After receiving
the packets, P4 removes the MPLS label from them and searches the IP routing table for a
next hop to forward the packets to PE2. Remote LFA ensures uninterrupted traffic forwarding.
On the network shown in Figure 8-75, Remote LFA calculates the PQ node as follows:
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
OSPF anti-microloop
In Figure 8-75, OSPF remote LFA FRR is enabled, the primary link is PE1 -> P1 -> P2 ->
PE2, and the backup link is PE1 -> P1 -> P3 -> P4 -> P2 -> PE2. If the primary link fails,
traffic is switched to the backup link. Specifically, after P1 completes route convergence, its
next hop becomes P3. However, the route convergence on P3 is slower than that on P1, and
P3's next hop is still P1. As a result, a temporary loop occurs between P1 and P3. OSPF anti-
microloop can address this problem by delaying P1 from switching traffic to P3 until the route
convergence on P3 completes.
NOTE
OSPF anti-microloop applies only to OSPF remote LFA FRR.
Derivative Functions
If you bind a Bidirectional Forwarding Detection (BFD) session with OSPF IP FRR, the BFD
session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPF IP FRR
is triggered on the interface to switch traffic from the faulty link to the backup link, which
minimizes the loss of traffic.
According to the types of packets, the authentication is classified into the following:
l Area authentication
This authentication is configured in the OSPF area view and applies to the packets
received by all the interfaces in the OSPF area.
l Interface authentication
This authentication is configured in the interface view and applies to all the packets
received by the interface.
According to the authentication modes of packets, the authentication is classified into the
following:
l Non-authentication
Authentication is not required.
l Simple authentication
The authenticated party directly adds the configured password to packets for
authentication. This imposes security threats.
l MD5 authentication
The authenticated party encrypts the configured password using a Message Digest 5
(MD5) algorithm and adds the ciphertext password to packets for authentication. This
authentication mode improves password security. The MD5 algorithms supported
includes MD5 and HMAC-MD5.
l Keychain authentication
A keychain consists of multiple authentication keys, each of which contains an ID and a
password. Each key has the lifecycle. According to the life cycle of the key, you can
dynamically select different authentication keys from the keychain. A keychain can
dynamically select the authentication key to enhance attack defense.
Keychain provides authentication protection for OSPF by dynamically changing
algorithms and keys to improve the security of OSPF.
l HMAC-SHA256 authentication
The HMAC-SHA256 algorithm use to encrypt a password before adding the password to
the packet, which improves password security.
OSPF carries authentication types in packet headers and authentication information in packet
tails.
l 0: Non-authentication
l 1: Simple authentication
l 2: Ciphertext authentication
Application Environment
ATND ATNE
l OSPF neighbor relationships can be set up between multiple devices on the same
network only when interface authentication is configured in the same manner on all the
devices.
l When multiple devices are in the same area, you must configure area authentication in
the same manner on all the devices.
8.6.3.1 OSPF GR
On the network shown in Figure 8-77, ATN-A, CX-B, CX-C, and CX-D run OSPF for
interworking, and GR is enabled on ATN-A and CX-B. When ATN-A restarts, CX-B helps
ATN-A to perform GR, without notifying other neighbors of ATN-A's restart. This ensures
uninterrupted network traffic forwarding.
t CX-C
s no t
e a
B d o - C th s
- X rt
CX ify C resta
t
Set up neighbor no N-A
ATN-A relationship and AT
CX-B
negotiate GR
Terms
Term Description
OSPF Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol
(IGP) developed by the Internet Engineering Task Force (IETF). OSPF
version 2 (OSPFv2), which is defined in RFC 2328, is intended for IPv4.
OSPF version 3 (OSPFv3), which is defined in RFC 2740, is intended for
IPv6.
OSPF IP With OSPF IP fast reroute (FRR), a device pre-computes alternate next hops
FRR and stores them in the IP routing table. If a primary link fails, the device
switches the traffic to a backup link within 50 ms.
OSPF TE OSPF Traffic Engineering (TE) is a new feature extended on the basis of
OSPF to support MPLS TE and establish and maintain the Label Switch
Path (LSP) of TE. In the MPLS TE architecture, OSPF functions as the
information advertising component, responsible for collecting and
advertising MPLS TE information.
Sham links Sham links are unnumbered P2P links between two PEs over an MPLS VPN
backbone network.
Term Description
Virtual link A virtual link is a logical channel established between two ABRs over a non-
backbone area.
8.7 OSPFv3
8.7.1 Introduction
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2 (OSPFv2) is used for IPv4, and OSPF Version 3 (OSPFv3) is
used for IPv6.
Purpose
The primary purpose of OSPFv3 is to develop a routing protocol independent of any specific
network layer. The internal routing information of OSPFv3 is redesigned to serve this
purpose.
l OSPFv3 does not insert IP-based data in the header of each packet and Link State
Advertisement (LSA).
l OSPFv3 executes some crucial tasks that originally require the data in the IP packet
header using the information independent of any network protocol. For example,
OSPFv3 can identify the LSA that advertises the routing data.
8.7.2 Principles
Database Description (DD) A DD packet contains the summary of the local LSDB.
packet It is exchanged between two OSPFv3 routers to update
the LSDBs.
Link State Request (LSR) packet LSR packets are sent to the neighbor to request the
required LSAs.
An OSPFv3 router sends LSR packets to its neighbor
only after they exchange DD packets.
Link State Update (LSU) packet The LSU packet is used to transmit required LSAs to
the neighbor.
Link State Acknowledgment The LSAck packet is used to acknowledge the received
(LSAck) packet LSA packets.
LSA Type
LSA Type Description
Link-LSA (Type8) Each router generates a link LSA for each link. A link LSA
describes the link-local address and IPv6 address prefix
associated with the link and the link option set in the
network LSA. It is transmitted only on the link.
Router Type
IS-IS ASBR
Area1 Area4
Backbone Router
Internal Router
Area0
Area border router (ABR) An ABR can belong to two or more areas, but one of the
areas must be a backbone area.
An ABR is used to connect the backbone area and the non-
backbone areas. It can be physically or logically connected
to the backbone area.
AS boundary router (ASBR) A router that exchanges routing information with other ASs
is called an ASBR.
An ASBR may not locate on the boundary of an AS. It can
be an internal router or an ABR.
Type1 external routes Because of the high reliability of Type 1 external routes,
the calculated cost of external routes is equal to that of AS
internal routes, and can be compared with the cost of
OSPFv3 routes.
That is, the cost of a Type1 external route equals the cost of
the route from the router to the corresponding ASBR plus
the cost of the route from the ASBR to the destination
address.
Type2 external routes Because of the low reliability of Type2 external routes, the
cost of the route from the ASBR to a destination outside
the AS is considered far greater than the cost of any
internal path to an ASBR.
Therefore, OSPFv3 only takes the cost of the route from
the ASBR to a destination outside the AS into account
when calculating route costs. That is, the cost of a Type2
external route equals the cost of the route from the ASBR
to the destination of the route.
Area
When a large number of ATNs run OSPFv3, link state databases (LSDBs) become very large
and require a large amount of storage space. Large LSDBs also complicate shortest path first
(SPF) computation and are computationally intensive for the ATNs. Network expansion
causes the network topology to change, which results in route flapping and frequent OSPFv3
packet transmission. When a large number of OSPFv3 packets are transmitted on the network,
bandwidth usage efficiency decreases. Each change in the network topology causes all ATNs
on the network to recalculate routes.
OSPFv3 resolves this problem by partitioning an AS into different areas. An area is regarded
as a logical group, and each group is identified by an area ID. A ATN, not a link, resides at the
border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPFv3 interface.
OSPFv3 areas include common areas, stub areas, and not-so-stubby areas (NSSAs), as
described in Table 8-34.
Stub area A stub area is a non-backbone area with only l The backbone area
one ABR and generally resides at the border of cannot be configured
an AS. The area border router (ABR) in a stub as a stub area.
area does not transmit received AS external l An autonomous system
routes, which significantly decreases the number boundary router
of entries in the routing table on the ABR and (ASBR) cannot exist in
the amount of routing information to be a stub area. Therefore,
transmitted. To ensure the reachability of AS AS external routes
external routes, the ABR in the stub area cannot be advertised
generates a default route and advertises the route within the stub area.
to non-ABRs in the stub area.
l A virtual link cannot
A totally stub area allows only intra-area routes pass through a stub
and ABR-advertised Type 3 link state area.
advertisements (LSAs) carrying a default route
to be advertised within the area.
Non-broadcast Multiple If the link layer protocol is frame relay, ATM, or X.25, OSPFv3
Access (NBMA) defaults the network type to NBMA.
In this type of networks, protocol packets such as Hello
messages, DD packets, LSR packets, LSU packets, and LSAck
packets, are transmitted in unicast mode.
Point-to-Multipoint Regardless of the link layer protocol, OSPFv3 does not default
(P2MP) the network type to P2MP. A P2MP network must be forcibly
changed from other network types. The common practice is to
change a non-fully connected NBMA to a P2MP network.
In this type of networks, the following situations occur:
l Hello messages are transmitted in multicast mode with the
multicast address as FF02::5.
l Other protocol packets, including DD packets, LSR packets,
LSU packets, and LSAck packets, are transmitted in unicast
mode.
Point-to-point (P2P) If the link layer protocol is PPP, HDLC, or LAPB, OSPFv3
defaults the network type to P2P.
In this type of network, the protocol packets, including Hello
messages, DD packets, LSR packets, LSU packets, and LSAck
packets, are transmitted to the multicast address FF02::5.
Stub Area
A stub area is a special area where the ABRs do not flood the received external routes. In stub
areas, the size of the routing table of the routers and the routing information in transmission
are reduced.
Configuring a stub area is optional. Not all areas can be configured as stub areas. Usually, a
stub area is a non-backbone area with only one ABR and is located at the AS boundary.
To ensure the reachability of a destination outside the AS, the ABR in the stub area generates
a default route and advertises it to the non-ABR routers in the stub area.
Area0 Area2
Virtual Link
ABR Area1 ABR
Transit Area
As shown in Figure 8-79, OSPFv3 packets transmitted between two ABRs are only
forwarded by the OSPFv3 devices that reside between the two ABRs. The OSPFv3 devices
detect that they are not the destinations of the packets, so they forward the packets as common
IP packets.
OSPFv3 Multi-process
OSPFv3 supports multi-process. More than one OSPFv3 process can run on the same router
because processes are independent of each other. Route interaction between different OSPFv3
processes is similar to the route interaction between different routing protocols.
– OSPFv3 floods packets in an OSPF area or on a link. It sets the U flag bit of packets
(the flooding area is based on the link local) so that unidentified packets are stored
or forwarded to the stub area.
For example, ATN A and ATN B can identify LSAs of a certain type. They are
connected through ATN C, which, however, cannot identify this type of LSAs. When
ATN A floods an LSA of this type, ATN C can still flood the received LSA to ATN B
although it does not identify this LSA. ATN B then processes the LSA.
If OSPFv2 is run, ATN C discards the unidentified LSA so that the LSA cannot reach
ATN B.
l OSPFv3 supports multi-process on a link.
Only one OSPF process can be configured on a physical interface.
In OSPFv3, one physical interface can be configured with multiple processes that are
identified by different instance IDs. That is, multiple OSPFv3 instances can run on one
physical link. They establish neighbor relationships with the other end of the link and
transmit packets to the other end without interfering with each other.
Thus, the resources of a link can be shared among OSPFv3 instances that simulate
multiple OSPFv3 routers, which improves the utilization of limited router resources.
l OSPFv3 uses IPv6 link-local addresses.
IPv6 implements neighbor discovery and automatic configuration based on link-local
addresses. Routers running IPv6 do not forward IPv6 packets whose destination address
is a link-local address. Those packets can only be exchanged on the same link. The
unicast link-local address starts from FE80/10.
As a routing protocol running on IPv6, OSPFv3 also uses link-local addresses to
maintain neighbor relationships and update LSDBs. Except Vlink interfaces, all OSPFv3
interfaces use link-local addresses as the source address and that of the next hop to
transmit OSPFv3 packets.
The advantages are as follows:
– The OSPFv3 can calculate the topology without knowing the global IPv6 addresses
so that topology calculation is not based on IP addresses.
– The packets flooded on a link are not transmitted to other links, which prevents
unnecessary flooding and saves bandwidth.
l OSPFv3 packets do not contain authentication fields.
OSPFv3 directly adopts IPv6 authentication and security measures. Thus, OSPFv3 does
not need to perform authentication. It only focuses on the processing of packets.
l OSPFv3 supports two new LSAs.
– Link LSA: A router floods a link LSA on the link where it resides to advertise its
link-local address and the configured global IPv6 address.
– Intra-area prefix LSA: A router advertises an intra-area prefix LSA in the local
OSPF area to inform the other routers in the area or the network, which can be a
broadcast network or a NBMA network, of its IPv6 global address.
l OSPFv3 identifies neighbors based on router IDs only.
On broadcast, NBMA, and P2MP networks, OSPFv2 identifies neighbors based on IPv4
addresses of interfaces.
OSPFv3 identifies neighbors based on router IDs only. Thus, even if global IPv6
addresses are not configured or they are configured in different network segments,
OSPFv3 can still establish and maintain neighbor relationships so that topology
calculation is not based on IP addresses.
OSPF Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol
(IGP) developed by the Internet Engineering Task Force (IETF). OSPF
version 2 (OSPFv2), which is defined in RFC 2328, is intended for IPv4.
OSPF version 3 (OSPFv3), which is defined in RFC 2740, is intended for
IPv6.
OSPFv3 IP With OSPFv3 IP fast reroute (FRR), a device pre-computes alternate next
FRR hops and stores them in the IP routing table. If a primary link fails, the
device switches the traffic to a backup link within 50 ms.
8.8 BGP
l If BGP and BGP4+ implement a feature in the same way, details are not provided in this chapter.
l For the route aggregation function, BGP supports both automatic aggregation and manual
aggregation, whereas BGP4+ supports only manual aggregation.
Border Gateway Protocol (BGP) is a dynamic routing protocol used between autonomous
systems (AS).
BGP-1 (defined in RFC 1105), BGP-2 (defined in RFC 1163), and BGP-3 (defined in RFC
1267) are three earlier-released versions of BGP. BGP exchanges the reachable inter-AS
routes, establishes inter-AS paths, avoids routing loops, and applies routing policies between
ASs.
The current BGP version is BGP-4 defined by RFC 4271.
As an exterior routing protocol on the Internet, BGP is widely used among Internet Service
Providers (ISP).
Purpose
BGP transmits routes between ASs, but is not required in all situations.
Client AS
IBGP
EBGP EBGP
ISP1 ISP2
Internet
l As shown in Figure 8-80, the user (Client AS) needs to be connected to two or more
ISPs. The ISPs need to provide all or part of the Internet routes for the user. Based on the
AS Path carried in BGP routes, the ATN selects the optimal route through the AS of an
ISP to the destination.
l Different organizations need to transmit the AS_Path.
l Users transmit private network routes through Layer 3 VPN. For details, see the Feature
Description - VPN.
8.8.2 Principles
This chapter describes BGP features.
BGP is called IBGP when it runs within an AS and is called EBGP when it runs between ASs.
Client AS
IBGP
EBGP EBGP
ISP1 ISP2
Internet
BGP Messages
BGP runs by sending five types of BGP messages: Open, Update, Notification, Keepalive,
and Route-refresh.
l Open message: is the first message that is sent after a TCP connection is set up, and is
used to negotiate capability in order to set up BGP peer relationships. After the peer
receives an Open message and peer negotiation succeeds, the peer sends a Keepalive
message to confirm and maintain the peer relationship. Then, peers can exchange
Update, Notification, Keepalive, and Route-refresh messages.
l Update message: is used to exchange routes between BGP peers. Update messages can
be used to send the following communications:
– Advertise multiple reachable routes with the same attributes. These routes can share
a group of route attributes. Route attributes contained in an Update message are
applicable to all destination addresses (expressed by IP prefixes) contained in the
Network Layer Reachability Information (NLRI) field of the Update message.
– Withdraw multiple unreachable routes. Each route is identified by its destination
address, which identifies routes previously advertised between BGP speakers.
– Withdraw routes only. In this case, the message does not need to carry the path
attributes or NLRI. Conversely, an Update message can be used only to advertise
the reachable routes, so it does not need to carry information about withdrawn
routes.
l Notification message: is sent to its peer when BGP detects an error. The BGP connection
is then torn down immediately.
l Keepalive message: is sent periodically to the peer to maintain the peer relationship.
l Route-refresh message: is used to request that the peer resend all reachable routes.
If all devices of BGP are enabled with Route-refresh capability, the local BGP device
sends Route-refresh messages to peers when the import routing policy of BGP changes.
After receiving the message, the peers resend their routing information to the local BGP
device. The BGP routing table can be dynamically refreshed, and the new routing policy
can be used, without tearing down BGP connections.
– If BGP initiates a TCP connection with an unknown IP address, the TCP connection
fails. When this occurs, BGP restarts the ConnectRetry Timer with the initial value
and stays in the Active state.
l In OpenSent state, BGP has sent one Open message to its peer and waits for the other
Open message from the peer.
– If there are no errors in the Open message received, BGP changes its state to
OpenConfirm and sends a Keepalive message.
– If there are errors in the Open message received, BGP sends a Notification message
to the remote peer and changes its state to Idle.
– If the TCP connection fails, BGP restarts the ConnectRetry Timer with the initial
value, continues to listen for a TCP connection initiated by the remote peer, and
changes its state to Active.
l In OpenConfirm state, BGP waits for a Notification message or a Keepalive message.
– If BGP receives a Notification message or the TCP connection fails, BGP changes
its state to Idle.
– If BGP receives a Keepalive message, BGP changes its state to Established.
l In Established state, BGP peers can exchange Update, Route-Refresh, Keepalive, and
Notification messages.
During establishment of BGP peer relationships, BGP is usually in the Idle, Active, or
Established state.
l If BGP receives an Update or a Keepalive message, its state stays in Established.
l If BGP receives a Notification message, BGP changes its state to Idle.
The BGP peer relationship can be established only when both BGP peers are in the
Established state. The two peers send Update messages to exchange routes.
BGP Processing
l BGP adopts TCP as its transport layer protocol. Before the BGP peer relationship is set
up, a TCP connection must be set up between the peers. Then, BGP peers exchange
Open messages to negotiate related parameters, and finally establish the BGP peer
relationship.
l After the peer relationship is set up, BGP peers exchange BGP routing tables. BGP does
not periodically update the routing table. When BGP routes change, however, BGP
updates the BGP routing table incrementally through Update messages.
l BGP sends Keepalive messages to maintain the BGP connection between peers. When it
detects an error on a network, for example, error packets or packets that indicate
unsupported negotiation capability are received, BGP sends a Notification message to
report the error, and the BGP connection is torn down.
BGP Attributes
The BGP route attribute is a set of parameters that further describe routes. With the BGP route
attribute, BGP can filter and select routes. BGP route attributes are classified into the
following types:
l Well-known mandatory: can be identified by all BGP devices. This type of attribute is
mandatory and must be carried in Update messages. Without this attribute, errors occur
in the routing information.
l Origin: defines the origin of a route and marks the paths of a BGP route. The Origin
attributes are classified into the following types:
– IGP: indicates the highest priority. For routing information obtained through an IGP
of the AS that originates the route, the Origin attribute is IGP. For example, for
routes imported to the BGP routing table through the network command, the Origin
attribute is IGP.
– Exterior Gateway Protocol (EGP): indicates the second highest priority. The Origin
attribute of routes obtained through EGP is EGP.
– Incomplete: indicates the lowest priority. The Origin attribute of routes learned by
other means is Incomplete. For example, for the routes imported by BGP through
the import-route command, the Origin attribute is Incomplete.
l AS_Path: is used to record all ASs that a route passes through from the local end to the
destination in the distance-vector (DV) order.
Assume that the BGP speaker advertises a local route:
– When advertising the route to other ASs, the BGP speaker adds the local AS
number in the AS_Path list, and advertises it to the neighboring devices through
Update messages.
– When advertising the route to the local AS, the BGP speaker creates an empty
AS_Path list in an Update message.
Assume that the BGP speaker advertises the routes learned from Update messages of
other BGP speakers:
– When advertising the route to other ASs, the BGP speaker adds the local AS
number to the beginning of the AS_Path list. According to the AS_Path attribute,
the BGP device that receives the route can detect the ASs through which the route
passes to the destination. The number of the AS that is nearest to the local AS is
placed at the top of the list. The other AS numbers are arranged in sequence.
– When the BGP speaker advertises the route to the local AS, it does not change the
AS_Path.
The AS_Path attribute has four types:
– AS_Sequence: a sequenced set of numbers of the ASs that a route passes through
from a local end to the destination
– AS_Set: an unsequenced set of numbers of the ASs that a route passes through from
a local end to the destination The AS_Set attribute is used in the route aggregation
scenario. After route aggregation, the device cannot sequence the numbers of ASs
that specific routes pass through, so the AS_Set attribute is used to record the
unsequenced AS numbers. No matter how many AS numbers an AS_Set contains,
BGP regards the AS_Set as one AS number to calculate routes.
– AS_Confed_Sequence: a sequenced set of sub-AS numbers in a confederation
c. A route imported using the network command is preferred over a route imported
using the import-route command.
4. Prefers a route that carries the Accumulated Interior Gateway Protocol Metric (AIGP)
attribute.
– The priority of a route that carries the AIGP attribute is higher than the priority of a
route that does not carry the AIGP attribute.
– If two routes both carry the AIGP attribute, the route with a smaller AIGP attribute
value plus IGP metric of the iterated next hop is preferred over the other route.
5. Prefers the route with the shortest AS_Path.
– The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the
AS_Path length.
– An AS_SET counts as 1, no matter how many ASs are in the set.
– After you run the bestroute as-path-ignore command, the AS_Path attributes of
routes are not compared in the route selection process.
6. Prefers the route with the highest Origin type. IGP is higher than EGP, and EGP is higher
than Incomplete.
7. Prefers the route with the lowest MED.
– BGP compares MEDs of only routes from the same AS, but not a confederation
sub-AS. MEDs of two routes are compared only when the first AS number in the
AS_SEQUENCE (excluding AS_CONFED_SEQUENCE) is the same for the two
routes.
– A route without MED is assigned a MED of 0, unless the bestroute med-none-as-
maximum command is run. If you run the bestroute med-none-as-maximum
command, the route is assigned the highest MED of 4294967295.
– After you run the compare-different-as-med command, MEDs in routes received
from peers in different ASs are compared. Do not use this command unless you
confirm different ASs use the same IGP and route selection mode. Otherwise, a
loop can occur.
– If you run the bestroute med-confederation command, MEDs are compared for
routes that consist of only AS_CONFED_SEQUENCE. The first AS number in the
AS_CONFED_SEQUENCE must be the same for the routes.
– After you run the deterministic-med command, routes are not selected in the
sequence in which routes are received.
8. Prefers EBGP routes over IBGP routes.
EBGP is higher than IBGP, IBGP is higher than LocalCross, and LocalCross is higher
than RemoteCross.
If the export route target (ERT) of a VPNv4 route in the routing table of a VPN instance
on a Provide Edge (PE) matches the import route target (IRT) of another VPN instance
on the PE, the Virtual Private Network version 4 (VPNv4) route is added to the routing
table of the second VPN instance. This is called LocalCross. If the ERT of a VPNv4
route from a remote PE is learned by the local PE and matches the IRT of a VPN
instance on the local PE, the VPNv4 route will be added to the routing table of that VPN
instance. This is called RemoteCross.
9. Prefers the route with the lowest IGP metric to the BGP next hop.
After the bestroute igp-metric-ignore command is run, the IGP metrics are not
compared for routes during route selection.
NOTE
Assume that load balancing is configured. If the preceding rules are the same and there are
multiple external routes with the same AS_Path, load balancing will be performed based on the
number of configured routes.
10. Prefers the route with the shortest Cluster_List.
NOTE
By default, Cluster_List takes precedence over Originator_ID during BGP route selection. To
enable Originator_ID to take precedence over Cluster_List during BGP route selection, run the
bestroute routerid-prior-clusterlist command.
11. Prefers the route advertised by the device with the smallest router ID.
NOTE
If routes carry the Originator_ID, the originator ID is substituted for the router ID during route
selection. The route with the smallest Originator_ID is preferred.
12. Prefers the route learned from the peer with the smallest address if the IP addresses of
peers are compared in the route selection process.
BGP ECMP
When multiple equal-cost routes have the same destination address, traffic can be evenly load
balanced using BGP Equal Cost Multiple Path (ECMP).
Condition for BGP ECMP: Routes must have the same first nine attributes defined in the
preceding "Policies for BGP Route Selection".
l Advertises only the optimal route to its peer when there are multiple valid routes.
l Advertises the routes learned from EBGP devices to all BGP peers, including EBGP
peers and IBGP peers.
l Does not advertise the routes learned from IBGP devices to its IBGP peers.
l Advertises the routes learned from IBGP devices to its EBGP peers.
l Advertises all BGP optimal routes to new peers when the peer relationship is established.
8.0.0.0/8
AS20 IGP
ATN C
IGP ATN E
ATN A IBGP EBGP
AS10 EBGP AS30
ATN B ATN D
If synchronization is configured, devices check the IGP routing table before they add the
IBGP route to the routing table and advertising it to the EBGP peers. The IBGP route is added
to the routing table and advertised to EBGP peers only when IGP obtains this IBGP route.
The synchronization can be disabled in the following cases:
l The local AS is not a transitive AS. (AS20 in Figure 1 is a transitive AS)
l All devices in the local AS are full-meshed IBGP peers.
Purpose
On medium or large-scale Border Gateway Protocol (BGP) networks, the BGP routing table
on a device contains a large number of routing entries. Storing the routing table consumes a
great deal of memory, and transmitting and processing routing information consume
significant network resources. Route summarization can reduce the size of a routing table,
prevent specific routes from being advertised, and minimize the impact of route flapping on
network performance.
Definition
Route summarization is the process of summarizing specific routes with the same IP prefix
into a summary route. BGP supports automatic and manual route summarization. Table 8-36
defines the differences between the two modes.
Automati After automatic route summarization l BGP summarizes only local routes
c route is configured, BGP summarizes routes that are imported using the
summari based on the natural network segment import-route command.
zation and sends only the summarized route l During BGP route selection, an
to peers. For example, 10.1.1.1/24 and automatically summarized route
10.2.1.1/24 are summarized into has a lower priority than a
10.0.0.0/8, which is a Class A address. manually summarized one.
l An automatically summarized
route does not carry path
information because BGP
summarizes only local routes that
are imported using the import-
route command.
l BGP4+ does not support
automatic route summarization.
An automatically summarized route comes from local routes, and the mechanism of automatic
route summarization is much less complex than that of manual route summarization.
Therefore, the next section describes only manual route summarization.
Related Concepts
Atomic_Aggregate: a well-known discretionary BGP attribute, carried in Update messages,
indicating that a route is a summarized one. BGP speakers cannot delete this attribute during
route transmission.
AS_Set affects BGP route selection. Whenever AS_Set changes, a router sends Update messages to its
peers whose routes are not summarized by the router to notify the change. If the summarized route
passes through a large number of ASs and the specific routes change frequently, the router needs to send
Update messages frequently to its peers to notify them of the AS_Set changes. This process may lead to
route flapping.
Implementation
As shown in Figure 8-83, the router in AS 100 summarizes the routes from AS 65001, AS
65002, and AS 65003 into the route 10.1.1.0/24 and then advertises it. Because the route
10.1.1.0/24 originates from AS 100, it carries only AS 100 without the path information about
the specific routes for the summarization.
10.1.1.0/24
AS_Path: 100
AS 100
10.1.1.128/27
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
Without the path information, AS_Path carried in the route 10.1.1.0/24 can no longer prevent
routing loops. To warn downstream routers that the path information has been lost, the router
in AS 100 adds Atomic_Aggregate to an Update message.
As shown in Figure 8-84, the router in AS 100 adds Atomic_Aggregate and Aggregator to
an Update message to advertise the route 10.1.1.0/24.
NOTE
BGP speakers cannot delete Atomic_Aggregate carried in a summarized route during route
transmission. After the downstream router receives this route, the router cannot restore the
lost path information.
10.1.1.0/24
AS_Path: 100
AS 100 Atomic_Aggregate
10.1.1.128/27 Aggregator=100, 10.1.1.1
Router ID
10.1.1.1
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
However, only Atomic_Aggregate and Aggregator cannot prevent routing loops. AS_Set
can address this problem. If AS_Set is configured on the ATN in AS 100 in the networking
shown in Figure 8-85, the summarized route 10.1.1.0/24 carries AS_Set {65001, 65002,
65003} which records all the ASs it passes through.
10.1.1.0/24
AS 100
AS_Path: 100 {65001, 65002, 65003}
10.1.1.128/27
Aggregator=100, 10.1.1.1
Router ID
10.1.1.1
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
Because the ATN in AS 100 cannot determine the AS sequence, it records the ASs without an
order in AS_Set. After the ATN in AS 65001, AS 65002, or AS 65003 receives the route
10.1.1.0/24 elsewhere, it checks the AS_Path carried in the route. Because the AS_Path
contains its AS number, the ATN discards the route. Therefore, even though the ASs are listed
without an order in AS_Set, routing loops can still be prevented.
In the preceding section, all ATNs are old speakers. If new speakers co-exist with old
speakers, AS4_Path and AS4_Aggregator must be available. Figure 8-86 shows such a
scenario in which new speakers use 4-byte AS numbers and old speakers use 2-byte AS
numbers. The ATN in AS 3.3 summarizes the routes from AS 1.1, AS 65001, AS 2.2, and AS
65002 into the route 10.0.0.0/8 carrying AS4_Path {1.1, 65001, 2.2, 65002} and advertises it
to the ATNs in AS 4.4 and AS 65003. The carried AS4_Path equals an AS4_Set in function.
Figure 8-86 Networking for route summarization in which new speakers co-exist with old
speakers
Destination: 10.0.0.0/8
AS_Path: 23546 {23456 65001 65002}
Destination: 10.0.0.0/8
AS4_Path: 3.3 {1.1 65001 2.2 65002}
AS_Path: 3.3 {1.1 65001 2.2 65002}
Aggregator: 23456 192.168.1.1
AS_Aggregator: 3.3 192.168.1.1
AS4_Aggregator: 3.3 192.168.1.1
Router ID
192.168.1.1
New Session
New Speaker Old Speaker
Old Session
l Because the BGP connection is an old session between the ATN in AS 3.3 and that in AS
65003 that does not support 4-byte AS numbers, the ATN in AS 3.3 replaces the 4-byte
AS numbers in AS4_Path and AS4_Aggregator with 23456 (AS_Trans) before it sends
the route 10.0.0.0/8 to the ATN in AS 65003. Therefore, the AS_Path carried in the
route is 23456{23456, 65001, 23456, 65002}, and the Aggregator is 23456 192.168.1.1.
After the ATN in AS 65003 receives the route 10.0.0.0/8, it checks the AS_Path.
Because its own AS number is not listed in the AS_Path, the ATN in AS 65003 accepts
the route.
NOTE
23456 is a reserved AS number and cannot be the number of the AS to which the downstream
router that receives the summarized route belongs. Therefore, the downstream router does not
discard the summarized route.
In addition, the ATN in AS 65003 may be connected to downstream new speakers in an
AS numbered in 4-byte format, AS 5.5 for example. To ensure that the ATN in AS 5.5
knows about the actual path that the route passes through, the ATN in AS 3.3 adds
AS4_Path and AS4_Aggregator to the Update message to advertise the route 10.0.0.0/8
to the ATN in AS 65003. After the ATN in AS 65003 receives the message, it
transparently transmits the message to the ATN in AS 5.5. After the ATN in AS 5.5
receives the message, it constructs the actual path that the route passes based on
AS4_Path and AS4_Aggregator carried in the message.
l Because the BGP connection is a new session between the ATN in AS 3.3 and that in AS
4.4 that supports 4-byte AS numbers, the ATN in AS 3.3 adds only AS_Path and
AS_Aggregator to the Update message to advertise the route 10.0.0.0/8 to the ATN in
AS 4.4. After the ATN in AS 4.4 receives the route 10.0.0.0/8, it checks the AS_Path.
Because its own AS number is not listed in the AS_Path, the ATN in AS 4.4 accepts the
route.
Benefits
Route summarization brings the following benefits:
l Reduces the router load: Route summarization reduces the size of a routing table and
spares a router from advertising a large number of specific routes, which reduces the
transmitting load. Route summarization also reduces the receiving load because
downstream routers receive only the summarized route.
l Reduces the link load: A router advertises only the summarized route to its peers, which
reduces link bandwidth consumption.
l Minimizes the impact of route flapping: If route flapping occurs in the ASs that the
specific routes for summarization pass through, its impact will not spread beyond the
ASs.
Penalty value
suppress value
reuse value
suppress time
time
half-life
The community attribute is used to simplify the application, maintenance, and management of
routing policies. With the community attribute, a group of BGP peers in multiple ASs can
share the same routing policy. The community attribute is a route attribute. It is transmitted
between BGP peers and is not restricted by the AS. Before advertising a route with the
community attribute to peers, a BGP peer can change the original community attribute of this
route.
The peers in a peer group share the same policy, whereas the routes with the same community
attribute share the same policy.
The well-known communities are described in the next section. Users can also create their
own communities to filter routes.
Well-known Community
Table 8-37 lists the well-known community attributes of BGP routes.
Usage Scenario
In Figure 8-88, EBGP connections are established between ATN B and ATN A, and between
ATN B and ATN C. With the community attribute of No_Export configured on ATN A, routes
from AS 10 advertised to AS 20 are not advertised to other ASs by AS 20.
AS 10
ATN-A
EBGP
EBGP
200.1.3.1/24 ATN -C
ATN -B
AS 20 AS 30
to be established. If many IBGP peers exist, network resources and Central Processing Unit
(CPU) resources are greatly consumed. To solve this problem, route reflection is introduced in
the network.
In an AS, one router serves as a Route Reflector (RR) and the other routers serve as clients.
The clients establish IBGP connections with the RR. The RR and its clients form a cluster.
The RR reflects routes between clients, and clients do not need to establish BGP connections.
A BGP device that functions as neither an RR nor a client is called a non-client. A non-client
must establish a fully meshed connection with an RR and with all the other non-clients, as
shown in Figure 8-89.
Route
Reflector Non-Client
IBGP IBGP
Client
IBGP IBGP
Cluster IBGP IBGP
Applications
After receiving routes from peers, an RR selects the optimal route based on BGP route
selection policies. The RR advertises the learned routes to its IBGP peers according to the
rules defined in RFC 2796.
l After learning routes from a non-client IBGP peer, the RR advertises the routes to all the
clients.
l After learning routes from a client, the RR advertises the routes to all the other clients
and all non-clients.
l After learning routes from an EBGP peer, the RR advertises the routes to all clients and
non-clients.
An RR is easy to configure, because it needs to be configured only on the device that
functions as a reflector, and clients do not need to detect that they are clients.
On some networks, if clients of an RR establish fully meshed connections between each other,
they can exchange routing information directly. In this case, route reflection between clients is
unnecessary and occupies bandwidth needlessly. On the ATN, you can run the undo reflect
between-clients command to disable route reflection between clients, but routes between
clients and non-clients can still be reflected. Route reflection between clients is enabled by
default.
Originator_ID
The Originator_ID attribute and Cluster_List attribute are defined in RFC 2796. They are
used to detect and prevent routing loops.
The Originator_ID attribute is four bytes long and is generated by an RR. It carries the router
ID of the originator of the route in a local AS.
l When a route is reflected by an RR for the first time, the RR adds the Originator_ID
attribute to the route to identify the originating router. If a route already has the
Originator_ID attribute, the RR does not create a new Originator_ID.
l When another BGP speaker receives the route, it compares the Originator_ID added to
the route with the local router ID. If the Originator_ID and local router ID are the same,
the BGP speaker ignores the route.
Cluster_List
To avoid routing loops between ASs, a BGP router uses the AS_Path attribute to record the
ASs that a route passes through. The router discards the route with the local AS number. To
avoid routing loops within an AS, a BGP router prohibits IBGP peers from advertising routes
learned from the local AS.
An RR is implemented on the basis that IBGP peers can advertise to each other the routes
learned from the local AS to each other. In this case, the Cluster_List attribute is introduced to
avoid routing loops within an AS.
To avoid routing loops, the RR uses the Cluster_List attribute to record Cluster_IDs of all
RRs that a route passes through.
The Cluster_List is composed of a series of Cluster_IDs. It records all the RRs that a route
passes through. The Cluster_List is similar to the AS_Path list and is generated by an RR.
l When an RR reflects routes between its clients or between its clients and non-clients, the
RR adds the local Cluster_ID to the top of the Cluster_List. If the Cluster_List is empty,
the RR creates a new one.
l When receiving an updated route, the RR checks its Cluster_List. If the Cluster_List
contains the local Cluster_ID, the RR discards the received route. If the Cluster_List
does not contain the local Cluster_ID, the RR adds the local Cluster_ID to the
Cluster_List, and then reflects the updated route.
Backup RR
To enhance the reliability of a network and avoid the single node fault, more than one RR
needs to be configured in a cluster. To avoid routing loops, RRs in the same cluster have the
same Cluster_ID. On the ATN, you can run the reflector cluster-id command to configure the
same Cluster_ID for all RRs in a cluster.
In the redundant environment, clients can receive multiple routes to the same destination from
different RRs. Clients then apply route selection policies to select the optimal route.
In Figure 8-90, RR1 and RR2 are in the same cluster. RR1 and RR2 establish an IBGP
connection. That is, the two RRs are non-clients.
RR1 RR2
IBGP
Cluster
Client2 Client3
Client1
AS65000
l When Client 1 receives an updated route from an external peer, it advertises the route to
RR1 and RR2 through IBGP.
l After it receives the updated route, RR1 reflects the route to other clients (Client 2 and
Client 3) and non-clients (RR2) and adds the local Cluster_ID to the top of the
Cluster_List.
l After receiving the reflected route, RR2 checks the Cluster_List. RR2 finds that its
Cluster_ID is contained in the Cluster_List; therefore, it discards the updated route and
does not reflect the route to its clients.
NOTE
Application of the Cluster_List ensures that routing loops do not occur between RRs in the same AS.
Multiple Clusters in an AS
Multiple clusters can exist in an AS. RRs are IBGP peers of each other. An RR can be
configured as a client or non-client of another RR. The relationship between clusters in an AS
can be configured flexibly.
For example, a backbone network is divided into multiple reflection clusters. Each RR is
configured as a non-client of the other RRs, which are fully meshed. Each client establishes
IBGP connections with only the RRs in the same cluster. All BGP devices in the AS can
receive the reflected routes, as shown in Figure 8-91.
Cluster 4
Cluster 3
Client Client Client Client
Client
Client RR
RR
RR RR Client
Hierarchical Reflector
In the actual deployment of RRs, the scenario of the hierarchical reflector is most often used.
In Figure 8-92, an Internet Service Provider (ISP) provides Internet routes for AS 100. Two
EBGP connections are established between the ISP and AS 100. AS 100 is divided into two
clusters. The four ATNs in Cluster 1 are core routers.
l Two Level-1 RRs (RR-1) are deployed in Cluster 1. This redundant structure ensures
reliability of the AS100 core layer. The other two ATNs at the core layer serve as clients
of RR-1.
l One Level-2 RR (RR-2) is deployed in Cluster 2. RR-2 is the client of RR-1.
ISP
EBGP EBGP
RR-1 RR-1
Cluster1 Client/RR-2
Client
Cluster2
AS100
Client Client
NOTE
In the networking with RRs, if BGP preferred routes do not need to guide packet forwarding, after the
BGP-RIB-only feature is configured, all BGP preferred routes are not added to the IP routing table and
are not delivered to the forwarding layer. Forwarding efficiency is improved and system capacity is
expanded.
ATN -B ATN-C
AS 65002
AS 65003
AS 65001
ATN -D
ATN -F ATN -A
AS 100
AS 200
ATN -E
BGP speakers outside the confederation (such as, the devices in AS 100) are unaware of the
sub-ASs (AS 65001, AS 65002, and AS 65003) in the same confederation. The external
devices do not need to detect the topology of each sub-AS. The confederation ID is the AS
number that is used to identify the entire confederation. As shown in Figure 8-93, AS 200 is
the confederation ID.
In Figure 8-93, AS 200 has multiple BGP devices. To reduce IBGP connections, AS 200 is
divided into three sub-ASs: AS 65001, AS 65002, and AS 65003. In AS 65001, IBGP full
meshes are established between the three devices.
NOTE
4-byte AS numbers do not support confederations, which may incur routing loops. Therefore, old BGP
speakers with 2-byte AS numbers and new speakers with 4-byte AS numbers cannot exist in the same
confederation.
8.8.2.8 BGP GR
Graceful restart (GR) is one of the high availability (HA) technologies that comprise a series
of comprehensive technologies, such as fault-tolerant redundancy, link protection, faulty node
GR is usually used when the active route processor (RP) fails because of a software or
hardware error, or used by an administrator to perform the master/slave switchover.
Related Concepts
The concepts related to GR are as follows:
NOTE
The ATN device can only function as a GR Helper.
l GR Restarter: indicates a device that performs master/slave switchover triggered by the
administrator or a failure. A GR Restarter must support GR.
l GR helper: indicates the neighbor of a GR Restarter. A GR helper must support GR.
l GR session: indicates a session, through which a GR Restarter and a GR helper can
negotiate GR capabilities.
l GR time: indicates the time when the GR helper finds that the GR Restarter is Down but
keeps the topology information or routes obtained from the GR Restarter.
l End-of-RIB (EOR): indicates a BGP information, notifying a peer BGP that the first
route upgrade is finished after the negotiation.
l EOR timer: indicates a maximum time of a local device waiting for the EOR information
sent from the peer. If the local device does not receive the EOR information from the
peer within the EOR timer, the local device will select an optimal route from the current
routes.
Principles
Principles of BGP GR are as follows:
l During BGP peer relationship establishment, devices negotiate GR capabilities by
sending supported GR capabilities to each other.
l When detecting the master/slave switchover of the GR Restarter, a GR helper does not
delete the routing information and forwarding entries related to the GR Restarter within
the GR time, but waits to re-establish a BGP connection with the GR Restarter.
NOTE
If the GR Helper sends Keepalive packets to the Restarter but receives no reply within the
Holdtimer, the GR Helper is in GR state and marks the route sent from the Restarter as Stale. The
Restarter restart may trigger the GR Helper to enter the GR state.
l After the master/slave switchover, the GR Restarter receives routes from all the
negotiated peers with GR capabilities before the switchover, and starts the EOR timer.
The GR Restarter selects a route when either of the following conditions is met:
a. The GR Restarter receives the EOR information of all peers and the EOR timer is
deleted.
b. The EOR timer times out but the GR Restarter receives no EOR information from
all peers.
l The GR Restarter sends the optimal route to the GR Helper and the GR Helper starts the
EOR timer. The GR Helper quits GR when either of the following conditions is met:
a. The GR Helper receives the EOR information from the GR Restarter and the EOR
timer is deleted.
b. The EOR timer times out and the GR Helper receives no EOR information from the
GR Restarter.
GR Reset
Currently, BGP does not support dynamic capability negotiation. Therefore, each time a new
BGP capability (such as the IPv4, IPv6, VPNv4, and VPNv6 capabilities) is enabled on a
BGP speaker, the BGP speaker tears down existing sessions with its peer and renegotiates
BGP capabilities. This process will interrupt ongoing services.
To prevent the service interruptions, the ATN provides the GR reset function that enables the
ATN to reset a BGP session in GR mode. With the GR reset function configured, when you
enable a new BGP capability on the BGP speaker, the BGP speaker enters the GR state, resets
the BGP session, and renegotiates BGP capabilities with the peer. In the whole process, the
BGP speaker re-establishes the existing sessions but does not delete the routing entries for the
existing sessions, so that the existing services are not interrupted.
Benefits
BGP GR ensures uninterrupted forwarding. In addition, the flapping of BGP occurs only on
the peers of the GR Restarter. This is important for BGP that needs to process a large number
of routes.
BGP Authentication
BGP uses TCP as the transport layer protocol. To enhance BGP security, you can perform the
Message Digest 5 (MD5) authentication or Keychain authentication when you set up a TCP
connection is set up. The MD5 authentication or Keychain authentication, however, does not
authenticate BGP packets. Instead, it sets the authentication password for the TCP connection,
and the authentication is performed by TCP. If authentication fails, the TCP connection cannot
be established.
GTSM of BGP
The Generalized TTL Security Mechanism (GTSM) defends against attacks by checking the
time to live (TTL) value (maximum number of routers through which a packet can pass). If an
attacker simulates real BGP packets and keeps sending them to a router, an interface board on
the router receives the packets and sends them directly to the main control board for BGP
processing, without checking the validity of the packets. When the router is tied up in
processing these packets, Central Processing Unit (CPU) usage is high.
The GTSM checks whether the TTL value in the IP packet header is within a pre-defined
value range. To enhance system security, the GTSM can protect services above the IP layer.
After the GTSM of BGP is enabled, an interface board checks the TTL values carried in all
BGP packets. As required by the actual networking, packets whose TTL values are not within
the specified range are either allowed to pass or discarded by the GTSM. To configure the
GTSM to discard packets by default, you need to set an appropriate TTL value range
according the network topology. Then, packets whose TTL values are not within the specified
range are discarded and attacks by bogus BGP packets are avoided.
You can also enable the log function to record information when GTSM drops packets to help
you locate faults.
To meet the requirement for high reliability of carrier-class networks, BFD for BGP is
introduced in the network to detect faults on the links between BGP peers in milliseconds and
notify the faults to BGP so that routes can converge quickly.
Networking
In Figure 8-94, ATN A belongs to AS 100, ATN B belongs to AS 200, and the EBGP peer
relationship is established between them.
BFD is enabled to detect the BGP relationship between ATN A and ATN B. When the link
between ATN A and ATN B becomes faulty, BFD can quickly detect the fault and notify BGP.
EBGP
AS100 AS200
ATN A ATN B
To ensure network stability, configure a proper delay for BGP to terminate a connection after
detecting the associated peer unreachable.
l If the delay is set to 0, BGP immediately terminates the connection between the local
device and its peer after detecting the peer unreachable.
l If IGP route flapping occurs and the delay is set to 0, the peer relationship between the
local device and its peer alternates between Up and Down. Therefore, setting the delay to
a value greater than the IGP route convergence time is recommended.
l When BGP peers successfully perform GR negotiation, a master/slave control board
switchover is performed on the BGP peers. To prevent a GR failure, set the delay to a
value greater than the GR convergence time. If the delay is less than the GR convergence
time, the connection between the local device and its BGP peer will be terminated,
leading to GR failure.
BGP peer tracking can speed up network convergence and is easy to deploy. However, BGP
route convergence on a network configured with BGP peer tracking is slower than that on a
network enabled with BFD. BGP peer tracking cannot meet the requirements of voice
services that require a high convergence speed.
Networking
In Figure 8-95, an IBGP peer relationship is established between ATN A and ATN C. BGP
peer tracking is configured on ATN A. If the link between ATN A and ATN B fails, ATN A
detects that ATN C is unreachable after IGP fast route convergence and then terminates the
BGP connection with ATN C.
With BGP Auto FRR, if a peer has multiple routes with the same prefix that are learned from
different peers, the peer uses the optimal route as the primary link to forward packets and the
less optimal route as a backup link. If the primary link fails, the peer rapidly notifies other
peers that the BGP route has become unreachable and then switches traffic from the primary
link to the backup link.
Usage Scenario
In Figure 8-96, ATN Y advertises a learned BGP route to ATN X2 and ATN X3 in AS 100;
ATN X2 and ATN X3 then advertise the BGP route to ATN X1 through the reflector. ATN X1
therefore receives two routes whose next hops are ATN X2 and ATN X3 respectively. Then,
ATN X1 selects a route according to the configured policy. Assume that the route received
from ATN X2 (link A) is preferred. Link B, then functions as the backup link.
Loopback1
2.2.2.2/32
RR
Loopback1 ATNX2
1.1.1.1/32
LinkA
AS100 AS200
ATNX3
RR
Loopback1
3.3.3.3/32
When a device along Link A fails or faults occur on Link A, the next hop of the route to ATN
X2 becomes invalid on ATN X1. If BGP Auto FRR is enabled on ATN X1, the forwarding
plane then quickly switches traffic sent from ATN X1 to ATN Y to Link B, which ensures
uninterrupted traffic transmission. In addition, ATN X1 reselects the route received from ATN
X3 according to the forwarding prefixes and then updates the FIB table.
carrier then filters out unwanted routes during route advertisement based on the received
inbound policies. This prevents users from receiving a large number of unwanted routes
and saves resources.
Applications
As shown in Figure 8-97, ATN A and ATN B are directly connected, and are enabled with
prefix-based ORF; after negotiating the prefix-based ORF capability with ATN B, ATN A
adds the local prefix-based inbound policy to a Route-Refresh message, and then sends the
Route-Refresh message to ATN B. Based on the received Route-Refresh message, ATN B
works out an outbound policy for advertising routes to ATN A.
ATNA ATNB
AS100 AS200
As shown in Figure 8-98, there is an RR in the domain, and ATN A and ATN B are the
clients of the RR; ATN A, ATN B, and the RR are enabled with prefix-based ORF. After
negotiating prefix-based ORF with the RR, ATN A and ATN B add the local prefix-based
inbound policies to Route-Refresh messages, and then send the Route-Refresh messages to
the RR. Based on the Route-Refresh messages received from ATN A and ATN B, the RR
works out associated outbound policies for reflecting routes to ATN A and ATN B.
RR
ATNA ATNB
8.8.2.14 Active-Route-Advertise
BGP advertises only optimal routes to peers. In versions earlier, only the routes preferred by
the routing management layer are advertised. Active-route-advertise is designed to implement
forward compatibility.
By default, when a route is preferred by BGP, the route can be advertised to peers. When
active-route-advertise is configured, only the route preferred by BGP and also active on the
routing management layer is advertised to peers.
NOTE
Imported routes are active in the IP routing table and are not restricted by the active-route-advertise
command.
Usage Scenario
The BGP dynamic update peer-group feature is applicable to the following scenarios:
l Scenario with an international gateway
l Scenario with an RR
l Scenario where routes received from EBGP peers need to be sent to all IBGP peers
The following figures represent each scenario in turn.
AS1000
AS200
AS65001
AS30
Internet Route
AS100
AS65002
AS120
AS100
RR1 RR2
IBGP IBGP
AS200
ATN -C
IBGP
AS100 ATN -D
ATN -A EBGP
ATN -B ATN -E
IBGP
ATN -F
The preceding scenarios have in common that a router needs to send routes to a large number
of BGP peers, most of which share the same outbound policy. This situation is most evident in
the networking shown in Figure 8-100. When a large number of peers and routes exist, the
forwarding efficiency is low.
For example, an RR has clients and needs to reflect routes to them. If the RR groups the
routes for each peer before sending the routes to 100 clients, the total number of times that all
routes are grouped . After the dynamic update peer-groups feature is applied, the total number
of times that all routes are grouped changes to . The efficiency is times higher than before.
NSR ensures that a peer is unaware of the fault on the control plane of the local device with a
slave control plane. In this process, the peer relationships set up through specific routing
protocols, MPLS, and other protocols that carry services are not interrupted.
As an HA solution, NSR ensures that user services are not affected or least affected in the
case of device failures.
During the master/slave switchover, BGP NSR ensures uninterrupted forwarding and BGP
route advertisement.
4-byte AS numbers define a new capability code and new optional transitive attributes to
negotiate the 4-byte AS number capability and transmit 4-byte AS numbers. This mechanism
enables communication between new speakers and between old speakers and new speakers.
BGP Extension
Open capability code 0x41, defined for BGP connection negotiation, indicates that the BGP
speaker supports 4-byte AS numbers.
Two new optional transitive attributes, AS4_Path with attribute code 0x11 and
AS4_Aggregator with the attribute code 0x12, are defined to transmit 4-byte AS numbers on
old sessions.
If a new speaker with an AS number greater than 65535 communicates with an old speaker,
the old speaker needs to set the peer AS number to AS_TRANS. The value of AS_TRANS is
23456 and reserved.
Principles
When setting up connections, BGP peers determine whether the peer supports 4-byte AS
numbers according to the optional capability field in Open messages.
l New sessions are set up between new speakers. AS_Path and Aggregator in an Update
message carry 4-byte AS numbers.
l Old sessions are set up between new and old speakers. AS_Path and Aggregator on old
speakers carry 2-byte AS numbers.
– When a new speaker sends an Update message to an old speaker, if the AS number
of the new speaker is greater than 65535, AS4_Path and AS4_Aggregator are used
together with AS_Path and AS_Aggregator to carry 4-byte AS numbers. AS4_Path
and AS4_Aggregator are transparent to the old speaker.
– When receiving messages that contain AS_Path, AS4_Path, AS_Aggregator, and
AS4_Aggregator from an old speaker, a new speaker reconstructs the actual
AS_Path and AS_Aggregator based on the reconstruction algorithm.
Usage Scenario
Figure 8-102 shows old speakers and new speakers. The 4-byte AS number feature, together
with AS4_Path, transmits routing information between the old and new speakers.
AS10
old speaker
ATN-A
D=(8.0.0.0)
AS_Path (10)
AS20.1 AS50.5
ATN-C
new speaker new speaker
ATN-B
D=(8.0.0.0)
AS_Path (23456, 10) D=(8.0.0.0)
AS4_Path (20.1, 10) AS_Path (40.4, 30, 20.1, 10)
In Figure 8-102, before advertising route D=8.0.0.0 of AS 10 to other ASs, a BGP device
performs the following:
1. BGP adds AS 10 to the AS_Path list (10).
2. When the route passes AS 20.1, to enable ATN-D (old speaker) to transmit AS path
information with 4-byte AS numbers, this route carries the AS4_Path attribute (20.1, 10).
ATN-B then adds AS 20.1 to the beginning of the AS_Path list (23456, 10). (The value
23456 is obtained when AS_TRANS replaces 20.1.)
3. When the route passes AS 30, ATN-D, an old speaker, transparently transmits AS4_Path
(20.1, 10) to ATN-E. ATN-D then adds AS 30 to the beginning of the AS_Path list (30,
23456, 10).
4. When the route passes AS 40.4, after the reconstruction of AS_Path and AS4_Path, BGP
adds AS 40.4 to the beginning of the AS_Path list (40.4, 30, 20.1, 10).
The rest may be deduced by analogy. After the device in AS 50.5 receives the route, the
device learns the path to AS 10 according to the AS_Path list.
Application
On the network shown in Figure 8-103, IBGP peer relationships are established between
ATN A and ATN B, and between ATN A and ATN C through loopback interfaces. ATN A
receives a BGP route with the prefix 10.10.10.10/32 from ATN B and ATN C. The original
next hop of the BGP route received from ATN B is 2.2.2.2. The address of GE 0/2/1 on ATN
A is 2.2.2.10/24.
Loopback0
2.2.2.2/32
Loopback0
1.1.1.1/32 ATNB
GE0/2/1
2.2.2.10/24 GE0/2/2
10.10.10.10/32
GE0/2/3
ATNA
ATNC
Loopback0
AS100 3.3.3.3/32
When ATN B runs normally, the BGP route with the prefix 10.10.10.10/32 is iterated to the
IGP route 2.2.2.2/32. If ATN B fails, the IGP route 2.2.2.2/32 becomes unavailable, which
triggers new route iteration. ATN A searches the IP routing table based on the original next
hop 2.2.2.2 and uses the route 2.2.2.0/24 for iteration. However, the route 3.3.3.3 is expected
when the route 2.2.2.2 is unreachable.
In this situation, you can configure a routing policy with the mask length of the route to the
original next hop as the matching condition to control the route iteration. In this example, you
can configure a routing policy so that the route with the original next hop 2.2.2.2 depends on
only the IGP route 2.2.2.2/32.
BGP Border Gateway Protocol. Dynamic routing protocol used between ASs.
Different from the IGP, such as OSPF and RIP, BGP controls route
transmission and selects optimal routes rather than discovering or
calculating routes.
RM Routing Management
AS Autonomous System
CE Customer Edge
PE Provider Edge
P Provider
RR Route Reflector
GR Graceful Restart
Definition
Routing policies are used to filter routes and set attributes for routes. Changing route
attributes (including reachability) changes the path that network traffic passes through.
NOTE
The difference between a routing policy and policy-based routing (PBR) is as follows:
l Routing policies apply to routes. Based on routing protocols, the result of route generation,
advertisement, and selection is changed by following rules, changing parameters, or using control
modes. That is, the contents in the routing table are changed.
l PBR applies to data packets. PBR provides a means to route or forward data packets flexibly based
on predefined policies instead of following the routes in the existing routing table.
For details about PBR, see Feature Description - IP Services.
Purpose
When advertising, receiving, and importing routes, the ATN implements certain policies
based on actual networking requirements to filter routes and change the attributes of the
routes. Routing policies serve the following purposes:
A routing protocol may import routes discovered by other routing protocols. Only routes
that satisfy certain conditions are imported to meet the requirements of the protocol.
l Modify attributes of specified routes
Attributes of the routes that are filtered by a routing policy are modified to meet the
requirements of the local device.
l Configure fast reroute (FRR)
If a backup next hop and a backup outbound interface are configured for the routes that
match a routing policy, IP FRR, VPN FRR, and IP+VPN FRR can be implemented.
Benefits
This feature brings the following benefits:
l Controls the size of the routing table, saving system resources.
l Controls route receiving and advertising, improving network security.
l Modifies attributes of routes for proper traffic planning, improving network
performance.
l Improves network reliability using FRR.
8.9.2 Principles
1. Define rules. Define features of routing information to which routing policies are
applied. That is, you need to define a set of matching rules regarding different attributes
of routing information such as the destination address and AS number.
2. Implement the rules. Apply matching rules to the routing policies to advertise, receive,
and import desired routes.
ACL
There are ACLs for IPv4 packets. When defining an ACL, you can specify an IP address and
a subnet range against which the destination network segment address or the next hop address
of a route is matched.
IP Prefix List
There are IP prefix lists for IPv4 routes.
An IP prefix list is identified by its name. Each IP prefix list can contain multiple entries.
Each entry can independently specify a matching range in the form of a network prefix. The
matching range is identified by an index number that specifies the matching sequence.
During route matching, the device checks entries identified by the index number in ascending
order. If a route matches an entry, the route is not matched against the next entry.
AS_Path
Each BGP route contains an AS_Path attribute. AS_Path filters specify matching rules
regarding AS_Path attributes. AS_Path filters are applicable only to BGP.
For more information about the AS_Path attribute, refer to RFC 1965.
Community
Community filters are applicable only to BGP. Each BGP route contains a community
attribute to identify a community. Community filters specify matching rules regarding
community attributes.
For more information about the community attribute, refer to RFC 1997.
Extended Community
Extended community filters are applicable only to BGP. Currently, Huawei devices support
route filtering only through VPN route-target (RT) extended community attributes.
RD
RD filters are applicable only to BGP. RD filters specify matching rules regarding VPN RD
attributes.
Route-Policy
Matching rules are the core of route-policies.
A route-policy can use the preceding filters to define its matching rules. A route-policy can
consist of multiple nodes, and the relationship between these nodes is OR. The system checks
the nodes based on index numbers. If a route matches a node in the route-policy, the route is
not matched against the next node.
Each node comprises a set of if-match and apply clauses. The if-match clauses define the
matching rules that are used to filter certain route attributes. The relationship between the if-
match clauses of a node is AND. A route matches a node only when the route matches all the
matching rules defined by the if-match clauses of the node. The apply clauses specify
actions. When a route matches a node, the apply clauses set certain attributes for the route.
Nodes have the following matching modes:
l Permit: If a route matches all the if-match clauses of a node, the route matches the route-
policy, and all the actions defined by apply clauses are performed on the route. If a route
does not match one if-match clause of a node, the route is matched against subsequent
nodes.
l Deny: If a route matches all the if-match clauses of a node, the route is denied and is not
matched against the next node.
RM Route Management
RIP 520 -
RIPv2 520 -
RIPng 521 -
BGP - 179
OSPF - -
IS-IS - -
Note that "-" indicates that the related transport layer protocol is not used.
DHCP 67 -
DNS 53 53
FTP - 20/21
HTTP - 80
IMAP - 993
POP3 - 110
SMTP 25 25
SNMP 161 -
TELNET - 23
TFTP 69 -
Note that "-" indicates that the related transport layer protocol is not used.
9 IP Multicast
This document describes the IP multicast in terms of the overview, principle, and applications.
NOTE
IP multicast is under GTL License control. IP multicast can be enabled on the ATN only after a valid
GTL License file is loaded and activated.
9.1.1 Introduction
IP multicast is a method of sending a single IP stream to multiple receivers simultaneously,
reducing bandwidth consumption. IP multicast provides benefits for point to multi-point
(P2MP) services, such as e-commerce, online conferencing, and video on demand. P2MP
services offer opportunities for significant profits, yet require high bandwidth and secure
operation. IP multicast is used to meet these requirements.
IP Data Transmission
IP data transmission is based on IP addresses. An IP address identifies a specific device on a
specific network.
l Unicast IP address
A unicast IP address can identify only one host, and a host can identify only one unicast
IP address. An IP packet that carries a unicast destination address can be received only
by one host.
l Broadcast IP address
A broadcast IP address can identify all hosts on a network segment, and an IP packet that
carries a broadcast destination IP address can be received by all hosts on a network
segment. However, a host can identify only one broadcast IP address. IP broadcast
packets cannot be transmitted across network segments.
l Multicast IP address
A multicast IP address can identify multiple hosts at different locations, and a host can
identify multiple multicast IP addresses. An IP packet that carries a multicast destination
IP address can therefore be received by multiple hosts at different locations.
IP Transmission Modes
Based on the IP address types, networks can transmit packets in the following modes:
l IP unicast mode
l IP broadcast mode
l IP multicast mode
Any of these modes can be used for P2MP data transmission.
l Unicast transmission
– Features: A unicast packet uses a unicast address as the destination address. If
multiple receivers require the same packet from a source, the source sends an
individual unicast packet to each receiver.
– Disadvantages: This mode consumes unnecessary bandwidth and processor
resources when sending the same packet to a large number of receivers.
Additionally, the unicast transmission mode does not guarantee transmission quality
when a large number of hosts exist.
l Broadcast transmission
– Features: A broadcast packet uses a broadcast address as the destination address. In
this mode, a source sends only one copy of each packet to all hosts on the network
segment, irrespective of whether a host requires the packet.
– Disadvantages: This mode requires that the source and receivers reside on the same
network segment. Because all hosts on the network segment receive packets sent by
the source, this mode cannot guarantee information security or charging of services.
l Multicast transmission
The following example uses the network shown in Figure 9-1 to illustrate the multicast
transmission mode. A source exists on the network. User A and User C require
information from the source, while User B does not.
ATN D
Receiver
UserA
RouterA
Source ATN E
UserB
RouterB
ATN F
Receiver
RouterC UserC
– Advantages: In multicast mode, only one copy of a multicast packet exists on each
link and is sent to users along the distribution tree. Only users who require the
packet receive it, providing the basis for information security. Compared with
unicast, multicast does not increase the network load when the number of users
increases in the same multicast group. As a result, multicast requires fewer server
and CPU resources. Compared with broadcast, multicast can transmit information
across network segments and across long distances.
– Applications: Multicast applies to all P2MP applications, such as multimedia
presentations, streaming media, and finance (stock-trading) applications. IP
multicast is being widely used in Internet services, such as online broadcast,
network TV broadcast, and real-time video and audio conferencing.
9.1.2 Principles
Multicast Group
A multicast group consists of a group of receivers that require the same data stream. A
multicast group uses an IP multicast address identifier. A host that joins a multicast group
becomes a member of the group and can identify and receive IP packets that have the IP
multicast address as the destination address.
Multicast Source
A multicast source sends IP packets that carry multicast destination addresses.
Multicast Router
The ATN that supports the multicast function is called multicast router.
l Manages group members on the leaf segment networks that connect to users.
l Routes and forwards multicast packets.
IP multicast is an end-to-end service. Figure 9-2 shows four IP multicast functions from the
lower protocol layer to the upper protocol layer.
Multicast Multicast
Application Application
Multicast Multicast ……
…… routing routing
Host Host
…… …… …… registration registration
Addressing Addressing Addressing Addressing
mechanism mechanism mechanism mechanism
Multicast Multicast Multicast Receiver
source (host) router router (host)
A multicast packet's source address field contains a Class A, B, or C unicast address. A Class
D address cannot be a source IP address in a multicast packet. Class E addresses are reserved
for future use.
All receivers in a multicast group are identified by the same IPv4 multicast group address on
the network layer. Once a user joins the group, the user can receive all IP packets sent to the
group.
Class D addresses are in the 224.0.0.0 to 239.255.255.255 range. For details, see Table 9-2.
l A permanent multicast group address, also known as a reserved multicast group address,
identifies all devices in a multicast group that may contain any number (including 0) of
members. For details, see Table 9-3.
A multicast MAC address identifies receivers of the same multicast group at the link layer.
Ethernet interface boards can identify multicast MAC addresses. After a multicast MAC
address of a multicast group is configured on a device's driver, the device can then receive and
forward data of the multicast group on the Ethernet. The mapping between the multicast IPv4
address and multicast IPv4 MAC address is as follows:
As defined by the IANA, the 24 most significant bits of a MAC address are 0x01005e, the
25th bit is 0, and the 23 least significant bits are the same as those of a multicast IPv4 address.
Figure 9-3 shows the mapping relationships between multicast IPv4 addresses and multicast
MAC addresses.
Figure 9-3 Mapping relationships between multicast IPv4 addresses and multicast MAC
addresses
5 bits information loss
XXXX X
The first four bits of an IPv4 multicast address, 1110, are mapped to the 25 most significant
bits of a multicast MAC address. In the last 28 bits, only 23 bits are mapped to a MAC
address, resulting in the loss of 5 bits. Therefore, 32 IPv4 multicast addresses are mapped to
the same MAC address.
NOTE
This document focuses on IP multicast technology and device operation. Multicast in the document
refers to IP multicast, unless otherwise specified.
l ASM model
l SFM model
l SSM model
ASM Model
In the any-source multicast (ASM) model, any sender can act as a multicast source and send
information to a multicast group address. Receivers cannot know the multicast source location
before they join a multicast group.
SFM Model
From the sender's point of view, the source-filtered multicast (SFM) model works the same as
the ASM model. That is, any sender can act as a multicast source and send information to a
multicast group address.
Compared with the ASM model, the SFM model extends the following function: The upper
layer software checks the source addresses of received multicast packets, permitting or
denying packets of multicast sources as configured.
NOTE
Compared with ASM, SFM adds multicast source filtering policies. The basic principles and
configurations of ASM and SFM are the same. In this document, information about ASM also applies to
SFM.
SSM Model
In real-world situations, users may not require all data sent by multicast sources. The source-
specific multicast (SSM) model allows users to specify multicast data sources.
Compared with receivers in the ASM model, receivers in the SSM model know the multicast
source location before they join a multicast group. The SSM model uses a different multicast
address scope from the ASM model and sets up a dedicated forwarding path between a source
and receivers.
AS1 AS2
Source
PIM PIM
MSDP
IGMP IGMP
User User
The ATN supports various multicast routing protocols to implement different applications.
Table 9-4 describes commonly used multicast routing protocols.
Multicast protocols have two main types of functions: managing member relationships;
establishing and maintaining multicast routes.
Terms
Terms Definition
MSDP Multicast Source Discovery Protocol. A protocol applies only to the any-source
multicast (ASM) model in PIM-SM domains.
After an MSDP peer relationship is set up between rendezvous points (RPs) of
different PIM-SM domains, multicast source information can be shared between
the PIM-SM domains, and the inter-domain multicast can then be implemented.
After an MSDP peer relationship is set up between RPs of the same PIM-SM
domain, multicast source information can be shared in the PIM-SM domain, and
Anycast-RP can then be implemented.
9.2 PIM
9.2.1 PIM
Purpose
A multicast network requires multicast protocols to replicate and forward multicast data. The
Protocol Independent Multicast (PIM) is a widely used intra-domain multicast protocol that
builds MDTs to transmit multicast data between devices in the same domain.
PIM can create multicast routing entries on demand, forward packets based on multicast
routing entries, and dynamically respond to network topology changes.
Definition
PIM is a multicast routing protocol that uses unicast routing protocols to forward data, but
PIM is independent of any specific unicast routing protocols.
PIM can be implemented in PIM-DM, PIM-SM, or PIM-SSM mode on IPv4 networks.
Benefits
PIM works together with other multicast protocols to implement applications, such as:
l Multimedia and media streaming applications
l Training and tele-learning communication
l Data storage and financial management applications
IP multicast is being widely used in Internet services, such as online broadcasts, network TV,
e-learning, telemedicine, network TV stations, and real-time video/voice conference services.
9.2.2 Principles
9.2.2.1 PIM-DM
PIM-DM is used for P2MP data transmission on small-scale networks on which users are
densely distributed.
PIM-DM uses a flooding-pruning method to forward multicast data. PIM-DM is not suitable
for large-scale networks with sparsely distributed users because a large number of Prune
messages will be generated and the flooding-pruning process is time-consuming.
PIM-DM constructs SPT MDTs with multicast sources as roots and group members as leaves.
PIM-DM assumes that all members are densely distributed and each network segment has
members. Based on these assumptions, a multicast source first floods multicast data to each
network segment and then prunes segments without any members. Through regular flooding
and pruning, PIM-DM creates and maintains a unidirectional and loop-free SPT that connects
the multicast source and group members.
Related Concepts
This section provides basic PIM-DM concepts. See Figure 9-5.
Receiver DR
Receiver
Ethernet
Receiver
Source DR
Source
Ethernet
PIM Router
l PIM device
A multicast router that supports PIM is called a PIM device. A PIM-enabled interface on
a PIM device is called a PIM interface.
l DR
A designated router (DR) is responsible for forwarding multicast data and is categorized
as a source's DR or receiver's DR.
– A multicast source's DR is a PIM device directly connected to a multicast source in
a PIM-DM domain and is responsible for forwarding multicast data packets to other
PIM devices.
– A receiver's DR is a PIM device directly connected to receivers' hosts and is
responsible for forwarding multicast data to group members.
l SPT
A shortest path tree (SPT) is a multicast distribution tree (MDT) with the multicast
source at the root and group members at leaves. SPTs can be used in PIM-DM, PIM-SM,
and PIM-SSM scenarios.
Implementation
The multicast data forwarding process in a PIM-DM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-DM domain periodically sends Hello messages to all other
PIM devices to discover PIM neighbors and maintain PIM neighbor relationships.
NOTE
By default, a PIM device permits other PIM control messages or multicast messages from a
neighbor, irrespective of whether the PIM device has received Hello messages from the neighbor.
However, if a PIM device has the neighbor check function enabled, the PIM device permits other
PIM control messages or multicast messages from a neighbor only after the PIM device has
received Hello messages from the neighbor.
2. Flooding
PIM-DM assumes that at least one multicast group member exists on each network
segment, and floods multicast data to all routers on the network. Therefore, all PIM
devices on the network can receive multicast data.
3. Prune
After flooding multicast data, PIM-DM prunes network segments that have no multicast
data receiver and retains only the network segments that have multicast data receivers.
Only PIM devices that require multicast data can receive multicast data.
4. State Refresh
If a downstream device is in the prune state, the upstream device maintains a prune timer
for this device. When the prune timer expires, the upstream device resumes data
forwarding to the downstream device, which wastes network resources. To prevent this
problem, the state-refresh function can be enabled on the upstream router. This function
enables the upstream router to periodically send State-Refresh messages to refresh the
status of the prune timers of downstream devices. Downstream devices that do not
require multicast data remain in the prune state.
5. Graft
If a node on a pruned network segment has new group members, PIM-DM uses the graft
mechanism to enable the node to immediately forward multicast data.
If there are multiple PIM devices on a network segment, the same multicast packets are sent
repeatedly across the network segment. The Assert mechanism can be used to select a unique
multicast data forwarder, preventing redundant multicast data forwarding.
Neighbor Discovery
Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that
carries a Hello message has the following features:
l The destination address is 224.0.0.13.
l The source address is an interface address.
l The TTL is 1, indicating that packets are sent to neighbor interfaces only.
Hello messages are used to discover neighbors, adjust protocol parameters, and maintain
neighbor relationships.
Flooding
The following example uses the network shown in Figure 9-6 to describe the flooding
function. The source sends a data packet to ATN A. Then ATN A floods the packet to all its
neighbors. ATN B and ATN C also exchange data packets with each other. To prevent data
duplication, PIM-DM capable ATN B uses the reverse path forwarding (RPF) mechanism to
ensure that it only permits data packets from one neighbor, ATN A or ATN C. (The RPF
check is used by a device to check received packets and determine unicast routing entry
creation and maintenance. When a device receives a multicast packet, the device searches the
unicast routing table, Multicast Border Gateway Protocol (MBGP) routing table, and static
multicast routing table for an RPF route that matches the packet source address. If the packet
inbound interface matches the RPF interface, the packet passes the RPF check; otherwise, the
packet is considered invalid and discarded. Therefore, the RPF check is the basis for multicast
routing because it ensures that multicast data is forwarded along correct paths.) Based on the
RPF check result, ATN B permits the data packet from ATN A and sends the packet to User
A.
UserA
PIM-DM
ATN C
packets
Flooding
Prune
The following example uses the network shown in Figure 9-7 to describe the prune function.
ATN C has no receivers, so it sends a Prune message upstream to ATN A to instruct ATN A
to stop forwarding data to the interface connected to ATN C.
After receiving the Prune message, ATN A stops forwarding data to the downstream
interface connected to ATN C. This process is called pruning. Because a downstream interface
on ATN A is connected to ATN B that has a receiver, ATN A forwards multicast data to the
downstream interface connected to ATN B. In this manner, a unidirectional and loop-free SPT
is set up from the source to User A.
PIM-DM
ATN C
packets
Prune
State Refresh
The following example uses the network shown in Figure 9-7 to describe the state refresh
function. After ATN A prunes the network segment of ATN C, ATN A maintains a prune
timer for ATN C. When the prune timer expires, ATN A resumes data forwarding to ATN C.
This results in a waste of network resources.
The state refresh function can prevent this problem and works as follows: ATN A periodically
floods State-Refresh messages to all its downstream interfaces to reset the prune timers of all
the downstream devices.
Graft
The following example uses the network shown in Figure 9-8 to describe the graft function.
If User B sends to pruned ATN C an IGMP Report message for joining a multicast group or
for responding to a Query message, flooding-pruning needs to be performed, so the entire
service access process is prolonged. The graft function can be used to prevent flooding-
pruning and shorten this process as follows:
ATN C sends a Graft message upstream to require ATN A to restore the forwarding status of
the downstream interface connected to ATN C. After restoring the forwarding the status, ATN
A sends multicast data to ATN C. Therefore, the graft function implements rapid data
forwarding for devices in the pruned state.
PIM-DM
Receiver
UserB
ATN C
packets
Graft
Assert
The following example uses the network shown in Figure 9-9 to describe the assert function.
ATN B and ATN C can receive multicast packets from the multicast source S and the
multicast packets that pass the RPF check. (S, G) entries can be created on ATN B and ATN
C. Because the downstream interfaces of ATN B and ATN C are connected to the same
network segment, ATN A and ATN C can both send multicast data to the network segment.
The assert function is used to ensure that only one multicast data forwarder exists on the
network segment. The assert process is as follows:
1. ATN B receives a multicast packet from ATN C through a downstream interface, but this
packet fails the RPF check and is discarded by ATN B. At the same time, ATN B sends
an Assert message to the network segment.
2. ATN C compares its routing information with that carried in the Assert message sent by
ATN B. ATN C is denied because the route cost from ATN B to the source is lower. The
downstream interface of ATN C is prohibited from forwarding multicast packets and
deleted from the downstream interface list of the (S, G) entry.
3. ATN C receives a multicast packet from ATN B through the network segment, but the
packet fails the RPF check and therefore is discarded.
ATN B Ethernet
ATN C
multicast packets
Assert message from RouterB
Assert message from RouterC
9.2.2.2 PIM-SM
Protocol Independent Multicast-Sparse Mode (PIM-SM) implements P2MP data transmission
on large-scale networks on which multicast data receivers are sparsely distributed. PIM-SM
forwards multicast data only to network segments with active receivers that have required the
data.
PIM-SM assumes that no host wants to receive multicast data, so PIM-SM sets up an MDT
only after a host requests multicast data, and then sends the data to the host along the MDT.
Concepts
This section provides basic PIM-SM concepts. Figure 9-10 shows a typical PIM-SM
network.
RP Receiver DR
Receiver
Ethernet
Receiver
Source DR PIM-SM
Source
BSR
PIM Router
l PIM device
A router that runs PIM is called a PIM device. A router interface on which PIM is
enabled is called a PIM interface.
l PIM domain
A network constructed by PIM devices is called a PIM network.
A PIM-SM network can be divided into multiple PIM-SM domains by configuring
BootStrap router (BSR) boundaries on router interfaces to restrict BSR message
transmission. PIM-SM domains isolate multicast traffic between domains and facilitate
network management.
l DR
A designated router (DR) can be a multicast source's DR or a receiver's DR.
– A multicast source's DR is a PIM device directly connected to a multicast source
and is responsible for sending Register messages to an RP.
– A receiver's DR is a PIM device directly connected to receiver's hosts and is
responsible for sending Join messages to an RP and forwarding multicast data to
receiver's hosts.
l RP
A rendezvous point (RP) is the forwarding core in a PIM-SM domain and is used to
process hosts' join requests and multicast source's registration requests. An RP constructs
an MDT with the RP at the root, called an RP tree (RPT). An RP creates (S, G) entries to
transmit multicast data to hosts. All routers in the PIM-SM domain need to know the
RP's location.
The following table lists the types of RPs.
l BSR
Implementation
The multicast data forwarding process in a PIM-SM domain is as follows:
1. Neighbor discovery
Each PIM device in a PIM-SM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
NOTE
By default, a PIM device permits other PIM control messages or multicast messages from a
neighbor, irrespective of whether the PIM device has received Hello messages from the neighbor.
However, if a PIM device has the neighbor check function, the PIM device permits other PIM
control messages or multicast messages from a neighbor only if the PIM device has received Hello
messages from the neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on a shared network segment. The
source's DR is responsible for forwarding multicast data received from the multicast
source along an MDT.
3. RP discovery
An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards
multicast data on the entire network.
4. RPT setup
PIM-SM assumes that no hosts want to receive multicast data, so PIM-SM sets up an
RPT only after a host requests multicast data, and then sends the data from the RP to the
host along the RPT.
5. SPT switchover
A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All
multicast data packets are forwarded by the RP. The path along which the RP forwards
multicast data may not be the shortest path from the multicast source to receivers. The
load of the RP increases when the multicast traffic volume increases. If the multicast
data forwarding rate exceeds a configured threshold, an RTP-to-SPT switchover can be
implemented to reduce the burden on the RP.
If a network problem occurs, the assert mechanism or a DR switchover delay can be used to
ensure successful multicast data transmission.
l Assert
If multiple multicast data forwarders exist on a network segment, each multicast packet
is repeatedly sent across the network segment, generating redundant multicast data. To
resolve this issue, the assert mechanism can be used to select a unique multicast data
forwarder on a network segment.
l DR switchover delay
If the role of an interface on a PIM device is changed from DR to non-DR, the PIM
device immediately stops using this interface to forward data. If multicast data sent from
a new DR does not arrive, multicast data traffic is temporarily interrupted. If a DR
switchover delay is configured, the interface continues to forward multicast data until the
delay expires. Therefore, setting a DR switchover delay prevents multicast data traffic
from being interrupted.
The detailed PIM-SM implementation process is as follows:
Neighbor Discovery
Neighbor discovery in PIM-SM is the same as that in PIM-DM. For details, see Neighbor
Discovery.
DR Election
The network segment on which a multicast source or group members reside is usually
connected to multiple PIM devices, as shown in Figure 9-11. The PIM devices exchange
Hello messages to set up PIM neighbor relationships. A Hello message carries the DR
priority and the address of the interface that connects the PIM device to this network segment.
A PIM device compares its own information with that carried in messages sent from
neighbors to elect a DR. This process is a DR election. The DR election rules are as follows:
l The PIM device with the highest DR priority wins if all the PIM devices send Hello
messages that carry a DR priority.
l If the PIM devices have the same DR priority or one or more PIM devices do not support
Hello messages carrying a DR priority, the PIM device with the highest IP address wins.
Ethernet
Ethernet
UserA
Source
DR RP
DR
UserB
Server
Hello
Join
Register Message
RP Discovery
l Static RP
A static RP is specified by running commands. A static RP's address must be manually
configured on other routers so they can find and use this RP for data forwarding.
l Dynamic RP
A dynamic RP is elected among multiple PIM devices.
C-BSR
PIM-SM
BSR
C-RP
C-RP
Bootstrap
C-RP advertisement
The network shown in Figure 9-12 is used as example to describe the dynamic RP
election process:
a. To use a dynamic RP, configure C-BSRs to elect a BSR among the C-BSRs.
At first, each C-BSR considers itself a BSR and advertises a Bootstrap message.
The Bootstrap message carries the address and priority of the C-BSR. Each router
compares the information contained in its received Bootstrap messages to elect a
BSR as follows:
i. The C-BSR with the highest priority wins (the greater the priority value, the
higher the priority).
ii. If all the C-BSRs have the same priority, the C-BSR with the highest IP
address wins.
All the ATNs follow the same BSR election rules, so they will elect the same BSR
and learn the BSR address.
b. The C-RPs send C-RP Advertisement messages to the BSR. Each Advertisement
message carries the address of the C-RP that sent it, the range of multicast groups
that the C-RP serves, and the priority of the C-RP.
c. The BSR collects the received information as an RP-Set, encapsulates the RP-Set
information in a Bootstrap message, and advertises the Bootstrap message to all
PIM-SM devices.
d. Each ATN uses the RP-Set information to perform the same calculations and
comparisons to elect an RP among multiple C-RPs as follows:
i. A C-RP wins if it serves the group address that has the longest mask.
ii. If group addresses have the same mask length, the C-RP with the highest
priority wins (the greater the priority value, the lower the priority).
iii. If the C-RPs have same priority, the hash function is started. The C-RP with
the greatest calculated value wins.
iv. If none of the above criteria can determine a winner, the C-RP with the highest
address wins.
e. Because all ATNs use the same RP-set and the same election rules, the relationship
between the multicast group and the RP is the same for all ATNs. ATNs save this
relationship to guide subsequent multicast operations.
l Anycast-RP
In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When
the network is overloaded or traffic is heavy, many network problems can occur. For
example, if the RP is overloaded, routes will converge slowly, or the multicast
forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be
implemented through MSDP or PIM:
– Through MSDP: To use this mode, configure multiple RPs with the same address in
a PIM-SM domain, and allow the RPs to set Multicast Source Discovery Protocol
(MSDP) peer relationships, so that they can share multicast data source information.
This mode is only for use on IPv4 networks. For details about the implementation
principles, see Anycast-RP in MSDP.
– Through PIM: To use this mode, configure multiple RPs with the same address in a
PIM-SM domain, and assign a unique local address for each RP. These local
addresses are used to set up connectionless peer relationships between the RP
devices. The peers share multicast source information by exchanging Register
messages.
This mode is for use on IPv4 networks.
NOTE
These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is
implemented through PIM, you can also configure a device in a local domain to advertise the source
information obtained from extra-domain MSDP peers to the peers in the local domain.
With Anycast-RP, both a multicast receiver and a source select their topologically closest RPs
to create RPTs. After receiving multicast data, the receiver's DR determines whether to trigger
a switchover to an SPT. Therefore, Anycast-RP facilitates optimal RP route selection and
implements load sharing on RPs.
PIM-SM
RP1 DR1
U1 S1
S2 U2
DR2 RP2
Register message
In the PIM-SM domain shown in Figure 9-13, multicast sources S1 and S2 send multicast
data to multicast group G, and U1 and U2 are members of group G. Perform the following
operations to implement Anycast-RP through PIM:
l Configure RP1 and RP2 and assign a same IP address for them (for example, the
loopback interface address 10.10.10.10).
l Assign a unique local IP address for each RP (for example, 1.1.1.1 for RP1, and 2.2.2.2
for RP2), so that the RPs can set up a connectionless peer relationship.
– After receiving the (S2, G) Register message from DR2, RP2 replaces the source
and destination addresses with 2.2.2.2 and 1.1.1.1 respectively, and re-encapsulates
and sends the message to RP1. After receiving the re-encapsulated Register
message, RP1 processes this Register message but does not forward it to other
peers.
4. Each RP joins an SPT with the source's DR as the root to obtain multicast data.
– RP1 sends a Join message to S2. Then, S2 sends multicast data to RP1 along the
SPT, and RP1 sends the data to U1 along the RPT.
– RP2 sends a Join message to S1. Then, S1 sends multicast data to RP2 along the
SPT, and RP2 sends the data to U2 through the RPT.
5. After receiving the multicast data, each receiver's DR determines whether to trigger a
switchover to an SPT based on configurations.
RPT Setup
A PIM-SM RPT is an MDT that uses an RP as a root and group member ATNs as leaves.
An RP is a data forwarding core for processing Register messages from source's DRs and
Join messages from receivers. Therefore, an RP acts as an information convergence center.
All PIM devices need to know the RP address.
Figure 9-14 shows the RPT setup and data forwarding processes.
Receiver
RouterC DR
(*,G) join ATN D
packets
NOTE
To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM
supports switchovers to SPTs, allowing a multicast network to set up an SPT. Then, the multicast source
can send multicast data directly to receivers along the SPT.
SPT Switchover
A PIM-SM SPT is an MDT with the multicast source as the root and the group members as
leaves.
In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set
up. If SPT switchover is not enabled, all multicast packets must be encapsulated in Register
messages and then sent to the RP. After receiving the packets, the RP de-encapsulates them
and forwards them along the RPT.
Since all multicast messages forwarded along the RPT are transferred by the RP, the RP may
be overloaded when multicast traffic is heavy. To resolve this problem, PIM-SM allows the
RP or the receiver's DR to trigger an SPT switchover.
RouterC DR
ATN D
(*,G) join
packets
(S,G) join
packets
After the SPT is set up and the RP receives the first multicast data message on the SPT,
the RP stops processing Register messages. This frees the source's DR and RP from
encapsulating and decapsulating messages. Multicast data is sent from the ATN directly
connected to the multicast source to the RP along the SPT and then forwarded to group
members along the RPT.
l SPT switchover triggered by the receiver's DR
a. As shown in Figure 9-15, multicast data is transmitted along the RPT. The
receiver's DR (ATN D) sends (*, G) Join messages to the RP. Multicast data is sent
to receivers along the path: source's DR (ATN A)->RP (ATN B)-> receiver's DR
(ATN D).
b. The receiver's DR periodically checks the forwarding rate of multicast packets. If
the receiver's DR detects that the forwarding rate is greater than a configured
threshold, the DR triggers an RPT-to-SPT switchover.
c. The receiver's DR sends (S, G) Join messages to the source's DR. After receiving
multicast data along the SPT, the receiver's DR discards multicast data received
along the RPT and sends a Prune message to the RP to delete the receiver from the
RPT. The switchover from the RPT to the SPT is then complete.
d. Multicast data is forwarded along the SPT. Specifically, multicast data is
transmitted to receivers along the path: multicast source's DR (ATN A) -> receiver's
DR (ATN D).
After an SPT is set up, subsequent packets may not pass through the RP. After a
switchover to an SPT, delays in transmitting multicast data are reduced because the
previously used RPT may not have the shortest path.
If one source sends packets to multiple groups simultaneously and an SPT switchover policy
is specified for a specified group range:
l Before an SPT switchover, packets reach the receiver's DR along the RPT.
l After an SPT switchover, only the packets within the group range specified in the SPT
switchover policy are forwarded along the SPT. Other packets are still forwarded along
the RPT.
NOTE
By default, the RP performs an SPT switchover immediately after receiving the first Register message,
and the receiver's DR performs an SPT switchover immediately after receiving the first multicast data
message.
Assert
Either of the following conditions indicates other multicast forwarders are present on a
network segment:
If other multicast forwarders are present on the network segment, the ATN starts the Assert
mechanism as follows.
The ATN sends an Assert message through the downstream interface, and the downstream
interface also receives an Assert message from a forwarder on the network segment. In an
Assert message, the destination address is 224.0.0.13, the source address is the downstream
interface address, and the TTL is 1. An Assert message carries the route cost from the PIM
ATN to the source or RP, priority of the used unicast routing protocol, and group address.
The ATN compares information in its sent and received Assert messages to start Assert
election. The election rules are as follows:
1. The ATN with the highest unicast routing protocol priority wins.
2. If the ATNs have the same unicast routing protocol priority, the ATN with the smaller
route cost to the source or RP wins.
3. If the ATNs have the same priority and route cost, the ATN with the highest IP address
for the downstream interface wins.
The ATN performs the following operations based on the Assert election result:
l If the ATN wins, the downstream interface of the ATN is responsible for forwarding
multicast packets on the network segment. The downstream interface is called an Assert
winner.
l If the ATN loses, the downstream interface is prohibited from forwarding multicast
packets and deleted from the downstream interface list of the (S, G) entry. The
downstream interface is called an Assert loser.
After the Assert election is complete, only one upstream ATN that has a downstream interface
exists on the network segment and the downstream interface transmits only one copy of
multicast traffic. The Assert winner then periodically sends Assert messages to maintain its
status as the Assert winner. If the Assert loser does not receive any Assert message from the
Assert winner within the time limit, it re-adds a downstream interface for multicast data
forwarding.
DR Switchover Delay
If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is
triggered.
By default, when an interface changes from a DR to a non-DR, the ATN immediately stops
using the interface to forward data. If multicast data sent from a new DR has not yet arrived,
multicast data flows are temporarily interrupted.
However, if a PIM-SM interface that has a PIM DR switchover delay configured receives
Hello messages from a new neighbor and changes from a DR to a non-DR, the interface
continues to function as a DR and to forward multicast messages until the delay elapses.
If the PIM-SM interface receives packets from a new DR before the delay elapses, the
interface immediately stops forwarding messages, preventing duplicated multicast data
transmission. Therefore, when a new IGMP Report message is received on the shared network
segment, the new DR, not the old DR configured with the DR switchover delay, sends a PIM
Join message to the upstream device.
NOTE
If the new DR receives multicast data from the original DR before the DR switchover delay elapses, an
Assert election is triggered.
division also allows for the use of private group addresses to provide user services in a
specified domain.
Each BSR administrative domain has one BSR, and this BSR serves multicast groups in a
specific address range. The global domain also has one BSR, and the BSR serves all multicast
groups that not served by the BSR administrative domains.
The relationship between the BSR administrative domain and the global domain is described
as follows in terms of the domain space, group address range, and multicast function.
l Domain space
C-RP BSR
BSR1
domain
BSR C-RP
Global
C-RP
domain
Each BSR administrative domains contains exclusive ATNs, and each ATN belongs only
to one BSR administrative domain, as shown in Figure 9-16. BSR administrative
domains are independent and geographically isolated from each other. Each BSR
administrative domain serves multicast groups in a specific address range and cannot
transmit multicast messages to other BSR administrative domains.
The global domain contains all the ATNs on the PIM-SM network. The multicast
messages that do not belong to a BSR administrative domain can be transmitted over the
entire PIM network.
l Group address range
BSR1 BSR3
G1 address G3 address
Global
G-G1-G2 address BSR2
G2 address
Each BSR administrative domain provides services for multicast groups within a specific
address range. The multicast groups that different BSR administrative domains serve can
overlap. However, a multicast group address is valid only in its serving BSR
administrative domain and considered a private group address of the domain. As shown
in Figure 9-17, the group address range of BSR1 overlaps with that of BSR3.
The multicast group that does not belong to any BSR administrative domain belongs to
the global domain. In the example shown in Figure 9-17, the group address range of the
global domain is G-G1-G2.
l Multicast function
The global domain and each BSR administrative domain have their respective C-RP and
BSR devices, as shown in Figure 9-16. Devices only function in the domain to which
they are assigned. Each BSR administrative performs BSR and RP elections
independently.
Each BSR administrative domain has a border. Multicast information for this domain,
such as the C-RP Advertisement messages and BSR Bootstrap message, can be
transmitted only within the domain. Multicast information for the global domain can be
transmitted over the entire global domain and can traverse any BSR administrative
domain.
9.2.2.3 PIM-SSM
PIM-SM needs to maintain Rendezvous Points (RPs) to transmit multicast data. If receivers
know the exact location of a multicast source and want to request multicast data directly from
a multicast source, Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) can
enable use hosts to rapidly join multicast groups. A shortest path tree (SPT) is set up between
the multicast source and group members.
Unlike the Any-Source Multicast (ASM) model, the Source-Specific Multicast (SSM) model
does not need to maintain an RP, construct a rendezvous point tree (RPT), or register a
multicast source.
The SSM model is based on the PIM-SM technology and IGMPv3/Multicast Listener
Discovery (MLD)v2. The procedure for setting up a multicast forwarding tree on a PIM-SSM
network is similar to the procedure for setting up an SPT on a PIM-SM network. The
receiver's Designated router (DR), which knows the exact position of the multicast source,
sends Join messages directly to the source so that multicast data streams can be sent to the
receiver's DR.
Related Concepts
PIM-SSM is implemented based the PIM-SM technology. For details about PIM-SSM
concepts, see Related Concepts.
Implementation
The process for forwarding multicast data in a PIM-SSM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-SSM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
NOTE
By default, a PIM device permits other PIM control messages or multicast messages from a
neighbor, irrespective of whether the PIM device has received Hello messages from the neighbor.
However, if a PIM device has the neighbor check function, the PIM device permits other PIM
control messages or multicast messages from a neighbor only after the PIM device has received
Hello messages from the neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on the segment.
3. SPT Setup
Users on a PIM-SSM network can know the exact position of the multicast source in
advance and can, therefore, specify the source when joining a multicast group. After
receiving a Report message from a user, the receiver's DR sends a Join message towards
the multicast source to establish an SPT between the source and the user. Multicast data
is sent by the multicast source to the user along the SPT.
NOTE
l The SPT establishment can be triggered by a user dynamically joining a multicast group, static join,
or SSM-mapping.
l The DR in an SSM scenario is valid only in the shared network segment connected to group
members. The DR on the group member side sends Join messages to the multicast source, creates
the (S, G) entry hop by hop, and then sets up an SPT.
l PIM-SSM supports a PIM DR switchover delay, PIM silent, and BFD for PIM.
NOTE
Source PIM-SM
ATN C
ATN B GE2
GE1
Ethernet
Receiver
As shown in Figure 9-18, in the shared network segment where user hosts reside, a PIM BFD
session is set up between the downstream interface GE1 of ATN B and the downstream
interface GE2 of ATN C. Both interfaces send BFD packets to detect the status of the link
between them.
GE1 of ATN B is elected as a DR for forwarding multicast data to Receiver. If GE1 fails,
BFD fast notifies the RM of the session status and the RM then notifies PIM. PIM triggers a
new DR election. GE2 of ATN C is then elected as a new DR to forward multicast data to
Receivers.
PIM NSR
Multicast NSR enables the protocol control plane to back up protocol control information,
including neighbor information, MDT information, and RP set information. Multicast NSR
also synchronizes information between the protocol control and forwarding control planes,
and between the forwarding control planes of the MPU and LPUs.
Currently, multicast NSR can be used with PIM-SM, and PIM-SSM.
NOTE
PIM PIM-DM This function is PIM IPSec All PIM All PIM
IPSec PIM-SM used to authenticate uses devices on a devices on a
IPv4 or IPv6 PIM security network. network.
PIM- packets to prevent association
SSM bogus IPv4 or IPv6 (SA) to
PIM protocol authenticate
packet attacks or sent and
denial of service received
(DoS) attacks, IPv4 or
improving IPv6 PIM
multicast service packets.
security. The PIM
IPSec
implementa
tion process
is as
follows:
l Before
an
interface
sends
out an
IPv4 or
IPv6
PIM
protocol
packet,
IPSec
adds a
protocol
header
to the
packet.
l After an
interface
receives
an IPv4
or IPv6
PIM
protocol
packet,
IPSec
uses a
protocol
header
to
authenti
cate the
protocol
header
in the
packet.
If the is
authenti
cation is
successf
ul, the
packet is
forward
ed.
Otherwi
se, the
packet is
discarde
d.
PIM IPSec
can
authenticate
the
following
types of
IPv4 or
IPv6 PIM
packets:
l IPv4 or
IPv6
PIM
multicas
t
protocol
packets,
such as
Hello
and
Join/
Prune
packets.
l IPv4 or
IPv6
PIM
unicast
protocol
packets,
such as
Register
and
Register
-Stop
packets.
NOTE
For IPsec
feature
description,
see IPSec.
l The value of the protocol type field in the IP header is 103. It indicates that the PIM
message is encapsulated in the data field.
l The destination address of the IP header identifies the receiver of the PIM message. The
destination address can be either a unicast address or a multicast address.
l PIM-DM and PIM-SM support different control messages.
0 4 8 16 31
Version Type Reserved Checksum
Field Description
Field Description
Hello Message
NOTE
Hello messages are used in PIM-DM and PIM-SM, so you cannot distinguish PIM-DM or PIM-SM
through the Hello message.
PIM devices periodically send Hello messages through all PIM interfaces. PIM devices
discover neighbors and maintain the neighbor relationship by exchanging Hello messages.
The source address of the IP packet encapsulated with the Hello message is the local interface
address. The destination address is 224.0.0.13, and the TTL value is 1. The message is
transmitted in multicast mode.
...
Reserved Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is ignored when the message is
received.
OptionType Indicates the types of parameters. For the valid values, see
Table 9-10.
OptionType OptionValue
Register Message
NOTE
When an active multicast source appears in the PIM-SM network, the Designated router (DR)
at the source side sends a Register message to the Rendezvous Point (RP) to register the
source.
The source address of the IP packet encapsulated with the Register message is the address of
the DR at the source side and the destination address is the address of the RP. The message is
transmitted in unicast mode.
Reserved Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is not processed when the message is
received.
Field Description
Reserved2 Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is not processed when the message is
received.
Multicast data packet Indicates the multicast data packet. The DR at the source side
encapsulates the received multicast data in a Register message
and sends the message to the RP. After decapsulating the
message, the RP learns the (S, G) information of the multicast
data packet.
If a multicast source may send data to multiple groups, the DR at the source side must send a
Register message to the RP that each group corresponds to. A Register message is
encapsulated in only one multicast data packet, so the packet carries only one copy of the (S,
G) information.
In the register suppression period, the DR sends Null-Register messages to notify the RP that
the multicast source is still in the active state. After the register suppression times out, the DR
reuses the Register message to encapsulate multicast data packet. In the Null-Register
message, the field contains only the IP header of the multicast data packet, including the
source address and group address.
Register-Stop Message
NOTE
In a PIM-SM network, the RP sends Register-Stop messages to the DR at the source side in
the following cases:
l Receivers have not received data sent to a certain group from the RP.
l The RP does not serve a certain multicast group.
l Multicast data has been switched from the rendezvous point tree (RPT) to the shortest
path tree (SPT).
After receiving the Register-Stop message, the DR at the source side stops using the Register
message to encapsulate multicast data packet and enters the register suppressed state.
The source address of the IP packet encapsulated with the Register message is the address of
the RP and the destination address is the address of the DR at the source side. The message is
transmitted in unicast mode.
Version Type -
Group Address
Source Address
An RP can synchronously serve multiple groups and a group may correspond to multiple
sources that send data to the group. Therefore, an RP may synchronously perform multiple (S,
G) registrations.
A Register-Stop message carries only one copy of the (S, G) information. When the RP sends
a Register-Stop message to the DR at the source side, the RP can end only one (S, G)
registration.
After receiving the Register-Stop message carrying the (S, G) information, the DR at the
source side stops encapsulating (S, G) packets. S still uses Register messages to encapsulate
packets and send the packets to other groups.
Join/Prune Message
NOTE
A Join/Prune message can contain both Join messages and Prune messages. The Join/Prune
message that contains only a Join message is called Join message. The Join/Prune message
that contains only a Prune message is called Prune message.
l When the downstream interface of a PIM device does not have any receiving
requirement, the PIM device sends a Prune message through the upstream interface to
notify the upstream device of stopping forwarding packets to the network segment.
l When a group member appears in the PIM-SM network, the DR at the group member
side sends a Join message through Reverse Path Forwarding (RPF) interface towards the
RP to notify the upstream neighbor of forwarding packets to the network segment. The
Join message is sent to the upstream hop by hop. The RPT is set up.
l When the RP triggers the SPT switchover, the RP sends a Join message through the RPF
interface towards the source to notify the upstream neighbor of forwarding packets to the
network segment. The Join message is sent to the upstream hop by hop. The RP-source
tree is set up.
l When the DR at the group member side triggers the SPT switchover, the DR sends a Join
message through the RPF interface towards the source to notify the upstream neighbor of
forwarding packets to network segment. The Join message is sent to the upstream hop by
hop. The SPT is set up.
l A PIM network segment may be connected to a downstream interface and multiple
upstream interfaces. Assume that an upstream interface sends a Prune message. If other
upstream interfaces still need to receive multicast packets, these interfaces must send the
Join message within the override-interval. Otherwise, the downstream interfaces
responsible for forwarding packets in the network segment cannot perform the prune
action.
NOTE
ATN A
GE1
Ethernet
Prune Join
GE2 GE3
ATN B ATN C
The source address of the IP packet encapsulated with Join/Prune message is the local
interface address. The destination address is 224.0.0.13, and the TTL value is 1. The message
is transmitted in multicast mode.
Version Type -
Upstream Neighbor Address
- Number of Groups(N) Holdtime
...
Group Address [ 1 ]
Number of Joined Sources( J ) Number of Pruned Sources( P )
Joined Source Address [ 1 ]
...
Joined Source Address [ J ]
Pruned Source Address [ 1 ]
...
Pruned Source Address [ P ]
Upstream Neighbor Indicates the address of the upstream neighbor, that is, the
Address address of the downstream interface that receives the Join/
Prune message and performs the Join and Prune actions.
Number of Joined Indicates the number of sources that the ATN joins.
Sources
Number of Pruned Indicates the number of sources that the ATN prunes.
Sources
Joined Source Address Indicates the address of the source that the ATN joins.
Pruned Source Address Indicates the address of the source that the ATN prunes.
Bootstrap Message
NOTE
When the dynamic RP is used in the PIM-SM network, ATNs configured with Candidate-
BSR (C-BSR) periodically send Bootstrap messages through all PIM interfaces to take part in
the BootStrap router (BSR) election. The ATN that wins in the election continues to send
Bootstrap messages carrying RP-set information to all PIM devices in the domain.
The source address of the IP packet encapsulated with the Bootstrap message is the C-BSR
address and the destination address is 224.0.0.13. The packet is sent in multicast mode. The
packet with the TTL of 1 is forwarded hop by hop in the PIM-SM and is flooded in the entire
network at last.
Version Type -
Fragment Tag Hash Mask Length BSR-priority
BSR-Address
Group-RP Record [ 1 ]
...
Group-RP Record [ N ]
Group Address
RP-Count Frag RP-Cnt(M) -
RP-address [ 1 ]
RP-holdtime [ 1 ] RP-Priority [ 1 ] -
...
RP-address [ M ]
RP-holdtime [ M ] RP-Priority [ M ] -
Hash Mask length Indicates the length of the Hash mask of the C-BSR.
Field Description
Frag RP-Cnt Indicates the total number of the C-RPs that want to serve the
group in the network segment. The packet may be fragmented
and the RP-Set information may not be integrated, so the field
is used to indicate this.
The BSR boundary of a PIM interface can be set by using the pim bsr-boundary command
on the interface. Multiple BSR boundary interfaces divide the network into different PIM-SM
domains. Bootstrap messages cannot pass through the BSR boundary.
Assert Message
NOTE
In the shared network segment, if a PIM device receives an (S, G) packet from the
downstream interface of the (S, G) or (*, G) entry, it indicates that other forwarders exist in
the network segment. The PIM device sends an Assert message through the downstream
interface to take part in the election. The ATN that fails in the election stops forwarding
multicast packets through the downstream interface.
The source address of the IP packet encapsulated with the Assert message is the local
interface address, the destination address is 224.0.0.13, and the TTL value is 1. The packet is
sent in multicast mode.
Version Type -
Group Address
Source Address
R Metric Preference
-
Metric
Source address If the ATN elects the unique forwarder of the (S, G) entry, the
address is the source address. If the ATN elects the unique
forwarder of the (*, G) entry, the address is 0.
R Indicates the RPT bit. If the ATN elects the unique forwarder of
the (S, G) entry, the bit is set to 0; if the ATN elects the unique
forwarder of the (*, G) entry, the bit is set to 1.
Metric Preference Indicates the priority of the unicast path to the source address.
If the R field value is set 1, this field indicates the priority of the
unicast path to the RP.
Metric Indicates the cost of the unicast route to the source address.
If the R field value is set 1, this field indicates the cost of the
unicast path to the RP.
Graft Message
NOTE
In the PIM-DM network, when the ATN receives a Report message from a host, the ATN
sends a Graft message through the upstream interface of the related (S, G) entry if the ATN is
not on the SPT. The upstream neighbor immediately restores the forwarding of the
downstream interface. If the upstream neighbor is not on the SPT, the neighbor continues to
send the Graft message to the upstream.
The source address of the IP packet encapsulated with the Graft message is the local interface
address and the destination address is the RPF neighbor. The packet is sent in unicast mode.
The format of the Graft message is the same as that of the Join/Prune message, as shown in
Table 9-16. Only the values of partial fields are different.
Joined Source Address Indicates the source address of the (S, G) to be grafted.
Graft-Ack Message
NOTE
In the PIM-DM network, when the ATN receives a Graft message from the downstream, the
ATN restores the forwarding of the related downstream interface. At the same time, the ATN
sends a Graft-Ack message through the downstream interface to notify that it has received the
Graft message.
If the ATN that sends out the Graft message does not receive any Graft-Ack message in the
set time, the ATN considers that the upstream does not receive the Graft message and resends
it.
The source address of the IP packet encapsulated in the Graft-Ack message is the downstream
interface address of an upstream device and the destination address is the address of the ATN
that sends out the Graft message. The packet is sent in unicast mode.
The format of the Graft-Ack message is the same as that of the Graft message and copies
some contents of the Graft message. Only the values of partial fields are different.
Upstream Neighbor Indicates the address of the ATN that sends out the Graft
Address message.
When the dynamic RP is used in the PIM-SM network, ATNs configured with C-RP
periodically send Advertisement messages to notify the BSR of the range of groups they want
to serve.
The source address of the IP packet encapsulated with the Advertisement message is the C-RP
at the source side and the destination address is the BSR. The packet is sent in unicast mode.
Version Type -
Prefix-Cnt Priority Holdtime
RP-Address
Group Address [ 1 ]
...
Group Address [ N ]
Field Description
State-Refresh Message
NOTE
In the PIM-DM network, to avoid that the interface restores forwarding because the prune
timer times out, the first-hop router nearest to the source periodically triggers State-Refresh
messages. The State-Refresh message is flooded in the entire network and the statuses of
prune timers on all ATNs are refreshed.
The source address of the IP packet encapsulated with the State-Refresh message is the
downstream interface address, the destination address is 224.0.0.13, and the TTL value is 1.
The packet is sent in multicast mode.
Version Type -
Multicast Group Address
Source Address
Originator Address
Metric Preference
Metric
Masklength TTL P - Interval
Field Description
Field Description
Metric Preference Indicates the priority of the unicast route to the source.
Masklength Indicates the address mask length of the unicast route to the
source.
9.2.3 Applications
HostA
PIM-DM
Source
RouterD ATN B
HostB
ATN C
Implementation Solution
On the network shown in Figure 9-33, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast
mode. PIM-DM is used throughout the PIM domain. RouterD is connected to the multicast
source. ATN A is connected to Host A. ATNs B and C are connected to Host B.
Figure 9-34 shows a large-scale network with multicast services deployed. An IGP has been
deployed, and each network segment route is reachable. Group members are distributed
sparsely. Users on the network want VoD services, but network bandwidth resources are
limited.
S1 RouterB ATN C
HostA
Loopback0
Loopback0
C-BSR C-RP PIM-SM
S2
C-RP C-BSR
RouterG
ATN F
Implementation Solution
As shown in Figure 9-34, Host A and Host B are multicast information receivers, each
located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SM is used in the entire PIM domain. RouterB is connected to multicast source S1.
RouterA is connected to multicast source S2. ATN C is connected to Host A. ATNs E and F
are connected to Host B.
Avoid configuring static RPs on some equipments and dynamic RPs on others in the same PIM
domain. This ensures that RP information is consistent throughout the PIM domain.
l IGMP is run between ATN C and Host A and between ATN E, ATN F, and Host B.
When configuring IGMP on equipment interfaces, ensure that interface parameters are
consistent. All equipments connected to the same network must run the same version of
IGMP (IGMPv2 is recommended) and be configured with the same parameter values,
such as the interval at which IGMP Query messages are sent and holdtime of
memberships. Otherwise, IGMP group memberships on different equipments are
inconsistent.
l Hosts A and B send Join messages to the RP as required to obtain required information
from the multicast source.
NOTE
Configuring interfaces on network edge devices to statically join all multicast groups is
recommended to increase the speed for changing channels and to provide a stable viewing
environment for users.
S1 RouterB ATN C
HostA
PIM-SM
S2
ATN E
RouterA RouterD
HostB
RouterG
ATN F
Implementation Solution
On the network shown in Figure 9-35, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast
mode. PIM-SSM is used throughout the PIM domain. RouterB is connected to multicast
source S1. Router A is connected to multicast source S2. ATN C is connected to Host A.
ATNs E and F are connected to Host B.
Network configuration details are as follows:
l PIM-SSM is enabled on all equipment interfaces.
NOTE
A receiver in a PIM-SSM scenario can send a Join message directly to a specific multicast source.
A shortest path tree (SPT) is established between the multicast source and receiver. It is
unnecessary to maintain a Rendezvous Point (RP) on a PIM-SSM network.
l IGMP runs between ATN C and Host A, between ATN E and Host B, and between ATN
F and Host B.
When configuring IGMP on equipment interfaces, ensure that interface parameters are
consistent. All equipments connected to the same network must run the same version of
IGMP (IGMPv3 is recommended) and be configured with the same interface parameter
values, such as the Query timer value and hold time of memberships. If the IGMP
versions or interface parameters are different, IGMP group memberships are inconsistent
on different equipments.
l Host A can send Join messages to S1. Host B can send Join messages to S2. Information
sent by these multicast sources can reach corresponding user hosts.
NOTE
Configuring interfaces on network edge devices to statically join all multicast groups is
recommended to increase the speed for changing channels and to provide a stable viewing
environment for users.
Terms
Terms Definition
Terms Definition
C-BSR Candidate-BSR
RP Rendezvous Point
C-RP Candidate-RP
9.3 IGMP
9.3.1 Introduction
Definition
In the TCP/IP protocol suite, the Internet Group Management Protocol (IGMP) manages IPv4
multicast members, and sets up and maintains multicast member relationships between IP
hosts and their directly connected multicast ATNs.
After IGMP is configured on hosts and their directly connected multicast ATNs, the hosts can
dynamically join multicast groups, and the multicast ATNs can manage multicast group
members on the local network.
IGMP has three versions: IGMPv1 (defined by RFC 1112), IGMPv2 (defined by RFC 2236),
and IGMPv3 (defined by RFC 3376). All the IGMP versions support any-source multicast
(ASM). IGMPv3 supports source-specific multicast (SSM), not requiring the SSM mapping
technique; however, IGMPv1 and IGMPv2 require the SSM mapping technique to support
SSM.
Purpose
IGMP allows receivers to access IP multicast networks, join multicast groups, and receive
multicast data from multicast sources. IGMP manages multicast group members by
exchanging IGMP messages between hosts and ATNs. In addition, IGMP records host join
and leave information on interfaces, ensuring correct multicast data forwarding on the
interfaces.
9.3.2 Principles
9.3.2.1 IGMPv1&v2&v3
IGMP
ISP
ATN A ATN B
Ethernet
IGMPv2 is capable of suppressing IGMP Report messages to reduce repetitive IGMP Report
messages. This function works as follows:
After a host (for example, Host A) joins a multicast group G, Host A receives an IGMP Query
message from the ATN. Then the host randomly selects a value from 0 to the maximum
response time (specified in the IGMP Query message) as the timer value. When the timer
expires, Host A sends an IGMP Report message of group G to the ATN. However, if Host A
receives an IGMP Report message of group G from another host in group G before the timer
expires, Host A does not send an IGMP Report message of group G to the ATN.
When a host leaves group G, the host sends an IGMP Leave message of group G to the ATN.
With the Report message suppression function in IGMPv2, the ATN cannot determine
whether another host exists in group G. Therefore, the ATN triggers a query on group G. If
another host exists in group G, the host sends an IGMP Report message of G to the ATN.
If the ATN sends the query on group G for a specified number of times, but does not receive
an IGMP Report message for group G, the ATN deletes information about group G and stops
forwarding multicast data of group G.
NOTE
Both IGMP queriers and non-queriers can process IGMP Report message, while only queriers can
forward IGMP Report messages. IGMP non-queriers cannot process IGMP Leave messages.
In IGMP group compatibility mode, a multicast device of a higher IGMP version can also be
compatible with the hosts of a lower IGMP version.
For example, the multicast device of the IGMPv2 version can correctly process the joining of
hosts in the IGMPv1 version; the multicast device of the IGMPv3 version can correctly
process the joining of hosts in the IGMPv1 or IGMPv2 version. When the multicast device
operates in IGMP group compatibility mode, and receives IGMP Report messages from the
hosts in a lower IGMP version, the multicast device automatically lowers the version of the
corresponding multicast group to be the same as that for the hosts and then operates in this
version.
For example, when the multicast device of IGMPv2 or IGMPv3 version receives Report
messages from the hosts in the IGMPv1 version, the multicast device lowers the version of
the corresponding multicast group to IGMPv1. Then, the multicast device ignores the
IGMPv2 Leave messages in the multicast group.
In addition, when the multicast device of the IGMPv3 version receives Report messages from
the hosts in the IGMPv2 version, the multicast device lowers the version of the corresponding
multicast group to IGMPv2. Then, the multicast device ignores the IGMPv3 BLOCK
messages, the IGMPv3 TO_IN messages, and the multicast source list in the IGMPv3 TO_EX
messages. The multicast source-selecting function of IGMPv3 messages is suppressed.
If the IGMP version of a multicast device is configured higher, the multicast group of the
original IGMP version can still function properly as soon as the multicast group contains
hosts.
NOTE
The IGMP-enabled multicast device plays the following two roles on the network segment:
l Querier
The querier is responsible for sending IGMP Query messages and receiving IGMP
Report messages and Leave messages from hosts. In this way, the multicast device
knows which multicast group has receivers (multicast group members) on the relevant
network segment.
l Non-querier
The non-querier only receives IGMP Report messages from hosts, and knows which
multicast group on the network segment has receivers. Then, according to the action of
the querier on the network segment, the multicast device identifies which receives leave
the network segment.
Generally, only one querier exists on a network segment. Therefore, you must follow the
principles to select the querier among multicast devices (take ATN A, ATN B, and ATN C as
an example):
l After ATN A is enabled with IGMP, ATN A considers itself as the default querier of the
network segment in the IGMP startup process, and sends IGMP Query messages on the
network segment. If ATN A receives the IGMP Query message from ATN B that has a
lower IP address, ATN A is changed from the querier to the non-querier, starts the
another-querier-existing timer, and records ATN B as the querier of the network segment.
l If ATN A in the non-querier state receives the IGMP Query message from the querier
ATN B, the another-querier-existing timer is updated; if ATN A in the non-querier status
receives the IGMP Query message from ATN C that has a lower IP address than the
querier ATN B, the querier is updated to be ATN C, and the another-querier-existing
timer is updated.
l When ATN A is in the non-querier status, the another-querier-existing timer expires.
Then ATN A is changed from the non-querier to the querier.
NOTE
IGMPv1 does not support the querier election, and the querier in IGMPv1 is designated by the upper-
layer protocol, such as the Protocol Independent Multicast (PIM). At present, only the querier election
for multicast devices of the same network segment and same IGMP version is supported. Therefore, all
multicast devices on the same network segment must be configured with the same IGMP version.
Generally, packets are sent to and processed by the routing protocol only if the destination IP
address is the IP address of an interface on the device. In real applications, if the destination
IP address of protocol packets is a multicast address or a particular IP address, the packets
may not be sent to the routing protocol.
Therefore, Router-Alert, as a particular mechanism for marking protocol packets, is
introduced. If a packet contains the Router-Alert option, it indicates that the packets must be
sent to and processed by the routing protocol.
The destination IP address of IGMP packets is usually a multicast address, and thus the IGMP
packets may not be sent to the routing protocol. In such a situation, the Route-Alert option can
properly address the problem:
l By default, Router-Alert is not checked , the multicast device sends the received IGMP
messages to the routing protocol no matter whether these IGMP messages contain the
Route-Alert option.
l When Router-Alert is configured to be checked, only the IGMP messages containing the
Route-Alert option can be sent to the routing protocol.
IGMP Only-Link refers to the mechanism that the interface of the multicast device that is
connected to the host is only enabled with IGMP (rather than other upper-layer protocols such
as PIM), and IGMP guides data forwarding on the corresponding network segment.
Compared with PIM-guided data forwarding on a network segment, IGMP Only-Link reduces
the maintenance jobs of the multicast device on information such as PIM neighbors and state
machine of the PIM interface.
After IGMP Only-Link is used, the querier provides the following functions:
l By sending IGMP Query messages to hosts and receiving IGMP Report messages and
Leave messages from hosts, the querier can know which multicast group contains
receivers on the relevant network segment.
l The querier maintains the Join/Leave status of the IGMP multicast group, and guides
data forwarding on the relevant network segment according to the Join/Leave status.
The non-querier only maintains the Join/Leave status of the IGMP multicast group.
NOTE
If PIM is enabled on the interface, Designated router (DR) is responsible for guiding data forwarding.
For the details, refer to the section "Basic Principles of PIM DR Election" in the PIM-SM.
By sending IGMP Query messages to the connected host and receiving IGMP Report
messages and Leave messages from the host, a multicast device can know which multicast
group contains receivers on the relevant network segment. The device connected to the
multicast device, however, may be not a host, but an access device that is enabled with IGMP
proxy.
To reduce packet exchange between the multicast device and the access device, you can
perform optimization. After converging the IGMP Report/Leave status of the IGMP multicast
group, the access device reports the IGMP Report/Leave status to the multicast device only if
the status of the IGMP multicast group is changed. In other words, the access device sends
IGMP Report messages to the multicast device only if the first member joins the multicast
group, and sends the IGMP Leave message to the multicast device only if the last member
leaves the multicast group. This is called IGMP On-Demand.
The multicast device enabled with IGMP On-Demand does not send the IGMP Query
message initiatively to identify whether the IGMP multicast group contains receivers on the
network segment, but maintains the IGMP multicast group by receiving the Report/Leave
status of the multicast group converged by its connected access device (IGMP proxy).
IGMP On-Demand is applied to IGMPv2 and IGMPv3 only. After a multicast device is
enabled with IGMP On-Demand, the multicast device implements IGMP different from the
standard one, as shown follows:
l The multicast device does not send IGMP Query messages initiatively.
l After the multicast device receives the IGMP Report message, the multicast device
creates the entry about the multicast group and multicast source, and the entry never
expires.
l The multicast device deletes the relevant entries only if it receives the IGMP Leave
message.
When a host quits the multicast group G, the host sends the IGMP Leave message of G to the
multicast device. Because of the report depression mechanism in IGMPv2, the multicast
device cannot determine whether another host joins G. Therefore, the multicast device
triggers a query on G. If another host joins G, the host sends the IGMP Report message of G
to the multicast device. If the multicast device sends the query on G for several times, but
receives no IGMP Report message from any host, the multicast device does not record
information about G, and stops forwarding the multicast data of G to the relevant network
segment.
If the multicast device is only connected to an access device that is enabled with IGMP proxy,
when the access device leaves a multicast group G and sends the IGMP Leave message of G
to the multicast device, the multicast device can identify that G contains no receivers and thus
need not to trigger the IGMP Query message. Then the multicast device can delete all records
about G, and stop forwarding data of G to the relevant network segment. This is called IGMP
Prompt-Leave.
After the multicast device is enabled with IGMP Prompt-Leave, the multicast device triggers
no IGMP Query messages destined for the multicast group when the multicast device receives
the IGMP Leave message from the multicast group. Then the multicast device deletes all
records about the multicast group, and stops forwarding the data of the multicast group to the
relevant network segment. In this manner, the multicast device responses faster to the IGMP
Leave message.
NOTE
The IGMP On-Demand feature already includes the IGMP Prompt-Leave feature.
IGMP-Limit
Ethernet
HostA
ATN A Receiver
GE0/3/0 N1
192.168.1.1/24 GE0/3/1
HostB
10.110.1.1/24
ATN B
GE0/3/0 GE0/3/1
192.168.2.1/24 10.110.2.1/24 Leaf network
HostC
PIM network
ATN C GE0/3/1 Receiver
N2
10.110.2.2/24
HostD
GE0/3/0
192.168.3.1/24
Ethernet
When a large number of multicast users request multiple programs simultaneously, excessive
bandwidth resources will be exhausted, and the ATN's performance will be degraded,
deteriorating the multicast service quality. To prevent this problem, use IGMP-limit to restrict
the maximum number of multicast groups on specific or all interfaces. This function enables
users who have successfully joined multicast groups to enjoy smoother multicast services.
IGMP-limit applies to a specific ATN interface, a single instance, and all instances or the
entire system. With IGMP-limit, when an IGMP Report message for a new group reaches, the
ATN first checks whether the number of multicast groups exceeds the upper limit. If the upper
limit is not exceeded, the ATN establishes a membership for the group and forwards data of
this group. However, if the upper limit is exceeded, the ATN rejects the join request.
l IGMP-limit on an interface
– You can set an IGMP entry limit on an interface. Then, after the interface receives
an IGMP Join message, the interface can determine whether to create an entry
based on the IGMP entry limit.
– You can configure groups, including source-specific groups, to be free of the IGMP
entry limit.
l IGMP entry limit on the ATN
You can set the IGMP entry limit on the ATN. That is, you can limit the number of
IGMP entries on the interfaces belonging to all instances on the ATN.
– After an interface receives an IGMP Join message, the interface determines whether
to create an entry according to whether the number of the IGMP entries on the
whole ATN reaches the configured limit.
– When an interface deletes (*, G) and (S, G) entries, the interface decreases the
IGMP entries on the ATN correspondingly.
The preceding IGMP entry limit policies are subject to the following rules:
l A (*, G) entry or an (S, G) entry is counted as one entry.
l A (*, G) entry used in SSM mapping is counted as one entry; however, the (S, G) entry
mapped by the (*, G) entry is not counted as an entry.
Static-Group
ATN A
Source1
User1
PIM-DM
or
PIM-SM
Source2 User2
ATN B
data, the ATN can fast forward the multicast data. This step shortens the channel switchover
period.
IGMP Group-Policy
Ethernet
HostA
ATN A Receiver
GE0/3/0 N1
192.168.1.1/24 GE0/3/1
HostB
10.110.1.1/24
ATN B
GE0/3/0 GE0/3/1
192.168.2.1/24 10.110.2.1/24 Leaf network
HostC
PIM network
ATN C GE0/3/1 Receiver
N2
10.110.2.2/24
HostD
GE0/3/0
192.168.3.1/24
Ethernet
Group-Policy refers to a filtering policy configured on the ATN interface. After Group-Policy
is configured, the ATN can set restrictions on certain multicast groups, and establish no
entries for these multicast groups.
When too many users watch multiple programs simultaneously, greater ATN bandwidth is
consumed, leading to degraded ATN performance. To avoid this degradation, you can use
Group-Policy to set restrictions on certain multicast groups and limit the number of multicast
groups. In addition, for network security or expedient management, you can also use Group-
Policy to prevent receiving IGMP Report messages from certain multicast groups and prohibit
forwarding data of these multicast groups.
The Source Specific Multicast Mapping (SSM Mapping) mechanism enhances the
compatibility of the hosts running versions earlier than IGMPv3, and ensures that these hosts
can also use services in the SSM range. To be specific, the SSM mapping mechanism converts
the (*,G) of IGMPv1/v2 in the SSM range into the (S,G) according to the configured
conversion principle. In this manner, hosts of lower IGMP versions can also enjoy multicast
services in the SSM range.
What is more, the SSM mapping mechanism can better protect the multicast source server and
prevent attacks to the server.
NOTE
The multicast device does not process the (*,G) requirements, but only the (S,G) requirements from the
multicast group of the SSM range. For details of SSM, see Protocol Independent Multicast-Source-
Specific Multicast (PIM-SSM).
SSM
IGMPv3
Report
As shown in Figure 9-40, in the user network segment of the SSM network, Host A runs
IGMPv3, Host B runs IGMPv2, and Host C runs IGMPv1. If you want Host B and Host C to
provide SSM multicast services for all hosts in the network segment without upgrading their
IGMP versions to IGMPv3, the multicast device needs to support SSM mapping.
If the multicast device supports SSM mapping, and is configured with the relevant conversion
principle, the multicast device performs either of the following after receiving the IGMP
Report messages (*,G) from Host B and Host C:
l If the multicast group of the messages indicates the ASM range, see the section 9.3.2.1
IGMPv1&v2&v3 for the processing method.
l If the multicast group of the messages indicates the SSM range, follow the SSM
mapping mechanism to convert the (*,G) of IGMPv1/v2 into the (S,G) according to the
configured conversion principle.
Figure 9-41 Networking diagram for various SSM mapping settings on various interfaces
HostC
ATN GE2
SSM GE1
GE0
HostB
HostA
For example, the network shown in Figure 9-41 provides SSM multicast services for all
interfaces connected to hosts.
l If all the interfaces require the same SSM multicast service, configure an entry
conversion principle in the IGMP view.
l If the interfaces require different SSM multicast services, configure an entry conversion
principle for each interface in the system view and enable the principle on the
corresponding interface.
ISP
ATN A
ATN B
10.0.0.1/24
Ethernet
Source address-based Internet Group Management Protocol (IGMP) message filtering enables
a multicast device's interface to filter IGMP messages based on the access control list (ACL)
configuration to protect a multicast device against attacks from user hosts. To ensure the
precision in multicast traffic sending, configure source address-based IGMP message filtering
on the multicast device's interface connected to user hosts. Different IGMP messages have
different source address-based filtering policies:
l IGMP Report or Leave messages
– If you have not specified an ACL rule:
n If the source address of an IGMP Report or Leave message and the IP address
of the receiving interface are on the same network segment, or the host address
of the IGMP Report or Leave message is 0.0.0.0, the IGMP source address
filtering is passed.
n If the source address of an IGMP Report or Leave message and the IP address
of the receiving interface are on different network segments, the IGMP source
address filtering fails and the IGMP Report or Leave message is discarded.
– If you have specified an ACL rule, the interface filters out the IGMP Report or
Leave messages whose source addresses do not match the ACL rule.
l IGMP Query messages: The interface filters out IGMP Query messages whose source
addresses do not match a specified ACL rule.
As shown in Figure 9-42, ATN A is connected to the hosts through the interface at
10.0.0.1/24. The source addresses of IGMP Report or Leave messages sent by Host A, Host
B, and Host C are 11.0.0.1, 10.0.0.8, and 0.0.0.0, respectively. If you have not specified an
ACL rule, the interface filters out the IGMP Report or Leave messages from Host A. If you
have specified an ACL rule, the interface filters out the IGMP messages whose source
addresses do not match the rule.
IGMPv1 has no IGMP IGMPv2 provides IGMP IGMPv2 can manage members
Leave messages. Leave messages. of multicast groups effectively.
IGMPv1 provides only IGMPv2 provides General The multicast group can be
General Query messages. Query messages and selected directly, and thus the
Group-specific Query selection is more precise.
messages.
The message contains the A message contains not The multicast source can be
multicast group only the multicast group selected directly, and thus the
information, rather than the information, but also the selection is more precise.
multicast source multicast source
information. information.
The IGMP Query message The IGMP Query message The multicast information
of a specified multicast of a specified multicast maintained by the non-querier
group features no re- group and a specified and querier can be kept
transmission mechanism. multicast source features consistent better.
the re-transmission
mechanism.
Source1 ATN A
User1
PIM-SM
Source2 User2
ATN B
IGMP is the protocol responsible for adding hosts into the routing network. Therefore, IGMP
is applied to the area where the multicast device and host are connected. Note that IGMP can
be used for hosts and multicast devices of different versions.
The IGMP On-Demand and IGMP Prompt-Leave features are only applicable to the
scenario where only a single multicast device and a single access device are located on the
shared network segment.
Terms
Terms Description
IGMP The Internet Group Management Protocol (IGMP) refers to the signaling
mechanism between the host and multicast device on the leaf network of IP
multicast.
The host joins or leaves a multicast group by sending relevant IGMP messages;
the multicast device identifies whether the multicast group contains members on
the downstream network.
(S,G) (S,G) refers to a multicast routing entry. S indicates a multicast source, and G
indicates a multicast group.
After a multicast message with S as the source address and G as the group
address reaches the multicast device, it is forwarded through the downstream
interface of the (S, G) entry.
Usually, the multicast message is expressed as the (S, G) message.
Terms Description
(*,G) (*,G) refers to a PIM routing entry. * indicates any multicast source, and G
indicates a multicast group.
(*, G) is applicable to all multicast messages with the multicast group address as
G. That is, all the multicast messages sent to G are forwarded through the
downstream interface of the (*, G) entry, regardless of which multicast sources
send the multicast messages.
9.4.1 Introduction
Definition
Layer 2 multicast is used to transmit multicast data on the data link layer. On the network
shown in Figure 9-44, after Layer 2 multicast is configured on ATN B (a Layer 2 device),
ATN B listens to Internet Group Management Protocol (IGMP) packets exchanged between
Router A (a Layer 3 device) and hosts, and creates a Layer 2 multicast forwarding table. This
implements on-demand multicast data transmission and ensures proper deployment of
multicast services on the data link layer.
L2 Multicast
IGMP Protocol Packet
Purpose
The primary purpose of Level 2 multicast is to reduce network bandwidth consumption. On
the network shown in Figure 9-44, after receiving multicast packets from Router A, ATN B at
the edge of the access network forwards these multicast packets to multicast receivers. If
Layer 2 multicast is not configured on ATN B, ATN B will broadcast received multicast data
packets in the broadcast domain to which the packets belong. All hosts including group
members and non-group members in the broadcast domain will receive the packets. This is
because ATN B does not know which interfaces are connected to receivers. This wastes
network bandwidth and adversely affects network security.
The problem of bandwidth waste can be addressed by configuring Layer 2 multicast on ATN
B. Layer 2 multicast enables ATN B to record the mappings between multicast group
addresses and relevant ports in the forwarding table. Instead of flooding multicast data
packets, ATN B can now forward these multicast data packets based on the forwarding table.
Upon receiving multicast packets, ATN B searches the forwarding table for downstream ports
based on group addresses of the multicast packets and forwards these multicast packets to
relevant users.
Functions
Layer 2 multicast has the following principal functions:
l IGMP snooping
Internet Group Management Protocol (IGMP) snooping provides a way to control
multicast traffic at Layer 2. By listening to IGMP packets exchanged between an
upstream device and hosts, IGMP snooping can set up Layer 2 multicast forwarding
tables to deliver traffic only to interfaces with at least one group member, significantly
reducing the volume of multicast traffic.
l Static Layer 2 multicast
A Layer 2 multicast forwarding table is manually configured in which interfaces and
multicast address entries are bound. This enables multicast data packets to be forwarded
to hosts that must steadily receive multicast data.
l Layer 2 Source-Specific Multicast (SSM) mapping
Layer 2 SSM mapping enables IGMPv2 hosts to enjoy IGMPv3 services.
l IGMP Snooping Proxy
An IGMP Snooping Proxy-enabled device has the several functions. Acting as an
attached host of an upstream device, it responds rapidly to Query messages sent from the
upstream device and forwards Report and Leave messages sent from users to the
upstream device. Acting as an upstream device directly connected to hosts, it sends
IGMP Query messages to the hosts and processes IGMP Report messages sent by the
hosts. This reduces performance pressure on the upstream device and saves network
bandwidth.
l Multicast VLAN
After the multicast VLAN function is configured on a Layer 2 device, an upstream
device of the Layer 2 device sends multicast data only to a specific multicast VLAN. The
Layer 2 device replicates the multicast data to its other VLANs. This reduces bandwidth
consumption on the upper-layer network.
Benefits
Layer 2 multicast implements the on-demand multicast data distribution on the data link layer.
It provides the following benefits:
l Saves network bandwidth.
9.4.2 Principles
Principles
Layer 3 devices and attached hosts use IGMP to implement multicast data communications.
In IGMP, before a host joins a multicast group, it needs to send an IGMP Report message to
the upstream device directly it is connected to. The upstream device can then send multicast
packets to the host. IGMP messages are encapsulated in IP packets (Layer 3 packets). A link
layer device cannot, however, process Layer 3 information carried in packets. In addition, the
link layer device cannot learn any multicast MAC address by learning the source MAC
addresses of link layer data frames because the source MAC addresses of the data frames
cannot be multicast MAC addresses. When a link layer device receives a date frame, the
destination MAC address of which is a multicast MAC address, the device cannot find a
matching entry in its MAC address table. Consequently, the link layer device broadcasts all
multicast packets it receives. This wastes bandwidth resources and poses a threat to network
security.
IGMP snooping is a basic Layer 2 multicast function, and is used to control multicast traffic at
Layer 2. A Layer 2 device that runs IGMP snooping listens to and analyzes IGMP messages
exchanged between a Layer 3 device and hosts. The Layer 2 device sets up a Layer 2
forwarding table based on these messages and uses this table to forward data packets.
Figure 9-45 shows a network on which ATN B functions as a Layer 2 device.
l If ATN B does not run IGMP snooping, multicast data is broadcast at the data link layer.
l If ATN B runs IGMP snooping and does not receive Report messages for multicast
group 225.0.0.1 from some users, ATN B does not broadcast the multicast data for this
group. Instead, it uses the Layer 2 multicast forwarding table to send the multicast data
to the group members through Port1 and Port2.
Figure 9-45 Multicast packet transmission before and after IGMP snooping is configured on a
Layer 2 device
Multicast packet transmission
without IGMP Snooping
Source RouterA
PIM
ATN B
Source
RouterA
PIM
ATN B
Port1 Port2
Port3
Multicast Packet
225.0.0.1 Port1
225.0.0.1 Port2
Related Concepts
Figure 9-46 is used to introduce concepts related to IGMP snooping.
Internet
/Intranet
Source
ATN B
ATN A
l Router port: is a port (labeled with a circle in Figure 9-46) connecting a link layer
multicast device to an upstream multicast router.
Router ports are either dynamic or static. Dynamic router ports are discovered by
protocols. Static router ports are manually configured.
l Member port of a multicast group: is a port (labeled with a square in Figure 9-46)
connecting a link layer multicast device to a group member host. The link layer multicast
device uses the member port of a multicast group to send multicast packets to a host. A
member port of a multicast group is called a member port for short.
Member ports are either dynamic or static. Dynamic member ports are discovered by
protocols. Static member ports are manually configured.
l Layer 2 multicast forwarding entry: is the basis for multicast data forwarding. Devices at
the link layer use entries in the multicast forwarding table to forward multicast packets
sent by an upstream device to receivers. An entry in a Layer 2 multicast forwarding table
contains the following information:
– VLAN ID or VSI name
– Multicast group address
– Router port (upstream port)
– Member port list (downstream port list)
Multicast MAC address: is mapped from a multicast IP address contained in a multicast data
packet to be transmitted at the data link layer. Multicast MAC addresses are used to transmit
multicast data packets at the data link layer. As defined by the Internet Assigned Numbers
Authority (IANA), the 24 most significant bits of a multicast MAC address are 0x01005e, the
25th bit is 0, and the 23 least significant bits are the same as those of a multicast IP address.
Figure 9-47 shows the mapping between a multicast IP address and a multicast MAC address.
For example, if the IP address of a multicast group is 224.0.1.1, the MAC address of this
multicast group is 01-00-5e-00-01-01. Information about 5 bits of the IP address is lost,
because only 23 bits of the 28 least significant bits of the IP address are mapped to the MAC
address. As a result, 32 IPv4 multicast addresses are mapped to the same MAC address. For
example, IP multicast addresses 224.0.1.1, 224.128.1.1, 225.0.1.1, and 239.128.1.1 all
correspond to multicast MAC address 01-00-5e-00-01-01.
Figure 9-47 Mapping between an IP multicast address and a multicast MAC address
5 bits information loss
XXXX X
32 bits IP address 1110 XXXX X XXXXXXX XXXXXXXX XXXXXXXX
Implementation
The process for implementing IGMP snooping is as follows:
1. IGMP snooping analyzes IGMP messages exchanged between hosts and a Layer 3
device and sets up a Layer 2 multicast forwarding table on the basis of this analysis.
Forwarding table entries contain VLAN IDs or VSI names, multicast source addresses,
multicast group addresses, and downstream port lists.
– After receiving an IGMP Query message from an upstream device, IGMP snooping
sets the network-side port as a dynamic router port.
– After receiving an IGMP Report message from a downstream device or a user,
IGMP snooping sets the user-side port as a dynamic member port.
2. When multicast data traffic passes through a Layer 2 device, the Layer 2 device forwards
the multicast data traffic based on its Layer 2 multicast forwarding table.
NOTE
When multiple router ports exist (for example, in a dual-homing scenario) and one of them receives
multicast traffic, the Layer 2 device forwards the traffic to all the other router ports while forwarding the
traffic to users based on the Layer 2 multicast forwarding table.
Other Functions
IGMP snooping also supports the following functions:
Deployment Scenarios
IGMP snooping can be used on VLANs and VPLS networks.
Benefits
IGMP snooping enabled on the ATN connected to a user network segment provides the
following benefits:
Principles
Multicast data can be transmitted to user terminals over an IP bearer network in either
dynamic or static multicast mode.
l In dynamic multicast mode, a device receives and delivers the data for a channel
(multicast group) only after it receives a Report message for the channel from the first
user. The device strops receiving data for the channel after it receives the Leave message
from the last member. The dynamic multicast mode has both an advantage and a
disadvantage:
– Advantage: It reduces bandwidth consumption by reducing multicast traffic.
– Disadvantage: It brings in a delay when a user switches a channel.
l In static multicast mode, multicast forwarding entries are configured for each channel
(multicast group) on a device. Multicast traffic for each channel is delivered to the
device, regardless of whether there are users attached to the device. The static multicast
mode has the following advantages and disadvantages:
– Advantages:
n Multicast routes are fixed, and multicast paths exist regardless of whether there
are multicast data receivers. Users can change channels without delays and the
quality of user experience is good.
n Multicast source and group ranges are easy to manage because multicast paths
are stable.
n The delay when data is first forwarded is minimal because static routes already
exist and do not need to be established as dynamic multicast routes do.
– Disadvantages:
n Each device on a multicast data transmission path must be manually
configured. The device configuration load is heavy.
n Sub-optimal multicast forwarding paths may be generated because
downstream ports must be specified in advance on each device.
n When there are changes in the network topology or unicast routes, static
multicast paths may need to be reconfigured. The workload is heavy.
n Multicast routes exist even when no multicast data needs to be forwarded. This
wastes network resources and creates high bandwidth requirements.
A Layer 2 multicast forwarding table can be built dynamically with IGMP snooping or can be
manually configured. Network quality requirements or the kinds of services demanded by
users can be the basis for determining whether to use dynamic or static multicast mode on
network devices.
If network bandwidth is sufficient and hosts need to multicast data for specific multicast
groups from a router port for a long period of time, static Layer 2 multicast can be used to
implement stable multicast data transmission on a MAN or bearer network.
Related Concepts
Static router or member ports are used in static Layer 2 multicast.
l Static member ports are used to send data for specific multicast groups.
Deployment Scenarios
Static Layer 2 multicast is used on VLANs and VPLS networks.
Benefits
After static Layer 2 multicast is deployed on a device, multicast entries on the device do not
age and users attached to the device can regularly receive multicast data for specific multicast
groups. Static Layer 2 multicast provides the following benefits:
l Simplifies network management.
l Reduces network delays.
l Protects unregistered users from being attacked by multicast data and protocol packets,
improving information security.
Principles
IGMPv3 supports Source Specific Multicast (SSM). While many multicast devices currently
support IGMPv3, most old multicast terminals support IGMPv1 or IGMPv2. The SSM
mapping mechanism enables devices running IGMPv3 to provide SSM services for hosts
running IGMPv1 or IGMPv2. The SSM mapping mechanism maps IGMPv1 or IGMPv2 (*,
G) messages in which group addresses are within an SSM group address range into IGMPv3
(S, G) messages. This enables hosts running IGMPv1 or IGMPv2 to obtain SSM services.
The SSM mapping mechanism effectively protects multicast sources from being attacked.
Layer 2 SSM mapping is used to implement SSM mapping on Layer 2 networks. In the
networking shown in Figure 9-48, the Layer 3 device runs IGMPv3 and is directly connected
to a Layer 2 device. NodeB-A runs IGMPv3, NodeB-B runs IGMPv2, and NodeB-C runs
IGMPv1 on the Layer 2 network. If the IGMP versions of NodeB-B and NodeB-C cannot be
upgraded to IGMPv3, Layer 2 SSM mapping needs to be configured on the Layer 2 device to
provide SSM services for all hosts on the network segment.
SSM
IGMPv3
Report
Implementation
If SSM mapping is configured on a multicast device and mappings between group addresses
and source addresses are configured, the multicast device will perform the following actions
after receiving a (*, G) message from Host C or Host B:
l If the multicast group address contained in the message is out of the SSM group address
range, the device processes the message in the same manner as it processes an
IGMPv1/v2 message.
l If the multicast group address contained in the message is within the SSM group address
range, the device maps the (*, G) Report message into an (S, G) Report messages based
on mapping rules.
Deployment Scenarios
Layer 2 SSM mapping is used on VLANs and VPLS networks.
Benefits
Layer 2 SSM mapping deployed on Layer 2 devices provides the follow benefits:
l Enables IGMPv1/v2 terminal users to enjoy SSM services.
l Better protects multicast sources from being attacked.
Principles
On the network shown in Figure 9-49, forwarding entries are set up based on IGMP messages
exchanged between the PE (a Layer 3 device) and user hosts. If there are many user hosts,
redundant IGMP messages increase the work load of the PE.
IGMP Snooping Proxy can be deployed on the CE (a Layer 2 device) connecting the PE and
hosts to address this problem by terminating IGMP messages. The CE where IGMP Snooping
Proxy is configured has the following functions:
l Periodically sends Query messages to the hosts and receives Report and Leave messages
from the hosts.
l Maintains group member relationships.
l Sends Report and Leave messages to the PE.
l Forwards multicast traffic only to those hosts that require it.
After IGMP Snooping Proxy is deployed on the CE, the PE believes that it is interacting with
only one user. The CE interacts with the upstream PE and downstream user hosts, and is not
completely transparent.
PE PE
CE CE
Implementation
A device that runs IGMP Snooping Proxy establishes and maintains a multicast forwarding
table and sends multicast data to users based on this table. The process for implementing
IGMP Snooping Proxy is as follows:
l An IGMP Snooping Proxy-enabled device sends Query messages to query members of
multicast groups. The querier function must be enabled on a device If the upstream
device of this device does not send IGMP Query messages or static multicast groups are
configured on the upstream device.
l The IGMP Snooping Proxy-enabled device suppresses Report and Leave messages if
large numbers of users frequently join or leave multicast groups. This reduces message
processing pressure on the upstream device.
– When receiving the first Report message for a multicast group from a user host, the
device checks whether there is an entry for this group. If no entry exists, the device
sends the Report message to its upstream device and also creates an entry for this
group. If the entry exists, the device adds the host to the multicast group and does
not send any Report messages to its upstream device.
– After receiving a Leave message for a group from a user host, the device sends a
group-specific query message to check whether there are any members of this
group. If there are other members of this group, the device deletes the user from the
group. If there are no other members of this group, the device considers the user as
the last member of the group and sends a Leave message to its upstream device.
Deployment Scenarios
IGMP Snooping Proxy is used on VLANs and VPLS networks.
Benefits
IGMP Snooping Proxy deployed on a Layer 2 ATN connected to a user network segment
provides the following benefits:
l Reduces bandwidth consumption by reducing IGMP message exchanges.
l Reduces the load of a directly connected Layer 3 device by processing protocol
messages received from downstream hosts and maintaining group memberships.
Principles
As shown in Figure 9-50, in traditional multicast on-demand mode, bandwidth is wasted and
extra loads are borne by both the Layer 3 PE and Layer 2 CE if users in different VLANs
(VLAN 11 and VLAN 22) need to receive multicast data from the same source through the
same device. The Layer 3 PE must send one copy of the multicast data for each VLAN to the
Layer 2 CE and the Layer 2 CE must send a copy of the multicast data to each user.
The multicast VLAN function can be used to address this problem. Based on IGMP snooping,
the multicast VLAN function implements multicast replication across broadcast domains on
Layer 2 devices. After the multicast VLAN function is enabled on the CE, the PE connected
to the CE sends one copy of multicast traffic only to VLAN 3 (multicast VLAN) of the CE.
The multicast data is replicated on the CE and copies are sent to VLAN 11 and VLAN 22.
The PE no longer needs to send more than one identical multicast data flow downstream. This
saves network bandwidth and reduces the load on the PE.
Figure 9-50 Comparison of networks with and without the multicast VLAN function
IP core IP core
PE PE
CE CE
The multicast VLAN function generally must be used together with IGMP Snooping Proxy
for the following reasons:
l On the network shown in Figure 9-50, if IGMP Snooping Proxy is not enabled on
VLAN 3 and users in different VLANs want to join the same group, the CE needs to
forward an IGMP Report message from each user to the PE. Similarly, if users in
different VLANs want to leave the same group, the CE also needs to forward an IGMP
Leave message from each user to the PE.
l After IGMP Snooping Proxy is enabled on VLAN 3, if users in different VLANs want to
join the same group, the CE needs to send only one IGMP Report message to the PE. If
the last member of the group leaves, the CE sends an IGMP Leave message to the PE.
This reduces network-side bandwidth consumption on the CE and performance pressure
on the PE.
Related Concepts
The following concepts are involved in the multicast VLAN function:
l Multicast VLAN: is a VLAN to which the interface connected to a multicast source
belongs. A multicast VLAN is used to aggregate multicast flows.
l User VLAN: is a VLAN to which a group member host belongs. A user VLAN is used
to receive multicast flows from a multicast VLAN.
One multicast VLAN can be bound to multiple user VLANs.
After the multicast VLAN function is configured on a device, the device receives multicast
traffic through the multicast VLANs and sends the multicast traffic to users through user
VLANs.
Implementation
The multicast VLAN implementation process can be divided into two parts:
Deployment Scenarios
The multicast VLAN function is used on VLANs.
Benefits
The multicast VLAN function moves the multicast replication point downstream to edge
devices so that multicast data can be transmitted in different VLANs. The multicast VLAN
function provides the following benefits:
l Reduces bandwidth consumption.
l Reduces the loads of Layer 3 devices
l Facilitates management of multicast sources and multicast group members.
Principles
In traditional multicast on-demand mode, if users in different VLANs or VPLS networks need
to receive the multicast data from the same source through a device, the upstream device of
this device must send several identical multicast data flows downstream. This wastes the
bandwidth and imposes extra processing burdens on the upstream device.
One or more Layer 2 multicast instances can be deployed on the Layer 2 network to address
this problem. The Layer 2 multicast instance function enhances the multicast VLAN function.
This function implements multicast data replication across VLANs and VPLS networks, and
limits the multicast group range in different instances. This saves bandwidth resources and
simplifies multicast group management. In the networking shown in Figure 9-51, if users in
VLAN 11 and VLAN 22 require the multicast data for channels in the range of 225.0.0.1 to
225.0.0.5, Layer 2 multicast instances can be deployed on the CE. After the PE sends one
copy of the multicast data traffic through VLAN 3, the CE replicates the multicast data and
sends a copy to each VLAN. This reduces bandwidth consumption.
IP core
PE
VLAN 3
( 225.0.0.1~225.0.0.5)
CE
VLAN 11 VLAN 22
Multicast users are allowed to receive multicast data traffic across different types of networks.
This facilitates flexible deployment of multicast services and satisfies the requirements of
different types of networking. For example, users are allowed to receive multicast data traffic
across VPLS networks and VLANs.
Related Concepts
The following concepts are involved in the Layer 2 multicast instance function:
l Multicast instance: is the instance to which the interface connected to the multicast
source belongs. A multicast instance is used to aggregate multicast flows.
l User instance: is the instance to which a group member host belongs. A user instance is
used to receive multicast flows from a multicast instance.
One multicast instance can be bound to multiple user instances.
l Channel: is a series of multicast groups. To facilitate program management, content
providers operate different types of channels in different Layer 2 multicast instances.
Channels need to be configured in the Layer 2 multicast instances.
Implementation
The Layer 2 multicast instance implementation is similar to the multicast VLAN
implementation. After receiving a multicast data packet from an upstream device, a Layer 2
device searches the multicast forwarding table for a matching entry based on the multicast
instance ID and the destination address (multicast group address) contained in the packet. If a
matching forwarding entry exists, the Layer 2 device will identify the downstream interfaces
and their VLAN IDs or VSI names, replicate the multicast data packet on each downstream
interface, and send a copy of the packet to user instances. If no matching forwarding exists,
the Layer 2 device will broadcast the multicast data packets in the local multicast VLAN or
VSI.
Deployment Scenarios
The Layer 2 multicast instance function is used on VLANs and VPLS networks.
Benefits
The Layer 2 multicast instance function provides the following benefits:
l Reduces bandwidth consumption.
l Ensures network security.
l Separates the unicast and multicast domains and prevents the traffic of a particular user
from affecting other users or the network as a whole.
9.4.3 Applications
Given the characteristics of IPTV, multicast technologies can be used to bear IPTV services.
Unlike traditional unicast, multicast does not require more network bandwidth as the number
of users increases. It reduces loads on video servers and the bearer network. If service
providers want to deploy IPTV services quickly and economically, E2E multicast push is
recommended.
Network Description
Currently, an IP MAN consists of a metro backbone network and a broadband access network.
IPTV service traffic is pushed through the metro backbone network to the broadband access
network and finally to user terminals. Figure 9-52 shows an E2E IPTV service push model.
The metro backbone network is made up primarily of network layer (Layer 3) devices. PIM,
such as PIM-SM, is used on metro backbone devices to connect to the multicast source.
Devices directly connected to the broadband access network use IGMP to forward multicast
packets to user terminals. The broadband access network consists of data link layer (Layer 2)
devices. Layer 2 devices can use Layer 2 multicast technologies such as IGMP Snooping
Proxy or IGMP snooping to forward multicast packets to terminal users. The multicast
technology ensures that there is only one copy of multicast data is transmitted on the metro/
backbone and broadband access networks, greatly reducing bandwidth consumption.
Server
Metro
IP/MPLS Backbone
Network
PIM/IGMP SR BSR
IGMP Snooping/
IGMP Proxy/ CE1 CE2
Multicast VLAN
Multicast
Packets
The following section describes Layer 2 multicast features used on the broadband access
network.
PW Pseudo Wire
9.5 MSDP
Purpose
A network composed of multiple PIM-SM devices is called the PIM-SM network. A large
PIM-SM network may be maintained by multiple Internet Service Providers (ISPs).
PIM-SM domains are isolated by Rendezvous Points (RPs). The multicast source can only
register to the local RP, and hosts can only send the Join message to the local RP. As a result,
the RP only recognizes the local multicast source and distributes the data from the multicast
source to the local users.
A PIM-SM network depends on RPs to forward multicast data. To implement load balancing
among RPs, enhance network reliability, and facilitate management, you can group multiple
RPs into different domains on the PIM-SM network. Each domain is called a PIM-SM
domain.
After a PIM-SM network is divided into multiple PIM-SM domains, RPs in different domains
cannot communicate with each other. To implement the communication between PIM-SM
domains, MSDP is introduced.
NOTE
A PIM-SM domain can be considered the service scope of an RP, and different PIM-SM domains can be
divided by the BootStrap Router (BSR) boundary or by configuring different static RPs on different
ATNs.
9.5.2 Principles
l Establish MSDP peer relationships between RPs in the same autonomous system (AS)
but of different PIM-SM domains.
l Establish MSDP peer relationships between RPs in different ASs.
NOTICE
To ensure successful Reverse Path Forwarding (RPF) checks in an inter-AS scenario, a
BGP or a Multicast Border Gateway Protocol (MBGP) peer relationship must be
established on the same interfaces as the MSDP peer relationship.
NOTE
For details of MBGP, refer to the chapter "MBGP Configuration" in the Configuration Guide - IP
Multicast.
Basic Principle
Setting up the MSDP peer relationships between RPs in different PIM-SM domains ensures
communications between MSDP peers (RPs). This procedure forms an MSDP-connected
graph.
MSDP peers then exchange Source Active (SA) messages. The SA message carries (S, G)
information registered on RP of the source DR. SA messages are exchanged among MSDP
peers. This exchange ensures that SA messages sent by an RP can be received by all the other
RPs.
As shown in Figure 9-53, the PIM-SM network is divided into four PIM-SM domains. The
multicast source of PIM-SM1 domain (Source) sends data to the multicast group G. Receiver
in the PIM-SM3 domain, as a member of G, maintains an RP-rooted Shared Tree (RPT) of G
with RP3.
Receiver
PIM-SM 3
DR3 RP3
Source DR1
PIM-SM 4
PIM-SM 1
RP2
RP1 PIM-SM 2
MSDP peers
multicast packet
Register
SA message
Join
As shown in Figure 9-53, Receiver can receive the multicast data sent by Source after the
MSDP peer relationships between RP1, RP2, and RP3 are set up.
1. Source sends multicast data to G. DR1 (Designated Router) then encapsulates the data
into the Register message and sends the message to RP1. As the RP of the multicast
source, RP1 creates an SA message, which carries the IP addresses of the multicast
source, multicast group G, and RP1, and sends the SA message to the peer RP2.
2. After RP2 receives the SA message, it performs an RPF check on the message. If the
check succeeds, RP2 forwards the message to RP3.
3. After RP3 receives the SA message, it performs an RPF check on the message, and the
check succeeds. RP3 has the (*, G) entry, and the domain contains the member of G.
4. RP3 creates an (S, G) entry and sends a Join message with the (S, G) information to
Source hop by hop. A multicast path (source tree) from Source to RP3 is set up. After the
multicast data reaches RP3 along the source tree, RP3 forwards it to Receiver along the
RPT.
5. After Receiver receives the multicast data, it determines whether to initiate the SPT
switchover.
Applicable Environment
In a traditional PIM-SM domain, each multicast group is mapped to only one Rendezvous
Point (RP). When the network is overloaded or the traffic is too heavy, many network
problems occur, such as the heavy pressure of the RP, the slow convergence after the RP fails,
and the non-optimal multicast forwarding path.
Therefore, anycast RP is introduced in MSDP. After anycast RP is enabled in MSDP, multiple
RPs can be configured with the same loopback address in a PIM-SM domain, and MSDP peer
relationships are established between these RPs. As a result, the path destined for the RP is
optimal, and load balancing is implemented among RPs.
To sum up, anycast RP can properly address the problem of heavy loading on a single RP in a
PIM-SM domain, which is caused by the convergence of all multicast source information and
multicast join information on the RP. Meanwhile, anycast RP ensures the path destined for an
RP is optimal because the receiver and multicast source join and register to the nearest RP.
Principles
As shown in Figure 9-54, in the PIM-SM domain, the multicast sources, S1 and S2, send
multicast data to the multicast group G that contains multicast members, U1 and U2.
PIM-SM
RP1 DR1
U1 S1
S2 U2
DR2 RP2
SA message
MSDP peers
messages. MSDP strictly controls the inbound of the SA message. MSDP directly discards an
SA message that does not comply with the RPF rules.
MSDP has the following RPF rules:
l Rule 1: If the peer that sends the SA message is the source Rendezvous Point (RP), the
SA message is received and forwarded to other peers.
l Rule 2: The SA message sent by the static RPF peer is received. A device can set up an
MSDP peer relationship with multiple devices. Users can select one or multiple peers
from these remote peers and set it as the static RPF peer.
l Rule 3: If a device has only one remote MSDP peer, the remote peer automatically
becomes the RPF peer. The device receives the SA message sent by the remote peer. The
PIM-SM domain that has only one remote MSDP peer outside the PIM-SM domain is
called STUB domain.
l Rule 4: If the peer that sends the SA message and the local device belong to the same
mesh group, the local device receives the SA message. The SA messages from the mesh
group are not forwarded to the members of the mesh group, but to all the peers that do
not belong to the mesh group.
l Rule 5: If the peer that sends the SA message is the next hop of the route to the source
RP or a route forwarder, the local device receives SA messages and forwards them to
other peers. The route types can be Multicast Border Gateway Protocol (MBGP) routes,
BGP routes, static multicast routes, and Interior Gateway Protocol (IGP) routes.
l Rule 6: If the route that reaches the source RP spans multiple autonomous systems
(ASs), only the SA message received from the peer whose AS number is in the AS-path
is accepted.
Inter-Domain Multicast
AS 100
Receiver
Source
PIM-SM1 PIM-SM2
RP1 RP2
Router1 Router2
MSDP Peers
l The MSDP peer relationship is set up between Rendezvous Points (RPs) of two PIM-SM
domains. In this manner, information about the multicast source is shared between the
two PIM-SM domains.
l When receiving the multicast data, the multicast source RP1 sends Source Active (SA)
messages that carry the multicast source information to RP2.
l Then RP2 forwards the multicast data to the receiver in its domain.
l After receiving the multicast data, the receiver decides whether to initiate the shortest
path tree (SPT) switchover.
Anycast RP
U2
PIM-SM
S1
Loopback1 ATN2
S2
BSR
Loopback1
ATN1
U1
MSDP peers
l ATN 1 and ATN 2, as RPs, establish the MSDP relationship between each other.
l Through the MSDP peer relationship, the intra-domain multicast is performed. In
addition, the receiver sends a Join message to the nearest RP to set up a rendezvous point
tree (RPT) tree.
l The multicast source registers to the nearest RP, and RPs send each other SA messages to
share the multicast source information.
l RPs join the SPT with the source Designated router (DR) as root to obtain the multicast
data.
l After the receiver receives the multicast data, it determines whether to initiate the SPT
switchover.
Terms
Terms Description
MSDP Multicast Source Discovery Protocol (MSDP) is only applicable to the PIM-SM
domain and only meaningful for the Any-Source Multicast (ASM) model.
After the MSDP peer relationship is set up between RPs of different PIM-SM
domains, multicast source information can be shared between PIM-SM domains,
and the inter-domain multicast can be implemented.
After the MSDP peer relationship is set up between RPs of the same PIM-SM
domain, multicast source information can be shared in the PIM-SM domain, and
anycast RP can be implemented.
PIM Protocol Independent Multicast (PIM) is one of the multicast routing protocols.
PIM forwarding can be implemented only if unicast routes are reachable. By
using the existing unicast routing information, PIM performs Reverse Path
Forwarding (RPF) check on multicast messages. In this manner, multicast
routing entries are created and the multicast distribution tree is set up.
SPT Shortest Path Tree (SPT) distributes multicast data by taking the multicast
source as the root and multicast group members as leaves. SPT is applicable to
PIM-DM, PIM-SM, and PIM SSM.
BSR BootStrap Router (BSR), also called Boot Router, is the management core of the
PIM-SM network. The BSR collects the C-RP information into an RP-set,
encapsulates the RP-set into a Bootstrap message, and advertises the Bootstrap
message to each PIM-SM device in the entire network. The PIM-SM device
then calculates the RP corresponding to the specified multicast group according
to the RP-set.
AS Autonomous System
RP Rendezvous Point
Definition
With the fast development of the Internet, there has been a considerable growth in all types of
data and voice and video information exchanged in the network, which speeds up the
development of multicast services. Multicast management provides the following tools for
multicast service probe and fault diagnosis.
l Multicast Ping (MPing): is a tool used to probe multicast services. By sending Internet
Control Message Protocol (ICMP) Echo Request messages, MPing triggers the setup of
the multicast forwarding tree and detects the members of reserved multicast groups over
the network.
NOTE
Reserved multicast group: The reserved multicast group addresses are within the range from
224.0.0.0 to 224.0.0.255. For example, 224.0.0.5 is reserved for the OSPF multicast group;
224.0.0.13 is reserved for the PIMv2 multicast group.
l Multicast trace route (MTrace): is a tool used to trace multicast forwarding paths. It can
trace the path from a receiver to a multicast source along the multicast forwarding tree.
Purpose
As multicast services are widely applied, MPing and MTrace become more important in
multicast service maintenance and fault location. When selecting the network devices that
support multicast, users demand that the devices should support not only multicast forwarding
and multicast routing protocols but also tools for diagnosing multicast faults. With the
development of multicast services, multicast maintenance and fault location are absolutely
necessary.
l Performing statistics on the ICMP Echo Reply messages sent from the destination host to
calculate the time-to-live (TTL) and response time from the multicast source to the
member of the multicast group
l Obtaining the network delay and route jitter by performing MPing periodically
l Pinging the address of a reserved multicast group
l Checking the members of reserved multicast groups over the network
l Locating faulty nodes and finding configuration errors in multicast troubleshooting and
routine maintenance
l Tracing the actual forwarding path of packets and collecting traffic information during
the trace; calculating multicast traffic rate in cyclic path tracing
l Outputting information about the faulty nodes for the NMS to analyze the fault and
generate alarms
9.6.2 Principles
9.6.2.1 MPing
MPing uses standard Internet Control Message Protocol (ICMP) messages to detect the
connectivity of a multicast path. A standard ICMP message used by MPing is an ICMP Echo
Request message, with the encapsulated destination address being a multicast address (either a
multicast address for the reserved multicast group or a common multicast group address).
l If the encapsulated destination address is a multicast address for the reserved multicast
group, the querier must specify the outgoing interface of the ICMP Echo Request
message. Finding that the destination address of the received ICMP Echo Request
message is the address of the reserved multicast group, the member (multicast device) of
the reserved multicast group responds with an ICMP Echo Reply message. Therefore,
MPing can be used to check the members of reserved multicast groups over the network.
l If the encapsulated destination address is a common multicast group address, the querier
cannot specify the outgoing interface of the ICMP Echo Request message. The ICMP
Echo Request message, as multicast traffic, is forwarded across the multicast network,
which can build multicast routing. The network quality analysis (NQA) software can
perform the MPing operations on multicast groups, and then gather the information
about delay and jitter. In this manner, multicast services can be successfully maintained
and multicast faults can be located.
9.6.2.2 MTrace
This standard describes a mechanism to trace the path on which multicast data is forwarded
from a multicast source to a designated receiver.
Receiver
MTrace is based on the multicast-enabled network such as the Protocol Independent Multicast
(PIM), including PIM-DM or PIM-SM and the established multicast distribution tree. MTrace
probes the multicast forwarding path by sending IGMP Tracert messages. IGMP Tracert
messages fall into the following types: IGMP Tracert Query message, IGMP Tracert Request
message, and IGMP Tracert Response message.
l The IGMP Tracert Request message is the IGMP Tracert Query message with an
additional response data block added to the end of the message.
l The IGMP Tracert Response message is the IGMP Tracert Request message with only
the message type field changed.
1. Run the MTrace command on the querier, with the multicast source address, destination
host address, and multicast group being specified.
2. The querier sends an IGMP Tracert Query message to the last-hop device connected with
the destination host.
3. After receiving the IGMP Tracert Query message, the last-hop device adds a response
data block containing the information about the interface receiving this IGMP Tracert
Query message to construct an IGMP Tracert Request message, and sends the message
to the previous-hop device.
4. The device of each hop adds a response data block to the IGMP Tracert Request message
and sends the message upstream.
5. When the first-hop device connected with the multicast source receives the IGMP
Tracert Request message, it also adds a response data block and sends the IGMP Tracert
Response message to the querier.
6. The querier parses the IGMP Tracert Response message and obtains the information
about the forwarding path from the multicast source to the destination host.
7. If the IGMP Tracert Request message cannot reach the first-hop device because of some
errors, the IGMP Tracert Response message is directly sent to the querier. The querier
then parses the data block information for locating the faulty node. In this way, faulty
node monitoring is realized.
An MTrace operation can be initiated in the following modes. The initiating modes vary
with networking environment.
– all-router: indicates that the current multicast device is directly connected to the
destination host but it is not the last-hop device. 224.0.0.2 is set as the destination
address of the message. Such a message can be received by all multicast devices
residing on the network segment of the destination host, including the last-hop
device.
– last-hop: indicates that the IP address of the last-hop multicast device is set as the
destination address of the message. This mode requires the user to input the IP
address of the last-hop device.
– destination: indicates that the IP address of the destination host is set as the
destination address of the message. When the multicast device that directly
connects the destination host receives such a message, the device judges whether it
is the last-hop device. If not, the device re-encapsulates the IGMP Tracert Query
message in all-router mode.
– multicast-tree: indicates that the querier is just on the path from the multicast source
to the destination host (for example, the first-hop multicast device). The IP address
of the traced multicast group is set as the destination address of the message, and
the IP address of the multicast source is set as the source address of the message.
Then, the message is forwarded along the multicast path and finally arrives at the
last-hop multicast device.
Terms
None.
Definition
Multicast Route Management is used to manage multicast routing tables and control the
creation and change of multicast routes.
Purpose
l RPF check
This function is used to search for an optimal unicast route to a multicast source and
create a multicast forwarding tree. The outgoing interface of the unicast route is the
incoming interface of the forwarding entry. Then, when the forwarding module receives
multicast data packets, it searches the forwarding entry and checks whether the incoming
interface of the data packets is correct. If the interface that a multicast data packet
reaches is the outgoing interface of the unicast route, the packet passes the RPF check;
otherwise, the packet cannot pass the RPF check and is discarded. The RPF check
effectively avoids traffic loops during multicast data forwarding.
l Multicast load splitting
During multicast routing, you can configure a multicast load splitting policy on the ATN
so that the ATN can select different routes from the equal-cost routes as RPF routes for
different forwarding entries to guide data forwarding. Because the RPF routes of
forwarding entries can be distributed to different equal-cost routes, multicast data
distribution is implemented.
l Longest-match multicast routing
During multicast routing, the ATN prefers a route whose destination address mask and
source address mask are of the longest match to achieve accurate route matching.
l Multicast boundary designation
By configuring a multicast boundary on an interface, you can block multicast data on the
interface. That is, disable the interface from forwarding the received multicast data.
l Multicast NSR
Through multicast NSR, the adjacent devices cannot sense the master/slave switchover
of the current device. Therefore, multicast routing is not interrupted and the MDT is not
changed, which will not trigger the processing on the adjacent devices.
9.7.2 Principles
The Reverse Path Forwarding (RPF) check rules are as follows: According to the source of a
packet, a multicast device searches its unicast routing table, Multicast Border Gateway
Protocol (MBGP) routing table, Multicast Interior Gateway Protocol (MIGP) routing table,
and static multicast routing table for an optimal route as an RPF route. A packet passes the
RPF check only when the interface that the packet reaches is the same as the RPF interface.
If all the MIGP, MBGP, and MSR routing tables have candidate routes for the RPF route, the
system selects one optimal route from each of the routing table. If the routes selected from
each table are Rt_urt (migp), Rt_mbgp, and Rt_msr, the system selects the RPF route based
on the following rules:
l By default, the system selects the RPF route based on the route priority.
a. The system compares the priorities of Rt_urt (migp), Rt_mbgp, and Rt_msr. The
route with the smallest priority value is preferentially selected as the RPF route.
b. If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same priority, the system selects
the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
l If the multicast longest-match command is run to control route selection based on the
route mask:
– The system compares the mask lengths of Rt_urt (migp), Rt_mbgp, and Rt_msr.
The route with the longest mask is preferentially selected as the RPF route.
– If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length, the system
compares their priorities. The route with the smallest priority value is preferentially
selected as the RPF route.
– If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length and priority, the
system selects the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt
(migp).
ATN B
GE0/2/1 Receiver
RouterA GE0/2/0
Source
ISP
192.168.0.1/24
GE0/2/0
GE0/2/1 Receiver
In Figure 9-58, ATN C searches its routing tables and finds that GE 0/2/1 is the outbound
interface of the shortest path to the source. If the inbound interface of (S,G) entry is GE 0/2/1,
ATN C forwards the packet. If the inbound interface of (S,G) entry is not GE 0/2/1, ATN C
discards the packet.
RouterB
Source1 ATN G
RouterC
(S,G1) RouterF
(S,G2)
RouterD
(S,G3)
(S,G4)
......
Based on a series of algorithms, a multicast ATN can select an appropriate route among
several equal-cost routes for each multicast group. This route is used for packet forwarding
for this group. Finally, multicast traffic for different groups can be split into different
forwarding paths.
RouterB
Source1 ATN G
RouterC
(S1,G)
......
RouterF
Source10
RouterD
(S10,G)
Based on a series of algorithms, a multicast ATN can select an appropriate route among
several equal-cost routes for each multicast source. This route is used for packet forwarding
for this source. Finally, multicast traffic from different sources can be split into different
forwarding paths.
Figure 9-61 Networking diagram of multicast source- and multicast group-based load
splitting
RouterA
RouterE
RouterB
Source1 ATN G
RouterC
(S1,G1)
......
RouterF
Source10
RouterD
(S10,G10)
Based on a series of algorithms, a multicast ATN can select an appropriate route among
several equal-cost routes for each source-specific multicast group. This route is used for
packet forwarding for this source-specific multicast group. Finally, multicast traffic for
different source-specific groups can be split into different forwarding paths.
RouterA
RouterE
RouterB
Source ATN G
RouterC
RouterF Receiver
RouterD
l Implementation principle
The ATN configured with stable-preferred load splitting selects the most appropriate
route for a newly created entry, that is, the route assigned the fewest entries. When the
network topology and entries are stable, all entries with the sources on the same network
segment are distributed evenly among the equal-cost routes.
If unbalance is caused because an entry is deleted or the weight of a route changes, the
ATN configured with stable-preferred load splitting solves the problem by selecting the
most appropriate routes for subsequent entries.
In stable-preferred load splitting mode, if finding that entries are not balanced among
paths, the device will balance entries after a certain time (a waiting time) to reduce the
impact of frequent entry changes on the system.
Currently, setting a load balancing timer to change the waiting time before balancing
entries is supported.
interfaces change, or the number of equal-cost routes changes. When the entries are
unbalanced, there is a delay for the ATN enabled with balance-preferred load splitting to
balance the entries, which prevents the frequent changes of routes for the entries. In addition,
within the delay, the ATN can balance the entries by selecting the most appropriate routes for
subsequent entries.
Currently, setting a load balancing timer to change the waiting time before balancing entries is
supported.
During route selection, an optimal intra-domain unicast route, an optimal inter-domain unicast
route, and an optimal multicast static route are selected. One of the them is finally selected as
the forwarding path for the multicast data.
The longest match principle works as follows:
1. If the longest match principle is configured for route selection, a route with the longest
matched mask is chosen by the multicast router.
For example, there is a multicast source with the IP address of 10.1.1.1, and multicast
data needs to be sent to a host with the IP address of 192.168.1.1. There are two
reachable routes to the source in the static routing table and intra-domain unicast routing
table, and the destination network segments are 10.1.1.0/16 and 10.1.1.0/24. Based on
the longest match principle for route selection, the route to the network segment of
10.1.1.0/24 is chosen as the forwarding path for the multicast data.
2. If the mask lengths of the routes are the same, the route with a higher priority is chosen
as the forwarding path for the multicast data.
3. If the mask lengths and priorities of the routes are the same, a route is selected in the
order of a static route, an inter-domain unicast route, and an intra-domain unicast route
as the forwarding path for multicast data.
4. If all the preceding conditions cannot determine a forwarding path for multicast data, the
route with the highest next-hop address is chosen.
Applicable Environment
A multicast boundary is used to control the transmission of multicast information. After this
function is enabled, the multicast information each multicast group corresponds to can be
transmitted only within a certain range. You can configure a multicast boundary on an
interface to form a closed multicast forwarding area. When an interface of the multicast
device is configured with the multicast boundary for a group, the interface does not forward
or receive any packet for this group.
Principles
Source1 Source2
ATN B ATN D
GE0/2/0 GE0/2/0
RouterA RouterE
Multicast
packet
RouterC RouterF
Receiver Receiver
As shown in Figure 9-63, RouterA, ATN B, and RouterC form a multicast domain 1; ATN D,
RouterE, and RouterF form a multicast domain 2. The two multicast domains communicate
through ATN B and ATN D.
If the data for a multicast group (G) in one multicast domain needs to be isolated from the
other multicast domain, you only need to configure GE 0/2/0 of ATN B or GE 0/2/0 of ATN
D as a multicast boundary for G so that the interface no longer forwards data to and receives
data from G.
Terms
Term Description
Multicast load Multicast load splitting is different from load balancing. Multicast load
splitting splitting indicates that multicast entries can be distributed to multiple
equal-cost routes and the number of multicast entries transmitted on
each equal-cost route can be different.
Definition
Multicast VPN (MVPN) in Rosen Mode implements multicast service transmission over
MPLS/BGP VPNs. It is a solution based on the multicast domain (MD) scheme defined in
RFC 6037.
Purpose
MVPN in Rosen Mode implements multicast transmission on MPLS/BGP VPNs. It transmits
multicast data and control messages of the PIM instances in private network over the public
network to the remote sites of the VPN.
The PIM instances in the public network (PIM P-instances) need not know multicast data
transmitted between the private networks and the PIM C-instances also need not know
multicast routing information of the PIM P-instance. Therefore, isolating the PIM instances of
the public network from those of the private networks is implemented.
9.8.2 Principles
l MD
MD is short for Multicast Domains. MD is the set of all the VPN instances that can
transmit multicast packets on each Provider Edge (PE). Different VPN instances belong
to different MDs. An MD serves a specific VPN. All private multicast data transmitted in
the VPN is transmitted in the MD.
l Share-Group
Based on the MD principle, all the VPN instances on the PEs in the same MD must join
a common group, called a Share-Group.
Currently, one VPN instance can be configured with only one Share-Group, that is, one
VPN instance can join only one MD.
l Share-Multicast Distribution Tree
Share-MDT is short for Share-Multicast Distribution Tree. Actually, it is set up when the
PIM C-instances on the PEs join Share-Groups. A Share-MDT transmits the PIM
protocol packets and data packets to other PEs within the same VPN. The Share-MDT is
regarded as a multicast tunnel (MT) within an MD.
l MTI
MTI is short for Multicast Tunnel Interface. It is the outgoing interface or incoming
interface of an MT. An MTI is equal to the outgoing interface or incoming interface of
an MD. The local PE and remote PE send and receive VPN data through MTIs.
The MTI is the channel through which the public network instance and VPN instances
on PEs communicate. PEs are connected to an MT by using MTIs, which is equal to the
situation that PEs are connected to a shared network segment. On each PE, VPN
instances that belong to the MD set up the PIM neighbor relationship on MTIs.
l Switch-Group
It is a group to which all the VPN receivers of the PE join for establishing a Switch-
MDT after a Share-MDT is established.
l Switch-MDT
Switch-MDT is short for Switch-Multicast Distribution Tree. It prevents multicast data
packets from being transmitted to unnecessary PEs. After a Share-MDT is set up, all the
PEs to which the receivers in the VPN are attached join an MDT set up based on Switch-
Groups. A Switch-MDT can transmit high-rate data packets to other PEs in the same
VPN.
VPN CE2B
RED CE1R P VPN
BLUE
PC2
PC1
The process of implementing the communication between PIM C-instances on the PEs
through MVPN is as follows:
In this manner, the VPN instances with the same Share-Group address form a multicast
domain (MD).
As shown in Figure 9-64, VPN BLUE instances bound to PE1 and PE2 communicate through
the MD BLUE and similarly, VPN RED instances bound to PE1 and PE2 communicate
through the MD RED, as shown in Figure 9-65 and Figure 9-66.
Source1
VPN CE1B
BLUE
CE2B
VPN
BLUE
PC2
Source2
CE2R VPN
RED
VPN
RED CE1R
PC1
The PIM C-instance on the PE considers the MTI as a LAN interface and sets up the PIM
neighbor relationship with the remote PIM C-instance through MTIs. The PIM C-instances
then use MTIs to perform DR election, send Join/Prune messages, and forward and receive
multicast data.
The PIM C-instance sends PIM protocol packets or multicast data packets to the MTI and the
MTI encapsulates the received packets. The packets after encapsulation are public network
multicast data packets and therefore are forwarded by the PIM P-instances on the network. In
conclusion, an MT is actually a multicast distribution tree on the public network.
l Different VPNs use different topologies and each topology uses a unique packet
encapsulation mode. In this manner, multicast data in different VPNs is isolated from
each other.
l The PIM C-instances on the PEs in the same VPN use the same MT and communicate
through this MT.
NOTE
A VPN uniquely defines an MD. An MD serves only one VPN. This relationship is called one-to-one
relationship. The VPN, MD, MTI, Share-Group, and Switch-group-pool are all in one-to-one
relationship.
The PIM neighbor relationship is set up between two or more direct multicast devices that
reside in the same network segment. There are three types of PIM neighbor relationships in a
multicast domain (MD) VPN: PE-Customer Edge (CE) neighbor relationship, PE-Provider
(P) neighbor relationship and PE-PE neighbor relationship.
As shown in Figure 9-67, VPN A instances on each PE and the sites that belong to the VPN
A implement VPN A multicast. Figure 9-68 shows the neighbor relationship between CE,
PE, and P.
VPNA
site1
CE1
PE1_vpnA-instance
PE3_vpnA-instance
MD A
MD A
CE2
VPN A
site3 PE2_vpnA-instance
CE3
VPN A
site2
CE3
PE3
CE1 CE2
P
MD
PE1 PE2
PE-PE neighbour
PE-P neighbour
PE-CE neighbour
It is set up between the interface on the PE bound to a VPN instance and the interface on
the CE at the remote end of the link.
l PE-P neighbor relationship
It is set up between the interface on the public network side of the PE and the interface
on the P at the remote end of the link.
l PE-PE neighbor relationship
It is set up after the VPN instance on the local PE receives Hello packets from the VPN
instance on the remote PE through a Multicast Tunnel Interface (MTI).
The multicast distribution tree (MDT) that takes the Share-Group address as the group address
is called a Share-MDT. The VPN uniquely identifies a Share-MDT by using a Share-Group.
The public network can run PIM-SM or PIM-DM. The process of establishing a Share-MDT
is different in the two cases.
P RP
MD
PE1 PE2
IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
As shown in Figure 9-69, the public network runs PIM-SM. The process of establishing a
Share-MDT is as follows:
1. The PIM P-instance on PE1 sends a Join message with the Share-Group address being
the multicast group address to the Rendezvous Point (RP) in the public network. PEs that
receive the Join message then create the (*, 239.1.1.1) entry on themselves. PE2 and PE3
also send Join messages to the RP in the public network. A Rendezvous Point Tree
(RPT) is thus formed in the MD, with the RP being the root and PE1, PE2, and PE3
being leaves.
2. The PIM P-instance on PE1 sends a Register message with the Multicast Tunnel
Interface (MTI) address being the source address and the Share-Group address being the
group address to the RP in the public network. The RP then creates the (11.11.1.1,
239.1.1.1) entry on itself. PE2 and PE3 also send Register messages to the RP in the
public network. Thus, three independent RP-source trees that connect PEs to the RP are
formed in the multicast domain (MD).
In the PIM-SM network, an RPT (*, 239.1.1.1) and three independent RP-source trees form a
Share-MDT.
PE3
IBGP:11.1.3.1/24
MD
PE1 PE2
IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
As shown in Figure 9-70, the public network runs PIM-DM. The process of establishing a
Share-MDT is as follows.
A flooding-pruning process is initiated on the entire public network with the PIM P-instance
on PE1 being a multicast source, the Share-Group address being the multicast group address,
and other PEs that support VPN A being group members. During this process, the (11.1.1.1,
239.1.1.1) entry is created on the PEs along the path in the public network. A Shortest Path
Tree (SPT) with PE1 being the root and PE2 and PE3 being leaves is thus set up. PE2 and
PE3 also start the similar flooding-pruning process in the public network to form two SPTs.
As a result, in the PIM-DM network, three independent SPTs are created and form a Share-
MDT.
Figure 9-71 shows the process of converting a VPN multicast packet into a public network
multicast data packet and then into a VPN multicast packet. Table 9-21 describes the meaning
of each field in a VPN or public network multicast packet.
P-IP Header
Field Description
l All interfaces that belong to the same VPN, including the PE interfaces bound to the VPN instance
and MTI, must be in the same PIM mode.
l The VPN instance and the public network instance are independent of each other. They can be in
different PIM modes.
RP
RP
P
The process of transmitting VPN multicast data across the public network is as follows:
1. The source sends VPN multicast data (192.1.1.1, 225.1.1.1) to CE1.
2. CE1 forwards the VPN multicast data to PE1 along the SPT. The VPN instance on PE1
searches for the forwarding entry. If the outgoing interface of the forwarding entry
contains an MTI, the instance forwards the VPN multicast data to the related P for
further processing. The VPN instance on PE1 then considers that the Join message is
sent out from the MTI.
3. PE1 encapsulates the VPN multicast data with GRE and reverts it to a public network
multicast data packet (11.11.1.1, 239.1.1.1) with the address of the IBGP interface on
PE1 being the source address and the share-group address being the group address. PE1
then forwards the multicast data packet to the public network instance for forwarding.
4. The multicast data packet (11.11.1.1, 239.1.1.1) is sent to the public network instance on
each PE along Share-MDT. Each PE decapsulates it, reverts it to VPN multicast data,
and forwards it to the related VPN instance for further processing. If there is an SPT
downstream interface on the PE, the data is forwarded along SPT. Otherwise, the data is
discarded.
5. The VPN instance on PE2 searches for the forwarding entry and then sends the VPN
multicast data to the receiver. So far, the process of transmitting VPN multicast data
across the public network is complete.
Background
According to the process of establishing a Share-multicast distribution tree (Share-MDT)
described in the previous section, you can find that the VPN instance bound to PE3 has no
receivers but PE3 still receives the VPN multicast data packet of the group (192.1.1.1,
225.1.1.1). This is a defect of the multicast domain (MD) scheme: All the PEs belonging to
the same MD can receive multicast data packets regardless of whether they have receivers.
This wastes the bandwidth and imposes extra burden on PEs.
Implementation
Figure 9-74 shows the switch-MDT implementation process based on the assumption that a
share-MDT has been successfully established.
PE3 S: 192.1.1.1/24
IBGP:11.1.3.1/24 G: 255.1.1.1
Share-Group: 239.1.1.1
RP
l If the advanced ACL rules used to control the switchover of VPN multicast data packets
to the Switch-MDT change, the VPN multicast data packets cannot pass the filtering of
new ACL rules.
Applicable Environment
Multicast VPN extranet meets the following requirements:
l Distributes multicast services among different enterprise users.
l Enables service or contents providers to distribute multicast services to different
enterprise users. The multicast data of a VPN can be provided for users in other VPNs to
use.
Principles
Multicast VPN extranet is applicable to two scenarios: remote-cross scenario and local-cross
scenario. The basic principles of multicast VPN extranet applied in the two scenarios are
described as follows:
l Remote-cross scenario
Figure 9-75 Networking diagram of the remote-cross scenario of multicast VPN extranet
Source
VPN
Source BLUE
VPN
RED
CE2
PE1 PE2
CE1
IP MPLS Core
PE3
CE3
VPN
BLUE
Receiver
Receiver
As shown in Figure 9-75, VPN RED is configured on PE1; the address of the share-
group is the address of G1; the site where CE1 resides is connected to the multicast
source of VPN RED. VPN BLUE is configured on PE2; the address of the share-group is
the address of G2; the site where CE2 resides is connected to the multicast source of
VPN BLUE. Therefore, PE1 functions as the source PE of VPN RED, and PE2 functions
as the source PE of VPN BLUE. VPN BLUE is configured on PE3; the address of the
share-group is the address of G2; PE3 establishes an MDT with PE2 on the public
network. A user at the site where CE3 resides needs to receive multicast data from both
VPN BLUE and VPN RED. Therefore, PE3 functions as the receiver PE of VPN RED
and VPN BLUE.
In such a scenario, after configuring a VPN instance on the local PE, you need to
establish a multicast tunnel between the VPN instance of the local PE and that of the
remote PE. There are two configuration options available to provide multicast VPN
extranet services:
– Configure the source VPN on the PE where the receiver VPN resides. Based on the
multicast domain (MD) to which the VPN to be accessed belongs, configure source
VPN RED on PE3 and a multicast routing policy for the receiver VPN instance.
Then, hosts in the receiver VPN instance can send Join messages to source VPN.
PE3 then encapsulates multicast Join messages with the share-group address of
VPN RED, and sends the multicast Join messages to PE1 over the public network.
Finally, the multicast Join messages reach the multicast source of VPN RED.
Similarly, the multicast source of VPN RED sends multicast traffic over the public
network to VPN RED at the PE3 side. The multicast traffic is then imported to VPN
BLUE, and finally reaches the user.
– Configure the receiver VPN on the PE where the source VPN resides. Based on the
MD to which the VPN to be accessed belongs, configure receiver VPN BLUE
instance on PE1. Then, the source VPN instance and the receiver VPN instance can
exchange unicast routes. Hosts in the receiver VPN instance send Join messages to
the source VPN instance.
PE3 then encapsulates multicast Join messages with the share-group address of
VPN BLUE, and then sends the multicast Join messages to VPN BLUE on PE1
over the public network. PE1 then imports the multicast Join messages from VPN
BLUE to VPN RED. Therefore, the multicast Join messages reach the multicast
source of VPN RED. Similarly, after multicast traffic sent by the multicast source of
VPN RED is imported by PE1 to receiver VPN BLUE, VPN BLUE encapsulates
the multicast traffic with its share-group address, and then sends the multicast
traffic to the local VPN instance of PE3. Finally, the multicast traffic is forwarded
to the user on the associated VPN.
l Local-cross scenario
Figure 9-76 Networking diagram of the local-cross scenario of multicast VPN extranet
Receiver Source
VPN VPN
RED BLUE
CE1
IP MPLS Core
PE3
CE3-1 CE3-2
VPN VPN
RED BLUE
As shown in Figure 9-76, users at the site where CE3-2 resides need to receive multicast
data from both VPN BLUE and VPN RED. PE2 is the source PE of VPN BLUE. The
site where CE2 resides is connected to the multicast source of VPN BLUE. The
multicast source of VPN RED is connected to CE3-1. Both CE3-1 and CE3-2 are at the
PE3 side.
In the local-cross scenario, the receiver VPN and the source VPN are on the same PE,
and multicast traffic enters the PE through a VPN instance and leaves the PE through
another VPN instance. On PE3, the Import Route Target (IRT) of VPN BLUE needs to
be configured to be the same as the Export Route Target (ERT) of VPN RED so that
CE3-1 and CE3-2 can exchange VPN unicast routes. The process for a user to request
and receive multicast data from VPN RED is as follows:
a. A user at the site where CE3-2 resides requests multicast data from VPN RED. PE3
receives a PIM Join message from CE3-2, and then creates a multicast routing entry
of VPN BLUE. Through the RPF check, PE3 finds that the upstream interface of
the RPF route belongs to VPN RED. Then, PE3 sends a Join message to VPN RED.
b. PE3 creates a multicast routing entry (which has the receiver list including receiver
VPN BLUE) for VPN RED and then sends a PIM Join message to CE3-1.
c. The multicast data of VPN RED reaches PE3 through CE3-1. PE3 then imports the
multicast data to receiver VPN BLUE based on the multicast routing entries of VPN
RED.
d. After importing multicast data from VPN RED to VPN BLUE, PE3 sends the
multicast data to CE3-2 based on multicast routing entries of VPN BLUE. CE3-2
then sends the required multicast data of VPN RED to the user.
NOTE
A VPN extranet's multicast protocol and data packets are not encapsulated by GRE if the VPN extranet
connects to multicast sources on a public network.
Applicable Environment
In a multicast domain (MD)-based MVPN, because the PEs that belong to the same VPN
know neither the BGP peers of each other nor multicast source information, the PEs cannot
send Join messages to the multicast source to establish a Protocol Independent Multicast-
Source-Specific Multicast (PIM-SSM) multicast distribution tree (MDT). Therefore, the
share-MDT of the public network cannot use a PIM-SSM tunnel.
MVPN in BGP auto-discovery (A-D) mode is introduced to address this problem. In MVPN
in BGP A-D mode, PEs exchange BGP Update packets carrying A-D routes (recording the
peers of each PE) to automatically discover the BGP peers of the PEs on a multicast VPN. In
this manner, multicast VPN services can be transmitted over a public-network tunnel based on
a PIM-SSM MDT.
Related Concepts
BGP A-D MVPN related concepts are as follows:
l Peer: BGP speakers that exchange messages with each other are called peers.
l A-D route: is used to discover all peers in the same VPN. This type of route helps
implement tunnel setup and control message exchange between peers.
l BGP update message: is used to exchange routes between BGP peers.
Principles
Currently, two BGP A-D modes, namely, MDT-Subsequent Address Family Identifier (SAFI)
A-D and MCAST-VPN SAFI A-D, are supported:
l In MDT-SAFI A-D mode, a new address family is defined by BGP. In this manner, after
VPN instance is configured on a PE, the PE advertises the VPN configuration including
the RD and share-group address to all its BGP peers. After a remote PE receives an
MDT-SAFI message advertised by BGP, the remote PE compares the share-group
address in the message with its share-group address. If the remote PE confirms that it is
in the same VPN with the sender of the MDT-SAFI message, the remote PE establishes
the PIM-SSM MDT on the public network to transmit multicast VPN services.
l The principles of MCAST-VPN SAFI A-D are similar to that of MDT-SAFI A-D. That
is, the multicast VPN configuration is transmitted through BGP Update packets. The
difference is that in MCAST-VPN SAFI A-D mode, a BGP Update packet carries more
multicast VPN attributes and information for establishing the public network tunnel.
Therefore, the MCAST-VPN SAFI A-D mode is applicable to the next-generation
multicast VPN.
The same VPN to which different PEs are added can use the same BGP A-D mode, and the
different VPNs to which the same PE is added can use different BGP A-D modes to
automatically discover the BGP peers of PEs.
l Scenario where the same VPN to which different PEs are added can use the same BGP
A-D mode in MVPN in BGP A-D mode
Figure 9-77 Networking diagram of the scenario where the same VPN to which different
PEs are added can use the same BGP A-D mode in MVPN in BGP A-D mode
Source
VPN
RED
PE1
CE1
IP MPLS Core
PE2
PE3
CE3 CE2
VPN VPN
RED RED
As shown in Figure 9-77, PE1, PE2, and PE3 belong to VPN RED, and join the share-
group G1. The address of G1 is within the SSM group address range. BGP A-D in the
same mode is enabled on each PE. In addition, the BGP A-D function is enabled on VPN
RED. The site where CE1 resides is connected to Source of VPN RED, and CE2 and
CE3 are connected to VPN users. Based on the BGP A-D mechanism, every PE on the
network obtains and records information about all its BGP peers on the same VPN, and
then directly establishes a PIM-SSM MDT on the public network for transmitting
multicast VPN services. In this manner, MVPN services can be transmitted over a public
network tunnel based on the PIM-SSM MDT.
The following takes PE3 as an example to describe service processing in MVPN in BGP
A-D mode:
a. After being configured with the BGP A-D function, PE1, PE2, and PE3 negotiate
session parameters, and confirm that both ends support the BGP A-D function.
Then, the PEs can establish BGP peer relationships. After receiving a BGP Update
packet from PE1 and PE2 respectively, PE3 obtains and records the BGP peer
addresses of PE1 and PE2. The BGP Update packets carry the information about
the PEs that send packets, such as the PE address and supported tunnel type.
b. VPN RED is configured on PE3. PE3 joins the share-group G1. PE3 creates a PIM-
SSM entry with G1 being the group address and the address of PE1 being the
source address and another PIM-SSM entry with G1 being the group address and
the address of PE2 being the source address. PE3 then directly sends PIM Join
messages to PE1 and PE2 to establish two PIM-SSM MDTs to PE1 and PE2
respectively.
c. CE3 sends a Join message to PE3. After receiving the Join message, PE3
encapsulates the Join message with the PIM-SSM share-group address, and then
sends it to PE1 over the public network tunnel. PE1 then decapsulates the received
Join message, and then sends it to the multicast source.
d. After the multicast data sent by the multicast source reaches PE1, PE1 encapsulates
the multicast data with the share-group address, and then forwards it to PE3 over
the public network tunnel. PE3 then forwards the multicast data to CE3, and CE3
sends the multicast data to the user.
l Scenario where the different VPNs to which the same PE is added can use different BGP
A-D modes in MVPN in BGP A-D mode
Figure 9-78 Networking diagram of the scenario where the different VPNs to which the
same PE is added can use different BGP A-D modes in MVPN in BGP A-D mode
Source Source
VPN
RED VPN
BLUE
CE1
PE3
CE3 CE2
VPN VPN
RED BLUE
As shown in Figure 9-78, PE1 belongs to VPN RED, PE2 belongs to VPN BLUE, and
PE3 belongs to both VPN RED and VPN BLUE. BGP A-D in MDT-SAFI mode is
enabled on PE1, BGP A-D in MCAST-VPN SAFI mode is enabled on PE2, and both
BGP A-D in BGP MDT-SAFI mode and BGP A-D in MCAST-VPN SAFI mode are
enabled on PE3. In addition, on PE3, BGP A-D in MDT-SAFI A-D mode is used by
VPN RED whereas BGP A-D in MCAST-VPN SAFI A-D mode is used by VPN BLUE.
The site where CE1 resides is connected to the multicast source of VPN RED and the
site where CE4 resides is connected to the multicast source of VPN BLUE. CE2 and
CE3 are connected to VPN users. Based on the BGP A-D mechanism, BGP peers
enabled with BGP A-D in the same mode obtain the BGP A-D information from each
other, and every PE on the network obtains and records information about all peers on
the same VPN and directly establishes a PIM-SSM MDT on the public network for
transmitting multicast VPN services. In this manner, MVPN services can be sent over a
public network tunnel based on the PIM-SSM MDT.
The interaction between MVPNs supporting BGP A-D through different BGP A-D
address families is as follows:
a. After being configured with BGP A-D, PE1, PE2, and PE3 negotiate session
parameters. Because PE1 and PE2 are configured with BGP A-D in different
modes, PE1 and PE2 fail to negotiate session parameters and cannot set up the BGP
peer relationship. Because PE3 is configured with BGP A-D in both modes, PE3
can establish BGP peer relationships with PE1 and PE2 respectively. After
receiving a BGP Update packet from PE1 and PE2 respectively, PE3 obtains and
records the BGP peer addresses of PE1 and PE2. The BGP update packets carry the
information about the PEs that send packets, such as the PE address and supported
tunnel type.
b. PE3 encapsulates its BGP A-D information including the address and supported
tunnel type in a packet used by MVPN in MDT-SAFI mode and in another packet
used by MVPN in MCAST-VPN SAFI mode, and then sends the former packet to
PE1 and latter packet to PE2. After receiving the corresponding packet, PE1 and
PE2 record the information about PE3.
c. PE1 and PE3 that belong to the same VPN RED obtain each other's information,
and send an (S, G) Join message to each other. Similarly, PE2 and PE3 that belong
to the same VPN BLUE obtain each other's peers, and send an (S, G) Join message
to each other.
d. Finally, the multicast traffic of each VPN is forwarded to users attached to CE3 of
VPN RED and users attached to CE2 of VPN BLUE respectively.
Principles
As multicast services are widely deployed and the number of users requesting multicast
services is increasing, multicast users and multicast sources may reside in different ASs. To
allow multicast users in a different AS to enjoy multicast services, multicast services need to
be transmitted in a VPN across ASs. There are two types of inter-AS MVPNs: OptionA and
OptionC.
NOTE
Related Concepts
The following part briefly describes the concept of inter-AS MVPN with reference to the
following figures:
Implementation
Among two types of inter-AS MVPN, OptionA supports Any-Source Multicast (ASM) and
Source-Specific Multicast (SSM), whereas OptionC only support MDT-SAFI auto-discovery
(A-D) of MVPN BGP A-D in SSM. The implementation of each type of inter-AS MVPN is
described as follows:
l Inter-AS MVPN OptionA:
As shown in Figure 9-79, an independent multicast domain (MD) is established in each
AS, and VPN multicast data is transmitted between MDs.
Public instance
In inter-AS MVPN OptionA, VPN multicast data is transmitted in the following process:
a. CE1 in VPN1 sends VPN multicast data to the CE of ASBR1", that is, ASBR2";
CE2 sends VPN multicast data to the CE of ASBR2", that is, ASBR1".
b. After the VPN multicast data of CE1 reaches ASBR2", ASBR2" considers that the
multicast data comes from VPN2. ASBR2" then encapsulates the multicast data and
forwards it to PE2 and then CE2 in MD2. Similarly, VPN multicast data of CE2 can
also reach CE1 based on the preceding process. In this manner, CE1 and CE2 can
exchange VPN multicast data across ASs.
P1 P2
Public instance
MTI MTI
CE1 MD
Virtual multicast link
MT CE2
PE1" PE2"
VPN instance1 VPN instance2
In inter-AS MVPN OptionC, VPN multicast data is transmitted in the following process:
a. VPN multicast data of CE1 is encapsulated on PE1 based on MTI and then
forwarded over MT tunnels. The encapsulated VPN multicast data is transmitted
over the public network as common multicast data based on Share-Group or
Switch-Group entries of the public network.
b. The VPN multicast data reaches PE2 and inter-AS multicast is implemented. PE1
and PE2 do not know how VPN multicast data is transmitted across ASs and
consider that the VPN multicast data is transmitted within the same AS.
Application scenario
The application scenarios are inter-AS MVPN OptionA and Inter-AS MVPN OptionC.
Advantages
Inter-AS MVPN allows carriers to deploy multicast VPN across ASs to provide multicast
services for users in different ASs.
The single-AS MD VPN is mainly used to isolate multicast services in different VPNs within
a multicast domain (MD).
Source2
Source1
VPN CE2B
RED CE1R P VPN
BLUE
PC2
PC1
As shown in Figure 9-81, a single AS runs MPLS/BGP VPN. Both PE1 and PE2 are
configured with two VPN instances, namely, VPN BLUE and VPN RED, and the same Share-
Group address is set for the same VPN instances on the two PEs. In such a case, the VPN
instances with the same Share-Group address join the same MD. After the corresponding
Share-multicast distribution tree (MDT) is established, the protocol packets and low-rate data
in the VPNs can be transmitted through their respective Multicast Tunnel (MT).
VPN BLUE is taken as an example to describe how multicast services are transmitted
between VPNs.
1. A VPN instance named VPN BLUE is configured on both PE1 and PE2 and the
instances on the two PEs use the same Share-Group address. After the corresponding
Share-MDT is established, the VPN BLUE instances connected with CE1B and CE2B
can exchange multicast protocol packets through the corresponding MT.
2. Multicast devices in the VPNs connected with CE1B and CE2B can then establish
neighbor relationships, and send Join, Prune, and BootStrap router (BSR) messages to
each other. The protocol packets in the VPNs are encapsulated and decapsulated only on
the MT of the PEs. The devices, however, do not know that they are in VPN networks.
They still process the multicast protocol packets and forward multicast data packets like
the devices in the public network. In this way, multicast service transmission in one VPN
instance is implemented and multicast services in different VPN instances are isolated.
Terms
Terms Explanation
PIM It is a multicast routing protocol, with the full name being Protocol
Independent Multicast. Reachable unicast routes are the basis of PIM
forwarding. PIM uses the existing unicast routing information to perform
the RPF check on multicast packets to create multicast routing entries and
set up an MDT.
SPT It is a shortest path tree, with the multicast source being the root and
group members being leaves. SPT is applicable to PIM-DM, PIM-SM,
and PIM-SSM.
Share-Group Based on the MD principle, all the VPN instances on the PEs in the same
MD must join a common group, called a Share-Group.
Currently, one VPN instance can be configured with only one Share-
Group, that is, one VPN instance can join only one MD.
MTI MTI is short for Multicast Tunnel Interface. It is the outgoing interface or
incoming interface of an MT. An MTI is equal to the outgoing interface or
incoming interface of an MD. The local PE sends VPN data through an
MTI. The remote PE receives it through an MTI.
The MTI is the channel through which the public network instance and
VPN instances on PEs communicate. PEs are connected to an MT by
using MTIs, which is equal to the situation that PEs are connected to a
shared network segment. On each PE, VPN instances that belong to the
MD set up the PIM neighbor relationships on MTIs.
AS Autonomous System
RP Rendezvous Point
9.9.2 Principles
For details about the limit on global IGMP entries, see "IGMP-Limit" in IGMP Policy
Control.
The following methods can be used to limit the number of (S, G) entries in an MSDP SA
cache:
l Limiting the number of (S, G) entries in a single instance
l Limiting the total number of (S, G) entries on all MSDP devices
l Limiting the number of (S, G) entries on each MSDP peer
The limit on the number of PIM neighbors, which means the maximum number of PIM
neighbors in the neighbor list of an interface on a device, ensures the normal operation of a
device by preventing an interface on the device from establishing too many PIM neighbor
relationships.
Source Policy
A source policy is used to filter received multicast data packets based on source addresses or
source/group addresses.
SSM Policy
A Source-Specific Multicast (SSM) policy is used to change the SSM group address range in
a certain instance.
After an SSM policy is configured on a device, all PIM-SM interfaces on the device consider
that multicast groups whose addresses are within the SSM group address range adopt the
PIM-SSM model.
For details about an SSM policy, see SSM Mapping.
BSR Policy
A BootStrap router (BSR) policy is used to limit the range of valid BSR addresses. A device
configured with a BSR policy discards messages received from the BSRs whose addresses are
beyond the set address range, preventing BSR spoofing.
C-RP Policy
A Candidate-Rendezvous Point (C-Rendezvous Point) policy is used to set the range of valid
C-RP addresses and the range of multicast groups that each C-RP serves. The BSR configured
with a C-RP policy discards messages received from the C-RPs whose addresses are beyond
the set address range, preventing C-RP spoofing.
Register Policy
After being configured with a register policy, a device receives or denies Register messages
matching the register policy, preventing illegal Register messages.
MSDP SA Policy
A Multicast Source Discovery Protocol (MSDP) Source Active (SA) policy is used to filter
the received or sent SA messages.
When receiving SA messages from a specified MSDP peer or forwarding SA messages to a
specified MSDP peer, a device configured with an MSDP SA policy filters the (S, G)
information in the SA messages based on source addresses or source/group addresses. In this
manner, the transmission of source information or source/group information is controlled.
When creating an SA message, a device configured with this policy filters the (S, G) entry
contained in the SA message based on the source address or the source/group address,
controlling the advertisement of source or source/group information during the creation of an
SA message.
Multicast Boundary
The multicast information to which each multicast group corresponds needs to be transmitted
in a certain range on a network. A multicast boundary can be configured on an interface to
define the forwarding range of the data of a multicast group, forming a closed multicast
forwarding area. When an interface of a device is configured with a forwarding boundary for
a group, the interface does not forward or receive any packet of the group.
BSR Boundary
A BSR boundary can be configured an edge device to restrict the range of a PIM domain,
implementing refined management of networks.
For details about a Source Address-based IGMP Message Filtering, see "". The principles for
a Source Address-based MLD Message Filtering are similar to those for a Source Address-
based IGMP Message Filtering.
PIM Silent
PIM silent can be configured on the interface connecting a device to hosts to prevent hosts
from maliciously sending Hello messages to attack the device. After entering the PIM silent
state, the interface is forbidden to receive or forward any PIM packet. All PIM neighbors and
the PIM state machines on this interface are deleted and the interface automatically becomes a
DR. The IGMP function on the interface, however, is not affected.
For details about PIM silent, see "PIM Silent" in PIM Security.
10 MPLS
This document describes the MPLS in terms of the overview, principle, and applications.
10.1.1 Introduction
Background
IP-based Internet prevailed in the mid 90s. The technology is simple and costs little to deploy.
However, nowadays IP technology, which relies on the longest match algorithm, is not the
most efficient choice for forwarding packets.
In comparison, asynchronous transfer mode (ATM) technology is much more efficient at
forwarding packets. It uses labels (particularly, cells) of fixed length and maintains a label
table that is much smaller than a routing table. ATM technology, however, is a complex
protocol with a high deployment cost, which hinders its widespread popularity and growth.
Users wanted a technology that combines the best that IP and ATM technologies have to offer.
This has sparked the emergence of MPLS technology.
Multiprotocol Label Switching (MPLS) is designed to increase forwarding rates. Unlike IP
technology, MPLS analyzes packet headers only on the edge of a network, not at each hop.
Therefore, packet processing time is shortened.
MPLS no longer has the high-speed forwarding advantages since application-specific
integrated circuit (ASIC) technology has been developed to increase the routing rate. MPLS
supports multi-layer labels, and its forwarding plane is connection-oriented. MPLS is widely
used in virtual private network (VPN), traffic engineering (TE), and quality of service (QoS).
Overview
MPLS works between the data link layer and the network layer in the TCP/IP protocol stack.
MPLS provides connections for the IP layer and obtains services from the data link layer.
MPLS replaces IP forwarding with label switching. A label is a short connection identifier of
fixed length that is meaningful to the local end. The label is similar to the ATM virtual path
identifier (VPI)/virtual channel identifier (VCI) and the Frame Relay data link connection
identifier (DLCI). The label is encapsulated between the data link layer and network layer.
MPLS can use any Layer 2 media to transfer packets, but is not limited by any specific
protocol on the data link layer.
The origin of MPLS is the Internet Protocol version 4 (IPv4). The core MPLS technology can
be extended to multiple network protocols, such as the Internet Packet Exchange (IPX),
Appletalk, DECnet, and Connectionless Network Protocol (CLNP). MPLS supports label
switching between multiple network protocols, as implied by its name.
The MPLS technology is a tunneling technology, but not a service or an application. It
supports multiple protocols and services. Moreover, it improves data transmission security.
10.1.2 Principles
10.1.2.1 Concepts
LER
Non- MPLS MPLS network Non- MPLS
network network
Core LSR Core LSR
LER
LER
Core LSR
Core LSR
Non- MPLS Non- MPLS
network network
LER
LER
All LSRs on the MPLS network forward data based on labels. When an IP packet enters an
MPLS network, an LER adds a label to it. Before the IP packet leaves the MPLS network,
another LER removes the label.
The path that MPLS packets take in an MPLS network is called a label switched path (LSP).
The LSP is a unidirectional path that transmits traffic from the ingress to the egress.
MPLS network
LSP
The start node of an LSP is the ingress. The end node of the LSP is the egress. The nodes
between both ends along the LSP are transit nodes. An LSP may have none, one, or several
transit nodes and has only one ingress and one egress.
Label
A label is 20-bit identifier that uniquely identifies the FEC to which a packet belongs. A label
is only meaningful to a local end. A FEC can be mapped to multiple incoming labels to
balance loads, but a label only represents a single FEC. A label on an MPLS network
performs the same function as a virtual path identifier (VPI)/virtual channel identifier (VCI)
in an ATM network or a data link connection identifier (DLCI) in a Frame Relay network.
0 19 22 23 31
Label Exp S TTL
l Exp: a 3-bit l field used for extension. This field is used by the class of service (CoS)
function, which is similar to Ethernet 802.1p.
l S: a 1-bit field that identifies the bottom of a label stack. MPLS supports multiple labels
that may be stacked. If the S field value is set to 1, the label is at the bottom of the label
stack.
l TTL: a time to live value. The length is 8 bits. This field is the same as the TTL in IP
packets.
Labels are encapsulated between the data link layer and network layer and supported by all
data link layer protocols.
Label Space
Label space is the label value range. The ATN supports the following label ranges:
l 0 to 15: special labels. For details about special labels, see Table 10-1.
l ATN 910 support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
and dynamic signaling protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 910I support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
and dynamic signaling protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 910B support for 16 to 7167: the label space shared by static LSPs, static CR-LSPs,
and dynamic signaling protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 950B with the control board AND1CXPA/AND1CXPB installed support for 16 to
3071: the label space shared by static LSPs, static CR-LSPs, and dynamic signaling
protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 950B with the control board AND2CXPB/AND2CXPE installed support for 16 to
7167: the label space shared by static LSPs, static CR-LSPs, and dynamic signaling
protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 905 support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
and dynamic signaling protocols, such as LDP, RSVP-TE, and MP-BGP.
NOTE
When the ATN 905 supports 4K VLANs, the label space is 16 to 271.
0 IPv4 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress must remove the label from the packet.
The egress then forwards the packet using IPv4.
1 Router Alert If a node receives a packet carrying label with this value,
Label the node sends the packet to a software module, without
implementing hardware forwarding. The node forwards
the packet based on the next layer label. If the packet
needs to be forwarded using hardware, the node pushes
the Router Alert Label back onto the top of the label
stack before forwarding the packet.
This label takes effect only when it is not at the bottom
of a label stack.
2 IPv6 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress removes the label from the packet and
forwards the packet using IPv6.
13 OAM Router If the ingress receives a packet carrying a label with this
Alert Label value, the ingress considers it an Operation,
Administration and Maintenance (OAM) packet and
transparently forwards it to the egress. MPLS OAM
sends OAM packets to monitor LSPs and advertise
faults.
14 to 15 Reserved N/A
Label Stack
A label stack in an MPLS packet contains a set of labels. The label next to the Layer 2 header
is the top or outer label. The label next to the Layer 3 header is the bottom or inner label.
Theoretically, there is no limitation to the number of MPLS labels that can be stacked.
Label stack
Link layer header Outer label Inner label Layer3 header Layer3 payload
The labels are processed from the top of the stack based on the last in, first out principle.
Label Operations
The label forwarding table defines the following label operations:
l Push: The ingress adds a label to a packet between the Layer 2 header and IP header
before forwarding the packet over an MPLS network. Within an MPLS network, an LSR
adds a label to the top of the label stack.
l Swap: A transit node replaces a label on the top of the label stack in an MPLS packet
with another label, which is assigned by the next hop.
l Pop: The penultimate LSR removes the top label from the label stack to decrease the
number of labels in the stack. The egress removes a label from the MPLS packet before
the packet leaves an MPLS network.
The VPN Option C scenario supports the following actions to process labels:
l Swappush: swaps an existing outer label for a new one and pushes a label of another
tunnel into a packet.
l Popgo: pops out outer labels from a packet and pushes a label of another tunnel into the
packet.
LER
An LER is an LSR that resides on the edge of an MPLS domain. When an LSR connects to a
node that does not run MPLS, the LSR acts as the LER.
The LER classifies the packets entering an MPLS domain by FECs and pushes labels into
them. Then, the LER forwards MPLS packets based on these labels. When packets leave the
MPLS domain, the labels are popped out. The packets again become IP packets and are
forwarded.
LSPs are unidirectional and originate from the ingress and terminate at the egress. LSPs
perform the same functions on MPLS networks as permanent virtual circuits (PVCs) on ATM
and Frame Relay networks.
l Ingress LSR: the start node on an LSP. An LSP can have only one ingress.
The ingress pushes a new label into the packet and encapsulates the IP packet as an
MPLS packet to be forwarded.
l Transit LSR: the intermediate node of an LSP. Multiple transit LSRs may exist on an
LSP.
The transit LSR searches for routes in the label forwarding table and swaps labels to
forward MPLS packets.
l Egress LSR: the end node on an LSP. An LSP can have only one egress.
The egress removes labels from MPLS packets and forwards the resultant IP packets.
As shown in Figure 10-6, LSRA is the upstream LSR of LSRB, and the LSRB is the
downstream LSR of LSRA. Similarly, LSRB is the upstream LSR of LSRC. LSRC is the
downstream LSR of LSRB.
Downstream Downstream
LSR-A LSR-B LSR-C
data flow data flow
Label Distribution
An LSR records a mapping between a label and FEC and notifies upstream LSRs of the
mapping. This process is called label distribution.
On the network shown in Figure 10-7, packets with destination address 192.168.1.0/24 are
assigned to a specific FEC. LSRB and LSRC assign labels that represent the FEC and
advertise the mapping between labels and the FEC to upstream LSRs.
MPLS Architecture
As shown in Figure 10-8, the MPLS architecture consists of a control plane and a forwarding
plane.
Control Plane
IP Routing Protocol
Routing Information
Base (RIB)
Forwarding Plane
Label Forwarding
Information Base(LFIB)
l The control plane is connectionless and is used to distribute labels, create a label
forwarding table, and establish or tear down LSPs.
l The forwarding plane, also known as the data plane, is connection-oriented. It can apply
services and protocols supported by ATM, Frame Relay, and Ethernet networks. The
forwarding plane, also known as the data plane, is connection-oriented. It can apply
services and protocols supported by ATM, Frame Relay, and Ethernet. The forwarding
plane adds labels to IP packets, forwards packets based on the label forwarding table,
and removes labels from MPLS packets before the packets.
Procedure
MPLS assigns packets to a FEC, distributes labels that identify the FEC, and establishes an
LSP. Packets travel along the LSP.
Labels are assigned and distributed by a downstream LSR to an upstream LSR. As shown in
Figure 10-9, packets destined for 3.3.3.3 are assigned to a FEC. Downstream LSRs assign
labels for the FEC to upstream LSRs and use a label advertisement protocol to inform the
upstream LSRs of the mapping between the labels and FEC. Each upstream LSR adds the
mapping to a label forwarding table. An LSP is established using the label mapping
information.
LSPs can be either static or dynamic. Static LSPs are established manually. Dynamic LSPs
are established using a routing protocol and a label distribution protocol.
NOTE
A reachable route is only required on the ingress for establishing a static LSP, but not on the transit node
or egress.
A static LSP is established without label distribution protocols or the exchanging of control
packets. The static LSP has a low cost and is recommended for small-scale networks with
simple and stable topology. The static LSP cannot vary dynamically with the network
topology. Instead, it needs to be configured by an administrator.
0 31
Field Description
An LSP for FEC with the destination address 3.3.3.3/32 is established on the MPLS network
shown in Figure 10-11.
Egress ILM
InLabel
Nodes along an LSP search the following tables for entries used to forward MPLS packets:
1. The ingress searches the FIB and NHLFE tables.
2. The transit node searches the ILM and NHLFE tables.
3. The egress searches the ILM table.
FIB entries, ILM entries, and NHLFEs are associated with each other using the token field in
a tunnel ID.
l The ingress performs the following steps:
a. Searches the FIB table and finds a tunnel ID mapped to a specific destination IP
address.
b. Finds an NHLFE mapped to the tunnel ID in the FIB table and associates the FIB
entry with the NHLFE.
c. Searches the NHLFE table for the outbound interface name, next-hop IP address,
outgoing label value, and label operation. The label operation type is Push.
d. Pushes a label into an IP packet, processes the EXP field based on a specific QoS
policy and TTL field and sends the encapsulated MPLS packet to a transit node.
l A transit node performs the following steps:
a. Searches the ILM table mapped to an MPLS label for the token.
b. Finds the NHLFE mapped to the token in the ILM table and associates the FIB
entry with the NHLFE.
c. Searches the NHLFE table for the outbound interface name, next-hop IP address,
outgoing label value, and label operation.
d. Processes the MPLS packets based on the specific label value:
n If the label value is greater than or equal to 16, the label operation is Swap.
The transit node performs the following operations:
○ Replaces the existing label with a new label in the MPLS packet.
○ Processes the EXP field and TTL field.
○ Forwards the MPLS packet with the new label to the egress.
n If the label value is 3, the label operation is Pop. The transit node performs the
following operations:
○ Removes the label from the MPLS packet.
○ Processes the EXP field and TTL field.
○ Forwards the packet over IP routes or based on the next layer label.
l The egress performs the following steps:
a. Searches for the label operation. The operation is Pop.
b. Processes the EXP field and TTL field.
c. Determines the forwarding path:
– When the S field in the label is equal to 1, the label is at the bottom of the stack.
Therefore, the egress forwards the packet over an IP route.
– When the S field in the label is equal to 0, the label is not at the bottom of the stack.
Therefore, the egress forwards the packet based on the next layer label.
As defined in RFC 3443, MPLS processes TTLs in either uniform or pipe mode. By default,
MPLS processes TTLs in pipe mode.
l Uniform mode
The IP TTL value reduces by one each time it passes through a node in an MPLS
network.
When IP packets enter the MPLS network shown in Figure 10-13, the ingress reduces
the IP TTL value in an IP packet by one and copies the IP TTL into the MPLS TTL field.
Each transit node only processes the MPLS TTL. The egress reduces the MPLS TTL by
one and copies it into the IP TTL field before the packet leaves the MPLS network.
MPLS
CE PE P PE CE
MPLS MPLS
TTL 254 TTL 253
IP TTL IP TTL IP TTL IP TTL
255 254 254 252
l Pipe mode
The IP TTL value decreases by one only when passing through the ingress and egress.
On the network shown in Figure 10-14, the ingress reduces the IP TTL value in packets
by one and sets the MPLS TTL to a specific value. Transit nodes only process the MPLS
TTL. When the egress receives the packets, it removes the MPLS label carrying the
MPLS TTL from each packet and reduces the IP TTL value by one.
MPLS
CE PE P PE CE
MPLS MPLS
TTL 255 TTL 254
IP TTL IP TTL IP TTL IP TTL
255 254 254 253
Overview
On an MPLS network, when data fails to be transmitted across an LSP, the MPLS control
plane cannot detect the transmission failure. Network maintenance is difficult to carry out.
MPLS ping and traceroute functions provide a mechanism used to detect LSP faults and
locate faulty nodes.
MPLS ping is used to test the network connectivity and host accessibility. MPLS traceroute is
used to check network connectivity and locate network faults.
Similar to the IP ping and traceroute, the MPLS ping and traceroute monitor the LSP
availability using MPLS Echo Request and MPLS Echo Reply messages. These two messages
are sent over UDP with port number 3503. The receiver can distinguish between these
messages based on the received UDP port number.
An MPLS Echo Request message contains information about the FEC of the LSP to be
monitored. The message is sent like other packets that belong to the FEC along the LSP. The
LSP is monitored. Echo Request messages are transmitted to the destination using MPLS,
whereas MPLS Echo Reply messages are transmitted to the source using IP.
The destination address in the IP header of the Echo Request message is set to 127.0.0.1/8 and
the IP TTL is set to 1. This prevents the egress from forwarding the message to other nodes.
Format of message
The MPLS Echo Request and MPLS Echo Reply messages use the same format, as shown in
Figure 10-15.
Sender's Handle
Sequence Number
TLVs
……
Return Code The Return Code is set to zero by a sender. The receiver can set it
to one of the following values:
l 0: No return code
l 1: Malformed echo request received
l 2: One or more of the TLVs was not understood
l 3: Replying device is an egress for the FEC at stack-depth RSC
l 4: Replying device has no mapping for the FEC at stack-depth
RSC
l 5: Downstream mapping mismatch
l 6: Upstream interface index unknown
l 7: Reserved
l 8: Label switched at stack-depth RSC
l 9: Label switched but no MPLS forwarding at stack-depth RSC
l 10: Mapping for this FEC is not the given label at stack-depth
RSC
l 11: No label entry at stack-depth RSC
l 12: Protocol not associated with interface at FEC stack-depth
RSC
l 13: Premature termination of ping due to label stack shrinking
to a single label
Field Description
TimeStamp Sent Time (in seconds and microseconds) when the MPLS Echo
Request was sent.
TimeStamp Received Time (in seconds and microseconds) when the corresponding
MPLS Echo Request was received. It is carried in an MPLS Echo
Reply message.
Figure 10-16 shows the format of the TLV carried in the MPLS Echo Request and MPLS
Echo Reply messages used to monitor LDP LSPs. Figure 10-17 shows the format of the TLV
carried in the MPLS Echo Request and MPLS Echo Reply messages used to monitor RSVP
LSPs.
IPv4 prefix
Extended Tunnel ID
MPLS Ping
LSP
1.1.1.1/30 2.2.2.1/30 3.3.3.1/30
1.1.1.2/30 2.2.2.2/30 3.3.3.2/30
ATNA CX-B CX-C ATND
As shown in Figure 10-18, an LSP whose FEC is identified with the destination of ATND is
established on ATNA. ATNA uses the MPLS ping feature to monitor the LSP:
1. ATNA checks whether the LSP exists. For a TE tunnel, ATNA checks whether the tunnel
interface exists and whether a CR-LSP is established successfully. If the LSP does not
exist, an error message is returned, and ATNA stops pinging. If the LSP exists, ATNA
performs the following actions continuously.
2. ATNA constructs an MPLS Echo Request packet. The destination address in the IP
packet header is 127.0.0.1/8 and the IP TTL is 1. ATNA searches a matching LSP and
pushes a label (with the TTL of 255) of the LSP into the packet. Then, ATNA sends the
packet to CX-B.
3. CX-B and CX-C that serve as transit nodes forward the MPLS Echo Request packet as a
common MPLS packet.
If a transit node fails to forward the packet, the transit node returns a reply message
carrying the error code.
4. When the MPLS forwarding path is working properly, transit nodes forward the packet
successfully to ATND, namely, the egress of the LSP. ATND processes the packet and
replies with an MPLS Echo Reply packet.
MPLS Traceroute
As shown in Figure 10-18, ATNA uses the MPLS traceroute feature to monitor an LSP with
the destination address of 4.4.4.4/32:
1. ATNA checks whether an LSP exists.
– If the LSP exists, ATNA performs the following actions continuously.
– If the LSP does not exist, an error message is returned, and ATNA stops tracing the
route.
2. ATNA constructs an MPLS Echo Request packet. The destination address is 127.0.0.1/8
in the IP packet header and the IP TTL is 1. ATNA searches for a matching LSP and
pushes a label (with the TTL value of 1) of the LSP into the packet. Then, ATNA sends
the packet to CX-B. CX-B receives this packet and the TTL of the label times out. Then,
an MPLS Echo Reply message is returned. The destination UDP port and the destination
IP address of the MPLS Echo Reply message are the source UDP port and the source IP
address, respectively, of the MPLS Echo Request packet. The IP TTL is 255.
3. After receiving the MPLS Echo Reply message, ATNA sends an MPLS Echo Request
packet. The TTL of the label is 2. CX-B forwards this packet as a common MPLS
packet. CX-C receives this packet and the TTL of the label times out. Then, an MPLS
Echo Reply message is returned.
4. After receiving the MPLS Echo Reply message, ATNA sends an MPLS Echo Request
packet. The TTL of the label is 3. CX-B and CX-C forward this packet as a common
MPLS packet. ATND receives the packet and finds that the destination address of the
packet is a local loop IP address. Then, ATND returns an MPLS Echo Reply message.
10.1.3 Applications
10.1.3.1 MPLS-based VPN
A traditional virtual private network (VPN) transmits private network data over a public
network using tunneling protocols, such as the Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP), and Point to Point Tunneling Protocol (PPTP).
The MPLS-based VPN technology establishes LSPs to connect private network branches
within a single VPN and to connect VPNs. Figure 10-19 shows the devices in the MPLS-
based VPN.
The following devices are deployed on the MPLS-based VPN:
l Customer edge (CE): an edge device on a customer network. The CE can be a router, a
switch, or a host.
l Provider edge (PE): an edge device on a service provider network.
l Provider (P): a backbone device on an SP network. A P is not directly connected to CEs.
Ps only need to obtain basic MPLS forwarding capabilities and do not maintain VPN
information.
CE1 PE1
Backbone network
VPN
branch 1
PE2
CE2
VPN
branch 2
Policy-based routing (PBR) enables the ATN to select routes based on a user-defined policy,
which helps transmit traffic securely or balance traffic. On an MPLS network, IP packets that
meet a PBR policy can be forwarded along a specified LSP.
In Figure 10-20, ATN-A, ATN-B, ATN-C, ATN-D, and ATN-E are in the original network.
ATN-F and ATN-G are added to provide new services. Traffic is forwarded as follows:
l Traffic for original services is forwarded through the original network.
l Traffic for new services is forwarded by ATN-F and ATN-G.
ATN-F ATN-G
To allow part of the new services to pass through the original network, the PBR can be
configured on ATN-A. The services matching a specific PBR policy can travel along LSPs
over the original network.
You can also use the PBR to the LSP together with LDP FRR to divert some traffic to the
backup LSP for load balancing.
Term Definition
DoD downstream-on-demand
DU downstream unsolicited
10.2.1 Introduction
Definition
The Label Distribution Protocol (LDP) is a control protocol of Multiprotocol Label Switching
(MPLS). It is similar to a signaling protocol working on a traditional network. It classifies
packets based on forwarding equivalence classes (FECs), distributes labels, and establishes
and maintains label switched paths (LSPs). In addition, LDP defines the messages and
procedures for distributing labels.
Purpose
MPLS supports multiple labels and its forwarding plane is connection-oriented, and this
excellent scalability enables the MPLS/IP-based network to provide various services. Label
switching routers (LSRs) run LDP to map routing information at the network layer to the
switched paths at the data link layer, and establish LSPs at the network layer. LDP features
simple networking and configurations, supports route topology-driven establishment of LSPs,
and supports large-capacity LSPs, and is widely used to provide virtual private network
(VPN) services.
10.2.2 Principles
10.2.2.1 Concepts
The MPLS architecture consists of multiple label distribution protocols, among which LDP is
widely used. Label switching routers (LSRs) exchange LDP messages to obtain information
about incoming labels, next-hop nodes, and outgoing labels for specified FECs so that they
can establish LSPs. For LDP specifications, see RFC 5036 titled "LDP Specification."
LDP Adjacency
When an LSR receives a Hello message from a peer, the LSR establishes an adjacency with
the peer. An LDP adjacency maintains a peer relationship between the two LSRs. There are
two types of LDP adjacencies:
l Local adjacency: established by exchanging Link Hello messages between two LSRs.
l Remote adjacency: established by exchanging Target Hello messages between two LSRs.
LDP Peers
Two LDP peers establish an LDP session and exchange Label Mapping messages over the
session so that they can establish an LSP.
LDP peers learn each other's labels through the LDP session between them.
LDP Sessions
An LDP session established between LSRs helps them exchange messages, such as Label
Mapping messages and Label Release messages. LDP sessions are classified into the
following types:
l Local LDP session: established between two LSRs that are directly connected.
l Remote LDP session: established between two LSRs that are directly or indirectly
connected.
Hello message
Step1
TCP Connection
Step2
The actor sends an Initialization
message to negotiate about parameters
Step3
When the parameters are received,
an Initialization message and a
Keepalive message are sent
Step4
When the parameters are received,
a Keepalive message is sent
Step5
1. Two LSRs send a Hello message to each other. The Hello message contains the transport
address that the two parties use to establish an LDP session. The LSR with the larger
transport address initiates a TCP connection and functions as the active role. As shown
in Figure 10-21, LSRA starts to establish a TCP connection and functions as the active
role, and LSRB waits for the TCP connection request and functions as the passive role.
2. After the TCP connection is successfully established, LSRA sends an Initialization
message to negotiate parameters used to establish the LDP session with LSRB. These
parameters include the LDP version, label distribution mode, value of the Keepalive
timer, maximum length of PDUs, and label space.
3. After receiving the Initialization message, either of the following situations occurs:
– If LSRB rejects some parameters, it sends a Notification message to instruct LSRA
to terminate the process of establishing the LDP session. The whole process ends.
– If LSRB accepts all parameters, it sends an Initialization message and a Keepalive
message to LSRA.
4. After receiving the Initialization message, either of the following situations occurs:
– If the active role LSRA cannot accept some parameters, it sends a Notification
message to LSRB to terminate the process of establishing the LDP session.
– If LSRA accepts all parameters, it sends a Keepalive message to LSRB.
After both LSRs receive the Keepalive messages from each other, the LDP session is
successfully established.
algorithm to calculates a PQ node, the ingress needs to run LDP to automatically establish a
remote LDP session with the destination IP address set to the PQ node's IP address.
After an Remote LFA-enabled LSR receives a Targeted Hello message with the R bit of 1, the
LSR automatically establishes a remote LDP peer relationship with its peer and replies with a
Targeted hello message with the R bit of 0, which triggers the establishment of a remote LDP
session. The R bit of 1 in the Targeted Hello message indicates that the receive end
periodically replies with a Targeted Hello message. The R bit of 0 in the Targeted Hello
message indicates that the receive end does not need to periodically reply with a Targeted
Hello message. If the LSR does not receive a Targeted Hello message with the R bit of 1, the
LSR deletes the established remote LDP session.
NOTE
On the ATN, LDP by default works in the DU label advertisement mode, ordered label control mode,
and liberal label retention mode.
As shown in Figure 10-23, the upstream ingress sends the Label Request message. The
downstream egress receives this message and sends the Label Mapping message
upstream to advertise the label of the host route to 192.168.1.1/32.
An upstream LSR and a downstream LSR must use the same label advertisement mode.
If the next hop of an LSR changes, either of the following situations occurs:
l In Liberal mode, the LSR can use an existing label advertised by a non-next LSR to
quickly establish an LSP. (For information about the establishment of an LDP LSP, see
Establishment of an LDP LSP). Liberal mode requires more memory and label space
than conservative mode.
An LSP that is assigned a label but is not successfully established called a Liberal LSP.
l In Conservative mode, the LSR only preserves the label advertised by a new next hop. In
most cases, the conservative and DoD modes are used simultaneously. This mode saves
memory and label space but the LSP is reestablished more slowly.
Conservative label retention mode is usually used together with DoD on the LSRs that
have limited label spaces.
1. If an LER finds a new host route in the routing table and the destination IP address in the
host route is mapped to no existing FEC, the LER by default creates a FEC for the
destination IP address.
2. If the egress has available labels, it distributes labels for FECs and proactively sends a
Label Mapping message to an upstream transit LSR. The Label Mapping message
contains distributed labels and bound FECs.
3. After receiving the Label Mapping message, a transit LSR adds the mapping entry to its
label forwarding table and then proactively sends a Label Mapping message of the
specified FEC to the ingress.
4. After receiving the Label Mapping message, the ingress also adds the mapping to its
label forwarding table. An LSP is established, and the packets classified as the FEC can
be forwarded based on the label.
NOTE
The maximum number of equal-cost LDP LSPs that can be established on the ingress or a transit node
depends on the device type.
A proxy egress LSP can be established on a network with MPLS-incapable routers or in the
Border Gateway Protocol (BGP) route load balancing scenario. For example, on the network
shown in Figure 10-24, LSRA, LSRB, and LSRC, all except LSRD, are in an MPLS domain.
An LSP is established along the path LSA -> LSRB -> LSRC. LSRC functions as a proxy
egress and extends the LSP to LSRD. The extended LSP is a proxy egress LSP.
Proxy Egress
Principles
After the GR Restarter performs an AMB/SMB switchover, the GR Helper's interface may go
Up slowly. As a result, the GR Helper fails to receive the Hello messages in the following
situations:
l A coexistent local and remote LDP session is established or multiple LDP-enabled links
reside between GR-enabled LSRs (called GR Restarter and Helper).
l Multiple LDP-enabled links reside between GR-enabled LSRs. These LSRs are called
GR Restarter and Helper.
This causes link protocol timeout on the control plane. The GR Helper cannot receive Hello
messages before the Hello hold timer expires. In this situation, the LDP adjacency goes
Down, without bringing down the LDP session between the GR Restarter and GR Helper.
This is because there are still other LDP adjacencies. As a result, the GR Helper does not
enter the GR process and deletes the LDP LSP for the LDP adjacency that went Down.
To prevent this problem, delayed LDP adjacency deletion can be used. This setting allows
LDP to delete an LDP adjacency in the Down state and its LDP LSP a specified delay after
the LDP adjacency goes Down, ensuring stable LSP traffic transmission.
Implementation
Delayed LDP adjacency deletion is implemented as follows:
Usage Scenario
Delayed LDP adjacency deletion is used when a coexistent local and remote LDP session is
established or multiple LDP-enabled links reside between the GR Restarter and Helper.
l Scenario in which a coexistent local and remote LDP session is established between the
GR Restarter and Helper
On the network shown in Figure 10-25, a coexistent local and remote LDP session is
established between the GR Restarter and Helper and is maintained by the local and
remote LDP adjacencies.
After the GR Restarter performs an AMB/SMB switchover, an interface may go Up
slowly before being able to send Hello messages to the GR Helper. The GR Helper
cannot receive the Hello messages before the Hello hold timer expires. This causes the
LDP local adjacency to go Down. The LDP session does not go Down, because it is
maintained by the remote LDP adjacency. As a result, the GR Helper does not enter the
GR process and deletes the LDP LSP for the local LDP adjacency. This causes traffic
loss during the GR process. In this case, delayed LDP adjacency deletion can be
deployed so that the local adjacency and its LSP are deleted only after the specified
delay. As a result, the link protocol can be restored to ensure traffic transmission.
Figure 10-25 Scenario in which a coexistent local and remote LDP session is established
between the GR Restarter and Helper
Local Adjacency
GR Remote Adjacency GR
Restarter Helper
Local Adjacency
GR Remote Adjacency GR
Restarter Helper
l Scenario in which multiple links exist between the GR Restarter and Helper
On the network shown in Figure 10-26, an LDP session is established between the GR
Restarter and Helper and is maintained by two local LDP adjacencies.
After the GR Restarter performs an AMB/SMB switchover, an interface may go Up
slowly before being able to send Hello messages to the GR Helper. The GR Helper
cannot receive the Hello messages before the Hello hold timer expires. This causes LDP
local adjacency 1 to go Down. The LDP session does not go Down, because it is
maintained by local LDP adjacency 2. As a result, the GR Helper does not enter the GR
process and deletes the LDP LSP for local LDP adjacency 1. This causes traffic loss
during the GR process. In this case, delayed LDP adjacency deletion can be deployed so
that local adjacency 1 and its LSP are deleted only after the specified delay. As a result,
the link protocol can be recovered to ensure traffic transmission.
Figure 10-26 Scenario in which multiple links exist between the GR Restarter and
Helper
Local Adjacency1
GR Local Adjacency2 GR
Restarter Helper
Benefits
This function minimizes packet loss during a GR process and helps implement stable traffic
transmission when a coexistent local and remote LDP session is established or multiple LDP-
enabled links reside between the GR Restarter and Helper.
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
LDP converges slower than IGP routes, causing traffic loss. Traffic is dropped when an active
link fails and recovers on the network shown in Figure 10-27 configured with the active and
standby links.
1. When an active link fails, an IGP route of a standby link becomes reachable. A backup
LSP over the standby link takes over traffic. This process is implemented usually using
LDP FRR. After the active link recovers, the IGP route of the active link becomes
reachable before an LDP session is established over the active link. As a result, traffic is
dropped when being transmitted using the reachable IGP route along the unreachable
LSP.
2. When the IGP route of the active link is reachable and an LDP session between nodes on
the active link fails, traffic is directed using the IGP route of the active link, whereas the
LSP over the active link is torn down. Because a preferred IGP route of the standby link
is unavailable, an LSP over the standby link cannot be established, causing traffic loss.
Primary tunnel
LSR4 Backup tunnel
Related Concepts
LDP-IGP synchronization delays IGP route advertisement to ensure that the LDP session and
IGP route converge simultaneously.
Implementation
l LDP-IGP synchronization state machine
After LDP-IGP synchronization is enabled on an interface, the LDP-IGP synchronization
state machine operates based on the flowchart shown in Figure 10-28.
Query interface
LDP session goes Down
and LDP session
status
Init
NOTE
The Hold-down timer can be set. Using the default value is recommended.
b. The following process takes place after the physical fault is rectified on the active
link:
i. An LDP session between nodes on the active link fails.
ii. The LDP module notifies the IGP module of the fault. The IGP interface enters
the Hold-max-cost state. An IGP advertises the maximum cost of the active
link and starts the Hold-max-cost timer.
iii. The IGP route of the standby link becomes reachable.
iv. An LSP is established over the standby link and the LDP module on LSR2
delivers forwarding entries.
NOTE
The Hold-max-cost timer can be configured to always advertise the maximum cost of the
active link. This setting allows traffic to keep traveling through the standby link before the
LDP session over the active link is reestablished.
Other Functions
On the MPLS network shown in Figure 10-29, a graceful restart (GR) process is performed
after LSR2 goes faulty. The LDP session between LSR2 and LSR3 may be established after
the GR process is complete. If LDP-IGP synchronization is enabled on the interface between
LSR2 and LSR3, LSR2 and LSR3 perform the following operations:
l LSR2 functioning as a GR Restarter
a. During the GR process, an IGP advertises the actual cost of the active link and
starts the GR Delay timer that delays the GR completion. The LDP session is
waiting to be established before the GR is complete.
b. After the GR Delay timer expires, the GR is complete. If the LDP session is not
established at this time, the IGP starts the Hold-max-cost timer and advertises the
maximum active link cost of the interface, switching the IGP route to the standby
link.
c. If the LDP session is reestablished or the Hold-max-cost timer expires, the IGP
resumes the actual link cost of the interface, switching the IGP route back to the
active link.
l LSR3 functioning as a GR Helper
a. LSR3 retains the original IGP route and the LSP before the LDP GR is complete.
When the LDP session goes Down, LDP does not notify the IGP link of the session
Down event. In this case, the IGP still advertises the actual link cost, ensuring that
the IGP route is not switched to the standby link.
b. If the LDP session is not established after the GR is complete, the IGP starts the
Hold-max-cost timer and advertises the maximum active link cost of the interface,
switching the IGP route to the standby link.
c. If the LDP session is reestablished or the Hold-max-cost timer expires, the IGP
resumes the actual link cost of the interface, switching the IGP route back to the
active link.
Usage Scenario
Figure 10-30 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 10-30, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
PE1 P1 P4 PE2
P3
CE2 CE4
VPN
Primary tunnel
Backup tunnel
Benefits
Packet loss is reduced during an active/standby link switchover or the GR process, improving
network reliability.
delay traffic switchback to the active link. This process ensures that LDP is synchronized with
static routes.
Synchronization between LDP and static routes is used to minimize packet loss during a
traffic switchover or switchback on the network with active and standby links. As shown in
Figure 10-31, on a network with active and standby links, a static route is configured between
LSRA and LSRD, and an LSP between the two devices is established based on the static
route. Normally, Link A is preferred.
Figure 10-31 Networking diagram for LSP switching with synchronization between LDP and
static routes
LSRB
Link A
LSRA LSRD
Link B
LSRC
l Switchover scenario
If the link between LSRA and LSRB is working properly but the LDP session between
LSRA and LSRB goes Down, the static route on the active link LinkA remains reachable
but the LSP on Link A is deleted. The static route on the standby link LinkB is not
available, so that no LSP can be established on LinkB, which causes traffic loss on the
deleted LSP.
After synchronization between LDP and static routes is enabled on LSRA, the LDP
session between LSRA and LSRB goes Down, causing the static route on LinkA to
become unreachable. The LDP session between LSRA and LSRC goes Up, causing the
static route on LinkB to become reachable. As a result, the LSP switches from LinkA to
LinkB so that traffic on the LSP is not interrupted.
l Switchback scenario
If the link between LSRA and LSRB fails, both the static route and the LSP on LinkA
switch to LinkB. After the link between LSRA and LSRB recovers, the static route
precedes the LSP and switches back to LinkA. This is because a static route converges
faster than LDP. In this case, the backup LSP on LinkB cannot be used. The LSP is not
established on LinkA, causing the LSP traffic interruption.
After synchronization between LDP and static routes is enabled on LSRA, the static
route on LinkA does not become reachable until the LDP session between LSRA and
LSRB goes Up. This configuration enables the static route and the LSP to switch back to
LinkA at the same time, which prevents traffic loss.
10.2.2.8 LDP GR
LDP graceful restart (GR), with the help of a helper, implements uninterrupted forwarding
during an active main board (AMB)/standby main board (SMB) switchover. Without GR,
LDP GR timers
l MPLS Forwarding State Holding timer: When a GR restarter restarts the LDP protocol,
it sets forwarding entries to the Down state and starts this timer. After this timer expires,
a device deletes forwarding entries in the Stale state.
l LDP Reconnect timer: After a GR helper finds that the LDP session established with the
GR restarter goes Down, the helper retains the FEC label mapping for the LDP session,
sets the mapping to the Stale state, and starts the LDP Reconnect timer. If the LDP
session is not established after this timer expires, the GR helper deletes the FEC label
mapping and forwarding entries for the LDP session.
l LDP Recovery timer: After the LDP session is reestablished between the GR helper and
restarter, the GR helper starts the LDP recovery timer. After this timer expires, the GR
helper deletes the stale FEC label mapping and forwarding entries for the LDP session.
Figure 10-32 describes the usage scenario and timing sequence of using the preceding timers.
Figure 10-32 Usage scenario and timing sequence of using LDP GR timers
MPLS Forwarding State Holding timer
Session
reestablishment
GR Restarter
time
An LDP session goes LDP restarts. The LDP session is A forwarding entry is Time
Down. reestablished. created.
Detects that the LDP The LDP session is A forwarding entry is Time
session went Down. reestablished. created.
LDP GR implementation
4. If the LDP is reestablished between the restarter and helper before the LDP Reconnect
timer expires, the helper deletes the LDP Reconnect timer and starts the LDP Recovery
timer.
5. Before the LDP Recovery timer expires, the helper and restarter exchange Label
Mapping messages to restore forwarding entries.
6. After the MPLS Forwarding State Holding timer expires, the restarter deletes the stale
forwarding entries.
7. After the LDP Recovery timer expires, the helper deletes the stale FEC label mapping
and forwarding entries.
Establishes an Establishes an
LDP session LDP session
Both ends exchange Label Mapping
messages.
Creates a Creates a
forwarding entry forwarding entry
NOTE
The ATN can only function as the GR helper.
The non-stop routing (NSR) technology is an innovation based on the non-stop forwarding
(NSF) technology, whereas is naturally different from NSF. If a software or hardware fault
occurs on the control plane, NSR ensures the uninterrupted forwarding and the uninterrupted
connection on the control plane. In addition, the control plane of a neighbor does not detect
the fault.
LDP NSR is implemented through synchronization between the master control board and
slave control board. When being started, the slave control board backs up data of the master
board in batches to ensure data consistency on both boards at this stage. LDP NSR
simultaneously notifies the master and slave control boards of receiving packets in real time.
The slave control board synchronizes data with the master board. NSR then ensures that after
the switchover, the slave board fast takes over services on the original master board, whereas
the neighbor does not detect the fault on the local router.
LDP NSR synchronizes the following key data between the master control board and slave
control board:
l LSP forwarding entries
l Cross connect (XC) information, used to describe the cross connection between a
forwarding equivalence class (FEC) and an LSP
l Labels, including the following types:
– LDP LSP labels on a public network
– Labels of VCs in Martini mode in a VLL networking
– Labels of VCs in Martini mode in a VPLS networking
– PW labels used by dynamic PWs in a PWE3 networking
l LDP protocol control blocks
Usage Scenario
Figure 10-34 Typical usage scenario for LDP FRR (triangle topology)
LSRC
LSRA LSRB
Figure 10-34 shows a typical usage scenario for LDP FRR. The preferred LSRA-to-LSRB
route is LSRA-LSRB, and the second optimal route is LSRA-LSRC-LSRB. A primary LSP
between LSRA and LSRB is established on LSRA, and a backup LSP of LSRA-LSRC-LSRB
is established to protect the primary LSP. After receiving a label from LSRC, LSRA compares
the label with the LSRA-to-LSRB route. Because the next hop of the LSRA-to-LSRB route is
not LSRC, LSRA preserves the label as a liberal label. If either of the following conditions is
met, a specific situation occurs:
l The source of a liberal label for LDP manual FRR corresponds to a specified outbound
interface and next hop.
l The backup route corresponding to the source of the liberal label for LDP auto FRR
exists, and its destination meets the policy for LDP to create a backup LSP, and no
backup manual FRR LSP is established over the backup route.
LSRA can apply for a forwarding entry for the liberal label, establish a backup LSP as the
backup forwarding entry of the primary LSP, and send the entries mapped to both the primary
and backup LSPs to the forwarding plane. In this way, the primary LSP is associated with the
backup LSP.
LDP FRR is triggered when the interface detects fault by itself, BFD detects faults in the
interface, or BFD detects a primary LSP failure. After LSP FRR is complete, traffic is
switched to the backup LSP based on the backup forwarding entry. Then, the route is
converged to LSRA-LSRC-LSRB. An LSP is established on the new LSP (the original
backup LSP), and the original primary LSP is torn down, and the traffic is forwarded along
the new LSP over the path LSRA-LSRC-LSRB.
Figure 10-35 Typical usage scenario for LDP FRR (rectangle topology)
S D
N1 N2
LDP FRR is applicable to a triangle network with three Figure 10-34s deployed, but may be
not supported in a square network with four Figure 10-34s. On the network shown in Figure
10-35, if the optimal route from N1 to D is N1-N2-D (load balancing is unavailable), then S
receives a liberal label from N1 and is bound to LDP FRR. If the link between S and D is
faulty, traffic is switched to the route of S-N1-N2-D without forming a loop.
However, if the optimal route from N1 to D is load balanced between N1-N2-D and N1-S-D,
the S as the downstream neighbor of N1 does not necessarily receive the liberal label from
N1. In addition, although the S receives the liberal label (LDP distributes labels for each peer)
and is configured with LDP FRR, traffic may still go to the S after traffic switches to N1,
which leads to a loop, till the route from N1 to D is converged to N1-N2-D.
LDP Remote LFA FRR
LDP LFA FRR cannot calculate backup paths on large networks, especially ring networks,
which fails to meet reliability requirements. To address this issue, LDP Remote LFA FRR is
used. Remote LFA FRR is implemented based on IGP Remote LFA FRR's (OSPF IP FRR)
LDP Auto FRR. Figure 10-36 illustrates the typical LDP Auto FRR usage scenario. The
primary LDP LSP is established over the path PE1 -> PE2. Remote LFA FRR establishes a
Remote LFA FRR LSP over the path PE1 -> P2 -> PE2 to protect the primary LDP LSP.
Figure 10-36 Typical LDP Auto FRR usage scenario - ring topology
PE1 PE2
P1 P2 (PQ)
LDP tunnel
3. LDP-enabled PE1 establishes an LDP LSP over the path PE1 -> P1 -> P2 with the
iterated outbound interface's next hop. This LSP is called a Remote LFA FRR iterated
LSP.
If PE1 detects a fault, PE1 rapidly switches traffic to the Remote LFA FRR LSP.
Principles
LDP LSP forwarding and common IP forwarding differ greatly in terms of implementation
mechanism but share a large number of similar aspects about the MTU. Both of them are
required to send packets smoothly to the receiver through each hop without reassembly.
The MPLS MTU, like the interface MTU, has a default value and is configurable. Before
informing the upstream device of the LDP MTU, an LSR calculates the LDP MTU by
selecting the smallest value among the MTU values used by all downstream devices and the
MTU of the egress. The LSR adds the smaller MTU value to the MTU TLV of a Label
Mapping message and then sends the message to an upstream device. If any of the two MTUs
mentioned previously changes due to configuration modifications or the outbound interface
changes on the local end, the LSR recalculates the MTU and sends a Label Mapping message
that contains the calculated MTU to all upstream devices.
LDP MD5
Message-digest algorithm 5 (MD5) is a standard digest algorithm defined in RFC 1321.
Typically, MD5 is used to compute a message digest to prevent message spoofing. The MD5
message digest is a uniquely calculated by an irreversible character string algorithm. If a
message is modified during transmission, a different digest is generated. After the message
arrives at the receiving end, the receiving end can determine whether the packet is modified
by comparing the received digest with the pre-computed digest.
LDP MD5 verifies LDP packets against modifications by generating a unique digest from the
same message. This authentication is stricter than the common checksum verification of TCP
connections.
Before sending packets over a TCP connection, the sender performs LDP MD5 authentication
adding the unique message digest after the TCP header. The message digest is computed
using the TCP header, LDP message, and password set by the user.
After receiving this TCP packet, the receiver obtains the TCP header, digest, and LDP
message, and uses MD5 to calculate a digest based on the received TCP header, received LDP
message, and locally stored password. The receiver compares the calculated digest with the
received one to check whether the packet is modified.
A password can be set in either ciphertext or simple text. The simple password is directly
recorded in the configuration file. The ciphertext password is recorded in the configuration
file after being encrypted using a special algorithm.
During the calculation of a digest, the manually entered character string is used regardless of
whether the password is in simple text or ciphertext. This means that a password in ciphertext
does not participate in MD5 calculation.
LDP Keychain
Keychain, an enhanced encryption algorithm to MD5, calculates a message digest for the
same LDP message to prevent the message from being modified.
During keychain authentication, a group of passwords is defined to form a password string.
Each password is specified with encryption and decryption algorithms, such as MD5
algorithm and SHA-1, and is assigned the validity period. The system selects a valid password
based on the user's configuration. Within the validity period of the password, the system uses
the encryption algorithm matching the password to encrypt the packet before sending it out, or
uses the decryption algorithm matching the password to decrypt the packet before accepting
it. In addition, the system automatically uses a new password after the previous password
expires, preventing the password from being decrypted.
The keychain authentication password, the encryption and decryption algorithms, and the
password validity period, the three of which construct a keychain configuration node are
configured using different commands. A keychain configuration node requires at least one
password and encryption and decryption algorithms.
To reference a keychain configuration node, specify a desired peer and the name of the node
in the MPLS LDP view so that an LDP session is encrypted. Different peers can reference the
same keychain configuration node.
BP1 BP2
RSVP LSP
As shown in Figure 10-37, the entire network is an MPLS VPN that runs LDP as a signaling
protocol and provides common VPN services. LSR1 and LSR5 are PEs. After a large number
of users are connected to the network, all traffic between LSR1 and LSR5 passes through the
link between LSR2 and LSR3. The link is then congested. The link between LSR2 and BP1 is
idle. The LSP, however, cannot use the link between LSR2 and BP1 because the IGP cost of
this link is high.
To prevent traffic congestion, LDP over TE can be deployed. A TE tunnel can be established
between LSR2 and LSR4, and the tunnel passes through BP1 and BP2. The IGP cost value is
adjusted so that routes can be balanced on LSR2 on the following two types of interfaces:
A specific number of TE tunnels can be established on idle links. This setting has more
advantages than adjusting IGP cost values and is widely applied in MPLS TE.
Principles
Coexistence of local and remote LDP session mainly applies to L2VPNs.
Both the local and remote LDP adjacencies can be connected to the same peer. The peer is
maintained by both the local and remote LDP adjacencies.
In Figure 10-38, when the local LDP adjacency is deleted due to the faulty link to which the
adjacency is connected, the peer type may change without affecting the existence and status of
the peer. The peer type is determined by the type of adjacencies. The type of adjacencies can
be local, remote, or coexistence of the local and remote.
If the link becomes faulty or is recovering, the peer type may change, and the session type
corresponding to the peer also changes. The session remains Up, not being deleted or going
Down.
Usage Scenario
Figure 10-38 Networking topology for the coexistence of the local and remote LDP sessions
Remote Adjacency
Local
CE1 PE1 Adjacency PE2 CE2
A typical application scenario is L2VPN. In Figure 10-38, L2VPN services are configured on
PE1 and PE2. When the directly connected link between PE1 and PE2 is disconnected and
then recovers, the procedure is as follows:
1. A session for the coexistence of the local and remote LDP adjacencies is created on the
two directly connected devices. L2VPN messages are sent over this session.
2. The physical link between PE1 and PE2 becomes Down, and the local LDP adjacency of
the peer becomes Down. The route between PE1 and PE2 is reachable through the P
because the remote LDP adjacency is still Up. When the session type changes, the
session becomes a remote session and is still Up. The L2VPN cannot detect the session
type change and does not delete the session. This implementation prevents the L2VPN
from disconnecting neighbors and then recovering and reduces the service interruption
time.
3. After the fault is rectified, the link between PE1 and PE2 goes Up and then the local
LDP adjacency goes Up. If the session type changes, the session is restored to a session,
through which the local LDP adjacency and remote LDP adjacency can coexist, and the
session is still Up. The L2VPN cannot detect the session type change and does not delete
the session. This implementation helps reduce the service interruption time.
>P2->P4->P3->PE3, and P2 becomes the downstream node of P1. P2, however, does not send
the Label Mapping message to P1 and has to wait to resend the Label Mapping message. In
the process, LSP reconvergence is slow.
When LDP distributes labels for all peers, and P2 receives the Label Mapping message from
P1, P2 directly sends the Label Mapping message about the route to P1 and LDP generates a
liberal LSP on P1. In this manner, when the link between P1 and P3 is faulty, the route from
PE1 to PE3 is switched from PE1->P1->P3->PE3 to PE1->P1->P2->P4->P3->PE3, P2
becomes the downstream node of P1, and the liberal LSP directly changes to a normal LSP.
Then, LSP convergence is accelerated.
In addition, you can configure split horizon to determine the upstream peers to which Label
Mapping messages are sent, and the upstream peers to which Label Mapping messages are not
sent.
Figure 10-39 Networking topology for distributing labels for all peers by LDP
PE1 P1 P3 PE3
PE2 P2 P4 PE4
Primary LSP
Backup LSP
LSP from P2 to PE3
Background
By default, the Label Distribution Protocol (LDP) establishes label switched paths (LSPs)
using Interior Gateway Protocol (IGP) host routes with 32-bit masks. A growing network has
an increasing number of routes that are used to establish a great number of LDP LSPs. Since
only some LDP LSPs are used to transmit services, the other LDP LSPs not in use cause
forwarding resource wastes, or LSPs for some services fail to be established.
Although manual policies prevent unwanted LDP LSPs from being established, they have the
following drawbacks:
l The configuration is complex and involves operations on multiple devices.
l Configuration errors may arise if policy configurations differ between devices.
To address these issues, a smart LDP ingress policy can be configured to only allow service-
specific LSPs to be established. This decreases resource consumption and simplifies manual
configuration.
Usage Scenario
A smart LDP ingress policy is used when the following routes are reachable:
l Exact routes used by LDP in Downstream Unsolicited (DU) mode
l Exact routes used by LDP in Downstream on Demand (DoD) mode
l Longest match rule-based routes used by LDP in DU mode
The ingress running LDP obtains tunnel information based on BGP or VPN services and
establishes LSPs only for the services. This prevents establishment of unwanted LSPs, and
increases the efficiency of forwarding resources.
NOTE
In DU mode, the exact route rule and longest match rule take effect on routing information, not on LDP
LSP establishment. Therefore, exact routes and longest match rule-based routes are the same for the
establishment of smart LDP LSPs in DU mode.
Smart LDP ingress policies take effect on ingress LSPs, not transit or egress LSPs.
Transit
Non-MPLS Non-MPLS
network network
PE1 P PE2
In Figure 10-40, an L2VPN, L3VPN, or BGP service is deployed between PE1 and PE2.
These services are similar, with the exception that an L2VPN service involves a remote LDP
session. A smart LDP ingress policy is configured on PE1 and PE2. With this policy
configured, PE1 runs LDP to obtain tunnel information for a specified service and to establish
an ingress LSP to PE2. PE2 runs LDP to obtain tunnel information for a specified service and
to establish an ingress LSP to PE1. The P functions as a transit node and does not need to
establish an ingress LSP.
Figure 10-41 shows the process of LDP smartly obtaining tunnel information.
TNLM
LDP
1. The BGP, L2VPN, or L3VPN service module notifies the tunnel management module
(TNLM) of tunnel iteration information.
2. The TNLM module advertises service-specific tunnel information to the LDP module.
3. The LDP module enforces the smart LDP ingress policy and uses service-specific tunnel
information to establish an ingress LSP.
Benefits
A smart LDP ingress policy helps reduce the number of ingress LSPs to be established and
minimize resource consumption, which ensures that LDP LSPs can be established based on
services.
Background
In seamless Multiprotocol Label Switching (MPLS) networking, downstream on demand
(DoD) LDP sessions are established on the access side, and an Interior Gateway Protocol
(IGP) advertises default routes (or static default routes are configured). In such deployment,
an ingress must be able to send requests based on a specified service to establish LSPs
because the ingress cannot obtain routing information stored in its routing table to establish
LSPs.
The service can only be L2VPN, not BGP or L3VPN. The existing remote-ip auto-dod-
request command enables an ingress to use a remote LDP session to automatically send DoD
requests to a specified peer for a Label Mapping message. The remote peer must have been
configured. This function is supported by L2VPN services, not BGP or L3VPN services. This
is because the BGP and L3VPN services do not need remote LDP peers that are mandatory
for L2VPN services. To overcome the drawback in BGP and L3VPN implementation, a smart
request policy can be used to allow a service to trigger a request to establish an ingress LSP.
With the smart request policy, LDP can obtain tunnel information needed by BGP or VPN
services to send requests to establish ingress LSPs, without deploying remote LDP peers.
Usage Scenario
A smart LDP request policy is used when longest-match-rule routes are reachable for LDP in
Downstream Unsolicited (DU) mode.
NOTICE
An L2VPN service can be configured in DU and DoD mixed networking. In this situation, a
transit policy, not an ingress policy, is used to enable nodes to send requests to establish LSPs,
including unwanted transit LSPs. To allow only wanted LSPs to be established, a remote peer-
specific pseudo wire emulation edge-to-edge (PWE3) policy can be configured.
Figure 10-42 Smart request policy with the DoD mode configured
PE2 PE3
ABR1 ABR2
PE1
This implementation also applies to L2VPN services and does not conflict with the auto DoD
request function.
Figure 10-43 shows the process of LDP smartly obtaining tunnel information.
TNLM
LDP
1. The BGP, L2VPN, or L3VPN service module notifies the tunnel management module
(TNLM) of tunnel iteration information.
2. The TNLM module advertises service-specific tunnel information to the LDP module.
3. The LDP module enforces the smart LDP request policy and uses service-specific tunnel
information to establish an ingress LSP.
Benefits
In a BGP or VPN service scenario, a device can send requests to establish an ingress LSP,
without a remote LDP peer configured.
Terms
Term Definition
GR graceful restart
P2MP point-to-multipoint
10.3 MPLS TE
10.3.1 MPLS TE
Multiprotocol Label Switching (MPLS) traffic engineering (TE) effectively schedules,
allocates, and uses existing network resources to provide sufficient bandwidth and support for
quality of service (QoS). MPLS TE helps carriers minimize expenditures without requiring
hardware upgrades. TE is implemented based on MPLS techniques and is easy to deploy and
maintain on live networks. MPLS TE supports a range of reliability techniques, which helps
backbone networks achieve carrier- and device-class reliability.
Purpose
Traffic engineering techniques are common for carriers operating IP/MPLS bearer networks.
These techniques are used to prevent traffic congestion and uneven resource allocation.
A node on a conventional IP network selects the shortest path as an optimal route, regardless
of other factors, for example, bandwidth. The shortest path may be congested with traffic,
whereas other available paths are idle.
80M
LSRB
Each Link on the network shown in Figure 10-44 has a bandwidth of 100 Mbit/s and the
same metric value. LSRA sends LSRJ traffic at 40 Mbit/s, and LSRG sends LSRJ traffic at 80
Mbit/s. Traffic from both routers travels through the shortest path LSRA (LSRG) → LSRB →
LSRC → LSRD → LSRI → LSRJ that is calculated by an Interior Gateway Protocol (IGP)
protocol. As a result, the path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ
may be congested because of overload, while the path LSRA (LSRF) → LSRB → LSRE →
LSRF → LSRH → LSRI → LSRJ is idle.
Congestion is a major cause for poor performance of a backbone network. A network may be
congested because of insufficient resources or be partially congested because of network
resource imbalance. TE resolves congestion caused by load imbalance. Conventional TE
solutions are as follows:
l TE controls network traffic by adjusting the metric of a path. This method eliminates
congestion only on some links. Adjusting a metric is difficult on a complex network
because a link change affects multiple routes.
l TE directs some traffic to virtual connections (VCs) based on an overlay model. The
current IGPs are topology driven and applicable to only static network connections,
regardless of dynamic factors, such as bandwidth and traffic attributes.
The overlay model, such as IP over asynchronous transfer mode (ATM) or IP over frame
relay (FR), complements IGP disadvantages. An overlay model provides a virtual
topology over a physical topology for a network. This helps properly adjust traffic and
implement QoS features, but has high costs and poor extensibility.
A scalable and simple solution is required to implement TE on a large-scale network. MPLS,
an overlay model, allows a virtual topology to be established over a physical topology and
maps traffic to the virtual topology. MPLS can be integrated with TE. MPLS TE was
introduced.
Definition
MPLS TE establishes label switched paths (LSPs) satisfying specific constraints and
transparently transmits traffic over the LSPs based on labels. This satisfies constraints, such as
controllable paths and sufficient link bandwidth reserved for services transmitted over the
LSPs. MPLS TE can be used on the network shown in Figure 10-44 to address congestion.
MPLS TE establishes an 80 Mbit/s LSP over the path LSRG → LSRB → LSRC → LSRD →
LSRI → LSRJ and a 40 Mbit/s LSP over the path LSRA → LSRB → LSRE → LSRF →
LSRH → LSRI → LSRJ. MPLS TE directs traffic to the two LSPs, preventing congestion.
LSP 1
80M
LSRB
LSP 2
LSRA 40M LSRI LSRK
Basic Includes basic MPLS TE settings and the tunnel establishment capability.
function
Tunnel Allows existing tunnels to be reestablished over other paths if the topology is
optimizati changed, or these tunnels can be reestablished using updated bandwidth if
on service bandwidth values are changed.
Function Description
Module
Benefits
MPLS TE offers the following benefits:
l Provides sufficient bandwidth and supports QoS capabilities for services.
l Optimizes bandwidth allocation.
l Establishes public network tunnels to isolate virtual private network (VPN) traffic.
l Is implemented based on existing MPLS techniques and its deployment and maintenance
are simple.
l Supports carrier- and device-level reliability functions.
10.3.2 Principles
MPLS TE Tunnel
Multiple LSPs are bound together to form an MPLS TE tunnel. An MPLS TE tunnel is
uniquely identified by the following parameters:
l Tunnel interface: a P2P virtual interface that encapsulates packets. Similar to a loopback
interface, a tunnel interface is a logical interface. A tunnel interface name is identified by
an interface type and number. The interface type is "tunnel." The interface number is
expressed in the format of SlotID/CardID/PortID.
l Tunnel ID: a decimal number that identifies an MPLS TE tunnel and facilitates tunnel
planning and management. A tunnel ID must be specified before an MPLS TE tunnel
interface is configured.
MPLS TE Tunnel
LSRA LSRE
A primary LSP with LSP ID 2 is established along the path LSRA → LSRB → LSRC →
LSRD → LSRE on the network shown in Figure 10-46. A backup LSP with LSP ID 32500 is
established along the path LSRA → LSRF → LSRG → LSRH → LSRE. The two LSPs are in
a tunnel named Tunnel 0/1/0 with a tunnel ID 100.
CR-LSPs
LSPs in an MPLS TE tunnel are constraint-based routed LSPs (CR-LSPs).
Unlike Label Distribution Protocol (LDP) LSPs that are established using routing
information, CR-LSPs are established based on bandwidth and path constraints, in addition to
routing information.
Link Attributes
MPLS TE link attributes describe bandwidth resources, route costs, and link reliability. The
link attributes are as follows:
l Total link bandwidth
Bandwidth of all physical links.
l Maximum reservable bandwidth
Maximum bandwidth that a link can reserve for an MPLS TE tunnel to be established.
The maximum reservable bandwidth must be lower than or equal to the total link
bandwidth. The maximum reservable bandwidth can be manually set.
l TE metric
A TE metric is used in TE tunnel path calculation, allowing the calculation process to be
independent from IGP route-based path calculation. The IGP metric is used for MPLS
TE tunnels by default.
l SRLG
A shared risk link group (SRLG) is a set of links which are likely to fail concurrently
when sharing a physical resource (for example, an optical fiber). Links in an SRLG share
the same risk of faults. If one link fails, other links in the SRLG also fail.
An SRLG enhances CR-LSP reliability on an MPLS TE network enabled with CR-LSP
hot standby or TE FRR. For more information about the SRLG, see SRLG.
l Link administrative group
Link administrative group is also called link color. A link administrative group is a 32-bit
vector, with each bit set to a specified value that is associated with a desired meaning.
For example, a link administrative group attribute can be configured to describe link
bandwidth, a performance parameter (for example, the delay time) or a management
policy. The policy can be a traffic type (for example, multicast) or a flag indicating that
an MPLS TE tunnel passes over the link. The link administrative group attribute is used
together with affinities to control the paths for tunnels.
Tunnel Attributes
MPLS TE tunnels support the following attributes:
l Bandwidth
Bandwidth values are planned based on services that are to pass through a tunnel. The
configured bandwidth is reserved on each node through which a tunnel passes.
l Affinity attribute
An affinity is a 32-bit vector, configured on the ingress of a tunnel. It must be used
together with a link administrative group attribute.
After a tunnel is configured with an affinity, a device compares the affinity with the
administrative group value during link selection to determine whether a link with
specified attributes is selected or not. The device implements two AND operations, one
between a 32-bit mask and each affinity, and one between the 32-bit mask and the
administrative group value. If the two AND operations yield the same results, the path is
selected. If the results are different, the path is not selected. The following rules apply:
– If some bits in a mask are 1s, at least one bit in the administrative group is 1, and
the corresponding bit in the affinity must be 1. If some bits in the affinity are 0s, the
corresponding bits in the administrative group cannot be 1.
For example, an affinity is 0x0000FFFF and its mask is 0xFFFFFFFF. The higher-
order 16 bits in the administrative group of available links are 0 and at least one of
the lower-order 16 bits is 1. This means the administrative group attribute ranges
from 0x00000001 to 0x0000FFFF.
– If some bits in a mask are 0s, the corresponding bits in the administrative group are
not compared with the affinity bits.
For example, an affinity is 0xFFFFFFFF, and its mask is 0xFFFF0000. At least one
of the higher-order 16 bits in an administrative group attribute is 1, and the lower-
order 16 bits can be 0s and 1s. This means that the administrative group attribute
ranges from 0x00010000 to 0xFFFFFFFF.
NOTE
Understand specific comparison rules before deploying devices of different vendors because the
comparison rules vary with vendors.
A network administrator can use the link administrative group and affinities to control
the paths over which MPLS TE tunnels are established.
l Explicit path
Strict
B Strict
C Strict
E Strict
LSRC LSRE D Strict
F Strict
For example, a CR-LSP is established between LSRA and LSRF on the network
shown in Figure 10-47. LSRA is the ingress, and LSRF is the egress. "X strict"
specifies the LSR through which the CR-LSP must travel. For example, "B strict"
indicates that the CR-LSP must travel through LSRB, and the previous hop of
LSRB must be LSRA. "C strict" indicates that the CR-LSP must travel through
LSRC, and the previous hop of LSRC must be LSRB. The procedure repeats. A
path with each node specified is provided for the CR-LSP.
– Loose explicit path
A loose explicit path contains specified nodes through which a CR-LSP must pass.
Other routers that are not specified can also exist on the CR-LSP.
Loose
For example, a CR-LSP is established over a loose explicit path between LSRA and
LSRF on the network shown in Figure 10-48. LSRA is the ingress, and LSRF is the
egress. "D loose" indicates that the CR-LSP must pass through LSRD and LSRD
and LSRA may not be directly connected. This means that other LSRs may exist
between LSRD and LSRA.
l Hop limit
Hop limit is a condition for path selection during CR-LSP establishment. Similar to the
administrative group and affinity attributes, a hop limit defines the number of hops that a
CR-LSP allows.
l Route pinning
Any changes in the network topology or tunnel functions may cause an established CR-
LSP to be reestablished, leading to the following issues:
– The reestablished CR-LSP may be over a path that is different from the original
one, causing management difficulties.
– Traffic must switch from the original CR-LSP to the new one, causing traffic loss.
Route pinning can be used to resolve the preceding problems. Route pinning helps an
established CR-LSP remain over a path regardless of route changes. This function
improves service traffic continuity and reliability.
l Priorities and preemption
They are used to allow TE tunnels to be established preferentially to transmit important
services, preventing random resource competition during tunnel establishment.
If there is no path meeting the bandwidth requirement of a desired CR-LSP, a device can
tear down an established CR-LSP and use the bandwidth assigned to that CR-LSP to
establish a desired CR-LSP. This is called preemption. The following preemption modes
are supported:
– Hard preemption: A CR-LSP with a higher priority can directly preempt resources
assigned to a CR-LSP with a lower priority. Traffic is dropped on the CR-LSP with
a lower priority during the hard preemption process until the lower priority tunnel is
reestablished.
– Soft preemption: The make-before-break mechanism applies. A CR-LSP with a
higher priority has to wait until traffic over a lower-priority CR-LSP switches to
another CR-LSP before the higher-priority CR-LSP preempts bandwidth assigned
to the lower-priority CR-LSP.
CR-LSPs use setup and holding priorities to determine whether to preempt resources.
The priority value ranges from 0 to 7. A smaller value allows for a higher priority. The
setup priority must be lower than or equal to the holding priority for a tunnel.
The priority and preemption attributes are used in conjunction to determine resource
preemption among tunnels. If multiple CR-LSPs are to be established, CR-LSPs with
high priorities can be established by preempting resources. If resources (such as
bandwidth) are insufficient, a CR-LSP with a higher setup priority can preempt resources
of an established CR-LSP with a lower holding priority.
The following tunnels are established on the network shown in Figure 10-49.
– Tunnel 1: established over the path LSRA → LSRF → LSRD. Its bandwidth is 155
Mbit/s, and its setup and holding priority values are 0.
– Tunnel 2: established over the path LSRB → LSRF → LSRC. Its bandwidth is 150
Mbit/s, and its setup and holding priority values are 7.
If the link between LSRF and LSRD fails, LSRA recalculates a path LSRA → LSRF →
LSRC → LSRE → LSRD for tunnel 1. The link between LSRF and LSRC is shared by
tunnels 1 and 2, but has insufficient bandwidth for these two tunnels. As a result,
preemption is triggered.
Tunnel 1 Tunnel 2
155M 155M
LSRF
150M
200M 155M
155M 155M
LSRC LSRE LSRD
– If hard preemption is used, LSRF directly sends an RSVP message to tear down
tunnel 2 because tunnel 1 has a higher priority than tunnel 2. As a result, some
traffic on tunnel 2 is dropped if tunnel 2 is transmitting traffic.
– If soft preemption is used, LSRF sends LSRC an RSVP message. After LSRC
receives this message, LSRC reestablishes tunnel 2 over another path LSRB →
LSRD → LSRE → LSRC. LSRC switches traffic to the new path before tearing
down tunnel 2 over the original path.
10.3.2.2 Implementation
An MPLS TE tunnel is established using four components. Table 10-6 lists the components
and describes their functions.
2 Path Runs Constraint Shortest Path First (CSPF) and uses TEDB data to
Selection calculate a path that satisfies specific constraints. CSPF evolves from
Componen the Shortest Path First (SPF) protocol. CSPF excludes nodes and links
t that do not satisfy specific constraints and uses the same algorithm that
SPF supports to calculate a path.
N Name Description
o.
4 Traffic Directs traffic to a CR-LSP and forwards the traffic along the CR-LSP.
Forwardin Although a CR-LSP can be established using the preceding three
g components, the CR-LSP cannot automatically import traffic. The
Componen traffic forwarding component can be used to direct traffic to the CR-
t LSP.
NOTE
l A static CR-LSP is manually established, and there is no need to use the information advertisement
component or the path calculation component.
l A dynamic CR-LSP is dynamically established by signaling. Therefore, all the preceding
components are used to establish a dynamic CR-LSP.
A network administrator can configure link and tunnel attributes to enable MPLS TE to
automatically establish a CR-LSP. The network administrator can then direct traffic to the
CR-LSP and forward traffic over the CR-LSP.
Contents to Be Advertised
The network resource information to be advertised includes the following items:
l Link status information: interface IP addresses, link types, and link metric values, which
are collected by an Interior Gateway Protocol (IGP)
l Bandwidth information, such as maximum link bandwidth and maximum reservable
bandwidth
l TE metric: TE link metric, which is the same as the IGP metric by default
Advertisement Methods
Either of the following link status protocol extensions can be used to advertise TE
information:
l OSPF TE
l IS-IS TE
Open Shortest Path First (OSPF) TE and Intermediate System to Intermediate System (IS-IS)
TE automatically collect TE information and flood it to MPLS TE nodes.
Figure 10-50 Proportion of the bandwidth reserved for each MPLS TE tunnel to the
available bandwidth in the TEDB
10% 10%
9% 8.9%
8% 7.8%
7% 6.7%
6% 5.6%
5%
4.4%
4%
3.3% 3.8%
3%
2% 2.2% 2.5%
1% 1.1% 1.3%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22......
CSPF Fundamentals
CSPF works based on the following parameters:
A TEDB can be generated only after Interior Gateway Protocol (IGP) TE is configured. On an IGP TE-
incapable network, CR-LSPs are established based on IGP routes, but not CSPF calculation results.
NOTE
CSPF attempts to use the OSPF TEDB to establish a path for a CR-LSP by default. If a path is
successfully calculated using OSPF TEDB information, CSPF completes calculation and does not use
the IS-IS TEDB to calculate a path. If path calculation fails, CSPF attempts to use IS-IS TEDB
information to calculate a path.
CSPF can be configured to use the IS-IS TEDB to calculate a CR-LSP path. If path
calculation fails, CSPF uses the OSPF TEDB to calculate a path.
CSPF calculates the shortest path to a destination. If there are several shortest paths with the
same metric, CSPF uses a tie-breaking policy to select one of them. The following tie-
breaking policies for selecting a path are available:
l Most-fill: selects a link with the highest proportion of used bandwidth to the maximum
reservable bandwidth, efficiently using bandwidth resources.
l Least-fill: selects a link with the lowest proportion of used bandwidth to the maximum
reservable bandwidth, evenly using bandwidth resources among links.
l Random: selects links randomly, allowing LSPs to be established evenly over links,
regardless of bandwidth distribution.
When several links have the same proportion of used bandwidth to the maximum reservable
bandwidth (for example, the links do not use the reserved bandwidths or the same bandwidth
is used on every link), the link discovered first is selected, irrespective of whether most-fill or
least-fill is configured.
For example, CSPF removes links marked blue and links each with bandwidth of 50 Mbit/s
based on tunnel constraints. It uses other links each with bandwidth of 100 Mbit/s to calculate
a path for an MPLS TE tunnel on the network shown in Figure 10-51. The constraints include
the destination LSRE, bandwidth of 80 Mbit/s, and a transit node LSRH.
50 Bl Bl
ue ue
ue
Bl
LSR A LSR E
50
LSR F LSR G LSR H
Calculated topology
LSRA LSRE
CSPF calculates a path shown in Figure 10-52 in the same way SPF would calculate it.
LSR A LSR E
l CSPF calculates the shortest path between the ingress and egress. SPF calculates the
shortest path between a node and each of other nodes on a network.
l CSPF uses metrics such as the bandwidth, link attributes, and affinity attributes, in
addition to link costs, which are the only metric used by SPF.
l CSPF does not support load balancing and uses three tie-breaking policies to determine a
path if multiple paths have the same attributes.
RSVP-TE Overview
The Resource Reservation Protocol (RSVP) is designed on the basis of the Integrated
Services model. RSVP can reserve resources on each node of a CR-LSP. RSVP, an Internet
control protocol, operates at the transport layer and does not transport application data. RSVP-
TE is an extension to RSVP. RSVP-TE can establish or delete CR-LSPs using TE attributes in
extended objects.
RSVP-TE has the following unique aspects compared with RSVP:
l RSVP-TE appends Label Request objects to RSVP Path messages to request labels. Resv
messages carry Label objects that are used to allocate labels. TE tunnels can be
established based on the labels.
l The extended RSVP messages can carry information about path constraint parameters, in
addition to label binding information.
l RSVP-TE supports MPLS TE attributes, such as resource reservation, carried in the
extended objects.
RSVP-TE Principles
Table 10-7 lists RSVP-TE principles.
Path Maintenance RSVP-TE sends messages to maintain the path status on each
node.
Path Teardown A CR-LSP is torn down and releases labels and bandwidths on
each node. The ingress initiates the request for a teardown.
CR-LSP Establishment
Figure 10-53 shows the process of establishing an RSVP-TE CR-LSP.
1. PE1 uses CSPF to calculate a path between PE1 and PE2. The IP address of every hop
on this path has been specified. PE1 generates a Path message and creates a PSB. PE1
then adds the explicit route object (ERO) field containing a list of IP addresses calculated
by CSPF, and sends the Path message to P1 along the path specified by the ERO.
2. After P1 receives the Path message, P1 parses the message and creates a PSB based on
the Path message. P1 then generates a new Path message and sends it to P2 based on the
ERO.
– The HOP field in the Path message updated by PE1 specifies the IP address of the
outbound interface through which PE1 sends the message to P1. The HOP field in
the Path message updated by P1 specifies the IP address of the outbound interface
through which P1 sends the message to P2.
– P1 deletes the local LSR ID and IP addresses of the inbound and outbound
interfaces from the ERO field in the Path message.
3. P2 deals with the received Path message in the same process as that on P1. P2 creates a
PSB based on the Path message, updates the new Path message, and sends it to PE2.
4. After PE2 receives a Path message, PE2 knows that it is the egress of the CR-LSP to be
established based on the Tunnel Address field in the Session object. PE2 then allocates a
label and bandwidth resources, and generates an RSB based on the Resv message. The
Resv message is sent to P2 and carries the label which is allocated by PE2.
Different from the destination IP address in the Path message, the destination IP address
of the Resv message sent by PE2 is the IP address carried in the HOP field of the
received Path message, not the LSR ID of the ingress. The IP header of the Resv
message does not need to contain the Router Alert option.
The Resv message is forwarded along the reverse path. Therefore, the Resv message
does not carry the ERO field.
OBJECT Value
SESSION Source:PE2-if0;Destination:P2-if1
HOP PE2-if0
LABEL 3
REQUEST RECORD_ROUTE PE2-if0
5. After P2 receives the Resv message, P2 creates an RSB based on the Resv message,
allocates a new label, updates the Resv message, and sends the message to P1.
OBJECT Value
SESSION Source:P2-if0;Destination:P1-if1
HOP P2-if0
LABEL 17
REQUEST RECORD_ROUTE P2-if0;PE2-if0
6. P1 deals with the received Resv message in the same process as that on P2. P1 updates
the Resv message and sends it to PE1.
OBJECT Value
SESSION Source:P1-if0;Destination:PE1-if1
HOP P1-if0
LABEL 18
REQUEST RECORD_ROUTE P1-if0;P2-If0;PE2-if0
7. PE1 obtains the label allocated by P1 based on the received Resv message. Resource
reservation succeeds and a CR-LSP is established.
Reservation Styles
The treatment style of reserving resources for different senders within the same session is
called a reservation style. The following reservation styles are supported:
l Fixed Filter (FF) style: allows a particular sender to create a separate reservation for a
tunnel. This sender does not share its resource reservation with other senders. A resource
reservation on the same link is used by a specific CR-LSP.
l Shared Explicit (SE) style: allows a set of selected upstream senders to share a single
reservation. The same resource reservation on the same link is shared by different CR-
LSPs.
NOTE
A Refresh message is not a new type of message. Refresh messages are the messages that have already
been advertised.
The refreshing interval is specified in the Time Value field.
If the PSB or RSB does not receive any Refresh message about a specific state block after the
specified refreshing intervals elapses, it deletes the state.
RSVP Refresh messages can also monitor the reachability between RSVP neighbors and
maintain RSVP neighbor relationships.
Figure 10-60 shows an RSVP Refresh message. Path and Resv messages are sent separately.
PE1 P1 P2 PE2
Path Resv Path Resv Path Resv
Time
0:00
0:30
1:00
1:30
. . . .
. . . .
. . . .
Path Teardown
After a user instructs an ingress to delete a CR-LSP or the ingress receives a PathErr message,
the ingress sends a PathTear message to a downstream node. The downstream node receives
this message, tears down the CR-LSP, and replies to the ingress with a ResvTear message.
Error Signaling
RSVP-TE uses the following messages to advertise LSP errors.
l PathErr message: sent upstream by an RSVP node if an error occurs while this node is
processing a Path message. A PathErr message is forwarded by consecutive transit nodes
and arrives at the ingress.
l ResvErr message: sent downstream by an RSVP node if an error occurs while this node
is processing a Resv message. A ResvErr message is forwarded by consecutive transit
nodes and arrives at the egress.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships. As the sizes of Path and
Resv messages are larger, sending many messages to establish many CR-LSPs causes
increased consumption of network resources. RSVP Srefresh can be used to address this
problem.
Implementation
RSVP Srefresh defines new objects based on the existing RSVP protocol:
l Message_ID extension and retransmission extension
The Srefresh extension builds on the Message_ID extension. According to the
Message_ID extension mechanism defined in RFC 2961, RSVP messages carry
extended objects, including Message_ID and Message_ID_ACK objects. The two
objects are used to confirm RSVP messages and support reliable RSVP message
delivery.
The Message_ID object can also be used to provide the RSVP retransmission
mechanism. For example, a node initializes a retransmission interval as Rf seconds after
it sends an RSVP message carrying the Message_ID object. If the node receives no ACK
message within Rf seconds, the node retransmits an RSVP message after (1 + Delta) x
Rf seconds. The Delta determines the increased rate of the transmission interval set by
the sender. The node keeps retransmitting the message until it receives an ACK message
or the retransmission times reach a specific threshold (called a retransmission increment
value).
l Summary Refresh extension
The Summary Refresh extension supports Srefresh messages to update the RSVP status,
without standard Path or Resv messages transmitted.
Each Srefresh message carries a Message_ID object. Each object contains multiple
messages IDs, each of which identifies a Path or Resv state to be refreshed. If a CR-LSP
changes, its message ID value increases.
Only the state that was previously advertised by Path and Resv messages containing
Message_ID objects can be refreshed using the Srefresh extension.
After a node receives an Srefresh message, the node compares the Message_ID with that
saved in a local state block.
– If they match, the node does not change the state.
– If the Message_ID is greater than that saved in the local state block, the node sends
a NACK message to the sender, refreshes the PSB or RSB based on the Path or
Resv message, and updates the Message_ID.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships.
Using Path and Resv messages to monitor neighbor reachability delays a traffic switchover if
a link fault occurs and therefore is slow. The RSVP Hello extension can address this problem.
Related Concepts
l RSVP Refresh messages: Although an MPLS TE tunnel is established using Path and
Resv messages, RSVP nodes still send Path and Resv messages over the established
tunnel to update the RSVP status. These Path and Resv messages are called RSVP
Refresh messages.
l RSVP GR: ensures uninterrupted transmission on the forwarding plane when an
AMB/SMB switchover is performed on the control plane. A GR helper assists a GR
restarter in rapidly restoring the RSVP status.
l TE FRR: a local protection mechanism for MPLS TE tunnels. If a fault occurs on a
tunnel, TE FRR rapidly switches traffic to a bypass tunnel.
Implementation
The principles of the RSVP Hello extension are as follows:
1. Hello handshake mechanism
LSRA and LSRB are directly connected on the network shown in Figure 10-61.
– If RSVP Hello is enabled on LSRA, LSRA sends a Hello Request message to
LSRB.
– After LSRB receives the Hello Request message and is also enabled with RSVP
Hello, LSRB sends a Hello ACK message to LSRA.
– After receiving the Hello ACK message, LSRA considers LSRB reachable.
2. Detecting neighbor loss
After a successful Hello handshake is implemented, LSRA and LSRB exchange Hello
messages. If LSRB does not respond to three consecutive Hello Request messages sent
by LSRA, LSRA considers router B lost and re-initializes the RSVP Hello process.
3. Detecting neighbor restart
If LSRA and LSRB are enabled with RSVP GR, and the Hello extension detects that
LSRB is lost, LSRA waits for LSRB to send a Hello Request message carrying a GR
extension. After receiving the message, LSRA starts the GR process on LSRB and sends
a Hello ACK message to LSRB. After receiving the Hello ACK message, LSRB
performs the GR process and restores the RSVP soft state. LSRA and LSRB exchange
Hello messages to maintain the restored RSVP soft state.
NOTE
Deployment Scenarios
The RSVP Hello extension applies to networks enabled with both RSVP GR and TE FRR.
Static Route
Static route is the simplest method for directing traffic to a CR-LSP in an MPLS TE tunnel. A
TE static route works in the same way as a common static route and has a TE tunnel interface
as an outbound interface.
Auto Route
An Interior Gateway Protocol (IGP) uses an auto route related to a CR-LSP in a TE tunnel
that functions as a logical link to calculate a path. The tunnel interface is used as an outbound
interface in the auto route. The TE tunnel is considered a P2P link with a specified metric
value. The following auto routes are supported:
l IGP shortcut: A route related to a CR-LSP is not advertised to neighbor nodes,
preventing other nodes from using the CR-LSP.
l Forwarding adjacency: A route related to a CR-LSP is advertised to neighbor nodes,
allowing these nodes to use the CR-LSP.
Forwarding adjacency allows tunnel information to be advertised based on IGP neighbor
relationships.
If the forwarding adjacency is used, nodes on both ends of a CR-LSP must be in the
same area.
The following example demonstrates the IGP shortcut and forwarding adjacency.
Figure 10-62 Schematic diagram for IGP shortcut and forwarding adjacency
LSRH
LSRC LSRD
10
10
0
LSRB 1 5 LSRE
10 10
10
LSRA 10
A CR-LSP over the path LSRG → LSRF → LSRB is established on the network shown in
Figure 10-62, and the TE metric values are specified. Either of the following configurations
can be used:
l The auto route is not used. LSRE uses LSRD as the next hop in a route to LSRA and a
route to LSRB; LSRG uses LSRF as the next hop in a route to LSRA and a route to
LSRB.
l The auto route is used. Either IGP shortcut or forwarding adjacency can be configured:
– The IGP shortcut is used to advertise the route of Tunnel 1. LSRE uses LSRD as the
next hop in the route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the
next hop in the route to LSRA and the route to LSRB. LSRG, unlike LSRE, uses
Tunnel 1 in IGP path calculation.
– The forwarding adjacency is used to advertise the route of Tunnel 1. LSRE uses
LSRG as the next hop in the route to LSRA and the route to LSRB; LSRG uses
Tunnel 1 as the next hop in the route to LSRA and the route to LSRB. Both LSRE
and LSRG use Tunnel 1 in IGP path calculation.
Policy-based Routing
The policy-based routing (PBR) allows the system to select routes based on user-defined
policies, improving security and load balancing traffic. If PBR is enabled on an MPLS
network, IP packets are forwarded over specific CR-LSPs based on PBR rules.
MPLS TE PBR, the same as IP unicast PBR, is implemented based on a set of matching rules
and behaviors. The rules and behaviors are defined using an apply clause, in which the
outbound interface is a specific tunnel interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific CR-
LSPs.
Tunnel Policy
Tunnel policies applied to virtual private networks (VPNs) guide VPN traffic to tunnels in
either of the following modes:
l Select-seq mode: The system selects tunnels for VPN traffic in the specified tunnel
selection sequence.
l Tunnel binding mode: A CR-LSP is bound to a destination address in a tunnel policy.
This policy applies only to CR-LSPs.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. An MPLS TE
tunnel is configured using static information, such as a bandwidth setting and a calculated
path. Without the optimization function, an MPLS TE tunnel cannot be automatically updated
after the service bandwidth or a tunnel management policy changes. This wastes network
resources. MPLS TE tunnels need to be optimized after being established.
Implementation
The optimization enables the CR-LSP to be reestablished over the optimal path with the
smallest metric. A specific event that occurs on the ingress can trigger optimization for a CR-
LSP bound to an MPLS TE tunnel.
NOTE
After the interval at which a CR-LSP is optimized elapses, constraint shortest path first
(CSPF) attempts to calculate a new path. If the calculated path has a metric smaller than
that of the existing CR-LSP, a new CR-LSP is established over the new path. After the
CR-LSP is successfully established, the ingress instructs the forwarding plane to switch
traffic to the new CR-LSP and tear down the original CR-LSP. Re-optimization is then
complete. If the CR-LSP fails to be established, traffic is still forwarded along the
existing CR-LSP.
l Manual re-optimization
A re-optimization command is run in the user view to trigger re-optimization.
CR-LSP attribute templates can be used to flexibly configure MPLS TE tunnels in batches
and effectively manage these tunnels.
Background
If many MPLS TE tunnels need to be established, lots of MPLS TE functions need to be
configured and managed. To reduce workload, CR-LSP attribute templates each with a set of
parameters can be used, providing configuration flexibility.
Related Concept
A CR-LSP attribute template is a set of CR-LSP parameters that are configured on a tunnel
interface.
Implementation
A network administrator creates a CR-LSP attribute template and sets attributes in this
attribute template. This attribute template is used on a tunnel interface of an ingress. The
ingress can use this template to create CR-LSPs. Table 10-8 lists attributes that can be
configured in a CR-LSP attribute template.
Flag indicating the route Enables a tunnel interface to record routes and labels.
and label record
Attribute Description
Constraints for a bypass Setup priority value, holding priority value, and bandwidth for
tunnel a bypass tunnel that protects a CR-LSP established using this
template.
After an attribute template is used to create a CR-LSP, this template can also be used to
manage and maintain CR-LSP attributes. CR-LSP attributes can be modified in either of the
following modes:
l Configurations in the attribute template that is used to establish a CR-LSP are modified.
The attribute template can be modified to update the attributes of an existing CR-LSP
that uses this attribute template.
A specific attribute template update has a specific impact on the setup of CR-LSPs:
– If the priority or bandwidth type is changed, the ingress tears down the existing CR-
LSP and uses the changed attribute to establish a new CR-LSP.
– If other attributes are changed, the ingress implements the make-before-break
procedure.
l Commands are run on the tunnel interface to update attributes.
When attributes are configured both using the attribute template and command lines on
the tunnel interface, the attributes configured using command lines on the tunnel
interface are used to establish a CR-LSP.
For example, an attribute template named lsp-attribute 1 is used to establish a tunnel named
Tunnel 1 with the hop limit 24. If the mpls te hop-limit command is used on the tunnel
interface to set the hop limit to 16, the ingress implements the make-before-break procedure
and establishes a new CR-LSP with the hop limit 16.
Other Usage
A primary CR-LSP can be established using a specific attribute template. A maximum of
three attribute templates can be specified for a hot-standby CR-LSP or an ordinary backup
CR-LSP. Each attribute template contains specific attributes. One attribute template can be
selected to establish a desired CR-LSP.
In addition to the attribute templates, a best-effort path can also be configured on the tunnel
interface. This means that the hot-standby CR-LSP, ordinary backup CR-LSP, and best-effort
path can be configured on the same tunnel interface. If no attribute template is used, only hot-
standby CR-LSPs and best-effort paths can be configured simultaneously on a tunnel
interface.
Deployment Scenarios
CR-LSP attribute templates can be used to establish primary and backup CR-LSP bound to a
TE tunnel.
Benefits
CR-LSP attribute templates on a tunnel interface offer the following advantages:
l CR-LSPs with the same TE attributes can be established in a batch, which greatly
simplifies configurations.
l CR-LSP attribute templates with different settings can be configured, and one of them
can be selected to establish a hot-standby or an ordinary CR-LSP. More attributes and
paths are provided for the CR-LSP than those configured using commands.
l A maximum of three CR-LSP attribute templates on a TE tunnel interface are designated
for a hot-standby CR-LSP or an ordinary CR-LSP. Different protection paths are
available.
l A hot-standby CR-LSP, an ordinary CR-LSP, and a best-effort path are configured
simultaneously to protect a primary CR-LSP on the same tunnel interface.
l Modifying attributes in a CR-LSP attribute template updates the configuration of CR-
LSPs that have been established using that attribute template, providing more flexibility
for CR-LSP configuration.
Fault detection Rapidly detects MPLS TE network faults to speed l RSVP Hello
up a protection switchover. l BFD for TE
10.3.5.2 Make-Before-Break
The make-before-break mechanism prevents traffic loss during a traffic switchover between
two CR-LSPs. This mechanism improves MPLS TE tunnel reliability.
Background
MPLS TE provides a set of tunnel update mechanisms, which prevents traffic loss during
tunnel updates. In real-world situations, an administrator can modify the bandwidth or explicit
path attributes of an established MPLS TE tunnel based on service requirements. An updated
topology allows for a path better than the existing one, over which an MPLS TE tunnel can be
established. Any change in bandwidth or path attributes causes a CR-LSP in an MPLS TE
tunnel to be reestablished using new attributes and causes traffic to switch from the previous
CR-LSP to the newly established CR-LSP. During the traffic switchover, the make-before-
break mechanism prevents traffic loss that occurs if the traffic switchover is implemented
more quickly than the path switchover.
Principles
Make-before-break is a mechanism that allows a CR-LSP to be established using changed
bandwidth and path attributes over a new path before the original CR-LSP is torn down. It
helps minimize data loss and additional bandwidth consumption. The new CR-LSP is called a
modified CR-LSP. Make-before-break is implemented using the shared explicit (SE) resource
reservation style.
The new CR-LSP competes with the original CR-LSP on some shared links for bandwidth.
The new CR-LSP cannot be established if it fails the competition. The make-before-break
mechanism allows the system to reserve bandwidth used by the original CR-LSP for the new
CR-LSP, without calculating the bandwidth to be reserved. Additional bandwidth is used if
links on the new path do not overlap the links on the original path.
LSRE
In this example, the maximum reservable bandwidth on each link is 60 Mbit/s on the network
shown in Figure 10-63. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is
established, with the bandwidth of 40 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load. The reservable bandwidth of the link between LSRC and
LSRD is just 20 Mbit/s. The total available bandwidth for the new path is less than 40 Mbit/s.
The make-before-break mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path
LSRA → LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link
between LSRC and LSRD. After the new CR-LSP is established over the path, traffic
switches to the new CR-LSP, and the original CR-LSP is torn down.
In addition to the preceding method, another method of increasing the tunnel bandwidth can
be used. If the reservable bandwidth of a shared link increases to a certain extent, a new CR-
LSP can be established.
In the example shown in Figure 10-63, the maximum reservable bandwidth on each link is 60
Mbit/s. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is established, with the
bandwidth of 30 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load, and the bandwidth is expected to increase to 40 Mbit/s. The
reservable bandwidth of the link between LSRC and LSRD is just 30 Mbit/s. The total
available bandwidth for the new path is less than 40 Mbit/s. The make-before-break
mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path
LSRA → LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link
between LSRC and LSRD. The bandwidth of the new CR-LSP is 40 Mbit/s, out of which 30
Mbit/s is released by the link between LSRC and LSRD. After the new CR-LSP is
established, traffic switches to the new CR-LSP and the original CR-LSP is torn down.
torn down a specified delay later after a new CR-LSP is established. The switching delay and
deletion delay can be manually configured.
10.3.5.3 TE FRR
TE FRR protects links and nodes on CR-LSPs bound to an MPLS TE tunnel. If a link or node
fails, TE FRR rapidly switches traffic to a backup path, minimizing traffic loss.
Background
A link or node failure triggers a primary/backup CR-LSP switchover. IGP routes of the
backup path need to converge, and CSPF recalculates a path over which a CR-LSP is
established. Traffic is dropped during this process.
TE FRR can be used to prevent traffic loss. After a link or node fails, TE FRR establishes a
bypass CR-LSP, which excludes the faulty link or node. The bypass CR-LSP can rapidly take
over traffic, minimizing traffic loss. The ingress can reestablish a primary CR-LSP.
Related Concepts
PLR MP
Primary CR- LSP
Bypass CR-LSP
LSRE
Concept Description
Bypass CR-LSP A CR-LSP that protects the primary CR-LSP. The bypass CR-LSP
is usually in the idle state and transmits few data. If the bypass
CR-LSP needs to forward service data when it protects the
primary CR-LSP, sufficient bandwidth must be allocated to the
bypass CR-LSP.
Point of Local Repair The ingress of the bypass CR-LSP. It must be on the path of the
(PLR) primary CR-LSP. The PLR can be the ingress, not the egress of
the primary CR-LSP.
Concept Description
Merge point (MP) The egress of the bypass CR-LSP. It must be on the path of the
primary CR-LSP. The MP cannot be the ingress of the primary
CR-LSP.
Obj Link The PLR (LSRB) and MP (LSRC) are directly connected, and the
ect protectio primary CR-LSP passes through the direct link. Bypass CR-LSP 1
to n protects the direct link, as shown in Figure 10-65.
be
prot Node A primary CR-LSP between the PLR (LSRB) and MP (LSRD) passes
ecte protectio through LSRC. Bypass CR-LSP 2 protects LSRC on the primary CR-
d n LSP, as shown in Figure 10-65.
Ban Bandwid The bandwidth of a bypass CR-LSP is higher than or equal to that of the
dwi th primary CR-LSP. The bypass CR-LSP protects the primary CR-LSP and
dth protectio its bandwidth.
n
LSRA LSRE
NOTE
A bypass CR-LSP supports the combination of protection modes. For example, manual protection, node
protection, and bandwidth protection can be implemented together on a bypass CR-LSP.
Implementation
The PLR implements TE FRR as follows:
Path LSRF
SESSION ATTRIBUTE :
Local protection desired
Bandwidthprotection desired
( PLR ) ( MP )
LSRA LSRE
If multiple bypass CR-LSPs are established, the PLR selects the one with the highest
priority. The PLR prioritizes bypass CR-LSPs in the following order:
– Bandwidth protection
– Non-bandwidth protection
– Manual protection
– Auto FRR protection
– Node protection
– Link protection
Both bypass CR-LSPs 1 and 2 shown in Figure 10-67 are manually configured and
provide bandwidth protection. Bypass CR-LSP 1, which protects a link, has a lower
priority than bypass CR-LSP 2, which protects a node. In such a scenario, bypass CR-
LSP 2 is then bound to a primary CR-LSP. If bypass CR-LSP 1 only protects bandwidth
and bypass CR-LSP 2 only protects a link, bypass CR-LSP 1 is then bound to the
primary CR-LSP.
After the binding is complete, the primary CR-LSP NHLFE records the bypass CR-LSP
NHLFE index and an inner label that the MP allocates for the primary CR-LSP. The
label is used to forward traffic from the MP to the next hop along the primary CR-LSP.
3. Performs fault detection.
– Link protection directly uses a data link layer protocol to detect and report faults.
The speed of fault detection at the data link layer depends on the link type.
– Node protection uses a data link layer protocol to detect link faults. If no link fault
occurs, the bidirectional forwarding detection (BFD) mechanism is used to detect
faults in a protected node.
After a link or node fault is detected, FRR switching triggers immediately.
NOTE
If node protection is enabled, only the link between the protected node and PLR is protected. The
PLR cannot detect faults in the link between the protected node and MP.
4. Performs a traffic switchover.
If the primary CR-LSP fails, both data traffic and RSVP messages switch to the bypass
CR-LSP, and the switchover event is reported upstream. The PLR pushes both an inner
label that the MP assigns for the primary CR-LSP and an outer label assigned for the
bypass CR-LSP into a packet. The outer label is removed at the penultimate hop of the
bypass CR-LSP, and the packet, only with the inner label, arrives at the MP. The MP
forwards the packet to the next hop along the primary CR-LSP.
Bypass LSP
Primary LSP
PLR MP
34
1022 1022
IP IP
PLR MP
1024 IP
LSRA IP LSRB LSRC LSRD LSRE
Swap 1024→1022
Push 34 Faulty point
Figure 10-68 shows nodes on the primary and bypass CR-LSPs and their allocated
labels and forwarding behaviors. The bypass CR-LSP provides node protection. If the
link between LSRB and LSRC fails or LSRC fails, LSRB (PLR) swaps an inner label
1024 for an inner label 1022, pushes an outer label 34 into the packet, and forwards the
packet over the bypass CR-LSP. After the packet arrives at LSRD, LSRD forwards the
packet to the next hop LSRE. For the detailed transmission procedure, see Figure 10-69.
5. Performs a traffic switchback.
After TE FRR (either manual or Auto FRR) switching is complete, the PLR (ingress)
attempts to reestablish the primary CR-LSP using the make-before-break mechanism.
Service traffic and RSVP messages switch from the bypass CR-LSP back to the primary
CR-LSP after the primary CR-LSP is successfully reestablished. The reestablished CR-
LSP is called a modified CR-LSP. The make-before-break mechanism allows the original
primary CR-LSP to be torn down only after the modified CR-LSP is established
successfully.
NOTE
FRR does not take effect if multiple nodes fail simultaneously. This means that after FRR switches data
from the primary CR-LSP to the bypass CR-LSP, all nodes on the bypass CR-LSP must be working
properly when transmitting data. If the bypass CR-LSP fails, the protected data cannot be forwarded,
and the FRR function fails. Even if the bypass CR-LSP is reestablished, it cannot forward data. Data will
be restored only after the primary CR-LSP is restored or reestablished.
Other Usage
l Board hot removal protection
Board hot removal protection protects traffic on the primary CR-LSP's outbound
interface on a PLR. If an interface board on which a protected outbound interface of a
primary CR-LSP resides is removed from a PLR, the PLR rapidly switches traffic to a
bypass CR-LSP. After the interface board is re-installed and the outbound interface of the
primary CR-LSP becomes available, traffic switches back to the primary CR-LSP.
Hot removal protection does not apply to an interface board, on which tunnel interfaces
are configured. If an interface board configured with a tunnel interface is removed, CR-
LSP information is lost and traffic is interrupted. The primary and bypass CR-LSPs'
tunnel interfaces and the bypass CR-LSP's outbound interface must be configured on
boards different from the board configured with the bypass CR-LSP's outbound interface
on the PLR.
Configuring tunnel interfaces on the main control board of the PLR is recommended. If
an interface board on which the primary CR-LSP's outbound interface is removed or
fails, the primary CR-LSP's tunnel interface enters the Stale state, and resources
allocated to the tunnel interface remain. After the interface board is re-installed, the
tunnel interface recovers and a primary CR-LSP is reestablished.
l N:1 protection
A single bypass CR-LSP can protect traffic over multiple primary CR-LSPs.
Deployment Scenarios
TE FRR is a local protection mechanism that applies to MPLS TE networks that have backup
paths.
Benefits
TE FRR provides carrier-class local protection capabilities for CR-LSPs to improve network
reliability.
Related Concepts
CR-LSP backup functions include hot standby, ordinary backup, and the best-effort path
function. CR-LSP backup functions are as follows:
l Hot standby: A hot-standby CR-LSP is established immediately after a primary CR-LSP
is created. If the primary CR-LSP fails, the hot-standby CR-LSP takes over traffic from
the primary CR-LSP. After the primary CR-LSP recovers, traffic switches back.
l Ordinary backup: An ordinary backup CR-LSP can be established only after a primary
CR-LSP fails. The ordinary backup CR-LSP takes over traffic if the primary CR-LSP
fails. After the primary CR-LSP recovers, traffic switches back.
l Best-effort path
If both the primary and backup CR-LSPs fail, a best-effort path is established and takes
over traffic.
For example, the primary CR-LSP is established over the path PE1 → P1 → P2 → PE2,
and the backup CR-LSP is established over the path PE1 → P3 → PE2 shown in Figure
10-70. If both CR-LSPs fail, PE1 establishes a best-effort path PE1 → P4 → PE2 to take
over traffic.
P3
Backup CR-LSP
Best-effort path
P4
NOTE
A best-effort path has no bandwidth reserved for traffic, but has an affinity and a hop limit
configured as needed.
Implementation
The procedure of CR-LSP backup is as follows:
1. CR-LSP backup is deployed.
Plan the paths, bandwidth values, and deployment modes. Table 10-12 lists CR-LSP
backup deployment items.
Dynamic bandwidth protection ensures that the hot-standby CR-LSP does not use bandwidth,
while the primary CR-LSP is transmitting traffic. The dynamic bandwidth protection process
is as follows:
1. If the primary CR-LSP fails, traffic immediately switches to the hot-standby CR-LSP
with 0 bit/s bandwidth. The ingress uses the make-before-break mechanism to establish a
hot-standby CR-LSP.
2. After the new hot-standby CR-LSP has been successfully established, the ingress
switches traffic to this CR-LSP and tears down the hot-standby CR-LSP with 0 bit/s
bandwidth.
3. After the primary CR-LSP recovers, traffic switches back to the primary CR-LSP. The
hot-standby CR-LSP then releases the bandwidth it uses, and the ingress establishes
another hot-standby CR-LSP with no bandwidth.
The dynamic bandwidth function can be configured to allow the system to create a primary CR-
LSP and a hot-standby CR-LSP with the bandwidth of 0 bit/s simultaneously. The hot-standby
CR-LSP does not use bandwidth resources before the primary CR-LSP fails.
Background
Most live IP radio access networks (RANs) use ring topologies and have the access ring
separated from the aggregation ring. To improve the end-to-end and inter-ring LSP reliability,
many IP RAN carriers require isolated primary and hot-standby LSPs. The CSPF algorithm
does not meet this reliability requirement, because CSPF is a metric-based path computing
algorithm that may compute two intersecting LSPs. Specifying explicit paths can meet this
reliability requirement; this method, however, does not adapt to topology changes. Each time
a node is added to or deleted from the IP RAN, operators must configure new explicit paths,
which is time-consuming and laborious. To resolve these problems, you can configure
isolated LSP computation.
Figure 10-71 shows an IP RAN on which a Multiprotocol Label Switching (MPLS) Traffic
Engineering (TE) tunnel is established between a cell site gateway (CSG) on the access ring
and a radio service gateway (RSG) on the aggregation ring. The MPLS TE tunnel implements
the end-to-end virtual private network (VPN) service. To improve the network reliability, this
network requires the constraint-based routed label switched path (CR-LSP) hot standby
feature and isolated primary and hot-standby LSPs.
Without the isolated LSP computation feature, CSPF on this network will compute CSG ->
ASG1 -> ASG2 -> RSG as the primary LSP. This LSP does not have an isolated hot-standby
LSP. However, two isolated LSPs exist on this network: CSG -> ASG1 -> RSG and CSG ->
ASG2 -> RSG. With the isolated LSP computation feature, the disjoint and CSPF algorithms
work simultaneously to get the two isolated LSPs.
Figure 10-71 Application of isolated LSP computation on an end-to-end VPN bearer network
Node B RNC
CSG RSG
metric=1
L3VPN Eth
PW ATM/TDM
Implementation
Isolated LSP computation is implemented by both the disjoint and CSPF algorithms. This
feature computes primary and hot-standby LSPs simultaneously and cuts off overlapping
paths of the two LSPs to get two isolated LSPs. In the example shown in Figure 10-72, before
isolated LSP computation is configured, CSPF computes LSRA -> LSRB -> LSRC -> LSRD
as the primary LSP and LSRA -> LSRC -> LSRD as the hot-standby LSP if path overlapping
is allowed. These two LSPs intersect, so that they do not meet the reliability requirement.
After isolated LSP computation is configured, the disjoint and CSPF algorithms compute
LSRA -> LSRB -> LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the hot-
standby LSP. These two LSPs do not intersect, so that they meet the reliability requirement.
LSRB LSRB
metric=2 metric=2
metric=3 metric=3
LSRC LSRC
Before isolated LSP computation is After isolated LSP computation is
configured configured
Primary LSP
Hot-standby LSP
Excluded path
NOTE
l Isolated LSP computation is a best-effort technique. If the disjoint and CSPF algorithms cannot get
isolated primary and hot-standby LSPs or two isolated LSPs do not exist, the device uses the
primary and hot-standby LSPs computed by CSPF.
l The disjoint algorithm cannot work together with the following features: explicit path, affinities, hop
limit, CR-LSP attribute template, and automatic bandwidth adjustment. Therefore, before you
configure isolated LSP computation, check that all those features are disabled. Otherwise, the device
does not allow you to configure isolated LSP computation. After you configure isolated LSP
computation, the device does not allow you to configure any of those features, either.
l After you configure isolated LSP computation, the shared risk link group (SRLG), if configured,
becomes ineffective.
Usage Scenario
Isolated LSP computation applies to networks on which Resource Reservation Protocol -
Traffic Engineering (RSVP-TE) tunnels and the CR-LSP hot standby feature are configured.
Benefits
Isolated LSP computation offers the following benefits to carriers:
10.3.5.6 SRLG
The shared risk link group (SRLG) functions as a constraint that is used to calculate a backup
path in the scenario where CR-LSP hot standby or TE FRR is used. This constraint helps
prevent backup and primary paths from overlapping over links with the same risk level,
improving MPLS TE tunnel reliability as a consequence.
Background
Carriers use CR-LSP hot standby or TE FRR to improve MPLS TE tunnel reliability.
However, in real-world situations, protection failures may occur, requiring the SRLG
technique to be configured as a preventative measure, as the following example demonstrates.
PE1 P1 P2 PE2
P3
Logical topology
SRLG
PE1 P1 P2 PE2
Physical topology
P3
Shared link Shared link
Optical transport
device
NE1
The primary tunnel is established over the path PE1 → P1 → P2 → PE2 on the network
shown in Figure 10-73. The link between P1 and P2 is protected by a TE FRR bypass tunnel
established over the path P1 → P3 → P2.
In the lower part of Figure 10-73, core nodes P1, P2, and P3 are connected using a transport
network device. They share some transport network links marked yellow. If a fault occurs on
a shared link, both the primary and FRR bypass tunnels are affected, causing an FRR
protection failure. An SRLG can be configured to prevent the FRR bypass tunnel from
sharing a link with the primary tunnel, ensuring that FRR properly protects the primary
tunnel.
Related Concepts
An SRLG is a set of links at the same risk of faults. If a link in an SRLG fails, other links also
fail. If a link in this group is used by a hot-standby CR-LSP or FRR bypass tunnel, the hot-
standby CR-LSP or FRR bypass tunnel cannot provide protection.
Implementation
An SRLG link attribute is a number and links with the same SRLG number are in a single
SRLG.
Interior Gateway Protocol (IGP) TE advertises SRLG information to all nodes in a single
MPLS TE domain. The constraint shortest path first (CSPF) algorithm uses the SRLG
attribute together with other constrains, such as bandwidth, to calculate a path.
l Strict mode: The SRLG attribute is a necessary constraint used by CSPF to calculate a
path for a hot-standby CR-LSP or an FRR bypass tunnel.
l Preferred mode: The SRLG attribute is an optional constraint used by CSPF to calculate
a path for a hot-standby CR-LSP or FRR bypass tunnel. For example, if CSPF fails to
calculate a path for a hot-standby CR-LSP based on the SRLG attribute, CSPF
recalculates the path, regardless of the SRLG attribute.
Usage Scenario
The SRLG attribute is used in either the TE FRR or CR-LSP hot-standby scenario.
Benefits
The SRLG attribute limits the selection of a path for a hot-standby CR-LSP or an FRR bypass
tunnel, which prevents the primary and bypass tunnels from sharing links with the same risk
level.
Related Concepts
Concepts related to a tunnel protection group are as follows:
Working tunnel-1
Working tunnel-2
Protection tunnel-3
LSRA LSRB
Data flow when primary
tunnel is normal
Data flow when primary
tunnel is failed
Primary tunnels tunnel-1 and tunnel-2, and the bypass tunnel tunnel-3 are established on the
ingress LSRA on the network shown in Figure 10-74.
Tunnel-3 is configured as a protection tunnel for primary tunnels tunnel-1 and tunnel-2 on
LSRA. If the configured fault detection mechanism on the ingress detects a fault in tunnel-1,
traffic switches to tunnel-3. LSRA attempts to reestablish tunnel-1. If tunnel-1 is successfully
established, traffic switches back to the primary tunnel.
Implementation
A TE tunnel protection group uses a configured protection tunnel to protect traffic on the
working tunnel to improve tunnel reliability. To ensure the improved performance of the
protection tunnel, the protection tunnel must exclude links and nodes through which the
working tunnel passes during network planning.
1 Establish The working and protection tunnels must have the same ingress and
ment destination address. The protection tunnel is established in the same
procedure as a regular tunnel. The protection tunnel can use attributes
that differ from those for the working tunnel. Ensure that the working
and protection tunnels are established over different paths as much as
possible.
NOTE
l A protection tunnel cannot be protected or enabled with TE FRR.
l Attributes for a protection tunnel can be configured independently of those
for the working tunnel, which facilitates the network planning.
2 Binding The protection tunnel is bound to the tunnel ID of the working tunnel
between so that the two tunnels form a tunnel protection group.
the
working
and
protectio
n tunnels
3 Fault In addition to MPLS TE's own detection mechanism, MPLS OAM and
detection BFD for CR-LSP are used to detect faults in a tunnel protection group
to speed up protection switching.
4 Protectio The tunnel protection group supports either of the following protection
n switching modes:
switching l Manual switching: Traffic is forcibly switched to the protection
tunnel.
l Automatic switching: Traffic automatically switches to the
protection tunnel if the working tunnel fails.
A time interval can be set for automatic switching.
Other Usage
A tunnel protection group works in either 1:1 or N:1 mode. The 1:1 mode enables a protection
tunnel to protect only a single working tunnel. The N:1 mode enables a protection tunnel to
protect more than one working tunnel.
Working tunnel-1
LSRA Working tunnel-2 LSRB
Protection tunnel-3
Data flow when primary
tunnel is normal
Data flow when primary
tunnel is failed
Table 10-14 Comparison between CR-LSP backup and a tunnel protection group
Item CR-LSP Backup Tunnel Protection Group
Object to be Primary and backup CR-LSPs One tunnel protects traffic over
protected are established on the same another tunnel in a tunnel protection
tunnel interface. A backup CR- group.
LSP protects traffic on a
primary CR-LSP.
LSP attributes Primary and backup CR-LSPs The attributes of one tunnel in a
have the same attributes, except tunnel protection group are
for the TE FRR attribute. In independent of the attributes of the
addition, the bandwidth for the other tunnel. For example, a
backup CR-LSP can be set protection tunnel with no bandwidth
separately. can protect traffic on a working
tunnel that has a bandwidth.
Background
TE FRR, CR-LSP backup, and tunnel protection groups can be used to improve the reliability
of MPLS TE networks. A fault occurs if no message arrives after the refresh period of RSVP
Hello or RSVP messages elapses, which leads to a slow detection speed. When a Layer 2
device (such as a switch or hub) exists on the faulty link, slow detection delays a traffic
switchover and causes some traffic to be dropped. BFD can send packets to quickly detect
faults in MPLS TE tunnels and trigger a rapid traffic switchover to minimize traffic loss.
Related Concepts
BFD sessions are classified into the following types:
l Static BFD session: Local and remote discriminators are configured manually.
l Dynamic BFD session: Local and remote discriminators are allocated automatically.
NOTE
For details about BFD, see the chapter "BFD" in Feature Description - Reliability.
Implementation
The following BFD functions are supported for MPLS TE:
l BFD for CR-LSP
BFD monitors CR-LSPs. After BFD detects a fault in a CR-LSP, the BFD module
immediately instructs the forwarding plane to trigger a rapid traffic switchover. BFD for
CR-LSP is used together with a hot-standby CR-LSP or a tunnel protection group.
l BFD for RSVP
BFD can detect faults in links between RSVP neighboring nodes in milliseconds. BFD
for RSVP applies to a TE FRR network, on which Layer 2 devices exist between the
PLR and its RSVP neighboring nodes over the primary CR-LSP.
l BFD for TE tunnel
BFD can monitor MPLS TE tunnels that are used as public network tunnels to transmit
VPN traffic. BFD monitors a whole TE tunnel. If BFD detects a fault in a tunnel that
transmits private network traffic, the BFD module instructs the VPN FRR module to
perform a traffic switchover.
Figure 10-76 Traffic forwarding of a BFD session before and after a traffic switchover
LSRD
LSRB
LSRA LSRC
LSRD
LSRA LSRC
LSRB
Primary Lsp
Backup Lsp
Bfd Session
BFD Session
A BFD session for RSVP is established to monitor the link between RSVP neighbors shown
in Figure 10-77. The RSVP module can rapidly detect a link failure.
BFD for RSVP can share BFD sessions with BFD for Open Shortest Path First (OSPF), BFD
for Intermediate System to Intermediate System (IS-IS), or BFD for Border Gateway Protocol
(BGP). The local node selects the smallest values of parameters between the two ends of the
shared BFD session as local BFD parameters. The parameters include the interval at which
BFD packets are sent, interval at which BFD packets are received, and local detection
multiplier.
Differences
Table 10-15 lists differences between BFD for CR-LSP, BFD for RSVP, and BFD for TE
tunnel.
Table 10-15 Differences between BFD for CR-LSP, BFD for RSVP, and BFD for TE tunnel
Detection Detection Node Usage BFD Session
Technique Object Scenario Support
BFD for RSVP RSVP neighbor Two ends of an Can be used Dynamic
relationships RSVP session with TE FRR
10.3.5.9 RSVP GR
RSVP graceful restart (GR) ensures uninterrupted transmission on the forwarding plane when
an active main board (AMB)/standby main board (SMB) switchover is performed on the
control plane. A GR helper assists a GR restarter in rapidly restoring the RSVP status.
Background
GR applies to provider edge (PE) devices on the provider network shown in Figure 10-78.
User nodes access the provider network through only a single PE. RSVP-TE tunnels are
established between PEs on the network to implement TE or transmit VPN traffic. If a PE
fails or a maintenance measure (such as a software upgrade) is taken, an AMB/SMB
switchover is performed on the PE. To prevent traffic loss during a traffic switchover, RSVP
GR can be implemented to ensure the uninterrupted transmission of critical services.
VPNA VPNA
CE1 CE2
PE1 PE2
PE3 PE4
CE3 CE4
VPNB VPNB
Related Concepts
RSVP GR is a rapid status restoration mechanism for RSVP-TE that is implemented based on
non-stop forwarding (NSF).
Devices play the following roles during a GR process:
l GR restarter: performs a graceful restart.
l GR helper: assists the GR restarter in implementing a graceful restart.
RSVP GR supports the following messages:
l Hello messages: used to create a Hello session between RSVP neighboring nodes.
l Path messages with a restoration label: also called GRPath messages. A GRPath message
is sent by an upstream node and carries the content of the latest refreshed Path message.
l RecoveryPath messages: sent by a downstream node and carry the content of the last
Path message that is received by the downstream node.
The RSVP GR process involves the following periods:
l RSVP restart time: elapses when the restarter restarts RSVP-TE components and
reestablishes an RSVP signaling channel. The time is specified by the Restart timer.
l RSVP recovery time: elapses when the restarter restores the RSVP soft state and
refreshes MPLS forwarding entries after the restarter and helper exchange Hello
messages. The time is specified by the Recovery timer.
Implementation
RSVP GR depends on the RSVP hello extension capability. After the RSVP Hello extension
capability is enabled, RSVP neighbors exchange Hello messages to advertise each other's GR
capabilities and GR parameters, including the RSVP restart time and recovery time. In Figure
10-79, if a fault occurs and an AMB/SMB switchover is performed on a device, the device
functions as the restarter, and its upstream and downstream GR-capable RSVP neighbors
function as helpers.
Hello Hello
Advertises GR capabilities and Advertises GR capabilities and
time parameters to each other. time parameters to each other.
Hello Hello
Hello Hello
Hello Hello
... ...
Hello Hello
Hello Hello
RSVP restarting complete
Recovery Path
GR Path
Path
Resv
Resv
...
...
Hello Hello
RSVP recovery complete
Deployment Scenarios
RSVP GR can be used on nodes that run RSVP-TE to establish MPLS TE tunnels to improve
device reliability.
Benefits
RSVP GR ensures uninterrupted data service transmission when the control plane performs an
AMB/SMB switchover. It supports device-level reliability for MPLS TE nodes.
Background
RSVP uses raw IP to transmit packets. Raw IP has no security mechanism and is prone to
attacks. RSVP authentication can be used to verify packets based on keys to prevent attacks.
Original RSVP authentication, however, cannot prevent replay attacks or the problem of
neighbor relationship termination resulted from RSVP message mis-sequence. The RSVP
authentication enhancements are used to address this problem. The authentication lifetime,
handshake, and message window are added as enhanced functions. The authentication
enhancements improve security and user authentication in a harsh network environment, such
as network congestion.
Related Concepts
l Raw IP: similar to UDP but unreliable. No control is provided for raw IP. Whether raw
IP datagrams reach their destinations is uncertain. Connectionless raw IP can exchange
data between hosts without virtual circuits.
Implementation
l Key authentication
RSVP authentication uses keys carried in packets exchanged between RSVP neighboring
nodes to verify those packets, preventing spoofing attacks. The same key must be
configured on two RSVP neighboring nodes before they perform RSVP authentication.
The RSVP authentication implementation is as follows:
a. A local node uses Keyed-Hashing for Message Authentication Message Digest 5
(HMAC-MD5) to calculate a digest for a key.
b. The local node adds this digest as an integrity object into an RSVP message, and
sends that message to the remote node.
c. After the remote node receives the message, the node uses the same key and
algorithm to calculate a digest and checks whether the local digest is the same as the
received one.
n If they match, the remote node accepts the message.
n If they do not match, the remote node discards the message.
l Handshake mechanism
The handshake mechanism maintains the RSVP authentication status. After RSVP
neighboring nodes authenticate each other, they exchanged handshake packets. If they
accept the packets, they record a successful handshake. If a local node receives a packet
with the sequence number less than the local maximum sequence number, the local node
processes the packet as follows:
– Discards the packet if the packet shows that the handshake mechanism is not
enabled on the remote node.
– Discards the packet if the packet shows that the handshake mechanism is enabled
on the remote node and the local node has a record about a successful handshake. If
the local node does not have a record about a successful handshake, this packet is
the first one arrives at the local node and the local node starts a handshake process.
NOTE
In the preceding procedure, the local node only records the maximum sequence number, without
the message window enabled.
l Message window
A message window saves sequence numbers of received RSVP messages. If the window
size is 1, only the largest sequence number is saved. If the window size is set to a value
greater than 1, the specified number of largest sequence numbers can be saved. For
example, a window size is set to 10, and the largest sequence number of a received
RSVP message is 80. The sequence numbers between 71 and 80 can be saved if there is
no packet mis-sequence. If a packet mis-sequence problem occurs, the local node
arranges the messages and records the 10 largest sequence numbers.
l Authentication lifetime
Authentication can be performed at a specified interval.
l Neighbor-oriented authentication
You can configure authentication information, such as authentication keys, based on
neighbor addresses. RSVP then authenticates each neighbor separately.
Either of the following items can be used as an RSVP neighbor address:
– IP address of an interface on an RSVP neighboring node
– LSR ID of an RSVP neighboring node
l Interface-oriented authentication
Authentication is configured on interfaces, and RSVP authenticates messages based on
inbound interfaces.
10.3.7 DS-TE
This section describes the background, basic concepts, principle, and applications of DS-TE.
NOTE
10.3.7.1 Background
Background
l Advantages and disadvantages of MPLS TE
Multiprotocol label switching traffic engineering (MPLS TE) uses available resources to
establish a label switched path (LSP), and therefore provides guaranteed bandwidth for
traffic. MPLS TE can also precisely control traffic paths so that current bandwidth can be
fully used.
MPLS TE, however, cannot provide differentiated QoS guarantees for traffic of different
types. When both voice and video traffic is transmitted, video frames may be
retransmitted over a long period of time, so it may be required that video traffic be of a
higher drop precedence than voice traffic. MPLS TE, however, does not classify traffic
and processes voice and video traffic with the same drop precedence.
reduce the delay in processing voice packets on each hop. When traffic congestion
occurs, the more packets, the longer the queue, and the higher the delay in processing
packets. Therefore, you must restrict the voice traffic on each link.
If the MPLS DiffServ model is used in this case, services are distinguished, and a
specific MPLS TE LSP is configured for each type of service. When a link or node fails
on the network, the network topology changes, or an LSP is preempted, the voice traffic
rate on the link may still exceed the specification, and end-to-end QoS cannot be
guaranteed.
0M
10
R5 10
0M
1000M
0M
HSI:20M 10
0M 10 0M
10
R4
VoIP:60M R1 Internet
0M
10
R3 1
0M
VoIP:40M 00
M
10
R2
HSI:20M R7
As shown in Figure 10-81, the bandwidth of each link is 100 Mbit/s, and all links share
the same metric. Voice traffic is transmitted from R1 to R4 and from R2 to R4 at the rate
of 60 Mbit/s and 40 Mbit/s, respectively. Traffic from R1 to R4 is transmitted along the
LSP over the path R1 → R3 → R4, with the ratio of voice traffic being 60% between R3
and R4. Traffic from R2 to R4 is transmitted along the LSP over the path R2 → R3 →
R7 → R4, with the ratio of voice traffic being 40% between R7 and R4.
When the link between R3 and R4 fails, as shown in Figure 10-82, the LSP between R1
and R4 switches to the path R1 → R3 → R7 → R4 because this path is the shortest path
with sufficient bandwidth. At this time, the ratio of voice traffic from R7 to R4 reaches
100%, causing the sum delay of voice traffic to prolong.
0M
10
R5 10
0M
1000M
0M
HSI:20M 10
0M 10 0M
10
R4
VoIP:60M R1 Internet
0%
0M
10
10
R3 1
0M
VoIP:40M 0 0M
10
R2
HSI:20M R7
0M
10
R5 10
0M
1000M
HSI:20M
0M
%
P: 60
r VoI : 20%
10
CT f o I
10 r HS R4
VoIP:60M R1 0M CT fo 0M Internet
10
CT
0M
CT for
for VoI
10
R3 H P:
VoIP:40M 10 SI: 2 40%
0M
0M 0%
10
R2
HSI:20M R7
When the link from R3 to R4 fails, VoIP and HSI services from R1 to R4 are switched to the
path R1 → R3 → R5 → R6 → R4, as shown in Figure 10-84. Voice services from R1 to R4
can also be controlled within a proper range.
0M
10
R5 %
60 % 10
IP: 20 0M 1000M
o I:
HSI:20M rV S
0M
fo or H
CT T f
10
10 C R4
0M Internet
VoIP:60M R1 10 0M
CT
0M
CT for
fo Vo
10
R3 1 r HS IP: 4
VoIP:40M 00
0M
M I: 20%0%
10
R2
HSI:20M R7
DS Field
To carry out the DiffServ model, RFC 2474 redefines the ToS field in the IPv4 packet header
as the Differentiated Services (DS) field. The high order 2 bits in the DS field are reserved,
and the low order 6 bits specify the DS Code Point (DSCP).
CT
To carry out differentiated services, the DS-TE model divides the bandwidth of the LSP into
eight parts. Each part of bandwidth is allocated with a different service class. The set of
bandwidth of one LSP or a group of LSPs with the same service class is called a class type
(CT). One CT can bear the traffic of a single service type.
As defined in the IETF, the DS-TE supports a maximum of eight CTs. CTs can be represented
as CTi. The value of "i" ranges from 0 to 7.
TE-class
A TE-class indicates the combination of a CT and a priority in format of <CT, priority>.
The priority indicates the priority of the CR-LSP preemption rather than the value of the EXP
field in the MPLS packet header. The value of the preemption priority ranges from 0 to 7. The
smaller the value is, the higher the priority is. A CR-LSP can be set up only when both the
combination of its CT and setup priority (<CT, setup-priority>) and the combination of its CT
and holding-priority (<CT, hold-priority>) exist in the TE-class mapping table. For example,
suppose the TE-class mapping table of a certain node contains only TE-class[0] = <CT0, 6>
and TE-class[1] = <CT0, 7>.
Only the following types of CR-LSPs can be set up successfully:
l Class-Type = CT0, setup-priority = 6, holding-priority = 6
l Class-Type = CT0, setup-priority = 7, holding-priority = 6
l Class-Type = CT0, setup-priority = 7, holding-priority = 7
NOTE
The CR-LSPs of "Class-Type = CT0, setup-priority = 6, holding-priority = 7" cannot be configured,
because the setup priority of the CR-LSP cannot be higher than its holding priority.
Each of eight CTs can be combined with any of eight priorities, which theoretically yields 64
TE-classes. In the ATN, eight TE-classes can be configured manually.
DS-TE Modes
The DS-TE modes are as follows:
l IETF mode: indicates the mode defined by the IETF. Eight CTs are combined with eight
priorities and the combinations specify 64 TE-classes. In the ATN, the maximum of the
configurable TE-classes is 8.
l Non-IETF mode: indicates the mode not defined by the IETF that each of the two CTs is
combined with each of eight priorities, which yields 16 TE-classes.
10.3.7.3 Implementation
Basic Implementation
The edge nodes in the DiffServ model divide the traffic into several classes and add the class
information into the DSCP field in packets. The internal node chooses a proper PHB for the
packet according to the DSCP value.
The EXP field in the MPLS packet header contains information related to the DiffServ model.
The key to implement DS-TE is how to map the DSCP field (with a maximum of 64 values)
to the EXP field (with a maximum of eight values). RFC 3270 defines the following
solutions:
l Label-Only-Inferred-PSC LSP (L-LSP): The drop priority is specified in the EXP field,
and the PHB is determined by the label value. During packet forwarding, the label
determines the packet forwarding path and allocates the PHB for the path.
l EXP-Inferred-PSC LSP (E-LSP): The PHB and the drop priority are specified in the
EXP field of the MPLS label. During packet forwarding, the label value determines the
packet forwarding path, and the EXP value determines the PHB. The E-LSP applies to
networks that support not more than 8 PHBs.
The ATN implements the E-LSP. The mapping of the DSCP to the EXP field complies to
RFC 3270. The mapping of the EXP field to the PHB is configured manually.
The CT is introduced to the DS-TE to allocate resources according to traffic types. The DS-
TE maps the traffic of the same PHB to one CT and allocates resources to each CT separately.
DS-TE LSPs are set up based on the CT. DS-TE calculates the path and reserves resources
based on the CT and its bandwidth.
IGP Extension
RFC 4124 extends IGP to support the DS-TE. RFC 4124 introduces a Bandwidth Constraints
sub-TLV into IGP and redefines the Unreserved Bandwidth sub-TLV. These sub-TLVs are
used to collect and advertise the reservable bandwidth for each CT along a link. For details,
see RFC 4124.
RSVP Extensions
IETF extends RSVP to implement the DS-TE in IETF mode. RFC 4124 defines a
CLASSTYPE object for Path messages. In the IETF draft (draft-minei-diffserv-te-multi-
class), a new object is defined, the extended-classtype object. For details, refer to RFC 4124
and draft-minei-diffserv-te-multi-class.
When each LSR along an LSP receives RSVP Path messages carrying the CT information and
the resources are sufficient, the LSR agrees to set up an LSP. At the same time, the LSR re-
calculates the reservable bandwidth for each CT. After the LSP is set up, information about
the reservable bandwidth is responded to IGP, and IGP advertises the information to other
nodes over the network.
In the MAM, the total bandwidth of CTi along an LSP is not more than that of BCi (0 <=
i <= 7). The total bandwidth of CTs of all LSPs is not more than the maximum
reservable bandwidth.
For example, suppose the bandwidth of a link is 100 Mbit/s; the MAM is applied, and
three CTs are supported: CT0, CT1, and CT2. BC0 is 20 Mbit/s, bearing CT0 traffic (for
example, BE traffic); BC1 is 50 Mbit/s, bearing CT1 traffic (for example, AF traffic);
BC2 is 30 Mbit/s, bearing CT2 traffic (for example, EF traffic). The total bandwidth of
all LSPs bearing BE traffic cannot be more than 20 Mbit/s; the total bandwidth of all
LSPs bearing AF traffic cannot be more than 50 Mbit/s; the total bandwidth of all LSPs
bearing EF traffic cannot be more than 30 Mbit/s.
In the MAM, the bandwidth preemption is allowed between the LSPs of the same CT,
and is not allowed between different CTs. In the MAM, however, the bandwidth may be
wasted.
l Russian Dolls Model (RDM): CTs can share bandwidth. The BC model ID of the RDM
is 0.
The bandwidth of BC0 is equal to or less than the maximum reservable bandwidth of the
link. As shown in Figure 10-86:
– The total bandwidth of all LSPs from CT7 <= Bandwidth of BC7
– The total bandwidth of all LSPs from CT6 and CT7 <= Bandwidth of BC6
– The total bandwidth of all LSPs from CT5, CT6, and CT7 <= Bandwidth of BC5
– The total bandwidth of all LSPs from CT0, CT1,... CT7 <= Bandwidth of BC0 <=
Maximum reservable bandwidth
This model is similar to Russian dolls that bigger dolls nest smaller ones.
BC1 BC0
BC7
... CT1 + ... + CT7 CT0 + CT1 + ... + CT7
CT7
Max. reservable bandwidth >= BC0 >= BC1 >= ... >= BC7
For example, the bandwidth of a link is 100 Mbit/s; the RDM applies, and three CTs are
supported: CT0, CT1, and CT2. CT0 bears BE traffic; CT1 bears AF traffic; CT2 bears
EF traffic. The bandwidth of BC0 is 100 Mbit/s; the bandwidth of BC1 is 50 Mbit/s; the
bandwidth of BC2 is 20 Mbit/s. The total bandwidth of all LSPs bearing EF traffic
cannot be more than 20 Mbit/s; the total bandwidth of all LSPs bearing AF and EF traffic
cannot be more than 50 Mbit/s; the total bandwidth of all LSPs cannot be more than 100
Mbit/s.
The RDM allows the bandwidth preemption between CTs. If 0 <= m < n <= 7 and 0 <= i
< j <= 7, the CTi of priority m can preempt the bandwidth of CTi of priority n and the
bandwidth of CTj of priority n. The total bandwidth of CTi of all LSPs cannot exceed the
bandwidth of BCi.
In the RDM, the bandwidth can be used efficiently.
l Extended-MAM: A bandwidth allocation mode that supports E-LSPs. The BC mode ID
of the extended-MAM is 254.
The extended-MAM supports eight more implicit CTs (the combination of CT0 and eight
priorities). This is different from the MAM. IGP floods these eight implicit CTs that are
carried in the unreserved BW TLV.
Table 10-16 Differences between the IETF mode and non-IETF mode
Bandwidth Supports the MAM and Supports the RDM, MAM, and extended-
constraints RDM. MAM.
model
IGP message The reservable bandwidth is The CT information is carried in the sub-
carried in the Unreserved TLVs.
Bandwidth sub-TLV based The sub-TLVs are as follows:
on the priority.
l Unreserved Bandwidth sub-TLV:
carries the unreserved bandwidth of
the 8 TE-classes, in byte/s.
l Bandwidth Constraints sub-TLV:
– For RDM and MAM, it carries
information about the BC model
and the BC bandwidth, in byte/s.
– For extended-MAM, it carries the
unreserved bandwidth of the 8
implicit TE-classes, in byte/s.
Change in If the TE-class mapping table is The TE-class mapping table is not
the TE- configured, it applies. Otherwise, applied.
class the default one applies. l If a TE-class mapping table is
mapping For information about the default configured, it is not deleted.
table TE-class mapping table, see l If no TE-class mapping table is
Table 10-18. configured, the default one is deleted.
LSP LSPs whose <CT, set-priority> or The following LSPs are torn down on the
deletion <CT, hold-priority> is not in the ingress and transit nodes:
TE-class mapping table are torn l Multi-CT LSPs
down on the ingress and transit
nodes. l LSPs of single CT from CT2 to CT7
TE-Class [0] 0 0
TE-Class [1] 1 0
TE-Class [2] 2 0
TE-Class [3] 3 0
TE-Class [4] 0 7
TE-Class [5] 1 7
TE-Class [6] 2 7
TE-Class [7] 3 7
Background
Service packets exchanged by two nodes need to travel through the same links and nodes on a
transport network without running a routing protocol. Static bidirectional co-routed LSPs can
be used to meet the requirements.
Related Concept
Static bidirectional co-routed LSP:
A static bidirectional co-routed LSP is a type of LSP over which two flows are transmitted in
opposite directions by the same nodes over the same links. A static bidirectional co-routed
LSP is established manually. A static bidirectional co-routed LSP differs from two LSPs that
transmit traffic in opposite directions. Two unidirectional LSPs bound to a static bidirectional
co-routed LSP function as a whole LSP. Two forwarding tables are used to forward traffic in
opposite directions.
The static bidirectional co-routed LSP can go Up only when the conditions for forwarding
traffic in opposite directions are met. If the conditions for forwarding traffic in one direction
are not met, the bidirectional LSP is in the Down state. If no IP forwarding capabilities are
enabled on the bidirectional LSP, any intermediate node on the bidirectional LSP can reply
with a packet along the original path.
Implementation
A static bidirectional co-routed LSP is established by allocating labels manually to a specific
forwarding equivalence class (FEC). In manual label allocation mode, the outgoing label
value of a node is equal to the incoming label value of its next hop. Although this LSP is
established in the same way as a common static CR-LSP, a static bidirectional co-routed LSP
requires two forwarding tables, one for sending packets and the other for receiving packets.
A node on a static bidirectional co-routed LSP only has information about the local LSP and
cannot obtain information about nodes on the other LSP. A static bidirectional co-routed LSP
shown in Figure 10-87 consists of a CR-LSP and a reverse CR-LSP. The CR-LSP originates
from the ingress and terminates on the egress. Its reverse CR-LSP originates from the egress
and terminates on the ingress.
Background
On an IP RAN, low-speed E1/T1 services are transmitted on the AC side. If some packets are
dropped or bit errors occur on a PW, faults must be diagnosed. To diagnose the faults, a static
bidirectional co-routed CR-LSP overlapping the PW is established, and loopback detection is
enabled to locate faults along the PW.
Implementation
In Figure 10-88, loopback detection is enabled for a static bidirectional co-routed CR-LSP
that overlaps a PW.
To prevent the impact on the existing services, both the PW and static bidirectional co-routed
CR-LSP must overlap. The PW must be a static PW with the outgoing label the same as the
incoming label.
Fault diagnosis is performed on a low-speed AC-side interface without services transmitted.
The pseudo random binary sequence (PRBS) detection mechanism simulates traffic, and the
dichotomy method is used in loopback detection to monitor the link to each hop along the
CR-LSP to locate the point where packet loss occurs.
Figure 10-88 Networking diagram for loopback detection for a static bidirectional co-routed
CR-LSP
With loopback detection enabled, a specified transit node loops back traffic to the ingress
along the CR-LSP. Loopback alarms can then be generated to prompt users that loopback
detection is performed. Loopback detection can be disabled manually or automatically after
being complete. Its configuration takes effect only on a main control board. After a master/
slave main control board is performed, loopback detection is automatically disabled.
Benefits
Loopback detection for a static bidirectional co-routed CR-LSP helps rapidly diagnose low-
speed service faults and improve network operation and maintenance efficiency.
Background
Existing networks have the following issues:
l RSVP-TE tunnels for transmitting TE services are unidirectional, and TE services are
transmitted from the ingress to the egress of a tunnel. TE services can be transmitted
from the egress to the ingress only using IP routes, which may cause traffic congestion.
l Another RSVP TE tunnel (a reverse tunnel) can be configured to send services from the
egress to the ingress. If a tunnel or its reverse tunnel fails, a traffic switchover is
performed, but the other tunnel cannot detect the fault or perform a traffic switchover,
which causes a service interruption.
In this case, you can deploy two RSVP-TE tunnels on two devices functioning as the source
and destination of each other. Then bind two unidirectional dynamic LSPs of the two tunnels
into an associated bidirectional dynamic LSP. The associated bidirectional dynamic LSP can
transmit bidirectional traffic, preventing network congestion. In addition, when one end of the
LSP fails, the other end will be notified of the fault, and the two ends can perform link
switching at the same time, preventing service interruptions.
Related Concepts
APS coordinates the source and destination ends to perform a protection switchover, a
delayed switchover, or a switchover after a wait-to-restore (WTR) time elapses.
Implementation
Tunnel1
l During service deployment, the reverse RSVP-TE LSP uses labels to establish a connection to
the forward RSVP-TE LSP on the forwarding plane. If PHP is supported, label 0 or 3 will be
popped out at the penultimate hop so that messages with label 0 or label 3 cannot be sent to the
destination.
l Using the same path to establish an LSP and its reverse LSP is recommended, which ensures
the same delay time for packets in opposite directions.
LSRA LSRF
LSRC
LSRB LSRD
LSRE
If a fault occurs on the LSP that originates from LSRB and is destined for LSRC on the
network shown in Figure 10-91, traffic on LSRB switches to the link LSRB -> LSRE ->
LSRD, and a fault notification is sent to LSRD to instruct LSRD to switch traffic to the link
LSRD -> LSRE -> LSRB. Bidirectional services between LSRA and LSRF switch to the
bypass tunnels LSRB -> LSRE -> LSRD and LSRD -> LSRE -> LSRB, preventing service
interruptions.
LSRA LSRF
LSRC
LSRB LSRD
LSRE
Deployment Scenarios
l Associated bidirectional LSPs apply to the scenario in which bidirectional services need
bandwidth protection.
l To configure bit-error-triggered RSVP-TE tunnel switching, configure associated
bidirectional LSPs.
RSVP Messages
RSVP has the following message types:
l Path message: sent by a sender to receivers to collect path information of the passing
nodes.
l Resv message: sent upstream by a receiver hop-by-hop to respond to the Path message,
require resource reservation.
l PathTear message: sent to remove path state of the passing nodes.
l ResvTear message: sent to remove resource reservation state on a node.
l PathErr message: sent upstream by a node to report errors in processing of the Path
messages.
l ResvErr message: sent downstream by a node if errors occur during the processing of the
Resv messages.
l ResvConf message: sent downstream by a sender hop-by-hop to confirm the resource
reservation requests. It is sent only when the Resv message contains the
RESV_CONFIRM object.
Each type of RSVP messages contains a common header. The length and types of other fields
are not fixed. Figure 10-92 shows the format of RSVP messages.
Objects ( Variable )
Field Description
NOTE
l Length: total length of the object, in bytes. Its value must be a multiple of 4, and at least
4.
l Class_Number: an object class. Each object class has a name, such as SESSION,
SENDER_TEMPLATE, and TIME_VALUE.
l C-Type: object type, unique within the Class_Number. The Class-Number and C-Type
are used together to define a unique type for each object.
l Object Content: content of objects. The length of this field is changeable.
Path Message
In RSVP-TE, a Path message is used to create an RSVP session and maintain a Path state.
The Path message is sent by the ingress node to the egress node in the direction of data flows.
On each node, the path state block (PSB) is created.
NOTE
The source IP address of a Path message is the LSR ID of the ingress node and the destination IP
address is the LSR ID of the egress node.
HOP Identifies the IP address and the handle of the outbound interface
of the previous hop that sends the Path message.
Explicit Route Object Describes information about the path through which the LSP
(ERO) passes. The explicit paths can be strict or loose. Path messages
are then forwarded with the specified ERO, without being
restricted by IGP shortest path.
Record Route Object Lists the LSRs through which the Path message passes when
(RRO) being transmitted. An RRO can be used to collect path
information and discover route loops. It can also be copied to the
next Path message for implementing Route Pinning.
Session Attribute Specifies the setup priority, holding priority, reservation style,
affinity, and other information.
Resv message
After receiving a Path message, a transit node and the egress node reply with Resv messages.
The Resv message, carrying resource reservation information, is sent to the previous node
hop-by-hop. Each passing node creates and maintains a reserved state block (RSB) and
allocates a label. When the Resv message reaches the ingress node, an LSP is established
successfully.
HOP Identifies the IP address and the index of the outbound interface that
sends the Resv message.
FILTER_SPEC Specifies the sender IP address and LSP ID of the node that sends the
message.
RRO Collects the IP address of the inbound interface, LSR ID, and the IP
address of the outbound interface of the node along the path.
Service Overview
An IP RAN is a transport network that transmits traffic between wireless base stations and
base station controllers. A conventional RAN supports circuit-switching and consists of multi-
service transmission platform (MSTP) and microwave devices. As data, audio, and video
services are growing, these services have increasing demands for bandwidth on IP RANs, but
carriers operate the IP RANs on decreasing profit margins due to fierce competition. Carriers
start to use IP/MPLS techniques to transmit services on the IP RANs to meet the bandwidth
requirements and to reduce costs.
Long Term Evolution An eNodeB uses IP/MPLS techniques to send LTE S1 services to
(LTE) S1 services an MME through an Ethernet interface.
Service Description
Networking Description
Figure 10-94 shows an IP RAN that consists of the access, aggregation, and service core
layers. Access and aggregation layers use ring networking. Wireless base stations are
connected to the RNC, BSC, and MME through the access and aggregation networks.
AGG1
RSG1
BTS CSG BSC
RNC
Node B
AGG2 RSG2
MME
e node B
2G TDM 2G TDM
2G ATM MPLS TE1 2G ATM
3G Eth 3G Eth
MPLS TE2
LTE S1/X2 LTE S1/X2
L3VPN
L2VPN
NOTE
In this section, an E2E virtual private network (VPN) solution is used to demonstrate how MPLS TE
techniques are used to transmit IP RAN services.
Feature Deployment
MPLS TE tunnels are established as public network tunnels to transmit E2E VPN services
over an IP RAN. Table 10-23 lists MPLS TE-related features.
QoS E2E QoS must be configured between CSGs and radio service gateways
(RSGs) to ensure service quality. Using DS-TE to establish MPLS TE tunnels
is recommended.
To avoid different services in one tunnel interfering with each other, you can set up specific
VPNs and TE tunnels to bear specific services. Otherwise, when multiple VPNs and tunnels
are set up for bearing difference types of services over a network at the same time, the
resources may be wasted.
Alternatively, you can deploy DS-TE to use a multi-CT LSP to bear services over one VPN.
A multi-CT LSP can reserve up to 8 CTs. Each CT can bear one type of services of one VPN.
Services among different CTs cannot interfere with each other.
As shown in Figure 10-95, VPN1 bears EF, AF, and BE services. One DS-TE tunnel needs to
be set up and configured with CT0 (100 Mbit/s), CT2 (50 Mbit/s), and CT7 (10 Mbit/s). The
tunnel is bound to VPN1 on the ingress. After traffic of VPN1 is classified, the traffic enters
corresponding CT queues.
Figure 10-95 Networking diagram for one LSP bearing different services on one VPN
CR-LSP backup The bypass CR-LSP inherits the CTs and their bandwidth from the
primary CR-LSP. The best-effort path cannot guarantee QoS and it
does not inherit the CTs and bandwidth from the primary CR-LSP.
Tunnel protection Two independent tunnels are bound and they form a tunnel protection
group group. One tunnel is the primary tunnel and the other is the bypass
tunnel. DS-TE features on the backup tunnel can be configured. The
CTs and bandwidth of the bypass tunnel should be consistent with
those of the primary tunnel.
In addition, MPLS OAM packets are sent through the queue of the
highest TE tunnel priority.
On a network, if the bandwidth preemption is enabled among CTs, the RDM is applicable.
The bandwidth can be utilized efficiently. If the bandwidth preemption is disabled among
CTs, the MAM or extended-MAM is applicable. The extended-MAM is recommended when
a node needs to interwork with both a non-DS-TE node and a DS-TE node.
In the ATN implementation, all links on the same node use the same bandwidth constraints
model. In addition, the same bandwidth constraints model is recommended to all nodes over
the entire network. Therefore, the network is configured and maintained easily.
CT class type
MP merge point
TE traffic engineering
Definition
Seamless MPLS is a bearer technique that extends MPLS techniques to access networks. All
services can be encapsulated using MPLS on access networks. Seamless MPLS establishes an
end-to-end (E2E) LSP across the access, aggregation, and core layers to transmit services.
Purpose
MPLS, a mature and well-known technique, proves its worth and inspires service providers in
network construction. MPLS can converge multiple networks on an Ethernet-based
infrastructure, which fully exerts advantages of the single forwarding model and reduces
network construction costs. MPLS has been widely used on aggregation and core networks.
With the trend towards the delayering network structure, a metropolitan area network (MAN)
evolves into the Ethernet architecture. This gives the opportunity of using MPLS techniques
on the MAN and access networks. To meet this requirement, the seamless MPLS technique
was developed. Seamless MPLS is not a new technique. It uses existing Border Gateway
Protocol (BGP), Interior Gateway Protocol (IGP), and MPLS techniques to establish an E2E
LSP across the access, aggregation, and core layers so that traffic can be encapsulated and
forwarded using MPLS over a whole network.
Benefits
Seamless MPLS offers the following benefits:
l Converges the access, aggregation, and core layers on an MPLS network, encapsulates
all services using MPLS, and transmits these services along an E2E LSP. Seamless
MPLS simplifies network provisioning, operation, and maintenance.
l Supports high deployment flexibility and scalability. On a seamless MPLS network, any
two nodes on an LSP can be connected and roll out services using MPLS.
10.4.2 Principles
Usage Scenario
Seamless MPLS establishes a BGP LSP across the access, aggregation, and core layers on a
network and transmits services along the E2E BGP LSP. Any two nodes on the LSP can
exchange service traffic. The seamless MPLS network architecture maximizes service
scalability using the following functions:
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
CSG2 AGG2 Core ABR2 MASG2
NodeB/
eNodeB
Control
plane In Figure 10-96, routing protocol deployment is as follows:
l An IGP (IS-IS or OSPF) is enabled on devices at each of the
access, aggregation, and core layers to implement intra-AS
connectivity.
l The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each pair of the following devices:
– CSG and AGG
– AGG and core ABR
– Core ABR and MASG
The AGG and core ABR are configured as route reflectors
(RRs) so that the CSG and MASG can obtain routes
destined for each other's loopback addresses.
l The AGG and core ABR set the next hop addresses in BGP
routes to their own addresses to prevent advertising
unnecessary IGP area-specific public routes.
Network Description
Deployment
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
CSG2 AGG2 Core ABR2 MASG2
NodeB/
eNodeB MPLS LDP/ MPLS LDP/ MPLS LDP/
MPLS TE MPLS TE MPLS TE
Network Description
Deployment
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
CSG2 AGG2 Core ABR2 MASG2
NodeB/
eNodeB
VPN
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
Network Description
Deployment
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB MPLS LDP/ MPLS LDP/ MPLS LDP/
MPLS TE MPLS TE MPLS TE
Network Description
Deployment
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
VPN
Network Description
Deployment
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
VPN
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
MP-IBGP
MP-EBGP
(HVPN)
Control
plane
Network Description
Deployment
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB MPLS LDP/ MPLS LDP/ MPLS LDP/
MPLS TE MPLS TE MPLS TE
Network Description
Deployment
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG3 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
VPN
Network Description
Deployment
Reliability
Seamless MPLS network reliability can be improved using a variety of functions. If a network
fault occurs, devices with reliability functions enabled immediately detect the fault and switch
traffic from active links to standby links.
NOTE
Figure 10-106 Traffic protection triggered by a fault in the link between the CSG and
AGG on the inter-AS seamless MPLS network
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB Primary path
Backup path
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 Core ASBR2 MASG2
AGG ASBR2
NodeB/
eNodeB Primary path
Backup path
Figure 10-108 Traffic protection triggered by a fault in the link between an AGG and an
AGG ASBR on the inter-AS seamless MPLS network
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB Primary path
Backup path
Figure 10-109 Traffic protection triggered by a fault in an AGG ASBR on the inter-AS
seamless MPLS network
NodeB/ CSG1
eNodeB Access Aggregation Core
MME/
SGW
CSG2 MASG2
NodeB/ AGG2 AGG ASBR2 Core ASBR2
eNodeB
Primary path
Backup path for downstream traffic
Backup path for upstream traffic
l A fault occurs on the link between an AGG ASBR and a core ASBR.
As shown in Figure 10-110, BFD for interface is configured on AGG ASBR1 and core
ASBR1. If the BFD module detects a fault in the link between AGG ASBR1 and core
ASBR1, the BFD module triggers the BGP Auto FRR function. BGP Auto FRR switches
both upstream and downstream traffic from the primary path to backup paths.
Figure 10-110 Traffic protection triggered by a fault in the link between an AGG ASBR
and a core ASBR on the inter-AS seamless MPLS network
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 MASG2
NodeB/ AGG2 AGG ASBR2 Core ASBR2
eNodeB
Primary path
Backup path for downstream traffic
Backup path for upstream traffic
Figure 10-111 Traffic protection triggered by a fault in a core ASBR on the inter-AS
seamless MPLS network
NodeB/ CSG1
eNodeB Access Aggregation Core
MME/
SGW
CSG2 MASG2
NodeB/ AGG2 AGG ASBR2 Core ASBR2
eNodeB
Primary path
Backup path for downstream traffic
Backup path for upstream traffic
Figure 10-112 Traffic protection triggered by a link fault in a core area on the inter-AS
seamless MPLS network
Core ASBR1
AGG1 AGG ASBR1 MASG1
CSG1
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
MASG2
CSG2 AGG2 AGG ASBR2 Core ASBR2
NodeB/
eNodeB
Primary path
Backup path
Core ASBR1
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
CSG2 AGG2 AGG ASBR2 Core ASBR2 MASG2
NodeB/
eNodeB
Primary path
Backup path
The following example describes how load balancing works between core ASBRs and
MASGs in inter-AS seamless MPLS networking:
Multiple BGP IPv4 unicast label peer relationships are configured between each core ASBR
and an MASG. After the MASG receives BGP LSP labeled routes with the same prefix but
different next hops, these BGP LSP labeled routes participate in load balancing, which
improves network resource utilization. In Figure 10-114, two BGP LSPs are configured
between MASG1 and each core ASBR for load balancing.
AGG Core
CSG1 AGG1 MASG1
ASBR1 ASBR1
BGP LSP
NodeB/
BGP LSP
eNodeB Core
Access Aggregation
MME/
SGW
AGG Core
CSG2 AGG2 MASG2
ASBR2 ASBR2
NodeB/
eNodeB
10.4.3 Applications
Service Overview
With the growth of third generation of mobile telecommunications (3G) services and Long
Term Evolution (LTE) services, inter-AS leased line services become the key services.
Seamless MPLS can establish an E2E LSP between a cell site gateway (CSG) and a mobile
aggregate service gateway (MASG) to transmit virtual private network (VPN) services.
Seamless MPLS helps carriers reduce costs of network construction, operation, and
maintenance and allows carriers to uniformly operate and maintain networks.
Networking Description
Figure 10-115 illustrates an LTE network. The access and aggregation layers belong to one
AS, and the core layer belongs to another AS. To transmit VPN services, the inter-AS
seamless MPLS+HVPN networking can be used to establish an LSP between each pair of a
CSG and an MASG. CSGs are connected to NodeBs functioning as Wideband Code Division
Multiple Access (WCDMA) 3G base stations and to eNodeBs functioning as LTE base
stations. MASGs are connected to a mobility management entity (MME) or service gateway
(SGW). VPN instances can be configured between CSGs and MASGs to transmit various
types of services. An HVPN is deployed between each pair of a CSG and an aggregation
(AGG) node, and an inter-AS LSP between each pair of an AGG and an MASG is established
using the seamless MPLS technique. A NodeB or eNodeB can then communicate with the
MME or SGW.
NodeB/
eNodeB Access A g g re g a tio n C o re
M M E/
SGW
CSG2 AGG2 A G G A S B R 2 C o re A S B R 2 MASG2
NodeB/
eNodeB
Enterprise leased line The VPN services of large-scale enterprises can be provisioned.
services The Layer 2 and Layer 3 leased lines connected to CSGs are easy
to deploy to transmit VPN services.
CSG performance CSGs that maintain a few routes only need to process packets each
requirements with two labels.
11 VPN
This document describes the VPN feature in terms of the overview, principle, and
applications.
Characteristics of VPN
A VPN has the following characteristics:
l Privacy: For a VPN user, the VPN has no difference from a traditional private network in
terms of privacy. Resources of a VPN are separated from its bearer network. Therefore,
the resources of a VPN cannot be used by other users outside this VPN. In addition,
VPNs offer sufficient security measures to ensure that the internal information is free
from external interference.
l Virtuality: VPN users communicate with each other through public networks. The public
networks are used by other non-VPN users at the same time. That is, a VPN is a logical
private network. The public networks are called VPN backbone networks.
Given by the characteristics of privacy and virtuality, VPN can segment an existing IP
network into several logically isolated networks. Such logical segmentation is quite flexible.
It can be applied to interconnect different departments or branches of an enterprise. VPNs can
also provide enhanced services. For example, creating a VPN for the IP phone service can
solve the problem of inadequate IP addresses, guarantee Quality of Service (QoS), and pave
the way for enhanced services.
VPNs, especially the Multiprotocol Label Switching VPN (MPLS VPN), are highly evaluated
by operators in terms of providing interworking between enterprises and other enhanced
services. In this manner, VPNs become an important means for operators to provide Value-
Added Services (VASs) in IP networks.
Advantages of VPN
Compared with traditional private networks, for a user, a VPN has the following advantages:
l A VPN can guarantee the data security. On a VPN, reliable connections are established
between remote users, branches, partners, suppliers, and company headquarters to ensure
the security of data transmission. High security is of great significance to the
combination of e-business or financial networks with communication networks.
l A VPN is an economical solution. Using public network, an enterprise can connect its
headquarters with branches, personnel on business, and business partners at a low cost.
l A VPN supports mobile services. VPN users that are located outside the headquarters
can access the VPN regardless of time and place. As a result, the increasing demand for
mobile services can be met.
l A VPN can guarantee QoS. A VPN with QoS such as MPLS VPN can provide VPN
users with QoS of different levels.
l VPNs are easy to operate. The resource utilization is improved, and profits of carriers are
increased.
l The configurations of VPNs are flexible. The carriers can add or delete VPN users
through software configurations without modifying hardware configurations. Therefore,
VPNs have flexible and wide applications.
l VPNs provide multiple services. In addition to basic VPN interworking services, carriers
can also provide enhanced services such as network outsourcing, service outsourcing,
and customized services.
Given by outstanding advantages, VPNs relieve enterprises from the burden of network
operation and maintenance to some extent, facilitate enterprises to achieve their business
goals, and therefore become popular with enterprises. In addition, an operator can manage and
operate a network, and provide multi-service on the network such as Best-effect IP service,
VPN, traffic engineering, and differentiated services (DS). Therefore, the cost of construction,
maintenance, and operation of the operator is reduced.
VPNs provide a network with strong scalability and flexibility besides security, reliability, and
manageability. Regardless of locations, users can enjoy the VPN services as long as they can
access the Internet.
Corporation
ISDN Internet
interior network
or
PSTN
Remote
user Corporation gataway
Compared with VPNs of other types, VPDNs provide more flexible authentication
mechanisms, accounting schemes, and higher security. In addition, VPDNs support
dynamic address assignment. VPDNs adopt Layer 2 tunnels and support multiple Layer
3 protocols.
l Virtual Private Routing Network (VPRN)
A Virtual Private Routing Network (VPRN) connects the headquarters, branches, and
remote offices through virtual devices. Different from VPNs of other types, in VPRNs,
packets are forwarded on the network layer. Each VPN node on the public network sets
up a private routing forwarding table, which contains information about reachability of
the network layer, for each VPN. Data traffic between VPN nodes and that between VPN
nodes and user sites is transmitted on the basis of the forwarding tables.
VPRNs are implemented through two ways: one is using traditional VPN protocols such
as the Generic Routing Encapsulation (GRE) protocol; the other way is using MPLS
(Multi-Protocol Label Switching).
NOTE
l For more information about GRE, refer to the chapter GRE in this manual.
l For more information about MPLS VPRN, refer to the chapter BGP/MPLS IP VPN in this
manual.
l Virtual Private Wire Service (VPWS)
Virtual Private Wire Service (VPWS) are also called VLL (Virtual Leased Line). By
using IP networks to emulate private leased lines, Virtual Private Wire Service (VPWS)
provides the asymmetric and low cost Digital Data Network (DDN) service. For users on
the two ends of a virtual leased line, the virtual line is similar to a traditional leased line.
VPWS is available in traditional private networks such as ATM and FR networks.
Operators can smoothly update ATM or FR networks to VPWS networks.
As a service of virtual leased line, VPWS is generally used on the access layer and the
convergence layer. VPWS is divided into the following types:
– Circuit Cross-Connect (CCC)
– Static Virtual Circuit (SVC)
– Martini VPWS
As an end-to-end Layer 2 service-bearing technology, Pseudo-Wire Emulation Edge-to-
Edge (PWE3) is an extension of Martini VPWS.
NOTE
l For more information about VPWS, refer to the chapter VLL in this manual.
l For more information about PWE3, refer to the chapter PWE3 in this manual.
VPWS is suitable for VPNs of star topology; VPRN is suitable for fully connected
VPNs.
l Virtual Private LAN Service (VPLS)
Virtual Private LAN Service (VPLS) connects LANs through a virtual private network
segment. VPLS is an extension of LANs over IP public networks.
VPLS is also called Transparent LAN Service (TLS). Different from the common
L2VPN P2P service, by using VPLS, SPs can provide the multi-point service based on
Ethernet networks by using MPLS backbone networks.
Thanks to the advantages such as flexible configurations of VLAN logical interfaces and
high bandwidth/cost ratio, the Ethernet technology is widely used nowadays.
VPRNs and VPWS networks can also provide LAN services; the following limitations
of the traditional Ethernet technology, however, still exist:
– Broadcast storm of frames with unknown destination MAC addresses cannot be
avoided.
– The expansion of the Spanning Tree Protocol (STP) is limited.
– VLAN address spaces are limited.
VPLS is thus introduced to solve those problems. Instead of running STP, VPLS
backbone networks use full-mesh connections and split horizon to eliminate loops. For
unicast or multicast frames with unknown destination MAC addresses, a VPLS discards,
handles the frames on the local node, or broadcasts the frames. VPLS, therefore, can
expand the range of a VLAN to a country or even the whole world.
l Intranet VPN
Intranet VPNs connect all the branches of an enterprise through public networks. Intranet
VPNs are the extension or substitute for traditional private networks or other enterprise
networks.
Through Intranet VPNs, headquarters, branches, offices, and mobile personnel of an
enterprise compose an intranet by using public networks. VPNs can be applied to
constructing intranets of banks and governments.
Chain business such as chain stores, storage and logistics companies, and gas stations are
typical examples of intranet VPNs.
l Extranet VPN
Extranet VPNs extend enterprise networks to suppliers, partners, and clients by using
VPNs. The VPNs are established between different enterprises with common benefit
through public networks. Parts of the resources are thus shared among different VPN
users.
On a network of traditional leased lines, an extranet needs to manage the network,
perform access control, and even install compatible network devices on user side.
Although an extranet can be established in dialing mode, different extranet users must be
configured respectively. The configurations are not simplified. An extranet in dialing
mode requires high expenses in construction and maintenance for wide distribution of
partners and customers. Therefore, most enterprises give up extranets, which leads to the
complication of business processes between the enterprises. The efficiency of the
enterprises is reduced.
Extranet VPNs are thus introduced. Similar to intranet VPNs in terms of the technology
implementation, extranet VPNs are easy to construct and manage. Currently, enterprises
generally use VPNs to construct extranets. To guarantee QoS, generally, external
communication of an enterprise is not realized through the Internet. The reason is that
the data transmission requires high security guarantee, and the security of extranets is
stronger than that of the Internet. The access right of an extranet VPN can be configured
by each extranet user. For example, a user can configure firewalls to perform access
control.
l Access VPN
Through access VPNs, personnel on business, Small Office Home Office (SOHO), and
remote offices can access the servers of an intranet through cheap dialing media and set
up private network connections with intranets and extranets. Access VPNs are also
called VPDNs.
Access VPNs are divided into two types: client-initiated VPN and NAS-initiated VPN.
l L3VPN
Layer 3 VPNs (L3VPNs) are also called VPRNs. GRE VPN, BGP/MPLS VPN based on
RFC 4364, and BGP/MPLS VPN with GRE tunnels are all L3VPNs. The BGP/MPLS
VPN is generally applied at the forwarding layer of the core network; GRE VPN is
mainly applied at the access layer.
NOTE
l For more information about L3VPN, refer to the chapter BGP/MPLS IP VPN in this manual.
l L2VPN
With the development of network technologies, carrier's networks become increasingly
complex. New technologies are required to integrate traditional switching networks such
as ATM and FR networks with IP or MPLS networks. Layer 2 VPN (L2VPN) is thus
introduced.
L2VPN includes the preceding described VPWS and VPLS. VPWS is suitable for large-
scale enterprises that are connected through Wide Area Networks (WANs); VPLS is
suitable for small-scale enterprises that are connected through Metropolitan Area
Networks (MANs). VPLS cannot avoid broadcast storm. In addition, on a VPLS
network, Provider Edge (PE) devices need to learn Medium Access Control (MAC)
addresses of devices in the private network. Protocol and storage involve a high cost.
L2VPNs only use Layer 2 links of SP networks. Therefore, L2VPNs can support
multiple Layer 3 protocols. L3VPNs also support multiple protocols; however, there are
more limitations than the L2VPN case.
NOTE
l For more information about L2VPN, refer to the chapter VLL, PWE3 and VPLS in this
manual.
Traditional VPNs based on public IP networks (IP VPNs) belong to CPE-based VPNs.
CPE-based VPNs set up VPN security tunnels between private devices to transmit
private data of users. The Internet is a typical public IP network. Constructing VPNs
based on the Internet is economical; however, QoS cannot be guaranteed. When
programming an IP VPN, an enterprise should consider choosing which kind of public
IP network.
l Network-based VPN
In Network-based VPN mode, ISPs build, manage, and maintain VPNs. The ISPs allow
users to manage and control services in some measure. The functions and features are
mainly implemented on the devices on network side. On user side, only networks
interconnection is required.
This mode reduces the user investment, improves the flexibility and scalability of
services, and brings more incomes to operators.
VPNs based on MPLS, namely, MPLS VPNs belong to Network-based VPNs. Owing to
the advantages on flexibility, scalability, and QoS, MPLS VPNs become the major IP
VPN technology and are widely used in telecom carrier's networks and enterprise
networks. As an important technology to connect branches of VIP customers, to isolate
3G and NGN services, MPLS VPNs are generally applied on the backbone core network
and the convergence layer. MPLS VPNs are also of great importance to MANs. The
MPLS VPN technology applied within a MAN is an important means to improve IP
MAN values and increase profits of operators.
In an MPLS VPN network, user sites can use T1, FR, ATM VCs, and Digital Subscriber
Lines (DSLs) to access the MPLS VPN backbone network. No additional configuration
is required on user devices.
Table 11-2 lists the difference between a CPE-based VPN and a Network-based VPN.
Seamless integration of CPE-based VPNs with Network-based VPNs can provide users with
more reliable, securer, and more abundant VPN services.
l VPN tunnels
– Establishment of tunnels.
– Management of tunnels.
l VPN management
– VPN configuration management.
– VPN member management.
– VPN attribute management: the management of attributes of multiple VPNs on PE
devices and differentiation of VPN address spaces.
– VPN automatic configuration: the establishment of one-to-one relationship between
VPN internal links in L2VPNs after information about the peer links is received on
the local link.
l VPN signaling protocol
– Exchange and share of VPN resources between CE devices on a VPN: For an
L2VPN, information about data links is exchanged; for an L3VPN, routing
information is exchanged; for a VPDN, information about a single data link is
exchanged.
– VPN member discovery in some applications.
l Access layer
The devices on the access layer provide users with the access function. Those devices
need not realize many functions, but requires many access interfaces. For MANs in big
cities, the access layer needs to realize more functions in addition to the access function.
On the access layer, generally, a CE device is dual-homed to or multi-homed to access
nodes. The dual homing is either physical or logical. In the physical dual homing, a CE
device accesses two nodes through two physical links; in the logical dual homing, a CE
device accesses two nodes through loops. The logical dual homing is widely used in
L2VPN network.
l Convergence layer
The convergence layer is of either a mesh topology or a ring topology.
l Backbone layer
The backbone layer must be of a full-mesh topology and multi-level backup. The devices
on the backbone layer are generally connected through high-speed interfaces.
11.1.2 Principles
Instantiation
In instantiation mode, each VPN on Layer 2 and Layer 3 is instantiated, and instances of
private forwarding information of each VPN are established. Besides tunnel management, a
VPN in this mode performs member discovery, member management, and VPN automatic
configuration.
Operability
The VPN technology is generally used to provide services for different departments of an
enterprise through public networks. Nowadays, more and more VPN users require VPN
services being operable. They do not need to spend too much time and unexpected resources
on network maintenance, and require operators to undertake the task. Therefore, when
designing a VPN, consider the operability first.
Manageability
On a VPN, network management of an enterprise is seamlessly extended from LANs to the
public network, even clients and partners. Besides assigning some nonessential network
management tasks to the carrier, the enterprise need also fulfill many network management
tasks. So a complete VPN management system is absolutely necessary.
l VPN management reduces network risks. After a VPN intranet is extended to a public
network, the intranet is faced with more risks. VPN management can guarantee the
integrity of data resources of an intranet when branches, clients, and partners of an
enterprise access a VPN.
l VPN management provides better expansibility. VPN management quickly makes
adjustment to the increased number of clients and partners, including the upgrade of
network hardware and software, the guarantee of network quality, and the maintenance
of security policies.
l VPN management reduces costs. VPN management controls expenses of operation and
maintenance and ensure service scalability at the same time.
l VPN management improves the reliability of a VPN. VPNs are set up over a public
network. Compared with traditional WANs using leased lines, the controllability of the
VPNs is lower. VPN management should guarantee the reliable and stable operation of a
VPN.
Security
VPNs are constructed over public networks. The implementation of a VPN is simple,
convenient, and flexible. However, network risks arise at the same time.
l On a traditional IP VPN, an enterprise must guarantee that the VPN data is not
intercepted and modified by attackers, and prohibit the access of unauthorized users.
Extranet VPNs are faced with even more serious risks.
The following solutions can improve the security of a VPN:
– Tunnel technology and encryption: By performing multi-protocol encapsulation, the
tunnel technology can enhance the flexibility of a VPN and provide P2P logical
channel on connectionless IP networks. When users require a more secured data
transmission, an encrypted tunnel is utilized, which can prevent data from being
intercepted and modified.
– Data authentication: In an insecure network such as a public network used by a
VPN, packets may be illegally intercepted and modified. The receiver receives the
incorrect packets. By using data authentication, the receiver can recognize the
modification.
– User authentication: Through user authentication, a VPN can allow legal users to
access enterprise resources and prohibit the access of unauthorized users. After the
configuration of Authentication, Authorization and Accounting (AAA), ATNs can
authenticate users, authorize users with different levels, and generate access
records. User authentication greatly improves the security of access VPNs and
extranet VPNs.
– Firewalls and attack detection: Firewalls are used to filter packets and prevent
illegal access. Attack detection is used to judge the validity of packets by analyzing
the packets, implement security policies in real time, disconnect the illegal sessions,
and record illegal access.
l MPLS VPNs are created on the basis of labels of forwarding table and packets on
network side. If an MPLS network is not connected to the Internet, the security of
internal resources of the MPLS network is guaranteed. The MPLS VPN, therefore, can
ensure the security of the VPN to some extent.
If the MPLS VPN users want to access the Internet, a channel with a firewall can be
created to provide a secure connection for the VPN. The MPLS VPN is easy to manage
because only one security policy is applied in the VPN.
Node B RNC
In the scenario shown in Figure 11-2, when a CSG receives an ARP/ND packet for PWE3
services, the CSG encapsulates the PW label and tunnel label into the ARP/ND packet and
then transparently transmits the packet to the remote AGG. As a result, only the master AGG
learns this ARP/ND packet. Specifically, only one AGG has the ARP/ND packet information.
When a fault occurs on the master AGG, the Layer 3 service is switched from the master
AGG to the backup AGG. The backup AGG, which does not have the ARP/ND packet
information, has to trigger ARP/ND packet learning by sending traffic. It takes a long time for
the backup AGG to learn the ARP/ND packet of the base station. As a result, many packets
are lost during path switching.
To reduce packet loss, configure ARP/ND dual fed on CSGs. When the primary and
secondary PWs are working properly, the AC-side device transparently transmits the received
ARP/ND packet to master and backup AGGs over the primary and secondary PWs
respectively so that Layer 3 interfaces of the master and backup AGGs can learn the ARP/ND
packet of the base station. When a fault occurs on the master AGG, the Layer 3 service is
switched from the master AGG to the backup AGG, reducing the time the backup AGG learns
the ARP/ND packet of the base station. Consequently, packet loss is reduced and device
performance is improved. When the Layer 3 service on the aggregation ring recovers and is
switched back from the backup AGG to the master AGG, the master AGG does not have to
learn the ARP/ND packet and corresponding packet loss will not occur.
Terms
Term Description
AVP The attribute value pairs (AVP) that are used by the L2TP
protocol to transmit and negotiate the L2TP parameters. A
control message contains multiple AVPs.
Control connection A connection that defines a pair of LNS and LAC and controls
the establishment, maintenance and dismantlement of tunnels
and sessions. The procedures for establishing a control
connection involve the exchange of information about identity
protection, L2TP version, frame type, and parameters of the
physical links.
Term Description
Intranet VPN A VPN that connects sites within an enterprise through the
public network.
Term Description
MPLS L2VPN A VPN that provides Layer 2 VPN services based on the MPLS
network to enable the carriers to provide VPNs of different
media, including ATM, FR, VLAN, Ethernet, and PPP on
unified MPLS network.
Network-based VPN A VPN in which users entrust maintenance of the VPN to ISPs
and realize VPN features and functions on the network edge
devices.
Term Description
Term Description
Single segment PW A situation in which only one PW exists between the U-PEs.
The label switching on the PW label level is not needed.
Service quality The priority information in the Layer 2 frame header is mapped
to the priority of QoS in the packet that is transmitted on the
public network. Generally, it is applied to MPLS TE networks.
SVC An implementation of static MPLS L2VPN that does not use the
signaling protocol to transmit L2VPN information. In SVC, VC
label information needs manual configuration.
Tunnel interface A virtual P2P interface that can encapsulate packets. Similar to
loopback interfaces, tunnel interfaces are logical interfaces.
Term Description
Tunnel Management A module manages the tunnel. It notifies the tunnel status to the
application that uses this tunnel and queries the tunnel and
configured policy based on the destination IP address. It
provides uniform interfaces to such upper-layer applications as
L3VPN, L2VPN, Resource Manager (RM), and the Border
Gateway Protocol (BGP).
Tunnel switch A technology that is used to implement the L2TP tunnel relay. A
device supporting the tunnel switch works on the one hand as an
LNS to set up the tunnel connection with the LAC, and on the
other hand works as an LAC to set up the tunnel connection with
the LNS.
VCCV A tool that is used to manually test the connectivity of the virtual
circuit. Similar to ICMP ping and LSP ping, it is realized
through the extended LSP ping.
VLL A line that emulates the leased line by using IP network and
therefore provides unsymmetrical and low-cost Digital Data
Network (DDN) service.
VPLS A service that is used to connect more than one Ethernet LAN
segment through the PSN and make them operate in an
environment similar to a LAN.
Term Description
VPN instance An entity that is set up and maintained by the PE devices for
directly-connected sites. Each site has its VPN instance on a PE
device. A VPN instance is also called VPN Routing and
Forwarding (VRF) table. A PE device has multiple forwarding
tables, including a public-network routing table and one or
multiple VRFs.
VPN route matching A process in which VPNv4 routes and VPN targets of the local
VPN instances are matched.
VPN target A BGP extended community attribute that is also called Route
Target. In BGP/MPLS IP VPN, VPN Target is used to control
VPN routing information. VPN Target attribute defines a VPN-
IPv4 route can be received by which site and a PE device can
receive routes from which site.
Abbreviations
Abbreviation Full Name
AC Attachment Circuit
AS Autonomous System
CE Customer Edge
CW Control Word
DR Designated Router
DU Downstream Unsolicited
FR Frame Relay
HoPE Hierarchy of PE
LO Label-block Offset
LR Label Range
P2MP Point-to-Multipoint
P2P Point-to-Point
PE Provider Edge
PW Pseudo-Wire
QinQ 802.1q-in-802.1q
RD Router Distinguisher
RR Route-Reflector
SP Service Provider
S-PE Switching-point PE
TE Traffic Engineering
U-PE Ultimate PE
VC Virtual Circuit
11.2.1 Introduction
Definition
A tunnel policy determines which type of tunnel can be selected for an application. Tunnel
policies can be classified into the following types:
l Tunnel type prioritizing policy: selects tunnels for an application based on the tunnel
type priorities defined in the policy.
l Tunnel binding policy: selects only a specified tunnel for an application.
The two types of policies are mutually exclusive.
A tunnel selector selects a tunnel policy for each route based on route attributes.
Purpose
Currently, multiple types of tunnels are provided, such as LSPs (including LDP LSPs and
static LSPs), constraint-based routed LSPs (CR-LSPs), and Generic Routing Encapsulation
(GRE) tunnels. The tunnel management (TNLM) module selects tunnels for applications in
accordance with configured tunnel policies.
11.2.2 Principles
NOTE
l If no tunnel policy is configured for an application or the tunnel policy to be configured has not been
created yet, the system selects a tunnel based on the default tunnel selection policy. Specifically, the
system selects only one LSP for the application.
l If a protection group is configured for CR-LSPs, the protection CR-LSP cannot be selected. In other
words, the tunnel playing the protection role cannot be selected.
l If a CR-LSP is reserved for tunnel binding, the CR-LSP cannot be selected.
A tunnel binding policy can select only TE tunnels with the reserved-for-binding attribute configured.
Tunnel type Cannot ensure which tunnel is selected if there are several tunnels
prioritizing policy of the same type.
Tunnel binding Precisely specifies which TE tunnel can be used. Tunnel binding
policy policies applies only to TE tunnels, but TE tunnels can also use the
tunnel type prioritizing policy.
l Permit: If a route matches all the if-match clauses of a node, the route matches the node
and the actions defined by the apply clause are performed on the route. If a route does
not match one if-match clause of a node, the route continues to match the next node.
l Deny: In this mode, the actions defined by the apply clause are not performed. If a route
matches all the if-match clauses of a node, the route is denied and does not match the
next node.
11.2.2.5 Introduction
Introduction
Definition
Generic Routing Encapsulation (GRE) is a tunneling protocol that encapsulates the packets of
a wide variety of network layer protocols, such as Internetwork Packet Exchange (IPX),
Asynchronous Transfer Mode (ATM), IPv6, and AppleTalk, into IP tunneling packets, so that
these packets can be transmitted over an IPv4 network.
Purpose
To ensure the packets of a wide variety of network layer protocols, such as IPX, ATM, IPv6,
and AppleTalk, to be transmitted over the IPv4 network, GRE is introduced. GRE solves the
transmission problem faced by heterogeneous networks.
In addition, GRE serves as a Layer 3 tunneling protocol of VPNs, and provides a tunnel for
transparently transmitting VPN packets. Currently, GRE is supported by IPv4 L3VPN, but not
IPv6 L3VPN.
Keepalive Detection
GRE Black Hole
GRE does not support link status detection. As a result, a GRE tunnel cannot immediately
close the tunnel connection when the remote interface is unreachable and continues
forwarding data to the peer. The peer, however, discards all the packets. A black hole is
therefore generated.
Keepalive Detection
The device provides link status detection, also called Keepalive detection, for GRE tunnels.
Keepalive detection is used to detect whether the tunnel link is in the Keepalive state,
specifically, whether the peer of the tunnel is reachable. If the peer is not reachable, the tunnel
is disconnected to prevent data loss caused by black holes.
After Keepalive detection is enabled for a GRE tunnel, the ingress periodically sends
Keepalive detection packets to the peer. If the peer is reachable, the ingress receives a reply
packet from the peer. Otherwise, the ingress cannot receive any reply packet.
NOTE
The endpoint of a GRE tunnel has a Keepalive detection mechanism as long as it has Keepalive
detection configured. The peer does not need to have the Keepalive detection mechanism. After the peer
receives a Keepalive detection packet, it sends a reply packet, regardless of whether it has Keepalive
detection configured.
Unreachability Counter
After Keepalive detection is enabled for a GRE tunnel, the ingress creates a counter,
periodically sends Keepalive detection packets, and counts the number of sent detection
packets. The number increases by one each time a detection packet is sent.
The peer sends a reply packet to the ingress after receiving a detection packet. Upon receipt of
the reply packet, the source clears the counter value.
If the ingress receives a reply packet before the counter value reaches the preset value, the
ingress considers the peer reachable. If the source does not receive any reply packet before the
counter reaches the preset value, specifically, the retry times, the source considers the peer
unreachable. The ingress then closes the tunnel connection.
11.2.3 Applications
GRE tunnel
VPN
VPN site1 VPN site2
backbone
CE PE PE CE
l On a network-based VPN, both ends of a GRE tunnel reside on PEs, as shown in Figure
11-4.
VPN
backbone
VPN site1 GRE tunnel VPN site2
CE PE PE CE
Usually, the MPLS backbone network uses LSPs as public tunnels. If Ps do not support MPLS
but PEs do, LSPs cannot be used as public tunnels. In this situation, you can use GRE tunnels
for L2VPN or L3VPN solutions. Figure 11-5 shows the format of a GRE VPN packet
transmitted over an MPLS backbone network.
11.3.1 Introduction
Definition
A BGP/MPLS IP VPN is a Layer 3 virtual private network (L3VPN), which uses BGP to
advertise VPN routes and uses MPLS to forward VPN packets on the IP backbone networks
of service providers (SPs).
VPN 2
VPN 1 Site
CE
Site CE Service provider's
P backbone P
PE
PE
PE
VPN 1
VPN 2 P P
CE CE
Site Site
l CE: an edge device on a customer network. A CE provides interfaces that are directly
connected to the SP network. A CE can be a router, a switch, or a host. Usually, a CE is
unaware of the VPN and does not need to support MPLS.
l PE: an edge device on an SP network. A PE is directly connected to a CE. On an MPLS
network, PEs process all VPN services. The requirements on the performance of PEs are
rather high.
l P: a backbone device on an SP network. A P does not directly connect to a CE. Ps only
need to possess basic MPLS forwarding capabilities and do not maintain VPN
information.
PEs and Ps are managed by SPs. CEs are managed by users, except that the users trust SPs
with the management rights.
A PE can connect to multiple CEs. A CE can connect to multiple PEs, no matter whether
these PEs belong to the same SP.
Purpose
MPLS seamlessly integrates the flexibility of IP routing and simplicity of ATM label
switching. A connection-oriented control plane is introduced into an MPLS IP network, which
enriches the means of managing and operating the network. On IP networks, MPLS TE has
become an important tool in managing network traffic, reducing network congestion, and
ensuring QoS.
The VPNs using MPLS IP networks as the backbone networks are highly valued by carriers,
and have become an important means of providing value-added services.
Unlike the IGP, BGP focuses on controlling route transmission and choosing optimal routes
instead of discovering and calculating routes. VPNs use public networks to transmit VPN
data, and the public networks use an IGP to discover and calculate their routes. The key to
constructing a VPN is to control the transmission of VPN routes and choose the optimal
routes between two PEs.
BGP uses TCP (with port number 179) as the transport layer protocol, enhancing transmission
reliability. VPN routes can be directly exchanged between two PEs with routers located
between them.
BGP can append any information to a route as optional BGP attributes. The information is
transparently forwarded by BGP devices that cannot identify those attributes. Therefore, VPN
routes can be conveniently transmitted between PEs.
When routes are updated, BGP sends only updated routes rather than all routes. This
implementation saves the bandwidth consumed by route transmission, making the
transmission of a great number of routes over a public network possible.
As an Exterior Gateway Protocol (EGP), BGP is best suited for VPNs that cross the networks
of multiple carriers.
11.3.2 Principles
Definition
A BGP/MPLS IP VPN is a Layer 3 virtual private network (L3VPN), which uses BGP to
advertise VPN routes and uses MPLS to forward VPN packets on the IP backbone networks,
as shown in Figure 11-7. A BGP/MPLS IP VPN applies to scenarios where there is only one
carrier backbone network or the backbone networks of multiple carriers belong to the same
AS. A BGP/MPLS IP VPN has the following characteristics:
l Transmits packets using extended BGP.
l Encapsulates and transmits VPN packets over MPLS LSPs serving as public network
tunnels.
l Allows a device to play only one role at a time, either PE, P, or CE.
VPN1 VPN2
MP-BGP
MPLS CE
CE Site3
Site1 Backbone
VPN2 PE P PE VPN1
CE CE
Site2 Site4
Related Concepts
l Site
The site concept is frequently mentioned in the VPN technology. The following
describes a site from different aspects:
– A site is a group of IP systems with IP connectivity that can be achieved
independent of service provider (SP) networks.
As shown in Figure 11-8, on the networks on the left, the headquarters of company
X in city A is a site, and the branch of company X in city B is another site. IP
devices within each site can communicate without using the SP network.
Site A Site X
CE
CE
Carrier's Carrier's
network Headquarters of network Headquarters
X company in of X company
CityA in CityA
CE
CE
Branch of X Branch of X
company in company in
CityB Site B CityB
– Sites are classified based on the topological relationships between devices rather
than the geographical locations of devices, although devices at a site are
geographically adjacent to each other in general. If two geographically separated IP
devices are connected over a leased line, the two devices form a site if they can
communicate without the help of SP networks.
As shown in Figure 11-8, if the branch network in city B connects to the
headquarters network in city A over a leased line instead of an SP network, the
branch network and the headquarters network form a site.
– The devices at a site may belong to multiple VPNs. In other words, a site may
belong to more than one VPN.
As shown in Figure 11-9, in company X, the decision-making department in city A
(Site A) is allowed to communicate with the R&D department in city B (Site B) and
the financial department in city C (Site C). Site B and Site C are not allowed to
communicate with each other. In this case, two VPNs (VPN1 and VPN2) can be
established with Site A and Site B belonging to VPN1 and Site A and Site C
belonging to VPN2. In this manner, Site A is configured to belong to multiple
VPNs.
Site B
City A City B
Site A VPN 1 CE
X Company X Company
Decision-making R&D
department department
CE
VPN 2
City C
X Company Carrier's
Financial network
department
CE
Site C
– A site connects to an SP network using a CE. A site may contain more than one CE,
but a CE belongs to only one site.
It is recommended that you determine the devices to be used as CEs based on the
following principles:
If the site is a host, use the host as the CE.
If the site is a subnet, use switches as CEs.
If the site comprises multiple subnets, use routers as CEs.
Sites connected to the same SP network can be classified into different sets based
on configured policies. Only sites that belong to the same set can access each other,
and this set is a VPN.
l Address space overlapping
As a private network, a VPN independently manages an address space. Address spaces
of different VPNs may overlap. For example, if both VPN1 and VPN2 use addresses on
the network segment 10.110.10.0/24, address space overlapping occurs.
NOTE
A VPN instance is also called a VPN routing and forwarding (VRF) table. A PE
maintains multiple routing and forwarding tables, including a public routing and
forwarding table and one or more VRFs. A PE has multiple instances, including a public
network instance and one or more VPN instances, as shown in Figure 11-10. Each VPN
instance maintains routes from the corresponding VPN. The public network instance
maintains public network routes. This enables a PE to keep all routes from VPNs,
irrespective of their address spaces overlap.
Site1 CE
Backbone
VPN1 PE
VPN-instance
VPN2 Public
VPN-instance forwarding table
VPN2
Site2 CE
The differences between a public routing and forwarding table and a VRF are as follows:
– A public routing table contains the IPv4 routes of all PEs and Ps. These IPv4 routes
are static routes configured on the backbone network or are generated by routing
protocols configured on the backbone network.
– A VPN routing table contains the routes of all sites that belong to the corresponding
VPN instance. The routes are obtained through exchange of VPN routes between
PEs or between CEs and PEs.
– According to route management policies, a public forwarding table contains the
minimum forwarding information extracted from the corresponding routing table,
whereas a VPN forwarding table contains the minimum forwarding information
extracted from the corresponding VPN routing table.
VPN instances on a PE are independent of each other and of the public routing and
forwarding table.
Each VPN instance can be regarded as a virtual router, which maintains an
independent address space and has one or more interfaces connected to the router.
In RFC 4364 (BGP/MPLS IP VPNs), a VPN instance is called a per-site forwarding
table. As the name suggests, one VPN instance corresponds to one site. To be more
accurate, every connection between a CE and a PE corresponds to a VPN instance,
but this is not a one-to-one mapping. The VPN instance is manually bound to the
PE interface that directly connects to the CE.
A VPN instance uses a route distinguisher (RD) to identify an independent address
space and uses VPN targets to manage VPN memberships and routing principles of
directly connected sites and remote sites.
l Relationships between VPNs, sites, and VPN instances
The relationships between VPNs, sites, and VPN instances are as follows:
RDs are used to distinguish address spaces with the same IPv4 address prefix. The
format of RDs enables SPs to allocate RDs independently. An RD, however, must be
unique on the entire network to ensure correct routing if CEs are dual-homed to PEs.
IPv4 addresses with RDs are called VPNv4 addresses. After receiving IPv4 routes from a
CE, a PE converts the routes to globally unique VPNv4 routes and advertises the routes
on the public network.
l VPN target
The VPN target, also called the route target (RT), is a 32-bit extended community
attribute. BGP/MPLS IP VPN uses the VPN target to control the advertising of VPN
routing information.
A VPN instance is associated with one or more VPN targets. VPN targets are classified
into the following types:
– Export target: After learning an IPv4 route from a directly connected site, a PE
converts the route to a VPNv4 route and sets export targets for the route. As an
extended community attribute, export targets are advertised with the route.
– Import target: After receiving a VPNv4 route from one PE, a second PE checks the
export targets of the route. If one of the export targets is identical with an import
target of a VPN instance on the PE, the PE adds the route to the corresponding
VRF.
A VPN target defines which sites can receive a VPN route and which VPN routes of
which sites can be received by a PE.
After receiving a route from a directly connected CE, a PE sets export targets for the
route. The PE then uses BGP to advertise the route with the export targets to related PEs.
After receiving the route, the related PEs compare the export targets with the import
targets of all their VPN instances. If an export target is identical with an import target,
the route is added to the corresponding VRF.
The reasons for using the VPN target instead of the RD as the extended community
attribute is as follows:
– A VPNv4 route has only one RD, but can be associated with multiple VPN targets.
With multiple extended community attributes, BGP can greatly improve the
flexibility and expansibility of a network.
– VPN targets can be used to control route advertisement between different VPNs on
a PE. With properly configured VPN targets, different VPN instances on a PE can
import routes from each other.
On a PE, different VPNs have different RDs, but the extended community attributes
allowed by BGP are limited. Using RDs for route importing limits network expansibility.
On a BGP/MPLS IP VPN, VPN targets can be used to control exchange of VPN routes
between sites. Export targets and import targets are independent of each other and can be
configured with multiple values, ensuring flexible VPN access control and diversified
VPN networking modes.
l MP-BGP
Traditional BGP-4 defined in RFC 1771 can manage IPv4 routes but not the routes of
VPNs with overlapped address spaces.
To correctly process VPN routes, VPNs use MP-BGP defined in RFC 2858
(Multiprotocol Extensions for BGP-4). MP-BGP supports multiple network layer
protocols. Network layer protocol information is contained in the Network Layer
Reachability Information (NLRI) field and the Next Hop field of an MP-BGP Update
message.
MP-BGP uses the address family to differentiate network layer protocols. An address
family can be a traditional IPv4 address family or any other address family, such as a
VPNv4 address family or an IPv6 address family. For the values of address families, see
RFC 1700 (Assigned Numbers).
Out-Label Switch
In this example, the final outer label of the packet is O-L2. If penultimate hop popping (PHP) is
configured, O-L2 is removed on the penultimate hop, and the egress PE receives a packet with the
inner label only.
4. The egress PE removes the inner label residing at the bottom of the label stack.
5. The egress PE sends the packet from the corresponding outbound interface to CE2. After
its labels are removed, the packet becomes a pure IP packet.
In this manner, the packet is sent from CE1 to CE2. CE2 forwards the packet to the
destination in the way it sends other IP packets.
Benefits
BGP/MPLS IP VPN offers the following benefits:
l Enables users to communicate with each other over networks of geographically different
regions.
l Ensures the security of VPN data during transmission on the public network.
11.3.2.2 HVPN
Background
Currently, hierarchical architectures are used in most networking designs. For example,
metropolitan area networks (MANs) typically use a three-layer architecture consisting of an
access layer, an aggregation layer, and a core layer. On the network shown in Figure 11-13,
all PEs reside on the same plane and must provide the following functions:
l Provides access services for users. This function requires each PE to provide a large
number of interfaces.
l Manages and advertises VPN routes and processes user packets. This function requires
each PE to have a high-capacity memory and strong forwarding capabilities.
VPN 2
VPN 1 Site
CE Service provider's CE
Site
backbone
P P
PE
PE
PE
VPN 2 P P VPN 1
Site CE Site
CE
Related Concepts
Figure 11-14 shows a basic HVPN architecture consisting of mainly user-end PEs (UPEs),
superstratum PEs (SPEs), and network PEs (NPEs):
l UPE: directly connected to CEs and provides access services for users.
l SPE: connected to UPEs and located on the core of a network. SPEs manage and
advertise VPN routes.
l NPE: connected to SPEs and located on the network side.
A UPE and an SPE are connected by only one link and exchange packets based on labels. An
SPE does not need to provide a large number of interfaces for access users. UPEs and SPEs
can be connected by physical interfaces with physical links, by sub-interfaces with virtual
local area networks (VLANs) or permanent virtual circuits (PVCs), or by tunnel interfaces
with label switched paths (LSPs). If an IP or MPLS network resides between a UPE and an
SPE, the UPE and SPE can be connected by tunnel interfaces to exchange labeled packets
over a tunnel.
The capabilities of SPEs and UPEs differ according to the roles they play on a network. SPEs
require large-capacity routing tables and high forwarding performance, but few interface
resources. UPEs, on the other hand, require only low-capacity routing tables and low
forwarding performance, but high access capabilities.
NOTE
The roles of UPEs and SPEs are relative. On an HVPN, a superstratum PE is the SPE of an understratum
PE, and an understratum PE is the UPE of a superstratum PE.
An HoPE is compatible with common PEs on an MPLS network.
If a UPE and an SPE belong to the same autonomous system (AS), they use the Multi-
protocol Extensions for Interior Border Gateway Protocol (MP-IBGP). If they belong to
different ASs, they use the Multi-protocol Extensions for Exterior Border Gateway Protocol
(MP-EBGP).
If MP-IBGP is used, an SPE can function as the route reflector (RR) for multiple UPEs to
advertise routes between IBGP peers. To reduce the number of routes on UPEs, ensure that an
SPE that is already acting as the RR for UPEs is not used as the RR for other PEs.
VPN1 CE
site1
VPN2 CE
site1
UPE2 SPE2 NPE2
VPN2
site2 CE
The following describes the route exchanging and packet forwarding processes on an HoVPN
and an H-VPN. In the following figures, N indicates a next hop, and L indicates a label.
CE1 CE2
VPN1 VPN1
site1 site2
CE1 CE2
VPN1 VPN1
site1 site2
4. After receiving the VPNv4 routes, the UPE converts these routes to IPv4 routes and
imports routes with reachable next hops to its VPN IPv4 routing table.
5. The UPE advertises the IPv4 routes to CE1 using the IP protocol.
CE1 CE2
VPN1 VPN1
site1 site2
Data Data
CE1 CE2
VPN1 VPN1
site1 site2
Data Data
CE1 CE2
VPN1 VPN1
site1 site2
3. After receiving the packet, the SPE replaces the outer label Lv with Lu and the inner
label L2 with L3. Then, the SPE sends the packet to the NPE over the same tunnel.
4. After receiving the packet, the NPE removes the outer label Lu, searches for a VPN
instance corresponding to the packet based on the inner label L3, and removes the inner
label L3 after the VPN instance is found. Then, the NPE searches the VPN forwarding
table of this VPN instance for the outbound interface of the packet based on the
destination address of the packet and sends the packet through this outbound interface to
CE2. The packet sent by the NPE is a pure IP packet with no label.
Data
Data
CE1 CE2
VPN1 VPN1
site1 site2
Related Functions
H-VPN supports HoPE embedding.
l You can connect a new SPE to an existing SPE and configure the existing SPE to be the
UPE of the new SPE.
l You can connect new UPEs to an existing UPE and configure the existing UPE to be the
SPE of the new UPEs.
l HoPEs can be embedded repeatedly in the preceding two methods.
Figure 11-21 shows a three-layer H-VPN, and the PEs in the middle are referred to as middle-
level PEs (MPEs). MP-BGP runs between the SPE and MPEs, and between the MPEs and
UPEs.
NOTE
The MPE concept of is introduced solely for descriptive purposes and does not actually exist in an H-
VPN model.
MP-BGP advertises all the VPN routes of UPEs to the SPE, but advertises only the default
routes of the VPN instances of the SPE to UPEs.
An SPE maintains the routes of all VPN sites connected to its understratum PEs, whereas a
UPE maintains only the routes of its directly connected VPN sites. The numbers of routes
maintained by an SPE, an MPE, and a UPE are in descending order.
MPE UPE
UPE UPE
CE CE CE CE
Benefits
HVPN networking provides the following benefits:
l Flexible expandability
If the performance of a UPE is insufficient, you can add an SPE for the UPE to access. If
the access capabilities of an SPE are insufficient, add more UPEs to the SPE.
l Reduced interface resource requirements
Since a UPE and an SPE exchange packets based on labels, they only need to be
connected over a single link.
l Reduced burdens on UPEs
A UPE needs to maintain only local VPN routes. The remote VPN routes are represented
by a default or aggregated route, lightening the burdens on UPEs.
l Simpler configuration
SPEs and UPEs use MP-BGP, a dynamic routing protocol, to exchange routes and
advertise labels. Each UPE only needs to establish a single MP-BGP peer relationship
with an SPE.
Background
As networks develop rapidly, the time used for end-to-end service convergence if a fault
occurs on a carrier's network has been used as an indicator to measure bearer network
performance. MPLS TE FRR is one of the commonly used fast switching technologies. The
solution is to create an end-to-end TE tunnel between two PEs and a backup LSP that protects
a primary Label Switched Path (LSP). When either of the PEs detects that the primary LSP is
unavailable because of a node or link failure, the PE switches the traffic to the backup LSP.
MPLS TE FRR, however, cannot implement fast switching if faults occur on the ingress or
egress. If a fault occurs on the ingress or egress, services can only be restored through end-to-
end route convergence and LSP convergence. The service convergence time is closely related
to the number of routes inside an MPLS VPN and the number of LSP hops on the bearer
network. The more VPN routes, the longer the service convergence time, and the more traffic
is lost.
VPN FRR sets in advance on a remote PE forwarding entries pointing to the active and
standby PEs, respectively. In collaboration with fast PE fault detection, VPN FRR can reduce
end-to-end service convergence time if a fault occurs on an MPLS VPN where a CE is dual-
homed to two PEs. In VPN FRR, service convergence time depends on only the time required
to detect remote PE faults and change tunnel status. VPN FRR enables the service
convergence time to be irrelevant to the number of VPN routes on the bearer network.
Implementation
PE2
Backbone
VPN Site Link A VPN Site
CE1 PE1
Link B CE2
PE3
As shown in Figure 11-22, normally, CE1 accesses CE2 over Link A. If PE2 is Down, CE1
accesses CE2 over Link B.
Based on the traditional BGP/MPLS IP VPN technology, both PE2 and PE3 advertise routes
destined for CE2 to PE1, and assign VPN labels to these routes. PE1 then selects a preferred
VPNv4 route based on the routing policy. In this example, the preferred route is the one
advertised by PE2, and only the routing information, including the forwarding prefix, inner
label, selected LSP, advertised by PE2 is filled in the forwarding entry of the forwarding
engine to guide packet forwarding.
When a fault occurs on PE2, PE1 detects the fault of PE2 (the BGP peer goes Down or the
MPLS LSP is unavailable), re-selects the route advertised by PE3, and updates the forwarding
entry to complete end-to-end convergence. Before PE1 re-delivers the forwarding entry for
the route advertised by PE3, CE1 cannot reach CE2 for a certain period, because PE2, an end
point of the LSP, is Down. As a result, end-to-end services are interrupted.
VPN FRR is an improvement of the traditional reliability technology. With VPN FRR, PE1
can select the appropriate VPNv4 routes based on the matching rules. For these routes, in
addition to information about the preferred routes advertised by PE2, information about the
second-best route advertised by PE3 is also filled in the forwarding entry.
If a fault occurs on PE2, the MPLS LSP between PE1 and PE2 becomes unavailable. After
detecting the fault by means of techniques such as bidirectional forwarding detection (BFD),
PE1 marks the corresponding entry in the LSP status table as unavailable, and delivers the
setting to the forwarding table. After selecting a forwarding entry, the forwarding engine
examines the status of the LSP corresponding to the forwarding entry. If the LSP is
unavailable, the forwarding engine uses the second-best route carried in the forwarding entry
to forward packets. After being tagged with the inner labels assigned by PE3, packets are
transmitted to PE3 over the LSP between PE1 and PE3 and then forwarded to CE2. In this
manner, fast end-to-end service convergence is implemented and traffic from CE1 to CE2 is
restored.
Other Functions
VPN FRR is a fast switching technique based on inner labels. The outer tunnels can be LDP
LSPs, RSVP-TE tunnels, or traditional tunnels used by L3VPN (such as GRE tunnels). When
the forwarding engine detects that the outer tunnel is unavailable during packet forwarding,
fast switching based on inner labels can be implemented.
Usage Scenario
On a VPN where a CE is dual-homed to two PEs, after a PE fails, VPN FRR ensures that the
VPN services from the CE to the PE can be rapidly switched to the standby PE for
transmission.
Benefits
On a VPN where a CE is dual-homed to two PEs, VPN FRR speeds up service convergence
and enhances network availability in the case of PE failures.
11.3.2.4 VPN GR
Graceful restart (GR) is a high availability (HA) technology that comprises a comprehensive
set of techniques, such as fault-tolerant redundancy, link protection, faulty node recovery, and
traffic engineering. As a fault-tolerant redundancy technology, GR ensures normal forwarding
of data when the routing protocol restarts to prevent interruption of key services. Currently,
GR has been widely applied to active/standby switchovers and system upgrades.
GR is usually used when the active route processor (RP) fails due to a software or hardware
error, or used when an administrator performs a master/slave main control board switchover.
Implementation Prerequisite
On a traditional routing device, a processor performs both control and forwarding. The
processor finds routes based on routing protocols and maintains the routing and forwarding
tables of a device. High- and medium-end devices generally use the multi-RP structure to
improve forwarding performance and reliability. The processor responsible for routing
protocols is mostly located on the main control board, whereas the processor responsible for
data forwarding is located on the interface board. This design helps to ensure the continuity of
packet forwarding on the interface board during the restart of the main processor. The
forwarding-control decoupling technology satisfies the prerequisite for GR implementation.
A GR-capable device must have two control boards, and its interface board must have an
independent processor and memory.
Related Concepts
GR involves the following concepts:
Overview
VPN GR is the application of the GR technology on a VPN. VPN GR ensures that VPN
traffic is not interrupted when a master/slave control board switchover is performed on a
device that transmits VPN services. VPN GR offers the following benefits:
l Reduces the impact of VPNv4 route or BGP label route flapping on the entire network
during a master/slave control board switchover.
l Decreases the packet loss rate of VPN services to almost 0%.
l Protects important VPN services.
l Improves VPN reliability by reducing PE or CE single-point failures.
To support VPN GR, a BGP/MPLS IP VPN must support IGP GR and BGP GR. When using
an MPLS LDP LSP as a tunnel, the BGP/MPLS IP VPN must support MPLS LDP GR. If
traffic engineering is used, the BGP/MPLS IP VPN must also support RSVP GR. After a
master/slave control board switchover is performed on a PE or CE, the PE or CE and its
connected PEs keep the forwarding information of all VPN routes for a certain period to
ensure that VPN traffic is not interrupted. CEs connecting to a PE on which a master/slave
control board switchover is performed also need to keep the forwarding information of all
VPN routes for a certain period.
On a common L3VPN, a master/slave control board switchover may be performed on any PE,
CE, or P.
b. After the LSPM module deletes all LSPs in the Stale state, VPN GR is complete.
The processing on devices connecting to a PE is as follows:
l After a CE connecting to this PE detects the restart of the PE, the CE uses the same
processing flow as that of the GR helper in common IGP GR or BGP GR and keeps
information about all IPv4 routes for a certain period.
l After a P connecting to this PE detects the restart of the PE, either of the following
situations occurs:
– If BGP is not configured, the P uses the same processing flow as that of the GR
helper in common IGP GR and MPLS LDP GR.
– If BGP is configured, the BGP processing flow is the same as that of the GR helper
in the common BGP GR except that the BGP processing flow includes additional
IGP GR processing and MPLS LDP GR processing, and the P then keeps
information about all the public IPv4 routes for a certain period.
l After detecting the restart of the PE, the RRs reflecting VPNv4 routes and the other PEs
(including ASBRs) connecting to this PE use the same processing flow as that of the GR
helper in BGP GR. They then keep information about all the public IPv4 routes and
VPNv4 routes for a certain period.
affected, with neighbors not knowing the switchover on the local device. This ensures
uninterrupted transmission of VPN services.
CE1
To avoid routing loops in a VPN site, you can configure an SoO attribute on PE1 for CE1, the
SoO attribute identifies the site where the CE1 resides. The routes advertised by CE1 to PE1
then carry this SoO attribute and PE1 advertises the routes with the SoO attribute to other PEs
across the backbone network. Before advertising the received routes to its peer CE2, PE2
checks whether the routes carry the SoO attribute specified for the site where CE2 resides. If a
route carries this SoO attribute, it indicates that this route is advertised from the site where
CE2 resides. PE2 then refuses to advertise such a route to CE2, therefore avoiding routing
loops in the site.
On the network shown in Figure 11-24, a public network tunnel and a BGP VPNv4 peer
relationship are established between PE1 and PE2. After PE2 receives VPNv4 routes
advertised by PE1, PE2 performs VPN route matching and iterates the matching roue to a
public network tunnel based on next hop and tunnel policy information. If tunnel load
balancing is configured, the route will be iterated to multiple public network tunnels. Then, a
bearer relationship is established between public network tunnel(s) specified on PE2 and a
specific VRF.
The MIB queries a VPN's bearer tunnel based on the specified VRF name, public network
next hop, and tunnel ID, and return queried tunnel information to the NMS client through
SNMP packets. The queried tunnel information includes the destination address of the tunnel,
source address of the tunnel, tunnel type, outbound interface of the tunnel, load balancing
status of the tunnel, LSP index, outbound interface of the LSP, outgoing label of the LSP, next
hop of the LSP, LSP FEC, mask length of the LSP FEC, and LSP status (primary or backup).
Note that tunnels can be of different types, such as LocalIfNet, TE, GRE, and LSP. Only
tunnels of the LSP and TE types have LSP information, including the LSP index, outbound
interface of the LSP, outgoing label of the LSP, next hop of the LSP, LSP FEC, mask length of
the LSP FEC, and LSP status (primary or backup).
L2TP A Layer 2 tunneling protocol that is drafted by IETF and involves the
participation of companies such as Microsoft. The L2TP combines the
advantages of both PPTP and L2F.
Term Description
PE A device that is located in the backbone network of the MPLS VPN structure.
A PE is responsible for VPN user management, establishment of LSPs
between PEs, and exchange of routing information between sites of the same
VPN. During the process, a PE performs the mapping and forwarding of
packets between the private network and the public channel. A PE can be a
UPE, an SPE, or an NPE.
tunnel A channel on the packet switching network that transmits service traffic
between PEs. In VPN, a tunnel is an information transmission channel
between two entities. The tunnel ensures secure and transparent transmission
of VPN information. In most cases, a tunnel is an MPLS tunnel.
VPN An entity that is set up and maintained by PEs for directly-connected sites.
instance Each site has its VPN instance on a PE. A VPN instance is also called the
VPN Routing and Forwarding (VRF) table. A PE has multiple forwarding
tables, including a public-network routing table and one or multiple VRFs.
VPN A BGP extended community attribute that is also called Route Target. In
target BGP/MPLS IP VPN, VPN-Target is used to control VPN routing information.
The VPN-Target attribute defines which sites can receive a VPN IPv4 route
and the routes from which sites can be received by a PE.
AS autonomous system
CE customer edge
HoPE Hierarchy of PE
P provider
PE provider edge
RD route distinguisher
RR route reflector
11.4 VLL
11.4.1 Introduction to the VLL
Definition
MPLS L2VPN
The Multiprotocol Label Switching Layer 2 Virtual Private Network (MPLS L2VPN)
transmits Layer 2 VPN services over an MPLS network. MPLS L2VPN enable operators to
provide L2VPN services over different media, such as Asynchronous Transfer Mode (ATM),
Frame Relay (FR), virtual local area network (VLAN), Ethernet, and Point-to-Point Protocol
(PPP) in a unified MPLS network.
Simply, the MPLS L2VPN indicates that Layer 2 data is transmitted transparently over an
MPLS network. For the users, the MPLS network functions as a Layer 2 switched network
through which Layer 2 connections can be set up between nodes. Layer 2 connections can be
set up in virtual leased line (VLL) mode and virtual private LAN service (VPLS) mode.
l VLL
The VLL is an emulation of the traditional leased line service. It emulates the leased line
over an IP network, and provides the asymmetrical digital data network (DDN) service at
low costs. For users at both ends of a VLL, the VLL is similar to the traditional leased
line. The VLL is a point-to-point virtual private wire technology that can support almost
all the link layer protocols. The VLL can be implemented in the following modes:
– Circuit Cross Connect (CCC): It is a mode of implementing the L2VPN through
static configuration.
– Static Virtual Circuit (SVC): It is a mode of implementing the MPLS L2VPN. The
SVC is similar to the Label Distribution Protocol (LDP) L2VPN. The difference is
that LDP is not used as the signaling protocol for transmitting VC labels or link
information, whereas VC labels are manually configured on the SVC.
– Martini: It implements the MPLS L2VPN by using LDP as the signaling protocol
for transmitting the VC information.
– Pseudo-Wire Emulation Edge to Edge (PWE3): It is an extension of Martini mode
and a technology for end-to-end Layer 2 service transmission.
l VPLS
VPLS uses the PSN to connect multiple Ethernet LAN segments and thus these segments
can work as one LAN. VPLS is also called transparent LAN service or virtual private
switched network service (VPSNS).
Different from the point-to-point service of the common L2VPN, VPLS enables the
service provider to offer Ethernet-based multipoint service to users through an MPLS
backbone network.
Purpose
l Extended network functions and service capabilities of operators
Operators can provide MPLS L2VPN services over only one network. In addition,
operators can use enhanced technologies related to MPLS, such as traffic engineering
(TE) and Quality of Service (QoS), to provide users with different classes of services to
meet users' requirements.
l Higher scalability
In an ATM or FR network that MPLS is not enabled, VCs provide the L2VPN service.
For each VC, the provider edge (PE) devices and provider (P) devices in the network
need to maintain the complete VC information. Then, when PEs of the operators are
connected to multiple costumer edge (CE) devices, multiple VCs are created. Therefore,
PEs and P devices must maintain information about multiple VCs. The MPLS L2VPN,
however, can adopt label stacking to multiplex multiple VCs in a label switched path
(LSP). Therefore, P devices only need to maintain information about one LSP. This
improves scalability of a system.
l Separation of administrative responsibilities
In the MPLS L2VPN, operators provide only Layer 2 connectivity while users are
responsible for Layer 3 connectivity such as routing. Therefore, route flapping caused by
incorrect configurations does not affect stability of operators' networks.
l Privacy of routing and security of user information
Users maintain their own routing information; therefore, operators do not need to
concern address overlapping or IP address planning, and do not need to worry about that
the routing information of a user is leaked to other users in private networks. This
reduces the burden of operators on management and enhances security of user
information.
l Enhanced security and confidentiality
The MPLS L2VPN provides the same security and confidentiality as ATM and FR
networks. By having users maintain their own routing information, operators do not have
to worry about address overlapping or the risk of leaking the routing information of one
user to another user. The MPLS L2VPN reduces the management pressure of operators
and improves user information security.
l Support for multiple protocols
Operators provide only Layer 2 connections; therefore, users can use any Layer 3
protocol such as IPv4 and IPv6.
l Smooth network upgrade
The MPLS L2VPN is transparent to users; therefore, when operators upgrade networks
from traditional L2VPNs such as ATM and FR networks to MPLS L2VPNs, users do not
need to perform any configuration. The network upgrade does not affect user services
except for data loss in a short period during the switchover.
11.4.2 Principles
l Martini
VLL supports the following link layer protocols:
l VLAN
l Ethernet
VLL supports the following types of interfaces:
l Ethernet interface
l Ethernet sub-interface
l GE interface
l GE sub-interface
l Eth-Trunk interface
l Eth-Trunk sub-interface
l MP interface
An AC interface in VLAN encapsulation mode can be an Ethernet interface or an Ethernet
sub-interface. However, the encapsulation mode of an Ethernet interface used as an AC
interface must be Ethernet, not VLAN.
In VLL networking, only one VC can be configured on each interface.
VLL Architecture
The VLL architecture comprises two ACs, one VC, and one tunnel, as shown in Figure
11-25.
AC VC AC
Tunnel
PE MPLS PE CE
CE
Network
Functional Modules
VLL involves the following functional modules:
l AC: an independent physical or virtual circuit connecting a CE and a PE. An AC
interface can be either a physical or a virtual interface. The AC attributes include the
encapsulation type, maximum transmission unit (MTU), and interface parameters of the
specified link type.
l VC: a virtual connection between two PEs.
l Tunnel: a virtual link used to transparently transmit service data.
CCC must be configured by network administrators and is best suited for small MPLS
networks with simple topologies. The establishment of CCC virtual circuits (VCs) does not
require signaling negotiation or exchange of control packets. Compared with other types of
Layer 2 connections, CCC VCs consume fewer resources and are easy to configure.
Site4
The most significant advantage of an MPLS L2VPN in local CCC mode is that an ISP
network can support this type of L2VPN so long as the ISP network supports MPLS
forwarding. Exchange of labels or signaling packets carrying L2VPN information is not
required during the establishment of this type of L2VPN. In addition, QoS can be guaranteed
for CCC VCs. This is because an LSP used by a CCC VC can no longer be used by other
types of Layer 2 connections.
Definition
A Martini VLL uses LDP as the signaling protocol to transmit VC information. The Martini
mode complies with RFC 4906, which extends LDP by adding a new type of forwarding
equivalence class (FEC) for exchanging VC labels. A PE assigns a VC label to each
connection between two CEs. L2VPN information carrying VC label information is
forwarded to the remote PE over an LSP established using LDP. In this manner, a VC LSP is
set up over the ordinary LSP.
In Figure 11-27, Site1 and Site2 in each VPN (VPN1 and VPN2) are interconnected using an
LSP on the ISP network. Site1 and Site2 in VPN1 can also multiplex an LSP with Site1 and
Site2 in VPN2.
CE PE1 PE2 CE
Site1 Site2
CE P P CE
A Martini VLL supports remote connections, but not local connections. The Martini mode
supports graceful restart (GR). After the ATN performs a switchover, the VC labels remain
unchanged. During the switchover, the packet forwarding on the VC remains unaffected.
Related Concepts
If PW redundancy is configured for a Martini VLL, you need to configure the following
parameters:
l VC type: indicates the encapsulation type of a VC, such as ATM, VLAN, and PPP.
l VC ID: identifies a VC. The IDs of VCs of the same type must be unique on a PE,
except that these VCs belong to the same MS-PW.
l Peer IP address: indicates the IP address of the remote PE for a VC. The peer IP address
uniquely identifies a VC. The loopback interface IP address of the remote PE is usually
used as the peer IP address.
The PEs that are connected to two CEs exchange VC labels using LDP and bind the
corresponding CEs to the VC ID. If two PEs that exchange VC labels are not directly
connected, a remote LDP session must be established on which the VC FEC and the VC label
are transmitted. A VC can be set up for two CEs to transmit Layer 2 data if the following the
conditions are met:
l The physical status of the AC interfaces is Up.
l A tunnel exists between the two PEs.
l The VC labels have been exchanged between PEs and CEs have been bound to the VC
ID.
mode are adopted. To set up a PW, an LDP session must be established first. Which type of
LDP session needs to be established between PEs depends on the following situations:
l If a P exists between PEs, the LDP session needs to be established in remote mode.
l If PEs are directly connected, an ordinary LDP session needs to be established.
PE1 PE2
Reque
st
mpls l2vc 2.2.2.2 101 Mappin
g
mpls l2vc 1.1.1.1 101
ing
Mapp
VC state up
se
Relea
VC state down raw
Withd
local VC to Up. A PW consisting of two bidirectional VCs between PE1 and PE2 is set
up successfully.
1. If PE1 detects that the AC interface or tunnel is Down or the AC interface is deleted,
PE1 sends the Withdraw and Release messages to PE2. The Withdraw message is used
to instruct the peer to withdraw the VC label; the Release message is used to respond to
the Withdraw message, instructing the peer that sends the Withdraw message to
withdraw the VC label. To expediently delete the PW, PE1 sends the Withdraw message
and Release message in tandem.
2. After receiving the Withdraw message and Release message from PE1, PE2 processes
the Withdraw message of PE1.
3. PE2 sends the Release message to PE1.
4. After PE1 receives the Release message of PE2, the PW between PE1 and PE2 is
deleted.
VPN1 VPN1
Site1 Site2
1000 3000 1001 3000 1002 3000
1000 4000 1001 4000 1002 4000
VL
CE
20
P P CE
AN
AN
VL
20
10
VL
PE2
AN
PE1
AN
ISP
20
VL
P Network P
VC
0
I1 0
I10
9
VC
I20
Site2
VC
Site1
VC
I
20
As Figure 11-29 shows, the process of packet transmission in Martini mode can be:
After these packets reach PE2, PE2 strips the incoming label 1002 of LSP1 and selects
outbound interfaces according to the VC labels 3000 and 4000. The VC labels 3000 and
4000 are transmitted to PE1 through LDP signaling when PE2 sets up VCs with PE1.
l From Site2 to Site1
After the packet sent from Site2 of VPN1 to VLAN20 of PE2 reaches PE2, PE2 adds a
VC label 3500 and an outgoing label of LSP1 2000 to the packet and sends the packet to
LSP2 (indicated by the blue dashed line) for transmission. For the ATM packet sent from
Site2 of VPN2 to PE2 with the VCI as 205, PE2 adds a VC label 4500 and an outgoing
label 2000 of LSP2 to the packet and sends the packet to LSP2 (indicated by the blue
dashed line) for transmission.
After these packets reach PE1, PE1 strips the incoming label 2002 of LSP1 and selects
outbound interfaces according to the VC labels 3500 and 4500. The VC labels 3500 and
4500 are transmitted to PE2 through LDP signaling when PE1 sets up VCs with PE2.
The preceding process of packet transmission shows that the outer LSP tunnel is shared. After
receiving the packets, PE2 maps the packets to different VCs according to different inner
labels.
In Martini mode, the LSP label is used to transmit the data of each VC on the ISP network.
The VC label is used to identify service data. An LSP on the ISP network can be shared by
multiple VCs. The LSP is used to transmit the VC data across the ISP network and can be
encapsulated into the IP tunnel. To deploy the Martini mode, the ISP network must be able to
automatically set up LSPs. The ISP network must support MPLS forwarding and MPLS LDP.
If not, GRE tunnel encapsulation can be implemented.
Interface Parameters
Optional Parameters
In Martini mode, extended remote LDP sessions are established between PEs to transmit VC
information. Type 128 FECs are added to transmit VC information. Figure 11-31 shows the
format of a VC FEC.
0 7 8 23 31
VC TLV(0x80) c VC Type VC Info Length
Group ID
VC ID
Interface Parameters
In a Type 128 FEC, the length of Interface parameters is indefinite. The length information is
contained in the VC info length field.
VC info length Indicates the 8 The value indicates the total length
length of the of the VC ID and interface
VC information. parameters.
Usage Scenarios
The Martini mode applies to networks with sparse Layer 2 connections, such as networks
with the star topology.
Benefits
In Martini mode, the ISP network can be shared by multiple VCs. The Martini mode is easy
to extend because in the carrier's network, only the PE needs to save information about VC
labels and LSP mapping and the P does not contain any L2VPN information. When you add a
VC, you only need to configure two unidirectional VCs on the two related PEs without
affecting the running of the network. Compared with the Kompella mode, the Martini mode
adopts LDP rather than BGP as the signaling protocol. The Martini mode is independent of
the timing refresh mechanism. This mode is faster in fault detection.
Definition
SVC VLL is an L2VPN technology that uses VC labels manually configured based on VC
IDs to transmit data. SVC VLL is similar to Martini VLL, except that Martini VLL uses LDP
to exchange VC labels. SVC VLL can be regarded as simplified Martini VLL.
require a signaling protocol to exchange VC labels. The network topology and packet
exchange process of SVC VLL are the same as those of Martini VLL.
When creating a static Layer 2 VC connection in SVC mode, you can specify an LDP LSP,
constraint-based routing label switched path (CR-LSP) as the bearer tunnel in the tunnel
policy. You can also specify multiple bearer tunnels for load balancing. SVC VLL supports
multi-hop inter-AS L2VPN, but does not support local connections.
Introduction
Heterogeneous VLL applies to scenarios where the AC interfaces at both ends of an L2VPN
connection have different link types. After a PE receives a frame from a CE, the PE removes
the frame header and transparently transmits the IP packet over an MPLS network to the peer
PE. The peer PE re-encapsulates the IP packet according to its own link layer protocol and
transmits the packet to the connected CE. PEs directly process link-layer control packets
received from CEs without transmitting these packets over the MPLS network and silently
discard non-IP packets, including MPLS and Internet Packet Exchange (IPX) packets.
Topology
Heterogeneous VLL is required when two heterogeneous sites accessing an L2VPN backbone
network need to communicate. On the network shown in Figure 11-32, Site 3 and Site 4 are
homogeneous sites, but Site 1 and Site 2 are heterogeneous sites.
CE Backbone CE
ATM 1 GE 1
VPN2 VPN2
GE 1 GE 1
CE CE
Site4 Site3
Table 11-6 lists the types of data that can be transparently transmitted over a VLL.
Table 11-6 Types of data that can be transparently transmitted over a VLL
Value Type
0x0005 Ethernet
0x0007 PPP
11.4.2.7 Comparison Between the MPLS L2VPN and the BGP/MPLS VPN
Table 11-8 shows differences between the Martini MPLS L2VPN, and BGP/MPLS VPN.
Table 11-8 Comparison between the MPLS L2VPN and the BGP/MPLS VPN
Item BGP/MPLS VPN Maritni L2VPN
Cost of PEs The memory cost is high; the The memory cost is low; the
consumption of interface consumption of interface resources
resources is low; the signaling is high; the signaling cost is high.
cost is low.
Flooding mode The VPN routes are flooded The VPN routes are flooded directly
of VPN routes through PEs and converge between CEs and converge rapidly.
slowly.
Access mode of Different sites in the same VPN Maritni L2VPNs of different
CEs can have different access modes. encapsulation types, such as PPP,,
ATM and Ethernet (VLAN), can
interwork through heterogeneous
IP-interworking.
Inheritance from Inherits and improves the Inherits and improves the traditional
the tradition traditional L2VPN. L2VPN.
VPN
P1 PE2
PE1
AC1
AC3
P2 PE3 RNC
Node B
l Backbone tunnel backup: As shown in Figure 11-34, a primary tunnel and one or more
secondary tunnels are set up between PEs on both ends of the link.
This scheme is a networking solution considering tunnel faults in a backbone network.
P1
PE1 PE2
AC1
VPN backbone
AC2
Node B P2
RNC
Redundant L2VPN is not required, because if a secondary tunnel exists between PEs,
BFD can directly detect the tunnel fault and switch tunnels, which speeds up fault
convergence and avoids PW faults.
AC attachment circuit
CE customer edge
PE provider edge
SP service provider
VC virtual circuit
11.5 PWE3
11.5.1 Introduction
Definition
A pseudo wire emulation edge to edge (PWE3) service is a point-to-point (P2P) connection
on a multiprotocol label switching (MPLS) Layer 2 virtual private network (L2VPN). PWE3
provides methods for carrying network services such as asynchronous transfer mode (ATM),
frame relay (FR), Ethernet, time division multiplexing (TDM), and synchronous optical
network/synchronous digital hierarchy (SONET/SDH) over a packet switched network (PSN).
PWE3 is developed based on draft-martini-l2circuit-trans-mpls and in compliance with RFC
4447. Currently, PWE3 supports only FEC 128.
Purpose
IP networks have developed rapidly in recent years, owing to their advantages in
upgradability, expansibility, and interoperability. In comparison, the development of
traditional communications networks is confined due to limitations on transmission modes
and service types. To upgrade traditional communications networks and expand their capacity,
integrate them with existing PSNs. This solution maximizes the use of existing network
resources.
PWE3 is used to carry various types of services such as Ethernet, ATM, TDM, and PPP over
broadband metropolitan access networks or mobile broadband networks. As shown in Figure
11-35, the headquarters and branch of company A use traditional communications networks
such as ATM and FR networks. A pseudo wire (PW) is established between PE1 and PE2
using PWE3, so that the headquarters and branch can communicate over the MPLS network.
By converging previous access modes with the current IP backbone network, PWE3 prevents
repetitious network construction and saves operation costs.
Company A Company A
MPLS branch
headquarters
network
MPLS tunnel
AC AC
CE1 PE1 PE2 CE2
Benefits
As an independent workgroup of the Internet Engineering Task Force (IETF), the PWE3
workgroup extends draft-martini-l2circuit-trans-mpls and defines a complete PW architecture.
This architecture uses some specifications of Martini virtual leased line (VLL) defined in
draft-martini-l2circuit-trans-mpls and has the following features compared with the Martini
PW architecture:
l Advertising PW status using Label Distribution Protocol (LDP) signaling Notification
messages
Notification messages only advertise PW status. A PW established in PWE3 mode is
torn down only when PW configurations are deleted or the LDP session is interrupted.
This feature reduces signaling control packets exchanged between PEs, reducing
signaling costs. PWE3 PWs can work with Martini PWs.
l Supporting MS-PWs
The number of LDP connections required on an access device is reduced, minimizing
LDP session costs on the access device. MS-PWs allow more flexible networking.
l Supporting TDM interfaces
TDM interfaces can use the control word (CW) feature to sequence TDM packets and
the Real-Time Transport Protocol (RTP) to extract and synchronize clock signals.
l Providing the fragmentation negotiation mechanism
l Providing PW connectivity detection functions, such as virtual circuit connectivity
verification (VCCV) and PW operation, administration, and maintenance (OAM)
PW connectivity detection ensures quicker network convergence and enhanced network
reliability.
l Enriching and optimizing MIB functions and improving MIB maintainability
In addition to carrying various types of services, PWE3 enables a mobile network to evolve
towards LTE. PWE3 can protect carriers' investment during the migration of services such as
ATM and TDM from traditional communications networks to IP networks.
NOTE
For similarities between PWE3 and Martini, such as L2VPN heterogeneous interworking and inter-AS
VPN, see Martini VLL.
11.5.2 Principles
PW Classification
PWs can be classified into static PWs and dynamic PWs or single-segment PWs (SS-PWs)
and MS-PWs, depending on different classification methods.
Martini VLL supports dynamic PWs established using LDP signaling. In addition to dynamic
PWs, PWE3 also supports static PWs.
An SS-PW is a PW set up between two PEs without PW label switching. PW1 in Figure
11-36 is an example of an SS-PW.
An MS-PW is a set of two or more PW segments that function as a single PW. The
forwarding mechanisms of PEs for the SS-PW and MS-PW are the same. The only difference
is that PW labels are switched on switching PEs (SPEs) for MS-PWs. PW2 in Figure 11-36 is
an example of an MS-PW.
NOTE
If two PEs cannot establish a connection using signaling or cannot establish a direct tunnel, configure an MS-
PW between the two PEs instead. By supporting MS-PWs, PWE3 enables networking modes to be more
flexible.
The preceding PW classification methods can be used together. For example, an MS-PW can
be a set of static and dynamic PW segments.
PE1 P PE2
SPE
PW2 PW2
Segment1 Segment2
dynamic PW, the label distribution mode is downstream unsolicited (DU) and the label
retention mode is liberal label retention.
NOTE
If Ps exist between the two PEs, the LDP session must be established in remote mode. If the two PEs are
directly connected, the local LDP session is established.
After PWE3 is configured on the two PEs and an LDP session is established between the two
PEs, the dynamic PW starts to be established. Figure 11-37 shows the process of establishing
a dynamic PW.
1. PE1 sends a Label Request message and a Label Mapping message to PE2.
2. After receiving the Label Request message from PE1, PE2 sends a Label Mapping
message to PE1.
3. After receiving the Label Mapping message from PE1, PE2 determines whether its PW
configurations are consistent with those on PE1. If its PW configurations such as the VC
ID, VC type, MTU, and CW enabling status are consistent with those on PE1, PE2 sets
the PW status as Up.
4. After receiving the Label Mapping message from PE2, PE1 determines whether its PW
configurations are consistent with those on PE2. If consistent, PE1 sets the PW status as
Up. After that, a dynamic PW is established between PE1 and PE2.
5. After the dynamic PW is established, PE1 and PE2 learn the status of each other by
exchanging Notification messages.
PE1 PE2
parameter match , VC up
ing
Mapp
parameter match , VC up
Notification
AC/Tunnel state changed AC/Tunnel state changed
If the AC interface of a PW is Down or the corresponding tunnel is Down, Martini and PWE3
use different processing mechanisms:
l In Martini mode, the local Provider Edge (PE) sends a Label Withdraw packet to its peer
to tear down the PW. After the AC interface or tunnel goes Up, another round of
negotiation is required for the PEs to establish a PW.
l In PWE3 mode, the local PE sends a Notification signaling to notify its peer that packets
cannot be forwarded, but the PW is not torn down. When the AC interface or tunnel goes
Up, the local PE sends a Notification packet to notify its peer that packets can be
forwarded.
The PW is torn down only when PW configurations are deleted from the PEs or the LDP
session is interrupted. Notification messages prevent repeated PW establishment and deletion
caused by network flapping.
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
PE1 PE2
Release
Release
VC Deletion
Mapping
mpls l2vc 2.2.2.2 100
Request
Mapping
Switch PW
Request
Request
Mapping parameters match
parameters match Mapping VC up
VC up
Derivative Functions
PWE3 reliability requirements are increasing as the PWE3 technology becomes more widely
used. Currently, many fast fault detection and protection switching mechanisms are available,
such as bidirectional forwarding detection (BFD), OAM, and FRR. These mechanisms,
however, address only link or node failures within a PSN, but not PE or AC failures between
PEs and CEs. To solve this problem, PW automatic protection switching (APS) and PW
redundancy are introduced. PW APS is in compliance with G.8131 and PW redundancy is in
compliance with draft-ietf-pwe3-redundancy. For details, see PWE3 Reliability.
Background
ATM is a traditional multi-service bearer technology used on backbone networks. ATM
networks can carry services such as IP, FR, voice, teleconference, and ISDN/DSL and provide
well-designed quality of service (QoS) mechanisms for these services. ATM networks have
been used to carry important services.
By interconnecting ATM networks over a PSN, ATM cell relay emulates traditional ATM
services when they are transmitted over the PSN. This allows end users to be unaware of
network differences and protects carriers' investment during network convergence and
construction.
Related Concepts
l ATM cell: A cell is the basic ATM transmission unit. An ATM cell consists of 53 bytes,
comprising a 5-byte header and a 48-byte payload. Each ATM cell is transmitted
independently with a short transmission delay.
l VC: ATM is a VC-based and connection-oriented switching technology. Each VC is
identified by a virtual path identifier (VPI) and a virtual channel identifier (VCI). A
VPI/VCI pair is valid for only a link between ATM devices.
l PVC: A permanent virtual circuit (PVC) is a type of ATM connection configured by a
network administrator. The establishment of a PVC does not require signaling.
l SVC: A switched virtual circuit (SVC) is a type of ATM connection dynamically
established using signaling.
l VCC: A virtual circuit connection (VCC) is a type of ATM connection established based
on VCI switching.
l VPC: A virtual path connection (VPC) is a type of ATM connection established based on
VPI switching.
l AAL: The ATM adaptation layer is similar to the data link layer of the OSI reference
model and is integrated with the ATM layer. The AAL is responsible for separating the
upper layer from the ATM layer. The AAL prepares for conversion between service data
and ATM cells by fragmenting service data into 48-byte payloads for ATM cells.
l VPI/VCI mapping: As shown in Figure 11-40, a PW is used to emulate an ATM Switch.
To retain configurations on ATM Switch, VPI/VCI pairs 1/100 and 2/200 must be
mapped to each other on PE1 and PE2. In this manner, VPI/VCI pairs for CEs of a VC
are mapped. If the PW emulates only one VPC or VCC, the PW functions as an ATM
switch and mapping between VPI/VCI pairs does not need to be configured on PE1 and
PE2. If the PW emulates two or more VPCs or VCCs, mapping between VPI/VCI pairs
need to be configured on PE1 and PE2.
Figure 11-40 Networking diagram for ATM cell relay over a P2P tunnel on a PSN
ATM
Switch
VP
I: I/
I/ VC 2/2 VCI:
ATM VP / 1 0 0 00 ATM
1 CE
ATM CE ATM
Switching Switching
Network Network
VP I:
I/ /VC
1/1 VCI: I
VP /200
00 2
PSN
PE1 PE2
NOTE
For details about ATM, see the ATM description in ATN Multi-service Access Equipment Feature
Description - WAN Access.
Implementation
ATM cell relay interconnects traditional ATM networks and carries ATM cells over a point-to-
point PW on a PSN.
Figure 11-41 shows the label encapsulation mode for ATM cell relay over a PSN. The outer
label is the MPLS tunnel label and the inner label is the VC label used to identify the PW.
Figure 11-41 Networking diagram for ATM cell relay over a PSN
PSN-based ATM encapsulation
PSN transport header Outer label
PW header Inner label
Outer label identifying ATM control word
the PSN tunnel ATM service payload
MPLS Inner label identifying
ATM service the PW
network
PSN tunnel ATM
ATM
network PW network
PE PE
ATM service
A VPI/VCI pair is used to identify an ATM VC. Based on PW emulation types and
comparison between ATM cell relay and AAL5 SDU relay, the following ATM cell relay
modes are defined:
l One-to-one (1-to-1): One PW emulates one VCC or VPC to carry ATM cells.
l N-to-one (N-to-1): One PW emulates two or more VCCs or VPCs to carry ATM cells.
l ATM port cell relay: One PW emulates one dedicated ATM transport line to carry ATM
cells and VPC or VCC emulation is not required.
As shown in Figure 11-42, ATM cell relay is classified into the following modes based on
PWE3 networking modes:
l Remote ATM cell relay: CEs are connected to two different PEs on the PSN, and ATM
cells need to be transparently transmitted over the PSN.
l Local ATM cell relay: CEs are connected to the same PE on the PSN. ATM cells are
directly forwarded by the PE, instead of being transparently transmitted over the PSN.
Figure 11-42 Networking diagram for local and remote ATM cell relay
PE1 PE2
ATM
ATM PSN ATM
network network
CE1 CE3
ATM ATM
network
CE2
ATM
network Local connection
Remote connection
Table 11-9 lists the characteristics of different ATM cell relay modes.
1-to-1 VPC cell All AAL types VP The VCI but not the VPI is
relay encapsulated into the ATM cell.
The control word is required for
the PW.
ATM port cell All AAL types Port The VPI/VCI pair is
relay encapsulated into the ATM cell.
The control word is optional for
the PW.
Usage Scenario
The following describes usage scenarios for different ATM cell relay modes.
Figure 11-43 shows an example of a VCC. A VCC is the basic transmit unit of an ATM
network. VCCs can carry various ATM services.
PVC:VPI1/VCI1 PVC:VPI2/VCI2
CE PE PE CE
ATM PSN ATM
Network Network Network
Figure 11-44 shows an example of a VPC. A VPC is a set of VCCs with the same destination.
VPCs can carry various ATM services. ATM VPC cell relay applies to the scenario in which
packets from multiple users are bound to the same destination. ATM VPC cell relay features
rapid transmission, easy management, and convenient configuration.
PVC:VPI1 PVC:VPI2
CE PE PE CE
ATM PSN ATM
Network Network Network
Figure 11-45 shows an example for ATM port cell relay. ATM port cell relay allows an ATM
port to be connected to another ATM port for ATM cell transmission. ATM port cell relay
applies to the scenario in which ATM cells need to be transmitted between two CEs over a
connection other than the VPC or VCC. The ingress PE discards idle and unassigned cells
received on an ATM port, saving bandwidth resources.
Benefits
By interconnecting traditional ATM network resources over a PSN, ATM cell relay emulates
traditional ATM services when they are being transmitted over the PSN. This allows end users
to be unaware of network differences and protects carriers' investment during network
convergence and construction.
11.5.2.3 PW Template
A PW template is a set of common attributes abstracted from PWs. Before configuring PWs
with similar attributes, you can define a PW template that contains the common attributes of
these PWs. Then, you can configure these PWs based on the PW template to simplify the
configuration process.
The ATN allows for binding between PWs and PW templates and the reset of PW templates.
Using a PW template helps simplify the configuration of PWs with the similar attributes.
PW Template Attributes
On the endpoint PEs of a PW, you can create a PW template and specify the related attributes,
such as the peer IP address, control word, tunnel policy, and maximum number of cells
allowed in a frame. These attributes are optional and can be selected as required. If you want
to perform the continuity check in control word mode, enable the control word function in
advance.
W S ta
ic P tic
am
Dyn PW
U-PE1 U-PE2
CE-A CE-B
AC attachment circuit
CW control word
FR frame relay
PE provider edge
PW pseudo wire
VC virtual circuit
11.6.1 Overview
Introduction
Pseudo Wire Emulation Edge to Edge (PWE3) is a bidirectional and point to point (P2P)
service on a multiprotocol label switching Layer 2 virtual private network (MPLS L2VPN).
High reliability is required for the VPN service. There are many fast fault detection and
protection switching mechanisms such as bidirectional forwarding detection (BFD),
operation, administration and maintenance (OAM), and fast reroute (FRR). These
mechanisms, however, address only link or node failures within a packet switched network
(PSN), but not PE failures or attachment circuit (AC) failures between PEs and CEs.
To protect services against PE and AC failures, PWE3 reliability mechanisms are required.
Packet
Trunk/APS PE1 Switched PE2 Trunk/APS
Network
AC PW AC
The most effective way to protect ACs is to deploy multiple physical links between PEs and
CEs connected by the ACs. In Figure 11-47, the trunk technique is used to bundle multiple
physical links into a logical link and automatic protection switching (APS) is configured for
the trunk. Trunk applies to Ethernet links, whereas APS applies to asynchronous transfer
mode (ATM) or time division multiplexing (TDM) links.
Either trunk or APS protects services against only AC failures between PEs and CEs, but not
PE failures.
To protect services against both PE and AC failures, enhanced trunk (E-Trunk) and PW
redundancy/PW APS are used. E-Trunk is deployed between devices.
On the network shown in Figure 11-48, PE2 is the master and PE3 is the backup; the AC
between CE2 and PE2 is active and the AC between CE2 and PE3 is standby. A primary PW
is deployed between PE1 and PE2; a secondary PW is deployed between PE1 and PE3. The
backup PE and standby AC protect services on the master PE and active AC.
W Trunk/APS
Trunk/APS ary P
Prim
Bypass PW
PSN
Seco
ndar
PE1 y PW CE2
CE1
PE3
AC PW AC
When PWE3 FRR is used and a public network link on the PSN fails, traffic must also be
switched between ACs. As Figure 11-48 shows, the path between PE1 and PE2 is the active;
the path between PE1 and PE3 is the standby. When the public network link between PE1 and
PE2 fails or PE2 fails, PWE3 triggers Ethernet OAM to rapidly notify CE2 of the failure.
Upon receipt of the failure notification, CE2 switches traffic to the link between CE2 and
PE3. If PWE3 is associated with E-Trunk in this network, traffic cannot be rapidly switched
back after the failure is removed.
PWE3 FRR Only PWE3 1:1 l Static BFD for PW, Poor
and Martini, LSP OAM
but not SVC l Static mapping,
CR-LSP and physical
layer failure
l Dynamic notification
CR-LSP
l LDP LSP
11.6.2 Principles
11.6.2.1 PW Redundancy
PW Redundancy Signaling
In conventional PWE3, one-to-one mapping is implemented between ACs and PWs. To
ensure the same forwarding capability, the PW protection mechanism to be used must allow
the configuration of a single PW in a PW group as an active PW and the remaining as inactive
PWs.
RFC 4447 (Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP) )
specifies the PW Status TLV to transmit the PW forwarding status. The PW Status TLV is
transported to the remote PW peer using a Label Mapping or LDP Notification message. The
PW Status TLV is a 32-bit status code field. Each bit in the status code field can be set
individually to indicate more than a single failure at once. PW redundancy introduces a new
PW status code 0x00000020. When the bit is set, it indicates "PW forwarding standby".
NOTE
l Primary and secondary are terms used to describe PW forwarding priorities and can be
configured.
A PE selects the primary PW in preference to a secondary PW when both PWs are in the
Active state. Currently, only one secondary PW can be configured for a primary PW.
l Active and inactive are terms used to describe PW forwarding and operating status and
cannot be configured.
Only active PWs are used to forward traffic. The signaling status and configured
forwarding priority determine PW forwarding status. A PW with the highest priority will
be selected as an active PW to forward traffic. All the other PWs will be in the Inactive
state and must not be used to forward traffic. Inactive PWs used in the VLL service can
be configured to receive traffic though.
Operation Modes
PW redundancy operation modes are specified on PEs where primary and secondary PWs
have been configured. If a PW redundancy operation mode is not specified, PWE3 FRR will
be used.
NOTE
In PWE3 FRR, a PE locally determines the primary and secondary status of the PWs, of which a remote
PE is not informed. PWE3 FRR is implemented on Huawei devices only and is not recommended.
There are two PW redundancy operation modes:
Master/slave mode:
A PE locally determines the primary and secondary status of the PWs, and uses signaling to
inform a remote PE of the status. The PW status is independent of the AC status, and
therefore PW and AC failures are isolated.
Independent mode:
On a PE, its PW status is determined by the remote AC status after negotiation procedures.
The remote PE then informs the PE of the PW status. If an AC fails and protection switching
is triggered, protection switching will also be implemented on the PWs. This mode cannot
isolate PW and AC failures.
NOTE
11.6.2.2 PW APS
Definition
APS instructs the source and destination ends to implement protection switching in the same
manner to achieve traffic switching, delayed switching, and wait-to-restore. APS always
transmits protocol traffic along the backup channel. Both the transmit and receive ends know
that they receive APS protocol packets through each other's backup channel. This
implementation helps determine whether both ends are configured with the same master and
backup channels.
PW APS is an application of APS on PWs. PW APS uses PW OAM to monitor the PW status.
If a PE detects that the primary PW fails, PW APS is triggered, and traffic is switched to the
secondary PW, implementing service protection.
Purpose
PWs are generally used to transmit 2G services between base transceiver stations (BTSs) and
base station controllers (BSCs), 3G services between NodeBs and RNCs, and long term
evolution (LTE) services between eNodeBs and mobility management entities (MMEs)/
serving gateways (S-GWs). PWs meet requirements for bandwidth, expansion, and flexible
configuration of these services. The bearer network solution includes:
l Static solution: Static routes, LSPs, and PWs are used.
l Dynamic solution: Dynamic routes, LSPs/TE tunnels, and PWs are used.
As static PWs do not use signaling, the primary and secondary PW status negotiation, PW
switchover, and PW switchback cannot be implemented using signaling. PW redundancy
currently supported addresses only PWE3 reliability, but not reliability for PWs in SVC or
Martini mode. SVC PWs are static PWs. PW APS can provide reliability for PWs in SVC,
Martini, or PWE3 mode.
l PW APS uses PW OAM (MPLS OAM or TP OAM) to rapidly monitor PW status and
notifies APS of the status.
l The primary/secondary PW protection group is associated with APS instances. APS
instructs the source and destination ends to implement bidirectional PW protection
switching in the same manner, as defined in G.8131.
PW APS applies to SVC, Martini, or PWE3 PWs.
Using PW APS or PW redundancy solely on the entire network is recommended. PW APS
and PW redundancy are both reliability mechanisms but are implemented differently.
Basic Concepts
Protection Type
PW APS can work in 1:1 or 1+1 mode, in which primary and secondary PWs backing up each
other. In PW APS 1:1 mode, traffic is transmitted and received through a single link. In PW
APS 1+1 mode, traffic is transmitted and received through double links but accepted through
only one link.
Switching Type
PW APS supports bidirectional protection switching. If a working PW fails in one direction,
APS switches traffic in both directions to a protection PW.
Operation Mode
The PW APS operation mode can be a revertive operation mode or a non-revertive operation
mode. In non-revertive mode, traffic will not be switched back from the protection PW to the
working PW even if the working PW recovers. In revertive mode, traffic will return to the
working PW after the wait-to-restore (WTR) timer configured for the working PW expires.
WTR Time
The WTR time is counted from the time when the primary PW recovers to the time when
traffic is switched back from the secondary PW. Setting a WTR time prevents frequent traffic
switching.
Delayed Switching Time
The delayed switching time is the time after which a protection switching is triggered if a
signal fail (SF) is still detected on a PW. Setting a delayed switching timer prevents switching
from immediately occurring after an SF is detected.
Dual-Homing Protection
Dual-homing protection is implemented by connecting two PEs to a CE through respective
ACs. This protects PE services on the bearer network.
PW APS Bundling
The device usually needs to undergo a great deal of PW APS protection switching. If PW
APS enables a state machine for each protection switching, the device will not be able to
implement all protection switching due to limited resources and capabilities. Configuring an
APS state machine to process a great deal of PW APS protection switching decreases resource
consumption. This APS state machine is shared by multiple PWs, which is called PW APS
bundling.
Switching Mechanism
PW APS uses PW OAM to monitor the primary and secondary PW status. PW OAM sends
detection packets from the ingress to the egress periodically. If the egress fails to receive any
detection packets in a certain period, it considers that an SF occurs and notifies the remote
APS module of the fault. This implements service switching and protection.
As shown in Figure 11-49, PW APS is configured on PE1 and PE2. Normally, upstream
traffic from a BTS/NodeB is transmitted along the path PE1->primary PW->PE2 on the PSN.
PE2 forwards the traffic to a BSC/RNC. Downstream traffic from a BSC/RNC is transmitted
along the path PE2->primary PW->PE1 on the PSN. PE1 forwards the traffic to a BTS/
NodeB.
Primary PW
Secondary PW
Service flow
As shown in Figure 11-50, if the primary PW fails, PW OAM on PE1 and PE2 detects the
failure and triggers APS. Both upstream and downstream traffic are switched to the secondary
PW.
The delayed revertive operation mode is used for PW APS by default. After the primary PW
recovers, PW OAM on PE1 and PE2 detects the recovery but waits a delayed switching time
before triggering an APS revertive operation. Both upstream and downstream traffic are then
switched back to the primary PW.
11.6.3 Applications
11.6.3.1 PW Redundancy in the Scenario that the Node B Accesses Three PEs
(PWE3)
Figure 11-51 Networking diagram of PW redundancy in the scenario that the Node B
Accesses three PEs
RNC
E-Trunk
PE1 PE2
bypass PW
PW1
PW2
PE3
BFD BFD
Node B
Figure 11-51 shows the networking diagram of PW redundancy in the scenario that CEs
asymmetrically access three PEs. This chapter takes E-Trunk as an example to describe how
the primary/secondary statuses of PWs are dynamically negotiated.
Table 11-11 Type and configuration of the link for PW redundancy in the scenario that the
Node B Accesses three PEs
Type of the AC Link Configuration on the Configuration on the PE
RNC
b. The local statuses of the PWs on PE1 and PE2 are notified to PE3 through LDP
packets.
Note that LDP packets of PE1 and PE2 reach PE3 in a random sequence.
c. After receiving the LDP packets from PE1 and PE2, PE3 acknowledges that PW1
of PE1 is the primary PW and PW2 of PE2 the secondary PW.
In this case, the unidirectional traffic path is RNC -> PE1 -> PW1 -> PE3 -> Node B.
Primary/Secondary PW Switchover
The primary/secondary PW switchover occurs in one of the following situations:
l The E-Trunk priority is changed, and statuses of PWs are renegotiated.
l PE1 becomes faulty. In this case, the E-Trunk detects the fault, and changes the status of
PE2 from backup to master. Statuses of the PWs are then renegotiated.
Note that statuses of the PWs are not affected if the backup node PE2 becomes faulty.
l The AC link between PE1 and RNC becomes faulty. The processing flow is similar to
that for the fault of PE1.
Note that statuses of the PWs are not affected if the AC link between PE2 and RNC
becomes faulty.
After the primary/secondary PW switchover, the unidirectional traffic path becomes CE1 ->
PE2 -> PW2 -> PE3 -> Node B.
After the faulty node or link recovers, the master/backup statuses of PEs is renegotiated in the
E-Trunk, and PE1 resumes the master state because its priority is not changed.
Figure 11-52 shows typical PW APS networking. The network comprises an access ring and
an aggregation ring. A BTS/NodeB is connected to a CSG. A BSC/RNC is connected to an
RSG. Primary and secondary PWs are established between a CSG and an RSG. The PWs can
be either single-segment PWs (SS-PWs) or multi-segment PWs (MS-PWs). A BTS/NodeB
communicates with a BSC/RNC through a mobile broadband (MBB) network.
PW APS is deployed on the bearer network to improve reliability. APS instances are
configured on CSGs and RSGs, and the primary/secondary PW protection group is associated
with each APS instance. APS instructs the source and destination ends to implement
bidirectional protection switching in the same manner to achieve delayed switching and WTR
for PW protection.
NodeB
SPE3
SPE1
BTS BSC/RNC
CSG1
PW APS PW APS
CSG2
BTS/NodeB RSG
CSG3
SPE2 BSC/RNC
SPE4 Primary PW
BTS
Secondary PW
NodeB
Access ring Aggregation ring
AC Attachment Circuit
PE Provider Edge
PW Pseudo Wire
SPE Switching PE
UPE Ultimate PE
VC Virtual Circuit
11.7.1 Introduction
Definition
An IP hard pipe is an MPLS LSP or a PW with a bandwidth that is guaranteed and can neither
be exceeded nor infringed upon. IP hard pipe provides quality guarantee for leased line
services of high-value customers.
In the IP hard pipe solution, the U2000 manages bandwidth resources network-wide. The
physical interface bandwidth on the public network is divided and allocated to soft and hard
pipes. For example, on a 10G Ethernet interface, 2 Gbit/s bandwidth is allocated to the hard
pipe, and the remaining 8 Gbit/s is allocated to the soft pipe. The hard and soft pipe
bandwidths are isolated and cannot be preempted.
Enterprise 1 Enterprise 1
Enterprise 2 Enterprise 2
PE P PE
3G bps
Physical Interface
Soft Pipe
2G bps
Hard Pipe
Purpose
Customers who have strict bandwidth, delay, and security requirements generally use
synchronous digital hierarchy (SDH) networks. Retaining these customers is expensive
because carriers must maintain both IP and SDH networks. Therefore, to reduce maintenance
costs and facilitate user management, carriers expect to migrate their SDH networks to IP
networks.
To meet these expectations, IP hard pipe has been developed. IP hard pipe provides SDH-like
service quality for access services on IP networks by providing guaranteed bandwidth and low
delay. It also provides granular and service-specific OAM and SLA monitoring, which can
accelerate the migration of SDH networks to IP networks.
Benefits
IP hard pipe offers the following benefits to carriers:
l Deployment of high-quality leased lines for VIP customers on newly deployed or
existing routers, reducing SDH network construction and costs for maintaining both
SDH and IP networks
l Rapid service protection, ensuring highly reliable service quality
l Granular service quality measurement using IP FPM, providing flexible and effective
maintenance and management for leased lines dedicated to VIP customers
11.7.2 Principles
This section describes the implementation principles of IP hard pipe.
10G
Physical Network
PE P P PE
1G
9G
2G
8G
In the physical network topology, select the public network links that require hard pipe
deployment and set the hard pipe bandwidth for each link. The hard pipe topology is then
established. On the network shown in Figure 11-55, after hard pipes are established over
the entire network, the physical network is divided into two logical networks: a hard pipe
network and a normal service network (called a soft pipe network).
3G 3G
1G
1G
2G
2G 1G
PE
Hardened Pipe Network
PE P P PE
7G 7G 10G
10G
9G
8G
9G
10G 8G 9G
2. Provision services.
The service bandwidth, source and destination devices, and service IDs must be
manually configured for VIP customers. The intermediate path can be manually
configured or automatically calculated by the NMS.
The NMS checks whether the hard pipe bandwidth on each node is adequate for service
provisioning:
– If the bandwidth is inadequate, the NMS stops service provisioning and displays an
error message.
– If the bandwidth is inadequate, the NMS delivers configurations to devices.
After services are provisioned, the NMS updates the bandwidth resource database.
Leased Line 1:
300M
PE P P PE
3G 3G
1G
1G
2G
2G 1G
Leased Line 1:
300M
Hard pipe services on the ATN have a higher priority than soft pipe services. If traffic is
transmitted through both the hard pipe and soft pipe, the bandwidth and low delay are
preferentially guaranteed for hard pipe traffic.
The ATN models have different chip capabilities, and therefore the hard pipe implementations
are also different.
The ATN 910/ATN 910I/ATN 905 supports IP hard pipe in interface-shared mode. In this
mode, an interface can carry both hard pipe and soft pipe traffic. Hard pipe traffic enters only
the CS7 queue, whereas soft pipe traffic can enter any queue from BE to CS7. The bandwidth
available to soft pipe traffic entering the CS7 queue equals the interface bandwidth minus the
hard pipe bandwidth. Bandwidth unused by hard pipe traffic can be used by soft pipe traffic.
If a hard pipe is configured on an interface, do not apply for extended queues for soft pipe
traffic. If you do so, packet loss or high delay may occur for hard pipe traffic.
E2E hard pipe services can only be deployed using an NMS. The NMS delivers hard pipe
VLL and TE LSP configurations based on the hard pipe's processing capabilities. A device
establishes VLL PWs and TE LSPs based on the delivered data and transmits VLL and TE
services through the hard pipe.
The NMS supports alarm thresholds for services exceeding the hard pipe's processing
capabilities, ensuring that services transmitted over the hard pipe do not exceed the hard
pipe's processing capabilities.
Principles
After the hard pipe bandwidth is reserved on a physical interface on a carrier network, the
logical hard pipe network comes into being. At this point, path planning is required for
service provisioning. Static bidirectional co-routed TE LSPs can be established to provide
P2P leased line services between two PEs.
After a carrier determines the PE for user access on the NMS, the network transmission paths
can be manually specified or automatically generated. When the NMS automatically plans
paths, hard pipe bandwidth is reserved on a hop-by-hop basis along the transmission path
based on user access bandwidth. If the hard pipe bandwidth of all links on the transmission
path meets the user access service requirements, a hard pipe TE LSP is established between
the PEs. The NMS then updates the bandwidth resource database. This implements hard pipe
services over the TE LSP.
Principles
After the hard pipe bandwidth is reserved on a physical interface on a carrier network, the
logical hard pipe network comes into being. At this point, path planning is required for
services provisioned to users.
After a carrier determines the PE for user access on the NMS, the network transmission paths
can be either manually specified or automatically generated. After a transmission path is
determined and a hard pipe TE tunnel is established, the NMS reserves the hard pipe
bandwidth on a hop-by-hop basis on the TE tunnel based on user access bandwidth. A PW
can then be established and bound to the TE tunnel, and bandwidth limitation can be deployed
on the AC interface. This implements hard pipe services over the VLL/PWE3 PW, with
guaranteed bandwidth and low delay.
The U2000 must reserve bandwidth for the hard pipe on the public network side based on the
expanded access user bandwidth.
AC VC AC
Tunnel
CE PE PE CE
MPLS Network
On the network shown in Figure 11-57, the length of packets received on a PE from a CE is
L1 (CRC length included). The length of the packets sent by the PE to the public network
interface is L2. The public network interface is an Ethernet interface that sends double-tagged
packets. L2 is calculated as follows:
L2 = L1 + Public network header length (length of the destination MAC address, source MAC
address, Eth_Type, outer VLAN tag, inner VLAN tag, TE label, and VC label)
L2 = L1 + 30
The calculation shows that the public network packet length is determined by the following
factors:
l Service packet length
l Public network link type
The service packet length varies. Even in a data flow from the same access user, packet
lengths will vary. This variability means that Ethernet links cannot use a fixed bandwidth
expansion proportion. However, a bandwidth expansion proportion can be calculated based on
the average packet length.
The VLL bandwidth expansion proportion parameters can be configured. The default value is
calculated based on the average parameter values:
NOTE
The bandwidth expansion proportion varies according to the POS and Ethernet encapsulation lengths
and the number of VLAN tags on the Ethernet network.
11.7.3 Applications
This section describes typical IP hard pipe applications.
IP hard pipe applies to P2P leased line services of high-end enterprise users.
PW
Bac P
kup
t unn
el
To protect PE1, deploy primary and secondary PWs. On the network shown in Figure 11-60,
if the master PE fails, traffic is switched to the backup PE, implementing user node
protection.
PE
PW
ary
P rim
IP hard pipe
network
PE Se
User network co
nd User network
a ry
PW
PE
Huawei devices to implement E2E hard pipe services and the non-Huawei devices also
support PWs used to implement hard pipe, multi-segment PWs (MS-PWs) can be deployed.
Figure 11-61 Hard-pipe-based leased line services implemented using both Huawei and non-
Huawei devices
AS3
(Non-huawei
devices)
ASBR ASBR
SPE SPE
P P
AS1 PE PE AS2
(Huawei (Huawei
devices) devices)
User User
network network
Terms
Term Definition
IP hard pipe A technology that provides IP leased line services with strict bandwidth
guarantee and low delay.
11.8 VPLS
VPN1
site1 VPN1
site3
CE1 PE1 PE2 CE3
MPLS backbone
VPN2 VPN2
PE3
site2 site4
CE2 CE4
VPN1
CE5
site5
Purpose
As enterprises set up more and more branches in different regions and office flexibility
increases, applications such as VoIP, instant messaging, and teleconferencing are increasingly
widely used. This imposes high requirements for end-to-end (E2E) datacom technologies. A
network capable of providing P2MP services is the key to datacom function implementation.
Traditional asynchronous transfer mode (ATM) and frame relay (FR) technologies provide
only Layer 2 point-to-point (P2P) connections. In addition, those network types have
disadvantages such as high construction costs, low speed, and complex deployment. The
development of IP has led to the MPLS VPN technology, which can provide VPN services
over an IP network and offers advantages such as easy configuration and flexibly bandwidth
control. MPLS VPNs can be classified into MPLS L2VPNs and MPLS L3VPNs.
l Traditional MPLS L2VPNs, such as the virtual leased lines (VLLs) or virtual private
wire services (VPWSs), can provide P2P services but not P2MP services over a public
network.
l MPLS L3VPNs can provide P2MP services on the precondition that PEs keep routes
destined for end users. This implementation requires high routing performance of PEs.
Benefits
VPLS brings the following benefits:
11.8.2 Principles
ATN implements VPLS control plane by running LDP. VPLS based on LDP is referred to as Martini
VPLS.
The following table describes the various concepts related to VPLS networks.
PW signaling A type of signaling used to create and maintain PWs. PW signaling is the
foundation for VPLS implementation. Currently, the PW signaling is
LDP or BGP.
VPN1 VPN1
Site3 Site2 VPN2
CE5 CE3 Site2
CE4
PE3
MPLS
PE2
Network
Forwarder
PE1
AC
CE1 CE2
PW
VPN1 VPN2
Site1 Site1 PW Signal
Tunnel
3. PE1 then adds two MPLS labels to the packet based on the PW forwarding entry and
tunnel information and sends the packet to PE2. The private network label identifies the
PW, and the public network label identifies the tunnel between PE1 and PE2.
4. After PE2 receives the packet from the public tunnel, PE2 removes the private network
label of the packet.
5. The forwarder of PE2 selects an AC and forwards the packet to CE3 over the AC.
The Ethernet often uses the Spanning Tree Protocol (STP) to prevent loops. VPLS networks,
however, use full-mesh PWs and split horizon to avoid loops as follows:
l The PEs in a VSI must be fully meshed. That is, a PE must create a tree path to every
other PE in the VSI.
l Each PE must support split horizon to avoid loops. Split horizon requires that packets
received from a PW in a VSI should not be forwarded to other PWs in the VSI. Any two
PEs in a VSI must communicate over a direct PW, which is why full-mesh PWs are
required between PEs in a VSI.
CE CE
VLAN1 VLAN1
VSI 1 VSI 1
PE PE
VSI 2 VSI 2
CE VSI 1 VSI 2 CE
VLAN2 VLAN2
PE
CE CE
VLAN1 VLAN2
BGP VPLS l PEs must run BGP, and demands The BGP mode
mode implemented in on PE performance are high. applies to VPLS
BGP mode, also Automatic member discovery is networks that
called Kompella supported, simplifying user reside on the core
VPLS, uses operations. layers of large-
BGP signaling. l After a PE is added, configurations scale networks,
on existing PEs do not need to be or span multiple
modified, as long as the total ASs, or with PEs
number of PEs does not exceed the that run BGP.
number allowed by the label block.
l RRs are used to reduce the number
of BGP connections, increasing
network expansibility.
l Usage of label blocks wastes label
resources to some extent.
l VPN targets are used to identify
VPN member relationships. This
feature allows a VPLS network to
span multiple ASs.
VLAN The header of each Ethernet frame sent between CEs and PEs carries
a VLAN tag, known as the provider-tag (P-Tag). This is a service
delimiter identifying users on an ISP network.
Ethernet The header of each Ethernet frame sent between CEs and PEs does
not carry a P-Tag. If the frame header contains a VLAN tag, it is an
inner VLAN tag called the user-tag (U-Tag). A CE does not add the
U-Tag to an Ethernet frame; instead, the tag is carried in a packet
before the packet is sent to the CE. A U-Tag informs the CE to which
VLAN the packet belongs, and is meaningless to PEs.
Encapsulation modes of packets transmitted over ACs and PWs can be used together. As
shown in Figure 11-65, CE1 and CE3 are connected to the PEs in VLAN access mode,
whereas CE2 and CE4 are connected to the PEs in Ethernet access mode. Packets on the PW
between PE1 and PE3 are encapsulated in tagged mode, whereas packets on the PW between
PE2 and PE4 are encapsulated in raw mode. The following uses Ethernet+raw encapsulation
and VLAN+tagged encapsulation as examples to describe the packet exchange process.
L2 IP
P-TAG Data
Header Header
L2 Tunnel VC L2 IP
P-TAG Data
Header Label Label Header Header
L2 IP
P-TAG Data
Header Header
VLAN VLAN
access Tagged mode access
CE1 CE3
PE1 PE3
He
L2 er
ad
He
ac AN
VL ess
L2 er
ss
ac
ad
AN
VL
ce
c
P-
TA
He
G
IP er
ad
He
IP er
ata
Da
ad
er D
ta
ata
Da
He IP
ad
er D
ta
He IP
ad
G
TA
Eth cess
t
ac erne
P-
ac
er
ern
ss
He 2
ad
L
ce
er
Eth
et
He L2
ad
PE2 PE4
CE2 CE4
Ethernet Raw mode Ethernet
access access
L2 IP L2 IP
Data Data
Header Header Header Header
L2 Tunnel VC L2 IP
Data
Header Label Label Header Header
l Ethernet+raw encapsulation
As shown in Figure 11-65, Ethernet+raw encapsulation is used on the path of CE2 ->
PE2 -> PE4 -> CE4 and its reverse path. The packet exchange process is as follows:
a. CE2 sends a Layer 2 packet without a P-Tag to PE2.
b. PE2 searches the corresponding VSI for a forwarding entry and selects a tunnel and
a PW to forward the packet based on the found forwarding entry. PE2 adds double
labels (outer tunnel label and inner VC label) to the packet based on the selected
tunnel and PW, performs Layer 2 encapsulation, and forwards the packet to PE4.
c. Upon receipt, PE4 removes the Layer 2 encapsulation carried out by PE2 and the
double labels.
d. PE4 sends the original Layer 2 packet to CE4.
The process of sending a packet from CE4 to CE2 is similar to this process.
l VLAN+tagged encapsulation
As shown in Figure 11-65, VLAN+tagged encapsulation is used on the path of CE1 ->
PE1 -> PE3 -> CE3 and its reverse path. The packet exchange process is as follows:
a. CE1 sends a Layer 2 packet with a P-Tag to PE1.
b. PE1 retains the P-Tag because a packet sent to a PW with the tagged packet
encapsulation mode must carry a P-Tag.
c. PE1 searches the corresponding VSI for a forwarding entry and selects a tunnel and
a PW to forward the packet based on the found forwarding entry. PE1 adds double
labels (outer tunnel label and inner VC label) to the packet based on the selected
tunnel and PW, performs Layer 2 encapsulation, and forwards the packet to PE3.
d. Upon receipt, PE3 removes the Layer 2 encapsulation carried out by PE1 and the
double labels.
e. PE3 sends the original Layer 2 packet to CE3.
The process of sending a packet from CE3 to CE1 is similar to this process.
VPLS
network
CE1
By default, traffic can be forwarded between AC interfaces, between UPE PWs, and between
AC interfaces and UPE PWs in a VSI. On a non-hierarchical VPLS network, VPLS service
isolation prohibits traffic forwarding between AC interfaces. On an HVPLS network, VPLS
service isolation prohibits traffic forwarding between AC interfaces, between UPE PWs, and
between AC interfaces and UPE PWs.
Background
Label Distribution Protocol (LDP) virtual private LAN service (VPLS), also called Martini
VPLS, uses a static discovery mechanism to discover VPLS members using LDP signaling.
VPLS information is carried in extended type-length-value (TLV) fields (type 128 and type
129 FEC TLVs) of LDP signaling packets. During the establishment of a pseudo wire (PW),
the label distribution mode is downstream unsolicited (DU) and the label retention mode is
liberal.
Related Concepts
LDP VPLS involves the following concepts:
l FEC: A set of packets with similar or identical characteristics and forwarded in the same
way by label switching routers (LSRs). Characteristics determining the FEC of a packet
include the destination address, service type, and QoS attribute.
l TLV: A highly efficient and expansible coding mode for protocol packets. To support
new features, you only need to add new types of TLVs to carry information required by
the features.
l DU: A label distribution mode in which an LSR distributes labels to FECs without
having to receive Label Request messages from its upstream LSR.
l Liberal: A label retention mode in which an LSR retains the label mapping received from
a neighboring LSR, regardless of whether the neighboring LSR is its next hop. In liberal
label retention mode, an LSR can use the labels sent from neighboring LSRs that are not
at the next hop to re-establish an LSP. This mode requires more memory and label space
than the conservative mode.
Implementation Process
l Figure 11-67 shows the process of establishing a PW using LDP signaling.
VSI VSI
VC1
PE1 VC2
PE2
a. After PE1 is associated with a VSI, and PE2 is configured as a peer of PE1, PE1
sends a Label Mapping message to PE2 in DU mode if an LDP session already
exists between PE1 and PE2. The Label Mapping message carries information
required to establish a PW, such as the PW ID, VC label, and interface parameters.
b. Upon receipt of the message, PE2 checks whether itself has been associated with
the VSI. If PE2 has been associated with the VSI and PW parameters on PE1 and
PE2 are consistent, PE1 and PE2 belong to the same VSI. In this case, PE2
establishes a unidirectional VC named VC1 immediately after PE2 receives the
Label Mapping message. Meanwhile, PE2 sends a Label Mapping message to PE1.
After receiving the message, PE1 takes a similar sequence of actions to PE2 and
establishes VC2.
l Figure 11-68 shows the process of tearing down a PW using LDP signaling.
a. After the peer configuration about PE2 is deleted from PE1, PE1 sends a Label
Withdrawal message to PE2. After receiving the Label Withdrawal message, PE2
withdraws its local VC label, tears down VC1, and sends a Label Release message
to PE1.
b. After receiving the Label Release message, PE1 withdraws its local VC label and
tears down VC2.
Usage Scenario
The LDP mode applies to VPLS networks that do not have many sites, do not span multiple
ASs, or with PEs that do not run BGP.
Benefits
LDP VPLS brings the following benefits:
l Easy configuration
l Saved label resources
Definition
BGP AD VPLS, short for Border Gateway Protocol Auto-Discovery Virtual Private Line
Service, is a new technology for automatically deploying VPLS services.
Purpose
The wide use of VPLS technologies leads to the growing scale of VPLS networks and
configurations. BGP AD VPLS is introduced to simplify configurations, enable automatic
service deployment, and reduce OpEx.
BGP AD VPLS obtains the advantages of both Kompella and Martini VPLS. BGP AD VPLS-
enabled devices exchange extended BGP Update packets to automatically discover BGP peers
in a VPLS domain. After BGP peer relationships are established, these devices use LDP FEC
129 to negotiate and establish VPLS PWs. On the established PWs, VPLS services are
automatically deployed.
Concepts
Acronym and Full Name Description
Abbreviation
FEC 129 Forwarding Equivalence Class New type of FEC used by LDP
129 signaling
Principles
BGP AD VPLS obtains the advantages of both Kompella and Martini VPLS. BGP AD VPLS
automatically discovers VPLS BGP peers, simplifying the configurations and saving labels.
BGP AD VPLS-enabled devices exchange extended BGP Update packets carrying VSI
information and automatically discover BGP peers in a VPLS domain. After BGP peer
relationships are established, these devices use LDP FEC 129 to negotiate and establish VPLS
PWs. On the established PWs, VPLS services are automatically deployed.
BGP UPDATE
VPLS-ID:65535:100
RD:65535:100
VSI-ID:1.1.1.1
RT:5:5
Next Hop:1.1.1.1
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
AS 65535
PE1 PE2
1. The VPLS ID, RT, VSI ID are set on PE1 and encapsulated in BGP AD Update
messages. These messages are sent to all peer PEs in all BGP areas. The operations and
process are the same on PE2.
NOTE
By default, the RD is equal to the VPLS ID. If the VPLS ID is set, the RD does not need to be set. The
VSI ID is equal to the local LSR ID and does not need to be set.
2. After receiving BGP AD packets, PEs check whether the BGP AD packets match the RT
policy. If they match, PEs obtain the VSI information carried in the packets and compare
the obtained information to the local configuration. After comparison, either of the
following results is obtained:
– If VPLS IDs of VSIs on both PEs are the same, it indicates that the two VSIs are in
the same VPLS domain. This allows one and only one PW to be established
between them.
– If VPLS IDs of VSIs on the two PEs are different, it indicates that the two VSIs are
in different VPLS domains. This allows no PW between them.
Automatically Deploying a PW
After a PE discovers remote PEs in a VPLS domain, BGP AD uses LDP FEC 129 to negotiate
the creation of PWs. Figure 11-70 shows the exchange process and information used during
negotiation.
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
AS 65535
PE1 PE2
1. If no LDP session is established between two PEs in one VPLS domain, the two PEs
initiate negotiation on the creation of an LDP session. If an LDP session is established ,
the PEs exchange LDP Mapping messages to each other by using FEC 123 signaling.
The LDP Mapping messages carry information such as AGI, SAII, TAII, and the label.
NOTE
After BGP AD VPLS members are discovered, BGP AD VPLS proactively triggers LDP to establish an
LDP session, allowing the establishment of a PW for VPLS services. If VPLS services are deleted and
this LDP session is no longer used, LDP is proactively triggered to delete the LDP session. This
simplifies maintenance of the LDP session, saves the network resource cost, and improves system
resource usage and network performance.
2. After a PE receives LDP Mapping messages, the PE parses and obtains information
including the VPLS ID, PW type, MTU, and TAII. The PE compares the information to
the local VSI information. If they are the same and meet the requirements for setting up a
PW, the PE sets up a PW to the remote PE.
CE1
PE1
MPLS
PE2 Network PE3
CE2 CE3
Background
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports this function.
To protect against failures and improve reliability, a redundant provider edge (PE) is often
deployed for a service. If a redundant PE is provided for a virtual private wire service
(VPWS) or virtual private LAN service (VPLS), two pseudo wires (PWs) are deployed for
PW protection. This mechanism is called PW redundancy.
PW redundancy, which is widely used for point-to-point services on VPWS networks, can be
used on VPLS networks because point-to-multipoint VPLS services can be considered as
point-to-point services for each point.
Related Concepts
Some key concepts for VPLS PW redundancy are described by using service traffic protection
between customer edge 1 (CE1) and CE2 on the VPLS network in Figure 11-72 as an
example.
PEs on the two ends of a PW group must negotiate PW statuses to ensure that they select the
same PW to transmit packets.
l Primary and secondary are terms used to describe PW forwarding priorities. The PW
forwarding priorities can be configured, and a smaller value indicates a higher priority. A
PW with the highest priority is the primary PW.
NOTE
PW forwarding priorities take effect only when PE1 uses PW redundancy in Master/Slave mode.
In Master/Slave mode, PE1 instructs PE2 and PE5 to change forwarding statuses of PWs to be the
same as those of PWs on PE1. In Independent mode, the master and backup statuses of PE2 and
PE5 determine forwarding statuses of local PWs.
l Active and Standby are terms used to describe PW forwarding statuses and cannot be
configured. Only active PWs are used for forwarding traffic. Standby PWs may be used
for receiving traffic.
NOTE
Active and Inactive, and Primary and Backup are terms used by Huawei that have the same
meaning with Active and Standby as defined in draft-ietf-pwe3-redundancy-bit-04. They all
indicate PW forwarding statuses.
Implementation
To ensure the same forwarding capability, the PW redundancy protection mechanism to be
used must allow the configuration of a single PW in a PW group to be an active PW and the
remaining to be standby PWs, which requires corresponding signaling control.
RFC 4447 (Pseudowire Setup and Maintenance Using the Label Distribution Protocol [LDP])
specifies the PW Status TLV to transmit the PW forwarding status. The PW Status TLV is
transported to the remote PW peer using a Label Mapping or LDP Notification message. The
PW Status TLV is a 32-bit status code field. Each bit in the status code field can be set
individually to indicate more than one failure. PW redundancy introduces a new PW status
code 0x00000020. When the code is set, it indicates "PW forwarding standby".
NOTE
Forwarding priorities (Primary or Secondary) must be configured for PWs that back up each
other. The highest priority PW will be selected as the primary PW to forward traffic. The
remaining PWs will be in the Secondary state to protect the primary PW.
NOTE
The forwarding status of a PW determines whether the PW is used to forward traffic. The PW
forwarding statuses depend on:
l Local and remote PW signaling statuses: A PE monitors the local signaling status and
uses PW redundancy signaling to obtain remote signaling status from a remote PE.
l PW redundancy mode: Master/Slave or Independent mode is specified on PE1.
l PW forwarding priorities: PW forwarding priorities (Primary or Secondary) are specified
on PE1.
Figure 11-72 shows that VPLS PW redundancy is configured on PE1. In normal cases, all
local and remote PW signaling statuses on PE1 are Up. PEs at the two ends of a PW in
different VPLS PW redundancy modes use different methods to select the same PW for
transmitting user packets.
NOTE
VPLS PW redundancy is similar to VPWS PW redundancy, with the exception that a virtual switch
instance (VSI) has multiple PWs to different PEs. These PWs form various PW groups. PW switching in
one group does not affect other PW groups.
Currently, VPLS supports only the master/slave PW redundancy mode.
Derivative Function
In addition to protection against network faults in real time, VPLS PW redundancy allows
users to manually switch traffic between PWs in a group during network operation and
maintenance. For example, if a device providing a primary PW needs to be maintained, a user
can switch traffic to the secondary PW and switch it back to the primary PW after the
maintenance.
NOTE
Usage Scenarios
VPLS PW redundancy can be used on hierarchical virtual private LAN service (HVPLS)
networks and VPLS and virtual leased line (VLL) interconnected networks. These two types
of networks can bear any services, but when newly planned or deployed, these networks are
suggested to carry different services based on their networking characteristics.
l HVPLS networks are suitable for bearing multicast services because HVPLS networks
can save VPLS core network bandwidth.
l VPLS and VLL interconnected networks are suitable for bearing unicast services,
because VLL PEs do not need to learn user MAC addresses.
VPLS PW redundancy can also be used to improve reliability of existing networks. On the
VPLS network in Figure 11-72, CE1 communicates with CE2, CE3, and CE4 through PWs
between one VSI on PE1 and PE2, PE3, and PE4.
As services develop, services between CE1 and CE2 and between CE1 and CE3 require high
reliability. Services between CE1 and CE4 do not require high reliability.
To meet the reliability requirements, PE5 and PE6 are deployed on the VPLS network to
provide VPLS PW redundancy protection for PE2 and PE3, respectively. In addition, multiple
PW groups to peer PEs are configured in one VSI on PE1. Links between CE1 and CE4
remain unchanged.
VPLS PW redundancy protects services against failures on the network side, AC side, or PEs
without affecting existing services, improving network reliability.
NOTE
VPLS PW redundancy can be provided for the desired services without affecting services on other PWs,
which reduces costs and maximizes profits.
PE2
CE1 CE2
VPLS
PE1
PE5
PE3
PE4
PE6
CE3
CE4 Primary PW
Secondary PW
AC attachment circuit
CE customer edge
PE provider edge
SP service provider
VC virtual circuit
11.9.1 Overview
L2VPN loop detection can detect and eliminate L2VPN loops, preventing L2VPN broadcast
storms.
Purpose
Generally, redundant links are used on an Ethernet switching network to provide link backup
for higher network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and MAC table instability. As a result, the communication quality
may deteriorate and services may even be interrupted.
As Layer 2 Ethernet technologies, L2VPN technologies, including virtual private LAN
service (VPLS) and virtual leased line (VLL), also encounter loop problems in practical
application:
l If a customer network uses multiple leased lines, loops may accidentally occur due to
reasons such as incorrect network configurations, resulting in broadcast storms.
l If a customer network passes through a third-party network, loops may accidentally
occur due to reasons such as incorrect third-party network configurations, resulting in
broadcast storms.
To prevent loops on Ethernet switching networks, the Spanning Tree Protocol (STP) is used.
However, if STP is used to prevent L2VPN loops, varying and complex customer networks
will pose great network maintenance difficulties to carriers. This is because STP also needs to
be deployed on CEs and relies on the customer network to some extent.
To address this issue, L2VPN loop detection is introduced.
Benefits
L2VPN loop detection offers the following benefits to carriers:
l Reduced device burdens: L2VPN loop detection effectively prevents broadcast storms,
reducing device burdens.
l Flexible and controllable deployment: L2VPN loop detection only needs to be deployed
on PEs and is totally independent of the customer network. Therefore, L2VPN loop
detection can be deployed in a flexible and controllable manner.
11.9.2 Principles
Basic Concepts
L2VPN loop detection can detect and eliminate L2VPN loops, preventing L2VPN broadcast
storms.
Implementation
L2VPN loop detection is deployed on the AC interfaces of PEs. AC interfaces that have
L2VPN loop detection enabled send L2VPN loop detection packets (Layer 2 packets) at an
interval that ranges from 100 ms to 4000 ms. The interval changes automatically each time.
The L2VPN loop detection packets are destined for broadcast MAC addresses and are
forwarded within a VLAN domain.
It can work in either self-sending self-receiving mode or loose mode.
Self-Sending Self-Receiving Mode
l If an AC interface on a PE receives an L2VPN loop detection packet from another AC
interface on the same PE and the two AC interfaces are bound to the same VSI or VLL,
the PE experiences an L2VPN network loop. The PE then compares the two looped AC
interfaces, automatically blocks the AC interface with a smaller interface index, and
reports an alarm.
Interface blocking rule: The blocking priorities and interface indexes of two AC
interfaces are compared. The AC interface with a higher blocking priority is blocked
preferentially. If the blocking priorities are the same, the indexes of both AC interfaces
are compared. The AC interface with a smaller index is blocked preferentially.
Interface index comparison rule:
– If a loop occurs on two GigabitEthernet or Ethernet interfaces, the PE compares the
slot IDs, interface IDs, and sub-interface IDs in order until the interface to be
blocked is determined. For example, if a loop occurs on GigabitEthernet 0/x1/y1.z1
and GigabitEthernet 0/x2/y2.z2, the PE compares x1 and x2 first. Because x1 is
smaller than x2, the PE blocks GigabitEthernet 0/x1/y1.z1 directly without going on
to compare y1 and y2 or z1 and z2.
– If a loop occurs on two Eth-Trunk interfaces, the PE compares the interface IDs and
sub-interface IDs in order until the interface to be blocked is determined. For
example, if a loop occurs on Eth-Trunk m1.n1 and Eth-Trunk m2.n2, the PE
compares m1 and m2 first. Because m1 is smaller than m2, the PE blocks Eth-
Trunk m1.n1 directly without going on to compare n1 and n2.
– If a loop occurs on a GigabitEthernet or Ethernet interface and an Eth-Trunk
interface, the PE blocks the Eth-Trunk interface.
l If a PW interface on a PE receives an L2VPN loop detection packet from an AC
interface on the same PE and the AC and PW interface are bound to the same VSI or
VLL, the PE is considered in an L2VPN loop. The PE then automatically blocks the AC
interface that sends the L2VPN loop detection packet and reports an alarm.
Loose Mode
On the network shown in Figure 11-73, AC1 on PE1 and AC2 on PE2 send L2VPN loop
detection packets to the customer network.
PE1 PE2
VPLS/VLL
AC1 network AC2
L2 network
CE1 CE2
Two PEs (PE1 and PE2) reside on a VPLS or VLL network. If PE1's AC interface AC1
receives an L2VPN loop detection packet from PE2's AC interface AC2, PE1 and PE2 are
considered in an L2VPN loop, irrespective of whether AC1 and AC2 are bound to the same
VSI or VLL. A PE then blocks an AC interface according to the following interface blocking
rules and reports an alarm.
Interface blocking rule: The blocking priorities, system MAC addresses, and interface indexes
of AC1 and AC2 are compared. If an interface is blocked preferentially due to a higher
blocking priority, the system MAC addresses and interface indexes of the two interfaces will
not be compared. For example, if AC1 has a higher blocking priority than AC2, AC1 is
blocked preferentially.
11.9.3 Applications
On the network shown in Figure 11-74, AC1 and AC2 on PE1 send L2VPN loop detection
packets to the customer network.
VPLS/VLL
network
AC1 AC2
PE1
L2 network
CE1 CE2
On the network shown in Figure 11-75, AC1 on PE1 and AC2 on PE2 send L2VPN loop
detection packets to the customer network.
PE1 PE2
VPLS/VLL
AC1 network AC2
L2 network
CE1 CE2
AC attachment circuit
CE customer edge
PE provider edge
Purpose
IP RAN solutions have been worked out to maximize carriers' return on investment, reduce
network construction costs, and evolve the exiting network smoothly into a Long Term
Evolution (LTE) network.
The existing IP RAN solutions include end-to-end (E2E) virtual private network (VPN),
hierarchical VPN (HVPN), mixed VPN, native IP+L3VPN, and ATN+CX gateway solutions.
However, these solutions have disadvantages, as described in Table 11-16.
Laye l This type of solution requires high-performance access devices because the
r3 dynamic signaling protocols, such as the Border Gateway Protocol (BGP),
solut Resource Reservation Protocol (RSVP), and Label Distribution Protocol (LDP),
ion need to be enabled on the devices. The protocols generate a large number of
packets, which consume large amounts of network bandwidth and system
process resources.
l This type of solution involves complex Layer 3 technologies and therefore
requires highly skilled operation and maintenance (O&M) personnel.
Laye l This type of solution has complex data planning, and a large number of features
r2 need to be deployed.
solut l This type of solution has complex configurations, and a large number of
ion configuration procedures are required.
l This type of solution has high O&M costs, and a lot of manpower is required for
routine maintenance.
IP RAN virtual clusters overcome these disadvantages. As shown in Figure 11-76, a virtual
cluster is deployed on the access ring. The access aggregation gateways (AGGs) perform
centralized path calculation, service provisioning, and traffic control for the cell site gateways
(CSGs). This virtual cluster simplifies network O&M and deployment. Table 11-17 and Table
11-18 respectively describe the configuration and protocol changes before and after a virtual
cluster is deployed.
BTS BSC
Virtual
Master
cluster
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG
Table 11-17 Configuration changes before and after a virtual cluster is deployed
Ite CSG Primary AGG Secondary AGG
m
Before a After a Before a After a Before a After a
Virtual Virtual Virtual Virtual Virtual Virtual
Cluster Cluster Is Cluster Is Cluster Cluster Is Cluster Is
Is Deployed Deployed Is Deployed Deployed
Deploye Deploy
d ed
l Config
uring
Bidire
ctional
Forwa
rding
Detect
ion
(BFD)
for
LSP
l Binding
the
L3VE
interfac
e to
virtual
routing
and
forwardi
ng
(VRF)
l Configu
ring the
Border
Gatewa
y
Protocol
(BGP)
Table 11-18 Protocol changes before and after a virtual cluster is deployed
Ite CSG Primary AGG Secondary AGG
m
Before a After a Before a After a Before a After a
Virtual Virtual Virtual Virtual Virtual Virtual
Cluster Cluster Is Cluster Is Cluster Cluster Is Cluster Is
Is Deployed Deployed Is Deployed Deployed
Deploye Deploy
d ed
Definition
Virtual cluster: is a promising technology for simplifying network O&M and management
and reducing device loads. The control layers of all devices on a network are centralized on a
device. The device performs centralized path calculation, service provisioning, and traffic
control for other devices on the network.
Master: is a server in a virtual cluster and performs centralized path calculation, service
provisioning, and traffic control for access points (APs). The AGGs shown in Figure 11-76
are masters. Masters are classified as primary or secondary masters.
AP: is a client in a virtual cluster and is connected to base stations. The CSGs shown in
Figure 11-76 are APs.
To enhance network reliability, deploy the primary and secondary masters in a virtual cluster
to implement two control planes. As shown in Figure 11-77, an AP can belong to two masters
that work in primary/secondary mode.
Master A
Virtual cluster A
AP2
AP1
AP3
Virtual cluster B Master B
Different APs can belong to different primary and secondary masters. For example, AP1 can
belong to the primary master A and secondary master B, and AP2 can belong to the primary
master B and secondary master A.
If the primary master becomes faulty, traffic on the primary master switches to the secondary
master. The secondary master automatically becomes a new primary master for path
calculation. If the original primary master recovers, traffic switches back to the original
primary master. However, the control layer is still located on the new primary master.
Benefits
IP RAN virtual clusters offer the following benefits:
l The control layers of all devices on access rings are centralized on the AGG, which
significantly simplifies network deployment.
l The entire network automatically adapts to network topology changes, which reduces
O&M costs.
l The dynamic protocols, such as RSVP-TE, LDP, and BGP, do not need to run on access
devices, which implifies the complexity of the network and reduces device loads.
11.10.2 Principles
Related Concepts
l VP: A virtual path (VP) is a bidirectional label switched path (LSP) that is established on
a virtual cluster and is used to forward PW packets on the public network.
Virtual
cluster Master RSG
AP
MME
eNodeB
A tunnel label is swapped at each hop in a Multiprotocol Label Switching (MPLS) domain.
l X2 service
– A master forwards X2 packets between adjacent APs.
– If signaling needs to be forwarded to the MME, the forwarding process is similar to
that for S1 packets.
NOTE
The process of forwarding 2G/3G Ethernet packets is similar to the process of forwarding LTE S1 and
X2 packets.
l For the process of forwarding 2G/3G Ethernet packets from a BTS/NodeB to a BSC/RNC or the
core network, see the process of forwarding LTE S1 packets.
l For the process of forwarding 2G/3G Ethernet packets between adjacent APs, see the process of
forwarding LTE X2 packets.
Related Concepts
Establishing a virtual cluster involves the following concepts:
l Remote-AP interface: is a virtual interface defined on a master in a virtual cluster. The
interface is mapped to an AP's physical interface connected to a base station. The
remote-AP interface terminates the virtual circuit (VC) from the AP to the master and
provides access to the VC from the master to the AP.
l vBridge interface: is a virtual interface created on a master. The interface applies to
Ethernet services when base stations use the same IP network segment. Multiple base
stations use the interface to share a Layer 3 gateway address on the master. Other
interfaces connected to the interface form a vBridge broadcast domain.
l Master Slave Control Protocol (MSCP): is an extension defined by Huawei to the
Diameter protocol. MSCP is used to establish a control channel between an AP and a
master. Using the control channel, the AP reports node information to the master and the
master delivers control information to the AP.
Implementation
Figure 11-79 shows the process of establishing a virtual cluster. All APs flood their own node
and topology information to the entire network. The same topology database (TOPO DB) is
established on the APs and masters. The primary master uses the TOPO DB to calculate the
forwarding paths between each AP and the primary master and between each AP and the
secondary master. Then the primary master advertises the calculation results to the secondary
master. Based on the calculation results, the primary and secondary masters generate tunnel
and VC forwarding entries and deliver the forwarding entries to the APs to establish tunnels
and VCs.
Virtual
cluster Primary
master
AP
1
2
3
4
5
Secondary
master
As shown in Figure 11-79, establishing a virtual cluster involves five procedures. Table
11-19 describes details about the five procedures.
2 Topo The AP runs The primary master runs The secondary master
logy Intermediate System to IS-IS, collects topology runs IS-IS, collects
Infor Intermediate System information, and floods topology information,
mati (IS-IS), collects the collected and floods the collected
on topology information, information to the entire information to the
Colle and floods the collected network. entire network.
ction information to the
entire network.
AP Registration
Before an AP joins a virtual cluster, the AP needs to register with a master. After the AP
registers with the master, you can use the master to log in to the AP and manage it.
AP registration involves the following scenarios:
First AP registration
After you configure a virtual cluster, a master first enters the virtual cluster mode and waits
for an AP to register with it. After the AP enters the virtual cluster mode, the AP sends a
request for establishing an MSCP channel to the master. The AP and master must exchange
routes to management IP addresses before establishing an MSCP channel. The AP can
automatically or statically obtain the local management IP address and the primary and
secondary masters' management IP addresses. In automatic mode, the AP randomly selects a
local loopback interface address as the local management IP address and obtains the primary
and secondary masters' management IP addresses after the virtual cluster accesses an
Intermediate System to Intermediate System (IS-IS) process. In static mode, the three
management IP addresses can be specified on the AP. After the MSCP channel is established,
the AP collects its own label space and interface information and sends a registration message
to the master. After the MSCP channel is established, the AP collects its own label space and
interface information and sends a registration message to the master. If the registration is
successful, the master saves and maintains the label space and interface information of AP.
AP attribute change
When attributes on an AP change, the AP sends an update message to a master to update the
AP's information. For example, if an AP no longer belongs to a master, the AP sends a
registration cancel message to the master and sends a registration message to a new master.
Virtual clusters allow you to dynamically add or delete APs and change the status of the
primary and secondary masters to which an AP belongs.
Versatile Routing Platform (VRP) extends IS-IS and adds a type-length-value (TLV) field to
carry topology information specific to a virtual cluster.
Establishing an IS-IS process for a virtual cluster involves the following scenarios:
l No IS-IS process exists in the virtual cluster.
After you configure a virtual cluster, an AP and a master automatically establish an IS-IS
process with the ID of 65534 for the virtual cluster.
l One or more IS-IS processes exist in the virtual cluster. An AP and a master have
different implementation modes:
– AP
If the AP has only one IS-IS process, the process is enabled for the virtual cluster.
If the AP has multiple IS-IS processes, the AP searches for an unused IS-IS process
ID in descending order from 65534 and uses the ID to establish an IS-IS process for
the virtual cluster.
– Master
The master can have multiple IS-IS processes for the virtual cluster. If the master
has no IS-IS process for the virtual cluster, the master searches for an unused IS-IS
process ID in descending order from 65534 and uses the ID to establish the first IS-
IS process for the virtual cluster. You need to manually establish other IS-IS
processes.
NOTE
The original standard IS-IS process on the aggregation ring is used between the primary and secondary
masters.
Neighbor establishment and information flooding for extended IS-IS are similar to those for
standard IS-IS. After all information is flooded, the same TEDB is established on APs and
masters. The primary master uses the TEDB to calculate the optimal traffic forwarding path in
the virtual cluster.
Path Calculation
The primary master uses the Constraint Shortest Path First (CSPF) algorithm to calculate four
virtual paths (VPs) for an AP. As shown in Figure 11-80, the four VPs are VP1 to VP4. Table
11-20 describes the VPs and their constraints.
VP1 Primary VP from an AP to the primary master l VP1 and VP2 do not
intersect.
VP2 Backup VP for VP1
l VP1 and VP3 do not
VP3 Primary VP from an AP to the secondary master intersect.
l VP3 and VP4 do not
VP4 Backup VP for VP3 intersect.
VP1
AP
VP2
Secondary
VP3 master
Tunnel Establishment
After path calculation is complete, the primary and secondary masters allocate labels and
forwarding entries to all APs based on the calculation results. Then the primary and secondary
masters use the MSCP channels to deliver the allocated labels and forwarding entries to the
APs. The primary master calculates four VPs for an AP. Two of the VPs form a VP protection
group from the primary master to the AP, and the other two form a VP protection group from
the secondary master to the AP. Two VPs in each VP protection group work in hot standby
mode. Table 11-21 describes the relationship between the VP protection groups and VPs.
VC Establishment
A master uses a remote-AP interface to establish a VC to an AP or terminate the VC. A
service bearer between a master and an RSG varies for different scenarios. Table 11-22
describes VC establishments in different scenarios.
Implementation
You can use either of the following methods to manage devices on an access ring:
l Using commands: You can log in to a master and run commands to manage the devices.
When you run commands on the master to configure an AP, the master automatically
uses Telnet to connect to APs. The master only transparently transmits AP configurations
and does not save them. To ensure the security of automatic login, an AP needs to verify
the identity of a login user.
l Using an NMS: You can use an NMS to manage the devices as you do before a virtual
cluster is deployed.
NMS
BTS BSC
n et
Tel
Tel Master
net
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG
Primary
VC VRF
BTS BSC
Secondary
VC
AP
NodeB RNC
Secondary RSG2
master
Background
If the MSCP channel in a virtual cluster becomes faulty, the data plane is interrupted. The
MSCP channel becomes faulty if either of the following cases occurs:
l A master in the virtual cluster becomes faulty, and a switchover occurs between the
master's master and slave main control boards.
l The link between the AP and master becomes faulty.
GR ensures that the data plane is not interrupted if the MSCP channel becomes faulty.
Related Concepts
GR: In IETF, protocols related to Internet Protocol/Multiprotocol Label Switching (IP/MPLS)
such as Open Shortest Path First (OSPF), Intermediate System-Intermediate System (IS-IS),
Border Gateway Protocol (BGP), Label Distribution Protocol (LDP), and Resource
Reservation Protocol (RSVP) are extended to ensure that the forwarding is not interrupted
when the system is restarted. This reduces the flapping of the protocols at the control plane
when the system performs the AMB/SMB switchover. This series of standards is called GR
extension to each protocol.
Implementation
As shown in Figure 11-86, traffic is transmitted from the AP to the master over a pseudo wire
(PW). If the MSCP channel becomes faulty due to a switchover between the master's master
and slave main control boards or other causes, the AP and master both enter the GR state. The
data plane is not interrupted during MSCP channel recovery. The radio network controller site
gateways (RSGs) outside the virtual cluster do not detect the fault. The GR process is as
follows:
AP Master
1. Before a switchover occurs between the master's master and slave main control boards,
the AP sends a registration request carrying the GR flag to notify the master of its own
GR capability. After the AP registers with the master, the master notifies the AP of its
own GR capability.
2. If the MSCP channel is interrupted, the master and AP start their own GR Reconnect
timers. The AP sends a re-registration request to the master. During the re-registration,
the AP and master re-exchange GR capability information with each other. Before the
GR Reconnect timers expire, forwarding entries are retained, which ensures traffic
forwarding continuity.
3. After the MSCP channel recovers, the AP and master stop their own GR Reconnect
timers, restore the forwarding data before the fault occurred, and start their own GR
Recovery timers.
4. After the forwarding data is restored, the AP sends a GR End message to notify the
master that GR ends.
NOTE
If GR ends but service data is still not restored, traffic switches to the secondary PW or backup LSP.
11.10.2.6 OAM
IP radio access network (RAN) virtual clusters use the following operation, administration
and maintenance (OAM) techniques:
l Bidirectional Forwarding Detection (BFD): used for fault detection
l IP Flow Performance Management (IP FPM) for Ethernet services: used for performance
monitoring
l Virtual circuit connectivity verification (VCCV) ping and label switched path (LSP)
ping/tracert: used for fault locating
Fault Detection
When a virtual cluster is established to carry Ethernet services, BFD sessions are
automatically established for LSPs and PWs in the virtual cluster. You need to manually
configure BFD for TE tunnel and BFD for LSP outside the virtual cluster.
Fault Locating
If a traffic interruption occurs, perform the following operations to locate the fault:
l Perform end-to-end detection in the order of RSG -> master -> AP.
l Perform TE tunnel detection between the RSG and master.
l Perform PW detection between the master and AP.
l Perform LSP detection.
Figure 11-87 shows methods for locating faults between nodes.
IP
layer
ICMP ping
As shown in Figure 11-87, methods for locating faults between nodes are classified into the
following categories:
l End-to-end fault locating
Internet Control Message Protocol (ICMP) ping/tracert can be used for Ethernet services
to implement end-to-end fault locating.
l Fault locating outside a virtual cluster
LSP ping/tracert can be used for both Ethernet services to implement fault locating
outside the virtual cluster.
l Fault locating in a virtual cluster
– PW ping in label alert mode can be used for fault locating in a virtual cluster. You
can use a ping command on the master to detect PW continuity. Because the AP has
no route to the master, a reply packet must be transmitted over an LSP.
– LSP ping/tracert can also be used for fault locating in a virtual cluster. You can use
a ping or tracert command on the master to detect LSP continuity. Because the AP
has no route to the master, a reply packet must be transmitted over an LSP.
11.10.3 Application
Deployment Position
The Versatile Routing Platform (VRP) allows a virtual cluster to be deployed on an access
ring of an IP RAN. That is, you can deploy a virtual cluster on cell site gateways (CSGs) and
access aggregation gateways (AGGs). Figure 11-88 shows the deployment position of a
virtual cluster.
BTS BSC
Virtual
cluster Master
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG
Master A B C D
5 7
12 13
6
1
AP
9
4
2
11
3 10
8
AGG A B C D
5 7
12 13
6
1
CSG
9
4
2
11
3 10
8
The procedure for deploying virtual clusters on the entire network is as follows:
1. Upgrade the A-D ring and ensure that D's interface connecting to AP13 runs a virtual
cluster and D's interface connecting to AP12 still runs a non virtual cluster, as shown in
Figure 11-91.
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8
3. Upgrade C and AP7 to implement virtual clusters on the entire network, as shown in
Figure 11-93.
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8
Term Description
AP access point
VP virtual path
VC virtual circuit
12 QoS
This document describes the QoS in terms of the overview, principle, and applications.
The differentiated service model is called Diff-Serv for short. In the model, the application
program does not need to send its request for network resource before sending the packets.
The application program informs network nodes of its demand for QoS by using QoS
parameters in the IP packet header. Then ATNs along the path obtain the demand by
analyzing the header of the packet.
To implement Diff-Serv, the access ATN classifies packets and marks the class of service
(CoS) in the IP packet header. The downstream ATNs then identify the CoS and forward the
packets on the basis of CoS. Diff-Serv is therefore a class-based QoS solution.
DS domain
DS node DS node
DS node
Non-DS
Non-DS domain domain
throughput. R bit: It is of one bit and indicates reliability. C bit: It is of one bit and
indicates cost. The lowest bit of ToS field has to be 0.
The ATN first checks the IP precedence of packets to implement QoS. The other bits are
not fully used.
The ToS octet of IPv4 packet header is redefined in RFC2474, called DS field. As shown
in Figure 12-2: The leftmost 6 bits (from 0 through 5) in DS field are used as DSCP. The
rightmost 2 bits (6 and 7) are the reserved bits. The leftmost 3 bits (from 0 through 2) are
Class Selector CodePoint (CSCP), which indicate a type of DSCP. DS node selects PHB
according to the DSCP value.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DSCP
The DSCP field within the DS field is capable of conveying 64 distinct codepoints. The
codepoint space is divided into three pools as shown in Table 12-1.
Code pool 1 (xxxxx0) is used for standard action, code pool 2 (xxxx11) and code pool 3
(xxxx01) are used for experiment or future extension.
l Standard PHB
The DS node implements the PHB behavior on the data flow. The network administrator
can configure the mapping from DSCP to PHB. When a packet is received, the DS node
detects its DSCP to find the mapping from DSCP to PHB. If no matching mapping is
found, the DS node selects the default PHB (Best-Effort, DSCP=000000) to forward the
packet. All the DS nodes support the default PHB.
The following are the four standard PHBs defined by the IETF: Class selector (CS),
Expedited forwarding (EF), Assured forwarding (AF) and Best-Effort (BE). The default
PHB is BE.
– CS PHB
Service levels defined by the CS are the same as the IP precedence used on the
network.
The value of the DSCP is XXX000 where the value of "X" is either 1 or 0. When
the value of DSCP is 000000, the default PHB is selected.
– EF PHB
EF means that the flow rate should never be less than the specified rate from any
DS node. EF PHB cannot be re-marked in DS domain except on border node. New
DSCP is required to meet EF PHB features.
EF PHB is defined to simulate the forwarding of a virtual leased line in the DS
domain to provide the forwarding service with low drop ratio, low delay, and high
bandwidth.
– AF PHB
AF PHB allows traffic of a user to exceed the order specification agreed by the user
and the ISP. It ensures that traffic within the order specification is forwarded. The
traffic exceeding the specification is not simply dropped, but is forwarded at lower
service priorities.
Four classes of AF: AF1, AF2, AF3, and AF4 are defined. Each class of AF can be
classified into three different dropping priorities. AF codepoint AFij indicates AF
class is i (1<=i<=4) and the dropping priority is j (1<=j<=3). When providing AF
service, the carrier allocates different bandwidth resource for each class of AF.
A special requirement for AF PHB is that the traffic control cannot change the
packet sequence in a data flow. For instance, in traffic policing, different packets in
a service flow are marked with different dropping priorities even if the packets
belong to the same AF class. Although the packets in different service flows have
different dropping ratio, their sequence remains unchanged. This mechanism is
especially applicable to the transmission of multimedia service.
– BE PHB
BE PHB is the traditional IP packet transmission that focuses only on reachability.
All ATNs support BE PHB.
l Recommended DSCP
Different DS domains can have self-defined mapping from DSCP to PHB. RFC2474
recommends code values for BE, EF, AFij, and Class Selector Codepoints (CSCP).
CSCP is designed to be compatible with IPv4 precedence model.
– BE: DSCP=000000
– EF: DSCP=101110
– AFij codepoint
AFij codepoint is shown in Table 12-2.
In traffic policing:
n If j=1, the packet color is marked as green.
n If j=2, the packet color is marked as yellow.
n If j=3, the packet color is marked as red.
The first three bits of the same AF class are identical. For example, the first
three bits of AF1j are 001; that of AF3j are 011, that of AF4j are 100. Bit 3 and
bit 4 indicate the dropping priority which has three valid values including 01,
10, and 11. The greater the Bit value, the higher the dropping priority.
n Class selector codepoint
In the Diff-Serv standard, the CSCP is defined to make the DSCP compatible
with the precedence field of the IPv4 packet header. The ATNs identify the
priority of the packets through IP precedence. The IP precedence and the
CSCP parameters map with each other. The user should configure the values
for these parameters. In CSCP, the higher the value of DSCP=xxx000 is, the
lower the forwarding delay of PHB is.
The default mapping between DSCP and IPv4 precedence is shown in Table
12-3.
Table 12-3 The default mapping between IPv4 precedence and DSCP
IPv4 DSCP (in DSCP (in dotted Service
Precedence binary) decimal) Class
0 000000 0 BE
1 001000 8 AF1
2 010000 16 AF2
3 011000 24 AF3
4 100000 32 AF4
5 101000 40 EF
6 110000 48 CS6
7 111000 56 CS7
Defined in RFC3032, MPLS packet header is shown in Figure 12-3. EXP field is of
three bits. Its value ranges from 0 to 7 and indicates the traffic type. By default, EXP
corresponds to IPv4 priority.
MPLS Header
0 1 2 3..... 20 21 22 23 31
LABEL EXP S TTL
It should be noted that QoS is an end-to-end solution, while MPLS only ensures that data can
enjoy the services regulated in SLA. After the data enters the IP network, IP network ensures
QoS.
l Traffic classification
l Traffic policing
l Traffic shaping
l Congestion management
l Congestion avoidance
Traffic classification is the basis of the QoS application. With this technique, packets are
identified based on certain mapping rules. This is a precondition for providing differentiated
services. Traffic policing, traffic shaping, congestion management, and congestion avoidance
control the network traffic and resource allocation from different aspects. They feature the
Diff-Serv concept. The following describes these techniques in detail:
For the common QoS features in the Diff-Serv model, see Figure 12-4.
DS node
Traffic
shaping
Non-DS
Non-DS domain domain
In the Int-Serv model, the Resource Reservation Protocol (RSVP) is used as signaling for the
transmission of QoS requests. When a user needs QoS guarantee, the user sends a QoS
request to the network devices through the RSVP signaling. The request may be a
requirement for delay, bandwidth, or packet loss ratio. After receiving the RSVP request, the
nodes along the transfer path perform admission control to check the validity of the user and
the availability of resources. Then the nodes decide whether to reserve resources for the
application program. The nodes along the transfer path meet the request of the user by
allocating resources to the user. This ensures the QoS of the user services.
When implementing QoS in Diff-Serv model, the ATN needs to identify each class of traffic.
The following are the three methods for the ATN to classify traffic:
l Complex traffic classification: Packets are classified using complex rules, for example,
by integrated link layer, network layer, and transmission layer information (such as
source MAC address, destination MAC address, source IP address, destination IP
address, user group number, protocol type, and TCP/UDP port number specific to
application). Complex traffic classification is performed on ATN at edges of a Diff-Serv
domain.
l Simple traffic classification: This classification is based on IP precedence, DSCP, MPLS
EXP, 802.1P precedence in packets. A collection of packets of the same class is called
Behavior Aggregate (BA). Generally, the core ATN in Diff-Serv domain performs only
simple traffic classification.
l Forced traffic classification: The ATN supports forced traffic classification. That is, you
can run a command to configure forced traffic classification on the inbound interface to
set the precedence for the traffic. Then, the traffic is forwarded to the outbound interface
carrying the specified precedence.
In a Diff-Serv domain, traffic policing, and traffic shaping is completed by the traffic
conditioner. A traffic conditioner consists of four parts: Meter, Marker, Shaper, and Dropper
as shown in Figure 12-5.
l Meter: Measures the traffic and judges whether the traffic complies with the
specifications defined in TCS. Based on the result, the ATN performs other actions
through Marker, Shaper, and Dropper.
l Marker: Re-marks the DSCP of the packet, and puts the re-marked packet into the
specified BA. The available measures include lowering the color and the service level of
the packet flow which does not match the traffic specifications (Out-of-Profile) or
maintaining the color and the service level.
l Shaper: Indicates the traffic shaper. Shaper has buffer which is used to buffer the traffic
received and ensures that packets are sent at a rate not higher than the committed rate.
l Dropper: Performs the traffic policing action, which controls the traffic by dropping
packets so that the traffic rate conforms with the committed rate. Dropper can be
implemented by setting the Shaper buffer to 0 or a small value.
Meter
Packets
Shaper/
Classifier Marker
Dropper
In Diff-Serv, ATNs must support traffic control on the inbound and outbound interfaces
simultaneously. The functions of ATNs vary with their locations. The functions of a ATN are
as follows:
l The border ATN processes the access of a limited number of low-speed users. In this
way, traffic control on the border ATN can be completed efficiently. A large amount of
traffic classification and traffic control are completed by the border ATN.
l The core ATN only performs PHB forwarding of BA to which packets flow belong. In
this way, PHB forwarding can be completed with high efficiency, which also meets the
requirements of high-speed forwarding by Internet core network.
Causes of Congestion
Congestion often occurs in complex packet switching environment of the Internet. It is caused
by the bandwidth bottleneck of two types of links, as shown in Figure 12-6.
100M
l Packets enter the ATN at high rate through v1, and are forwarded at low rate through v2.
Congestion occurs in the ATN because the rate of v1 is greater than that of v2.
l Packets from multiple links enter the ATN at the rate of v1, v2, and v3. They are
forwarded at the same rate of v4 through a single link. Congestion occurs in the ATN
because the total rate of v1, v2, and v3 is greater than that of v4.
Congestion also occurs due to the causes as follows:
l Packets enter the ATN at line speed.
l Resources such as available CPU time, buffer, or memory used for sending packets are
insufficient.
l Packets that arrive at the ATN within a certain period of time are not well controlled. As
a result, the network resources required to handle the traffic exceed the available
resources.
Congestion Results
The impact of congestion is as follows:
l Increases the delay and the jitter in sending packets. Long delay can cause retransmission
of packets.
l Reduces the efficiency of throughput of the network and result in waste of the network
resources.
l Consumes more network resources, particularly storage resources when congestion is
aggravated. If not properly allocated, the network resources may be exhausted, and the
system may crash.
Congestion is the main cause of low QoS. It is very common in complex networks and must
be solved to increase the efficiency of the network.
Congestion Solutions
When congestion occurs or aggravates, queue scheduling and packet discard policies can be
used to allocate network resources for traffic of each service class. The commonly used
packet discard policies are as follows:
l Tail Drop
When the queue is full, subsequent packets that arrive are discarded.
l Random Early Detection (RED)
When the queue reaches a certain length, packets are discarded randomly. This can avoid
global synchronization due to slow TCP start.
l Weighted Random Early Detection (WRED)
When discarding packets, the ATN considers the queue length and packet precedence.
The packets with low precedence are discarded first and are more likely to be discarded.
The ATN adopts WRED to avoid congestion problems.
12.1.3.4 RSVP
Through RSVP signaling, requests for resources are transmitted between nodes on the entire
network. The nodes then allocate resources based on the priorities of requests.
RSVP is an end-to-end protocol.
Requests for resources are transmitted between nodes through RSVP. The nodes allocate
resources at the requests. This is the process of resource reservation. Nodes check the requests
against current network resources before determining whether to accept the requests. If the
current network resources are limited, certain requests can be rejected.
Different priorities can be set for different requests for resources. Therefore, a request with a
higher priority can preempt reserved resources when network resources are limited.
RSVP determines whether to accept requests for resources and promises to meet the accepted
requests. RSVP itself, however, does not implement the promised service. Instead, it uses the
techniques such as queuing to guarantee the requested service.
Network nodes need to maintain some soft state information for the reserved resource.
Therefore, the maintenance cost is very high when RSVP is implemented on large networks.
RSVP is therefore not recommended for the backbone network.
Purpose
To control the traffic that is sent to networks by users is important for ISPs. To control the
traffic of certain applications in an enterprise network is a means to manage the network
status. A typical application of TP is to monitor the specification of a type of traffic that enters
a network. Based on the result, the specification can be limited in a reasonable scope; or the
amount of traffic that exceeds the limit is "punished." As a result, the network resources and
the interests of a carrier are protected.
A typical application of traffic shaping is to control the normal flow and burst flow of
outgoing traffic based on the network connection. Therefore, the packets can be sent at a
uniform rate.
Benefits
l This feature brings the following benefits to carriers.
Punish the amount of traffic that exceeds the limit to protect the network resources and
the interests of a carrier.
12.2.2 Principles
12.2.2.1 Basic Principles of Traffic Policing
– If the packet is green while Tc-B ≥ 0, the packet is remarked in green and Tc
decreases by B.
– If the packet is green or yellow while Tc-B < 0 and Tp-B ≥ 0, the packet is
remarked in yellow and Tp decreases by B.
– Otherwise, the packet is remarked in red and Tc and Tp remain unchanged.
l Double Rate Traffic Policing
When the network traffic is complex, the double rate traffic policing can be used.
In the double rate traffic policing, two token buckets are used. The capacity of one token
bucket is the committed burst size (CBS); this token bucket is called bucket C for short.
The capacity of the other token bucket is the peak burst size (PBS); this token bucket is
called bucket P for short. Tc and Tp represent the token quantity of the two buckets.
When being initialized, Tc equals CBS and Tp equals PBS. CBS is less than PBS. The
two buckets adopt the different rates for filling tokens: the CIR and the PIR. An
incoming packet is processed (colored, discarded, or forwarded) according to the current
capacity of the two buckets.
Tokens are filled into bucket C at the rate of CIR and into bucket P at the rate of PIR.
When the buckets are full, that is, when the quantity is CBS and PBS, the extra tokens
are dropped. In initialization, Tc equals CBS and Tp equals PBS.
Tc is refreshed CIR times per second and Tp is refreshed PIR times per second. Each
time in the refreshment, the following rules are observed:
– If Tc < CBS, Tc increases by 1.
– If Tp < PBS, Tp increases by 1.
– Otherwise, Tc and Tp remain unchanged.
The processing also goes in the following two modes:
– The color-blind mode
– The color-aware mode
In the color-blind mode, when an arriving packet is measured (suppose the size of the
arriving packet is B), the following rules are observed:
– If Tp-B < 0, the packet is remarked in red.
– If Tp-B ≥ 0 and Tc-B < 0, the packet is remarked in yellow and Tp decreases by B.
– Otherwise, the packet is remarked in green and both Tc and Tp decrease by B.
In the color-aware mode, when an arriving packet is measured (suppose the size of the
arriving packet is B), the following rules are observed:
– If the packet is red or Tp - B < 0, the packet is remarked in red.
– If the packet is or Tp - B ≥ 0 and Tc - B < 0, the packet is remarked in yellow and
Tp decreases by B.
– Otherwise, the packet is remarked in green and both Tc and Tp decrease by B.
...
Filling the bucket
Tokens
with tokens at a
specified rate
Classifying
Passed
Token bucket
Dropped
l The tokens are put into the TB at the rate preset by the user. The capacity of the TB is
also preset by users. When the number of tokens reaches the capacity of the TB, the
number does not increase any more.
l On arrival, the packets are classified according to the information such as the IP
precedence, source address, or destination address. The packets that conform to the
preset feature go into the TB for further processing.
l If the TB has enough tokens for sending packets, packets are forwarded. Meanwhile, the
number of tokens is reduced by the packet length. If the TB contains insufficient tokens
or is empty, the packets that are not assigned with tokens or not assigned with enough
tokens are discarded; or redirect and the packets are re-sent. At this time, the number of
tokens in the TB remains unchanged.
To limit the traffic rate is the main function of CAR. With the CAR technology, a TB is used
to measure the data traffic that flows through the ports of a ATN so that in the specified time
only the packets that are assigned with tokens go through the ATN. In this way, the traffic rate
is limited. CAR limits the maximum traffic rates of both incoming packets at the ingress and
the outgoing packets at the egress. Meanwhile, the rate of certain types of traffic can be
controlled according to such information as the IP address, port number, and precedence.
These characteristics include the IP address, port number, and precedence. The traffic not
conforming to the present conditions is not limited in rate; such traffic is forwarded at the
original rate.
The CAR technology is used at the network edge to ensure that the core device can process
data normally.
l On the interface, configure different shaping parameters for the packets that participate
in the traffic shaping based on different service classes (EF, AF1, AF2, AF3, AF4, BE,
CS6, or CS7).
l The scheduling mode of the GTS queue can adopt either the PQ scheduling or the WFQ
scheduling. In the GTS queue, the scheduling modes of the packets with different service
levels have the following default values:
– For the AF1 to AF4 queue or the BE queue, by default, the WFQ scheduling is
configured. The bandwidth is allocated based on the configured weight parameters.
– For the EF, CS6, or CS7 queue, by default, the PQ scheduling is configured. Based
on the priority, the PQ scheduling is applied to the services that are sensitive to the
delay.
l When the GTS queue adopts the WFQ scheduling, the weight value can be configured,
which represents the ratio among the bandwidth occupied by all the WFQ queues.
l You can configure the shaping value on the interface, that is, the rate of putting the
tokens in the token bucket. If the rate of the packets exceeds this value, the packets enter
the GTS queue.
NOTE
The length of the frame header and CRC field are calculated in the bandwidth for packets to which CAR
applies but not calculated in the bandwidth for packets that have been implemented with traffic shaping.
For example, if the traffic shaping value is set to 23 Mbit/s for IPoE packets, the IP packets are
transmitted at a rate of 23 Mbit/s with the lengths of the frame header and CRC field not counted.
For the ATN, the depth of the token bucket is set by the system.
Processing Procedure
The packets need to be processed by GTS on the outbound interface of the upstream ATN to
decrease the lost packets. The packets that exceed the GTS traffic features are cached in the
interface buffer of the upstream ATN. When the network congestion disappears, GTS extracts
the packets from the buffer queue and sends the packets continuously. Therefore, the packets
sent to the downstream ATNs conform to the traffic specifications, and the packet loss ratio
on the downstream ATNs is decreased. If the packets are not processed by GTS on the
outbound interface of the upstream ATN, the packets that exceed the CAR specifications of
the downstream ATN will be dropped by the downstream ATN.
Traffic Prevents unnecessary packet Introduces the delay and jitter. More buffer
shaping loss. resources are needed to cache the packets.
Traffic Supports the marks. No extra The packet loss may lead to the resending.
policing buffer is needed.
12.2.3 Applications
Applications of Traffic Policing
Traffic policing is a traffic control policy that limits a traffic rate and resource usage through
monitoring the traffic specifications. Traffic policing is applied at the edge of a network to
ensure that core devices process data normally. Figure 12-8 is a typical networking diagram
for traffic policing.
WAN
Congestion in WAN
Output interface is not network results in
congested;queuing and nonintelliqent Layer 2
WRED do not work. drops.
When a network of high traffic rate transmits data to a network of low traffic rate, the
entrance of the low-rate network bottlenecks the network traffic and results in serious data
loss, especially affects transmission of the data demanding low latency such as voice data.
Traffic control limits the transmission of data at high rate; the voice data, however, must be
forwarded first. The traffic classification function can be used to assign high priority to the
voice data stream. The CAR traffic policing and the queue scheduling work together to ensure
the quality of communication.
ATN supports traffic policing based on an interface. Traffic policing based on an interface
means to control all the traffic that enters the interface regardless of the packet type. Traffic
policing can be applied to both the incoming packets and outgoing packets.
The CAR works together with other QoS policies to provide QoS control for an entire
network. When the CAR works together with other QoS policies, the working order of the
policies is as follows:
l When policies are configured at the ingress, all policies take effect before packets are
decided to forward. If the CAR and other policies such as PQ and WFQ are configured,
the CAR takes effect earlier than other policies.
l When policies are configured on the egress, all policies take effect at the moment after
packets are decided to forward. If the CAR and other policies such as PQ and WFQ are
configured, the CAR takes effect earlier than other policies.
ATNA ATNB
Physical line
As shown in Figure 12-9, ATN A sends the packets to ATN B. To decrease the lost packets,
the packets are processed by GTS on the outbound interface of ATN A. The packets that do
not conform to the GTS traffic features are cached on ATN A. When ATN A can send the next
batch of the packets, the GTS extracts the packets from the buffer queue and sends them.
Therefore, the packets sent to ATN B conform to the traffic specifications of ATN B, and the
packet loss ratio on ATN B is decreased.
Terms
Terms Description
Committed Access Committed Access Rate (CAR) limits the traffic volume of a
Rate (CAR) specified type of packets. A token bucket (TB) is used to perform TP.
Generic Traffic Generic Traffic Shaping: The typical application of GTS is to control
Shaping (GTS) the volume and burst of outgoing traffic based on the network
connection. Therefore, the packets can be sent at a uniform rate. The
traffic shaping is implemented by using the buffer and the token
bucket. When the rate of sending the packets is too fast, the packets
are first cached in the buffer and then sent at a uniform rate under the
control of the token bucket.
Traffic Shaping A typical application of traffic shaping is to control the flow and burst
of outgoing traffic based on the network connection. Therefore, the
packets can be sent at a uniform rate. The traffic shaping adopts the
Generic Traffic Shaping (GTS) to shape the traffic that is irregular or
does not conform to the preset traffic features, which is convenient
for the bandwidth match between the network upstream and
downstream.
TP Traffic Policing
TS Traffic Shaping
Definition
l Congestion Avoidance
Congestion avoidance is a traffic control mechanism that monitors the network resources
such as queues and buffer memory. When network congestion is found of tending to
intensify, the ATN actively discards packets to regulate network traffic so that the
network is free from overload.
l Congestion Management
Congestion management provides means to manage and control traffic when traffic
congestion occurs. The queue scheduling technology is used to handle traffic congestion.
Packets sent from one interface are placed into many queues which are identified with
different priorities. Packets are then sent according to the priorities. A proper queue
scheduling mechanism can provide packets of different types with reasonable QoS
features such as the bandwidth, latency, and jitter. The queue here refers to the outgoing
packet queue. Packets are buffered into queues before the interface is able to send them.
Therefore, the queue scheduling mechanism works only when an outbound interface is
congested. The queue scheduling mechanism can re-arrange the order of packets except
those in First In First Out (FIFO) queues.
Purpose
Congestion avoidance and management are a traffic control mechanism to regulate network
traffic so that the network is free from overload.
Benefits
l This feature brings the following benefits to users.
Packets with high priorities will be preferentially transmitted when traffic congestion
occurs.
12.3.2 Principles
12.3.2.1 Basic Principles of Congestion Avoidance
Congestion avoidance is a traffic control mechanism used to discard packets according to the
queue status when the network is congested. Through congestion avoidance, the QoS of
traffic is improved when the network is congested.
The traditional solution adopted by congestion avoidance is tail drop. That is, all arriving
packets are discarded when the network is congested. If a large number of packets from a
TCP connection are discarded, the TCP connection will time out and enter the slow start state.
Then, the TCP connection sends fewer packets. When packets from multiple TCP connections
are discarded in a queue, these TCP connections enter the congestion avoidance and slow start
state at the same time, which is referred to as global TCP synchronization. Therefore, these
TCP connections simultaneously send fewer packets to the queue so that the rate of incoming
packets is smaller than the rate of outgoing packets, which reduces the bandwidth usage.
To avoid the preceding problems, packet discarding must be done before the queue is to be
congested. WRED is a congestion avoidance mechanism used to discard packets to prevent
queues from being congested. WRED discards at probabilities increasing packets that may
cause congestion. Therefore, the bandwidth consumed by outgoing interfaces of TCP
connections is reduced slowly, which does not cause the slow synchronization of a large
number of TCP connections. This also reduces the average queue length and shortens the
delay for sending traffic.
The ATN uses both the tail-drop and the WRED algorithms for congestion avoidance. In the
Diff-Serv model, the ATN preserves eight service queues for each port. The queues map the
following service types respectively: BE, AF1 to AF4, EF, CS6, and CS7. By default, AF1 to
AF4 and BE queues are applied with the WFQ scheduling; they are allocated with bandwidth
according to configured weight parameters. EF, CS6, and CS7 queues are configured to the
PQ scheduling by default.
RED Algorithm
To avoid the global TCP synchronization, you can use the random early detection (RED)
mechanism. The RED is a mechanism for detecting congestion. You can define a type of
traffic so that when the length of a queue exceeds a limit, the router discards packets
randomly in a proportion in advance. Figure 12-10 shows the working principle of the RED.
Minimum
Drop
Probability
10%
20 40 Average
Quere
Minimum Maximum Size
Threshold Threshold
According to the RED algorithm, each queue is set with a pair of minimum threshold and
maximum threshold. Apart from this,
l When a queue is shorter than the minimum threshold, the router does not discard
packets.
l When a queue is longer than the maximum threshold, the router discards all incoming
packets.
l When the length of a queue is between the minimum threshold and the maximum
threshold, the router discards packets at random. By the random way, each arriving
packet is applied with a random number. This random number is compared with the
current discarding probability of the current. If the number is greater than the discarding
probability, this packet is discarded. The longer a queue, the higher the drop probability
is. The probability, however, cannot exceed the maximum value.
With this mechanism, the router decides to discard packets by comparing the length of a
queue with the minimum threshold and the maximum threshold, that is, by means of setting
the absolute length of a queue. This mechanism is unfair to burst traffic and is unfavorable for
data transmission. Therefore, a new mechanism is adopted: a router decides to discard packets
by comparing the average queue size (AQS) with the minimum threshold and the maximum
threshold, that is, by means of setting the relative value by comparing the queue length
thresholds with the average queue length. The AQS implies the changing tendency of a queue;
it is insensitive to the abrupt change of the queue length. This avoids the consequence that
burst traffic is considered unfairly.
The random discarding of packets in the RED mechanism can prevent simultaneous drop of
traffic rates of many TCP connections. In this way, global TCP synchronization is avoided.
When the packets of a certain TCP connection are discarded and the traffic rate decreases,
packets of other TCP connections are still sent at high rate. For all the time, there must be
packets of some TCP connections that are sent at high rate. As a result, the bandwidth of a
link is fully used.
A ve ra g e
lin k u s e
F lo w A F lo w B F lo w C
Figure 12-11 shows that when the RED congestion avoidance algorithm is used, the traffic
flows on a network are stable.
WRED Algorithm
The RED algorithm can better solve the problem of the global TCP synchronization. This
algorithm, however, cannot sense any QoS signaling: all types of packets are considered
equally. Therefore, this algorithm is less flexible. To adopt differentiated discarding policies
to different types of packets, the weighted random early detection (WRED) algorithm is
introduced.
The WRED algorithm is similar to the RED algorithm. In the WRED algorithm, each queue is
also set with a minimum threshold and a maximum threshold. Apart from this,
l When a queue is shorter than the minimum threshold, the ATN does not discard packets.
l When a queue is longer than the maximum threshold, the ATN discards all incoming
packets.
l When the length of a queue is between the minimum threshold and the maximum
threshold, the ATN discards packets in a random order. By the random way, each
arriving packet is applied with a random number. This random number is compared with
the current discarding probability of the current. If the number is greater the discarding
probability, this packet is discarded. The longer a queue, the higher the discarding
probability. The probability, however, cannot exceed the high limit.
l In addition, the average queue length is used to compare with the minimum threshold
and the maximum threshold so that burst traffic is processed unfairly.
NOTE
The longer the queue, the higher the drop probability. When the queue lengths are the same, the higher
the maximum drop probability, the higher the drop probability.
Different from the RED algorithm, the random numbers produced by the WRED algorithm is
based on the precedence. In the WRED mechanism, the DSCP value that indicates the IP
precedence is introduced to identify discarding policies. You can set different DSCP values
for the queue length, queue threshold, and drop probability so that packets of different
precedence are applied with different discarding probability. This is the important feature of
the WRED algorithm.
l When the weighted fair queuing (WFQ) is used in the queuing mechanism, packets of
different precedence can be set with different minimum threshold, maximum threshold,
and drop probability. In this way, packets of different precedence are provided with
different discarding features.
l When the FIFO, PQ are used in the queuing mechanism, you can set different minimum
threshold, maximum threshold, and drop probability for each queue so that packets of
different types are provided with different discarding features.
Figure 12-12 shows the relationships between the WRED and queues.
......
QueueN-1 weightN-1
Classifying Scheduling Forwarded packets
QueueN weightN
Dropped
packets
You can configure a template on a device to realize the WRED. First define WRED
templates: set the maximum and minimum thresholds for packets in different colors and
set the drop probability. Then apply the WRED templates for different levels of quality
on the interface. You can configure a maximum of eight WRED templates for queues on
an interface. Each template supports the process of packets of no more than three colors.
These packets are defined as red, yellow, and green packets. Generally green packets are
set to low drop probability and high threshold while red packets are set to high drop
probability and low threshold. You can configure packets of different colors with a
different thresholds and drop probabilities flexibly.
When traffic congestion occurs, a queue begins to buffer packets. According to the
classification of packets, red packets are set to low threshold and high drop probability;
therefore, the red packets begin to be dropped first. When the queue is long enough,
green packets begin to be dropped. When the queue length reaches the maximum
threshold of a color, packets of this color begin to be applied with the tail-drop policy.
Because the WFQ queues share the bandwidth in proportion, traffic congestion occurs
easily. The use of the WRED policy can effectively prevent the global TCP
synchronization.
Currently, the device supports the application of the WRED policy only on outbound
interfaces.
l Congestion Avoidance of Flow Queues
The downstream FQs support the WRED and tail drop mechanisms.
Packets to be sent
from this interface Scheduling
As shown in Figure 12-13, FIFO does not classify packets. When packets enter the
interface at a rate higher than the ability the interface can support, FIFO lets the packets
that come earlier to enter the queue first. At the outbound interface, FIFO lets the packets
leave the interface in the same order as when the packets enter the interface. This is
called first in, first out for short.
In the FIFO mechanism, if a queue is defined to be too long, the queue is not easy to be
full and fewer packets are discarded. But long queue results in long latency. If a queue is
defined to be too short, latency is short but more packets are discarded. In configuration,
you must balance between the two factors to achieve a favorable result. Such a problem
also exists in other queue scheduling mechanisms.
l PQ
In the Priority Queuing (PQ) mechanism, queues are generally classified into four levels,
namely, top, middle, normal, and bottom, from high to low in priority.
NOTE
On the device, queues are classified into eight priority levels, from 0 to 7.
As shown in Figure 12-14, when packets arrive, PQ organizes the packets into four
classes. Each class of packets is sent to one of the four PQ queues.
When packets leave a queue, PQ lets the packets from the queue of the top priority go
first. Packets from this queue keep being sent until the queue is empty. When the packets
from the queue of the top priority are all sent, packets from the queue of middle priority
are sent. When the packets from the queue of the middle priority are all sent, the packets
from the queue of the normal priority are sent; finally, the packets from the queue of the
bottom priority are sent.
In this way, packets from the queue of high priority are sent earlier according to the
classification. When congestion occurs, packets from the queue of high priority are still
authorized to leave earlier. This makes the packets of important services such as the
enterprise resource planning (ERP) service are handled earlier. The packets of not so
important services such as the email service are handled late until the packets of
important services are all sent up and the network is idle. As a result, key services are
handled first and network resources are also fully used.
PQ has the following features:
– ACLs can be used for packets classification and then classified packets are put into
different queues as required.
– The tail drop mechanism is used as the only packet drop policy when congestion
occurs.
– The FIFO is used in the queue internally.
– In queue scheduling, packets from the queue of high priority are scheduled first.
PQ has also obvious advantages and obvious disadvantages as follows:
– Advantages: Packets from the queue of high priority are provided with higher
bandwidth, lower latency, and less jitter.
– Disadvantages: Packets from the queue of low priority are not scheduled in time so
that they keep "starving."
l WFQ
The weighted fair queuing (WFQ) is a complex queuing algorithm. With this algorithm,
services of the same priority are processed in fair manner. Services of different priorities
are weighted before being processed.
Figure 12-15 shows the WFQ queuing principle.
...
Packets to be sent Scheduling
from this interface
Queue N
The queue scheduling for the device consists of three stages: the traffic rate limit, the default
SQ queue scheduling and the FQ queue scheduling on the interface. The following are
configurable parameters for queues:
In the Diff-Serv model, the ATN reserves eight service queues for each interface. These
queues map the service types of BE, AF1 to AF4, EF, CS6, and CS7. By default, AF1 to AF4
and BE queues are configured to the WFQ scheduling scheme; bandwidth is distributed
proportionally according to the preset weight. The EF, CS6, and CS7 queues are configured to
the PQ scheduling scheme by default. This scheduling is based on absolute priorities. PQ
scheduling is used in services sensitive to latency.
12.3.3 Applications
Server Telephone
10.1.1.2/24 10.1.1.3/24
ATNA CX
GE1
10.1.1.1/24 S0 Network
S1 GE2
100.1.1.1/24
PC1 PC2
10.1.1.4/24 10.1.1.5/24
As shown in Figure 12-16, devices Server, Telephone, PC1 and PC2 all send data to the
network through ATN A. The data sent from Server is of critical traffic class; the data sent
from Telephone is of voice services; the data from PC1 and PC2 is of normal services.
Because the rate of the inbound interface GE0/3/0 on ATN A is greater than that of the
outbound interface GE0/3/1, congestion may occur on GE0/3/1.
When network congestion occurs, the data sent by Server and Telephone must be transmitted
first. Users PC1 and PC2 allow a little delay to the transmission of their data but they also
require bandwidth guarantee because they are VIP users. Therefore, ATN A must discard
packets based on the priority of the packets when the network congestion intensifies.
ATN A sends packets through the S0 interface to CX. Because the bandwidth of the S0
interface is less than that of the S1 interface. the S0 interface on ATN A is easy to be
congested.
The queuing technologies need to be used to manage and control the congested interface. First
classify packets to be sent from the S0 interface and place them into many different queues.
Then process the queues respectively according to the priorities. Packets of high priorities are
handled first.
Terms
Terms Description
Congestion A network status that the network traffic exceeds the supported value
so that the system cannot process network messages normally.
Terms Description
First In First Out First in First out: It allows the packets to enter and leave the queue
Queuing (FIFO) based on the sequence of reaching the interface.
Priority Queue Priority Queuing: It classifies the packets based on the carried
(PQ) information and sends the packets based on the specified priorities.
The priority queuing ensures the larger bandwidth, lower delay, and
less jitter for the queue of the higher priority. The packets in the
queue of the lower priority cannot be scheduled immediately and are
dropped.
Weighted Fair Weighted Fair Queuing: WFQ classifies the traffic dynamically based
Queue (WFQ) on the quintuple (source IP address, destination IP address, protocol
number, source port number, and destination port number).The Hash
algorithm is used to map the flows to different queues and allocate the
bandwidth for each flow based on the priorities of the flows. In some
cases, the ToS field is used.
Weighted Random The weighted random early detection (RED) is a mechanism for
Early Detection detecting traffic congestion. You can define a type of traffic so that
(WRED) when the length of a queue exceeds a limit, the ATN discards packets
randomly in a proportion in advance. In this mechanism, weight is
introduced on the basis of the RED.
FQ Fair Queue
PQ Priority Queue
12.4.1 Introduction
Definition
Traffic classification classifies packets based on certain rules defined based on specific
information contained in packets, and then implements different QoS policies for the packets
matching different rules.
Based on matching rules, traffic classification is classified into simple traffic classification
and complex traffic classification.
Purpose
Traffic classification provides differentiated services for the traffic of users in the Diff-Serv
domain.
Due to the characteristics of the traffic model and service model on the IP network, the
Internet backbone network needs to provide services for thousands of service traffic at the
same time. As a result, the resolution roadmap of reserving the bandwidth for each flow
cannot be expanded, which seriously restricts the IntServ application on the practical network.
The IntServ application is also restricted by other factors such as the large-scale deployment
of the RSVP signaling, the interworking between the devices of different manufacturers, and
the management (including authentication and accounting) based on services. Since 1994, the
IntServ has not been used for commercial purpose.
The Diff-Serv, however, is a class-based QoS technology. On the ingress of the network, Diff-
Serv is used to implement traffic classification and traffic control based on the service
requirements and set the ToS fields of the packets. Diff-Serv is also used to differentiate the
communications based on the values of the ToS fields in packets and provide QoS services
including resource allocation, queue scheduling, and packet discarding policy, which are
called Per Hop Behaviors (PHBs). All the nodes in the Diff-Serv domain abide by the PHB
based on the DSCP fields of packets. The Diff-Serv model classifies services, which improves
the service scalability.
The Diff-Serv model provides different services for different types of traffic. Therefore, in the
Diff-Serv model, the service traffic needs to be classified based on the service requirements,
which is the prerequisite and basis for the differentiated service.
Benefits
l This feature brings the following benefits to carriers:
Class-based QoS provide different QoS services by differentiating users and services.
12.4.2 Principles
Simple traffic classification can be classified into two types: upstream simple traffic
classification and downstream simple traffic classification.
l Upstream simple traffic classification: Based on DSCP values of IP packets, EXP values
of MPLS packets, and 802.1p values of VLAN packets, the packets are classified into
eight CoSs (CS7, CS6, EF, AF4 to AF1, and BE) and marked with three colors (green,
yellow, and red). When the CoS of packets is EF, BE, CS6, or CS7, by default the
packets can be re-marked in green. Upstream simple traffic classification is used to
differentiate services such as voice, video, and data services. During congestion
management and queue scheduling, different services enter different queues. Therefore,
different scheduling solutions are implemented. For example, voice services can enter
the PQ queue of a higher priority, and short delay is ensured. If upstream simple traffic
classification is not implemented, the service type of all the packets is BE.
l Downstream simple traffic classification: Based on the CoSs (CS7, CS6, EF, AF4 to
AF1, and BE) and three colors (green, yellow, and red), DSCP values of IP packets, EXP
values of MPLS packets, or 802.1p values of VLAN packets are re-set. Downstream
simple traffic classification implements the re-marking function, that is, re-marking
DSCP values of IP packets, EXP values of MPLS packets, or 802.1p values of VLAN
packets.
BA Mapping
8021p
ServiceClass
DSCP
Color
MPLS EXP
PHB Mapping
8021p
ServiceClass
DSCP
Color
MPLS EXP
Table 12-5 Default mapping between DSCP values of IP packets and CoSs in default domain
01 BE Green 33 BE Green
03 BE Green 35 BE Green
05 BE Green 37 BE Green
07 BE Green 39 BE Green
09 BE Green 41 BE Green
11 BE Green 43 BE Green
13 BE Green 45 BE Green
15 BE Green 47 BE Green
17 BE Green 49 BE Green
19 BE Green 51 BE Green
21 BE Green 53 BE Green
23 BE Green 55 BE Green
25 BE Green 57 BE Green
27 BE Green 59 BE Green
29 BE Green 61 BE Green
31 BE Green 63 BE Green
Table 12-6 Default mapping between 802.1p values of VLAN packets and CoSs in the default
domain template
802.1p CoS Color 802.1p CoS Color
Table 12-7 Default mapping between EXP values of MPLS packets and CoSs
EXP CoS Color EXP CoS Color
Table 12-8 Default mapping between the CoS value and the DSCP value in default domain
Service Color DSCP
BE Green 0
AF1 Green 10
AF1 Yellow 12
AF1 Red 14
AF2 Green 18
AF2 Yellow 20
AF2 Red 22
AF3 Green 26
AF3 Yellow 28
AF3 Red 30
AF4 Green 34
AF4 Yellow 36
AF4 Red 38
EF Green 46
CS6 Green 48
CS7 Green 56
The default mapping between the CoS value and the 802.1p value is shown inTable 12-9.
Table 12-9 Mappings from QoS CoSs and colors to 802.1p priorities in the default domain
template
CoS Color 802.1p
BE Green 0
EF Green 5
CS6 Green 6
CS7 Green 7
The default mapping between the CoS value and the EXP value is shown in Table 12-10.
Table 12-10 Default mapping between the CoS value and the EXP value
Service Color MPLS EXP
BE Green 0
EF Green 5
CS6 Green 6
CS7 Green 7
Traffic Classifiers
A classifier is a set of defined conditions for classifying packets. Packets are classified
through certain fields in the packets.
Multiple matching rules can be defined in a classifier. The default relationship between their
rules is "OR". That is, the corresponding behaviors can be implemented for the packets when
the packets match any one of the rules. The relationship between these rules can be specified
by setting the parameter operator. The relationships can be specified only while a traffic
classifier is created and cannot be specified after a traffic classifier is created.
Traffic Behaviors
Traffic classification is performed to provide differentiated services. Therefore, traffic
classification is useful only after it is associated with certain traffic control actions or resource
distribution actions. The traffic behaviors are as follows (these behaviors can be used
together):
l Deny or permit
The deny or permit action is the simplest traffic behavior. The network traffic can be
controlled by permitting packets to pass through or denying packets.
l Traffic policing
As one of the traffic behaviors, traffic policing is also called CAR. Through CAR,
operators can set the maximum volume of traffic for various services from the network
edge and control the usage of network resources, which ensures QoS on the entire
network. Operators sign the service level agreements (SLAs) for cooperation. An SLA
contains the parameters such as the Committed Information Rate (CIR), Peak
Information Rate (PIR), Committed Burst Size (CBS), and Peak Burst Size (PBS) of
various service traffic. The device performs such behaviors as pass, drop, or re-marking
the priorities of packets for the traffic exceeding the promised limit.
l Re-marking
Re-marking is to mark service traffic with classes according to the SLA and results of
traffic classification. At present, the related RFC protocol defines six types of standard
services: EF, AF1 to AF4, and BE and confirms the requirements for implementing these
services by defining the PHBs of the services, that is, the requirements for processing
these services by the device. EF traffic requires short delay, low jitter, and low packet
loss ratio, and corresponds to real-time services such as video services, voice services,
and video conferences. AF traffic requires shorter delay, low packet loss ratio, and high
reliability, and corresponds to services that have high requirements for data reliability,
such as e-business and enterprise VPNs. BE traffic has no requirement for the CIR and
delay, and corresponds to traditional Internet services. The device can specify the service
types, that is, EF, AF1 to AF4, and BE, of packets.
l Redirect
The redirect action indicates that the device does not forward packets according to the
original destination addresses of the packets but forward the packets to a specified next
hop. In this manner, policy-based routing is implemented. Currently, the redirect action
is valid only for Layer 3 packet forwarding.
The device can implement multiple types of redirect action.
– IPv4 strong redirection
If a user specifies the next-hop IP address and outgoing interface of a packet, the
device does not need to search the FIB table for an entry for forwarding the packet.
The device can directly send the packet to the outgoing interface specified by the
user. The packet can be sent after being encapsulated with the ARP information on
the outgoing interface. If the outgoing interface is Down, the packet is discarded
and is not forwarded according to the original destination address.
– IPv4 weak redirection
When a user specifies the next-hop IP address of a packet but does not specify the
outgoing interface of the packet, the devices search the FIB table according to the
next-hop IP address configured by the user for an entry for forwarding the packet. If
the path specified by the user is available, the device forwards the packet along the
path. If the path specified by the user is unavailable, the device forwards the packet
according to the original destination address of the packet.
l Security
Security actions perform measures such as port mirroring, or traffic sampling over
packets. Security actions are not QoS measures but can be used together with other
actions to improve the security of the network and packets.
Traffic Policies
A traffic policy is an integrated QoS policy formed by associating traffic classifiers with QoS
behaviors. A traffic policy can be applied to interfaces, thus applying traffic classifiers and
behaviors defined in the traffic policy.
The traffic policy supports two attributes, that is, the shared attribute and the unshared
attribute. The shared attribute indicates that different interfaces use the same traffic policy and
share a set of traffic classifier and traffic behavior entries. The unshared attribute indicates
that different interfaces use the same traffic policy but use multiple sets of traffic classifier
and traffic behavior entries generated based on interfaces and VLANs.
When two interfaces use the same traffic policy, the two interface share a set of rules and
behaviors if the attribute of the traffic policy is shared. If CAR is set, the traffic on both the
two interfaces is limited.
The two interface use two sets of rules and behaviors if the attribute of the traffic policy is
unshared. The rules are the same but the behaviors are different. If CAR is set, the traffic on
the two interfaces is limited independently.
The device supports the dynamic modification of rules of a traffic policy but does not support
the dynamic modification of the shared attribute or the unshared attribute of a traffic policy.
After applying a traffic policy on an interface, you can dynamically add, delete, or change the
rules and behaviors of the traffic policy, but you cannot change the shared attribute of the
traffic policy. You can change the shared attribute of the traffic policy only after disabling the
traffic policy on the interface.
Is complex No
traffic classification
enabled?
Yes
Rule key
Contruct the Key value.
GID SIP DIP ......
Rule mask
Search the table of SIP_mask DIP_mask ......
complex traffic
classification.
packet
GID SIP DIP ......
Do the packets
match the traffic
policy? No
Yes
The preceding figure shows the basic process of implementing complex traffic classification
for packets. When the AND operation is performed between the masks of the rules of a traffic
policy and the source IP addresses and destination IP addresses of packets, the packets match
the policy if the value obtained through the AND operation is the same as the value defined in
the rules. Then, the behavior specified in the traffic policy is implemented for the packets.
12.4.3 Applications
As shown in the preceding figure, ATN A and CX connect to each other through a VLAN.
When IP packets sent from ATN A enter the VLAN, the priority of the IP packets is mapped
to the priority of the VLAN frames according to the default mapping. When the packets from
the VLAN reach CX, the priority of the packets is mapped according to the priority mapping
for the DS domain set on CX.
l Simple traffic classification applied in MPLS networks
As shown in the preceding figure, the three devices set up MPLS peer relationships. After
reaching ATN A, IP traffic is forwarded through MPLS from ATN A to CXC. After the IP
traffic leaves CXC, the IP traffic is forwarded through IP. The mapping from IP DSCP values
to MPLS EXP values is set on GE1 of ATN A, and the mapping from MPLS EXP values to IP
DSCP values is set on GE1 of CXC. Simple traffic classification is enabled on the two
interfaces. Therefore, the DSCP value of the IP traffic can be changed to the EXP value of
MPLS traffic on ATN A, and the EXP value of MPLS traffic can be changed to the DSCP
value of the IP traffic on CXC.
Core
network
Edge access
nodes
Company A Company B
193.2.0.0 193.1.0.0
As shown in the preceding figure, assume that the bandwidth purchased by Company A is 200
Mbit/s, and that purchased by Company B is 400 Mbit/s. To ensure the bandwidth, you can
configure complex traffic classification on the edge access node. The node can thus
differentiate the traffic of Company A from that of Company B based on the IP addresses, and
then carry out different traffic policing policies.
Terms
None.
12.5 HQoS
Purpose
Along with the emergence of new applications on IP networks, new requirements are
presented to the QoS of IP networks. For example, real-time services such as Voice over IP
(VoIP) demand a shorter delay. A long delay for packet transmission is unacceptable. Email
and File Transfer Protocol (FTP) services are comparatively insensitive to the delay. To
support the services that present different service requirements, such as voice, video, and data
services, the network is required to distinguish the services before providing corresponding
quality of services for them. Therefore, the QoS technology is introduced. With the rapid
development of network equipment, the capacity of a single interface increases along with the
number of users accessing it. HQoS is needed because Traditional QoS is encountering new
problems with applications.
l Traditional traffic management schedules traffic based on the bandwidth of interfaces.
As a result, traffic management is sensitive to the class of services rather than users,
which is fit for traffic at the network core side but not fit for traffic at the service access
side.
l Traditional traffic management has great difficulties in simultaneously controlling
multiple services of many users.
To solve the problems and provide better QoS, a kind of QoS technology that can carry out
queue scheduling based on the priorities of services and control the traffic of users, is in
urgent need. Combined with the Diff-Serv scheme, HQoS supported by ATN adopts three
levels of scheduling. HQoS enables the equipment to acquire policies for controlling internal
resources with the existing hardware. It can both provide the quality assurance for the
advanced users and reduce the total cost of the network construction.
Family HQoS is the classifying of services with the specified characteristics into the same
family. These services enter one subscriber queue (SQ). Currently, methods of identifying
family members include NONE, C-VLAN. Subscribers going online from different sub-
interfaces of the same interface can also be classified as belonging to the same family.
Leased line user refers to an enterprise that leases an entire interface (or interfaces) from the
operator. All subscribers from this enterprise are scheduled, and managed in a unified manner.
12.5.2 Principles
the cache in a queue. The time and sequence for packets leaving related queues and the
scheduling of packets in various queues are determined by scheduling policies. The QoS
queue structure only includes the downstream queue structure, which includes flow queues/
Subscriber Queue and port queues, as shown in Figure 12-22.
Packet
FQ0
shaping
FQ1 SQ-a
shaping PQ/WFQ shaping
SQ-c
shaping
Each FQ belongs to only one SQ, and each SQ corresponds to eight FQs. In practice, each SQ
maps one user (Tunnel, port or "port + VLAN"). Each user can use one to eight FQs.
Port Queue
A Port Queue (PQ) is a virtual queue. A virtual queue means that there is no buffer for the
queue. Data packets of the queue enter or leave the queue without any delay. The queue is
only a level in hierarchical scheduling for output packets. In practice, each PQ maps one
physical port or one logical port.
NOTE
Among the ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports lpq.
RR
RR is a simple scheduling method. Through RR, multiple queues can be scheduled.
1
2
3
scheduliing
4 7 6 5 4 3 2 1
6
Round robin, take a packet
7 away from each queue
RR schedules multiple queues in ring mode. If the queue on which RR is performed is not
empty, the scheduler takes one packet away from the queue. If the queue is empty, the queue
is skipped and the scheduler does not wait.
PQ (SP)
PQ (also called Strict Priority (SP)) is an absolute priority scheduling method. In PQ, packets
with a high priority are scheduled with precedence.
WFQ
WFQ is used to assign bandwidth to queues taking part in the scheduling according to the
weights of the queues. In WFQ, unused bandwidth is reassigned.
LPQ (SPL)
LPQ (also called Strict Priority Low (SPL)) is also an absolute priority scheduling method.
The difference between LPQ and PQ lies in their priorities during two-level scheduling of
HQoS. The priority sequence applied for two-level scheduling is PQ > WFQ > LPQ.
QoS queue scheduling includes FQ/SQ scheduling, and Target Port scheduling.
FQ/SQ Scheduling
Flow queue scheduling includes FQ traffic shaping, FQ scheduling, SQ traffic shaping, and
SQ scheduling. Traffic shaping uses a double token bucket and places tokens to the bucket
according to the rate of the traffic shaper.
An SQ corresponds to a user (Tunnel, port or "port + VLAN"). The SQ scheduling ensures the
CIR and PIR bandwidth of the user.
The FQ scheduling adopts PQ/WFQ/LPQ. Each SQ corresponds to eight FQs, and the eight
FQs share the bandwidth of the SQ. Traffic shaping is implemented on each FQ to limit the
traffic rate. That is, the PIR is configured for each FQ.
NOTE
For ATN 950B series, only ATN 950B (AND2CXPB/AND2CXPE) supports LPQ.
FQ0(LPQ)
LPQ
FQ1(WFQ)
FQ2(WFQ)
WFQ
FQ3(WFQ) PQ
FQ4(WFQ)
FQ5(PQ)
FQ6(PQ) PQ
FQ7(PQ)
PE PE
mobile backhaul
network
Base station CX
ATN RNC
equipment
Differentiated Service A QoS model that classifies the service level according the
packet precedence field (IP Precedence and DSCP), the
source IP address and the destination IP address. Packets
with different levels can be provided with different service
levels. It is commonly used to provide end-to-end QoS for
specified application programs.
Weighted Fair Queue It features automatic traffic classification and balances the
delay and jitter time of each traffic. Compared with Fair
Queue, it benefits the high-priority packet.
Abbreviations
Abbreviation Full Spelling
RR Round Robin
SP Strict Priority
13 Clock
This document describes the clock and time synchronization features in terms of overview,
principle, and applications.
NOTE
If the SFP interface houses an electrical module, the interface does not support synchronous
Ethernet,IEEE 1588v2 or IEEE 1588 ACR .
13.1.1 Introduction
Definition
Clock synchronization including frequency synchronization and time synchronization, refers
to a precise and specific relationship between the frequencies or phases of signals. Signals are
moving at the same rate of speed at any given moment and this guarantees that all the devices
in the communication network works at the same speed.
Information is coded into discrete Pulse Code Modulation (PCM) pulses for transmission over
digital communications networks. If two digital switching devices have different clock
frequencies or if digital bit streams are impaired or damaged, then phase drift or jitter occurs
during transmission. This results in the loss and duplication of elements from the buffer
storage device of the digital switching system, leading to loses from slippage of transmitted
bit streams.
13.1.2 Principles
Clock Source
A device that provides clock signals to a local device is called a clock source. A local device
may have multiple clock sources. There are several types of clock sources:
SSM
SSMs are also referred to as synchronization quality messages. On a link carrying
synchronization timing signals, SSMs indicate clock quality levels of the timing signals.
l Pseudo synchronization
l Master/slave synchronization
Pseudo Synchronization
Pseudo synchronization refers to situations in which each switching site has its own highly
accurate and highly stable independent clock. The clocks of the switching sites are not
synchronized. Differences in clock frequency and phasing between different switching sites
are, however, very small. They do not affect data transmissions and can be ignored.
Master/Slave Synchronization
Master/slave synchronization refers to situations in which a highly accurate clock is set as the
internal master clock for a network. Clocks at all sites within the network trace the master
clock. Each sub-site traces a higher level clock until the highest level network element is
reached.
There are two types of master/slave synchronization:
l Direct master/slave synchronization
l Level-based master/slave synchronization
Figure 13-1 shows direct master/slave synchronization. All of the slave clocks synchronize
directly with the primary reference clock. Direct master/slave synchronization is used on
networks with relatively simple structures.
Figure 13-2 shows level-based master/slave synchronization. Devices on the network are
divided into three levels. Level two clocks synchronize with the level one reference clock.
Level three clocks synchronize with level two clocks. Level-based master/slave
synchronization is used on networks of larger scale and complicated structure.
To improve the reliability of master/slave synchronization, two master clocks are set on the
network. There is an active master clock and a standby master clock. Both are cesium clocks.
Under normal circumstances, each network element traces the master clock The standby
master clock also traces the master clock. If the active master clock is faulty, the standby
master clock takes over and becomes the reference clock for the entire network. After the
fault is repaired and the master clock recovers, there is a switchover. The original active
master clock becomes active again and serves as the reference clock.
The device that functions as a slave clock has the following working status:
l Trace status
A slave clock traces and locks on to a clock source provided by a higher level clock. This
clock source may be the master clock or it may be an internal clock source on the
network element at the next highest level.
l Holding status
If a slave clock loses reference clocks, the slave clock enters holding status. The slave
clock uses the last frequency stored before the reference clocks were lost. In addition, the
slave clock provides clock signals that conform to the source reference clock. This
ensures that there are only minor frequency differences between the clock signals
provided by the slave clock and those of the reference clock. After the holding time
expires, the slave clock enters free running status.
l Free running status
If a slave clock loses all external reference clocks, the slave clock loses clock reference
memories. As a result, the oscillator inside the slave clock works in free running status.
Fixed oscillation frequencies may drift over time. This means that a clock in holding status
cannot retain its accuracy for a long period of time. The accuracy of a clock in holding status
is inferior to that of a clock in trace status.
BITS
CLK-IN
CLK-IN CLK-IN
CLK-OUT CLK-OUT
CX- A CX-B ATN
The networking previously described can only be used to connect devices at the same site.
The distance between the ATN and CX cannot exceed 200 meters.
BITS
CLK-IN
Ethernet Ethernet
External clock
E W E W E W
NOTE
In all of the networking diagrams for this chapter, W represents the west side interface, and E represents
the east side interface.
The ATNA serves as the local clock for this network element. It extracts clock information
from signals, which are received at the E side interface. The clock board on CX-C also acts as
the local clock for its network element, extracting clock information from signals, which in
this case are received at the W side interface. At the same time, clock information is attached
to signals and these are transmitted downstream to ATND. ATND receives these signals at the
W side interface and uses the clock information extracted as a reference point to complete
clock synchronization with the main station CX-B.
Performance degradation of the clock on ATNA will not influence the clocks on CX-C and
ATND, but performance degradation on CX-C can influence the ATND clock because ATND
traces its clock through the higher level device, CX-C.
If a link is very long, clock signals transmitted to a slave station must be transmitted a long
distance or divided into several transmissions. To ensure that slave stations receive high
quality clock signals, two master clocks can be set on the network to act as reference clocks.
Network elements can trace one or the other of these reference clocks. The two reference
clocks must maintain synchronization and be at the same quality level.
W E
E CX- A W
ATNA ATNE
W E
E W
ATNB ATND
W E
ATNC
E W
Mixed Topology
As shown in Figure 13-7, ATNA, ATNB, ATNC, and CX-D form a ring network topology
and make use of STM-N links. CX-D and CX-E form a link network topology and make use
of STM-M links. N > M: Indicates that the link bandwidth of the ring network is greater than
that of the link network.
Serving as the main station, CX-E uses an external clock source as the reference clock for all
the devices on the network. CX-E and CX-D are connected through a low-speed link.
ATNA
STM-N
E W
STM-M W
ATNB CX- D
W E CX- E
ATNC
E W
ATNA, ATNB, and ATNC use both E side and W side interfaces to trace and lock the clock of
CX-D. This CX-D clock traces the clock transmitted by the main station CX-E. CX-D
extracts clock information from the STM-N signals transmitted by CX-E and uses these to
synchronize with the downstream devices.
This section describes how to deploy a network with highly reliable clock synchronization.
The following topics are covered:
l Overview of clock protection switching
l Implementation
l Boards participating in clock protection switching
Overview
Each device on the network uses a particular clock synchronization path moving from level to
level to trace the same reference clock. Clock synchronization for the entire network is
implemented in this way. Usually, a device does not acquire just one path to one clock source.
It may have multiple clock sources. These clock sources may come from the same master
clock or they may come from reference clocks with different quality levels. On a
synchronized network, it is very important to keep the clocks of the device synchronous. If
one clock synchronization path is faulty, synchronization on the entire network is faulty.
Automatic protection switching for synchronized clocks can be used to avoid this situation.
Automatic protection switching means that, if a device loses all paths that it has traced to a
certain clock source, it can automatically trace another clock source. The new path may lead
to the same reference clock which the device was previously tracing or it may lead to a
different lower quality clock source.
The clock source switchover can be lossless, without error codes generated.
Implementation
l Specifying a clock reference source manually
This method is used to designate a particular, fixed clock source for a clock board to
trace. In addition, the active and standby clock boards can be configured to trace
different clock sources.
As shown in Figure 13-8, on the master clock CX-A, the active clock board has been set
manually to trace BITS1 and the standby clock board has been set to trace BITS2. Under
normal circumstances, the master clock traces the BITSI reference clock. If the active
clock board is faulty, there is a switchover between the active and standby clock boards.
After the CX-A traces the BITS2 reference clock. CX-B traces the clock of CX, and
ATNC traces the clock of CX-B.
The problem with this method is that all of the device on the network are set to trace the
clock of CX-A. If CX-A is faulty, the entire network has no reference clock. All of the
devices are in the free oscillation status.
Figure 13-8 Networking diagram for specifying the clock reference source manually
BITS1
CLK-IN
CX- A
CLK-IN CX- B ATNC
BITS2
If the SSM level of a reference source is DNU and SSM is enabled, this reference source
is not chosen during protection switching.
The SSM level of output signals is determined by the traced clock source. When the
clock works in trace status, the current traced clock source port outputs signals with SSM
level of ODU, and other clock source ports outputs clock signals with the same SSM
level as the traced clock. When the clock does not work in trace status, the SSM level of
the output signals is SEC.
To set a clock source for a PIC, an SSM can be extracted from the PIC and reported to
the system control board. The system control board uses the SSM of the line clock
source to set the clock board. The system control board can also force setting the SSM of
the PIC clock source.
For a clock module that uses a BITS clock source, if the signal is 2.048 Mbit/s, an SSM
can be extracted by the clock module.
13.1.3 Applications
deliver wireless services, the IP bearer network uses IP and Ethernet technologies between the
RNC and NodeBs. Clock information sent by devices on the bearer network is synchronized
and then allocated to data communication devices connected to the base stations. Base
stations receive reliable clock transmissions and quality is guaranteed.
BTS
FE
CX600 RNC
FE GE GE
BTS ATNA GE GE
FE ATNB
BTS
BITS
W E
E W
CX- A
ATNB ATNF
W E
E W
ATNC ATNE
W E
CX- D
E W
BITS
CX-A External External clock source, East side clock source, West side clock
clock source source, and internal clock source
ATN B East side East side clock source, West side clock source, and internal
clock source clock source
ATN C East side East side clock source, West side clock source, and internal
clock source clock source
CX-D East side East side clock source, West side clock source, External clock
clock source source, and internal clock source
ATN E West side West side clock source, East side clock source, and internal
clock source clock source
ATN F West side West side clock source, East side clock source, and internal
clock source clock source
In addition, for CX-A and CX-D, the timeslot where the S1 byte of the external BITS clock is
stored must be configured.
l Use of clock protection switching when the link between ATN B and ATN C is faulty
Under normal circumstances, if the optical fiber between ATN B and ATN C breaks, the
synchronized clocks will switch automatically. CX-D traces a clock from ATN C, so
keeping in mind the switching protocol mentioned in the preceding sentence, the clock
quality message sent by CX-D to ATN C is "Do Not Use (DNU)", that is, the S1 byte is
0X0F. When ATN C detects the loss of the east side clock source, it cannot use the west
side clock source as the synchronization source for this station. ATN C can only use the
internal clock source as a reference clock. It sends the information to CX-D through the
S1 byte. The specific S1 byte sent by ATN C to CX-D is 0X0B, meaning "Synchronized
Equipment Timing Source (SETS) clock signals".
After receiving this signal, the quality of all synchronization sources traced by CX-D
decreases (the original clock source is the G.812 local clock; its S1 byte is 0X08). None
of these can satisfy the quality requirements set for synchronization sources. CX-D must
choose a different clock source that satisfies quality requirements. Four clock sources are
available for CX-D:
– East side clock source
– West side clock source
– Internal clock source
– External BITS clock source
At this time, only the west side clock source and the external BITS clock source satisfy
the quality threshold requirements.
The level of the west side clock source configured on CX-D is superior to that of the
external BITS clock source, so CX-D finally chooses the former as the synchronization
source for this station. After CX-D switches from the east side to west side to trace a
synchronization source, the west side clock source of ATN C becomes available. At this
time, among the clock sources available to ATN C, the west side clock source meets
quality threshold requirements and is the highest level clock available. As a result, ATN
C chooses the west side clock source as the synchronization source. Clock traces for the
entire network are shown in Figure 13-11.
Figure 13-11 Diagram of clock traces when the optical fiber between ATN B and ATN C
is damaged
BITS
W E
E W
CX- A
ATNB ATNF
E
W
ATNC ATNE
W E
CX- D
E W
BITS
Figure 13-12 Diagram of clock traces when the external BITS clock on CX-A is faulty
W E
E CX- A W
ATNB ATNF
E
W
E W
ATNC ATNE
W E
CX- D
E W
BITS
Figure 13-13 Diagram of clock traces when all the external BITS clocks are faulty
W E
E CX- A W
ATNB ATNF
E
W
E W
ATNC ATNE
W E
CX- D
E W
Inner clock
source
Terms
Terms Description
13.2 NTP
13.2.1 Introduction
Definition
Network Time Protocol (NTP) is an application layer protocol used on the internet to
synchronize clock among a set of distributed time servers and clients. In this manner, the
clock of the host is synchronized with certain time standards.
Purpose
NTP synchronizes the time of all devices that have clock configured on the network. If time
synchronization is not performed using NTP, the devices may encounter time errors. NTP
enables all network devices to have consistent time so that the devices provide various
applications based on unified time.
13.2.2 Principle
NTP synchronizes time among a set of distributed time servers and clients. In this manner, the
time of the host is synchronized with certain time standards. The server and client are two
relative concepts. The device that announces the willingness to synchronize clocks and
provides the standard time is a server; the device that announces its willingness to be
synchronized is a client. A local system running NTP can be synchronized by other clock
sources or acts as a clock source to synchronize other clocks. In addition, mutual
synchronization can be implemented through NTP packet exchanges. NTP message
transmission is based on UDP.
Third Third
server server
l A primary time server is directly synchronized with a primary reference source, which is
usually a radio clock or Global Positioning System (GPS).
l A secondary time server synchronizes its clock with the clock of the primary time server
on the network or other secondary time servers, and transmits the time information to
other hosts on the network through NTP.
Under normal circumstances, primary and secondary time servers in the synchronization
subnet assume a hierarchical-master-slave structure, with the primary server at the root and
the secondary server at successive stratums toward the leaf node. The higher the stratum level
is, the less accurate the clock will be.
Client Server
Internet
Kiss-o'-Death (KOD) packets provide useful information to a client and are used for status
reporting and access control. When KOD is enabled at the server, the server may send packets
with kiss codes DENY and RATE to the client.
l When the client receives packet with kiss code DENY, the client demobilizes any
associations with that server and stops sending packets to that server.
l When the client receives packet with kiss code RATE, the client immediately reduces its
polling interval to that server and continues to reduce it each time it receives a RATE
kiss code.
Peer Mode
In this mode, the active peer and the passive peer can be synchronized with each other. To be
specific, the higher stratum (lower level) peer is synchronized with the lower stratum (higher
level) peer. The active and passive peers first exchange NTP packets whose values of Mode
fields are 3 (sent by the client) and NTP packets whose values are 4 (sent by the server).
l Active peer: A host that functions as an active peer sends packets periodically. The value
of the Mode field in a packet is set to 1. This indicates that the packet is sent by an active
peer, without considering whether its peer is reachable and which stratum its peer is on.
The active peer can provide time information about the local clock for its peer, or
synchronize the time information about the local clock based on that of the peer clock.
l Passive peer: A host that functions as a passive peer receives packets from the active
peer and sends response packets. The value of the Mode field in a response packet is set
to 2. This indicates that the packer is sent by a passive peer. The passive peer can provide
time information about the local clock for its peer, or synchronize the time information
about the local clock based on that of the peer clock.
l Prerequisites for a host to function as a passive peer: The packets received by the local
host are sent by an active peer. The number of the stratum that the active peer is on must
be less than or equal to the number of the stratum that the local host is on. In addition,
the routes between the local host and the active peer must be reachable.
NOTE
The host operating in passive mode is at the lower stratum in the synchronization subnet. You do not
need to obtain information about the peer in advance because the connection between peers is not set up
and status variables are not configured unless the passive host receives NTP messages from the peer.
Broadcast Mode
l A host that runs in broadcast mode sends clock synchronization packets to the broadcast
address 255.255.255.255 periodically. The value of the Mode field in a packet is set to 5.
This indicates that the packet is sent by a host that runs in broadcast mode, without
considering whether its peer is reachable and which stratum its peer is on. The host
running in broadcast mode is usually a time server running high-speed broadcast media
on the network, which provides synchronization information for all of its peers but does
not alter the clock of its own.
l The client listens to the broadcast packets sent from the server. When the client receives
the first broadcast packet, the client and server exchange NTP packets whose values of
Mode fields are 3 (sent by the client) and the NTP packets whose values of Mode fields
are 4 (sent by the server). In this process, the client enables the server/client mode for a
short time to exchange information with the remote server. This allows the client to
obtain the network delay between the client and the server. Then, the client returns the
broadcast mode, and continues to sense the incoming broadcast packets to synchronize
the local clock.
The broadcast mode is applied to the high speed network that has multiple workstations and
does not require high accuracy. In a typical scenario, one or more time servers on the network
periodically send broadcast packets to the workstations. The delay of packet transmission in a
LAN is at the milliseconds level.
Internet
Multicast Mode
l A server running in multicast mode sends clock synchronization packets to a multicast
address periodically. The value of the Mode field in a packet is set to 5. This indicates
that the packet is sent by a host that runs in multicast mode. The host running in
multicast mode is usually a time server running high-speed broadcast media on the
network, which provides synchronization information for all of its peers but does not
alter the clock of its own.
l The client listens to the multicast packets from the server. When the client receives the
first broadcast packet, the client and the server exchange NTP packets whose values of
Mode fields are 3 (sent by the client) and the NTP packets whose values of Mode fields
are 4 (sent by the server). In this process, the client enables the server/client mode for a
short time to exchange information with the remote server. This allows the client to
obtain the network delay between the client and the server. Then, the client returns the
multicast mode, and continues to sense the incoming multicast packets to synchronize
the local clock.
Multicast mode is useful when there are large numbers of clients distributed in a network.
This normally results in large number of NTP packets in the network. In the multicast mode, a
single NTP multicast packet can potentially reach all the clients in the network and therefore
reduce the control traffic on the network.
Internet
Each mode processes the event with the same procedure: sending messages, receiving
messages, processing messages, and updating the clock.
Sending Messages
In all modes except the broadcast client mode and all server modes, the peer sends NTP
request messages when its timer times out. In broadcast client mode, the peer never sends
NTP request messages. In server mode, the peer sends NTP request messages only in
response to the received messages. If the received NTP request message does not result in a
local permanent connection, the action of receiving message invokes the action of sending
message to retain the connection.
To ensure a valid response message, the time when the message is sent must be accurately
saved and then added to the message.
Receiving Messages
After receiving the NTP request message, the server first checks the mode field of the NTP
request message. If the value is 0, it indicates that the peer adopts an earlier NTP version.
Then, the server checks whether modes of the local end and the peer are matched. The
following cases exist:
l If the matching result is displayed as error, the message is discarded and an error
message is returned.
l If the matching result is displayed as recv, the received message is processed. If the
packet header is valid, the connection is marked as reachable. If both the packet header
and the data are valid, the clock-update procedure is invoked to update the local clock.
Otherwise, the connection is deleted if it is not configured in advance.
l If the matching result is displayed as xmit, the received message is processed and a
response message is sent immediately. If the connection is not configured in advance, it
is deleted.
l If the matching result is displayed as pkt, the received message is processed. If the
packet header is valid, the connection is marked as reachable. If both the packet header
and the data are valid, the clock-update procedure is invoked to update the local clock.
Otherwise, if the connection is not configured in advance, a response message is sent
immediately, and then the connection is deleted.
Processing Messages
This process is used to check the message validity, calculate delay/offset samples, and invoke
other procedures to filter data and select reference source. This process first requires that the
transmit timestamp should be different from the transmit timestamp of the last message
received from the same peer; otherwise, the message may be outdated.
Secondly, it is required that the originate timestamp should be different from the originate
timestamp of the last message sent to the same peer; otherwise, the message may be mis-
sequenced, bogus or less accurate. In broadcast mode (5), the roundtrip delay is zero. In this
case, the high accuracy of the time-transfer operation is not ensured. However, the accuracy
achieved may be adequate for most objectives.
After the preceding procedure, the best clock sample can be selected from a specified clock
and the best clock can be selected from clock groups at different stratums. Finally, the delay
(peer.delay), offset (peer.offset), and dispersion (peer.dispersion) for the peer are all
determined.
The clock-selection is then invoked, which contains two algorithms: intersection and
clustering. The intersection algorithm generates a list of candidate peers suitable to serve as
the reference source and calculates a confidence interval for each peer. It then discards
falsetickers using a technology adopted from Marzullo and Owicki [MAR85]. The clustering
algorithm orders the list of remaining candidates based on their stratums and synchronization
distance. It repeatedly discards outlyers peers based on select dispersion until only the most
accurate, precise and stable candidates are selected.
If the offset, delay, and dispersion of the candidate peers are close identical, the clock
combining analyzes the clock candidates and then provides the parameters determined
through comprehensive analysis to the local end for updating the local clock.
Step1: Network
ATNA ATNB
NTP packet 10:00:00am 11:00:01am
Step2: Network
ATNA ATNB
Step3: Network
ATNA ATNB
Step4: Network
ATNA ATNB
Presuming that:
l Before the clocks of ATN A and ATN B are synchronized, the clock of ATN A is
10:00:00 am. and the clock of ATN B is 11:00:00 am.
l ATN B acts as an NTP time server. ATN A must synchronize its clock with ATN B.
l Unidirectional transmission of an NTP message between ATN A to ATN B takes one
second.
l Both ATN A and ATN B take one second to process an NTP message.
The process of synchronizing the system clock is as follows:
l ATN A sends an NTP message to ATN B. The message carries an initial timestamp,
10:00:00 am (T1), indicating the time when it leaves ATN A.
l When the NTP message reaches ATN B, ATN B adds a timestamp, namely, 11: 00:01
am. (T2) to the NTP message, indicting the time when ATN B receives the message.
l When the NTP message leaves ATN B, ATN B adds a transmit timestamp, namely,
11:00:02 am. (T3) to the NTP message, indicating the time when the message leaves
ATN B.
l When ATN A receives this response message, it adds a new receive timestamp, which is
10:00:03am (T4).
ATN A uses the received information to calculate the following two important parameters:
l A roundtrip delay of the NTP message: Delay = (T4 - T1) - (T3 - T2).
l The clock offset of ATN A by taking ATN B as a reference: Offset = ((T2 - T1) + (T3 -
T4))/2.
ATN A sets its clock based on the delay and offset to implement clock synchronization with
ATN B.
NOTE
NTP uses the standard algorithm in RFC 1305 to ensure the precision of clock synchronization. The
preceding example is only a brief introduction to the operating principle of NTP.
Access Authority
The device protects local NTP services by setting access authority. This is a simple measure
to ensure security.
The device provides four access authority levels. When an NTP access request message
reaches the local end, the device matches it with the access authority from level 1 to level 4.
The first matched authority level takes effect. The matching sequence is as follows:
l peer: indicates the minimum access authority. The remote end can perform time requests
and control queries for the local NTP service. The local clock can also be synchronized
with the clock of the remote server.
l server: indicates that the remote end can perform time requests and control queries for
the local NTP service. The local clock, however, cannot be synchronized with the clock
of the remote server.
l synchronization: indicates that the remote end can perform time requests only for the
local NTP service.
l query: indicates the maximum access authority. The remote end can perform control
queries only for the local NTP service.
Authentication
NTP authentication can be enabled on networks demanding high security. NTP authentication
should be separately configured on the client and the server.
l Configurations of NTP authentication on both the client and the server must be complete.
Otherwise, the authentication does not take effect. If NTP authentication is enabled, you
must configure the key and declare the key as reliable.
l Keys configured on the server and the client must be identical.
Static Association
A static association is set up through command lines.
Dynamic Association
A dynamic association is set up dynamically during the NTP implementation process.
NTP Network Time Protocol (NTP) is an application layer protocol used in the
internet to synchronize timekeeping among a set of distributed time servers and
clients. In this manner, the clock of the host is synchronized with certain time
standards.
Terms Description
Clock Clock offset is the time difference between the local clock and the reference
offset clock. It represents the offset time to be adjusted when the local clock is
synchronized with the reference clock.
Round Round trip delay refers to the time difference between the sending and
trip delay receiving of an NTP message on the client. It indicates the capability of the
local clock to send a message to the reference clock within a specified time.
Dispersio Dispersion is the maximum error of the local clock compared with the
n reference clock.
Clock Clock filtering is used among is used to select the best time sample from a
filtering specified peer.
Clock Clock selection is a method of selecting reference clocks by using the clock
selection selection algorithm.
Acronyms
Acronyms Full Spelling
13.3 1588v2
13.3.1 Introduction to 1588v2
Definition
l Synchronization
This is the process of ensuring that the frequency offset or time difference between
devices is kept within a reasonable range. In a modern communications network, most
telecommunications services require network clock synchronization in order to function
properly. Network clock synchronization includes time synchronization and frequency
synchronization.
– Time synchronization
Time synchronization, also called phase synchronization, means that both the
frequency of and the time between signals remain constant. In this case, the time
offset between signals is always 0.
– Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a constant
frequency offset or phase offset. In this case, signals are transmitted at a constant
average rate during any given time period so that all the devices on the network can
work at the same rate.
Phase synchronization
Watch A
Watch B
Frequency synchronization
Watch A
Watch B
Figure 13-20 shows the differences between time synchronization and frequency
synchronization. If Watch A and Watch B always have the same time, they are in time
synchronization. If Watch A and Watch B have different time, but the time offset remains
constant, for example, 6 hours, they are in frequency synchronization.
l IEEE 1588
IEEE 1588 is defined by the Institute of Electrical and Electronics Engineers (IEEE) as
Precision Clock Synchronization Protocol (PTP) for networked measurement and control
systems. It is called the Precision Time Protocol (PTP) for short.
IEEE 1588v1, released in 2002, applies to industrial automation and tests and
measurements fields. With the development of IP networks and the popularization of 3G
networks, the demand for time synchronization on telecommunications networks has
increased. To satisfy this need, IEEE drafted IEEE 1588v2 based on IEEE 1588v1 in
June 2006, revised IEEE 1588v2 in 2007, and released IEEE 1588v2 at the end of 2008.
Targeted at telecommunications industry applications, IEEE 1588v2 improves on IEEE
1588v1 in the following aspects:
– Encapsulation of Layer 2 and Layer 3 packets has been added.
– The transmission rate of Sync messages is increased.
– A transparent clock (TC) model has been developed.
– Hardware timestamp processing has been defined.
– Time-length-value (TLV) extension is used to enhance protocol features and
functions.
1588v2 is a time synchronization protocol which allows for highly accurate time
synchronization between devices. It is also used to implement frequency synchronization
between devices.
Purpose
Data communications networks do not require time or frequency synchronization and,
therefore, routers on such networks do not need to support time or frequency synchronization.
On IP radio access networks (RANs), time or frequency needs to be synchronized among base
transceiver stations (BTSs). Therefore, routers on IP RANs are required to support time or
frequency synchronization.
Table 13-3 Requirements of wireless standards for time synchronization and frequency
accuracy
Benefits
This feature brings the following benefits to operators:
l Construction and maintenance costs for time synchronization on wireless networks are
reduced.
l High-accuracy NQA-based unidirectional delay measurement is supported.
13.3.2 Principles
Clock Domain
Logically, a physical network can be divided into multiple clock domains. Each clock domain
has a reference time with which all devices in the domain are synchronized. Each clock
domain has its own reference time and these times are independent of one another.
A device can transparently transmit time signals from multiple clock domains over a bearer
network to provide specific reference times for multiple mobile operator networks. The
device, however, can join only one clock domain and can synchronize only with the
synchronization time of that clock domain.
Clock Node
Each node on a time synchronization network is a clock. The 1588v2 protocol defines the
following types of clocks:
l Ordinary clock
An ordinary clock (OC) has only one 1588v2 clock interface (a clock interface enabled
with 1588v2) through which the OC synchronizes with an upstream node or distributes
time signals to downstream nodes.
l Boundary clock
A boundary clock (BC) has multiple 1588v2 clock interfaces, one of which is used to
synchronize with an upstream node. The other interfaces can be used to distribute time
signals to downstream nodes.
The following is an example of a special case: If a ATN obtains the standard time from a
BITS through an external time interface (which is not enabled with 1588v2) and then
distributes time signals through two 1588v2 enabled clock interfaces to downstream
nodes, this router is a BC node, as it has more than one 1588v2 clock interface.
l Transparent clock
A transparent clock (TC) does not synchronize the time with other devices (unlike BCs
and OCs) but has multiple 1588v2 clock interfaces through which it transmits 1588v2
messages and corrects message transmission delays.
TCs are classified into end-to-end (E2E) TCs and peer-to-peer (P2P) TCs.
Figure 13-21 shows the location of the TC, OC, and TC+OC on a time synchronization
network.
Figure 13-21 Location of the TC, OC, and TC+OC on a time synchronization network
BC1
Grandmaster clock
TC1 TC2
Cyclic path
TC3 TC4
1588v2 Announce messages are used to exchange time source information, including
information about the priority level of the GM, time strata, time accuracy, distance, and hops
to the GM between clocks. After this information has been gathered, one of the clock nodes is
selected to be the GM, the interface to be used for transmitting clock signals issued by the
GM is selected, and master and slave relationships between nodes are specified. A loop-free
and full-meshed GM-rooted spanning tree is established after completion of the process.
If a master-slave relationship has been set up between two nodes, the master node periodically
sends Announce messages to the slave node. If the slave node does not receive an Announce
message from the master node within a specified period of time, it terminates the current
master-slave relationship and finds another interface with which to establish a new master-
slave relationship.
Grandmaster
A time synchronization network is like a GM-rooted spanning tree. All other nodes
synchronize with the GM.
Master/Slave
When a pair of nodes perform time synchronization, the upstream node distributing the
reference time signals is the master node and the downstream node receiving the reference
time signals is the slave node.
1588v2 presumes that the link delay is constant or changes so slowly that the change between
two synchronization processes can be ignored, and the packet transmission delays in two
directions on a link are identical. Packets are time-stamped for delay measurement at the
physical layer of the LPU. This ensures that time synchronization based on the obtained link
delay is extremely accurate.
1588v2 defines two modes for the delay measurement and time synchronization mechanisms,
namely, Delay and Peer Delay (PDelay).
Delay Mode
The Delay mode is applied to end-to-end (E2E) delay measurement. Figure 13-22 shows the
delay measurement in Delay mode.
t2 t2
Follow_Up
t1, t2
t3 t1, t2, t3
Delay_Req
t-sm
t4
Delay_Resp
NOTE
As shown in Figure 13-22, t-sm and t-ms represent the sending and receiving delays respectively and
are presumed to be identical. If they are different, they should be made identical through asymmetric
delay correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. Only the one-step mode is described in this part and
Follow_UP packets are not mentioned. For details about the two-step mode, see the following part of
this section.
A master node periodically sends a Sync packet carrying the sending timestamp t1 to the slave
node. When the slave node receives the Sync packet, it time-stamps t2 to the packet.
The slave node periodically sends the Delay_Req packet carrying the sending timestamp t3 to
the master node. When the master node receives the Delay_Req packet, it time-stamps t4 to
the packet and returns a Delay_Resp packet to the slave node.
The slave node receives a set of timestamps, including t1, t2, t3, and t4. Other elements
affecting the link delay are ignored.
The time offset between the master and slave nodes equals [(t2-t1)-(t4-t3)]/2.
Based on the time offset, the slave node synchronizes with the master node.
BC OC
Master Slave
t1 Sync
t2
DelayReq t3
t4
DelayResp
The BC and OC can be directly connected as shown in Figure 13-23. Alternatively, they can
be connected through other devices, but these devices must be TCs to ensure the accuracy of
time synchronization. The TC only transparently transmits 1588v2 packets and corrects the
packet transmission delay (which requires that the TC identify these 1588v2 packets).
To ensure the high accuracy of 1588v2 time synchronization, it is required that the packet
transmission delays in two directions between master and slave nodes be stable. Usually, the
link delay is stable but the transmission delay on devices is unstable. Therefore, if two nodes
performing time synchronization are connected through forwarding devices, the time
synchronization accuracy cannot be guaranteed. The solution to the problem is to perform the
transmission delay correction on these forwarding devices, which requires that the forwarding
devices be TCs.
Figure 13-24 shows how the transmission delay correction is performed on a TC.
+
+
The TC performs the transmission delay correction by adding the time it takes to transmit the
packet to the Correction field of a 1588v2 packet. This means that the TC deducts the
receiving timestamp of the 1588v2 packet on its inbound interface and adds the sending
timestamp to the 1588v2 packet on its outbound interface.
In this manner, the 1588v2 packet exchanged between the master and slave nodes, when
passing through multiple TCs, carry packet transmission delays of all TCs in the Correction
field. When the value of the Correction field is deducted, the value obtained is the link delay,
ensuring high accuracy time synchronization.
A TC that records the transmission delay from end to end as described above is the E2E TC.
Time synchronization in Delay mode can be applied only to E2E TCs. Figure 13-25 shows
how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 13-25 Networking diagram of the BC, OC, and E2E TC and the 1588v2 operation
BC E2E OC
Master TC Slave
t1 Sync
correction
t2
t3
correction
t4
DelayResp
PDelay Mode
When performing time synchronization in PDelay mode, the slave node deducts both the
packet transmission delay and upstream link delay. This requires that adjacent devices
perform the delay measurement in PDelay mode to enable each device on the link to know its
upstream link delay. Figure 13-26 shows the delay measurement in PDelay mode.
t1
Pdelay_Req
t-ms
t2
t3
Pdelay_Resp
t-sm
t4
Pdelay_Resp_Follow_Up
NOTE
As shown in Figure 13-22, t-sm and t-ms represent the sending and receiving delays respectively and
are presumed to be identical. If they are different, they should be made identical through asymmetric
delay correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. In this part, the one-step mode is described and
Follow_UP packets are not mentioned. For details about the two-step mode, see the following part of
this section.
Node 1 periodically sends a PDelay_Req packet carrying the sending timestamp t1 to node 2.
When the PDelay_Req packet is received, node 2 time-stamps t2 to the PDelay_Req packet.
Then, node 2 sends a PDelay_Resp packet carrying the sending timestamp t3 to node 1. When
the PDelay_Resp packet is received, node 1 time-stamps t4 to the PDelay_Resp packet.
Node 1 obtains a set of timestamps, including t1, t2, t3, and t4. Other elements affecting the
link delay are ignored.
The packet transmission delays in two directions on the link between node 1 and node 2 equal
(t4 - t1) - (t3 - t2).
If the packet transmission delays in two directions on the link between node 1 to node 2 are
identical, the packet transmission delay in one direction equals [(t4 - t1) - (t3 - t2)]/2.
The delay measurement in PDelay mode does not differentiate between the master and slave
nodes. All nodes send PDelay packets to their adjacent nodes to calculate adjacent link delay.
This calculation process repeats and the packet transmission delay in one direction is updated
accordingly.
The delay measurement in PDelay mode does not trigger time synchronization. To implement
time synchronization, the master node needs to periodically send Sync packets to the slave
node and the slave node receives the t1 and t2 timestamps. The slave node then deducts the
packet transmission delay on the link from the master node to the slave node. The obtained t2-
t1-CorrectionField is the time offset between the slave and master nodes. The slave node uses
the time offset to synchronize with the master node. Figure 13-27 shows how time
synchronization is implemented in PDelay mode in the scenario where the BC and OC are
directly connected.
Figure 13-27 Networking diagram of time synchronization in PDelay mode on the directly-
connected BC and OC
BC OC
Master Slave
t1 PDelay Req
t2
PDelay Resp t3
t4 PDelay Req t1
t2
t3 PDelay Resp
t4
t1 Sync
t2
+
+
Figure 13-29 shows how the BC, OC, and E2E TC are connected and how 1588v2 operates.
BC P2P OC
Master TC Slave
t1 PDelayReq
t2
PDelayResp t3
t4 t1 PDelayReq
t2
t4 PDelayResp t3
PDelayReq t2
t1
PDelayResp t3
t4
t1 PDelayReq
t2
PDelayReq
t4 t3
t1 correction
Sync t2
One-Step/Two-Step
In one-step mode, both the Sync packets for time synchronization in Delay mode and
PDelay_Resp packets for time synchronization in PDelay mode are stamped with a sending
time.
Asymmetric Correction
Theoretically, 1588v2 requires the packet transmission delays in two directions on a link to be
symmetrical. Otherwise, the algorithms of 1588v2 time synchronization cannot be
implemented. In practice, however, the packet transmission delays in two directions on a link
may be asymmetric due to the attributes of a link or a device. For example, if the delays
between receiving the packet and time-stamping the packet in two directions are different,
1588v2 provides a mechanism of asymmetric delay correction, as shown in Figure 13-30.
Master clock
or
Responder
A B
t-sm t-ms
Slave clock
or
Requestor
Usually, t-ms is identical with t-sm. If they are different, the user can set a delay offset
between them as long as the delay offset is constant and obtainable. 1588v2 performs the time
synchronization calculation according to the asymmetric correction value. In this manner, a
high level of time synchronization accuracy can be achieved on an asymmetric-delay link.
Packet Encapsulation
1588v2 defines the following multiple packet encapsulation modes:
l Layer 3 unicast encapsulation through unicast UDP
The destination UDP port number is 319 or 320, depending on the types of 1588v2
packets.
Currently, it is recommended that Huawei base stations adopt Layer 3 unicast
encapsulation. The IP clock server consists of multiple BTSs and uses unicast UDP
packets to exchange 1588v2 protocol packets. Figure 13-31 shows Layer 3 unicast
encapsulation without VLAN tags.
1588
DA SA 0x800 IP(header) UDP(header)
packet
Vlan--12bit
DA SA 0x8100 prority--3bit
IP(header) UDP(header) 1588 packet
BITS Interface
1588v2 enables clock nodes to synchronize with each other, but cannot enable them to
synchronize with Greenwich Mean Time (GMT). If the clock nodes need to synchronize with
GMT, an external time source is required. That is, the GM needs to be connected to an
external time source to obtain the reference time in non-1588v2 mode.
Currently, the external time sources are from satellites, such as the GPS from the U.S.A,
Galileo from Europe, GLONASS from Russia, and Beidou from China. Figure 13-33 shows
how the GM and an external time source are connected.
Grandmaster
1588v2
External
time port
ATN CX BITS
Clock Synchronization
In addition to time synchronization, 1588v2 can be used for clock synchronization, that is,
frequency recovery can be achieved through 1588v2 packets.
1588v2 time synchronization in Delay or PDelay mode requires the device to periodically
send Sync packets to its peer.
The sent Sync packet carries a sending timestamp. After receiving the Sync packet, the peer
adds a receiving timestamp to it. When the link delay is stable, the two timestamps change at
the same pace. If the receiving timestamp changes are faster or slower, it indicates that the
clock of the receiving device runs faster or slower than the clock of the sending device. In this
case, the clock of the receiving device needs to be adjusted. When this occurs, the frequencies
of the two devices are synchronized.
The frequency restored through 1588v2 packets has a lower accuracy than the frequency
restored through synchronous Ethernet. Therefore, it is recommended to perform frequency
synchronization through synchronous Ethernet and time synchronization through 1588v2.
1588v2 restores the frequency in the following modes:
l Hop-by-hop
In hop-by-hop mode, all devices on a link are required to support 1588v2. The frequency
recovery in this mode is highly accurate. In the case of a small number of hops, the
frequency recovery accuracy can meet the requirement of ITU-T G.813 (stratum 3
standard).
l End-to-end (Delay and jitter may occur on the transit network.)
In end-to-end mode, the forwarding devices do not need to support 1588v2, and the
delay of the forwarding path is only required to meet a specified level, for example, less
than 20 ms. The frequency recovery accuracy in this mode is low, and can meet only the
requirements of the G.8261 and base stations (50 pps) rather than that of the stratum 3
clock standard.
To achieve high frequency recovery accuracy, Sync packets must be transmitted at a high
frequency. For physical-layer frequency synchronization, at least 16 packets must be
transmitted per second; for 1588v2 frequency synchronization, at least 128 packets must be
transmitted per second.
clock server
1588 1588
FE GE GE GE FE
Node B with Node B with
1588 1588
As shown in Figure 13-34, clock servers and NodeBs exchange TOP-encapsulated 1588
messages over a QoS-enabled bearer network with the jitter being less than 20 ms.
Scenario description:
Solution description:
l The bearer network is connected to a wireless IP clock server and adopts 1588v2 clock
synchronization and frequency recovery in E2E mode.
l 1588v2 timing messages need to be transparently transmitted by priority over the bearer
network; the E2E jitter on the bearer network must be less than 20 ms.
l Advantage of the solution: Devices on the bearer network are not required to support
1588v2, and are therefore easily deployed.
l Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed. In practice, an E2E jitter of less than 20 ms is not ensured.
1588 Synchronous
1588 WAN clock 1588
Ethernet
FE GE GE FE
Node B Node B
with 1588 Physical clock 1588 clock without 1588
signal transfer signal transfer
As shown in Figure 13-35, the clock source can send clock signals to NodeBs through the
1588v2 clock, synchronous Ethernet clock, or any combination of clocks.
Scenario description:
Solution description:
l The Synchronous Digital Hierarchy (SDH) or synchronous Ethernet clock sends stratum
3 clock signals through physical links. On the GE links that do not support the
synchronous Ethernet clock, stratum 3 clock signals are transmitted through 1588v2.
l Advantage of the solution: The solution is simple and flexible.
l Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed.
Figure 13-36 Networking diagram of the bearer and wireless networks in the same clock
domain
GPS+BITS GPS+BITS
Node B
without 1588
1588 E1
1588
1588
FE GE GE
Node B BC BC BC BC FE
with 1588
Physical clock 1588 clock Node B
signal transfer signal transfer with 1588
Scenario description:
l NodeBs need to synchronize time with each other.
l The bearer and wireless networks are in the same clock domain.
Solution description:
l The core node supports GPS or BITS clock interfaces.
l All nodes on the bearer network function as BC nodes, which support the link delay
measurement mechanism to handle fast link switching.
l Links or devices that do not support 1588v2 can be connected to devices with GPS or
BITS clock interfaces to perform time synchronization.
l Advantage of the solution: The time of all nodes is synchronous on the entire network.
l Disadvantage of the solution: All nodes on the entire network must support 1588v2.
Terms
Terms Description
Clock Logically, a physical network can be divided into multiple clock domains. Each
domain clock domain has a reference time, with which all devices in the domain are
synchronized. Different clock domains have their own reference time, which is
independent of each other.
Clock Each node on a time synchronization network is a clock. The 1588v2 protocol
node defines three types of clocks: OC, BC, and TC.
Clock Clock source selection is a method to select reference clocks based on the
reference clock selection algorithm.
source
One-step In one-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode are stamped with the time when messages are sent.
Two-step In two-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode only record the time when messages are sent and carry no
timestamps. The timestamps are carried in the messages, such as Follow_Up
and PDelay_Resp_Follow_Up messages.
Abbreviations
Abbreviation Full Spelling
BC Boundary Clock
OC Ordinary Clock
TC Transparent Clock
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints
1588v2 is a software-based technology that carries out time and frequency synchronization.
To achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if
not, frequency synchronization cannot be achieved.
Derived from 1588v2, 1588 ACR implements frequency synchronization with clock servers
on a network with both 1588v2-aware devices and 1588v2-unaware devices. Therefore, in the
situation where only frequency synchronization is required, 1588 ACR is more applicable
than 1588v2.
Benefits
This feature brings the following benefits to operators:
13.4.2 Principles
A client initiates a negotiation with a server in the server list by sending a request to the
server. After receiving the request, the server replies with an authorization packet,
implementing a 2-way handshake. After the handshake is complete, the client and server
exchange Layer 3 unicast packets to set up a clock link, and then exchange 1588v2 messages
over the link to achieve frequency synchronization.
Duration Mechanism
On a 1588 ACR client, you can configure a duration for Announce, Sync, and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.
NOTE
t1 Data obtained
by the client
clock
t2
t1' t1 t2
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
l Two-way mode
Data obtained
t1 Sync by the client
clock
t2 t1 t2
t3 t1 t2 t3
Delay_Req
t4
t5
Delay_Resp t1 t2 t3 t4
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The same calculation method is used in two-way and one-way modes. t1 and t2 are compared
with t3 and t4. A group of data with less jitter is used for calculation. In the same network
conditions, the clock signals with less jitter in one direction can be traced, which is more
precise than clock signal tracing in one direction.
13.4.3 Applications
Typical Applications of 1588 ACR
On an IP RAN shown in Figure 13-39, NodeBs need to implement only frequency
synchronization rather than phase synchronization; devices on an MPLS backbone network do
not support 1588v2; the RNC-side device is connected to an IPCLK server; closed subscriber
groups (CSGs) support 1588 ACR.
NodeB1 transmits wireless services along an E1 link to a CSG, and NodeB2 transmits
wireless services along an Ethernet link to the other CSG.
CSG
E1 RSG1
NodeB1
MPLS
Backbone
RNC
FE
IP CLK
NodeB2 CSG RSG2
1588v2 packet
line clock signal
NodeB service
On the preceding network, CSGs support 1588 ACR and function as clients to initiate
requests for Layer 3 unicast connections to the upstream IPCLK server. The CSGs then
exchange 1588 essages with the IPCLK server over the connections, achieving frequency
recovery. RSG1 and RSG2 are configured as clock servers for the CSGs to provide protection.
One CSG sends line clock signals carrying frequency information to NodeB1 along an E1
link. The other CSG transmits NodeB2 frequency information either along a synchronous
Ethernet link. In this manner, both NodeBs connected to the CSGs can achieve frequency
synchronization.
Figure 13-40 Networking diagram of 1588 ACR applications in the L3VPN scenario
FE RSG2
e Node
B IP CLK
On the preceding network, CSGs support 1588 ACR and function as clients to initiate
requests for Layer 3 unicast connections to the upstream IP CLK server. The CSGs then
exchange 1588 messages with the IP CLK server over the connections, achieving frequency
recovery. The two CSGs are configured as active and standby clock servers to provide
protection. Service flows are added with LSP labels and forwarded through PW on an MPLS.
The ATN adds LSP labels to locally generated and received packets.
l Downstream
The CXconnected to IP CLK servers and level-2 network implements service PWE
encapsulation. The CX, also functioning as the 1588 ACR server, generates 1588 ACR
packets and sends the 1588 ACR packets to level-2 network along with other service
packets through an Ethernet interface. Ethernet service packets, not clock packets, are
then transparently transmitted over level-2 network. The ATNA connected to the PSN
network receives and forwards the service packets to downstream devices. The ATNA,
also functioning as the 1588 ACR client, extracts 1588 ACR packets from the service
packets to restore the clock, implements hop-by-hop frequency synchronization in
synchronization Ethernet mode, and send the clock packets to the next hop ATNA. The
ATNA, connected to NodeB, extracts EI service from the service packets and sends the
clock packets to NodeB in E1 re-timing mode.
l Upstream
The ATNA connected to NodeB encapsulates service packets, adds LSP labels to them,
and sends them to ATNB that connects to level-2 network through an Ethernet interface.
Ethernet service packets, not clock packets, are then transparently transmitted over
level-2 network. The CXconnected to level-2 network receives and forwards the service
packets to upstream devices, and extracts 1588 ACR packets from the service packets as
the 1588 ACR server.
The CSG can send frequency information to NodeB and eNB using a synchronized Ethernet
clock line, implementing frequency synchronization at all NodeBs.
Terms
Term Description
Term Description
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.
Abbreviations
Abbreviation Full Spelling
The preceding networks are a summary of the general experience for availability evaluation on
network deployment of the end-to-end 1588 time synchronization solution, instead of sufficient
conditions for stable running of end-to-end 1588 time synchronization. The general experience can
be used for preliminary evaluation.
Definition
The 1588 Adaptive Time Recovery (ATR) algorithm is used to carry out time synchronization
between the clock clients and clock servers by exchanging 1588v2 messages over a clock link
that is set up by sending Layer 3 unicast packets.
Unlike 1588v2 that achieves time synchronization only when all devices on a network support
1588v2, 1588 ATR is capable of implementing time synchronization on a network with both
1588v2-aware devices and 1588v2-unaware devices.
1588 ATR is a client/server protocol through which servers communicate with clients to
achieve time synchronization.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints
before migrating to IP packet-switched networks. Transmitting Time Division Multiplexing
(TDM) services over IP networks presents a major technological challenge. TDM services are
classified into two types: voice services and clock synchronization services. With the
development of VoIP, technologies of transmitting voice services over an IP network have
become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
1588v2 is a software-based technology that carries out time and time synchronization. To
achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if not,
time synchronization cannot be achieved.
To address this disadvantage, 1588 ATR is introduced to allow time synchronization over a
third-party network that includes 1588v2-incapable devices. On the live network, 1588v2 is
preferred for 1588v2-capable devices, and 1588 ATR is used when 1588v2-incapable devices
exist.
Benefits
This feature brings the following benefits to operators:
l Does not require 1588v2 to be supported by all network devices, reducing network
construction costs.
l Operators can provide more services that can meet subscribers' requirements for time
synchronization.
13.5.2 Principles
A client initiates a negotiation request with a server. The server replies with an authorization
packet to implement handshake. After the handshake succeeds, the client and server establish
a clock link through Layer 3 unicast packets. Then, the client and server exchange PTP
packets to implement time synchronization over the clock link.
Duration Mechanism
On a 1588 ATR client, you can configure a duration for Announce, Sync, and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.
After the client triggers negotiation with a server, the client periodically checks the
negotiation result. If the client finds that the negotiation process fails or the server fails after
the client implements synchronization with the server, the client detects the negotiation status
change . If the client finds that servers are working properly during the query of the
negotiation result, the client selects a server to connect to based on the quality levels of the
servers.
When only one server is configured, the client re-attempts to negotiate with the server after a
negotiation failure. This allows a client to renegotiate with a server that is only temporarily
unavailable in certain situations, such as when the server fails and then recovers or when the
server is restarted.
1588 ATR sends Layer 3 unicast packets to establish a time link between a client and a server
to exchange 1588v2 messages. 1588 ATR obtains a time offset by comparing timestamps
carried in the 1588v2 messages, which enables the client to synchronize time with the server.
NOTE
1588 ACR clock synchronization is implemented in two modes: one-way mode and two-way
mode.
l One-way mode
t1 Data obtained
by the client
clock
t2
t1' t1 t2
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
l Two-way mode
Data obtained
t1 Sync by the client
clock
t2 t1 t2
t3 t1 t2 t3
Delay_Req
t4
t5
Delay_Resp t1 t2 t3 t4
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The round-trip latency of the link between the server and client is (t4-t1)-(t3-t2). 1588 ATR
requires the same link latency on two links involved in the same round trip. Therefore, the
offset of the client is t2-t1-[(t4 - t1) - (t3 - t2)]/2 = [(t2 - t1) -(t4 - t3)]/2, compared to the time
of the server. The client then uses the calculation result to adjust its local time.
13.5.3 Applications
Typical Applications of 1588 ATR
On the IP RAN shown in the following figure, time synchronization needs to be performed
between NodeBs, but the third-party network (such as a microwave or switch network) does
not support 1588v2.
If the third-party network supports frequency synchronization but not time synchronization,
frequency is restored at the physical layer hop by hop, and time is restored using 1588 ATR. If
the third-party network does not support frequency synchronization or time synchronization,
1588 ATR is used for frequency and time synchronization.
Client 1
Server 1
NodeB 1 Master CLK 1
Server 2 CLK 2
NodeB 2 Client 2
The ATN device can function as a boundary clock (BC) to restore time from the upstream
device and also function as a master to implement E2E 1588 time synchronization with the
downstream slave device. The slave device can request time synchronization packets from the
master device to implement time synchronization and function as a BC to provide the hop-by-
hop time synchronization information to downstream devices. The slave device can
implement physical-layer hop-by-hop frequency recovery or 1588 ATR-based frequency
recovery. The slave device can negotiate with multiple master devices to implement time
source backup and protection switching.
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.
Abbreviations
Abbreviation Full Spelling
Device Description
Device Description
13.6.1 Introduction
Definition
Circuit emulation service (CES) clock synchronization implements adaptive clock frequency
synchronization and asynchronous clock frequency synchronization based on CESs. CES
clock synchronization uses special circuit emulation headers to encapsulate time multiplexing
service (TDM) packets that carry clock frequency information and transmits these packets
over a packet switched network (PSN).
Purpose
If a clock frequency is out of the allowed error range, problems such as bit errors and jitter
occur. As a result, network transmission performance deteriorates. Clock synchronization
confines the clock frequencies of all network elements (NEs) on a digital network to the
allowed error range, enhancing network transmission stability.
When the intermediate PSN does not support clock synchronization at the physical layer and
needs to transmit clock frequency information using TDM services of the CES .
13.6.2 Principles
CES
The circuit emulation service (CES) technology originated from the asynchronous transfer
mode (ATM) network. CES uses emulated circuits to encapsulate circuit service data into
ATM cells and transmits these cells over the ATM network. Circuit emulation was later used
on the metro Ethernet network to transparently transmit circuit switched services like TDM.
CES uses special circuit emulation headers to encapsulate TDM service packets that carry
clock information and transmits these packets over a PSN.
CES ACR
The CES technology generally uses the adaptive clock recovery (ACR) algorithm to
synchronize clock frequencies. If an Ethernet transmits TDM services over emulated circuits,
the Ethernet uses the ACR algorithm to extract clock synchronization information from data
packets.
PW
E1 E1
PSN
TDM TDM
IWF1 IWF2
CE1
CES DCR
As shown in Figure 13-44, if the PSN does not support physical-layer clock synchronization,
clock information is sent through TDM packets in CES DCR mode. Details are as follows:
1. The clock source sends clock frequency information to the CE1.
2. The CE1 encapsulates clock frequency information into TDM service packets sends to
gateway IWF1.
3. The master gateway IWF1 and the slave gateway IWF2 obtain the common clock for
CES DCR using clock source 2 and clock source 3, respectively. Clock source 2 and
clock source 3 are connected to the same upper-layer clock source.
4. The master gateway IWF1 periodically sends the slave gateway IWF2 the service clock
information that is contained in a E1 simulation packet and expressed as a sequence
number or timestamp.
5. The slave gateway IWF2 extracts timestamp from the E1 simulation packet. The
common clock first restores clock information based on the timestamp and then the
service clock information using differential algorithm. From long-term perspective, the
local clock extracted from the slave gateway IWF2 is the same as the source clock. In
this manner, frequency synchronization is implemented between two IWFs on the PSN.
BITS1
PW
E1 E1
PSN
TDM TDM
IWF2
IWF1
CE1
Sync Sync
network network
BITS2 BITS3
13.6.3 Applications
CES ACR is used in scenarios in which the intermediate PSN does not support clock
synchronization at the physical layer and needs to transmit clock frequency information using
TDM services.
BITS
PW
E1 E1
PSN
TDM TDM
IWF1 IWF2
CE1
As shown in Figure 13-46, the clock source sends clock frequency information to a CE. The
CE encapsulates clock frequency information into TDM service packets and transmits these
packets over the intermediate PSN to the peer CE. CES ACR recovers clock frequency
information at the IWF connected to the peer CE. In practical application, multiple E1
interfaces can belong to the same clock recovery domain. By default, the system selects a PW
as the primary PW and uses the primary PW to recover clock signals. If the primary PW fails,
the system selects the next available PW as the primary PW to recover clocks. In this manner,
clock protection among multiple PWs is implemented.
Abbreviations
Abbreviation Full Spelling
13.7 G.8275.1
13.7.1 Introduction
Definition
G.8275.1 is a precise time synchronization standard defined by International Telecom Union -
Telecommunication Standardization Sector (ITU-T).
Purpose
Because datacom networks do not require time or clock synchronization, routers on such
networks do not implement time or clock synchronization. To meet requirements of base
stations on an IP radio access network (RAN), routers used on the IP RAN must implement
time and clock synchronization. Clock synchronization between base stations is necessary for
the IP RAN. Frequency asynchrony leads to call drops when wireless terminals switch
between base stations. In addition to frequency synchronization (clock synchronization), some
wireless standards require phase synchronization (time synchronization). Table 13-7 lists time
and clock synchronization requirements of wireless standards.
The requirements for clock synchronization on base stations that support different standards
can be met using multiple methods, such as physical clocks (including external clock input
and synchronous Ethernet) and packet-based recovery clocks (including NTP and 1588v2).
Traditionally, base stations are connected only to the global positioning system (GPS) to
implement time synchronization. Packet-based time synchronization that the Network Time
Protocol (NTP) and IEEE 1588v1 support cannot meet requirements of base stations. Time
synchronization can reach only sub-second precision using NTP and sub-millisecond
precision using IEEE 1588v1. The cost of GPS installation and maintenance is high. In
addition, because the GPS relies on satellites owned by different countries, communication
security cannot be guaranteed.
To resolve the preceding problems, ITU-T G.8275.1 can be used to implement
submicrosecond time synchronization. ITU-T G.8275.1, with hardware processing supported,
is cost effective and less dependent on the GPS, mitigating some of the security concerns
about the GPS.
Benefits
ITU-T G.8275.1 offers the following benefits:
l Implements high-precision clock synchronization.
l Reduces the costs of construction and maintenance of a wireless network with time
synchronization implemented.
l Enhances communication security, with time and clock synchronization independent of
the GPS.
13.7.2 Principles
Synchronization
Most telecommunication services running on a modern communications network require
network-wide synchronization with the frequency offset or time difference between devices
remaining in a specified range. Network clocks implement the following synchronization:
l Clock synchronization
Also called frequency synchronization. There is a constant frequency or phase offset
between signals. Signals are sent or received at the same average rate in any given period
of time so that all devices on a communications network can operate at the same rate.
The difference between signal phases is a constant value.
l Time synchronization
Also called phase synchronization. The frequency offset and phase offset between
signals are always 0.
Phase synchronization
Watch A
Watch B
Frequency synchronization
Watch A
Watch B
Figure 13-47 illustrates time and clock synchronization between Watch A and Watch B. In
time (phase) synchronization, Watch A and Watch B always keep the same time. In clock
(frequency) synchronization, Watch A and Watch B keep different time, but the time
difference between the two watches is a constant value, for example, 6 hours.
Clock Domain
A physical network can be logically divided into multiple clock domains. Each clock domain
has its own independent synchronous time, with which clocks in the same domain
synchronize. Each device can be added only to a single clock domain. Devices in the same
clock domain have the same domain ID.
Clock Nodes
ITU-T G.8275.1 defines the following clock nodes:
NOTE
An ATN device can only function as a T-BC node.
l Telecom grandmaster (T-GM)
A T-GM can only function as a master clock. It has one or more PTP ports and is unable
to trace other PTP clocks.
l Telecom boundary clock (T-BC)
A T-BC can function as either a master or slave clock. As a slave clock, the T-BC traces
other PTP clocks.
Figure 13-48 shows the locations of the three types of clocks on a time synchronization
network.
Figure 13-48 Locations of the three types of clocks on a time synchronization network.
M S M S M S
Packet Encapsulation
ITU-T G.8275.1 defines untagged Layer 2 multicast encapsulation. Figure 13-49 shows an
untagged Layer 2 multicast packet.
l EtherType field: set to 0x88F7.
l Source MAC address (SA) field: set to the MAC address of a device that sends a packet.
l Destination MAC address (DA) field: set to either the unforwardable multicast MAC
address 01-80-C2-00-00-0E or the forwardable multicast MAC address
01-1B-19-00-00-00.
l PTP message: ITU-T G.8275.1 defines five types of messages: Announce, Sync,
Delay_Req, Delay_Resp, and Follow_Up.
Packet Types
ITU-T G.8275.1 defines five message types: Announce, Sync, Delay_Req, Delay_Resp, and
Follow_Up. Table 13-8 outlines five types of ITU-T G.8275.1 messages and their functions.
Packet Function
Type
Announce To select the master and slave clocks, clock nodes send Announce messages
to one another to exchange time source information, including the priority of
the grandmaster, SSM level, time precision, and the number of hops to the
grandmaster.
Packet Function
Type
Sync The master clock sends Sync messages timed-stamped with t1 to a slave
clock.
Sync messages can be sent in either of the following modes:
l one-step: A Sync message carries the timestamp of when the message was
sent.
l two-step: A Sync message records the time when it is sent, but does not
carry a transmission timestamp. A Follow-Up message sent after the Sync
message carries the timestamp of when the Sync message was sent.
Delay_Re A slave clock sends Delay_Req messages time-stamped with t3 to the master
q clock. The t3 timestamp records the time when the Delay_Req message was
sent.
Delay_Res The master clock sends Delay_Resp messages carrying a timestamp t4 and
p the requested interface ID to a slave clock. The t4 timestamp records the time
when a Delay_Resp message was sent.
Follow_U Follow_Up messages are used only in two-step mode. The master clock sends
p a Sync message and then a Follow_Up message time-stamped with t1 to a
slave clock.
Asymmetric Correction
ITU-T G.8275.1 requires symmetric delays in opposite directions of a link. Asymmetric
delays cause time synchronization algorithms to fail. The delays in opposite directions,
however, are asymmetric due to link problems or device processing issues. ITU-T G.8275.1
provides the asymmetric delay correction mechanism, as shown in Figure 13-50.
Master clock
or
Responder
A B
t-sm t-ms
Slave clock
or
Requestor
Both t-sm and t-ms are unidirectional delays, and t-ms is expected to be the same as t-sm. If
they are different, an asymmetric correction value can be set to compensate for the
asymmetric delays. In time synchronization calculation, the asymmetric correction value is
used to ensure the precision of time synchronization even if delays on the forward and reverse
links are different.
Overview
In ITU-T G.8275.1 time synchronization, the master clock periodically sends Sync messages
carrying timestamps to a slave clock. Upon receipt of the Sync messages, the slave clock
records the timestamps. When the link delay is stable, the sending and receiving timestamps
change at the same pace. If the receiving timestamp is changed faster or slower than the
sending timestamp, the clock on the receiving device runs faster or slower than the clock on
the sending device. In this case, the local clock on the receiving device must be adjusted. This
process helps two devices synchronize to the same frequency.
NOTE
The frequencies restored using G.8275.1 messages have lower precision than those on a synchronous
Ethernet network. It is recommended that you use synchronous Ethernet to implement clock synchronization
and ITU-T G.8275.1 to implement time synchronization.
Synchronization Mechanism
ITU-T G.8275.1 restores frequencies by using packets on each hop.
Per-hop packet-based frequency recovery can be implemented only when all devices on a path
support ITU-T G.8275.1. On a path with only a few hops, ITU-T G.8275.1 can restore
Stratum 3 frequencies that meet the ITU-T G.813 standard.
To achieve high frequency recovery precision, ITU-T G.8275.1-enabled devices must send
Sync messages at a minimum rate of 128 messages every second.
Overview
On an ITU-T G.8275.1 time synchronization network, all clocks establish master/slave
synchronization relationships with one another. The grandmaster clock, at the highest level in
the hierarchy, is used as the system reference clock. The synchronization topology is
automatically generated using the best master clock (BMC) algorithm. Clock nodes exchange
time source information, select the grandmaster clock, and determine which local ports to
receive clock signals sent by the grandmaster clock. With the BMC algorithm, a loop-free
meshed tree-shaped network is built and rooted at the grandmaster clock. In addition, a master
node periodically sends packets to its slave nodes. If the slave nodes do not receive packets
sent by the master node within a specific period of time, the slave nodes consider the master/
slave relationships invalid and start to re-select a master node that provides a clock source.
Synchronization Mechanism
The time synchronization principles of ITU-T G.8275.1 are the same as those of IEEE
1588v2. The master and slave nodes send and receive timing packets to and from each other.
Based on the receiving and sending timestamps in the timing packets, the total delay in
bidirectional transmission can be calculated. If the delays in opposite directions are the same,
the total delay divided by 2 is equal to the unidirectional delay, which is the time difference
between the slave and master nodes. Then, the slave node corrects the local time based on the
time difference. The slave node is then synchronized with the master node.
Time synchronization precision, however, is low due to the variation jitter on an existing
network and different delays in opposite directions on a link. For example, NTP implements
time synchronization only with precision that ranges from 10 ms to 100 ms. In addition,
software on a control board of the ATN runs NTP, which means that software processing is
also involved in NTP delay calculation. NTP calculates communication delays, including the
link delay and internal processing delays stemming from queuing, software invoking, and
software processing. The variation jitter can be large, and the delays in opposite directions on
a link are asymmetric. As a result, time synchronization precision cannot meet requirements.
Unlike NTP, ITU-T G.8275.1 assumes that the link delay is a constant value (or a trivial value
that can be ignored between synchronization processes), and delays in opposite directions
along a link are the same. In this case, the link delay can be measured using timestamps on
two ends of a link to implement most precise time synchronization.
The Delay mode is used in E2E delay measurement to repeatedly implement synchronization.
Figure 13-51 shows the flowchart for ITU-T G.8275.1 E2E delay measurement in Delay
mode.
Master Slave
time time
Timestamps
known by slave
t1
Syn
t-ms
t2 t2
Follow_Up
t1, t2
t3 t1, t2, t3
Delay_Req
t-sm
t4
Delay_Resp
NOTE
In Figure 13-51, t-sm and t-ms are delays in opposite directions. In the following example, the two
delay values are the same. If they are different, the asymmetrical delay correction mechanism can be
used to compensate for the asymmetric delay.
In the following example, the one-step mode is used. Follow-Up messages are not used in one-step
mode because they are only used in two-step mode.
13.7.3 Applications
Service Description
To meet clock and time synchronization requirements of base stations on an IP RAN, routers
on bearer networks must support clock and time synchronization. ITU-T G.8275.1 is a per-
hop synchronization protocol used only in the Telecom field. ITU-T G.8275.1 implements
high-precision clock and time synchronization.
Networking Description
Per-hop clock synchronization
NodeBs need to perform clock synchronization; however, a bearer network does not support
the synchronous Ethernet technique. ITU-T G.8275.1 can be configured to perform clock
synchronization between NodeBs (configuration on each hop is not required). Clock
information can be sent by a clock source to a destination node using any combinations of
1588, and synchronous Ethernet clocks. As shown in Figure 13-52, nodes A and B implement
clock synchronization.
G.8275.1 G.8275.1 E1
GE GE G.8275.1
FE
T-BC T-BC T-GM T-BC FE
Node A with
G.8275.1
Node B with
G.8275.1clock G.8275.1
signal transfer
Physical clock signal transfer
G.8275.1 G.8275.1 E1
FE GE GE G.8275.1
Terms
Term Description
ITU-T G. A standard entitled precision time protocol telecom profile for phase/time
8275.1 synchronization with full timing support from the network, defined by the ITU-
T.
Clock A physical network can be logically divided into multiple clock domains. Each
domain clock domain has its own independent synchronous time, with which clocks in
the same domain synchronize.
13.8.1 Introduction
Definition
The rapid commercial deployment of Long Term Evolution (LTE) Time Division Duplex
(TDD) and LTE-Advanced (LTE-A) drives the need for time synchronization of base stations.
Two time synchronization solutions are commonly used: one solution is to directly connect
base stations to the Global Positioning System (GPS) and the other solution is to obtain the
Precision Time Protocol (PTP) time from the network.
If base stations connect directly to the GPS, each base station must pay GPS deployment
costs. The total cost of ownership (TCO) therefore increases as the number of base stations
increases. If base stations obtain PTP time from the network, the entire network must support
PTP time synchronization, which renders high network-wide reconstruction costs.
Using the GPS solution also has additional limitations. For example, the GPS antenna must be
installed outdoors and positioned to receive signals from GPS satellites. Long feeders must
therefore be used to connect to devices that are deployed indoors, and holes must be drilled
through walls in order to route these feeders indoors. In addition, requirements such as
lightning protection must be considered when selecting antenna sites. These conditions make
it difficult and costly to deploy GPS antennas for indoor devices. Furthermore, rented indoor
equipment rooms may have restrictions in place that prevent or strictly control through-wall
installation of cables, and obtaining permissions for such installation may be complex. For
example, Japanese law does not allow GPS radio frequency (RF) cables to be installed from
outdoor to indoor.
To address this situation, the ATN provides Atom GPS timing. It uses a built-in AE 905S
module that provides GPS access. This module functions as a lightweight building integrated
timing supply (BITS) to receive clock and time signals from the GPS and converts them into
synchronous Ethernet (SyncE) signals and 1588v2 signals (PTP time signals), respectively.
The AE 905S module then outputs the signals to the ATN, which in turn synchronizes SyncE
clock and PTP time to all base stations connected to the ATN. This feature greatly reduces the
TCO for clock and time synchronization.
Benefits
Atom GPS timing offers the following benefits to carriers:
l Time synchronization deployment costs reduced by 80% for newly constructed networks
l Carrier investment protected by employing existing ATN networks in network expansion
scenarios
13.8.2 Principles
Atom GPS timing is implemented by using a GPS antenna, GPS receiver, PLL, SyncE
processing module, and PTP processing module.
Related Modules
PTP Packet
PTP GM PTP BC
GPS Antenna
It receives GPS satellite signals.
GPS Receiver
It processes GPS RF signals and extracts frequency and time information from the GPS RF
signals.
Phase-locked loop (PLL)
The PLL can be:
l Frequency PLL: locks 1 PPS reference clocks and outputs a high-frequency clock.
l Analog PLL (APLL): multiplies the system clock to a higher-frequency clock.
l Time PLL: locks the UTC time and outputs the system time.
RTC
The real time clock (RTC) provides real-time timestamps for PTP event messages.
PTP GM
The PTP Grandmaster module periodically sends Announce, Sync, and Delay_Resp messages
and receives Delay_Req messages.
SyncE Slave
It is an ATN device's slave clock processing module that extracts SyncE clock signals.
PTP BC
It is an ATN device's PTP processing module that functions as the slave BC to process PTP
messages and extract PTP time.
Implementation
Atom GPS timing provides two service functions:
1. Service function 1: Atom GPS timing allows an AE 905S to function as a SyncE clock
reference source to provide clock synchronization for ATN.
2. Service function 2: Atom GPS timing allows an AE 905S to function as a PTP time
reference source to provide time synchronization for ATN.
1. The AE 905S module uses a built-in GPS receiver to receive satellite signals from the
GPS antenna and output GPS clock signals at 1pps.
2. The AE 905S module uses a built-in frequency PLL module to trace and lock 1 PPS
phase and frequency and output the system clock.
3. The AE 905S module uses a built-in APLL to multiply the system clock to a clock at GE
rate, which is then used as the SyncE transmit clock.
4. The ATN uses the GE interface equipped with the AE 905S module to obtain the SyncE
clock signals from the AE 905S module and transfer the clock signals to downstream
devices.
1. The AE 905S module uses a built-in GPS receiver to receive satellite signals from the
GPS antenna and output the UTC time.
2. The AE 905S module uses a built-in time PLL module to trace time PLL and lock the
UTC time and output the system time.
3. The AE 905S module uses a built-in RTC module to obtain the system time.
4. The AE 905S module uses a built-in PTP Grandmaster module to process PTP messages.
The timestamp carried in PTP event messages is generated by the RTC module.
5. The ATN uses the GE interface equipped with the AE 905S module to obtain the PTP
time signals from the AE 905S module and transfer the time signals to downstream
devices.
13.8.3 Applications
In comparison with indoor deployment, outdoor deployment has the following advantages:
1. The deployment is simple and requires less work hours and engineering costs.
2. Cable costs are low due to the use of a fixed-length feeder.
3. Integrated outdoor installation prevents the routing of a long feeder, reducing the fault
possibility and facilitating maintenance.
14 Security
This document describes the security feature in terms of the overview, principle, and
applications.
Purpose
MAC entries on the Layer 2 network are essential to forwarding packets. When MAC attacks
are launched on a network, MAC entries are exhausted by invalid MAC addresses, denying
the access of authorized users to the network. To prevent this problem, you can configure
MAC address limit to minimize the impact of MAC attacks.
Benefits
Benefits brought to operators
To solve the preceding problem, MAC address limitation is introduced. By configuring the
maximum number of MAC addresses to be learnt by an interface , a VLAN or a Virtual
Switch Instance (VSI), you can minimize the impact of an attack and protect other users.
MAC address limitation minimizes the impact of attacks so that the security of users is
enhanced and the bandwidth usage is improved.
14.1.2 Principles
MAC address limit allows you to set the maximum number of MAC addresses to be learnt by
an interface on a ATN. When the number of learnt MAC addresses reaches the set maximum
number, the interface forwards a subsequent packet if the source MAC address of the packet
exists in the MAC table; the interface forwards or discards the subsequent packet if the source
MAC address of the packet does not exist in the MAC table based on the action configured in
MAC address limit. For example, the packet is discarded by the interface if the action is
configured as Discard.
1. When a user packet passes through a port enabled with MAC address limit based on Port or
Port+VLAN, the ATN learns the source MAC address and forwarding information carried in
the user packet and experiences the limit process.
2. Limit process: The ATN first determines whether the source MAC address to be learnt
exists in the MAC table. If so, the packet is simply forwarded; if not, the ATN checks whether
the number of MAC addresses that are learnt previously reaches the maximum number set in
MAC address limit. If the set maximum number is not reached, the ATN learns the MAC
address of the packet; If the set maximum number is reached, the ATN discards or forwards
the packet based on the action set in MAC address limit.
1. When a user packet is forwarded in a broadcast domain configured with MAC address
limit, the ATN learns the source MAC address of the packet on the outbound interface. If the
source MAC address of the packet exists in the MAC table, the ATN simply forwards the
packet; if the source MAC address of the packet does not exist in the MAC table, the ATN
checks whether the number of MAC addresses learnt previously reaches the maximum
number set in MAC address limit. If not, the ATN learns the MAC address of the packet; if
so, the ATN discards or forwards the packet based on the action configured in MAC address
limit.
Traffic Types
Traffic in a Layer 2 network is classified into the following types:
l Unicast traffic: For the unicast packets that have destination MAC mapping entries in the
MAC table, the switch forwards them based on the mapping entries.
l Unknown unicast traffic: For the unicast packets that have no destination MAC mapping
entries in the MAC table, the switch broadcasts them.
l Multicast traffic: For the packets whose destination MAC addresses are multicast MAC
addresses, the switch broadcasts them.
l Broadcast traffic: For the packets whose destination MAC addresses are broadcast MAC
addresses, the switch broadcasts them.
To ensure normal forwarding of unicast traffic, you can limit the bandwidth for forwarding
the unknown unicast traffic, multicast traffic, and broadcast traffic by configuring traffic
suppression on the switch.
D a te flo w
A switch receives all the data frames across the network. It learns the source MAC address
carried in the frames and constructs a MAC address table to save the mapping between the
MAC address and the source interface.
After receiving a data frame, the switch searches the MAC address table for its mapping
destination MAC address. If the corresponding MAC address is found, the switch forwards
the frame to the destination MAC. In this manner, the switch implements conflict isolation.
Otherwise, the switch broadcasts the frame to all the interfaces except the interface that sends
the frame. Broadcast storms then occur across the network.
When receiving a multicast or broadcast packet, the switch cannot exactly find the interface
for which the packet destines based on the destination MAC address. The switch then also
needs to forward the multicast or broadcast packet to all the interfaces except the interface
that sends the packet. In such a case, broadcast storms are also generated.
Deploying switches in the network can improve the unicast forwarding efficiency. The
broadcast traffic, however, degrades the switch performance. To solve this problem, traffic
suppression is introduced.
Traffic Suppression
If the broadcast traffic is not suppressed, a great amount of network bandwidth is consumed
when a great deal of broadcast traffic flows through the network. The network performance is
therefore degraded, even interrupting the communication.
In such a case, you need to configure broadcast traffic suppression on the switch to ensure that
the switch can reserve a part of bandwidth for forwarding unicast traffic when broadcast
traffic bursts across the network.
D a ta flo w
You can enable MAC address limit on Layer 2 interfaces of the ATN to control the total
number of MAC addresses that can be learnt from all the attached user networks, regardless
of the VLANs to which each user network belongs. As shown in Figure 14-3, you can
configure MAC address limit on port1 of the ATN.
Layer 2 In te rn e t
n e tw o rk
P o rt1
VLAN10 VLAN20
MAC Address Limit for One or More VLANs to Which an Inbound Interface
Belongs
In addition to MAC address limit on an inbound interface, you can also configure MAC
address limit for one or more specific VLANs to which an inbound interface belongs.
As shown in Figure 14-3, you can configure MAC address limit based on Port+VLAN on
port1 of the ATN to restrict the number of MAC addresses barely on VLAN 10 or VLAN 20.
MAC A ATN or a switch learns the source MAC address of a user packet to
Address forward the packet to the proper destination.
Learning
MAC You can restrict the number of MAC addresses to be learnt by an interface
Address to enhance the network security.
Limit
Action It is the behavior adopted by the ATN to process a subsequent packet when
the MAC address limit threshold is reached. Currently, there are two
actions: discard and forward.
Abbreviations
Abbreviation Full Name
14.2.1 Introduction
Definition
Dynamic Host Configuration Protocol (DHCP) snooping, a DHCP security feature, filters out
untrusted DHCP messages by means of DHCP snooping binding table, and IP+MAC binding.
DHCP snooping functions as a firewall between the DHCP client and DHCP server to prevent
DHCP-associated attacks.
Purpose
DHCP snooping prevents the following attacks:
l Bogus DHCP server attacks
l Middleman attacks and IP/MAC spoofing attacks
l DoS attacks launched by changing the value of the CHADDR field
The working mode of DHCP snooping varies according to the type of attack, as shown in
Table 14-1.
Middleman attack and IP/MAC spoofing Using a DHCP snooping binding table
attack
DoS attack launched by changing the Checking the CHADDR field in a DHCP
value of the CHADDR field message
14.2.2 Principles
to the DHCP client. This causes the Denial of Service (DoS). Figure 14-4 shows a bogus
DHCP server attack.
DHCP server
DHCP pseudo
server
DHCP discovery (broadcast)
DHCP offer (unicast from the pseudo server)
DHCP request (broadcast)
DHCP ack (unicast from the pseudo server)
To prevent a bogus DHCP server attack, enable DHCP snooping to work in trusted or
untrusted mode.
x
DHCP Server
Untrusted Trusted
Untrusted
x
DHCP Pseudo
Server
Middleman Attack
A middleman sends a packet carrying its own MAC address and the IP address of the DHCP
server to the client. The client then learns the IP and MAC addresses and mistakenly regards
the middleman as the DHCP server. From then on, a packet sent from the client is always
destined for the middleman before reaching the DHCP server, and the middleman, as a
response, sends a packet carrying its own MAC address and the IP address of the client to the
DHCP server. The DHCP server then learns the IP and MAC addresses and mistakenly
regards the middleman as the client. From then on, a packet sent from the DHCP server is
always destined for the middleman before reaching the client, as shown in Figure 14-6.
The middleman therefore participates in data communications between the DHCP server and
client. The DHCP server and client then mistake that they are exchanging packets, which are
actually bogus packets processed by the middleman.
(3)
Middleman
(2) (1)
DHCP server
10.1.1.1/32
MAC:1-1-1
10.1.1.2/32
MAC:2-2-2
10.1.1.3/32 10.1.1.2/32
MAC:3-3-3 MAC:2-2-2
DHCP server DHCP client
To prevent a middleman attack and IP/MAC spoofing attack, you can use a DHCP snooping
binding table.
The ATN appllies the Discard policy by default. After receiving an ARP or IP packet, an
interface compares its source IP address and source MAC address with the entries in the
DHCP snooping binding table. As shown in Figure 14-8, if a matched entry is found, the
packet is forwarded; if no matched entry is found, the packet is discarded.
For the clients with static IP addresses configured, ARP packets or IP packets sent from them
are discarded. This is because these clients do not obtain IP addresses by sending
DHCPREQUEST messages and no DHCP snooping binding entry exists. In this manner,
these clients cannot access the network.
Similarly, for the clients that steal valid IP addresses of other clients, ARP packets or IP
packets sent from them are also discarded. This is because these clients do not obtain IP
addresses by sending DHCPREQUEST messages and the MAC address and interface
information corresponding to the IP address in the DHCP snooping binding table are therefore
different from those of the packet sender. In this manner, these clients cannot access the
network.
DHCP snooping
enable
ISP
network
The entries in the DHCP snooping binding table are classified into the following two types:
l Static entries configured through using command lines. These entries can be deleted only
using command lines.
l Dynamic entries automatically learned through DHCP snooping. These entries are aged
based on the lease.
The dynamic entries in the DHCP snooping binding table are automatically generated based
on DHCPACK messages from the DHCP server.
For the untrusted interface, a Layer 3 device intercepts the DHCPREPLY message to obtain
the information including the IP address assigned by the DHCP server, the MAC address of
the interface, and the interface through which the message pass. An IP and MAC binding
entry of the untrusted interface is then generated. A binding entry has the same lease as the IP
address of the client. When the lease expires or the client releases this IP address, the entry is
automatically deleted.
14.2.2.3 DoS Attack Launched by Changing the Value of the CHADDR Field
In a DHCP exhaustion attack, the attacker may change the Client Hardware Address
(CHADDR) carried in the DHCP message rather than the source MAC address in the frame
header to repeatedly apply for IP addresses, as shown in Figure 14-9. The attack packets may
be retransmitted normally because the device verifies a packet based on only the source MAC
address in the frame header.
DHCP snooping
enable
ISP
network
You can configure DHCP snooping on the device to check the CHADDR field carried in a
DHCPREQUEST message. If the CHADDR field matches the source MAC address in the
frame header, the message is forwarded. Otherwise, the message is discarded.
i
82 N i1 i2 i3 i4 i5 …
N
The ATN device uses the Option 82 field to define the address assignment policies or other
policies for the DHCP server to perform.
Figure 14-11 Appending an Option 82 field to the DHCP message on an ATN Device
Client3
Client2
DHCP DHCP
ATN Relay Server
Client1 Internet
Discover
Discover+Option82
Offer+Option82
Offer
Request
Request+Option82
Ack+Option82
Ack
Data Exchange
Option 82 Implementation
After Option 82 is enabled, a interface checks whether the DHCPREQUEST message sent
from a client or the message ready to send to a client contains an Option 82 field.
l If the DHCPREQUEST message contains an Option 82 field, do as follows:
Check configurations about Option 82 field appending, if the current interface is
configured with the Rebuild mode, it indicates that this interface does not trust the
Option 82 field contained in the received message and must modify Sub-option 1
contained in the Option 82 field.
l If the DHCP Request packet does not contain the Option 82 field:
The device adds an Option 82 field with Sub-option 1.
When the DHCPREPLY message is forwarded, the device first checks whether the message
contains Sub-option 1 and whether the sub-option contains the Huawei Device Identifier field.
If so, the device can successfully parse the Option 82 field, and then removes the Huawei
Device Identifier field from Sub-option 1 before forwarding the message.
14.2.3 Applications
A Dynamic Host Configuration Protocol (DHCP) server dynamically assigns IP addresses to
DHCP clients. Attacks, such as a bogus DHCP server attack and a DHCP denial of service
(DoS) attack may occur during IP allocation. To address this problem, deploy DHCP
snooping. DHCP snooping can be deployed on Layer 2 or Layer 3 devices. The DHCP relay
is required when DHCP snooping is deployed on Layer 3 devices.
Figure 14-12 Networking diagram for configuring DHCP snooping on a Layer 2 device
ISP network
L3
network
DHCP
Relay
L2
network
DHCP
Snooping DHCP
Trusted Server
Untrusted
User Network
Figure 14-13 Networking diagram for configuring DHCP snooping on a Layer 3 device
ISP network
L3
network
DHCP
d
te
Snooping
us
Tr
ed
DHCP
st
ru
Relay
nt
U
L2
network DHCP
Server
User Network
MP MultiLink PPP
14.3 URPF
14.3.1 Introduction
Definition
Unicast Reverse Path Forwarding (URPF) is a security measure used to prevent source
address spoofing attacks across the network.
Purpose
As IP networks are developing, threats to network security increase, and network devices are
vulnerable. Source address spoofing attacks have become a typical security threat on the
Internet. URPF helps prevent such attacks.
When URPF is enabled on the ATN equipment, the system will obtain the source address of a
received packet and search the routing table to see whether the interface that receives the
packet matches the outbound interface corresponding to the address in the forwarding table. If
they do not match, the source address is considered as spoofed and the packet is dropped. In
this way, URPF protects the device against source address spoofing attacks.
se rve r
G 1 0 .1 0 .1 .2 /2 4
S IP 1 0 .1 .1 .2 /2 4
E0
/2
/1
D IP 1 0 .1 0 .1 .2 /2 4
/0 In te rn e t
/2
S IP 1 0 .1 .1 .2 /2 4 E0
G
ATN B re a k d o w n
D IP 1 0 .1 0 .1 .2 /2 4
U se r D a ta F lo w
U se r
1 0 .1 .1 .2 /2 4 A tta cke r D a ta flo w
As shown in Figure 14-14, an attacker forges packets with the source addresses of authorized
users and sends the packets to the server. These forged packets flood the network resources
and cause Denial of Service (DoS) to the server and users.
There are typical countermeasures to DoS attacks, including Address Resolution Protocol
(ARP) attack defense, URPF, and Dynamic Host Configuration Protocol (DHCP) snooping.
URPF can be configured on the network ingress to prevent source address spoofing attacks by
blocking the packets with forged source addresses.
Benefits
Benefits to carriers
URPF is an additional layer of network security. It helps protect devices against source
address spoofing, reducing DoS and Distributed DoS (DDoS) attacks.
Benefits to users
URPF increases users' security on the network.
14.3.2 Principles
Among the ATN 950B series, only the ATN 950B ( with the control board AND2CXPB/AND2CXPE
installed ) supports Strict mode (excluding the default route).
Strict Mode
In strict URPF mode, a data packet can pass the URPF check only when the forwarding table
contains a matched entry and the outbound interface of the entry matches the inbound
interface of the packet.
After interface-based strict URPF is enabled on a router, the router searches the routing table
for a matched entry based on the source IP address and the VRF (index of a VPN) of a
received data packet. If the router finds such an entry, it compares the outbound interface of
the entry with the inbound interface of the packet. If the two interfaces match, the router
considers that the packet passes the URPF check, and then forwards it normally. If the router
finds no such an entry or the outbound interface of the entry mismatches the inbound interface
of the packet, the router considers the source address of the data packet as a bogus source
address, and then discards the data packet.
If there is only one path between two network edge routers, routes are symmetrical and strict
URPF safeguards the network to the most extent.
Loose Mode
In loose URPF mode, a packet can pass the URPF check as long as there is a route with the
destination address being the source address of the packet, regardless of whether the outbound
interface of the route matches the inbound interface of the packet.
After interface-based loose URPF is enabled on a router, the router searches the routing table
for an entry based on the source IP address and the VRF (index of a VPN) of a received data
packet. If a matched entry is found, the data packet passes the URPF check and is forwarded
normally. If no matched entry is found, the source address of the packet is considered as a
bogus source address, and the packet is discarded.
If there are multiple connections between two network edge devices, routers may be
asymmetrical but loose URPF still safeguards the network to a certain extent.
14.3.3 Applications
NOTE
Figure 14-15 URPF application environment where a client is connected to the ISP network
through only one path
Network A
Network C
PE-3
NodeB
NodeB
source: 2.2.2.2
destiation:3.3.3.3
As shown in Figure 14-15, Network A and Network B are connected to ATN-1. URPF is
enabled on GE 0/2/0 and GE 0/2/1 of ATN-1 to protect the ISP network against source
address spoofing attacks from network A and network B.
It is assumed that a PC in network A sends a request packet with a forged source address
being 2.2.2.2 to network C. After receiving the request packet, ATN-1 performs the URPF
check on the based on the inbound interface and source address of the packet. ATN-1 then
finds that the request packet should enter through GE 0/2/1 but it enters through GE 0/2/0.
ATN-1 considers the source address of the packet as a bogus source address, and directly
discards the packet. In this manner, ATN-1 is protected against the source address spoofing
attack.
After normal packets sent by a PC in network B to network C pass the URPF check, the
packets are normally forwarded.
source: 2.2.2.2
NodeB destiation:3.3.3.3
ATN
1.1.1.1/24
CX-A
Server
source:2.2.2.2 3.3.3.3/24
destiation:3.3.3.3
NodeB ATN
2.2.2.2/24
l As shown in Figure 14-16, multiple connections are set up between NodeB network and
an ISP to ensure reliability. In this case, symmetrical routes between the NodeB network
and the ISP network cannot be ensured, and loose URPF must be used.
Network A ISP A
CX-B
Internet
NodeB ATN A
ISP B
CX-C
As shown in Figure 14-17, the NodeB network is connected to multiple ISP networks.
Therefore, it is difficult to ensure symmetrical routes between the NodeB network and two
ISP networks. Loose URPF must be used.
URPF applied in the scenario where an NodeB network is connected to multiple ISP networks
has the following characteristics:
l If any special packet is required to pass the URPF check in all conditions, you can
specify the source address in an ACL.
l Many users' ATNs may have only one default route to the ATN of an ISP network.
Therefore, the default routing entries should be configured.
14.4.1 Introduction
Definition
The local attack defense feature restricts the packets to be sent to the CPU of an ATN device
to protect services on the ATN device.
Purpose
With the development and wide application of the network, people become more concerned
about how to keep the confidentiality and security of their privacy data and resources in such
an open network environment.
Therefore, protecting the CPU is necessary and important for the device to process and
respond to normal services. Both valid packets and attack packets that are destined for the
CPU exist on the network. Attack packets destined for the CPU can interrupt services or even
paralyze the system. Moreover, a burst of valid packets will increase the CPU usage,
deteriorating the CPU performance.
The local attack defense feature on an ATN device targets the packets to be sent to the ATN
CPU in an attempt to protect running services and prevent mutual impact of services on each
other in the case of attacks.
Local attack defense on the ATN device offers the following functions:
l Restricts users' remote access to the ATN device through unauthorized interfaces.
l Restricts control packets received on unauthorized interfaces.
l Ensures that packets matching the configured whitelist are preferentially sent to the
CPU.
l Restricts the rate at which packets are sent to the CPU.
l Records information about attack packets.
Benefits
The local attack defense feature offers the following benefits to carriers:
l Services are not interrupted or affected when the ATN device is under attack, improving
the ATN device's working duration and carriers' service capabilities.
The local attack defense feature offers the following benefits to users:
l Attacks on the ATN device are restricted, improving user information security and
bandwidth usage.
ATN-A
GE1
1.1.1.1/24
Internet
ATN-B ATN-C
GE2 GE3
2.2.2.2/24 3.3.3.3/24
When under attacks from the Internet, ATN-A applies local attack defense to sustain its
service continuity and protect the communication with ATN-B and ATN-C.
Management and control plane protection has two functions. One function is to specify some
interfaces as management interfaces and enable the other interfaces to discard all received
management packets. In this manner, management and control plane protection can prevent
attackers from remotely controlling ATN. The other function is to control protocol packets at
the software layer. By applying policies globally or to specific interfaces, management and
control plane protection can flexibly specify the types of protocol packets for an interface on
ATN. If no active interface on the device has the protocols FTP, SSH. SNMP, Telnet, and
TFTP enabled, the command for disabling these protocols globally does not take effect. For
example, if no active interface on the device has FTP enabled, the command for disabling
FTP globally does not take effect. This prevents the device from being disconnected.
The attack source tracing module can be considered as a powerful log processing center. It
records the information about attack packets coming from each function module, and
sequences the attack packets by timestamp in its buffer area. In addition, it supports exact
query and fuzzy query of information about the attack packets and retains the information
after the device is reset. Users can export the information in a standard format to the CF card
on the system control board by running a specific command.
Currently, Information recorded by attack source tracing cannot be saved on an independent
server.
14.4.2.3 CP-CAR
CP-CAR restricts the rate at which packets are sent to the CPU to protect the CPU.
Rate limitation can be implemented on packets first by a specific protocol and then by all
protocols.
l Rate limitation on protocol-specific packets: Specific bandwidth is configured for
packets of a specific protocol, such as ARP and DHCP, so that packets of this protocol
do not preempt bandwidth that is intended for other protocol packets.
l Rate limitation on all protocol packets: The excess packets over the rate limit are
dropped so that the CPU is protected against overloads. After packets of each protocol
are transmitted at a protocol-specific rate, these packets are added to 14 protocol groups,
for each of which bandwidth is assigned based on its weight. Packets in each protocol
group are placed in eight queues for transmission to ensure that packets of different
protocols do not affect each other.
The alarm function accompanies rate limitation.
l Rate limitation on protocol-specific packets: If both the number of protocol-specific
packets sent to the CPU and the CPU usage reach the configured thresholds within a
certain period, an alarm is generated.
l Rate limitation on all protocol packets: If the number of all protocol packets sent to the
CPU and the CPU usage reach the configured thresholds within a certain period, an
alarm is generated.
14.4.2.5 Alarm
If the number of discarded packets exceeds the threshold in a period, the device with the
security function generates alarms that inform the NMS of massive packet loss that must be
handled.
If the number of discarded packets is under the threshold, the alarm is cleared.
14.4.3 Applications
To resolve the problem showed in Figure 14-20, enable the whitelist-based application layer
association function to add the user to the whitelist to prevent the user's connection to the
ATN device from being interrupted.
14.4.3.2 CP-CAR
CP-CAR is used to protect the ATN device's CPU so that the CPU can properly work even
under attacks.
In Figure 14-21, ATN device need to process OSPF protocol packets transmitted from the
network.An attacker sends a large number of attack packets to an ATN device to consume
excessive CPU resources on the ATN device, causing the ATN device not to be able to
process OSPF protocol packets transmitted on the network side. As a result, the OSPF
connection fails to be established between the ATN device and the network-side device.
After CP-CAR is configured on the ATN device, attack packets are sent to the CPU at a
restricted rate so that sufficient CPU resources are allocated to OSPF protocol packets
transmitted on the network side. In this manner, OSPF protocol packets can be properly
transmitted between the ATN device and network-side device, allowing valid OSPF users to
access the Internet through the ATN device.
14.4.4.1 Abbreviations
None
14.5 Mirroring
Definition
NOTE
The mirroring feature may be used to analyze the communication information of terminal customers for
a maintenance purpose. Before enabling the mirroring function, ensure that it is performed within the
boundaries permitted by applicable laws and regulations. Effective measures must be taken to ensure
that information is securely protected.
Mirroring can be classified into the following types based on the locations of the mirrored
port and the observing port:
l Local mirroring
In local mirroring, traffic on an interface is copied and sent through another interface of a
local device. As shown in Figure 14-22, Port 1 is the local mirrored port and Port 3 is
the local observing port.
Port 1 Port 2
Network 1 Network 2
Imported packets Forwarded packets
Analyzer
Mirroring can also be classified into the following types based on mirroring policies:
Mirrored packets
GE0/2/0
Label VC L2 L3 Mirrored
L2 L3
GRE VC L2 L3 port Local
Source ATN
mirroring Destination ATN
ATN-A source ATN-B
MPLS/GRE tunnel
GE0/2/2
Mirrored port
Remote mirroring GE0/2/1
source Local observing port
Forwarded flow
Mirroring flow Analyzer
l Local mirroring source: In local mirroring, it is an interface on which packets are copied
to and then sent out by a specified local observing interface. As shown in Figure 14-23,
GE 0/2/0 of ATN-B is a local mirroring source.
l Local observing port: It is an interface for sending the mirrored traffic. As shown in
Figure 14-23, GE 0/2/1 of ATN-B is a local observing port.
Functions supported in both local mirroring and remote mirroring are as follows:
Purposes
When connected to the Internet, a device faces various attacks. In this case, the self-protection
capability of the device must be enhanced by analyzing attack packets in time to eliminate
attack threats, filtering attack packets before attacks and tracing attack sources to prevent the
same attacks.
Benefits
Benefits Brought to Users
Customers and customer service personnel apply mirroring to locate and analyze network
problems.
14.5.2 Principle
This section describes the basic principle and application of local mirroring.
14.5.2.2 Application
Port 1 Port 2
Network 1 Network 2
Imported packets Forwarded packets
Analyzer
As shown in Figure 14-24, Port 1 on ATN-A is a mirrored port that mirrors received packets
and Port 3 is an observing port.
14.6.1 Introduction
Definition
NOTE
The NetStream function conforms to IETF RFC3954. For security risks, see IETF RFC3954. This
function involves analyzing the communications information of terminal customers. Before enabling the
function, ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Online packet head capture is used to intercept sent and received packets for analysis.
Purpose
When network devices send or receive incorrect packets or drop packets, the quality of voice
services on the network will deteriorate and mosaics will occur, affecting network
performance. To solve the preceding problems, you can use the online packet head capture
function.
In the traditional method for obtaining packet information, files on a device need to be
exported and maintenance engineers must connect devices or modify configurations on site.
This consumes great human cost and increases device operating risks. The online packet head
capture function is easy to configure and allows you to capture packets' head quickly for a
specified problem.
You can check packet head, captured on a device without exporting files from the device,
shortening location time and enhancing location efficiency.
Benefits
This function brings the following benefits to carriers:
l This function can be deployed remotely to facilitate packet head capture for devices
deployed far away.
l The topology of devices does not need to be modified, shortening location time.
l Compared with the mirroring function, online packet head capture does not need
observing interfaces, saving interface resources.
14.6.2 Principles
Principles
Online packet head capture is similar to the mirroring function. Packets are processed using
the mirroring process. Then packet head captured is sent to a CPU of an MPU and stored in
the CPU memory and CF card.
A device can capture packet heads of various protocol packets sent to a CPU, including BGP,
LDP, FTP, and Telnet packets.
Profile Instance
The profile instance for online packet head capture defines the following parameters:
l Duration
Indicates the duration for capturing the packet head.
l Number of captured packets' head
Indicates the maximum number of packets' head to be captured.
l Size of packets' head to be captured
Indicates the size of a packet head capture file (storing captured packets' head). Packets'
head will no longer be captured if the size of the packet head capture file exceeds the
specified value.
l Length of packet head
Indicates the length of the packet head captured.
14.6.3 Applications
The voice service quality of a subscriber attached to a PE on the live network deteriorates and
troubleshooting is required, as shown in Figure 14-25.
To quickly locate the network fault, a network engineer can deploy the remote packet head
capture function on that PE and capture packet heads of packets sent to the CPU of that PE for
processing. Then the network engineer can analyze captured packet heads for fault
rectification.
ISP Networks
CE
OFFICE
Internet PE
BGP、LDP、FTP、TELNET
Engineer Packets
Forwarding packet
None.
14.7 MPAC
14.7.1 Introduction
Definition
The Management Plane Access Control (MPAC) feature protects devices from attacks.
MPAC enables devices to filter packets destined for the CPU based on rules specified in an
MPAC policy and discard unneeded packets, which helps prevent attacks to the CPU.
Purpose
On an Internet service provider (ISP) network, user-side interfaces on a local device receive a
great number of packets to be forwarded to the CPU. Some packets attempt to initiate attacks
to the CPU. If too many packets rush to the CPU, CPU usage increases sharply and device
performance deteriorates, which then affects services running on the device. Frequently
sending attack packets to the CPU causes the CPU to be busy processing packets, which
affects other services or even causes a system crash.
An MPAC policy can be configured to allow the device to send valid packets to the CPU and
to discard attack packets, which prevents attacks to the CPU. MPAC is enabled to protect
TCP/IP-based control plane protocols against Denial of Service (DoS) attacks. For example,
an attacker keeps sending packets to a device by simulating a routing protocol. The device
receives and processes the attack packets as valid packets. As a result, the device becomes
extremely busy, and its CPU usage increases. To prevent CPU overload, you can set an MPAC
rule to enable the device to drop forged packets destined for the CPU.
14.7.2 Principles
On an Internet service provider (ISP) network shown in Figure 14-26, user-side interfaces on
a local device receive a great number of packets to be forwarded to the CPU. Some packets
attempt to initiate attacks to the CPU.
User-side Network-side
Network
ATNA
Attack packets destined for the CPU on a device pose the following threats to the device:
l A great number of packets sent to the CPU are likely to cause a sharp spike in CPU
usage. If the CPU is overloaded, device performance deteriorates, and services may be
interrupted.
l Malicious packets allowed to reach the CPU consume resources, which causes a service
interruption or even a system crash.
To prevent CPU resource exhaustion for stable network operation, configure an MPAC policy
on sub-interfaces, physical interfaces, and the entire device. The rules in the policy determine
whether protocol-specific packets with the specified source and destination addresses can be
sent to the service module.
l If packets match a rule and the behavior in the rule is "permit", the packets are sent to the
service module.
l If packets match a rule and the behavior in the rule is "deny", the packets are discarded.
l If packets do not match rules in the policy, the packets are sent to the service module.
Figure 14-27 demonstrates how an MPAC-capable device processes packets. You can define
rules in an MPAC policy to meet site-specific requirements.
No
No
Discard the packets
No
No
Discard the packets
No
No
Discard the packets
Terms
None
14.8 Keychain
14.8.1 Introduction
Definition
Applications, such as Routing Protocol Application (RPA), Transmission Control Protocol
(TCP), and signaling protocols (like LDP), exchange authenticated packets over the network
for security reasons, but the authentication mechanism in these applications is not robust.
Each application uses a constant authentication key unless the administrator of the network
changes the key manually. Manual authentication key change is a cumbersome procedure.
During the change, packets can be dropped, because it is very difficult to change the keys
instantaneously on all routers.
Another drawback of this type of authentication mechanism is that there is no central
application to control all the authentication functionality. Each application maintains its own
set of authentication rules. If there are many application instances that require the same set of
authentications, this results in duplication of data and processing.
This authentication system needs a mechanism to achieve centralization of all authentication
processing and dynamic change of authentication keys with little human intervention. To
achieve such, a new application called Keychain has been added to the system.
Keychain is a centralized application that provides authentication functionality to all
applications that require them. It also provides dynamic change of authentication keys to all
required applications.
Purpose
When routing applications communicate over a network, persons with malicious intent can
tamper with packets or pretend to be authenticated users. To detect modified messages and to
authenticate the sender, routing applications support message authentication by defining the
authentication rules statically. Each application may use different authentication rules, but
using the same authentication rule over a long period will eventually compromise security.
Manually changing the authentication rules on communicating peers simultaneously is error
prone.
If each application maintains its own set of authentication rules, multiple instances of the
same set of authentication information create duplication of data and processing across
networking applications.
Keychain centralizes the storage of authentication information and provides dynamic
modification of authentication information without human intervention for all applications
that need to perform authenticated communication.
14.8.2 Principles
14.8.3 Applications
This section describes typical applications of Keychain.
Non-TCP applications such as RIP and ISIS can initialize or de-initialize with the Keychain
module through the exposed initialization application programming interface (API) provided
with Keychain.
When an application needs to send packets, it performs the process shown in the following
figure.
Provide
Send packet data
Request active for which Provide
active
key-id information MAC has calculated
key-id
to be MAC value
created
Keychain
1. Through the Keychain API, the application queries Keychain for the active send key-id.
When it receives the active key-id, the application constructs the packet data for which a
MAC needs to be calculated. Then it sends the packet data to Keychain.
2. Keychain generates a MAC for the packet data and sends the calculated MAC to the
application.
3. The application formulates a packet with authentication information and sends it out.
When an application receives a packet, it performs the process shown in the following figure.
Application
Receive
packet Accept packet
Receive packet with
Parse packet based on the
authentication
validation
Send information to
Keychain for validation Send
of received success or
authentication failure
information
Keychain
When an application that does not carry the key-id in the packet, such as ISIS, receives a
packet, it performs the following process:
l The application extracts the authentication information and sends the information
(Keychain name, packet data, algorithm type, MAC) to Keychain for validation.
l Keychain re-calculates the MAC for each active receive key-id and compares them with
the MAC received in the packet. If the MACs match, then Success is returned to the
application; otherwise, failure is returned to the application.
l The application accepts or rejects the packet based on the Keychain validation.
In TCP application of Keychain, authentication is done at the TCP level, not at the application
level. An application specifies that TCP will use Keychain to extract authentication
information. TCP initializes or de-initializes itself with the Keychain module through the
exposed Keychain initialization API.
TCP uses the Enhanced Authentication Option for authenticated communication, as specified
in the TCPM Working Group draft (draft-bonica-tcp-auth-06.txt). The following figure shows
the Option format.
Authentication Data
Because the draft is not a standard yet, the Internet Assigned Numbers Authority (IANA) has
not defined the kind value (Option type) nor the algorithm-id for some algorithms. Thus
different vendors use different values. To be interoperable with other vendors, the kind value
and TCP algorithm-id of TCP are configurable and are maintained in Keychain.
The Keychain API provides a query function for applications to obtain TCP kind and
algorithm-id values.
When a TCP application needs to send packets, it performs the process shown in the
following figure.
Provide
Request TCP Send TCP
packet
kind and TCP kind and Request
Send data for Provide
algorithm-id TCP active
active which Calculated
value algorithm-id key-id
key-id MAC has MAC value
value information
to be
created
Keychain
1. To set the Enhanced Authentication Option, the application queries the Keychain module
to get the active send key-id authentication information.
2. From the authentication information obtained, the application generates packet data and
sends it to Keychain to generate a MAC. Keychain calculates the MAC and sends it to
the application.
3. The application fills in the TCP kind value, TCP algorithm-id that corresponds to the
active send key-id algorithm, and generated MAC in the Enhanced Authentication
Option format and sends out the packet.
When the TCP application receives a packet, it performs the process shown in the following
figure.
TCP Application
Receive
packet Success Accept packet
Receive packet
Parse packet based on the
with authentication
validation
Keychain
l The application extracts authentication information from the packet and provides it to
Keychain for validation.
l Keychain checks whether the TCP algorithm-id in the packet matches the TCP
algorithm-id that corresponds to the received key-id algorithm. If algorithm-ids do not
match, then a failure message will be returned.
l Keychain re-calculates the MAC and compares the generated MAC and received MAC.
If they match, then a success message is returned to the application; otherwise, a failure
message is returned.
l The application accepts or rejects the packet based on the Keychain validation.
Terms
Term Description
Term Description
14.9 IPSec
NOTE
The ATN does not support data encryption on an IPsec VPN tunnel. To comply with RFC standards,
IPsec on the ATN applies only to the IPv4 PIM, IPv6 PIM, MLD, OSPFv3, RIPng protocol packets but
not to the transmitted data.
14.9.1 Introduction
Definition
Internet Protocol Security (IPSec) is a security protocol suite defined by the Internet
Engineering Task Force (IETF). IPSec secures data transmission on the Internet through data
origin authentication, data encryption, data integrity check, and anti-replay functions:
l Data origin authentication: The receiver checks the validity of the sender.
l Data encryption: The sender encrypts data packets and transmits them in cipher text on
the Internet. The receiver decrypts or directly forwards the received data packets.
l Data integrity check: The receiver validates received data to check whether the data has
been tampered with.
l Anti-replay: The receiver rejects old or duplicate packets to prevent attacks that
malicious users initiate by re-sending obtained data packets.
Purpose
On the Internet, most data is transmitted in plain text, causing security risks. For example,
bank accounts and passwords may be intercepted or tampered with, user identities may be
counterfeited, or bank networks may be attacked. IPSec can protect transmitted IP packets to
reduce the risk of information leaks.
Benefits
IPSec reduces the risk of information leaks and tampering, ensures data integrity and
confidentiality, and secures service transmission.
14.9.2 Principles
IPSec configurations include the security association (SA), security protocol, encapsulation
mode, authentication algorithm, and encryption algorithm.
SA
IPSec provides secure communication between IPSec peers (two communication ends).
An SA functions as a convention for some elements of IPSec peers, and is used to protect
data, providing IPSec with essential functions. It defines the security protocol to be applied,
encapsulation mode, authentication mode, and shared key for data protection. The security
protocol can be configured as either Authentication Header (AH) or Encapsulating Security
Payload (ESP), while the authentication mode can be set to Message Digest 5 (MD5), Secure
Hash Algorithm 1 (SHA-1) or SHA-2.
An SA is unidirectional, with incoming packets and outgoing packets being processed by
different SAs. Therefore, if two hosts (ATN A and ATN B) are to communicate through ESP,
two SAs must be set up on ATN A, one of which processes the outgoing packets and the other
processes the incoming packets. Similarly, two SAs must be set up on ATN B, as shown in
Figure 14-33.
SAs are protocol-specific. If ATN A and ATN B use both AH and ESP for secure
communication, four SAs are required on ATN A. Two SAs (one for incoming packets and
one for outgoing packets) are configured for AH and the other two SAs (one for incoming
packets and one for outgoing packets) are configured for ESP. Similarly, four SAs with
equivalent relationships are required on ATN B.
An SA is uniquely identified by a 3-tuple, which comprises the Security Parameter Index
(SPI), destination IP address, and security protocol (AH or ESP). The SPI is a 32-bit number
generated to identify an SA and is carried in the AH or ESP header during transmission.
Security protocol
IPSec ensures security by authenticating and encrypting data using two security protocols,
AH and ESP, the features of which are shown in Table 14-2.
Data integrity authentication This function prevents data This function prevents data
from being modified by from being modified by
unauthorized users during unauthorized users during
transmission. Data security transmission. Data security
is ensured by using an is ensured by using an
authentication key and authentication key and
authentication algorithm that authentication algorithm that
are shared by sending and are shared by sending and
receiving parties. receiving parties.
Before transmitting data, the The authentication process
sending party calculates the is the same as that of AH,
data using the authentication with the difference that,
key and specified using ESP, all IP packet
authentication algorithm. contents except the IP
The sending party then header are authenticated.
sends the calculation result Therefore, AH provides a
together with the data packet more secure service than
to the receiving party. After ESP.
receiving the packet, the
receiving party uses the
same authentication key and
authentication algorithm to
calculate the data. If the
result calculated by the
receiving party is the same
as the calculation result sent
by the sending party, the
packet is considered
integrated and not to have
been modified during data
transmission; otherwise, the
packet is considered to have
been modified and is
dropped.
If AH is used to protect
data, the entire IP packet is
authenticated.
Data origin authentication This function checks that the This function checks that the
party sending the data is party sending the data is
authorized. authorized.
Function\Security AH ESP
protocol
Encapsulation mode
IPSec currently supports the transport mode for encapsulation. With this mode, the AH or
ESP header is inserted following the IP header, but before all transport layer protocol headers
or all other IPSec protocol headers, as shown in Figure 14-34.
Mode
Transport
Protocol
The transport mode is applicable to a scenario in which two hosts, or a host and a security
gateway, are communicating with each other. In transport mode, the two devices encrypting
and decrypting packets must be the original packet sender and the final receiver, respectively.
Both AH and ESP can check the integrity of IP packets and determine whether a packet has
been modified during data transmission. Authentication is based on the hash function, a type
of algorithm that does not limit the length of input messages, but always outputs messages of
a certain length. The output message is called the message summary. To authenticate message
integrity, IPSec peers calculate the message summary based on the hash function. If the
message summaries are identical on both peers, the packet is considered integrated and not to
have been modified. The following IPSec authentication algorithms are supported:
l MD5: generates a 128-bit message summary for an input message of any length.
l SHA-1: generates a 160-bit message summary for an input message of less than 264 bits.
l SHA2-256: generates a 128-bit message summary for an input message of less than 832
bits
l SHA2-384: generates a 192-bit message summary for an input message of less than 1664
bits
l SHA2-512: generates a 256-bit message summary for an input message of less than 1664
bits
The message summary generated by SHA-2 and SHA-1 is longer than that of MD5, and
therefore provides a more secure service.
Encryption algorithm
ESP encrypts an IP packet to prevent disclosure of the packet contents during transmission.
The encryption algorithm is implemented based on a symmetric key system, which encrypts
and decrypts data using the same key. IPSec uses the following two encryption algorithms:
l Data Encryption Standard (DES): uses a 56-bit key to encrypt a 64-bit packet in
plaintext.
l Triple Data Encryption Standard (3DES): uses three 56-bit keys (in effect, a 168-bit key)
to encrypt a packet in plaintext.
l Advanced Encryption Standard (AES): uses a 128-bit/192-bit/256-bit key to encrypt
packet in plaintext.
The peer using IPSec perform various security functions for different data flows. The
implementation process of IPSec on the ATN is as follows:
l Define a security proposal, and specify the security protocol, authentication algorithm,
encryption algorithm, and encapsulation mode in the proposal.
l Define an SA, and specify the association relationship between the security protocols,
SPIs, and authentication keys.
l Apply the SA to the service type that requires protection.
The ATN supports AH and ESP security protocols. AH and ESP can be used independently or
together. Both AH and ESP supports three authentication algorithms (MD5, SHA-1 and
SHA-2), and ESP supports three encryption algorithms (DES, 3DES and AES).
Defining an SA
An SA imports the security proposal to specify the security protocol, authentication
algorithm, encryption algorithm, and encapsulation mode for service-based protection. An SA
determines the key for data authentication and encryption and is uniquely identified by the SA
name and SPI. Therefore, the key, SPI, and security proposal are required for an SA.
Applying an SA
Currently, IPSec configured on the ATN protects data based on the service type. Packets to be
protected are encapsulated and the receiving party drops the packets that are not protected by
IPSec or fail to be decapsulated. Service-based IPSec does not require an Access Control List
(ACL) to specify the data flow to be protected or a specific segment of an IPSec tunnel. It is
bound only to a specific service and protects all packets of this service, regardless of which
interface sends the packets.
14.9.3 Applications
Service Overview
Protocol Independent Multicast (PIM) is the most widely used inter-domain multicast
protocol. PIM builds up an multicast distribution tree (MDT) to forward multicast data. PIM
therefore requires high levels of protection. PIM itself does not define any authentication
mechanism, therefore, if no additional authentication mechanism is configured for PIM,
packets will be prone to be intercepted, modified, or faked. This can potentially affect PIM
neighbor relationships and interrupt multicast network communication.
IPSec can be used for authenticating PIM packets. An AH or ESP header inserted into a PIM
packet provides a basis for data origin authentication and data integrity authentication,
protecting PIM neighbor relationships and network communication.
Networking Description
On the networking shown in Figure 14-35, the multicast service is deployed. ATN A set up
PIM neighbor relationships with ATN B and ATN C, and they need to exchange PIM protocol
packets to maintain the neighbor relationships and multicast routing entries. On this network,
there may be attackers who tend to attack the ATNs by sending pseudo PIM protocol packets,
which causes the ATNs fail to forward multicast data. To avoid attacks, you can configure
PIM IP Security (IPSec) on the interfaces of ATN A, ATN B and ATN C to authenticate the
IPv6 PIM protocol packets transmitted between them. In this manner, malicious attacks are
avoided to ensure normal multicast data transmission and Receiver can receive multicast data
from Source.
Ethernet
ATNA
Ethernet
Source PIM SM
ATNC
Receiver
ATNB
Ethernet
Receiver
IPSec SA Negotiation
Feature Deployment
IPSec can be deployed in a PIM process or on an interface.
PIM IPsec configured in the interface view has the same effect as that configured in the PIM
view, but their application scopes are different:
l PIM IPsec configured in the interface view: applies only to the current interface.
l PIM IPsec configured in the PIM view: applies to all interfaces.
PIM IPsec configured in the interface view takes precedence over PIM IPsec configured in
the PIM view. If no PIM IPsec configuration exists in the interface view, the interface uses the
PIM IPsec configuration in the PIM view.
As shown in Figure 14-35, IPSec is configured on all the interfaces so that PIM neighbor
relationships are set up only when IPSec authentication succeeds. Packets that fail IPSec
authentication or undergo different authentication modes on IPSec peers will be dropped.
AH Authentication Header
SA Security Association
15 User Management
This document describes the overview, principle and typical applications of user management
feature.
Definition
AAA, short for Authentication, Authorization, and Accounting, provides the following types
of security functions:
The ATN implements Authentication and Authorization through the Remote Authentication
Dial in User Service (RADIUS) protocol or the Huawei Terminal Access Controller Access
Control System (HWTACACS) protocol.
l RADIUS
RADIUS is one of the most commonly used protocols to implement AAA. As an
application-layer protocol running between the ATN and a RADIUS server, RADIUS
defines the procedure for transmitting user information and accounting information
between theATN and the RADIUS server and the format of packets exchanged between
them.
l HWTACACS
AAA can also be implemented through HWTACACS. HWTACACS is the enhancement
of TACACS that is an access control protocol defined in RFC 1492. Similar to RADIUS,
HWTACACS adopts the client/server model to communicate with the HWTACACS
server, therefore implementing Authentication and Authorization for various users.
In actual applications (except the applications of non-accounting) on the ATN, all user
accounts must be configured on an AAA server, and all the domains to which the user
accounts belong must be configured on the ATN. The ATN supports the configuration and
management of local user accounts.
Commonly, the service attributes configured in a domain have a lower priority than the
service attributes delivered by an AAA server. Therefore, when service attributes are both
configured for a domain and delivered by an AAA server, the ATN adopts the service
attributes that are delivered by the AAA server. The service attributes configured for a domain
take effect only when the AAA server does not support or deliver the service attributes.
Purpose
The ATN implements AAA through either RADIUS, and implements Authentication,
Authorization and Accounting through HWTACACS.
The ATN supports domain-based or user account-based user management and supports
multiple authentication and accounting policies.
Benefits
This feature brings the following benefits to carriers:
l Access users are identified to guarantee legal service access.
l Authorities of access users are controlled through domain-based user management.
l The reliability of access user accounting is ensured through the RADIUS or
HWTACACS accounting accounting protocol and the local accounting function in case
of the remote accounting failure.
15.1.2 Principles
15.1.2.1 AAA
Authentication
The ATN supports the following authentication modes. The modes can be used in
combination.
l Local authentication
In this mode, user information, including the user name, password, and attributes, is
configured on the ATN. This mode features fast processing speed and low operation
costs. The major limitation is that the information storage capacity is subject to the
capacity of device hardware.
l Remote authentication
In this mode, user information, including the user name, password, and attributes, is
configured on an authentication server. The ATN supports remote authentication through
RADIUS or HWTACACS. As a client, the ATN communicates with the RADIUS or
HWTACACS server. The RADIUS protocol can be either a standard RADIUS protocol
or an extended RADIUS protocol of Huawei, that is, RADIUS+V1.0 or RADIUS+V1.1.
l First local authentication and later remote authentication
It is a local-authentication-preferred policy. That is, remote authentication is performed
only after local user name did not exist.
l First remote authentication and later local authentication
It is a remote-authentication-preferred policy. That is, local authentication is performed
only after the AAA server gives no response.
Authorization
The ATN supports user authorization during user login. During user login, the ATN supports
various types of authorization schemes.
The ATN supports the following authorization modes during user login:
l Local authorization
In this mode, users are authorized based on the attributes of local user accounts
configured on the ATN.
l HWTACACS authorization
In this mode, users are authorized through a HWTACACS server.
l If-authenticated authorization
In this mode, users pass the authorization after passing authentication.
l RADIUS authorization
RADIUS integrates authentication and authorization. Therefore, RADIUS authorization
cannot be performed independently.
Accounting
The ATN supports Non-accounting, HWTACACS accounting, and RADIUS accounting
mode. By default, RADIUS accounting mode is adopted.
l Accounting mode
AAA supports the following accounting modes:
– Non-accounting
Free services are provided.
– Remote accounting
The ATN supports remote accounting through a RADIUS server.
During remote accounting, the real-time accounting function can be enabled. By
default, real-time accounting is disabled.
n Real-time accounting
During real-time accounting for online users, the ATN periodically generates
accounting packets and then sends them to a remote accounting server. Real-
time accounting is also a bill protection measure. It furthest reduces error bills
and ensures accuracy of accounting information in case of a link failure.
l Accounting failure policy
The ATN supports the configuration of a remote accounting failure policy. Remote
accounting failure policies include:
– Policy for start-accounting failures
When start-accounting fails,
n If the policy is set to "offline", the ATN terminates user access.
n If the policy is set to "online", the user remains online but no real-time
accounting packets can be exchanged between the user and the AAA server,
even though the AAA server gives a response again. The user still needs to
send an accounting packet to the AAA server for going offline.
– Policy for real-time accounting failures
When real-time accounting fails,
n If the policy is set to "offline", the ATN terminates user access.
n If the policy is set to "online", the user remains online and sends real-time
accounting packets to the AAA server. If the user needs to go offline, it sends
an accounting packet to the AAA server.
15.1.2.2 RADIUS
Figure 15-2 Process of exchanging RADIUS messages between the RADIUS server and
client
1.User name
password 2.Request
3.Response
User ATN RADIUS sever
1. A user initiates authentication and sends a user name and password to the ATN.
2. After the RADIUS client configured on the ATN receives the user name and password, it
sends an authentication request to the RADIUS server.
3. If the request is valid, the RADIUS server completes the authentication and sends the
required authorization information back to the RADIUS client.
Authentication information is encrypted before being transmitted between the RADIUS client
and RADIUS server. This prevents theft of information on an insecure network.
The process of exchanging accounting messages is similar to that of exchanging
authentication or authorization messages.
RADIUS Features
RADIUS adopts the server/client model and has the following characteristics:
l RADIUS features excellent real-time performance by using the User Datagram Protocol
(UDP) as the transmission protocol.
l RADIUS possesses high reliability owing to the retransmission mechanism and backup
server mechanism.
l RADIUS is easy to implement and is applicable to the multi-threaded server in the case
of a large number of users.
RADIUS Versions
The ATN supports standard RADIUS, RADIUS+V1.0, and RADIUS+V1.1. RADIUS+V1.1
and RADIUS+V1.0, derived from the standard RADIUS protocol, are Huawei proprietary
protocols. With these protocols. The two protocols are both applicable to IPHotel and Portal
services though they are different in expansion.
l RADIUS+V1.0
In RADIUS+V1.0, a private attribute set is suffixed to the standard attribute set. That is,
the private attributes are added to the standard attribute set. Such an extension may
conflict with the subsequent extension of the standard RADIUS protocol.
l RADIUS+V1.1
In RADIUS+V1.1, all private attributes are considered a subset to be contained in the
vendor-specific attribute defined in RFC 2865. This ensures the interworking and
controllability between extended RADIUS+V1.1 of Huawei and the extended RADIUS
protocols defined by other vendors, and avoids the conflict between extended RADIUS
+v1.1 of Huawei and the subsequent extension of the standard RADIUS protocol.
Dynamic Authorization
The RADIUS server changes the service attributes of online users through CoA packets. The
format of the CoA packet is the same as that of the normal RADIUS packet, as shown in
Figure 15-1. In the CoA packet, the Code field has the following values:
l 43, CoA-Request packet
l 44, CoA-ACK packet
l 45, CoA-NAK packet
The dynamic authorization is applied in destination address accounting (DAA) services that
use service policies delivered by the RADIUS server. For details of the DAA services, refer to
"Configuring DAA Service" of the ATNMulti-service Access EquipmentConfiguration Guide
- User Access - DAA Configuration.
Figure 15-3 shows the basic process of dynamic authorization.
1. User subscribes to
a service
2. Information about
the ordered service
3. CoA-Request
packet
4. CoA-NAK/CoA-
NAK packet
5. Updated policy
1. The user accesses the portal server and subscribes to a DAA service online.
2. The portal server sends information about the service that the user subscribes to the
RADIUS server.
3. The RADIUS server makes or modifies the service policies according to the service
information and user information. Then, the RADIUS server sends the CoA-Request
packet to the ATN, and requests to modify the authorization information of the user.
4. After receiving the CoA-Request packet from the RADIUS server, the ATN modifies the
authorization information of the user without changing the online state of the user. If the
modification is successful, the ATN returns the CoA-ACK packet to the RADIUS server;
if the modification fails, the ATN returns the CoA-NAK packet to the RADIUS server.
5. If the modification is successful, when the user uses the DAA service, the ATN controls
the service based on the modified authorization attribute.
Disconnect Message
Disconnect Message (DM), is the operation of AAA server to get users offline. In the DM
packet, the Code field has the following values:
l 40 - Disconnect-Request
l 41 - Disconnect-ACK
l 42 - Disconnect-NAK
Figure 15-4 shows the basic process of DM.
Figure 15-4 DM
RADIUS
User ATN Server Portal Server
3. DM Response
packet 4. Updates new
policy succeeds
5. Informs the user
to get offline
3. After receiving the DM-Request packet from the RADIUS server, the ATN processes the
DM message without changing the online state of the user. If the processing is
successful, the ATN returns the DM-ACK packet to the RADIUS server; if the
processing fails, the ATN returns the DM-NAK packet to the RADIUS server.
4. If the processing is successful, the ATN instructs the user to go offline.
15.1.2.3 HWTACACS
Features of HWTACACS
Compared with RADIUS, HWTACACS is more reliable in transmission and encryption and
therefore is more suitable for security control. Table 15-1 shows comparisons between
HWTACACS and RADIUS.
Encrypts the main structure of a packet Encrypts only the password field in the
except the standard HWTACACS header. authentication packet.
Authorizes the commands executed by Does not authorize the commands executed by
administrative users. administrative users.
3.author-cmd ACK
User ATN TACACS
Server
Overview
Currently, the device manages users in the following modes:
The service attributes configured for a domain have a lower priority than the service attributes
delivered by an AAA server. Therefore, when service attributes are both configured for a
domain and delivered by an AAA server, the ATN adopts the service attributes that are
delivered by the AAA server. The service attributes configured for a domain take effect only
when the AAA server does not support or deliver the service attributes.
Overview of a Domain
The ATN supports a user account in the format of username@domain or domain@username.
Here, @ is a domain name delimiter. The positions of the domain name and the user name can
be exchanged. If the user account that is input when a user accesses the ATN does not contain
a domain name, it indicates that the user belongs to the default domain of the system.
l Default domain
A default domain is fixed in the system. The service attributes of the default domain can
be modified rather than deleted.
The ATN has one default domain: default_admin, as shown in Table 15-2.
15.1.3 Applications
RADIUS RADIUS
(master) (backup)
129.7.66.66 129.7.66.67
BTS1
Internet
BTS2
ATN user
BTS3
server goes Down, the packets are switched to the backup server for authentication. After the
authentication succeeds, the HWTACACS server delivers corresponding rights to the users.
Figure 15-7 shows the network diagram of HWTACACS authentication, accounting, and
authorization.
HWTACACS HWTACACS
(master) (backup)
BTS1
Internet
BTS2 ATN
user
BTS3
15.2 DHCP
Purpose
As the network expands and becomes complex, the number of hosts often exceeds the number
of available IP addresses. As portable computers and wireless networks are widely used, the
positions of computers often change, causing IP addresses of the computers to be changed
accordingly. As a result, network configurations become increasingly complex. To properly
and dynamically assign IP addresses to hosts, DHCP is used.
DHCP is developed based on the BOOTstrap Protocol (BOOTP). BOOTP runs on networks
where each host has a fixed network connection. The administrator configures a BOOTP
parameter file for each host, and the file remains unchanged for a long period of time. DHCP
has the following new features compared with BOOTP:
l Dynamically assigns IP addresses and configuration parameters to clients.
l Enables a host to obtain an IP address dynamically, but does not specify an IP address
for each host.
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage.
15.2.2 Principles
This section describes the implementation of DHCP.
DHCP Architecture
Figure 15-8 shows the DHCP architecture.
IP Network
their address configuration. Using a DHCP relay agent eliminates the need for deploying
a DHCP server on each network segment. This feature reduces network deployment
costs and facilitates device management.
In the DHCP architecture, the DHCP relay agent is optional. A DHCP relay agent is
required only when the server and client are located on different network segments.
l DHCP Server
A DHCP server processes requests of address allocation, address lease extending, and
address releasing from a DHCP client or a DHCP relay agent, and allocates IP addresses
and other network configuration parameters to the DHCP client.
sname (64)
file (128)
options (variable)
In Figure 15-9, numbers in the round brackets indicate the field length, expressed in bytes.
op(op 1 byte Indicates the message type. The options are as follows:
code) l 1: DHCP Request message
l 2: DHCP Reply message
htype 1 byte Indicates the hardware address type. For Ethernet, the value of
(hardware this field is 1.
type)
hlen 1 byte Indicates the length of a hardware address, expressed in bytes. For
(hardware Ethernet, the value of this field is 6.
length)
hops 1 byte Indicates the number of DHCP relay agents that a DHCP Request
message passes through. This field is set to 0 by a DHCP client or
a DHCP server. The value increases by 1 each time a DHCP
Request message passes through a DHCP relay agent. This field
limits the number of DHCP relay agents that a DHCP message
can pass through.
secs 2 bytes Indicates the time elapsed since the client obtained or renewed an
(seconds) IP address, in seconds.
flags 2 bytes Indicates the Flags field. Only the leftmost bit of the Flags field is
valid and other bits are set to 0. The leftmost bit determines
whether the DHCP server unicasts or broadcasts a DHCP Reply
message. The options are as follows:
l 0: The DHCP server unicasts a DHCP Reply message.
l 1: The DHCP server broadcasts a DHCP Reply message.
yiaddr 4 bytes Indicates the DHCP client IP address assigned by the DHCP
(your client server. The DHCP server fills this field into a DHCP Reply
ip address) message.
siaddr 4 bytes Server IP address from which a DHCP client obtains the startup
(server ip configuration file.
address)
giaddr 4 bytes Indicates the IP address of the first DHCP relay agent. If the
(gateway ip DHCP server and client are located on different network
address) segments, the first DHCP relay agent fills its IP address into this
field of the DHCP Request message sent by the client and
forwards the message to the DHCP server. The DHCP server
determines the network segment where the client resides based on
this field, and assigns an IP address on this network segment from
an address pool.
The DHCP server also returns a DHCP Reply message to the first
DHCP relay agent. The DHCP relay agent then forwards the
DHCP Reply message to the client.
chaddr 16 Indicates the client MAC address. This field must be consistent
(client bytes with the hardware type and hardware length fields. When sending
hardware a DHCP Request message, the client fills its hardware address
address) into this field. For Ethernet, a 6-byte Ethernet MAC address must
be filled in this field when the hardware type and hardware length
fields are set to 1 and 6 respectively.
sname 64 Indicates the name of the server from which a client obtains
(server host bytes configuration parameters. This field is optional and is filled in by
name) the DHCP server. The field must be filled in with a character
string that ends with 0.
file (file 128 Indicates the Bootfile name specified by the DHCP server for a
name) bytes DHCP client. This field is filled in by the DHCP server and is
delivered to the client when the IP address is assigned to the
client. This field is optional. The field must be filled in with a
character string that ends with 0.
options Variabl Indicates the DHCP Options field, which has a maximum of 312
e bytes. This field contains the DHCP message type and
configuration parameters assigned by a server to a client,
including the gateway IP address, DNS server IP address, and IP
address lease.
Message Description
Name
Message Description
Name
DHCP OFFER A DHCP Offer message is sent by a DHCP server to respond to a DHCP
Discover message. A DHCP Offer message carries various configuration
information.
DHCP ACK A DHCP ACK message is sent by a DHCP server to acknowledge the
DHCP Request message from a DHCP client. After receiving a DHCP
ACK message, the DHCP client obtains the configuration parameters
including the IP address.
DHCP NAK A DHCP NAK message is sent by a DHCP server to reject the DHCP
Request message from a DHCP client. For example, after a DHCP server
receives a DHCP Request message, it cannot find matching lease records.
Then the DHCP server sends a DHCP NAK message, notifying that no IP
address is available for the DHCP client.
DHCP A DHCP Decline message is sent by a DHCP client to notify the DHCP
DECLINE server that the assigned IP address conflicts with another IP address.
Then the DHCP client applies to the DHCP server for another IP address.
As shown in Figure 15-10, the Code field in Option 82 is 82; the Length field indicates the
total number of bytes in the Agent Information field; the iN field indicates a sub-option of the
Agent Information field and each sub-option is a SubOpt/Length/Value tuple.
The initially assigned device sub-options are as follows:1: agent circuit ID sub-option
A DHCP server uses the agent circuit ID sub-option for IP and other parameter assignment
policies.
i
82 N i1 i2 i3 i4 i5 …
N
The ATN device uses the Option 82 field to define the address assignment policies or other
policies for the DHCP server to perform.
Figure 15-11 Appending an Option 82 field to the DHCP message on an ATN Device
Client3
Client2
DHCP DHCP
ATN Relay Server
Client1 Internet
Discover
Discover+Option82
Offer+Option82
Offer
Request
Request+Option82
Ack+Option82
Ack
Data Exchange
Option 82 Implementation
After Option 82 is enabled, a interface checks whether the DHCPREQUEST message sent
from a client or the message ready to send to a client contains an Option 82 field.
When the DHCPREPLY message is forwarded, the device first checks whether the message
contains Sub-option 1 and whether the sub-option contains the Huawei Device Identifier field.
If so, the device can successfully parse the Option 82 field, and then removes the Huawei
Device Identifier field from Sub-option 1 before forwarding the message.
1.DHCP Discover
2.DHCP Offer
3.DHCP Request
4.DHCP ACK
1. The DHCP client sends a DHCPDISCOVER packet to the DHCP server and enters the
selecting state. Then, the DHCP client creates a timer for waiting DHCPOFFER packets
from the DHCP server.
– If the DHCP client receives a non-DHCPOFFER packet, it discards the packet.
– If the DHCP client receives no DHCPOFFER packet before the timer expires, the
DHCP client is initialized and sends another request for an IP address.
2. After receiving a DHCPOFFER packet, the DHCP client deletes the timer and sends a
DHCP request. Then, the DHCP client creates a timer for waiting a DHCPACK packet.
– If the DHCP client receives a packet that is not a DHCPACK or DHCPNAK packet,
it discards the packet.
– If the DHCP client receives a DHCPNAK packet, it sends another request for an
address.
– If the DHCP client has not received a DHCPACK or DHCPNAK packet before the
timer expires, it sends request packets four times at intervals of 4s, 8s, 16s, and 32s,
respectively. If the DHCP client still does not receive any response within 60s after
the request packets have been sent for four times, it re-sends DHCPDISCOVER
packets at intervals of 4s, 8s, 16s, 32s, 64s in sequence and 64s for later on to
initialize address allocation and apply for an IP address again until a DHCPOFFER
packet is received.
3. After being allocated an IP address, the DHCP client sends a gratuitous ARP packet to
check whether the allocated address is already in use. If the address is in use, the DHCP
client sends a DHCPDECLINE packet to the DHCP server and returns to the initial state.
The DHCP relay function enables message exchanges between a DHCP server and a client on
different network segments. When the DHCP client and server are on different network
segments, the DHCP relay agent transparently transmits DHCP messages to the destination
DHCP server. In this way, DHCP clients on different network segments can communicate
with one DHCP server.
Figure 15-13 shows how a DHCP client uses the DHCP relay agent to apply for an IP address
for the first time.
Figure 15-13 shows the working process of a DHCP relay agent. The DHCP client sends a
Request message to the DHCP server. When receiving the message, the DHCP relay agent
processes and unicasts the message to the specified DHCP server on the other network
segment. The DHCP server sends requested configurations to the client through the DHCP
relay agent based on information in the Request message.
1. After receiving a DHCP Discover message or a Request message, the DHCP relay agent
performs the following operations:
– Discards DHCP Request messages whose number of hops is larger than the hop
limit to prevent loops. Or, increases the value of the hop by 1, indicating that the
message passes through a DHCP relay agent.
– Checks the giaddr field. If the value is 0, set the value of the giaddr field to the IP
address of the interface which receives the Request message. Selects one IP address
if the interface has multiple IP addresses. All the Request messages received by the
interface later use this IP address to fill the giaddr field. If the value is not 0, do not
change the value.
– Sets the TTL in the request packets to the default value in the DHCP relay device,
not the value calculated by decreasing the original TTL by 1. You can change the
value of the hops field to prevent loops and limit hops.
– Changes the destination IP address of the DHCP Request message to the IP address
of the DHCP server or the IP address of the next DHCP relay agent. In this way, the
DHCP Request message can be forwarded to the DHCP server or the next DHCP
relay agent.
2. The DHCP server assigns IP addresses to the client based on the Relay Agent IP Address
field and sends the DHCP Reply message to the DHCP relay agent specified in the Relay
Agent IP Address field. After receiving the DHCP Reply message, the DHCP relay agent
performs the following operations:
– The DHCP relay agent assumes that all the Reply messages are sent to the directly-
connected DHCP clients. The Relay Agent IP Address field identifies the interface
directly connected to the client. If the value of the Relay Agent IP Address field is
not the IP address of a local interface, the DHCP relay agent discards the Reply
message.
– The DHCP relay agent checks the broadcast flag bit of the message. If the broadcast
flag bit is 1, the DHCP relay agent broadcasts the DHCP Reply message to the
DHCP client; otherwise, the DHCP relay agent unicasts the DHCP Reply message
to the DHCP client. The destination IP address is the value in the Your (Client) IP
Address field, and the MAC address is the value in the Client Hardware Address
field.
Figure 15-14 shows how a DHCP client extends the IP address lease through the DHCP relay
agent.
Figure 15-14 Extending the IP address lease through the DHCP relay agent
Client DHCP Relay Server
DHCP RESQUEST(Unicast)
Step1
1. After accessing the network for the first time, the DHCP client only needs to unicast a
DHCP Request message to the DHCP server that assigned its currently-used IP address.
2. The DHCP server then directly unicasts a DHCP ACK message or a DHCP NAK
message to the client.
DHCP Releasing
The DHCP relay agent, instead of the client, can send a Release message to the DHCP server
to release the IP addresses that assigned to the DHCP clients. You can configure a command
on the DHCP relay agent to release the IP addresses that the DHCP server assigns to the
DHCP client.
Figure 15-15 Interaction with the DHCP server when the DHCP client accesses a
network for the first time
Client Server
DHCP DISCOVER
Step1
DHCP OFFER
DHCP REQUEST
Step2
DHCP ACK/DHCPNAK
When the DHCP client accesses a network for the first time, it goes through the
following stages to set up a connection to the DHCP server:
a. Discovery stage: The DHCP client searches for the DHCP server. The DHCP client
broadcasts a DHCP Discover packet, and only the DHCP server replies to the
packet.
b. Offer stage: The DHCP server offers an IP address to the DHCP client. After
receiving the DHCP Discover packet from the DHCP client, the DHCP server
selects an unassigned IP address from the IP address pool and then sends to the
DHCP client a DHCP Offer packet that carries information about the leased IP
address and other settings.
c. Request stage: The DHCP client selects an IP address. If multiple DHCP servers
send DHCP Offer packets to the DHCP client, the DHCP client accepts only the
first received DHCP Offer packet and then broadcasts to each DHCP server a
DHCP Request packet that carries information about the selected IP address.
d. Acknowledgement stage: The DHCP server acknowledges the IP address that is
offered. After receiving the DHCP Request packet from the DHCP client, the
DHCP server sends the DHCP client a DHCP ACK packet that carries the offered
IP address and other settings. After receiving the DHCP ACK packet, the DHCP
client broadcasts a gratuitous ARP packet to check whether any host is using the IP
address assigned by the DHCP server. If the DHCP client does not receive a
response within a specified period, it uses the IP address. If the DHCP client
receives a response within a specified period, it sends a DHCP Decline packet to the
DHCP server to notify the DHCP server that the IP address is unavailable. The
DHCP client then re-applies for an IP address.
The unassigned IP addresses offered by other DHCP servers (except the DHCP server
selected by the DHCP client) are available for other DHCP clients.
After sending a ping packet, the DHCP server checks whether a response to the ping packet
can be received within a specified period. If the number of ping packets reaches the upper
limit there is still no response, the DHCP server considers that the IP address has not been
used by any device on the network segment of the IP address, ensuring that the IP address
assigned to the client is unique (implemented according to RFC 2132).
The DHCP server sends two ping packets by default, and the default timeout period for each
ping response is 500 ms.
When the lease renewal timer expires, the DHCP client must renew its IP address. The DHCP
client automatically sends a DHCP Request packet to the DHCP server that assigns the IP
address and enters the renewal state. If the IP address is valid, the DHCP server replies with a
DHCP ACK packet to entitle the DHCP client a new lease. The DHCP client then re-enters
the binding state. If the DHCP client receives a DHCP NAK packet from the DHCP server, it
enters the initializing state.
After sending a DHCP Request packet for extending the lease, the DHCP client remains in the
renewal state and waits for a response. If the DHCP client does not receive any response from
the DHCP server until the rebinding timer expires, it considers the original DHCP server
unavailable and starts to broadcast a DHCP Request packet.
Any DHCP server on the network can reply to the DHCP Request packet with a DHCP ACK
or DHCP NAK packet.
If the DHCP client receives a DHCP ACK packet, it re-enters the binding state and resets the
lease renewal and rebinding timers. If the DHCP client receives only DHCP NAK packets, it
stops using the IP address immediately and returns to the initializing state to apply for a new
IP address.
If the DHCP client does not receive any responses before the lease expiration timer expires, it
stops using the IP address immediately and returns to the initializing state. The DHCP client
then sends a DHCP Discover packet to apply for a new IP address (implemented according to
RFC 2131).
15.2.3 Applications
This section describes the applicable scenario of DHCP.
Figure 15-16 shows a networking diagram in which the UPE functioning as a DHCP client
initiates a request for obtaining a management IP address.
DHCP Server
As it is shown in Figure 15-17, a DHCP server and multiple DHCP clients (such as PCs and
portable computers) are deployed.
DHCP Clients
Generally, the DHCP server is used to assign IP addresses in the following scenarios:
l On a large network, manual configurations take a long time and bring difficulties to
centralized management over the entire network.
l Hosts on the network are more than available IP addresses. Thus, not every host has a
fixed IP address. Many hosts need to dynamically obtain IP addresses through the DHCP
server. In addition, network administrators hope that there is a limit to the number of
users of on-line at the same time.
l Only a few hosts on the network require fixed IP addresses.
Internet
DHCP Clients
The earlier DHCP protocol applies to only the scenario that the DHCP client and DHCP
server are on the same network segment. To dynamically assign IP addresses to hosts on
network segments, the network administrator needs to configure a DHCP server on each
network segment, which increases costs.
The DHCP relay function is introduced to solve this problem. A DHCP client can apply to the
DHCP server on another network segment to obtain a valid IP address. In this manner, DHCP
clients on multiple network segments can share one DHCP server. This reduces costs and
facilitates centralized management.
Terms
None
SNP Snooping
15.3 DHCPv6
Only the ATN 910/ATN 910I/ATN 910B/ATN 905/ATN 950B (with the AND2CXPB/
AND2CXPE configured) supports the DHCPv6 Relay.Only the ATN 910I/ATN 910B/ATN
905 support the DHCPv6 Client.
15.3.1 Introduction
Definition
Dynamic Host Configuration Protocol for IPv6 (DHCPv6) is designed to assign IPv6
addresses, prefixes, and other network configuration parameters to hosts.
Purpose
The IPv6 protocol provides huge address space formed by 128-bit IPv6 addresses that require
proper and efficient assignment and management policies. IPv6 stateless address
autoconfiguration defined in RFC2462 is widely used. Hosts configured with the stateless
address autoconfiguration function automatically configure IPv6 addresses based on prefixes
carried in Route Advertisement (RA) packets sent from a neighboring router.
When stateless address autoconfiguration is used, routers do not record IPv6 addresses of
hosts. Therefore, stateless address autoconfiguration has poor manageability. In addition,
hosts configured with stateless address autoconfiguration cannot obtain other configuration
parameters such as the DNS server address. ISPs do not provide instructions for automatic
allocation of IPv6 prefixes for routers. Therefore, users need to manually configure IPv6
addresses for routing and switching devices during IPv6 network deployment.
DHCPv6 solves this problem. DHCPv6 is a stateful protocol for configuring IPv6 addresses
automatically. During stateful address configuration, a DHCPv6 server assigns a complete
IPv6 address to a host and provides other configuration parameters, such as the DNS server
address. A DHCPv6 relay agent may be used to relay DHCPv6 packets. The DHCPv6 server
binds the IPv6 address to a client. This improves network manageability.
Compared with manual address configuration and IPv6 stateless address autoconfiguration
that uses network prefixes in RA packets, DHCPv6 has the following advantages:
l Controls IPv6 address assignment better. A DHCPv6 device can record addresses
assigned to hosts and assign requested addresses. This function facilitates network
management.
l Assigns IPv6 address prefixes to network devices. This function facilitates automatic
configuration and hierarchical network management.
l Provides other network configuration parameters such as the DNS server address.
15.3.2 Principles
– After receiving the Response message, the client uses the IPv6 address/prefix and
other configuration information in the Response message. If the client receives only
a DHCPv6 Advertise message within the specified period, the client undergoes four
stages to obtain the configuration information according to the configured policy.
l Exchanging IPv6 addresses/prefixes and other configuration information, involving four
stages:
When a DHCPv6 client accesses the network for the first time, similar to a DHCPv4
client, the DHCPv6 client undergoes four stages to obtain an IPv6 address/prefix and
other configuration information:
– Discovering stage: indicates the stage at which the DHCPv6 client searches for a
DHCPv6 server. The client multicasts a DHCPv6 Solicit message.
– Offering stage: indicates the stage at which the DHCPv6 server offers an IPv6
address/prefix to the DHCPv6 client. After receiving the DHCPv6 Solicit message
from the client, the DHCPv6 server selects an unassigned IPv6 address/prefix from
the IPv6 address/prefix pool, and then sends a DHCPv6 Advertise message
containing the leased IPv6 address/prefix and other configuration information to the
client.
– Selecting stage: indicates the stage at which the DHCPv6 client selects an IPv6
address/prefix. If multiple DHCPv6 servers send DHCPv6 Advertise messages to
the client, the client selects a server according to the configured policy. If the
Advertise message contains the Server Unicast option, and the client also supports
this option, the client unicasts a DHCPv6 Request message to each DHCPv6 server.
Otherwise, the client multicasts a DHCPv6 Request message containing
information used to instruct the selected DHCPv6 server to offer an IPv6 address/
prefix.
– Acknowledging stage: indicates the stage at which the DHCPv6 server
acknowledges the IPv6 address/prefix to be offered. After receiving the DHCPv6
Request message from the client, the DHCPv6 server sends a DHCPv6 Response
message to the client. The DHCPv6 Response message contains the offered IPv6
address/prefix and other configuration information. After receiving the DHCPv6
Response message, the client uses the offered IPv6 address/prefix and other
configuration information.
l The DHCPv6 client extends the IPv6 address/prefix lease.
When a DHCPv6 server assigns an IPv6 address/prefix to a client, the server sends a
message containing the preferred lifetime, valid lifetime, lease renew time, and rebind
time. The relationship between them is as follows: lease renew time < rebind time <
preferred lifetime < valid lifetime.
The preferred lifetime is used to limit the lease renew time and rebind time. By default,
the lease renew time and rebind time account for 50% and 80% respectively of the
preferred lifetime.
The valid lifetime is the lease set for the IPv6 address/prefix assigned to a client. The
server retrieves the IPv6 address/prefix after the valid lifetime expires. If the client
intends to continue to use this IPv6 address/prefix, it needs to extend the IPv6 address/
prefix lease before the valid lifetime ends.
When the lease of the IPv6 address/prefix expires, the DHCPv6 client automatically
sends a DHCPv6 Renew message to the server. If the client and server support unicast,
the client unicasts a DHCPv6 Renew message. Otherwise, the client multicasts a
DHCPv6 Renew message.
After the DHCPv6 server receives the DHCPv6 Renew message, if the contained IPv6
address/prefix is valid and the lease can be renewed, the server replies with a DHCPv6
Response message containing the new lease of the IPv6 address/prefix. After receiving
the DHCPv6 Response message from the server, the client renews the lease of its IPv6
address/prefix.
When the rebind time expires, if the DHCPv6 client does not finish renewing the lease of
its IPv6 address/prefix, it multicasts a DHCPv6 Rebind message to all available servers.
After the DHCPv6 server receives the DHCPv6 Rebind message, if the contained IPv6
address/prefix is valid and the lease can be renewed, the server replies with a DHCPv6
Response message containing the new lease of the IPv6 address/prefix. After receiving
the DHCPv6 Response message from the server, the client renews the lease of its IPv6
address/prefix.
l When the link to which the DHCPv6 client is connected changes, the client needs to
check whether its IPv6 address/prefix is still available.
When the link to which the DHCPv6 client is connected changes, for example, the
network cable is loosely connected, the client needs to send a DHCPv6 Confirm message
to the server to check whether its IPv6 address/prefix is still available.
If the IPv6 address needs to be validated, the client multicasts a DHCPv6 Confirm
message containing the IPv6 address to be validated.
After the DHCPv6 server receives the DHCPv6 Confirm message, if the IPv6 address/
prefix assigned to the client is still available, the server replies with a DHCPv6 Response
message in which the status of the IPv6 address is set to Success. After receiving the
DHCPv6 Response message from the server, the client continues to use this IPv6
address.
If the IPv6 prefix needs to be validated, the client multicasts a DHCPv6 Rebind message
containing the IPv6 prefix to be validated. The DHCPv6 server processes the received
Rebind message and then replies with a Response message. After the client receives the
Response message from the server, if the lifetime of the IPv6 prefix is not 0, the client
continues to use this prefix and renew the lease.
l The DHCPv6 client detects a duplicate IPv6 address.
If the DHCPv6 client detects a duplicate IPv6 address, it notifies the server of the
address conflict.
That is, the DHCPv6 client sends a DHCPv6 Decline message containing the duplicate
IPv6 address to the server. The source address of the Decline message cannot be the
duplicate address. If the client and server support unicast, the client unicasts a DHCPv6
Decline message to the server. Otherwise, the client multicasts a DHCPv6 Decline
message to the server.
When receiving the DHCPv6 Decline message, the server marks the IPv6 address
contained in the Decline message as a duplicate address.
l The DHCPv6 client releases an IPv6 address/prefix.
To release its IPv6 address/prefix, the DHCPv6 client sends a DHCPv6 Release message
containing the IPv6 address/prefix to be released to the server. If the client and server
support unicast, the client unicasts a DHCPv6 Release message. Otherwise, the client
multicasts a DHCPv6 Release message.
After receiving the DHCPv6 Release message from the client, the server releases the
IPv6 address/prefix assigned to the client and responds with a Reply message.
Client sends
packets
Relay-Forward
Relay-Forward
Relay-Reply
Relay-Reply
Server requests
packets
A DHCPv6 relay agent encapsulates all request messages that pass through it into Relay-
forward messages before forwarding them to other relay agents or the server. These request
messages include DHCP request messages originating from clients and Relay-forward
messages originating from other relay agents. The server then encapsulates messages in
response to the clients into Relay-reply messages and sends the Relay-reply messages to the
relay agent.
15.3.3 Applications
15.3.3.1 DHCPv6 Client over PPPoE (Including DHCPv6-PD)
The ATN device that functions as a CPE supports the following functions:
l Routed mode, either numbered or unnumbered
– In numbered routed mode, the ATN device sends DHCPv6 requests that carry the
IA_NA option to apply for IPv6 addresses of WAN interfaces, as shown in Figure
15-20.
– In unnumbered routed mode, the ATN device sends DHCPv6 requests that do not
carry the IA_NA option so that IPv6 addresses of WAN interfaces are not applied
separately, as shown in Figure 15-21.
l Applying for access-side IPv6 address pool using DHCPv6-PD and assigning IPv6
prefixes to the user side using ND
NMS
CPE PE-AGG
IP/MPLS
DHCPv6 Server
SNP Snooping
15.4 Plug-and-Play
Purpose
A great number of devices need to access the network; therefore, the CapEx of project
deployment, especially of on-site commissioning of devices on the mobile bearer network is
high and the profit of the carrier is greatly affected. In this situation, Huawei launches a PnP
solution for networking schemes to address this problem.
Benefits
This feature brings the following benefits to carriers:
PnP can greatly reduce time taken for on-site commissioning of devices and prevent device
commissioning engineers from working in atrocious outdoor environments. In this manner,
PnP can accelerate progress and improve quality of the project.
15.4.2 Principles
The Dynamic Host Configuration Protocol (DHCP) provides a framework for transmitting
configuration information to hosts on a TCP/IP network. DHCP, based on the Bootstrap
Protocol (BOOTP), adds the capability of automatically allocating reusable network addresses
and adds additional configuration options to DHCP packets.
DHCP packets can be classified into eight types. A DHCP server and a DHCP client
communicate with each other by exchanging these DHCP packets.
l DHCPDISCOVER: It is the first packet used to search for a DHCP server when a DHCP
client accesses the network for the first time.
l DHCPOFFER: It is sent by a DHCP server to respond to a DHCPDISCOVER packet. A
DHCPOFFER packet carries configuration information.
l DHCPREQUEST: The DHCP client sends a DHCPREQUEST packet to the DHCP
server in any of the following situations.
– After being initialized, the DHCP client broadcasts a DHCPREQUEST packet to
respond to the DHCPOFFER packet sent from the DHCP server.
– After being restarted, the DHCP client broadcasts a DHCPREQUEST packet to
confirm the correctness of the configurations, such as the previously allocated IP
address.
– After being bound to an IP address, the DHCP client sends a unicast
DHCPREQUEST packet to extend the lease of the IP address.
l DHCPACK: It is sent by a DHCP server to acknowledge the DHCPREQUEST packet
sent from a DHCP client. After receiving a DHCPACK packet, the DHCP client obtains
the configuration information, including the IP address.
l DHCPNAK: It is sent by a DHCP server to refuse the DHCPREQUEST message from a
DHCP client. For example, the IP address that is assigned by the DHCP server to the
DHCP client expires, or the DHCP client moves to another network.
l DHCPDECLINE: It is sent by a DHCP client to notify the DHCP server that the
assigned IP address conflicts with the other IP addresses. Then, the DHCP client applies
to the DHCP server for another IP address.
l DHCPRELEASE: It is sent by a DHCP client to ask the DHCP server to release the
network address and cancel the remaining lease.
l DHCPINFORM: It is sent by a DHCP client to the DHCP server to ask for configuration
parameters after the DHCP client obtains an IP address.
5
3
4
2
1
IP/MPLS
UPE (DHCP client) NMS (DHCP
DHCP Relay Server)
10
11
1. After being powered on, the UPE starts the PnP process automatically. First, the UPE
sends a DHCP Discover broadcast packet that carries the Vendor Class Identifier (VCI)
in the Option 60 field and Option 61 field.
2. The DHCP relay agent receives the DHCP Discover packet and appends the Option 82
field to the packet. Then, the DHCP relay agent unicasts the packet to the DHCP server,
which functions as the NMS.
3. The DHCP server searches the network element planning forms for a fixed IP address
according to the Option 60, Option 61 and Option 82 fields carried in the packet. The
DHCP server allocates a fixed IP address and responds to the DHCP relay agent with a
DHCP Offer packet.
4. The DHCP relay agent receives the DHCP Offer packet and then sends it to the UPE.
5. The UPE broadcasts a DHCP request.
6. The DHCP relay agent receives the DHCP request and appends the Option 82 field to the
packet. Then, the DHCP relay agent unicasts the packet to the DHCP server, which
functions as the NMS.
7. The DHCP server checks information in the received packet and acknowledges the
address allocation for the UPE. In addition, the DHCP server responds to the DHCP
relay agent with a DHCP ACK packet.
8. The DHCP relay agent receives the DHCP ACK packet and sends it to the UPE.
9. After receiving the DHCP ACK packet, the UPE sends gratuitous ARP packets and
checks whether the allocated address is already in use.
If the UPE finds that the allocated address is not in use, the UPE obtains information
such as the IP address (yiaddr), mask (option 1), and gateway (option 3) from the DHCP
ACK packet and generates a route. Meanwhile, the IP address X.X.X.X mask command
is automatically executed; Telnet, AAA administrative user, and SNMP are configured
on the UPE. Finally, the DHCP client function is disabled on the UPE, and the UPE
cannot send or handle DHCP packets any longer.
10. The NMS delivers configuration files, startup files, and a restart command to the UPE.
11. The UPE can use the PnP feature. That is, the PnP process is complete.
NOTE
After the PnP process is complete, all VTY channels are available. The first user is required to log in to the
device through the DHCP server or DHCP relay and create the login password.
15.4.3 Applications
As shown in Figure 15-24, the UPE obtains a management IP address through DHCP and
starts a management channel through automatic configuration. The NMS delivers
configuration files and startup files through the management channel.
DHCP Server
PNP Plug-and-Play
15.5 DCN
15.5.1 Introduction
Definition
The data communication network (DCN) refers to the network on which network elements
(NEs) exchange Operation, Administration and Maintenance (OAM) information with the
network management system (NMS), and it is constructed for communication between
managed devices.
A DCN can be an external or internal DCN. As shown in Figure 15-25, the external DCN is
the network between the NMS and the access point. The internal DCN is the network on
which network elements transmit OAM information. The DCN mentioned in the following
description is internal DCN.
NMS
External
DCN
GNE GNE
NE NE
Internal Internal
DCN DCN
NE NE
Gateway network elements (GNEs) are the network elements that connect to the NMS using
protocols, such as Transfer Control Protocol (TCP) and Simple Network Management
Protocol (SNMP). GNEs are able to forward data at the network or application layer. The
NMS can use GNEs to manage remote NEs connected through optical fibers.
Purpose
When constructing a large network, hardware engineers must install devices on site, and
software commissioning engineers must configure the devices on site. This network
construction method requires significant human and material resources, causing high capital
expenditure (CAPEX) and operational expenditure (OPEX). If a new NE is deployed but the
NMS cannot detect the NE, the network administrator cannot manage or control the NE. Plug-
and-play is required so that the NMS can automatically detect new NEs and remotely
commission NEs to reduce CAPEX/OPEX.
The DCN technique offers a mechanism to implement plug-and-play. After an NE is installed
and started, an IP address (NEIP address) is automatically generated for the NE based on the
NEID. Each NE adds its NEID and NEIP address to a Type-10 LSA. Then, OSPF advertises
all the Type-10 LSAs to construct a core routing table consisting of mappings between NEIP
addresses and NEIDs on each NE. After detecting a new NE, the GNE reports the NE to the
NMS. The NMS accesses the NE using the IP address of the GNE and ID of the NE. To
commission NEs, the NMS can use the GNE to remotely manage the NEs on the network.
NOTE
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Benefits
The NMS is able to manage NEs using service channels provided by the managed NEs. No
additional devices are required, reducing CAPEX/OPEX.
15.5.2 Principles
most significant bits 00001001, which is 9 in decimal format. Therefore, the NEIP
address derived from 0x09BFE0 is 128.9.191.224.
Before the NEIP address is manually changed, the NEIP address and NEID are
associated; therefore, the NEIP address changes if the NEID is changed. Once the NEIP
address is manually changed, it no longer changes when the associated NEID is changed.
NOTE
To improve the system security, it is recommended that the NEIP address be changed to the
planned one.
NMS GNE NE
IP:10.9.0.1
NE1 NE2
ETH
Management
Info
ID:9-3 Original packet from the NMS
UDP
IP:10.9.0.3 Packet processed by the GNE
PPP
ETH
The devices on a DCN communicate with each other using the Point-to-Point Protocol (PPP)
through single-hop logical channels. Therefore, packets transmitted on the DCN are
encapsulated into PPP frames and forwarded at the data link layer of service ports.
As shown in Figure 15-26, the NMS uses the GNE to manage NEs in the following process:
1. After the DCN function is enabled, a PPP channel and an OSPF neighbor relationship
are established between devices.
2. OSPF protocol packets are sent between OSPF neighbors to learn host routes carrying
NEIP addresses to obtain mappings between NEIP addresses and NEIDs.
3. When the NMS accesses an NE, it uses the NEIP address of the GNE as the destination
address and the NEID of the NE to address a TCP packet to the GNE.
4. The GNE sends the packet to its application layer, searches for the destination NEIP
address based on the NEID, changes the NEIP address of the NE as the destination
address, encapsulates the TCP packet into a UDP packet, and searches the local routing
table to send the packet to the NE.
5. After receiving the packet, the NE obtains the IP address carried in the packet, verifies
that the IP address is its own NEIP address, and sends the packet to the application layer
for further processing.
Figure 15-27 Using the sub-interface numbered 4094 for DCN communication
NMS GNE NE
TCP GNE
IP:128.9.0.1
NE1 NE2
ETH
Management
Info
ID:9-3 Original packet from the NMS
UDP
IP:128.9.0.3 Packet processed by the GNE
ETH
2. OSPF protocol packets are sent between OSPF neighbors to learn host routes carrying
NEIP addresses to obtain mappings between NEIP addresses and NEIDs.
3. When the NMS accesses an NE, it uses the NEIP address of the GNE as the destination
address and the NEID of the NE to address a TCP packet to the GNE.
4. The GNE sends the packet to its application layer, searches for the destination NEIP
address based on the NEID, changes the NEIP address of the NE as the destination
address, encapsulates the TCP packet into a UDP packet, and searches the local routing
table to send the packet to the NE.
5. After receiving the packet, the NE obtains the IP address carried in the packet, verifies
that the IP address is its own NEIP address, and sends the packet to the application layer
for further processing.
Routes in the DCN core routing table are generated in the same process.
Figure 15-28 Gateway DCN application layer in the protocol stack (PTN forwarding mode)
Application Application
TCP TCP
IP IP IP
VLAN ETH
Ethernet Ethernet
ETH GE/FE
TCP connection
Figure 15-29 Gateway DCN application layer in the protocol stack (IPRAN forwarding
mode)
Application Application
TCP TCP
IP IP
VLAN ETH
Ethernet Ethernet
ETH GE/FE
NMS Management
GNE
network
TCP
connection
When ATNs are connected through service interfaces, intra-domain DCN over service
interfaces is deployed.
The following figure shows intra-domain DCN application layer in the protocol stack.
Figure 15-30 Intra-domain DCN application layer in the protocol stack (PTN forwarding
mode)
Application Application Application
IP IP IP IP IP
Figure 15-31 Intra-domain DCN application layer in the protocol stack (IPRAN forwarding
mode)
Application Application
TCP UDP
IP IP IP
Service and DCN packets are transmitted separately on service interfaces through DCN VRF.
Figure 15-32 Protocol layers used in gateway DCN over the control plane (PTN forwarding
mode)
Application Application
TCP TCP
IP IP IP
ETH/PPP ETH/PPP
Ethernet Ethernet
Service Port Service Port
TCP connection
Figure 15-33 Protocol layers used in gateway DCN over the control plane (IPRAN
forwarding mode)
Application Application
TCP TCP
IP IP
ETH/PPP ETH/PPP
Ethernet Ethernet
Service Port Service Port
TCP connection
In gateway DCN over the control plane and gateway DCN over service interfaces, service
interfaces are used for access and function as gateways. In gateway DCN over the control
plane, the VLAN tag does not need to be removed, and DCN packets are transmitted over the
control plane, avoiding address conflicts with the management plane.
DCN
Network
User GNE
Unauthorized users may frequently send authentication attempts to crack account information
by simulating Qx or MML AAA login packets or using exhaustive attack method to construct
login information. To prevent this case, DCN supports delayed processing for users with
authentication failures. The detailed implementation process is as follows:
1. When a user logs in to a device for the first time, DCN authenticates the user. If the
authentication is successful, the user can log in to the device. If the authentication fails,
DCN locks the user and sets an 8-second timeout period.
2. If the user resends a login request before the timeout period expires, DCN discards the
request. If the user resends a login request after the timeout period expires, DCN
reauthenticates the user. If the authentication is successful, the user can log in to the
device. If the authentication fails again, DCN relocks the user and sets a 16-second
timeout period.
3. If the user resends a login request before the 16-second timeout period expires, DCN
discards the request. If the user resends a login request after the 16-second timeout
period expires, DCN reauthenticates the user. If the authentication is successful, the user
can log in to the device. If the authentication fails again, DCN relocks the user and sets a
32-second timeout period.
4. If the user fails to log in to the device three consecutive times, DCN disconnects the
user's TCP connection after the 32-second timeout period expires.
15.5.3 Applications
15.5.3.1 Typical DCN Application
During network deployment, every network element (NE) must be configured with software
and commissioned after hardware installation to ensure that all NEs can communicate with
each other. As a large number of NEs are deployed, on-site deployment for each NE requires
significant manpower and is time-consuming. In order to reduce the on-site deployment times
and the cost of operation and maintenance, the DCN can be deployed.
External TCP
DCN
Active Standby
GNE GNE
UDP
As shown in Figure 15-35, to improve reliability, Active and standby GNEs can be deployed.
If the active GNE fails, this function can be gracefully switched to the standby GNE by the
NMS.
HUAWEI
ATN-1 Network
Third-party
Network
NMS VLL-1
HUAWEI
Network
GNE
VLL-N
…
HUAWEI
ATN-N Network
1. A DCN VLAN group is configured on the GNE, and the VLAN ID of the Dot1q
termination subinterface is the same as the DCN VLAN ID of the main interface.
2. The GNE sends DCN negotiation packets to VLANs in the DCN VLAN group.
3. The DCN negotiation packets are sent to different leaf nodes through VLLs.
4. NEs learn the DCN VLAN ID sent by the GNE and establish DCN connections with the
GNE.
Terms
Term Description
Core routing table A core routing table consists of mappings between NEID
and NEIP addresses of NEs on a data communication
network (DCN). Before accessing a non-GNE through a
GNE, the NMS must search the core routing table for the
NEIP address of the non-GNE based on the destination
NEID.
Abbreviations
Abbreviation Full Name