ATN 905&910&910I&910B&950B V200R006C20SPC600 Feature Description 01 (CLI) PDF

ATN 905&910&910I&910B&950B Multi-Service
Access Equipment
V200R006C20SPC600
Feature Description
Issue 01
Date 2018-01-30
HUAWEI TECHNOLOGIES CO., LTD.

Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior written
consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Huawei Technologies Co., Ltd.

Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China
Website: http://www.huawei.com
Email: support@huawei.com
Issue 01 (2018-01-30) Huawei Proprietary and Confidential i

Copyright © Huawei Technologies Co., Ltd.
ATN 905&910&910I&910B&950B Multi-Service Access
Equipment
Feature Description About This Document
About This Document
Purpose
This document describes the feature in terms of its overview, principle, and applications.
This document together with other types of document helps intended readers get a deep
understanding of the feature. For information on how the ATN equipment supports this
feature, see the Product Description.
Related Version
The following table lists the product version related to this document.
Product Name Version
ATN 905 V200R006C20SPC600

ATN 910
ATN 910I
ATN 910B
ATN 950B
Intended Audience
This document is intended for:
l Network Planning Engineer
l Commissioning Engineer
l Data Configuration Engineer
l System Maintenance Engineer
Security Declaration
l Encryption algorithm declaration
The encryption algorithms DES/3DES/RSA (RSA-1024 or lower)/MD5 (in digital
signature scenarios and password encryption)/SHA1 (in digital signature scenarios) have
a low security, which may bring security risks. If protocols allowed, using more secure
Issue 01 (2018-01-30) Huawei Proprietary and Confidential ii

Equipment
encryption algorithms, such as AES/RSA (RSA-2048 or higher)/SHA2/HMAC-SHA2 is

recommended.
l Password configuration declaration
– Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
– To further improve device security, periodically change the password.
l Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
l Feature declaration
– The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
– The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
l Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
l This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
l The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
l Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
l The pictures of hardware in this document are for reference only.
Issue 01 (2018-01-30) Huawei Proprietary and Confidential iii

Equipment
Symbol Conventions
Symbol Description
Indicates an imminently hazardous situation which, if not

avoided, will result in death or serious injury.
Indicates a potentially hazardous situation which, if not

avoided, could result in death or serious injury.

avoided, may result in minor or moderate injury.

avoided, could result in equipment damage, data loss,
performance deterioration, or unanticipated results.
NOTICE is used to address practices not related to personal
injury.
Calls attention to important information, best practices and

tips.
NOTE is used to address information not related to
personal injury, equipment damage, and environment
deterioration.
Command Conventions
Convention Description
Boldface The keywords of a command line are in boldface.
Italic Command arguments are in italics.
[] Items (keywords or arguments) in brackets [ ] are optional.
{ x | y | ... } Optional items are grouped in braces and separated by

vertical bars. One item is selected.
[ x | y | ... ] Optional items are grouped in brackets and separated by

vertical bars. One item is selected or no item is selected.
{ x | y | ... }* Optional items are grouped in braces and separated by

vertical bars. A minimum of one item or a maximum of all
items can be selected.
[ x | y | ... ]* Optional items are grouped in brackets and separated by

vertical bars. Several items or no item can be selected.
Issue 01 (2018-01-30) Huawei Proprietary and Confidential iv

Equipment
GUI Conventions
Convention Description
Boldface Buttons, menus, parameters, tabs, window, and dialog titles

are in boldface. For example, click OK.
> Multi-level menus are in boldface and separated by the ">"

signs. For example, choose File > Create > Folder.
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
Changes in Issue 01 (2018-01-30)

This document is the first release of the V200R006C20SPC600 version.
Issue 01 (2018-01-30) Huawei Proprietary and Confidential v

Equipment
Feature Description Contents
Contents
About This Document.....................................................................................................................ii

1 Basic Configurations.....................................................................................................................1
1.1 VRP Overview................................................................................................................................................................1
1.1.1 Overview..................................................................................................................................................................... 1
1.1.1.1 VRP Introduction......................................................................................................................................................1
1.1.2 VRP Architecture........................................................................................................................................................ 2
1.1.2.1 NOS Model...............................................................................................................................................................2
1.1.2.2 System Plane.............................................................................................................................................................3
1.1.2.3 SCP........................................................................................................................................................................... 4
1.1.2.4 DFP........................................................................................................................................................................... 4
1.1.2.5 GCP.......................................................................................................................................................................... 5
1.1.2.6 SMP.......................................................................................................................................................................... 6
1.1.2.7 SSP............................................................................................................................................................................6
1.1.3 VRP System Features.................................................................................................................................................. 6
1.1.3.1 Componentized Structure......................................................................................................................................... 6
1.1.3.2 License......................................................................................................................................................................7
1.1.3.3 HA............................................................................................................................................................................ 7
1.2 Basic Configuration........................................................................................................................................................ 7
1.2.1 Introduction to Basic Configuration............................................................................................................................ 7
1.2.2 Principles..................................................................................................................................................................... 8
1.2.2.1 FTP........................................................................................................................................................................... 8
1.2.2.2 TFTP....................................................................................................................................................................... 13
1.2.2.3 Introduction to Telnet............................................................................................................................................. 14
1.2.2.4 SSH......................................................................................................................................................................... 19
1.2.2.5 User Management...................................................................................................................................................25
1.2.2.6 Virtual File System................................................................................................................................................. 28
1.2.2.7 Pipe Character.........................................................................................................................................................30
1.2.2.8 Daylight Saving Time.............................................................................................................................................31
1.2.2.9 Timing Restart........................................................................................................................................................ 31
1.2.3 Applications...............................................................................................................................................................31
1.2.3.1 Applications of FTP................................................................................................................................................31
1.2.3.2 Applications of TFTP............................................................................................................................................. 32
1.2.3.3 Applications of Telnet.............................................................................................................................................33
Issue 01 (2018-01-30) Huawei Proprietary and Confidential vi

Equipment
1.2.3.4 Applications of SSH............................................................................................................................................... 33

1.2.4 Terms, Acronyms, and Abbreviations....................................................................................................................... 36
2 System Management...................................................................................................................38
2.1 Information Center....................................................................................................................................................... 38
2.1.1 Introduction............................................................................................................................................................... 38
2.1.2 Principles................................................................................................................................................................... 40
2.1.2.1 Information Classification...................................................................................................................................... 40
2.1.2.2 Information Hierarchy............................................................................................................................................ 44
2.1.2.3 Information Output................................................................................................................................................. 45
2.1.2.4 Information Shield.................................................................................................................................................. 47
2.1.2.5 Suppression of the Log Processing Rate................................................................................................................ 48
2.1.2.6 Diagnostic Logs in Binary Format......................................................................................................................... 49
2.2 SNMP........................................................................................................................................................................... 50
2.2.1 Terms and Abbreviations...........................................................................................................................................50
2.2.2 Introduction............................................................................................................................................................... 51
2.2.3 Principle.....................................................................................................................................................................52
2.2.3.1 SNMP Management Model and Related Concepts................................................................................................ 52
2.2.3.2 SNMPv1................................................................................................................................................................. 54
2.2.3.3 SNMPv2c................................................................................................................................................................56
2.2.3.4 SNMPv3................................................................................................................................................................. 57
2.2.3.5 Comparisons of SNMPv1, SNMPv2c, and SNMPv3.............................................................................................59
2.2.3.6 SNMP Attack Defense Mechanism........................................................................................................................ 60
2.2.4 Applications...............................................................................................................................................................60
2.2.4.1 SNMP for Configuration Management.................................................................................................................. 60
2.2.4.2 SNMP for VPN User Management........................................................................................................................ 61
2.3 RMON and RMON2.................................................................................................................................................... 63
2.3.1 Introduction............................................................................................................................................................... 63
2.3.2 Principles................................................................................................................................................................... 64
2.3.2.1 RMON and RMON2 Infrastructure........................................................................................................................65
2.3.2.2 Features of RMON and RMON2............................................................................................................................72
2.3.2.3 Remote Monitoring of RMON and RMON2......................................................................................................... 74
2.3.2.4 Table Management in RMON and RMON2...........................................................................................................75
2.3.2.5 Implementation of RMON and RMON2 on Huawei Devices............................................................................... 76
2.3.3 Terms and Abbreviations...........................................................................................................................................79
2.4 IP FPM..........................................................................................................................................................................79
2.4.1 Introduction............................................................................................................................................................... 79
2.4.2 Principles................................................................................................................................................................... 80
2.4.2.1 Basic Concepts....................................................................................................................................................... 80
2.4.2.2 Basic Functions.......................................................................................................................................................82
2.4.3 Applications...............................................................................................................................................................86
Issue 01 (2018-01-30) Huawei Proprietary and Confidential vii

Equipment
2.4.3.1 End-to-End Performance Measurement Scenarios.................................................................................................86

2.4.3.2 Hop-by-Hop Performance Measurement Scenarios............................................................................................... 90
2.5 NQA..............................................................................................................................................................................94
2.5.1 Introduction to NQA..................................................................................................................................................94
2.5.2 Principles................................................................................................................................................................... 96
2.5.2.1 UDP Jitter Test........................................................................................................................................................96
2.5.2.2 UDP Jitter Test (hardware-based)...........................................................................................................................98
2.5.2.3 ICMP Jitter Test......................................................................................................................................................99
2.5.2.4 ICMP Jitter Test (hardware-based).......................................................................................................................100
2.5.2.5 Path Jitter Test.......................................................................................................................................................101
2.5.2.6 FTP Test................................................................................................................................................................101
2.5.2.7 SNMP Test............................................................................................................................................................102
2.5.2.8 TCP Test............................................................................................................................................................... 102
2.5.2.9 UDP Test...............................................................................................................................................................103
2.5.2.10 ICMP Test...........................................................................................................................................................103
2.5.2.11 Trace Test............................................................................................................................................................105
2.5.2.12 LSP PING test.................................................................................................................................................... 106
2.5.2.13 LSP Trace test.....................................................................................................................................................106
2.5.2.14 LSP Jitter test......................................................................................................................................................107
2.5.2.15 PWE3 Ping Test..................................................................................................................................................108
2.5.2.16 PWE3 Trace Test................................................................................................................................................ 109
2.5.2.17 MAC Ping Test................................................................................................................................................... 110
2.5.2.18 Path MTU test..................................................................................................................................................... 111
2.5.2.19 VPLS Ping test....................................................................................................................................................112
2.5.2.20 VPLS Trace test.................................................................................................................................................. 113
2.5.2.21 VPLS PW Ping test and VPLS PW Trace test....................................................................................................114
2.5.2.22 General Flow Test............................................................................................................................................... 116
2.5.2.23 Ethernet Service Activation Test........................................................................................................................ 121
2.5.3 Terms and Abbreviations.........................................................................................................................................133
2.6 Ping and Tracert..........................................................................................................................................................134
2.6.1 Introduction to Ping and Tracert..............................................................................................................................134
2.6.2 Principles................................................................................................................................................................. 134
2.6.2.1 Working of Ping....................................................................................................................................................134
2.6.2.2 Working of Tracert................................................................................................................................................135
2.6.2.3 LSPV.................................................................................................................................................................... 135
2.6.2.4 CE Ping.................................................................................................................................................................138
2.7 Fault Management...................................................................................................................................................... 139
2.7.1 Introduction............................................................................................................................................................. 139
2.7.2 Principles................................................................................................................................................................. 140
2.7.2.1 Fault Management................................................................................................................................................ 140
Issue 01 (2018-01-30) Huawei Proprietary and Confidential viii

Equipment

2.8 Performance Management.......................................................................................................................................... 142
2.8.1 Introduction............................................................................................................................................................. 142
2.8.2 Principles................................................................................................................................................................. 143
2.8.2.1 Statistics................................................................................................................................................................143
2.9 PoE Features............................................................................................................................................................... 144
2.9.1 Overview................................................................................................................................................................. 144
2.9.2 Principle Description............................................................................................................................................... 145
2.9.2.1 Power Supply Procedure...................................................................................................................................... 145
2.9.2.2 Power Supply Modes............................................................................................................................................146
2.9.3 Applications.............................................................................................................................................................148
2.9.3.1 Typical Applications.............................................................................................................................................148
2.9.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 149
2.10 TWAMP....................................................................................................................................................................149
2.10.1 Introduction........................................................................................................................................................... 150
2.10.2 Principles............................................................................................................................................................... 150
2.10.2.1 TWAMP Implementation Principles.................................................................................................................. 151
2.10.2.2 TWAMP Implementation Process...................................................................................................................... 152
2.10.3 Applications...........................................................................................................................................................155
2.10.3.1 TWAMP Applications on an IP Network........................................................................................................... 155
2.10.3.2 TWAMP Applications on an L3VPN................................................................................................................. 156
2.10.4 Terms and Abbreviations.......................................................................................................................................157
2.11 TWAMP Light.......................................................................................................................................................... 157
2.11.1 Introduction............................................................................................................................................................157
2.11.2 Principles............................................................................................................................................................... 158
2.11.2.1 Comparison Between TWAMP Light and TWAMP...........................................................................................158
2.11.2.2 Principles............................................................................................................................................................ 160
2.11.3 Applications........................................................................................................................................................... 162
2.11.3.1 TWAMP Light Application on an L3VPN......................................................................................................... 162
2.11.4 Terms, Acronyms, and Abbreviations................................................................................................................... 162
3 Reliability....................................................................................................................................163
3.1 VRRP..........................................................................................................................................................................163
3.1.1 Introduction............................................................................................................................................................. 163
3.1.2 Principles................................................................................................................................................................. 165
3.1.2.1 Master/Backup Mode........................................................................................................................................... 169
3.1.2.2 VRRP Load Balancing......................................................................................................................................... 169
3.1.2.3 VRRP Tracking Interface Status...........................................................................................................................170
3.1.2.4 BFD for VRRP..................................................................................................................................................... 170
3.1.2.5 Pinging the Virtual IP Address............................................................................................................................. 174
3.1.2.6 VRRP Security..................................................................................................................................................... 174
3.1.2.7 VRRP Smooth Switching..................................................................................................................................... 174
Issue 01 (2018-01-30) Huawei Proprietary and Confidential ix

Equipment
3.1.2.8 mVRRP.................................................................................................................................................................175
3.1.2.9 VRRPv3 Packet Format........................................................................................................................................176
3.1.3 Applications.............................................................................................................................................................177
3.1.3.1 VRRP Tracking Interface Status...........................................................................................................................178
3.1.3.2 mVRRP.................................................................................................................................................................179
3.1.4 Terms, Acronyms and Abbreviations...................................................................................................................... 179
3.2 Bit-Error-Triggered Protection Switching.................................................................................................................. 180
3.2.1 Introduction to Bit-Error-Triggered Protection Switching...................................................................................... 180
3.2.2 Principles................................................................................................................................................................. 181
3.2.3 Applications.............................................................................................................................................................186
3.2.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which an RSVP-TE Tunnel Carries a
PW.................................................................................................................................................................................... 186
3.2.3.2 Application of Bit-Error-Triggered Protection Switching in a Scenario in Which an LDP LSP Carries a PW...187
3.2.3.3 Application of Bit-Error-Triggered Protection Switching on Trunk Interfaces................................................... 188
3.3 BFD............................................................................................................................................................................ 189
3.3.1 Overview................................................................................................................................................................. 189
3.3.2 Key Concepts...........................................................................................................................................................190
3.3.2.1 BFD for IP............................................................................................................................................................ 196
3.3.2.2 BFD for PIS.......................................................................................................................................................... 197
3.3.2.3 BFD for TTL........................................................................................................................................................ 198
3.3.2.4 Introduction to BFDv6..........................................................................................................................................199
3.3.3 Application Environment........................................................................................................................................ 200
3.3.3.1 BFD for USR........................................................................................................................................................ 200
3.3.3.2 BFD for OSPF...................................................................................................................................................... 200
3.3.3.3 BFD for IS-IS....................................................................................................................................................... 201
3.3.3.4 BFD for BGP........................................................................................................................................................ 202
3.3.3.5 BFD for LSP......................................................................................................................................................... 202
3.3.3.6 BFD for PST......................................................................................................................................................... 204
3.3.3.7 BFD for TE........................................................................................................................................................... 204
3.3.3.8 BFD for PW..........................................................................................................................................................206
3.4 NSR Overview............................................................................................................................................................209
3.4.1 Introduction............................................................................................................................................................. 209
3.4.2 NSR Features Supported by the ATN...................................................................................................................... 211
3.5 Ethernet OAM............................................................................................................................................................ 212
3.5.1 Introduction............................................................................................................................................................. 212
3.5.2 Principles................................................................................................................................................................. 212
3.5.2.1 EFM OAM............................................................................................................................................................214
3.5.2.2 Ethernet CFM....................................................................................................................................................... 218
3.5.2.3 Basic Y.1731 Functions........................................................................................................................................ 226
3.5.2.4 OAM Fault Association........................................................................................................................................239
3.5.2.5 OAM-based Security............................................................................................................................................ 241
Issue 01 (2018-01-30) Huawei Proprietary and Confidential x

Equipment
3.5.2.6 Comparison Between Protocols............................................................................................................................242

3.5.3 Applications.............................................................................................................................................................244
3.6 E-LMI......................................................................................................................................................................... 248
3.6.1 Introduction............................................................................................................................................................. 249
3.6.2 Principles................................................................................................................................................................. 249
3.7 MPLS-TP OAM......................................................................................................................................................... 254
3.7.1 Introduction to MPLS-TP OAM..............................................................................................................................254
3.7.2 Principles................................................................................................................................................................. 254
3.7.2.1 MPLS-TP OAM Functional Components............................................................................................................ 254
3.7.2.2 Continuity Check.................................................................................................................................................. 255
3.7.2.3 Loss Measurement................................................................................................................................................ 256
3.7.2.4 Delay Measurement.............................................................................................................................................. 257
3.7.3 Application.............................................................................................................................................................. 258
3.7.3.1 MPLS-TP OAM over an IP RAN in the Layer 2 to Edge Scenario..................................................................... 258
3.7.4 Acronyms and Abbreviations.................................................................................................................................. 259
3.8 ISSU Feature Description........................................................................................................................................... 260
3.8.1 Introduction............................................................................................................................................................. 260
3.8.2 Principles................................................................................................................................................................. 262
3.8.2.1 ISSU Principle...................................................................................................................................................... 262
3.8.3 Typical Applications................................................................................................................................................264
4 Interface Management.............................................................................................................. 266

4.1 Logical Interface.........................................................................................................................................................266
4.1.1 Introduction............................................................................................................................................................. 266
4.1.2 Principles................................................................................................................................................................. 268
4.1.2.1 Trunk Interface..................................................................................................................................................... 268
4.1.2.2 VLANIF Interface................................................................................................................................................ 269
4.1.3 Applications.............................................................................................................................................................269
4.2 Transmission Alarm Customization and Suppression................................................................................................ 273
4.2.1 Introduction............................................................................................................................................................. 273
4.2.2 Principles................................................................................................................................................................. 274
4.2.2.1 Basic Concepts..................................................................................................................................................... 274
4.2.2.2 Transmission Alarm Processing........................................................................................................................... 275
4.3 Interface Alarm Inversion...........................................................................................................................................278
4.3.1 Introduction............................................................................................................................................................. 278
4.3.2 Principles................................................................................................................................................................. 278
4.3.2.1 Alarm Inversion Mode..........................................................................................................................................278
5 LAN Access and MAN Access................................................................................................ 280
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xi

Equipment
5.1 Ethernet.......................................................................................................................................................................280
5.1.1 Introduction to Ethernet...........................................................................................................................................280
5.1.2 Principles................................................................................................................................................................. 281
5.1.2.1 Physical Layer of the Ethernet..............................................................................................................................281
5.1.2.2 Data Link Layer of the Ethernet........................................................................................................................... 290
5.1.3 Applications.............................................................................................................................................................294
5.1.3.1 Computer Interconnection.................................................................................................................................... 294
5.1.3.2 Interconnection Between High-Speed Network Devices..................................................................................... 295
5.1.3.3 Means to Access MANs....................................................................................................................................... 295
5.2 VLAN......................................................................................................................................................................... 297
5.2.1 Introduction............................................................................................................................................................. 297
5.2.2 Principles................................................................................................................................................................. 298
5.2.2.1 Basic Concepts..................................................................................................................................................... 298
5.2.2.2 VLAN Communication Principles....................................................................................................................... 301
5.2.2.3 VLAN Aggregation.............................................................................................................................................. 305
5.2.2.4 VLAN Mapping....................................................................................................................................................312
5.2.2.5 Flexible Service Access Through Sub-interfaces of Various Types.....................................................................313
5.2.3 Application.............................................................................................................................................................. 320
5.3 Trunk...........................................................................................................................................................................322
5.3.1 Introduction............................................................................................................................................................. 322
5.3.2 Principles................................................................................................................................................................. 322
5.3.2.1 Basic Principles.................................................................................................................................................... 322
5.3.2.2 Restrictions on Trunk Interfaces...........................................................................................................................324
5.3.2.3 Trunk Interface Classification and Features......................................................................................................... 325
5.3.2.4 Trunk Forwarding Principles................................................................................................................................ 326
5.3.2.5 Inter-Board Trunk................................................................................................................................................. 327
5.3.2.6 LACP.................................................................................................................................................................... 328
5.3.2.7 E-Trunk.................................................................................................................................................................336
5.3.3 Usage Scenario........................................................................................................................................................ 343
5.3.3.1 Eth-Trunk..............................................................................................................................................................343
5.3.3.2 Link Aggregation Group...................................................................................................................................... 344
5.3.3.3 E-Trunk Application in Dual-homing Networking.............................................................................................. 345
5.4 STP/RSTP/MSTP....................................................................................................................................................... 346
5.4.1 Introduction............................................................................................................................................................. 346
5.4.2 Principles of STP/RSTP.......................................................................................................................................... 348
5.4.2.1 Background...........................................................................................................................................................348
5.4.2.2 Basic Concepts..................................................................................................................................................... 350
5.4.2.3 BPDU Format....................................................................................................................................................... 359
5.4.2.4 STP Topology Calculation....................................................................................................................................361
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xii

Equipment
5.4.2.5 Evolution from STP to RSTP............................................................................................................................... 367

5.4.2.6 Details About RSTP............................................................................................................................................. 374
5.4.3 MSTP Principles......................................................................................................................................................377
5.4.3.1 MSTP Background............................................................................................................................................... 377
5.4.3.2 Basic MSTP Concepts.......................................................................................................................................... 378
5.4.3.3 MST BPDUs.........................................................................................................................................................386
5.4.3.4 MSTP Topology Calculation................................................................................................................................ 391
5.4.3.5 MSTP Fast Convergence...................................................................................................................................... 394
5.4.4 Applications.............................................................................................................................................................395
5.5 QinQ........................................................................................................................................................................... 398
5.5.1 Introduction............................................................................................................................................................. 398
5.5.2 Principles................................................................................................................................................................. 399
5.5.2.1 Principles.............................................................................................................................................................. 399
5.5.2.2 VLAN Stacking.................................................................................................................................................... 400
5.5.2.3 QinQ Mapping...................................................................................................................................................... 400
5.5.2.4 IP Forwarding on Termination Sub-interfaces..................................................................................................... 401
5.5.2.5 Proxy ARP on a VLAN Tag Termination Sub-interface...................................................................................... 402
5.5.2.6 L3VPN Access Through a Termination Sub-interface.........................................................................................404
5.5.2.7 PWE3/VLL Access Through a Termination Sub-interface.................................................................................. 406
5.5.2.8 VPLS Access Through a Termination Sub-interface............................................................................................407
5.5.2.9 PWE3 or VLL Access Through QinQ Stacking Sub-interfaces...........................................................................409
5.5.2.10 VPLS Access Through QinQ Stacking Sub-interfaces...................................................................................... 410
5.5.3 Applications............................................................................................................................................................. 411
5.5.3.1 Public User Services on an ME Network............................................................................................................. 412
5.5.3.2 Enterprise User Communication Through Private Lines..................................................................................... 413
5.6 RRPP.......................................................................................................................................................................... 415
5.6.1 Introduction............................................................................................................................................................. 415
5.6.2 Principles................................................................................................................................................................. 417
5.6.2.1 Basic Concepts..................................................................................................................................................... 417
5.6.2.2 RRPP Implementation.......................................................................................................................................... 423
5.6.2.3 RRPP Running Principles.....................................................................................................................................429
5.6.3 Applications.............................................................................................................................................................432
5.7 LLDP.......................................................................................................................................................................... 436
5.7.1 Introduction............................................................................................................................................................. 436
5.7.2 Principles................................................................................................................................................................. 437
5.7.2.1 Basic Principles.................................................................................................................................................... 437
5.7.2.2 LLDP Parameters................................................................................................................................................. 442
5.7.2.3 LLDP Implementation.......................................................................................................................................... 445
5.7.3 Applications.............................................................................................................................................................446
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xiii

Equipment

5.8 Transparent Transmission of Layer 2 Protocol Packets............................................................................................. 447
5.8.1 Introduction to Transparent Transmission of Layer 2 Protocol Packets..................................................................447
5.8.2 Principles................................................................................................................................................................. 447
5.8.2.1 Basic Concepts of Transparent Transmission of Layer 2 Protocol Packets......................................................... 447
5.8.2.2 Principles of Transparent Transmission of Layer 2 Protocol Packets.................................................................. 449
5.8.3 Applications.............................................................................................................................................................456
5.8.3.1 Interface-based Transparent Transmission of Layer 2 Protocol Packets..............................................................457
5.8.3.2 VLAN-based Transparent Transmission of Layer 2 Protocol Packets.................................................................458
5.8.3.3 QinQ-based Transparent Transmission of Layer 2 Protocol Packets................................................................... 459
5.8.3.4 Hybrid VLAN-based Transparent Transmission of Layer 2 Protocol Packets.....................................................460
5.9 ERPS (G.8032)........................................................................................................................................................... 461
5.9.1 Overview................................................................................................................................................................. 461
5.9.2 Principles................................................................................................................................................................. 462
5.9.2.1 Basic Concepts..................................................................................................................................................... 463
5.9.2.2 R-APS PDU Format............................................................................................................................................. 468
5.9.2.3 ERPS Single Ring Principles................................................................................................................................471
5.9.2.4 ERPS Multi-ring Principles.................................................................................................................................. 476
5.9.2.5 ERPS Multi-instance............................................................................................................................................ 480
5.9.2.6 Association Between ERPS and Ethernet CFM................................................................................................... 481
5.9.3 Applications.............................................................................................................................................................483
5.9.3.1 ERPS Layer 2 Transparent Transmission............................................................................................................. 483
5.10 Automatic Link Discovery....................................................................................................................................... 486
5.10.1 Introduction........................................................................................................................................................... 486
5.10.2 Principles............................................................................................................................................................... 486
5.10.2.1 Basic Principles.................................................................................................................................................. 487
5.10.2.2 Automatic Link Discovery Packets.................................................................................................................... 488
5.10.2.3 Neighbor Information Parameters...................................................................................................................... 489
5.10.2.4 Implementation of Automatic Link Discovery...................................................................................................489
5.10.3 Applications...........................................................................................................................................................489
6 WAN Access............................................................................................................................... 491

6.1 ATM IMA................................................................................................................................................................... 491
6.1.1 Introduction............................................................................................................................................................. 491
6.1.2 Principles................................................................................................................................................................. 492
6.1.3 Applications.............................................................................................................................................................496
6.2 PPP and MP................................................................................................................................................................ 498
6.2.1 Introduction............................................................................................................................................................. 498
6.2.2 Principles................................................................................................................................................................. 499
6.2.2.1 Process of Establishing a PPP Connection........................................................................................................... 499
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xiv

Equipment
6.2.2.2 Process of Establishing an MP Connection.......................................................................................................... 500

6.2.2.3 PPP and MP Features Supported by the ATN...................................................................................................... 500
6.2.3 Applications.............................................................................................................................................................500
6.3 CES............................................................................................................................................................................. 501
6.3.1 Introduction............................................................................................................................................................. 501
6.3.2 Principles................................................................................................................................................................. 503
6.3.2.1 Basic Concepts..................................................................................................................................................... 503
6.3.2.2 IP RAN Implementation on the Device................................................................................................................505
6.3.3 Applications............................................................................................................................................................. 511
6.4 BER Measurement......................................................................................................................................................511
6.4.1 Introduction to BER Measurement.......................................................................................................................... 511
6.4.2 Principles of E1 BER Measurement........................................................................................................................ 512
6.4.2.1 Basic principle of BER Measurement.................................................................................................................. 512
6.4.3 Application.............................................................................................................................................................. 513
6.5 APS............................................................................................................................................................................. 514
6.5.1 Introduction to APS................................................................................................................................................. 514
6.5.2 Principles................................................................................................................................................................. 514
6.5.2.1 Basic APS Principles............................................................................................................................................ 514
6.5.2.2 Implementation of APS........................................................................................................................................ 518
6.5.3 Applications.............................................................................................................................................................518
6.5.4 Acronyms and Abbreviations.................................................................................................................................. 518
6.6 xDSL...........................................................................................................................................................................519
6.6.1 Introduction............................................................................................................................................................. 519
6.6.2 Principles................................................................................................................................................................. 520
6.6.2.1 Packet Encapsulation Mode..................................................................................................................................520
6.6.2.2 Principles of xDSL on the ATN............................................................................................................................520
6.6.3 Applications.............................................................................................................................................................521
6.6.3.1 Ethernet-based xDSL Service Forwarding in the Offload Solution..................................................................... 521
6.6.3.2 IP-based xDSL Service Forwarding for the Offload Solution..............................................................................522
6.7 GPON......................................................................................................................................................................... 523
6.7.1 Overview................................................................................................................................................................. 524
6.7.2 Introduction............................................................................................................................................................. 525
6.7.3 GPON Principles..................................................................................................................................................... 526
6.7.3.1 Basic Concepts..................................................................................................................................................... 526
6.7.3.2 Service Multiplexing Principles........................................................................................................................... 530
6.7.3.3 GPON Frame Structure........................................................................................................................................ 531
6.7.4 Key GPON Technologies........................................................................................................................................ 533
6.7.4.1 Ranging.................................................................................................................................................................533
6.7.4.2 Burst Optical/Electrical Technology.....................................................................................................................534
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xv

Equipment
6.7.4.3 DBA......................................................................................................................................................................536
6.7.4.4 FEC....................................................................................................................................................................... 537
6.7.4.5 Line Encryption.................................................................................................................................................... 538
6.7.5 GPON Terminal Authentication and Management..................................................................................................539
6.7.5.1 GPON Terminal Authentication (an ONU Not Preconfigured)........................................................................... 539
6.7.5.2 GPON Terminal Authentication (an ONU Preconfigured).................................................................................. 540
6.7.6 Networking Applications (FTTx)............................................................................................................................ 544
7 IP Services................................................................................................................................... 547
7.1 IP Addressing............................................................................................................................................................. 547
7.1.1 Introduction to IP Addresses................................................................................................................................... 547
7.1.2 Principles................................................................................................................................................................. 548
7.1.2.1 Classes of IP Addresses........................................................................................................................................ 548
7.1.2.2 Characteristics of IP Addresses............................................................................................................................ 549
7.1.2.3 Special IP Addresses............................................................................................................................................ 550
7.1.2.4 Private IP Addresses............................................................................................................................................. 551
7.1.3 Applications.............................................................................................................................................................551
7.1.3.1 Subnetting............................................................................................................................................................. 551
7.1.3.2 IP Address Allocation...........................................................................................................................................553
7.1.3.3 IP Address Unnumbered.......................................................................................................................................553
7.1.3.4 IP Address Resolution.......................................................................................................................................... 553
7.1.3.5 IP Address Overlapping in the VPN Instance...................................................................................................... 554
7.2 ARP............................................................................................................................................................................ 555
7.2.1 Introduction to ARP.................................................................................................................................................555
7.2.2 Principles................................................................................................................................................................. 557
7.2.2.1 Basic ARP Principles............................................................................................................................................557
7.2.2.2 Dynamic ARP.......................................................................................................................................................563
7.2.2.3 Static ARP............................................................................................................................................................ 566
7.2.2.4 ARP Automatic Scanning and Fixed ARP........................................................................................................... 568
7.2.2.5 Gratuitous ARP.....................................................................................................................................................569
7.2.2.6 Proxy ARP............................................................................................................................................................ 571
7.2.2.7 ARP-Ping..............................................................................................................................................................576
7.2.2.8 IP Address Conflict Detection.............................................................................................................................. 579
7.2.2.9 ARP Security........................................................................................................................................................ 580
7.2.3 Applications.............................................................................................................................................................582
7.2.3.1 Application of Static ARP.................................................................................................................................... 583
7.2.3.2 Application of Proxy ARP Within a VLAN.........................................................................................................583
7.3 ACL............................................................................................................................................................................ 585
7.3.1 Introduction to the ACL.......................................................................................................................................... 585
7.3.2 Principles................................................................................................................................................................. 586
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xvi

Equipment
7.3.2.1 Differences Between ACL4 and ACL6................................................................................................................ 588

7.3.3 Applications.............................................................................................................................................................588
7.3.5 Appendix................................................................................................................................................................. 591
7.4 IPv4.............................................................................................................................................................................593
7.4.1 Introduction to IPv4.................................................................................................................................................593
7.4.2 Principles................................................................................................................................................................. 593
7.4.2.1 Principle of TCP................................................................................................................................................... 593
7.4.2.2 Principle of UDP.................................................................................................................................................. 595
7.4.2.3 Principle of Raw IP...............................................................................................................................................595
7.4.2.4 Principle of the Socket..........................................................................................................................................595
7.4.3 Applications.............................................................................................................................................................596
7.5 IP Unicast Policy-Based Routing............................................................................................................................... 597
7.5.1 Introduction............................................................................................................................................................. 597
7.5.2 Principles................................................................................................................................................................. 598
7.5.3 Applications.............................................................................................................................................................599
7.6 IPv6.............................................................................................................................................................................600
7.6.1 Introduction to IPv6.................................................................................................................................................601
7.6.2 Principles................................................................................................................................................................. 602
7.6.2.1 IPv6 Header Format..............................................................................................................................................602
7.6.2.2 IPv6 Addresses..................................................................................................................................................... 608
7.6.2.3 Features of IPv6....................................................................................................................................................612
7.6.2.4 ICMPv6................................................................................................................................................................ 614
7.6.2.5 Neighbor Discovery..............................................................................................................................................615
7.6.2.6 Path MTU............................................................................................................................................................. 631
7.6.2.7 Dual Protocol Stacks............................................................................................................................................ 632
7.6.2.8 TCP6..................................................................................................................................................................... 632
7.6.2.9 UDP6.................................................................................................................................................................... 633
7.6.2.10 RawIP6............................................................................................................................................................... 634
7.6.3 Applications.............................................................................................................................................................634
8 IP Routing................................................................................................................................... 637
8.1 IP Routing Overview.................................................................................................................................................. 637
8.1.1 Introduction to IP Routing....................................................................................................................................... 637
8.1.2 Principles................................................................................................................................................................. 637
8.1.2.1 Routers..................................................................................................................................................................638
8.1.2.2 Routing Protocols................................................................................................................................................. 639
8.1.2.3 Routing Table and FIB Table................................................................................................................................639
8.1.2.4 Route Iteration...................................................................................................................................................... 642
8.1.2.5 Static Routes and Dynamic Routes...................................................................................................................... 642
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xvii

Equipment
8.1.2.6 Classification of Dynamic Routing Protocols...................................................................................................... 643

8.1.2.7 Routing Protocols and Route Preferences............................................................................................................ 643
8.1.2.8 Priority-based Route Convergence....................................................................................................................... 645
8.1.2.9 Load Balancing and Route Backup...................................................................................................................... 646
8.1.2.10 Principle of IP FRR............................................................................................................................................ 647
8.1.2.11 Re-advertisement of Routing Information..........................................................................................................648
8.1.2.12 Indirect Next Hop............................................................................................................................................... 648
8.1.2.13 Default Routes.................................................................................................................................................... 651
8.1.3 Applications.............................................................................................................................................................651
8.1.3.1 Typical Application of IP FRR............................................................................................................................. 651
8.1.3.2 Typical Application of Indirect Next Hop............................................................................................................ 652
8.2 Static Routes............................................................................................................................................................... 655
8.2.1 Introduction to Static Routes................................................................................................................................... 655
8.2.2 Principles................................................................................................................................................................. 655
8.2.2.1 Components of Static Routes............................................................................................................................... 655
8.2.2.2 Applications of Static Routes............................................................................................................................... 656
8.2.2.3 Functions of Static Routes.................................................................................................................................... 658
8.2.2.4 BFD for Static Routes...........................................................................................................................................658
8.3 RIP.............................................................................................................................................................................. 659
8.3.1 Introduction............................................................................................................................................................. 659
8.3.2 Principles................................................................................................................................................................. 660
8.3.2.1 RIP-1.....................................................................................................................................................................660
8.3.2.2 RIP-2.....................................................................................................................................................................660
8.3.2.3 Timers................................................................................................................................................................... 661
8.3.2.4 Split Horizon.........................................................................................................................................................662
8.3.2.5 Poison Reverse..................................................................................................................................................... 662
8.3.2.6 Triggered Update.................................................................................................................................................. 663
8.3.2.7 Route Summarization........................................................................................................................................... 663
8.3.2.8 Multi-process and Multi-instance......................................................................................................................... 664
8.3.2.9 Hot Standby.......................................................................................................................................................... 664
8.4 RIPng.......................................................................................................................................................................... 665
8.4.1 Introduction............................................................................................................................................................. 665
8.4.2 Principles................................................................................................................................................................. 666
8.4.2.1 RIPng Packet Format............................................................................................................................................666
8.4.2.2 Timers................................................................................................................................................................... 667
8.4.2.3 Split Horizon.........................................................................................................................................................667
8.4.2.4 Poison Reverse..................................................................................................................................................... 668
8.4.2.5 Triggered Update.................................................................................................................................................. 668
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xviii

Equipment
8.4.2.7 Multi-process........................................................................................................................................................ 669

8.4.2.8 Hot Standby.......................................................................................................................................................... 670
8.5 IS-IS............................................................................................................................................................................670
8.5.1 Introduction to IS-IS................................................................................................................................................ 671
8.5.2 Principles................................................................................................................................................................. 672
8.5.2.1 Basic Concepts of IS-IS........................................................................................................................................672
8.5.2.2 IS-IS Multi-instance and Multi-process............................................................................................................... 689
8.5.2.3 IS-IS Route Leaking............................................................................................................................................. 690
8.5.2.4 IS-IS Fast Convergence........................................................................................................................................ 691
8.5.2.5 Priority-based IS-IS Convergence........................................................................................................................ 693
8.5.2.6 IS-IS LSP Fragment Extension.............................................................................................................................693
8.5.2.7 IS-IS Administrative Tag......................................................................................................................................696
8.5.2.8 Dynamic Hostname Exchange..............................................................................................................................696
8.5.2.9 IS-IS HA............................................................................................................................................................... 698
8.5.2.10 IS-IS 3-way Handshake...................................................................................................................................... 699
8.5.2.11 IS-IS GR............................................................................................................................................................. 699
8.5.2.12 IS-IS for IPv6......................................................................................................................................................706
8.5.2.13 IS-IS TE.............................................................................................................................................................. 707
8.5.2.14 IS-IS Shortcut (AA) and Advertise (FA)............................................................................................................ 710
8.5.2.15 IS-IS Wide Metric...............................................................................................................................................713
8.5.2.16 IS-IS LDP Synchronization................................................................................................................................ 714
8.5.2.17 BFD for IS-IS..................................................................................................................................................... 716
8.5.2.18 IS-IS Auto FRR.................................................................................................................................................. 719
8.5.2.19 IS-IS Authentication........................................................................................................................................... 723
8.5.4 Appendixes.............................................................................................................................................................. 727
8.6 OSPF...........................................................................................................................................................................727
8.6.1 Introduction............................................................................................................................................................. 727
8.6.2 Principles................................................................................................................................................................. 728
8.6.2.1 Fundamentals of OSPF......................................................................................................................................... 728
8.6.2.2 OSPF GR.............................................................................................................................................................. 737
8.6.2.3 OSPF TE...............................................................................................................................................................740
8.6.2.4 OSPF VPN............................................................................................................................................................744
8.6.2.5 OSPF NSSA......................................................................................................................................................... 750
8.6.2.6 BFD for OSPF...................................................................................................................................................... 751
8.6.2.7 OSPF GTSM.........................................................................................................................................................753
8.6.2.8 OSPF Smart-discover........................................................................................................................................... 754
8.6.2.9 OSPF-BGP Synchronization................................................................................................................................ 754
8.6.2.10 OSPF-LDP Synchronization...............................................................................................................................755
8.6.2.11 OSPF Database Overflow...................................................................................................................................757
8.6.2.12 OSPF Hot Standby..............................................................................................................................................758
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xix

Equipment
8.6.2.13 OSPF Fast Convergence..................................................................................................................................... 758

8.6.2.14 OSPF MIB.......................................................................................................................................................... 760
8.6.2.15 OSPF Mesh-Group............................................................................................................................................. 761
8.6.2.16 Priority-based OSPF Convergence..................................................................................................................... 762
8.6.2.17 OSPF IP FRR......................................................................................................................................................762
8.6.2.18 OSPF Authentication.......................................................................................................................................... 766
8.6.3 OSPF Applications.................................................................................................................................................. 767
8.6.3.1 OSPF GR.............................................................................................................................................................. 767
8.7 OSPFv3.......................................................................................................................................................................769
8.7.1 Introduction............................................................................................................................................................. 769
8.7.2 Principles................................................................................................................................................................. 769
8.7.2.1 Principle of OSPFv3............................................................................................................................................. 770
8.7.2.2 Comparison between OSPFv3 and OSPFv2........................................................................................................ 777
8.8 BGP............................................................................................................................................................................ 779
8.8.1 Introduction to BGP.................................................................................................................................................779
8.8.2 Principles................................................................................................................................................................. 781
8.8.2.1 Basic Principle of BGP.........................................................................................................................................781
8.8.2.2 Route Import.........................................................................................................................................................789
8.8.2.4 Route Dampening................................................................................................................................................. 796
8.8.2.5 Community Attribute............................................................................................................................................797
8.8.2.6 Route Reflector.....................................................................................................................................................798
8.8.2.7 BGP Confederation...............................................................................................................................................803
8.8.2.8 BGP GR................................................................................................................................................................ 804
8.8.2.9 BGP Security........................................................................................................................................................ 806
8.8.2.10 BFD for BGP...................................................................................................................................................... 807
8.8.2.11 BGP Peer Tracking............................................................................................................................................. 808
8.8.2.12 BGP Auto FRR................................................................................................................................................... 808
8.8.2.13 BGP ORF............................................................................................................................................................809
8.8.2.14 Active-Route-Advertise...................................................................................................................................... 811
8.8.2.15 BGP Dynamic Update Peer-Groups................................................................................................................... 811
8.8.2.16 BGP NSR............................................................................................................................................................814
8.8.2.17 4-Byte AS Number............................................................................................................................................. 814
8.8.2.18 Routing Policy-based Next Hop Iteration.......................................................................................................... 816
8.9 Routing Policies..........................................................................................................................................................818
8.9.1 Introduction to Routing Policies..............................................................................................................................818
8.9.2 Principles................................................................................................................................................................. 819
8.9.2.1 Basic Principles of Routing Policies.................................................................................................................... 819
8.9.2.2 Usage Scenario..................................................................................................................................................... 821
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xx

Equipment

8.10 Appendix List of Port Numbers of Common Protocols........................................................................................... 821
9 IP Multicast.................................................................................................................................823
9.1 IP Multicast Overview................................................................................................................................................823
9.1.1 Introduction............................................................................................................................................................. 823
9.1.2 Principles................................................................................................................................................................. 826
9.1.2.1 Basic Concepts..................................................................................................................................................... 826
9.1.2.2 Basic Framework.................................................................................................................................................. 827
9.1.2.3 Multicast Addresses..............................................................................................................................................828
9.1.2.4 Multicast Model Classification.............................................................................................................................830
9.1.2.5 Multicast Protocols............................................................................................................................................... 831
9.1.2.6 Multicast Packet Forwarding................................................................................................................................833
9.2 PIM............................................................................................................................................................................. 835
9.2.1 PIM.......................................................................................................................................................................... 835
9.2.2 Principles................................................................................................................................................................. 836
9.2.2.1 PIM-DM............................................................................................................................................................... 836
9.2.2.2 PIM-SM................................................................................................................................................................ 842
9.2.2.3 PIM-SSM..............................................................................................................................................................855
9.2.2.4 PIM Reliability..................................................................................................................................................... 856
9.2.2.5 PIM Security.........................................................................................................................................................858
9.2.2.6 PIM Control Message........................................................................................................................................... 867
9.2.3 Applications.............................................................................................................................................................880
9.2.3.1 PIM-DM Intra-domain......................................................................................................................................... 880
9.2.3.2 PIM Intra-domaim................................................................................................................................................ 881
9.2.3.3 PIM-SSM Intra-domain........................................................................................................................................ 883
9.3 IGMP.......................................................................................................................................................................... 886
9.3.1 Introduction............................................................................................................................................................. 886
9.3.2 Principles................................................................................................................................................................. 887
9.3.2.1 IGMPv1&v2&v3.................................................................................................................................................. 887
9.3.2.2 IGMP Group Compatibility..................................................................................................................................889
9.3.2.3 IGMP Querier Election.........................................................................................................................................890
9.3.2.4 Router-Alert for IGMP......................................................................................................................................... 891
9.3.2.5 IGMP Only-Link.................................................................................................................................................. 891
9.3.2.6 IGMP On-Demand............................................................................................................................................... 891
9.3.2.7 IGMP Prompt-Leave............................................................................................................................................ 892
9.3.2.8 IGMP Policy Control............................................................................................................................................893
9.3.2.9 SSM Mapping.......................................................................................................................................................895
9.3.2.10 Source Address-based IGMP Message Filtering................................................................................................897
9.3.2.11 Protocol Comparison.......................................................................................................................................... 898
9.3.3 IGMP Applications..................................................................................................................................................898
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxi

Equipment
9.3.3.1 Typical IGMP Applications..................................................................................................................................898

9.4 Layer 2 Multicast........................................................................................................................................................900
9.4.1 Introduction............................................................................................................................................................. 900
9.4.2 Principles................................................................................................................................................................. 902
9.4.2.1 IGMP Snooping.................................................................................................................................................... 902
9.4.2.2 Static Layer 2 Multicast........................................................................................................................................907
9.4.2.3 Layer 2 SSM Mapping......................................................................................................................................... 908
9.4.2.4 IGMP Snooping Proxy......................................................................................................................................... 909
9.4.2.5 Multicast VLAN................................................................................................................................................... 911
9.4.2.6 Layer 2 Multicast Instance................................................................................................................................... 913
9.4.3 Applications.............................................................................................................................................................915
9.4.3.1 Application of Layer 2 Multicast for IPTV Services............................................................................................915
9.5 MSDP......................................................................................................................................................................... 918
9.5.1 Introduction to MSDP............................................................................................................................................. 918
9.5.2 Principles................................................................................................................................................................. 919
9.5.2.1 Inter-Domain Multicast in MSDP........................................................................................................................ 919
9.5.2.2 Anycast RP in MSDP........................................................................................................................................... 921
9.5.2.3 MD5/Key-Chain Authentication.......................................................................................................................... 922
9.5.2.4 RPF Rules of SA Messages.................................................................................................................................. 922
9.5.3 MSDP Applications................................................................................................................................................. 923
9.6 Multicast Management............................................................................................................................................... 926
9.6.1 Introduction to Multicast Management................................................................................................................... 926
9.6.2 Principles................................................................................................................................................................. 927
9.6.2.1 MPing................................................................................................................................................................... 927
9.6.2.2 MTrace..................................................................................................................................................................927
9.7 Multicast Route Management.....................................................................................................................................929
9.7.1 Introduction to Multicast Route Management.........................................................................................................930
9.7.2 Principles................................................................................................................................................................. 930
9.7.2.1 RPF Check............................................................................................................................................................ 930
9.7.2.2 Multicast Load Splitting....................................................................................................................................... 932
9.7.2.3 Longest-Match Multicast Routing........................................................................................................................935
9.7.2.4 Multicast Boundary Designation.......................................................................................................................... 936
9.7.2.5 Multicast NSR...................................................................................................................................................... 937
9.8 Multicast VPN in Rosen Mode...................................................................................................................................938
9.8.1 Introduction to Multicast VPN................................................................................................................................ 938
9.8.2 Principles................................................................................................................................................................. 938
9.8.2.1 Concepts in MVPN...............................................................................................................................................938
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxii

Equipment
9.8.2.2 Basic Implantation Principles............................................................................................................................... 939

9.8.2.3 PIM Neighbor Relationship Between CE, PE, and P........................................................................................... 941
9.8.2.4 Process of Establishing a Share-MDT..................................................................................................................943
9.8.2.5 MT Transmission Process Based on the Share-MDT...........................................................................................944
9.8.2.6 Switch-MDT Switchover......................................................................................................................................949
9.8.2.7 Multicast VPN Extranet........................................................................................................................................951
9.8.2.8 Multicast VPN in BGP A-D Mode.......................................................................................................................954
9.8.2.9 Inter-AS MVPN....................................................................................................................................................957
9.8.3 MVPN Applications................................................................................................................................................ 959
9.8.3.1 Single-AS MD VPN............................................................................................................................................. 960
9.9 Multicast Security.......................................................................................................................................................962
9.9.1 Introduction to Multicast Security........................................................................................................................... 962
9.9.2 Principles................................................................................................................................................................. 962
9.9.2.1 Limit on the Total Number of Multicast Entries.................................................................................................. 962
9.9.2.2 Limit on the Number of Downstream Interfaces of a Multicast Entry.................................................................963
9.9.2.3 Limit on Multicast Protocol Status....................................................................................................................... 963
9.9.2.4 Multicast Filtering Policies...................................................................................................................................964
9.9.2.5 Multicast Protocol Packet Attack Defense........................................................................................................... 966
9.9.2.6 Multicast Security Authentication........................................................................................................................ 966
9.9.3 Multicast Security Applications.............................................................................................................................. 966
9.9.3.1 Measures to Guarantee Network Security............................................................................................................ 966
9.9.3.2 Measures to Guarantee Protocol-Layer Security..................................................................................................966
9.9.3.3 Measures to Guarantee Device Security...............................................................................................................967
10 MPLS..........................................................................................................................................968
10.1 MPLS Basics............................................................................................................................................................ 968
10.1.1 Introduction........................................................................................................................................................... 968
10.1.2 Principles............................................................................................................................................................... 969
10.1.2.1 Concepts............................................................................................................................................................. 969
10.1.2.2 Establishing LSPs............................................................................................................................................... 976
10.1.2.3 MPLS Forwarding.............................................................................................................................................. 977
10.1.2.4 MPLS Ping/Traceroute....................................................................................................................................... 982
10.1.3 Applications...........................................................................................................................................................987
10.1.3.1 MPLS-based VPN.............................................................................................................................................. 987
10.1.3.2 PBR to an LSP.................................................................................................................................................... 988
10.1.4 Terms, Acronyms, and Abbreviations................................................................................................................... 988
10.2 MPLS LDP............................................................................................................................................................... 990
10.2.1 Introduction........................................................................................................................................................... 990
10.2.2 Principles............................................................................................................................................................... 990
10.2.2.1 Concepts............................................................................................................................................................. 990
10.2.2.2 LDP Sessions...................................................................................................................................................... 992
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxiii

Equipment
10.2.2.3 Advertising and Managing Labels......................................................................................................................994

10.2.2.4 LDP LSP Establishment..................................................................................................................................... 996
10.2.2.5 Delayed LDP Adjacency Deletion..................................................................................................................... 997
10.2.2.6 LDP-IGP Synchronization..................................................................................................................................999
10.2.2.7 Synchronization Between LDP and Static Routes........................................................................................... 1004
10.2.2.8 LDP GR............................................................................................................................................................ 1005
10.2.2.9 LDP NSR.......................................................................................................................................................... 1007
10.2.2.10 LDP FRR........................................................................................................................................................ 1008
10.2.2.11 LDP MTU....................................................................................................................................................... 1011
10.2.2.12 LDP Authentication........................................................................................................................................ 1011
10.2.2.13 LDP over TE...................................................................................................................................................1012
10.2.2.14 Coexistence of the Local and Remote LDP Sessions.....................................................................................1013
10.2.2.15 Distributing Labels for All Peers by LDP...................................................................................................... 1014
10.2.2.16 Smart LDP Ingress Policy.............................................................................................................................. 1015
10.2.2.17 Smart LDP Request Policy............................................................................................................................. 1017
10.2.3 Terms, Acronyms, and Abbreviations................................................................................................................. 1019
10.3 MPLS TE................................................................................................................................................................ 1020
10.3.1 MPLS TE............................................................................................................................................................. 1020
10.3.2 Principles............................................................................................................................................................. 1023
10.3.2.1 Basic Concepts................................................................................................................................................. 1023
10.3.2.2 Implementation................................................................................................................................................. 1028
10.3.2.3 Information Advertisement Component........................................................................................................... 1029
10.3.2.4 Path Calculation Component............................................................................................................................ 1031
10.3.2.5 Establishing a CR-LSP Using RSVP-TE......................................................................................................... 1034
10.3.2.6 RSVP Summary Refresh.................................................................................................................................. 1039
10.3.2.7 RSVP Hello...................................................................................................................................................... 1040
10.3.2.8 Traffic Forwarding Component........................................................................................................................ 1041
10.3.3 Tunnel Optimization............................................................................................................................................ 1043
10.3.3.1 Tunnel Re-optimization.................................................................................................................................... 1043
10.3.4 CR-LSP Attribute Templates...............................................................................................................................1044
10.3.5 MPLS TE Reliability........................................................................................................................................... 1046
10.3.5.1 Reliability Overview.........................................................................................................................................1046
10.3.5.2 Make-Before-Break.......................................................................................................................................... 1047
10.3.5.3 TE FRR.............................................................................................................................................................1049
10.3.5.4 CR-LSP Backup............................................................................................................................................... 1055
10.3.5.5 Isolated LSP Computation................................................................................................................................1060
10.3.5.6 SRLG................................................................................................................................................................ 1062
10.3.5.7 TE Tunnel Protection Group.............................................................................................................................1063
10.3.5.8 BFD for MPLS TE........................................................................................................................................... 1066
10.3.5.9 RSVP GR..........................................................................................................................................................1069
10.3.5.10 RSVP NSR..................................................................................................................................................... 1072
10.3.6 MPLS TE Security...............................................................................................................................................1072
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxiv

Equipment
10.3.6.1 RSVP Authentication and Its Enhancements................................................................................................... 1072

10.3.7 DS-TE.................................................................................................................................................................. 1075
10.3.7.1 Background.......................................................................................................................................................1075
10.3.7.2 Related Concepts.............................................................................................................................................. 1078
10.3.7.3 Implementation................................................................................................................................................. 1080
10.3.8 Static Bidirectional Co-routed LSPs....................................................................................................................1084
10.3.9 Loopback Detection for a Static Bidirectional Co-Routed CR-LSP................................................................... 1086
10.3.10 Associated Bidirectional Dynamic LSPs...........................................................................................................1087
10.3.11 MPLS TE Control Message...............................................................................................................................1089
10.3.11.1 RSVP-TE Control Message............................................................................................................................ 1089
10.3.12 MPLS TE Applications..................................................................................................................................... 1093
10.3.12.1 MPLS TE Application on an IP RAN............................................................................................................ 1093
10.3.12.2 DS-TE Applications....................................................................................................................................... 1096
10.3.13 Terms, Acronyms, and Abbreviations............................................................................................................... 1099
10.4 Seamless MPLS...................................................................................................................................................... 1100
10.4.1 Introduction..........................................................................................................................................................1100
10.4.2 Principles............................................................................................................................................................. 1100
10.4.3 Applications......................................................................................................................................................... 1116
10.4.3.1 Seamless MPLS Applications in VPN Services............................................................................................... 1117
11 VPN.......................................................................................................................................... 1119
11.1 VPN Overview........................................................................................................................................................ 1119
11.1.1 Introduction to VPN.............................................................................................................................................1120
11.1.1.1 Classification of VPN....................................................................................................................................... 1122
11.1.1.2 Architecture of VPN......................................................................................................................................... 1127
11.1.1.3 Typical Networking of VPN............................................................................................................................. 1127
11.1.2 Principles..............................................................................................................................................................1127
11.1.2.1 VPN Tunnel...................................................................................................................................................... 1128
11.1.2.2 Implementation Modes of VPN........................................................................................................................ 1128
11.1.2.3 Features Related to the Implementation of VPN.............................................................................................. 1129
11.1.3 VPN Applications................................................................................................................................................ 1130
11.1.4 Terms, Acronyms, and Abbreviations..................................................................................................................1132
11.2 Tunnel Policy.......................................................................................................................................................... 1144
11.2.1 Introduction..........................................................................................................................................................1144
11.2.2 Principles..............................................................................................................................................................1144
11.2.2.1 Tunnel Type Prioritizing Policy........................................................................................................................ 1144
11.2.2.2 Tunnel Binding Policy...................................................................................................................................... 1144
11.2.2.3 Comparison of Tunnel Policies.........................................................................................................................1145
11.2.2.4 Tunnel Selector................................................................................................................................................. 1145
11.2.2.5 Introduction.......................................................................................................................................................1146
11.2.3 Applications......................................................................................................................................................... 1147
11.2.3.1 Connecting Discontinuous Local Networks into a VPN.................................................................................. 1147
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxv

Equipment
11.3 BGP/MPLS IP VPN................................................................................................................................................1148

11.3.1 Introduction..........................................................................................................................................................1148
11.3.2 Principles..............................................................................................................................................................1149
11.3.2.1 BGP/MPLS IP VPN..........................................................................................................................................1150
11.3.2.2 HVPN................................................................................................................................................................1157
11.3.2.3 VPN FRR.......................................................................................................................................................... 1165
11.3.2.4 VPN GR............................................................................................................................................................ 1167
11.3.2.5 VPN NSR..........................................................................................................................................................1170
11.3.2.6 BGP SoO...........................................................................................................................................................1171
11.3.2.7 Querying Bearer Relationships Between Tunnels and VPNs........................................................................... 1171
11.4 VLL.........................................................................................................................................................................1175
11.4.1 Introduction to the VLL....................................................................................................................................... 1175
11.4.2 Principles..............................................................................................................................................................1176
11.4.2.1 Basic Concepts..................................................................................................................................................1176
11.4.2.2 CCC VLL..........................................................................................................................................................1177
11.4.2.3 Martini VLL......................................................................................................................................................1178
11.4.2.4 SVC VLL.......................................................................................................................................................... 1186
11.4.2.5 Heterogeneous VLL..........................................................................................................................................1187
11.4.2.6 Comparison Between Modes of Implementing the VLL..................................................................................1188
11.4.2.7 Comparison Between the MPLS L2VPN and the BGP/MPLS VPN............................................................... 1189
11.4.3 Application Environment.....................................................................................................................................1189
11.5 PWE3...................................................................................................................................................................... 1191
11.5.1 Introduction..........................................................................................................................................................1191
11.5.2 Principles..............................................................................................................................................................1193
11.5.2.1 Basic PWE3 Principles..................................................................................................................................... 1193
11.5.2.2 ATM Cell Relay................................................................................................................................................ 1196
11.5.2.3 PW Template.....................................................................................................................................................1201
11.5.2.4 Static-Dynamic MS-PW................................................................................................................................... 1201
11.5.2.5 Other Related Features......................................................................................................................................1202
11.6 PWE3 Reliability.................................................................................................................................................... 1202
11.6.1 Overview..............................................................................................................................................................1202
11.6.2 Principles............................................................................................................................................................. 1205
11.6.2.1 PW Redundancy............................................................................................................................................... 1205
11.6.2.2 PW APS............................................................................................................................................................ 1206
11.6.3 Applications......................................................................................................................................................... 1208
11.6.3.1 PW Redundancy in the Scenario that the Node B Accesses Three PEs (PWE3).............................................1208
11.6.3.2 PW APS Application........................................................................................................................................ 1210
11.7 IP Hard Pipe............................................................................................................................................................1211
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxvi

Equipment
11.7.1 Introduction..........................................................................................................................................................1212
11.7.2 Principles............................................................................................................................................................. 1213
11.7.2.1 Centralized Management of Hard-Pipe-based Leased Line Services on the NMS.......................................... 1213
11.7.2.2 Interface-based Hard Pipe Bandwidth Reservation.......................................................................................... 1215
11.7.2.3 AC Interface Service Bandwidth Limitation.................................................................................................... 1216
11.7.2.4 Hard Pipe TE LSP.............................................................................................................................................1216
11.7.2.5 Hard Pipe VLL/PWE3 PW............................................................................................................................... 1216
11.7.2.6 Hard Pipe Reliability........................................................................................................................................ 1218
11.7.2.7 Hard Pipe Service Quality Monitoring............................................................................................................. 1218
11.7.3 Applications......................................................................................................................................................... 1218
11.7.3.1 Hard-Pipe-based Enterprise Leased Line Application......................................................................................1218
11.7.3.2 Hard-Pipe-based Enterprise Leased Line Protection........................................................................................ 1219
11.7.3.3 Hard-Pipe-based Leased Line Services Implemented Using Both Huawei and Non-Huawei Devices........... 1219
11.8 VPLS.......................................................................................................................................................................1220
11.8.1 Introduction to VPLS...........................................................................................................................................1221
11.8.2 Principles............................................................................................................................................................. 1222
11.8.2.1 VPLS Introduction............................................................................................................................................1222
11.8.2.2 LDP VPLS........................................................................................................................................................ 1230
11.8.2.3 BGP AD VPLS................................................................................................................................................. 1231
11.8.2.4 VPLS PW Redundancy.....................................................................................................................................1236
11.9 L2VPN Loop Detection.......................................................................................................................................... 1239
11.9.1 Overview..............................................................................................................................................................1240
11.9.2 Principles............................................................................................................................................................. 1240
11.9.2.1 Basic Concepts and Implementation Principles................................................................................................1240
11.9.3 Applications......................................................................................................................................................... 1242
11.9.3.1 Application of L2VPN Loop Detection When a CE Is Single-homed to a PE over Redundant Links............1242
11.9.3.2 Application of L2VPN Loop Detection When a Customer Network Is Dual-homed to a VPLS/VLL Network
........................................................................................................................................................................................ 1243
11.10 IP RAN Virtual Cluster.........................................................................................................................................1244
11.10.1 Introduction to IP RAN Virtual Clusters........................................................................................................... 1244
11.10.2 Principles........................................................................................................................................................... 1252
11.10.2.1 Data Plane....................................................................................................................................................... 1252
11.10.2.2 Control Plane.................................................................................................................................................. 1254
11.10.2.3 Management Plane..........................................................................................................................................1259
11.10.2.4 Protection Switching.......................................................................................................................................1260
11.10.2.5 Graceful Restart.............................................................................................................................................. 1261
11.10.2.6 OAM............................................................................................................................................................... 1262
11.10.3 Application.........................................................................................................................................................1264
11.10.3.1 Application of IP RAN Virtual Clusters......................................................................................................... 1264
11.10.4 Terms, Acronyms, and Abbreviations............................................................................................................... 1267
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxvii

Equipment
12 QoS........................................................................................................................................... 1269
12.1 QoS Overview........................................................................................................................................................ 1269
12.1.1 Introduction to QoS............................................................................................................................................. 1269
12.1.1.1 Traditional Packets Transmission Application................................................................................................. 1270
12.1.1.2 New Applications Requirements...................................................................................................................... 1270
12.1.2 End-to-End QoS Model....................................................................................................................................... 1270
12.1.2.1 Best-Effort Service Model................................................................................................................................1271
12.1.2.2 Integrated Service Model..................................................................................................................................1271
12.1.2.3 Differentiated Service Model........................................................................................................................... 1271
12.1.3 Techniques Used for the QoS Application.......................................................................................................... 1276
12.1.3.1 Traffic Classification........................................................................................................................................ 1278
12.1.3.2 Traffic Policing and Shaping............................................................................................................................ 1278
12.1.3.3 Congestion Avoidance Configuration...............................................................................................................1279
12.1.3.4 RSVP................................................................................................................................................................ 1281
12.2 Traffic Policing and Traffic Shaping...................................................................................................................... 1281
12.2.1 Introduction......................................................................................................................................................... 1281
12.2.2 Principles............................................................................................................................................................. 1282
12.2.2.1 Basic Principles of Traffic Policing..................................................................................................................1282
12.2.2.2 Basic Principles of Traffic Shaping.................................................................................................................. 1284
12.2.3 Applications.........................................................................................................................................................1286
12.3 Congestion Avoidance and Management............................................................................................................... 1288
12.3.1 Introduction......................................................................................................................................................... 1288
12.3.2 Principles............................................................................................................................................................. 1289
12.3.2.1 Basic Principles of Congestion Avoidance.......................................................................................................1289
12.3.2.2 Basic Principles of Congestion Management................................................................................................... 1293
12.3.3 Applications.........................................................................................................................................................1296
12.4 Class-Based QoS.................................................................................................................................................... 1298
12.4.1 Introduction......................................................................................................................................................... 1298
12.4.2 Principles............................................................................................................................................................. 1299
12.4.2.1 Simple Traffic Classification............................................................................................................................ 1299
12.4.2.2 Complex Traffic Classification.........................................................................................................................1304
12.4.3 Applications.........................................................................................................................................................1307
12.5 HQoS...................................................................................................................................................................... 1309
12.5.1 Introduction to HQoS.......................................................................................................................................... 1310
12.5.2 Principles............................................................................................................................................................. 1310
12.5.2.1 Related Concepts of HQoS...............................................................................................................................1310
12.5.2.2 Queue Scheduling Technology......................................................................................................................... 1311
12.5.2.3 HQoS Queue Scheduling..................................................................................................................................1313
12.5.3 HQoS Applications..............................................................................................................................................1314
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxviii

Equipment
12.5.4 Terms and Abbreviations.....................................................................................................................................1315
13 Clock........................................................................................................................................ 1316
13.1 Clock Synchronization........................................................................................................................................... 1316
13.1.1 Introduction......................................................................................................................................................... 1316
13.1.2 Principles............................................................................................................................................................. 1317
13.1.2.1 Basic Concepts................................................................................................................................................. 1317
13.1.2.2 Synchronization Mode and Issues of Concern................................................................................................. 1317
13.1.2.3 Networking Mode for Clock Synchronization................................................................................................. 1319
13.1.2.4 Typical Networking for Clock Synchronization...............................................................................................1321
13.1.2.5 Clock Protection Switching.............................................................................................................................. 1323
13.1.3 Applications.........................................................................................................................................................1325
13.2 NTP.........................................................................................................................................................................1331
13.2.1 Introduction......................................................................................................................................................... 1332
13.2.2 Principle...............................................................................................................................................................1332
13.2.2.1 Network Architecture....................................................................................................................................... 1332
13.2.2.2 Operating Mode................................................................................................................................................ 1333
13.2.2.3 Event Processing of NTP..................................................................................................................................1336
13.2.2.4 Operating Principle...........................................................................................................................................1338
13.2.2.5 Security Mechanism......................................................................................................................................... 1339
13.2.2.6 Dynamic and Static Associations of NTP........................................................................................................ 1340
13.2.3 Terms and Acronyms...........................................................................................................................................1340
13.3 1588v2.................................................................................................................................................................... 1342
13.3.1 Introduction to 1588v2........................................................................................................................................ 1342
13.3.2 Principles............................................................................................................................................................. 1344
13.3.2.1 Basic Concepts................................................................................................................................................. 1345
13.3.2.2 Principle of Synchronization............................................................................................................................ 1347
13.3.3 Application Environment.................................................................................................................................... 1356
13.4 1588 ACR............................................................................................................................................................... 1360
13.4.1 Introduction to 1588 ACR................................................................................................................................... 1360
13.4.2 Principles............................................................................................................................................................. 1361
13.4.2.1 Basic Mechanisms of 1588 ACR......................................................................................................................1361
13.4.2.2 Basic Principles of 1588 ACR..........................................................................................................................1362
13.4.3 Applications.........................................................................................................................................................1364
13.5 1588 ATR................................................................................................................................................................1367
13.5.1 Introduction to 1588 ATR....................................................................................................................................1368
13.5.2 Principles............................................................................................................................................................. 1369
13.5.2.1 Basic Mechanisms of 1588 ATR...................................................................................................................... 1369
13.5.2.2 Basic Principles of 1588 ATR.......................................................................................................................... 1370
13.5.3 Applications.........................................................................................................................................................1372
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxix

Equipment

13.6 CES ACR/DCR...................................................................................................................................................... 1374
13.6.1 Introduction......................................................................................................................................................... 1375
13.6.2 Principles............................................................................................................................................................. 1375
13.6.2.1 Basic Concepts................................................................................................................................................. 1375
13.6.2.2 Basic Principles................................................................................................................................................ 1375
13.6.3 Applications.........................................................................................................................................................1377
13.7 G.8275.1................................................................................................................................................................. 1378
13.7.1 Introduction......................................................................................................................................................... 1378
13.7.2 Principles............................................................................................................................................................. 1380
13.7.2.1 Basic Concepts................................................................................................................................................. 1380
13.7.2.2 Clock Synchronization Principles.................................................................................................................... 1384
13.7.2.3 Time Synchronization Principles......................................................................................................................1384
13.7.3 Applications.........................................................................................................................................................1386
13.8 Atom GPS Timing.................................................................................................................................................. 1388
13.8.1 Introduction......................................................................................................................................................... 1389
13.8.2 Principles............................................................................................................................................................. 1389
13.8.3 Applications.........................................................................................................................................................1391
14 Security.................................................................................................................................... 1393
14.1 MAC Address Limit............................................................................................................................................... 1393
14.1.1 Introduction to MAC Address Limitation........................................................................................................... 1393
14.1.2 Principles............................................................................................................................................................. 1394
14.1.2.1 Basic Principles of MAC Address Limit.......................................................................................................... 1394
14.1.2.2 Traffic Suppression Principle........................................................................................................................... 1395
14.1.3 MAC Address Limit Applications.......................................................................................................................1396
14.1.4 Terms, Acronyms, and Abbreviations ................................................................................................................ 1397
14.2 DHCP Snooping..................................................................................................................................................... 1398
14.2.1 Introduction......................................................................................................................................................... 1398
14.2.2 Principles............................................................................................................................................................. 1398
14.2.2.1 Bogus DHCP Server Attack............................................................................................................................. 1398
14.2.2.2 Middleman Attack and IP/MAC Spoofing Attack........................................................................................... 1399
14.2.2.3 DoS Attack Launched by Changing the Value of the CHADDR Field............................................................1401
14.2.2.4 Format of the Option 82 Field.......................................................................................................................... 1402
14.2.3 Applications.........................................................................................................................................................1404
14.2.4 Terms, Acronyms, and Abbreviations ................................................................................................................ 1405
14.3 URPF...................................................................................................................................................................... 1405
14.3.1 Introduction......................................................................................................................................................... 1406
14.3.2 Principles............................................................................................................................................................. 1407
14.3.2.1 Principles of URPF........................................................................................................................................... 1407
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxx

Equipment
14.3.3 Applications.........................................................................................................................................................1408
14.4 Local Attack Defense..............................................................................................................................................1411
14.4.1 Introduction..........................................................................................................................................................1411
14.4.2 Principle of Device Security................................................................................................................................1412
14.4.2.1 Management and Control Plane Protection...................................................................................................... 1412
14.4.2.2 Attack Source Tracing...................................................................................................................................... 1412
14.4.2.3 CP-CAR............................................................................................................................................................1413
14.4.2.4 Whitelist-based Application Layer Association............................................................................................... 1414
14.4.2.5 Alarm................................................................................................................................................................ 1414
14.4.3 Applications.........................................................................................................................................................1414
14.4.3.1 Whitelist-based Application Layer Association............................................................................................... 1414
14.4.3.2 CP-CAR............................................................................................................................................................1415
14.4.4 Acronyms and Abbreviations.............................................................................................................................. 1415
14.4.4.1 Abbreviations....................................................................................................................................................1416
14.5 Mirroring................................................................................................................................................................ 1416
14.5.1 Introduction to Mirroring.....................................................................................................................................1416
14.5.2 Principle...............................................................................................................................................................1417
14.5.2.1 Principle of Local Mirroring.............................................................................................................................1418
14.5.2.2 Application....................................................................................................................................................... 1418
14.6 Online Packet Head Capture...................................................................................................................................1419
14.6.1 Introduction......................................................................................................................................................... 1419
14.6.2 Principles............................................................................................................................................................. 1419
14.6.3 Applications.........................................................................................................................................................1420
14.7 MPAC..................................................................................................................................................................... 1421
14.7.1 Introduction......................................................................................................................................................... 1421
14.7.2 Principles............................................................................................................................................................. 1422
14.8 Keychain................................................................................................................................................................. 1424
14.8.1 Introduction......................................................................................................................................................... 1424
14.8.2 Principles............................................................................................................................................................. 1425
14.8.2.1 Principles of Keychain......................................................................................................................................1425
14.8.3 Applications.........................................................................................................................................................1425
14.8.3.1 Non-TCP Applications of Keychain.................................................................................................................1425
14.8.3.2 TCP Applications of Keychain.........................................................................................................................1427
14.9 IPSec....................................................................................................................................................................... 1430
14.9.1 Introduction......................................................................................................................................................... 1430
14.9.2 Principles............................................................................................................................................................. 1431
14.9.2.1 IPSec Basic Concepts....................................................................................................................................... 1431
14.9.2.2 IPSec Implementation.......................................................................................................................................1435
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxi

Equipment
14.9.3 Applications.........................................................................................................................................................1436
14.9.3.1 IPSec Application in PIM.................................................................................................................................1436
15 User Management..................................................................................................................1439
15.1 AAA and User Management.................................................................................................................................. 1439
15.1.1 Introduction to AAA and User Management...................................................................................................... 1440
15.1.2 Principles............................................................................................................................................................. 1441
15.1.2.1 AAA..................................................................................................................................................................1441
15.1.2.2 RADIUS........................................................................................................................................................... 1443
15.1.2.3 HWTACACS.................................................................................................................................................... 1447
15.1.2.4 User Management.............................................................................................................................................1448
15.1.3 Applications.........................................................................................................................................................1449
15.1.3.1 RADIUS Authentication and Accounting........................................................................................................ 1449
15.1.3.2 HWTACACS Authentication, Accounting, and Authorization........................................................................1449
15.2 DHCP......................................................................................................................................................................1450
15.2.1 DHCP Overview..................................................................................................................................................1450
15.2.2 Principles............................................................................................................................................................. 1451
15.2.2.1 DHCP Overview...............................................................................................................................................1451
15.2.2.2 Introduction to DHCP Messages...................................................................................................................... 1452
15.2.2.3 Description of the Option 82 Field................................................................................................................... 1455
15.2.2.4 Operation Principle of a DHCP Client............................................................................................................. 1457
15.2.2.5 DHCP Relay Principles.................................................................................................................................... 1458
15.2.2.6 Working Principles of a DHCP Server............................................................................................................. 1460
15.2.3 Applications.........................................................................................................................................................1464
15.2.3.1 DHCP Client Application................................................................................................................................. 1464
15.2.3.2 DHCP Server Application................................................................................................................................ 1464
15.2.3.3 DHCP Relay Application................................................................................................................................. 1465
15.3 DHCPv6..................................................................................................................................................................1466
15.3.1 Introduction......................................................................................................................................................... 1466
15.3.2 Principles............................................................................................................................................................. 1467
15.3.2.1 Principles of DHCPv6 Access.......................................................................................................................... 1467
15.3.3 Applications.........................................................................................................................................................1470
15.3.3.1 DHCPv6 Client over PPPoE (Including DHCPv6-PD)................................................................................... 1470
15.3.3.2 DHCPv6 Relay................................................................................................................................................. 1471
15.4 Plug-and-Play......................................................................................................................................................... 1472
15.4.1 Introduction to Plug-and-Play............................................................................................................................. 1472
15.4.2 Principles............................................................................................................................................................. 1472
15.4.2.1 Principle of DHCP............................................................................................................................................1472
15.4.2.2 Operation Process of PnP................................................................................................................................. 1473
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxii

Equipment
15.4.3 Applications.........................................................................................................................................................1475
15.4.3.1 Application of PnP............................................................................................................................................1475
15.5 DCN........................................................................................................................................................................1476
15.5.1 Introduction......................................................................................................................................................... 1476
15.5.2 Principles............................................................................................................................................................. 1477
15.5.2.1 Basic Concepts................................................................................................................................................. 1477
15.5.2.2 Basic DCN Principles....................................................................................................................................... 1478
15.5.2.3 DCN over Service Interfaces............................................................................................................................ 1481
15.5.2.4 Gateway DCN over the Control Plane............................................................................................................. 1483
15.5.2.5 DCN Security................................................................................................................................................... 1484
15.5.3 Applications.........................................................................................................................................................1485
15.5.3.1 Typical DCN Application................................................................................................................................. 1485
15.6 PPPoE Access.........................................................................................................................................................1487
15.6.1 Introduction to the PPPoE................................................................................................................................... 1487
15.6.2 Principles............................................................................................................................................................. 1487
15.6.2.1 PPPoE Negotiation Process for User Login..................................................................................................... 1487
15.6.2.2 PPPoE Packet Format....................................................................................................................................... 1490
15.6.2.3 PPPoE Packet Structure....................................................................................................................................1491
15.6.3 Applications.........................................................................................................................................................1494
15.7 PPPoE+...................................................................................................................................................................1495
15.7.1 Introduction to PPPoE+....................................................................................................................................... 1495
15.7.2 Principles............................................................................................................................................................. 1495
15.7.3 Applications.........................................................................................................................................................1497
15.8 802.1x Access......................................................................................................................................................... 1498
15.8.1 Introduction......................................................................................................................................................... 1499
15.8.2 Principle...............................................................................................................................................................1499
15.8.2.1 Basic Principle of 802.1x Access..................................................................................................................... 1499
15.8.2.2 Authentication Initiation and User Logoff....................................................................................................... 1500
15.8.2.3 EAP Packet Relaying and Termination............................................................................................................ 1501
15.8.2.4 Basic Process of the 802.1x Authentication System........................................................................................ 1501
15.8.3 Applications.........................................................................................................................................................1503
15.9 Attributes List of RADIUS, HWTACACS.............................................................................................................1504
15.9.1 HWTACACS Attribute........................................................................................................................................1504
15.9.2 RADIUS Attributes............................................................................................................................................. 1506
15.9.2.1 Attributes Carried in RADIUS Packets............................................................................................................ 1506
15.9.2.1.1 Attributes in RADIUS Packets...................................................................................................................... 1506
15.9.2.1.2 Attributes in RADIUS COA&DM Packets................................................................................................... 1512
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxiii

Equipment
15.9.2.2 Description of RADIUS Attributes.................................................................................................................. 1517

15.9.2.2.1 RADIUS Attributes Defined by RFC............................................................................................................ 1517
1. User-Name (1)............................................................................................................................................................ 1517
2. User-Password (2)...................................................................................................................................................... 1518
3. CHAP-Password (3)................................................................................................................................................... 1518
4. NAS-IP-Address (4)................................................................................................................................................... 1518
5. NAS-Port (5)...............................................................................................................................................................1519
6. Service-Type (6)......................................................................................................................................................... 1520
7. Framed-Protocol (7)................................................................................................................................................... 1521
8. Framed-IP-Address (8)............................................................................................................................................... 1521
9. Framed-IP-Netmask (9).............................................................................................................................................. 1522
10. Filter-Id (11)............................................................................................................................................................. 1522
11. Framed-MTU (12).................................................................................................................................................... 1523
12. Login-IP-Host (14)................................................................................................................................................... 1523
13. Login-Service (15)....................................................................................................................................................1524
14. Reply-Message (18)..................................................................................................................................................1524
15. Callback-Number (19).............................................................................................................................................. 1525
16. Framed-route (22).....................................................................................................................................................1525
17. State (24)...................................................................................................................................................................1526
18. Class (25).................................................................................................................................................................. 1526
19. Vendor-Specific (26).................................................................................................................................................1527
20. Session-Timeout (27)................................................................................................................................................1528
21. Idle-Timeout (28)......................................................................................................................................................1528
22. Termination-Action (29)...........................................................................................................................................1529
23. Called-Station-Id (30)...............................................................................................................................................1530
24. Calling-Station-Id (31)............................................................................................................................................. 1530
25. NAS-Identifier (32).................................................................................................................................................. 1532
26. Proxy-State (33)........................................................................................................................................................1532
27. Acct-Status-Type (40)...............................................................................................................................................1533
28. Acct-Delay-Time (41)...............................................................................................................................................1534
29. Acct-Input-Octets (42)..............................................................................................................................................1534
30. Acct-Output-Octets (43)........................................................................................................................................... 1535
31. Acct-Session-Id (44).................................................................................................................................................1535
32. Acct-Authentic (45).................................................................................................................................................. 1536
33. Acct-Session-Time (46)............................................................................................................................................1537
34. Acct-Input-Packets (47)............................................................................................................................................1537
35. Acct-Output-Packets (48)......................................................................................................................................... 1537
36. Acct-Terminate-Cause (49)...................................................................................................................................... 1538
37. Acct-Multi-Session-Id (50).......................................................................................................................................1539
38. Acct-Input-Gigawords (52)...................................................................................................................................... 1540
39. Acct-Output-Gigawords (53)....................................................................................................................................1540
40. Event-Timestamp (55).............................................................................................................................................. 1541
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxiv

Equipment
41. CHAP-Challenge (60).............................................................................................................................................. 1541

42. NAS-Port-Type (61)................................................................................................................................................. 1541
43. Port-Limit (62)..........................................................................................................................................................1542
44. Tunnel-Type (64)...................................................................................................................................................... 1542
45. Tunnel-Medium-Type (65)....................................................................................................................................... 1543
46. Tunnel-Client-Endpoint (66).................................................................................................................................... 1543
47. Tunnel-Server-Endpoint (67)....................................................................................................................................1544
48. Acct-Tunnel-Connection (68)...................................................................................................................................1544
49. Tunnel-Password (69)...............................................................................................................................................1545
50. Connect-Info (77)..................................................................................................................................................... 1545
51. EAP-Message (79)....................................................................................................................................................1546
52. Message-Authenticator (80)..................................................................................................................................... 1546
53. Tunnel-Assignment-ID (82)..................................................................................................................................... 1547
54. Tunnel-Preference (83)............................................................................................................................................. 1547
55. Acct-Tunnel-Packets-Lost (86).................................................................................................................................1548
56. NAS-Port-Id (87)...................................................................................................................................................... 1548
57. Framed-Pool (88)......................................................................................................................................................1549
58. Chargeable-User-Identity (89)..................................................................................................................................1549
59. Tunnel-Client-Auth-ID (90)..................................................................................................................................... 1549
60. Tunnel-Server-Auth-ID (91).....................................................................................................................................1550
61. NAS-IPv6-Address (95)........................................................................................................................................... 1550
62. Framed-Interface-Id (96).......................................................................................................................................... 1551
63. Framed-Ipv6-Prefix (97)...........................................................................................................................................1551
64. Error-Cause (101)..................................................................................................................................................... 1552
15.9.2.2.2 RADIUS Attributes Defined by Huawei+1.1 Protocol (Vendor = 2011)...................................................... 1553
1. HW-Input-Committed-Information-Rate (2).............................................................................................................. 1554
2. HW-Input-Peak-Information-Rate (3)........................................................................................................................ 1554
3. HW-Output-Committed-Information-Rate (5)........................................................................................................... 1554
4. HW-Output-Peak-Information-Rate (6)..................................................................................................................... 1555
5. HW-Input-Kilobytes-Before-Tariff-Switch (7)...........................................................................................................1555
6. HW-Output-Kilobytes-Before-Tariff-Switch (8)........................................................................................................1556
7. HW-Input-Packets-Before-Tariff-Switch (9)..............................................................................................................1556
8. HW-Output-Packets-Before-Tariff-Switch (10)......................................................................................................... 1557
9. HW-Input-Kilobytes-After-Tariff-Switch (11)........................................................................................................... 1557
10. HW-Output-Kilobytes-After-Tariff-Switch (12)...................................................................................................... 1558
11. HW-Input-Packets-After-Tariff-Switch (13).............................................................................................................1558
12. HW-Output-Packets-After-Tariff-Switch (14)..........................................................................................................1559
13. HW-Remanent-Volume (15).....................................................................................................................................1559
14. HW-Tariff-Switch-Interval (16)................................................................................................................................1560
15. HW-Subscriber-QoS-Profile (17)............................................................................................................................. 1560
16. HW-Command (20).................................................................................................................................................. 1560
17. HW-Priority (22).......................................................................................................................................................1561
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxv

Equipment
18. HW-Control-Identifier (24)...................................................................................................................................... 1561

19. HW-Result-Code (25)...............................................................................................................................................1562
20. HW-Connect-ID (26)................................................................................................................................................ 1562
21. HW-Portal-URL (27)................................................................................................................................................ 1563
22. HW-FTP-Directory (28)........................................................................................................................................... 1563
23. HW-Exec-Privilege (29)........................................................................................................................................... 1563
24. HW-SIP-Server (32)................................................................................................................................................. 1564
25. HW-User-Password (33)...........................................................................................................................................1564
26. HW-Command-Mode (34)....................................................................................................................................... 1565
27. HW-Renewal-Time (35)........................................................................................................................................... 1566
28. HW-Rebinding-Time (36).........................................................................................................................................1566
29. HW-Igmp-Enable (37).............................................................................................................................................. 1566
30. HW-NAS-Startup-Time-Stamp (59).........................................................................................................................1567
31. HW-IP-Host-Address (60)........................................................................................................................................ 1567
32. HW-Up-Priority (61)................................................................................................................................................ 1568
33. HW-Down-Priority (62)........................................................................................................................................... 1568
34. HW-Tunnel-VPN-Instance (63)................................................................................................................................1568
35. HW-User-Date (65).................................................................................................................................................. 1569
36. HW-User-Class (66)................................................................................................................................................. 1569
37. HW-Subnet-Mask (72)............................................................................................................................................. 1570
38. HW-Gateway-Address (73)...................................................................................................................................... 1570
39. HW-Lease-Time (74)................................................................................................................................................1570
40. HW-Ascend-Client-Primary-WINS (75).................................................................................................................. 1571
41. HW-Ascend-Client-Second-WIN (76)..................................................................................................................... 1571
42. HW-Tunnel-Session-Limit (80)................................................................................................................................1572
43. HW-Portal-Mode (85)...............................................................................................................................................1572
44. HW-Policy-Route (87)..............................................................................................................................................1572
45. HW-Framed-Pool (88).............................................................................................................................................. 1573
46. HW-L2TP-Terminate-Cause (89)............................................................................................................................. 1573
47. HW-Multicast-Profile-Name (93).............................................................................................................................1574
48. HW-VPN-Instance (94)............................................................................................................................................ 1574
49. HW-Tunnel-Group-Name (96)................................................................................................................................. 1575
50. HW-Client-Primary-DNS (135)............................................................................................................................... 1575
51. HW-Client-Secondary-DNS (136)........................................................................................................................... 1575
52. HW-Domain-Name (138)......................................................................................................................................... 1576
53. HW-ANCP-Profile (139).......................................................................................................................................... 1576
54. HW-HTTP-Redirect-URL (140)...............................................................................................................................1576
55. HW-Qos-Profile-Type (142).....................................................................................................................................1577
56. HW-Max-List-Num (143).........................................................................................................................................1577
57. HW-Acct-ipv6-Input-Octets (144)........................................................................................................................... 1578
58. HW-Acct-ipv6-Output-Octets (145).........................................................................................................................1578
59. HW-Acct-ipv6-Input-Packets (146)......................................................................................................................... 1579
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxvi

Equipment
60. HW-Acct-ipv6-Output-Packets (147).......................................................................................................................1579

61. HW-Acct-ipv6-Input-Gigawords (148).................................................................................................................... 1580
62. HW-Acct-ipv6-Output-Gigawords (149)................................................................................................................. 1580
63. HW-DHCPv6-Option37 (150)..................................................................................................................................1580
64. HW-User-Mac (153)................................................................................................................................................. 1581
65. HW-Version (254).....................................................................................................................................................1582
66. HW-Product-ID (255)...............................................................................................................................................1582
15.9.2.2.3 RADIUS Attributes Defined by Redback (Vendor ID = 2352).....................................................................1582
1. Forward-Policy (92)................................................................................................................................................... 1582
Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxxvii

Equipment
Feature Description 1 Basic Configurations
1 Basic Configurations
About This Chapter
This document describes the basic configurations protocols and features in terms of its
overview, principle, and applications.
1.1 VRP Overview

This chapter introduces the features of the Versatile Routing Platform (VRP).
1.2 Basic Configuration
1.1 VRP Overview

This chapter introduces the features of the Versatile Routing Platform (VRP).
1.1.1 Overview
This section describes the primary functions and the evolution of the VRP.
1.1.1.1 VRP Introduction

The Versatile Routing Platform (VRP) is a Network Operating System (NOS) used by the
data communication products developed by Huawei Technologies Co., Ltd.
A brief definition of the NOS is presented as follows.
NOS
NOS is a type of system software used to realize network access and provide interconnection
services.
The primary functions of the NOS are as follows:
l Allocating and transferring system resources

l Providing network communication services
Issue 01 (2018-01-30) Huawei Proprietary and Confidential 1

Equipment
l Providing user access control and system security management

l Providing application management
With the expansion of network scale and the rapid development of Internet technologies, an
efficient and stable NOS becomes key to guaranteeing network services and service quality.
VRP
Similar to the NOS, the VRP is the nerve center to Huawei products, ranging from low-end to
core ATNs, Ethernet switches to service gateways.
The VRP has the following functions:
l Unified user interface and management interface: unified kernel of real-time operating
system, IP forwarding engine, IP routing, and configuration management plane.
l Control plane, interface criterion on the forwarding plane, interaction between link layer
of various products and the VRP control plane
l Shielding link layer discrepancy from the network layer through the network interface
layer
1.1.2 VRP Architecture

This section describes the structure and functions of the VRP system plane.
1.1.2.1 NOS Model

The Network Operating System(NOS) models of VRP are divided to the centralized model
and the distributed model, which are shown in Figure 1-1 and Figure 1-2, respectively.
Figure 1-1 Centralized NOS model of VRP
Routing Engine Main Processing Unit

(CPU-based)
FIB
VRP
FIB
Packet Forwarding ASIC/
Engine NP
(CPU/ASIC/NP-based)
I/O I/O
Card Card

Equipment
Figure 1-2 Distributed NOS model of VRP
Main Processing Unit
VRP FIB
Routing Engine Line Processing Unit

(CPU-based)
FIB
VRP
Packet Forwarding FIB

Engine
ASIC/NP
(CPU/ASIC/NP-based)
I/O I/O
Card Card
1.1.2.2 System Plane

The VRP is divided into five system planes, as shown in Figure 1-3.
Figure 1-3 VRP architecture
Service Control Plane

(SCP) System Management
General Control Plane
Plane
(GCP)
Data Forwarding Plane (SMP)
(DFP)
System Service Plane (SSP)

Equipment
1.1.2.3 SCP
Figure 1-4 SCP architecture
Service Control Plane (SCP)

Proto Client
AAA LocalM
CM
The primary features of the SCP are as follows:

l Connection management
l User authentication, authorization and accounting
l User administration
1.1.2.4 DFP
Figure 1-5 DFP architecture
Data Forwarding Plane (DFP)
FE API
FEC
FE DRV
The primary functions of the DFP are listed as follows:

l Forwarding packets
l Applying QoS policy
l Applying security policy
l Maintaining forwarding information

Equipment
1.1.2.5 GCP
Figure 1-6 GCP architecture
General Control Plane(GCP)
Multicast Subsystem Routing Subsystem VPN Subsystem
OSPFv2 IS-IS RIP L2VPN L3VPN

GMP4/6 MSDP
OSPFv3 BGP RIPng VPDN IPSec VPN
PIM-SM4/6 PIM-DM4/6
RM4/6 VPN IFM
PIM-SSM4/6
MRM4/6
Net Interface Subsystem MPLS Subsystem
IFNET TunnelM MPLS IFM
IP Stack Subsystem LSPM LDP CSPF

PPP ETH ATM Tunn
IP Application 4/6 RSVP-TE TEDB
Link Manager
LSP Agent
Socket Layer
TCP4/6 UDP4/6 Security Subsystem QoS Subsystem

ICMP4/6 IP4/6 IPSec Firewall ACL BW-M SAA
NAT CA DecSec QoSM RSVP
The primary functions of the GCP are listed as follows:

l Sockets
l TCP protocol stack
l IP protocol stack
l Network interface layer
l Link layer protocols
l Routing protocols
l QoS
l VPN
l MPLS

Equipment
1.1.2.6 SMP
Figure 1-7 SMP architecture
System Management Plane (SMP)

Config Management Information Management Device Management
Subsystem Subsystem Subsystem
WebUI CLI MML IM BoardM HeartBeat
SNMP BINA TRACE State CFG-RS SwitchOver
CMO Multi-Language Hot Plug Alarm Proc
The primary functions of the SMP are as follows:

l System configuration
l Output management
1.1.2.7 SSP
Figure 1-8 SSP architecture

System Service Plane (SSP)
System Function
Kernel
Operating System
The primary functions of the SSP are as follows:

l Shielding low-layer operating system from the upper application layer
l Kernel management: management of tasks, memory, messages, time, timers, semaphore,
events, message queue, socket interface, and file systems
l System functions: online patch, command line, storage protection, and system
monitoring
1.1.3 VRP System Features

This section covers three system-level features.
1.1.3.1 Componentized Structure

Adopting the componentized structure, the VRP can provide applications-based tailorability
and scalability in addition to other abundant features.
To be specific, in the VRP:

Equipment
l All the protocols and features are componentized and can be dynamically controlled
through the License.
l Core components are separated from the hardware platforms, and can provide better
adaptability and cross-platform application.
1.1.3.2 License
The License furnishes the VRP with high flexibility.
The License manages the following items:
l Features (GTL license): A user can access only those features allowed by the License.
l Resources (PAF license): For example, the License sets limits on the number of reserved
routes, on the number of LSPs, and on the number of VPN instances.
Generally, the product price increases along with the quantity of functions purchased.
Functions not required can be removed to save the customer money.
1.1.3.3 HA
High availability (HA) guarantees the availability of the VRP 99.999% of the time. In other
words, the unavailable period is less than 5 minutes in a year.
To enable HA, the VRP adopts the following mechanisms:
l System level hot standby (HSB)
l System level GR
l Protocol level GR
l Fast reroute (FRR)
1.2 Basic Configuration
1.2.1 Introduction to Basic Configuration

Definition
In configuration management, the terminal service provides the access interface and human-
machine interfaces (HMIs) for users to configure devices.
The login mode includes:
l Login through the console port
l Telnet server/client
l Login through Secure Shell (SSH), with a password, with Elliptic Curves Cryptography
(ECC) authentication, with Revest-Shamir-Adleman Algorithm (RSA) authentication,
and with the Digital Signature Algorithm (DSA) authentication
l Login through customized user interfaces providing multiple user authentications and
authorization modes
The file transfer mode provides transmission control for system files and configuration files,
and simple remote management for the file system.

Equipment
The file transfer mode includes:
l FTP client/server
l TFTP client
l SSH FTP (SFTP) client/server
The following describes the principles of every protocol feature according to the type,
including the following parts:
l FTP
l TFTP
l Telnet
l SSH
l User management
l Virtual file system
l Daylight saving time
l Timing restart
Purpose
The terminal service provides the access interface and HMIs for users to configure devices.
File transfer provides transmission control for system files and configuration files, and simple
remote management for the file system.
1.2.2 Principles
1.2.2.1 FTP
As a protocol in the TCP/IP protocol suite, the File Transfer Protocol (FTP), running at the
application layer, is used for transferring files between local and remote hosts over the
Internet. FTP, which is implemented based on the file system, has been widely used during
version upgrade, log downloading and configuration saving.
NOTE
FTP is insecure. Using SFTP is recommended.
FTP is built on the client-server architecture, as shown in Figure 1-9.
Figure 1-9 FTP client/server architecture
IP Network
Server Client
The ATN provides the following FTP functions:

Equipment
l FTP server: indicates that the ATN functions as an FTP server to which users can log in
to access files by running the FTP client program.
l FTP client: indicates that the ATN functions as an FTP client that can access files saved
on a remote server. After running the terminal emulation program or using the Telnet
program on a PC to set up a connection to the ATN, a user can set up a connection to a
remote FTP server by using the FTP commands and access files saved on the remote
server.
In addition to file transfer, FTP supports interactive access, format specifications, and
authentication control.
FTP provides common file operation s to help users perform simple management over the file
system as well as supporting file transfer between hosts. Users can use a PC running the FTP
client program to upload files, download files, and access file directories on the ATN that
functions as an FTP server, or, use the FTP client program on the ATN that functions as an
FTP client to transfer files to an FTP server.
Basic Concepts of FTP

Before using FTP, familiarize yourself with the following basic concepts about file transfer:
l File type
– ASCII mode is used for text. Data is converted from the sender's character
representation to "8-bit ASCII" before transmission, and to the receiver's character
representation.
– Extended Binary-Coded Decimal Interchange Code (EBCDIC) mode requires that
both ends use the EBCDIC character set.
– Binary mode requires that the sender sends each file byte for byte. This mode is
often used to transfer image files and program files.
– Local mode allows two hosts using different file systems to send files in binary bit
streams. The bit stream of each byte is defined by the sender.
NOTE
The ATN supports the ASCII and binary modes. Differences between these two modes are as
follows:
l ASCII characters are used to separate carriage returns from line feeds.
l Binary characters can be transferred without format converting.
The client can select an FTP transmission mode, but by default the ASCII mode is used. The client
can use a mode switch command to switch between the two modes.
l File structure
– Byte stream structure is also called the file structure. A file is considered as a
continuous byte stream.
– Record structure is used only for text files in either ASCII or EBCDIC mode.
– Page structure files are transferred page for page with the pages numbered so the
receiver can save them without worrying about the pages being out of order.
NOTE
The ATN supports both the record structure and the byte stream structure.
l Transfer mode
– Stream mode

Equipment
Data is sent as a continuous stream. For the file structure, the sender sends an End-
Of-File (EOF) indicator at the end of file transfer to prompts the receiver to close
the data connection. For the record structure, a two-byte sequence number is used to
indicate the end of the record and file.
– Block mode
FTP breaks a file into several blocks and each block starts with a block header.
– Compressed mode
FTP compresses the bytes that are the same and consecutively sent.
NOTE
The ATN supports the stream mode.
l port command
The port command enables an interface. The command format is port a,b,c,d,e,f. a,b,c,d
specifies the IP address of an interface, in dotted decimal notation; e,f, which consists of
two decimal numbers, specifies the interface number calculated based on the formula of
e x 256 + f. For example:
ftp> debug
Debugging On .
ftp> ls
---> PORT 10,164,9,96,5,28
Here, 10.164.9.96 is an IP address; the values 5 and 28 are used to calculate the interface
number 1308 (5 x 256 + 28 = 1308).
FTP Connections
Figure 1-10 shows the process of file transfer through FTP.
Figure 1-10 File transfer through FTP
User User Interface
Control
User Protocol Connection Server Protocol
Interpreter Interpreter
User Data Data Server Data

File Connection File
Transfer Transfer
System System
Function Function
Client Server
FTP uses two TCP connections to transfer files. They are:
l Control connection

Equipment
A control connection is set up between the FTP client and the FTP server. The server
enables common port 21 and then waits for a connection request from the client; the
client enables common port 21 and then sends a request for setting up a connection to the
server.
A control connection always waits for communication between the client and the server,
transmits related commands from the client to the server, and then responses from the
server to the client.
l Data connection
The server uses port 20 for data connections. Generally, the server can either open or
close a data connection actively. For files sent from the client to the server in the form of
streams, however, only the client can close a data connection.
FTP transfers each file in streams, using an EOF indicator to identify the end of a file.
Therefore, a new data connection is required for each file or directory list to be
transferred. When a file is being transferred between the client and the server, it indicates
that a data connection is set up.
FTP
In the current system, FTP manages the control connection by using User Protocol
Interpretation (User-PI) and Server Protocol Interpretation (Server-PI) and transfers files by
using the User Data Transport Process (User-DTP) and Server Data Transport Process
(Server-DTP).
l FTP client
The FTP User Interface (UI) provides an interactive command line interface (CLI) for
users, which receives and interprets command lines input by users and offers help
information. After receiving a command on the UI, FTP triggers User-PI to convert the
command into a standard FTP command, and then manages the control connection to the
FTP client.
– After a login command is input, User-PI creates a control connection between the
client and the server.
– After a directory operation command is input, User-PI sends and receives control
data between the client and the server.
– After a file transfer command is input, User-PI enables User-DTP to transfer files
between the client and the server. User-DTP is responsible for creating a data
connection to the FTP server for data exchange. The data connection is temporarily
set up. That is, a data connection is set up when files or directory lists need to be
transferred and disconnected when the transfer process is complete or a
disconnection request is received.
l FTP server
Server-PI listens to FTP standard port 21 to wait for connection requests from the FTP
client. After receiving a login connection request from the FTP client, the FTP server
handles the request and sends a reply.
– After a login command is received, the login authentication process is triggered. If
the login authentication succeeds, a control connection to the FTP client is set up.
– After files are received, Server-DTP and User-DTP are triggered to create a data
connection to transfer files.
Server-DTP supports both active and passive data connection requests. By default,
Server-DTP is in the active state.

Equipment
When Server-DTP is transferring data, a user can forcibly disconnect the connection.
Upon receiving a disconnection request, Server-DTP stops transferring data and
disconnects the connection. Normally, a data connection is automatically disconnected
when file transfer is complete.
Process of Setting Up an FTP Connection

The process of setting up an FTP data connection by using active mode is as follows:
1. The server enables port 21 to wait for a connection request from the client.
2. The client sends a connection request to the server.
3. After the request is received, a control connection is set up between the temporary port
on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client chooses a temporary port for the data connection and sends the port number
by using the port command to the server over the control connection.
6. The server sends a request to the client for setting up a data connection to the temporary
port on the client.
7. After the request is received by the client, the data connection between the temporary
port on the client and port 20 on the server is set up.
The process of setting up an FTP data connection by using passive mode is as follows:
1. The server enables port 21 to wait for a connection request from the client.
2. The client sends a connection request to the server.
3. After the request is received, a control connection is set up between the temporary port
on the client and port 21 on the server.
4. The client sends a command for setting up a data connection to the server.
5. The client sends a command string PASV to the server to request the port number.
6. The server chooses a temporary port for the data connection and sends the port number
to the client over the control connection.
7. The server sends a request to the client for setting up a data connection.
8. The data connection between the temporary port on the client and the temporary port for
the data connection on the server is set up.
Figure 1-11 Process of setting up an FTP connection

PORT 10,168,2,45,9,42->
Port 2345 Port 21
Port 2346 Port 20

<-Port 2346
FTP Client FTP Server

10.168.2.45/32

Equipment
Figure 1-11 shows the process of setting up an FTP connection, assuming that the number of
the temporary port for the control connection is 2345 and the number of the temporary port
for the data connection is 2346.
1.2.2.2 TFTP
The Trivial File Transfer Protocol (TFTP) is a simple protocol for file transfer.
The TFTP client supports file upload and download by using TFTP. To ensure simple
implementation, TFTP utilizes the User Datagram Protocol (UDP) as its transport protocol.
Compared with FTP, TFTP does not require complicated interaction interfaces and
authentication control. Therefore, TFTP is applicable in a networking environment without
complicated interactions between the client and the server. For example, you can obtain the
memory image of the system through TFTP when the system is started up. To preserve the
small size of TFTP packets, TFTP is realized based on UDP.
Presently, the ATN implements the TFTP client rather than the TFTP server. The TFTP client
can upload and download files.
Basic Concepts of TFTP

l Operation code
TFTP packet header contains a two-byte operation code, with values defined as follows:
– 1: Read request (RRQ): indicates a read request (RRQ).
– 2: Write request (WRQ): indicates a write request (WRQ).
– 3: Data (DATA): indicates data packets.
– 4: Acknowledgment (ACK): indicates a positive reply packet.
– 5: Error (ERROR): indicates error packets.
l File type
TFTP supports the following file types:
– Binary type: is used to transfer program files.
– ASCII type: is used to transfer text files.
Currently, the ATN can act only as the TFTP client and only the binary transfer type is
available.
Basic Principle of TFTP

l A user name and password are not required.
This is because TFTP is designed for the bootstrap process.
l TFTP transfer
The client initiates the TFTP transfer.
– To download files, the client sends an RRQ to the server. The server then accepts
the request and sends a data packet to the client. After receiving the data packet, the
client sends an ACK packet to the server.
– To upload files, the client sends a WRQ to the server. After the server accepts the
request, the client sends a data packet to the server and waits for an ACK packet
from the server.

Equipment
1.2.2.3 Introduction to Telnet

The Telecommunication Network Protocol (Telnet) is derived from ARPANET, which is one
of the earliest Internet applications released in 1969. Telnet enables a terminal to remotely log
in to a server and provides an interactive operation interface. Through Telnet, a login user of
one host can log in to other hosts to configure and manage them without being physically
connected to each of them.
NOTE
Telnet is insecure. Using STelnet is recommended.
Basic Concepts of Telnet

l NVT
The Network Virtual Terminal (NVT) is a virtual device from which both ends of a
Telnet connection, the client and the server, map their real terminal to and from. By using
the NVT, Telnet can operate between any hosts (any operating system) or terminals.
That is, the client operating system must map to the NVT whatever type of terminal the
user is using. The server must then map the NVT to whatever terminal type the server
supports.
Figure 1-12 shows conversion between physical terminals and the NVT.
Figure 1-12 Conversion between physical terminals and the NVT
Terminal Telnet client Telnet server Terminal driver
Internet
Local NVT Remote

character set character set character set
l NVT ASCII
NVT ASCII is a 7-bit ASCII character set. Each 7-bit character is sent as an 8-bit byte,
with the high-order bit set to 0. The Internet protocol suite including FTP and the Simple
Mail Transfer Protocol (SMTP) uses NVT ASCII.
l IAC
Telnet uses in-band signaling in both directions. The byte 0xff is called the Interpret As
Command (IAC). The next byte is the command byte.
Commands and their meanings are listed as follows:
– SE: suboption end
– SB: suboption begin
– WILL: option negotiation
– WONT: option negotiation
– DO: option negotiation
– DONT: option negotiation
– IAC: data byte 255

Equipment
Table 1-1 Telnet command set defined in RFCs

Name Code (Decimal Notation) Description
EOF 236 End of file
SUSP 237 Suspend current process

(job control)
ABORT 238 Abort process
EOR 239 End of record
SE 240 Suboption end
NOP 241 No operation
DM 242 Data mark
BRK 243 Break
IP 244 Interrupt process
AO 245 Abort output
AYT 246 Are you there?
EC 247 Escape character
EL 248 Erase line
GA 249 Go ahead
SB 250 Suboption begin
WILL 251 Option negotiation
WONT 252 Option negotiation
DO 253 Option negotiation
DONT 254 Option negotiation
IAC 255 Data byte
l Telnet connection
A Telnet connection is a TCP connection used to transmit data with Telnet control
information.
l Telnet client/server mode
Telnet adopts the client/server mode. Figure 1-13 shows the schematic diagram of the
Telnet client/server mode.

Equipment
Figure 1-13 Schematic diagram of the Telnet client/server mode
Telnet server Telnet client
TCP
Pseudo connection Terminal
TCP/IP TCP/IP
terminal driver driver
Kernel Kernel
User at a
Login shell
terminal
The preceding diagram shows that:

– Telnet uses TCP.
– All echo messages of the Telnet connection are output to the terminal.
– The server interacts directly with the pseudo terminal.
– Commands and data are transmitted between the server and the client through the
TCP connection.
– The client logs in to the server.
Principle of Telnet
Telnet is designed to operate between any two hosts or terminals. The client operating system
maps to the NVT whatever type of terminal the user is using. The server then maps the NVT
to whatever terminal type the server supports. The types of clients and terminals are ignored.
Communication ends are simply assumed as being connected to the NVTs.
NOTE
Telnet adopts the symmetric mode. Theoretically, there must be an NVT at each of the two ends of a
Telnet connection.
The two ends of a Telnet connection send WILL, WONT, DO, or DONT requests for option
negotiation. The options to be negotiated include echo, character set of command change, and
line mode.
This section describes the operating principles of Telnet:
l Requests in a Telnet connection
Either end of a Telnet connection can initiate a request to the other end. Table 1-2 shows
different requests and their meanings.
Table 1-2 Description of requests for a Telnet connection

Request Description Response
WILL WONT DO DONT
WILL Sender wants to - - Receiver Receiver

enable option says OK says NO

Equipment
Request Description Response
WONT Sender wants to - - - Receiver

disable option must say
OK
DO Sender wants Receiver Receiver - -

receiver to says OK says NO
enable option
DONT Sender wants - Receiver - -

receiver to must say
disable option OK(1)
NOTE
When the sender sends an "option disable" request, such as WONT and DONT, the receiver must
accept the request.
When the sender sends an "option enable" request, such as WILL and DO, the receiver can either
accept or reject the request.
l If the receiver accepts the request, the option is enabled immediately.
l If the receiver rejects the request, the option remains disabled, but the sender can retain the
features as the NVT.
l Option negotiation
Option negotiation requires three bytes:
The IAC type, the byte for WILL, DO, WONT or DONT, and the option ID.
The following example illustrates the process of option negotiation.
The server needs to enable the "remote traffic control" with the option ID 33, and the
client grants the request. The commands exchanged between the server and client are as
follows:
– On the server: <IAC,WILL,33>
– On the client: <IAC,DO,33>
l Suboption negotiation
Certain options require more information than the option ID. For example, if the sender
requires the receiver to specify the terminal type, the receiver must respond with an
ASCII string to specify the terminal type.
The format of the commands for suboption negotiation is as follows:
< IAC, SB, option code, contents of suboption, IAC, SE >
A complete process of suboption negotiation is as follows:
– The sender sends a DO or WILL command carrying an option ID to request that the
option be enabled.
– The receiver returns a WILL or DO command carrying the option ID to accept the
request.
After the preceding two steps, both ends agree to enable the option.
One end of the connection starts suboption negotiation by sending a request
composed of the SB, suboption ID, and SE in sequence.

Equipment
– The opposite end responds to the request for suboption negotiation by sending a
command composed of the SB, suboption ID, related negotiation information, and
SE in sequence.
– The receiver returns a DO or WILL command to accept the negotiation information
about the suboption.
If there are no additional suboptions to be negotiated, the negotiation ends.
NOTE
In the preceding process, the receiver is assumed to accept the request from the sender. In practice,
the receiver can reject requests from the sender at any time as required.
The following example illustrates the process of terminal type negotiation.
The client needs to enable the "terminal type" with the option ID 24. The server grants
the request and sends a request for querying the client terminal type. The client then
sends to the server another request carrying its terminal type "DELL PC". The
commands exchanged between the server and client are as follows:
– On the client: <IAC, WILL, 24>
– On the server: <IAC, WILL, 33>
– On the server: <IAC, SB, 24, 1, IAC, SE>
– On the client: <IAC, SB, 24, 0, "D", "E", "L", "L", "P", "C", IAC, SE>
NOTE
l Only the sender that sends the DO command can request terminal type information.
l Only the sender that sends the WILL command can provide terminal type information.
Terminal type information cannot be sent automatically but only in request-response mode.
The terminal type is an NVT ASCII string of case insensitive characters.
l Operating modes
Telnet has the following operating modes:
– Half-duplex
– Character at a time
– Line at a time
– Line mode
Telnet Services Provided by the ATN

The ATN provides the following Telnet services:
l Telnet server
A user runs the Telnet client application on a PC to log in and configure and manage the
ATN.
The standard port number for a Telnet server is 23. If attackers access the standard port
continuously, the bandwidth is consumed and the performance of the server is degraded.
As a result, legitimate users cannot access the port.
In this case, you can configure another port number to replace the standard port number
23. Attackers who do not know the new port number will still send requests for socket
connections to port 23. The Telnet server will reject the requests after detecting the
wrong port number. This effectively prevents bandwidth consumption and waste of
system resources caused by an attack on the standard Telnet server port.
l Telnet client

Equipment
After running the emulation terminal program or Telnet client application on a PC to

connect to the ATN, a user runs the telnet command to log in to the device and manage
it. As shown in Figure 1-14, ATN A can function as both a Telnet server and a Telnet
client.
Figure 1-14 ATN A functioning as a Telnet client

Telnet Session 1 Telnet Session 2
Telnet Server
PC ATN CX600
l Terminal redirection
As shown in Figure 1-15, a user runs the Telnet client application and logs in to the ATN
through a specified port, and then sets up connections with the devices connected to the
ATN through asynchronous serial interfaces. The typical application is that the devices
directly connected to the ATN through asynchronous serial interfaces are remotely
configured and maintained.
Figure 1-15 Terminal redirection

PC
Ethernet
ATN
Async0 Async1 Async2

Async8/16
CX600-1 Lan Switch Modem CX600-2
NOTE
Only the ATNs having asynchronous serial interfaces support terminal redirection.
1.2.2.4 SSH
SSH is short for Secure Shell. Its standard port number is 22.

Equipment
Data transmission in Telnet mode is prone to attack, because it does not have a secure
authentication mode and use TCP to transmit data in plain text. Simple Telnet access is also
vulnerable to Denial of Service (DoS) attacks, IP address spoofing, and route spoofing.
With the increasing emphasis on network security, data transmission in plain text used by
traditional Telnet and FTP is becoming unacceptable. SSH is a network security protocol that
provides secure remote access and other secure network services on an insecure network by
encrypting network data.
SSH uses TCP to exchange data and builds a secure channel based on TCP. In addition to
standard port 22, SSH supports access through other service ports to prevent attacks.
SSH supports password authentication, Elliptic Curves Cryptography (ECC), Digital-
Signature Algorithm (DSA) and Revest-Shamir-Adleman Algorithm (RSA) authentication. It
uses Data Encryption Standard (DES), 3DES, and Advanced Encryption Standard (AES)
encryption to prevent password interception, ensuring the integrity and reliability of the data
and guarantee the secure data transmission. In particular, ECC, RSA and DSA authentication
supports the combined use of symmetric and asymmetric encryption. This implements secure
key exchange and finally secures the session process.
By virtue of data encryption in transmission and more secure authentication, SSH is widely
used and has become one of the more important network protocols.
SSH has two versions: SSH1 (SSH 1.5) and SSH2 (SSH 2.0). Both are different and
incompatible. SSH2.0 is superior to SSH 1.5 in security, functions, and performance.
NOTE
SSH in this chapter refers to SSH2.0, unless otherwise specified.
Devices that can function as the STelnet client and server support both SSH1 (SSH 1.5) and
SSH2 (SSH 2.0). Devices that can function as the SFTP client and server support SSH2 (SSH
2.0).
Secure Telnet (STelnet) enables users to remotely and securely log in to the device, and
provides the interactive configuration interface. All data exchanges based on STelnet are
encrypted. This ensures the security of sessions.
The SSH File Transfer Protocol (SFTP) enables users to log in to the device securely for file
management from a remote device. This improves the security of data transmission for the
remote system update. Meanwhile, the client function provided by SFTP enables users to log
in to the remote device for secure file transmission.
Basic Concepts of SSH

l SFTP
SFTP guarantees secure file transfer over an insecure network by authenticating the
client and encrypting data in bidirectional mode.
l STelnet
STelnet ensures secure Telnet services. It guarantees secure file transfer on a traditional
insecure network by authenticating the client and encrypting data in bidirectional mode.
l RSA authentication
RSA authentication is based on the private key of the client. It is a public key encryption
architecture and an asymmetric encryption algorithm. RSA is mainly used to help solve
the problem of factoring large numbers by transmitting the keys of the symmetric
encryption algorithm, which can improve encryption efficiency and simplify key
management.

Equipment
The server checks whether the SSH user, public key, and digital user signature are valid.
If all of them are valid, the user is permitted to access the server; if any of them is
invalid, the authentication fails and the user is denied access.
l DSA authentication
The digital signature algorithm (DSA) is an asymmetric encryption algorithm used the
authenticating clients. DSA algorithm consists of a public key and a private key.
Like RSA, the server checks whether the SSH user, public key, and digital user signature
are valid. If all of them are valid, the user is permitted to access the server; if any of them
is invalid, the authentication fails and the user access is denied.
Compared with RSA authentication, DSA authentication adopts the DSA encryption
mode and is widely used.
– In many cases, SSH only supports DSA to authenticate the server and the client.
– In SSH, DSA authentication takes precedence over RSA authentication.
l ECC authentication
The differences between the ECC and RSA algorithms are as follows:
– The RSA algorithm is based on large number factorization, which increases the key
length. And the long keys slow down the computing speed and complicate the key
storage and management.
– Based on discrete logarithm, the ECC algorithm is difficult to crack and is more
secure.
Compared with the RSA algorithm, the ECC algorithm shortens the key length while
ensuring the same security.
Compared with the RSA algorithm, the ECC algorithm secures the encryption with short
keys, which speeds up encryption. The ECC algorithm has the following advantages:
– ECC algorithm provides same security with shorter key length than the RSA
algorithm.
– Features a shorter computing process and higher processing speed than the RSA
algorithm.
– Requires less storage space than the RSA algorithm does.
– Requires lower bandwidth than the RSA algorithm does.
l Password authentication
Password authentication is based on the user name and password.
On the server, the AAA module assigns a login password to each authorized user. The
server has the mappings between user names and passwords. When a user requests
access the server, the server authenticates the user name and password. If either of them
fails to pass authentication, the access is denied.
l ECC-password authentication, RSA-password authentication and DSA-Password
authentication
The server can authenticate the client by checking both the public key and the password.
It allows user access only when both public key and password are consistent with those
configured on the server.
l ALL authentication
The server can authenticate the client by checking both the public key and the password.
It allows user access when either the public key or the password is consistent with those
configured on the server.

Equipment
SSH Features Supported by the Device

l Basic SSH functions
– Different encryption algorithms for incoming and outgoing data
– Different MAC algorithms for incoming and outgoing data
– Encryption algorithms of 3DES, DES, Advanced Encryption Standard (AES128),
AES256, AES128_CTR, and AES256_CTR
– HMAC authentication algorithms of SHA1, SHA1-96, SHA2-256, SHA2-256-96,
MD5 and MD5-96
– DH_Group1, DH_Exchange_Group, and DH_Group14 algorithms for key
exchange
– Public key format of SSH-RSA
– Public key format of SSH-DSA
– Public key format of SSH-ECC
– Key re-exchange (It indicates renegotiation of the key. During this process, the
algorithm and the key used for the algorithm are negotiated.)
– Public key authentication and password authentication
l SSH client function
The SSH client function allows users to establish SSH connections with a UNIX host or
the device supporting the SSH server. Figure 1-16 and Figure 1-17 show the
establishment of an SSH connection in the Local Area Network (LAN) and in the Wide
Area Network (WAN) respectively.
Figure 1-16 Establishing an SSH connection in a LAN

WorkStation ATN
Ethernet 100BASE-TX
Server LapTop PC
PC running SSH client
Figure 1-17 Establishing an SSH connection in a WAN
Local LAN Remote LAN
WAN
ATN SSH Router
PC run SSH client PC

Equipment
l SSH for SFTP

SFTP is based on SSH2.0. It guarantees secure file transfer on a traditional insecure
network by authenticating the client and encrypting data in bidirectional mode.
An SFTP-enabled device can provide the following functions:
– Acting as the SFTP client or the SFTP server
– Being enabled with or disabled from SFTP services (By default, SFTP services are
disabled.)
– Setting the default directory that the SFTP client is allowed to access
l SSH for STelnet
An STelnet-enabled device can provide the following functions:
– Acting as the STelnet client or the STelnet server
– Being enabled with or disabled from STelnet services. (By default, STelnet services
are disabled.)
l SSH for non-standard ports
The standard SSH listening port number is 22. When attackers continuously access the
port, the bandwidth and performance of the server is reduced and authorized users are
prevented from accessing this port. This is known as a DoS attack.
To address the problem, you can change the listening port to another port on the SSH
server. This prevents attackers from consuming bandwidth and system resources.
Authorized users can still access the SSH server through non-standard ports to decrease
DoS attacks.
Applications of this function are as follows:
– The STelnet client can access the server using a non-standard port.
– The listening port can be set on the SSH server.
Principles of SSH
SSH uses the traditional client/server (C/S) application model. Its security is guaranteed by
using the following modes:
Data encryption: Through the negotiation between the client and the server, an encryption key
is generated and used in data symmetric encryption. This ensures confidentiality during data
transmission.
Data integrity: Through the negotiation between the client and the server, an integrity key is
generated and used to uniquely identify a session link. All session packets are identified by
the integrity key. Any modifications made by the third party during transmission can be
discovered by the receiver based on the integrity key. The receiver can discard these modified
packets to ensure the data integrity.
Authority authentication: There are multiple authentication modes. Authority authentication
allows only valid users to have a session with the server, improving system security and
safeguarding the benefits of valid users.
Establishment of an SSH Connection

The SSH connection goes through six phases during the entire communication process, as
shown in Figure 1-18. The SSH connection is established through negotiation. The following
is the entire SSH negotiation procedure.

Equipment
Figure 1-18 Establishment of an SSH connection
Version Negotiation
Algorithm Negotiation
Key Exchange
User Authentication
Session request
Interactive session
1. Version negotiation
In the version negotiation phase, the SSH client sends a request for setting up a TCP
connection to the SSH server. After the TCP connection is set up, the SSH server and
SSH client negotiate the SSH version. After a matched version protocol is obtained,
different version protocols correspond to different state machine processes. If the version
of the client matches that of the server, the key negotiation starts; otherwise, the SSH
server tears down the TCP connection.
2. Algorithm negotiation
In the algorithm negotiation phase, the sender sends algorithm negotiation messages to
the receiver, together with their parameters, such as the random cookie, key exchange
algorithm, host key algorithm, Message Authentication Code (MAC) method, and
supported language.
After receiving these algorithm negotiation messages, the receiver compares the received
algorithm list set with the local algorithm list set. If the key exchange algorithm, public
key encryption algorithm, or MAC algorithm is not found, the receiver tears down the
connection with the sender and the algorithm negotiation fails.
3. Key exchange
After the server and client negotiate the version, the server sends the client a packet
containing the server's host public key, the server public key, the supported encryption
algorithm, the authentication algorithm, the protocol extension flag, and an 8-byte
cookie. This packet is sent in simple text.Then, the server and client calculate a 16-byte
session ID using the same parameter. The client also randomly generates a 32-byte
session key used to encrypt data. The client does not send the session key to the server,
but use the most-significant 16 bytes of the session key to XOR the 16-byte session ID to
obtain a result. The client then arranges the result using the Most Significant Bit (MSB)
first rule and obtains a multiple precision (MP) integer. Then the client encrypts the MP
integer using a public key with a smaller module value, arranges the result using the
MSB first rule again, and obtains a new value. Then the client uses a public key with a
larger module value to encrypt the new value.
The server is now in the waiting state. When receiving a key generation message from
the client, the server then returns a key generation message to the client, which indicates

Equipment
that key exchange is complete and that the new key should be used for communications.
If the server fails to receive a key generation message from the client, it returns a key
exchange failure message and tears down the connection.
4. User authentication
After obtaining the session key, the SSH server authenticates the SSH client. The SSH
client sends the identity information to the SSH server. After a specific authentication
mode is configured on the SSH server, the client sends an authentication request. If the
authentication succeeds or the connection with the server expires, the connection is
terminated.
The SSH server authenticates a user in one of the following methods:
– In ECC, RSA, DSA authentication, the client generates an ECC, RSA, DSA key
pair and sends the public key to the server. When a user initiates an authentication
request, the client randomly generates a text encrypted with the private key and
sends it to the server. The server decrypts it by using the public key. If decryption
succeeds, the server considers this user trustable and grants access rights. If
decryption fails, the server tears down the connection.
– Password authentication is implemented based on AAA. Like Telnet and FTP, SSH
supports local database authentication and remote RADIUS server authentication.
The SSH server compares the user name and password of an SSH client with the
pre-configured ones. If both are matched, authentication succeeds.
5. Session request
After user authentication is completed, the client sends a session request to the server.
The session requests include the running of Shell and commands. At the same time, the
server waits to process the request from the client. During this phase, the server responds
to the client with an SSH_SMSG_SUCCESS message after successfully processing a
request from the client. If the server fails to process or identify the request, it responds
with an SSH_SMSG_FAILURE message.
Possible causes for the authentication failure are as follows:
– The server fails to process the request.
– The server cannot identify the request.
6. Interactive session
After the session request is accepted, the SSH connection enters the interactive session
mode. In this phase, data is transmitted bidirectionally.
a. The client sends a packet with the encrypted command to the server.
b. After receiving the packet, the server decrypts the packet and runs the command.
Then, the server packages the encrypted command execution results and sends the
packet to the client.
c. Upon receiving the packet, the client decrypts it and displays the command
execution results on the terminal.
1.2.2.5 User Management

Users can log in to the device to configure, monitor, and maintain local or remote network
devices only after user interfaces, user management, and terminal services are configured.
User interfaces provide the login place; user management ensures login security; terminal
services offer login protocols.
The device supports the following login modes:

Equipment
l Login through the console port

l Local or remote login through Telnet or SSH
User management, consisting of user interface configurations, user view configurations, and
terminal services, provides users' secure login and operations, thus implementing unified
management over different user interfaces.
User Interface
A User Interface (UI), which is presented in the form of a user interface view, enables users to
log in to the device. Through a user interface, you can configure the parameters on all
physical and logical interfaces that work in asynchronous and interactive modes. In this
manner, you can manage, authenticate, and authorize the login users.
l The system supports the following user interfaces:

– Console port: It is a linear port that is provided by the main control board of the
device.
Each main control board provides a console port. The serial port of the user
terminal can directly connect to the console port of the device to implement local
configurations of the device.
– Virtual Terminal (VTY)
It is a kind of virtual interface indicating a logical terminal line.
When you set up a Telnet or SSH connection with the device through a terminal,
you set up a VTY. You can also perform the local or remote access to the device
through the virtual connection established through VTY.
l Numbering of user interfaces
You can number a user interface in the following manners:
– Relative numbering
The format of relative numbering is: user interface type + number.
Relative numbering indicates that the interfaces of the same type are numbered.
Relative numbering uniquely specifies a user interface of the same type. Relative
numbering must comply with the following rules:
Number of the CON port: CON 0
Number of the VTY: The first VTY is 0, the second VTY is 1, and so on
– Absolute numbering
Absolute numbering uniquely specifies a user interface or a group of user
interfaces.
Absolute numbers start with 0 and are allocated in the sequence of the CON port
and the VTY.
On a main control board, only one CON port is present but a maximum of 20 VTYs
are present. (The VTYs ranging from 1 to 14 are provided for ordinary Telnet or
SSH users and those ranging from 16 to 20 are reserved for Network Management
System (NMS) users.) In the system view, the allowable maximum number of user
interfaces can be set. The default value is 5.
By default, the absolute numbering of the CON port and the VTY is shown in Table
1-3.

Equipment
Table 1-3 Example for the absolute numbering of user interfaces

Absolute User Interface
Numbering
0 CON0
34 VTY0: the first VTY
35 VTY1: the second VTY
36 VTY2: the third VTY
37 VTY3: the fourth VTY
38 VTY4: the fifth VTY
NOTE
In the previous examples, the numbers ranging from 1 to 32 are reserved for VTYs. TTY is a
synchronous or asynchronous terminal line, which is related to specific physical devices.
Currently, the commands for viewing absolute numbering and relative numbering have been
provided.
User Login
In the absence of user authentication, any user can configure the device after the PC is
connected to the device through the console port.
Thus, the device and network are vulnerable to attacks. In this case, users should be created
for the device and passwords should be set for users so that the device can manage users. SSH
users are configured with RSA authentication and other users are configured with AAA. For
more information, refer to the AAA Feature Description.
User Classification
The users of the device can be classified into the following types based on the types of
services that users enjoy.
l HyperTerminal users: indicate the users who log in to the device through the console
port.
l Telnet users: indicate the users who log in to the device through Telnet.
l FTP users: indicate the users who transfer files by setting up the FTP connection with the
device.
l SSH users: indicate the users who perform the remote access to the network by setting
up the SSH connection with the device, including the STelnet mode and the SFTP mode.
l NMS users: indicate the users who set up the connection with the device through SNMP
or Telnet to manage devices in machine-to-machine mode.
One user can obtain multiple services simultaneously to perform multiple functions. VTY
users, namely, Telnet or SSH users, need be bound to admission protocols in the user interface
view before they log in.
User Priorities
The system supports hierarchical management over HyperTerminal users and VTY users.

Equipment
Command levels are increased from 4 to 16. Similar to command levels, users are classified
into 16 levels numbered 0 to 15. The greater the number, the higher the user level. The level
of the command that a user can run is determined by the level of this user.
l In the case of password authentication, the level of the command that the user can run
depends on the level of the user interface.
l In the case of AAA authentication, the command the user can run depends on the level of
the local user specified in AAA configuration.
A user can run the commands whose levels are equal to or lower than the user level. For
example, the level 2 user can access the commands at levels 0, 1, and 2. The level 3 user can
access the commands at levels 0, 1, 2, and 3.
NOTE
The one-to-one mapping exists between user levels and command lines.
User Authentication
After users are configured, the system authenticates the users when they log in to the device.
Two authentication modes are available: password authentication, and Authentication,

Authorization, and Accounting (AAA) authentication.
l Password authentication: In this mode, users can log in to the device by entering the
password rather than the username. This mode is configured based on the terminal line.
A password can be configured for a terminal line or a group of terminal lines.
l AAA authentication: It includes AAA local authentication and AAA remote
authentication. In AAA local authentication, users need enter both the username and
password on the local device. If necessary, users also need enter user attributes, such as
user rights and FTP paths of users. In AAA remote authentication, user information need
be configured on the AAA server. In general, AAA server authentication is used for
VTY users; AAA local authentication or non-authentication is used for console users.
For more information, refer to the AAA Feature Description.
Planning Users
The network administrator can plan the users of the device as required.
l Usually, at least a HyperTerminal user need be created on the device.

l Telnet or SSH users need be configured to implement remote login to the device through
Telnet or SSH.
l FTP or SFTP users need be configured to enable remote users to upload or download
files to or from the device.
1.2.2.6 Virtual File System

The virtual file system, that is easy-to-use and tailorable, has two functions, namely,
managing the storage device and managing the files that are stored on the device. In the file
system, users can create, delete, modify, and rename a file or a directory, and view the
contents of a file. To manage mass storage devices more effectively and ignore the differences
of bottom-layer storage devices, the mass storage device must support the virtual file system
that is easy-to-use and tailorable.

Equipment
Basic Concepts
l Storage device: a hardware device used to store data
l File: a mechanism used for the system to store and manage information
l Directory: a mechanism used by the system to integrate and organize files and to provide
a logical container of files
Managing Storage Devices

l Repairing the storage device with the abnormal file system
When the file system on a storage device fails, the device terminal prompts that the fault
should be rectified.
l Formatting the storage device
When the repair of the file system fails or when the data on the storage device is no
longer needed, the storage device can simply be reformatted. However, all data on the
device will be lost.
If reformatting the storage device fails, a physical fault may occur.
Managing File Directories

When transmitting files between the client and the server, directories needs to be set up in the
file system. The specific operations are as follows:
l Display the current directory.

l Change the current directory.
l Display directories or file information.
l Create a directory.
l Delete a directory.
NOTE
Either the absolute path or relative path is applicable.
Managing Files
You can perform the following operations for files:
l Display file contents.

l Copy files.
l Move files. Changing the file storage location.
l Rename files. Changing the names of existing files.
l Delete files. Deleting existing files and actually moving files to the recycle bin. This
operation is reversible. The wildcard (*) can be used to delete multiple files at a time.
l Delete files from the recycle bin. This operation is irreversible.
l Restore deleted files. Restoring files from the recycle bin. Restoring deleted files is a
reverse operation of deleting files.
Miscellaneous
l Executing batch files

Equipment
A batch file is created and executed to automat several tasks. Batch files must be created
on the client and uploaded to the device.
This operation need edit batch files on the client and upload batch files to the device.
l Configuring the prompt mode of the file system
If data is lost or damaged during file management, the system should provide prompts as
to corrective steps.
NOTICE
If the prompt mode is set as quiet, the system does not provide prompts when data is lost
because of user misoperations such as the accidentally deleting files. Therefore, this quiet
mode should be used with caution.
1.2.2.7 Pipe Character

The pipe character is used to filter and then display the output of display commands
according to the rules set by a user.
During device maintenance, a display command may output a lot of information, only a part
of which has real value to the user, for example, the status of interfaces, the status of OSPF
peers, and the Cyclic Redundancy Check (CRC) statistics of interfaces (used to determine or
locate a fault). If all the output of a display command remains unfiltered, users cannot readily
obtain pertinent information. The pipe character filters out irrelevant information of the
command output, insuring the desired information stands out to help users rapidly determine
the exact nature of the problem.
NOTE
If a great amount of command output is to be displayed, the device takes a long period of time to output
all information. Wait a although to obtain desired information.
Filtration rules of the pipe character are as follows:

l include + regular expression
In this mode, the lines containing user-specified contents are displayed.
l begin + regular expression
In this mode, the lines from the first line containing user-specified contents are
displayed.
l exclude + regular expression
In this mode, the lines not containing user-specified contents are displayed.
l count
In this mode, the lines to be output are counted and only the line numbers are displayed.
Context of Filtered Information

After display command output is filtered, the displayed information is difficult to understand
in absence of context. To facilitate understanding, a context can be displayed with the
displayed information. Context rules are as follows:
l Before: The lines containing user-specified contents and the preceding lines are
displayed.

Equipment
l After: The lines containing user-specified contents and the subsequent lines are
displayed.
l Before + After or After + Before: The lines containing user-specified contents and the
preceding and subsequent lines are displayed.
Special Processing of the Table-form Output

The output of certain display commands contains tables such as FIB and ARP tables. A table
is composed of the table heading, table tail, and table text (entries). If the table heading and
tail are included in the pipe character filtration, they are probably filtered out. This is not
convenient. It is necessary, therefore, that table headings and tails are not included in the
filtration process.
Generally, all display commands need to support the pipe character. The display commands
that meet the following requirements, however, do not necessarily support the pipe character:
l Commands whose output information is stable can be displayed in current screen.
l Commands whose output information does not vary with configurations, dynamic data,
and specifications.
1.2.2.8 Daylight Saving Time

Daylight Saving Time (DST), also referred to as summer time, is a convention established by
communities for prolonging daylight hours and saving resources such as the cost of lighting
office buildings and schools.
In high latitude areas, the sun rises earlier in summer than in the winter. To reduce evening
usage of incandescent lighting and save energy, clocks are adjusted forward one hour in the
spring. At present, about 110 countries around the world adopt DST.
Users can customize the DST zone according to their countries' or regions' convention. Users
can set when and how clocks are adjusted forward, usually an hour. With DST enabled, the
system time is adjusted accordingly; when it is time to end DST, the system time
automatically returns to normal.
1.2.2.9 Timing Restart

The system supports timing restart when, at a specified time, the system automatically restarts
and updates system files. Such a device upgrade needs to be performed at the exact right time.
After a timing restart is configured, maintenance personnel just need to prepare software
packages, system image files, and to set the time and files for the automatic device restart.
1.2.3 Applications
1.2.3.1 Applications of FTP

l Device functioning as an FTP client
A user logs in to the FTP server from the device acting as an FTP client and then
downloads files from the server to the client storage device.
In Figure 1-19, the device with the IP address of 172.16.105.111 acts as the FTP client.
The user then can log in to the FTP server from the client through FTP.

Equipment
Figure 1-19 Networking diagram of the device functioning as an FTP client
IP Network
Server ATN
172.16.105.110/24 172.16.105.111/24
l Device functioning as an FTP server

A user logs in to the client from a HyperTerminal. The device functions as an FTP
server, and downloads files from the FTP server. In Figure 1-20, the device with the IP
address of 172.16.104.110 acts as the FTP server.
Figure 1-20 Networking diagram of the device functioning as an FTP server
Server
172.16.104.110/24
console cable
1.2.3.2 Applications of TFTP

Downloading or Uploading Files Through TFTP
A user can use TFTP to upload or download files to or from the server in a simple interaction
environment. Currently, the device acts only as a TFTP client.
Figure 1-21 shows the networking of downloading or uploading files through TFTP.
Figure 1-21 Networking diagram of uploading or downloading files through TFTP
Server ATN PC
TFTP Client
10.111.16.160/24

Equipment
1.2.3.3 Applications of Telnet

Telnet applies to remote login to configure, monitor, and maintain the remote or local devices
running device.
As shown in Figure 1-22, the user on ATN logs in to the remote CX through Telnet.
Figure 1-22 Networking diagram of login through Telnet

10.1.1.1/24 10.1.1.2/24
ATN CX600
1.2.3.4 Applications of SSH

Attackers cannot pass authentication because they cannot provide the correct private key or
password. In addition, they cannot obtain the session key between another client and the
server. Only the server and the related client can decrypt packets exchanged between them.
Even if attackers intercept packets exchanged between the server and the client, they cannot
decrypt the packets. In this manner, secure data transmission on the network is guaranteed.
l SSH for STelnet

The STelnet client is based on SSH2 and the STelnet server is based on SSHv1.x and
SSHv2. The client and the server set up a secure connection through negotiation. The
client can then log in to the server using Telnet. Figure 1-23 shows the networking of
SSH for STelnet.
Figure 1-23 Networking diagram of SSH for STelnet
Stelnet Client SSH Server
– A device can function as the STelnet server. Alternatively, it can function as the
STelnet client to access other STelnet servers.
– STelnet services can be enabled or disabled as required and they must be configured
on global mode. By default, STelnet services are disabled.
l SSH for SFTP
SFTP is based on SSH2.0, which supports the following authentication modes: password
authentication, RSA authentication, DSA authentication and ECC authentication. To
access the server using a client, an authorized user needs to enter the correct user name,
password, and private key to pass the authentication on the server. After that, the user
can use SFTP that is similar to FTP to manage remote file transfer on the network. The
system uses the negotiated session key to encrypt the user's data.
– A device can function as the SFTP server. Alternatively, it can function as the SFTP
client to access other SFTP servers.

Equipment
– SFTP services can be enabled or disabled as required and they must be configured
on global mode. By default, SFTP services are disabled.
– Different users are allowed to use SFTP to access different file directories. Users
can access only the set SFTP directories. Available files for different users are
isolated from each other.
Figure 1-24 Networking diagram of SSH for SFTP
SFTP Client
legal user
Network SFTP Server
SSH Client
setting port VPN
SFTP Server
SFTP Client
attacker
l SSH for the private network

A device can function as either an STelnet client or an SFTP client. Therefore, the client
(device) on a public network can set up a Socket connection with the server in a VPN:
– The STelnet client can access the SSH server on the private network.
– The SFTP client can access the SSH server on the private network.
Figure 1-25 Networking diagram of SSH for the private network
SSH Client
legal user
Network
SSH Client
setting port SSH Server VPN
SSH Client
attacker

Equipment
l SSH for non-standard ports

The standard SSH listening port number is 22. If attackers continuously access this port,
the available bandwidth and the performance of the server are reduced and authorized
users cannot access this port.
To address this problem, you can change the listening port on the SSH server to a non-
standard port. The port change is invisible to attackers, so they continue to send socket
connection requests to the standard listening port 22. If the SSH server detects that the
connection requests are not forwarded to the actual listening port, it denies the requests.
Only authorized clients can set up socket connections with the SSH server using non-
standard ports. The client and the server then negotiate the SSH version, algorithms and
session keys. User authentication, session request, and interactive session are performed
subsequently.
SSH can be used on intermediate switching devices or edge devices on a network to
secure user access and device management.
Figure 1-26 Networking diagram of SSH for non-standard ports
SSH Client
legal user
Network
SSH Client
setting port
SSH Server
SSH Client
attacker
l SSH for Remote Authentication Dial in User Service (RADIUS)

If password authentication is required, SSH calls the interface provided by AAA in the
same manner as FTP and Telnet. After user authentication is configured as RADIUS in
AAA, and when SSH authentication is enabled, the SSH server sends the authentication
information (user name and password) to the RADIUS server (which is compatible with
the HWTACACS server). The RADIUS server then sends the authentication result (pass
or fail) to the SSH server where it is determined whether or not to establish a connection
with the SSH client.
Figure 1-27 SSH for RADIUS
SSH Client SSH Server RADIUS Server

Equipment
l SSH for ACLs

The SSH server uses ACLs to limit the call-in and call-out rights of SSH users. This
prevents unauthorized users from establishing TCP connections or entering the SSH
negotiation phase, thus improving the security of the SSH server.
Figure 1-28 Networking diagram of SSH for ACLs
ACL
SSH Client SSH Server
1.2.4 Terms, Acronyms, and Abbreviations
Terms
Terms Description
FTP In the TCP/IP protocol suite, the File Transfer Protocol (FTP) is applied
to the application layer. It is used to transfer files between local and
remote hosts. FTP is implemented based on the file system.
TFTP TFTP is short for Trivial File Transfer Protocol.
Telnet The Telecommunication Network Protocol (Telnet) is applied to the

application layer in the TCP/IP protocol suite. Telnet enables a terminal
to remotely log in to a server, presenting an interactive operation
interface.
NVT The Network Virtual Terminal (NVT) is a bidirectional virtual device,

to and from which both ends of the connection, the client and the server,
map their physical terminals. Because of the use of uniformed NVT,
Telnet can operate between any two hosts (on any operating system) or
terminals.
SSH Secure Shell (SSH) uses multiple encryption and authentication modes
to solve the problem of data encryption and user authentication in
traditional services. In virtue of its mature public key or private key
system, SSH provides an encryption channel between the client and the
server. This solves the problem of insecurity caused when data, such as
passwords, are transmitted over the network in plain text. SSH also
supports multiple authentication modes, such as CA and the smart card,
which solves the authentication problem and eliminates such insecurity
factors as the man-in-the-middle attack.
SFTP The Secure File Transfer Protocol (SFTP) is an SSH-based upper-layer

application, which provides secure file transmission.

Equipment
Terms Description
STelnet The Secure Shell Telnet (STelnet) is an SSH-based upper-layer

application, which provides secure login operations.
TLS TLS is a protocol based on the Netscape's SSL 3.0 protocol. TLS
replaces the vulnerability of SSL, which was vulnerable to man-in-the-
middle attack and used a weak MAC construction. The successors of
SSL are TLS 1.0 and TLS 1.1, which are defined by IETF. HTTPS,
LDAP and SNMP are some of the protocols that continue to use SSL.
Abbreviations
Abbreviations Full Name
AAA Authentication, Authorization, Accounting
ACL Access control list
AES Advanced Encryption Standard
CON Console, Primary terminal line
FTP File Transfer Protocol
FTPS FTP Secure
IETF Internet Engineering Task Force
MAC Message Authentication Code
NVT Network Virtual Terminal
RSA Revest, Shamir and Adleman
SFTP Secure File Transfer Protocol
SSH Secure Shell
SSL Secure Socket Layer
TACACS Terminal Access Controller Access Control System
Telnet Telecommunication network protocol
TFTP Trivial File Transfer Protocol
TTY Terminal controller (A/S or SA)
VPN Virtual Private Network
VRP Versatile router platform

Equipment
Feature Description 2 System Management
2 System Management
About This Chapter
This document describes the system management feature in terms of the overview, principle,
and applications.
2.1 Information Center

2.2 SNMP
2.3 RMON and RMON2
This chapter gives the introduction to Remote Network Monitoring (RMON) and RMON2and
describes their basic concepts and principles as well as applications on Huawei devices.
2.4 IP FPM
2.5 NQA
2.6 Ping and Tracert
2.7 Fault Management
2.8 Performance Management
2.9 PoE Features
2.10 TWAMP
2.11 TWAMP Light
2.1 Information Center
2.1.1 Introduction
Definition
The information center functions as an information hub and is essential to the operation of a
device. It manages most output information and supports information classification to achieve

Equipment
effective filtering. Together with debugging commands and the SNMP module, the
information center provides powerful support for network administrators to monitor the
device operation and locate network faults.
The working mechanism of the information center is as follows:
In general, the information center distributes 3 types of information with 8 severities to 10

information channels, and then outputs those information in different directions. Specifically
speaking, the information center processes information as follows:
1. Receives logs, traps, and debugging information (with different severities) sent from
different modules.
NOTE
The logs, traps, and debugging information are stored in the log, trap, and debugging queues of the
information center. Each queue supports a maximum of 30,000 messages.
2. Distributes the information to different information channels according to user settings.
3. Outputs the information in different directions based on the mappings between the
information channels and directions.
The following table lists the main functions of the information center.
Table 2-1 Main functions of the information center

Function Description
Information The information center classifies information into three types: log, trap, and
classification debugging information.
Information The information center defines eight information severities. A smaller

severity severity value indicates a higher severity.
Information The information center can output information to a log file, console, virtual
output type (VTY) terminal, true type (TTY) terminal, log host, SNMP agent, log
buffer, or trap buffer.
In addition, the information center can output SSL-encrypted syslog
packets.
Information You can use commands to shield output information based on severities or
shield modules.
Purpose
The information center outputs information in a unified format to different directions,
improving information readability, maintainability, and flexibility from the following aspects:
1. Controls the output direction, that is, where information is to be output. Currently,
information can be output to a log file, console, VTY/TTY terminal, log host, SNMP
agent, log buffer, or trap buffer.
2. Filters information based on the information source, severity, type, and output direction.
3. Provides a system-level information output platform.
4. Displays system-level debugging information.

Equipment
5. Transmits encrypted packets.
2.1.2 Principles
2.1.2.1 Information Classification

The information center classifies information into logs, traps, and debugging information to
meet the requirements of different output directions.
Logs record user operations and diagnosis information. Diagnostic logs are used by R&D
personnel for fault locating, but user operation logs can be viewed only.
Traps record faults. After receiving traps, the information center sends them to the SNMP
agent. Then the SNMP agent forwards the traps to an NMS.
Debugging information records device operating status.
Logs
l Log overview
According to the ITU-T, logs are records of events and unexpected activities of managed
objects. The log module helps view user operations and manage system security,
providing basis for system diagnosis and maintenance. Therefore, logs are important for
O&M and fault locating.
l Log implementation on devices
The information center is enabled by default. It can output logs to a specified destination
as required.
For example, you can configure the information center to output logs to a specified log
host. Currently, the device supports a maximum of eight log hosts specified. This feature
allows logs to be simultaneously sent to different log hosts for backup.
The information center can send logs to the console and log buffer by default. If the log
quantity in the log buffer reaches the upper limit, the logs that are stored earliest will be
replaced by new ones.
l Diagnostic log
Diagnostic logs are used for fault locating and are not intended for users. Therefore,
users are not informed of these logs.
The information center still uses the original user log management system to process
diagnostic logs. With this system, you can view user logs rather than diagnostic logs. As
diagnostic log files are encrypted after being generated, specific diagnosis information in
the files cannot be obtained.
By default, diagnostic logs are output to diagnostic log files.
l Security log
The following types of security logs are available:
– Account management security logs: record account operation information, such as
user accounts, IP addresses, login and logout time, and operation time, contents, and
results.
– Protocol security logs: record insecure protocol interactions or algorithms.
– Attack defense security logs: record attack event information, such as the event
occurrence time, attack locations and sources, IP attack types, and attack impacts.

Equipment
– Status security logs: record software and hardware abnormalities, real-time data of
key performance indicators as well as bandwidth, entry, and storage resources, and
process/branch abnormalities.
l Log output format
Syslog is a sub-function of the information center. Syslog uses UDP to output logs to log
hosts through port 514.
Figure 2-1 shows the log output format.
Figure 2-1 Log output format

<Int_16>TIMESTAMP HOSTNAME %%ddAAA/B/CCC(t)[e]:slot=XXX; YYYY
Table 2-2 describes the fields in a log.
Table 2-2 Fields in a log

Field Meaning Description
<Int_16> Leading character Before logs are output to log hosts, leading
characters are added to the logs. However, logs
saved on the local device do not have leading
characters.
TIMESTAM Time to send logs Five timestamp formats are available:

P l boot: a format of relative time used for
debugging information by default
l date: a format of system time used for logs
and traps by default
l short-date: a format of system time that
does not contain year information
l format-date: another format of system time
The timestamp and host name are separated by
a space.
HOSTNAM Host name By default, the system name is HUAWEI.

E
%% Huawei identifier Indicates that the log is output by a Huawei

product.
dd Version Indicates the version of the log format.
AAA Module name Indicates the name of the module that outputs
information to the information center.
B Severity Indicates the log severity.
CCC Brief description Indicates the brief description of a log.

Equipment
(t) Information type l l: log

l T: trap
l d: debugging information
l D: diagnostic log
l s: security log
[e] Information counter Indicates the serial number of a log.
slot=XXX Location Indicates the slot number. Some modules

information generate logs without location information.
YYYY Descriptor Indicates detailed log information output from

a module to the information center. Every time
a log is output, the module fills this field with
detailed information.
Traps
l Trap overview
Traps are notifications generated when the system detects faults. Information about the
faults is carried in traps. Different from logs, traps are time-sensitive and need to be
notified to users in time. Therefore, the information center processes traps sent to the
NMS in a different way.
Traps are sent from a device to an NMS. With the SNMP agent enabled on a device, the
trap function enabled on the associated module, and the NMS host to which traps are
sent configured, when an event occurs (for example, a network interface goes Down), the
device generates a trap and sends it to the specified destination address. If the route
between the device and NMS are reachable, the NMS can receive the trap.
The device has a trap buffer for storing traps. If the device is specified as an information
source on the information center, the buffer can store traps generated by the local device
regardless of whether a destination NMS host is configured.
l Trap-related concepts
– Event: indicates anything that takes place on the managed object. For example, the
object is added, deleted, or modified.
– Fault: indicates a situation where the system does not function properly. A fault
may cause the system to fail to operate or implement redundancy.
– Trap: indicates a notification generated when the system detects a fault.
l Trap output format
Figure 2-2 Output format of traps

TimeStamp HostName ModuleName Severity Brief:Description
Table 2-3 describes the fields in a trap.

Equipment
Table 2-3 Fields in a trap
TimeStamp Time to send traps Five timestamp formats are available:

l boot: a format of relative time used for
debugging information by default
l date: a format of system time used for logs
and traps by default
l short-date: a format of system time that does
not contain year information
l format-date: another format of system time
The timestamp and host name are separated by a
space.
HostName Host name By default, the system name is HUAWEI.

The host name and module name are separated
by a space.
ModuleName Module name Indicates the name of the module that generates a
trap.
Severity Severity Indicates the trap severity.

l Critical
l Major
l Minor
l Warning
Brief Brief description Indicates the brief description of a trap.
Description Detailed description Indicates the detailed description of a trap.
Debugging Information
Debugging information records a device's internal running status. A device can generate
debugging information only after the debugging function of the associated module is enabled
in the user view. Debugging information contains the contents of packets sent or received by
the debugged module. Note that enabling debugging only generates debugging information.
Displaying debugging information requires additional configuration. Different from logs and
traps, no buffer is available for debugging information. The information center can be
configured to output debugging information to the console or log hosts.
You can connect the PC to the console port of a device (called console mode) or to a network
interface of a device through Telnet (called terminal mode). When debugging the device in
console or terminal mode, you can determine the debugging information to be output.
Various debugging commands are provided for debugging protocols and functions that a
device supports. You can enable the debugging of a protocol or function for fault diagnosis.
The output of debugging information depends on the following points:

Equipment
l Whether to enable the debugging function for a protocol

l Whether to display debugging information on the terminal screen
Figure 2-3 shows the relationships between the preceding points. After the debugging
function is enabled for protocols 1 and 3, debugging information is output. As screen display
is also enabled, the debugging information is displayed on the screen. No debugging
information about protocol 2 is output because the debugging function is not enabled for this
protocol.
Figure 2-3 Output of debugging information
Debugging
information 1 2 3
Protocol
debugging
ON OFF ON
1 3 1 3
Terminal screen
display
OFF ON
1 3
2.1.2.2 Information Hierarchy
Overview
If a large amount of information is available, users may feel hard to differentiate between
information about normal operations and information about faults. Therefore, an information
hierarchy is designed to help users roughly determine whether to take an action immediately
or shield the information that does not require any user action.
Information Severities
The information center defines eight information severities. A smaller severity value indicates
a higher severity. Table 2-4 describes the severities.

Equipment
Table 2-4 Information severities
Value Severity Description
0 Emergencies A fatal fault occurs in the device, which causes the system to
fail to function properly unless the device is restarted. For
example, a program abnormality leads to a device restart or a
memory error is detected.
1 Alert A severe fault occurs in the device, which requires immediate

actions. For example, memory usage of the system reaches the
upper limit.
2 Critical A critical fault occurs in the device, which requires that actions
be taken to analyze or process it. For example, the memory
usage or temperature falls below the lower limit, Bidirectional
Forwarding Detection (BFD) detects that the device is
unreachable, or the device is generating error messages.
3 Error An incorrect operation is performed or an abnormal process

occurs in the device, which does not affect subsequent services
but requires attention and cause analysis. For example, users
enter incorrect instructions, user names, or passwords, or error
protocol packets received by other devices are detected.
4 Warning An abnormality that may result in a fault occurs in the device,

which requires attention. For example, the routing process is
disabled, BFD detects a packet loss event, or error protocol
packets are detected.
5 Notification A key operation is performed to keep the device functioning

properly. For example, the shutdown command is run on an
interface, a neighbor is discovered, or the state of a protocol
changes.
6 Informational A general operation is performed. For example, the display

command is run.
7 Debugging A general operation is performed, which requires no attention.
The severity of output information can be modified. If you filter output information based on
a specified severity, only the information with a severity value less than or equal to the
specified value is output. That is, only the information with the specified severity or higher is
output.
For example, if the severity value is set to 6, the information with a severity value ranging
from 0 to 6 is output.
2.1.2.3 Information Output

The information center needs to output information to a terminal, console, log buffer, SNMP
agent, or log file. 10 information channels that work independently from each other are
defined for the information center to facilitate information output.

Equipment
Information Output Channels

The 10 information output channels are equal in precedence. You must specify information
sources for the channels to be used. By default, information sources are specified for six
channels numbered from 0 to 5: console, monitor, log host, trap buffer, log buffer, and SNMP
agent. For devices with storage media, the information center can output log files through
channel 9 by default.
Besides the default channels, you can customize information sources for the rest four channels
numbered from 6 to 9 by configuring their channel names or by running configuration
commands.
Information Output Directions

The information center supports 10 channels, among which channels 0 through 5 have their
default channel names. By default, the six information channels are respectively mapped to
six output directions, as described in Table 2-5.
Table 2-5 Mappings between information channels and output directions

Chan Default Output Description
nel Channel Direction
Numb Name
er
0 console Console Outputs logs, traps, or debugging information to the

local console.
1 monitor Monitor Outputs logs, traps, or debugging information to a

VTY terminal to facilitate remote maintenance.
2 loghost Log host Outputs logs, traps, or debugging information as a

file to a log host for your reference.
3 trapbuffer Trap Outputs traps to the trap buffer of a device. An area

buffer inside the device is specified as the trap buffer to
record traps.
4 logbuffer Log buffer Outputs logs to the log buffer of a device. An area
inside the device is specified as the log buffer to
record logs.
5 snmpagent SNMP Outputs traps to the SNMP agent.

agent
6 unspecified Unspecifie Reserved.

d

d

d

Equipment
Chan Default Output Description

nel Channel Direction
Numb Name
er
9 channel9 Log file Outputs logs, traps, or debugging information to log

files. The log files are saved to a hard disk or flash
card of a device.
As each output direction is associated with an information channel, information can be output
to a specified direction through the associated channel.
You can change information channel names or the mappings between the channels and output
directions as needed.
Information Output
Terminals connected to the device dynamically change. The information center needs to know
the latest change in time to determine whether to output information to terminals and in which
format information is output. Every time an EXEC user logs in, logs out, or has its attributes
changed, the information center is notified of the event through the EXEC module so that
information can be correctly output.
If information is output to a log file, a log file in .zip format is generated. When available
storage space is smaller than the specified threshold, the information center deletes the earliest
log file.
2.1.2.4 Information Shield

The information center provides the information shield function for you to flexibly control
information output. You can use commands to configure the type, severity, and originating
module of the information to be output.
Information Shield Table

The information center filters information using an information shield table.
An information shield table helps filter information based on the type, severity, and source
and output information in multiple directions. Multiple information shield tables can be
created in the information center. Each information shield table maps one or multiple output
directions. Shielded information can be unshielded as required.
The contents of an information shield table are as follows:
l Number of the module that outputs information

l Whether logs can be output
l Severity at which logs can be output
l Whether traps can be output
l Severity at which traps can be output
l Whether debugging information can be output

Equipment
l Severity at which debugging information can be output
As shown in Figure 2-4, by default, logs, traps, and debugging information are output
through default channels. You can also specify a channel to output information. For example,
you can configure logs to be output to the log buffer through channel 6. In this way, all the
logs will be output through the specified channel (channel 6) rather than the default channel
(channel 4).
Figure 2-4 Information filtering
Infomation type Infomation channel Output direction

0 Console
Console
1 Remote
Logs Monitor
terminal
Loghost Loghost
Traps 2
Trapbuffer Trap buffer
3
Logbuffer Log buffer
4
Debugs
SNMP agent
5 SNMP agent
6 channel6
Direction of logs channel7
7
Direction of alarms
8 channel8
Direction of debugging
information channel9 Logfile
9
2.1.2.5 Suppression of the Log Processing Rate
Overview of the Suppression of the Log Processing Rate

If too many logs with the same ID are generated on a running device, the information center is
too busy to process logs with other IDs, which may adversely affect service running.
The information center monitors the traffic of logs with different IDs. If the traffic of logs
with a specific ID exceeds the threshold during a monitoring period, the information center
processes only the conforming traffic and discards the non-conforming traffic. If the traffic of
logs with a specific ID falls below the threshold and remains below the threshold within five
monitoring periods, the suppression is removed.
Advantages of the Suppression of the Log Processing Rate

l Configurable monitoring threshold
You can use commands to specify a monitoring threshold for logs with a specific ID
based on their generation scenario and importance.

Equipment
l Configurable monitoring period

You can use commands to specify a scenario-based monitoring period for the
information center to adjust the suppression condition. To avoid traffic fluctuations, the
suppression can be removed only when the number of logs that are generated every
second falls below the threshold and remains below the threshold within five monitoring
periods.
l Configurable global processing capability
You can configure and adjust the processing capability of the information center to meet
service requirements.
l Suppression record query
You can use commands to query suppression records in real time. In addition, you can
query historical suppression records. For example, you can query the logs that have been
suppressed over half an hour in the suppression record that is updated every half an hour.
In this case, after the suppression is removed, you can view information about the
suppression period and number of logs that are sent and received during the suppression
period.
2.1.2.6 Diagnostic Logs in Binary Format

Diagnostic logs are system logs used by technical support personnel to locate faults.
Only dynamic data in logs is saved in binary format. This feature allows more logs to be
saved in a longer period on a device and improves log processing efficiency. Logs saved in
binary format use only 25% of the space used to save logs in non-binary format.
NOTE
Common user logs are saved in text rather than binary format. Although they use a large log space, they
can be accessed and viewed at any time, independent of device types and versions.
A diagnostic log consists of two parts: static template data and dynamic log data. The two
parts are associated using a log ID that uniquely identifies a log sent by a log module with a
specific severity. Figure 2-5 shows how static template data and dynamic log data are
combined into a complete log.
Figure 2-5 Combining static template data and dynamic log data into a complete log
- <Module name="INFO">
- <LOG ID="1079398422" LEVEL="6" ALIAS="SUPPRESS_DIAGLOG">
Static template <Lang name="en-US" value="Last diagnostic message repeated
data [ULONG] times.([STRING])" />
</LOG>
</Module>
“1079398422”+
+
Dynamic log “Mar 31 2012 04:29:05.230.1-01:00”+”1”+”InfoID=1077497885,
data ModuleName=SHELL, InfoAlias=CMDTIMEOUT"
Saved in binary format.
=
Mar 31 2012 04:29:05.230.1-01:00 huawei %%01INFO/6/
SUPPRESS_DIAGLOG(D)[1464]:Last diagnostic message repeated 1
Complete log
times.(InfoID=1077497885, ModuleName=SHELL,
InfoAlias=CMDTIMEOUT)

Equipment
l Static template data: contains the log ID and fixed log contents, such as diagnostic log
information about all modules on the device. This type of data is saved in .xml format
and can be configured on the device, enhancing log availability and extensibility.
l Dynamic log data: contains the log ID and variable log contents, such as the time and
dynamic parameters. This type of data is saved in binary format and generated based on
operations, events, or alarms in the system.
You can view a generated diagnostic log file using either of the following methods:
l Run commands directly on the device to view log information. After command
execution, the system fills dynamic data in a template based on the log ID and displays
complete log information.
l Use a log parsing tool to parse the static template data and dynamic log data. The parsing
tool is an .exe file. It fills the dynamic data in the static template based on the log ID and
displays complete log information.
Terms
Term Definition
Log A record of transactions or activities that take place in the system.
Event An action or occurrence to which a program may respond. For example,

a managed object is added, deleted, or modified.
Trap Information used to report a fault.
Debugging Information used to record device operating status.
SNMP Acronym for Simple Network Management Protocol. The network

management protocol of TCP/IP. In SNMP, agents, which can be
hardware as well as software, monitor the activity in the various devices
on the network and report to the network console workstation. Control
information about each device is maintained in a structure known as a
management information block.
2.2 SNMP
2.2.1 Terms and Abbreviations
Terms
Terms Explanation
SNMP Simple Network Management Protocol
Agent An agent is the process run on the managed devices.

Equipment
Terms Explanation
ASN.1 ASN.1 is a syntax notation type employed to specify protocols. It

describes the syntax used during the data transmission instead of
detailed data signification.
BER BER is the basic encoding rules. It is in the syntax structure of the ASN.
1, describing how data is represented during transmission.
Entity An entity is the software or hardware to be managed.
Abbreviation
Abbreviation Full Spelling
SMI Structure of Management Information
MIB Management Information Base
PDU Protocol Data Unit
NMS Network Management System
2.2.2 Introduction
Definition
The Simple Network Management Protocol (SNMP) is used to manage TCP/IP networks. It
uses a central computer (a network management station) that runs network management
software. SNMP has the following characteristics:
l Simplicity: SNMP applies to small-scale networks requiring high speed and low costs
because it uses a polling mechanism and provides basic functions. SNMP uses UDP
packets, and is therefore supported by most devices.
l Ease of use: SNMP ensures the transmission of management information between any
two devices on the network, thereby allowing the network administrator to query
information, modify parameters, and locate faults on any device.
Purpose
As networks rapidly develop and applications become more diversified, network management
becomes difficult due to the following factors:
l The number of network devices is dramatically increasing, which increases the network
administrator's workload. In addition, networks' coverage areas are constantly being
expanded, making real-time monitoring and fault location of network devices difficult.
l The network supports a variety of devices from different vendors. Each vendor has a set
of management interfaces (such as command line interfaces), which complicates network
management.

Equipment
SNMP has been developed to simplify the management of large numbers of network devices.
SNMP uses the network management system (NMS) to manage these network devices in
batches, which greatly improves management efficiency. SNMP can manage various network
devices from different vendors, regardless of the differences between these devices.
Along with hardware and software, SNMP monitors, configures, analyzes, estimates, and
controls network resources, ensuring a higher quality of service and better operating
performance at a lower cost.
Version Evolution
In May 1990, RFC 1157 was developed to define the first SNMP version: SNMPv1. RFC
1157 provides a systematic method for monitoring and managing the network. SNMPv1
cannot ensure the security of the network because it is based on community-name
authentication, and only a few error codes are returned.
Later, Internet Engineering Task Force (IETF) released SNMPv2p. For network security,
SNMPv2p imports the concept "participant". This concept, however, was not popularized
because of the problems encountered during actual practice. SNMPv2p was then replaced by
SNMPv2c. SNMPv2c does not cover the concept "participant". It still uses the community-
name authentication of SNMPv1 but imports the get-bulk operation to provide more error
codes.
Because SNMPv2c did not provide a high level of security, the IETF released SNMPv3.
SNMPv3 provides user security module-based (USM-based) encrypted authentication and
view-based access control model (VACM).
At present, Huawei products support SNMPv1, SNMPv2c, and SNMPv3.
Benefits
l Improves the work efficiency of the network administrator. The network administrator
can use SNMP to query information, modify information, and locate faults on any
device.
l Reduces management costs. SNMP provides basic functions for managing devices with
different management tasks, physical attributes, and network types.
l Reduces the impact of feature operations on the device. SNMP is simple in terms of
hardware/software installation, packet type, and packet format.
l Ensures reliable packet transmission by providing a retransmission mechanism. SNMP
supports packet transmission in the "request-response" mode and the active report mode.
l Ensures secure packet transmission by providing security mechanisms such as
authentication and encryption.
2.2.3 Principle
2.2.3.1 SNMP Management Model and Related Concepts

Figure 2-6 shows the Simply Network Management Protocol (SNMP) management model.

Equipment
Figure 2-6 SNMP management model

NMS
Internet
Device Device
Agent Agent
MIB MIB
OID Node ... OID Node ...
1.3.6.1.2.1.1.1 sysDescr ... 1.3.6.1.2.1.1.1 sysDescr ...
1.3.6.1.2.1.1.2 sysObjectID ... 1.3.6.1.2.1.1.2 sysObjectID ...
1.3.6.1.2.1.1.3 sysUpTime ... 1.3.6.1.2.1.1.3 sysUpTime ...
...
...
...
...
...
...
Management Management
object object
Each network management system has at least one network management station (NMS)
running management processes to manage network devices.
The network has devices to be managed, and the agent process is run on these devices. Each
managed device may have several management objects. The agent queries the MIB on the
device under the request of the NMS.
Elements in the network management system are as follows:
l NMS
A network manager or a system using SNMP to manage or monitor network devices.
The NMS runs on NMS servers.
– An NMS can send requests to an agent on a device to query or modify the value of
one or multiple parameters.
– An NMS can receive trap messages sent from the agent on a device to learn the
current status of the device.
l Agent
An agent process on the network device, which maintains data sent from the managed
device and responds to requests from the NMS by sending management data to the
NMS.
– Upon receiving requests of the NMS, the agent performs the required operation
over the MIB and sends the operation result to the NMS.

Equipment
– When a fault or an event occurs on the device, the agent running on the device
sends notifications to the NMS, reporting the current status of the device.
l MIB
A database. It contains variables maintained by network elements. These variables can
be queried and set by the management process. The MIB defines the name, status, access
rights, and data type of the managed device.
An agent can use the MIB to:
– Learn the current status of the device.
– Set the status parameter of the device.
As shown in Figure 2-7, data information is saved in a tree structure (OID tree) similar
to that of the Domain Name System (DNS). Each Object Identifier (OID) is mapped with
a management object. In this example, the OID of the system is 1.3.6.1.2.1.1 and the
OID of the interface is 1.3.6.1.2.1.2.
The OID tree facilitates information management and improves management efficiency.
With the OID tree, the network administrator can query information in batches.
Figure 2-7 OID tree

root
ccitt(0) iso(1) Joint-iso-ccitt(2)
registration authority(1) memberbody(2)

standard(0) identified organization(3)
dod(6)
internet(1)
directort(1) mgmt(2) experim ental(3) private(4) security(5) snmpv2(6)
mib-2(1) 1.3.6.1.2.1 enterprises(1) 1.3.6.1.4.1

......
system(1) interface(2) at(3) ip(4) icmp(5) tcp(6) udp(7) egp(8) ......
...... ...... ...... ...... ...... ............ ......
l Management object
Object to be managed. A device may have multiple management objects, including a
hardware component (such as an interface board), software, and parameters (such as a
route selection protocol) configured for the hardware or software.
2.2.3.2 SNMPv1
This section describes SNMPv1 in terms of the packet format and working principle.

Equipment
SNMPv1 Packet Format

As shown in Figure 2-8, an SNMPv1 packet has the version, community name, and SNMP
Protocol Date Unit (PDU) fields.
Figure 2-8 SNMPv1 packet format

IP data packets
UDP data packets
SNMP packets
IP UDP Community
Version SNMPv1 PDU
header header name
Get/GetNext/Set PDU or
Response PDU or
tap PDU
The fields in an SNMPv1 packet are defined as follows:

l Version: specifies the SNMP version. Its value is the packet version minus by one, for
example, the value of the version field in an SNMPv1 packet is 0.
l Community name: authenticates packets exchanged between the agent and the NMS. Its
value is a string of characters, and a common value is a 6-character string "public". The
community name can be classified into the readable community name and writable
community name. The readable community name is used to implement the get and get-
next operations; the writable community name is used to implement the set operation.
l SNMPv1 PDU: contains information about the PDU type, request ID, and variable
binding list. SNMP PDUs can be classified into get PDUs, get-next PDUs, set PDUs,
response PDUs, and trap PDUs.
SNMPv1 Working Principle

As shown in Figure 2-9, SNMPv1 defines five types of operations for exchanging
information between the NMS and the agent.
Figure 2-9 SNMPv1-defined operation types

Get-request
Get-response
GetNext-request
Get-response
NM Station Set-request Agent
UDP Port162 Set-response UDP Port161
Trap

Equipment
The operation types defined in SNMPv1 are as follows:

l get-request
An NMS performs a get-request operation to obtain the value of one or several MIB
objects of the agent process. The agent process tries to process the get request and sends
a get-response packet to the NMS if the required information is found. If a device is too
heavily loaded to process the request, the agent process discards the request packet.
l GetNext-request
An NMS performs a GetNext-request operation to obtain the value of the next MIB
object of the agent process. Upon receiving the response packet from the agent process,
the NMS performs another GetNext-request operation until the agent process traverses
each MIB sub-tree in the lexicographic order and an error message indicating that all
MIB objects have been traversed is displayed.
l set-request
An NMS performs the set-request operation to set the value for one or more MIB objects
of the agent. The set-request operation can set or change the values of the MIB objects
with a maximum access right of "read-write".
l response
The agent process performs the response operation to return one or more values in
response to the get or set request of the NMS.
l trap
The agent performs the trap operation to notify the NMS of a fault or event on the
managed device. This operation allows the network administrator to process the fault or
event in time.
The agent sends trap messages to the NMS only when conditions that are predefined on a
module are triggered. This trap sending mode has the following advantages:
– Only critical events are reported, which reduces the number of trap messages sent to
the NMS.
– The contents of the trap messages are simple, which reduces the byte counts.
2.2.3.3 SNMPv2c
This section describes SNMPv2c in terms of the packet format and working principle.
SNMPv2c Packet Format
Figure 2-10 SNMPv2c packet format

IP data packets
UDP data packets
SNMP packets
IP UDP Community
Version SNMPv2c PDU
header header name
Get/GetNext/Set PDU or
Response PDU or
Tap PDU
GetBulk PDU or
Inform PDU

Equipment
As shown in Figure 2-10, SNMPv2c Protocol Date Units (PDUs) can be classified into get
PDUs, get-next PDUs, set PDUs, response PDUs, trap PDUs and two newly added PDUs
(getBulk PDUs and inform PDUs).
SNMPv2c Working Principle
Figure 2-11 get-bulk operation
Get-Bulk-request
Get-response
NM Station Agent
UDP Port162 Inform-request UDP Port161
Response
Compared with SNMPv1, two operation types are added in SNMPv2c, as shown in Figure
2-11.
l get-bulk
An NMS performs the get-bulk operation to query pieces of information about a
managed device. One get-bulk operation functions the same as multiple consecutive get-
next operations. The number of get-next operations that function the same as one get-
bulk operation (a one-time get-bulk packet exchange on the host side) can be set on the
NMS.
l Inform
A managed device performs the inform operation to send notifications to the NMS. This
operation is only supported in SNMPv2c. Different from trap messages, inform
messages require a response after reaching the NMS. If the NMS does not send a
response, the managed device sends the inform message again until a response is
returned or the number of message retransmission occurrences reaches the upper limit. If
an inform message fails to be sent, the system logs the failure event on the managed
device. When the NMS restarts, it is notified of any inform messages that failed to be
sent. Inform messages are more reliable than trap messages.
2.2.3.4 SNMPv3
This section describes SNMPv3 in terms of the packet format and working principle.
SNMPv3 Packet Format

SNMPv3 defines a new packet format shown in Figure 2-12.

Equipment
Figure 2-12 SNMPv3 packet format

IP data packets
UDP data packets
SNMP packets
IP UDP Security
Version Header data SNMPv3 PDU
header header parameters
Username Get/GetNext/Set PDU or

Private key Response PDU or
Private Trap PDU or
parameter GetBulk PDU
Compared with SNMPv1 and SNMPv2c, two fields are added to SNMPv3 packets:
l Header data: records the maximum message size supported by the sender, the security
mode, and whether the message is encrypted or authenticated.
l Security parameter: records information of the user name, authentication key, and private
parameter.
SNMPv3 Working Principle

SNMPv3 uses the SNMPv3 entity for the communication between different SNMP-enabled
NMSs. An SNMPv3 entity consists of SNMPv3 engines and applications, and each SNMPv3
engine or application has multiple modules.
The modular architecture of the SNMP entity has the following advantages:
l Strong adaptability: This architecture is adaptable for both simple and complex
networks.
l Easy management: This architecture consists of multiple independent sub-systems and
applications. When a fault occurs in the system, it is easy to locate the sub-system to
which the fault belongs based on the fault type.
l Excellent expandability: An SNMP system can be extended by increasing the number of
modules on the SNMP entity. For example, a module can be added in the security sub-
system for the application of a new security protocol.
In addition, security is enhanced in SNMPv3.
SNMPv1 and SNMPv2c use the community name for packet authentication between the NMS
and the agent. This authentication mode is less secure. To enhance system security, SNMPv3
sets private keys for different users and provides data encryption, encrypted authentication,
and user access control functions, and allows the communication between AAA users and the
NMS. AAA users of different levels have permission to access different MIB objects.
NOTE
To improve system security, it is recommended to configure different authentication and encryption

passwords for an SNMP user.

Equipment
2.2.3.5 Comparisons of SNMPv1, SNMPv2c, and SNMPv3
Usage Scenarios of SNMP Versions

Usage scenarios of the Simple Network Management Protocol (SNMP) versions are as
follows:
Table 2-6 Usage scenarios of SNMP versions

Protocol Version Usage Scenario
SNMPv1 Applies to small-scale networks, which are secure and stable, or

have low requirements on the network security, such as campus
networks or small-scale enterprise networks.
SNMPv2c Applies to medium- and large-scale networks, which are secure

and stable, or have low requirements on the network security,
such as VPN networks. If these networks are overloaded, traffic
congestion may occur. The alarm type is set to inform on these
networks to ensure that inform messages from the managed
device are received by the NMS.
SNMPv3 Applies to networks of any scale, particularly for networks having

high requirements on the security and allowing only authenticated
administrators to manage network devices.
Packet Fields of Each SNMP Version

Packet fields of each SNMP version are as follows:
Table 2-7 Packet fields of each SNMP version

Protocol Version Field
SNMPv1 Community name and SNMPv1 PDU (get PDU, get-next PDU,
set PDU, response PDU, and trap PUD)
SNMPv2c Community name and SNMPv2c PDU (get PDU, get-next PDU,
set PDU, response PDU, trap PUD, get-bulk PDU, and inform
PDU)
Compared with SNMPv1, SNMPv2c have the following
characteristics:
l More operation types are provided.
l The inform alarm is more reliable than the trap alarm.
l More standard error codes for defining different scenarios are
supported.

Equipment
Protocol Version Field
SNMPv3 Header data, security parameter (user name, private key, and
private parameter), and SNMPv3 PDU (get PDU, get-next PDU,
set PDU, response PDU, trap PUD, and get-bulk PDU)
Compared with SNMPv2c, SNMPv3 have the following
characteristics:
SNMPv3 is more secure than SNMPv1 and SNMPv2c because
SNMPv3 supports user authentication, user access control,
authorization, and authentication encryption. Authentication
modes include MD5 and SHA, and the encryption mode is DES
56 , AES-192, AES-256, 3DES and AES-128.
2.2.3.6 SNMP Attack Defense Mechanism

Simple Network Management Protocol (SNMP) provides a sound attack defense mechanism
to prevent invalid users from logging in to the device and protect the device against attacks.
The SNMP attack defense mechanism is implemented as follows:
l When a user logs in to the device the first time, SNMP authenticates the user. The user
that passes the authentication can successfully log in to the device. If the user does not
pass the authentication, SNMP locks and adds the user to the list of locked users and
specifies an 8-second timeout period.
l If the user re-sends a login request before the timeout period expires, SNMP discards the
request. If the user re-sends a login request after the timeout period expires, SNMP re-
authenticates the user. If the user passes the authentication, device login succeeds, and
SNMP deletes the user from the list of locked users. If the user still fails to pass the
authentication, SNMP re-adds the user to the list of locked users and doubles the timeout
period to 16 seconds.
l If a user fails to pass the authentication several times, the user is locked for 8, 16, and 32
seconds, respectively, for the first three authentication failures and for 5 minutes since
the fourth authentication failure.
l The lockout record is kept for 5 minutes from the time the user is added to the list of
locked users the last time. If the 5-minute period does not expire, the record is still kept
in the list even if a lockout timeout occurs. SNMP deletes the record from the list only
when the 5-minute period expires.
l SNMP generates a lockout log every time the user is added to the list of locked users.
When a lockout timeout occurs, SNMP generates an unlocking log.
l If the number of locked IP addresses exceeds the upper threshold, the system reports an
alarm. If the number of locked IP addresses reaches the upper threshold, the system stops
processing SNMP packets until the number falls below the upper threshold.
2.2.4 Applications
2.2.4.1 SNMP for Configuration Management

The network administrator needs to configure and manage all devices on the network. On the
network with sparsely-located devices as shown in Figure 2-13, it is impossible for the

Equipment
network administrator to configure and manage each device on site. If these network devices
are provided by different vendors and the device from each vendor has a set of management
interfaces (such as command line interfaces), the network administrator's workload for
managing these devices is increased. To reduce the operation cost and improve the work
efficiency, the network administrator can use SNMP to manage, configure, and monitor
network devices remotely.
Figure 2-13 SNMP for configuration management
LAN
LAN
NM Station
IP Network
SNMP
M LAN
LAN
M (manager): a program enabling devices to send requests to the NMS.

A (agent): a program enabling the NMS to respond to device's request.
SNMP is enabled on the network, SNMP manager is configured on the NMS, and agent is
enabled on the managed device.
With SNMP:
l The NMS can learn the device status by sending requests to the agent and control
devices remotely.
l The agent can report the status and faults of the device to the NMS in real time.
2.2.4.2 SNMP for VPN User Management

The virtual private network (VPN) technology is developed and used over an IP network to
ensure high-efficient, secure, and fast information transmission.
To establish VPNs between devices and ensure that each VPN user can access the correct
VPN, Internet Service Provider (ISP) needs to manage devices uniformly. SNMP can be used
to implement uniform management over devices on the network.
On the network shown in Figure 2-14, PE1 is connected to CE1 and CE2, and PE2 is
connected to CE3 and CE4. Users on VPN A can communicate with each other through
tunnel A, and users on VPN B can communicate with each other through tunnel B.
l Customer Edge (CE): It is an edge device on a customer network, providing interfaces
that are directly connected to the Service Provider (SP) network.

Equipment
l Provider Edge (PE): It is an edge device on the SP network. A PE is directly connected

to a CE.
l Tunnel: It is a connection-oriented channel used to transmitted messages on the VPN
network.
Figure 2-14 SNMP for VPN user management
M
CE1 CE3 VPN A
VPN A
A NMS A
Tunnel A
PE1 PE2
Tunnel B
VPN B VPN B
CE4
CE2
SNMP deployed on PE1 and PE2 can provide the following functions:
l Enables the NMS to manage PEs in batches and establish a tunnel on the VPN network.
l Manages PEs and their accessed CEs using the NMS and ensures that CE1 and CE3 are
added to VPN A, and CE2 and CE4 are added to VPN B.
These functions save the network management cost, monitor device operating performance,
and improve the service quality.
Term
Term Description
Agent process A management process running on the managed devices and

management software
Acronym and Abbreviation

Acronym Full Name
and
Abbreviatio
n
NMS Network Management Station

Equipment
Acronym Full Name

and
Abbreviatio
n
2.3 RMON and RMON2

This chapter gives the introduction to Remote Network Monitoring (RMON) and RMON2and
describes their basic concepts and principles as well as applications on Huawei devices.
2.3.1 Introduction
This section describes the basic knowledge about RMON.
Basic Concepts of RMON

l NMS
A Network Management System (NMS) is the network management software running
on the Network Management Workstation (NM WS). The network manager sends
requests to the managed devices, and monitors and configures the network devices
through the NMS.
l MIB
A Management Information Base (MIB) is a virtual database that collects the
information about the status of the managed devices.
l Agent
An agent is a process that runs on the managed devices.
l Monitor
A monitor is used to trace communications across network devices. It is also called the
network probe.
l Poll
The NM workstation sends SNMP packets to query the operating status of the device and
configure parameters.
Drawbacks of SNMP
SNMP is a widely used network management protocol. It collects statistics about network
communications by using the agent software embedded in the managed device. The
management software polls the agent for the information. The agent then searches the MIB
and returns the required information to the NMS. This process implements network
management through the NMS. Though the MIB counter records the sum of the statistics, it
cannot analyze the history status of daily communications. To completely obtain the
information about traffic and traffic change in traffic volume in a day, the NMS software must
continue to poll the agent for required information and then analyze the network status.
Poll in SNMP has the following drawbacks:

Equipment
l Consumes network resources heavily.

In a large-scale network, poll generates heavy traffic across the network. This leads to
network congestion or network block. In addition, SNMP collects a lot of information,
such as information about routing tables. Therefore, SNMP is not suitable to manage
large-size networks.
l Burdens the network manager.
In polling, the network manager is responsible for collecting information through the
NMS software. If the manager monitors more than three network segments, the
monitoring task may be so overloaded that the network manager cannot complete.
In addition, MIB-II contains public MIBs and private MIBs defined by manufacturers. They
collect information about devices, such as port status and traffic information, instead of the
information about the NMS running in the sub-network. Therefore, SNMP is not suitable to
manage large-scale networks.
Introduction of RMON
To improve the usability of management information, lighten the burden on the NMS, and
enable the network manager to monitor several network segments, the Internet Engineering
Task Force (IETF) proposed RMON to replace SNMP for managing increasingly distributed
networks.
RMON is based on SNMP and is compatible with SNMP. It consists of two parts: the NMS
and the SNMP agent. The implementation of RMON is simple because it uses the original
mechanism of SNMP. RMON enables the SNMP module to monitor remote network devices
more efficiently and actively. It provides an efficient method to monitor the running status of
sub-networks, which reduces the communication traffic between the NMS and the Agent.
Large-scale networks can therefore be managed in a simple and effective manner.
RMON Goals
RMON provides an effective method to monitor traffic behaviors in sub-networks. RMON
goals are as follows:
l Offline operation: The monitor can continuously collect information about errors,
performance, and configuration even when the network manager is not available.
l Proactive monitoring: The monitor must be available at the onset of any network failure.
It can notify the network manager of the failure and provide useful statistics for fault
location.
l Problem detection and reporting: The monitor can be configured to monitor conditions,
such as faults in the network and resource consumption. When any of these conditions
occurs, an event is logged. This is helpful in checking errors.
l Data analyzing: The monitor can collect and analyze data about the sub-network. This
lightens the burden on the NM Station.
l Multiple managers: Multiple managers can be used to enhance reliability. Managers have
different functions and provide different management performance for interior devices.
2.3.2 Principles
RMON defines a set of MIBs, which contain the information about standard network
monitoring function and interfaces.This implements the communication between the SNMP
management terminal and the remotely managed devices.

Equipment
2.3.2.1 RMON and RMON2 Infrastructure

RMON defines a set of MIBs, which contain the information about standard network
monitoring function and interfaces. This implements the communication between the SNMP
management terminal and the remotely managed devices. MIBs are divided into a certain
number of function table groups. Each table group may have one or more control tables and
data tables. Managers can read from or write into control tables. The data table, however, is
read-only. Control tables define the data collection function, whereas data tables collect data
according to the specifications in control tables. MIB-II contains RMON MIBs. The identifier
of the sub-tree is 16. MIBs are divided in to nine groups with different functions.
Figure 2-15 shows the MIB groups defined in RMON and RMON2.

Equipment
Figure 2-15 MIB groups in the RMON and RMON2
rmon ( mib- 2 16)
statistic ( 1 )
protocolDir ( 11 )
history ( 2 )
protocolDir ( 12 )
alarm ( 3 )
addressMap ( 13 )
host ( 4 )
nlHost( 14 )
hostTopN ( 5 )
nlMatrix ( 15 )
matrix ( 6 )
alHost ( 16 )
filter ( 7 )
alMatrix( 17 )
capture ( 8 )
userHistory( 18 )
event ( 9 )
probeConfig( 19 )
RMON RMON2
l Statistics group: collects basic statistics of each monitored sub-network. The statistics
include the data flow on a network segment, distribution of various packets, error frames,
and collision times.
l History group: periodically collects the network status statistics and stores them for
future use.

Equipment
l Alarm group: allows predefining a set of thresholds for alarm variables that can be any
object in the local MIB. The monitor records logs or sends trap messages to the NMS
when the sample crosses a threshold in a certain direction.
l Host group: contains inbound and outbound traffic statistics associated with each host
discovered on the network.
l HostTopN group: contains statistics about hosts that top a list ordered by one of the
parameters.
l Matrix group: stores errors and useful information in the form of a matrix. This is
convenient for operators to search the information based on any set of two addresses.
l Filter group: allows the monitor to observe packets on the interface and select a specific
packet through filtering.
l Packet getting group: provides a cache mechanism and allows packets to be obtained
after they flow through a channel.
l Event group: stores all the events generated by the RMON agent in a table. The event
group records logs or sends trap messages to the NMS when an event occurs.
NOTE
The alarm group requires the implementation of the event group. The hostTopN group requires the
implementation of the host group. The getting group requires the implementation of the filter group.
Statistics Group
The statistics group collects basic statistics of each monitored sub-network.
Figure 2-16 shows the three tables in the statistics group.
Figure 2-16 Statistics group
statistic ( rmon 1 )
EtherStats Table ( 1 )
tokenRing MPLS
Stats Table ( 2 )
tokenRing
PStats Table ( 3 )
EtherStatsTable
This table contains 21 objects. It has a record entry for each monitored sub-network to display
statistics about the sub-networks. Most objects in this table are counters, used by the monitor
to record packets with different status across sub-networks.
The EtherStatsTable contains information about sub-networks and error information, such as
Cyclic Redundancy Check (CRC) code, and correct and incorrect packets. Therefore, this
table displays the operating status of the entire network. Information collected in the
EtherStatsTable and MIB-II is similar. The information in the EtherStatsTable is more detailed
and is more pertinent to Ethernet networks.

Equipment
Upper and lower limits in the alarm table are set based on the statistics in the EtherStatsTable.
Setting the alarm threshold is an effective method for network monitoring.
tokenRing MPLS Stats Table and tokenRing PStats Table
The tokenRing MPLS StatsTable and the tokenRing PStatsTable provide statistics of token
ring networks. Most objects in the tables are counters.
History Group
The history group periodically collects statistical samples on a monitor.
This group consists of one historyControlTable and three HistoryTables as shown in Figure
2-17.
Figure 2-17 History group
history ( rmon 2 )
historyControl
Table ( 1 )
etherHisTable( 2 )
tokenRingMLHistory
Table( 3 )
tokenRingPHistory
Table( 4 )
l historyControlTable
The historyControlTable contains detailed information, such as the sampling interval and
interface information. Every record in it defines the sampling interval for a specified
interface. After being sampled, data is saved as a related entry in the data table. As
defined in RMON, a monitored interface must have two control rows, one of which
defines the sampling interval as 30 seconds and the other defines the sampling interval as
30 minutes. The short interval is used to detect the burst communication events, and the
long interval is used to detect the stable communication events.
l Data Table
Data tables are applied to record data. The etherHistoryTable is a data table especially
for Ethernet networks. The tokenRingMLIHistoryTable and the tokenRingTable are data
tables for token ring networks. Similar to the statistics group, the data table also provides
counters.
Alarm Group
The alarm group allows predefining a set of thresholds for variables. If the monitored variable
exceeds the threshold, an event is generated, and the monitor records logs or sends trap
messages to the NMS. This group is dependent on the event group and requires the

Equipment
implementation of the event group. The alarm group consists of only one table: the
alarmTable. Each record defines the specified variable, sampling interval, and threshold.
Host Group
The host group collects statistics associated with the specified host newly discovered on a
Local Area Network (LAN).It discovers hosts by monitoring the source and destination MAC
addresses in the packets transmitted across the LAN. The host group retains a group of
statistics for each host.
This group consists of the hostControlTable, the hostTable, and the hostTimeTable, as shown
in Figure 2-18.
Figure 2-18 Host group
host ( rmon 4 )
hostControlTable( 1 )
hostTable( 2 )
hostTimeTable( 3 )
l hostControlTable
Every row in the hostControlTable corresponds to a monitored network interface.
Options in the control tables define various data. Control tables also record the time
when entries in the data table are deleted. The control tables and the data tables are
directly mapped. The hostTable records the MAC addresses discovered by network
interfaces specified by rows in the control table.
l hostTable
Rows in the hostTable store statistics about hosts. This table can be indexed either by
MAC addresses of hosts or network interfaces. If the network interface discovers a new
host, a row is added in the hostTable. Once a row is added in the hostTable, the monitor
begins to the check the MAC address of the corresponding network interface.
l hostTimeTable
Rows in the hostTimeTable store the same information as that in the hostTable. The
hostTimeTable is indexed by the creation time instead of the MAC address.
The hostTimeTable also supports management stations. You can effectively find new
entries for the specified interface without downloading the information of the complete
table.
HostTopN Group
The hostTopN group is used to maintain statistics of the hosts in a sub-network. The
monitored hosts top a list ordered by one of their variables. For example, this group can
collect the information about the host with the top 10 data transmission amount.

Equipment
This group consists of the hostTopNControlTable and the hostTopNTable, as shown in Figure
2-19.
Figure 2-19 HostTopN group
hostTopN ( rmon 5 )
hostTopControl
Table( 1 )
hostTopNTable( 2 )
l hostTopControl Table
Every row in the hostTopControlTable defines a Top-N report of a network interface. It
also covers the period from the time the last TOP-N report is initialized to the time the
system starts.
l hostTopTable
This table contains information about Top N hosts. Each row represents a unique host.
This table also contains the MAC address of the host and defines the changes of the
sampled data.
Matrix Group
The matrix group stores statistics of traffic between hosts in a sub-network. The statistics are
stored in a matrix format. This group consists of the matrixControlTable, the matrixSDTable,
and the matrixSDTable, as shown in Figure 2-20.
Figure 2-20 Matrix group
matrix (rmon 6)
matrixControlTable
(1)
matrixSDTable(2)
matrixSDTable(2)
l matrixControlTable
Each row in this table identifies a sub-network. It displays the session status on the
network interface and records the statistics of sessions in two data tables.

Equipment
l matrixSDTable
This table is used to store the statistics of traffic from the specified source host to
multiple destination hosts. This table records two entries for a pair of hosts exchanging
information recently. One entry recording traffic sent from the source host to the
destination host; the other entry records traffic sent from the destination host to the
source host.
l matrixSDTable
This table is similar to the matrixSD table. The difference lies in the sequence of
indexes.
Filter Group
The filter group allows the monitor to trace the packets on a specified interface. Basic
components of this group are two types of filters: data filter and status filter. A data filter
allows the monitor to shield the traced packets in a bit method. A status filter allows the
monitor to match the packets based on packet status. Filters can be used in the logical
AND/OR combination to form a complicated test mode.
This group consists of the filterTable and the channelTable.
Figure 2-21 Filter group
filterindex (rmon 7)
filter Table(1)
channelTable(2)
The filterTable defines related filters. Each row in the channelTable corresponds to a unique
channel and is associated with one or several rows in the filterTable.
Packet Getting Group

The packet getting group is set as a cache mechanism, used to obtain packet header after they
flow through a channel. The packet getting group requires the implementation of the filter
group.
Figure 2-22 shows the two tables that the packet getting group contains.

Equipment
Figure 2-22 Packet getting group
capture ( rmon 8 )
BufferControl
Table( 1 )
captureBuffer
Table( 2 )
Each row in the bufferControlTable defines a cache used to get and store packets passing
through a channel. Each row in the captureBufferTable corresponds to a obtained packet.
Event Group
The event group can define events. An event can be triggered by a certain condition in the
MIB or can trigger a certain operation defined in the MIB. The event also generates logs
(recorded in this group) or SNMP trap messages.
This group consists of the eventTable and the logTable, as shown in Figure 2-23.
Figure 2-23 Event group
event ( rmon 9)
eventTable(1)
logTable(2)
The eventTable defines events. Each row in the table describes the parameters of the event
triggered by a certain condition.
If events are recorded, corresponding entries in the logTable are created.
2.3.2.2 Features of RMON and RMON2
Description of RMON and RMON2 Features

RMON MIBs contain information used for statistics, analysis, and diagnosis. The manager
can obtain the data by using the standard tools from various manufacturers. In this way, it
provides the function of analyzing the remote network. Generally, each sub-network has one
monitor, also called a probe or an RMON agent. A monitor is an independent device
especially used to collect and analyze traffic statistics. To ensure more effective network

Equipment
management, the monitor must communicate with the central network management station.
Monitors have the following functions:
l Traces information groups in the network, collects statistics, and summarizes the
information.
l Provides important management information for the network manager; stores certain
information groups for later analysis.
l Filters groups according to information types.
l Gets special information groups.
To implement RMON in the network, the monitor and the RMON client software must be
used. The monitor runs effectively without continuously polling the managed devices. It
generates a trend diagram to illustrate the network operating status based on the capability of
the RMON module to store history statistics. The monitor reports the operating status and
describes any obtained abnormal situation regardless of accidental network events. The client
software diagnoses the fault based on the reported information from the RMON module and
finds solutions.
RMON allows multiple monitors and collects data in the following ways:
l Uses a special RMON probe. The NMS obtains management information from the
RMON probe and controls network resources directly. This helps in obtaining all the
information on the RMON MIB. This costs a lot because RMON probes must be
deployed in all LANs.
l Embeds the RMON Agent into a network device (ATN and HUB), enabling the device
with the RMON probe function. The NMS uses the basic SNMP commands to exchange
data information with the RMON agent and to collect the network management
information. This is, however, restricted by the device resources, and the NMS collects
the information in four groups only (alarm, event, history, and statistics) rather than the
entire RMON MIB data. This method improves the efficiency of the network monitoring
and reduces costs.
NOTE
The ATN implements the monitoring and statistics collection function only on the Ethernet interfaces of
network devices.
New Feature of RMON2

RMON2 is one of the RMON MIB standards, serving as a supplement to RMON. In RMON2,
certain new groups are added.
RMON and RMON2 are both used to monitor Ethernet links. RMON monitors the traffic only
at the MAC layer whereas RMON2 can monitor the traffic at the MAC layer and the
subsequent upper layers.
RMON2 can decode data packets from layer 3 to layer 7 in the OSI model. The RMON agent
has the following functions:
l Monitors the traffic based on network layer protocols and addresses. This enables the
agent to learn its connected external LAN network segment and view the incoming
traffic to the LAN through the ATN.
l Records the incoming and outgoing traffic to and from a specific application because the
RMON2 agent decodes and monitors the traffic of applications such as email, the File
Transfer Protocol (FTP), and WWW.

Equipment
In this manner, the monitor can record the information about the actions of the application on
a host and display diagrams to illustrate the action of each application. This strengthens the
network monitoring.
2.3.2.3 Remote Monitoring of RMON and RMON2

Special devices or a function module of the system can be used to implement remote
monitoring. To manage the remote monitor effectively, RMON MIB supports the terminal
management function.
Configurations
To implement remote monitoring, configurations about data collection are required. The type
of data required to be collected must be configured. MIBs are divided into multiple function
groups. Each group contains one or more controls tables, corresponding to one or more data
tables. The manager can read from or write into the control tables, whereas data tables are
read-only. A control table contains all parameters in the data table. The manager collects the
required data by modifying parameters in data tables. Parameter setting is implemented by
adding or deleting records in the control table. After data is collected according to the data in
the control table, data is stored in the corresponding data table.
Defining and actualizing of the functions of the monitor are implemented through tables. The
operation process is similar to the database operation. Parameters in the control tables are all
configured with values, and every record defines a specified data collection function. Records
in the data table correspond to the records in the control table. Every record in the control
table and its corresponding data records are bound through pointers. Records in the control
table have indexes, which are used to search the corresponding records in the data table.
Similarly, records in the data table also have indexes, which are applied to find the
corresponding records in the control table.
To modify parameters in the control table, first delete the record with the specified index in
the control table. Note that the corresponding records in the data table must be deleted
simultaneously. The manager then generates a control record and adds it to the control table.
When the records in the control table and the data table are mapped one-to-one, the control
table and the data table can be considered as one table.
Operation Control on the Monitor

RMON inherits operating modes in SNMP. RMON releases and controls commands by
reading the specified value from the MIB and sets the value of the object. Each object can be
used to indicate a command. If the object is set with a special value, it indicates that a specific
operation is to be performed.
Multiple Manager Control

One RMON agent may be controlled by multiple managers. Then, resources must be shared
among managers. This may lead to conflicts and unexpected results. The following are the
problems that occur when the RMON agent is shared:
l A manager requires additional resources more than that a monitor can provide.
l A manager uses a significant amount of resources for a long period. This prevents other
manager from using the monitor.
l A manager uses resources and then crashes; the resources used cannot be released.

Equipment
A mechanism is developed to prevent the preceding conflicts and help to resolve them. This
mechanism is a simple control function in the control table of the RMON MIB. Each control
table has a label identifying the owner of the function. This shows the relationship between
records and related functions. When multiple managers want to access the same control table,
the following can be implemented based on the relationship:
l A manager may recognize resources it owns as well as resources that it no longer needs.
l A network manager can know the resources and successfully release resources or related
functions.
l An authorized network manager can release resources that are reserved by other
managers.
l Upon initialization, a manager can recognize the resources it has reserved. It then
releases the resources if it no longer needs them.
In RMON, owner ID cannot be used as passwords or access control mechanism. In SNMP

management frame, the unique access control mechanism is SNMP views and community
names. If a readable and writable RMON control table exists in the views of some managers,
all these managers can read from and write into the control table. The control table can be
deleted or modified by the owner only. Other managers have read-only authority.
If multiple NMSs want to access the same control table, it is more effective to use the
resource sharing function. When a manager intends to utilize a function in a monitor, it first
scans the control table of that function to find the function or a similar function defined by
other managers for sharing. If the function is found, the manager can read the records from
the data table corresponding to the control table. The owner of the records may
indiscriminately modify or delete the functions. Therefore, in certain cases, other managers
may find that the expected functions have been modified or deleted.
Generally, during the initialization of each monitor, default function sets should be
configured. The labels of function owners are characters starting with "monitor", indicating
that resources related to the pre-defined functions belong to the specified monitor. If some
managers need to use the functions, they can only read but cannot modify or delete the
function. Functions can be deleted by the manager of the monitor (commonly, network
manager) only.
2.3.2.4 Table Management in RMON and RMON2

RMON MIBs are composed of control tables and data tables. Control tables define the
structure of data tables by setting parameters, and data tables are used to store information.
Without modifying or disobeying the SNMP management frame, RMON provides standard
specifications, clearly describing the operations as row addition or row deletion.
Row Addition
The manager obeys the following rules to add rows through the SNMP Set operation:
l The manager sends the Set request to the managed device for adding a row. If the index
of the new row does not conflict with indexes of other rows, the agent generates a new
row.
l If the tabular information is not configured for the new row, the agent can set the row to
the default value or maintain the row in the incomplete status.
l Before the manager requires adding a new row, the inactive row must keep the inactive
status.

Equipment
l If the new row to be created exists, the agent responds with an error packet.
Row Deletion
If the manager sets the value of this object to an invalid value by sending the Set request to
the agent, a row can be deleted.
Row Modification
The manager sets the value of the object to an invalid value and then modifies the value by
sending the Set request. In this way, the value of the object is modified.
2.3.2.5 Implementation of RMON and RMON2 on Huawei Devices

This section describes the implementation of RMON and RMON2 on Huawei devices.
Implementation of RMON
RMON effectively implements monitoring on all network segments. In LANs, deploying
RMON probes is highly expensive. In addition, monitoring network segments individually
degrades the performance of RMON and generates heavy traffic on the network.
To solve the preceding problems, manufacturers embed the RMON module into network
devices. This is more economic and effective. The RMON agent module is embedded on
Huawei ATNs, forming a complete system with other modules in the ATN. The NMS can use
SNMP. The network managers then do not need extra learning to handle RMON.
RMON in the ATN supports four groups, namely, statistics, history, alarm, and event, defined
in RFC 2819, and a Performance-MIB defined by Huawei. The four groups are described as
follows:
l Statistics group
The statistics group collects basic statistics of each monitored sub-network. The statistics
include the data flow on a network segment, distribution of various packets, error frames,
and collision times.
The statistics group contains an ethernetStatsTable. Rows can be created in the
ethernetStatsTable only on Ethernet and Gigabit Ethernet interfaces (not sub-interfaces).
An interface corresponds to only one row in the etherStatsTable. The etherStatsTable has
a maximum of 100 rows.
l History group
The history group periodically collects the network status statistics and stores them for
future use. The history group has the following tables:
– historyControlTable: controls information such as the sampling interval and
interface information. After being sampled, data is saved as a related entry in the
ethernetHistoryTable. Rows can be created in the historyControlTable only on
Ethernet and Gigabit Ethernet interfaces (not sub-interfaces) and 10 Gbit/s Ethernet
interfaces. The historyControlTable contains a maximum of 100 rows.
– ethernetHistoryTable: provides the network administrator with other history
statistics such as the traffic on a network segment, error packets, broadcast packets,
utilization, and collision times. Each entry in the historyControlTable defines a
sampling interval and is associated with the historyControlTable that the sampling
is based on. The history control table is created once one sampling interval arrives.

Equipment
Each entry in the historyControlTable corresponds to a maximum of 10 pieces of

history records in the ethernetHistoryTable. The previous records are overwritten
circularly when the number of records exceeds 10. When the RMON module
receives a request for row deletion from the historyCotrolTable, the manager cannot
delete the specified row because the ethernetHistoryTable does not contain the
status of the row. Instead, the manager can delete all the associated
ethernetHistoryTable.
l Alarm group
The alarm group allows predefining a set of thresholds for alarm variables that can be
any object in the local MIB. The monitor records logs or sends trap messages to the
NMS when the sample crosses a threshold in a certain direction. The alarm table requires
the implementation of the event group. As defined in RFC 2819, the alarm function has a
hysteresis mechanism to limit the generation of alarms. This mechanism generates an
event when the sampled data crosses the threshold in one direction, and does not
generate more events until the sampled data crosses the threshold in the opposite
direction. This is inconvenient for monitoring devices.
The ATN does not apply this mechanism because it may stop generating alarms for a
long period. In the ATN, once the sampled value becomes normal, the alarm function can
be reused to monitor the value alarm variables. The alarm table requires the
implementation of the event table. The alarm table takes effect and automatically updates
the status of effective parameters to valid only after the corresponding event table is
created. The alarm table contains a maximum of 60 rows.
The alarm group contains an alarmTable.
l Event group
The event group records all events occurring on the RMON agent. It controls events and
prompts on the managed devices. Entries in this group are used to describe parameters
that trigger the events. Each event row is activated by related variables in the MIB. After
an event row is created, its associated alarm row and prialarm row automatically update
their status to valid if the parameters are effective. If the event row becomes invalid, it
must update the status of the associated valid alarm row and prialarm row, from invalid
to critical. The event table has a maximum of 60 rows.
Logs or trap messages are sent to the NMS for notifying that an event occurs.
The event group contains the eventTable and the logTable.
l Performance-MIB
Based on the alarmTable in RFC 2819, the RMON prialarm group is enhanced with a
function: setting the alarm object and the time span of an alarm entry in the form of an
expression. The RMON Performance-MIB has a prialarmTable. After an event row is
created, its associated prialarm row automatically updates the status to valid if the
configured parameters are effective. If the event row becomes invalid, it must update the
status of the associated effective prialarm row, from invalid to critical. The prialarmTable
has a maximum of 50 rows.
In the ATN, each entry is given a specific time span, to save system resources. Lifetime
indicates the time period of an entry that is not in the valid state. The longer an entry keeps
invalid, the shorter its lifetime is. The entry is deleted when the lifetime reaches the value 0.
Table 2-8 shows the capacity of various tables and the maximum time span of each table.

Equipment
Table 2-8 Time span of each table

Table Entry Capacity (Byte) Maximum Lifetime (s)
ethernetStatsTable 100 600
historyControlTable 100 600
alarmTable 60 6000
eventTable 60 600
logTable 600 -
prialarmTable 50 6000
When an interface board or an interface card is removed, its corresponding ethernetStatsTable

and historyControlTable entries become invalid. The ATN then sets the default lifetime of
these tables to 1200s. If lifetime reaches 0, entries are deleted.
Adding an interface when its corresponding entries still exist makes the entries to be valid
again.
Implementation of RMON2
RMON2 is one of the RMON MIB standards, serving as a supplement to RMON. In RMON2,
some groups are added.
As defined in RFC 2021, RMON2 contains several MIB groups: protocolDir, protocolDist,
addressMap, nlHost, nlMatrix, alHost, alMatrix, usrHistory, probeConfig, and
rmonConformance.
Currently, the ATN supports two RMON2 MIB groups: protocolDir and nlHost.
Figure 2-24 shows the relationship between the protocolDirTable, nlHostTable, and
hlHostControl table.
Figure 2-24 Relationship between tables
protocolDirTable nlHostTable hlHostControlTable
protocolDirID index1 hlHostControlInde
nlHostTimeMark
protocolDirParameters other PARA
(index2)
index
protocolDirLocalIndex
3
nlHostAddress
other PARA
(index4)
other
PARA

Equipment
l protocolDirTable
It lists the protocols, which the RMON agent can decode and count. Each row in the
table corresponds to one type of protocols. The protocols can be network layer protocols,
transport layer protocols, or higher-layer protocols. Note that nlHost supports the
network layer host group instead of the application layer group. That is, application layer
host control and the alHostTable are not implemented in the host control table.
Therefore, only IP can be set in the protocol directory group.
l nlHostTable
The nlHostTable is used to count the amount of inbound and outbound traffic on the
interface. It provides traffic statistics for the specified network address. This table
collects statistics of the host discovered by the RMON agent and classifies statistics
based on network addresses.
l hlHostControlTable
The hlHostControlTable contains two tables: network layer host control table and
application layer host control table. It defines the monitored interface, and records the
total number of frames that are received on the interface but not recorded in the
nlHostTable. It also records the number of times of entry addition and deletion and the
expected maximum number of entries in the nlHostTable. The alHostControlTable
cannot control the alHostTable.
Abbreviations
Abbreviations Full Spelling
RMON Remote Network Monitoring
2.4 IP FPM
NOTE
Among the ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the IP FPM
function.
2.4.1 Introduction
Definition
IP Flow Performance Measurement (FPM) is a Huawei proprietary feature that measures
packet loss rate and delay of end-to-end service packets transmitted on an IP network to
determine network performance. This feature is easy to deploy and provides an accurate
assessment of network performance.
Purpose
As IP services are more widely adopted, fault diagnosis and end-to-end service quality
analysis are becoming an increasingly pressing concern for carriers. However, absence of
effective measures prolongs fault diagnosis and increases the workload. Currently, carriers use

Equipment
Network quality analysis (NQA) and Y.1731 to measure the quality of services running on IP
radio access networks (RANs).
Both measures, however, have their own shortcomings.
l NQA measures network performance by determining the packet loss rate of simulated
packets, but not actual service packets transmitted on networks. The performance
counters collected by NQA may not represent the actual service quality, and therefore
cannot serve as a solid reference for network performance analysis.
l Y.1731 measures only Layer 2 Ethernet network performance, but not performance of
networks spanning different layers.
l Neither NQA nor Y.1731 can monitor end-to-end networks at different layers, and
therefore are not effective for monitoring IP network performance.
IP FPM does not have any of these shortcomings. IP FPM directly measures service packets
to assess IP network performance and monitors services in real time for network diagnosis.
Benefits
IP FPM brings the following benefits to carriers:
l Allows carriers to use the network management system (NMS) to monitor the network
running status to determine whether the network quality complies with the service level
agreement (SLA).
l Allows carriers to promptly adjust services based on measurement results to ensure
proper transmission of voice and data services, improving user experience.
2.4.2 Principles
2.4.2.1 Basic Concepts
IP FPM Model
The IP Flow Performance Measurement (FPM) model describes how service flows are
measured to obtain the packet loss rate and delay. In statistical terms, the statistical objects are
the service flows, and the statistical calculations determine the packet loss rate and delay of
the service flows traveling across the transit network. Service flow statistical analysis is
performed on the ingress and egress of the transit network.
The IP FPM model is composed of three objects: target flows, the transit network, and the
statistical system. The statistical system is further classified into the Target Logical Port
(TLP), Data Collecting Point (DCP), and Measurement Control Point (MCP). Figure 2-25
shows the IP FPM model.

Equipment
Figure 2-25 IP FPM model

MCP
Upstream-TLP1
Upstream-TLP2 Downstream-TLP1
DCP DCP
Upstream-TLP3 Downstream-TLP2
Upstream-TLP4
Transit Network
l Target flow
Target flows must be pre-defined.
One or more fields in IP headers can be specified to identify target flows. The field can
be the source IP address or prefix, destination IP address or prefix, protocol type, source
port number, destination port number, or type of service (ToS). The more fields
specified, the more accurately flows can be identified. Specifying as many fields as
possible is recommended to maximize the measurement accuracy.
l Transit network
The transit network only bears target flows. The target flows are not generated or
terminated on the transit network. The transit network can be a Layer 2 (L2), Layer 3
(L3), or L2+L3 hybrid network. Each node on the transit network must be reachable at
the network layer.
l TLP
TLPs are interfaces on the edge nodes of the transit network. TLPs perform the
following actions:
– Compile statistics on the packet loss rate and delay.
– Generate statistics, such as the number of packets sent and received, traffic
bandwidth, and timestamp.
An In-Point-TLP collects statistics about service flows it receives. An Out-Point-TLP
collects statistics about service flows it sends.
l DCP
DCPs are edge nodes on the transit network. DCPs perform the following actions:
– Manage and control TLPs.
– Collect statistics generated by TLPs.
– Report the statistics to an MCP.
l MCP
MCPs can be any nodes on the transit network. MCPs perform the following actions:
– Collect statistics reported by DCPs.
– Summarize and calculate the statistics.
– Report measurement results to user terminals or the network management system
(NMS).

Equipment
Measurement Flags
Measurement flags, also called identification flags, identifies whether a specific packet is used
to measure packet loss or delay.
A specific bit in the IPv4 packet header can be specified as a measurement flag for packet loss
or delay measurement.
l The third to seventh bits in the ToS field are seldom used in actual applications. These
bits, if available, can be used as measurement flags for service packets.
l Bit 0 in the Flags field is reserved and can be directly used as a measurement flag.
Figure 2-26 shows the possible measurement flags in the IPv4 packet header.
Figure 2-26 IPv4 packet header format
0 15 16 31 bit
Version IHL Type of Service Total Length
Identification Flags Fragment Offset
Time to Live Protocol Header Checksum
Source Address
Destination Address
Options Padding
If two or more bits in the IPv4 packet header have not been planned for other purposes, they
can be used for packet loss and delay measurement at the same time. If only one bit in the
IPv4 packet header has not been planned, it can be used for either packet loss or delay
measurement in one IP FPM instance.
2.4.2.2 Basic Functions
Function Overview
IP Flow Performance Measurement (FPM) measures multipoint-to-multipoint (MP2MP)
service flows to obtain the packet loss rate and delay.
Three IP FPM types are available: proactive performance statistics, on-demand performance
statistics, and hop-by-hop performance statistics. Table 2-9 lists the usage scenarios for these
IP FPM types.
Table 2-9 IP FPM classification
Category Usage Scenario
Proactive When users want to detect network performance deterioration in

performance statistics real time, implement end-to-end proactive performance statistics to
continuously monitor the network.

Equipment
Category Usage Scenario
On-demand When network performance deteriorates or users want to learn

performance statistics about real-time performance statistics of specific service flows,
implement end-to-end on-demand performance statistics in a
specified period.
Hop-by-hop When network performance deteriorates, implement hop-by-hop

performance statistics on-demand performance statistics to locate the faulty node.
NOTE
Currently, the ATN supports the following hop-by-hop performance
statistics functions: hop-by-hop packet loss measurement, hop-by-hop one-
way delay measurement, and hop-by-hop two-way delay measurement.
The ATN supports the following IP FPM functions:

l Packet Loss Measurement
– P2P packet loss measurement measures packet loss on a link between two devices.
– MP2MP packet loss measurement measures packet loss on links between multiple
devices.
l Delay Measurement
– P2P one-way delay measurement measures one-way delay on a link between two
devices.
– P2P two-way delay measurement measures round-trip delay on a link between two
devices.
Implementation
A bearer network where traffic passes through has boundaries through which traffic enters
and leaves. On the IP/MPLS network shown in Figure 2-27, the number of packets entering
the network in the ingress direction on ATN is PI, and the number of packets leaving the
network in the egress direction on ATN is PE.
Figure 2-27 IP FPM statistics collection
PI2 Ingress Egress PE2
ATNB
IP/MPLS
PI1 Ingress Ingress PI3
PE1 Egress Egress PE3

ATNA ATNC

Equipment
Over a specified period, the difference between the number of packets entering the network
and the number of packets leaving the network is the packet loss.
l The number of packets entering the network is the sum of all packets moving in the
ingress direction: PI = PI(1) + PI(2) + PI(3)
l The number of packets leaving the network is the sum of all packets moving in the
egress direction: PE = PE(1) + PE(2) + PE(3)
Over a specified period, the difference between the time a service flow enters the network and
the time the service flow leaves the network is the delay.
Packet Loss Measurement
Packet loss measurement calculates the difference between the volume of traffic entering the
network and the volume of traffic leaving the network over a specified period.
Figure 2-28 shows a typical network where end-to-end performance can be measured.
Service packets enter the network from ATNA and leave the network from ATNB.
Figure 2-28 IP FPM packet loss measurement
PI1 Ingress Egress PE2

IP/MPLS
ATN1 ATN2
ATN1
0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0
ATN2
0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0
time t5 t4 t3 t2 t1 t0
1. t0: ATNA sets the loss measurement flag to 1 for incoming service packets in the first
period and starts counting all service packets with the loss measurement flag as 1.
2. t1: ATNB starts receiving service packets with the loss measurement flag as 1 in the first
period and starts counting these service packets.
3. t2: ATNA finishes counting the incoming service packets with the loss measurement flag
as 1 in the first period and calculates the total number of these service packets PI1.
ATNA then sets the loss measurement flag to 0 for incoming service packets in the
second period and starts counting all service packets with the loss measurement flag as 0.
4. t3: ATNB finishes receiving service packets with the loss measurement flag as 1 in the
first period and calculates the total number of these services packets PE2.

Equipment
NOTE
ATNB starts receiving service packets with the loss measurement flag as 1 from t1. At T3, the internal
timer has run for a specified period. ATNB determines that it finishes receiving service packets with the
loss measurement flag as 1 in this period based on the period elapse, but not on whether service packets
with the loss measurement flag as non-1 have been received. Therefore, service packet measurement
will not be affected by packet disorder. This mechanism ensures that service packets in each period are
correctly collected.
5. t4: ATNA sets the loss measurement flag to 1 for incoming service packets in the third
period and starts counting all service packets with the loss measurement flag as 1.
6. t5: ATNB starts receiving service packets with the loss measurement flag as 1 in the third
period and starts counting these service packets.
ATNB can obtain the number of received service packets with the loss measurement flag as 1
in the first period any time between t3 and t5. The formula is LostPacket = PI1 - PE2.
Delay Measurement
Delay measurement calculates the difference between the time a service flow enters the
network and the time the service flow leaves the network over a specified period.
In IP FPM, delay measurement is implemented for sampled service packets by recording the
time the packets are sent and the time the packets are received.
Figure 2-29 IP FPM delay measurement
t1 t2
0 0 0 1 0 0 0 0 0 1 0 0
IP/MPLS
0 0 1 0 0 0 0 0 1 0 0 0
t4 t3
ATNA ATNB
Figure 2-29 shows a typical network where delay measurement is implemented.

l When service packets are transmitted from ATNA to ATNB, the procedure is as follows:
– t1: ATNA sets the delay measurement flag to 1 for specified incoming service
packets and obtains the timestamp t1.
– t2: ATNB starts receiving the service packets with the delay measurement flag as 1
and obtains the timestamp t2.
l When service packets are transmitted from ATNB to ATNA, the procedure is as follows:
– t3: ATNB sets the delay measurement flag to 1 for specified incoming service
packets and obtains the timestamp t3.
– t4: ATNA starts receiving the service packets with the delay measurement flag as 1
and obtains the timestamp t4.
The delay measurement results are as follows:

Equipment
l The one-way delay from ATNA to ATNB is: 1d(ATNA->ATNB) = t2 - t1

l The one-way delay from ATNB to ATNA is: 1d(ATNB->ATNA) = t4 - t3
l The two-way delay is: 2d = (t2 - t1) + (t4 - t3) = (t4 - t1) - (t3 - t2)
2.4.3 Applications
2.4.3.1 End-to-End Performance Measurement Scenarios
IP datacom networks, as the mainstream of datacom networks, are large in scale and provide
various access modes. To maximize carriers' return on investment, reduce network
construction costs, and evolve the existing network smoothly into a Long Term Evolution
(LTE) network, an IP RAN solution is introduced.
IP RANs require performance measurement for SLA compliance and routine O&M
performance management. As the bearer network quality (delay, jitter, and packet loss) affects
the radio service quality, the bearer network department must provide optimal methods to
detect the network operating status. In addition, if the service quality deteriorates, the bearer
network must be able to provide its own performance data to help fault locating.
IP RAN provides a variety of solutions. The following section describes the application of IP
FPM end-to-end performance measurement in HVPN, L2+L3 mixed VPN, and L3 dual-
homing scenarios.
HVPN Scenarios
Figure 2-30 shows an HVPN networking. Table 2-10 lists how to deploy IP FPM in an
HVPN scenario.
Figure 2-30 IP FPM application in an HVPN scenario
Last Mile Access Aggregation RNC/SGW/MME
L3VPN MCP
TLP
RSG1
TLP AGG1
RNC
NodeB Service data

CSG1 IP FPM packets
TLP AGG2
RSG2 SGW/MME
eNodeB
TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow
Period i+1 Period i Period i+1 Period i

CSG ingress RSG egress
measurement points measurement points

Equipment
Table 2-10 IP FPM deployment in an HVPN scenario

Stati A specific service type on a base station is being measured. The service type can be
stics the 3G Ethernet service (signaling, voice, and data), S1 service (signaling, voice,
objec and data), OM service, or IP clock service. The service flow is identified by the
t source IP address, destination IP address, and DSCP value.
TLP Performance measurement can be implemented for E2E services and local
depl switching services on IP RANs. On the network shown in Figure 2-30:
oym l For E2E services, configure IP FPM on both ends (CSG, RSG1, and RSG2) of
ent the Layer 3 service flow; deploy TLPs on the UNIs of the CSG, RSG1, and
RSG2 and bind the TLPs to the access-side interfaces (TLPs must be bound to
the outbound interfaces on both RSGs).
l For local switching services, deploy TLPs only on the CSG sub-interfaces
connecting to the base stations.
DCP Configure the CSG, RSG1, and RSG2 as DCPs to send measurement data to the
depl MCP.
oym
ent
MCP l If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG.
oym l If routes are unreachable between the access and aggregation networks, deploy
ent the MCP on an AGG.
On the network in Figure 2-30, deploy the MCP on RSG1.
Cloc Configure the network time protocol (NTP) or 1588v2 so that all device clocks can
k be synchronized.
depl l To implement IP FPM one-way delay measurement, you must configure 1588v2
oym for clock synchronization. If not, the measurement is incorrect.
ent
l To implement IP FPM two-way delay measurement or packet loss
measurement, configure either NTP or 1588v2 for clock synchronization.
1588v2 implements higher-precision clock synchronization than NTP. Using
1588v2 is recommended.
In E2E VPN and native IP+L3VPN scenarios, deploy IP FPM in the same manner as that in
HVPN scenarios.
L2+L3 Mixed VPN Scenarios

Figure 2-31 shows an L2+L3 mixed VPN networking. Table 2-11 lists how to deploy IP FPM
in an L2+L3 mixed VPN scenario.

Equipment
Figure 2-31 IP FPM application in an L2+L3 mixed VPN scenario

L2VPN L3VPN
MCP TLP
AGG1 RSG1
TLP RNC
NodeB Service data

CSG
IP FPM packets
TLP
AGG2
RSG2 SGW/MME
eNodeB
TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow

Table 2-11 IP FPM deployment in an L2+L3 mixed VPN scenario
Stati Same as that in an HVPN scenario.

stics
objec
t
TLP For E2E services, deploy TLPs on the AC interfaces that carry services (Layer 2
depl user interfaces on the CSG and Layer 2/Layer 3 user interfaces on RSGs).
oym Configure flow characteristics based on the 5-tuple, start measurement and
ent counting, and send measurement data to the MCP through protocol packets
DCP Configure the CSG, RSG1, and RSG2 as DCPs to send measurement data to the
depl MCP.
oym
ent
MCP Routes are unreachable between the access and aggregation networks, and the CSG
depl does not have routes to RSGs. Therefore, deploy the MCP on an AGG. On the
oym network in Figure 2-31, deploy the MCP on AGG1.
ent
Cloc Same as that in an HVPN scenario.

k
depl
oym
ent
In packet loss measurement, ARP request messages are filtered out and unknown unicast
traffic is also filtered out based on the source IP addresses. The causes are as follows:

Equipment
l ARP request messages are generated on AGGs in the downstream direction.

l Broadcast traffic is generated in the downstream direction if the VSI has not learned
MAC addresses.
L3 Dual-homing Scenarios
Figure 2-32 shows an L3 dual-homing scenario in which a NodeB is dual-homed to two
CSGs and the NodeB's gateway address is the VRRP backup group's virtual IP address.
Figure 2-32 IP FPM application in an L3 dual-homing scenario

MCP
TLP
TLP
CSG1 AGG1 RSG1 RNC
1
2
Service data
VRRP
VRRP IP FPM packets 3
NodeB VLANIF
5
AGG2 4
CSG2 RSG2 SGW/MME
TLP TLP
IP FPM
TX end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 RX end
Service flow

In normal situations (no link or node failure between the RSGs and CSGs), traffic travels
along the primary path CSG1 -> AGG1 -> RSG1.
l If the link between RSG1 and S-GW/MME fails, traffic switches to the path CSG1 ->
AGG1 -> RSG1 -> RSG2.
l If RSG1 fails, the VPN routes advertised by RSG1 become invalid, and a master/backup
VRRP switchover occurs. As a result, traffic switches to the path CSG1 -> CSG2 ->
AGG2 -> RSG2.
l If CSG1 or the link between CSG1 and the NodeB fails, a master/backup VRRP
switchover occurs, and traffic switches to CSG2.
The upstream traffic enters the network from links 1 and 5 and leaves the network from links
2 and 4. Link 3 is a transit path. The measurement result can be obtained by comparing the
number of packets sent through links 1 and 5 and received through links 2 and 4. The
downstream traffic enters the network from links 2 and 4 and leaves the network from links 1
and 5, and the measurement result can be obtained in the same way. Table 2-12 lists how to
deploy IP FPM in an L3 dual-homing scenario.

Equipment
Table 2-12 IP FPM deployment in an L3 dual-homing scenario
Stati Same as that in an HVPN scenario.

stics
objec
t
TLP Deploy TLPs on the UNIs of CSG1, CSG2, RSG1, and RSG2, and bind the TLPs
depl to the UNIs.
oym
ent
DCP Configure the CSG1, CSG2, RSG1, and RSG2 as DCPs to send measurement data
depl to the MCP.
oym
ent
MCP If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG. If routes are unreachable between the access and aggregation
oym networks, deploy the MCP on an AGG. On the network in Figure 2-32, deploy the
ent MCP on RSG1.
Cloc Same as that in an HVPN scenario.

k
depl
oym
ent
DCPs send measurement data collected by the TLPs to the MCP. The MCP uses the
synchronization method to calculate the ingress and egress data to obtain the measurement
result. The measurement is irrelevant to the topology change, and therefore the deployment is
easy.
Summary
IP FPM offers the following benefits:
l Supports E2E performance measurement on large-scale networks.
l Supports service-based packet loss and delay measurement with high precision.
l Applies to various networking scenarios.
2.4.3.2 Hop-by-Hop Performance Measurement Scenarios
In end-to-end performance measurement, only the traffic entering and leaving the network is
measured. This measurement reflects only the quality of the entire network. If a network fault
occurs, end-to-end performance measurement cannot help locate the fault. To locate the fault,
IP FPM provides hop-by-hop performance measurement.
On the mobile backbone IP RAN shown in Figure 2-33, multiple NEs are deployed, and
services are complex. Once a network fault occurs, it is difficult to locate the fault.

Equipment
Figure 2-33 IP FPM fault locating networking

L2VPN L3VPN MCP

TLP1-1 TLP2-1 TLP3-1
RSG1
AGG1
TLP0-1 RNC
NodeB Service data

CSG
IP FPM packets
AGG2
TLP0-2 SGW/MME
eNodeB TLP2-2 RSG2 TLP3-2
TLP1-2
IP FPM
IP FPM can function with the NMS for fault locating. The process is as follows:
l The NMS provides visualized service paths for target flows by segmenting the service
forwarding path into multiple closed hops and delivering these closed hops to the DCPs
and MCP.
l The DCPs report the hop-by-hop measurement data to the MCP.
l The MCP calculates the packet loss and delay performance for each hop.
l The NMS displays the real-time data of each hop using the MIB, helping locate the fault.
Figure 2-33 lists how to deploy IP FPM on the network shown in Figure 2-33.
Table 2-13 IP FPM deployment in a fault locating scenario

Stati A specific service type on a base station is being measured. The service type can be
stics the 3G Ethernet service (signaling, voice, and data), S1 service (signaling, voice,
objec and data), OM service, or IP clock service. The service flow is identified by the
t source IP address, destination IP address, and DSCP value.
TLP Hop-by-hop performance measurement applies only to unidirectional target flows.

depl Deploy TLPs as shown in Figure 2-33 for upstream traffic. Hop-by-hop
oym performance measurement differs from end-to-end performance measurement in
ent the following aspects:
l TLPs must be deployed hop by hop, but not only on the network ingress and
egress.
l Atomic Closed Hops (ACHs) must be configured, and TLP in-groups and out-
groups must be specified for target flows in the ACHs. As shown in Figure
2-33, the ingress TLPs (TLP0-1 and TLP0-2) and the egress TLPs (TLP1-1 and
TLP1-2) can form an ACH. The MCP calculates the packet loss and delay of
each closed hop based on hop and synchronization information to obtain hop-
by-hop performance of the entire network. Similarly, TLP1-1 and TLP1-2 form
an ACH with TLP2-1 and TLP2-2; TLP2-1 and TLP2-2 form an ACH with
TLP3-1 and TLP3-2.

Equipment
DCP Configure all devices that have TLPs deployed as DCPs to send measurement data
depl to the MCP.
oym
ent
MCP If routes are reachable between the access and aggregation networks, deploy the
depl MCP on an RSG. If routes are unreachable between the access and aggregation
oym networks, deploy the MCP on an AGG. On the network in Figure 2-33, deploy the
ent MCP on RSG1.
Cloc Configure the network time protocol (NTP) or 1588v2 so that all device clocks can
k be synchronized.
depl
oym
ent
ACH Division Based on Measurement Sections

The MCP calculates hop-by-hop performance statistics based on hops along a service path.
Accurate measurement can be obtained based only on closed hops. In an ACH, traffic from
the ingress TLPs is sent to the egress TLPs, and the egress TLPs receive only traffic sent from
the ingress TLPs. As shown in Figure 2-34, the TLPs on nodes A and B form an ACH with
the TLP on node D; the TLPs on nodes D and E form an ACH with the TLP on node F; the
TLP on node F forms an ACH with the TLPs on nodes G, H, and I.
Figure 2-34 Path-based ACH division
A G
B D F H
C E I
In some situations, the accurate path diagram cannot be obtained, and subsequently ACHs
cannot be formed. In this case, some key points on the path can be pinpointed to form
measurement sections through which traffic passes through, as shown in Figure 2-35. ACHs
can be divided based on these measurement sections.

Equipment
Figure 2-35 Measurement section-based ACH division
Measurement
section
In hop-by-hop performance measurement, the MCP measures the packet loss and delay based
on ACHs. The smaller scale an ACH covers, the more accuracy for fault locating. ACH
division helps identify a local area, a direct link, or inbound and outbound interfaces on a
device.

Terms
Term Description
DCP Data Collecting Point, a device that manages TLPs, collects statistics
generated by TLPs, and reports the statistics to the MCP in the IP
FPM model.
IP FPM IP Flow Performance Measurement, a method used to measure

service flows to obtain the packet loss rate and delay of end-to-end IP
networks.
MCP Measurement Control Point, a device that collects and calculates

statistics reported by DCPs in the IP FPM model.
TLP Target Logical Port, an interface that compiles statistics and outputs
data in the IP FPM model.

Equipment
Acronyms and Abbreviations

Acronym&Abbr Full Name
eviation
DCP Data Collecting Point
IP FPM IP Flow Performance Measurement
MCP Measurement Control Point
TLP Target Logical Port
2.5 NQA
2.5.1 Introduction to NQA
Definition
Network Quality Analysis (NQA) is a feature provided by the device. Independent of lower-
layer hardware, NQA functions above the link layer to measure the performance of protocols
running at the network layer, transport layer, and application layer.
Purpose
The device provides NQA to help carriers to monitor network QoS in real time and locate the
faults occurring on the network.
To visualize the qualities of network services and allow users themselves to check whether the
qualities of network services meet requirements, carriers must take the following measures:
l Provide statistics about the device to illuminate the qualities of network services.
Owing to the statistical multiplexing and traffic burst of IP networks, NQA can only be
described by statistics. Therefore, carriers need to provide relevant statistical parameters
such as delay, jitter, and packet loss ratio at the equipment side.
l Monitor the qualities of network services by deploying probe devices.
As the scale of networks continuously increases, if dedicated probe devices are used, for
example, the third party probe device Brix, more and more probe devices are needed.
This will increase carriers' expenditure.
The device provides NQA to meet the preceding requirements. Through the network quality
test function integrated in a device, NQA can accurately test the operating status of the
network and collect statistics. In addition, since dedicate probe devices are not required, NQA
provided by the device also reduces the carriers' cost.
NQA measures the performance of different protocols running on the network. In that case,
carriers can collect operation indexes of networks in real time, such as delay in setting up a
TCP connection, file transmission rate, and delay in setting up an FTP connection.
The Ping operation is a traditional method to monitor the network quality. Compared with
information collected through NQA, the information collected through the Ping operation is

Equipment
limited. The following shows the differences between NQA and Ping in the aspect of
functions and configurations.
Table 2-14 Comparisons between NQA and Ping

NQA Ping
Function NQA not only supports Internet Control Message Ping is based on ICMP
Protocol (ICMP) tests but also service availability and is used to test only
tests (such as TCP, UDP, FTP, SNMP, Traceroute, the round-trip time (RTT)
and LSP Ping/Traceroute services). Moreover, of a datagram between
NQA can be used to test the response time of the source and the
each service and the jitter time on the network destination and test the
through Jitter tests. reachability of the
By default, NQA supports a maximum of 50 test destination.
instances. A user can initiate only
one Ping operation at a
time.
Configura For NQA, you can run commands on the client to For Ping, you need to run
tion view NQA test results. Note the following: the ping command on the
In NQA, parameters of operations can be set and Console to test the
tests can be started through the Network reachability of a specified
Management System (NMS). You can obtain IP address. The RTT or
statistics by viewing the output test results and timeout period of every
history tables. packet can be displayed
in real time.
In most test instances, you only need configure
NQA clients. Configuring NQA servers is
necessary for FTP, TCP, UDP, and UDP Jitter test
instances.
An NQA server responds to the test request from
a client through the monitoring function. After
being configured with the corresponding
destination IP address and port number, the NQA
server can respond to the test request. The IP
address and port number specified in the
monitoring service on the server must be
consistent with those configured on the client.

Equipment
NQA Ping
Schedulin NQA supports test instance scheduling, which Ping only supports
g mode avoids concurrent test operating and hence command line delivery.
reduces the burden on a device.
NQA supports the configuration of different start
time and end time for a single test instance.
NQA supports three modes of starting tests:
immediate, delayed, and periodical.
NQA supports five modes of ending tests:
automatic, immediate, delayed, timely, and
ending tests when the life cycle of the test
expires.
When several tasks are performed at the same
time, a device reasonably arranges start time and
test intervals.
2.5.2 Principles
In NQA, two test ends are called an NQA client and an NQA server. An NQA test is initiated
by the NQA client. Users can configure test instances through command lines or the NMS.
Then, NQA places different types of test instances into various test queues for scheduling.
When starting an NQA test instance, you can choose to start the test instance immediately or
in a timing manner, or delay starting the test. A test packet is generated according to the test
type when the timer expires. If the size of the generated test packet is not in accordance with
the minimum size of the protocol packet, the test packet must be generated and sent out with
the size being the defined minimum size of the protocol packet.
After the test instance starts, a response packet is returned. Carriers can then know the
operating status about the protocol by analyzing the received response packet. The test packet
is marked with a timestamp based on the local system time before being sent to the
destination. After receiving the test packet, the destination sends a response packet to the
source. The source then marks the received response packet with a timestamp based on the
current local system time. This helps the source to calculate the RTT of the packet according
to the time of sending and receiving the packet.
NOTE
For a Jitter test instance, not only the source needs to mark the packet with a timestamp but also the
destination needs to mark the packet with a timestamp based on the local system time after receiving the
packet and returning a response packet. In this way, the source can calculate the jitter time of the packet.
Carriers can know the network operating status by viewing test results.
2.5.2.1 UDP Jitter Test

A UDP jitter test uses timestamped UDP packets to measure the delay, jitter, and packet loss
rate. Jitter is calculated by subtracting the interval for sending two consecutive packets from
the interval for receiving these two packets.

Equipment
Figure 2-36 UDP jitter test networking
ATN A ATN B
Network
t1
t3 UDP Packet t1'
t3'
t2'
t2 t4'
t4
The process of a UDP jitter test is as follows:

1. Source (ATNA) adds timestamp t1 to a UDP packet and sends the packet to Destination
(ATNB).
2. Upon receipt of the packet, Destination (ATNB) adds timestamp t1' to the packet.
3. After processing the packet, Destination (ATNB) adds timestamp t2' to the packet and
forwards it back to Source (ATNA).
4. Upon receipt of the packet, Source (ATNA) adds timestamp t2 to the packet.
The following can be calculated based on the timestamp information in the packets received
by Client (ATNA):
l Maximum, minimum, and average jitter of the packets from Client (ATNA) to
Destination B and from Destination B to Client (ATNA)
l Maximum unidirectional delay from Destination (ATNB) to Client (ATNA) and from
Client (ATNA) to Destination (ATNB)
The obtained jitter information clearly reflects network status.
RTT=(t2–t1)-(t2'-t1')
If the RTT is longer than the specified timeout period, the network is congested and UDP
packets will be counted as lost packets.
Packet loss rate = Number of lost UDP packets/Number of sent UDP packets.
A UDP jitter test can measure jitter either unidirectionally or bidirectionally:
l Source-to-destination jitter=(t3'-t1')-(t3-t1)
A larger absolute jitter value indicates poorer link quality, no matter whether the jitter
value is positive or negative.
l Destination-to-source jitter=(t4–t2)-(t4'-t2')

Equipment
A UDP jitter test can also measure the packet loss rate unidirectionally.
Figure 2-37 Unidirectional packet loss measurement in a UDP jitter test
ATN A ATN B
Network
t1
t3 UDP Packet t1'

t2'
t3'
t4'
t2
t4 UDP Jitter
Request
UDP Jitter
Reply
On the network shown in the Figure 2-37, Server (ATNB) collects statistics about received
packets. After Client (ATNA) finds that the number of packets sent by itself is different from
the number of packets received by itself, Client (ATNA) initiates a unidirectional packet loss
query to learn the number of packets received by Server (ATNB).
Source-to-destination packet loss rate = Number of packets sent by Client (ATNA) –

Number of packets received by Server B
Destination-to-source packet loss rate = Number of packets received by Server (ATNB) –

Number of packets received by Client (ATNA)
If Client (ATNA) does not receive any query reply, Client (ATNA) records Packet Loss
Unknown.
2.5.2.2 UDP Jitter Test (hardware-based)

UDP Jitter (hardware-based) is a supplement to the UDP Jitter, which use the hardware
forwarding engine to transmit packets and add timestamps to packets. The hardware
forwarding engine can:
l Reduce the interval for sending packets.
l Increase the number of concurrent test instances.
l Enhance the accuracy on the calculation of the delay and jitter time.
This can reflect the network status in a more accurate way and improve the efficiency of the
device.

Equipment
NOTE
By default, UDP jitter (hardware-based) is not enabled. To implement a hardware-based UDP jitter test,
enable the interface board to send packets.
Figure 2-38 Application scenario of UDP jitter (hardware-based)
NQA agent NQA server
2.5.2.3 ICMP Jitter Test

An ICMP jitter test uses timestamped ICMP packets to measure the delay, jitter, and packet
loss rate. Jitter is calculated by subtracting the interval for sending two consecutive packets
from the interval for receiving these two packets.
Figure 2-39 ICMP jitter test networking
ATN A ATN B
Network
t1
t3 ICMP Packet t1'
t3'
t2'
t2 t4'
t4
The process of an ICMP jitter test is as follows:
1. Client (ATNA) adds timestamp t1 to an ICMP packet and sends the packet to Destination
(ATNB).
2. Upon receipt of the packet, Destination (ATNB) adds timestamp t1' to the packet.
3. After processing the packet, Destination (ATNB) adds timestamp t2' to the packet and
forwards it back to Source (ATNA).
4. Upon receipt of the packet, Source (ATNA) adds timestamp t2 to the packet.
The following can be calculated based on the timestamp information in the packets received
by Client (ATNA):

Equipment
l Maximum, minimum, and average jitter of the packets from Client (ATNA) to
Destination (ATNB) and from Destination B to Client (ATNA)
l Maximum unidirectional delay from Destination (ATNB) to Client (ATNA) and from
Client (ATNA) to Destination (ATNB)
The obtained jitter information clearly reflects network status.
An ICMP jitter test can measure jitter either unidirectionally or bidirectionally:
l Source-to-destination jitter=(t3'-t1')-(t3-t1)
l Destination-to-source jitter=(t4–t2)-(t4'-t2')
RTT=(t2–t1)-(t2'-t1')
If the RTT is longer than the specified timeout period, the network is congested and ICMP
Packet loss rate = Number of lost ICMP packets/Number of sent ICMP packets
In an ICMP jitter test, you can set the number of packets to be sent consecutively in a single
test instance to simulate a certain type of traffic.
2.5.2.4 ICMP Jitter Test (hardware-based)

ICMP Jitter (hardware-based) is a supplement to the ICMP jitter, which uses the hardware
forwarding engine to transmit packets and add timestamps to packets. The hardware
forwarding engine can:
l Reduce the interval for sending packets.
l Increase the number of concurrent test instances.
l Enhance the accuracy on the calculation of the delay and jitter time.
The following items can be calculated based on the information in the packets received by the
client:
l Maximum, minimum, and average jitters of the packets from the client to the server and
from the server to the client
l Maximum unidirectional delay from the server to the client or from the client to the
server
The obtained jitter can clearly reflect network status.
Figure 2-40 Application scenario of ICMP jitter (hardware-based)
NQA agent NQA server

Equipment
2.5.2.5 Path Jitter Test

An NQA UDP jitter test instance can accurately measure the delay and jitter along the path
from the client to the server, but cannot figure out the faulty location if the jitter value is too
great. An NQA path jitter test instance, however, can identify the ATN whose jitter value is
great.
The NQA path jitter test first identifies the IP address of each hop from the client to the server
by initiating a trace test, and then initiates an ICMP jitter test from the client to obtain the
jitter value of each hop along the path. Figure 2-41 shows the process of a path jitter test:
1. ATN A initiates a trace test to obtain the IP address of each hop along the path to ATN C.
2. ATN A initiates an ICMP jitter test to the IP address of each hop to obtain the jitter value
of each hop.
Figure 2-41 Application scenario of a Path Jitter test
ATN A ATN B ATN C
2.5.2.6 FTP Test

An NQA FTP test is used to obtain the speed of downloading a specified file from the FTP
server or uploading a specified file to the FTP server. The FTP test is borne by TCP packets.
Through an NQA FTP test, you can obtain the responding speed in two phases. Figure 2-42
shows the process of an FTP test.
l Time to set up a control connection: It is the time taken by the client to set up a TCP
control connection with the FTP server through three-way handshake and the time taken
to interchange signals through the control connection.
l Time to set up a data transmission connection: It is the time taken by the client to
download a specified file from the FTP server or upload a specified file to the FTP server
through the data transmission connection.
Through an FTP test, the following can be calculated based on the information in the packets
received by the client:
l Minimum, maximum, and average time to set up a control connection

l Minimum, maximum, and average time to set up a data transmission connection
These statistics can clearly reflect the performance of the FTP protocol over the network.
Figure 2-42 Applicable scenario of the FTP test
FTP Client FTP Server

Equipment
NOTE
l At present, the NQA FTP test only supports data transmission in proactive and ASCII modes.
Anonymous users cannot be used for the test.
2.5.2.7 SNMP Test

An NQA SNMP test is used to check the speed of the communication between a host and an
SNMP agent. The SNMP test is borne by UDP packets. Figure 2-43 shows the process of an
SNMP test.
1. The client (ATNA) sends a Request packet to the SNMP agent (ATNC) for obtaining the
system time.
2. After receiving the Request packet, the SNMP agent queries the system time and
constructs a Response packet, and sends the Response packet to the client.
After receiving the Response packet, the client calculates the time for the communication
between the client and the SNMP agent by subtracting the time at which the client
receives the Response packet from the time at which the client sends the Request packet.
This can clearly reflect the performance of the SNMP protocol over the network.
Figure 2-43 Application scenario of the SNMP test
ATN A ATN B SNMP Agent (ATN C)
2.5.2.8 TCP Test

An NQA TCP test is used to check the speed of setting up a TCP connection between a host
and a TCP server through three-way handshake. Figure 2-44 shows the process of a TCP test.
1. ATNA sends a TCP SYN packet to ATNB (TCP server) for setting up a TCP connection.
2. After receiving the TCP SYN packet, the TCP server accepts the request and responds
the client with a TCP SYN ACK packet.
3. After receiving the SYN ACK packet, the client returns an ACK packet to the TCP
server. Then, a TCP connection is successfully set up.
The client can calculate the time taken in three-way handshake for setting up the TCP
connection with the TCP server by subtracting the time at which the client receives the
packet to the time at which the client sends the packet. This can clearly reflect the
performance of the TCP protocol over the network.
Figure 2-44 Applicable scenario of the TCP test
ATN A ATN B

Equipment
2.5.2.9 UDP Test

In NQA, a UDP test measures the rate at which a client and a UDP server exchange packets.
Figure 2-45 UDP test networking
UDP Server
ATN A ATN B
Network
t1
UDP packet
UDP packet
t2
The process for a UDP test is as follows:
1. Client (ATNA) adds timestamp t1 to a UDP packet and forwards the packet to UDP
Server (ATNB).
2. Upon receipt of the packet, UDP Server (ATNB) directly forwards it back to Client
(ATNA).
3. Upon receipt of the packet, Client (ATNA) adds timestamp t2 to the packet. Client
(ATNA) then calculates the time used for communication between itself and UDP Server
(ATNB) by subtracting the time at which it sends the UDP packet (t1) from the time at
which it receives the UDP packet (t2). The calculation result is called delay, a
performance counter clearly reflecting UDP performance.
If the delay is longer than the specified timeout period, the network is congested and UDP
A UDP test can also measure the packet loss rate using the following formula:
Packet loss rate = Number of lost UDP packets/Number of sent UDP packets
The packet loss rate further reflects network status.
2.5.2.10 ICMP Test

In NQA, an ICMP test checks whether the route from an NQA client to a destination is
reachable. An ICMP test provides functions similar to the ping command, but the ICMP test
provides more information.

Equipment
l By default, the results of the latest five tests can be saved.

l The test results include the average delay, the packet loss rate, and the time the last
packet is correctly received.
Figure 2-46 ICMP test networking
ATN A ATN B
Network
t1
icmp echo
request
icmp echo
reply
t2
The process for an ICMP test is as follows:
1. Client (ATNA) adds timestamp t1 to an ICMP Echo Request packet and send the packet
to Destination (ATNB).
2. Upon receipt of the packet, Destination (ATNB) responds Client (ATNA) with an ICMP
Echo Reply packet.
3. Upon receipt of the packet, Client (ATNA) adds timestamp t2 to the packet. Client
(ATNA) then calculates the time used for communication between itself and Destination
(ATNB) by subtracting the time at which it sends the ICMP Echo Request packet (t1)
from the time at which it receives the ICMP Echo Reply packet (t2). The calculation
result is called delay, a performance counter clearly reflecting network status.
If the delay is longer than the specified timeout period, the network is congested and ICMP
An ICMP test can also measure the packet loss rate using the following formula:
Packet loss rate = Number of lost ICMP packets/Number of sent ICMP packets
The packet loss rate further reflects network status.
NOTE
An ICMP test is usually conducted to check the connectivity. However, it cannot accurately test the link
delay. Therefore, to test link performance, you are advised to conduct an NQA jitter or ICMP jitter test
with hardware-based packet sending enabled.

Equipment
2.5.2.11 Trace Test

An NQA Trace test is used to detect the forwarding path between the NQA client and a
destination and collect statistics related to the devices along the forwarding path. The Trace
test has a similar function with the tracert command. The difference is that the Trace test
provides more output:
l Information about each hop in the test result contains average delay and packet loss ratio.
l The test result contains the time at which the last packet is received.
Figure 2-47 shows the process of a Trace test.
1. The client (ATN A) constructs a UDP packet, with the TTL being 1, and sends the packet
to the destination (ATN B).
2. After the first-hop device (ATN C) receives the UDP packet, it checks the TTL field and
finds that the TTL decreases to be 0. Then, ATN C returns an ICMP Time Exceeded
packet.
3. After the client receives the ICMP Time Exceeded packet, it receives the IP address of
the first-hop device and re-constructs a UDP packet, with the TTL being 2.
4. After the second-hop device (ATN D) receives the UDP packet, it checks the TTL field
and finds that the TTL decreases to be 0. Then, ATN D returns an ICMP Time Exceeded
packet.
5. The procedure repeats and after the packet reaches the last-hop device, the device returns
an ICMP Port Unreachable packet to the client.
The client (ATN A) can then obtain the forwarding path from the client to the destination
and collect statistics related to each device along the forwarding path based on the ICMP
packet returned by each hop. This can clearly reflect the network status and the
forwarding path from the source host to the destination host, and display the statistics on
each device along the forwarding path.
Figure 2-47 Applicable scenario of the Trace test
ATN C
IP Network
ATN A ATN B
ATN D

Equipment
2.5.2.12 LSP PING test

An NQA LSP ping test checks the reachability of two types of LSPs. As shown in Figure
2-48 LDP LSPs and TE LSPs,the LSP ping test works as follows:
1. In an LSP ping test, a UDP MPLS Echo Request packet is constructed first. The
destination IP field is filled with an IP address on the network segment 127.0.0.0/8. The
client searches for the LDP LSP based on the specified remote LSR ID and forwards the
packet through the LDP LSP in the MPLS domain. For a TE LSP, the packet can be sent
from a tunnel interface and forwarded along a specified CR-LSP..
2. The egress monitors port 3503 and returns an MPLS Echo Reply packet.
The client then calculates the time for the communication between the client and the
egress by subtracting the time at which the client receives the MPLS Echo Reply packet
from the time at which the client sends the MPLS Echo Request packet. This mechanism
help administrators get a snapshot of the MPLS network status.
Figure 2-48 Usage scenario of the LSP ping test
MPLS
Backbone
PE-A P PE-B
PW
VLAN1 VLAN2
NodeB RNC
2.5.2.13 LSP Trace test

An NQA LSP trace test checks the forwarding path of two types of LSPs: LDP LSPs and TE
LSPs, and collect statistics about devices along the path. As shown in Figure 2-49,the LSP
trace test works as follows:
1. In an LSP trace test, a UDP MPLS Echo Request packet is constructed first. The
client searches for the LDP LSP based on the specified remote LSR ID. The Echo
Request packet includes Downstream MapPing TLV that carries the information about
the downstream node of the current LSP node, such as the IP address of the next hop and
the outgoing label. The TTL value of the first Trace Echo Request packet is 1, .
2. The client forwards the Echo Request packet through the specified LDP LSP in the
MPLS domain. When TTL expires after the packet reaches the first node on the LSP
path, the node returns an MPLS Echo Reply massage.
3. The client continues sending Echo Request packets with TTL value increased by 1 each
time until all LSRs on an LSP return MPLS Echo Reply messages.

Equipment
After the client receives response massages from the LSRs, display and collect
information about the LSP forwarding path and devices along the path. This mechanism
help administrators get a snapshot of the LSP forwarding path from the source host to the
destination host and collect information about devices along the path.
Figure 2-49 Usage scenario of the LSP trace test
MPLS
Backbone
PE-A P PE-B
PW
VLAN1 VLAN2
NodeB RNC
2.5.2.14 LSP Jitter test

An NQA LSP Jitter test is used to detect the jitter, delay, and packet loss ratio on the two
types of LSPs, LDP LSPs and TE LSPs according to the timestamps in the packets. Figure
2-50 shows the process of an LSP Jitter test:
1. In an LSP Jitter test, a UDP MPLS Echo Request packet is first constructed. The
client then searches for the corresponding LSP and forwards the packet through the LSP
in the MPLS domain at certain intervals. For a TE LSP, the packet can be sent from a
tunnel interface and then forwarded along a specified CR-LSP.
2. The egress of the LSP monitors port 3503 and sends an MPLS Echo Reply packet
marked with a timestamp to the client.
3. After receiving the returned packet, the client calculates the packet jitter by subtracting
the interval for the egress to receive the packets from the interval for the client to send
the packets.
The client can also calculate the maximum, minimum, and average jitter time in the
transmission of the packet from the client to the egress. This can clearly reflect network
status.

Equipment
Figure 2-50 Application scenario of the LSP Jitter test
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
2.5.2.15 PWE3 Ping Test

An NQA PWE3 Ping test is used to detect the reachability of the PW for MPLS forwarding.
Figure 2-51 shows the process of a PWE3 Ping test:
1. The client constructs an MPLS Echo Request packet and sends the packet through a
specified PW based on the configured PW ID. After the packet reaches the remote PE,
the remote PE responds the client with an MPLS Echo Reply packet with the destination
address being the IP address of the interface that sends the MPLS Echo Request packet.
2. The client can forward data along the PW only when receiving an MPLS Echo Reply
packet returned by the remote PE.
The client can then calculate the time for the communication between the client and the
destination by subtracting the time at which the client receives the MPLS Echo Reply
packet to the time at which the client sends the MPLS Echo Request packet. This can
clearly reflect the PW status.

Equipment
Figure 2-51 Application scenario of the PWE3 Ping test
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
2.5.2.16 PWE3 Trace Test

An NQA PWE3 Trace test is used to detect the forwarding path of the MPLS-based PW and
collect statistics related to each device along the forwarding path. Figure 2-52 shows the
process of a PWE3 Trace test.
1. The client sends an MPLS Echo Request packet, with the TTL being 1. The Request
packet is forwarded through a specified PW. After the packet reaches the first-hop device
along the PW, its TTL decreases to be 0 and expires and the first-hop device returns an
MPLS Echo Reply packet.
2. After receiving the MPLS Echo Reply packet from the first-hop device, the client
continues to send an MPLS Echo Request packet along the specified PW, with the TTL
being 2. After the packet reaches the second-hop device along the PW, the TTL
decreases to be 0 and expires and the second-hop device returns an MPLS Echo Reply
packet.
3. The preceding procedure is repeated and as a result the client can collect information
about each node along the PW.
The client can obtain the forwarding path of the PW from the client to the destination
and collect statistics related to each LSR along the forwarding path based on the MPLS
Echo Reply packet returned by each LSR. This can clearly reflect the PW status.

Equipment
Figure 2-52 Application scenario of the PWE3 Trace test
MPLS
Backbone
PE-A P PE-B
PW
NodeB RNC
2.5.2.17 MAC Ping Test

An NQA MAC Ping test is a kind of detection tool provided by Ethernet OAM. It is
implemented based on 802.1ag. A MAC Ping test is initiated by an MEP, the destination node
can be any MP node in the same MA.
Figure 2-53 shows the process of an 802.1ag MAC Ping test from MEP1 to MEP2.
1. MEP1 sends a Loopback Message (LBM) to MEP2.
2. After receiving the LBM, MEP2 responds MEP1 with a Loopback Reply (LBR)
message. MEP1 then calculates the time of the Ping operation for analyzing the network
performance.
Within the timeout period:
– If MEP1 does not receive the LBR message from MEP2, it considers that the route
between MEP1 and MEP2 is not reachable.
– If MEP1 receives the LBR message from MEP2, it calculates the transmission delay
from MEP1 to MEP2 based on the timestamps carried in the messages.
During the test, the client can send multiple LBMs continuously and then observe
whether LBR messages are returned. Through a MAC Ping test, statistics about the
performance of Ethernet OAM, including average delay, jitter, and packet loss ratio can
be collected based on the timestamps in the test packets. These statistics can clearly
reflect the Ethernet network status.

Equipment
Figure 2-53 Application scenario of the MAC Ping test
LBM MEP2
LBR
MEP1 MEP3
MEP LBM data stream
LBR data stream
2.5.2.18 Path MTU test

A path MTU test instance is used to obtain the maximum MTU value that does not require
packet fragmentation during the packet transmission on the link.
When one host sends a large number of IP packets to another host, the IP packets are
fragmented according to the maximum acceptable packet length. This affect forwarding
efficiency. It is preferable that these packets be of the largest size that does not requires
fragmentation anywhere along the path from the client to the server. This packet size is
referred to as the path MTU.
Usually, the path MTU is equal to the minimum of the MTUs of each hop along the sub-paths.
As shown in Figure 2-54, the MTU value between ATN A and ATN B is 100 bytes and
between ATN B and ATN C is 200 bytes. Therefore, the path MTU value between ATN A and
ATN C is 100 bytes.
An NQA path MTU test is initiated from the client to the server. It requires several
incremental steps to estimate the maximum path MTU. Figure 2-54 shows the process of a
path MTU test:
1. ATN A sends an ICMP probe packet to ATN C, with the packet size as the minimum
range (The value is configurable and the default value is 48 bytes).
2. When the first probe packet successfully hits the destination, ATN A continues to send
ICMP probe packets with incremental steps (which is configurable and the default value
is 10 bytes) to ATN C until three consecutive packets time out. This indicates that the
MTU of the sent packet is greater than the minimum path MTU.
3. ATN A sends a 48-byte detection packet to ATN C to check the connectivity of the
network. If the connectivity of the network is normal, the size of the last successful
probe packet before the timeout in step 2 is the maximum path MTU.
NOTE
The packet header contains a Don't Fragment (DF) flag, indicating whether a packet can be fragmented.
The DF field should be set to 1, indicating that the device cannot fragment the packet.

Equipment
Figure 2-54 Application scenario of a Path MTU test
ATN A ATN B ATN C
2.5.2.19 VPLS Ping test

An NQA VPLS Ping test instance is used to test whether a particular server is reachable
across a VPLS network. Figure 2-55 shows the process of an NQA VPLS Ping test:
1. A VSI and a MAC address are specified. The MAC address can be the bridge MAC
address of the peer PE on the VPLS network or the MAC address of the CE on the user
side. The test instance constructs an MPLS Echo Request packet, with the network
address 127.0.0.0/8 being added to the IP header as the destination IP address. Then, the
MAC table learned on a PW side is checked. If an entry corresponding to the destination
address is found in the MAC table, the MPLS Echo Request packet is forwarded to the
PW; otherwise, the MPLS Echo Request packet is broadcast throughout all PWs in the
specified VSI.
2. The PE monitors the port numbered 3503. When the port receives the MPLS Echo
Request packet, the PE node responds with an MPLS Echo Reply packet.
3. If the MAC address specified on the client is the MAC address of the CE side, the MPLS
Echo Request packet is not actually forwarded to the CE. Instead, the MAC address of
the requested CE is searched on the PE node to which the requested CE is connected. If
the MAC address of the CE exists on the PE node, the VPLS ping test is regarded as
successful; otherwise, the test is regarded as failed.
The client can then calculate the time for the communication between the client and the
egress by subtracting the time at which the client receives the MPLS Echo Reply packet
from the time at which the client sends the MPLS Echo Request packet. This can clearly
reflect the MPLS network status.
Figure 2-55 Applicable scenario of the VPLS Ping test

MAC
0018-826D-4917
CE2
CE1
NodeB
PE1 PE2
PW vsi:a2
vsi:a2
VPLS
MAC port
PW
0018-826D-4917 GE0/3/1.3
PW
PE3
vsi:a2

Equipment
As shown in Figure 2-55, the process of initiating the VPLS ping test on PE1 is as follows.
1. A VPLS ping test instance is configured on PE1, with the MAC address of CE2, namely,
0018-826d-4917, as the destination MAC address. The entry corresponding to the
destination MAC address is not found in the MAC table on PE1. Consequently, the
MPLS Request packet is broadcast throughout PWs to the specified VSI.
2. Both PE2 and PE3 receive the MPLS Request packet. Because the destination MAC
address and the bridge MAC address on PE3 are different, and no entry corresponding to
the destination MAC address is found in the MAC table on the CE, according to the split
horizon principle, the Request packet is not forwarded.
3. The destination MAC address and the bridge MAC address on PE2 are different. An
entry corresponding to the destination MAC address, however, is found in the MAC
table on the CE. In this case, an MPLS Reply packet is returned to the client, indicating
that the VPLS ping test is successful.
2.5.2.20 VPLS Trace test

An NQA VPLS trace test instance is used to detect the connectivity of the forwarding path
across the VPLS network, locate faults on the VPLS network, and collect statistics about
every device along the forwarding path. Figure 2-56 shows the process of a VPLS trace test.
1. A VSI and MAC address are specified for the VPLS trace test. The MAC address can be
the bridge MAC address of the peer PE on the VPLS network or the MAC address of the
CE on the user side. The test instance creates an MPLS Echo Request packet and adds
the network address 127.0.0.0/8 to the IP header as the destination IP address. Then, the
MAC table learned on a PW side is checked. If an entry corresponding to the destination
address is found in the MAC table, the MPLS Echo Request packet is forwarded to the
PW; otherwise, the MPLS Echo Request packet is broadcast throughout all PWs
corresponding to the specified VSI. The Echo Request packet should include a
downstream mapping TLV that carries downstream information about the LSP at the
current node, such as next-hop address and outbound label. The TTL (MPLS TTL) value
of the first MPLS Echo Request packet is 1.
2. The MPLS Echo Request packet is forwarded in the VPLS domain through a specified
PW. When the TTL carried in the packet expires, an MPLS Echo Reply packet is
returned.
3. The client continues to send Echo Request packets carrying incremental TTL values.
Such a process repeats until the destination or the edge of the VPLS network return their
responses. Then, the VPLS trace process is complete.
4. If the MAC address specified on the client is the MAC address of the CE, the MPLS
Echo Request packet is not forwarded to the CE and the PE node that is connected to the
CE is checked. If the MAC address of the CE exists on the PE node, the VPLS trace test
is successful; otherwise, the test is failed.
Based on the MPLS Echo Reply packet returned by each node, the client collect and
display information about the forwarding path from the client to the server and each
device along the forwarding path.

Equipment
Figure 2-56 Applicable scenario of a VPLS trace test

MAC
0018-826D-4917
CE1 CE2
NodeB
Sending Receiving
PE PE
PW vsi:a2
vsi:a2
VPLS
MAC port
PW
0018-826D-4917 GE0/3/1.3
TT
PW
L1 PE3
vsi:a2
TTL2
Figure 2-56 shows the process of a VPLS trace test initiated on the client PE.
1. A VPLS trace test instance is configured on the sending PE, with the MAC address of
CE2, namely, 0018-826d-4917, as its destination MAC address. An MPLS Echo Request
packet with the TTL being 1 is sent. Because no destination MAC address is found on
the sending PE, the MPLS Echo Request packet is broadcast throughout all PWs of the
specified VSI.
2. After receiving the MPLS Echo Request packet, PE3 checks Because the destination
MAC address and the bridge MAC address on the PE3 are different, and no entry
corresponding to the destination MAC address is found in the MAC table, when the TTL
carried in the MPLS Echo Request expires, the packet is not forwarded and an MPLS
Echo Reply packet is returned to the sending PE.
The Receiving PE receives the MPLS Echo Request packet. The destination MAC
address and bridge MAC address on the Receiving PE are different. An entry
corresponding to the destination MAC address exists, however, is found in the MAC
table on the CE. In this case, an MPLS Reply packet is returned to the Sending PE,
indicating that the VPLS ping test is successful.
2.5.2.21 VPLS PW Ping test and VPLS PW Trace test

As a main technology for setting up a metropolitan area network (MAN), Virtual Private LAN
Service (VPLS) has been widely applied globally. VPLS, however, is poor in terms of service
management and monitoring. In this case, an optimized VPLS OAM mechanism is required.
On a VPLS network, the performance of PWs affects the entire network performance. For
example, the connectivity of PWs determines whether traffic can be normally forwarded
between users, and the forwarding performance of PWs determines whether the forwarding
capacity of the network complies with the Service Level Agreement (SLA) signed with users.
To monitor PWs on the VPLS network, VPLS PW ping and VPLS PW trace are developed
for detecting the connectivity of PWs, collecting performance information about PWs,
discovering packet forwarding paths along PWs, and locating faults on PWs.

Equipment
VPLS PW ping or VPLS PW trace operations initiated through NQA commands are the same
as ping or trace operations initiated through common command lines in principle, and
additionally provide the scheduling and result collection mechanism and the threshold-
exceeding alarm function.
VPLS PW ping and VPLS PW trace comply with RFC 4379 and RFC 5085 in implementing
PW detection: MPLS echo packets that carry Forwarding Equivalence Class (FEC) fields are
encapsulated in tunnel mode and labeled with the Router Alert option; the Router Alert
function is enabled on the VPLS network. MPLS echo packets are transmitted between PEs to
detect PWs and are not sent to CEs, which means that the NQA test instance can be
configured only on PEs. If an NQA test instance is configured on a non-PE device, it cannot
be started because there is no VSI used for establishing PWs on the non-PE device and as a
result, the test result is "drop."
During the VPLS PW detection through an NQA test instance, threshold monitoring and
NQA test instance scheduling can be actively performed based on the specifications defined
in the IP SLA.
l A VPLS PW ping or VPLS PW trace test instance can be configured to actively monitor
VPLS services and detect faults in VPLS services. In the case that the round-trip time
(RTT) of a packet exceeds the threshold, a connection is interrupted, or the response to a
request packet times out, an SNMP trap message in an NQA test instance is sent to the
Network Management System (NMS) for notification and collects the statistics (such as
the RTT) for users to query.
l The scheduling function can be enabled to periodically schedule an NQA test instance to
detect a specific VPLS PW as required. When multiple NQA test instances are started
concurrently to detect multiple PWs, the scheduling function enables these NQA test
instances to operate separately and arranges the operation time properly so that as many
test instances as possible can be started for PW detection. The maximum number of test
instances that the system allows is calculated based on the traffic metric of test instances.
Procedures for starting a VPLS PW ping test instance are as follows:

1. Parameters for the test instance are configured on the sender PE. For example, on a
Martini VPLS network, the destination address, PW ID, and VSI name need to be
configured. Then, the NQA module constructs an MPLS echo request packet carrying
the timestamp in the private type-length-value (TLV) field and the Router Alert option,
and sends the MPLS echo request packet to the public network based on the forwarding
information in the forwarding table.
2. After receiving the MPLS echo request packet, the receiver PE parses the packet and
determines whether it is the destination of the packet. If so, the receiver PE sends the
MPLS echo request packet to the CPU for processing. After that, it constructs an MPLS
echo reply packet carrying timestamps indicating the time when the MPLS echo request
packet is received and when the MPLS echo reply packet is sent, and then sends the echo
reply packet to the sender PE in a pre-defined manner.
3. When the MPLS echo reply packet reaches the sender PE, the sender PE saves the time
information in the packet to the test result table. Based on the time information, the
system can calculate the unidirectional delay of the packet transmission after the sender
and receiver PEs have synchronized their clocks. If the sender PE does not receive the
echo reply packet within the timeout period, it saves an error packet record to the test
result table.
4. In the case of a multi-segment PW formed by connecting a VLL PW to a VPLS PW, the
intermediate PE receives the MPLS echo request packet and finds itself not the
destination of the packet. Then, it searches for a route based on the packet's incoming

Equipment
label and sends the MPLS echo request packet to the pre-defined destination along the
route.
Procedures for starting a VPLS PW trace test instance are as follows:
1. Parameters for the test instance are configured on the sender PE. For example, on a
Martini VPLS network, the destination address, PW ID, and VSI name need to be
configured. Then, the NQA module constructs an MPLS echo request packet carrying
the timestamp in the private TLV field and the Router Alert option, and sends the MPLS
echo request packet to the public network based on the forwarding information in the
forwarding table. The initial and maximum TTL values of the MPLS echo request packet
to be sent can be specified, and the TTL value of the first MPLS echo request packet is 1.
2. After receiving the MPLS echo request packet, the intermediate PE checks whether the
TTL value of the packet expires. If so, the intermediate PE sends the MPLS echo request
packet to the CPU for processing. After that, the intermediate PE constructs an MPLS
echo reply packet and encapsulates the downstream TLV into the MPLS echo reply
packet. After obtaining the next hop information based on the packet's incoming label
and the inbound interface index, the intermediate PE sends the MPLS echo reply packet
to the sender PE.
3. After the sender PE receives the MPLS echo reply packet, it keeps sending MPLS echo
request packets with the TTL values being increased by 1 each time a packet is sent until
an MPLS echo request packet reaches the destination or the TTL value reaches the upper
limit.
4. Information about Ps is not displayed in the NQA test result by default. It can be
obtained by running the lsp-path full-display command.
2.5.2.22 General Flow Test

NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the General Flow
Test function.
Overview
An NQA general flow test is a standard traffic testing method for evaluating network
performance and is in compliance with RFC 2544. This test can be used in various
networking scenarios that have different packet formats. NQA general flow tests are
conducted using UDP packets with source UDP port 0xC020 and destination UDP port 7. As
defined in RFC 2544, in a general flow test, test results can be written into a file and
proactively pushed to an STP or SFTP server.
Before a customer performs a service cutover, an NQA general flow test helps the customer
evaluate whether the network performance counters meet the requirements in the design. An
NQA general flow test has the following advantages:
l Enables a device to send simulated service packets to itself before services are deployed
on the device.
Existing methods, unlike general flow tests, can only be used when services have been
deployed on networks. If no services are deployed, testers must be used to send and
receive test packets.
l Uses standard methods and procedures that comply with RFC 2544 so that NQA general
flow tests can be conducted on a network on which both Huawei and non-Huawei
devices are deployed.

Equipment
Related Concepts
l Specified lower threshold bandwidth: a dynamic value that changes while a test is being
conducted. The initial value of the lower threshold bandwidth is configured.
l Specified upper threshold bandwidth: a dynamic value that changes while a test is being
conducted. The initial value of the upper threshold bandwidth is configured.
Test Procedure
A general flow test is an NQA test tool using UDP packets. Before a general flow test is
conducted, the push function must be configured on an initiator, and the initiator must have a
reachable route to the FTP or SFTP server. An initiator (NQA client) initiates a general flow
test and sends test packets to a reflector. After the test packets arrive at the reflector, the
reflector interchanges the source and destination addresses in the packets and loops the
packets to the initiator. The initiator counts the number of sent and received packets and
calculates indicators based on timestamps carried in the packets. After the general flow test is
complete, the initiator writes test results into a file and upload the file onto an FTP or SFTP
server.
A general flow test measures the following counters:
l Throughput: maximum rate at which packets are sent without loss. The value is
expressed in kbit/s.
l Packet loss rate: percentage of discarded packets to all sent packets.
l Latency: consists of the bidirectional delay time and jitter calculated based on the
transmission and receipt timestamps carried in test packets. The transmission time in
each direction includes the time the forwarding devices process the test packet. The
value is expressed in microseconds.
These counters are calculated in separate tests. A counter must be specified before a test is
conducted.
On the network shown in Figure 2-57, UNI-A (User Network Interface A) is an initiator, and
UNI-B is a reflector. UNI-A and UNI-B conduct tests on the throughput, delay time, and
packet loss rate.

Equipment
Figure 2-57 General flow test networking
FTP/SFTP server
Push
Initiator Reflector
UNI-A
UNI-B
Looped traffic
Throughput tests
NOTE
The packet encapsulation format and the percentage of valid payloads vary with the service scenario.
Therefore, the network throughput data obtained through general flow tests (RFC 2544 tests) differs.
This difference exists no matter whether the network throughput is measured using general flow tests or
any other test method.
Use an L3VPN scenario where two devices are connected through sub-interfaces as an example. The
RFC 2544 test rate is calculated based on the L1 rate, with both the inter-frame gap of Ethernet packets
(12 bytes by default) and the preamble (8 bytes by default) being considered. During L3VPN access,
two MPLS labels (4 bytes per label) need to be added, and a VLAN tag (4 bytes) needs to be added for
each public network sub-interface. Therefore, the scenario-affected theoretical network throughput can
be calculated using the following formula: (Test packet length + Interframe gap length + Preamble
length)/(Test packet length + Interframe gap length + Preamble length + Length of the two MPLS labels
added + Length of the VLAN tag added) x Link bandwidth. For example, if the test packet length is 64
bytes, the theoretical network throughput is calculated as follows: (64 + 12 + 8)/(64 + 12 + 8 + 8 + 4) x
Link bandwidth = 87.5% x Link bandwidth. The theoretical value is for reference only. In real-world
applications, the value may also be affected by other factors.
l Lossless mode: also called the share mode on an interface. The interface allows both
RFC 2544 traffic and non-RFC 2544 traffic to share bandwidth so that non-RFC 2544
traffic can be transmitted properly without being discarded or interrupted.
NOTE
In lossless mode, when bandwidth congestion occurs, RFC 2544 traffic and non-RFC 2544 traffic
affect each other.
l Lossy mode: This mode supports exclusive port occupation and priority-based blocking.
The test will interrupt services.
– Exclusive port occupation: The port is exclusively occupied, and non-RFC 2544
traffic will be discarded, causing service interruptions.
– Priority-based blocking: Only service packets of a specified priority are blocked,
and those of other priorities can be properly forwarded.
Recommended test method

Equipment
l Configure a coarse-grained test instance and set precision at the M level.

l Based on the test result, configure a refined test instance and set precision at the K level.
l Set an appropriate duration to provide sufficient time for a test to run and find an
appropriate rate to prevent test timeouts.
Test procedure
Throughput tests are conducted to test throughput values by sending test packets at rates of
the specified upper and lower rate thresholds or at rates in between. The difference between
the test result and actual throughput must be less than a specified precision value. The test
procedure is as follows:
1. An initiator sends test packets at a rate equal to the lower threshold. The network
bandwidth is acceptable if no packet is dropped within a specified period or the packet
loss rate is less than the configured packet loss rate. Then the test continues.
2. The initiator sends test packets at a rate equal to the upper threshold. The network
bandwidth is acceptable if no packet is dropped within a specified period or the packet
loss rate is less than the configured test failure percentage. Then the test continues.
3. The initiator changes rates to send test packets to find a maximum rate that is the final
throughput test result. In the previous step, if the actual packet loss rate is greater than
the configured packet loss rate, the initiator uses the bisection method to attempt to send
test packets at different rates between the upper and lower rate thresholds. This process
repeats until a maximum rate is found when the test result meets the throughput
precision, and the packet loss rate is less than the configured packet loss rate.
Test end rules
l If the satisfying bandwidth is found within a configured bandwidth range, the test ends.
l A test times out, after the configured duration expires. In the test results, the test times
out, the tested bandwidth is recorded. In addition, as the packet lost ratio cannot be
calculated, the device considers all packets discarded.
Latency tests
Latency tests can only be conducted when background traffic is being transmitted. An
initiator sends background traffic at a specific rate and test packets at a specific interval to a
reflector. The initiator then calculates the bidirectional delay time and jitter based on the
transmission and receipt time.
Packet loss rate tests
An initiator sends test packets at a specific rate and interval to a reflector. Software collects
statistics about the sent and received packets every second. The initiator stops sending test
packets and counts the number of sent and received packets. The initiator then calculates the
packet loss rate based on the statistics.
Applications
A general flow test can be used in the following scenarios:
l Layer 2: native Ethernet scenario and L2VPN scenario, including Virtual Leased Line
(VLL) and Virtual Private LAN Service (VPLS) networking
l Layer 3: native IP scenario and L3VPN scenario
l IP gateway scenario

Equipment
Figure 2-58 General flow test networking
FTP/SFTP server
Push
Initiator Reflector
UNI-A
UNI-B
Looped traffic
In both the Layer 2 and Layer 3 scenarios, a general flow test is performed between two UNIs
on the network shown in Figure 2-58. Before a test instance runs, the push function must be
enabled on the initiator, and the initiator must have a reachable route to an FTP or SFTP
server. The initiator sends test packets to the reflector. The reflector returns all test packets
received by a reflector interface or only returns packets matching a specific filter condition.
After the initiator receives the test packets, it collects statistics and yields test results based on
the statistics. The initiator writes the test results into a file and uploads the file onto an FTP or
SFTP server.
Figure 2-59 General flow test in the IP Gateway scenario
FTP/SFTP server
Push
UNI-A
Reflector Initiator
IP gateway
Outbound interface of the initiator
Looped traffic

Equipment
On the network shown in Figure 2-59, a reflector functions as a switch. Layer 3 services on a
user-side CE are sent to an IP gateway (initiator) through a Layer 2 network. A general flow
test can be conducted in this scenario. The procedure is similar to that in the Layer 3 scenario.
Unlike the initiator in the Layer 3 scenario, the IP gateway cannot learn the MAC address of
the reflector or CE. The reflector simulates a user logging in to the CE and sends gratuitous
ARP packets to the IP gateway. The IP gateway can learn the MAC address carried in the
gratuitous ARP packets.
2.5.2.23 Ethernet Service Activation Test
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the Ethernet
Service Activation Test function.
Background
Ethernet service activation test is a technique that is provided for carriers to evaluate Ethernet
performance. Before rolling out services, carriers are desperate to know current network
performance, whether network configurations are correct, and whether the network
performance meets the service level agreement (SLA). The information facilitates future
business planning and service promotion. Therefore, a highly reliable and precise
performance test method is essential for the carriers to quickly evaluate their network
performance.
To address this issue, the Internet Engineering Task Force (IETF) published RFC 2544
"Benchmarking Methodology for Network Interconnect Devices", defining generalflow
testing as a standard method of evaluating network performance. However, this type of test
has restrictions in real-world applications due to the following disadvantages:
l The results of a generalflow test denote the network performance boundary only. They
cannot be used to evaluate whether a specified service meets the SLA.
l The test is unable to verify network configurations or provide performance counters,
such as the committed information rate (CIR), excess information rate (EIR), and color
mode (CM).
ITU-T Y.1564 (Ethernet Service Activation Test Methodology) was then published in 2011,
providing the test method of Ethernet service activation for carriers to evaluate Ethernet
performance. The values of performance counters in an Ethernet service activation test report
are different from those of the same performance counters in a generalflow test report. Table
2-15 lists the differences. Compared with a generalflow test, an Ethernet service activation
test offsets the preceding disadvantages and better reflects the performance of network
resources leased to customers.

Equipment
Table 2-15 Performance counters that the generalflow and Ethernet service activation test
reports both provide
Performance Counter Generalflow Test Ethernet Service
Activation Test
Network throughput Maximum transmission rate l Minimum information

rate
l Maximum information
rate
l Average information rate
Delay l Minimum delay l Minimum frame

l Maximum delay transmission delay
l Average delay l Maximum frame
transmission delay
l Average frame
transmission delay
Packet loss ratio Packet loss ratio Number of lost frames/

frame loss ratio
Jitter l Minimum jitter l Minimum frame jitter

l Maximum jitter l Maximum frame jitter
l Average jitter l Average frame jitter
Related Concepts
Bandwidth profile
A bandwidth profile defines the bandwidth that a carrier assigns user services that need to
enter a carrier network and the priorities based on which the user services are processed.
A bandwidth profile restricts the transmission rate of service traffic using parameters, such as
CIR and EIR.
l CIR: Rate at which a frame relay network agrees to transfer information in normal
conditions. Namely, it is the rate, at which the token is transferred to the leaky bucket.
l EIR: Bandwidth for excessive or burst traffic above the CIR. It equals the result of the
actual transmission rate without the safety rate.
During testing, service frames are marked green, yellow, or red based on the CIR and EIR.
Devices on the network forward green frames, place yellow frames into queues, and discard
red frames. Figure 2-60 shows the mappings between service transmission rates, CIR and
EIR, and colors.

Equipment
Figure 2-60 Mappings between service transmission rates, CIR and EIR, and colors
Transmission
rate
100% of
link rate
CIR + EIR
CIR
Test period
Red frames
Non-conformant to either CIR or EIR
Yellow frames
Conformant to EIR
Green frames
Conformant to CIR
NOTE
The mapping process is also called traffic measurement, which can be implemented by hardware using
the token bucket technique.
In addition to CIR and EIR, the CM parameter is used to control traffic measurement. The
CM enables carriers to measure and process service traffic by marking traffic priorities (such
as 802.1p and DSCP), rather than only based on the CIR and EIR.
The CM parameter is essential when the same service has applications with different
performance requirements. For example, voice traffic that requires a low packet loss ratio and
short delay needs to be marked green. However, TCP file transfer traffic that is insensitive to
transmission problems can be marked yellow.
Either of the following CMs can be used:
l Color-aware mode
Before entering a carrier network, high-priority traffic is marked green, while low-
priority traffic is marked yellow. A carrier network device processes traffic marked with
different colors based on the mapping between the CIR and EIR.
In this mode, user traffic must match the mapping between the CIR and EIR. Otherwise,
the traffic color will be changed by the carrier network. For example, if the rate of the
traffic marked green falls between the CIR and the sum of the CIR and EIR, the traffic
color is changed to yellow by the carrier network. If the rate of the traffic marked yellow
exceeds the sum of the CIR and EIR, the carrier network discards the traffic on UNIs.
l Color-blind mode
A carrier network device processes traffic based on the mapping between the CIR and
EIR in compliance with the first in first out (FIFO) rule, regardless of traffic colors.

Equipment
Service acceptance criteria (SAC)

The SAC defines a set of parameters to evaluate Ethernet network performance. The
parameters represent the minimum SLA requirements for network services provided by a
carrier.
The SAC is defined in ITU-T Y.1564 and consists of information rate (IR), frame loss ratio
(FLR), frame transfer delay (FTD), frame delay variation (FDV), and availability (AVAIL).
Test mode
An Ethernet service activation test can be conducted in inward or outward mode on the
initiator and reflector. In addition, the test results can be written to files and pushed to FTP/
SFTP servers.
l Inward mode
In Figure 2-61, the initiator resides on the tested network and simulates user access
traffic on user-to-network interface (UNI). The reflector can be a customer edge (CE)
that supports reflection (namely, loopback).
Figure 2-61 Inward mode
FTP/SFTP server
Push
Customer
Customer network
network
PE1 PE2
(Initiator) (Reflector)
UNI A
UNI B
l Outward mode
In Figure 2-62, the initiator and reflector do not reside on the tested network. They are
external devices connected to NNI A or NNI B at one end of the network. The reflector
can be a CE that supports reflection.

Equipment
Figure 2-62 Outward mode
FTP/SFTP server
Push
Customer
Customer Network
Network
PE1 PE2
(Initiator) (Reflector)
UNI A
UNI B
Packet sending mode

Packets are sent in one-way mode in an Ethernet service activation test. An initiator sends one
or more test flows to the reflector. Upon receipt of these packets, the reflector loops packets
back to these interfaces, and the initiator calculates and displays test results.
One-way packet sending can be implemented as long as Y.1564 is enabled on the initiator and
reflector, regardless of whether their clock signals are synchronized. However, one-way
packet sending has the following deficiencies:
l Reflection must be enabled on the reflector. If the loopback function is enabled on an
interface of a reflector, the interface reflects all the received packets or the packets
matching the configured traffic attributes. Looping back packets adversely affects
services, and traffic that matches the configured traffic attributes may be discarded.
NOTE
Available traffic attributes are source MAC address, destination MAC address, source IP address,
destination IP address, source UDP port number, destination UDP port number, and VLAN ID.
l In a one-way test, the forward and reverse paths are different. Therefore, measurement
errors occur if configurations, including the link transmission rate, bandwidth profile,
quality of service (QoS), and route parameters, are inconsistent between the paths.
l The results of a burst test may be inaccurate, and the test may even lower the
performance of other services.
Due to the preceding deficiencies, the one-way packet sending mode cannot be used for strict
SLA performance verification. Use this mode only if you do not require high test precision,
testers are insufficient, or onsite test operations are difficult to perform.
Test Procedure
An Ethernet service activation test is conducted to check whether the transmission
performance of Ethernet frames meets the SLA. Figure 2-63 shows the two phases in an
Ethernet service activation test: configuration test and performance test. Before a general flow
test is conducted, the push function must be configured on an initiator, and the initiator must
have a reachable route to the FTP or SFTP server. After the general flow test is complete, the
initiator writes test results into a file and upload the file onto an FTP or SFTP server.
l Configuration test: Each service flow must be generated and tested separately to verify
the correctness of Ethernet service deployment.

Equipment
l Performance test: Each service flow must be tested based on the configured CIR to
measure service forwarding quality. A performance test takes a longer period of time to
complete than a configuration test.
Figure 2-63 Ethernet service activation test
Set test
parameters
Start the
test
Fail Configuration Fail Locating of the

Ethernet fault
test configuration test
locating fault
Pass
Performance
test
Pass
Test
completed
Configuration tests
Configuration tests include CIR, EIR, and traffic policing tests.
1. CIR test
CIR tests include simple and step CIR tests. Table 2-16 describes CIR test methods and
advantages.

Equipment
Table 2-16 CIR test methods and advantages

Test Method Description Advantages
Simple CIR test 1. An initiator The test speed is fast, and

periodically sends test the test process is short.
flows at the specified
CIR.
2. A reflector loops the
test flows back.
3. Upon receipt, the
initiator calculates
performance counters,
including the FLR,
FTD, FDV, and
AVAIL.
4. If the calculated FLR,
FTD, FDV, and AVAIL
are within the
configured SAC range,
the test is successful,
and the EIR test starts.
If the calculated
counters are out of the
the test fails, and the
follow-up tests will not
start.

Equipment
Step CIR test As shown in Figure 2-60: A step CIR test provides
1. An initiator more accurate network
periodically sends test status analysis, whereas
flows at 25%, 50%, the test time is longer than
75%, and 100% of the that of a simple CIR test.
configured CIR.
test flows back.
3. The initiator receives
test flows looped by
the reflector and
calculate the IR, FLR,
FTD, FDV, and AVAIL
at each rate.
4. If the calculated
counters are within the
and the next test can be
conducted. If the
calculated counters are
out of the configured
SAC range, the test
fails, and the whole test
stops.
5. If all tests are
successful, the CIR test
is successful, and the
EIR test is conducted.
2. EIR test
EIR tests are conducted in either color-aware or color-blind mode. Table 2-17 describes
EIR test methods and advantages.

Equipment
Table 2-17 EIR test methods and advantages

Color-aware mode 1. An initiator Color-aware mode: If the

periodically sends test same service implements
flows marked green at various applications,
the specified CIR and application-specific frames
test flows marked can be colored based on
yellow at the specified performance sensitivities.
EIR.
test flows back.
including the FLR,
FTD, FDV, and
AVAIL.
4. If the FLR, FTD, FDV,
and AVAIL are within
the configured SAC
range, the EIR test is
successful, and a traffic
policing test is then
conducted. If the
counters are out of the
the EIR test fails. The
EIR and the whole test
stop.
NOTE
In this mode, only green test
flows, not yellow ones, are
verified.

Equipment
Color-blind mode 1. An initiator A carrier network device

periodically sends test processes traffic in
flows at a rate equal to compliance with the FIFO
the sum of the rule, regardless of traffic
configured CIR and colors.
EIR.
test flows back.
including the IR, FLR,
FTD, FDV, and
AVAIL.
4. If the test results satisfy
the following formula,
and a traffic policing
test is then conducted:
CIR x (1 - FLRSAC) ≤
IR ≤ CIR + EIR
If the counters are out
of the configured SAC
range, the EIR test
fails. The EIR and the
whole test stop.
NOTE
FLRSAC is the configured
FLR, and the IR is the
calculated IR.
NOTE
If the EIR is set to 0 kbit/s, the EIR test is not conducted, and a traffic policing test is performed.
3. Traffic policing test
Traffic policing tests are conducted using either the color-aware or Color-blind mode.
Table 2-18 describes traffic policing test methods and advantages.

Equipment
Table 2-18 Traffic policing test methods and advantages

Color-aware 1. An initiator Color-aware mode:

periodically sends test Application-specific
flows marked green at frames in the same test
the specified CIR and instance can be colored
test flows marked based on performance
yellow at 125% of the sensitivities.
specified EIR. If the
EIR is less than 20% of
the configured CIR, the
initiator sends test
flows marked green at
100% of the configured
CIR and test flows
marked yellow at the
sum rate of 25% of the
configured CIR and
100% of the configured
EIR.
test flows back.
including the FLR,
FTD, FDV, and
AVAIL.
4. If the FLR, FTD, FDV,
and AVAIL are within
the configured SAC
range, and the test
results meet the
following formula, the
test is successful, and a
performance test can be
conducted:
IR ≤ CIR + EIR + M
range, the traffic
policing test fails. The
traffic policing and the
whole test stop.
NOTE
In this mode, only green test
flows, not yellow ones, are
verified.

Equipment
Color-blind mode 1. An initiator A carrier network device

periodically sends test processes traffic in
flows at the sum rate of compliance with the FIFO
the configured CIR and rule, regardless of traffic
125% of the configured colors.
EIR. If the configured
EIR is less than 20% of
the configured CIR, the
initiator sends test
flows at a sum rate of
the configured EIR and
125% of the CIR.
test flows back.
including the IR, FLR,
FTD, and % CIR.
4. If the test results satisfy
the following formula,
and a traffic policing
test is then conducted:
IR ≤ CIR + EIR + M
range, the traffic
policing test fails. The
traffic policing and the
whole test stop.
NOTE
M is a factor used to strike a
balance between traffic
policing and the test time. M
is an experience value and is
not defined in Y.1564. It is
set to (CIR + EIR) x 1% in
this document.
NOTE
If the traffic policing test for NQA test flows is disabled, a performance test is conducted
immediately.
Performance test
A device automatically starts a performance test only after the configuration tests, including
the CIR, EIR, and traffic policing tests, are complete.

Equipment
In a performance test, an initiator simultaneously sends test flows for all service flows at the
specified CIR.
A reflector loops the test flows back. Upon receipt, the initiator calculates performance
counters, including the IR, FLR, FTD, FDV, and AVAIL for each service flow. If the counters
are within the configured SAC range, the performance test for a service flow is successful. If
the counters are out of the configured SAC range, the performance for the service flow fails.
NOTE
During the Ethernet service activation test, if the master and the slave switch over in 1 to 1 mode, and
the ISSU upgrade is performed, the test may be failed and need to be restarted.
Usage Scenarios
An Ethernet service activation test applies to the following scenarios:
l Layer 2 scenarios: native Ethernet and L2VPN (such as a scenario where virtual leased
line (VLL) or virtual private LAN service (VPLS) services are created)
l Layer 3 scenarios: Native IP and L3VPN
l Virtual gateway scenario
NOTE
The L2VPN and L3VPN scenarios support only the inward mode.
Benefits
Ethernet service activation tests provide accurate test results that can reliably reflect the
performance of networks that carriers lease to customers. Therefore, Ethernet service
activation tests help carriers verify that network quality meets requirements before service
provisioning on networks when high-precision tests are not required, tests devices are
insufficient, or onsite operations are difficult to perform.

NQA Network Quality Analysis
QoS Quality of Service
LSP Label Switched Path
FTP File Transfer Protocol
PWE3 Pseudo Wire Emulation Edge-to-Edge
MPLS Multiprotocol Label Switching
UDP User Datagram Protocol
TCP Transport Control Protocol
MA Maintenance Association

Equipment
MD Maintenance Domain
MEP Maintenance Association End Point
MP Maintenance Point
CCM Continuity Check Message
LBM Loopback Message
LBR Loopback Reply
VPLS Virtual Private LAN Service
2.6 Ping and Tracert
2.6.1 Introduction to Ping and Tracert
Definition
The ping command is a very common debugging tool for testing the accessibility of devices.
It uses a series of Internet Control Message Protocol (ICMP) Echo messages to determine:
l Whether the remote device is available

l Round-trip delay in communicating with the remote host
l Packet loss
The tracert command is used to discover the gateways that packets actually pass through
when traveling from the source host to the destination host.
Purpose
When a device is faulty, you can use the ping and tracert commands to check network
connectivity.
The ping command is used to test the network connectivity and the host accessibility. The
source host first sends an ICMP request message to the destination host, and then waits for an
ICMP reply message.
The tracert command is used to check the network connectivity and locate network faults.
2.6.2 Principles
2.6.2.1 Working of Ping

The ping command sends an ICMP echo request message to an address, and then waits for a
reply. The ping is successful only if the echo request gets to the destination, and the
destination is able to return an echo reply message back to the source within a predetermined

Equipment
time called a timeout. If the source does not receive the echo reply message within the
timeout, the source displays that the Request message times out.
The ping command sets the identifier field in the ICMP message as the process ID of the
sending process. This allows the remote end to distinguish multiple ping processes that are
running on the local end simultaneously.
The ping command labels each ICMP Echo Request message with a sequence ID that starts
from 1 and is increased by 1. The number of ICMP Echo Request messages to be sent varies
with different systems. The default number is 5. The number of ICMP Echo Request
messages can also be set through commands. If the destination is reachable, the source can
receive five ICMP Echo Reply messages from the destination, with sequence numbers
corresponding to those of ICMP Echo Request messages.
If the TTL field is reduced to 0 during the message forwarding, the device that the message
reaches sends an ICMP timeout message to the source host, indicating that the destination
host is unreachable.
2.6.2.2 Working of Tracert

The source host first sends three UDP packets with TTL fields being 1 to a remote device. A
port with number greater than 32768 is selected randomly for the destination to receive the
packet. The TTL value of 1 causes the UDP packets to be timed out as soon as it hits the first
device in the path; this device then responds with an ICMP timeout message indicating that
the UDP packets have expired. The source sends another three UDP messages, each with the
TTL value set to 2, which causes the second device to return ICMP timeout messages. This
process continues until the UDP packets actually reach the destination.
Since these UDP packets are trying to access an invalid port at the destination host, the
destination returns ICMP port unreachable messages, indicating that the port is unreachable
port and the tracert process is finished. The purpose behind this is to record the source of each
ICMP timeout message to provide a trace of the path the packet took to reach the destination.
The maximum TTL value set for the UDP packets can be 30. Each time a message fails to be
received within the predetermined time, the UDP packets are displayed as expired at the
sending end. If UDP packets with the TTL value set to 30 expire, it indicates that the
destination is unreachable and the tracert test fails. By default, if no UDP packet is sent within
5 seconds, a timeout message is displayed. The timeout value ranges from 0 to 65535, in
milliseconds.
2.6.2.3 LSPV
Label Switched Path Verification (LSPV) is a mechanism that uses the MPLS ping and
traceroute (abbreviated as tracert) to detect LSP errors and locate faulty nodes.
MPLS tunnel technologies support multiple upper-layer protocols and services. Similar to the
IP ping and tracert, the MPLS ping and tracert are used to detect the connectivity of an LSP.
In MPLS, the control panel responsible for establishing LSPs cannot detect data forwarding
failures over LSPs. This makes the network maintenance difficult.
The MPLS ping and tracert use MPLS Echo Request messages and MPLS Echo Reply
messages to detect the connectivity of an LSP. Both MPLS Echo Request and MPLS Echo
Reply messages are UDP packets using the well-known UDP port of 3503. The receiver
identifies MPLS Echo Request messages based on the port number. An MPLS Echo Request
message carries FEC is sent along the same LSP as common packets with the same FEC to
detect the connectivity of the LSP. MPLS Echo Request messages are transmitted to the

Equipment
destination by using MPLS, whereas MPLS Echo Reply messages are transmitted to the
source by using IP. To prevent the egress from forwarding the received Echo Request message
to other nodes, the destination address in the IP header of the Echo Request message is set to
127.0.0.1/8 (the local loopback address), and the TTL value contained in the IP header is set
to 1.
The VRP supports ping and tracert for the following link types:
l LDP LSP ping and tracert

– Ping and tracert of multiple LSPs in load-balancing on the first node
l P2P TE tunnel ping and tracert
– Ping and tracert of the hot-standby tunnel
l VLL PW ping and tracert
– PWE3 ping and tracert
– Multi-hop PW ping and tracert
– Inter-AS PW ping instead of Inter-AS PW tracert
l VPLS PW ping and tracert
– Martini VPLS ping and tracert
– Single-hop PW ping and tracert
– Inter-AS VPLS ping instead of inter-AS VPLS tracert
l L3VPN LSP ping
P2P LDP LSP Ping and Tracert

A ping or tracert operation can be initiated by the ingress to the egress based on the specified
FEC, mask, and next hop to check the connectivity of the LDP LSP.
NOTE
If there are multiple LDP LSPs, the LDP LSP to be checked is determined by the next hop address.
TE Tunnel Ping and Tracert

A ping or tracert operation can be initiated by the ingress to the egress to check the
connectivity of a TE tunnel.
NOTE
If the TE tunnel is configured with a hot-standby LSP, the connectivity of the hot-standby LSP can be
detected by using a ping or tracert operation.
VLL PW Ping and Tracert

Based on VLL types, the VLL PW ping and tracert operations are classified into the following
types:
l PWE3 VLL PW ping

In a PWE3 VLL network, to implement the VLL PW ping, a PW must be configured
with a PW template enabled with the Virtual Circuit Connectivity Verification (VCCV)
capability. A PWE3 VLL ping is used to check the connectivity of a PW. This ping can
be performed in control word mode ,label alert mode or normal mode.

Equipment
In a PWE3 VLL ping operation, an Echo Request message is first sent to the peer PE.
After receiving the message, the peer PE extracts and sends FEC information to the
L2VPN module to determine whether the peer PE is the egress. If so, the peer PE returns
an Echo Reply message.
NOTE
l If the reply mode is specified as 4, the label alert function must be enabled for the PW.
l If a multi-hop PW is detected in label alert mode, the PW Switching Point (SPE) sends the Echo
Request message to the L2VPN module. If the L2VPN module determines that the SPE is not the
egress, the SPE will forward the Echo Request message instead of returning an Echo Reply message.
l PWE3 VLL PW tracert
In a PWE3 VLL network, to implement the VLL PW tracert, a PW must be configured
with a PW template enabled with VCCV. A PWE3 VLL tracert can help you obtain
information about SPEs and Ps along the path that the message travels from the source to
the destination, check the connectivity of the PW, and locate the fault of a PW.
A PWE3 VLL tracert can be performed in control word mode, label alert mode, or TTL
mode. The default mode is label alert. The TTL mode and control word mode are
mutually exclusive.
The TTL value of the PW Tracert Request message is increased by 1 each time the
original device sends the request message. Each time the transit node (P node) receives
an Echo Request message with an expired TTL value, it sends the Echo Request message
to the LSPV module. The LSPV module responds with an Echo Reply message
containing information about the next hop of the node that sends the Echo Request
message.
l A PW tracert terminates when either of the following situations occurs:
– The PW Tracert Request message reaches the egress.
– The TTL value of the PW Tracert Request message reaches the upper threshold.
VPLS PW Ping and Tracert

l Martini VPLS PW ping and tracert
Martini VPLS PW ping and tracert can be performed only in label alert mode.
On a Hierarchical Virtual Private LAN Service (HVPLS) network, only single-hop PWs
can be detected.
NOTE
If a PW ID that is optional is set and specified, the PW with the specified PW ID is detected. If no
PW ID is specified, the PW associated with the VSI ID is detected.
In the procedure of the Martini VPLS PW ping and tracert, the following functions are
performed by each type of node:
a. Ingress: The ingress obtains the forwarding token from the L2VPN module based
on the VSI name, peer address, and PW ID. If no PW ID is specified, the first PW is
detected by default. The ingress searches for the TunnelInfo at the control plane
based on the forwarding token, obtains the downstream information based on the
TunnelInfo, and then encapsulates the Request message.
b. Transit node: The transit node searches for the Next Hop Label Forwarding Entry
(NHLFE) and Incoming Label Map (ILM) based on the incoming label and then
obtains the downstream information based on the incoming label and index of the
inbound interface.

Equipment
c. Egress: The egress delivers the incoming label and FEC TLV to the L2VPN
module. The L2VPN module determines whether the egress is the destination of the
packets. If so, the egress returns a Reply message.
d. The detection result or timeout information is displayed.
L3VPN LSP Ping

An L3VPN LSP ping can be initiated on a PE to a peer PE to detect the connectivity of the
LSPs established by using BGP.
The L3VPN LSP ping supports the following tunnels:
l LDP LSP
l TE tunnel
Specifying the First Node in the Ping Operation

When a device is faulty, the ping and tracert operations together can be performed to check
network connectivity. The ping operation is mainly used to test the connectivity of the
network and accessibility of the host. In a ping operation, the source sends an ICMP Request
message to the destination and the destination then returns an ICMP Response message to the
source.
On an MPLS network, if a fault occurs on the MPLS network and the control plane fails to
detect the fault, the source cannot successfully ping the destination. To identify whether the
fault occurs on the MPLS network or the IP network, you can specify the first node in the
ping operation. Subsequent ping packets will be forwarded based on IP, which can help you
fast locate the fault.
2.6.2.4 CE Ping
CE ping is tool used by a PE on the L2VPN network to identify whether the IP address of the
CE is online (reachable) by initiating an ARP request.
As shown in Figure 2-64, a ping operation is performed on PE1 to check whether CE1 is
reachable and an ARP message requesting the MAC address corresponding to the IP address
of CE1 is sent from the AC interface on PE1. CE1 responds the request with an ARP reply
message when the required IP address is the IP address of itself. Upon receiving the ARP
reply message, PE1 displays that the IP address is reachable.
Figure 2-64 Networking diagram of configuring CE ping to detect the connectivity between
the PE and CE on a VLL network
CE-Ping
CE1 PE1 PE2 CE2

Equipment

Abbreviations
Abbreviations Full Spelling
MPLS Multiprotocol Label Switch
LDP Label Distribution Protocol
TE Traffic Engineering
VLL Virtual Leased Line
L3VPN Layer3 Virtual Private Network
HVPLS Hierarchical Virtual Private LAN Service
CE Custom Edge
PE Provider Edge
UPE Underlayer PE
SPE Superstratum PE
PW Pseudo-Wires
2.7 Fault Management
2.7.1 Introduction
Definition
The Fault Management (FM) is used to dynamically manage and report alarms generated on
devices in a centralized manner.
Purpose
With the rapid growth in network scales and complexity, more and more network
configurations and applied features are required. When a module on a device is faulty, a great
number of alarms may be generated on one or multiple devices. The alarms, however, may be
lost during sending to the network management device because of limited capability of
handling alarms on the devices or the network management system (NMS). As a result,
certain needed alarms cannot be displayed, which inconveniences network management.
In the FM, alarm classification and alarm buffer are introduced.
l Alarm classification: Alarms can be classified into levels. (Default alarm classification is
enabled in the system, and you can modify alarm classification.) You can use alarm

Equipment
classification to display the concerned alarms and shield the alarms that are not needed
from being displayed.
l Alarm buffer: Alarms or events of specified types can be saved in the devices. (Default
types are set in the system, and you can modify alarm types.) The alarms saved on the
device can be displayed on the NMS through MIB interfaces. In addition, the device
provides the active alarm function. The NMS synchronizes the alarms of the current
activities in real time.
2.7.2 Principles
2.7.2.1 Fault Management

Fault management includes the following functions:
l A device reports alarms according to different alarm levels.
l The NMS obtains the time when the alarm is reported.
l The NMS obtains lost events or alarms.
l The NMS synchronizes current active alarms in real time.
l A device suppresses alarms.
Reporting Alarms According to Different Alarm Levels

The system sets an initial level and an initial type for each alarm, and the level can be
modified by a user.
Table 2-19 Mappings between alarm levels and severity levels

Alarm Severity Description
Level Level
1 Critical The Critical severity level indicates that a service affecting

condition has occurred and an immediate corrective action
is required. Such a severity can be reported, for example,
when a managed object becomes totally out of service and
its capability must be restored.
2 Major The Major severity level indicates that a service affecting

condition has developed and an urgent corrective action is
required. Such a severity can be reported, for example,
when there is a severe degradation in the capability of the
managed object and its full capability must be restored.
3 Minor The Minor severity level indicates the existence of a non-

service affecting fault condition and that corrective action
should be taken in order to prevent a more serious (for
example, service affecting) fault. Such a severity can be
reported, for example, when the detected alarm condition is
not currently degrading the capacity of the managed object.

Equipment
Alarm Severity Description

Level Level
4 Warning The Warning severity level indicates the detection of a

potential or impending service affecting fault, before any
significant effects have been felt. Action should be taken to
further diagnose (if necessary) and correct the problem in
order to prevent it from becoming a more serious service
affecting fault.
Alarms can be classified into three types shown as follows:

l Alarm
l Resume-alarm
l Event
A user can configure the level and type of an alarm.
l If the user focuses on certain types of alarms, he or she can set these types of alarms to
be of the highest level and configure filtering conditions. In this manner, the system
reports only these types of alarms to the NMS.
Obtaining the Time When the Alarm Is Reported

According to the Simple Network Management Protocol (SNMP), an alarm carrying the
generation time is reported to the NMS. The generation time refers to a relative time from
when the system is started to when the alarm is generated. The user cannot view the UTC
time of alarm generation. The device can provide the time of obtaining the reported alarm for
the NMS.
Before sending an alarm, the system determines whether the alarm is destined for Huawei's
NMS. If so, a parameter DateAndTime is added to the alarm binding table to store the alarm
generation time. The NMS then obtains the alarm generation time by parsing this parameter.
NOTE
This function is applicable to only Huawei's NMS. For a third-party network management network
system, the alarm binding table does not contain the parameter DateAndTime. Therefore, this function is
not supported.
Obtaining Lost Events or Alarms

The alarms of the three types, namely, alarm, resume-alarm, and event, can be stored on a
device. The alarms of the two types, namely, alarm and resume-alarm, are stored in the alarm
queue; whereas the alarms of the type, event, are stored in the event queue.
After an alarm is sent to the FM module, the system determines whether the alarm needs to be
saved. If it needs to be saved, the system generates a copy of the alarm binding table and store
the copy in the alarm queue or the event queue according to the alarm type. The system also
provides MIB interfaces, through which users can obtain alarms from a device.

Equipment
NOTE
The alarm binding table obtained by the NMS is coded according to the type, length, and value (TLV);
the alarms obtained by the NMS through the MIB are coded messages.
The binding table stored on the device is of Huawei's private data structure. Therefore, a third-party
NMS may not be able to correctly parse the alarms.
Synchronizing Current Active Alarms in Real Time

When a user receives an resume-alarm, the system matches the resume-alarm with the alarm
in the alarm queue. If the matching is successful, the system deletes the alarm from the active
alarm queue, and transfer the alarm and resume-alarm to the historical alarm queue. No matter
whether the alarm is matched, the system does not add the resume-alarm to the active alarm
queue.
The procedure for alarm matching is described as follows:
1. A user receives an alarm and adds it to the active alarm queue.

2. The user receives a resume-alarm. Then, the system obtains the matching information
and thus obtains the matched alarm and related matching rules.
3. The system searches the name of the alarm to be matched. If the alarm exists, the system
select the matching rules related to this alarm. If the matching is successful, the system
deletes this alarm and refreshes the alarms in the active alarm queue.
Terms
None.
Abbreviation
Abbreviatio Full Spelling
n
FM Fault Management
2.8 Performance Management
2.8.1 Introduction
Definition
Performance management (PM) is used to monitor and collect performance indexes in the
system, such as the CPU usage and data about received and sent packets. It also periodically
collects data about each performance, provides current and historical performance statistics
for user query, and reports alarms based on user-defined performance thresholds.

Equipment
Purpose
As the telecommunication industry develops, users pose higher operation and maintenance
requirements on devices. PM is the key feature for improving operation and maintenance
capability of devices. It provides current and historical statistics on all performance indexes,
which are used to judge running status of the system and analyze system errors, and used as
basis of system configuration.
Performance trend can be analyzed based on performance data. For example, you can analyze
the increase trend and speed of network traffic in a month or more based on the peak and
bottom values of traffic in one day.
Based on analysis of various performance data, you can provide materials and basis for
optimizing network configurations and expanding network capacity.
2.8.2 Principles
Performance management includes the threshold monitoring function and statistics function.
2.8.2.1 Statistics
To allow devices to periodically collect performance data, you can configure the statistics
function.
The statistics function contains different statistics tasks. Each statistics task is configured with
only one statistics instance type and data collection period. You can set a data collection
period (to 5 m, 10 m, 15 m, 30 m, 60 m, or 1440 m), set a statistics instance and related
indexes, and set an interval at which the system generates statistics files (the value ranges
from 1 to 16). When a statistics task is running, the system collects parameters used in the
monitored instance and indexes in a specific period, and calculates collected values at the end
of the period. The system will save statistics data to a file after the specific period (collection
period x interval at which statistics files are generated).
To allow the system to send alarms when system's performance data exceeds the
corresponding threshold, you can configure the threshold monitoring function. This function
monitors the system periodically. The system compares the indicator value of a collected
instance and the threshold value of monitoring rules within a certain interval. If the indicator
value exceeds the range of the threshold value, an alarm will be triggered. After an alarm is
triggered, the system will monitor the data until the data fall within the specified range. The
statistics function can also suppress the alarm signals to prevent repeated sending of an alarm.
You can query statistics files and current or historical performance statistics using the NMS or
related commands, and clear the current performance statistics.
The system can upload statistics files to the performance management server for network
management. The uploading mode is passive. This is because the system is instructed by the
NMS to upload files generated periodically.
Term
None.

Equipment

Acronyms and Full Name
Abbreviations
PM Performance Management
2.9 PoE Features

NOTE
Among ATN 910I series, only the ATN 910I-P supports the PoE function.
NOTE
Among ATN 905 series, only the ATN 905A-P/ATN 905A-C/ATN 905A-D supports the PoE function.
2.9.1 Overview
Definition
Power over Ethernet (PoE) refers to power supply through an Ethernet. It is also called power
over LAN (PoL) or active Ethernet.
Figure 2-65 Typical PoE system
UPS
IPRAN
PSE
Micro/Pico Camera IP Phone
PD
A PoE system consists of the following:

l Power sourcing equipment (PSE): Refers to a PoE device that feeds power to a powered
device (PD) through an Ethernet. A PSE provides functions such as detection, analysis,
and intelligent power management.
l PD: Refers to a device powered on by a PSE, that is, client equipment of a PoE system.
Usually, a PD can be an IP phone, network security camera, AP, PDA, or mobile phone
charger.
l PoE power supply: Supplies power to the whole PoE system. The power of the PoE
power supply determines the number of PDs connected to a PSE.

Equipment
Purpose
As IP phones, network video monitoring, and wireless Ethernet networks are widely applied,
the power supply requirements on the Ethernet become urgent. In most situations, access
point devices need DC power supply, but access point devices are often installed outdoors or
on the ceiling that has a long distance from the ground. The nearby proper power socket is
difficult to find. Even if the proper power socket is available, the network administrator finds
it hard to install the AC/DC converter required by access point devices. On many large-scale
LANs, administrators need to manage multiple access point devices that require uniform
power supply and management. In this case, power supply management is difficult. The PoE
function addresses this problem.
The PoE technology is used on the wired Ethernet and is most widely used on local LANs.
The PoE function transmits power together with data to terminals over cables or transmits
power without data over idle lines. This technology provides power on the 10Base-T,
100Base-TX, or 1000Base-T Ethernet at a distance of up to 100 m. PoE can be used to
effectively provide centralized power for terminals such as IP phones, Access Points (APs),
chargers of portable devices, POS machines, cameras, and data collection devices. Terminals
are provided with power when they access the network. Therefore, indoor cabling of power
supply is not required.
Benefits
l Power supplies are easily and conveniently accessed and the costs of power cables and
cable routing are saved.
l Uninterruptible power supplies (UPSs) are also used to provide redundancy power
supply to IP cameras, video servers, and IP phones, in order to prevent the devices from
being powered off.
2.9.2 Principle Description
2.9.2.1 Power Supply Procedure
PoE Power Supply Procedure

The PoE function is independent of the link status at the service layer. A PSE powers a PD
only after the PSE's interface is connected to the PD's interface.
PoE power supply procedure is shown in Table 2-20:
Table 2-20 PoE power supply procedure
Step Item Description
1 Detecting On the PSE, the port where PoE is enabled outputs a low voltage
the PD initially until the PSE detects the PD (connected to the line
terminal) that supports IEEE 802.3af or IEEE 802.3at.

Equipment
Step Item Description
2 Negotiating The PSE classifies the PD and negotiates the power supply
the power capability with the PD by analyzing the detected feature resistance.
supply
capability
with the
PD
3 Starting When undergoing a startup (the period is usually shorter than 15

supplying us), the PSE supplies power to the PD from a low voltage to a
power to relative steady voltage.
the PD
4 Normally After the voltage reaches to a relative steady voltage, he PSE starts
supplying to supply power to the PD.
power to
the PD
5 Powering When supplying power to the PD, the PSE continuously detects the
off the PD input current of the PD. When the input current of the PD is lower
than the limit or the input current increases sharply, the PSE
powers off the PD and starts detecting the PD again. The PD is
lower than the limit when the PD is removed. The input current
increases sharply when the PD is disconnected from the PSE, the
PD power is overloaded, the PD is short-circuited, or the PD power
exceeds the power supply capability of the PSE.
2.9.2.2 Power Supply Modes
As defined in IEEE standard, PSEs provide power for PDs and are classified into MidSpan
(the PoE module is installed out of the device) and Endpoint (the PoE module is integrated to
device) PSEs. The Endpoint PSE is compatible with 10Base-T, 100Base-TX, and 1000Base-T
interfaces. The Endpoint PSE is more widely used than the Midspan PSE.
ATN only support Endpoint PSEs.
Endpoint Networking Overview
Figure 2-66 Endpoint networking diagram
ATN
Power Power
Sourcing Device
Equipment (PD)
(PSE)

Equipment
Endpoint PSEs can work in Alternative A (line pair 1/2 and line pair 3/6) and Alternative B
(Line pair 4/5 and line pair 7/8) power supply modes according to different copper line pairs.
l Alternative A mode: Power is transmitted over pairs of lines that transmit data.
The PSE supplies power to PDs using twisted pairs 1/2 and 3/6. The DC power and data
frequency do not interfere with each other. Twisted pair 1/2 forms the positive (negative)
pole while twisted pair 3/6 forms the negative (positive) pole.
10BASE-T and 100BASE-TX interfaces use twisted pairs 1/2 and 3/6 to transmit data
while the 1000BASE-T interface use four twisted pairs to transmit data.
Figure 2-67 10BASE-T and 100BASE-TX interfaces using the alternative A power
supply mode
1 1
Data Pair Data Pair
2 2
4 4
Power Power
Sourcing 5 5 Device
Equipment 7 7
(PD)
(PSE)
8 8
3 3
Data Pair Data Pair
6 6
Figure 2-68 1000BASE-T using the alternative A power supply mode
1 1
Data Pair Data Pair
2 2
4 4
Data Pair Data Pair
5 5
7 7
Data Pair Data Pair
8 8
3 3
Data Pair Data Pair
6 6
Power
Power
Sourcing
Device
Equipment
(PD)
(PSE)
l Alternative B mode: Power is transmitted over idle pairs of lines.

The PSE supplies power to PDs using twisted pairs 4/5 and 7/8. Twisted pair 4/5 forms
the positive pole while twisted pair 7/8 forms the negative pole.

Equipment
Figure 2-69 10BASE-T and 100BASE-TX interfaces using the alternative B power
supply mode
1 1
Data Pair Data Pair
2 2
4 4
Power Power
Sourcing 5 5 Device
Equipment 7 7
(PD)
(PSE)
8 8
3 3
Data Pair Data Pair
6 6
Figure 2-70 1000BASE-T using the alternative B power supply mode
1 1
Data Pair Data Pair
2 2
4 4
Data Pair Data Pair
5 5
7 7
Data Pair Data Pair
8 8
3 3
Data Pair Data Pair
6 6
Power
Power
Sourcing
Device
Equipment
(PD)
(PSE)
Generally, a standard PD supports the two modes, whereas the PSE only needs to support one
mode. ATN support only Alternative A.
2.9.3 Applications
2.9.3.1 Typical Applications
Usually, terminal equipment (such as an IP phones, AP, and data collector) requires DC power
supply. However, such devices are usually installed in corridors or on ceilings, where suitable
power sockets are unavailable. On many large-scaled local area networks (LANs), the
administrator needs to manage devices at multiple access positions simultaneously. These
devices require unified power supply, making power supply management difficult.

Equipment
As shown in Figure 2-71, after power over Ethernet (PoE) is deployed, the PSE directly
supplies power to access devices (such as IP phones, APs, and other wireless LAN access
devices). This eliminates the needs for external power supplies, decreases cable connections,
reduces costs, and simplifies management.
Figure 2-71 Typical PoE power supply system
UPS
IPRAN
PSE
Micro/Pico Camera IP Phone
PD

Acronyms and Full Name
Abbreviations
PoE Power over Ethernet
PSE Power sourcing equipment
PD Powered device
PoL Power over LAN
2.10 TWAMP
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the TWAMP
function.

Equipment
2.10.1 Introduction
Definition
The Two-Way Active Measurement Protocol (TWAMP) is a technology that measures the
round-trip performance of an IP network.
TWAMP uses the client/server mode.

l The client establishes, starts, and stops a TWAMP session, and generates and maintains
statistics about the IP network performance.
l The server responds to the client's request for establishing, starting, or stopping a
TWAMP session.
Purpose
As networks rapidly develop and applications widely apply, various services are deployed to
meet requirements in different scenarios. Therefore, networks encounter increasingly higher
requirements for statistics collection. A tool that rapidly provides statistics about the IP
network performance is in urgent need.
Traditionally, network elements (NEs) themselves generate and maintain statistics about the
IP network performance. To display statistics about the performance of the entire network, a
network management system (NMS) is required to manage multiple NEs and collect statistics
about these NEs. However, there may be no NMS deployed or the NMS may be incapable of
collecting statistics.
TWAMP is therefore introduced. NEs themselves no longer need to generate or maintain

statistics about the IP network performance. The performance management system manages
only the TWAMP client and easily obtains statistics about the entire network.
TWAMP has the following advantages over the traditional tools that collect statistics about IP
network performance:
l Unlike network quality analysis (NQA), TWAMP has a unified test model and packet
format, facilitating deployment.
l IP Flow Performance Management (FPM) requires end-to-end devices to be
synchronized when implementing statistical analysis, whereas TWAMP boasts stronger
availability and deployment.
Therefore, TWAMP applies to the scenario in which statistics about the IP network
performance must be rapidly obtained but not necessarily be highly accurate.
Benefits
TWAMP brings the following benefits to carriers:
l TWAMP enables carriers to rapidly and flexibly obtain statistics about the performance
of the entire network when the NMS is incapable of collecting such statistics.
l TWAMP can be configured to collect statistics when the IP network does not support
clock synchronization.
2.10.2 Principles

Equipment
2.10.2.1 TWAMP Implementation Principles
Implementation
The Two-Way Active Measurement Protocol (TWAMP) defines a method for measuring
round-trip IP network performance between two TWAMP-capable devices. Figure 2-72
shows how TWAMP is implemented.
l The performance management system instructs the control-client to establish a test
session with a specific TWAMP server.
l The control-client establishes and completes the test session.
l The performance management system collects statistics during the test are saved.
Figure 2-72 TWAMP implementation
Control-client
① IP Network
③ ②
Performance
management system
Server
TWAMP collects statistics about the delay, jitter, and packet loss rate.
l The delay and jitter are calculated based on timestamps. The session-sender sends a
probe carrying a sending timestamp T0, and the reflector replies with a response probe
carrying a receiving timestamp T1 and a responding timestamp T2. After receiving the
response probe, the session-sender records the receiving timestamp T3. The delay and
jitter during a single period are calculated based on the four timestamps.
l The packet loss rate is calculated based on the serial numbers (starting from 0) carried in
probes. The session-sender sends a probe with a serial number, and the reflector replies
with a response probe with the same serial number. Each time the session-sender sends a
probe or the reflector replies with a response probe, the serial number increases by 1.
The packet loss rate is calculated based on the two rows of serial numbers.
Intercommunication Model
TWAMP uses the client/server mode and defines four logical entities, as shown in Figure
2-73.
l Control-client: establishes, starts, and stops a test session and collects statistics.
l Session-sender: proactively sends probes for performance statistics after being notified
by the control-client.
l Server: responds to the control-client's request for establishing, starting, or stopping a
test session.

Equipment
l Session-reflector: replies to the probes sent by the session-sender with response probes
after being notified by the server.
Figure 2-73 Typical TWAMP logical architecture
Session-sender TWAMP-Test Session-reflector
Server
TWAMP-Control
Control-client
To facilitate implementation, TWAMP unifies the four logical entities, as shown in Figure
2-74. Control signals are exchanged between the control-client and server through a TCP
connection; probes are exchanged between the session-sender and session-reflector through a
UDP connection. The control-client and server establish and start a test session. Once a test
session starts, the control-client and server notify the session-sender and session-reflector
respectively of the session information and allow the session-sender to send probes and the
session-reflector to respond to the probes.
Figure 2-74 Typical TWAMP implementation architecture
Control-client TWAMP-Control Server
Session-sender TWAMP-Test Session-reflector
NOTE
On a live network, if a network element (NE) functions as a server and session-reflector alone, the NE
participates in TWAMP session establishment and probe exchanges but does not compile statistics. If a device
or tester functions as the control-client and session-sender, the device or tester proactively establishes a
TWAMP session for statistics collection. Users manage the control-client alone to rapidly obtain statistics
about the performance of the entire IP network.
2.10.2.2 TWAMP Implementation Process

Equipment
A Two-Way Active Measurement Protocol (TWAMP) test covers the establishment of a

control session, establishment and start of a test session, and stop of the test session.
Establishment of a Control Session

A control session provides a basis for the establishment of a test session. Figure 2-75 shows
how a control session is established.
1. The server specifies a TCP port number ( the default port number is 862 ), and the
control-client initiates a TCP connection.
2. The server replies with a Server-Greeting message to notify the control-client of the
server configurations.
3. After receiving the Server-Greeting message, the control-client sends a Set-Up-Response
message to the server to establish a control session.
4. The server verifies the Set-Up-Response message and replies with a Server-Start
message.
A control session is established between the control-client and the server.
Figure 2-75 Establishment of a control session

Control-client Server
Session-sender Session-reflector
Open TCP connection port 862
Server-Greeting
Set-Up-Response
Server-Start
Establishment of a Test Session

After a control session is established, you can specify an IP address for the client and initiate a
test session on a UDP port, as shown in Figure 2-76.
1. The control-client sends a Request-TW-Session message carrying an IP address and a
UDP port number to the server through the TCP connection.
2. After receiving the Request-TW-Session message, the server establishes a test session
based on the IP address and UDP port number in the Request-TW-Session message and
replies to the control-client with an Accept-Session message.

Equipment
A test session is established between the control-client and the server.
Figure 2-76 Establishment of a test session
Request-TW-Session
Accept-Session
Start of a Test Session

A test session is started based on a control session. After a test session is established, the
control-client can send a Start-Session message to the server. After receiving the Start-Session
message, the server instructs the control session to start all test sessions that are established
based on the control session. Figure 2-77 shows how a test session is started.
1. The control-client sends a Start-Session message to the server.

2. After receiving the Start-Session message, the server notifies the session-reflector of the
test session information to enable the session-reflector to respond to probes.
3. The session-reflector replies to the control-client with a Start-ACK message to start the
test sessions.
4. After receiving the Start-ACK message, the control-client notifies the session-sender of
the test session information to enable the session-sender to send probes.
Test sessions are started, and the session-reflector starts to respond to probes.
Figure 2-77 Start of a test session
Start-Session
Start-ACK

Equipment
Stop of a Test Session

After statistics are collected, users can stop a test session. Figure 2-78 shows how a test
session is stopped.
1. The control-client sends a Stop-Session message to instruct the server to stop collecting
statistics.
2. After receiving the Stop-Session message, the server disables the session-reflector from
responding to probes in a test session.
The test session is stopped.
Figure 2-78 Stop of a Test session

Stop-Session
2.10.3 Applications
2.10.3.1 TWAMP Applications on an IP Network
As shown in Figure 2-79, ATN A, Router B, and Router C on an IP network function as the
servers in a TWAMP test. Router E functions as the control-client and specifies an IP address
to start collecting statistics. ATN E sends statistics to the performance management system.
Users can compare statistics to measure the performance of network segments. For example,
to measure the performance of the IP network between ATN A and Router B, the control-
client initiates a TWAMP test for ATN A and Router B each. Users can check the statistics
about the performance of the IP network between ATN A and Router B by comparing the two
sets of statistics.
Figure 2-79 TWAMP applications on an IP network

Performance
management
system
IP network
Router C
Router B
ATN A
Router E
(Control-client) TWAMP statistics packets
TWAMP statistics data

Equipment
2.10.3.2 TWAMP Applications on an L3VPN

Statistics about the performance of different network segments can be collected based on the
control-client's location on a Layer 3 virtual private network (L3VPN).
l When the control-client is located on the private network side:
As shown in Figure 2-80, the customer edge (CE) functions as the control-client and
initiates a Two-Way Active Measurement Protocol (TWAMP) test for provider edge 1
(PE1) and provider edge 2 (PE2) each. By comparing the two sets of statistics, users can
check statistics about the user network interface to user network interface (UNI-UNI)
performance between PE1 and PE2.
Figure 2-80 TWAMP applications on an L3VPN with the control-client on the private
network side
Performance
management
system
L3VPN
PE2 (Server)
PE1 (Server)
CE TWAMP statistics packets

(Control-client) TWAMP statistics data
l When the control-client is located on the public network side:

As shown in Figure 2-81, PE1 functions as the control-client and initiates a TWAMP
test for the P and PE2 each. By comparing the two sets of statistics, users can check
statistics about the NNI-NNI performance between the P and PE2.

Equipment
Figure 2-81 TWAMP applications on an L3VPN with the control-client on the public
network side
L3VPN
PE2
Performance
management
system
P
PE1 (Control-client)
TWAMP statistics packets
TWAMP statistics data

Terms
Table 2-21 Terms

Terms Description
Two-Way Active An IP link performance monitoring technique used to

Measurement Protocol measure bidirectional link performance.

Acronym&Abbreviation Full Name
TWAMP Two-Way Active Measurement Protocol
2.11 TWAMP Light

NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports the TWAMP Light
function.
2.11.1 Introduction
Definition
Two-Way Active Measurement Protocol (TWAMP) Light is a light version of TWAMP, which
is defined in RFC 5357. TWAMP Light measures the round-trip performance of an IP
network by using simplified control protocol to establish test sessions.

Equipment
Purpose
On conventional IP radio access networks (IP RANs), carriers desperately need a universal
tool that rapidly provides statistics about the IP network performance for operation,
administration and maintenance (OAM). Currently, Network Quality Analysis (NQA) and IP
Flow Performance Measurement (IP FPM) are mainly used. However, NQA does not allow
for intercommunication between a Huawei device and a non-Huawei device and requires
complex deployment, and IP FPM has high requirements on network devices and applies only
to a few scenarios. To resolve this problem, the Internet Engineering Task Force IP
performance monitoring (IETF IPPM) group defines a set of protocols, including TWAMP.
TWAMP, in its standard or light version, measures the round-trip performance of an IP
network.
As described in Table 2-22, TWAMP Light is simpler and easier than TWAMP.
Table 2-22 Comparison between TWAMP and TWAMP Light

Item TWAMP TWAMP Light
Capability of sending No Yes

packets
Using a tester Only a device that has high No tester is needed.

performance can function as the As the light version of
Control-Client and Session- TWAMP, TWAMP Light moves
Sender, which proactively starts the control plane from the
a test session and collect and Responder to the Controller so
maintain test statistics. that TWAMP control modules
can be simply deployed on the
Controller. Therefore, TWAMP
Light greatly relaxes its
requirements on the Responder
performance, allowing the
Responder to be rapidly
deployed.
Benefits
TWAMP Light is an IP link detection technology and can be easily used to help users monitor
the network quality (delay, jitter, and packet loss rate).
2.11.2 Principles
2.11.2.1 Comparison Between TWAMP Light and TWAMP
Communication Models
The Two-Way Active Measurement Protocol (TWAMP) measures the round-trip performance
of an IP network and has two versions: standard version and light version (TWAMP Light).
TWAMP uses the client/server model and defines four logical entities:

Equipment
l Control-Client: establishes, starts, and stops a test session and collects statistics.
l Session-Sender: sends probes for performance statistics after being notified by the
Control-Client.
l Server: responds to the Control-Client's request for establishing, starting, or stopping a
test session.
l Session-Reflector: replies to the probes sent by the Session-Sender with response probes
after being notified by the server.
Figure 2-82 TWAMP architecture
Controller Responder
Control-Client Server
TWAMP-Control
Session-Sender Session-Reflector
TWAMP-Test
Figure 2-83 TWAMP Light architecture
Server
Control-Client Session-Reflector
Session-Sender
TWAMP-Test
Compared with TWAMP (in Figure 2-82), TWAMP Light (in Figure 2-83) moves the control
plane (server) from the Responder to the Controller.
To be specific, TWAMP Light integrates the Control-Client, server, and Session-Sender on

the Controller and skips the process of establishing a control session. The TWAMP Light
Responder functions merely as the Session-Reflector.
Therefore, TWAMP Light simplifies the communication model of TWAMP and greatly
relaxes its requirements on the Responder performance, allowing the Responder to be rapidly
deployed. In addition, TWAMP Light supports plug-and-play.
The Controller and Responder function as follows:
l Controller: sends and receives packets over a test session, collects and calculates
performance statistics, and reports the statistics to the NMS.
l Responder: responds to the packets received over a test session.
NOTE
Different from TWAMP, TWAMP Light has parameters statically configured for a test session. You can
configure the IP address and UDP port number on the Responder using MIBs. After a test session is
created, TWAMP-Test packets are transmitted over the test session to help calculate the performance
statistics, such as the packet loss rate, delay, and jitter. Therefore, TWAMP Light does not need any
control protocol for parameter negotiation. TWAMP Light simplifies the working process of protocols
and is easier to deploy in real world situations.

Equipment
2.11.2.2 Principles
Related Concepts
The Two-Way Active Measurement Protocol (TWAMP) Light consists of on-demand
measurement and proactive measurement.
l On-demand measurement works in a specified period after being started. It can be
performed once or periodically in the specified period.
l Proactive measurement works continuously after being started to collect statistics.
Principles of TWAMP Light

A TWAMP Light service must be established before TWAMP Light is implemented.
1. Establishing a TWAMP Light service
Figure 2-84 Establishing a TWAMP Light service

NMS
Network
Controller
Responder
The Controller and Responder are deployed as shown in Figure 2-84.

a. The Control-Client on the Controller creates a TWAMP Light test session, and the
Controller is specified as the Session-Sender.
b. The Responder is specified as the Session-Reflector.
c. The Session-Sender on the Controller starts the test session, and the Controller
sends TWAMP-Test packets to the Responder based on the configured packet
sending rate and packet template.
d. The Responder responds to the TWAMP test packets.
2. Collecting performance statistics
TWAMP Light defines two types of TWAMP-Test packets: Test-request packets and
Test-response packets.
– Test-request packets are sent from the Controller to the Responder.
– Test-response packets are replied by the Responder to the Controller.

Equipment
Figure 2-85 TWAMP Light implementation

Session-Sender Session-Reflector
t1 Test-request
t3 t1'
t2'
Test-response t3'
t4'
t2
t4
In Figure 2-85, TWAMP-Test packets function as probes and carry the IP address and
UDP port number that are predefined for the test session between the Controller and
Responder. The Controller sends a TWAMP-Test packet to the Responder, and the
Responder replies to it. The Controller collects TWAMP statistics as follows:
a. The Controller collects statistics about the two-way delay, jitter, and packet loss rate
based on the sequence numbers and timestamps carried in TWAMP-Test packets.
Delay
The delay is calculated based on timestamps. The Controller sends a probe carrying
a sending timestamp t1, and the Responder replies with a response probe carrying a
receiving timestamp t1' and a responding timestamp t2'. After receiving the
response probe, the Controller records the receiving timestamp t2. The delay during
a single period is calculated based on the four timestamps.
Delay1 = t2 - t1- ( t2' - t1')
Jitter
The jitter is calculated based on two consecutive delays.
Based on the preceding delay formula, the following delay can be calculated as
follows: Delay2 = t4 - t3 - ( t4' - t3')
Jitter = | Delay2 — Delay1 |
Packet Loss Rate
The packet loss rate is calculated based on the sequence numbers (starting from 0)
carried in probes. The Controller sends a probe with a sequence number, and the
Responder replies with a response probe with the same sequence number. Each time
the Controller sends a probe or the Responder replies with a response probe, the
sequence number increases by 1. The packet loss rate is calculated based on the two
rows of sequence numbers.
Packet loss rate = Number of lost packets/Total number of sent packets
b. The Controller collects performance statistics based on TWAMP-Test packets and
reports the statistics to the NMS using Performance Monitoring (PM) for proactive

Equipment
measurement or using MIBs for on-demand measurement. The NMS provides

performance statistics for users.
2.11.3 Applications
2.11.3.1 TWAMP Light Application on an L3VPN

On the network shown in Figure 2-86, the Router that functions as the Controller initiates a
TWAMP Light test to the ATN that resides on an L3VPN and functions as the Responder.
The Controller is deployed on the aggregation node, and the Responder is deployed on the last
hop of the link connecting to a base station. You can query the TWAMP Light test results
using commands on the Controller or Responder or through the NMS.
Figure 2-86 TWAMP Light application on an L3VPN

NMS
Network
Router
ATN
Controller
Responder
The Controller and Responder are routable.

Table 2-23 Acronyms and Abbreviations
Acronym and Abbreviation Full Name
TWAMP Two-Way Active Measurement Protocol
TWAMP Light Two-Way Active Measurement Protocol

Light

Equipment
Feature Description 3 Reliability
3 Reliability
About This Chapter
This document describes the reliability in terms of the overview, principle, and applications.
3.1 VRRP
3.2 Bit-Error-Triggered Protection Switching
3.3 BFD
3.4 NSR Overview
Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.This
section describes how to implement NSR and related technologies.
3.5 Ethernet OAM
3.6 E-LMI
3.7 MPLS-TP OAM
3.8 ISSU Feature Description
Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.
3.1 VRRP
3.1.1 Introduction
Purpose
The Virtual Router Redundancy Protocol (VRRP) is a fault tolerant protocol that groups
several ATNs into a virtual router. If the next hop ATN of a host fails, VRRP switches traffic
to another ATN, ensuring continuous and reliable communication.
The basic concepts related to VRRP are as follows:
l VRRP device: a router running VRRP, which may belong to one or multiple virtual
routers.

Equipment
l Virtual router: an abstract device managed by VRRP, also called a VRRP backup group.
A virtual router functions as a default gateway on a shared local area network (LAN). A
virtual router is identified by a virtual router identifier and has a set of virtual IP
addresses.
l Virtual IP address: IP address of a virtual router. A virtual router is manually assigned
one or multiple virtual IP addresses.
l IP address owner: a VRRP device that uses a virtual router's IP address as an actual
interface address. When working normally, the VRRP device responds to packets
destined for the virtual IP address, such as ping packets and TCP packets.
l Virtual MAC address: a MAC address that is generated according to a virtual router ID.
A VRRP virtual router has a virtual MAC address in the format of 00-00-5E-00-01-
{VRID}, and a VRRP6 virtual router has a virtual MAC address in the format of
00-00-5E-00-02-{VRID}. A virtual router responds to Address Resolution Protocol
(ARP) requests using the virtual MAC address but not the interface's actual MAC
address.
l Primary IP address: an IP address selected from one of the physical interfaces' IP
addresses. It is usually the first configured IP address. The primary IP address functions
as the source IP address in VRRP multicast packets.
l Master Router (virtual router master): a VRRP device that forwards packets to the virtual
IP address and responds to ARP requests. When an IP address owner is available, it
usually functions as the master router.
l Backup Router (virtual router backup): a set of VRRP devices that do not forward
packets. If the master router fails, the backup routers will compete to be the new master
router.
l Preemption mode: a mode in which a backup router becomes the master router if the
backup router has a higher priority than the current master router.
Description
As networks rapidly develop and applications become diversified, various value-added
services, such as Internet Protocol television (IPTV) and video conferencing, have become
widespread. Demands for network infrastructure reliability are increasing, especially in
nonstop network transmission.
Generally, hosts use one default gateway to communicate with external networks. If the
default gateway fails, communication between the hosts and external networks is interrupted.
System reliability can be improved using dynamic routing protocols (such as RIP and OSPF)
or ICMP Router Discovery Protocol (IRDP). However, this method requires complex
configurations and each host must support dynamic routing protocols.
VRRP resolves this issue by enabling several routers to be grouped into a virtual router, also
called a VRRP backup group. In normal circumstances, the master router in the VRRP backup
group functions as a default gateway and provides access services for users. If the master
router fails, VRRP elects a backup router from the VRRP backup group to provide access
services for users.
Benefits
VRRP offers the following benefits to carriers:
l Reliable transmission: A logical VRRP gateway on a multicast or broadcast local area
network (LAN), such as an Ethernet network, ensures reliable transmission over key

Equipment
links. VRRP helps prevent service interruptions if a link to a physical VRRP gateway
fails.
l Flexible applications: A VRRP header is encapsulated into an IP packet. This
implementation allows the association between VRRP and various upper-layer protocols.
l Low network overheads: VRRP uses only VRRP Advertisement packets.
VRRP offers the following benefits to users:
l Simplified configurations: Users only need to specify a gateway address without

configuring routing protocols on their hosts.
l Improved user experience: Users are not aware of a single point of failure.
3.1.2 Principles
VRRP combines a group of routing devices on a LAN into a backup group that functions as a
virtual router. Hosts on the LAN only need to obtain the IP address of the virtual router rather
than the IP address of a specific device in the backup group. When the IP address of the
virtual router is configured as the default gateway for the hosts, the hosts can communicate
with an external network through the virtual gateway.
VRRP dynamically associates the virtual router with a physical device that transmits services.
When the device fails, another device is selected to transmit services. The switchover is
transparent to users, allowing the internal and external networks to communicate without
interruption.
Figure 3-1 Virtual router
Virtual IP Address:
10.110.10.1 ATN A
Master
10.110.10.5
HostA
ATN B
Backup
10.110.10.6
HostB Network
ATN C
10.110.10.7 Backup
HostC
Ethernet
As shown in Figure 3-1, the virtual router is implemented as follows:
l ATN A, ATN B, and ATN C form a VRRP backup group that functions as a virtual
router. The IP address of the virtual router is 10.110.10.1. The virtual IP address can be
specified or borrowed from an interface of a device in this VRRP backup group.

Equipment
l The actual IP addresses of ATN A, ATN B, and ATN C are 10.110.10.5, 10.110.10.6, and
10.110.10.7, respectively.
l Hosts on a LAN only need to set the default route to 10.111.10.1 rather than a physical
interface address of a specific device.
Hosts communicate with external networks through this virtual gateway. The virtual router
functions as follows:
l The master device is selected according to device priorities:
– The device with a higher priority is selected as the master device.
– If two devices have the same priority and one of them is the master device, the
backup device will remain in the backup state. If the two devices with the same
priority compete for becoming the master device, the device with a larger interface
IP address will be selected as the master device.
l Other devices function as backup devices and track the status of the master device.
– The master device sends a VRRP multicast packet at intervals of
Advertisement_Interval to notify backup devices in the backup group that the
master device is working normally.
– In a VRRP group with one backup device, when the backup device does not receive
packets from the master device within the period of Master_Down_Interval, the
backup device transitions itself to become the master device. In a VRRP group with
multiple backup devices, when the backup devices do not receive packets from the
master device within the period of Master_Down_Interval, multiple backup devices
may become the master devices in a short period. The devices then compare the
priorities in the received VRRP packets with their local priorities, and the device
with the highest priority is selected as the master device. After a backup device
becomes the master device, it sends gratuitous ARP packets to update MAC entries
on the switches. User traffic is then switched to the master device. The entire
process is transparent to users.
The preceding analysis demonstrates that when using VRRP, hosts do not need to perform
additional operations and can communicate with external networks even when a device fails.
VRRP Packet Format

VRRP packets notify all VRRP devices associated with the same virtual router ID of the
priority and status of the master device.
VRRP packets are encapsulated in IP packets and sent to the IPv4 multicast address assigned
to VRRP. In the IP packet header, the source address is the primary IP address of the interface
sending the packet, but not the virtual address or secondary address. The destination multicast
address is 224.0.0.18. The TTL value is 255, and the protocol number is 112. Figure 3-2
shows the VRRP packet format.

Equipment
Figure 3-2 VRRP packet format

0 34 7 15 23 31
Version Type Virtual Rtr ID Priority Count IP Addrs
Auth Type Adver Int Checksum
IP Address (1)
......
IP Address (n)
Authentication Data (1)
Authentication Data (2)
The descriptions of each field are as follows:

l Version: indicates the version number of the protocol. The VRRP protocol number is 2.
l Type: indicates the type of VRRP Advertisement packets. The value is fixed at 1.
l Virtual Rtr ID: indicates the virtual router identifier.
l Priority: specifies the priority of the VRRP device that sends a VRRP packet in a VRRP
backup group.
l Count IP Addrs: indicates the number of virtual IP addresses contained in a VRRP
advertisement packet.
l Authentication Type: indicates the authentication type in use. The authentication types
defined in the protocol are as follows:
– 0: Non Authentication
– 1: Simple Text Password
– 2: IP Authentication Header
NOTE
Currently, the ATN supports the following authentication modes:

l Simple Text Password: Simple text authentication
l IP Authentication Header: MD5 authentication
l Advertisement interval: indicates the interval at which advertisement packets are sent.
The default value is 1 second.
l Checksum: indicates the packet checksum.
l IP Address: indicates the virtual addresses of the virtual router. The number of addresses
is specified in the Count IP Addrs field.
l Authentication data indicates the authentication key. Currently, this field is used only in
simple authentication mode and MD5 authentication mode. In other authentication
modes, this field is 0.
State Machine
VRRP defines three states: Initialize, Master, and Backup. Only the device in the Master state
can forward packets destined for the virtual IP address.
Figure 3-3 shows the VRRP state transition.

Equipment
Figure 3-3 VRRP state transition
INITIALIZE
R
ge
ec
th
sa
e c th e
5 wi
ei
es
ei
ve
25 ae
ve prio
m
s
as sg
s
n
a
w
a rity
Sh
rit me
do
St lo
ut
ut
ar w
pr up
do
Sh
tu e r
y
t
w
p
r
th ta
a
n
io
m th a
S
s
m
es n
ve
es
e
s a 25
ei
es
sa
ec
ge 5
iv
ge
R
ce
w
Re
i th
Receives a packet with higher priority
MASTER BACKUP
MASTER_DOWN_TIMER times out
Initialize: A ATN is in the Initialize state when started. If a Startup message is received, the
ATN changes to the Backup state or the Master state. If theATN is the IP address owner, it
changes to the Master state directly. In this state, the ATN does not process VRRP packets.
Master: In the Master state, the ATN performs the following:
l Sends the VRRP packets periodically.
l Sends the virtual MAC address in respond to ARP requests with the virtual IP address.
l Forwards IP packets in which the destination MAC address is the virtual MAC address.
l If the ATN is the virtual IP address owner, it accepts IP packets of which the destination
IP address is the virtual IP address. If the ATN is the not virtual IP address owner, it
discards these IP packets.
l Transitions to the Backup state if the priority in the received packet is greater than the
local priority.
l Transitions to the Initialize state when the interface is shut down.
Backup: In the Backup state, the ATN performs the following:
l Accepts VRRP packets sent by the master and check determine the master is working
properly.
l Does not respond to ARP requests with the virtual IP address.
l Discards IP packets in which the destination MAC address is the virtual MAC address.
l Discards IP packets in which destination IP address is the virtual IP address.
l When receiving a packet of lower priority, it immediately switches to the Master state by
default. If non-preemption is configured, the ATN resets the timer. If a preemption delay
is configured, the ATN resets the timer and switches to the Master state after the
preemption delay expires. When receiving a packet of higher priority, the ATN resets the
timer. When receiving a packet of equal priority, the ATN resets the timer but does not
compare IP addresses.
l Transitions to the master device when receiving the event that
MASTER_DOWN_TIMER times out.

Equipment
l Transitions to the Initialize state when receiving an interface shutdown event.
3.1.2.1 Master/Backup Mode

In master/backup mode, VRRP provides the IP address backup features. A virtual router must
be set up with a master device and multiple backup devices, forming a backup group.
l Normally, the master device transmits all services.

l When the master device fails, a backup device takes over the services.
3.1.2.2 VRRP Load Balancing

A device can function as a backup device in multiple VRRP backup groups. Load balancing is
performed among multiple virtual routers. In load balancing mode, multiple virtual routers
transmit services simultaneously; therefore, two or more backup groups must be set up.
The load balancing mode has the following characteristics:
l Each backup group consists of a master device and multiple backup devices.
l The master devices of backup groups can be different.
l A device can join multiple backup groups and obtain different priorities in each group.
Figure 3-4 VRRP in load balancing mode

Backup group 1
Virtual IP Address:
10.110.10.1 ATN A
Master/Backup
10.110.10.5
HostA
ATN B
Backup
10.110.10.6
HostB Network
ATN C
Backup/Master
HostC 10.110.10.7
Ethernet
Backup group 2
Virtual IP Address:
10.110.10.2
As shown in Figure 3-4, two backup groups are configured, that is, Backup group 1 and
Backup group 2.
l ATN A is the master in Backup group 1 and the backup in Backup group 2.

Equipment
l ATN B is the Backup in both Backup group 1 and Backup group 2.

l ATN C is the master in Backup group 2 and the backup in Backup group 1.
l Backup groups 1 and 2 are gateways for different hosts.
In this mode, backup groups load balance data traffic and back up each other.
3.1.2.3 VRRP Tracking Interface Status

VRRP can track the status of all interfaces. When a tracked interface goes Up or Down, the
device's priority automatically increases or decreases by a specified value. The order of device
priorities in the backup group changes, and the VRRP devices compete with each other to
become the master device.
A VRRP backup group tracks a maximum of eight interfaces in Increase mode or Reduce
mode.
l In Increase mode, when a tracked interface goes Down, the priority of the VRRP backup
group increases by a specified value.
The Increased mode takes effect on both master and backup devices.
l In Reduced mode, when a tracked interface goes Down, the priority of the VRRP backup
group decreases by a specified value.
The Reduced mode takes effect on both master and backup devices.
For information about the typical application environment, see the section "VRRP
Applications."
3.1.2.4 BFD for VRRP
Principles
Devices in a VRRP backup group exchange VRRP Advertisement packets to negotiate the
master/backup status and implement backup. If the link between devices in a VRRP backup
group fails, VRRP Advertisement packets cannot be exchanged to negotiate the master/
backup status. A backup device attempts to preempt the Master state after a period three times
as long as the time interval at which VRRP Advertisement packets are broadcast. During this
period, user traffic is still forwarded to the master device, which results in user traffic loss.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
Related Concepts
Association between a VRRP backup group and a BFD session can be implemented in either
of the following modes:
l When an NPE is directly connected to a UPE, a VRRP backup group can be bound to a
common BFD session. If the BFD session detects a fault and goes Down, the BFD
module notifies the VRRP backup group of the status change. After receiving the
notification, the VRRP backup group changes VRRP priorities of devices and determines
whether to perform a master/backup VRRP switchover.

Equipment
l When an NPE is connected to a UPE through another device, a VRRP backup group can
be bound to link and peer BFD sessions. If the BFD session detects a fault and goes
Down, the BFD module notifies the VRRP backup group of the status change. After
receiving the notification, the VRRP backup group directly performs a master/backup
switchover.
NOTE
In both association modes, the VRRP backup group can be bound to a static BFD session or a static
BFD session with automatically negotiated discriminators.
Implementation
The following sections describe how to associate a VRRP backup group with a BFD session
in different modes.
Association Between a VRRP Backup Group and a Common BFD Session
On the network shown in Figure 3-5, VRRP is enabled on NPE1 and NPE2. NPE1 functions
as the master device and NPE2 functions as the backup device. NPE1 is transmitting user
traffic. A common BFD session is established between NPE1 and NPE2. The VRRP backup
group tracks the status of the BFD session. If the BFD session detects a fault and goes Down,
the BFD module notifies the VRRP backup group of the status change. After receiving the
notification, the VRRP backup group changes VRRP priorities of devices and performs a
master/backup VRRP switchover.
If the BFD session detects a fault in a link between NPE1 and the UPE, the BFD session goes
Down. The BFD module notifies the VRRP backup group of the status change. After
receiving the notification, NPE2's VRRP priority increases to be higher than NPE1's VRRP
priority. NPE2 becomes the master device and takes over traffic. During this process, a rapid
master/backup VRRP switchover is performed.
Figure 3-5 Association between a VRRP backup group and a common BFD session
Master Backup
NPE1 NPE2
VRRP
BFD
UPE
Figure 3-5 shows the network on which a VRRP backup group tracks a common BFD session
when and after a fault occurs.
l NPE1's VRRP priority is 120 and NEP1 is in the Master state in a VRRP backup group.
l NPE2's VRRP priority is 100 and NEP2 is in the Backup state in a VRRP backup group.
The immediate preemption mode is enabled on NPE2.
l On NPE2, the VRRP backup group is configured to track a common BFD session. If the
BFD session detects a fault and goes Down, NPE2 increases its VRRP priority by 40
after being notified.
Implementation is as follows:

Equipment
1. When NPE1 works properly, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE2 tracks the status of NPE1 and the BFD
session.
2. If a BFD session detects either of the following faults, the BFD session goes Down:
– Link or device fault between NPE1 and NPE2
NPE1 receives a VRRP Advertisement packet from NPE2 that becomes the master
device. After detecting that the priority carried in the VRRP Advertisement packet
is higher the local priority, NPE1 stops sending VRRP Advertisement packets and
enters the Backup state.
Before receiving a packet from NPE2, NPE1 retains the Master state, while NPE2
becomes the master device. Both NPE1 and NPE2 are in the Master state during a
short period of time. Using a trunk technique can prevent dual masters in a VRRP
backup group.
NPE2 increases its VRRP priority to 140 (100 + 40), higher than NPE1's VRRP
priority. NPE2 preempts the Master state and sends gratuitous ARP packets to
update MAC addresses on downstream devices.
– NPE1 device fault
NPE2 increases its VRRP priority to 140 (100 + 40), higher than NPE1's VRRP
priority. NPE2 preempts the Master state and sends gratuitous ARP packets to
update MAC addresses on downstream devices.
3. After the fault is rectified, the BFD session goes Up.
NPE2 restores the priority value of 100. NPE2 retains the Master state and is still able to
send VRRP Advertisement packets.
After receiving the packets sent by NPE2, NPE1 detects that the priority carried in the
packets is lower than the local VRRP priority, and waits a specified period before
preempting the Master state. After restoring the Master state, NPE1 sends VRRP
Advertisement packets and gratuitous ARP packets.
After receiving VRRP Advertisement packets carrying a higher priority than the local
priority, NPE2 enters the Backup state.
4. NPE1 in the Master state forwards user traffic to networks and NPE2 is in the Backup
state.
The preceding process shows that BFD for VRRP is different from VRRP. After BFD for
VRRP is used and a fault occurs, the backup device immediately preempts the Master state
without waiting a period three times the interval at which a VRRP Advertisement packet is
broadcast. A master/backup VRRP switchover can be implemented in milliseconds.
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
On a network shown in Figure 3-6, VRRP runs between two NPEs. A peer BFD session is set
up between NPEs to detect link and device failures. A link BFD session is established
between each NPE and a UPE to detect link and device failures. When NPE2 detects that the
peer BFD session goes down, while link2 BFD session goes Up, NPE2's VRRP status
changes from Backup to Master and takes over traffic.

Equipment
Figure 3-6 Association between a VRRP backup group and link and peer BFD sessions
NPE1 NPE2
Link1 Peer BFD Link2

BFD BFD
UPE
Figure 3-6 shows the network on which a VRRP backup group tracks link and peer BFD
sessions.
l NPE1 and NPE2 run VRRP.
l A peer BFD session is established between NPEs through the UPE to detect link and
device failures.
l Link1 BFD session is established between the UPE and NPE1. Link2 BFD session is
established between the UPE and NPE2 to detect link and device failures.
Implementation is as follows:
1. When NPE1 works properly, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE1 tracks the BFD session status. NPE2
tracks the status of NPE1 and the BFD session.
2. If a BFD session detects either of the following faults, the BFD session goes Down:
– Link1 or the UPE fails. Link1 BFD session and the peer BFD session go Down.
Link2 BFD session is Up.
NPE1's VRRP status directly becomes Initialize.
NPE2's VRRP status directly becomes Master.
– NPE1 fails. Link1 BFD session and the peer BFD session go Down. Link2 BFD
session is Up. NPE2's VRRP status becomes Master.
3. After a fault is rectified, the BFD sessions go Up, and the NPEs in the VRRP backup
group restore their VRRP status.
NOTE
A Link2 fault does not affect NPE1 status, and NPE1 continues to forward upstream traffic properly.
However, NPE2's VRRP status becomes Master if the peer BFD session and Link2 BFD session go
Down, and NPE2 detects the peer BFD session status change before detecting Link2 BFD session status
change. After NPE2 detects Link2 BFD session status change, NPE2's VRRP status enters Initialize.
Figure 3-7 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.

Equipment
Figure 3-7 State machine for the association between a VRRP backup group and link and
peer BFD sessions
INITIALIZE
Th
n sio
rit s
e go
5 rio oe
ow s
Th s U ty v
l i n es
D se
go rior
y
25 p g
e p, alu
k
e i
es FD
is RP on
BF Do
lin a e
e R si
go k B
k nd is
D n
lu V es
BF t n
se
lin
va the s
D he ot 2
w
ss
d FD
e
se VR 5
Th
io
an k B
n
ss R 5
p, in
io P
U el
n
Th
MASTER BACKUP
The peer BFD session goes Down
and the link BFD session goes Up
The preceding process shows that after link BFD for VRRP and peer BFD for VRRP are
used, the backup device can immediately preempt the Master state if a fault occurs. The
backup device does not wait a period as long as three times the interval at which a VRRP
Advertisement packet is broadcast or its VRRP priority is changed. A master/backup VRRP
switchover can be performed in milliseconds.
Benefits
BFD for VRRP speeds up masters/backup VRRP switchovers if faults occur.
3.1.2.5 Pinging the Virtual IP Address

Pinging the virtual IP addresses of VRRP backup groups facilitates monitoring of virtual
routers. However, this function may result in Internet Control Message Protocol (ICMP)
attacks. A command is provided for you to determine whether to enable or disable ping to a
virtual IP address.
3.1.2.6 VRRP Security

Different authentication modes and authentication keys can be set in VRRP packet headers
based on network security.
In a secure network, the default setting can be used. That is, the device does not authenticate
the sent or received VRRP packets. All received VRRP packets are considered as valid. In
this case, no authentication key needs to be set.
VRRP provides simple text authentication, HMAC-SHA256 authentication, and Message-
Digest Algorithm 5 (MD5) authentication for networks that are vulnerable to attacks.
3.1.2.7 VRRP Smooth Switching

After an active/standby switchover occurs on the master device, there is a period of time
before the new active main board (AMB) will work normally. This period of time varies
according to device and configuration. During this period, the master device cannot process
VRRP packets normally, and so the backup devices cannot receive VRRP multicast packets.

Equipment
As a result, a backup device preempts to become the master device. Then the new master
device sends a gratuitous ARP packet to the virtual IP address of each virtual router to notify
the related bound modules of the status change. In preemption mode, if the original master
device has a higher priority, it can preempt to become the master device again after the
switchover. This causes the VRRP status to change twice, affecting service traffic.
To prevent service traffic forwarding from being affected during an AMB/SMB switchover,
VRRP devices must support VRRP smooth switching.
When the AMB and SMB on a device are working properly, the master device in a VRRP
backup group sends VRRP multicast packets at intervals of Advertisement_Interval. The
backup device determines whether the master device works properly based on the multicast
packets it receives.
During VRRP smooth switching, the master device cooperates with backup devices to ensure
smooth transmission of services.
l To perform VRRP smooth switching, the master device and backup devices must be
enabled to learn the interval at which VRRP packets are sent. After this function is
enabled:
– The master device does not learn the interval at which VRRP packets are sent or
check consistency of the intervals.
– When a backup device receives a VRRP packet from the master device, it checks
the interval in the VRRP packets. If the interval in the packet is different from the
interval configured on the device, the backup device changes its own interval to the
interval specified in the packet.
l ATN A is configured with VRRP smooth switching. After an AMB/SMB switchover
occurs and the new AMB starts, VRRP saves the currently configured interval, changes
the interval of the master VRRP backup group, and sends a VRRP switching packet
carrying the new interval to ATN B at the currently configured intervals.
l After receiving the VRRP packet, ATN B finds that the interval carried in the VRRP
packet is different the locally configured interval. ATN B then changes the local interval
to the interval carried in the received VRRP packet.
l After smooth switching is complete, ATN A sends a VRRP Recovery packet carrying the
interval set before the AMB/SMB switchover. ATN B then learns the interval again.
When performing VRRP smooth switching, note the following:
l During VRRP smooth switching, the interval learning function takes precedence over the
preemption function. That is, when the interval carried in the received packet is different
from the current interval and the priority carried in the received packet is lower than the
current priority, VRRP first learns the interval and resets the timeout timer, and then
determines whether to preempt to become the master.
l VRRP smooth switching also depends on the system performance. If the system is very
busy after a AMB/SMB switchover occurs and cannot schedule operations of the VRRP
module, VRRP smooth switching cannot take effect.
3.1.2.8 mVRRP
Principles
A UPE is usually dual-homed to two NPEs at the aggregation layer on a MAN. Multiple
VRRP backup groups can be configured on the two NPEs to transmit various types of

Equipment
services. Each VRRP backup group maintains its own state machine, leading to transmission
of a lot of VRRP Advertisement packets between NPEs.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission,
a VRRP backup group can be configured as a Management Virtual Router Redundancy
Protocol (mVRRP) backup group. Other VRRP backup groups are bound to the mVRRP
backup group and become service VRRP backup groups. Only the mVRRP backup group, not
service VRRP backup groups, sends VRRP packets to negotiate the master/backup status. The
mVRRP backup group determines the master/backup status of the service VRRP backup
groups.
Related Concepts
An mVRRP group has all functions of a common VRRP backup group. Different from a
common VRRP backup group, an mVRRP backup group can be bound to other service VRRP
backup groups and determine the status of the service VRRP backup groups.
l An mVRRP backup group provides the following functions:

– When the mVRRP backup group functions as a gateway, it determines the master/
backup status of devices and transmits services. Before an mVRRP backup group is
configured, a common VRRP backup group must be configured and assigned a
virtual IP address. The common VRRP backup group can be configured as an
mVRRP backup group and the virtual IP address is a gateway IP address for users.
– When the mVRRP backup group does not function as a gateway, it only determines
the master/backup status of devices, and does not transmit services. In this situation,
the mVRRP backup group does not need a virtual IP address. This means that the
mVRRP backup group can be directly configured on an interface. This
configuration helps simplify maintenance.
l After common VRRP backup groups are added to an mVRRP backup group, they do not
need to send VRRP packets to determine the status. The mVRRP backup group sends
VRRP packets to determine its status and the status of all its bound service VRRP
backup groups. This reduces the bandwidth that VRRP packets use.
NOTE
An mVRRP backup group can be bound to a maximum of 127 VRRP backup groups, but cannot be
bound to another mVRRP backup group.
Benefits
This feature offers the following benefits:
l Management is simplified. The mVRRP backup group determines the master/backup

status of the service VRRP backup groups.
l CPU and bandwidth resource consumption is reduced. Service VRRP backup groups do
not send VRRP packets.
3.1.2.9 VRRPv3 Packet Format

Currently, VRRPv2 packets cannot be transmitted on an IPv6 network. VRRPv3 packets, by
comparison, can be transmitted on both IPv4 and IPv6 networks.
The VRRP protocol supports both VRRPv2 and VRRPv3 packets. VRRPv2 is defined in RFC
3768, and VRRPv3 is defined in RFC 5798. Both VRRPv2 and VRRPv3 are used to advertise

Equipment
the priority and status of the master device to other devices in a backup group. Figure 3-8
shows the format of a VRRPv3 packet.
Figure 3-8 VRRPv3 packet format

0 3 4 7 8 15 16 23 24 31
Version Type Virtual Rtr ID Priority Count IPvX Addr
(rsvd) Max Adver Int Checksum
IPvX Address(es)
The meanings of the fields are as follows:

l Version: indicates the VRRP version number. For VRRPv3, the value is 3.
l Type: indicates the type of VRRP Advertisement packets. The value is fixed at 1.
l Virtual Rtr ID: indicates the ID of a VRRP backup group.
l Priority: indicates the priority of the master device that sends VRRP Advertisement
packets in a VRRP backup group.
l rsvd: indicates the reserve field whose value must be set to 0.
l Count IP Addrs: indicates the number of IP addresses carried in a VRRP Advertisement
packet.
l Max Adver Int: indicates the interval at which VRRP Advertisement packets are sent.
The value is expressed in centiseconds.
l Checksum: indicates the packet checksum.
l IPvX Address(es): indicates the virtual IPv4 or IPv6 address(es)of a VRRP backup
group.
The major differences between VRRPv2 and VRRPv3 are described as follows:
l Authentication functions are different. VRRPv3 does not support authentication, but
VRRPv2 does.
l The units of the interval at which Advertisement packets are sent are different. VRRPv3
supports the interval in centiseconds, but VRRPv2 supports the interval in seconds.
3.1.3 Applications

Equipment
3.1.3.1 VRRP Tracking Interface Status
Figure 3-9 Networking diagram for VRRP tracking interface status
Internet
ATN-A VRRP ATN-B
Switch
Solved problem: VRRP cannot detect status changes on interfaces that are not enabled with
VRRP. In this case, when the outbound interface is faulty, VRRP cannot detect the fault,
which causes service interruption.
The configuration is as follows:
l VRRP is enabled to track specified interfaces.
l A VRRP backup group tracks an interface in Increased mode or Reduced mode.
l When the status of the interface tracked by VRRP changes, the VRRP backup group is
notified of the change and then increases or decreases the VRRP priority to determine
VRRP switchover.
As shown in Figure 3-9, ATN—A and ATN—B are enabled with VRRP. In addition, the
priority of the VRRP backup group on ATN—B is higher than the priority of the VRRP group
on ATN—A. ATN B tracks interface in Reduced mode. ATN—B functions as the master
device and the user traffic is sent by the master ATN—B, as shown in dotted lines in Figure
3-9. Now, interface on ATN—B connected to the Internet is faulty. The VRRP backup group
that tracks GE 1/0/0 in Reduced mode decreases the priority. Then, ATN—A preempts to be
the master device and receives user traffic and sends the traffic to the Internet.

Equipment
3.1.3.2 mVRRP
Figure 3-10 Typical mVRRP networking
Master
NPE1
UPE
NPE2
mVRRP Backup
Service VRRP
Problem: A large number of VRRP packets are transmitted, wasting bandwidth and CPU
resources.
To solve this problem, configure mVRRP.
l An mVRRP backup group and multiple ordinary VRRP backup groups are set up on
NPE 1 and NPE 2. The ordinary VRRP backup groups are bound to the mVRRP backup
group and function as service VRRP backup groups.
l The UPE does not sense the mVRRP backup group and service VRRP backup groups.
As shown in Figure 3-10, when an mVRRP backup group on NPE 1 changes from the Master
state to the Backup or Initialize state, the mVRRP backup group requests all its bound service
VRRP backup groups to change their state to Backup. In this case, the mVRRP backup group
on NPE 2 changes from the Backup state to the Master state, and all service VRRP backup
groups bound to it also change their status to Master. When the mVRRP backup group and the
service backup groups change to the Master state, they broadcast gratuitous ARP packets to
switch user traffic to the new master backup groups.
3.1.4 Terms, Acronyms and Abbreviations

Acronym & Full Name
Abbreviatio
n
ARP Address Resolution Protocol
BFD Bidirectional Forwarding Detection

Equipment
Acronym & Full Name

Abbreviatio
n
L2VPN Layer 2 virtual private network
ME Metro Ethernet
mVRRP Management Virtual Router Redundancy Protocol
mVSI management virtual switching instance
mVPLS management virtual private LAN service
PW pseudo wire
QinQ 802.1Q in 802.1Q
VRRP Virtual Router Redundancy Protocol
VSI virtual switching instance
3.2 Bit-Error-Triggered Protection Switching
3.2.1 Introduction to Bit-Error-Triggered Protection Switching

Definition
A bit error refers to the deviation between a bit that is sent and the bit that is received. Cyclic
redundancy checks (CRCs) are commonly used to detect bit errors. Bit errors caused by line
faults can be corrected by rectifying the associated link faults. Random bit errors caused by
optical fiber aging or optical signal jitter, however, are more difficult to correct. Bit-error-
triggered protection switching is a reliability mechanism that triggers protection switching
based on bit error events (bit error occurrence event or correction event) to minimize bit error
impact.
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from
narrowband voice services to integrated broadband services, including voice and streaming
media. Meeting the bandwidth demand with traditional bearer networks dramatically raises
carriers' operation costs. To tackle the challenges posed by this rapid broadband-oriented
development, carriers urgently need mobile bearer networks that feature flexibility, low costs,
and high efficiency. IP-based mobile bearer networks are an ideal choice. IP radio access
networks (RANs), a type of IP-based mobile bearer network, are increasingly widely used.
Traditional bearer networks use the retransmission mechanism or the mechanism that allows
one end to accept only one copy of packets from the multiple copies of packets sent by the
other end to minimize bit error impact. IP RANs have higher reliability requirements than
traditional bearer networks when carrying broadband services. Traditional fault detection
mechanisms cannot trigger protection switching based on random bit errors. As a result, bit
errors may degrade or even interrupt services on an IP RAN in extreme cases.

Equipment
To solve this problem, configure bit-error-triggered protection switching.
NOTE
To prevent impacts on services, check whether protection links have sufficient bandwidth resources
before deploying bit-error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
l Protects traffic against random bit errors, meeting high reliability requirements and
improving service quality.
l Enables devices to record bit error events. These records help carriers locate the nodes or
lines that have bit errors and take corrective measures promptly.
3.2.2 Principles
Table 3-1 describes the functions provided by bit-error-triggered protection switching.
Table 3-1 Functions provided by bit-error-triggered protection switching

Function Description Usage Scenario
Interface- This function detects bit errors, This function is the foundation for
based bit calculates the bit error rate (BER), and bit-error-triggered protection
error reports bit error events. switching.
detection
Bit-error- This function detects bit error events This function protects trunk
triggered on member trunk interfaces and uses interfaces against bit errors.
trunk bit error events to trigger trunk
update interfaces to update the availability
status of member interfaces.
Bit-error- This function triggers routes to re- This function protects services
triggered converge after detecting bit error transmitted over a Label
section events on interfaces, which in turn Distribution Protocol (LDP) label
switching triggers traffic to switch from the switched path (LSP) against bit
faulty route to another route. errors.
Bit-error- This function triggers link cost In a bit-error-triggered LDP LSP

triggered adjustment after detecting a bit error switching scenario, if the primary
route event on an OSPF or IS-IS interface. and secondary links of the LDP
switching The link cost adjustment then triggers LSP both encounter bit error events,
traffic to switch between the primary traffic transmitted over the LDP
and secondary links. LSP is interrupted. To prevent this
problem, configure bit-error-
triggered route switching, so that
IS-IS or OSPF interfaces can divert
traffic to links with lower BERs by
adjusting link costs. This
implementation minimizes the
impact of bit error events without
interrupting services.

Equipment
Function Description Usage Scenario
Bit-error- Bit-error-triggered Resource In a scenario in which an RSVP-TE

triggered Reservation Protocol-Traffic tunnel carries services and a traffic
tunnel Engineering (RSVP-TE) tunnel engineering (TE) hot standby tunnel
switching switching: triggers traffic to switch is configured for the RSVP-TE
between the primary and backup LSPs tunnel, bit-error-triggered RSVP-TE
of an RSVP-TE tunnel after detecting tunnel switching can protect
bit error events. services transmitted over the RSVP-
TE tunnel against bit errors.
Bit-error-triggered pseudo wire (PW) In a scenario in which an RSVP-TE

switching: protects Layer 2 virtual tunnel carries a PW and PW
private network (L2VPN) services redundancy is configured, if bit-
against bit errors by switching L2VPN error-triggered RSVP-TE tunnel
services from the faulty PW to another switching cannot protect L2VPN
PW. services against bit errors, you can
use bit-error-triggered PW
switching to do so.
Bit-error-triggered VPN route In H-VPN scenario in which an

switching: protects L3VPN services RSVP-TE tunnel carries L3VPN
against bit errors by switching L3VPN services, if bit-error-triggered
services from the faulty VPN route to RSVP-TE tunnel switching cannot
another VPN route. protect L3VPN services against bit
errors, you can use bit-error-
triggered VPN route switching to
do so.
Related Concepts
Bit-error-triggered protection switching involves the following concepts:
l Bit error: refers to the deviation between a bit that is sent and the bit that is received.
l BER: is the number of bit errors divided by the total number of transferred bits during a
studied time interval. The BER can be considered as an approximate estimate of the bit
error probability.
l Segment BER: is calculated based on the bit errors received by the inbound interface on
an LSP node.
l LSP BER: is calculated based on the BER of each segment on an LSP.
Interface-based Bit Error Detection

An interface uses the cyclic redundancy check (CRC) algorithm to detect bit errors and
calculate the BER. When the BER detected exceeds the maximum BER allowed by an
interface, the interface considers that a bit error occurrence event has occurred and sends an
excessive BER message using Bidirectional Forwarding Detection (BFD). When the BER
detected falls below the maximum BER allowed by the interface, the interface considers that a
bit error correction event has occurred and sends a normalized BER message using BFD.
The switching type configured on an interface can be:

Equipment
l Trigger-section: applies to bit-error-triggered section switching.

l Link-quality: applies to bit-error-triggered route switching.
l Trigger-LSP: applies to bit-error-triggered RSVP-TE tunnel or PW switching.
Bit-Error-Triggered Trunk Update

If a member interface of a trunk interface enabled with bit error detection detects a bit error
event, the member interface changes its own status no matter whether the configured
switching type is trigger-LSP or trigger-section. The change of the member interface status
then triggers the trunk interface to update the availability status of the member interface.
A trunk interface goes Down when the number of member interfaces in the Up state falls
below the configured lower threshold.
Bit-Error-Triggered Section Switching

If the switching type configured on an interface is trigger-section, the bit error event detected
by this interface triggers the interface to change its status, which in turn triggers routes to re-
converge. Bit-error-triggered section switching protects services transmitted over an LDP LSP
against bit errors.
To deploy bit-error-triggered section switching, configure bit error detection on the interfaces
along an LDP LSP and configure the switching type as trigger-section. If an interface along
an LDP LSP detects a bit error event, the interface triggers route re-convergence, which in
turn triggers the LDP LSP to switch to another LSP.
Bit-Error-Triggered Route Switching

In a bit-error-triggered LDP LSP switching scenario, bit error detection is enabled on
interfaces along the primary and secondary links of the LDP LSP and the switching type of
each interface is set to trigger-section. If the BER detected by an interface reaches or exceeds
the bit error alarm reporting threshold, the status of the interface changes to Down. If the
primary and secondary links of the LDP LSP both have Down interfaces, traffic transmitted
over the LDP LSP is interrupted. To prevent this problem, configure bit-error-triggered route
switching. Then, if the BER detected by an OSPF or IS-IS interface reaches or exceeds the
upper threshold for triggering link quality changes, the OSPF or IS-IS interface changes its
link quality to low instead of changing its status to Down. An IS-IS or OSPF interface adjusts
the link cost based on link quality as follows:
l If the BER detected by the OSPF or IS-IS interface reaches or exceeds the bit error alarm
reporting threshold, the OSPF or IS-IS interface increases the link cost, so that this link,
which has a higher BER, is not used by the optimal route.
l If the BER detected by the OSPF or IS-IS interface falls below the bit error alarm
clearing threshold, the OSPF or IS-IS interface restores the link cost, so that this link,
which has a lower BER, is used by the optimal route.
As a result, the LDP LSP always uses the link with a lower BER to transmit traffic,
minimizing the impact of bit errors on services.
NOTE
Bit-error-triggered route switching and section switching are mutually exclusive. Before you configure
bit-error-triggered route switching for an LDP LSP, ensure that bit-error-triggered section switching is
not configured.

Equipment
Bit-Error-Triggered Tunnel Switching

Bit-error-triggered RSVP-TE tunnel switching
To deploy bit-error-triggered RSVP-TE tunnel switching, configure bit error detection on the
interfaces along an RSVP-TE tunnel, configure the switching type as trigger-LSP, and
configure bit-error-triggered protection switching on the tunnel interface. After an interface
along the primary or backup LSP of the RSVP-TE tunnel detects a bit error event, the RSVP-
TE tunnel enters the process shown in Figure 3-11.
Figure 3-11 Bit-error-triggered RSVP-TE tunnel switching

Each intermediary
The egress calculates the The egress sends the
node sends the
LSP BER based on LSP BER to the
segment BER to the
segment BERs ingress
egress
The RSVP-TE tunnel The ingress

determines whether to determines LSP bit
perform a primary/backup error status based on
LSP switchover the LSP BER
Figure 3-12 shows how LSP bit error status is determined. If the BER of an LSP reaches or
exceeds the bit-error-triggered protection switching threshold of the RSVP-TE tunnel, the
LSP is in the excessive BER state. If the BER of the LSP is below the bit-error-triggered
protection switching threshold, the LSP is in the normalized BER state.
Figure 3-12 LSP bit error status schematic diagram

BER Red indicates the excessive BER state
Green indicates the normalized BER state
Protection switching
threshold
Revertive switching
threshold
After the bit error status of the primary and backup LSPs are determined, the RSVE-TE
tunnel determines whether to perform a primary/backup LSP switchover based on the
following principles:
l If the primary and backup LSPs are both in the excessive or normalized BER state, the
RSVE-TE tunnel transmits traffic over the primary LSP.
l If one LSP is in the excessive BER state and the other LSP is in the normalized BER
state, the RSVE-TE tunnel transmits traffic over the latter one, no matter whether the
latter LSP is the primary or backup LSP.

Equipment
Bit-Error-Triggered PW Switching
As shown in Figure 3-13, an RSVP-TE tunnel carries a PW. PW redundancy is configured to
provide service-level protection. If the RSVP-TE tunnel does not have a TE hot standby
tunnel or the primary and backup LSPs of the RSVP-TE tunnel are both in the excessive BER
state, bit-error-triggered RSVP-TE tunnel switching cannot protect traffic against bit errors.
To resolve this issue, you can configure bit-error-triggered PW switching.
Figure 3-13 Usage scenario of bit-error-triggered PW switching
SPE1
PW1
NPE
VPN Site
CE Bypass PW
UPE
PW2
SPE2
RSVP-TE Tunnel
The principles for bit-error-triggered PW switching are as follows:

l When the tunnel carrying the primary PW enters the excessive BER state but the tunnel
carrying the secondary PW is in normalized BER state, traffic switches to the secondary
PW.
l When the tunnel carrying the primary PW enters the normalized BER state, traffic
switches back to the primary PW.
l If the tunnels carrying the primary and secondary PWs are both in the excessive BER
state, traffic travels along the primary PW.
NOTE
The bit error status of the tunnel carrying the PW refers to the bit error status of the LSP that transmits
traffic in the tunnel.
You can configure a revertive switching policy to control revertive PW switching. When the tunnel
carrying the primary PW enters the normalized BER state, the revertive switching policy allows traffic
to immediately switch back to the primary PW, to switch back to the primary PW after a delay, or not to
switch back to the primary PW.
Bit-Error-Triggered VPN Route Switching

Figure 3-14 shows a scenario in which RSVP-TE tunnels carry L3VPN services. The UPE is
dual-homed to SPE1 (master device) and SPE2 (backup device). On SPE1, the VPN route
destined for the UPE is iterated to the RSVP-TE tunnel between SPE1 and the UPE. In
normal circumstances, the UPE and NPE preferentially select the VPN routes advertised by
SPE1.
If the RSVP-TE tunnel from the UPE to SPE1 does not have a TE hot standby tunnel or its
primary and backup LSPs are both in the excessive BER state, bit-error-triggered RSVP-TE

Equipment
tunnel switching cannot protect L3VPN services against bit errors. To resolve this issue,
configure bit-error-triggered VPN route switching on SPE1. After the configuration is
complete, SPE1 automatically reduces the priority of VPN routes advertised by itself if it
detects a bit error event. Then, the UPE and NPE preferentially select the VPN routes
advertised by SPE2. As a result, L3VPN services are transmitted over the links without bit
errors. After the RSVP-TE tunnel between the UPE and SPE1 recovers, SPE1 automatically
increases the priority of VPN routes advertised by itself, so that the UPE and NPE
preferentially select the VPN routes advertised by SPE1 again.
Figure 3-14 Usage scenario of bit-error-triggered VPN route switching

SPE1
Backbone
NPE
VPN Site
CE
UPE
SPE2
RSVP-TE Tunnel
3.2.3 Applications
3.2.3.1 Application of Bit-Error-Triggered Protection Switching in a Scenario in

Which an RSVP-TE Tunnel Carries a PW
Networking Description
Figure 3-15 shows a typical IP radio access network (RAN) networking diagram. The IP
RAN uses a Resource Reservation Protocol-Traffic Engineering (RSVP-TE) tunnel to carry a
pseudo wire (PW). A traffic engineering (TE) hot standby tunnel is configured for the RSVP-
TE tunnel to provide link-level protection. PW redundancy is configured to provide service-
level protection.

Equipment
Figure 3-15 IP RAN using an RSVP-TE tunnel to carry a PW

TE Hotstandby TE Hotstandby
RNC
PW1
(primary)
PW3
Access Aggregation Bypass

PW
PW2 PW4
NodeB
(secondary)
BNC
Feature Deployment
To meet the high reliability requirements of the IP RAN and better protect services against bit
errors, configure bit-error-triggered protection switching for both the RSVE-TE tunnel and
PWs.
To configure bit-error-triggered protection switching for the RSVP-TE tunnel, enable bit error
detection on the interfaces along the primary and backup LSPs, configure the switching type
as trigger-LSP, and configure bit error alarm thresholds. Then, enable bit-error-triggered
protection switching on the tunnel interface and set the bit-error-triggered protection
switching threshold and bit-error-triggered revertive switching threshold.
After you configure bit-error-triggered protection switching for the RSVP-TE tunnel,
configure bit-error-triggered protection switching for the PW carried over the RSVP-TE
tunnel and the backup PW.
3.2.3.2 Application of Bit-Error-Triggered Protection Switching in a Scenario in

Which an LDP LSP Carries a PW
Figure 3-16 shows a typical IP radio access network (RAN) networking diagram. The IP
RAN uses a Label Distribution Protocol (LDP) label switched path (LSP) to carry a pseudo
wire (PW). A protection mechanism, such as LDP fast reroute (FRR) or LDP-Interior
Gateway Protocol (IGP) synchronization, is used to provide link-level protection. PW
redundancy is configured to provide service-level protection.

Equipment
Figure 3-16 IP RAN using an LDP LSP to carry L2VPN services

LDP LSP LDP LSP
RNC
PW1
(primary)
PW3
Access Aggregation Bypass

PW
PW2 PW4
NodeB
(secondary)
BNC
Feature Deployment
To meet the high reliability requirements of the IP RAN and protect services against bit
errors, configure bit-error-triggered protection switching for the LDP LSP. To do so, enable
bit error detection on the interfaces along the LDP LSP, configure the switching type as
trigger-section, and configure bit error alarm thresholds. After an interface along the LDP
LSP detects a bit error event, the interface updates its own status and triggers route re-
convergence, which in turn triggers LDP LSP or PW switching (or revertive LDP LSP or PW
switching).
3.2.3.3 Application of Bit-Error-Triggered Protection Switching on Trunk

Interfaces
Figure 3-17 shows a trunk interface networking diagram.
Figure 3-17 Trunk interface networking diagram

Trunk
Feature Deployment
To improve trunk reliability, you can configure bit error detection for trunk interfaces. To do
so, enable bit error detection on each member interface and then on the trunk interface itself.
After bit error detection is configured for a trunk interface, a member interface changes its
status when detecting a bit error event, no matter whether the configured switching type is
trigger-LSP or trigger-section. Then, the trunk interface updates the availability status of the
member interface:
l If the status of the member interface changes from Up to Down, the trunk interface
disables the member interface from forwarding traffic.
l If the status of the member interface changes from Down to Up, the trunk interface
enables the member interface to forward traffic.

Equipment
A trunk interface goes Down when the number of member interfaces in the Up state falls
below the configured lower threshold.

Terms
Term Description
Bit error A bit error refers to the deviation between a

bit that is sent and the bit that is received.
Cyclic redundancy checks (CRCs) are
commonly used to detect bit errors.

Acronym & Full Name
Abbreviation
CRC cyclic redundancy check
PW pseudo wire
3.3 BFD
3.3.1 Overview
Purpose
Bidirectional forwarding detection (BFD) rapidly monitors communications faults between
systems and notifies upper-layer applications of those faults.
Description
To minimize the impact of a fault on services and improve network availability, a network
device must rapidly detect communications faults between adjacent devices so that the upper
layer protocol can resolve the issue and recover services.
Currently, the existing detection mechanisms are as follows:
l Hardware detection: For example, Synchronous Digital Hierarchy (SDH) alarms are
used to detect link faults. Hardware detection can fast detect a fault; however, not all
media support this hardware detection mechanism.
l Slow Hello: Usually refers to the Hello mechanism used by a routing protocol. The slow
Hello mechanism can detect a fault in seconds. For example, in high-speed gigabit rate

Equipment
data transmission, a detection time of more than one second results in a large data loss.
Delay-sensitive services, like voice, cannot function with more than a one second delay.
l Other detection mechanisms: Different protocols or manufacturers may provide their
own proprietary detection mechanisms; however, deploying proprietary detection
mechanisms on different systems can be very difficult.
BFD has been developed to supplement other detection mechanisms.
BFD provides the following features:
l Low-cost fast fault detection for channels between adjacent forwarding engines. Faults
can be detected on interfaces, data links, and forwarding engines.
l A single mechanism capable of real-time detection over any media, at any protocol layer.
3.3.2 Key Concepts

BFD detects communications faults between forwarding engines, specifically the connectivity
of a data protocol on a path between systems. The path can be a physical link, a logical link,
or a tunnel.
BFD can be regarded as a service provided by the system.
l Upper layer applications provide BFD with parameters, such as the detection address and
the detection time.
l BFD creates, deletes, or modifies a BFD session according to this information and
notifies the upper layer applications of the session status.
BFD offers the following features:
l Low-cost, fast detection of path faults between adjacent forwarding engines

l A single mechanism capable of detection over any media, at any protocol layer,
facilitating an integrated detection mechanism.
The following sections describe basic BFD concepts, including the BFD detection
mechanism, detected link types, BFD session modes, and session management.
BFD Detection Mechanism

In the BFD detection mechanism, two systems set up a BFD session, and periodically send
BFD control packets along the path between them. If one system does not receive BFD
control packets within a specified period, the system concludes that a fault has occurred on
the path.
BFD control packets are encapsulated in UDP packets. In the initial phase of a BFD session,
both systems negotiate with each other using parameters in BFD control packets, such as
discriminators, expected minimum intervals for sending and receiving BFD control packets,
and local BFD session status. When negotiations are successful, the two systems send BFD
control packets to each other at the negotiated intervals.
To meet fast detection requirements, the BFD draft specified that BFD control packets must
be sent and received at intervals expressed in microseconds. However, BFD-enabled devices
of most manufacturers can only process BFD control packets within milliseconds due to
limited processing capabilities. Therefore, the configured interval is expressed in milliseconds
and is converted to microseconds during internal processing. The minimum detection time
that the ATN supports is 10 milliseconds.

Equipment
BFD provides the following detection modes:

l Asynchronous mode: The main mode is asynchronous mode. In asynchronous mode, two
systems periodically send BFD control packets to each other. If one system fails to
receive packets consecutively, the BFD session is considered Down.
l Query mode: The second mode is the query mode. If multiple BFD sessions exist in a
system, periodically sending BFD control packets can draw significant system resources.
To prevent this, you can use the query mode. In query mode, after a BFD session is set
up, the system does not periodically send BFD control packets, but detects the
connectivity through another mechanism (such as the Hello mechanism of a routing
protocol or the hardware detection mechanism), reducing system resources used by the
BFD session.
An auxiliary function of the two modes is the Echo function. When the Echo function is
activated, a BFD control packet is sent as follows: The local system sends a BFD control
packet and the remote system sends the BFD control packet back through the forwarding
channel. If consecutive Echo packets are not received, the BFD session is declared Down. The
Echo function can work in asynchronous or query mode.
Types of Links That Can Be Detected by BFD
Table 3-2 Types of links detected by BFD

Link Type Classification Description
IP links l Layer 3 physical If a physical Ethernet

interfaces interface has multiple sub-
l Ethernet sub-interfaces interfaces, BFD sessions can
(including Eth-Trunk be separately established on
sub-interfaces) the physical Ethernet
interface and its sub-
interfaces.
IP-Trunks l IP-Trunk links Separate BFD sessions can

l IP-Trunk member links be established to detect link
faults on an IP-Trunk and its
member interfaces at the
same time.
Eth-Trunks l Layer 2 Eth-Trunk links Separate BFD sessions can

l Layer 2 Eth-Trunk be established to detect link
member links faults on an Eth-Trunk and
its member interfaces at the
l Layer 3 Eth-Trunk links same time.
l Layer 3 Eth-Trunk
member links
VLANIF l VLAN Ethernet member Separate BFD sessions can

links be established to detect link
l VLANIF interfaces faults on a VLANIF
interface and its member
interfaces at the same time.

Equipment
Link Type Classification Description
MPLS LSPs l In static mode, BFD can l BFD can detect a TE

detect the following tunnel that uses CR-
types of LSPs: Static or RSVP-TE as its
– Static LSPs signaling protocol and
detect the primary LSP
– LDP LSPs bound to the TE tunnel.
– TE tunnels, static l A dynamic BFD session
CR-LSPs bound to cannot detect the entire
tunnels, and RSVP TE tunnel.
CR-LSPs
l In dynamic mode, BFD
can detect the following
types of LSPs:
– LDP LSPs
– Static CR-LSPs
bound to tunnels and
RSVP CR-LSPs
– LDP Tunnel
PWs l SS PWs -
l MS PWs
l BGP PWs
l IP links
In the ATN, BFD in either single-hop detection mode or multi-hop detection mode can
monitor the following IP links:
– Layer 3 physical interfaces
– Ethernet sub-interfaces (including Eth-Trunk sub-interfaces)
– MLPPP
When a physical Ethernet interface has several sub-interfaces, BFD sessions can be
established on the physical Ethernet interface and each of its sub-interfaces.
l Eth-Trunk
– Layer 2 Eth-Trunk links
– Layer 3 Eth-Trunk links
l VLANIF
– VLAN Ethernet member links
– VLAN Ethernet sub-interfaces
– VLANIF interfaces
BFD sessions used to detect a VLANIF interface and VLAN member interfaces are
independent from each other and can detect these interfaces at the same time.
l MPLS LSP
To detect Multiprotocol Label Switching label switched path (MPLS LSP) connectivity,
BFD session negotiation is performed in the following modes:

Equipment
– Static: BFD session negotiation is performed through the manually-configured local

discriminator and remote discriminator of a BFD session.
– Dynamic: BFD session negotiation is performed through the BFD discriminator
type-length-value (TLV) in an LSP ping packet.
LSP types detected by static BFD sessions are as follows:
– Static LSPs
– Label Distribution Protocol (LDP) LSPs
– Traffic Engineering (TE) LSPs, including tunnels and Constraint-Routing LSPs
(CR-LSPs) and Resource Reservation Protocol (RSVP) CR-LSPs that are bound to
tunnels
BFD can detect a TE tunnel that uses the CR-Static or RSVP-TE signaling protocol, and
detect the primary LSP bound to the TE tunnel.
The types of LSPs detected by dynamic BFD sessions are as follows:
– LDP LSPs
– TE tunnels including static CR-LSPs and RSVP CR-LSPs that are bound to tunnels
A dynamic BFD session cannot detect the entire TE tunnel.
l PWs
BFD for PW can be configured in two modes, that is, static mode (the discriminator is
configured manually) and dynamic mode.
The types of PWs that BFD can detect are as follows:
– Single-segment PWs
– Multi-segment PWs
BFD Session Modes

A BFD session can be set up in the following modes:
BFD differentiates sessions by My Discriminator and Your Discriminator in the control
packets. The main difference in establishment of static and dynamic BFD sessions is that My
Discriminator and Your Discriminator are set differently.
Table 3-3 BFD session establishment modes

BFD Session Description
Establishment Mode
Static mode BFD session parameters, such as the local and remote
discriminators, are manually configured and delivered for BFD
session establishment.
NOTE
In static mode, configure unique local and remote discriminators for each
BFD session. This mode prevents incorrect discriminators from affecting
BFD sessions that have correct discriminators and prevents BFD sessions
from alternating between Up and Down.

Equipment
BFD Session Description

Establishment Mode
Dynamic mode When a BFD session is dynamically established, the system

processes the local and remote discriminators as follows:
l Dynamically allocates the local discriminator. When a system
triggers the dynamic establishment of a BFD session, the
system allocates a dynamic discriminator as the local
discriminator of the BFD session. Then, the system sends a
BFD control packet with Your Discriminator set to 0 to the
peer for session negotiation.
l Automatically learns the remote discriminator. The local end
of a BFD session sends a BFD control packet with Your
Discriminator set to 0 to the remote end. After the remote end
receives the packet, it checks whether the value of Your
Discriminator in this packet is the same as the value of its My
Discriminator. If the value of Your Discriminator matches
that of My Discriminator, the remote end learns the value of
My Discriminator of the local end and obtains its Your
Discriminator.
BFD Session Management

A BFD session has the following states:
l Down: indicates that the BFD session is in the Down state or has just been set up.
l Init: indicates that the local system can communicate with the remote system, and the
local system expects a BFD session to go Up.
l Up: indicates that the BFD session is set up successfully.
l AdminDown: indicates that the BFD session is in the administratively Down state.
The session status is conveyed in the State field of a BFD control packet. The system changes
the session status based on the local session status and the received session status of the peer.
When a BFD session is to be set up or deleted, the BFD state machine implements a three-
way handshake to ensure that both two systems are aware of the status change.
Figure 3-18 shows the state transition process in establishment of a BFD session.

Equipment
Figure 3-18 BFD session state transition

ATN A ATN B
DOWN Sta: Down DOWN

Step1 Sta: Down Step1
DOWN => INIT

DOWN => INIT Sta: Init Step2
Step3 Sta: Init
INIT => UP INIT => UP

Step5 Sta: Up
Sta: Up Step4
1. BFD configured on both ATN A and ATN B independently starts state machines. The
initial status of BFD state machines is Down. ATN A and ATN B send BFD control
packets with the State field set to Down. If BFD sessions are established in static mode,
the value of Your Discriminator in BFD control packets is manually specified. If BFD
sessions are established in dynamic mode, the value of Your Discriminator is set to 0.
2. After receiving a BFD control packet with the State field set to Down, ATN B switches
the session status to Init and sends a BFD control packet with the State field set to Init.
NOTE
After the local BFD session status of ATN B changes to Init, ATN B no longer processes the
received BFD control packets with the State field set to Down.
3. The BFD session status change of ATN A is the same as that of ATN B.
4. After receiving a BFD control packet with the State field set to Init, ATN B changes the
local session status to Up.
5. The BFD session status change of ATN A is the same as that of ATN B.
1. ATN and CX-B enable BFD state machines respectively. The initial status of BFD state
machines is Down. ATN and CX-B send BFD control packets with the State field being
Down. In the static configuration of a BFD session, Your Discriminator in the BFD
control packet is specified manually. In dynamic establishment of a BFD session, Your
Discriminator is 0.
2. After receiving the BFD packet with the State field being Down, CX-B switches the
session status to Init and sends the BFD packet with the State field set to Init.
3. After the local BFD session status of CX-B changes to Init, CX-B no longer processes
the received BFD packets with the State field being Down.
4. The status change of the BFD session on ATN is the same as the status change of the
BFD session on CX-B.
5. After receiving the BFD packet with the State field being Init, CX-B changes the local
session status to Up.

Equipment
6. The status change of the BFD session on ATN is the same as the status change of the
BFD session on CX-B.
3.3.2.1 BFD for IP

A BFD session is established on an IP link to fast detect faults.
BFD can detect single-hop and multi-hop IP links.
l Single-hop BFD detects IP route connectivity between directly-connected systems. The
single hop refers to an IP hop. Between these two systems, only one BFD session can be
set up for a specified data protocol on an interface.
l Multi-hop BFD detects any paths between systems. A path may span multiple hops or
may partially overlap.
BFD for IP Applications

Example 1
Figure 3-19 shows a single-hop BFD session detecting a path between directly-connected
devices. The BFD session is bound to the outgoing interface.
Figure 3-19 Single-hop BFD for IP
BFD session
ATN CX-B
BFD session
Example 2
Figure 3-20 shows a multi-hop BFD session detecting a path between ATN A and ATN C.
The BFD session is bound to the peer IP address but not the outgoing interface.

Equipment
Figure 3-20 Multi-hop BFD for IP
BFD session
ATN CX-B CX-C
BFD session
3.3.2.2 BFD for PIS

BFD for process interface status (PIS) is a simple mechanism, in which the BFD session is
associated with the interface status. This improves the sensitivity of interfaces when detecting
link faults and minimizes the impact of faults on non-directly-connected links.
In BFD for PIS, after detecting a link fault, a BFD session immediately sends a Down
message to the corresponding interface. Then, the interface enters the BFD Down state, which
matches the link protocol Down state. An interface in BFD Down state processes only BFD
packets, so the interface can quickly detect link faults.
To configure BFD for PIS, configure a multicast BFD session and associate it with an
interface. In BFD for PIS, BFD packet forwarding is independent of the IP attributes on the
interface.
BFD for PIS Applications
Figure 3-21 BFD for PIS networking diagram

BFD session
ATN CX-B
BFD session
As shown in Figure 3-21, a BFD session is established on ATN and CX-B. The BFD session
sends a packet with the source address being the default multicast IP address to GE 1/0/0 to
detect the single-hop link. After BFD for PIS is enabled, when BFD detects a link fault, the
BFD session sends a Down message to the corresponding interface and then the interface
enters the BFD Down state.

Equipment
3.3.2.3 BFD for TTL

As defined in draft-ietf-bfd-multihop-04, the single-hop BFD session uses port 3784 as the
destination port. In addition, a new rule is added that the multi-hop BFD session uses port
4784 as the destination port. In compliance with draft-ietf-bfd-multihop-04, port 4784 that is
the destination port for multi-hop BFD packets is supported in the feature of BFD for TTL.
Meanwhile, for the interworking with the device of the earlier version, port 3784 is used as
the port for both single-hop and multi-hop BFD packets and the TTL value carried in received
packets is used to distinguish single-hop sessions from multi-hop sessions. The default
destination port for both single-hop and multi-hop BFD packets is port 3784.
The BFD control packets are encapsulated in the UDP packets, using the source port in the
range of 49152 to 65535 and destination port 3784 or 4784. As defined in the BFD draft, the
destination port 4784 is used by multi-hop BFD control packets.
Application Environment
Typical Application 1
Figure 3-22 shows that a BFD session detects a single-hop path between devices and the BFD
session is bound to the outgoing interface.
Figure 3-22 Networking diagram for single-hop BFD for IP

BFD session
ATN CX-B
BFD session
Typical Application 2
Figure 3-23 shows that a BFD session detects a multi-hop path between ATN A and CXC and
the BFD session is bound to the peer IP address but not the outgoing interface.
Figure 3-23 Networking diagram of multi-hop BFD for IP

BFD session
ATN CX-B CX-C
BFD session

Equipment
3.3.2.4 Introduction to BFDv6

Bidirectional forwarding detection (BFD) for IPv6, also called BFDv6, has similar functions
to BFD for IPv4 (BFDv4). Both fast detect a communication failure between systems and
notify upper-layer applications of the failure.
Comparisons between BFDv6 and BFDv4

The following table compares BFDv6 and BFDv4 in terms of the functions they support.
Features BFDv6 BFDv4
IP link Supported Supported
Static route Supported Supported
OSPF Not supported Supported
OSPFv3 Supported Not supported
BGP Supported Supported
IS-IS Supported Supported
PST Not supported Supported
PIS Not supported Supported
PW Not supported Supported
TE Not supported Supported
LSP Not supported Supported
DHCP Not supported Supported
BFDv6 and Routing Protocol Association

A link failure or a change of the network topology may lead to rerouting. To improve the
availability of the network, routing protocols must be able to converge in as little time as
possible once a fault occurs. One feasible solution is to implement faster failure sensing and
notification.
With BFDv6 and routing protocol association, after a new neighbor relationship is set up
based on the routing protocol, a BFDv6 session is dynamically established to detect the link
between the neighbors. After detecting a link failure, BFDv6 notifies the routing protocol of
the failure. In this manner, faster convergence is achieved. If the neighbor relationship is
Down, the BFDv6 session is deleted dynamically.
Currently, the following protocols can be associated with BFD6:

l OSPFv3
l BGP4+
l IS-IS

Equipment
BFDv6 for Static Routes

Static routes do not provide a detection mechanism. This means that when faults occur on a
network, administrator interference is required.
After BFDv6 is associated with static routes, a BFDv6 session detects the status of an IPv6
static route on a public network. The routing management module judges whether a static
route is available according to the associated BFD session status.
Types of Links That Can Be Detected by BFDv6

BFD6 can detect IPv6 links. Similar to BFDv4, BFDv6 also supports both single-hop sessions
and multi-hop sessions.
BFD6 supports the following types of interfaces:
l Layer 3 physical interfaces
l Ethernet sub-interfaces (including Eth-Trunk sub-interfaces)
l Eth-Trunk interfaces
l VLANIF interfaces
3.3.3 Application Environment
3.3.3.1 BFD for USR

BFD for Unicast Static Route (USR) is used to detect IPv4 USRs. After a BFD session is
bound to an IPv4 USR, link failures can be detected more quickly.
Unlike dynamic routing protocols, USRs do not have a detection mechanism. If a fault occurs
on a network, an administrator needs to handle it manually. In BFD for USR, BFD sessions
are bound to IPv4 USRs in a public network and are used to detect the link status of the IPv4
USR.
Each BFD session is bound to a single IPv4 USR. When a BFD session detects a fault (for
example, the link changes from Up to Down) on a link of the USR, BFD reports the fault to
the routing management module. Then, the RM sets the USR as "inactive" (indicating that the
route is unavailable and is deleted from the IP routing table).
When the BFD session bound to the USR is successfully set up or the link of the USR
recovers from the fault (that is, the link changes from Down to Up), BFD reports the event to
the RM and the RM sets the USR as "active" (indicating that the route is available and has
been added to the IP routing table).
3.3.3.2 BFD for OSPF

A link fault or change in topology may lead to rerouting in a network. Quick convergence of a
routing protocol is important for improving network availability. A feasible solution is to fast
detect the fault and immediately notify the routing protocol of the fault.
In BFD for OSPF, OSPF is associated with a BFD session. The BFD session fast detects a
link fault and notifies OSPF of the fault. In this manner, OSPF speeds up responses to changes
in network topology.
Table 3-4 shows convergence speed statistics when OSPF is and is not associated with a BFD
session.

Equipment
Table 3-4 OSPF convergence speed statistics

Associated Link Fault Detection Mechanism Convergence
with BFD Speed
No OSPF Hello keepalive timer timeout Within seconds
Yes BFD session in the Down state Within milliseconds
3.3.3.3 BFD for IS-IS

Generally, the interval at which the Intermediate System to Intermediate System (IS-IS)
protocol sends Hello messages is 10 seconds. If a device does not receive any Hello message
from its neighbor within three Hello intervals, the device deletes the neighbor. Therefore, it
takes a device a number of seconds to detect that a neighbor is Down. This leads to the loss of
a large number of packets in a high-speed network.
In BFD for IS-IS, the establishment of a BFD session is dynamically triggered by IS-IS but
not configured manually. When detecting a fault, the BFD session notifies IS-IS of the fault
through the Routing Management Module (RM). IS-IS processes the neighbor-Down event
and quickly sends the link state PDU (LSP), and performs the partial route calculation (PRC).
In this manner, IS-IS routes fast converge.
The BFD fault detection interval is at the millisecond level. Instead of replacing the IS-IS
Hello mechanism, BFD works with IS-IS to detect the adjacency fault more quickly. In
addition, BFD instructs IS-IS to recalculate routes, ensuring correct packet forwarding.
The RM allows IS-IS and BFD to interact with each other. Through the RM, IS-IS instructs
BFD to dynamically set up or delete BFD sessions. The BFD event messages are also
delivered to IS-IS through the RM.
BFD for IS-IS Applications
Figure 3-24 BFD for IS-IS networking diagram
BFD session
ATN-A ATN-B ATN-C
BFD session
After BFD is enabled on ATN—A, ATN—B, and ATN—C, the BFD session can quickly
detect faults on the link between ATN—A and ATN—B, and notify IS-IS through the RM.
Then, IS-IS sets the neighbor status to Down to trigger the IS-IS topology calculation. In

Equipment
addition, IS-IS updates LSPs to ensure that ATN—C (ATN—B's neighbor) can receive the
updated LSPs from ATN—B in time. This implements fast network topology convergence.
3.3.3.4 BFD for BGP

The Border Gateway Protocol (BGP) periodically sends Keepalive messages to its peer to
monitor the neighbor status. This detection process lasts more than 1 second. When the data is
transmitted at gigabit rates, a large amount of data will be discarded, which cannot meet the
requirement for carrier-class reliability.
BFD for BGP was developed to compensate for this shortcoming. The BFD session can fast
detect a fault on a link between BGP peers and notify BGP, ensuring fast convergence.
NOTE
By default, a multi-hop BGP session is established between Huawei devices that set up an IBGP peer
relationship. A BFD for IGP session and A BFD for IBGP session cannot be both set up between a
Huawei device and a non-Huawei device that sets up a single-hop BGP session with its peer by default.
In such a situation, setting up only A BFD for IGP session or A BFD for IBGP session between the
Huawei and non-Huawei devices is recommended.
BFD for BGP Applications
Figure 3-25 BFD for BGP networking

BFD session
AS 100 AS 200
EBGP
ATNA ATNB
BFD session
As shown in Figure 3-25, ATN—A belongs to AS 100 and ATN—B belongs to AS 200. ATN
—A and ATN—B are directly connected through the External Border Gateway Protocol
(EGBP). A BFD session is established to detect the BGP neighbor relationship between ATN
—A and ATN—B. When the link between ATN—A and ATN—B is faulty, the BFD session
can quickly detect the fault and notify BGP.
3.3.3.5 BFD for LSP

A BFD session established on an LSP quickly detects faults on the LSP to provide end-to-end
protection for the LSP.
A BFD session can detect faults on the control plane of an MPLS LSP. When a BFD session
detects faults on a unidirectional LSP, the reverse path can be an IP link, an LSP, or a TE
tunnel.
To detect MPLS LSP connectivity, a BFD session is negotiated in the following modes:

Equipment
l Static configuration: The negotiation of a BFD session is performed using the local
discriminator and remote discriminator that are configured manually.
l Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator TLV in LSP ping packets.
BFD detects the following types of LSPs:
l Static LSP
l LDP LSP
l Static CR-LSP
l Dynamic CR-LSP
BFD uses the asymmetric mode to detect LSP connectivity. That is, the ingress and the egress
periodically send BFD packets to each other. If the ingress or the egress does not receive BFD
packets from the other within the detection period, the LSP is considered Down and BFD
sends an LSP Down message to the LSP management module (LSPM).
BFD for LSP Applications
Figure 3-26 BFD for LDP LSP networking diagram
BFD session PE2
P1
PE1 CE2
BFD session
PE3
As shown in Figure 3-26, only traffic from PE1 to CE2 is involved in BFD for LSP. When a
fault occurs on the link between PE1 and P1, PE1 can detect the fault through the interface,
and BFD for LDP LSP does not need to be configured. When a fault occurs on the link
between P1 and PE2, PE1 cannot detect the fault through the interface, and BFD for LDP LSP
must be configured to perform fast detection.
An LDP LSP destined for PE2 is set up on PE1. BFD for LDP LSP is enabled and a BFD
session is set up. Policies of Virtual Private Network fast reroute (VPN FRR) are configured
on PE1, and the path from PE1 to PE3 is configured as the protection path.
When a fault occurs on the link between PE1 and P1 or between P1 and PE2, PE1 quickly
detects the fault and triggers VPN FRR switching. Then, traffic sent to CE2 is switched to the
protection path from PE1 to PE3.

Equipment
3.3.3.6 BFD for PST

When a BFD session detects a fault, the interface status in the port state table (PST) is
changed, which triggers FRR switchover. BFD for PST is applicable to the single-hop BFD
session that is bound to an interface.
BFD for PST is widely used in different types of FRR. In the BFD session that is bound to an
interface, the BFD session is associated with the PST of the interface. After the BFD session
detects that the link is Down, the corresponding bit in the PST of the interface is set to Down,
which triggers FRR switchover.
3.3.3.7 BFD for TE

BFD for TE is an MPLS TE end-to-end fast detection mechanism used to quickly detect faults
along the link through which an MPLS TE tunnel passes.
Traditional detection mechanisms, including RSVP Hello and RSVP Srefresh (summary
refresh) mechanisms, detect faults at slow speeds. BFD, however, uses a fast packet
transmission mode and quickly detects faults in MPLS TE tunnels, triggering fast switchover
of services.
BFD detects faults on the following types of TE tunnels:
l Static BFD for CR-LSP
A BFD session detects faults on a CR-LSP. The BFD session needs to be configured
manually.
l Static BFD for TE
A BFD session detects the entire TE tunnel and triggers traffic switchover for
applications, such as VPN FRR.
l Dynamic BFD for CR-LSP
Dynamic BFD for CR-LSP functions the same as static BFD for CR-LSP, except that
BFD session establishment is dynamically triggered.
l Dynamic BFD for RSVP
BFD for RSVP is used to detect RSVP neighbor relationships. When a Layer 2 device
exists between adjacent RSVP nodes, both nodes can only use the Hello mechanism to
detect the link faults, which lasts for seconds. This results in the loss of a large amount of
data. BFD for RSVP, on the other hand, detects faults between adjacent RSVP nodes
within milliseconds. BFD for RSVP is used on TE FRR networks where Layer 2 devices
exist between the point of local repair (PLR) and its RSVP neighbor on the primary path.
The main difference between BFD for TE and BFD for CR-LSP is which objects the BFD
notifies when faults occur. In BFD for TE, BFD notifies applications, such as VPNs of faults
and triggers a traffic switchover between different tunnel interfaces; in BFD for CR-LSP, BFD
notifies TE tunnels of faults and triggers a traffic switchover between different CR-LSPs in
the same TE tunnel.
In BFD for LSP, BFD is bound to an LSP, and a BFD session is set up between the ingress
and egress of the LSP. A BFD packet is sent by the ingress and forwarded to the egress, and
then the egress responds to the BFD packet. In this manner, the BFD session on the ingress
can quickly detect a fault on the path through which an LPS passes.
After a link fault is detected, BFD notifies the LSP management module. Then, traffic is
switched to the backup LSP and a new BFD session is set up between the ingress and the
egress of the backup LSP.

Equipment
Figure 3-27 BFD for LSP networking
Before Switchover
After Switchover
Primary Lsp
Backup Lsp
BFD Session
As show in Figure 3-27, a BFD session detects a fault on the link through which the primary
LSP passes. When a fault occurs on the link of the primary LSP, the BFD session on the
ingress notifies the LSPM of the fault. Then, the ingress switches traffic to the backup LSP
and a new BFD session is set up along the link, through which the backup LSP passes, to
detect the link status.
BFD for TE Applications

Figure 3-28 shows the networking of BFD for TE, with hot standby and tunnel protection.

Equipment
Figure 3-28 BFD for TE networking
P1
R1 R2
P2 主Tunnel
备Tunnel
P3
Primary Lsp
Backup Lsp
l Switchover between the primary tunnel and the hot-standby LSP

As shown in Figure 3-28, a primary tunnel and a hot-standby tunnel are set up between
R1->P1->R2. A BFD session is set up from R1 to R2 to detect faults on the primary
tunnel. When a fault occurs on the primary lsp, BFD quickly notifies R1. After receiving
the fault information, R1 switches traffic to the hot-standby lsp to ensure normal traffic
transmission.
l Switchover between the primary tunnel and the backup tunnel
As shown in Figure 3-28, the primary tunnel is established along the path R1->P2->R2,
and the backup tunnel is established along the path R1->P3->R2. A BFD session is set
up along the path R1->P2->R2 to detect faults on the primary tunnel. When a fault
occurs on the primary tunnel, BFD quickly notifies R1 of the fault. After receiving the
fault information, R1 switches traffic to the backup tunnel to ensure normal traffic
transmission.
3.3.3.8 BFD for PW

BFD for pseudo wire (PW) is a mechanism used to rapidly detect and report faults in an
L2VPN.
L2VPN can use BFD for PW to rapidly detect tunnels or PWs between two PEs and trigger
service switchover in case of a fault, reducing the impact of link failures on services.
BFD for PW is classified into the following types:
l Static BFD for PW: performed in either TTL or non-TTL mode.

NOTE
The TTL mode indicates that the TTL value is variable (automatically calculated or manually set);
the non-TTL mode indicates that the TTL value is fixed at 255.
– BFD for PW in TTL mode: BFD packets are encapsulated into PW packets and
transmitted over a PW regardless of whether the PW is in control word mode or
non-control word mode.

Equipment
n BFD for single-segment PW: A BFD session is used to detect an end-to-end

single-segment PW. Based on the destination IP address and TTL value (the
TTL value for a single-segment PW is 1) carried in BFD packets, the source
and destination of a PW perform the BFD session negotiation and exchange
BFD packets to rapidly detect the PW.
n BFD for multi-segment PW: A BFD session is used to detect a multi-segment
PW. It is required that an IP address be specified as the destination address of
the multi-segment PW to be detected. BFD packets can reach the destination
after traversing multiple SPEs regardless of whether the PW is in control word
mode or non-control word mode.
n BFD for unicast VPLS: BFD can detect a single-segment PW or a multi-
segment PW formed by connecting a VPLS PW to a VLL PW. Different from
BFD for multicast VPLS, BFD for unicast VPLS can detect any PW on a
VPLS network.
– BFD for PW in non-TTL mode: BFD packets are encapsulated into PW packets and
transmitted over a PW. This mode requires that the PW work in control word mode
and distinguish control packets from data packets based on the control word field in
the packet. Currently, BFD for PW in non-TTL mode is used to detect an end-to-
end single-segment PW.
l Dynamic BFD for PW: It can be performed only in non-TTL mode. Dynamic BFD for
PW requires that the PW be enabled with the control word function. The status change of
a PW (Up or Down) will trigger the establishment or deletion of a BFD session. If a PW
to be detected goes Up, the local device informs the BFD module of its neighbor
information and detection parameters for establishing a BFD session to detect the PW.
Fault Detection and Notification Mechanism
Figure 3-29 Networking diagram for the AC fault detection and notification mechanism
BFD OAM mapping

OAM mapping
OAM detection
OAM
PE1 PE2 notification
AC fault
RNC
Node B
(CE2)
(CE1)
On the network shown in Figure 3-29, if the AC interface connecting CE1 to PE1 becomes
faulty:
1. AC OAM on PE1 detects the AC interface failure.

2. PE1 finds the PW mapped to the AC interface based on OAM mappings.
3. The BFD session on PE1 transparently sends the OAM notification message to PE2.

Equipment
4. After PE2 receives the OAM notification message, if a secondary PW exists between
PE1 and PE2, traffic switches to the secondary PW; if no secondary PW exist between
PE1 and PE2, PE2 sends the message to CE2 through the AC interface found on the
basis of OAM mappings.
Applications
As shown in Figure 3-30, the link UPE1-> SPE1-> UPE2 is the primary PW and the link
UPE1-> SPE2-> UPE2 is the secondary PW. A BFD session is established between UPE1 and
UPE2 to detect multi-segment PWs from UPE1 to UPE2. If the BFD session detects a fault in
the primary PW between UPE1 and UPE2, traffic is rerouted from the primary PW to the
secondary PW.
Figure 3-30 Networking diagram for the configuration of a static BFD session to detect the
multi-segment PW
BFD session
UPE1 SPE1 UPE2

VPLS(VLL) VLL
VP
LS L
(VL VL
L)
SPE2
Abbreviation
ISIS Intermediate System-Intermediate System
VC Virtual Circuit
AC Attachment Circuit
PE Provider Edge Router
CE Customer Edge Router
OSPF Open Shortest Path First
TE Traffic Engineer

Equipment
CSPF Constraint Shortest Path First
PW Pseudo Wire
MPLS Multi Protocol Label Switching
3.4 NSR Overview

Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.This
section describes how to implement NSR and related technologies.
3.4.1 Introduction
NSR is a type of reliability technology that keeps the neighbor relationships of a device
during the active/standby switchover of main control boards on the device.
Non-Stopping Forwarding (NSF) and Non-Stopping Routing (NSR) are two solutions to High
Availability (HA).
l NSF: ensures that forwarding services are not interrupted during the active/standby
switchover of main control boards by using the protocol-specific GR mechanism.
– When a fault occurs in the system, forwarding services are not interrupted during
the active/standby switchover of main control boards.
– After the device recovers, it can re-establish neighbor relationships with other
devices, and then rebuild the routing table based on the information obtained from
its neighbors.
For details about the GR configuration, see the chapter "GR Configuration" in the
Configuration Guide - Reliability.
l NSR: ensures that route processing is not interrupted on the control plane and the
forwarding plane during the active/standby switchover of main control boards by using
the backup mechanism of a related protocol.
During the active/standby switchover of main control boards on a device, the route
processing is not interrupted because of the following factors:
– No neighbor or topology information is lost.
– No neighbor relationship goes Down.
The advantages of NSR are as follows:
– NSR on the local device does not depend on or affect the remote device. Therefore,
the local and remote devices can communicate properly.
– The route convergence speed of NSR is higher than that of NSF.

Equipment
Table 3-5 Comparison between NSR and NSF

Feature Advantage Disadvantage
NSF l Less information needs to be l Interworking problem: Different

backed up from the active main vendors may use different methods
control board to the standby main to implement NSF due to immature
control board. Only the following standards.
information needs to be backed l Depending on neighbor status: A
up: device needs the help of its
– Changes of configuration neighbor to complete GR. If all
information devices become faulty at the same
– Interface status time due to a software defect, NSF
cannot work properly.
– Information about neighbors,
protocol status, sending and l Low convergence speed: If an
receiving of packets, routes, NSF-enabled device becomes
and topologies faulty and is no longer started, the
remote device, however,
l During the active/standby mistakenly considers that the NSF
switchover of main control device can be started after a while
boards, the forwarding plane can and does not delete the routing
still provide forwarding services. information learned from the NSF-
– During the GR process of a enabled device until the set
protocol, topologies and routes recovery time expires. Therefore,
may change, causing routing information may be
forwarding errors. The incorrect during this period of
occasion occurs with a very time.
low probability.

Equipment
Feature Advantage Disadvantage
NSR l When the control plane becomes l More bandwidths are required and
faulty, the forwarding plane can more system resources are
still provide forwarding services. consumed.
l During the active/standby
switchover of main control
boards, the route processing is not
interrupted because of the
following factors:
– No neighbor or topology
information is lost.
– No neighbor relationship goes
Down.
l The active/standby switchover of
main control boards is relevant to
only the local device.
– The active/standby switchover
of main control boards on the
local device does not depend
on or affect the remote device.
Therefore, the local device and
the remote device can
communicate properly.
– The route convergence speed
of NSR is higher than that of
NSF.
NSR and GR
The device that performs the master/slave main control board switchovers supports two HA
protection mechanisms: NSR and GR. NSR and GR are mutually exclusive for a specific
protocol. However, after NSR is deployed on a device, the device can still be configured as a
GR helper to help its neighbors complete GR, which improves service reliability on all nodes
of a network.
3.4.2 NSR Features Supported by the ATN

This section lists the protocols that support NSR.
Currently, the protocols that support NSR are as follows:
l ISIS
l OSPF
l BGP
l IPv4 L3VPN
l RSVP
l LDP

Equipment
l BFD
3.5 Ethernet OAM
3.5.1 Introduction
Definition
Ethernet operation, administration and maintenance (OAM) is for use on Ethernet networks.
Ethernet OAM provides the following functions:
l Fault management
– Ethernet OAM enables a device to send detection packets, either on demand or
periodically, to monitor network connectivity.
– Ethernet OAM uses methods similar to Packet Internet Groper (PING) and
traceroute used on IP networks to diagnose faults on Ethernet networks.
– Ethernet OAM can work with a protection switching protocol to trigger a device or
link switchover if a connectivity fault is detected. Switchovers help networks
achieve carrier-class reliability, by ensuring that network interruptions are less than
or equal to 50 milliseconds.
l Performance management
Performance management is usually implemented at the attachment circuit (AC)
interface and measures the packet loss ratio, delay, and jitter during packet transmission.
It also collects statistics on various types of traffic. By using performance management
tools in a network management system (NMS), carriers can monitor the network running
status, diagnose faults, and check whether the network forwarding capability complies
with the service level agreement (SLA) that has been signed with users.
Ethernet OAM effectively improves the manageability and maintainability of Ethernet

networks and ensures network stability.
Purpose
Since its appearance, Ethernet has gradually become the major local area network (LAN)
technology owing to its easy implementation and low costs. With the application of gigabit
Ethernet (GE) and 10 gigabit Ethernet (10GE) technologies in recent years, Ethernet has been
applied in metropolitan area networks (MANs) and wide area networks (WANs).
Ethernet was originally developed for LANs, which do not have high requirements for
reliability and stability compared with MANs and WANs. As a result, Ethernet lacks an OAM
mechanism, hindering Ethernet for use as an ISP network. Therefore, Ethernet OAM is the
trend.
3.5.2 Principles
Ethernet OAM is classified as link- or network-level Ethernet OAM.

Equipment
Figure 3-31 Typical MAN networking
Node B ATN
CE UPE
PE-AGG
BRAS
SOHO
CE UPE
IP/
MPLS
Intranet core
CE ......
CX600
PE-AGG
UPE
CE
Commercial
centre
CE UPE
Residential
area
EFM OAM (802.3ah) Ethernet CFM (802.1ag) Backbone
Ethernet in the first mile Access convergence layer network
on the MAN
Link-Level Ethernet OAM

Link-level Ethernet OAM technologies, such as Ethernet in the first mile OAM (EFM OAM)
defined in IEEE 802.3ah, provide functions including link connectivity detection, link fault
monitoring, remote fault notification, and remote loopback for two directly connected
devices. On the network shown in Figure 3-31, link-level Ethernet OAM is applied between
customer edges (CEs) and provider edges (PEs), and ensures reliable and stable connections
between customer and provider networks.
A CE is an edge device of a customer network and connects the customer network to a

provider network. Different from a CE, a PE is an edge device of a provider network and
connects the provider network to a customer network.
EFM OAM provides point-to-point fault detection on the link between two directly connected
devices.

Equipment
Network-Level Ethernet OAM

Ethernet connectivity fault management (Ethernet CFM), which complies with IEEE 802.1ag,
is a network-level Ethernet OAM technology that provides functions including end-to-end
(E2E) connectivity fault detection, fault notification, fault acknowledgement, and fault
locating. Ethernet CFM can monitor network connectivity and locate connectivity faults, and
can work with protection switching techniques to improve network reliability.
OAM Fault Association

OAM fault association is used to transmit fault information between detection protocols, such
as between EFM OAM and Ethernet CFM or between Ethernet OAM and Bidirectional
Forwarding Detection (BFD). Along with wider applications of detection protocols, such as
Ethernet OAM, BFD, and MPLS OAM, the OAM fault association module will be applied in
more scenarios.
3.5.2.1 EFM OAM

EFM OAM provides peer discovery, link monitoring, fault notification, and remote loopback
functions.
Peer Discovery
The EFM OAM working mode is an attribute of the interface on which EFM OAM is
enabled. EFM OAM has two working modes: active mode and passive mode. The default
EFM OAM working mode of an interface is the active mode.
Before configuring EFM OAM on an interface, configure a working mode for the interface:
l If the active mode is configured, the interface initiates the peer discovery process. When
EFM OAM is enabled and the interface initiates the peer discovery process, the interface
and its peer interface enter the EFM OAM discovery phase.
l If the passive mode is configured, the interface does not initiate the peer discovery
process. Two interfaces in passive mode cannot simultaneously negotiate sessions. In
addition, interfaces in passive mode cannot initiate requests for remote loopback or
variables.
Figure 3-32 Peer discovery
Initiates OAM discovery Interface 2

(Sends an OAMPDU) (Passive)
Interface 1
(Active) Responds to OAM discovery
(Responds witn an OAMPDU)
OAMPDU flow

Equipment
On the network shown in Figure 3-32, the EFM OAM working modes of interfaces 1 and 2
are active and passive, respectively. After EFM OAM is enabled on interface 1, the peer
discovery process is as follows:
1. Interface 1 sends an OAM protocol data unit (OAM PDU) to interface 2. This OAM
PDU carries the EFM OAM configuration of interface 1.
2. After receiving the OAM PDU, interface 2 compares its EFM OAM configuration with
that of interface 1 and then responds with an OAM PDU. The OAM PDU sent from
interface 2 to interface 1 carries not only the EFM OAM configurations of both
interfaces 1 and 2, but also the Flags field, which indicates whether interface 2 is
satisfied with the EFM OAM configuration of interface 1.
Figure 3-33 shows the OAM PDU format.
Figure 3-33 OAM PDU format
TLV Type
Destination MAC=01-80-C2-00-00-02
TLV Length
Source MAC
OAM Version Number
Slow protocol type=88-09
OAM Revision Number
Subtype=03
State Field
Flags Local TLV
OAM Configuration
Code Remote TLV
OAM PDU configuration
Data/Pad ......
OUI
Frame Check Sequence
Vendor Specific Info
Figure 3-34 Description of the OAM configuration
Name Description
7:5 reserved, TLV is set to 0 in local information
4 variable reachability
1=DTE supports that OAM PDUs are in response to sent variables
0=DTE does not support that OAM PDUs are in response to sent
variables
3 link event
1=DTE supports to parse link events
0=DTE does not support to parse link events
2 OAM remote loopback

OAM 1=DTE can be configured with OAM remote loopback
Configuration 0=DTE cannot be configured with OAM remote loopback
1 Sending OAM PDUs

1=when receiving link works, DTE can send OAM PDUs
0=when receiving link works, DTE cannot send OAM PDUs
0 OAM mode
1=DTE is configured to work in active mode
0=DTE is configured to work in passive mode

Equipment
3. After receiving the OAM PDU from interface 2, interface 1 compares its EFM OAM
configuration with that of interface 2 to check whether their configurations match.
After the preceding process is complete, interfaces 1 and 2 enter the Detect state if their EFM
OAM configurations match. In the Detect state, the two interfaces periodically send OAM
PDUs to maintain their neighbor relationship. If their EFM OAM configurations do not
match, the two interfaces remain in the Discovery state and keep sending OAM PDUs for
status negotiation until the negotiation is successful or EFM is disabled on either or both of
the interfaces.
Link Monitoring
After link monitoring is configured, the system queries physical-layer statistics about the
interface management module and checks the link quality of an interface. Within a specified
period, if the number of errored frames, errored codes, or errored frame seconds detected on
an interface reaches or exceeds a specified threshold, the link on which the interface resides is
faulty. The local device generates an alarm, reports the alarm to an NMS, and sends an OAM
PDU to notify the remote device of the link fault. An errored frame second is a 1-second
interval during which at least one errored frame is detected.
Fault Notification
Faults that can be reported include protocol packet timeout, physical link faults, and OAM
module transmission faults.
l If a protocol packet times out or a physical link fails, the fault event is logged and
reported to an NMS.
l If a transmission fault occurs on the OAM module, the fault event is logged and reported
to an NMS.
If a reverse link is reachable, an OAM PDU is sent to notify the peer of the fault. After
receiving the OAM PDU, the peer logs and reports the fault event to an NMS.
l If the EFM OAM module is associated with other modules, such as BFD, Ethernet CFM,
the OAM fault association module notifies the associated modules of the fault.
Remote Loopback
In Figure 3-35, when the local interface sends non-OAM PDUs to the remote interface, the
remote interface sends the non-OAM PDUs back to the local interface instead of forwarding
them to their destination addresses. This is called remote loopback.
Remote loopback can be used to locate link faults and test link quality. In remote loopback
mode, the local interface sends test packets to the remote interface. The local device then
calculates communication quality parameters (such as the packet loss ratio) of the current link
based on the number of packets sent and received.

Equipment
Figure 3-35 Remote loopback

Non-OAMPDUs
Interface 1 Interface 2
(Active) (Passive)
Data flow
Only interfaces in active mode can initiate remote loopback. Remote loopback can be enabled
on an interface only when the interface is in active mode and both this interface and its remote
peer are in the Detect state. The remote loopback process is as follows:
1. The local interface sends a loopback request to the remote interface and waits for a reply.
2. After receiving the loopback request from the local interface, the remote interface sends
a loopback reply to the local interface and enters the remote loopback state.
3. If the local interface receives the loopback reply within 2 seconds, it enters the remote
loopback state. If the local interface does not receive a loopback reply within 2 seconds,
it retransmits a loopback request to the remote interface. An interface can retransmit a
loopback request a maximum of three times.
To stop remote loopback, the local interface sends the remote interface a message for
disabling remote loopback. After receiving this message, the remote interface exits the
loopback state.
To prevent service interruptions caused by users forgetting to stop remote loopback, remote
loopback is automatically disabled after a timeout period. This timeout period is configurable.
After remote loopback times out, the local interface automatically sends the remote interface a
message to disable remote loopback.
Single-fiber Fault Detection

Optical interfaces work in full-duplex mode and therefore consider themselves Up as long as
they receive packets, even if they fail to send packets. This causes the working status of the
interfaces to be inconsistent with the physical interface status.
As shown in Figure 3-36, optical interface A is directly connected to optical interface B. If
line 2 fails, optical interface B cannot receive packets and sets its physical status to Down.
Optical interface A can receive packets from optical interface B over line 1 and therefore
considers its physical status Up. If optical interface A sends packets to optical interface B, a
service interruption occurs because optical interface B cannot receive the packets.
Figure 3-36 Single-fiber fault detection
2
Optical Optical
Module A Module B
1

Equipment
Single-fiber fault detection resolves the preceding issue.

If EFM detects a fault on an interface that is associated with EFM, the association function
enables the interface to go Down. The modules for Layer 2 and Layer 3 services can detect
the interface status change and trigger a service switchover. The working status and physical
status of the interface remain consistent, preventing a service interruption. After the fault is
rectified and EFM negotiation succeeds, the interface goes Up and services switch back.
Single-fiber fault detection prevents inconsistency between an interface's working status and
physical status and allows service modules to properly detect interface status changes.
3.5.2.2 Ethernet CFM

Ethernet CFM is a network-level Ethernet OAM technology that provides functions including
E2E connectivity fault detection, fault notification, fault acknowledgement, and fault locating.
Ethernet CFM can monitor network connectivity and locate connectivity faults, and can work
with protection switching techniques to improve network reliability.
Currently, IEEE 802.1ag has two versions: IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007.
Table 3-6 describes the differences between the two versions.
Table 3-6 Differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007
Item IEEE 802.1ag IEEE Std Remarks
Draft 7 802.1ag-2007
Maintenance Supported Supported The features and

domain (MD) configurations
supported by IEEE
802.1ag Draft 7 and
IEEE Std 802.1ag-2007
are the same.
Default MD Not supported Supported -

association (MA) configurations
supported by IEEE
802.1ag Draft 7 and
are the same.

association end configurations
point (MEP) supported by IEEE
802.1ag Draft 7 and
are the same.

Equipment
Item IEEE 802.1ag IEEE Std Remarks

Draft 7 802.1ag-2007
Remote Supported Supported The features and

maintenance configurations
association end supported by IEEE
point (RMEP) 802.1ag Draft 7 and
are the same.
Maintenance Supported Supported Both IEEE 802.1ag

association Draft 7 and IEEE Std
intermediate 802.1ag-2007 support
point (MIP) the same types of MIP
creation rules: default,
explicit, and none. The
differences of MIP
creation rules between
IEEE 802.1ag Draft 7
and IEEE Std
802.1ag-2007 are as
follows:
l MIPs are created
based on interfaces
in IEEE 802.1ag
Draft 7.
l MIPs are created
based on a
configured MD or
default MD in IEEE
Std 802.1ag-2007.

point (MP) configurations
supported by IEEE
802.1ag Draft 7 and
are the same.
Because IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 define packets in different
formats, only one version can be used if Ethernet CFM is required.
Basic Concepts
l MD
An MD is a network or a part of a network for which connectivity is managed by CFM.
Devices in an MD are managed by an Internet service provider (ISP) or carrier.
Each MD has a level, which ranges from 0 to 7. A larger value indicates a higher level.
802.1ag packets in low-level MDs cannot pass through high-level MDs, whereas 802.1ag
packets in high-level MDs can pass through low-level MDs.

Equipment
In practice, a nested MD can monitor connectivity of the higher-level MD into which it

is nested. Level settings allow 802.1ag packets to transparently travel through a nested
MD.
For example, on the network shown in Figure 3-37, MD2 with a level set to 3 is nested
in MD1 with a level set to 6. 802.1ag packets must transparently pass through MD2 to
monitor MD1 connectivity. The level setting allows 802.1ag packets to pass through
MD2 to monitor MD1 connectivity, but prevents 802.1ag packets that monitor MD2
connectivity from passing through MD1.
Figure 3-37 MDs at different levels
MD1 (Level=6)
……
MD2 (Level=3)
……
……
l Default MD
Each device can be configured with a single default MD with the highest priority
according to IEEE Std 802.1ag-2007. The default MD allows a high-level MD to detect
the internal topology of a low-level MD.
As shown in Figure 3-38, in the scenarios of MD nesting, devices with high-level MDs
configured may be the edge and intermediate devices of low-level MDs. When 802.1ag
packets in high-level MDs pass through low-level MDs, the packets are transparently
transmitted. If no default MD is configured and the internal topologies of low-level MDs
needed to be detected, devices in low-level MDs must create MIPs with specified
priorities on specified interfaces to reply to devices in high-level MDs with loopback
reply (LBR) or linktrace reply (LTR) messages.
Figure 3-38 Default MD
MD1 (Level=6)
……
MD2 (Level=3)
……
MIP

Equipment
If default MDs with the same level as high-level MDs are configured on devices in low-
level MDs, MIPs are created based on default MDs to reply to requests sent by devices
in high-level MDs. CFM detects topology changes and monitors the connectivity of both
high- and low-level MDs.
The default MD must have a higher level than all MDs to which MEPs configured on the
local device belong. In addition, the default MD must have the same level as a high-level
MD. The default MD is used to transmit high-level continuity check messages (CCMs)
and create MIPs to send LTR messages.
IEEE Std 802.1ag-2007 states that one default MD can be configured on each device and
associated with multiple virtual local area networks (VLANs). VLAN interfaces can
automatically create MIPs based on default MDs.
NOTE
On a device with a default MD configured, the VLAN that has been associated with the default
MD must not be associated with an MA.
l MA
An MA is a part of an MD. An MD can be divided into one or more MAs. Ethernet CFM
detects connectivity faults in each MA.
On a provider network, a VLAN is generally mapped to a service instance (SI). MA
division helps detect connectivity faults on networks where an SI is transmitted.
The level of an MA is the level of the MD to which the MA belongs.
l MEP
As shown in Figure 3-39, a MEP is located at the edge of an MA.
A MEP is configured on an interface. The level of a MEP is the level of the MD to which
the MEP belongs.
Figure 3-39 MEPs and MIPs
MA
MEP
MIP
A MEP configured on an Ethernet CFM-enabled device is called a local MEP, whereas

MEPs configured on other devices in the same MA are called RMEPs.
MEPs are classified as inward- or outward-facing MEPs. An inward-facing MEP sends
802.1ag packets through all the interfaces (except the interface on which the MEP
resides) in the VLAN associated with an MA. That is, an inward-facing MEP broadcasts
802.1ag packets in the VLAN associated with an MA. An outward-facing MEP sends
802.1ag packets through the interface on which the MEP resides.

Equipment
l MIP
As shown in Figure 3-39, a MIP is located within an MA.
A MIP is automatically created on an interface based on a specific rule. Table 3-7
describes the differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 in
creating MIPs.
Table 3-7 Differences between IEEE 802.1ag Draft 7 and IEEE Std 802.1ag-2007 in
creating MIPs
Rule for Creating a IEEE 802.1ag Draft 7 IEEE Std 802.1ag-2007
MIP
Default MIPs can be created on MIPs can be created on an

an interface without a interface, to which a
higher-level MEP or a specified MD or the default
lower-level MIP. MD belongs, without a
higher-level MEP or a lower-
level MIP.
Explicit MIPs can be created on MIPs can be created on an

an interface with a lower- interface, to which a
level MEP but without a specified MD or the default
higher-level MEP or a MD belongs, with a lower-
lower-level MIP. In the level MEP but without a
explicit rule, MIPs can be higher-level MEP or a lower-
created only after a lower- level MIP. In the explicit
level MEP has been rule, MIPs can be created
configured on an only after a lower-level MEP
interface. has been configured on an
interface.
None MIPs cannot be created MIPs cannot be created

automatically. automatically.
The level of a MIP is determined by a creation rule and the level of the MD for creating
the MIP.
On the network shown in Figure 3-40, MD1 to MD5 are nested in MD7, and MD2 to
MD5 are nested in MD1. MD7 has a higher level than MD1 to MD5, and MD1 has a
higher level than MD2 to MD5. Multiple MEPs are created on ATN A in MD1, and the
MEPs belong to MDs at different levels.

Equipment
Figure 3-40 MIP creation based on IEEE Std 802.1ag-2007
VLAN2
MD7(Level=7)
...
MD1(Level=6) MD5(Level=2)
... ...
VLAN1 VLAN2
MD2(Level=5) MD3(Level=4)
ATNA
MD4(Level=3) ...
MEP2 of MA2,the level is 5

MEP3 of MA3,the level is 4 VLAN1
A default rule is configured on ATN A to create a MIP in MD1. The process for creating
a MIP is as follows:
a. ATN A compares MEP levels and finds the MEP at level 5, the highest level. The
level of a MEP is determined by the level of the MD to which the MEP belongs.
b. ATN A selects the MD at level 6, which is higher than the MEP at level 5.
c. ATN A creates a MIP at level 6.
If MDs at level 6 or higher do not exist, no MIP can be created.
If a MIP at level 1 already exists on ATN A, a MIP at level 6 cannot be created.
l MP
MEPs and MIPs are called MPs.
Continuity Check
Ethernet CFM enables MEPs to periodically send continuity check messages (CCMs) to one
another to check the continuity between them. This check is called continuity check (CC).

Equipment
Figure 3-41 CC
M
CC
CC
M
MEP1
MEP3
M
C
C
MA
MEP2
MEP CCMs sent by MEP1

CCMs sent by MEP2
CCMs sent by MEP3
l CCM generation
A MEP generates and sends CCMs. MEP1, MEP2, and MEP3 are in the same MA on
the network shown in Figure 3-41. After the function of sending CCMs is enabled,
MEP1, MEP2, and MEP3 send multicast CCMs to one another at the same interval.
Each CCM carries a level equal to the MEP level.
l MEP database establishment
Each Ethernet CFM-enabled device has a MEP database. A MEP database records
information about the local MEP and RMEPs in the same MA. The local MEP and
RMEPs are manually configured, and their information is automatically recorded in the
MEP database.
l Fault identification
If a MEP does not receive CCMs from its RMEP within a period that is 3.5 times the
interval at which CCMs are sent, the MEP considers the path to the RMEP faulty. If
OAM fault association is configured, the OAM module triggers the associated module to
react or triggers a switchover.
l CCM termination
An MEP terminates CCMs. If a MEP receives a CCM carrying a level higher than the
local level, it forwards this CCM. If the MEP receives a CCM carrying a level lower than
or equal to the local level, it does not forward this CCM, which ensures that CCMs in a
lower-level maintenance domain (MD) are not sent to a higher-level MD.
802.1ag MAC Ping

Similar to a ping operation, 802.1ag MAC ping checks whether a destination device is
reachable by sending test packets and receiving response packets. 802.1ag MAC ping, which
is triggered by a command, can be used to check whether a fault occurs between a device and
its destination device.

Equipment
A MEP initiates an 802.1ag MAC ping test to monitor a path to a MEP or MIP destination
address. These nodes have the same level and they can share an MA or be in different MAs.
Figure 3-42 802.1ag MAC ping principle

ATNB
MEP2
LBM LBR
MEP3
MEP1
ATNA
MEP LBM data flow

LBR data flow
On the network shown in Figure 3-42, MEP1 initiates an 802.1ag MAC ping operation to
MEP2. The process is as follows:
1. MEP1 sends a loopback message (LBM) to MEP2. The LBM must carry either the host
MAC address or MEP ID of MEP2.
2. After receiving the LBM, MEP2 responds with a loopback reply (LBR). MEP1
calculates the period of the ping operation to analyze network performance.
Within a specified timeout period, if MEP1 does not receive an LBR from MEP2, MEP1
considers MEP2 unreachable; if MEP1 receives an LBR from MEP2, MEP1 calculates
the delay from MEP1 to MEP2 based on the timestamp carried in the LBR. In addition,
MEP1 can measure the frame loss ratio based on the difference between the number of
LBMs and the number of LBRs.
802.1ag MAC Trace

Similar to traceroute or tracert, 802.1ag MAC trace works by sending test packets and waiting
for a reply to test the path between a local device and a destination device and to locate faults.
A MEP initiates an 802.1ag MAC trace test to monitor a path to a MEP or MIP destination
address. These nodes have the same level and they can share an MA or be in different MAs.

Equipment
Figure 3-43 802.1ag MAC trace principle
MEP2
LTM LTR
MEP1
MIP2
MIP1
LTR LTR
MEP LTM data flow

MIP LTR data flow
On the network shown in Figure 3-43, MEP1 initiates an 802.1ag MAC trace operation to
MEP2. The process is as follows:
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of
the destination MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and
forwards the LTM if the TTL value is not zero. MIP1 then replies with an LTR to MEP1.
The LTR carries forwarding information and the TTL value carried in the received LTM.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is
repeated for MIP2 and MEP2. In addition, MEP2 finds that its MAC address is the
destination address carried in the LTM and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the
forwarding path between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive
the LTM or reply with an LTR. MEP1 can locate the faulty node based on such a
response failure. For example, if the link between MEP1 and MIP2 works properly, but
the link between MIP2 and MEP2 is faulty, MEP1 can receive LTRs from MIP1 and
MIP2 but fails to receive an LTR from MEP2. MEP1 then considers the path between
MIP2 and MEP2 faulty.
3.5.2.3 Basic Y.1731 Functions
Function Overview
Y.1731 can manage fault information and monitor performance.
l Fault management functions include continuity check (CC), loopback (LB), and linktrace
(LT). The principles of Y.1731 fault management are the same as those of CFM fault
management.
l Performance monitoring functions include single- and dual-ended frame loss
measurement, one- and two-way frame delay measurement, single-ended synthetic loss
measurement (SLM), alarm indication signal (AIS) on virtual private LAN service
(VPLS) networks, virtual leased line (VLL) networks, and virtual local area networks
(VLANs).

Equipment
Table 3-8 Y.1731 functions

Function Function Description Principles
Single-ended Collects frame loss To collect frame loss statistics, select either
frame loss statistics to assess the single- or dual-ended frame loss
measurement quality of links between measurement:
MEPs, independent of l Dual-ended frame loss measurement
continuity check. provides more accurate results than the
Dual-ended Collects frame loss single-ended method. The interval
frame loss statistics to assess link between dual-ended frame loss
measurement quality on CFM CC- measurements varies with the interval
enabled devices. between CCM transmissions. The
CCM transmission interval is shorter
than the interval between single-ended
frame loss measurements. The dual-
ended method allows for a short
interval between dual-ended frame loss
measurements.
l Single-ended frame loss measurement
can be used to minimize the impact of
many CCMs on the network.
One-way Measures the network To measure the link delay time, select
frame delay delay time on a either one- or two-way frame delay
measurement unidirectional link between measurement:
MEPs. l One-way frame delay measurement can
Two-way Measures the network be used to measure the delay time on a
frame delay delay time on a unidirectional link between a MEP and
measurement bidirectional link between its RMEP. The MEP must synchronize
MEPs. its time with its RMEP.
l Two-way frame delay measurement can
be used to measure the delay time on a
bidirectional link between a MEP and
its RMEP. The MEP does not need to
synchronize its time with its RMEP.
AIS Detects server-layer faults AIS is used to suppresses local alarms in

and suppresses alarms, the scenario where faults must be rapidly
minimizing the impact on detected.
network management
systems (NMSs).
Ethernet test Measures the bandwidth An ETH-test can measure the link
(ETH-test) throughput and code errors bandwidth throughput and code errors on a
on links. newly established link. After the carrier
leases this link to a user, the user also
conducts an ETH-test to measure the link
bandwidth throughput and code errors.

Equipment
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange
ETH-LM frames to collect frame loss statistics on E2E links. ETH-LM modes are classified
as near-end ETH-LM or far-end ETH-LM.
Near-end ETH-LM applies to an inbound interface, and far-end ETH-LM applies to an

outbound interface on a MEP. ETH-LM counts the number of errored frame seconds to
determine the duration when a link is unavailable.
ETH-LM supports the following methods:

l Single-ended frame loss measurement
This method measures frame loss proactively or on demand.
– The on-demand measurement collects single-ended frame loss statistics at a time or
a specific number of times for diagnosis.
– The proactive measurement collects single-ended frame loss statistics periodically.
A local MEP sends a loss measurement message (LMM) carrying an ETH-LM request to
its RMEP. After receiving the request, the RMEP replies with a loss measurement reply
(LMR) carrying an ETH-LM response. Figure 3-44 illustrates the procedure for single-
ended frame loss measurement.
Figure 3-44 Single-ended frame loss measurement
ETH-LMM
ETH-LMR
CE2 CE4
CE1 PE1 P PE2 CE5
CE3 CE6
Y.1731
After single-ended frame loss measurement is enabled, a MEP on provider edge PE1
sends an RMEP on PE2 an ETH-LMM containing an ETH-LM request. The MEP then
receives an ETH-LMR message containing an ETH-LM response from the RMEP on
PE2. The ETH-LMM carries a local transmit counter TxFCl (with the value of TxFCf),
indicating the time when the message is sent by the local MEP. After receiving the ETH-
LMM, PE2 replies with an ETH-LMR message, containing the following information:
– TxFCf: copied from the ETH-LMM
– RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
– TxFCb: value of the local counter TxFCl at the time of ETH-LMM transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss
based on the following values:

Equipment
– Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter
RxFCl value that is the time when this ETH-LMR message was received. These
values are represented as TxFCf[tc], RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
– Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and
local counter RxFCl value that is the time when this ETH-LMR message was
received. These values are represented as TxFCf[tp], RxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
Far-end frame loss = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
Near-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Service packets are prioritized based on 802.1p priorities and are transmitted using
different policies. Traffic passing through a provider (P) device on the network shown in
Figure 3-45 carries 802.1p priority values of 1 and 2.
Single-ended frame loss measurement is enabled on PE1 to send traffic with the priority
value of 1 to measure frame loss on a link between PE1 and PE2. Traffic with the
priority value of 2 is also sent. After receiving traffic with the priority values of 1 and 2,
the P forwards traffic with a higher priority, delaying the arrival of traffic with the
priority value of 1 at PE2. As a result, the frame loss ratio is no accurate.
802.1p priority-based single-ended frame loss measurement can be enabled to obtain
accurate results.
Figure 3-45 802.1p priority-based single-ended frame loss measurement
CE1 PE1 P PE2 CE2
User User
Network Network
Y.1731
Priority 1
Priority 2
l Dual-ended frame loss measurement

This method measures frame loss periodically, implementing error management. Each
MEP sends its RMEP a dual-ended ETH-LM message. After receiving an ETH-LM
message, a MEP collects near- and far-end frame loss statistics and does not forward the
ETH-LM message. Figure 3-46 illustrates the procedure for dual-ended frame loss
measurement.

Equipment
Figure 3-46 Dual-ended frame loss measurement
ETH-CCM
ETH-CCM
CE2 CE4
CE1 PE1 P PE2 CE5
CE3 CE6
Y.1731
After dual-ended frame loss measurement is configured, each MEP periodically sends a
CCM carrying a request to its RMEP. After receiving the CCM, an RMEP collects near-
and far-end frame loss statistics and does not forward the message. The CCM contains
the following information:
– TxFCf: value of the local counter TxFCl at the time of CCM transmission
– RxFCb: value of the local counter RxFCl at the time of the reception of the last
CCM
– TxFCb: value of TxFCf in the last received CCM
PE1 uses received information to measure near- and far-end frame loss based on the
following values:
– Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value
that is the time when this CCM was received. These values are represented as
TxFCf[tc], RxFCb[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this CCM was received.
– Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter
RxFCl value that is the time when this CCM was received. These values are
represented as TxFCf[tp], RxFCb[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous CCM was received.
Far-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
Near-end frame loss = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing
SLM, the local MEP exchanges frames containing ETH-SLM information with one or more
RMEPs.
Figure 3-47 demonstrates the process of single-ended SLM:

1. The local MEP sends ETH-SLM request frames to the RMEPs.
2. After receiving the ETH-SLM request frames, the RMEPs send ETH-SLM reply frames
to the local MEP.

Equipment
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame
with the single-ended ETH-SLM reply information is called an SLR. SLM frames carry SLM
protocol data units (PDUs), and SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the point-to-
multipoint network shown in Figure 3-47, inward MEPs are configured on PE1's and PE3's
interfaces, and single-ended frame LM is performed on the PE1-PE3 link. Traffic coming
through PE1's interface is destined for both PE2 and PE3, and single-ended frame LM will
collect frame loss statistics for all traffic, including the PE1-to-PE2 traffic. As a result, the
collected statistics are not accurate. Unlike singled-ended frame LM, single-ended SLM
collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
Figure 3-47 Single-ended SLM
PE2 CE2
User
Network
CE1 PE1
User
Network
Network
CE3
User
Network
SLM PE3
SLR
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR
frames from PE3. SLM frames contain TxFCf, the value of TxFC1 (frame transmission
counter), indicating the frame count at the transmit time. SLR frames contain the following
information:
l TxFCf: value of TxFC1 (frame transmission counter) indicating the frame count on PE1
upon the SLM transmission
l TxFCb: value of RxFC1 (frame receive counter) indicating the frame count on PE3 upon
the SLR transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the
near-end and far-end frame loss based on the following values:
l Last received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive counter)
indicating the frame count on PE1 upon the SLR reception. These values are represented
as TxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc indicates the time when the last SLR frame was received during the measurement
period.
l Previously received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive
counter) indicating the frame count on PE1 upon the SLR reception. These values are
represented as TxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous
measurement period.
Far-end frame loss = |TxFCf[tc] – TxFCf[tp]| – |TxFCb[tc] – TxFCb[tp]|
Near-end frame loss = |TxFCb[tc] – TxFCb[tp]| – |RxFCf[tc] – RxFCf[tp]|

Equipment
On a network, each packet carries the IEEE 802.1p field, indicating its priority. According to
packet priority, different QoS policies will be applied. On the network shown in Figure 3-48,
the PE1-to-PE3 traffic has two priorities: 1 and 2, as indicated by the IEEE 802.1p field.
When implementing single-ended SLM for traffic over the PE1-PE3 link, PE1 sends SLM
frames with varied priorities and checks the frame loss. Based on the check result, the
network administrator can adjust the QoS policy for the link.
Figure 3-48 Single-ended SLM based on different 802.1p priorities
PE2 CE2
User
Network
CE1 PE1
User Network
Network Y.1731 CE3
User
Network
PE3
Y.1731
MEP
Priority 1
Priority 2
ETH-DM
Delay measurement (DM) measures the delay time and delay variation. A MEP sends its
RMEP a message carrying ETH-DM information and then receives a response message
carrying ETH-DM information from its RMEP.
ETH-DM supports the following modes:
l One-way frame delay measurement
A MEP sends its RMEP a 1DM message carrying one-way ETH-DM information. After
receiving this message, the RMEP measures the one-way frame delay or delay variation.
The one-way frame delay measurement can be implemented only after the MEP
synchronizes the time with its RMEP. The delay variation can be measured regardless of
whether the MEP synchronizes the time with its RMEP. If a MEP synchronizes its time
with its RMEP, the one-way frame delay and delay variation can be measured. If the time
is not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following
modes:
– The on-demand measurement computes the one-way frame delay at a time or a
specific number of times for diagnosis.
– The proactive mode computes the one-way frame delay periodically.
Figure 3-49 illustrates the procedure for one-way delay measurement.

Equipment
Figure 3-49 One-way delay measurement
1DM PDU
CE2 CE4
CE1 P PE2 CE5

PE1
CE3 CE6
Y.1731
One-way frame delay measurement is implemented on an E2E link between a local MEP
and its RMEP. The local MEP sends 1DMs to the RMEP and then receives replies from
the RMEP. After one-way frame delay measurement is configured, a MEP periodically
sends 1DMs carrying TxTimeStampf (the time when the 1DM was sent). After receiving
the 1DM, the RMEP parses TxTimeStampf and compares this value with RxTimef (the
time when the DM frame was received). The RMEP calculates the one-way frame delay
based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay value can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets prioritize services. Traffic passing through a P
on the network shown in Figure 3-50 carries 802.1p priority values of 1 and 2.
One-way delay measurement is enabled on PE1 to send traffic with the priority value of
1 to measure the frame delay on a link between PE1 and PE2. Traffic with the priority
value of 2 is also sent. After receiving traffic with the priority values of 1 and 2, the P
forwards traffic with a higher priority, delaying the arrival of traffic with the priority
value of 1 at PE2. As a result, the frame delay calculated on PE2 is no accurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain
accurate results.

Equipment
Figure 3-50 802.1p priority-based one-way frame delay measurement
1DM PDU
CE1 PE1 P PE2 CE2
User User
Network Network
Y.1731
Priority 1
Priority 2
l Two-way frame delay measurement

A MEP sends its RMEP a delay measurement message (DMM) carrying an ETH-DM
request. After receiving the DMM, the RMEP sends the MEP a delay measurement reply
(DMR) carrying an ETH-DM response.
Two-way frame delay measurement can be implemented in either of the following
modes:
– The on-demand measurement computes the two-way frame delay at a time for
diagnosis.
– The proactive mode computes the two-way frame delay periodically.
Figure 3-51 illustrates the procedure for two-way delay measurement.
Figure 3-51 Two-way frame delay measurement

DMM
DMR
CE2 CE4
CE1 P PE2 CE5

PE1
CE3 CE6
Y.1731
Two-way frame delay measurement is performed by a local MEP to send a delay

measurement message (DMM) to its RMEP and then receive a DMR from the RMEP.
After two-way frame delay measurement is configured, a MEP periodically sends
DMMs carrying TxTimeStampf (the time when the DMM was sent). After receiving the

Equipment
DMM, the RMEP replies with a DMR message. This message carries RxTimeStampf
(the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the
exception that the source and destination MAC addresses was interchanged. Upon
receipt of the DMR message, the MEP calculates the two-way frame delay using the
following equation:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
The frame delay value can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets prioritize services. Traffic passing through a P
on the network shown in Figure 3-52 carries 802.1p priority values of 1 and 2.
Two-way delay measurement is enabled on PE1 to send traffic with the priority value of
1 to measure the frame delay on a link between PE1 and PE2. Traffic with the priority
value of 2 is also sent. After P receiving traffic with the priority values of 1 and 2, the P
forwards traffic with a higher priority, delaying the arrival of traffic with the priority
value of 1 at PE2. As a result, the frame delay calculated on PE2 is no accurate.
802.1p priority-based two-way frame delay measurement can be enabled to obtain
accurate results.
Figure 3-52 802.1p priority-based two-way frame delay measurement

DMM
DMR
CE1 PE1 P PE2 CE2
User User
Network Network
Y.1731
Priority 1
Priority 2
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with the level of 6 on each of customer edge CE1 and CE2
access interfaces on the user network shown in Figure 3-53. A MEP is configured in MD2
with the level of 3 on each of PE1 and PE2 access interfaces on a carrier network.
l If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS PDUs to
CEs. After receiving the AIS PDUs, the CEs suppress alarms, minimizing the impact of
a lot of alarms on a network management system (NMS).

Equipment
l After the link between the PEs recovers, the PEs stop sending AIS PDUs. CEs do not
receive AIS PDUs during a period of time 3.5 times as long as the interval at which AIS
PDUs are sent. Therefore, the CEs cancel the alarm suppression function.
Figure 3-53 AIS principles
CE1 AIS packets PE1 PE2 AIS packets CE2
VLL/VPLS/VLAN
VLAN/QinQ VLAN/QinQ
MD2 Level 3
MD1 Level 6
Usage Scenario
Y.1731 applies to virtual leased lines (VLLs), virtual private leased line service (VPLS)
connections, and virtual local area networks (VLANs). AIS and multicast loopback
applications are the same in these scenarios. Different Y.1731 statistical functions are
supported in specific scenarios. The following example illustrates Y.1731 statistical functions
in different scenarios on the network shown in Figure 3-54.
Figure 3-54 Y.1731 scenario

Services Access Metro
CE PE1 P PE2 PE3
Core
Y.1731 Y.1731
Y.1731
l Y.1731 statistical functions in the VLL scenario

A VLL is configured between PE1 and PE2 on the network shown in Figure 3-54. Y.
1731 can be used on an attachment circuit (AC) between a CE and PE1 and on a pseudo
wire (PW) between PE1 and PE2. Figure 3-55 lists supported Y.1731 statistical
functions.

Equipment
Figure 3-55 Y.1731 statistical functions in the VLL scenario
Scenario AC Side PW Side
Function Outward Inward Outward Inward

MEP MEP MEP MEP
Statistics based on demands Y Y Y Y

One-way
delay Periodic 802.1p priority-based N Y Y Y
statistics Not 802.1p priority-based Y Y Y Y
Statistics based on demands Y Y Y Y
Two-way
delay Periodic 802.1p priority-based N Y Y Y
statistics Not 802.1p priority-based Y Y Y Y
Single- Statistics based on demands Y Y Y Y
ended Periodic 802.1p priority-based N Y Y Y
frame loss statistics Not 802.1p priority-based Y Y Y Y
Dual-
ended Periodic statistics Y Y N N
frame loss
l Y.1731 statistical functions in the VPLS scenario

A VPLS connection is configured between PE1 and PE2 on the network shown in
Figure 3-54. Y.1731 can be used on the AC link between a CE and PE1 and on a PW
between PE1 and PE2. Figure 3-56 lists supported Y.1731 statistical functions.
Figure 3-56 Y.1731 statistical functions in the VPLS scenario
AC Side PW Side
Scenario
Function Outward Inward Outward Inward

MEP MEP MEP MEP
Statistics based on demands Y N Y Y
One-way
delay Periodic 802.1p priority-based N N Y Y
statistics Not 802.1p priority-based N N Y Y
Statistics based on demands Y N Y Y
Two-way
delay Periodic 802.1p priority-based N N Y Y
Single- Statistics based on demands Y N Y Y
ended 802.1p priority-based N N Y Y
Periodic
frame loss
Dual-
ended Periodic statistics Y N N N
frame loss
l Y.1731 statistical functions in the VLAN scenario

A VLAN is configured between a CE and PE1 on the network shown in Figure 3-54. Y.
1731 can be used on the link between the CE and PE1. Figure 3-57 lists supported Y.
1731 statistical functions.

Equipment
Figure 3-57 Y.1731 statistical functions in the VLAN scenario
Scenario Outward Inward

Function MEP MEP
One-way Statistics based on demands Y Y

delay Periodic 802.1p priority-based N N
statistics
Not 802.1p priority-based Y Y
Statistics based on demands Y Y
Two-way
delay Periodic 802.1p priority-based N N
statistics Not 802.1p priority-based Y Y
SIngle- Statistics based on demands Y Y
ended Periodic 802.1p priority-based N N
frame loss statistics Y Y
Not 802.1p priority-based
Dual-ended Periodic statistics Y Y
frame loss
Ethernet Test Signal Test

ETH-test is short for Ethernet test signal test. This test measures the bandwidth throughput
and code errors.
l Maximum bandwidth: Test packets are sent at a specified speed. After a specified period
of time, a node collects the number of sent packets and the number of received packets.
If the number of sent packets is greater than the number of received packets, packets
were discarded. Then test packets are sent at a smaller speed. After a specified period of
time, a node collects the number of sent packets and the number of received packets.
This process repeats until no packets are discarded. The transmission rate at which test
packets are transmitted without being discarded is the throughput for the link.
l Code errors: Test packets carry the CRC code in the Test TLV field. One of the following
codes can be specified for test packets:
– Null signal without CRC-32: Packets carrying all 0s.
– Null signal with CRC-32
– PRBS 2-31-1 without CRC-32: Pseudo Random Binary Sequence (PRBS) is used to
simulate a white noise scenario.
– PRBS 2-31-1 with CRC-32
A node sends ETH-test packets carrying calculated CRC code to its peer node. Upon
receipt, the peer node re-calculates a CRC code. If the received CRC code is the same as
the calculated CRC code, no code error occurs. If the two codes are different, a code
error occurs.
An ETH-test can be conducted without interrupting data flows. ETH-test frames share
bandwidth with the data flows.

Equipment
3.5.2.4 OAM Fault Association
Fault Association Between Ethernet OAM and BFD

When detecting a connectivity fault, the OAM module on a PE sends information about the
fault to a peer PE through the bound BFD session. Similarly, when BFD detects a fault, BFD
sends information about the fault to Ethernet OAM through the OAM module.
Fault association between Ethernet OAM (EFM OAM or Ethernet CFM) and BFD applies
when BFD is enabled on PEs and Ethernet OAM is enabled on a CE and PE. After the
association is configured, the CEs at both ends of a tunnel can learn whether a connectivity
fault exists between the CE and PE of a peer network.
Figure 3-58 Fault association between Ethernet OAM and BFD

PW1
ETH OAM ETH OAM
link3
link1 PE2
PE1
BFD
CE1 CE2
link2
link4
PE3 PE4
PW2
On the network shown in Figure 3-58:

l PW1 and PW2 are independent of each other, and BFD is used to detect faults on them.
l Each CE is dual-homed to two PEs, and Ethernet OAM is used to detect faults between a
CE and a PE.
Association Between EFM OAM and an Interface

On the network shown in Figure 3-59, when an interface with EFM OAM enabled detects a
link connectivity fault, the interface forwards only EFM protocol packets. As a result, Layer 2
and Layer 3 services are blocked. Therefore, associating EFM OAM with an interface may
interrupt services. After detecting link connectivity recovery, EFM OAM resumes packet
forwarding and unblocks Layer 2 and Layer 3 services on the interface.

Equipment
Figure 3-59 Association between EFM OAM and an interface

EFM OAM
ATN CX-B
The interface associate with
EFM OAM
Association Between Ethernet CFM and an Interface

When a MEP detects a connectivity fault between the MEP and an RMEP within the same
MA, the OAM module blocks and then unblocks the interface on which the MEP resides so
that other modules can detect the fault. This process is called the association between Ethernet
CFM and an interface.
The association between Ethernet CFM and an interface is used to detect faults on the active
link in a static LACP link aggregation group or in a manually configured 1:1 active/standby
link aggregation group and then trigger a protection switchover.
NOTE
l For ATN 950Bs, only ATN 950Bs with AND2CXPB/AND2CXPEs configured support the
association between Ethernet CFM and an interface.
l Only the ATN 950B can detect faults on the active link in a manually configured 1:1 active/standby
link aggregation group.
Association Between Ethernet OAM and Clearing ARP Entries

When a MEP detects a connectivity fault between the MEP and an RMEP within the same
MA, the OAM module clears all related ARP entries on the interface on which the MEP
resides so that the interface can learn ARP entries again. This process is called the association
between CFM OAM and clearing ARP entries.
When an interface running EFM OAM detects a connectivity fault between itself and its peer
interface, the OAM module clears all related ARP entries on the interface and ARP entries in
the specified VLAN so that the interface can relearn ARP entries. This process is called the
association between EFM OAM and clearing ARP entries.

Equipment
Figure 3-60 Association between Ethernet OAM and clearing ARP entries
ETH OAM
Master PE2a
L2VPN
Backup
ATN
Node B RNC
PE2b
As shown in Figure 3-60:

l RNC is dual-homed to PE2a and PE2b. The connected interfaces are configured with the
association between Ethernet OAM and clearing ARP entries.
l PEs are on an L2VPN.
When detecting a link fault, Ethernet OAM triggers the faulty interface to clear the
corresponding ARP entries so that the RNC can relearn ARP entries from the standby link. In
addition, traffic is switched to the standby link.
3.5.2.5 OAM-based Security

A large number of alarms may be generated and cleared on an unstable network enabled with
CC. These alarms consume system resources and deteriorate system performance. You can
configure the anti-jitter function and alarm suppression to minimize the number of alarms.
Anti-Jitter
NOTE
Anti-jitter is supported only on devices that comply with IEEE Standard 802.1ag-2007.
Anti-jitter involves the following concepts:

l RMEP activation time
The RMEP activation time is used to prevent false alarms. A local MEP with the ability
to receive CCMs can accept CCMs only after the RMEP activation time elapses. After
the configured RMEP activation time elapses, the local MEP considers the path to the
RMEP faulty and reports a connectivity fault alarm if the local MEP does not receive
CCMs from its RMEP within a period that is 3.5 times the interval at which CCMs are
sent.

Equipment
l Alarm and clear alarm anti-jitter times

– If a MEP detects a connectivity fault, the MEP:
n Sends an alarm to the NMS after the alarm anti-jitter time elapses.
n Does not send an alarm if the fault is rectified before the alarm anti-jitter time
elapses.
– If a MEP detects a connectivity fault and sends an alarm, the MEP:
n Sends a clear alarm if the fault is rectified within the clear alarm anti-jitter
time.
n Does not send a clear alarm if the fault is not rectified within the clear alarm
anti-jitter time.
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows only
the alarm with the highest level to be sent to an NMS. If alarms remain after the highest-level
alarm is cleared, the alarm with the next highest level is sent to the NMS. The process repeats
until all alarms are cleared.
The principles of CFM alarm suppression are:
l High-level alarms require immediate troubleshooting.
l A single fault may trigger alarms with different levels. After the highest-level alarm is
cleared, alarms with lower levels may also be cleared.
3.5.2.6 Comparison Between Protocols
Table 3-9 Comparison between EFM OAM and Ethernet CFM

Protocol Description
EFM OAM EFM OAM provides continuity check, fault monitoring, fault
notification, and remote loopback functions for the link between
directly connected devices. EFM OAM is used on links between
customer edges (CEs) and user-end provider edges (UPEs) on a
metropolitan area network (MAN) shown in Figure 3-61. EFM OAM
helps maintain the reliability and stability of connections between a
user network and a provider network.
Ethernet CFM Ethernet CFM provides E2E connectivity fault detection, fault
notification, fault acknowledgement, and fault locating functions.
Ethernet CFM is used at the access and aggregation layers of the
MAN shown in Figure 3-61 to monitor network-wide connectivity
and detect connectivity faults. Ethernet CFM can work with a
protection switching technology to improve network reliability.

Equipment
Figure 3-61 Typical MAN networking

Node B
ATN
CE UPE
P E -A G G
BRAS
SOHO
CE UPE
IP /M P L S
In tra n e t co re
CE ......
CX
P E -A G G
UPE
CE
C o m m e rcia l
ce n tre
CE UPE
R e sid e n tia l
a re a
E F M O A M (8 0 2 .3 a h ) E th e rn e t C F M (8 0 2 .1 a g ) B a ckb o n e
E th e rn e t in th e first m ile A cce ss co n ve rg e n ce la ye r n e tw o rk
o n th e M A N

Equipment
3.5.3 Applications
Fault and Performance Detection on E-Line Services
Figure 3-62 E-Line services on PWs
RNC
Node B
Metro
ATN
PW/PBT
Node B Node B
ATN
MA1 MEP of MA1
MA2 MEP of MA2
MA3 MEP of MA3
Defined in MEF 6, VLL services refer to Ethernet line services based on point-to-point
Ethernet virtual connections.
As shown in Figure 3-62, PWs are set up using the MPLS technology. As for the whole
service channel, a tunnel can be considered as a hop; as for a metropolitan network, Ethernet
OAM is used to set up an MD from a local UPE interface that accesses a CPE to a remote
UPE interface that accesses a CPE. MAs are set up for specific user services, and a MEP in
inward mode is created on a UPE interface that accesses a CPE. In this manner, faults on PWs
can be detected and services transmitted through PWs can be protected (MA2).

Equipment
Fault and Performance Detection on E-LAN Services
Figure 3-63 Point-to-Multipoint (P2MP) Ethernet

(CPE)
(UPE) RNC
ATN
Metro
MSTP/RRPP/RPR
Node B
(CPE)
ATN Node B
(UPE)
(CPE)
3AH
1AG
Figure 3-64 E-LAN services provided through VPLS

(CPE)
(UPE)
RNC
ATN
VPLS
Metro
Node B
(CPE)
ATN Node B
(UPE) (CPE)
3AH
1AG
Defined in MEF 6, E-LAN services provide P2MP connections for users. E-LAN services can
be implemented using technologies, such as pure Ethernet, VPLS, and QinQ. Each node on a
network must learn a MAC address.
UPEs implement fault detection on the network of multiple UPEs through 802.1ag by sending
multicast CC packet. Detection on delay or jitter can be implemented only between two
specific UPEs. Fault detection between the CPE and UPE uses 802.3ah, and no P2MP issues
are involved.
Fault Detection on E-tree Services

E-tree, which is still an MEF draft, provides P2MP services for users. Most E-tree services are
multicast services that feature single-direction traffic, multicast addresses, and possibility of

Equipment
inter-VLAN replication. For E-tree services, fault detection is performed on the leaf node for
the link from the root to the leaf and the leaf node is dual-homing protected.
l Fault detection on the single-direction P2MP network:
As shown in Figure 3-65, no inter-VLAN replication is applied on such a network. Each
node only concerns the status of the link to the root node, but not the status of the links
to other leaf nodes. The root node, however, does not need to concern the status of the
link to any leaf node. Therefore, you can set up only one MA and configure all leaf
nodes and the root node as MEPs. On the root node, you can specify all leaf nodes as
remote MEPs on the root node and enable the sending of CC packets. On all leaf nodes,
you can specify only the root node as the remote MEP and enable the receiving of CC
alarms. After the configuration, you can perform OAM detection on the network.
Figure 3-65 Unidirectional P2MP E-tree services
Metro/IP/MPLS
Core
Multicast
source
Leaf1
Bridge
Leaf2 Leaf3
l Fault detection on the multi-VLAN single-direction P2MP network:

The intermediate bridge implements inter-VLAN replication. In this case, you need to set
up only VLAN group-based MAs in the scenario of fault detection on the single-
direction P2MP network. You can also perform fault detection segment by segment and
transmit the fault between MAs.
l Fault detection on the bidirectional P2MP network:
As shown in Figure 3-66, a leaf node may communicate with the root node. Besides, the
leaf node may communicate with another leaf node; during the communication, the root
node transfers information for both leaf nodes. On such a network, MAs can be set up
between the root node and leaf nodes. In addition, you need to associate the MEPs on the
root node for transmitting alarms.

Equipment
Figure 3-66 Bidirectional P2MP E-tree services
Metro/IP/MPLS
Core
Multicast
source
Leaf1
Bridge
Leaf2 Leaf3
MEP in MA1
MEP in MA2
MEP in MA3
Fault Detection and Protection Switchover of Link Aggregation

l Fault detection and protection switchover of the static LACP link aggregation group
It takes LACP at least 3s to detect a link fault, which cannot meet the requirement of
carrier-class networks, that is, protection switchover within 50 ms. On the network
requiring high reliability, you can configure Ethernet CFM on interfaces on both ends of
a member link in the LACP aggregation group and the association between Ethernet
CFM and an interface function. Ethernet CFM fast detection on connectivity faults and
the association function make the protection switchover of the static LACP aggregation
group implemented within 50 ms, meeting the requirement of carrier-class networks.
As shown in Figure 3-67, ATNand CXB are configured with the LACP static link
aggregation group.
Figure 3-67 Fault detection and protection switchover of the LACP static link aggregation
group
Ethernet CFM
GE0/2/1 Link1 GE1/0/1
GE0/2/2 Link2 GE1/0/2
Link3
ATN GE0/2/3 GE1/0/3 CX-B
Active link MEPs in MA1

Inactive link MEPs in MA2
Aggregation MEPs in MA3
group in static
LACP mode
You can configure Ethernet CFM on ATN and CX-B and configure MEPs on all the member
interfaces of the aggregation group. MEPs on the interfaces of the same link are configured

Equipment
within the same MA. MEPs on the interfaces of different links are configured within different
MAs. MEPs on all the interfaces belong to the same MD. You can detect the link connectivity
by exchanging CCMs between MEPs of the same link. You can then associate Ethernet CFM
with the interfaces.
When a connectivity fault occurs on Link1, the OAM modules on ATN and CX-B block and
then unblock their GE0/2/1 and GE1/0/1 interfaces respectively. In this manner, the LACP
module senses the connectivity fault on Link1 and switches the service data from Link1 to the
inactive Link3.
Terms
None

Acronym and Full Name
Abbreviation
CCM continuity check message
CFM connectivity fault management
EFM Ethernet in the First Mile
LBM loopback message
LBR loopback reply
LTM linktrace message
LTR linktrace reply
MA maintenance association
MD maintenance domain
MEP maintenance association end point
MIP maintenance association intermediate point
OAM operation, administration and maintenance
PBT Provider backbone transport
3.6 E-LMI

Equipment
3.6.1 Introduction
Definition
Ethernet Local Management Interface (E-LMI) is an operation, administration and
maintenance (OAM) protocol defined by the Metro Ethernet Forum (MEF) 16 Technical
Specification. E-LMI runs on the user-to-network interface (UNI) link between a provider
edge (PE) and a customer edge (CE), enabling the PE to notify the CE of the connectivity
status and Ethernet service configuration parameters available for the UNI on the CE side.
Figure 3-68 shows where E-LMI is deployed on a network.
Figure 3-68 E-LMI on a network
CE1 PE1 PE2 CE2

User UNI UNI User
Network VLAN/ Network
Local CCC
E-LMI E-LMI
Purpose
E-LMI enables the PE to implement the following functions:
l Notifies the addition or deletion of an Ethernet virtual connection (EVC) to the CE.
l Notifies EVC status (Active, Not Active, or Partially Active) to the CE.
l Communicates UNI and EVC attributes to the CE.
3.6.2 Principles
E-LMI enables a CE to request and receive status and service attributes from a PE so that the
CE can be automatically configured to implement Metro Ethernet services. Figure 3-69
shows E-LMI networking.
Figure 3-69 E-LMI networking
CE1 PE1 PE2 CE2

UNI UNI
VLAN
E-LMI E-LMI

Equipment
EVC
An EVC associates two or more UNIs and is classified as point-to-point or multipoint-to-
multipoint. Figure 3-70 shows EVC classification.
Figure 3-70 EVC classification
Multipoint to
Point to Point MultiPoint
EVC EVC
An EVC has the following states:

l Active
All channels are Up.
l Partially Active
Some channels are Down. This state applies to a multipoint-to-multipoint EVC.
l Not Active
All channels are Down.
EVCs support VLAN services and can be mapped to CE-VLANs. The mapping type is
determined by the number of EVCs that the PE UNI needs to support and the number of CE-
VLANs. Currently, the mapping type must be Bundling. A PE UNI can support multiple
EVCs, and one or more VLANs can be mapped to an EVC on the same UNI. A VLAN can be
mapped to only one EVC.
EVC status can be detected using IEEE 802.1ag.
E-LMI Messages
Table 3-10 describes E-LMI messages.

Equipment
Table 3-10 E-LMI messages

Message Sent in Function Message Sent in Function
CE Mode PE Mode
STATUS ENQUIRY The CE sends this STATUS message The PE returns this
message with a message to the PE to with a report type of message carrying
report type of Full request the status of Full Status the status of the UNI
Status the UNI and all and all EVCs to the
EVCs. CE.
STATUS ENQUIRY The CE sends this STATUS message The PE responds to

message with a message to the PE if with a report type of the CE with this
report type of Full the status Full Status message after
Status Continued information cannot Continued receiving a STATUS
be sent in a single ENQUIRY message
Ethernet frame. with a report type of
Full Status
Continued.
STATUS ENQUIRY The CE uses this STATUS message The PE uses this
message with a message to with a report type of message to
report type of E- determine whether E-LMI Check determine whether
LMI Check E-LMI is E-LMI is
operational. operational.
– – STATUS message The PE sends this

with a report type of message to the CE
Single EVC when detecting an
Asynchronous EVC status change.
Status
As shown in Figure 3-71, after receiving a STATUS ENQUIRY message with a report type of
Full Status from the CE, the PE checks whether the length of the message exceeds the length
of a single Ethernet frame. If the length of the message exceeds the length of a single Ethernet
frame, the PE returns a STATUS message with a report type of Full Status Continued. After
receiving the message, the CE returns a STATUS ENQUIRY message with a report type of
Full Status Continued. This process continues until all the remaining status information is sent
out in a single Ethernet frame. The PE then sends a STATUS message with a report type of
Full Status.

Equipment
Figure 3-71 Message exchange between the CE and PE

CE PE
FULL Status Enquiry
FULL Status Continued
FULL Status Continued Enquiry
FULL Status Continued

...
FULL Status Continued Enquiry
FULL Status
E-LMI Working Modes

An E-LMI-capable device can work in PE or CE mode.
l When working in PE mode, the device obtains EVC and UNI status information and
sends the information to the CE.
l When working in CE mode, the device requests or receives data from the PE.
Table 3-11 respectively describe the counters and timers involved in implementing E-LMI.
Table 3-11 E-LMI counters

Counter Name Description Function
N391 Polling counter Full Status polling Determines the

count. number of polling
cycles between Full
Status exchanges.
N393 Status counter Count of Determines whether

consecutive errors. E-LMI is
If no response operational.
message is received
or error messages
are received within a
period specified by
T391 (for the CE) or
T392 (for the PE),
an error is recorded.

Equipment
Table 3-12 E-LMI timers

Timer Name Start Stop Actions
Condition Condition Taken upon
Timer
Expiration
T391 Polling timer A STATUS - A STATUS

(PT) ENQUIRY ENQUIRY
message with a message with a
report type of report type of
Full Status or Full Status or
Full Status Full Status
Continued is Continued is
transmitted. transmitted and
an error is
recorded, if no
STATUS
message with a
report type of
Full Status or
Full Status
Continued is
received.
T392 Polling A STATUS A STATUS An error is

verification message with a ENQUIRY recorded.
timer (PVT) report type of message with a
Full Status or report type of
Full Status Full Status or
Continued is Full Status
transmitted. Continued is
received.

Acronym & Full Name
Abbreviation
E-LMI Ethernet Local Management Interface
EVC Ethernet virtual connection
MEF Metro Ethernet Forum
PVT polling verification timer
UNI user-to-network interface

Equipment
3.7 MPLS-TP OAM

3.7.1 Introduction to MPLS-TP OAM
Definition
Multiprotocol Label Switching Protocol Transport Profile (MPLS-TP) is a transport technique
that integrates MPLS packet switching with traditional transport network features. MPLS-TP
networks are poised to replace traditional transport networks in the future. MPLS-TP
operation, administration and maintenance (MPLS-TP OAM) works on the MPLS-TP client
layer and server layer. It can effectively detect, identify, and locate faults in the client layer
and quickly switch traffic when links or nodes become defective. OAM is an important part of
any plan to reduce network maintenance expenditures.
Purpose
Both networks and services are part of an ongoing process of transformation and integration.
New services like triple play services, Next Generation Network (NGN) services, carrier
Ethernet services, and Fiber-to-the-x (FTTx) services are constantly emerging from this
process. Such services demand more investment and have higher OAM costs. They require
state of the art QoS, full service access, and high levels of expansibility, reliability, and
manageability of transport networks. Traditional transport network technologies, such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or
Wavelength Division Multiplexing (WDM) cannot meet these requirements because they lack
a control plane. Unlike traditional technologies, MPLS-TP does meet these requirements
because it can be used on next-generation transport networks that can process data packets, as
well as on traditional transport networks.
Because traditional transport networks like SDH or Optical Transport Node (OTN) networks
have high reliability and maintenance benchmarks, MPLS-TP must provide powerful OAM
capabilities. MPLS-TP OAM provides the following functions:
l Fault management
l Performance monitoring
l Protection switching
3.7.2 Principles
3.7.2.1 MPLS-TP OAM Functional Components
MPLS-TP OAM uses a range of functional components that have been defined in ITU-T Y.
1731. These definitions are widely used by IETF MPLS-TP work teams. The functional
components mentioned below are defined in the majority of OAM-associated MPLS-TP
standards documentation.
ME and MEG
MPLS-TP OAM functions are performed for maintenance entities (MEs). An ME consists of
a pair of maintenance entity group end points (MEPs) at either end of a transport path as well
as maintenance entity group intermediate points (MIPs) located between the two MEPs. OAM
operates between two MEPs along a transport path. The path can be either a P2P transport

Equipment
path, such as a pseudo wire (PW) or a point-to-point label switched path (P2P LSP), or a
point-to-multipoint (P2MP) path, for example, a P2MP LSP.
One or more MEs that use a transport path form a maintenance entity group (MEG).
Figure 3-72 illustrates either a P2P LSP or a PW. If the figure shows a P2P LSP, A and D are
label edge routers (LERs) and B and C are label switching routers (LSRs) If it illustrates a
multi-segment pseudo wire (MS-PW), A and D are a terminating provider edge (T-PEs), and
B and C are switching provider edges (S-PEs). A and D are MEPs and B and C are MIPs. The
equipment in the diagram can be connected by a physical link, an LSP, or a sub-layer
transport path.
Figure 3-72 MEG over a P2P path

A B C D
ATNA CX-B RNC

NodeB
MEP
A MEP is the source or sink node of a MEG. The MEP can only be an LER on an MPLS-TP
LSP, or a T-PE on an MPLS-TP PW.
On a P2P LSP shown in Figure 3-73, only PE1 and PE2 are MEPs.
Figure 3-73 MEPs

CE1 PE1 ASBR1 ASBR2 PE2 CE2
End-to-End
LSP
Maintenance
End Point
3.7.2.2 Continuity Check

Continuity check (CC) and connectivity verification (CV) are both MPLS-TP functions. CC is
used to check loss of continuity (LOC) between two MEPs in a MEG. CV monitors
connectivity between two MEPs within one MEG or in different MEGs. In real world
situations, CC and CV are usually used together. Although they serve very different purposes,
the two functions are integrated in the ATN.
CC
CC is a pro-active OAM operation. It detects LOC faults between any two MEPs that are part
of a MEG. A MEP sends CC messages (CCMs) to a remote RMEP at specified intervals. If
the RMEP does not receive a CCM for a period 3.5 times greater than the specified interval, it

Equipment
considers the connection between the two MEPs to be faulty. This causes the RMEP to report
an alarm and enter the Down state, and it triggers automatic protection switching (APS) on
both MEPs. After receiving a CCM from the MEP, the RMEP will clear the alarm and exit the
Down state.
CV
CV is also a pro-active OAM operation. It enables a MEP to report alarms when packets are
received unexpectedly or in error. For example, if a CV-enabled MEP receives a packet from
an LSP and finds that this packet has been transmitted in error by the LSP, the MEP will
report an alarm indicating a forwarding error.
3.7.2.3 Loss Measurement

LM is an MPLS-TP performance monitoring function that is implemented between two MEPs
on a PW, an LSP, or a section. LM statistics include near-end and far-end frame loss statistics.
Near-end frame loss statistics record the loss of frames sent from the RMEP to MEP; far-end
frame loss statistics record the loss of frames sent from the local MEP to the RMEP. To
collect frame loss statistics for frames going in both directions, each MEP must have both of
these counters:
l TxFCl: records the number of sent frames.
l RxFCl: records the number of received frames.
Figure 3-74 Networking diagram for loss measurement

CE1 PE1 PE2 CE2
MPLS-TP
Dual-End ETH-LM CCM TxFCf RxFCb TxFCb
LMM TxFCf
Single-End ETH-LM
LMR TxFCf RxFCb TxFCb
Maintenance End Point
Figure 3-74 shows single-ended and dual-ended LM processes. Dual-ended LM works only
in pro-active monitoring mode. Two MEPs in this mode periodically send CCMs carrying the
following information:
l TxFCf: the local TxFCl value recorded when a CCM is sent.
l RxFCb: the local RxFCl value recorded when a CCM is received.
l TxFCb: the TxFCf value carried in a received CCM. This TxFCb value is the RMEP's
TxFCI.
After receiving CCMs carrying frame count information, both MEPs use formulas shown in
Figure 3-75 to implement near-end and far-end LM.

Equipment
Figure 3-75 Dual-ended LM formulas
Frame Loss far-end = | TxFCb[tc] – TxFCb[tp] | - | RxFCb[tc] – RxFCb[tp] |

Frame Loss near-end = | TxFCf[tc] – TxFCf[tp] | - | RxFCl[tc] – RxFCl[tp] |
TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values carried in
the most recently received CCM. RxFCl[tc] is the RMEP's RxFCI value, and tc shows the
time when this CCM is received.
TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values carried in
the previously received CCM. RxFCl[tp] is the RMEP's RxFCI value, and tp shows the time
when the previous CCM was received.
Single-ended LM usually works in on-demand mode. In this mode, only the MEP periodically
sends a loss measurement message (LMM); the RMEP does not. LMMs carry the following
information to the REMP:
l TxFCf: the local TxFCl value recorded when the LMM is sent.
After receiving an LMM, the RMEP returns a loss measurement reply (LMR) carrying the
following information:
l TxFCf: the TxFCf value carried in the LMM.
l RxFCf: the local RxFCl value recorded when the LMM was received.
l TxFCb: the local TxFCl value recorded when the LMR is sent.
After receiving an LMR, the local MEP uses the equations shown in Figure 3-76 to calculate
near-end and far-end frame loss measurement.
Figure 3-76 Single-ended LM calculation
Frame Loss far-end = | TxFCf[tc] – TxFCf[tp] | - | RxFCf[tc] – RxFCf[tp] |

Frame Loss near-end = | TxFCb[tc] – TxFCb[tp] | - | RxFCl[tc] – RxFCl[tp] |
TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values carried in the
most recently received LMR. RxFCl[tc] is the local RxFCI value recorded when the LMR is
received, and tc shows the time when the LMR is received.
TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values carried in the
previously received LMR. RxFCl[tp] is the local RxFCI value, and tp shows the time when
the previous LMR was received.
3.7.2.4 Delay Measurement

Delay measurement (DM) is an MPLS-TP performance monitoring function. MPLS-TP OAM
makes on-demand DM available at all times. Delay variation (jitter) based on delay statistics
can also be determined.
As shown in Figure 3-77, DM is performed between two MEPs. DM implementation can be

either one-way or two-way.

Equipment
Figure 3-77 Schematic diagram for frame delay measurement

CE1 PE1 PE2 CE2
MPLS-TP
One-way ETH-DM 1DM TxTimeStampf
DMM TxTimeStampf
Two-way ETH-DM
DMR TxTimeStampf
Maintenance End Point
l A MEP performs one-way frame delay measurement by periodically sending delay

measurement (DM) frames carrying the TxTimeStampf value (the time when the DM
frame was sent) to its RMEP. After receiving a DM frame, the RMEP subtracts the
TxTimeStampf value from the receipt time to calculate delay:
Frame delay = RxTimef - TxTimeStampf (RxTimef is the time when the DM frame is
received.)
Note that MEPs that perform one-way DM must have the synchronous time. If not, the
frame delay measurement is inaccurate, and only the jitter is checked.
l A MEP performs two-way DM by periodically sending delay measurement messages
(DMMs) carrying the TxTimeStampf value (the time when the DMM was sent). After
receiving a DMM, the RMEP subtracts the TxTimeStampf value from the receipt time to
calculate delay:
Frame delay = RxTimeb - TxTimeStampf (RxTimeb is the time when the DMM is
received.)
To obtain a more accurate result, RxTimeStampf indicating the time when a DMM is
received and TxTimeStampb indicating the time when a DM reply (DMR) is sent are
used. After receiving a DMR, a MEP uses this equation to calculate frame delay:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
Two-way DM supports delay and jitter measurement even if both MEPs do not have
synchronized time. The frame delay is actually the round-trip delay. If both MEPs have
synchronous time, the round-trip delay can be calculated by combining the two delays
calculated with the following equations:
– Transmission delay from a MEP to its RMEP = RxTimeStampf - TxTimeStampf
– Transmission delay from an RMEP to a MEP = RxTimeb - TxTimeStampb
3.7.3 Application
3.7.3.1 MPLS-TP OAM over an IP RAN in the Layer 2 to Edge Scenario

Figure 3-78 shows an IP Radio Access Network (RAN) in the Layer 2 to edge scenario that
uses Time Division Multiplexing (TDM), Asynchronous Transfer Mode (ATM), or Ethernet.
An MPLS-TP network can be used to directly connect Base Transceiver Stations (BTSs) to
Base Station Controllers (BSCs), or NodeBNodeBs to Radio Network Controllers (RNCs).

Equipment
An MPLS-TP network uses Pseudo Wire Emulation Edge to Edge (PWE3) to transmit TDM,
ATM, or Ethernet services.
Figure 3-78 Networking diagram for MPLS-TP OAM over an IP RAN in the Layer 2 to edge
scenario
BTS/NodeB
FE/GE
MPLS-TP
N*E1 GE
IMA E1 STM-1
RNC/BSC
BTS/NodeB
FE/GE
BTS/NodeB
MPLS-TP OAM is used for MPLS-TP operation and maintenance. MPLS-TP OAM can
effectively detect, identify, and locate faults in the client layer and quickly switch traffic when
links or nodes become defective. This reduces network maintenance expenditures.
3.7.4 Acronyms and Abbreviations

A
AIS Alarm Indication Signal
CC Continuity Check
CV Connectivity Verification
DM Delay Measurement
LM Loss Measurement
MEP maintenance association end point
MIP maintenance association intermediate point
MPLS-TP Multiprotocol Label Switching Transport Profile

Equipment
OAM operation, administration and maintenance
RDI Remote Defect Indication
3.8 ISSU Feature Description

Only devices with two main control boards (such as ATN 950Bs) support ISSU feature.
Restarting devices is required during software upgrade and maintenance on most networks.
However, the restart interrupts services and deteriorates system performance.
A solution to the problem is to establish an equal-cost multiple path (ECMP) and switch
service traffic to a backup path during a software upgrade. In this situation, network
configurations must be modified, which increases the failure probability and prolongs the
upgrade period. The other effect is that services may be interrupted when a backup path is
over-crowded after the service switchover.
In-Service Software Upgrade (ISSU) is a convenient and rapid software upgrade mode that
upgrades the system software on a device. ISSU is used for planned software upgrade of
network devices. It effectively shortens the service interruption period and improves service
reliability.
NOTE
Users cannot specify the source and target versions at random for ISSU. The two versions need to be
planned and developed in advance. For details about ISSU target versions supported by the current
version, contact Huawei R&D engineers.
3.8.1 Introduction
Definition
In-Service Software Upgrade (ISSU) is a upgrade mode in which services are not affected.
ISSU reduces service interruption time greatly and enhances network availability.
Purpose
The traditional software upgrade mode interrupts services running on the device for a long
time, which decreases benefits brought to operators. The online software patch technology
can upgrade some modules of software to rectify the defects of software during the running of
the device; however, the technology is restricted under most situations. As a result, the full
image software upgrade is required.
In addition, the traditional software upgrade is carried out during midnight to reduce the
impact of upgrade on services, and it has strict requirements on upgrade operations. If the
traditional software upgrade is not finished within the specified period, the device must be
rolled back to the previous version. Thus, the software has to be upgraded again. As a result,
new services cannot be provided to users or defects cannot be rectified in time. In addition,
time limitation increases the probability of manual operation failures and the cost on human
resources and management.

Equipment
Besides choosing a proper upgrade time, traditional upgrading technologies reduce service
interruption time by establishing multiple equal-cost paths or backup paths to switch services.
As a result, network configurations must be modified, which increases the failure probability
and prolongs upgrade period. In addition, traffic may be interrupted because some backup
paths are too crowded to bear traffic after service switchover, as shown in Figure 3-79
Figure 3-79 Traditional upgrade
Out-service
for
This method takes time and resources to switch services to the backup path. If the network
does not have backup paths, this method does not take effect. This method does not apply to
the cluster chassis which is a trend of network development.
ISSU greatly reduces the impact of software upgrade on services, improves customer
satisfaction, enhances product competitiveness, and even facilitates Huawei engineers in
maintaining devices. In ISSU, no services are interrupted (ideally), software is upgraded at
any time and without switching services, thus simplifying software upgrade, as shown in
Figure 3-80

Equipment
Figure 3-80 ISSU
in-service
software
upgrade
Benefits
l Benefits Brought to Operators
ISSU reduces service interruption time to seconds. In addition, ISSU can be
implemented without switching services, and thus it is free from backup links and
reduces maintenance cost on software upgrade.
l Benefits Brought to Users
In ISSU, users' services are seldom affected and thus customer satisfaction is improved.
3.8.2 Principles
3.8.2.1 ISSU Principle
ISSU implements automatic software upgrade without interrupting services. Its principle is as
follows:
l ISSU restarts the SMB based on the new version. In this way, the new process the SMB
after restart form the new forwarding plane and Active Management Plane (AMP).
l Data synchronization and configuration restoration are performed between the new AMB
and old AMB.
l The new AMP and forwarding plane replace the old ones to implement ISSU.

Equipment
Figure 3-81 Implementation principle of ISSU
NOTE
In Figure 3-81, A is old version, B is new version.
ISSU is implemented as follows:
1. Perform the ISSU check.

– Upload the new version file to the CF cards of the AMB and SMB.
– Start ISSU check, including the system status (whether the AMB and SMB work
normally), memory status of each board, hardware compatibility, and feature
compatibility. During the ISSU check, the SMB is restarted to check feature
specifications.
– After the ISSU check is complete, system upgrade type and system maximum down
time are displayed.
NOTE
The tool to check compatibility on the host side and the precheck command are provided. The
ISSU precheck does not cause the SMB to restart and has little impact on the system, and thus it
can be used in any non-ISSU phases. It helps you determine whether the system is available for
ISSU.

Equipment
2. Start the Standby Management Plane (SMP) based on the new version and synchronize
data.
– The AMB generates a configuration file of the new version and synchronize it with
the SMB (AMB of the new AMP).
– The AMB of the new AMP restores configurations.
– The data is dynamically synchronized between the new and old AMPs.
3. Perform the AMP/SMP switchover.
– After data backup is complete, AMP/SMP switchover can be performed. The new
AMP (namely the original SMP) takes over the whole host system, including data
forwarding plane.
– The incremental updates of service entries and data smoothing of the new AMP are
complete.
4. Update the new SMP.
– Restart the new SMP based on the new version to complete ISSU.
3.8.3 Typical Applications
Devices that support ISSU can be upgraded through ISSU. Check the version compatibility
through a version comparison tool or command lines. Then, accordingly the proper ISSU
mode is automatically selected. As shown in Figure 3-82, the device marked in yellow is to
be upgraded. Through ISSU, instead of using multiple equal-cost/backup paths, you can
upgrade the device directly on the device.
Figure 3-82 ISSU
Terms
NA.

Equipment
Abbreviations
ISSU In-Service Software Upgrade

Equipment
Feature Description 4 Interface Management
4 Interface Management
About This Chapter
This document provides an overview of interface management together with an explanation of

its applications and principles.
4.1 Logical Interface

4.2 Transmission Alarm Customization and Suppression
4.3 Interface Alarm Inversion
4.1 Logical Interface
4.1.1 Introduction
Definition
A logical interface is a virtualized interface and does not physically exist. It must be manually
configured to exchange data. Logical interfaces include:
l Sub-interface
l Trunk interface
l VLANIF interface
l Virtual Ethernet (VE) interface
l Loopback interface
l Null0 interface
l Tunnel interface
Purpose
l Sub-interface
For point-to-point communication, assigning one IP address per physical interface
generally meets requirements. However, if the link layer of an interface supports

Equipment
multiplexing of multiple connections, one primary IP address on the local interface is

insufficient.
To resolve this problem, multiple sub-interfaces can be created on the local interface to
establish virtual connections to the remote networks. A remote network can then
communicate with one of the local sub-interfaces and consequently communicate with
the local network.
ATN devices allow multiple sub-interfaces to be configured on one physical interface.
l Trunk interface
A trunk interface is a group of physical interfaces that are bundled into a logical
interface. The physical interfaces are member interfaces. Trunk interfaces have the
following advantages:
– Provide increased bandwidth. The bandwidth of a trunk interface is the total
bandwidth of all member interfaces.
– Enhance reliability. When the physical link of a member interface fails, traffic on
the link is switched to another member link, improving transmission reliability.
– Support load balancing. After load balancing is implemented on trunk member
interfaces, traffic is distributed among member interfaces, preventing network
congestion that occurs when all traffic is transmitted over one path.
l VLANIF interface
A VLANIF interface is a Layer 3 VLAN interface that is configured to implement inter-
VLAN communication.
A VLAN is a broadcast domain that allows only intra-VLAN communication. Hosts in
different VLANs require IP routes to communicate with each other. To create IP routes, a
VLANIF interface must be configured on a switch that supports IP routing. If such
switch does not exist, you must create VLANIF interfaces on ATN devices.
l VE interface
A VE interface provides similar functions as a physical Ethernet interface.
l Loopback interface
A loopback interface always remains Up after being created and allows a device to loop
back packets to itself. Loopback interfaces are often configured to enhance reliability.
The IP address of a loopback interface is mainly used as follows:
– Specified as the source address of packets to improve network reliability.
– Used to control an access interface and filter logs to simplify information
displaying.
Generally, BGP uses the optimal local IP address destined for neighbors for TCP
connections. If the interface with the optimal local IP address fails, BGP cannot establish
TCP connections. In the Interior BGP (IBGP) topology, there are often multiple links to
the same neighbor. In this case, using the loopback interface as the BGP neighbor of the
local ATN can ensure reliable connections.
l Null0 interface
A Null0 interface neither forwards nor receives packets. Instead, it directly discards all
packets sent to it. A Null0 interface is used to prevent loops and discard unwanted traffic
for ATN devices.
l Tunnel interface
A tunnel is a virtual point-to-point channel. A tunnel interface encapsulates datagrams at
one end of a tunnel and decapsulates datagrams at the other end.

Equipment
4.1.2 Principles
4.1.2.1 Trunk Interface
Restrictions on Trunk Interfaces

Trunk interfaces are subject to the following restrictions:
l Member interfaces must have consistent parameters on the two ends of a trunk link.
These parameters include:
– Interface quantity
– Interface transmission rate
– Interface duplex mode
l Data sequence must be guaranteed.
A data flow is a group of packets with the same source and destination MAC addresses,
source and destination IP addresses, and source and destination port numbers. For
example, the Telnet or FTP connection between two devices is a data flow.
Before trunking is configured, data flow frames can reach their destination in the correct
order because only one physical connection exists between two devices. However, after
trunking is configured, frames may reach the destination in an incorrect order, because
packets are transmitted over multiple links.
To prevent frame misorder, a packet forwarding mechanism is configured.
After the packet forwarding mechanism is configured, packets are transmitted in any of the
following manners:
l Packets with the same source and destination MAC addresses are transmitted over the
same physical link.
l Packets with the same source and destination IP addresses are transmitted over the same
physical link.
l Packets with the same source and destination IP addresses, source and destination
TCP/UDP port numbers, and IP protocol types are transmitted over the same physical
link.
Trunk Forwarding Principles

As shown in Figure 4-1, the trunk layer belongs to the data link layer and lies between the
MAC sub-layer and physical layer.
Figure 4-1 Trunk interface in the Ethernet protocol stack
LLC
Data link
layer MAC
Trunk
Physical layer PHY

Equipment
The MAC sub-layer regards trunk interfaces as physical interfaces and delivers frames
directly to trunk interfaces.
The trunk module maintains a trunk forwarding table that contains the following two items:
l Key value
Key values are calculated using a hash algorithm based on packets' MAC or IP
addresses.
l Interface number
The number of entries in a trunk forwarding table is the same as the number of bundled
member interfaces. For example, if interfaces 3, 4, 5, and 6 are bundled into a trunk
interface, a trunk forwarding table contains four entries, as shown in the following
figure.
Figure 4-2 Example of a trunk forwarding table
KEY 0 1 2 3
PORT 3 4 5 6
The trunk module forwards frames based on the trunk forwarding table in the following way:
1. The trunk module receives a frame from the MAC sub-layer and extracts the source
MAC or IP address or extracts the destination MAC or IP address.
2. The trunk module uses a hash algorithm to calculate the key value.
3. The trunk module searches for the interface number in the trunk forwarding table based
on the key value.
4. The trunk module sends the frame through the corresponding interface.
For more trunk details, see ATNMulti-service Access EquipmentFeature Description -
LAN Access and MAN Access.
4.1.2.2 VLANIF Interface

For details about VLANIF interfaces, see Inter-VLAN Communication.
4.1.3 Applications
Eth-Trunk Interface
An Eth-Trunk interface aggregates the bandwidth of its member interfaces. As shown in the
following figure, an Eth-Trunk interface is created on ATN A and ATN B, and two full-duplex
GE interfaces are added to the Eth-Trunk interface. The bandwidth of the Eth-Trunk link
totals the bandwidths of the two GE interfaces.
An Eth-Trunk interface improves traffic transmission reliability. If one member link fails,
traffic is switched to the other member link.
An Eth-Trunk interface supports load balancing on its member interfaces to prevent network
congestion.

Equipment
Figure 4-3 Eth-Trunk networking diagram
ATN A Eth-Trunk1 ATN B

10.1.1.1/24 GE1
GE1
GE2 Eth-Trunk1 GE2
10.1.1.2/24
VLANIF Interface
VLANIF interfaces are used for inter-VLAN communication. In Figure 4-4, if hosts in
VLAN 2 need to communicate with hosts in VLAN 3, VLANIF interfaces must be created for
VLAN 2 and VLAN 3 on the ATN device.
Figure 4-4 Inter-VLAN communication through VLANIF interfaces
ATN
VLANIF VLANIF
VLAN 2 VLAN 3
Loopback Interface
l Reliability improvement
– Application in IP address unnumbered
When an interface uses an IP address only for a short period, the interface can
borrow a loopback interface's IP address from another interface to save IP address
resources and maintain interface stability.
– Application in router IDs
Some dynamic routing protocols require that ATNs have router IDs. A router ID
uniquely identifies an ATN in an autonomous system (AS).
For example, if no router ID is configured for both OSPF or BGP on a device, the
device needs to select the largest IP address as the router ID from the local interface

Equipment
IP addresses. If a physical interface's IP address is selected, the system does not

reselect another router ID until the selected IP address is deleted.
If a router ID needs to be configured for an ATN device, a loopback interface's IP
address can be configured as the router ID, because the loopback interface never
goes Down.
– Application in BGP
To prevent BGP sessions from being affected by physical interface faults, a
loopback interface can be configured as the source interface that sends BGP
packets.
Before configuring a loopback interface as the source interface, note the following:
The loopback interface address of the BGP peer must be reachable.
In the case of an EBGP connection, EBGP must be allowed to establish peer
relationships through the indirectly connected interfaces.
– Application in MPLS LDP
In MPLS LDP, a loopback interface address is often used as the transmission
address to maintain network stability. This IP address may be a public network
address.
l Information filtering
– Application in SNMP
To ensure SNMP server security, a loopback interface address can be specified as
the source IP address of trap messages.
The system allows only the packets from the loopback interface address to access
the SNMP interface. This mechanism protects the SNMP management system and
facilitates trap message reading and writing.
– Application in NTP
The Network Time Protocol (NTP) synchronizes the time of all devices. To ensure
NTP security, a loopback interface address can be specified as the source address of
NTP packets sent from the local ATN.
The system allows only the packets from the loopback interface address to access
the NTP interface, thereby filtering packets to protect the NTP system.
– Application in information recording
Before recording outgoing network traffic, a loopback interface address can be
specified as the source IP address of the outgoing network traffic.
Only packets from the loopback interface address are recorded.
– Application in security
To rapidly locate the source of logs on a user log server, a loopback interface
address can be configured as the logs' source IP address.
– Application in HWTACACS
To ensure Huawei Terminal Access Controller Access Control System
(HWTACACS) server security, a loopback interface address can be specified as the
source address of packets sent from the local ATN.
The system allows only the packets sent from the loopback interface address to
access the HWTACACS server, thereby facilitating log reading and writing.
– Application in RADIUS authentication
To ensure RADIUS server security, a loopback interface address can be specified as
the source IP address of the packets sent from an ATN device.

Equipment
The system allows only the packets sent from the loopback interface address to
access the RADIUS server, thereby facilitating log reading and writing.
Tunnel Interface
Tunnels such as GRE tunnels use tunnel interfaces to forward packets. Tunnel interfaces are
virtual interfaces that must be created before these tunnels can be used.
The source and destination addresses of a tunnel uniquely identify a tunnel. The source tunnel
address of the local end is the destination address of the remote end. Conversely, the
destination tunnel address of the local end is the source address of the remote end.
Tunnel interfaces can be configured with different encapsulation modes as required.
Terms
Term Definition
logical interface A virtual interface that can be used to exchange data.
loopback interface A logical interface that can be created by users or automatically

created by the system.
The loopback interface automatically created by the system has a
fixed IP address of 127.0.0.1/8.
Loopback interfaces are always Up.
virtual template A template that defines common attributes of VA interfaces and is

used to create VA interfaces.
virtual Ethernet A logical interface with Ethernet interface features.

interface
sub-interface A logical interface that can be created on a physical interface.

Multiple sub-interfaces can be created on the same physical interface.

Abbreviation
AAL ATM Adaptation Layer
ATM Asynchronous Transfer Mode
FR Frame Relay
GE Gigabit Ethernet
GRE Generic Routing Encapsulation
IBGP Internal Border Gateway Protocol

Equipment

Abbreviation
IP Internet Protocol
IPoA Internet Protocol over ATM
IPoE IP over Ethernet
IPoEoA IP over Ethernet over AAL5
IPX Internet Packet Exchange
ISDN integrated services digital network
ISP Internet Service Provider
MFR Multiple Frame Relay
MIB management information base
MODEM Modulator Demodulator
MP Multilink PPP
MTU maximum transmission unit
PPP Point-to-Point Protocol
PPPoA PPP over AAL
PPPoE PPP over Ethernet
PRI primary rate interface
PSE packet switch exchange
PVC permanent virtual circuit
QoS quality of service
TCP Transmission Control Protocol
VA virtual access
VE virtual Ethernet
VPN virtual private network
4.2 Transmission Alarm Customization and Suppression

4.2.1 Introduction
Definition
IP network reliability requirements continue to increase for carrier-class networks. Therefore,
network devices must be capable of rapidly detecting faults. However, if fast detection is

Equipment
enabled on an interface, the physical status of the interface frequently switches between Up
and Down because alarm reporting is also faster. This frequent switching is called network
flapping.
l The transmission alarm customization function allows you to specify alarms that can
cause an interface to change its physical status.
l The transmission alarm suppression function allows you to suppress network flapping by
setting a series of thresholds.
Purpose
Transmission alarm customization allows you to filter unwanted alarms, and transmission
alarm suppression enables you set thresholds on customized alarms, allowing devices to
ignore alarm burrs (also alarm chatters) generated during transmission link protection and
preventing frequent network flapping.
On the backbone network or Metropolitan Area Network (MAN), IP devices need to

cooperate with transmission devices. Common transmission devices include the devices
encapsulated with Synchronous Digital Hierarchy (SDH), Synchronous Optical Network
(SONET), or Wavelength Division Multiplexing (WDM).
If the transmission device fails, the IP device receives alarms. Then, the transmission device
protects the link and notifies the IP device that alarms are cleared. The entire process in which
the alarms are generated and then cleared generally lasts from 50 ms to 200 ms. For the IP
device, the process is generally regarded as a burr of a corresponding duration.
The IP device is expected to ignore such burrs. That is, when the transmission device
maintains or protects the link, the system uses the suppression to prevent route flapping,
ensuring that the network continues to run stably. Transmission alarms can be customized,
which minimizes the impact of alarms on the physical status of the interface. Transmission
alarm suppression can efficiently filter and suppress alarm signals to prevent interface
flapping.
4.2.2 Principles
Alarm Burr
An alarm burr, also called alarm chattering, is a process in which alarm generation and
clearance signals are received in a short period (the period varies with specific usage
scenarios, devices, or service types).
For example, if a loss of signal (LOS) alarm is cleared 50 ms after it is generated, the process
from the alarm generation to clearance is an alarm burr.
Alarm Flapping
Alarm flapping is a process in which an alarm is repeatedly generated and cleared in a short
period (The period varies with specific usage scenarios, devices, or service types).
For example, if an LOS alarm is generated and cleared 10 times in 1s, alarm flapping occurs.

Equipment
Key Parameters in Flapping Suppression

l figure of merit: stability value of an alarm. A larger value indicates a less stable alarm.
l penalty: penalty value. Each time an interface receives an alarm generation signal,
figure of merit increases by a fixed penalty value. Each time an interface receives an
alarm clearance signal, figure of merit decreases exponentially.
l suppress: alarm suppression threshold. When figure of merit exceeds this threshold,
alarms are suppressed. suppress must be smaller than ceiling and larger than reuse.
l ceiling: maximum value of figure of merit. When an alarm is repeatedly generated and
cleared in a short period, figure of merit significantly increases and takes a long time to
return to reuse. To avoid long delays returning to reuse, ceiling is set to limit the
maximum value of figure of merit. figure of merit does not increase when it reaches
ceiling. ceiling must be greater than suppress.
l reuse: alarm reuse threshold. When reuse is greater than figure of merit, alarms are not
suppressed. reuse must be smaller than suppress.
l half-time: time that the stability value (figure of merit) of suppressed alarms takes to
decrease to half.
l decay-ok: time that the stability value (figure of merit) takes to decrease to half when
an alarm clearance signal is received.
l decay-ng: time that the stability value (figure of merit) takes to decrease to half when
an alarm generation signal is received.
4.2.2.2 Transmission Alarm Processing

Transmission alarms are processed as follows:
1. After a transmission device generates an alarm, it determines whether to report the alarm
to its connected IP device based on the alarm type.
– If the alarm type is b3tca, sdbere, or sfbere, the device determines whether the
alarm threshold is reached.
If the threshold is reached, the device reports the alarm to the IP devices for
processing.
Otherwise, the device ignores the alarms.
– All other alarm types are directly reported to the IP device for processing.
2. If alarms are configured to be recorded to logs, the alarms are recorded after being
generated.
3. The IP device determines whether to change the physical status of the interface based on
customized alarm types.
– If no alarm types are customized to affect the physical status of the interface, alarms
of these types are ignored. The physical status of the interface remains unchanged.
– If an alarm type is customized to affect the physical status of the interface, the
alarm is processed based on the transmission alarm customization mechanism.
Transmission Alarm Customization

When a transmission device reports alarm signals to an IP device, the IP device determines
whether to change the physical status of its interface based on the transmission alarm
customization function.

Equipment
l If a certain type of alarms is customized to affect the interface status but transmission
alarm filtering or suppression is not configured:
– The physical status of the interface changes to Down if such an alarm is generated.
– The physical status of the interface changes to Up if such an alarm is cleared.
l If a certain type of alarms is customized to affect the interface status and transmission
alarm filtering or suppression is configured, the IP device processes the alarm according
to the filtering mechanism or suppression parameters.
(Optional) Transmission Alarm Filtering

Transmission alarm filtering enables an IP device to determine whether an alarm signal is a
burr.
If the interval between an alarm signal generation and clearance is less than the filtering timer
value, this alarm signal is considered a burr.
l If the alarm signal is a burr, it is ignored. The physical status of the interface remains
unchanged.
l If the alarm signal is not a burr:
– The physical status of the interface changes to Down if the signal is an alarm
generation signal.
– The physical status of the interface changes to Up if the signal is an alarm clearance
signal that is not suppressed.
(Optional) Transmission Alarm Suppression

Transmission alarm suppression enables an IP device to determine how to process an alarm
signal.
l When an alarm's figure of merit is smaller than suppress:
– If no alarm generation or clearance signal is received, figure of merit decreases
with time.
– If an alarm generation signal is received, the physical status of the interface changes
to Down, and figure of merit increases by the penalty value.
– If an alarm clearance signal is received, the physical status of the interface changes
to Up. figure of merit decreases exponentially.
l When an alarm's figure of merit reaches suppress, this alarm is suppressed. The
generation or clearance signal of this alarm does not affect the physical status of the
interface.
l When an alarm is frequently generated, figure of merit reaches ceiling. figure of merit
then stops increasing, even if new alarm signals arrive. If no alarm signals arrive, figure
of merit decreases with time.
l When an alarm's figure of merit decreases to reuse, this alarm is free from suppression.
After the alarm is free from suppression, the process repeats if this alarm is generated again.

Equipment
Figure 4-5 Alarm suppression attenuation
figure of merit
ceiling
suppress
reuse
t1 t2 t3 t4 t5 time
Figure 4-5 shows the correlation between a transmission device sending alarm generation
signals and how figure of merit increases and decreases.
1. At t1 and t2, figure of merit is smaller than suppress. Therefore, alarm signals
generated at t1 and t2 affect the physical status of the interface, and the physical status of
the interface changes to Down.
2. At t3, figure of merit exceeds suppress, and the alarm is suppressed. The physical status
of the interface is not affected, even if new alarm signals arrive.
3. At t4, figure of merit reaches ceiling. If new alarm signals arrive, figure of merit is
recalculated but does not exceed ceiling.
4. At t5, figure of merit falls below reuse, and the alarm is free from suppression.
Terms
None

Abbreviation
SDH synchronous digital hierarchy
SONET synchronous optical network

Equipment

Abbreviation
VRP versatile routing platform
4.3 Interface Alarm Inversion
4.3.1 Introduction
Definition
Alarm inversion inverts the state of LOS alarms when they are generated on physical
interfaces during device deployment.
Purpose
Alarm inversion can be used to invert the state of when an LOS alarm is generated. This
function is useful during device deployment when a device's physical interfaces have services
configured but do not connect to any cables. In such a scenario without alarm inversion
enabled, a device will report LOS alarms to the NMS. When this function is enabled, the
device will not report LOS alarms. Clearing the alarms will disable alarm inversion so that
subsequent LOS alarms can be reported. Alarm inversion does not affect network monitoring.
4.3.2 Principles
4.3.2.1 Alarm Inversion Mode

Two alarm inversion modes are available on an ATN device' physical interfaces: non-
inversion and inversion (automatic recovery). Table 4-1 describes how LOS alarms are
processed in the two modes.
Table 4-1 LOS alarm processing in alarm inversion modes

Inversion Mode Processing Condition Processing Result
Non-inversion The ATN device is in the None

normal alarm monitoring
status (default status).
The ATN device's global An error is returned if alarm

alarm inversion mode is set inversion is enabled on a
to non-inversion. physical interface.

Equipment
Inversion Mode Processing Condition Processing Result
Inversion (automatic The alarm inversion status is l The setting fails if no

recovery) set to Enable on the ATN LOS alarms are
device's physical interfaces. generated on physical
interfaces.
l If physical interfaces
have LOS alarms, the
LOS alarms are
suppressed.
NOTE
After the cause of LOS alarms
is cleared, alarm inversion is
automatically disabled on the
physical interfaces.
The alarm inversion status is The status of the LOS

set to Disable on the ATN alarms reported by the
device's physical interfaces. physical interfaces is the
actual status of the LOS
alarms.

Equipment
Feature Description 5 LAN Access and MAN Access
5 LAN Access and MAN Access
About This Chapter
This document describes the LAN access and MAN access in terms of the overview,
principle, and applications.
5.1 Ethernet
5.2 VLAN
5.3 Trunk
5.4 STP/RSTP/MSTP
5.5 QinQ
5.6 RRPP
5.7 LLDP
5.8 Transparent Transmission of Layer 2 Protocol Packets
5.9 ERPS (G.8032)
5.10 Automatic Link Discovery
5.1 Ethernet
5.1.1 Introduction to Ethernet
Definition
The Ethernet technology originates from an experimental network with the purpose of
connecting multiple PCs at the speed of 3 Mbit/s. In general, Ethernet refers to a standard for
10 Mbit/s Ethernet networks. The Digital Equipment Corporation (DEC), Intel, and Xerox
(DIX) joined efforts to develop and then issued the standard in 1982. The IEEE 802.3
standard is developed on the basis of the Ethernet standard, and is compatible with it.

Equipment
In TCP/IP, the encapsulation format of IP packets of the Ethernet is defined in RFC 894, and
that of the IEEE 802.3 network is defined in RFC 1042. Currently, the most commonly-used
encapsulation format is that defined in RFC 894, which is called Ethernet_II or Ethernet DIX.
NOTE
To distinguish Ethernet frames of those two types, in this document, Ethernet frames defined in RFC
894 are called Ethernet_II frames; Ethernet frames defined in RFC 1042 are called IEEE 802.3 frames.
Purpose
Ethernet is a universal communication protocol standard used for local area networks (LANs).
This standard defines the cable type and signal processing method used for LANs.
Ethernet networks are broadcast networks established based on the Carrier Sense Multiple
Access with Collision Detection (CSMA/CD) mechanism. Collisions restrict Ethernet
performance. Early Ethernet devices such as hubs work at the physical layer, and cannot
confine collisions to a particular scope. This restricts network performance improvement.
Working at the data link layer, switches are able to confine collisions to a particular scope.
Therefore, switches help improve Ethernet performance and gradually replace hubs to become
mainstream Ethernet devices. Switches, however, do not restrict broadcast traffic on the
Ethernet. This affects Ethernet performance. To resolve this problem, divide a LAN into
virtual local area networks (VLANs) on switches or use Layer 3 switches.
As a simple, cheap, and easy-to-implement LAN technology, Ethernet has become the
mainstream in the industry. The development of Fast Ethernet (FE) and Gigabit Ethernet
(GE), which provide higher Ethernet performance, helps Ethernet become the most promising
network technology.
5.1.2 Principles
5.1.2.1 Physical Layer of the Ethernet
Introduction to Ethernet Cable Standards

Currently, the well-developed Ethernet cabling standards are as follows:
l 10BASE-2
l 10BASE-5
l 10BASE-T
l 10BASE-F
l 100BASE-T4
l 100BASE-TX
l 100BASE-FX
l 1000BASE-SX
l 1000BASE-LX
l 1000BASE-CX
l 1000BASE-TX
In the preceding cabling standards, 10, 100, and 1000 stand for the transmission rate (the unit
is Mbit/s), and BASE represents baseband.

Equipment
l 10M Ethernet cable standards

Table 5-1 shows the 10M Ethernet cabling standard defined in IEEE 802.3.
Table 5-1 10M Ethernet cable standards

Name Cable Maximum
Transmission Distance
10BASE-5 Thick coaxial cable 500 m
10BASE-2 Thin coaxial cable 200 m
10BASE-T Twisted pair cable 100 m
10BASE-F Fiber 2000 m
NOTE
The fatal defect of the coaxial cable is the fact that devices on the cable are connected in series and
therefore a single node failure can cause the breakdown of the entire network. As the physical
standards of coaxial cables, 10BASE-2 and 10BASE-5 have fallen into disuse.
l 100M Ethernet cable standards
The 100M Ethernet is also called Fast Ethernet (FE). Compared with the 10M Ethernet,
the 100M Ethernet has faster transmission rate at the physical layer, but they have no
difference at the data link layer.
Table 5-2 lists the 100M Ethernet cable standards.
Table 5-2 100M Ethernet cable standards

Name Cable Maximum
100Base-T4 Four pairs of Category 3 100 m

twisted pair cables
100Base-TX Two pairs of Category 5 100 m

twisted pair cables
100Base-FX Single-mode fiber or multi- 2000 m

mode fiber
Both the 10Base-T and 100Base-TX are applied to Category 5 twisted pair cables. They
have different transmission rates. The 10Base-T transmits data at a rate of 10 Mbit/s
whereas the 100Base-TX transmits data at 100 Mbit/s.
The 100Base-T4 is rarely adopted now.
l Gigabit Ethernet cable standards
The Gigabit Ethernet is developed on the basis of the Ethernet standard defined in IEEE
802.3. Based on the Ethernet protocol, the transmission rate of the FE is increased by 10
times and reaches 1 Gbit/s. Table 5-3 lists the Gigabit Ethernet cable standards.

Equipment
Table 5-3 Gigabit Ethernet cable standards

Interface Name Cables Maximum
1000Base-LX Single-mode fiber or multi- 316 m

mode fiber
1000Base-SX Multi-mode fiber 316 m
1000Base-CX Balanced twisted pair copper 25 m

wire cable
1000Base-TX Category 5 twisted pair cable 100 m
Using the Gigabit Ethernet technology, you can upgrade the existing Fast Ethernet from
100 Mbit/s to 1000 Mbit/s.
The physical layer of a Gigabit Ethernet uses 8B10B coding. In the traditional Ethernet
technology, the data link layer delivers 8-bit data sets to its physical layer. After proper
processing, the data sets, still being 8 bit, are sent to the data link layer for transmission.
The situation is different on the Gigabit Ethernet of optical fibers, in which the physical
layer maps the 8-bit data sets transmitted from the data link layer to 10-bit data sets and
then sends them out.
l 10GE cable standards
IEEE 802.3ae is the 10GE cable standard. For a 10GE, the cables are all optical fiber in
full-duplex mode.
The 10GE is under way, and will be widely deployed in future.
CSMA/CD
l Concept of CSMA/CD
The Ethernet network was originally designed to connect computers and other digital
devices on a shared physical line. The computers and digital devices can access the
shared line only in half-duplex mode. Therefore, a mechanism of collision detection and
avoidance is required to prevent multiple devices from contending for the line. Carrier
Sense Multiple Access with Collision Detection (CSMA/CD) is therefore introduced.
The concept of CSMA/CD is described as follows:
– CS: carrier sense
Before transmitting data, a station monitors the line to check whether the line is
idle. In this manner, chances of collision are decreased.
– MA: multiple access
The data sent by a station can be received by multiple stations.
– CD: collision detection
If two stations transmit electrical signals at the same time, the signals are
superimposed, and therefore the voltage amplitude doubles the normal amplitude.
The situation results in collision.
The stations, therefore, stop transmission after sensing the conflict, and resume the
transmission after a random delay.

Equipment
l Working process of CSMA/CD

CSMA/CD works as follows:
a. A station continuously detects whether the shared line is idle.
n If the line is idle, the station sends data.
n If the line is in use, the station waits until the line becomes idle.
b. If two stations send data at the same time, a conflict occurs on the line, and the
signal becomes unstable.
c. After detecting the instability, the station immediately stops sending data.
d. The station sends a series of disturbing pulse. After waiting for a period of time, the
station resumes the data transmission.
Sending the disturbing pulse is to inform other stations, especially the station that
sends data at the same time, that a conflict occurs on the line.
After detecting a conflict, the station waits for a random period of time, and then
resumes the data transmission.
Minimum Frame Length and Maximum Transmission Distance

l Minimum frame length
Due to the limitation of the CSMA/CD algorithm, an Ethernet frame cannot be shorter
than a certain length. On the Ethernet, the minimum frame length is 64 bytes, which is
determined jointly by the maximum transmission distance and the collision detection
mechanism.
The use of minimum frame length can prevent the situation where station A finishes
sending the last bit, but the first bit does not arrive at station B, which is in the distance.
Station B considers that the line is idle and begins to send data, leading to a conflict.
The upper layer protocol must guarantee that the Data field contains at least 46 bytes.
Therefore, the Data field plus 14-byte Ethernet frame header, and the 4-byte check code
at the frame tail equals the minimum frame length. If the Data field is less than 46 bytes,
the upper layer must fill up the field.
The upper limit of the Data field is set to 1500 bytes, which is required by the memory
cost and the buffer of low-cost LAN controller in 1979.
l Maximum transmission distance
The maximum transmission distance is decided by the factors such as line quality and
signal attenuation.
Duplex Modes of the Ethernet

The physical layer of an Ethernet can work in either half-duplex or full-duplex mode.
l Half-duplex mode
The half-duplex mode has the following features:
– Receiving data or sending data takes place in only one direction at a time.
– The CSMA/CD mechanism is adopted.
– The transmission distance is limited.
Hubs work in half-duplex mode.
l Full-duplex mode

Equipment
After Layer 2 switches replace Hubs in networking, the shared Ethernet changes to the
switched Ethernet, and the half-duplex mode is replaced by the full-duplex mode. As a
result, the transmission rate is drastically increased, and the maximum throughput
reaches the double rate.
The full-duplex mode solves the problem of conflicts once and for all. CSMA/CD,
therefore, is no longer adopted by the Ethernet.
The full-duplex mode has the following features:
– Transmitting data and receiving data can take place simultaneously.
– The maximum throughput doubles the transmission rate.
– This mode does not have the limitation on the transmission distance.
Except Hubs, the network cards, Layer 2 devices, and Layer 3 devices produced in
recent 10 years all support the full-duplex mode.
To realize the full-duplex mode, the hardware requirements are as follows:
– Full-duplex network cards and chips
– Physical media over which sending and receiving frames are separated
– Point-to-point connection
Auto-Negotiation of the Ethernet

l Purpose of auto-negotiation
The earlier Ethernet adopts the 10 Mbit/s half-duplex mode, therefore, mechanisms such
as CSMA/CD are required to guarantee the system stability. With the development of
technology, the full-duplex mode and 100M Ethernet emerge. As a result, the Ethernet
performance is greatly improved. A new problem about how to achieve the compatibility
between the earlier Ethernet and the new-constructed Ethernet arises.
The auto-negotiation technology is therefore introduced. In auto-negotiation, the devices
on two ends of a link can choose the same operation parameters by exchanging
information. The main parameters to be negotiated are mode (half-duplex or full-
duplex), speed, and flow control. After the negotiation succeeds, the devices on two ends
operate in the negotiated mode and rate.
The auto-negotiation of duplex is defined in the following standards:
– 100M Ethernet standard: IEEE 802.3u
In IEEE 802.3u, the auto-negotiation is defined as an optional function.
– Gigabit Ethernet standard: IEEE 802.3z
In IEEE 802.3z, the auto-negotiation is defined as a mandatory and default function.
l Principle of auto-negotiation
Auto-negotiation is an Ethernet procedure by which two connected devices choose
common transmission parameters. It allows a network device to transmit the supported
operating mode to the peer and receives the operating mode from the peer. In this
process, the connected devices first share their capabilities regarding these parameters
and then choose the highest performance transmission mode they both support.
When no data is transmitted over twisted pair cables on an Ethernet network, pulses of
high frequency are transmitted at an interval of 16 ms to form Normal Link Pulse (NLP)
to maintain the connections at the link layer. Some pulses of high frequency can be
inserted in the NLP to form Fast Link Pulse (FLP) to transmit more information, as
shown in Figure 5-1. The basic mechanism of auto-negotiation is to encapsulate the
negotiation information into FLP.

Equipment
Figure 5-1 Schematic diagram of pulse insertion
16ms
1ms
16 small pulses are inserted

into every pulse
Similar to an Ethernet network that uses twisted pair cables, an Ethernet network that
uses optical modules and optical fibers also implements auto-negotiation by sending
code streams. These code streams are called Configuration (C) code streams. Different
from electrical interfaces, optical interfaces generally do not negotiate traffic
transmission rates and work in duplex mode. Therefore, only flow control parameters are
negotiated.
Auto-negotiation priorities of the Ethernet duplex link are listed as follows in a
descending order:
– 1000M full-duplex
– 1000M half-duplex
– 100M full-duplex
– 100M half-duplex
– 10M full-duplex
– 10M half-duplex
If auto-negotiation succeeds, the Ethernet card activates the link. Then, data can be
transmitted on the link. If auto-negotiation fails, the link is unavailable.
Auto-negotiation is implemented based on the chip design at the physical layer. As
defined in IEEE 802.3, auto-negotiation is implemented in any of the following cases:
– A faulty link recovers.
– A device is re-powered on.
– Either of two connected devices resets.
– A renegotiation request packet is received.
In other cases, two connected devices do not always send auto-negotiation code streams.
Auto-negotiation does not use special packets or bring additional protocol costs.
l Auto-negotiation rules for interfaces
Two connected interfaces can communicate with each other only when they are in the
same working mode.
– If both interfaces work in the same non-auto-negotiation mode, the interfaces can
communicate.
– If both interfaces work in auto-negotiation mode, the interfaces can communicate
through negotiation. The negotiated working mode depends on the interface with
lower capability (specifically, if one interface works in full-duplex mode and the
other interface works in half-duplex mode, the negotiated working mode is half-
duplex). The auto-negotiation function also allows the interfaces to negotiate about
the traffic control function.

Equipment
– If a local interface works in auto-negotiation mode and the remote interface works
in a non-auto-negotiation mode, the negotiated working mode of the local interface
depends on the working mode of the remote interface.
Table 5-4 describes the auto-negotiation rules for interfaces of the same type.
Table 5-4 Auto-negotiation rules for interfaces of the same type (the local interface
works in auto-negotiation mode)
Interface Type Working Mode Auto- Description

of the Remote negotiation
Interface Result
FE electrical 10M half-duplex 10M half-duplex If the remote

interface interface works in
10M full-duplex 10M half-duplex 10M full-duplex
100M half-duplex 100M half-duplex or 100M full-
duplex mode, the
100M full-duplex 100M half-duplex working modes of
the two interfaces
are different after
auto-negotiation,
and packets may
be dropped.
Therefore, if the
remote interface
works in 10M
full-duplex or
100M full-duplex
mode, configure
the local interface
to work in the
same mode.
GE electrical FE auto- 100M full-duplex If the remote

interface negotiation interface works in
10M full-duplex
10M half-duplex 10M half-duplex or 100M full-
10M full-duplex 10M half-duplex duplex mode, the
working modes of
100M half-duplex 100M half-duplex the two interfaces
are different after
100M full-duplex 100M half-duplex auto-negotiation,
and packets may
1000M full- 1000M full-
be dropped.
duplex duplex
Therefore, if the
remote interface
works in 10M
full-duplex or
100M full-duplex
mode, configure
the local interface
to work in the
same mode.

Equipment
Table 5-5 describes the auto-negotiation rules for interfaces of different types.
Table 5-5 Auto-negotiation rules for interfaces of different types
Interfa Working Working Auto- Description

ce Mode of Mode of negotiati
Type an FE a GE on
Electrica Electrical Result
l Interface
Interface
An FE 10M half- Auto- 10M half- If the FE electrical interface

electrica duplex negotiatio duplex works in 10M full-duplex or
l n 100M full-duplex mode and
interface 10M full- 10M half- the GE electrical interface
connecti duplex duplex works in auto-negotiation
ng to a 100M 100M mode, the working modes of
GE half- half- the two interfaces are
electrica duplex duplex different after auto-
l negotiation and packets may
interface 100M 100M be dropped. Therefore, if the
full- half- FE electrical interface works
duplex duplex in 10M full-duplex or 100M
full-duplex mode, configure
the GE electrical interface to
work in the same mode.
Auto- 10M half- 10M half- If the FE electrical interface

negotiatio duplex duplex works in auto-negotiation
n mode and the GE electrical
10M full- 10M half- interface works in 10M full-
duplex duplex duplex or 100M full-duplex
100M 100M mode, the working modes of
half- half- the two interfaces are
duplex duplex different after auto-
negotiation, and packets may
100M full- 100M be dropped. Therefore, if the
duplex half- GE electrical interface works
duplex in 10M full-duplex or 100M
full-duplex mode, configure
1000M Failure the FE electrical interface to
full-duplex work in the same mode.
Do not configure the GE
electrical interface to work in
1000M full-duplex mode. If
you configure the GE
electrical interface to work in
this mode, auto-negotiation
fails.

Equipment
According to the auto-negotiation rules described in Table 5-4 and Table 5-5, if an
interface works in auto-negotiation mode and the connected interface works in a
non-auto-negotiation mode, packets may be dropped or auto-negotiation may fail. It
is recommended that you configure two connected interfaces to work in the same
mode to ensure that they can communicate properly.
FE and higher-rate optical interfaces only support the full duplex mode. Auto-
negotiation is enabled on GE interfaces for the negotiation of traffic control. When
devices are directly connected using GE optical interfaces, auto-negotiation is
enabled on the optical interfaces to detect the unidirectional optical fiber fault. If
one of two optical fibers is faulty, the fault information is synchronized on both
ends through auto-negotiation. As a result, interfaces on both ends go Down. After
the fault is rectified, the interfaces go Up again through auto-negotiation.
HUB
l Hub principle
When terminals are connected using twisted pair cables, a convergence device, which is
called Hub, is required. Operating at the physical layer, Hubs connect devices. Figure
5-2 shows a Hub operation model.
Figure 5-2 Hub operation model
Application Application
layer layer
Presentation Presentation
layer layer
Session Session
layer layer
Transport Transport
layer layer
Network Network
layer HUB layer
Link Link
layer layer
Physical Physical Physical Physical
layer layer layer
layer
The appearance of a Hub is a box with multiple interfaces. Each interface can connect to
a terminal. Therefore, multiple devices can be connected through a Hub to form a star
topology.
NOTE
Note that although the topology is physically a star shape, the Hub uses the bus and CSMA/CD
technologies.

Equipment
Figure 5-3 Hub operation principle
11 22 33 44 55
IN OUT OUT OUT OUT
l According to the supported interfaces, Hubs can be classified into the following two
types:
– Category-I Hub: supports physical interfaces of one type.
For example, a Category-I Hub provides only Category-5 twisted pair interfaces,
Category-3 twisted pair interfaces, or optical fiber interfaces.
– Category-II Hub: provides interfaces of different types. For example, a Category-II
Hub can provide both Category-5 twisted pair interfaces and optical fiber interfaces.
Both types have no difference in internal operation mode; however, they are used in
different scenarios because they provide different types of interface. In practice,
Category-I Hubs are commonly used.
5.1.2.2 Data Link Layer of the Ethernet
Hierarchical Structure of the Data Link Layer

In the Ethernet, according to different duplex modes, the following access modes are used:
l CSMA/CD is used in half-duplex mode.

l Data is sent without detecting whether the line is idle in full-duplex mode.
Duplex mode, either half or full, refers to the operation mode of the physical layer. Access
mode refers to the access of the data link layer. Therefore, in the Ethernet, the data link layer
and physical layer are associated.
Therefore, different access modes are required for different operation modes. This brings
about some inconvenience to the design and application of the Ethernet.
Some organizations and vendors propose to divide the data link layer into two sub-layers: the
Media Access Control (MAC) sub-layer and the Logical Link Control (LLC) sub-layer.
Therefore, different physical layers correspond to different MAC sub-layers, and the LLC
sub-layer becomes totally independent, as shown in Figure 5-4.
Figure 5-4 Hierarchical structure of the data link layer of the Ethernet
Network
layer
LLC layer
Data link
layer
MAC layer
Physical
layer

Equipment
Functions of the MAC sub-layer

The MAC sub-layer is responsible for the following:
l Providing the access to physical links.
The MAC sub-layer is associated with the physical layer. That is, different MAC sub-
layers provide access to different physical layers.
In the Ethernet, two types of MAC sub-layers exist:
– Half-duplex MAC: provides access to the physical layer in half-duplex mode.
– Full-duplex MAC: provides access to the physical layer in full-duplex mode.
The two types of MAC are integrated in a network interface card. After the network
interface card is initialized, auto-negotiation is performed to choose an operation mode,
and then a MAC is chosen according to the operation mode.
l Identifying stations at the data link layer.
The MAC sub-layer reserves a unique MAC address to identify each station.
The MAC sub-layer uses a MAC address to uniquely identify a station.
MAC addresses are managed by Institute of Electrical and Electronics Engineers (IEEE)
and allocated in blocks. An organization, generally a vendor, obtains a unique address
block from IEEE. The address block is called the Organizationally Unique Identifier
(OUI). Using the OUI, the organization can allocate addresses to 16777216 devices.
A MAC address consists of 48 bits, which are generally represented in 12-digit dotted
hexadecimal notation. For example, the 48-bit MAC address
000000001110000011111100001110011000000000110100 is generally represented by
00e0.fc39.8034.
The first 6 digits in dotted hexadecimal notation stand for the OUI; the last 6 digits are
allocated by the vendor. For example, in 00e0.fc39.8034, 00e0.fc is the OUI allocated by
IEEE to Huawei; 39.8034 is the address number allocated by Huawei.
The second bit of a MAC address indicates whether the address is globally unique or
locally unique. The Ethernet uses globally unique MAC addresses.
MAC addresses are divided into the following types:
– Physical MAC address
A physical MAC address is burned into hardware (such as a network interface card)
and is used to uniquely identify a terminal on the Ethernet.
– Broadcast MAC address
A broadcast MAC address indicates all the terminals on a network.
The 48 bits of a broadcast MAC address are all 1s, such as ffff.ffff.ffff.
l Transmitting data over the data link layer. After receiving data from the LLC sub-layer,
the MAC sub-layer adds the MAC address and control information to the data, and then
transmits the data to the physical link. In the process, the MAC sub-layer provides other
functions such as the check function.
Data transmission at the data link layer is as follows:
a. The upper layer delivers data to the MAC sub-layer.
b. The MAC sub-layer stores the data in the buffer.
c. The MAC sub-layer adds the destination MAC address and source MAC address to
the data, calculates the length of the data frame, and forms Ethernet frames.

Equipment
d. The Ethernet frame is sent to the peer according to the destination MAC address.
e. The peer compares the destination MAC address with entries in the MAC address
table.
n If an entry is matched, the frame is accepted.
n If no entry is matched, the frame is discarded.
Frame Structure of the Ethernet

l Format of an Ethernet_II frame
Figure 5-5 Format of an Ethernet_II frame
6byte 6byte 2byte 46~1500byte 4byte

DMAC SMAC Type Data CRC
The fields of the Ethernet_II frame are described as follows:

– DMAC
It indicates the destination MAC address. DMAC specifies the receiver of the
frame.
– SMAC
It indicates the source MAC address. SMAC specifies the station that sends the
frame.
– Type
The 2-byte Type field identifies the upper layer protocol of the Data field. The
receiver can know the meaning of the Data field according to the Type field.
On the Ethernet, multiple protocols can coexist on a LAN. The hexadecimal values
in the Type field of an Ethernet_II frame stand for different protocols.
n Frames with the Type field value being 0800 are IP frames.
n Frames with the Type field value being 0806 are Address Resolution Protocol
(ARP) frames.
n Frame with the Type field value being 8035 are Reverse Address Resolution
Protocol (RARP) frames.
n Frames with the Type field value being 8137 are Internetwork Packet
Exchange (IPx) and Sequenced Packet Exchange (SPx) frames.
n Frame with the Type field value being 8847 are Multiprotocol Label Switching
(MPLS) frames.
– Data
The minimum length of the Data field is 46 bytes, which guarantees that the frame
is at least 64 bytes in length. The 46-byte Data field is required even if you attempt
to transmit only 1-byte information.
If the payload of the Data field is less than 46 bytes, the Data field must be padded
to 46 bytes.
The maximum length of the Data field is 1500 bytes.
– CRC

Equipment
The Cyclic Redundancy Check (CRC) field provides an error detection mechanism.
Each sending device calculates a CRC code containing the DMAC, SMAC, Type,
and Data fields. Then the CRC code is filled into the 4-byte CRC field.
l Format of an IEEE 802.3 frame
Figure 5-6 Format of an IEEE 802.3 frame

DMAC SMAC Length LLCSNAP Data CRC
DSAP SSAPControl org code Type

1byte 1byte 1byte 3byte 2byte
As shown in Figure 5-6, the format of an IEEE 802.3 frame is similar to that of an
Ethernet_II frame except that in an IEEE 802.3 frame, the Type field is changed to the
Length field, and the LLC field and the Sub-Network Access Protocol (SNAP) field
occupy 8 bytes of the Data field.
– Length
The Length field specifies the number of bytes of the Data field.
– LLC
The LLC field consists of three sub-fields: Destination Service Access Point
(DSAP), Source Service Access Point (SSAP), and Control.
– SNAP
The SNAP field consists of the Org Code field and the Type field. Three bytes in
the Org Code field are all 0s. The Type field functions the same as the Type field in
Ethernet_II frames.
For descriptions about other fields, see the relevant description of Ethernet_II frames.
Based on the values of DSAP and SSAP, IEEE 802.3 frames can be divided into the
following types:
– If DSAP and SSAP are both 0xff, the IEEE 802.3 frame changes to a Netware-
Ethernet frame that bears NetWare data.
– If DSAP and SSAP are both 0xaa, the IEEE 802.3 frame changes to an
Ethernet_SNAP frame.
Ethernet_SNAP frames can be encapsulated with data of multiple protocols. The
SNAP can be considered as an extension of the Ethernet protocol. SNAP allows
vendors to invent their own Ethernet transmission protocols.
The Ethernet_SNAP standard is defined by IEEE 802.1 to guarantee
interoperability between IEEE 802.3 LANs and Ethernet networks.
– Other values of DSAP and SSAP indicate IEEE 802.3 frames.
LLC Sub-layer
As described, the MAC sub-layer supports two types of frame: IEEE 802.3 frames and
Ethernet_II frames. In an Ethernet_II frame, the Type field identifies the upper layer protocol.

Equipment
Therefore, on a device, only the MAC sub-layer is required, and the LLC sub-layer does not
need to be realized.
In an IEEE 802.3 frame, besides the traditional services of the data link layer, the LLC sub-
layer defines additional useful features. All these features are provided by the sub-fields of
DSAP, SSAP, and Control.
The following lists three types of point-to-point services:
l Connectionless service
Currently, the Ethernet implements this service.
l Connection-oriented service
The connection is set up before data is transmitted. The reliability of the data is
guaranteed during the transmission.
l Connectionless data transmission with acknowledgement
The connection is not required before data transmission. The acknowledgement
mechanism is adopted to improve the reliability.
The following is an example that describes the applications of SSAP and DSAP. Assume that
terminals A and B use connection-oriented services. Data is transmitted in the following
process:
1. A sends a frame to B to require the establishment of a connection with B.

2. After receiving the frame, if B has enough resources, B returns an acknowledgement
message that contains a Service Access Point (SAP). The SAP identifies the connection
required by A.
3. After receiving the acknowledgement message, A knows that B has set up a local
connection with A. After creating a SAP, A sends a message containing the SAP to B.
The connection is set up.
4. The LLC sub-layer of A encapsulates the data into a frame. The DSAP field is filled in
with the SAP sent by B; the SSAP field is filled in with the SAP created by A. Then the
LLC sub-layer sends the data to the MAC sub-layer of A.
5. The MAC sub-layer of A adds the MAC address and the Length field into the frame, and
then sends the frame to the data link layer.
6. After the frame is received at the MAC sub-layer of B, the frame is transmitted to the
LLC sub-layer. The LLC sub-layer figures out the connection to which the frame belongs
according to the DSAP field.
7. After checking and acknowledging the frame based on the connection type, the LLC sub-
layer of B transmits the frame to the upper layer.
8. After the frame reaches its destination, A instructs B to release the connection by
sending a frame. At this time, the communications end.
5.1.3 Applications
5.1.3.1 Computer Interconnection

Computer interconnection is the principal object and the major application of the Ethernet
technology.

Equipment
At the beginning, a lot of computers are connected using coaxial cables to access shared
directories or access a file server located on the local network segment. All the computers,
regardless of servers or hosts, are equal on the network.
The structure, however, cannot keep up with the development in applications. Currently, most
traffic flows between clients and servers. This type of traffic model becomes a bottleneck on
servers inevitably.
After the full-duplex Ethernet technology and Ethernet switches are introduced, the servers
are connected to high-speed interfaces (100 Mbit/s) on Ethernet switches, and the clients are
connected to low-speed interfaces on Ethernet switches. The traffic bottleneck can be
alleviated. The modern operating system provides distributed services and database services.
Servers based on this operating system communicate with clients and other servers for data
synchronization. 100M FE cannot meet the bandwidth requirement; therefore, the 1000M
Ethernet technology emerges as the times require.
5.1.3.2 Interconnection Between High-Speed Network Devices

With the development of the Internet, bandwidth between some traditional network devices
cannot meet the transmission requirements. As a higher-speed and more-efficient technology,
1000M Ethernet becomes the first choice to solve the problem. 100M FE can also solve this
problem because after being converged, 100M FE networks can form FE channels whose
speed ranges from 100 Mbit/s to 1000 Mbit/s.
5.1.3.3 Means to Access MANs

Nowadays, accessing the Metropolitan Area Network (MAN) to surf online, download files,
and view Video on Demand (VoD) programs become more and more popular. The Ethernet
technology is used as the means to access MANs because most computers support Ethernet
network interface cards. Therefore, users can go online without changing software and
hardware configurations.
Terms
Term Description
10Base-T Defined in IEEE 802.3i, it is an Ethernet specification that uses the

twist pair with the maximum length of 100 meters (328.08 ft.) at 10
Mbit/s for each network segment.
100Base-T Defined in IEEE 802.3u, it is a Fast Ethernet specification that uses

the twist pair with the maximum length of 100 meters (328.08 ft.) at
100 Mbit/s for each network segment.
1000BaseT Defined in IEEE 802.3ab, it is an Ethernet specification that uses the

twist pair with the maximum length of 100 meters (328.08 ft.) at 1000
Mbit/s for each network segment.

Equipment
Term Description
Ethernet Created by Xerox and developed by Xerox, Intel, and Digital

Equipment Corporation (DEC), it is a baseband LAN specification
that uses CSMA/CD and transmits data over various cables at 10
Mbit/s. Ethernet-related standards are defined in IEEE 802.3 series.
Ethernet_II It is an encapsulation format of Ethernet frames, which is the standard

ARPA Ethernet Version 2.0 encapsulation that uses a 16-bit protocol
type code.
Ethernet_SNAP It is an encapsulation format of Ethernet frames. As specified in RFC

1042, it allows Ethernet frames to be transmitted through IEEE 802.2
media.
FE It is short for the Fast Ethernet. Complying with IEEE 802.3u, it is an

extension and enhancement of the traditional media-sharing Ethernet
standard and allows data to be transmitted at 100 Mbit/s.
Full-duplex The full-duplex mode is an operation mode of Ethernet interfaces. In

full-duplex mode, interfaces on both ends can send and receive data
at the same time without interruption.
GE It is short for Gigabit Ethernet. Complying with IEEE 802.3z, the GE

is compatible with the 10M Ethernet and the 100M Ethernet (FE).
Half-duplex The half-duplex mode is an operation mode of Ethernet interfaces. In

half-duplex mode, an interface can only receive or send data at a
time.
MAC It is short for Media Access Control. At the data link layer of the OSI
model, the MAC sub-layer is adjacent to the physical layer.
Auto-negotiation Through auto-negotiation, devices on both ends of a physical link

exchange information to automatically select an operation mode. In
auto-negotiation, the duplex mode and operation rate are negotiated.
Once the negotiation result is approved, the operation mode is fixed
until the device is restarted or the cable is removed.

Equipment

Acronym & Full Name
Abbreviation
CSMA/CD Carrier Sense Multiple Access with Collision Detection
GE Gigabit Ethernet
MAC Media Access Control
5.2 VLAN
5.2.1 Introduction
Definition
The Virtual Local Area Network (VLAN) technology logically divides a physical LAN into
multiple VLANs, each of which is a broadcast domain. Only intra-VLAN communication is
allowed for higher network security.
Purpose
The traditional LAN technology uses the bus structure and has the following shortcomings:
l Conflict occurs if multiple nodes send messages simultaneously.
l Messages are broadcast to all nodes.
l Networks have security risks because all hosts in a LAN share the same transmission
channel.
To overcome these shortcomings, bridges and Layer 2 switches are used to effectively isolate
the collision domain.
However, bridge and Layer 2 switches cannot address the network security issues caused by
broadcast domains.

Equipment
NOTE
In this document, a Layer 2 switch is referred to as a switch.
To reduce the broadcast traffic, broadcast traffic must be isolated among hosts that do not
have communication requirements. The ATN can select routes based on IP addresses and
effectively reduce broadcast traffic between two connected network segments. However, the
solution is costly. Multiple logical LANs, namely, VLANs were developed on the physical
LAN.
Hosts only in the same VLAN can communicate with each other. Broadcast packets are
therefore confined within each VLAN, and network security is also enhanced.
For example, it is costly for different companies in the same building to build their own
LANs. If these companies share the same LAN in the building, there may be security
problems.
To address these problems, these companies can use the VLAN technology.
Figure 5-7 Typical VLAN application

CX
VLAN-A
VLAN-B
VLAN-C
Figure 5-7 shows a networking diagram for a typical VLAN application. Three switches are
placed at different locations (for example, different floors of a building). Each switch
connects to three hosts that belong to different VLANs. Each VLAN can be used by a
different company. In the diagram, a dotted box indicates a VLAN.
5.2.2 Principles
VLAN Frame Format

IEEE 802.1Q defines a VLAN frame by adding a 4-byte 802.1Q tag between the source MAC
address field and the Length/Type field in an Ethernet frame, as shown in Figure 5-8.

Equipment
Figure 5-8 VLAN frame format defined in IEEE 802.1Q

6 bytes 6 bytes 4 bytes 2 bytes 42-1500 bytes 4 bytes
Destination Source 802.1Q Length Data FCS

address address Tag /Type (CRC-32)
Type PRI CFI VID
2 bytes 3 bits1 bit 12 bits
An 802.1Q tag contains four fields:

l Type
The 2-byte Type field indicates a frame type. If the value of the field is 0x8100, it
indicates an 802.1Q frame. If a device that does not support 802.1Q frames receives an
802.1Q frame, it discards the frame.
l PRI
The 3-bit Priority field indicates a frame priority. The value of the field ranges from 0 to
7. A greater PRI value indicates a higher frame priority. If a switch is congested, it
preferentially sends frames with a higher priority.
l CFI
The 1-bit Canonical Format Indicator (CFI) field indicates whether a MAC address is in
the canonical format. If the CFI field value is 0, the MAC address is in canonical format.
If the CFI field value is 1, the MAC address is not in canonical format. This field is
mainly used to differentiate among Ethernet frames, Fiber Distributed Digital Interface
(FDDI) frames, and token ring frames. The CFI field value in an Ethernet frame is 0.
l VID
The 12-bit VLAN ID (VID) field indicates to which VLAN the frame belongs. In the
ATN, the VLAN ID ranges from 0 to 4095. Note that 0 and 4095 are reserved VLAN
IDs and unavailable to users.
Link Types
VLAN links can be classified into the following types:
l Access link: a link connecting a host and a switch. In Figure 5-9, the link between PCs
and switches are all access links.
l Trunk link: a link connecting switches. In Figure 5-9, the links between switches are
trunk links. Frames transmitted over trunk links carry VLAN tags.

Equipment
Figure 5-9 Link types network diagram
VLAN2
VLAN3
Access Link
Trunk Link
Port Types
Some ports of a device can identify VLAN frames defined by IEEE 802.1Q, whereas others
cannot. Ports can be classified into three types based on whether they can identify VLAN
frames:
l Access port
An access port connects a switch to a host over an access link, as shown in Figure 5-9.
An access port has the following features:
– Directly discards frames with VLAN tags.
– Adds a PVID to its received untagged frame.
– Removes the tag from a frame before it sends the frame.
l Trunk port
As shown in Figure 5-9, a trunk port connects a switch to anther switch over a trunk
link. A trunk port has the following features:
– Allows tagged frames from multiple VLANs to pass.
– Directly discards untagged frames.
– Directly sends the tagged frame.
l Hybrid port
As shown in Figure 5-10, a hybrid port connects a switch to either a host over an access
link or another switch over a trunk link. A hybrid port allows frames from multiple
VLANs to pass and removes tags from outgoing VLAN frames by default.

Equipment
Figure 5-10 Port types
Hybrid Port
Access Link
Trunk Link
For details on QinQ, see QinQ.
Default VLAN
A default VLAN can be configured on access, trunk, and hybrid ports. However, the
meanings of default VLAN vary with port types.
l Default VLAN of an access port
– If an access port receives a tagged frame, it discards the frame.
– Before an access port sends a frame with a tag whose VID is the same as the PVID,
it removes the VLAN tag from the frame. Frames sent by an access port to a peer
device never carry VLAN tags.
l Default VLAN of a hybrid port
– If a hybrid port receives an untagged frame, it adds a VLAN tag to the frame and
sets the VID in the tag to the PVID.
– When a hybrid port receives a tagged frame,
n If the frame's VLAN ID is permitted by the port, the port accepts the tagged
frame.
n If the frame's VLAN ID is denied by the port, the port discards the tagged
frame.
– When a hybrid port sends a frame,
n If the frame's VLAN ID is permitted by the port, the port directly transmits the
frame.
n If the frame's VLAN ID is the same as the PVID, the port strips the VLAN tag
and sends the frame out.
5.2.2.2 VLAN Communication Principles
Basic Principles
To improve frame processing efficiency, frames arriving at a switch all carry VLAN tags for
uniform processing. If an untagged frame enters a switch port which has a default VLAN

Equipment
configured, the port then adds a VLAN tag whose VID is the same as the PVID to the frame.
If a tagged frame enters a switch port, the port does not add any tag to the frame, even if the
port has a default VLAN configured.
The switch processes frames in a different way according to port types. The following table
describes how ports of different types process frames.
Port Processing a Received Processing a Received Processing a Frame

Type Untagged Frame Tagged Frame to Be Sent
Access Accepts an untagged Discards the frame. Removes the tag that
frame and adds a tag with contains the PVID
the default VLAN ID to and sends the frame
the frame. out.
Trunk Discards the frame. l Accepts the tagged If the frame's VLAN
frame if the frame's ID is permitted by the
VLAN ID is permitted port, the port
by the port. transmits the frame.
l Discards the tagged Otherwise, the port
frame if the frame's discards the frame.
VLAN ID is denied by
the port.
Hybrid Accepts an untagged l Accepts the tagged l If the frame's

frame and adds a tag with frame if the frame's VLAN ID is
the PVID to the frame. VLAN ID matches the permitted by the
PVID. port, the port
l Discards a tagged directly transmits
frame if the VLAN ID the frame.
carried in the frame is l If the frame's
denied by the port. VLAN ID is the
same as the PVID,
the port strips the
PVID tag and
sends the frame
out.
l If the VLAN ID
carried in the
frame is the same
as the PVID, the
port sends the
frame as the
default VLAN
configurations.
QinQ A QinQ port adds a tag to each single-tagged frame and supports a maximum of
4094 x 4094 VLAN tags, which meet the requirement of a Metropolitan Area
Network (MAN) on the number of VLANs.

Equipment
Principles of Intra-VLAN Communication Across Switches

Hosts within the same VLAN sometimes connect to different switches. In this situation, ports
of different switches are required to recognize and send packets belonging to the same VLAN.
To meet this requirement, a trunk link is used.
A trunk link has the following functions:
l Replay function
A trunk link can transparently transmit VLAN packets from a switch to another
interconnected switch.
l Backbone function
A trunk link can transmit packets belonging to multiple VLANs.
Figure 5-11 Trunk link communication
VLAN3
ATNA GE2 GE3 ATNB

GE 1 GE4
Trunk Link
VLAN2
Host A Host B
In Figure 5-11, the trunk link between ATNA and ATNB must support both communication
within VLAN 2 and communication within VLAN 3. To implement this, the ports at both
ends of the trunk link must be configured to belong to both VLANs. Specifically, GE 1 on
ATNA and GE 2 on ATNB must belong to both VLAN 2 and VLAN 3.
Host A sends a frame to Host B in the following process:
1. Host A sends the frame to GE1 on ATNA.
2. GE1 adds a tag with a VLAN ID of 2 to the frame on GE1. 2 is the ID of the VLAN to
which GE1 belongs.
3. ATNA sends the frame to all its interfaces except GE1that belong to VLAN 2 (except
GE1).
4. GE2 sends the frame to ATNB.
5. After receiving the frame, ATNB finds that the frame belongs to VLAN 2 and sends the
frame to its interfaces that belong to VLAN 2 (except GE2).
6. GE4 sends the frame to Host B.
Communication within VLAN 3 is similar and is omitted here.

Equipment
Inter-VLAN Communication
After VLANs are configured, hosts in different VLANs cannot directly communicate with
each other at Layer 2. To implement communication between VLANs, you must create routes
between these VLANs. The implementation details are as follows:
l Layer 2 switch + router

On the network shown in Figure 5-12, a switched Ethernet interface on a Layer 2 switch
connects to a routed Ethernet interface on a router for inter-VLAN communication.
Figure 5-12 Inter-VLAN communication implemented through Layer 2 switch + Router
CX
Subinterface
VLAN Trunk
Access port
VLAN2 VLAN3
If VLAN 2 and VLAN 3 are configured on the switch, to enable VLAN 2 to

communicate with VLAN 3, you must create two sub-interfaces corresponding to VLAN
2 and VLAN 3 on the Ethernet interface of the router connected to the switch.
Then you must enable 802.1Q encapsulation and assign IP addresses to the sub-
interfaces.
On the switch, you must configure the switched Ethernet port type that connects to the
router to a trunk or hybrid port and allow frames of VLAN 2 and VLAN 3 to pass.
Layer 2 switch + router mode has the following shortcomings:
– Multiple devices are needed, and the networking is complex.
– Inter-VLAN communication is implemented through a router, which is expensive
and has a low transmission rate.
l Layer 3 switch
Layer 3 switching combines both routing and switching techniques to implement routing
on a switch, improving the overall performance of the network. After sending the first
data flow based on a routing table, a Layer 3 switch generates a mapping table, in which
the mapping between the MAC address and IP address for this data flow is recorded. If
the switch needs to send the same data flow again, it directly sends the data flow at Layer
2 but not at Layer 3 based on the mapping table. In this manner, delays on the network
caused by route selection are eliminated, and data forwarding efficiency is improved.
To allow the first data flow to be correctly forwarded based on the routing table, the
routing table must contain correct routing entries. Configuring a Layer 3 interface and a

Equipment
routing protocol on the Layer 3 switch is required. VLANIF interfaces are therefore
introduced.
A VLANIF interface is a Layer 3 logical interface, which can be configured on either a
Layer 3 switch or a router.
As shown in Figure 5-13, two VLANs, VLAN 2 and VLAN 3, are configured on the
switch. To implement communication between the two VLANs, create two VLANIF
interfaces on the switch and assign IP addresses and configure routes for the VLANIF
interfaces.
Figure 5-13 Inter-VLAN communication implemented through Layer 3 switch
VLANIF VLANIF
VLAN2 VLAN3
The Layer 3 switching addresses the shortcomings in the scheme of Layer 2 switch +
Router and implements faster traffic forwarding at a lower cost. Nevertheless, the Layer
3 switching has the following shortcomings:
– The Layer 3 switch scheme applies only to networks with almost all Ethernet
interfaces.
– The Layer 3 switching applies only to networks with stable routes and few changes
in the network topology.
5.2.2.3 VLAN Aggregation
Background
A VLAN is widely used on switching networks because of its flexible control of broadcast
domains and convenient deployment. On a Layer 3 switch, inter-VLAN communication is
implemented by configuring a VLANIF interface (logical Layer 3 interface) to each VLAN
and assigning an IP address to each VLANIF interface. This wastes IP addresses. Figure 5-14
shows a typical VLAN division in the device.

Equipment
Figure 5-14 VLAN division
VLANIF2:1.1.1.1 VLANIF4:1.1.1.25
VLANIF3:1.1.1.17
VLAN 2 VLAN 3 VLAN 4

1.1.1.0/28 1.1.1.16/29 1.1.1.24/30
Table 5-6 Example of host address assignment in a typical VLAN

VLAN Subnet Gateway Number of Number of Practical
Address Available Available Requirement
Addresses Hosts s
2 1.1.1.0/28 1.1.1.1 14 13 10
3 1.1.1.16/29 1.1.1.17 6 5 5
4 1.1.1.24/30 1.1.1.25 2 1 1
In Table 5-6, VLAN 2 requires 10 host addresses. Subnet 1.1.1.0/28 with mask length 28 bits
is assigned to VLAN 2. 1.1.1.0 is the subnet address, and 1.1.1.15 is the directed broadcast
address. These two addresses cannot be used as the host address. In addition, as the default
address of subnet, 1.1.1.1's network gateway cannot be used as the host address. The other 13
addresses ranging from 1.1.1.2 to 1.1.1.14 can be used by the hosts. In this way, although
VLAN 2 needs only ten addresses, 13 addresses need to be assigned to it according to the
division of the subnet.
VLAN 3 requires five host addresses, and subnet 1.1.1.16/29 with mask length 29 bits needs
to be assigned to VLAN 3. VLAN 4 requires only one address, and subnet 1.1.1.24/30 with
mask length 30 bits needs to be assigned to VLAN 4.
In the above example, 16 (10+5+1) addresses are required for all the VLANs, however 28
(16+8+4) addresses will be used according to the common VLAN addressing mode even if
the optimal scheme is used. Therefore, nearly half of the addresses will be wasted. In addition,
if VLAN 2 is accessed by only three hosts instead of ten later, the extra addresses will also be
wasted.
This division is inconvenient for future network upgrade and expansion. If VLAN 4 needs an
additional two hosts and does not want to change the assigned IP addresses, and the addresses

Equipment
after 1.1.1.24 has been assigned to others, a new subnet with mask length 29 bits and a new
VLAN need to be assigned to VLAN 4's new customers. As a result, VLAN 4's customers
only have three hosts, but the customers are assigned to two different subnets in separate
VLANs, which becomes inconvenient for network management.
In the preceding example, several IP addresses are used as subnet addresses, subnet
directional broadcast addresses, and default addresses of subnet network gateways, indicating
that these IP addresses cannot be used as host addresses in the VLAN. VLAN aggregation is
used to eliminate this limitation on address assignment.
Principles
VLAN aggregation, also known as a super-VLAN, partitions broadcast domains by using
multiple VLANs in a physical network so different VLANs can belong to the same subnet. In
VLAN aggregation, two basic concepts are involved, super VLAN and sub-VLAN.
l Super VLAN: Super VLANs differ from common VLANs. In super VLANs, only Layer
3 interfaces are created and physical ports are not contained. The super VLAN can be
regarded as a logical Layer 3 collection of many sub-VLANs.
l Sub-VLAN: Sub-VLANs are used to isolate broadcast domains. In sub-VLANs, only
physical ports are contained, and Layer 3 VLAN interfaces cannot be created. The Layer
3 switching with the external network is implemented through the super VLAN Layer 3
interface.
A super VLAN can contain one or more sub-VLANs each with different broadcast domains.
The sub-VLAN does not occupy an independent subnet segment. In the same super VLAN, IP
addresses of hosts belong to the super VLAN's subnet segment, regardless of the mapping
between hosts and sub-VLANs.
The same Layer 3 interface is shared by sub-VLANs. Some subnet IDs, default gateway
addresses of the subnet, and directed broadcast addresses of the subnet are saved. In addition,
different broadcast domains can use the unused addresses in the same subnet segment. As a
result, subnet differences are eliminated, addressing becomes flexible and previously wasted
addresses can be used.
Use the Table 5-6 to explain the implementation principle. Suppose that user demands are
unchanged. In VLAN 2, 10 host addresses are demanded; in VLAN 3, 5 host addresses are
demanded; in VLAN 4, 1 host address is demanded.
Create VLAN 10 and configure VLAN 10 as a super VLAN. Then assign subnet address
1.1.1.0/24 with mask length being 24 bits to VLAN 10, where 1.1.1.0 is the subnet ID and
1.1.1.1 is the gateway address of the subnet, as shown in Figure 5-15. The corresponding sub-
VLAN address assignment of VLAN 2, VLAN 3, and VLAN 4 is shown in Table 5-7.

Equipment
Figure 5-15 VLAN aggregation
Super VLAN 10
VLANIF10:1.1.1.1/24
Sub VLAN Sub VLAN Sub VLAN

2 3 4
Host IP Host IP Host IP
1.1.1.2-1.1.1.11 1.1.1.12-1.1.1.16 1.1.1.17
Table 5-7 Example of host address assignment in VLAN aggregation mode

VLA Subnet Gateway Number of Available Address
N Address Available Addresses Requirement
Addresses s
2 1.1.1.0/24 1.1.1.1 10 1.1.1.2-1.1.1.11 10
3 5 1.1.1.12-1.1.1.16 5
4 1 1.1.1.17 1
In VLAN aggregation implementation, sub-VLANs are not divided according to the previous
subnet border. Instead, their addresses are flexibly assigned in the super VLAN's subnet
according to the required number of hosts.
Table 5-7 shows that VLAN 2, VLAN 3, and VLAN 4 share a subnet (1.1.1.0/24), a default
gateway address of the subnet (1.1.1.1), and a directed broadcast address of the subnet
(1.1.1.255). In this manner, the subnet ID (1.1.1.16, 1.1.1.24), the default gateway of the
subnet (1.1.1.17, 1.1.1.25), and the directed broadcast address of the subnet (1.1.1.5, 1.1.1.23,
and 1.1.1.24) can be used as host IP addresses.
In total, 16 addresses (10 + 5 + 1 = 16) are required for the three VLANs. In practice, in this
subnet, a total of 16 addresses are assigned to the three VLANs (1.1.1.2 to 1.1.1.17). A total
of 19 IP addresses are used, that is, the 16 host addresses together with the subnet ID
(1.1.1.0), the default gateway of the subnet (1.1.1.1), and the directed broadcast address of the

Equipment
subnet (1.1.1.255). In the network segment, 236 addresses (255 - 19 = 236) are available,
which can be used by any host in the sub-VLAN.
Communications Between VLANs

l Introduction
VLAN aggregation ensures that different VLANs use the IP addresses in the same
subnet segment. However, this leads to the problem of Layer 3 forwarding between sub-
VLANs.
In common VLAN mode, the hosts of different VLANs can communicate with each
other based on Layer 3 forwarding through their respective gateways. In VLAN
aggregation mode, however, hosts in a super VLAN use IP addresses in the same
network segment and share the same gateway address. Since hosts in different sub-
VLANs belong to the same subnet, they communicate with each other based on Layer 2
forwarding, not Layer 3 forwarding through a gateway. Therefore, hosts in different sub-
VLANs are separated in Layer 2 and are incapable of communicating with each other. To
resolve this issue, the AR2200 ARP proxy solution ensures that sub-VLANs are capable
of communicating with each other.
NOTE
For ARP proxy details, see "ARP" in the Feature Description - IP Services.
l Layer 3 Communications Between Different sub-VLANs
As shown in Figure 5-16, the super VLAN, VLAN 10, contains the sub-VLANs VLAN
2 and VLAN 3.
Figure 5-16 Layer 3 communication between different sub-VLANs based on ARP proxy
Super VLAN 10
VLANIF10: 1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
If Host A's ARP table has no corresponding entry for Host B and the gateway between
sub-VLANs is enabled with the ARP proxy, communication between Host A in VLAN 2
and Host B in VLAN 3 proceeds as follows:
a. After comparing the IP address of Host B 1.1.1.3 with its IP address, Host A finds
that both IP addresses are in the same network segment 1.1.1.0/24, and its ARP
table has no entry corresponding to Host B.

Equipment
b. Host A initiates an ARP broadcast to request for Host B's MAC address.
c. Host B is not in the broadcast domain of VLAN 2, and cannot receive the ARP
request.
d. Since the gateway's ARP proxy is enabled between sub-VLANs, after receiving
Host A's ARP request, the gateway discovers that the IP address of Host B 1.1.1.3
is the IP address of a directly-connected interface. The gateway then initiates an
ARP broadcast to all other sub-VLAN interfaces to request Host B's MAC address.
e. After receiving an ARP request, Host B offers an ARP response.
f. After receiving Host B's ARP response, the gateway replies with Host A's MAC
address.
g. The ARP tables in both the gateway and Host A have entries corresponding to Host
B.
h. To send packets to Host B, Host A initially sends packets to the gateway, and then
the gateway carries out Layer 3 forwarding.
The process that Host B uses to send packets to Host A functions in the same way.
l Layer 2 communication between a sub-VLAN and an external network
In Figure 5-17, in port-based Layer 2 VLAN communication, the received or sent
frames are not tagged with the super VLAN ID.
Figure 5-17 Layer 2 communication between a sub-VLAN and an external network

ATN2
GE1 VLAN Trunk all

GE3 VLAN Trunk all
ATN1
GE1 GE2
Super VLAN 10
VLANIF10:1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
Host A sends a frame to Switch 1 through GE1. Upon receipt, Switch 1 adds a VLAN
tag with a VLAN ID 2 to the frame. The VLAN ID is not changed to VLAN 10's ID on
Switch 1 even if VLAN 2 is the sub-VLAN of VLAN 10. After passing through GE3 (a
trunk port), this frame still carries VLAN 2's ID.
That is, Switch 1 itself does not send VLAN 10's frames.

Equipment
A super VLAN has no physical port. This limitation is obligatory, as shown below:
– If you configure the super VLAN and then the trunk interface, the frames of a super
VLAN are filtered automatically according to the VLAN range set on the trunk
interface.
In Figure 5-17, no frame of the super VLAN 10 passes through GE3 on Switch 1,
even though the interface allows frames from all VLANs to pass through.
– If you complete configuring the trunk interface and allow all VLANs to pass
through, the super VLAN still cannot be configured on Switch 1, because any
VLAN with physical ports cannot be configured as the super VLAN, and the trunk
interface allows only the frames tagged with VLAN IDs to pass through.
As for Switch 1, the valid VLANs are VLAN 2 and VLAN 3, and all frames are
forwarded in these VLANs.
l Layer 3 communication between a sub-VLAN and an external network
Figure 5-18 Layer 3 communication between a sub-VLAN and an external network
VLANIF20
1.1.3.1/24
ATN2 GE2
GE1 VLANIF10
Host C
1.1.2.2/24
1.1.3.2/24
VLANIF10
GE3 1.1.2.1/24
ATN1
GE1 GE2
Super VLAN 4
VLANIF4:1.1.1.1/24
VLAN 2 VLAN 3
Host A Host B
1.1.1.2/24 1.1.1.3/24
In Figure 5-18, Switch 1 is configured with super VLAN 4, sub-VLAN 2, sub-VLAN 3,

and a common VLAN 10. Switch 2 is configured with two common VLANs, VLAN 10
and VLAN 20. Suppose that Switch 1 is configured with the route to network segment
1.1.3.0/24, and Switch 2 is configured with the route to network segment 1.1.1.0/24.
Then Host A in sub-VLAN 2 that belongs to the super VLAN 4 will have to access Host
C in Switch 2.
a. By comparing the IP address of Host C 1.1.3.2 with its IP address, Host A
determines that two IP addresses are not in the same network segment 1.1.1.0/24.

Equipment
b. Host A initiates an ARP broadcast to its gateway, requesting the gateway's MAC
address.
c. After receiving the ARP request, Switch 1 identifies the correlation between the
sub-VLAN and the super VLAN, and offers an ARP response to Host A through
sub-VLAN 2. The source MAC address in the ARP response packet is the MAC
address of VLANIF4 for super VLAN 4.
d. Host A learns the gateway's MAC address.
e. Host A sends the packet to the gateway, with the destination MAC address as the
MAC address of VLANIF4 for super VLAN 4, and the destination IP address of
1.1.3.2.
f. After receiving the packet, Switch 1 performs Layer 3 forwarding and sends the
packet to Switch 2, with the next hop address as 1.1.2.2, the outgoing interface as
VLANIF10.
g. After receiving the packet, Switch 2 performs Layer 3 forwarding and sends the
packet to Host C through the directly-connected interface VLANIF20.
h. The response packet from Host C reaches Switch 1 after Switch 2 carries out Layer
3 forwarding.
i. After receiving the packet, Switch 1 performs Layer 3 forwarding and sends the
packet to Host A through the super VLAN.
5.2.2.4 VLAN Mapping

VLAN mapping, also known as VLAN translation, converts between user VLAN IDs and ISP
VLAN IDs.
VLAN mapping is implemented after packets are received on an inbound interface and before
the packets are forwarded by an outbound interface.
l After VLAN mapping is configured on an interface, the interface replaces the VLAN tag
of a local VLAN frame with an external VLAN tag before sending the VLAN frame out.
l When receiving the VLAN frame, the interface replaces the VLAN tag of a received
VLAN frame with the local VLAN tag.
This implements inter-VLAN communication.
In Figure 5-19, VLAN mapping between VLAN 2 and VLAN 3 is configured on GE1. When
GE 1/0/1 sends frames from VLAN 2 to VLAN 3, it replaces VLAN 2 with VLAN 3. When
GE 1/0/1 sends frames from VLAN 3 to VLAN 2, it replaces VLAN 3 with VLAN 2. In this
manner, devices in VLAN 2 and VLAN 3 can communicate with each other.

Equipment
Figure 5-19 VLAN mapping
VLAN 2 VLAN 3
2 3
GE1
3
ATN A ATN B
3
2
3
172.16.0.1/16 172.16.0.7/16
If devices in two VLANs need to communicate through VLAN mapping, the IP addresses of
these devices must be on the same network segment. Otherwise, VLAN mapping does not
take effect, as the devices communicate through Layer 3 routes.
Currently, the ATN supports the following VLAN mapping modes:

l 1 to 1 VLAN mapping
When a main interface configured with VLAN mapping receives a single-tagged frame,
it maps the tag of the frame to a specified tag.
5.2.2.5 Flexible Service Access Through Sub-interfaces of Various Types
Background
On an ME network, commonly, users and services are first identified based on single VLAN
tags or double VLAN tags carried in packets and then access different VPNs through sub-
interfaces. In some special scenarios where the access device does not support QinQ or a
VLAN tag is used in different services, different services cannot be distributed to different
VSIs or VPN instances.
As shown in Figure 5-20, the High Speed Internet (HSI), Voice over IP (VoIP), and Internet
Protocol Television (IPTV) services belong to VLAN 10 and are converged to the CSG; the
CSG is connected to the UPE through L2VPNs.
If the CSG does not support QinQ, it cannot differentiate the received HSI, VoIP, and IPTV
services for transmitting them through different PWs. In this case, you can configure the CSG
to resolve the 802.1p priorities. Then, the UPE can transmit different packets through different
PWs based on the 802.1p priorities of the packets.
In a similar manner, if the CSG is connected to UPE through L3VPNs, the UPE can transmit
different services through different VPN instances based on the 802.1p priorities of the
packets.

Equipment
Figure 5-20 Networking diagram of multiple services belonging to the same VLAN
BTV VOD
Platform
SR Video
HSI
PW1
VoIP
CSG PW2 PE
Internet
IPTV BRAS
VLAN 10
Data flow1
Data flow2
Basic Concepts
The sub-interfaces are classified as shown in Table 5-8 based on service identification
policies configured on them.
Table 5-8 Different types of sub-interfaces

Sub-interface Concept Application
Type
VLAN sub- It is a sub-interface encapsulated with a Sub-interfaces on different

interface VLAN ID. main interfaces can be
encapsulated with the same
VLAN ID. VLAN sub-
interfaces are bound to
VSIs/VLLs or VPN
instances to access
L2VPNs or L3VPNs.
NOTE
Different sub-interfaces on
the same main interface
cannot be encapsulated with
the same VLAN ID.
Default sub- It is a sub-interface that supports VLAN A VLAN+default-enabled

interface +default. Here, the VLAN can be a sub-interface identifies
single VLAN or a VLAN range. If a packets based on their
single VLAN is specified, it is a Dot1q VLAN IDs without 802.1p
sub-interface; if a VLAN range is priorities/EthType values.
specified, it can be a sub-interface for
Dot1q VLAN tag termination or a QinQ
stacking sub-interface. A default sub-
interface receives tagged packets with
no 802.1p priorities/EthType values.

Equipment
Sub-interface Concept Application

Type
802.1p sub- When functioning as a common Dot1q An access device on an

interface sub-interface, it supports a single ME network distinguishes
VLAN. When functioning as a Dot1q services using VLANs and
termination sub-interface or QinQ 802.1p priorities.
stacking sub-interface, it supports the After the VLAN+802.1p
VLAN+802.1p matching policy; in matching policy is
other words, each packet received by configured for such a sub-
the sub-interface carries a VLAN tag interface, the sub-interface
and a 802.1p priority. automatically resolves the
VLAN tag and 802.1p
value when receiving a
user packet.
If the VLAN tag and
802.1p value of a packet do
not match the specified
matching policy, then the
packet is sent to the
VLL/VSI or VPN instance
corresponding to the sub-
interface. If the matching
fails but there is a default
sub-interface, then the
packet is sent to the
VLL/VSI or VPN instance
corresponding to the
default sub-interface.
If none of the preceding
conditions is met, the
packet is dropped.
VLAN+802.1p-based L2VPN Access

l Access of a non-IP station
As shown in Figure 5-21, a CSG accesses a non-IP station. The CSG and BTS are
connected through TDM or ATM physical links. Then, the CSG differentiates service
types (voice, data, or signal) based on timeslots in TDM or PVCs in ATM.
Figure 5-21 Networking diagram of the access of a non-IP station to an L2VPN

PWE3
PWE3
Signal
IP/Eth
Voice
Manage CSG VSI PE2

BSC
Data BTS VSI
VSI
VSI
PE3 PE4
Per Service Per VSI
ATM/TDM Ethernet over VSI ATM/TDM

Equipment
The following takes the ATM service as an example.

To ensure that ATM packets reach the remote server, PWE3 is required to be deployed
between the CSG and PE to implement ATM cell relay.
PWE3 interconnects traditional network resources such as ATM network resources
through a packet switched network (PSN) and simulates traditional services over the
PSN. When the traditional services traverse the PSN, the simulation of the original
services keeps end users from feeling the difference. In this manner, the investment of
users and carriers can be protected in network consolidation and construction. After a
P2P tunnel is set up on the public or private PSN, simulated Layer 2 services (data
packets, cells, and bit flows) can traverse the PSN. In addition, PWE3 tries to simulate
the original services between the PW-connected CSG and PE.
NOTE
For details on ATM cell relay implemented through PWE3, refer to the chapter "PWE3" in the
Feature Description - VPN.
In the access of a non-IP station to an L2VPN, the process of transmitting ATM packets
is as follows:
a. The NodeB differentiates service types (voice, data, or signal). The NodeB
encapsulates the IP packets as follows:
n Encapsulates different users with different VLAN IDs.
n Encapsulates different services with different 802.1p priorities.
n Encapsulates different services of the same user with the same VLAN ID but
different 802.1p priorities.
n Encapsulates different services of different users with different VLAN IDs but
the same or different 802.1p priorities.
After the NodeB encapsulates packets with different VLAN IDs and 802.1p
priorities, it encapsulates the packets with tunnel labels and PW labels for
implementing ATM cell relay. Then, the ATM services can be transparently
transmitted to the CSG through label switching. Figure 5-22 shows the format of a
packet on the outbound interface of the CSG.
Figure 5-22 Format of a packet on the outbound interface of the CSG
DA SA VLAN ID+802.1p ...... Tunnel PW label ...... DATA

label
Ethernet header
b. After CSG receives the ATM services, the 802.1p sub-interface on CSG resolves
the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
different VSIs through priority mapping. In this manner, different services are
transmitted to PE2 through different VSIs. Figure 5-23 shows the format of a
packet transmitted between CSG and PE.
Figure 5-23 Format of a packet transmitted between PEs
Ethernet MPLS VSI label Ethernet Tunnel PW label ...... DATA

header label header label

Equipment
c. Upon receiving the packets, the PE2 decapsulates the packets to obtain the original
ATM packets and then sends the original ATM packets to the BSC.
l Access of an IP station
As shown in Figure 5-24, when a CSG accesses an IP station, PWE3 is not required on
the CSG and MASG. After the CSG receives IP packets, it performs the following:
b. After CSG receives the packets, its 802.1p sub-interface resolves the packets to
obtain their VLAN IDs and 802.1p priorities. The packets then access different
VSIs through priority mapping. In this manner, different services are transmitted to
PE2 through different VSIs.
c. The PE2 then transmits the packets to the BSC.
Figure 5-24 Networking diagram of the access of an IP station to an L2VPN

Enter different VSIs
based on
Signal VLAN+802.1p
Voice CSG PE2 MASG
Manage
Data BTS VSI BSC

VSI
VSI
VSI
PE3 PE4
Per Service Per VSI
Ethernet
Ethernet Ethernet
over VSI
NOTE
l A PE connected to a CSG does not need to be configured in a different manner, irrespective of

whether the CSG is connected to an IP station or a non-IP station.
l Huawei high-end ATNs can function as PEs. In this scenario, only the configurations of CSGare
mentioned. For detailed configurations of other devices, refer to the related configuration guides.
l For details on L2VPNs, refer to the chapters "VLL", "PWE3", and "VPLS" in the Feature
Description - VPN.
VLAN+802.1p-based L3VPN Access

l Access of a non-IP station

Equipment
As shown in Figure 5-25, a CSG accesses a non-IP station. NodeB and the CSG, and the
MASG and RNC are connected through TDM or ATM physical links. Then, the CSG
differentiates service types (voice, data, or signal) based on timeslots in TDM or PVCs in
ATM.
Figure 5-25 Networking diagram of the access of a non-IP station to an L3VPN

PW
Signal VLAN+802.1p
Voice
Manage
CSG L3VPN PE2 RNC
Data NodeB
ATM/TDM Ethernet over VPN ATM/TDM
The following takes the ATM service as an example.

To ensure that ATM packets reach the remote server, PWE3 is required to be deployed
between the CSG and MASG to implement ATM cell relay.
PWE3 interconnects traditional network resources such as ATM through a packet
switched network (PSN) and simulates traditional services over the PSN. When the
traditional services traverse the PSN, the simulation of the original services keeps end
users from feeling the difference. In this manner, the investment of users and carriers can
be protected in network consolidation and construction. After a P2P tunnel is set up on
the public or private PSN, simulated Layer 2 services (data packets, cells, and bit flows)
can traverse the PSN. In addition, PWE3 tries to simulate the original services between
the PW-connected CSG and MASG.
NOTE
For details on ATM cell relay implemented through PWE3, refer to the chapter "PWE3" in the
Feature Description - VPN.
In the access of a non-IP station to an L3VPN, the process of transmitting ATM packets
is as follows:
After the NodeB encapsulates packets with different VLAN IDs and 802.1p
priorities, it encapsulates the packets with tunnel labels and PW labels for
implementing ATM cell relay. Then, the ATM services can be transparently
transmitted to DSLAM through label switching. Figure 5-26 shows the format of a
packet on the outbound interface of the CSG.

Equipment
Figure 5-26 Format of a packet on the outbound interface of the CSG
DA SA VLAN ID+802.1p ...... Tunnel PW label ...... DATA

label
Ethernet header
b. After CSG receives the ATM services, the 802.1p sub-interface on CSG resolves
the packets to obtain their VLAN IDs and 802.1p priorities. The packets then access
different VPN instances through priority mapping. In this manner, different services
are transmitted to PE2 through different VPN instances. Figure 5-27 shows the
format of a packet transmitted between the CSG and PE2.
Figure 5-27 Format of a packet transmitted between PEs
Ethernet LSP L3VPN Ethernet Tunnel PW

...... DATA
header label label header label label
c. Upon receiving the packets, the PE2 decapsulates the packets to obtain the original
ATM packets and then sends the original ATM packets to the RNC.
l Access of an IP station
As shown in Figure 5-28, when a CSG accesses an IP station, PWE3 is not required on
the CSG and MASG. After the CSG receives IP packets, it performs the following:
b. After the CSG receives the packets, its 802.1p sub-interface resolves the packets to
obtain their VLAN IDs and 802.1p priorities. The packets then access different
VPN instances through priority mapping. In this manner, different services are
transmitted to PE2 through different VPN instances.
c. After PE2 receives the packets, it sends the packets to the RNC.
Figure 5-28 Networking diagram of the access of an IP station to an L3VPN

Signal VLAN+802.1p
Voice
Manage CSG L3VPN PE2 RNC

Data NodeB

Equipment
NOTE
l A PE connected to a CSG does not need to be configured in a different manner, irrespective of

whether the CSG is connected to an IP station or a non-IP station.
l Huawei high-end ATNs can function as PEs. In this scenario, only the configurations of CGS are
mentioned here. For detailed configurations of other devices, refer to the related configuration
guides.
l For details on L3VPNs, refer to the chapter "BGP/MPLS IP VPN" in the Feature Description -
VPN.
5.2.3 Application
Port-Based VLAN Division
Figure 5-29 Networking diagram of Port-based VLAN division
Router
Trunk Link
CompanyA CompanyB CompanyC

Different companies residing in the same business premise may need to isolate service data.
Therefore, according to the port requirement of each company, VLANs are created on the
core switch of the business premise, and ports of each company are assigned into the
corresponding VLAN. This ensures that each company can have a "virtual switch" or say a
"virtual workstation".
Application of VLAN Trunk
Figure 5-30 Networking diagram of VLAN trunk application
ATN CX
Trunk Link
NodeB NodeB NodeB RNC A RNC B RNC C

VLAN 2 VLAN 3 VLAN 4 VLAN 2 VLAN 3 VLAN 4
As shown in Figure 5-30, the trunk link can be utilized to connect different NodeB and RNC.
In this manner, data of different NodeBs can be isolated, and the inter-department
communication within the NodeB and RNC can be implemented.

Equipment
Application of Inter-VLAN Communication

The inter-VLAN communication can be classified into two types, as shown as follows:
l Multiple VLANs belongs to the same Layer 3 device.
Figure 5-31 Networking diagram of communications between multiple VLANs on the same
Layer 3 device
CX600
Trunk Link
ATN
NodeB NodeB NodeB

As shown in Figure 5-31, if VLAN 2, VLAN 3, and VLAN 4 only belong to CX, these
VLANs are not VLANs across different switches. In such a situation, you can configure a
VLANIF interface for each VLAN on CX to implement the communications between these
VLANs.
The Layer 3 device shown in Figure 5-31 can be a ATN or a Layer 3 switch.
l Multiple VLANs belongs to different Layer 3 devices.
Figure 5-32 Networking diagram of communications between multiple VLANs on different

Layer 3 devices
CX-A CX-B
Trunk Link Trunk Link

ATN Switch
NodeB NodeB NodeB RNC A RNC B RNC C

VLAN 2 VLAN 3 VLAN 4 VLAN 2 VLAN 3 VLAN 4
As shown in Figure 5-32, VLAN 2, VLAN 3, and VLAN 4 are VLANs across different
ATNs. In such a situation, you can configure a VLANIF interface respectively on CX-A and
CX-B for each VLAN, and then configure the static route or run a routing protocol between
CX-A and CX-B.

Equipment
The Layer 3 device shown in Figure 5-32 can be a ATN or a Layer 3 switch.

Abbreviation
VLAN virtual local area network
PVID port default VLAN ID
5.3 Trunk
5.3.1 Introduction
Definition
Trunking bundles multiple physical interfaces into a single logical interface, which is called a
trunk interface. The bundled physical interfaces are member interfaces.
Trunking increases bandwidth, enhances reliability, and supports load balancing.
Purpose
Before trunking is used, the transmission rate between two network devices connected by a
fast Ethernet twisted pair cable is limited to 100 Mbit/s. To provide a higher transmission rate,
the twisted pair cable has to be replaced with a gigabit optical fiber, or the existing network
has to be upgraded to a Gigabit Ethernet network. These solutions are costly and not suitable
for small-and-medium size enterprises or institutions.
To provide an economical solution, trunking is used to provide a higher bandwidth by

bundling multiple interfaces. For example, three 100 Mbit/s full-duplex interfaces can be
bundled together to provide a maximum bandwidth of 300 Mbit/s.
5.3.2 Principles
5.3.2.1 Basic Principles

The member links of a trunk link can be configured with different weights to carry out load
balancing, which helps ensure connection reliability and greater bandwidth.
Additionally, a trunk interface can be configured to support routing protocols and services.
Figure 5-33 shows a simple Eth-Trunk example in which two devices are directly connected
through three GE interfaces. These three interfaces are bundled into an Eth-Trunk interface at
both ends of the trunk link. In this way, bandwidth is increased, and reliability is improved.

Equipment
Figure 5-33 Eth-Trunk network diagram

Eth-Trunk
A trunk link can be considered a direct point-to-point link, with devices on both ends being
either ATNs, switches, or a ATN on one end and a switch on the other end.
Trunking feature the following advantages over standard connections:
l Load balancing
Load balancing can be implemented on a trunk interface. For example, on an Eth-Trunk
interface, you can configure weights for member links to carry out load balancing.
l Higher reliability
When the physical link of a member interface fails, the traffic on the member link is
switched to another member link, ensuring uninterrupted service on the trunk link.
l Increased bandwidth
The bandwidth of a trunk interface equals the sum of the bandwidth of all member
interfaces.
Table 5-9 shows the Eth-Trunk link aggregation modes that theATN supports.
Table 5-9 Eth-Trunk link aggregation modes

Link Network Requirements Description
Aggregation
Mode
Manual load If one of the devices on one Manual load balancing is a basic
balancing mode end of an Eth-Trunk link link aggregation mode, in which you
does not support the Link must manually create the Eth-Trunk
Aggregation Control interface, add interfaces to the Eth-
Protocol (LACP), you can Trunk interface, and specify active
create an Eth-Trunk interface member interfaces. LACP is not
in load balancing mode on involved.
the ATN and add multiple In manual load balancing mode, all
interfaces to the Eth-Trunk active member interfaces forward
interface to increase data and perform load balancing. In
bandwidth and enhance this mode, Traffic can be evenly
transmission reliability. balanced among all member
interfaces. Alternatively, you can
also set a weight for each member
interface to implement uneven load
balancing; in this manner, the
interface that has a greater weight
value transmits a larger volume of
traffic. If an active link in the link
aggregation group fails, traffic is
balanced among the remaining
active links.

Equipment
Link Network Requirements Description

Aggregation
Mode
Static LACP mode If devices on both ends of an In static LACP mode, you must
Eth-Trunk link support manually create an Eth-Trunk
LACP, you can create an Eth- interface and add interfaces to the
Trunk interface in static Eth-Trunk interface. Different from
LACP mode on the ATN. In link aggregation in manual load
this mode, both load balancing mode, active member
balancing and backup can be interfaces are selected by sending
implemented. LACP data units (LACPDUs) in
static LACP mode. That is, when a
group of interfaces are added to an
Eth-Trunk interface, devices at both
ends determine active and inactive
interfaces by sending LACPDUs to
each other.
5.3.2.2 Restrictions on Trunk Interfaces

As a logical interface that bundles multiple physical interfaces and relays upper-layer data, a
trunk interface must comply with the following rules:
l Parameters of the physical interfaces (member interfaces) on both ends of the trunk link
must be consistent. These parameters include:
– Counts of the physical interfaces
– Transmission rates of the physical interfaces
– Duplex modes of the physical interfaces
l Data sequence must be guaranteed.
A data flow can be considered as a group of frames with the same MAC address and IP
address. For example, the Telnet or FTP connection between two devices can be
considered as a data flow.
If a trunk interface is not configured, frames that belong to a data flow can reach their
destination in the correct order because data flows are transmitted over only a physical
link. When a trunk interface is configured, multiple physical links are bound to the same
trunk link, and frames are transmitted along these physical links. If the first frame is
transmitted over a physical link, and the second frame is transmitted over another
physical link, the second frame may reach the destination earlier than the first frame.
To prevent frame disorder, a frame forwarding mechanism is used to ensure that frames
in the same data flow reach the destination in the correct sequence. This mechanism
differentiates data flows based on their MAC addresses or IP addresses. In this manner,
frames belonging to the same data flow are transmitted over the same physical link.
After the frame forwarding mechanism is used, frames are transmitted in the following
manners:
– Frames with the same source + destination MAC addresses are transmitted over the
same physical link.
– Frames with the same source + destination IP addresses are transmitted over the
same physical link.

Equipment
– Frames with the same source and destination IP addresses, source and destination
TCP/UDP port numbers, and IP protocol types are transmitted over the same
physical link.
5.3.2.3 Trunk Interface Classification and Features
Classification
Trunk interfaces can be classified into two types: Eth-Trunk and IP-Trunk.
l An Eth-Trunk interface is composed of Ethernet interfaces.
The ATN supports only Eth-Trunk.
Features
Eth-Trunk interfaces configured on the ATN support the following features:
l IP address assignment and allowing each trunk member interface to "borrow" a trunk
interface's IP address
l Layer 2 forwarding, MPLS forwarding, and Layer 3 forwarding (unicast )
l Hash algorithm-based load balancing
l QoS.
l VPN instance binding
l Hot backup and hot swapping
l Addition of interfaces on different boards to a single trunk interface
Maximum/Minimum Number of Up Member Links

The number of Up member links affects the status and bandwidth of a trunk interface. To
ensure that the trunk interface functions properly and is less affected by changes in member
link status, set the following thresholds:
l Minimum number of Up member links

When the number of Up member links falls below this threshold, the trunk interface goes
Down. This guarantees the trunk interface a minimum available bandwidth.
For example, if the trunk interface is required to provide a minimum bandwidth of 2
Gbit/s and each member link's bandwidth is 1 Gbit/s, the minimum number of Up
member links must be set to 2 or a larger value.
l Maximum number of Up member links
When the number of Up member links reaches this threshold, the bandwidth of the trunk
interface will not increase any further even if more member links go Up. This improves
higher network reliability while guaranteeing sufficient bandwidth.
For example, 8 trouble-free member links are bundled into a trunk link, each with a
bandwidth of 1 Gbit/s. The trunk link, however, only needs to provide a maximum
bandwidth of 5 Gbit/s. To meet this requirement, set the the maximum number of Up
member links to 5 or a larger value. The unselected Up links automatically enter backup
status, improving reliability.

Equipment
NOTE
The maximum number of Up member links can be configured only for Eth-Trunk interfaces in
static LACP mode.
The maximum number of Up member links is used to control the number of member links in the
Up state. After this number is reached, additional Up member links are forcibly set to Down.
In Layer 2 mode, the transmission rate of an Eth-Trunk interface is determined by the

following conditions:
l Maximum number of Up member links
l Number of Up member interfaces
Load Balancing Performed Among a Trunk Interface's Member Interfaces

Different weights can be configured for member interfaces of a trunk interface to carry out
load balancing.
Load balancing can be classified into the following types:
l Per-flow load balancing
Per-flow load balancing differentiates data flows based on each packet's MAC or IP
address and transmits packets belonging to the same data flow along a single member
link.
This load balancing mode guarantees ordered transmission, but cannot guarantee
efficient bandwidth usage.
l Per-packet load balancing
Per-packet load balancing takes each packet, rather than each data flow, as a transmission
unit and distributes packets among different member links.
This load balancing mode guarantees bandwidth usage, but not transmission sequence.
Trunk Member Interface Backup

To improve a trunk interface's reliability, a member interface in the Up state can be configured
as a backup of another member interface.
If a member interface fails, its backup interface (on the same trunk interface) takes over traffic
transmitted along the faulty member interface. Trunk member interface backup also called
trunk fast switchover.
NOTE
Trunk member interface backup can only be configured for Eth-Trunk interfaces in static LACP mode.
5.3.2.4 Trunk Forwarding Principles

As shown in Figure 5-34, the trunk layer lies between the MAC sub-layer and the physical
layer and belongs to the data link layer.

Equipment
Figure 5-34 Trunk interface in the Ethernet protocol stack
LLC
Data link
layer MAC
Trunk
Physical layer PHY
The MAC sub-layer regards trunk interfaces as physical interfaces and delivers frames
directly to the trunk interfaces.
The trunk module maintains a trunk forwarding table that contains the following two items:
l HASH-KEY value
Key values are calculated using the hash algorithm based on packets' MAC or IP
addresses
l Interface number
Figure 5-35 Example of a trunk forwarding table
KEY 0 1 2 3
PORT 3 4 5 6
The trunk module forwards a frame based on the trunk forwarding table in the following way:
1. The trunk module receives a frame from the MAC sub-layer and extracts its source MAC
address/IP address or destination MAC address/IP address.
2. The trunk module uses the hash algorithm to calculate the HASH-KEY value.
3. Based on the HASH-KEY value, the trunk module searches the trunk forwarding table
for the interface number and sends the frame from the corresponding interface.
5.3.2.5 Inter-Board Trunk

Inter-board trunk provides the following mechanisms for user access:
l User information backup

When the status of user connections change, user information such as user entries, user
configurations, link layer state machine, user real-time accounting, and traffic statistics,
is backed up on the boards where member interfaces of a trunk interface reside. This
ensures that user traffic is rapidly switched to a backup link when a member board
becomes faulty.
l Load balancing of user traffic
When users go online, the access control plane distinguishes user traffic based on the
HQoS configuration and status of the online users, and balances traffic among member
interfaces. The basic principle for implementing load balancing can be minimizing the

Equipment
number of users on each member interface or maximizing the remaining bandwidth of

each member interface.
The preceding mechanisms allow user traffic to be quickly switched if the status of the
member interfaces changes, preventing traffic loss.
5.3.2.6 LACP
Introduction to Link Aggregation

With the wide application of Ethernet technology on metropolitan area networks (MANs) and
wide area networks (WANs), carriers have an increasing requirement on the bandwidth and
reliability of Ethernet backbone links. To obtain higher bandwidth, the conventional solution
is to replace the existing interface boards with boards of higher capacity or install devices
which support higher-capacity interface boards. However, this solution is costly and
inflexible. To provide an economical and convenient solution, link aggregation is introduced.
Link aggregation increases link bandwidth by bundling a group of physical interfaces into a
single logical interface without the need to upgrade hardware. In addition, link aggregation
implements link backup for higher transmission reliability.
Trunking, as a link aggregation technique, helps increase bandwidth by bundling multiple

physical interfaces into a single trunk interface. However, trunking can only detect link
disconections, not link layer faults or link misconnections. The Link Aggregation Control
Protocol (LACP) is therefore used to improve trunk fault tolerance, provide M:N backup for
the trunk, and improve reliability.
LACP provides a standard negotiation mechanism for devices to automatically aggregate

multiple links according to their configurations and enables the aggregated link to transmit
and receive data. After an aggregated link is formed, LACP maintains the link status and
implements dynamic link aggregation and deaggregation.
As shown in Figure 5-36, a trunk link is established between ATN A and CX-B. Four full-
duplex GE interfaces on ATN A are bundled into a trunk interface and connected to the
corresponding interfaces on CX-B. One of the GE interfaces, however, is incorrectly
connected to the interface on CX-C. The trunk interface cannot effectively detect the fault and
still sends data to CX-C.
If LACP is enabled on ATN A, CX-B, and CX-C, and ATN A is configured with an LACP
priority higher than that of CX-B, after the LACP negotiation, data can be correctly sent from
ATN A to CX-B.
Figure 5-36 Incorrect trunk connection network diagram

ATNA CX-B
Trunk
CX-C

Equipment
Basic Concepts
l Link aggregation
Link aggregation bundles a group of physical interfaces into a logical interface to
increase bandwidth and improve reliability.
l Link aggregation group
A link aggregation group (LAG), also called a trunk link, is a logical link formed by
bundling several physical links.
If all bundled links are Ethernet links, the LAG is called an Ethernet LAG or an Eth-
Trunk link. The LAG interface is called an Eth-Trunk interface, and Ethernet interfaces
that constitute an Eth-Trunk interface are called member interfaces.
An Eth-Trunk interface can be considered as a single Ethernet interface. The only
difference lies that an Eth-Trunk interface must select one or more member Ethernet
interfaces before forwarding data. You can configure features on an Eth-Trunk interface
the same way as on a single Ethernet interface, except for some features that take effect
only on physical Ethernet interfaces.
NOTE
An Eth-Trunk member interface cannot be added to another Eth-Trunk interface.
l Active and inactive interfaces
Member interfaces can be active or inactive. Active interfaces forward data, whereas
inactive interfaces cannot.
Links connected to active interfaces are called active links, and links connected to
inactive interfaces are called inactive links.
To enhance link reliability, a backup link is used. Interfaces on the two ends of the
backup link are inactive. The inactive interfaces become active only when the active
interfaces fail.
l Maximum number of active member interfaces
If the number of active member interfaces reaches this threshold, additional member
interfaces cannot become active.
l Minimum number of active member interfaces
The minimum number of active member interfaces is specified to ensure Eth-Trunk
interface bandwidth. This threshold prevents data loss during transmission when the
number of active interfaces is insufficient.
If the number of active member interfaces falls below this threshold, the Eth-Trunk
interface goes Down, and all member interfaces of the Eth-Trunk interface stop
forwarding data.
l System LACP priority
A system LACP priority is set to prioritize the devices at both ends. In static LACP
mode, the active interfaces selected by devices must be consistent on both ends;
otherwise, the LAG cannot be set up. To ensure consistency, you can set a higher system
LACP priority for one end. Then, the other end selects the active interfaces based on the
end with a higher system LACP priority.
A smaller system LACP priority value indicates a higher system LACP priority. The
default system LACP priority value is 32768.
l Interface LACP priority
An interface LACP priority determines the likelihood that an interface can be selected as
an active interface. Interfaces with higher priorities are selected as active interfaces. A
smaller interface LACP priority value indicates a higher interface LACP priority.

Equipment
l M:N backup
Link aggregation in static LACP uses LACPDUs to negotiate active link selection. This
mode is also called M:N mode where M indicates the number of active links and N
indicates the number of backup links. This mode improves link reliability and allows
load balancing to be performed across M active links.
In Figure 5-37, M+N links with the same attributes (in the same LAG) are set up
between two devices. When data is transmitted over the aggregated link, load balancing
is performed on the M active links; no data is transmitted over the N backup links.
Therefore, the actual bandwidth of the aggregated link is the sum of the M links'
bandwidth, and the maximum bandwidth of the aggregated link is the sum of the M+N
links' bandwidth.
If one of the M links fails, LACP selects a link from the N backup links to take over the
traffic. In such a situation, the actual bandwidth of the aggregated link is still the sum of
M links' bandwidth, but the maximum bandwidth of the aggregated link becomes the
sum of the M+N-1 links' bandwidth.
Figure 5-37 M:N backup
ATNA Eth-Trunk 1 Eth-Trunk 1 CX-B

Eth-Trunk
Primary link
Backup link
M:N backup applies when the bandwidth of M links needs to be provided and link
redundancy is required. If an active link fails, the system automatically selects the
backup link with the highest priority and adds it to the current LAG.
If no backup link is available and the number of Up member links is less than the lower
threshold for the number of Up links, the device shuts down the trunk interface.
Link Aggregation Mode

Link aggregation works in either of the following modes:
l Manual load balancing mode

Manual load balancing is a basic link aggregation mode. In this mode, you must
manually create a trunk interface and add member interfaces to the trunk interface
without the help of LACP.
Then all member interfaces can forward data and perform load balancing, either evenly
or unevenly depending on the weight of each active interface.
If an active link of the LAG fails, the load balancing is implemented among the
remaining active links.
l Manual 1:1 master/backup mode
In 1:1 master/backup mode, a LAG contains only two member interfaces: the master
interface and the backup interface.
In manual mode, you must manually set up an Eth-Trunk and add an interface to the Eth-
Trunk. You must also manually configure member interfaces to be in the active state.

Equipment
The manual 1:1 master/backup mode is used when the peer device does not support
LACP.
l Static LACP mode
In static LACP mode, you must also manually create a trunk interface and add member
interfaces to it. Compared with link aggregation in manual load balancing mode, active
interfaces in static LACP mode are selected through exchange of Link Aggregation
Control Protocol Data Units (LACPDUs). To be specific, when a group of interfaces are
added into a trunk interface, the status of each member interface (active or inactive)
depends on LACP negotiation.
Table 5-10 shows a comparison between manual load balancing and static LACP modes.
Table 5-10 Comparison between the manual load balancing and static LACP modes
Differen Manual Load Balancing Static LACP Mode

ce/ Mode
Similari
ty
Differenc LACP is disabled. LACP is enabled.

e Does not check whether LACP checks whether interfaces can be
interfaces can be aggregated. aggregated.
Similarit A LAG is created and deleted manually.

y Member links are added and deleted manually.
Link Aggregation in Manual Load Balancing Mode

For a LAG in manual load balancing mode, you can manually add multiple interfaces to the
LAG, and all added interfaces will forward data and perform load balancing. This mode is
mainly used when high bandwidth is required and the devices do not support LACP. In
Figure 5-38, ATN A supports LACP, whereas CX-B does not support LACP.
Figure 5-38 Link aggregation in manual load balancing mode
Eth-Trunk 1 Eth-Trunk Eth-Trunk 1
ATNA CX-B
In this mode, load balancing is implemented among all member interfaces. The ATN supports
the following load balancing types:
l Load balancing based on IP addresses

l Load balancing based on MAC addresses
The former two types apply to per-destination load balancing, and the third mode applies to
the per-packet load balancing.

Equipment
NOTE
In manual load balancing mode, member interfaces on different boards can be added into the same Eth-
Trunk interface.
Link Aggregation in Static LACP Mode

LACP, specified in IEEE 802.3ad, implements dynamic link aggregation and deaggregation,
allowing both ends to exchange LACPDUs.
After member interfaces are added to a trunk interface in static LACP mode, each end sends
LACPDUs to inform its peer of its system priority, MAC address, member interface priorities,
interface numbers, and keys. After the peer receives the information, the peer compares this
information with stored information and selects interfaces that can be aggregated. Devices at
both ends then determine active interfaces.
Figure 5-39 shows the fields in an LACPDU.

Equipment
Figure 5-39 LACPDU
Destination Address
Source Address
Length/Type
Subtype=LACP
Version Number
TLV_type=Actor Information
Actor_Information_Length=20
Actor_Port
Actor_State
Actor_System_Priority
Actor_System
Actor_Key
Actor_Port_Priority
Reserved
TLV_type=Partner Information
Partner_Information_Length=20
Partner_Port
Partner_State
Partner_System_Priority
Partner_System
Partner_Key
Partner_Port_Priority
Reserved
TLV_type=Collector Information
Collector_Information_Length=16
CollectorMaxDelay
Reserved
TLV_type=Terminator
Terminator_Length=0
Reserved
FCS
The main fields are described as follows:

l Actor_Port/Partner_Port: interface of the Actor or Partner.
l Actor_State/Partner_State: status of the Actor or Partner.
l Actor_System_Priority/Partner_System_Priority: system priority of the Actor or Partner.
l Actor_State/Partner_System: system ID of the Actor or Partner.
l Actor_State/Partner_Key: operational Key of the Actor or Partner.
l Actor_Port_Priority/Partner_Port_Priority: interface priority of the Actor or Partner.
l The process of setting up an Eth-Trunk link in static LACP mode is as follows:

Equipment
a. Devices at both ends send LACPDUs to each other.

In Figure 5-40, after you manually create an Eth-Trunk link in static LACP mode
on CX-A and ATN -B and add member interfaces to the Eth-Trunk, the member
interfaces are enabled with LACP, and devices at both ends can send LACPDUs to
each other.
Figure 5-40 LACPDUs sending in static LACP mode

CX-A LACPDU ATNB
LACPDU
b. Devices at both ends determine the Actor based on the system LACP priority and
system ID.
In Figure 5-41, devices at both ends receive LACPDUs from each other. Use ATN
B as an example. When ATN B receives LACPDUs from -A, ATN B checks and
records information about CX-A and compares system priorities. If the system
priority of CX-A is higher than that of ATN B, CX-A acts as the Actor and ATN B
selects the active interfaces based on the priorities of the corresponding interfaces
on CX-A. In this manner, active interfaces of both devices are determined.
Figure 5-41 Actor selection in static LACP mode

CX-A ATNB
The device with higher The device with lower

system priority system priority
The Actor determines
active links
CX-A ATNB
c. Devices at both ends determine active interfaces based on the Actor's LACP
priorities and interface IDs.
In Figure 5-42, after devices at both ends determine the Actor, they select active
interfaces according to the priorities of the Actor's interfaces.
Then active interfaces are selected, active links in the LAG are specified, and load
balancing is implemented across these active links.

Equipment
Figure 5-42 Active interface selection in static LACP mode
Interface priority Interface priority

CX-A 1 3 ATNB
2 2
3 1
System priority = 10 System priority = 11
Compare system priority
and determine the Actor
CX-A 1 3 ATNB
2 2
3 1
Actor
Select active interfaces
according to the Actor
CX-A ATNB
1 3
2 2
3 1
Actor
l Switchover between active links and inactive links

In static LACP mode, a link switchover in the LAG is triggered if a device at one end
detects one of the following events:
– An active link goes Down.
– Ethernet OAM detects a link fault.
– LACP detects a link fault.
– An active interface becomes unavailable.
– If LACP preemption is enabled, a backup interface's priority becomes higher than
that of an active interface.
If any of the preceding triggering conditions is met, a link switchover is performed as
follows:
a. The faulty link is disabled.
b. The backup link with the highest priority is selected to replace the faulty active link.
c. The backup link with the highest priority becomes the active link and begins
forwarding data. The link switchover is complete.
l LACP preemption
After LACP preemption is enabled, interfaces with higher priorities in a LAG function as
active interfaces.
In Figure 5-43, GE 1, GE 2, and GE 3 are member interfaces of Eth-Trunk 1. The upper
threshold for the number of active interfaces is 2. LACP priorities of GE 1 and GE 2 are
set to 9 and 10 respectively. The LACP priority of GE 3 is the default value. When the
LACP negotiation is complete, GE 1 and GE 2 are selected as active interfaces because
their LACP priorities are higher, and GE 3 is selected as the backup interface.

Equipment
Figure 5-43 LACP preemption

ATNA CX-B
9 9 GE 1
10 Eth-Trunk 10 GE 2
GE 3
Primary link
Backup link
LACP preemption is typically enabled in the following situations:

– GE 1 fails and then recovers. When GE 1 fails, GE 3 replaces it. After GE 1
recovers, if the LACP preemption is not enabled on Eth-Trunk 1, GE 1 remains to
be the backup interface; if the LACP preemption is enabled on Eth-Trunk 1, GE 1
becomes the active interface and GE 3 becomes the backup interface.
– If the LACP preemption is enabled and GE 3 needs to replace GE 1 or GE 2 to
become the active interface, you can set the LACP priority value of GE 3 to 8 or a
smaller value. If the LACP preemption is not enabled, the system neither re-selects
the active interface nor switches the active interface when the priority of a backup
interface is higher than that of the active interface.
l LACP preemption delay
After LACP preemption is enabled, the backup link waits for a period of time before
switching to active status. This period is an LACP preemption delay. The LACP
preemption delay can be configured within a range of 10 to 180 seconds. The default
delay is 30 seconds.
An LACP preemption delay is set to prevent unstable data transmission along Eth-Trunk
links caused by frequent status changes in member links.
As shown in Figure 5-43, GE 1 becomes inactive because of a link failure. After a
period, the link recovers. If LACP preemption is enabled and the period is shorter than
the LACP preemption delay, GE 1 resumes as the active interface after the LACP
preemption delay, and no status change of any backup interface occurs.
ETH-OAM (802.1ag) and BFD

Using LACP to detect link faults is time-consuming and has low reliability. To rapidly detect
member link status for a faster active/backup link switchover or load rebalancing, ETH-OAM
(802.1ag) or BFD can be used, which ensures a link recovery time at the 50 ms level.
5.3.2.7 E-Trunk
Enhanced Trunk (E-Trunk) controls and implements link aggregation among multiple devices.
E-Trunk implements device-level link reliability, instead of board-level link reliability.
NOTE
Only ATN 950B ( AND2CXPB/AND2CXPE ) support this command.
Basic Concepts
l E-Trunk ID
An E-Trunk ID is an integer that uniquely identifies an E-Trunk link.
l Eth-Trunk ID
An Eth-Trunk ID is an integer that uniquely identifies an Eth-Trunk link.

Equipment
l E-Trunk priority
E-Trunk priorities determine the master/backup status of two devices in an aggregation
group. In Figure 5-44, PE1 has a higher E-Trunk priority than PE2. Therefore PE1 is the
master device, and PE2 is the backup device. A smaller value indicates a higher E-Trunk
priority.
l E-Trunk system ID
E-Trunk system IDs determine the master/backup status of two devices in an aggregation
group when they have the same E-Trunk priority. An E-Trunk system ID is the MAC
address of an Ethernet interface on a main control board. The device with a smaller
MAC address has a higher priority.
l LACP E-Trunk system priority
The Link Aggregation Control Protocol (LACP) system priority of an E-Trunk member
interface (Eth-Trunk interface) is called LACP E-Trunk system priority.
If an E-Trunk consists of Eth-Trunk interfaces in static LACP mode, LACP E-Trunk
system priorities determine the LACP Actor from the two ends of an Eth-Trunk link. The
end with a smaller LACP E-Trunk system priority value functions as the Actor. The
Actor selects active interfaces from its local Eth-Trunk member interfaces, and then the
other end selects its local Eth-Trunk member interfaces that are directly connected to the
active interfaces of the Actor as active interfaces.
NOTE
l LACP E-Trunk system priorities apply to E-Trunks that consist of Eth-Trunk interfaces in
static LACP mode.
l LACP system priorities apply to Eth-Trunk interfaces in static LACP mode.
l LACP E-Trunk system priorities and LACP system priorities are configurable. If both of them
are configured and Eth-Trunk interfaces in static LACP mode are added to an E-Trunk, the
LACP E-Trunk system priority is used.
l LACP E-Trunk system ID
The LACP system ID of an E-Trunk member interface (Eth-Trunk interface) is called
LACP E-Trunk system ID.
If two devices have the same LACP E-Trunk system priority, the device with the smaller
LACP E-Trunk system ID has a higher priority.
NOTE
l LACP E-Trunk system IDs apply to E-Trunks that consist of Eth-Trunk interfaces in static
LACP mode.
l LACP system IDs apply to Eth-Trunk interfaces in static LACP mode.
l LACP E-Trunk system IDs are configurable, whereas LACP system IDs are not because
LACP system IDs are the MAC addresses of the Ethernet interfaces on main control boards.
In E-Trunk, to enable a CE to consider the remote PEs as a single device, you must
configure the same LACP E-Trunk system priority and system ID for the PEs. In Figure
5-44, the LACP E-Trunk system ID is in the format of a MAC address.
l Eth-Trunk working mode
Eth-Trunk working modes refer to the mode in which E-Trunks' member interfaces Eth-
Trunks work. Eth-Trunk interfaces that are added to an E-Trunk can work in any of the
following modes:
– Automatic
– Forcible master
– Forcible backup

Equipment
l Timeout period
Normally, the master and backup devices in an E-Trunk periodically exchange Hello
messages. If the backup device does not receive any Hello message within the timeout
period, it becomes the master device.
The timeout period is obtained through this formula: Timeout period = Interval at which
Hello messages are sent x Time multiplier.
If the time multiplier is 3, the backup device becomes the master device if it does not
receive any Hello message within three consecutive sending intervals.
E-Trunk Working Principle

Eth-Trunk interfaces in static LACP mode and manual load balancing mode can be added to
an E-Trunk. Their working principles are different and the E-Trunk working process is
described as follows:
l Eth-Trunk interfaces in static LACP mode are added to an E-Trunk.
– Master/backup status negotiation
As shown in Figure 5-44, the CE is directly connected to PE1 and PE2, and E-
Trunk runs between PE1 and PE2.
Figure 5-44 E-Trunk
PE1
10
- Trunk
Eth-Trunk 1 Eth
E-Trunk 1
CE
Eth-Tru
nk 10
PE2
NOTE
When you configure IP addresses for Eth-Trunk interfaces connecting the CE and PEs to
transmit Layer 3 services, the PE Eth-Trunk interface configurations must meet the
following requirements:
l The Eth-Trunk interfaces must have IP addresses residing on the same network
segment.
In most cases, the master device advertises the direct route to its Eth-Trunk interface,
and the backup device does not. After a master/backup device switchover is complete,
the new master device (former backup device) advertises the direct route to its Eth-
Trunk interface.
l The same MAC address must be configured for the PE Eth-Trunk interfaces.
This prevents the CE from updating its ARP entries for a long time when a master/
backup device switchover is performed and therefore ensures uninterrupted service
forwarding.
There are few scenarios for configuring IP addresses for Eth-Trunk interfaces connecting the
CE and PEs to transmit Layer 3 services and then adding the PE Eth-Trunk interfaces to an
E-Trunk.

Equipment
n PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In
addition, the Eth-Trunk interfaces are added to the E-Trunk group.
n CE end
Eth-Trunk interfaces in static LACP mode are configured on the CE. By using
the Eth-Trunk interfaces, the CE is connected to PE1 and PE2.
The E-Trunk group is invisible to the CE.
i. E-Trunk master/backup status
PE1 and PE2 negotiate the E-Trunk master/backup status by exchanging E-
Trunk packets. Normally, after the negotiation one PE functions as the master
and the other as the backup.
The master/backup status of a PE depends on the E-Trunk priority and E-
Trunk ID carried in E-Trunk packets. The smaller the E-Trunk priority value,
the higher the E-Trunk priority. The PE with the higher E-Trunk priority
functions as the master. If the E-Trunk priorities of the PEs are the same, the
PE with the smaller E-Trunk system ID functions as the master device.
ii. Master/backup status of a member Eth-Trunk interface in the E-Trunk group
The master/backup status of a member Eth-Trunk interface in the E-Trunk
group is determined by its E-Trunk status and the peer Eth-Trunk interface
status.
As shown in Figure 5-44, PE1 and PE2 are on the two ends of the E-Trunk
link. PE1 is considered as the local end and PE2 as the peer end.
The status of each member Eth-Trunk interface in the E-Trunk group is
determined, as shown in Table 5-11.
Table 5-11 Master/backup status of an E-Trunk group and its member Eth-
Trunk interfaces
Status of the Working Status of the Status of the
Local E-Trunk Mode of the Peer Eth- Local Eth-
Local Eth- Trunk Trunk
Trunk Interface Interface
Interface
- Forcible master - Master
- Forcible backup - Backup
Master Automatic Down Master
Backup Automatic Down Master
Backup Automatic Up Backup
In normal situations:
○ If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the
master, and its link status is Up.
○ If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the
backup, and its link status is Down.

Equipment
If the link between the CE and PE1 fails, the following situations occur:
1) PE1 sends an E-Trunk packet containing information about the faulty
Eth-Trunk 10 of PE1 to PE2.
2) After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the
peer is faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2
becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is
forwarded through PE2. In this way, traffic destined for the peer CE is
protected.
If PE1 is faulty, the following situations occur:
1) If the PEs are configured with BFD, the PE2 detects that the BFD session
status becomes Down, then functions as the master and Eth-Trunk 10 of
PE2 functions as the master.
2) If the PEs are not configured with BFD, PE2 will not receive any E-Trunk
packet from PE1 before its timeout period runs out, after which PE2 will
function as the master and Eth-Trunk 10 of PE2 will function as the
master.
Through the LACP negotiation, the status of Eth-Trunk 10 on PE2
becomes Up. The traffic of the CE is forwarded through PE2. In this way,
destined for the peer CE is protected.
– Sending and receiving of E-Trunk packets
E-Trunk packets carrying the source IP address and port number configured on the
local end are sent through UDP. Factors triggering the sending of E-Trunk packets
are as follows:
n The sending timer times out.
n The configurations change. For example, the E-Trunk priority, packet sending
period, timeout period multiplier, addition/deletion of a member Eth-Trunk
interface, or source/destination IP address of the E-Trunk group changes.
n A member Eth-Trunk interface fails or recovers.
E-Trunk packets contain the timeout period to be used as the timeout period for the
peer.
– BFD fast detection
A device cannot quickly detect a fault on its peer based on the timeout period of
received packets. In this case, BFD can be configured on the device. The peer end
needs to be configured with an IP address. After a BFD session is established to
detect whether the route destined for the peer is reachable, E-Trunk can sense any
fault detected by BFD.
– Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the
Eth-Trunk interface on the local device goes Down or the local device fails, the peer
device becomes the master and the physical status of the member Eth-Trunk
interface becomes Up.
When the local end recovers, the local end needs to function as the master.
Therefore, the local Eth-Trunk interface enters the LACP negotiation state. After
being informed by LACP that the negotiation ability is Up, the local device starts
the switchback delay timer. After the switchback delay timer times out, the local

Equipment
Eth-Trunk interface becomes the master. After LACP negotiation, the Eth-Trunk
l Eth-Trunk interfaces in manual load balancing mode are added to an E-Trunk.
– Master/backup status negotiation
As shown in Figure 5-45, the CE is directly connected to PE1 and PE2, and E-
Trunk runs between PE1 and PE2.
Figure 5-45 E-Trunk
PE1
10
- Trunk
Eth-Trunk 1 Eth
E-Trunk 1
CE
Eth-Tru
nk 10
PE2
n PE end
The same Eth-Trunk and E-Trunk interfaces are created on PE1 and PE2. In
addition, the Eth-Trunk interfaces are added to the E-Trunk group.
i. E-Trunk master/backup status
PE1 and PE2 negotiate the E-Trunk master/backup status by exchanging E-
Trunk packets. Normally, after the negotiation one PE functions as the master
and the other as the backup.
The master/backup status of a PE depends on the E-Trunk priority and E-
Trunk ID carried in E-Trunk packets. The smaller the E-Trunk priority value,
the higher the E-Trunk priority. The PE with the higher E-Trunk priority
functions as the master. If the E-Trunk priorities of the PEs are the same, the
PE with the smaller E-Trunk system ID functions as the master device.
ii. Master/backup status of a member Eth-Trunk interface in the E-Trunk group
The master/backup status of a member Eth-Trunk interface in the E-Trunk
group is determined by its E-Trunk status and the peer Eth-Trunk interface
status.
As shown in Figure 5-45, PE1 and PE2 are on the two ends of the E-Trunk
link. PE1 is considered as the local end and PE2 as the peer end.
The status of each member Eth-Trunk interface in the E-Trunk group is
determined, as shown in Table 5-12.

Equipment
Table 5-12 Master/backup status of an E-Trunk group and its member Eth-
Trunk interfaces
Status of the Working Status of the Status of the
Local E-Trunk Mode of the Peer Eth- Local Eth-
Local Eth- Trunk Trunk
Trunk Interface Interface
Interface
- Forcible master - Master
- Forcible backup - Backup
Master Automatic Down Master
Backup Automatic Down Master
Backup Automatic Up Backup
In normal situations:
○ If PE1 functions as the master, Eth-Trunk 10 of PE1 functions as the
master, and its link status is Up.
○ If PE2 functions as the backup, Eth-Trunk 10 of PE2 functions as the
backup, and its link status is Down.
If the link between the CE and PE1 fails, the following situations occur:
1) PE1 sends an E-Trunk packet containing information about the faulty
Eth-Trunk 10 of PE1 to PE2.
2) After receiving the E-Trunk packet, PE2 finds that Eth-Trunk 10 on the
peer is faulty. Then, the status of Eth-Trunk 10 on PE2 becomes master.
Through the Trunk negotiation, the status of Eth-Trunk 10 on PE2
becomes Up.
The Eth-Trunk status on PE2 becomes Up, and traffic of the CE is
forwarded through PE2. In this way, traffic destined for the peer CE is
protected.
If PE1 is faulty, the following situations occur:
1) If the PEs are configured with BFD, the PE2 detects that the BFD session
status becomes Down, then functions as the master and Eth-Trunk 10 of
PE2 functions as the master.
2) If the PEs are not configured with BFD, PE2 will not receive any E-Trunk
packet from PE1 before its timeout period runs out, after which PE2 will
function as the master and Eth-Trunk 10 of PE2 will function as the
master.
Through the Trunk negotiation, the status of Eth-Trunk 10 on PE2
becomes Up. The traffic of the CE is forwarded through PE2. In this way,
destined for the peer CE is protected.
– Sending and receiving of E-Trunk packets
E-Trunk packets carrying the source IP address and port number configured on the
local end are sent through UDP. Factors triggering the sending of E-Trunk packets
are as follows:

Equipment
n The sending timer times out.

n The configurations change. For example, the E-Trunk priority, packet sending
period, timeout period multiplier, addition/deletion of a member Eth-Trunk
interface, or source/destination IP address of the E-Trunk group changes.
n A member Eth-Trunk interface fails or recovers.
E-Trunk packets contain the timeout period to be used as the timeout period for the
peer.
– BFD fast detection
A device cannot quickly detect a fault on its peer based on the timeout period of
received packets. In this case, BFD can be configured on the device. The peer end
needs to be configured with an IP address. After a BFD session is established to
detect whether the route destined for the peer is reachable, E-Trunk can sense any
fault detected by BFD.
– Switchback mechanism
The local device is in master state. In such a situation, if the physical status of the
Eth-Trunk interface on the local device goes Down or the local device fails, the peer
device becomes the master and the physical status of the member Eth-Trunk
When the local end recovers, the local end needs to function as the master.
Therefore, the local Eth-Trunk interface enters the Trunk negotiation state. After
being informed by Trunk that the negotiation ability is Up, the local device starts
the switchback delay timer. After the switchback delay timer times out, the local
Eth-Trunk interface becomes the master. After Trunk negotiation, the Eth-Trunk
E-Trunk Restrictions
To improve the reliability of CE and PE links, and to ensure that traffic can be automatically
switched between these links, the configurations on both ends of the E-Trunk link must be
consistent. Use the networking in Figure 5-44 as an example.
l The Eth-Trunk link directly connecting PE1 to the CE and the Eth-Trunk link directly
connecting PE2 to the CE must be configured with the same working rate, and duplex
mode. This ensures that both Eth-Trunk interfaces have the same key and join the same
E-Trunk group.
l Proper IP addresses must be specified for the two PEs to ensure Layer 3 connectivity.
The address of the local PE is the peer address of the peer PE, and the address of the peer
PE is the peer address of the local PE. Here, it is recommended that the addresses of the
PEs are configured as loopback interface addresses.
l The two PEs must be configured with the same security key (if necessary).
5.3.3 Usage Scenario
5.3.3.1 Eth-Trunk
In Figure 5-46, an Eth-Trunk link is established between ATN A and CXB, and two full-
duplex GE interfaces are added to the Eth-Trunk interface. The total bandwidth of the Eth-
Trunk interface doubles that of each GE interface.

Equipment
Figure 5-46 Networking diagram of the Eth-Trunk
ATNA Eth-Trunk1 CX-B

10.1.1.1/24
Eth-Trunk1
10.1.1.2/24
An Eth-Trunk interface improves link reliability. If one Eth-Trunk member link fails, traffic
switches to the other member link.
An Eth-Trunk interface also alleviates network congestion, because the Eth-Trunk interface
balances its traffic between two member links.
5.3.3.2 Link Aggregation Group

As shown in Figure 5-47, traffic of different services is sent to the core network through the
UPE and PE-AGG. Different services have varied priorities. To ensure the bandwidth and
reliability of the links between the UPE and the PE-AGG, a link aggregation group, Eth-
Trunk 1, is established.
Figure 5-47 Networking diagram of the link aggregation group
Core
Network
PE-AGG
Eth-Trunk 1
UPE
You can set a working mode for the Eth-Trunk interfaces as required:
l If devices at both ends of the Eth-Trunk link support LACP, set the Eth-Trunk interfaces
to work in static LACP mode.
l If the device at either end of the Eth-Trunk does not support LACP, set the Eth-Trunk
interfaces to work in manual load balancing mode.
After Eth-Trunk 1 is created, you can implement QoS on the logical interface as it is a
common interface.
You can also implement traffic shaping, congestion management, and congestion avoidance
for outgoing traffic on Eth-Trunk 1 on both the UPE and PE-AG. These configurations ensure
that packets of high priorities are preferentially sent.

Equipment
5.3.3.3 E-Trunk Application in Dual-homing Networking

NOTE
Only ATN 950B ( AND2CXPB/AND2CXPE ) support this command.
Service Overview
Eth-Trunk implements link reliability between single devices. However, if a device fails, Eth-
Trunk ceases to take effect.
To improve network reliability, carriers introduced the device redundancy method that
requires master and backup devices. If the master device or primary link fails, the backup
device can take over user services. In this situation, another device must be dual-homed to the
master and backup devices, and inter-device link reliability must be ensured.
In dual-homing networking, Virtual Router Redundancy Protocol (VRRP) can be used to

ensure device-level reliability, and Eth-Trunk can be used to ensure link reliability. In some
cases, however, traffic cannot be switched to the backup device and secondary link
simultaneously if the master device or primary link fails. As a result, traffic is interrupted. To
address this issue, use Enhanced Trunk (E-Trunk) to implement both device-level and link
reliability.
In Figure 5-48, the customer edge (CE) is dual-homed to the virtual private LAN service
(VPLS) network, and Eth-Trunk is deployed on the CE and provider edges (PEs) to
implement link reliability.
In normal situations, the CE communicates with remote devices on the VPLS network
through PE1. If PE1 or the link between the CE and PE1 fails, the CE cannot communicate
with PE1. To ensure that services are not interrupted, deploy E-Trunk on PE1 and PE2. If PE1
or the link between the CE and PE1 fails, traffic is switched to PE2. The CE then continues to
communicate with remote devices on the VPLS network through PE2. If PE1 or the link
between the CE and PE1 recovers, traffic is switched back to PE1. E-Trunk provides backup
between Eth-Trunk links of the PEs, improving device-level reliability.
Figure 5-48 E-Trunk dual-homing networking

PE1
CE PE4
Eth
-Tr
un
k 10
E-Trunk 1
Eth-Trunk 10
P PE2 PE3

Equipment

Terms
Term Definition
LA Link aggregation. A method of bundling a group of physical interfaces to a

logical interface to increase bandwidth.
LACP Link Aggregation Control Protocol. A protocol that provides a standard

negotiation method for devices to exchange data.
BFD Bidirectional Forwarding Detection. A common fault detecting mechanism

for quickly detecting and tracking the connectivity of network links or IP
routing. To improve network performance, adjacent systems must be able to
quickly detect a communication failure and set up a backup channel to
resume communication.
ETH-OAM Ethernet operation administration maintenance. A set of Ethernet

management functions that cover fault detection, notification, location, and
repair.

Abbreviation
LACP Link Aggregation Control Protocol
LA link aggregation
ETH-OAM Ethernet operation administration maintenance
5.4 STP/RSTP/MSTP
5.4.1 Introduction
Definition
Redundant links are generally used on an Ethernet switching network to provide link backup
and enhance network reliability. However, the use of redundant links may produce loops,
causing broadcast storms and rendering the MAC address table unstable. As a result,
communication quality deteriorates, and communication services may be interrupted. The
Spanning Tree Protocol (STP) solves this problem.
STP has a narrow sense and a broad sense:
l STP, in a narrow sense, refers to only the STP protocol defined in IEEE 802.1D.

Equipment
l STP, in a broad sense, refers to the STP protocol defined in IEEE 802.1D, the Rapid
Spanning Tree Protocol (RSTP) defined in IEEE 802.1W, and the Multiple Spanning
Tree Protocol (MSTP) defined in IEEE 802.1S.
The following spanning tree protocols are defined:
l STP
IEEE 802.1D, issued in 1998, defines STP.
STP, a management protocol at the data link layer, is used to detect and prevent loops on
a Layer 2 network. STP blocks redundant links on a Layer 2 network and trims a
network into a loop-free tree topology.
The STP topology, however, converges at a slow speed. Even an edge port cannot be
changed to the Forwarding state until twice the amount of time specified by the Forward
Delay timer elapses. The default time specified by the forward delay timer is 15 seconds.
l RSTP
IEEE 802.1W, issued in 2001, defines RSTP.
RSTP, as an enhancement of STP, achieves fast convergence of the network topology.
Both RSTP and STP have one defect: All the Virtual Local Area Networks (VLANs) in a
LAN share the same spanning tree. As a result, data traffic from different VLANs cannot
be balanced. Even worse, packets in some VLANs cannot be forwarded.
RSTP is backward compatible with STP and can be used together with STP on a
network.
l MSTP
IEEE 802.1S, issued in 2002, defines MSTP.
MSTP defines a VLAN mapping table in which VLANs are associated with multiple
spanning tree instances (MSTIs). In addition, MSTP divides a switching network into
multiple regions, each of which has multiple independent MSTIs. In this manner, the
entire network is trimmed into a loop-free tree topology, and replication and circular
propagation of packets and broadcast storms are prevented on the network. MSTP also
provides multiple redundant paths to load-balance VLAN traffic.
l Table 5-13 provides a comparison between these spanning tree protocols.
Table 5-13 Comparison between spanning tree protocols

Spann Characteristics Usage Scenario
ing
Tree
Protoc
ol
STP In an STP region, a loop-free STP or RSTP is used in a scenario

tree is generated. Broadcast where all VLANs share one spanning
storms are prevented, and tree. In this situation, users or services
redundancy is achieved. do not need to be differentiated.

Equipment
Spann Characteristics Usage Scenario

ing
Tree
Protoc
ol
RSTP l In an RSTP region, a loop-

free tree is generated.
Broadcast storms are
prevented, and redundancy is
achieved.
l RSTP allows fast
convergence of the network
topology.
MSTP l In an MSTP region, a loop- MSTP is used in a scenario where traffic

free tree is generated. in different VLANs is forwarded
Broadcast storms are through different spanning trees that are
prevented, and redundancy is independent of each other to implement
achieved. load balancing. In this situation, users or
l MSTP achieves fast services are distinguished by using
convergence of the network VLANs.
topology.
l MSTP implements load
balancing among VLANs.
Traffic from different
VLANs is transmitted along
different paths.
Purpose
After a spanning tree protocol is configured on an Ethernet switching network, it calculates
the network topology and implements the following functions to remove network loops:
l Loop cut-off: The potential loops on the network are cut off by blocking redundant links.
l Link redundancy: If an active path becomes faulty, a redundant link can be activated to
ensure network connectivity.
5.4.2 Principles of STP/RSTP
5.4.2.1 Background
STP is used to prevent loops in the LAN. The switching devices running STP discover loops
on the network by exchanging information with one another, and block certain interfaces to
cut off loops. Along with the growth of the LAN scale, STP has become an important
protocol for the LAN.

Equipment
Figure 5-49 Networking diagram for a typical LAN
Host A
port1 1 port1 5
2
S1 S2
port2 3 port2 4
Host B
Data flow
On the network shown in Figure 5-49, the following situations may occur:
l Broadcast storms render the network unavailable.
It is known that loops lead to broadcast storms. In Figure 5-49, STP is not enabled on
the switches S1 and S2. If Host A broadcasts a request, the request is received by port 1
and forwarded by port 2 on both S1 and S2. S1's port 2 then receives the request from
S2's port 2 and forwards the request from S1's port 1. Similarly, S2's port 2 receives the
request from S1's port 2 and forwards the request from S2's port 1. As such transmission
repeats, resources on the entire network are exhausted, causing the network unable to
work.
l Flapping of MAC address tables damages MAC address entries.
As shown in Figure 5-49, even update of MAC address entries upon the receipt of
unicast packets damages the MAC address table.
Assume that no broadcast storm occurs on the network. Host A unicasts a packet to Host
B. If Host B is temporarily removed from the network at this time, the MAC address
entries of Host B on S1 and S2 are deleted. The packet unicast by Host A to Host B is
received by port 1 on S1. S1, however, does not have associated MAC address entries.
Therefore, the unicast packet is forwarded to port 1 and port 2. Then, port 2 on S2
receives the unicast packet from port 2 on S1 and sends it out through port 1. As such
transmission repeats, port 1 and port 2 on S1 and S2 continuously receive unicast packets
from Host A. Therefore, S1 and S2 modify the MAC address entries continuously,
causing the MAC address table to flap. As a result, MAC address entries are damaged.

Equipment
Basic Design
STP runs at the data link layer. The devices running STP discover loops on the network by
exchanging information with each other and trim the ring topology into a loop-free tree
topology by blocking a certain interface. In this manner, replication and circular propagation
of packets are prevented on the network. In addition, STP prevents the processing
performance of network devices from deteriorating.
The devices running STP usually communicate with each other by exchanging configuration
Bridge Protocol Data Units (configuration BPDUs). BPDUs are classified into two types:
l Configuration BPDU: used to calculate a spanning tree and maintain the spanning tree
topology.
l Topology Change Notification BPDU (TCN BPDU): used to inform upstream devices of
a topology change by downstream device.
NOTE
Configuration BPDUs contain sufficient information for devices to calculate the spanning tree. They
contain the following information:
l Root bridge ID: is composed of a root bridge priority and the root bridge's MAC address. Each
STP network has only one root bridge.
l Cost of the root path: indicates the cost of the shortest path to the root bridge.
l ID of a designated bridge: is composed of a bridge priority and a MAC address.
l ID of a designated port: is composed of a port priority and a port name.
l Message Age: sets the lifetime of a BPDU on the network.
l Max Age: sets the maximum time a BPDU is saved.
l Hello Time: sets the interval at which BPDUs are sent.
l Forward Delay: indicates the time interface status transition takes.
One Root Bridge

A tree topology must have a root. Therefore, the root bridge is introduced by STP.
There is only one root bridge on the entire STP-capable network. The root bridge is the
logical center of but is not necessarily at the physical center of the entire network. The root
bridge changes dynamically with the network topology.
After the network converges, the root bridge generates and sends out configuration BPDUs at
specific intervals. The other devices forward only the configuration BPDUs to advertise the
changes in the topology to ensure a stable network.
Two Types of Measurements

The spanning tree is calculated based on two types of measurements: ID and path cost.
l ID
IDs are classified into Bridge IDs (BIDs) and Port IDs (PIDs).
– BID
IEEE 802.1D defines that a BID is composed of a 16-bit bridge priority and a
bridge MAC address. The bridge priority occupies the leftmost 16 bits and the
MAC address occupies the rightmost 48 bits.

Equipment
On an STP-capable network, the device with the smallest BID is selected to be the
root bridge.
– PID
The PID is composed of a 4-bit port priority and a 12-bit port number. The port
priority occupies the left most 4 bits and the port number occupies remaining bits
on the right.
The PID is used to select the designated port.
NOTE
The port priority affects the role of a port in a specified spanning tree instance. For details,
see 5.4.2.4 STP Topology Calculation.
l Path cost
The path cost is a port variable and is used to select a link. STP calculates the path cost
to select a robust link and blocks redundant links to trim the network into a loop-free tree
topology.
On an STP-capable network, the accumulative cost of the path from a certain port to the
root bridge is the sum of the costs of all the segment paths into which the path is
separated by the ports on the transit bridges.
Table 5-14 shows the path costs defined in IEEE 802.1t. Different device manufacturers
use different path cost standards.
Table 5-14 List of path costs
Port Port Mode STP Path Cost(Recommended Value)

Speed
802.1D-19 802.1T legacy
98
0 - 65535 20000000 200,000

0
10Mbps Half-Duplex 100 2000000 2,000
Full-Duplex 99 1999999 1,999
Aggregated Link 2 95 1000000 1800

Ports

Ports

Ports
100Mbps Half-Duplex 19 200000 200
Full-Duplex 18 199999 199

Ports

Ports

Equipment
Port Port Mode STP Path Cost(Recommended Value)

Speed
802.1D-19 802.1T legacy
98

Ports
1000Mbps Full-Duplex 4 20000 20

Ports

Ports

Ports
10Gbps Full-Duplex 2 2000 2

Ports

Ports

Ports
NOTE
The rate of an aggregated link is the sum of the rates of all Up member links in the aggregated
group.
Three Elements
There are generally three elements used when a ring topology is to be trimmed into a tree
topology: root bridge, root port, and designated port. Figure 5-50 shows the three elements.

Equipment
Figure 5-50 STP network architecture
root
bridge A B S2
PC=100;RPC=0 PC=100;RPC=100
S1
B A
PC=100;RPC=0 PC=99;RPC=199
A B
PC=100;RPC=100 PC=99;RPC=199
B A
S3 PC=200;RPC=300 PC=200;RPC=300 S4
PC: path cost

RPC: root path cost
root port
designated port
blocked port
l Root bridge
The root bridge is the bridge with the smallest BID. The smallest BID is discovered by
exchanging configuration BPDUs.
l Root port
The root port is the port with the smallest root path to the root bridge. The root port is
determined based on the path cost. Among all STP-capable ports on a network bridge,
the port with the smallest root path cost is the root port. There is only one root port on an
STP-capable device, but there is no root port on the root bridge.
l Designated port
For description of the designated bridge and designated port, see Table 5-15.
Table 5-15 Description of the designated bridge and designated port

Object Designated Bridge Designated Port
Device Device that forwards Designated bridge port that

configuration BPDUs to a forwards configuration BPDUs
directly connected device to a device

Equipment
Object Designated Bridge Designated Port
LAN Device that forwards Designated bridge port that

configuration BPDUs to a forwards configuration BPDUs
network segment to a network segment.
As shown in Figure 5-51, AP1 and AP2 reside on S1; BP1 and BP2 reside on S2; CP1
and CP2 reside on S3.
– S1 sends configuration BPDUs to S2 through AP1. S1 is the designated bridge of
S2, and AP1 on S1 is the designated port.
– Two devices, S2 and S3, are connected to the LAN. If S2 is responsible for
forwarding configuration BPDUs to the LAN, S2 is the designated bridge of the
LAN and BP2 on S2 is the designated port.
Figure 5-51 Networking diagram of the designated bridge and designated port
S1
AP1 AP2
BP1 CP1
S2 S3
BP2 CP2
LAN
After the root bridge, root port, and designated port are selected successfully, the entire tree
topology is set up. If the topology is stable, only the root port and the designated port forward
traffic. All the other ports are in the Blocking state and receive only STP protocol packets
instead of forwarding user traffic.
Four Comparison Principles

STP has four comparison principles that form a BPDU priority vector {root BID, total path
costs, sender BID, port ID}.
Table 5-16 shows the port information that is carried in the configuration BPDUs.

Equipment
Table 5-16 Four important fields

Field Brief Description
Root BID Each STP-capable network has only one root bridge.
Root path cost Cost of the path from the port sending configuration
BPDUs to the root bridge.
Sender BID BID of the device sending configuration BPDUs.
Port ID PID of the port sending configuration BPDUs.
After a device on the STP-capable network receives configuration BPDUs, it compares the
fields shown in Table 5-16 with that of the configuration BPDUs on itself. The four
comparison principles are as follows:
NOTE
During the STP calculation, the smaller the value, the higher the priority.
l Smallest BID: used to select the root bridge. Devices running STP select the smallest
BID as the root BID shown in Table 5-16.
l Smallest root path cost: used to select the root port on a non-root bridge. On the root
bridge, the path cost of each port is 0.
l Smallest sender BID: used to select the root port when a device running STP selects the
root port between two ports that have the same path cost. The port with a smaller BID is
selected as the root port in STP calculation. Assume that the BID of S2 is smaller than
that of S3 in Figure 5-50. If the path costs in the BPDUs received by port A and port B
on S4 are the same, port B becomes the root port.
l Smallest PID: used to block the port with a greater PID but not the port with a smaller
PID when the ports have the same path cost. The PIDs are compared in the scenario
shown in Figure 5-52. The PID of port A on S1 is smaller than that of port B. In the
BPDUs that are received on port A and port B, the path costs and BIDs of the sending
devices are the same. Therefore, port B with a greater PID is blocked to cut off loops.
Figure 5-52 Topology to which PID comparison is applied
S1 S2
A B
designated port
blocked port

Equipment
Five Port States

Table 5-17 shows the port status of an STP-capable device.
Table 5-17 Port states

Port Purpose Description
State
Forwardi A port in the Forwarding state Only the root port and designated port
ng forwards user traffic and BPDUs. can enter the Forwarding state.
Learning When a device has a port in the This is a transitional state, which is
Learning state, the device creates a designed to prevent temporary loops.
MAC address table based on the
received user traffic but does not
forward user traffic.
Listening All ports are in the Listening state This is a transitional state.
when STP calculation is being
implemented to determine port roles.
Blocking A port in the Blocking state receives This is the final state of a blocked
and forwards only BPDUs, not user port.
traffic.
Disabled A port in the Disabled state does not The port is Down.
forward BPDUs or user traffic.
Figure 5-53 shows the process of the state transition of a port.

Equipment
Figure 5-53 State transition of a port
2
Listening
3 5
4
1 4
Disabled or
Down Blocking Learning
2
4
5
2
Forwarding
1. The port is initialized or enabled.
2. The port is blocked or the link fails.

3. The port is selected as the root or designated port .
4. The port is no longer the root or designated port .
5. The forwarding delay timer expires.
NOTICE
A Huawei datacom device uses MSTP by default. After a device transitions from the MSTP
mode to the STP mode, its STP-capable port supports the same port states as those supported
by an MSTP-capable port, including the Forwarding, Learning, and Discarding states. For
details, see Table 5-18.
Table 5-18 Port status
Port Description
Status
Forwardi A port in the Forwarding state can send and receive BPDUs as well as forward
ng user traffic.
Learning A port in the Learning state learns MAC addresses from user traffic to
construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward
user traffic.

Equipment
Port Description
Status
Discardin A port in the Discarding state can only receive BPDUs.

g
The following parameters affect the STP-capable port states and convergence.
l Hello time
The Hello timer specifies the interval at which an STP-capable device sends
configuration BPDUs to detect link faults.
When the network topology becomes stable, the change made on the interval takes effect
only after a new root bridge takes over. The new root bridge adds certain fields in
BPDUs to inform non-root bridges of the change in the interval. After a topology
changes, TCN BPDUs will be sent. This interval is irrelevant to the transmission of TCN
BPDUs.
l Forward Delay time
The Forward Delay timer specifies the delay for interface status transition. When a link
fault occurs, STP recalculation is performed, causing the structure of the spanning tree to
change. The configuration BPDUs generated during STP recalculation cannot be
immediately transmitted over the entire network. If the root port and designated port
forward data immediately after being selected, transient loops may occur. Therefore, an
interface status transition mechanism is introduced by STP. The newly selected root port
and designated port do not forward data until an amount of time equal to twice the
forward delay has past. In this manner, the newly generated BPDUs can be transmitted
over the network before the newly selected root port and designated port forward data,
which prevents transient loops.
NOTE
The Forward Delay timer specifies the duration of a port spent in both the Listening and Learning
states. The port in the Listening or Learning state is blocked, which is key to preventing transient
loops.
l Max Age time
The Max Age time specifies the aging time of BPDUs. The Max Age time can be
manually configured on the root bridge.
Configuration BPDUs are transmitted over the entire network, ensuring a unique Max
Age value. After a non-root bridge running STP receives a configuration BPDU, the
non-root bridge compares the Message Age value with the Max Age value in the
received configuration BPDU.
– If the Message Age value is smaller than or equal to the Max Age value, the non-
root bridge forwards the configuration BPDU.
– If the Message Age value is larger than the Max Age value, the configuration
BPDU ages and the non-root bridge directly discard it. In this case, the network size
is considered too large and the non-root bridge disconnects from the root bridge.
NOTE
If the configuration BPDU is sent from the root bridge, the value of Message Age is 0. Otherwise,
the value of Message Age indicates the total time during which a BPDU is sent from the root
bridge to the local bridge, including the delay in transmission. In real world situations, each time a
configuration BPDU passes through a bridge, the value of Message Age increases by 1.

Equipment
5.4.2.3 BPDU Format

The BID, path cost, and PID that are described in the previous sections are all carried in
BPDUs.
l Configuration BPDUs are heartbeat packets. STP-enabled designated ports send BPDUs
at intervals specified by the Hello timer.
l Topology Change Notification (TCN) BPDUs are sent only after the device detects
network topology changes.
A BPDU is encapsulated into an Ethernet frame. In an Ethernet frame, the destination MAC
address is the multicast MAC address 01-80-C2-00-00-00; the value of the Length/Type field
is the length of MAC data; in the Logical Link Control (LLC) header, as defined in the IEEE
standard, the values of Destination Service Access Point (DSAP) and Source Service Access
Point (SSAP) are 0x42 and the value of Control is 0x03; the BPDU header follows the LLC
header. Figure 5-54 shows the format of an Ethernet frame.
Figure 5-54 Format of an Ethernet frame

DMAC SMAC Length LLC SNAP Data CRC
DSAP SSAP Control org code Type

1byte 1byte 1byte 3byte 2byte
Configuration BPDU
Configuration BPDUs are most commonly used.
During initialization, each bridge actively sends configuration BPDUs. After the network
topology becomes stable, only the root bridge actively sends configuration BPDUs. Other
bridges send configuration BPDUs only after receiving configuration BPDUs from upstream
devices. A configuration BPDU is at least 35 bytes long, including the parameters such as the
BID, path cost, and PID. A BPDU is discarded if both the sender BID and Port ID field values
are the same as those of the local port. Otherwise, the BPDU is processed. In this manner,
BPDUs containing the same information as that of the local port are not processed.
Table 5-19 shows the format of a BPDU.
Table 5-19 BPDU format
Field Byte Description
Protocol 2 Always 0
Identifier
Protocol 1 Always 0
Version
Identifier

Equipment
BPDU Type 1 Indicates the type of a BPDU. The value is one of the
following:
l 0x00: Configuration BPDU
l 0x80: TCN BPDU
Flags 1 Indicates whether the network topology is changed.

l The rightmost bit is the Topology Change (TC) flag.
l The leftmost bit is the Topology Change
Acknowledgement (TCA) flag.
Root 8 Indicates the Bridge ID (BID) of the current root bridge.

Identifier
Root Path 4 Indicates the cumulative cost of all links to the root bridge.
Cost
Bridge 8 Indicates the BID of the bridge sending a BPDU.

Identifier
Port 2 Indicates the ID of the port sending a BPDU.

Identifier
Message 2 Records the time since the root bridge originally generated
Age the information that a BPDU is derived from.
If the configuration BPDU is sent from the root bridge, the
value of Message Age is 0. Otherwise, the value of Message
Age indicates the total time during which a BPDU is sent
from the root bridge to the local bridge, including the delay
in transmission. In real world situations, each time a
configuration BPDU passes through a bridge, the value of
Message Age increases by 1.
Max Age 2 Indicates the maximum time that a BPDU is saved.
Hello Time 2 Indicates the interval at which BPDUs are sent.
Forward 2 Indicates the time spent in the Listening and Learning states.
Delay
Figure 5-55 shows the Flags field. Only the leftmost and rightmost bits are used in STP.

Equipment
Figure 5-55 Format of the Flags field
Reserved
Bit7 Bit0
TCA（Topology Change TC（Topology

Acknowledgment flag） Change flag）
A configuration BPDU is generated in one of the following scenarios:

l Once the ports are enabled with STP, the designated ports send configuration BPDUs at
intervals specified by the Hello timer.
l When a root port receives configuration BPDUs, the device where the root port resides
sends a copy of the configuration BPDUs to the specified ports on itself.
l When receiving a configuration BPDU with a lower priority, a designated port
immediately sends its own configuration BPDUs to the downstream device.
TCN BPDU
The contents of TCN BPDUs are quite simple, including only three fields: Protocol ID,
Version, and Type, as shown in Table 5-19. The value of the Type field is 0x80, four bytes in
length.
TCN BPDUs are transmitted by each device to its upstream device to notify the upstream
device of changes in the downstream topology, until they reach the root bridge. A TCN BPDU
is generated in one of the following scenarios:
l Where the port is in the Forwarding state.
l Where a designated port receives TCN BPDUs and sends a copy to the root bridge.
5.4.2.4 STP Topology Calculation
Initialization of the Spanning Tree

After all devices on the network are enabled with STP, each device considers itself the root
bridge. Each device only transmits and receives BPDUs but does not forward user traffic. All
ports are in the Listening state. After exchanging configuration BPDUs, all devices participate
in the selection of the root bridge, root port, and designated port.
1. Root bridge selection

As shown in Figure 5-56, the quadruple marked with {} indicates a set of ordered
vectors: root BID (S1_MAC and S2_MAC indicates the Bridge IDs (BIDs) of two
devices), total path costs, sender BID, and Port ID. Configuration BPDUs are sent at
intervals set by the Hello timer. By default, the interval is 2 seconds.

Equipment
NOTE
As each bridge considers itself the root bridge, the value of the root BID field in the BPDU sent by
each port is recorded as its BID; the value of the Root Path Cost field is the cumulative cost of all
links to the root bridge; the sender BID is the ID of the local bridge; the Port ID is the PID of the
local bridge port that sends the BPDU.
Figure 5-56 Exchange of initialization messages

{S1_MAC,0,S1_MAC,A_PID}
A B
S1 {S2_MAC,0,S2_MAC,B_PID} S2
Once a port receives a BPDU with a priority higher than that of itself, the port extracts
certain information from the BPDU and synchronizes its own information with the
obtained information. The port stops sending the BPDU immediately after saving the
updated BPDU.
When sending a BPDU, each device fills in the Sender BID field with its own BID.
When a device considers itself the root bridge, the device fills in the Root BID field with
its own BID. As shown in Figure 5-56, Port B on S2 receives a BPDU with a higher
priority from S1, and therefore considers S1 the root bridge. When another port on S2
sends a BPDU, the port fills in its Root BID field with S1_BID. The preceding
intercommunication is repeatedly performed between two devices until all devices
consider the same device as the root bridge. This indicates that the root bridge is
selected. Figure 5-57 shows the root bridge selection.
Figure 5-57 Diagram of root bridge selection
Priority 32768
MAC 0000-0C12-3458
A
S2
Priority 32768
Priority 32768 S3 MAC 0000-0C12-3457

S1
MAC 0000-0C12-3456 root
bridge
2. Root port selection

Each non-root bridge must and can only select one root port.
After the root bridge has been selected, each bridge determines the cost of each possible
path from itself to the root bridge. From these paths, it picks one with the smallest cost (a

Equipment
least-cost path). The port connecting to that path becomes the root port of the bridge.
Figure 5-58 shows the root port selection.
NOTE
In the Root Path Cost algorithm, after a port receives a BPDU, the port extracts the value of the
Root Path Cost field, and adds the obtained value and the path cost on the itself to obtain the root
path cost. The path cost on the port covers only directly-connected path costs. The cost can be
manually configured on a port. If the root path costs on two or more ports are the same, the port
that sends a BPDU with the smallest sender BID value is selected as the root port.
Figure 5-58 Diagram of root port selection

S2
A
19(cost) port5 port6 19(cost)
S1 port1 port3
root S3
bridge
port2 port4
19(cost)
root port
3. Selection of a designated port

A port that discards lower-priority BPDUs received from other ports, whether on the
local device or other devices on the network segment, is called a designated port on the
network segment. As shown in Figure 5-56, assume that the MAC address of S1 is
smaller than that of S2. Port A on S1 is selected as a designated port. The device where a
designated port resides is called a designated bridge on the network segment. In Figure
5-56, S1 is a designated bridge on the network segment.
After the network convergence is implemented, only the designated port and root port
are in the Forwarding state. The other ports are in the Blocking state. They do not
forward user traffic.
Ports on the root bridge are all designated ports unless loops occur on the root bridge.
Figure 5-59 shows the designated port selection.

Equipment
Figure 5-59 Diagram of designated port selection

BID:32768.0000-0C12-3456
A S2 RPC:38
SBID:32768.0000-0C12-3458
PID:0806
19 port5 port6 19
BID:32768.0000-0C12-3456
RPC:38
port1 port3 SBID:32768.0000-0C12-3457
S1
PID:0803
root S3
bridge
port2 port4
19
root port
designated port
blocked port
After the Topology Becomes Stable

After the topology becomes stable, the root bridge still sends configuration BPDUs at
intervals set by the Hello timer. Each non-root bridge forwards the received configuration
BPDUs by using its designated port. If the priority of the received BPDU is higher than that
on the non-root bridge, the non-root bridge updates its own BPDU based on the information
carried in the received BPDU.
STP Topology Changes

Figure 5-60 shows the packet transmission process after the STP topology changes.

Equipment
Figure 5-60 Diagram of packet transmission after the topology changes
Non-Root Non-Root Root BPDU Flags

Bridge A Bridge B Bridge Type TCA TC
1 2 TCN N/A N/A
3 Config 1 0
4 5 TCN N/A N/A
6 Config 1 1
Config 0 1
7 Config 0 1
Config 0 1
TC flag continues for a total of

Max Age + Forward Delay seconds (Default=35)
1. After the network topology changes, a downstream device continuously sends Topology
Change Notification (TCN) BPDUs to an upstream device.
2. After the upstream device receives TCN BPDUs from the downstream device, only the
designated port processes them. The other ports may receive TCN BPDUs but do not
process them.
3. The upstream device sets the Topology Change Acknowledgment (TCA) bit of the Flags
field in the configuration BPDUs to 1 and returns the configuration BPDUs to instruct
the downstream device to stop sending TCN BPDUs.
4. The upstream device sends a copy of the TCN BPDUs to the root bridge.
5. Steps 1, 2, 3 and 4 are repeated until the root bridge receives the TCN BPDUs.
6. The root bridge sets the Topology Change (TC) bit of the Flags field in the configuration
BPDUs to 1 to instruct the downstream device to delete MAC address entries.
NOTE
l TCN BPDUs are used to inform the upstream device and root bridge of topology changes.
l Configuration BPDUs with the TCA bit being set to 1 are used by the upstream device to inform the
downstream device that the topology changes are known and instruct the downstream device to stop
sending TCN BPDUs.
l Configuration BPDUs with the TC bit being set to 1 are used by the upstream device to inform the
downstream device of topology changes and instruct the downstream device to delete MAC address
entries. In this manner, fast network convergence is achieved.
Figure 5-59 is used as an example to show how the network topology converges when the
root bridge or designated port of the root bridge becomes faulty.
l The root bridge becomes faulty.

Equipment
Figure 5-61 Diagram of topology changes in the case of a faulty root bridge
A S2 B
port5 port6
S1 port1 port3 S3
root root
bridge bridge
port2 port4
root port
designated port
As shown in Figure 5-61, the root bridge becomes faulty, S2 and S3 will reselect the
root bridge. S2 and S3 exchange configuration BPDUs to select the root bridge.
l The designated port of the root bridge becomes faulty.

Equipment
Figure 5-62 Diagram of topology changes in the case of a faulty designated port on the
root bridge
S2
A B
port5 port6
S1 port1 port3
root S3
bridge
port2 port4
root port
designated port
As shown in Figure 5-62, the designated port of the root bridge, port1, becomes faulty.
port6 is selected as the root port through exchanging configuration BPDUs of S2 and S3.
In addition, port6 sends TCN BPDUs after entering the forwarding state. Once the root
bridge receives the TCN BPDUs, it will send TC-BPDUs to instruct the downstream
device to delete MAC address entries.
5.4.2.5 Evolution from STP to RSTP

In 2001, IEEE 802.1w was published to introduce an extension of the Spanning Tree Protocol
(STP), namely, Rapid Spanning Tree Protocol (RSTP). RSTP is developed based on STP but
outperforms STP.
Disadvantages of STP
STP ensures a loop-free network but has a slow network topology convergence speed, leading
to service deterioration. If the network topology changes frequently, the connections on the
STP-capable network are frequently torn down, causing frequent service interruption. Users
can hardly tolerate such a situation.
Disadvantages of STP are as follows:
l Port status or port roles are not subtly distinguished, which is not conducive to the
learning and deployment for beginners.
A network protocol that subtly defines and distinguishes different situations is likely to
outperform the others.

Equipment
– Ports in the Listening, Learning, and Blocking states do not forward user traffic and
are not even slightly different to users.
– The differences between ports in essence never lie in the port status but the port
roles from the perspective of use and configuration.
It is possible that the root port and designated port are both in the Listening state or
Forwarding state.
l The STP algorithm determines topology changes after the time set by the timer expires,
which slows down network convergence.
l The STP algorithm requires a stable network topology. After the root bridge sends
configuration BPDUs, other devices forward them until all bridges on the network
receive the configuration BPDUs.
This also slows down topology convergence.
Advantages of RSTP over STP

To make up for STP disadvantages, RSTP deletes three port states, introduces two port roles,
and distinguishes port attributes based on port status and roles to provide more accurate port
description. This offers beginners an easy access to protocols and speeds up topology
convergence.
l More port roles are defined to simplify the knowledge and deployment of STP.

Equipment
Figure 5-63 Diagram of port roles
S1
root bridge
B A
S2 S3
A A a
S1
root bridge
B A
S2 S3
A a
B A
b
root port
designated port
Alternate port
Backup port
As shown in Figure 5-63, RSTP defines four port roles: root port, designated port,
alternate port, and backup port.
The functions of the root port and designated port are the same as those defined in STP.
The alternate port and backup port are described as follows:

Equipment
– From the perspective of configuration BPDU transmission:

n An alternate port is blocked after learning the configuration BPDUs sent by
other bridges.
n A backup port is blocked after learning the configuration BPDUs sent by itself.
– From the perspective of user traffic
n An alternate port backs up the root port and provides an alternate path from the
designated bridge to the root bridge.
n A backup port backs up the designated port and provides an alternate path
from the root bridge to the related network segment.
After all RSTP-capable ports are assigned roles, topology convergence is
completed.
l Port states are redefined in RSTP.
Port states are simplified from five types to three types. Based on whether a port
forwards user traffic and learns MAC addresses, the port is in one of the following states:
– If a port neither forwards user traffic nor learns MAC addresses, the port is in the
Discarding state.
– If a port does not forward user traffic but learns MAC addresses, the port is in the
Learning state.
– If a port forwards user traffic and learns MAC addresses, the port is in the
Forwarding state.
Table 5-20 shows the comparison between port states in STP and RSTP.
NOTE
Port status and port roles are not necessarily related. Table 5-20 lists states of ports with different
roles.
Table 5-20 Comparison between states of STP ports and RSTP ports with different roles
STP Port State RSTP Port State Port Role
Forwarding Forwarding Root port or designated port
Learning Learning Root port or designated port
Listening Discarding Root port or designated port
Blocking Discarding Alternate port or backup port
Disabled Discarding Disabled port
l Configuration BPDUs in RSTP are differently defined. Port roles are described based on
the Flags field defined in STP.
Compared with STP, RSTP slightly redefined the format of configuration BPDUs.
– The value of the Type field is no longer set to 0 but 2. Therefore, the RSTP-capable
device always discards the configuration BPDUs sent by an STP-capable device.
– The 6 bits in the middle of the original Flags field are reserved. Such a
configuration BPDU is called a Rapid Spanning Tree (RST) BPDU, as shown in
Figure 5-64.

Equipment
Figure 5-64 Format of the Flags field in an RST BPDU
Bit7 Bit6 Bit5 Bit4 Bit3 Bit2 Bit1 Bit0

TCA Agreement Forwarding Learning Port role Proposal TC
Topology Change Topology

Acknowledgment flag Change flag
Port role = 00 Unknown
01 Alternate/Backup port
10 Root port
11 Designated port
l Configuration BPDUs are processed in a different manner.

– Transmission of configuration BPDUs
In STP, after the topology becomes stable, the root bridge sends configuration
BPDUs at an interval set by the Hello timer. A non-root bridge does not send
configuration BPDUs until it receives configuration BPDUs sent from the upstream
device. This renders the STP calculation complicated and time-consuming. In
RSTP, after the topology becomes stable, a non-root bridge sends configuration
BPDUs at Hello intervals, regardless of whether it has received the configuration
BPDUs sent from the root bridge. Such operations are implemented on each device
independently.
– BPDU timeout period
In STP, a device has to wait a Max Age period before determining a negotiation
failure. In RSTP, if a port does not receive configuration BPDUs sent from the
upstream device for three consecutive Hello intervals, the negotiation between the
local device and its peer fails.
– Processing of inferior BPDUs
In RSTP, when a port receives an RST BPDU from the upstream designated bridge,
the port compares the received RST BPDU with its own RST BPDU.
If its own RST BPDU is superior to the received one, the port discards the received
RST BPDU and immediately responds to the upstream device with its own RST
BPDU. After receiving the RST BPDU, the upstream device updates its own RST
BPDU based on the corresponding fields in the received RST BPDU.
In this manner, RSTP processes inferior BPDUs more rapidly, independent of any
timer that is used in STP.
l Rapid convergence
– Proposal/agreement mechanism
When a port is selected as a designated port, in STP, the port does not enter the
Forwarding state until a Forward Delay period expires; in RSTP, the port enters the
Discarding state, and then the proposal/agreement mechanism allows the port to
immediately enter the Forwarding state. The proposal/agreement mechanism must
be applied on the P2P links in full-duplex mode.
For details, see 5.4.2.6 Details About RSTP.
– Fast switchover of the root port
If the root port fails, the most superior alternate port on the network becomes the
root port and enters the Forwarding state. This is because there must be a path from

Equipment
the root bridge to a designated port on the network segment connecting to the
alternate port.
When the port role changes, the network topology will change accordingly. For
details, see 5.4.2.6 Details About RSTP.
– Edge ports
In RSTP, a designated port on the network edge is called an edge port. An edge port
directly connects to a terminal and does not connect to any other switching devices.
An edge port does not receive configuration BPDUs, and therefore does not
participate in the RSTP calculation. It can directly change from the Disabled state to
the Forwarding state without any delay, just like an STP-incapable port. If an edge
port receives bogus configuration BPDUs from attackers, it is deprived of the edge
port attributes and becomes a common STP port. The STP calculation is
implemented again, causing network flapping.
l Protection functions
Table 5-21 shows protection functions provided by RSTP.
Table 5-21 Protection functions

Protectio Scenario Principle
n
Function
BPDU On a switching device, ports After BPDU protection is enabled on a

protection that are directly connected to switching device, if an edge port
a user terminal such as a PC receives an RST BPDU, the switching
or file server are configured device shuts down the edge port without
as edge ports. depriving of its attributes, and notifies
Usually, no RST BPDU will the NMS of the shutdown event. The
be sent to edge ports. If a edge port can be started only by the
switching device receives network administrator.
bogus RST BPDUs on an To allow an edge port to automatically
edge port, the switching start after being shut down, you can
device automatically sets the configure the auto recovery function and
edge port to a non-edge port, set the delay on the port. In this manner,
and performs STP calculation an edge port starts automatically after
again. This causes network the set delay. If the edge port receives
flapping. RST BPDUs again, the edge port will
again be shut down.
NOTE
The smaller the delay is set, the sooner the
edge port becomes Up, and the more
frequently the edge port alternates between
Up and Down. The larger the delay is set, the
later the edge port becomes Up, and the
longer the service interruption lasts.

Equipment

n
Function
Root Due to incorrect If a designated port is enabled with the

protection configurations or malicious root protection function, the port role
attacks on the network, the cannot be changed. Once a designated
root bridge may receive RST port that is enabled with root protection
BPDUs with a higher priority. receives RST BPDUs with a higher
Consequently, the valid root priority, the port enters the Discarding
bridge is no longer able to state and does not forward packets. If
serve as the root bridge, and the port does not receive any RST
the network topology BPDUs with a higher priority before a
incorrectly changes. This also period (generally two Forward Delay
causes the traffic that should periods) expires, the port automatically
be transmitted over high- enters the Forwarding state.
speed links to be transmitted NOTE
over low-speed links, leading Root protection can take effect on only
to network congestion. designated ports.
Loop On an RSTP-capable After loop protection is configured, if

protection network, the switching device the root port or alternate port does not
maintains the status of the receive RST BPDUs from the upstream
root port and blocked ports by switching device for a long time, the
continually receiving BPDUs switching device notifies the NMS that
from the upstream switching the port enters the Discarding state. The
device. blocked port remains in the Blocked
If ports cannot receive state and does not forward packets. This
BPDUs from the upstream prevents loops on the network. The root
switching device due to link port or alternate port restores the
congestion or unidirectional Forwarding state after receiving new
link failures, the switching RST BPDUs.
device re-selects a root port. NOTE
Then, the previous root port Loop protection can take effect on only the
root port and alternate ports.
becomes a designated port
and the blocked ports change
to the Forwarding state. As a
result, loops may occur on the
network.

Equipment

n
Function
TC- After receiving TC-BPDUs, a After the TC-BPDU attack defense is

BPDU switching device will delete enabled, the number of times that TC-
attack its MAC entries and ARP BPDUs are processed by the switching
defense entries. In the event of a device within a given time period is
malicious attack by sending configurable. If the number of TC-
bogus TC BPDUs, a BPDUs that the switching device
switching device receives a receives within the given time exceeds
large number of TC-BPDUs the specified threshold, the switching
within a short period, and device processes TC-BPDUs only for
busies itself deleting its MAC the specified number of times. Excess
entries and ARP entries. As a TC-BPDUs are processed by the
result, the switching device is switching device as a whole for once
heavily burdened, rendering after the specified period expires. In this
the network rather unstable. manner, the switching device is
prevented from frequently deleting its
MAC entries and ARP entries, and
therefore is protected against
overburden.
5.4.2.6 Details About RSTP
P/A Mechanism
The Proposal/Agreement (P/A) mechanism helps a designated port to enter the Forwarding
state as soon as possible. As shown in Figure 5-65, the P/A negotiation is performed based on
the following port variables:
Figure 5-65 BPDU exchange during the P/A negotiation
Upstream Downstream
device device
Sends a proposal so that the The root port blocks

port can rapidly enter the the other non-edge
Forwarding state ports, enters the
Forwarding state, and
Sends an agreement then sends an
The designated agreement back to the
port enters the upstream device
Forwarding state
Root port
Designated port

Equipment
1. proposing: When a port is in the Discarding or Learning state, this variable is set to 1.
Additionally, a Rapid Spanning Tree (RST) BPDU with the Proposal field being 1 is sent
to the downstream switching device.
2. proposed: After a port receives an RST BPDU with the Proposal field being 1 from the
designated port on the peer device, this variable is set to 1, urging the designated port on
this network segment to enter the Forwarding state.
3. sync: After the proposed variable is set to 1, the root port receiving the proposal sets the
sync variable to 1 for the other ports on the same device; a non-edge port receiving the
proposal enters the Discarding state.
4. synced: After a port enters the Discarding state, it sets its synced variable to 1 in the
following manner: If this port is the alternate, backup, or edge port, it will immediately
set its synced variable to 1. If this port is the root port, it will monitor the synced
variables of the other ports. After the synced variables of all the other ports are set to 1,
the root port sets its synced variable to 1, and sends an RST BPDU with the Agreement
field being 1.
5. agreed: After the designated port receives an RST BPDU with the Agreement field being
1 and the port role field indicating the root port, this variable is set to 1. Once the agreed
variable is set to 1, this designated port immediately enters the Forwarding state.
Figure 5-66 Schematic diagram for the P/A negotiation
S1
p0 1 Proposal
3 Agreement
p1
S2
p2 E p4
p3
2 sync 2 sync 2 sync

(leaves the port (blocks the (leaves the port
status unchanged) port) status unchanged)
designated port
alternate port
E edge port
As shown in Figure 5-66, a new link is established between the root bridges S1 and S2. On
S2, p2 is an alternate port; p3 is a designated port in the Forwarding state; p4 is an edge port.
The P/A mechanism works in the following process:

Equipment
1. p0 and p1 become designated ports and send RST BPDUs.

2. After receiving an RST BPDU with a higher priority, p1 realizes that it will become a
root port but not a designated port, and therefore it stops sending RST BPDUs.
3. p0 enters the Discarding state, and sends RST BPDUs with the Proposal field being 1.
4. After receiving an RST BPDU with the Proposal field being 1, S2 sets the sync variable
to 1 for all its ports.
5. As p2 has been blocked, its status keeps unchanged; p4 is an edge port, and therefore it
does not participate in calculation. Therefore, only the non-edge designated port p3
needs to be blocked.
6. After p2, p3, and p4 enter the Discarding state, their synced variables are set to 1. The
synced variable of the root port p1 is then set to 1, and p1 sends an RST BPDU with the
Agreement field being 1 to S1. Except for the Agreement field, which is set to 1, and the
Proposal field, which is set to 0, the RST BPDU is the same as that was received.
7. After receiving this RST BPDU, S1 identifies it as a reply to the proposal that it just sent,
and therefore p0 immediately enters the Forwarding state.
This P/A negotiation process finishes, and S2 continues to perform the P/A negotiation with
its downstream device.
Theoretically, STP can quickly select a designated port. To prevent loops, STP has to wait for
a period of time long enough to determine the status of all ports on the network. All ports can
enter the Forwarding state at least one forward delay later. RSTP is developed to eliminate
this bottleneck by blocking non-root ports to prevent loops. By using the P/A mechanism, the
upstream port can rapidly enter the Forwarding state.
NOTE
To use the P/A mechanism, ensure that the link between the two devices is a P2P link in full-duplex
mode. Once the P/A negotiation fails, a designated port can be selected by performing the STP
negotiation after the forwarding delay timer expires twice.
RSTP Topology Change

In RSTP, if a non-edge port changes to the Forwarding state, the topology changes.
After a switching device detects the topology change (TC), it performs the following
procedures:
l Start a TC While Timer for every non-edge port. The TC While Timer value doubles the
Hello Timer value.
All MAC addresses learned by the ports whose status changes are cleared before the
timer expires.
These ports send RST BPDUs with the TC field being 1. Once the TC While Timer
expires, they stop sending the RST BPDUs.
l After another switching device receives the RST BPDU, it clears the MAC addresses
learned by all ports excluding the one that receives the RST BPDU. The device then
starts a TC While Timer for all non-edge ports and the root port, the same as the
preceding process.
In this manner, RST BPDUs flood the network.
Interoperability Between RSTP and STP

When RSTP switches to STP, RSTP loses its advantages such as fast convergence.

Equipment
On a network where both STP-capable and RSTP-capable devices are deployed, STP-capable
devices ignore RST BPDUs; if a port on an RSTP-capable device receives a configuration
BPDU from an STP-capable device, the port switches to the STP mode after two Hello
intervals and starts to send configuration BPDUs. In this manner, RSTP and STP are
interoperable.
After STP-capable devices are removed, Huawei RSTP-capable datacom devices can switch
back to the RSTP mode.
5.4.3 MSTP Principles
5.4.3.1 MSTP Background

RSTP, an enhancement to STP, implements fast convergence of the network topology. There
is a defect for both RSTP and STP: All VLANs on a LAN use one spanning tree, and
therefore VLAN-based load balancing cannot be performed. Once a link is blocked, it will no
longer transmit traffic, wasting bandwidth and causing the failure in forwarding certain
VLAN packets.
Figure 5-67 STP/RSTP defect

S1 S4
VLAN3 VLAN2 VLAN3 VLAN2
HostC HostA
（VLAN3） VLAN3 VLAN2 （VLAN2）
VLAN2 VLAN3
S2 S5
HostB VLAN2 VLAN2 HostD

VLAN2 VLAN3
VLAN3
S3 S6
spanning tree(root bridge:S6)
On the network shown in Figure 5-67, STP or RSTP is enabled. The broken line shows the
spanning tree. S6 is the root switching device. The links between S1 and S4 and between S2
and S5 are blocked. VLAN packets are transmitted by using the corresponding links marked
with "VLAN2" or "VLAN3."
Host A and Host B belong to VLAN 2 but they cannot communicate with each other because
the link between S2 and S5 is blocked and the link between S3 and S6 denies packets from
VLAN 2.
To fix the defect of STP and RSTP, the IEEE released 802.1s in 2002, defining the Multiple
Spanning Tree Protocol (MSTP). MSTP implements fast convergence and provides multiple
paths to load balance VLAN traffic.

Equipment
MSTP divides a switching network into multiple regions, each of which has multiple
spanning trees that are independent of each other. Each spanning tree is called a multiple
spanning tree instance (MSTI) and each region is called a multiple spanning tree (MST)
region.
NOTE
An instance is a collection of VLANs. Binding multiple VLANs to an instance saves communication

costs and reduces resource usage. The topology of each MSTI is calculated independent of one another,
and traffic can be balanced among MSTIs. Multiple VLANs that have the same topology can be mapped
to one instance. The forwarding status of the VLANs for a port is determined by the port status in the
MSTI.
Figure 5-68 Multiple spanning trees in an MST region

S1 S4
HostC HostA
VLAN2
S2 S5
HostB VLAN2 VLAN2 HostD

VLAN3
VLAN2 VLAN3
S3 S6
As shown in Figure 5-68, MSTP maps VLANs to MSTIs in the VLAN mapping table. Each
VLAN can be mapped to only one MSTI. This means that traffic of a VLAN can be
transmitted in only one MSTI. An MSTI, however, can correspond to multiple VLANs.
Two spanning trees are calculated:
l MSTI 1 uses S4 as the root switching device to forward packets of VLAN 2.
l MSTI 2 uses S6 as the root switching device to forward packets of VLAN 3.
In this manner, devices within the same VLAN can communicate with each other; packets of
different VLANs are load balanced along different paths.
5.4.3.2 Basic MSTP Concepts
MSTP Network Hierarchy

As shown in Figure 5-69, an MSTP network consists of one or more MST regions. Each
MST region contains one or more Multiple Spanning Tree Instances (MSTIs). An MSTI is a
tree network consisting of switching devices running STP, RSTP, or MSTP.

Equipment
Figure 5-69 MSTP network hierarchy
MSTP Network
MSTI
MSTI1
1
MSTI2 MSTI0 MSTI2 MSTI0
MST Region MST Region
MSTI1
MSTI2 MSTI0
MST Region
MST Region
An MST region contains multiple switching devices and network segments between them.
The switching devices of one MST region have the following characteristics:
l MSTP-enabled
l Same region name
l Same VLAN-MSTI mappings
l Same MSTP revision level
A LAN can comprise several MST regions that are directly or indirectly connected. Multiple
switching devices can be grouped into an MST region by using MSTP configuration
commands.
As shown in Figure 5-70, the MST region D0 contains the switching devices S1, S2, S3, and
S4, and has three MSTIs.

Equipment
Figure 5-70 MST region
AP1
D0 S1
MSTI1
Master Bridge
root switch:S3
MSTI2
root switch:S2
MSTI0 (IST)
S2 S3 root switch:S1
VLAN1 MSTI1
VLAN2,VLAN3 MSTI2
S4 other VLANs MSTI0
VLAN Mapping Table

The VLAN mapping table is an attribute of the MST region. It describes mappings between
VLANs and MSTIs.
As shown in Figure 5-70, the mappings in the VLAN mapping table of the MST region D0
are as follows:
l VLAN 1 is mapped to MSTI 1.
l VLAN 2 and VLAN 3 are mapped to MSTI 2.
l Other VLANs are mapped to MSTI 0.
Regional Root
Regional roots are classified into Internal Spanning Tree (IST) and MSTI regional roots.
In the region B0, C0, and D0 on the network shown in Figure 5-72, the switching devices
closest to the Common and Internal Spanning Tree (CIST) root are IST regional roots.
An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional
root is the root of the MSTI. On the network shown in Figure 5-71, each MSTI has its own
regional root.

Equipment
Figure 5-71 MSTI
MST Region
VLAN VLA
N10
10&20&30 &20
VLAN 20&30
VLAN VLAN VLAN

30 VLAN
10&30 2 10&30
0
VLAN 10
Root
Root
MSTI MSTI MSTI Root

corresponding to corresponding to corresponding to
MSTI
links
MSTI links blocked by the protocol
MSTIs are independent of each other. An MSTI can correspond to one or more VLANs, but a
VLAN can be mapped to only one MSTI.
Master Bridge
The master bridge is the IST master, which is the switching device closest to the CIST root in
a region, for example, S1 shown in Figure 5-70.
If the CIST root is in an MST region, the CIST root is the master bridge of the region.

Equipment
CIST Root
Figure 5-72 MSTP network
A0
CIST Root
D0 Region Root B0
Region Root
C0
Region Root
IST
CST
On the network shown in Figure 5-72, the CIST root is the root bridge of the CIST. The CIST
root is a device in A0.
CST
A Common Spanning Tree (CST) connects all the MST regions on a switching network.
If each MST region is considered a node, the CST is calculated by using STP or RSTP based
on all the nodes.
As shown in Figure 5-72, the MST regions are connected to form a CST.
IST
An IST resides within an MST region.
An IST is a special MSTI with the MSTI ID being 0, called MSTI 0.
An IST is a segment of the CIST in an MST region.
As shown in Figure 5-72, the switching devices in an MST region are connected to form an
IST.

Equipment
CIST
A CIST, calculated by using STP or RSTP, connects all the switching devices on a switching
network.
As shown in Figure 5-72, the ISTs and the CST form a complete spanning tree, the CIST.
SST
A Single Spanning Tree (SST) is formed in either of the following situations:
l A switching device running STP or RSTP belongs to only one spanning tree.
l An MST region has only one switching device.
As shown in Figure 5-72, the switching device in B0 forms an SST.
Port Role
Based on RSTP, MSTP has two additional port types. MSTP ports can be root ports,
designated ports, alternate ports, backup ports, edge ports, master ports, and regional edge
port.
The functions of root ports, designated ports, alternate ports, and backup ports have been
defined in RSTP. Table 5-22 lists all port roles in MSTP.
NOTE
Except edge ports, all ports participate in MSTP calculation.

A port can play different roles in different spanning tree instances.
Table 5-22 Port roles
Port Description
Role
Root port A root port is the non-root bridge port closest to the root bridge. Root bridges
do not have root ports.
Root ports are responsible for sending data to root bridges.
As shown in Figure 5-73, S1 is the root; CP1 is the root port on S3; BP1 is the
root port on S2; DP1 is the root port on S4.
Designate The designated port on a switching device forwards bridge protocol data units
d port (BPDUs) to the downstream switching device.
As shown in Figure 5-73, AP2 and AP3 are designated ports on S1; BP2 is a
designated port on S2; CP2 is a designated port on S3.
Alternate l An alternate port is blocked after it receives a BPDU sent by another

port switching device.
l An alternate port provides an alternate path to the root bridge. This path is
different than using the root port.
As shown in Figure 5-73, DP4 and AP4 are alternate ports.

Equipment
Port Description
Role
Backup l A backup port is blocked after it receives a BPDU sent by itself.

port l A backup port provides a redundant path to a segment and is the backup for
the root port.
As shown in Figure 5-73, CP3 is the backup port.
Master A master port is on the shortest path connecting MST regions to the CIST root.
port BPDUs of an MST region are sent to the CIST root through the master port.
Master ports are special regional edge ports, functioning as root ports on ISTs
or CISTs and master ports in instances.
As shown in Figure 5-73, S1, S2, S3, and S4 form an MST region. AP1 on S1,
being the nearest port in the region to the CIST root, is the master port.
Regional A regional edge port is located at the edge of an MST region and connects to
edge port another MST region or an SST.
During MSTP calculation, the roles of a regional edge port in the MSTI and
the CIST instance are the same. If the regional edge port is the master port in
the CIST instance, it is the master port in all the MSTIs in the region.
As shown in Figure 5-73, ports AP1, DP2, and DP3 in the MST region are
directly connected to other regions, and therefore they are all regional edge
ports of the MST region.
As shown in Figure 5-73, AP1 is a regional edge port and a master port in the
CIST. Therefore, AP1 is the master port in every MSTI in the MST region.
Edge port An edge port is located at the edge of an MST region and does not connect to
any switching device.
Generally, edge ports are directly connected to terminals.
After MSTP is enabled on a port, edge-port detecting is started automatically.
If the port fails to receive BPDU packets within (2 x Hello Timer + 1) seconds,
the port is set to an edge port. Otherwise, the port is set to a non-edge port.
As shown in Figure 5-73, BP3 is an edge port.

Equipment
Figure 5-73 Port roles
AP1 AP4
MST Region
Root port
AP2 AP3
S1 Designated port
Alternate
Root Bridge
port
CP1 BP1 Backup port
S3 S2 Regional edge
port
BP2 Master port
CP2 CP3 BP3
Edge port
S4
DP1 DP4 PC
DP2 DP3
MSTP Port Status

Table 5-23 lists the MSTP port status, which is the same as the RSTP port status.
Table 5-23 Port status

Port Description
Status
Forwardi A port in the Forwarding state can send and receive BPDUs as well as forward
ng user traffic.
Learning A port in the Learning state learns MAC addresses from user traffic to
construct a MAC address table.
In the Learning state, the port can send and receive BPDUs, but not forward
user traffic.
Discardin A port in the Discarding state can only receive BPDUs.

g
There is no necessary link between the port status and the port role. Table 5-24 lists the
relationships between port roles and port status.

Equipment
Table 5-24 Relationships between port roles and port status

Port Root Port/ Designated Regional Alternate Backup Port
Status Master Port Port Edge Port Port
Forwardi Yes Yes Yes No No

ng
Learning Yes Yes Yes No No
Discardi Yes Yes Yes Yes Yes

ng
Yes: The port supports this status.

No: The port does not support this status.
5.4.3.3 MST BPDUs

MSTP calculates spanning trees on the basis of Multiple Spanning Tree Bridge Protocol Data
Units (MST BPDUs). By transmitting MST BPDUs, spanning tree topologies are computed,
network topologies are maintained, and topology changes are conveyed.
Table 5-25 shows differences between Topology Change Notification Bridge Protocol Data
Units (TCN BPDUs), configuration BPDUs defined by STP, Rapid Spanning Tree Bridge
Protocol Data Units (RST BPDUs) defined by RSTP, and MST BPDUs defined by MSTP.
Table 5-25 Differences between BPDUs

Version Type Name
0 0x00 Configuration BPDU
0 0x80 TCN BPDU
2 0x02 RST BPDU
3 0x02 MST BPDU
MST BPDU Format

Figure 5-74 shows the MST BPDU format.

Equipment
Figure 5-74 MST BPDU format
Octet
Protocol Identifier 1-2
Protocol Version Identifier 3
BPDU Type 4
CIST Flags 5
CIST Root Identifier 6-13
CIST External Path Cost 14-17
CIST Regional Root Identifier 18-25
CIST Port Identifier 26-27
Message Age 28-29
Max Age 30-31
Hello Time 32-33
Forward Delay 34-35
Version 1 Length=0 36
Version 3 Length 37-38
MST Configuration Identifier 39-89
MST 90-93
CIST Internal Root Path Cost
special
CIST Bridge Identifier 94-101
fields
CIST Remaining Hops 102
MSTI Configuration Messages 103-39+Version
(may be absent) 3 Length
The first 36 bytes of an intra-region or inter-region MST BPDU are the same as those of an
RST BPDU.
Fields from the 37th byte of an MST BPDU are MSTP-specific. The field MSTI
Configuration Messages consists of configuration messages of Multiple Spanning Tree
Instances (MSTIs).
Table 5-26 lists the major information carried in an MST BPDU.

Equipment
Table 5-26 Major information carried in an MST BPDU

Protocol 2 Indicates the protocol identifier.

Identifier
Protocol 1 Indicates the protocol version identifier. 0 indicates

Version STP; 2 indicates RSTP; 3 indicates MSTP.
Identifier
BPDU Type 1 Indicates the BPDU type:

l 0x00: Configuration BPDU for STP
l 0x80: TCN BPDU for STP
l 0x02: RST BPDU or MST BPDU
CIST Flags 1 Indicates the CIST flags.
CIST Root 8 Indicates the CIST root switching device ID.

Identifier
CIST External 4 Indicates the total path costs from the MST region
Path Cost where the switching device resides to the MST region
where the CIST root switching device resides. This
value is calculated based on link bandwidth.
CIST Regional 8 Indicates the ID of the regional root switching device

Root Identifier on the CIST, that is, the IST master ID. If the root is in
this region, the CIST Regional Root Identifier is the
same as the CIST Root Identifier.
CIST Port 2 Indicates the ID of the designated port in the IST.

Identifier
Message Age 2 Indicates the lifecycle of the BPDU.
Max Age 2 Indicates the maximum lifecycle of the BPDU. If the

Max Age timer expires, it is considered that the link to
the root fails.
Hello Time 2 Indicates the Hello timer value. The default value is 2
seconds.
Forward Delay 2 Indicates the forwarding delay timer. The default value
is 15 seconds.
Version 1 1 Indicates the BPDUv1 length, which is fixed to 0.

Length
Version 3 2 Indicates the BPDUv3 length.

Length

Equipment
MST 51 Indicates the MST regional label information, which

Configuration includes four fields shown in Figure 5-75.
Identifier Interconnected switching devices that are configured
with the same MST configuration identifier belong to
one region. For details about these four fields, see
Table 5-27.
CIST Internal 4 Indicates the total path costs from the local port to the
Root Path Cost IST master. This value is calculated based on link
bandwidth.
CIST Bridge 8 Indicates the ID of the designated switching device on

Identifier the CIST.
CIST 1 Indicates the remaining hops of the BPDU in the CIST.

Remaining
Hops
MSTI 16 Indicates the MSTI configuration information. Each

Configuration MSTI configuration message uses 16 bytes, and
Messages(may therefore this field has N x 16 bytes in the case of N
be absent) MSTIs. Figure 5-76 shows the structure of a single
MSTI configuration message. Table 5-27 describes
every sub-field.
Figure 5-75 shows the sub-fields in the MST Configuration Identifier field.
Figure 5-75 MST Configuration Identifier
Octet
Configuration Identifier Format Selector 39
Configuration Name 40-71
Revision Level 72-73
Configuration Digest 74-89
Table 5-27 describes the sub-fields in the MST Configuration Identifier field.
Table 5-27 Description of sub-fields in the MST Configuration Identifier field

Sub-field Byte Description
Configuration 1 The value is 0.

Identifier Format
Selector

Equipment
Configuration Name 32 Indicates the regional name. The value is a 32-

byte string.
Revision Level 2 The value is a 2-byte non-negative integer.
Configuration 16 Indicates a 16-byte digest obtained by encrypting

Digest the mappings between VLANs and instances in
the region based on the HMAC-MD5 algorithm.
Figure 5-76 shows the sub-fields in the MSTI Configuration Messages field.
Figure 5-76 MSTI Configuration Messages

Octet
MSTI Flags 1
MSTI Regional Root Identifier 2-9
MSTI Internal Root Path Cost 10-13
MSTI Bridge Priority 14
MSTI Port Priority 15
MSTI Remaining Hops 16
Table 5-28 describes the sub-fields in the MSTI Configuration Messages field.
Table 5-28 Description of sub-fields in the MSTI Configuration Messages field

MSTI Flags 1 Indicates the MSTI flags.
MSTI Regional Root 8 Indicates the MSTI regional root

Identifier switching device ID.
MSTI Internal Root 4 Indicates the total path costs from the
Path Cost local port to the MSTI regional root
switching device. This value is
calculated based on link bandwidth.
MSTI Bridge Priority 1 Indicates the priority value of the

designated switching device in the
MSTI.
MSTI Port Priority 1 Indicates the priority value of the

designated port in the MSTI.
MSTI Remaining Hops 1 Indicates the remaining hops of the

BPDU in the MSTI.

Equipment
Configurable MST BPDU Format

Currently, there are two MST BPDU formats:
l dot1s: BPDU format defined in IEEE 802.1s.

l legacy: private BPDU format.
If a port transmits either dot1s or legacy BPDUs by default, the user needs to identify the
format of BPDUs sent by the peer, and then runs a command to configure the port to support
the peer BPDU format. Once the configuration is incorrect, a loop probably occurs due to
incorrect MSTP calculation.
By using the stp compliance command, you can configure a port on a Huawei datacom
device to automatically adjust the MST BPDU format. With this function, the port
automatically adopts the peer BPDU format. The following MST BPDU formats are
supported by Huawei datacom devices:
l auto
l dot1s
l legacy
In addition to dot1s and legacy formats, the auto mode allows a port to automatically switch
to the BPDU format used by the peer based on BPDUs received from the peer. In this manner,
the two ports use the same BPDU format. In auto mode, a port uses the dot1s BPDU format
by default, and keeps pace with the peer after receiving BPDUs from the peer.
Configurable Maximum Number of BPDUs Sent by a Port at a Hello Interval

BPDUs are sent at Hello intervals to maintain the spanning tree. If a switching device does
not receive any BPDU during a certain period of time, the spanning tree will be re-calculated.
After a switching device becomes the root, it sends BPDUs at Hello intervals. Non-root
switching devices adopt the Hello Time value set for the root.
Huawei datacom devices allow the maximum number of BPDUs sent by a port at a Hello
interval to be configured as needed.
The greater the Hello Time value, the more BPDUs sent at a Hello interval. Setting the Hello
Time to a proper value limits the number of BPDUs sent by a port at a Hello interval. This
helps prevent network topology flapping and avoid excessive use of bandwidth resources by
BPDUs.
5.4.3.4 MSTP Topology Calculation
MSTP Principle
In MSTP, the entire Layer 2 network is divided into multiple MST regions, which are
interconnected by a single Common Spanning Tree (CST). In an MST region, multiple
spanning trees are calculated, each of which is called a Multiple Spanning Tree Instance
(MSTI). Among these MSTIs, MSTI 0 is also known as the internal spanning tree (IST). Like
STP, MSTP uses configuration messages to calculate spanning trees, but the configuration
messages are MSTP-specific.

Equipment
Vectors
Both MSTIs and the Common and Internal Spanning Tree (CIST) are calculated based on
vectors, which are carried in Multiple Spanning Tree (MST) BPDUs. Therefore, switching
devices exchange MST BPDUs to calculate MSTIs and the CIST.
l Vectors are described as follows:

– The following vectors participate in the CIST calculation:
{root ID, external root path cost, region root ID, internal root path cost, designated
switching device ID, designated port ID, receiving port ID}
– The following vectors participate in the MSTI calculation:
{regional root ID, internal root path cost, designated switching device ID,
designated port ID, receiving port ID}
The priorities of the vectors in the braces are in descending order from left to right.
Table 5-29 describes the vectors.
Table 5-29 Vector description

Vector Name Description
Root ID Identifies the root switching device for the CIST. The root
identifier consists of the priority value (16 bits) and MAC address
(48 bits).
External root path Indicates the path cost from a CIST regional root to the root.
cost (ERPC) ERPCs saved on all switching devices in an MST region are the
same. If the CIST root is in an MST region, ERPCs saved on all
switching devices in the MST region are 0s.
Regional root ID Identifies the MSTI regional root. The regional root ID consists
of the priority value (16 bits) and MAC address (48 bits).
Internal root path Indicates the path cost from the local bridge to the regional root.
cost (IRPC) The IRPC saved on a regional edge port is greater than the IRPC
saved on a non-regional edge port.
Designated Identifies the nearest upstream bridge on the path from the local
switching device bridge to the regional root. If the local bridge is the root or the
ID regional root, this ID is the local bridge ID.
Designated port Identifies the port on the designated switching device connected
ID to the root port on the local bridge. The port ID consists of the
priority value (4 bits) and port number (12 bits). The priority
value must be a multiple of 16.
Receiving port ID Identifies the port receiving the BPDU. The port ID consists of
the priority value (4 bits) and port number (12 bits). The priority
value must be a multiple of 16.
l The vector comparison principle is as follows:

For a vector, the smaller the priority value, the higher the priority.
Vectors are compared based on the following rules:

Equipment
a. Compare the IDs of the roots.

b. If the IDs of the roots are the same, compare ERPCs.
c. If ERPCs are the same, compare the IDs of regional roots.
d. If the IDs of regional roots are the same, compare IRPCs.
e. If IRPCs are the same, compare the IDs of designated switching devices.
f. If the IDs of designated switching devices are the same, compare the IDs of
designated ports.
g. If the IDs of designated ports are the same, compare the IDs of receiving ports.
If the priority of a vector carried in the configuration message of a BPDU received by a
port is higher than the priority of the vector in the configuration message saved on the
port, the port replaces the saved configuration message with the received one. In
addition, the port updates the global configuration message saved on the device. If the
priority of a vector carried in the configuration message of a BPDU received on a port is
equal to or lower than the priority of the vector in the configuration message saved on
the port, the port discards the BPDU.
CIST Calculation
After completing the configuration message comparison, the switching device with the
highest priority on the entire network is selected as the CIST root. MSTP calculates an IST
for each MST region, and computes a CST to interconnect MST regions. On the CST, each
MST region is considered a switching device. The CST and ISTs constitute a CIST for the
entire network.
MSTI Calculation
In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between
VLANs and MSTIs. Each MSTI is calculated independently. The calculation process is
similar to the process for STP to calculate a spanning tree. For details, see 5.4.2.4 STP
Topology Calculation.
MSTIs have the following characteristics:
l The spanning tree is calculated independently for each MSTI, and spanning trees of
MSTIs are independent of each other.
l MSTP calculates the spanning tree for an MSTI in the manner similar to STP.
l Spanning trees of MSTIs can have different roots and topologies.
l Each MSTI sends BPDUs in its spanning tree.
l The topology of each MSTI is configured by using commands.
l A port can be configured with different parameters for different MSTIs.
l A port can play different roles or have different status in different MSTIs.
On an MSTP-aware network, a VLAN packet is forwarded along the following paths:
l MSTI in an MST region
l CST among MST regions
MSTP Responding to Topology Changes

MSTP topology changes are processed in the manner similar to that in RSTP. For details
about how RSTP processes topology changes, see 5.4.2.6 Details About RSTP.

Equipment
5.4.3.5 MSTP Fast Convergence

MSTP supports both ordinary and enhanced Proposal/Agreement (P/A) mechanisms:
l Ordinary P/A
The ordinary P/A mechanism supported by MSTP is implemented in the same manner as
that supported by RSTP. For details about the P/A mechanism supported by RSTP, see
5.4.2.6 Details About RSTP.
l Enhanced P/A
Figure 5-77 Enhanced P/A mechanism
Upstream Downstream
device device
Sends a proposal so
that the port can
rapidly enter the
Forwarding state The root port blocks
all the other non-
edge ports
Sends an agreement
The root port
The designated enters the
port enters the Sends an agreement Forwarding state
Forwarding state
root port
designated port
As shown in Figure 5-77, in MSTP, the P/A mechanism works as follows:

a. The upstream device sends a proposal to the downstream device, indicating that the
port connecting to the downstream device wants to enter the Forwarding state as
soon as possible. After receiving this BPDU, the downstream device sets its port
connecting to the upstream device to the root port, and blocks all non-edge ports.
b. The upstream device continues to send an agreement. After receiving this BPDU,
the root port enters the Forwarding state.
c. The downstream device replies with an agreement. After receiving this BPDU, the
upstream device sets its port connecting to the downstream device to the designated
port, and the port enters the Forwarding state.
By default, Huawei datacom devices use the enhanced P/A mechanism. If a Huawei datacom
device needs to communicate with a non-Huawei device that uses the ordinary P/A
mechanism, run the stp no-agreement-check command to configure the Huawei device to
use the ordinary P/A mechanism. In this manner, these two devices can communicate with
each other.

Equipment
5.4.4 Applications
STP Application
On a complex network, network designers tend to deploy multiple physical links between two
devices, one of which is the master and the others are the backup. Loops are likely or bound
to occur in such a situation.
Figure 5-78 STP application
Network
PE1 Root PE2

Bridge
STP
CE1 CE2
PC1 PC2
Blocked port
On the network shown in Figure 5-78, after a CE and a PE running STP discover loops on the
network by exchanging information with each other, they trim the ring topology into a loop-
free tree topology by blocking a certain port. In this manner, replication and circular
propagation of packets are prevented on the network, and the switching devices are released
from processing duplicated packets, thereby improving their processing performance.

Equipment
MSTP Application
Figure 5-79 MSTP application
MST Region
ATNA ATNB
all VLAN
VLAN VLAN VLAN VLAN

10&20 10&20 20&30 20&30
VLAN
ATNC 20&40 ATND
In Figure 5-79, MSTP can be configured to use different spanning tree instances to forward
packets in different VLANs. The detailed configurations are as follows:
l Configure all switches on the network to belong to the same MST region.
l Configure VLAN 10 packets to be forwarded within MSTI 1; VLAN 30 packets within
MSTI 3; VLAN 40 packets within MSTI 4; VLAN 20 packets within MSTI 0.
BPDU Tunneling
The BPDU tunneling technology allows a user's networks located in different areas to
transparently transmit BPDUs on a specified VLAN VPN within a carrier's network. In this
manner, all devices on the user's networks can calculate the spanning tree. The user's
networks and the operator's networks have their own independent spanning trees.
As shown in Figure 5-80, the upper part is an operator's network; the lower part is a user's
network. The operator's networks hold ingress/egress devices; the user's networks consist of
user's network A and user's network B.
You can configure the packet ingress device to replace the original destination MAC address
of a BPDU with a MAC address in a special format and the packet egress device to replace
the MAC address in a special format with the original MAC address. In this manner, the
BPDU is transparently transmitted.

Equipment
Figure 5-80 BPDU tunneling
Packet ingress/ Operator's Network Packet ingress/

egress device egress
Network
User's Network
NetworkA NetworkB

Terms
Term Explanation
STP Spanning Tree Protocol. A protocol used in a local area network (LAN) to
eliminate loops. STP-capable devices exchange protocol packets to
discover loops in the network and block redundant interfaces to eliminate
loops.
RSTP Rapid Spanning Tree Protocol. A protocol defined in the IEEE 802.1w.
RSTP is a supplement to STP and implements faster convergence than
STP.
MSTP Multi-Spanning Tree Protocol. A spanning tree protocol that is defined in

IEEE 802.1s and introduces the concepts of region and instance. MSTP
divides a large network into regions and creates multiple spanning tree
instances (MSTIs), which are mapped to virtual LANs (VLANs). Network
bridges exchange bridge protocol data units (BPDUs) carrying
information about regions and instances to know which MSTIs they
belong to. Multi-instance RSTP runs within regions, whereas RSTP-
compatible protocols run between regions.
VLAN Virtual local area network. A logical switched network that is constructed
across different network segments by using network management
software. A VLAN forms a logical subnet (broadcast domain). One
VLAN can include multiple network devices.

Equipment

Abbreviation
STP Spanning Tree Protocol
RSTP Rapid Spanning Tree Protocol
MSTP Multiple Spanning Tree Protocol
BPDU bridge protocol data unit
CIST Common and Internal Spanning Tree
CST Common Spanning Tree
IST Internal Spanning Tree
SST Single Spanning Tree
MST Multiple Spanning Tree
MSTI multiple spanning tree instance
TCN topology change notification
5.5 QinQ
5.5.1 Introduction
Definition
802.1Q-in-802.1Q (QinQ) adds another IEEE 802.1Q tag to 802.1Q tagged packets entering
the network. QinQ expands the VLAN space by tagging the tagged packets. It allows services
in a private VLAN to be transparently transmitted over a public network.
Purpose
The 12-bit VLAN tag defined in IEEE 802.1Q identifies only a maximum of 4096 VLANs,
unable to isolate and identify the users on the growing metro Ethernet (ME) network. QinQ is
therefore developed to expand the VLAN space by adding another 802.1Q tag to an 802.1Q
tagged packet. In this way, the number of VLANs increases to 4096 x 4096.
In addition to expanding VLAN space, QinQ is applied in other scenarios with the
development of the ME network and carriers' requirements on refined operation. The outer
and inner VLAN tags can be used to differentiate users from services. For example, the inner
tag represents a user, whereas the outer tag represents a service. Moreover, QinQ functions as
a simple and practical VPN technology by transparently transmitting private VLAN services
over a public network. It extends services of a core MPLS VPN to the ME network and
implements an end-to-end VPN.

Equipment
As the metro Ethernet grows and the refined operation requires, double tags of QinQ can be
applied in other scenarios. The inner tag indicates the user; the outer tag indicates the service.
In addition, when QinQ packets that carry double tags traverse the Internet Service Provider
(ISP) network, the inner tag is transmitted transparently. Such an implementation mode can
also be regarded as a simple and practical VPN technology. QinQ extends services of a core
MPLS VPN in the metro Ethernet and forms the end-to-end VPN.
Since the QinQ technology is easy to use, it has been widely applied in the ISP network. For
example, it is used by multiple services in the metro Ethernet. The introduction to selective
QinQ (VLAN stacking) makes QinQ more popular among ISPs. As the metro Ethernet
develops, different vendors propose their own metro Ethernet solutions. QinQ with its
simplicity and flexibility, plays important roles in metro Ethernet solutions.
5.5.2 Principles
5.5.2.1 Principles
802.1Q-in-802.1Q (QinQ) expands the VLAN space by adding another 802.1Q tag to a
tagged 802.1Q packet. To accommodate to the ME network development, QinQ becomes
diversified in its encapsulation and termination modes and is more intensely applied in service
refined operation.
QinQ Packet Format

A QinQ packet has a fixed format. In the packet, another 802.1Q tag is added before an
exiting 802.1Q tag. A QinQ packet is 4–byte longer than an 802.1Q packet.
Figure 5-81 QinQ packet format
802.1Q Encapsulation
DA SA ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 46 Byte~1500 Bytes 4 Bytes
QinQ
Encapsulation
DA SA ETYPE TAG ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 42 Byte~1500 Bytes 4 Bytes
0x8100 Priority CFI VLAN ID
QinQ Encapsulation
QinQ encapsulation is to add an 802.1Q tag to a tagged-802.1Q packet on UPE interfaces
connecting to users. Sometimes, QinQ encapsulation can also be performed on a routing sub-
interface.
l QinQ encapsulation on a routing sub-interface
In general, QinQ encapsulation is performed on the switched port. In a special situation,
however, QinQ encapsulation can be performed on the route sub-interface.

Equipment
When data needs to be transmitted transparently over the MPLS/IP core network by
PWE3/VLL/VSI, the routing sub-interface on the NPE encapsulates packets with an
outer VLAN ID based on the user VLAN ID and accesses VLL/PWE3 through the outer
VLAN. This sub-interface is also called a QinQ stacking sub-interface.
This encapsulation mode is traffic-based. QinQ stacking sub-interfaces support only
L2VPN (PWE3/VLL/VPLS) and do not support Layer 3 forwarding.
Sub-interface for Dot1qQinQ/ VLAN Tag Termination

QinQ termination is a process in which a single or double VLAN tags of QinQ packets are
identified and then removed.
When QinQ is applied on the MPLS/IP core network, different termination modes are used.
QinQ termination is usually conducted on sub-interfaces. A termination sub-interface can be
either of the following:
l sub-interface for dot1q VLAN tag termination: A single VLAN tag is stripped.
l sub-interface for QinQ VLAN tag termination: Double VLAN tags are stripped.
5.5.2.2 VLAN Stacking

VLAN stacking is a Layer 2 technology that encapsulates different outer VLAN tags for
different VLANs.
In the carrier network-accessing environment, user packets usually need to be differentiated
according to user's applications, access points, or access devices. Therefore, VLAN stacking
is used to differentiate packet by adding different outer VLAN tags to user packets according
to the inner VLAN tags, IP addresses, or MAC addresses of these packets.
The VLAN stacking port has the following features:
l The VLAN stacking port can be configured with multiple outer VLAN tags. Then
different outer VLAN tags can be assigned to different VLAN frames.
l The VLAN stacking port can add outer VLAN tags to the received frames, and strip the
outer VLAN tags from the sent frames.
5.5.2.3 QinQ Mapping
Principle of QinQ Mapping

QinQ mapping occurs between when frames are received by an inbound interface and when
frames are forwarded by an outbound interface.
l When sending a local VLAN frame to the external VLAN, the sub-interface replaces the
VLAN tag of the frame with the VLAN tag of the external VLAN.
l When receiving an external VLAN frame, the port replaces the VLAN tag of the frame
with the VLAN tag of the local VLAN.
In actual networking applications, QinQ mapping can be used to map the C-VLAN tag to the
S-VLAN tag so that different C-VLAN tags are shielded.
QinQ mapping is often deployed on edge devices of an ME network to map the C-VLAN tag
carried in a frame to the S-VLAN tag before the frame is transmitted on the public network.
QinQ mapping can be applied but not limited to the following scenarios:

Equipment
l The VLAN IDs deployed at new sites and old sites conflict, but new sites need to
communicate with old sites.
l The VLAN ID planning of each site on the public network is different. As a result, the
VLAN IDs conflict. The sites, however, do not need to communicate.
l The VLAN IDs on both ends of the public network are different.
Currently, the ATN supports the following mapping modes:

l 1 to 2 QinQ mapping
When a sub-interface of the device configured with QinQ mapping receives a single-
tagged frame, it maps the tag of the frame to the specified double tags.
5.5.2.4 IP Forwarding on Termination Sub-interfaces

On the network shown in Figure 5-82 and Figure 5-83, when the NPE at the edge of the
Multiprotocol Label Switching (MPLS)/IP core network acts as a gateway of users,
termination sub-interfaces must support IP forwarding.
IP forwarding can be configured on a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packets received by
the NPE carry one or two VLAN tags:
l If the user packets contain one tag, the sub-interface for dot1q VLAN tag termination
must support IP forwarding.
l If the user packets contain double tags, the sub-interface for QinQ VLAN tag termination
must support IP forwarding.
IP Forwarding on a Sub-interface for Dot1q VLAN Tag Termination
Figure 5-82 IP forwarding on a sub-interface for dot1q VLAN tag termination
IP 100
Dot1q
terminatio
n ISP
MPLS/IP
network
Dot1q PE
terminatio
n
IP 200
The sub-interface for dot1q VLAN tag termination first identifies the outer VLAN tag and
then generates an ARP entry containing the IP address, MAC address, and outer VLAN tag.

Equipment
l For the upstream traffic, the termination sub-interface strips the MAC address and the
outer VLAN tag, and searches the routing table to perform Layer 3 forwarding based on
the destination IP address.
l For the downstream traffic, the termination sub-interface encapsulates IP packets with
the MAC address and outer VLAN tag according to ARP entries and then sends IP
packets to the target user.
IP Forwarding on a Sub-interface for QinQ VLAN Tag Termination
Figure 5-83 IP forwarding on a sub-interface for QinQ VLAN tag termination
IP 200 1000
QinQ
termination
ISP
MPLS/IP
network
NPE
PE
QinQ
termination
IP 300 1000
The sub-interface for QinQ VLAN tag termination first identifies double VLAN tags and then
generates an ARP entry containing the IP address, MAC address, and double VLAN tags.
l For the upstream traffic, the termination sub-interface strips the MAC address and
double VLAN tags, and searches the routing table to perform Layer 3 forwarding based
on the destination IP address.
l For the downstream traffic, the termination sub-interface encapsulates IP packets with
the MAC address and double VLAN tags according to ARP entries and then sends IP
packets to the target user.
5.5.2.5 Proxy ARP on a VLAN Tag Termination Sub-interface

On the network shown in Figure 5-84, a VLAN tag termination sub-interface allows a VLAN
range to access the same network segment. Users on the same network segment but in
different VLANs fail to communicate at Layer 2 without the help of IP forwarding. Therefore,
the VLAN tag termination sub-interface must support proxy ARP.
Proxy ARP can be configured for a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packet received by
a PE contains one or two VLAN tags.
l If the user packet contains one tag, the sub-interface that has proxy ARP configured is a
sub-interface for Dot1q VLAN tag termination.

Equipment
l If the user packet contains double tags, the sub-interface that has proxy ARP configured
is a sub-interface for QinQ VLAN tag termination.
Proxy ARP on a Sub-interface for Dot1q VLAN Tag Termination

On the network shown in Figure 5-84, PC1 and PC2 belong to VLAN 100; PC3 belongs to
VLAN 200; Switch 1 is a Layer 2 switch, which allows any VLAN packet to pass; PC1, PC2,
and PC3 are on the same network segment.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3.
However, because PC1 and PC3 are in different VLANs, PC3 does not receive the ARP
request from PC1.
To resolve this problem, configure proxy ARP on the sub-interface for dot1q VLAN tag
termination.
Figure 5-84 Proxy ARP on a sub-interface for dot1q VLAN tag termination
PE
10.1.1.254/24
Dot1q Termination
sub-interface
Switch1
Switch2 VLAN100 VLAN200 Switch3
PC1 PC2 PC3

10.1.1.1/24 10.1.1.2/24 10.1.1.3/24
Proxy ARP on a Sub-interface for QinQ VLAN Tag Termination

A sub-interface for QinQ VLAN tag termination allows a VLAN range to access the same
network segment. Users on the same network segment but in different VLANs fail to
communicate at Layer 2 without the help of IP forwarding. Therefore, the sub-interface for
QinQ VLAN tag termination must support proxy ARP.
On the network shown in Figure 5-85, PC1, PC2, and PC3 are on the same network segment.
PC1 and PC2 belong to VLAN 100; PC3 belongs to VLAN 200. Switch 1 has selective QinQ
enabled and attaches outer VLAN tag 1000 to the packets from Switch 2 and Switch 3 to the
PE.
When PC1 and PC3 want to communicate with each other, PC1 sends an ARP request to PC3.
However, because PC1 and PC3 are in different VLANs, PC3 does not receive the ARP
request from PC1.
To resolve this problem, enable proxy ARP on the sub-interface for QinQ VLAN tag
termination.

Equipment
Figure 5-85 Proxy ARP on a sub-interface for QinQ VLAN tag termination
PE
10.1.1.254/24
VLAN1000 QinQ Termination
sub-interface
Switch1
Switch2 VLAN100 VLAN200 Switch3
PC1 PC2 PC3

10.1.1.1/24 10.1.1.2/24 10.1.1.3/24
5.5.2.6 L3VPN Access Through a Termination Sub-interface

On the network shown in Figure 5-86, Layer 3 virtual private network (L3VPN) functions are
configured on termination sub-interfaces.
l If a user packet carries one VLAN tag, configure a sub-interface for dot1q VLAN tag
termination for L3VPN access.
l If a user packet carries double VLAN tags, configure a sub-interface for QinQ VLAN
tag termination for L3VPN access.
L3VPN Access Through a Sub-interface for Dot1q VLAN Tag Termination

Figure 5-86 shows a typical networking for L3VPN access through a sub-interface for dot1q
VLAN tag termination.
A user packet carries one VLAN tag. On the CSG, a sub-interface for dot1q VLAN tag
termination is configured with an outer VLAN tag specified. The sub-interface for dot1q
VLAN tag termination is bound to a VPN instance according to the outer VLAN tag.
When the CSG receives the user packet, it terminates the packet's VLAN tag for L3VPN
access.

Equipment
Figure 5-86 L3VPN access through a sub-interface for dot1q VLAN tag termination
VPN1
VLAN300
VPN1
VLAN100
Switch2
IP 100
IP xxx IP 300
DSLAM1
CE1 Dot1q
termination CE3
PE3
DSLAM2
IP 200
IP 200
VPN1 DSLAM
VLAN200 ISP Backbone
Network 3
PE1
VPN1
VLAN200
Dot1q PE2
termination CE4
CE2
IP xxx
IP 300
Switch1 Switch3
DSLAM4
VPN1 VPN1 VPN1

VLAN300 VLAN100 VLAN300
L3VPN Access Through a Sub-interface for QinQ VLAN Tag Termination

Figure 5-87 shows a typical networking for L3VPN access through a sub-interface for QinQ
A user packet carries double VLAN tags: one is added by the DSLAM based on user type,
and the other is added by the CE based on the service type. On the CSG, the sub-interface for
QinQ VLAN tag termination is configured with inner and outer VLAN tags specified. The
sub-interface for QinQ VLAN tag termination is bound to a VPN instance according to the
double VLAN tags.
When the CSG receives the user packet, it strips off the double VLAN tags for L3VPN
access. When the CSG sends a data packet to the user terminal, it must add correct outer and
inner VLAN tags to the packet.
When the CSG is terminating the double tags of a user packet, the CSG must perform ARP
learning based on double VLAN tags of the user packet.

Equipment
Figure 5-87 L3VPN access through a sub-interface for QinQ VLAN tag termination
VPN1
VLAN300
VPN1
VLAN100
Switch2
IP 100
IP xxx 1000 IP 300

DSLAM1
CE1 QinQ
termination CE3
PE3
DSLAM2
IP 200
IP 200
VPN1
VLAN200 ISP Backbone DSLAM3
Network
PE1
VPN1
VLAN200
QinQ PE2
termination CE4
CE2
IP xxx 1000
IP 300
Switch1 Switch3
DSLAM4
VPN1 VPN1 VPN1

5.5.2.7 PWE3/VLL Access Through a Termination Sub-interface

NOTE
Among ATN 950B series, only ATN 950B(AND2CXPB/AND2CXPE) supports Layer 2 protocol transparent
transmission.
Sub-interface for QinQ VLAN tag termination access to Pseudo-Wire Emulation Edge to
Edge (PWE3)/VLL means that the sub-interface for QinQ VLAN tag termination is
configured with PWE3/VLL functions. After a range of inner and outer VLAN tags are
configured on the sub-interface for QinQ VLAN tag termination of the PE, users within the
VLAN tag range are allowed to access PWE3/VLL. The packet that carries double tags is
transparently transmitted to the remote end as Layer 2 data for identification and
authentication. The remote end is often a Broadband Remote Access Server (BRAS).
Figure 5-88 shows a typical networking for PWE3/VLL access through the sub-interfaces for
QinQ VLAN tag termination.

Equipment
Figure 5-88 PWE3/VLL access through the sub-interfaces for QinQ VLAN tag termination
IP 100
IP xxx 1000
DSLAM1
CE1
ISP
Network
PE1
IP 200
DSLAM2 PE2 BRAS

CE2
IP xxx 1000
IP 300
Switch
5.5.2.8 VPLS Access Through a Termination Sub-interface

Virtual Private LAN Service (VPLS) access through a termination sub-interface means that
VPLS functions are configured on the termination sub-interface. By configuring the range of
double VLAN tags on the sub-interface for QinQ VLAN tag termination of the PE, the local
Virtual Switching Instance (VSI) can communicate with the remote VSI. VPLS access is
often used for communication between QinQ users of Layer 2 enterprise networks.
On a VPLS network, one Virtual Circuit (VC) link connects only a user's two VLANs that are
distributed in different places. If the user wants to connect multiple VLANs distributed in
different places, multiple VCs are required.
As a termination sub-interface supports a VLAN range, configuring VPLS access through a

termination sub-interface allows one VC to connect users in the VLAN rage. Traffic of all the
VLANs in the specified range is transmitted over this VC, greatly saving VC resources of the
public network and configuration workload. In addition, users can plan their own VLANs,
irrespective of what the Internet Service Provider's (ISP's) VLANs are.
VPLS access can be configured for a sub-interface for dot1q VLAN tag termination or sub-
interface for QinQ VLAN tag termination, depending on whether the user packet received by
a PE contains one or two VLAN tags.
l If a user packet carries one VLAN tag, configure a sub-interface for dot1q VLAN tag
termination for VPLS access.
l If a user packet carries double VLAN tags, configure a sub-interface for QinQ VLAN
tag termination for VPLS access.
VPLS Access Through a Sub-interface for Dot1q VLAN Tag Termination

Figure 5-89 shows a typical networking for VPLS access through a sub-interface for dot1q

Equipment
Figure 5-89 VPLS access through a sub-interface for dot1q VLAN tag termination
CE3
IP 3000
PE3
VSI1
PE1 ISP Network PE2

VSI1 VSI1
Dot1q Dot1q
termination termination
IP xxx IP xxx
CE1 CE2
IP 100 IP 200 IP 100 IP 200
VPLS supports the Point-to-Multipoint Protocol (P2MP) and forwards data by learning MAC
addresses. In this case, VPLS access through a sub-interface for dot1q VLAN tag termination
can be performed by MAC address learning on the basis of a single VLAN tag. Note that
there are no restrictions on VLAN tags for VPLS access.
VPLS Access Through a Sub-interface for QinQ VLAN Tag Termination

Figure 5-90 shows a typical networking for VPLS access through a sub-interface for QinQ

Equipment
Figure 5-90 VPLS access through a sub-interface for QinQ VLAN tag termination
CE3
IP 3000
PE3
VSI1
PE1 ISP Network PE2

VSI1 VSI1
QinQ QinQ
termination termination
IP xxx 1000 IP xxx 1000
CE1 CE2
IP 100 IP 200 IP 100 IP 200
VPLS supports the P2MP and forwards data by learning MAC addresses. In this case, VPLS
access through a sub-interface for QinQ VLAN tag termination can be performed by MAC
address learning on the basis of double VLAN tags. Note that there are no restrictions on
VLAN tags for VPLS access.
5.5.2.9 PWE3 or VLL Access Through QinQ Stacking Sub-interfaces

To ensure multiple user access through the same physical interface, you can use the VLAN-
based QinQ stacking function on different sub-interfaces. This function requires that CE-
VLANs on PE1 and PE2 must be the same.
On the network shown in Figure 5-91, stacking is performed on sub-interfaces for the packets
whose VLAN IDs ranges from 1 to 200. These packets are added with outer VLAN tags of
the ISP network and then sent to the VLL and PWE3.

Equipment
Figure 5-91 PWE3 or VLL access through QinQ stacking sub-interfaces
ISP Network
PE1 PE2
1~200 1000 1000 1~200
CE1 CE2
5.5.2.10 VPLS Access Through QinQ Stacking Sub-interfaces

To access an Internet Service Provider (ISP) network over a virtual private LAN service
(VPLS) network, you can configure a QinQ stacking on a routing-based sub-interface. On the
network shown in Figure 5-92, you can implement stacking operation on packets whose
VLAN tags range from 1 to 200 on sub-interfaces and add outer VLAN tags of the ISP
network to these packets. In this manner, sub-interfaces are accessed to the VPLS after being
bound to a VSI.

Equipment
Figure 5-92 VPLS access through QinQ stacking sub-interfaces
CE3
1~200 1000
PE3
VSI1
IP/MPLS
Network
VSI1 VSI1
1~200 1000
1~200 1000
PE1 PE2
CE1 CE2
5.5.3 Applications

Equipment
5.5.3.1 Public User Services on an ME Network
Figure 5-93 QinQ on an ME network
Core Network
NPE
NPE
VLAN1001 VLAV1XX
VLAN2001 VLAV3XX
VLAN1000 VLAV1XX
VRRP VLAN3001 VLAV5XX
VLAN2000 VLAV3XX Metro
VLAN3000 VLAV5XX Ethernet
UPE
VLAN101 VLAN101
VLAN301 VLAN301
VLAN501 VLAN501
HSI VOIP IPTV HSI VOIP IPTV
PVC101
PVC301
PVC501
On the network shown in Figure 5-93, DSLAMs support multiple Permanent Virtual Channel
(PVC) access. One user uses multiple services, such as HSI, IPTV and VoIP.
PVCs are used to carry services that are assigned with different VLAN ID ranges. Figure
5-93 lists the VLAN ID ranges for each service.
Table 5-30 Mapping between services and VLAN IDs

Service Name Full Name Range of VLAN IDs
HSI high-speed Internet 101 to 300
VoIP voice over IP 301 to 500
IPTV Internet Protocol television 501 to 700
If a user needs to use the VoIP service, user VoIP packets are sent to a DSLAM over a
specified PVC and assigned with VLAN ID 301. When the packets reach the UPE, an outer
VLAN ID (for example, 2000) is added to the packets. The inner VLAN ID (301) represents
the user, and the outer VLAN ID (2000) represents the VoIP service (the DSLAM location
can also be marked if different DSLAMs add different VLAN tags to packets). The UPE then
sends the VoIP packets to the NPE where the double VLAN tags are terminated. Then, the
NPE sends the packets to an IP core network or a Virtual Private Network (VPN).

Equipment
HSI and IPTV services are processed in the same way. The difference is that QinQ
termination of HSI services is implemented on the BRAS.
The NPE can also perform HQoS scheduling based on the double tags and generate a
Dynamic Host Configuration Protocol (DHCP) binding table to avoid network attacks. In
addition, the NPE can implement DHCP authentication based on the double tags or other
information and enable QinQ Virtual Router Redundancy Protocol (VRRP) to ensure reliable
service access.
5.5.3.2 Enterprise User Communication Through Private Lines

As shown in Figure 5-94, an enterprise has two sites in different places. Each site has three
networks: finance, sales and others. To ensure network security, users of different networks
cannot communicate with each other.
Figure 5-94 Enterprise user communication through private lines

ME VPLS ME
UPE NPE NPE UPE

Finance
Others Finance Others
Marketing
VLAN200 Marketing
VLAN200
The carrier uses VPLS on the MPLS/IP core network and QinQ on the ME network. Each site
is configured with three VLANs. VLANs 100, 200, and 300 represent the finance, sales and
other departments respectively. An outer VLAN 1000 is encapsulated on the UPE (Packets
can be added with different VLAN tags on different UPEs.) The Virtual Switching Instance
(VSI) on the NPE is in symmetry mode. After the configurations, only users of the same
VLAN in different sites can communicate with each other.

Equipment

Terms
Terms Description
QinQ interface An interface that can process VLAN frames with a single tag (dot1q
termination or VLAN mapping) or with double tags (QinQ termination,
or VLAN stacking)
QinQ An interface that can identify single or double VLAN tags and strips the
termination tags or sends the packets according to the subsequent forwarding.
sub-interface

Abbreviation
DHCP Dynamic Host Configuration Protocol
HSI high-speed Internet
IPTV Internet Protocol television
PVC permanent virtual connection
PWE3 pseudo wire emulation edge-to-edge
QinQ 802.1Q in 802.1Q
QinQ Termination 802.1Q in 802.1Q termination
Selective QinQ Selective 802.1Q in 802.1Q
VLAN Stacking virtual local area network stacking
VLL virtual leased Line
VOIP voice over IP
VPLS virtual private LAN service
VSI virtual switch instance

Equipment
5.6 RRPP
NOTE
Only ATN 910/ATN 910B/ATN 910I/ATN 950B support this feature.
5.6.1 Introduction
Definition
The Rapid Ring Protection Protocol (RRPP) is a Huawei-specific link layer protocol that
prevents loops on an Ethernet ring network. RRPP enables devices to detect loops by
exchanging RRPP packets and to eliminate loops by blocking certain interfaces.
Purpose
Most MANs and LANs use ring networks to implement high reliability. Any node or link
failure on the ring will not affect the service if the backup link is deployed.
When a device or link fails, it takes a period of time for data switches to a backup device or
link. To reduce convergence time and remove the impact of network scale on convergence
time, Huawei develops RRPP. Compared with other Ethernet ring technologies, RRPP boasts
the following features, as shown in Table 5-31:
l The convergence speed is rapid: The convergence time of Layer 2 traffic is less than 50
milliseconds (ms) and that of Layer 3 traffic is less than 200 ms.
l Convergence time is irrelevant to the number of nodes on the ring network. Therefore,
RRPP can be applied to a network with a great diameter.
l RRPP can prevent broadcast storms caused by loops when an Ethernet ring network is
complete.
l On an Ethernet ring network, when a link is torn down, a backup link immediately starts
to resume the normal communication between nodes.
l The cost is low.
Table 5-31 compares RRPP with other ring network protocols.
Table 5-31 Comparison of RRPP with other ring network protocols
Ring Description Disadvantage

Networ
k
Protoco
l
Token Token ring is the first ring technology Token ring does not have a self-
ring on data communication networks and healing capability.
adopts a single-direction ring structure Token ring is of low speed, so that it
based on MAC layer protocols. applies only to local area networks
(LANs).

Equipment

Networ
k
Protoco
l
Fiber FDDI is an enhancement of token ring FDDI does not have a self-healing
Distribut and adopts the double-ring structure. capability.
ed FDDI uses a token to control a ring FDDI does not consider bandwidth
Digital network. consumption because FDDI uses the
Interface FDDI uses fibers to transmit data, source address stripping technique.
(FDDI) providing higher performance and
efficiency compared with token ring.
Synchro SDH/SONET is a widely applied ring Broadcast packets and multicast

nous technology that supports both single packets on the SDH/SONET are
Digital ring and double rings. SDH/SONET transmitted as unicast packets. The
Hierarch features high reliability and provides bandwidth is therefore severely
y/ Automatic Protection Switching wasted. The APS feature requires a
Synchro (APS), which is a mechanism of maximum of 50 % redundancy
nous automatic fault recovery. bandwidth, which makes a flexible
Optical On an SDH/SONET, the bandwidth selection mechanism impossible.
Network between two nodes is fixed, which is
(SDH/ determined by the point-to-point
SONET) (P2P) structure and the circuit
switching design. The bandwidth
cannot adapt itself to the ever
changing situation, which impedes the
efficient utilization of bandwidth.
Therefore, the SDH/SONET
technology cannot fully meet the
bandwidth requirements of burst IP
traffic.
Spannin STP (Spanning Tree Protocol)/RSTP The convergence time is affected by

g Tree (Rapid Spanning Tree Protocol)/ the network topology.
Protocol MSTP (Multi-Spanning Tree
(STP)/ Protocol)
Rapid On an MSTP network, a loop-free tree
Spannin is formed. Therefore, broadcast storms
g Tree are eliminated, and backup is
Protocol implemented.
(RSTP)/
Multiple Multiple spanning trees carry out load
Spannin balancing among VLANs. In this case,
g Tree traffic of different VLANs is
Protocol transmitted along different paths.
(MSTP) STP/RSTP/MSTP is a protocol with
the automatic calculation function and
supports any topology.

Equipment

Networ
k
Protoco
l
RRPP RRPP features fast convergence. RRPP is a Huawei-specific protocol

The convergence time is irrelevant to and can be enabled only on networks
the number of nodes on a ring deployed with pure Huawei devices.
network.
5.6.2 Principles

Ethernet devices can be configured as nodes with different roles on an RRPP ring. RRPP ring
nodes exchange and process RRPP packets to detect the status of the ring network and
communicate any topology changes throughout the network. The master node on the ring
blocks or unblocks the secondary port depending on the status of the ring network. If a device
or link on the ring network fails, the backup link immediately starts to eliminate loops.
RRPP Network Architecture

As shown in Figure 5-95, an RRPP domain consists of the following elements.
Figure 5-95 RRPP domain
RRPP Domain
Master Node
CX-B
Edge Node
RRPP Sub-Ring 1 Transit Node
CX-A
RRPP Major-Ring
Assistant Node CX-C Master Node
Master Node Transit Node

RRPP Sub-Ring 2

Equipment
l RRPP domain
Each RRPP domain is uniquely identified by an integer ID.
An RRPP domain consists of a group of switches that are mutually connected and
configured with the same domain ID and control VLAN.
Network elements that form an RRPP domain are as follows:
– RRPP major ring
– RRPP sub-ring
– Control VLAN
– Master node
– Transit node
– Edge node
– Assistant edge node
– Common port
– Edge port
– Primary port
– Secondary port
l RRPP ring
Physically, an RRPP ring corresponds to an Ethernet ring topology. Each RRPP ring is a
part of an RRPP domain.
An RRPP domain can comprise a single RRPP ring or a major ring plus multiple sub-
rings.
Sub-ring protocol packets are transmitted through the major ring as data packets; major
ring protocol packets are transmitted only within the major ring.
NOTE
An RRPP domain can only have one RRPP major ring.
l Control VLAN
In an RRPP domain, a control VLAN is used to transmit only RRPP packets, whereas a
data VLAN is used to transmit data packets. A data VLAN can contain both the RRPP
and non-RRPP ports.
Each RRPP domain is configured with two control VLANs: major control VLAN and
sub-control VLAN. A major control VLAN belongs to a major ring, whereas a sub-
control VLAN belongs to a sub-ring. During the configuration, you need only to specify
a major control VLAN, and the system will automatically take the VLAN with an ID 1
greater than the major control VLAN ID as a sub-control VLAN.
Protocol packets of the major ring are transmitted in the major control VLAN; protocol
packets of the sub-ring are transmitted in the sub-control VLAN. Interfaces of both the
major control VLAN and the sub-control VLAN cannot be configured with VLANIF
interfaces.
On each switch, the port connected to an RRPP ring network belongs to a control
VLAN.
l Node type
Each device on an Ethernet ring is a node. Nodes on the RRPP ring are classified into
following types:
– Master node

Equipment
The master node determines how to handle topology changes. Each RRPP ring must
have only one master node.
Any device on the Ethernet ring can serve as the master node.
The status of the master node can be either Complete or Failed.
When all links on the ring network are in the Up state and the master node can
receive Hello packets sent by itself over the secondary port, the master node is in
the Complete state.
The status of the master node represents the status of the RRPP ring. When the
master node is in the Complete state, the RRPP ring is also in the Complete state. In
this situation, the master node blocks the secondary port to prevent data packets
from forming broadcast loops on the ring topology. After being blocked, the
secondary port can receive RRPP protocol packets, but cannot transmit data
packets.
When a link on the ring network is in the Down state, the master node is in the
Failed state. In this situation, the master node unblocks the secondary port to ensure
uninterrupted communication between nodes on the ring network.
– Transit node
On an RRPP ring, all nodes except the master node are transit nodes. Each transit
node monitors the status of its directly connected RRPP link and notifies the master
node of any changes in link status.
The status of transit nodes can be Link-Up, Link-Down, or Preforwarding.
When both the primary and secondary ports of a transit node are in the Up state, the
transit node is in the Link-Up state.
When either the primary port or secondary port of a transit node is in the Down
state, the transit node is in the Link-Down state.
When either the primary port or secondary port of a transit node is in the Blocked
state, the transit node is in the Preforwarding state.
As shown in Figure 5-96, when a transit node in the Link-Up state detects that the
link of the primary port or secondary port turns Down, the transit node switches to
the Link-Down state and sends a Link-Down packet to notify the master node.
The transit node never directly switches back to the Link-Up state from the Link-
Down state. When the link on a port of the transit node in the Link-Down state turns
Up and the primary port and secondary port return to the Up state, the transit node
switches to the Preforwarding state and blocks the recovered port. When the
primary and secondary ports go Up, the master node does not immediately detect
the change, and the secondary port therefore remains unblocked. If the transit node
immediately switches back to the Link-Up state, broadcast loops formed by data
packets occur on the ring network. Therefore, the transit node first enters
Preforwarding from the Link-Down state.
When a port on the transit node in the Preforwarding state goes Down, the transit
node enters the Link-Down state. When an interface on the transit node in the
Preforwarding state goes Up and the transit node receives a COMPLETE-FLUSH-
FDB packet from the master node, the transit node enters the Link-Up state. If the
COMPLETE-FLUSH-FDB packet is lost during the transmission, RRPP provides a
backup mechanism to unblock temporarily blocked ports and trigger the state
transition. Specifically, the transit node automatically changes to Link-Up and
unblocks the temporarily blocked port.

Equipment
Figure 5-96 Transit node state transition

Primary port up and
Secondary port up
Link-Up State
Primary port down or Primary port blocked or

Secondary port down Secondary port blocked
Recieved COMPLETE-
FLUSH-
FDB
Primary port up or
Secondary port up
Link-Down State Preforwarding State
Primary port down or

Secondary port down
– Edge node and assistant edge node

A switch can serve as an edge node or assistant edge node on the sub-ring, and as a
transit node on the major ring.
n Edge node
On an RRPP sub-ring, either of the two nodes crossed with the major ring can
be specified as an edge node.
Each sub-ring must have only one edge node.
n Assistant edge node
On an RRPP sub-ring, if one of the two nodes crossed with the major ring is
specified as an edge node, the other node is the assistant edge node.
Each sub-ring must have only one assistant edge node.
Edge nodes and assistant edge nodes are special transit nodes. Therefore, they have
the same three states as transit nodes, but with slightly different meanings.
If an edge port is in the Up state, the edge node or assistant edge node is actually in
the Link-Up state.
If an edge port is in the Down state, the edge node or assistant edge node is actually
in the Link-Down state.
If an edge port is in the Blocked state, the edge node or assistant edge node is
actually in the Preforwarding state.
The state transition of an edge node or assistant edge node is similar to that of a
transit node. The difference is that if the link status change of an edge node or
assistant edge node causes the state transition, only the status of the edge port
changes.
l Port role
– Primary port and secondary port
On both the master node and transit node, one of the two ports connected to the
Ethernet ring is the primary port, and the other is the secondary port. You can
specify roles for the two ports.

Equipment
The primary and secondary ports of the master node perform different functions.
The master node sends a Hello packet from the primary port. If the secondary port
can receive this packet, the RRPP ring of the node is complete. The master node
must block the secondary port to prevent a data loop.
However, if the packet is not received within the specified period, the RRPP ring is
faulty. The master node must unblock the secondary port to guarantee normal
communication between nodes on the ring.
If the secondary port on the master node of the major ring is blocked, both the data
packets and protocol packets of the sub-rings are prevented from passing through
the port. When the secondary port is unblocked, both the data packets and protocol
packets of the sub-rings are permitted to pass through the port. On the major ring,
protocol packets of the sub-rings are processed as data packets.
The primary and secondary ports of the transit node provide the same function.
– Common port and edge port
On an edge node or an assistant edge node, the port shared by the sub-ring and
major ring is called the common port. The port only on the sub-ring is called the
edge port.
A common port is regarded as a port on the major ring and belongs to both the
major control VLAN and the sub-control VLAN. The RRPP port on the sub-ring
only belongs to the sub-control VLAN. The major ring is regarded as a logical node
of the sub-ring, and packets of the sub-ring are transparently transmitted through the
major ring. However, the packets of the major ring are transmitted only in the major
ring.
RRPP Packets
Table 5-32 RRPP packet types
Packet Type Description
HEALTH A packet sent from the master node to detect whether a loop exists on
(HELLO) a network.
LINK-DOWN A packet sent from a transit, edge, or assistant edge node to notify the
master node that a port has gone Down and the loop has disappeared.
COMMON- A packet sent from the master node to instruct the transit, edge, or
FLUSH-FDB assistant edge node to update its MAC address forwarding table, ARP
entries, and ND entries.
COMPLETE- A packet sent from the master node to instruct the transit, edge, or
FLUSH-FDB assistant edge node to update its MAC address forwarding table, ARP
entries, and ND entries.
In addition, this packet instructs the transit node to unblock the
temporarily blocked ports.
EDGE-HELLO A packet sent from an edge port of a sub-ring and received by an

assistant edge port on the same sub-ring. The packet is used to check
the completeness of the major ring in the domain where the sub-ring
is located.

Equipment
MAJOR-FAULT A packet sent from an assistant edge node to notify the edge node that
the major ring in the domain fails if the assistant edge node does not
receive the Edge-Hello packet from the edge port within a specified
period.
Figure 5-97 RRPP packet format
0 7 8 15 16 23 24 31 32 39 40 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
EtherType PRI VLAN ID Frame Length
DSAP/SSAP CONTROL OUI = 0x00e02b
0x00bb 0x99 0x0b RRPP Length
RRPP_VER RRPP TYPE Domain ID Ring ID
0x0000 SYSTEM_MAC_ADDR (6 bytes)
HELLO_TIMER FAIL_TIMER
0x00 LEVEL HELLO_SEQ 0x0000
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
RESERVED(0x000000000000)
The following describes the fields in an RRPP packet:
l Destination MAC Address: indicates the destination MAC address of an RRPP packet.
This field occupies 48 bits.
l Source MAC Address: indicates the source MAC address of an RRPP packet. This field
occupies 48 bits and is the bridge MAC address of a device.
l EtherType: indicates the encapsulation type. This field occupies 16 bits and has a fixed
value of 0x8100 for tagged encapsulation.
l PRI: indicates the priority of Class of Service (COS). This field occupies 4 bits and has a
fixed value of 0xe.
l VLAN ID: indicates the ID of a VLAN to which the packet belongs. This field occupies
12 bits.
l Frame Length: indicates the length of the Ethernet frame. This field occupies 16 bits and
has a fixed value of 0x0048.

Equipment
l DSAP/SSAP: indicates the destination service access point/source service access point.
This field occupies 16 bits and has a fixed value of 0xaaaa.
l CONTROL: an 8 bit field of no significance. This field has a fixed value of 0x03.
l OUI: a 24 bit field of no significance. This field has a fixed value of 0x00e02b.
l RRPP_LENGTH: indicates the length of an RRPP data unit. This field occupies 16 bits
and has a fixed value of 0x0040.
l RRPP_VERS: indicates the version of an RRPP packet. This field occupies 8 bits, and
the current version is 0x01.
l RRPP TYPE: indicates the type of an RRPP packet. This field occupies 8 bits.
– HEALTH = 0x05
– COMPLETE-FLUSH-FDB = 0x06
– COMMON-FLUSH-FDB = 0x07
– LINK-DOWN = 0x08
– EDGE-HELLO = 0x0a
– MAJOR-FAULT= 0x0b
l DOMAIN_ID: indicates the ID of the RRPP domain to which the packet belongs. This
field occupies 16 bits.
l RING_ID: indicates the ID of the RRPP ring to which the packet belongs. This field
occupies 16 bits.
l SYSTEM_MAC_ADDR: indicates the bridge MAC address from which the packet is
sent. This field occupies 48 bits.
l HELLO_TIMER: indicates the timeout period of the Hello timer on the node that sends
the packet, in seconds. This field occupies 16 bits.
l FAIL_TIMER: indicates the timeout period of the Fail timer on the node that sends the
packet, in seconds. This field occupies 16 bits.
l LEVEL: indicates the level of the RRPP ring to which the packet belongs. This field
occupies 8 bits.
l HELLO_SEQ: indicates the sequence number of the Hello packet. This field occupies 16
bits.
5.6.2.2 RRPP Implementation
Polling Mechanism
l Hello timer and fail timer
When RRPP detects the link status of the Ethernet ring through the Polling mechanism,
the master node sends Hello packets according to the Hello timer and checks whether the
secondary port receives Hello packets within a set period according to the Fail timer.
Then, the master node determines whether to unblock the secondary port.
– The value of the Hello timer specifies the interval at which the master node sends
Hello packets from the primary port.
– The value of the Fail timer specifies the maximum delay during which the primary
port sends the Hello packet and the secondary port receives the Hello packet.
When the link is faulty, RRPP fast convergence enables the transit node in the Link-Up
state to immediately instruct the master node to unblock the secondary port through the
Link-Down packet. When the link recovers, the master node instructs the transit node to

Equipment
unblock the temporarily blocked port through the COMPLETE-FLUSH-FDB packet,

irrespective of the value ranges of the Hello and Fail timers.
The Fail timer on the transit node sets the time after which the temporarily blocked port
is unblocked.
l Process of the polling mechanism
The polling mechanism is used by the master node on an RRPP ring to detect the
network status. The process of the polling mechanism is as follows:
a. The master node periodically sends Hello packets from the primary port based on
the Hello timer.
b. Hello packets are transmitted through all transit nodes on the ring.
n If the secondary port of the master node receives the Hello packet before the
Fail timer expires, the master node believes that the ring is in the Complete
state.
n If the secondary port of the master node does not receive the Hello packet
before the Fail timer expires, the master node believes that the ring is in the
Failed state.
When the secondary port on the master node in the Failed state receives the Hello
packet, the master node performs the following operations.
n Goes to the Complete state.
n Blocks the secondary port.
n Refreshes the FDB.
n Sends packets from the primary port to instruct all transit nodes to unblock
temporarily blocked ports and refresh FDBs.
Link Fault Notification Mechanism

Devices serving as master nodes on an RRPP ring need to fast detect the link status change on
the ring. Master nodes can only fast detect the faults on their directly-connected links. To
detect faults on indirectly connected links, master nodes require the link fault notification
mechanism. When links recover, a link recovery mechanism is used to notify the master
nodes.
l Link fault notification mechanism

The link fault notification mechanism works as follows:
a. If a link on an RRPP ring fails, the port directly connected to the link becomes
Down.
b. The transit node immediately sends a Link-Down packet to the master node to
report the change of the link status.
c. When receiving the Link-Down packet, the master node knows that the ring fails,
immediately unblocks the secondary port, and sends the Link-Down packet to
notify other transit nodes to refresh FDBs.
d. After other transit nodes refresh their FDBs, the data stream is switched to the
normal link.
As shown in Figure 5-98, if a link on the ring goes Down, the two directly-connected
devices immediately send LINK-DOWN packets from the other ports to the master node.
The master node unblocks the blocked secondary port after receiving the LINK-DOWN
packets.

Equipment
l Link recovery mechanism

The link recovery mechanism works as follows:
a. If the faulty link is recovered, the port of the transit node changes to the Up state.
b. The transit node temporarily blocks the recovered port. However, the Hello packet
sent from the master node can pass through the temporarily blocked port.
c. After the secondary port on the master node receives the Hello packet, the master
node considers that the ring recovers to the healthy state.
d. The master node blocks the secondary port and sends packets to instruct all transit
nodes to unblock temporarily blocked ports and refresh FDBs.
In Figure 5-98, when the faulty link is recovered, transits nodes at the two ends
temporarily block the recovered ports. However, the Hello packet sent from the master
node can pass through the temporarily blocked port. After the secondary port on the
master node receives the Hello packet, the master node considers that the ring recovers
to the healthy state. Then, the master node blocks the secondary port and sends the
COMPLETE-FLUSH-FDB packet to instruct transit nodes to unblock the temporarily
blocked port and refresh FDBs. In this case, data stream can be switched back to the
normal path from the backup link, and the ring restores to the normal state.
Figure 5-98 Link recovery

Link Transit
Transit
Failure
○
P S (Block->Forward)
Master
Block
Data Packets P Primary port
LINK DOWN Packet S Secondary port
Checking the Channel Status of the Sub-Ring Protocol Packets on the Major
Ring
On a network where multiple sub-rings are crossed with the master ring. to prevent loops
among sub-rings after secondary ports are unblocked by master nodes on sub-rings, check the
channel status of the sub-ring protocol packets on the major ring.
In Figure 5-99, if the common link between the major ring and sub-ring is faulty and at lease
one non-common link is faulty, the master node of each sub-ring blocks its secondary port
("S" in the figure) because the secondary port no longer receives the Hello packet. Broadcast
loops (blue dashed lines in the figure) may occur between sub-rings. To prevent loops, check
the channel status of the sub-ring protocol packets on the major ring.

Equipment
Figure 5-99 Loop formation between sub-rings
Transit Edge
(Block)
P
Sub
(Block) Ring 2 Sub
S
Master 2
Major Ring
P Sub S
Ring 1
S Sub
Master P Master 1
Assistant-Edge
Block
EDGE-HELLO packets P Primary Port

S Secondary Port
MAJOR_FAULT packets Possible ring if the Edge
ports are not blocked
The mechanism works as follows:

1. Check the channel status of sub-ring protocol packets on the major ring.
The edge nodes on a sub-ring periodically send Edge-Hello packets to the major ring
through two RRPP ports on the major ring. The Edge-Hello packets pass through all the
nodes on the ring before reaching the assistant edge node. After receiving the Edge-Hello
packets, the assistant edge node does not forward the packets.
As shown in Figure 5-100, the edge node in the major ring sends Edge-Hello packets to
the major ring through two RRPP ports.
Figure 5-100 Sending Edge-Hello packets from an edge node

Edge
P
Master
S Block
Major Ring Sub Ring
S Block
Master Assistant
EDGE-HELLO
Data Packet
P primary port
S secondary port

Equipment
If the assistant edge node receives the Edge-Hello packets within the specified period,
the packet channel is normal. Otherwise, the channel is faulty.
2. The channel breaks off and the edge node blocks the edge port.
After the assistant edge node detects that the channel for sub-ring protocol packets
breaks off, the assistant edge node immediately sends the Major-Fault packet to the edge
node through the sub-ring. After receiving the Major-Fault packet, the edge node blocks
its edge port.
As shown in Figure 5-101, the assistant edge node sends the Major-Fault packet to the
edge node through the sub-ring.
Figure 5-101 Blocking the edge port in response to the Major-Fault packet received on
edge node
Edge
Block P
Master
S Block
Major Ring Sub Ring

P
S
Master Assistant
MAJOR-FAULT P primary port
Data Packet S secondary port
3. The master node of the sub-ring unblocks the secondary port even after the Hello timer
expires.
After the edge node blocks its edge port, the channel for sub-ring protocol packets breaks
off because of the failure in the major ring. Therefore, the master node of the sub-ring
cannot receive the Hello packet sent within the specified period. The master node,
therefore, turns Failed and unblocks the secondary port.
As shown in Figure 5-102, the edge node blocks its edge port. The master node of the
sub-ring unblocks the secondary port that is blocked in Figure 5-102.

Equipment
Figure 5-102 Sub-ring failed due to the blocked channel on the major ring
Edge
Block P
Master
S
Major Ring Sub Ring

P
S
Master Assistant
Data Packet
P primary port
S secondary port
4. The channel for the sub-ring protocol packets is recovered.

In Figure 5-103, after the links on the major ring are restored, the communication
between the edge node and assistant edge node recovers. Then, the channel for the sub-
ring protocol packets is recovered. The secondary port on the sub-ring can receive the
Hello packets sent from the master node. Then, the master node goes Complete and
blocks the secondary port.
Figure 5-103 Recovery of the channel of the sub-ring protocol packets

Edge
Block P
Master
S Block
Major Ring Sub Ring

P
S
Master Assistant
HELLO P primary port
In Figure 5-104, the master node on the sub-ring sends the COMPLETE-FLUSH-FDB
packets. After receiving the packets, the edge node unblocks the edge port.

Equipment
Figure 5-104 Unblocking the edge port on the edge node of the sub-ring
Edge
P
Master
S Block
Major Sub
Ring Ring
P
S
Master Assistant
COMPLETE-FLUSH-FDBP primary port
5.6.2.3 RRPP Running Principles
Single RRPP Ring

The following example describes the RRPP operation and topology convergence when ring
status changes from Complete to faulty and then returns to Complete.
1. The ring is in the Complete state.
Figure 5-105 RRPP ring in the Complete state
P S Block
Master
HELLO
Data Packet
P primary port
S secondary port
In Figure 5-105,
– If all links on the ring are in the Up state, the RRPP ring is in the Complete state.
The status of the master node reflects the health status of the ring.
– When the ring is in the Complete state, the master node blocks its secondary port to
prevent a broadcast loop.

Equipment
– The master node periodically sends a Hello packet from the primary port. The Hello
packet is transmitted through all transit nodes and reaches the secondary port of the
master node.
2. A link fails.
Figure 5-106 Transit node link fault reporting

Link Failure
P S Block
Master
LINK DOWN
Data Packet
P primary port
S secondary port
In Figure 5-106,
– When a link on an RRPP port of the transit node is faulty, the node sends a Link-
Down packet to inform the master node.
– When the master node receives the Link-Down packet, it changes its status from
Complete to Failed and unblocks the secondary port.
If the Link-Down packet is lost during transmission, the master node relies on the
Polling mechanism to restore communication between nodes. If the secondary port
of the master node does not receive a Hello packet from the primary port within
before the Fail timer expires, the master node will also discover the fault and
unblock the secondary port.
– When the network topology changes, the master node updates the forwarding
database (FDB) to ensure that packets can be sent to the correct destination. In
addition, the master node sends a COMMON-FLUSH-FDB packet from the
primary port and secondary port to instruct all transit nodes to update FDBs. As
shown in Figure 5-107.

Equipment
Figure 5-107 Master node changing to the Failed state

Link Failure
P S
Master
COMMON-FLUSH-FDB
Data Packet
P primary port
S secondary port
3. The fault is rectified.

– When an RRPP port of the transit node recovers, the transit node enters the
Preforwarding state and blocks the recovered port.
– The master node periodically sends Hello packets through the primary port. When
all the faulty links on the ring recover, the secondary port of the master node
receives the Hello packet.
– When the master node receives the Hello packet, it changes to the Complete state
and blocks the secondary port.
– The master node sends the COMPLETE_FLUSH_FDB packet through the primary
port to request all transit nodes to update their FDBs.
If the COMPLETE_FLUSH_FDB packet is lost during transmission, a standby
mechanism unblocks the temporarily blocked ports of transit nodes. If a transit node is in
the Preforwarding state and a COMMON-FLUSH-FDB packet has not been received
from the master node before the Fail timer expires, the temporarily blocked port is
unblocked.
After receiving the COMPLETE_FLUSH_FDB packet, the transit node changes to the
Link-Up state, unblocks the temporarily blocked ports, and updates FDBs, as shown in
Figure 5-108.

Equipment
Figure 5-108 Ring recovery
P S Block
Master
COMPLETE-FLUSH-FDB
Data Packet
P primary port
S secondary port
Multiple RRPP Rings

The operation process in multiple rings is similar to that of the single-ring. The main
difference is how the sub-ring protocol packets channel status is checked on the major ring.
For details, see "Checking the Channel Status of the Sub-Ring Protocol Packets on the Major
Ring."
Another difference is that, in the case of multiple rings, when nodes on the sub-ring receive
COMMON-FLUSH-FDB or COMPLETE-FLUSH-FDB packets from the sub-ring, they
update FDBs to relearn the address entries. Then, re-routing is performed for the data traffic.
The transit node on the major ring unblocks temporarily blocked ports only after it receives a
COMPLETE-FLUSH-FDB packet sent from the major ring instead of the sub-ring.
5.6.3 Applications
Single RRPP Ring
Figure 5-109 is the networking of a single RRPP ring. Normally, data flow is transmitted
along the path of Transit 1 -> Transit 2 -> Master. If the link between Transit 1 and Transit 2
fails, the path of the data flow on the RRPP ring changes.

Equipment
Figure 5-109 Single RRPP ring
RRPP
Domain
Transit 2
NodeB
P
Master IP/MPLS
Core
Transit 1 Block
S MSE/NPE
NodeB
Transit 3
Data Packet
NodeB
l Fast switchover of Layer 2 services

After being notified of faults on the link between Transit 1 and Transit 2, the master node
immediately unblocks the secondary port.
At this time, the network topology changes, the original MAC address table of each node
cannot correctly guide the Layer 2 forwarding. Therefore, Layer 2 traffic is interrupted.
After unblocking the secondary port, the master node immediately requires other nodes
on the ring to relearn MAC address entries. The Layer 2 traffic on the RRPP ring is
switched onto the path of Transit 1 -> Transit 3 -> Master.
l Fast switchover of Layer 3 services
After being notified of faults on the link between Transit 1 and Transit 2, the master node
immediately unblocks the secondary port.
At this time, the network topology changes, the original ARP and FIB of each node
cannot correctly guide the Layer 3 forwarding. After unblocking the secondary port, the
master node immediately requires other nodes on the ring to relearn MAC address
entries. The Layer 2 traffic on the RRPP ring is switched onto the path of Transit 1 ->
Transit 3 -> Master.
Tangent RRPP Rings

Generally, the metro Ethernet uses two-layer rings:
l One layer is the aggregation layer between the aggregation devices PE-AGGs, for
example, RRPP Domain 1 in Figure 5-110.
l The other layer is the access layer between PE-AGGs and UPEs, such as RRPP Domain
2 and RRPP Domain 3 in Figure 5-110.
As shown in Figure 5-110, in this networking, tangent RRPP rings can be adopted. That is,
the aggregation layer is the RRPP major ring; the access layer is the RRPP sub-ring.

Equipment
Figure 5-110 Application of the tangent RRPP rings

UPE
Master
UPE PE-AGG
RRPP
Domain Transit 1
2 PE-AGG
Master
RRPP P IP/MPLS
UPE Domain Core
UPE 1 S
UPE NPE
Block
RRPP
Domain PE-AGG
3 Transit 2
Master
UPE
LANSwitch CE DSLAM UMG

DSLAM: digital subscriber line access multiplexer UMG: universal media gateway
UPE: underlayer provider edge PE: provider edge
PE-AGG: PE-aggregation -
Two tangent rings cannot belong to the same RRPP domain. The tangency points are
configured to two domains. The master node on a ring can be the tangency point.
For multiple tangent RRPP rings, the failure in one ring does not affect other domains. The
convergence process of RRPP rings in a domain is the same as that of a single ring.
Association Between RRPP and STP

As shown in Figure 5-111, RRPP can be deployed on an Ethernet network enabled with STP/
RSTP/MSTP only in tangent mode. On an Ethernet network enabled with STP/RSTP/MSTP,
any RRPP node can be selected as the master node of the RRPP ring.

Equipment
Figure 5-111 Association between RRPP and STP
UPE PE-AGG NPE NPE

UPE
Master
RRPP Ring STP Network

UPE
UPE
PE-AGG NPE
UPE
UPE: underlayer provider edge NPE: network provider edge
PE-AGG: PE-aggregation -
CE Dual-homing to a VPLS Network

In Figure 5-112, CE1 accesses PE1 and PE3 through Layer 2 ports. PE1 and PE3 connect at
Layer 2. A layer 2 ring network forms across CE1, PE1, and PE3.
Figure 5-112 CE Dual-homing to a VPLS network
PE1 VPLS PE 2
Master
RRPP Ring
CE 1 CE 2
CE 2 does not
support the local
PE3 PE 4 switching.
CE: customer edge PE: provider edge
VPLS: virtual private LAN Service -
CE2 accesses PE2 and PE4 through Layer 2 interfaces. PE2 and PE4 connect at Layer 2. CE2
supports port isolation within a VLAN and does not support local switching. Therefore, CE2,
PE2, and PE4 cannot form a ring.
PE1, PE2, PE3, and PE4 set up a VPLS network. Therefore, a ring CE1 - PE1 - PE3 - CE1
forms. Enable RRPP on CE1, PE1, and PE3 can eliminate the loop.

Equipment

Term
Term Description
RRPP Rapid Ring Protection Protocol. A link layer protocol specially used to prevent
loops on an Ethernet ring network. Devices running RRPP discover loops on the
network by exchanging information with each other, and block certain interfaces
to eliminate loops.
MSTP Multi-Spanning Tree Protocol. A spanning tree protocol defined in IEEE 802.1s. It
introduces concepts of region and instance. Based on different requirements,
MSTP divides a big network into regions where multiple spanning tree instances
(MSTIs) are created. These MSTIs are mapped to virtual LANs (VLANs), and
bridge protocol data units (BPDUs) are transmitted between network bridges.
VPLS Virtual Private LAN Service. A type of point-to-multipoint service used in public
networks. VPLS ensures that isolated user sites can be connected through
MAN/WAN and two sites can communicate as if they were in a LAN.
FDB Forwarding database. A database that includes entries for guiding multicast data
forwarding. FDBs can be layer 2 or layer 3. The layer 2 FDB refers to the MAC
table, which provides information about the MAC address and outbound interface
and directs layer 2 forwarding. The layer 3 FDB refers to the ARP table, which
provides information about the IP address and outbound interface and directs layer
3 forwarding.

Abbreviation
5.7 LLDP
5.7.1 Introduction
Definition
The Link Layer Discovery Protocol (LLDP) is a Layer 2 discovery protocol defined in the
IEEE 802.1ab standard. Each LLDP interface stores local status information in the standard

Equipment
Simple Network Management Protocol (SNMP) Management Information Base (MIB). A

device can send local status information and status updates to its neighbors. The neighbors
store the received information in the standard SNMP MIB. The Network Management System
(NMS) can search the MIB for link layer information.
Purpose
The Ethernet technology is widely used on the Local Area Network (LAN) and Metropolitan
Area Network (MAN). Network scale expansion requires enhanced Ethernet network
management capabilities, such as the capabilities to automatically obtain the topology of
interconnected devices and solve configuration conflicts between different devices.
Currently, the NMS uses the automated discovery function to trace topology changes. Most
NMSs can only analyze topologies up to the network layer. The information obtained by these
NMSs concerns only basic events such as adding or deleting devices. The NMSs cannot
obtain information about interfaces through which a device connects to other devices. That is
to say, the NMSs cannot locate a device position or describe the current network topology.
LLDP is introduced to address these problems. LLDP provides information about device
positions and interfaces through which one device connect to other devices. In addition,
LLDP discovers the paths between NEs, such as a client, device, application server, and
network server.
Benefits
LLDP improves O&M efficiency by allowing an NMS to rapidly obtain Layer 2 network
topologies and topology changes.
5.7.2 Principles

Each LLDP interface has a standard SNMP MIB. An SNMP MIB stores local status
information, including the chassis ID, interface ID, and management address. A device can
send its status information and status updates to its neighbors as required. Neighbors store the
received information in the standard SNMP MIB so that the NMS can extract the network
topology information.
The NMS collects topology information from MIBs of all the managed devices and
determines the current link layer topology.
LLDP Packets
LLDP packets are Ethernet packets encapsulated with LLDP data units (LLDPDUs). LLDP
packets support two encapsulation modes: Ethernet II and Subnetwork Access Protocol
(SNAP). Currently, the versatile routing platform (ATN) supports the Ethernet II
encapsulation mode. Figure 5-113 shows the format of an Ethernet II LLDP packet.
Figure 5-113 Ethernet II LLDP packet format
Destination Source MAC

Type Data FCS
MAC address address
6 bytes 6 bytes 2 bytes 1500 bytes 4 bytes

Equipment
Table 5-33 describes the fields in an Ethernet II LLDP packet.
Table 5-33 Fields in an Ethernet II LLDP packet
Field Description
Destination MAC A fixed multicast MAC address 0x0180-C200-000E

address
Source MAC address An interface MAC address or a bridge MAC address for a device
(The interface MAC address takes precedence over the bridge
MAC address.)
Type Type of an LLDP packet, fixed at 0x88CC
LLDPDU Payload of an LLDP packet
FCS Frame check sequence
LLDPDU
An LLDPDU is a data unit encapsulated in the data field of an LLDP packet.
A device encapsulates local device information in type-length-value (TLV) format and

combines several TLVs in an LLDPDU for transmission. Various TLVs can be combined as
required to form an LLDPDU. A device uses these TLVs to advertise its status and learn the
status of neighboring devices.
Figure 5-114 shows the LLDPDU format.
Figure 5-114 LLDPDU format
Chassis ID Port ID Time to Live Optional Optional End of LLDPDU

...
TLV TLV TLV TLV TLV TLV
LLDP requires that each LLDPDU carry a maximum of 28 types of TLVs and that each
LLDPDU start with Chassis ID TLV, Port ID TLV, and Time to Live TLV, and end with
End of LLDPDU TLV. Other types of TLVs are optional.
TLV
A TLV is the smallest unit of an LLDPDU and indicates an object's type, length, and value.
For example, a device ID is carried in Chassis ID TLV, interface ID in Port ID TLV, and
network management address in Management Address TLV.
LLDPDUs can encapsulate basic TLVs.
l Basic TLVs: basis for network device management.

Equipment
Table 5-34 Basic TLVs

TLV Name TLV Type Description Mandatory
End of LLDPDU 0 End of an LLDPDU Yes

TLV
Chassis ID TLV 1 Bridge MAC address Yes

of the transmit device
Port ID TLV 2 Transmit interface of a Yes

device
Time To Live 3 Lifetime of local Yes

TLV device information on
neighboring devices
Port Description 4 String of characters No

TLV describing an Ethernet
interface
System Name 5 Device name No

TLV
System 6 System description No

Description TLV
System 7 Primary functions of No

Capabilities TLV the system and whether
these primary
functions are enabled
Management 8 Management address No

Address TLV
Reserved 9-126 Reserved for special No

use
Figure 5-115 shows the TLV format.
Figure 5-115 TLV format

TLV type TLV informatin string length TLV information string
7 bits 9 bits 0 - 511bytes
TLV header
l TLV type indicates the type of a TLV. This field occupies seven bits. Each TLV type has
a unique value. For example, the value of End of LLDPDU TLV is 0 and the value of
Chassis ID TLV is 1.
l TLV information string length indicates the length of the TLV information. This field
occupies 9 bits.
l TLV information string indicates TLV information. This field occupies a maximum of
511 bytes.

Equipment
LLDP Topology Discovery Between Directly Connected Neighbors

Figure 5-116 shows how LLDP discovers a network topology.
Figure 5-116 LLDP topology discovery between directly connected neighbors
NMS
MP
SN
SN
MP
LLDPDU
ATN A ATN B
SNMP Packets LDPDU frames
When LLDP is enabled on both ATNA and ATNB, LLDP discovers a network topology as
follows:
1. ATNA sends its status information to ATNB using LLDPDUs.
2. ATNB analyzes the received LLDPDUs and stores the analysis result in its LLDP remote
system MIB so that the NMS can extract the network topology information.
3. ATNB also sends its status information to ATNA.
4. ATNA analyzes the received LLDPDUs and stores the analysis result in its LLDP remote
system MIB so that the NMS can extract the network topology information.
5. The NMS extracts local information and neighbor information from ATNA and ATNB.
The NMS then analyzes the information and determines the network topology.
LLDP Topology Discovery Between Indirectly Connected Neighbors

When two neighbors are connected over an intermediate device, Figure 5-117 shows how
LLDP discoveries the network topology.

Equipment
Figure 5-117 LLDP topology discovery between indirectly connected neighbors
NMS
SNMP SNMP
Tu
nn
el
nn
el
Tu
ISP Network
LLDPDU LLDPDU
ATN A ATN B
LLDPDU Frames
SNMP Packets
LLDP discovers a network topology as follows:
1. ATNA sends LLDP multicast packets with a packet type of 0x88CC and a MAC address
of 01-80-C2-00-00-0E. The LLDP packets are transparently transmitted to ATNB
through a tunnel on the Internet Service Provider (ISP) network.
2. After receiving the LLDP packets, ATNB check the packet type and determines that
these LLDP packets can be processed. ATNB then further analyzes the LLDP packets
and stores the analysis result in its LLDP remote system MIB so that the NMS can
extract the network topology information.
3. ATNB sends LLDP packets in the same manner as ATNA. ATNA also analyzes the
LLDP packets sent from ATNB and stores the analysis result in its LLDP remote system
MIB so that the NMS can obtain the network topology information.
4. The NMS locates ATNA and ATNB based on the management addresses and obtains the
topology information for analysis.
NOTE
To implement LLDP topology discovery between indirectly connected neighbors, a tunnel must have
been established between ATNA andATNB on the ISP network for transparent transmission of LLDP
packets.

Equipment
5.7.2.2 LLDP Parameters

During topology discovery, adjusting LLDP parameters properly helps discover the topology
efficiently and reduce resource wastes. LLDP parameters include:
l Interval for sending LLDP packets

l Delay in sending LLDP packets
l Multiplier of the hold time of local information on neighbors
l Delay in re-enabling LLDP on an interface
l Delay in sending LLDP traps
Interval for Sending LLDP Packets

When a device's status remains unchanged, the device sends LLDP packets to neighbors at a
specific interval.
After the interval is set on a device, all LLDP interfaces on the device also send LLDP
packets to neighbors at this interval. The time at which these interfaces begin to send LLDP
packets can be different.
The interval for sending LLDP packets determines the network topology discovery speed and
can be adjusted according to the network load:
l A larger value reduces the frequency at which LLDP packets are exchanged, and
therefore conserves system resources. However, if the value is too large, a device may
fail to efficiently notify neighbors of its status, affecting network topology change
discovery.
l A smaller value increases the frequency at which a local device sent its status
information to its neighbors, helping the NMS to efficiently discover network topology
changes. However, if the value is too small, LLDP packets are exchanged too frequently,
increasing the system burden and wasting resources.
Delay in Sending LLDP Packets

Delay in sending LLDP packets refers to the minimum delay between successive LLDP
packet transmissions. After the delay in sending LLDP packets is set on a device, all LLDP
interfaces on the device take this value as the minimum delay in sending LLDP packets to
neighbors. The time at which these interfaces begin to send LLDP packets can be different.
When the status of a device changes frequently, increase the delay to reduce the frequency at
which LLDP packets are sent to neighbors.
The delay in sending LLDP packets must be adjusted according to the network load:
l A larger value reduces the frequency at which LLDP packets are exchanged, and
therefore conserves system resources. However, if the value is too large, a device may
fail to efficiently notify neighbors of its status, affecting network topology change
discovery.
l A smaller value increases the frequency at which a local device sent its status
information to its neighbors, helping the NMS to efficiently discover network topology
changes. However, if the value is too small, LLDP packets are exchanged too frequently,
increasing the system burden and wasting resources.

Equipment
l The interval at which LLDP packets are sent must be equal to or four times greater than
the delay in sending LLDP packets. Figure 5-118 shows the relationship between the
interval and delay for sending LLDP packets.
Figure 5-118 Relationship between the interval and delay in sending LLDP packets
An LLDP frame is
sent and the Interval
timer is triggered
Does local
status information No
change within the
interval? B
Yes
LLDP frames are sent

and the Interval timer
and Delete timer are
triggered again
The Delete
timer times out and Yes
check whether local
status information
changes
D
C No
The Interval
timer continues
until it times out
int: is short for interval, indicating the interval for sending LLDP frames.
del: is short for delay, indicating the delay in sending LLDP frames.
A, B, C, and D refer to different time points at which LLDP frames are sent.
Figure 5-119 shows the process of sending LLDP packets at different time points.

Equipment
Figure 5-119 Sending LLDP packets at different time points
interval
A
interval
B
interval
delay
C
interval
delay
D
LLDP frames sent after LLDP frames sent after local

the interval times out status information changes
A: The first LLDP packet is sent.

B: When local status information does not change within the interval, LLDP packets are sent
after the interval times out.
C: When local status information changes within the interval, LLDP packets are sent, and the
Interval timer and Delay timer are triggered.
D: After the Interval timer and Delay timer are triggered, if local status information changes
within the delay period, LLDP packets are sent again after the delay times out, and the
Interval timer and Delay timer are triggered again.
Multiplier of the Hold Time of Local Information on Neighbors

The multiplier of the hold time of local information on neighbors (hold multiplier) is used to
calculate the hold time of LLDP packets on neighbors. After receiving LLDP packets, devices
use the hold time to update the aging time of neighbor information.
The formula for calculating the hold multiplier is as follows:
TTL = Min [65535, (interval * hold multiplier)]
l Time to Live (TTL): specifies the hold time of local information on neighbors. The value
is the smaller one between 65535 and the value of interval * hold multiplier.
l interval: specifies the interval at which LLDP packets are sent to neighbors.
l hold multiplier: specifies the multiplier of the hold time of local information on
neighbors.
Delay in Re-enabling LLDP on an Interface

The delay in re-enabling LLDP on an interface refers to a hold-off time for re-enabling LLDP
from the disabled state on an interface. If the LLDP status of an interface changes frequently,
set the delay in re-enabling LLDP on the interface to suppress the topology flappings of
neighbors.

Equipment
Delay in Sending LLDP Traps

The delay in sending LLDP traps is the minimum delay for a device to send LLDP traps to the
NMS when the LLDP trap function is enabled and information in the LLDP remote system
MIB frequently changes. The delay in sending LLDP traps applies only to the traps generated
in the following scenarios:
l Adding neighbors
l Deleting neighbors
l Aging neighbors
l Discarding neighbors
After the delay in sending LLDP traps is set on a device, all LLDP interfaces on the device
take this value as the minimum delay in sending the traps to neighbors. The time at which
these interfaces begin to send the traps can be different. When neighbor information changes
frequently, you can increase the delay to reduce the frequency at which traps are sent to the
NMS, thereby suppressing topology flappings.
5.7.2.3 LLDP Implementation
Operation Mode
The ATN allows LLDP to work in duplex operation mode. Specifically, the ATN can send and
receive LLDP packets at the same time.
Enabling LLDP Globally

Enabling LLDP on the ATN allows neighbors to exchange status information. LLDP must be
globally enabled on a device before any other LLDP-related operations, such as enabling or
disabling LLDP on an interface, configuring LLDP parameters, and setting an LLDP
management address.
Enabling LLDP on an Interface

After LLDP is enabled globally on the ATN, all Ethernet physical interfaces are automatically
enabled with LLDP. You can disable and then re-enable LLDP on an interface as required.
Setting an LLDP Management Address

Setting an LLDP management address enables the NMS to identify devices for management.
LLDP must be globally enabled on the ATN before an LLDP management address is set.
The LLDP management address must be an existing unicast IP address not reserved for the
system. If a specified IP address is invalid or no LLDP management address is configured, the
system selects an IP address in the following sequence: first selects the IP address of the
loopback interface, then the address of a VLANIF interface, and finally the smallest IP
address in the IP address table. If none of the preceding addresses are available, the system
takes the bridge MAC address as the LLDP management address.
Enabling the LLDP Trap Function

After the LLDP trap function is enabled on a device, the device can send traps to instruct the
NMS to update topology information in the following situations:

Equipment
l LLDP is globally enabled or disabled on a device.

l The LLDP management address is changed.
l Neighbor information changes.
5.7.3 Applications
LLDP Application on the Carrier's Network

In Figure 5-120, available links exist between ATN A and ATN B, and between ATN A and
ATN C. Available links also exist between ATN A and the NMS, and between ATN C and the
NMS. LLDP is configured on ATN A, ATN B, and ATN C, and ATN A, ATN B, and ATN C
exchange LLDP packets through available links to obtain the status of each other. In addition,
the NMS can use the LLDP management address to locate ATN A and ATN C for network
topology.
Figure 5-120 LLDP configurations on a network where an interface has multiple neighbors
NMS
SNMP
SNMP
LLDPDU
CX-D CX-F
LL
D
PD
U
LL
D
U
PD
PD
LLDPDU
Router E U
D
LL
10.10.10.1 10.10.10.2 10.10.10.3

ATNA ATNB ATNC
SNMP packets
LLDPDU
Interfaces enables with LLDP
NMS Network Management System

Abbreviation
LLDP Link Layer Discovery Protocol
LLDPDU LLDP data unit

Equipment

Abbreviation
MIB management information base
5.8 Transparent Transmission of Layer 2 Protocol Packets

NOTE
Among ATN 950B series , only ATN 950B(AND2CXPB/AND2CXPE) supports Layer 2 protocol transparent
transmission.
5.8.1 Introduction to Transparent Transmission of Layer 2

Protocol Packets
Definition
Transparent transmission of Layer 2 protocol packets indicates that the packets of standard
protocols such as Spanning Tree Protocol (STP), Link Aggregation Control Protocol (LACP),
HUAWEI Group Management Protocol (HGMP), and user-defined protocols are transparently
transmitted on a Layer 2 network through Layer 2 tunneling technologies.
NOTE
Layer 2 networks involved in Layer 2 protocol tunneling refer to networks constructed by Layer 2
interfaces but not Layer 2 virtual private networks (L2VPNs).
Purpose
Transparent transmission of Layer 2 protocol packets is a technology used to transparently
transmit the protocol packets of users over the ISP network. On the ingress of the ISP
network, protocol packets sent by users are forwarded to the ISP network after their multicast
destination MAC addresses are changed or modified; on the egress of the ISP network, the
multicast destination MAC addresses of the protocol packets are restored to the original ones.
5.8.2 Principles
5.8.2.1 Basic Concepts of Transparent Transmission of Layer 2 Protocol Packets
Background of Transparent Transmission of Layer 2 Protocol Packets

Some Layer 2 protocols, such as Multiple Spanning Tree Protocol (MSTP), HUAWEI Group
Management Protocol (HGMP), and Link Aggregation Control Protocol (LACP) running
between user networks, need to traverse the Internet Service Provider (ISP) network to
perform Layer 2 protocol calculation.
As shown in Figure 5-121, a certain Layer 2 protocol such as MSTP is running in user
network1 and user network2. The Layer 2 protocol packets in user network1 must traverse the
ISP network to reach user network2 to perform Spanning Tree Protocol (STP) calculation.

Equipment
Generally, the destination MAC addresses of Layer 2 protocol packets are the same. For
example, the MSTP packets are BPDUs, of which the destination MAC address is 0180-
C200-0000. Therefore, when a Layer 2 protocol packet reaches a PE on the ISP network, the
PE sends the protocol packet to the CPU to perform STP calculation, without identifying
whether the protocol packet comes from a user network or the ISP network.
In this case, devices in user network1 perform STP calculation together with PE1 rather than
devices in user network2. As a result, the Layer 2 protocol packets in user network1 cannot
traverse the ISP network to reach user network2.
Figure 5-121 Transparent transmission of Layer 2 protocol packets in the ISP network
ISP
network
PE1 PE2
CE1 CE2
User User
network1 network2
To address the preceding problem, you can configure transparent transmission of Layer 2
protocol packets. Currently, the Huawei devices support the transparent transmission of
packets of the following Layer 2 protocols:
l Cisco Discovery Protocol (CDP)
l Device link detection protocol(DLDP)
l Dynamic Trunking Protocol (DTP)
l Ethernet Operation, Administration, and Maintenance 802.3ah (EOAM3ah)
l Generic Multicast Registration Protocol (GMRP)
l Generic VLAN Registration Protocol (GVRP)
l HUAWEI Group Management Protocol (HGMP)
l Link Aggregation Control Protocol (LACP)
l Link Layer Discovery Protocol (LLDP)
l Port Aggregation Protocol (PAGP)
l Per VLAN Spanning Tree Plus (PVST+)
l Spanning Tree Protocol (STP)
l Unidirectional Link Detection (UDLD)
l VLAN Trunking Protocol (VTP)
l User-defined protocols
If Layer 2 protocol packets need to be transparently transmitted on the ISP network, the
following conditions must be met during the transmission process:

Equipment
l Each site of a user network can receive the Layer 2 protocol packets from other sites.
l The Layer 2 protocol packets of a user network cannot be processed by the CPUs of the
devices on the ISP network.
l Layer 2 protocol packets of different user networks must be isolated and do not affect
each other.
Transparent transmission of Layer 2 protocol packets can prevent the Layer 2 protocol
packets of different user networks from interfering in each other, which cannot be achieved by
the previous technologies.
Bridge Protocol Data Unit (BPDU)

BPDUs are common Layer 2 protocol packets. For example, STP and HGMP use BPDUs as
protocol packets. The BPDUs are special protocol packets that are multicast between Layer 2
switches. The encapsulation of BPDUs conforms to IEEE 802.3 and the encapsulation format
is shown in Figure 5-122. BPDUs of various protocols are multicast with different destination
MAC addresses.
Figure 5-122 Format of a BPDU

3
0 7 15 23
1
Destination address (0180-C200-0000)
Source address
Length
BPDU Data
A BPDU consists of the following fields:

l Destination Address: is of 6 bytes and indicates the destination MAC address.
l Source Address: is of 6 bytes and indicates the source MAC address.
l Length: is of 2 bytes and indicates the length of the BPDU.
l BPDU Data: indicates the contents of the BPDU.
Transparent transmission of Layer 2 protocol packets provides a BPDU tunnel for BPDUs.
BPDU tunneling is a Layer 2 tunneling technology that enables the provider network to
transparently transmit BPDUs from customer networks at different locations. In this manner,
mutual interference between the customer networks and the provider network is prevented.
5.8.2.2 Principles of Transparent Transmission of Layer 2 Protocol Packets

Layer 2 protocol packets are transparently transmitted based on the following principles:
l On the ingress PE of the ISP network, the destination multicast MAC address of a Layer
2 protocol packet is replaced with a specified multicast MAC address.
l The devices on the ISP network determine whether to add an outer VLAN tag to the
protocol packet according to the configured transparent transmission mode.
l When the Layer 2 protocol packet reaches the egress, the destination multicast MAC
address of the Layer 2 protocol packet is restored to the standard destination multicast

Equipment
MAC address according to the mapping between the specific destination multicast MAC
address configured on the device and the Layer 2 protocol. In addition, the egress
determines whether to remove the outer VLAN tag according to the configured
transparent transmission mode, and then forwards the protocol packet to the UPE.
The Huawei devices support the following transparent transmission modes of Layer 2
protocol packets in different application scenarios:
l Interface-based transparent transmission of Layer 2 protocol packets

l VLAN-based transparent transmission of Layer 2 protocol packets
l QinQ-based transparent transmission of Layer 2 protocol packets
l Hybrid VLAN-based transparent transmission of Layer 2 protocol packets
Currently, the Huawei devices support the transparent transmission of packets of the
following Layer 2 protocols:
Interface-based Transparent Transmission of Layer 2 Protocol Packets
Figure 5-123 Interface-based transparent transmission of Layer 2 protocol packets
Port based Port based

VLAN 200 VLAN 200
LAN-B LAN-B
MSTP MSTP
ISP Network
PE1 BPDU Tunnel 200 PE2
BPDU Tunnel 300

VLAN 300 VLAN 300
LAN-A PE3 LAN-A
MSTP Port based MSTP
VLAN 200
LAN-B
MSTP
As shown in Figure 5-123, each interface on a PE connects to one user network. The user
networks belong to different LANs, that is, LAN-A and LAN-B. BPDUs sent from user
networks to the PE are untagged. The PE, however, needs to identify that LAN from which
the BPDUs come. BPDUs of a user network in LAN-A must be sent to other user networks in
LAN-A rather than the user networks in LAN-B. In addition, BPDUs must not be processed
by PEs.
In this application scenario, the following processing methods are available:

Equipment
l Change the default multicast MAC address of the Layer 2 BPDU that can be identified
by the devices on the ISP network into another multicast MAC address.
a. Set the roles of the ingress device on the ISP network to provider. Therefore, the
destination MAC addresses of the BPDUs sent by the devices on the ISP network
are changed to 01-80-C2-00-00-08 instead of the original 01-80-C2-00-00-00.
b. Set the roles of all devices in a user network to customer. Therefore, the destination
MAC addresses of the BPDUs sent by the user network are still 01-80-
C2-00-00-00.
By default, the device is configured as the customer on the network.
c. On the device of the ISP network, add the interfaces that connect to the same user
network to the same VLAN. After receiving the Layer 2 protocol packet from the
user network, the device on the ISP network adds the default VLAN ID of the
interface to the packet.
d. The devices (of the provider type) on the ISP network do not take the BPDU as the
Layer 2 BPDU and do not send the BPDU to the CPU for processing. Instead, the
devices select a corresponding Layer 2 tunnel according to the default VLAN ID of
the interface to forward the BPDU.
e. The BPDU is normally forwarded by the devices on the ISP network and normally
traverses the ISP network.
f. When reaching the egress on the ISP network, the Layer 2 BPDU is forwarded to
the UPE without being changed.
l Replace the original multicast MAC address of the Layer 2 BPDU with a specified
multicast MAC address.
NOTE
This method applies to all types of transparent transmission of Layer 2 protocol packets.
a. After receiving and identifying the Layer 2 protocol packet (such as a BPDU of the
STP protocol) from the user network, the device on the ISP network adds the
default VLAN ID of the interface to the Layer 2 protocol packet.
b. According to the mapping between the specified destination multicast MAC address
and the Layer 2 protocol, the device on the ISP network changes the standard
destination multicast MAC address of the Layer 2 BPDU into the specified
destination multicast MAC address.
c. The Layer 2 BPDU is normally forwarded by the devices on the ISP network,
therefore successfully traversing the ISP network.
d. When the Layer 2 BPDU reaches the egress, the egress restores the destination
multicast MAC address to the standard destination multicast MAC address of the
Layer 2 BPDU according to the mapping between the special destination multicast
MAC addresses and Layer 2 protocols, and then forwards the BPDU to the UPE.

Equipment
VLAN-based Transparent Transmission of Layer 2 Protocol Packets
Figure 5-124 VLAN-based transparent transmission of Layer 2 protocol packets
LAN-B LAN-B
MSTP MSTP
CE-VLAN 100 CE-VLAN 100
PE 1 ISP Network PE 2
BPDU Tunnel

Trunk Trunk
100-200 100-200
PE 3
CE-VLAN 100
LAN-A LAN-A
MSTP MSTP
LAN-B
MSTP
In most cases, a PE serves as a convergence device. As shown in Figure 5-124, the

convergence interface on PE1 receives Layer 2 protocol packets from LAN-A and LAN-B. To
differentiate BPDUs from two LANs, BPDUs sent from the CE to the PE must be tagged. The
VLAN ID of a BPDU from LAN-A is 200 and the VLAN ID of a BPDU from LAN-B is 100.
Currently, some Layer 2 protocol packets, such as protocol packets of a spanning tree
protocol, need carry VLAN tags. When receiving Layer 2 protocol packets with VLAN tags, a
device on the ISP network considers them as invalid protocol packets and discards them. To
avoid this problem, you can configure VLAN-based transparent transmission of Layer 2
protocol packets on the devices on the ISP network. In this manner, the Layer 2 protocol
packets can traverse the ISP network through Layer 2 tunnels.
Similar to the interface-based transparent transmission of Layer 2 protocol packets, there are
two processing methods in this application scenario:
l Change the default multicast MAC address of the Layer 2 protocol packet that can be
identified by the devices on the ISP network into another multicast MAC address.
a. Set the roles of the ingress device on the ISP network to provider. Therefore, the
destination MAC addresses of the BPDUs sent by the devices on the ISP network
are changed to 01-80-C2-00-00-08 instead of the original 01-80-C2-00-00-00.

Equipment
b. Set the roles of all devices in a user network to customer. Therefore, the destination
MAC addresses of the BPDUs sent by the user network are still 01-80-
C2-00-00-00.
c. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ISP network.
d. Configure the devices on the ISP network to identify the Layer 2 protocol packets
with VLAN IDs and allow the packets to pass through.
e. The devices (of the provider type) on the ISP network do not take the packet as the
BPDU and do not send the packet to the CPU for processing. Instead, the devices
select a corresponding Layer 2 tunnel to forward the packet according to the VLAN
IDs with which the packets are allowed to pass through.
f. The Layer 2 protocol packet is transmitted as an ordinary Layer 2 packet by the
devices on the ISP network, therefore successfully traversing the ISP network.
g. When reaching the egress on the ISP network, the Layer 2 protocol packet is
forwarded to the CE without being changed.
l Replace the original multicast MAC address of the Layer 2 protocol packet with a
specified multicast MAC address.
NOTE
This method applies to transparent transmission of all types of Layer 2 protocol packets.
a. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
b. Configure the devices on the ISP network to identify the Layer 2 protocol packets
with VLAN IDs and allow the packets to pass through.
c. According to the mapping between the specified destination multicast MAC address
and the Layer 2 protocol, the device on the ISP network changes the standard
destination multicast MAC address of the Layer 2 protocol packet into the specified
destination multicast MAC address.
d. After the MAC address is changed, the Layer 2 protocol packet is transmitted as an
ordinary Layer 2 packet by the devices on the ISP network, therefore successfully
traversing the ISP network.
e. When the Layer 2 protocol packet reaches the egress, the egress restores the
destination multicast MAC address to the standard destination multicast MAC
address according to the mapping between the specified destination multicast MAC
addresses and Layer 2 protocols, and then forwards the Layer 2 protocol packet to
the CE.
QinQ-based Transparent Transmission of Layer 2 Protocol Packets

l QinQ overview
The QinQ protocol is a Layer 2 tunneling protocol based on the IEEE 802.1Q
technology. The QinQ technology improves the utilization of VLANs by adding another
802.1Q tag. In this manner, services in the private VLAN can be transparently
transmitted on the public network. The packet transmitted on the ISP network carries
double 802.1Q tags (a public VLAN tag and a private VLAN tag), that is, 802.1Q-
in-802.1Q. It is also called the QinQ protocol.
Figure 5-125 shows the format of a QinQ packet. Compared with the 802.1Q packet, the
QinQ packet has a tag suffixed to the source address (SA). This tag is known as the outer

Equipment
tag or public tag, used for carrying the VLAN ID of a public network. The inner tag is
usually known as the private tag, used for carrying the VLAN ID of a private network.
NOTE
The QinQ function configured on a Layer 2 interface is also called VLAN stacking.
Figure 5-125 802.1Q Encapsulation and QinQ Encapsulation

802.1Q Encapsulation
DA SA ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 42 Bytes~1500 Bytes 4 Bytes
QinQ
Encapsulation
DA SA ETYPE TAG ETYPE TAG LEN/ETYPE DATA FCS
6 Bytes 6 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 38 Byte~1500 Bytes 4 Bytes
0x8100 Priority CFI VLAN ID
l QinQ-based transparent transmission of Layer 2 protocol packets
Figure 5-126 QinQ-based transparent transmission of Layer 2 protocol packets
LAN-B LAN-B
MSTP MSTP
PE-VLAN20:CE-VLAN 100~199
PE PE 2
ISP Network
1
CE-VLAN 100 BPDU Tunnel CE-VLAN 100
BPDU Tunnel
LAN-A LAN-A
MSTP MSTP
If Layer 2 protocol packets are still transmitted transparently in VLAN-based mode

when many user networks are connected to the ISP network, a large number of VLAN
IDs of the ISP network are required. This may result in insufficient VLAN ID resources.
In this case, you can configure the QinQ function to forward Layer 2 protocol packets.

Equipment
As shown in Figure 5-126, the convergence interfaces on the PEs are configured with
the function of QinQ-based transparent transmission of Layer 2 protocol packets. Then,
the PEs add different outer tags to the packets from different user networks.
In this application scenario, the following processing methods are available:
– Change the default multicast MAC address of the Layer 2 BPDU that can be
identified by the devices on the ISP network into another multicast MAC address.
i. Set the roles of the ingress device on the ISP network to provider. Therefore,
the destination MAC addresses of the BPDUs sent by the devices on the ISP
network are changed to 01-80-C2-00-00-08 instead of the original 01-80-
C2-00-00-00.
ii. Set the roles of all devices in a user network to customer. Therefore, the
destination MAC addresses of the BPDUs sent by the user network are still
01-80-C2-00-00-00.
iii. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
iv. Configure transparent transmission of Layer 2 protocol packets and the QinQ
function on the interfaces of the ingress on the ISP network.
v. According to the user VLAN IDs, the ingress on the ISP network allocates
different outer tags, that is, the public VLAN IDs, to the Layer 2 protocol
packets.
vi. The ingress on the ISP network selects different Layer 2 tunnels according to
different outer tags. Then, the layer 2 protocol packets are transmitted as
ordinary Layer 2 packets by the devices on the ISP network.
vii. The Layer 2 protocol packet is transmitted as an ordinary Layer 2 packet by
the devices on the ISP network, therefore successfully traversing the ISP
network.
viii. The egress removes the outer tags and forwards the Layer 2 protocol packets
to the corresponding user networks according to the inner tags.
As shown in Figure 5-126, after receiving the BPDUs with the tags ranging from
100 to 199, the PEs label the BPDUs with the outer tag 20, and then forward the
BPDUs in the ISP network; after receiving the BPDUs with the tags ranging from
200 to 299, the PEs label the BPDUs with the outer tag 30, and then forward the
BPDUs in the ISP network. In this way, the BPDUs of different user networks can
be transparently transmitted in the ISP network; moreover, less VLAN IDs are
occupied.
– Replace the original multicast MAC address of the Layer 2 protocol packet with a
i. Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user
networks to the ingress device on the ISP network.
ii. Configure transparent transmission of Layer 2 protocol packets and the QinQ
function on the interfaces of the ingress on the ISP network.
iii. According to the user VLAN IDs, the ingress on the ISP network allocates
different outer tags, that is, the public VLAN IDs, to the Layer 2 protocol
packets.
iv. The ingress on the ISP network selects different Layer 2 tunnels according to
different outer tags. Then, the layer 2 protocol packets are transmitted as
ordinary Layer 2 packets by the devices on the ISP network.

Equipment
v. Configure transparent transmission of Layer 2 protocol packets and the QinQ

function on the interfaces of the egress on the ISP network.
vi. The egress removes the outer tags and forwards the Layer 2 protocol packets
to the corresponding user networks according to the inner tags.
As shown in Figure 5-126, after receiving a Layer 2 protocol packet from a VLAN
with the ID ranging from 100 to 199, PE1 adds VLAN ID 20 as an outer VLAN ID
to the packet, and forwards the packet on the ISP network through a Layer 2 tunnel.
After receiving a Layer 2 protocol packet from a VLAN with the ID ranging from
200 to 299, PE1 adds VLAN ID 30 as an outer VLAN ID to the packet, and
forwards the packet on the ISP network through a Layer 2 tunnel. In this manner,
Layer 2 protocol packets from different user networks can be transparently
transmitted on the ISP network, and VLAN ID resources of the operator can be
saved.
Hybrid VLAN-based Transparent Transmission of Layer 2 Protocol Packets
Figure 5-127 Hybrid VLAN-based transparent transmission of Layer 2 protocol packets

VLAN3
PE2
VLAN3
VLAN3 CE2 LAN-C
LAN-A
VLAN3
ISP
VLAN2
VLAN2
CE1
PE1 VLAN2
VLAN2
PE3
LAN-B
CE3 LAN-D
As shown in Figure 5-127, PE1, PE2, and PE3 are connected to construct a Layer 2 network;
VLAN 2 and VLAN 3 are respectively created in user networks LAN-A and LAN-C and in
user networks LAN-B and LAN-D; Layer 2 protocol packets with VLAN IDs as VLAN 2 and
VLAN 3 are sent from LAN-A and LAN-B, and then forwarded by CE1, CE2, and CE3. In
addition, a standard Layer 2 protocol, such as the Link Layer Discovery Protocol (LLDP), of
the untagged type needs to be run between CE1, CE2, and CE3.
In this scenario, PEs may receive Layer 2 protocol packets with VLAN IDs and without
VLAN IDs. In this case, you can configure hybrid VLAN-based transparent transmission of
Layer 2 protocol packets on the PEs of the ISP network to enable the PEs to transparently
transmit Layer 2 protocol packets with VLAN tags and without VLAN tags.
NOTE
Hybrid VLAN-based Layer 2 protocol tunneling functions as a combination of interface-based and

VLAN-based Layer 2 protocol tunneling. For details about the tunneling process, see Interface-based
Transparent Transmission of Layer 2 Protocol Packets and VLAN-based Transparent
Transmission of Layer 2 Protocol Packets.
5.8.3 Applications

Equipment
5.8.3.1 Interface-based Transparent Transmission of Layer 2 Protocol Packets

As shown in Figure 5-128, PEs on the Layer 2 switching network can transparently transmit
Layer 2 Control Protocol packets from access users.
Figure 5-128 Interface-based transparent transmission of Layer 2 control protocol packets on

a Layer 2 network

VLAN 200 VLAN 200
LAN-B LAN-B
MSTP MSTP
ISP Network
PE1 BPDU Tunnel 200 PE2
BPDU Tunnel 300

VLAN 300 VLAN 300
PE3 LAN-A
LAN-A
MSTP MSTP
Port based
VLAN 200
LAN-B
MSTP
PE1, PE2, and PE3 are connected to construct a Layer 2 switching network, and access LAN-
A and LAN-B through different interfaces. Each LAN runs Layer 2 control protocol packets.
Here, STP is taken as an example.
The process of transparently transmitting Layer 2 control protocol packets is as follows:
l The type of the Layer 2 control protocol packets that need to be transparently transmitted
is set on the interfaces that connect PE1, PE2, and PE3 to CEs, and the original multicast
MAC address of Layer 2 protocol packets from user networks is replaced with a
l After identifying that the packets received from CEs are Layer 2 control protocol
packets, PE1 replaces the original multicast MAC address of the packets with the
specified multicast MAC address according to the configured mapping, and then
forwards the packets. The packets whose multicast MAC address is replaced with the
specified multicast MAC address are forwarded as common Layer 2 packets on the ISP
network.
l When the packets reach PE2, PE2 restores the multicast MAC address of the packets to
the standard multicast MAC address according to the configured mapping between
multicast MAC addresses and Layer 2 control protocol packets, and then forwards the
packets to the corresponding CE, completing transparent transmission of Layer 2
protocol packets.

Equipment
5.8.3.2 VLAN-based Transparent Transmission of Layer 2 Protocol Packets

As shown in Figure 5-129, Layer 2 control protocol packets with a VLAN tag need to be
transparently transmitted. Therefore, the devices in VLAN 100 and VLAN 200 are required to
transparently transmit the Layer 2 protocol packets.
Figure 5-129 VLAN-based transparent transmission of Layer 2 control protocol packets on a

Layer 2 network
LAN-B LAN-B
MSTP MSTP
PE 1 ISP Network PE 2
BPDU Tunnel

Trunk Trunk
100-200 100-200
PE 3 CE-VLAN 100
LAN-A LAN-A
MSTP MSTP
LAN-B
MSTP
PE1, PE2, and PE3 are connected to construct a Layer 2 ISP network. CEs add one tag to
Layer 2 control protocol packets from user networks and then send them to the PEs. The
packets received by PEs have only one tag.
l VLAN-based transparent transmission of Layer 2 control protocol packets is configured
on the interfaces that connect PE1, PE2, and PE3 to CEs.
l After identifying that the packets received from CEs are Layer 2 control protocol
packets, PE1 replaces the original multicast MAC address of the packets with the
specified multicast MAC address according to the configured mapping, and then
forwards the packets. The packets whose multicast MAC address is replaced with the
specified multicast MAC address are forwarded as common Layer 2 VLAN packets on
the ISP network.
l When the packets reach PE2, PE2 restores the multicast MAC address of the packets to
the standard multicast MAC address according to the configured mapping between

Equipment
multicast MAC addresses and Layer 2 control protocol packets, and then forwards the
packets to the corresponding CE, completing transparent transmission of Layer 2
protocol packets.
5.8.3.3 QinQ-based Transparent Transmission of Layer 2 Protocol Packets

As shown in Figure 5-130, when the edge devices on the ISP are connected to a large number
of VLAN users, you can configure QinQ-based transparent transmission of Layer 2 control
protocol packets on the devices to save VLAN resources.
Figure 5-130 QinQ-based transparent transmission of Layer 2 control protocol packets
LAN-B LAN-B
MST
MSTP P
PE PE 2
ISP Network
1
CE-VLAN 100 BPDU Tunnel CE-VLAN 100
BPDU Tunnel
PE-VLAN30:CE-VLAN
200~299
LAN-A LAN-A
MSTP MSTP
PE1 and PE2 are connected to construct a Layer 2 switching network. VLAN 20 and VLAN
30 are configured on the PEs. CEs send tagged Layer 2 control protocol packets (VLAN ID
being 100 or 200) to the PEs. QinQ is configured on the interfaces that connect PE1 and PE2
to CEs.
l Set specific VLAN IDs for the Layer 2 protocol packets that are sent from user networks
to the ISP network.
l Configure transparent transmission of Layer 2 protocol packets and the QinQ function on
the interfaces of the ingress on the ISP network.
l According to the user VLAN IDs, the ingress on the ISP network allocates different
outer tags, that is, the public VLAN IDs, to the Layer 2 protocol packets.
l The ingress on the ISP network selects different Layer 2 tunnels according to different
outer tags. Then, the layer 2 protocol packets are transmitted as ordinary Layer 2 packets
by the devices on the ISP network.

Equipment
l Configure transparent transmission of Layer 2 protocol packets and the QinQ function on
the interfaces of the egress on the ISP network.
l The egress removes the outer tags and forwards the Layer 2 protocol packets to the
corresponding user networks according to the inner tags.
5.8.3.4 Hybrid VLAN-based Transparent Transmission of Layer 2 Protocol

Packets
As shown in Figure 5-131, when the devices on the ISP network are connected to VLAN
users, the devices can transparently transmit tagged Layer 2 control protocol packets and
untagged Layer 2 control protocol packets.
Figure 5-131 Hybrid VLAN-based transparent transmission of Layer 2 control protocol

packets on a Layer 2 network
VLAN3
PE2
VLAN3
VLAN3 CE2 LAN-C
LAN-A
VLAN3
ISP
VLAN2
VLAN2
CE1
PE1 VLAN2
VLAN2
PE3
LAN-B
CE3 LAN-D
PE1, PE2, and PE3 are connected to construct a Layer 2 switching network. USR_A, USR_B,
USR_C, and USR_D form different Layer 2 domains of VLAN 2 and VLAN 3 and send
tagged Layer 2 control protocol packets. CE1, CE2, and CE3 forward tagged Layer 2 control
protocol packets (VLAN ID being 2 or 3) and standard untagged Layer 2 control protocol
packets.
l The default VLAN and the dot1q tunnel attribute are configured on the interfaces that
connect PE1, PE2, and PE3 to CEs. In addition, interface-based transparent transmission
of Layer 2 control protocol packets is configured on these interfaces.
l After receiving Layer 2 control protocol packets (tagged or untagged), PE1 replaces the
original multicast MAC address of the packets with a specified multicast MAC address
according to the configured mapping, and then adds an outer VLAN tag with the default
VLAN ID to the packets before forwarding them. The packets whose multicast MAC
address is replaced with the specified multicast MAC address are forwarded as common
Layer 2 VLAN packets on the ISP network.
l When the packets reach PE2 and PE3, PE2 and PE3 restore the multicast MAC address
of the packets to the standard multicast MAC address according to the configured
mapping between multicast MAC addresses and Layer 2 control protocol packets, and
then remove the outer VLAN tag from the packets and forward the packets to the
corresponding CEs, completing transparent transmission of Layer 2 protocol packets.

Equipment

Acronym&Abbreviation Full Spelling
BPDU Bridge Protocol Data Unit
LACP Link Aggregation Control Protocol
LLDP Link Layer Discovery Protocol
GMRP Generic Multicast Registration Protocol
GVRP Generic VLAN Registration Protocol
HGMP HUAWEI Group Management Protocol
EFM Ethernet in the First Mile
VTP VLAN Trunk Protocol
UDLD Unidirectional Link Detection
PAGP Port Aggregation Protocol
CDP Cisco Discovery Protocol
5.9 ERPS (G.8032)

5.9.1 Overview
Definition
Ethernet Ring Protection Switching (ERPS) is a Layer 2 protocol defined by the International
Telecommunication Union - Telecommunication Standardization Sector (ITU-T) to prevent
loops. As the standard number is ITU-T G.8032/Y1344, ERPS is also called G.8032. ERPS
defines Ring Auto Protection Switching (R-APS) Protocol Data Units (PDUs) and protection
switching mechanisms.
ERPSv1 and ERPSv2 are currently available. ERPSv2, fully compatible with ERPSv1,
provides the following enhanced functions:
l Supports multi-ring topologies, such as intersecting rings.
l Allows sub-rings to use either virtual channels (VCs) or non-virtual channels (NVCs) to
transmit R-APS PDUs.
l Supports two manual port blocking modes: forced switch (FS) and manual switch (MS).
l Supports both revertive and non-revertive switching.

Equipment
Purpose
Redundant links are generally used on an Ethernet switching network to provide link backup
for higher network reliability. However, redundant links may produce loops, causing
broadcast storms and reducing the stability of MAC address tables. As a result, the
communication quality deteriorates, and communication services may be interrupted. To
resolve these problems, ring network protocols must be used to prevent loops.
Table 5-35 compares various ring network protocols.
Table 5-35 Ring network protocol comparison

Ring Network Advantage Disadvantage
Protocol
Rapid Ring Fast convergence, meeting l Supports only level-1

Protection Protocol carrier-class reliability subring in ring networking.
(RRPP) requirements. l Is a proprietary protocol that
cannot be used for
communication between
Huawei and non-Huawei
devices.
Spanning Tree l Applies to all Layer 2 Slow to converge large-scale

Protocol (STP)/Rapid networks. networks. It cannot meet
Spanning Tree l Is a standard IEEE protocol carrier-class reliability
Protocol (RSTP)/ that allows Huawei devices requirements.
Multiple Spanning to communicate with non-
Tree Protocol Huawei devices.
(MSTP)
ERPS l Boasts of fast convergence, Requires complex manual

meeting carrier-class configuration of many
reliability requirements. functions.
l Is a standard ITU-T
protocol that allows Huawei
devices to communicate
with non-Huawei devices.
l Supports single and multi-
ring topologies in ERPSv2.
Benefits
ERPS offers the following benefits:
l Protects services and prevents broadcast storms on ring networks.

l Meets carrier-class reliability requirements for network convergence.
l Allows communication between Huawei and non-Huawei devices on ring networks.
5.9.2 Principles

Equipment

Ethernet Ring Protection Switching (ERPS) is a protocol used to block specified ports to
prevent loops at the link layer of an Ethernet network.
As shown in Figure 5-132, ATN A through ATN D constitute a ring and are dual-homed to an
upper-layer network. This access mode will cause a loop on the entire network. To ensure link
connectivity, ERPS is used to prevent loops.
Figure 5-132 ERPS single-ring networking
Network
NPE1 NPE2
ATNA ATND
ERPS
ATNB RPL
ATNC
CE
RPL owner
RPL neighbour
Figure 5-132 shows a typical ERPS single-ring network. The following describes ERPS
based on this networking:
ERPS Ring
An ERPS ring consists of interconnected ATN devices that have the same control VLAN. A
ring is a basic ERPS unit.
ERPSv1 supports only major rings (closed). ERPSv2 supports both major rings and sub-rings
(open). Major rings can be reconfigured as sub-rings.

Equipment
Node
A node is a ATN added to an ERPS ring. A node can have a maximum of two ports added to
the same ERPS ring.
Port Role
ERPS defines three port roles: ring protection link (RPL) owner port, RPL neighbor port
(only in ERPSv2), and ordinary port.
l RPL owner port
An RPL owner port is a ring port responsible for blocking traffic over the RPL to prevent
loops. An ERPS ring has only one RPL owner port.
When the node on which the RPL owner port resides receives an R-APS PDU indicating
the failure of a link or node on the ring, it unblocks the RPL owner port to allow the port
to send and receive traffic. This process ensures that traffic is not interrupted.
l RPL neighbor port
An RPL neighbor port is a ring port directly connected to an RPL owner port and is used
to reduce the number of times that filtering database (FDB) entries are refreshed.
RPL owner and neighbor ports are both blocked under normal conditions to prevent
loops.
If an ERPS ring fails, both RPL owner and neighbor ports are unblocked.
l Ordinary port
Ordinary ports are ring ports other than the RPL owner and neighbor ports.
An ordinary port monitors the status of the directly connected ERPS link and sends R-
APS PDUs to inform the other ports if the link status changes.
Port Status
On an ERPS ring, an ERPS-enabled port can be in either of the following states:
l Forwarding: The port forwards user traffic and sends and receives R-APS PDUs.
l Discarding: The port only sends R-APS PDUs.
Control VLAN
A control VLAN is used to transmit R-APS PDUs for an ERPS ring.
Each ERPS ring must be configured with a control VLAN. After a port is added to an ERPS
ring that has a control VLAN configured, the port is added to the control VLAN
automatically.
Different ERPS rings cannot be configured with the same control VLAN ID.
Unlike control VLANs, data VLANs are used to transmit data packets.
ERP Instance
On a ATN running ERPS, the VLAN in which R-APS PDUs and data packets are transmitted
must be mapped to an Ethernet Ring Protection (ERP) instance so that ERPS forwards or
blocks the VLAN packets based on blocking rules. Otherwise, VLAN packets may cause
broadcast storms on the ring network and render the network unavailable.

Equipment
Timer
ERPS defines four timers: guard timer, wait to restore (WTR) timer, hold-off timer, and wait
to block (WTB) timer (only in ERPSv2).
l Guard timer
After a faulty link or node recovers or a clear operation is executed, the nodes on the two
ends of the link or the recovered node sends R-APS No Request (NR) messages to
inform the other nodes of the link or node recovery and starts a guard timer. To avoid
receiving out-of-date R-APS Signal Fail (SF) messages before the timer expires, each
involved node does not process any R-APS PDUs. After the timer expires, if the
involved node still receives an R-APS (SF) message, the local port enters the Forwarding
state. (An R-APS (SF) message is sent by a node to other nodes after the node detects
that one of its ring ports is Down.)
l WTR timer
If the RPL owner port is unblocked due to a link or node failure, the involved port may
not go Up immediately after the link or node recovers. To prevent the RPL owner port
from alternating between Up and Down, the node on which the RPL owner port resides
starts a WTR timer after receiving an R-APS No Request (NR) message. If the node
receives an R-APS (SF) message before the timer expires, it terminates the WTR timer.
If the node does not receive any R-APS SF message before the timer expires, it unblocks
the RPL owner port when the timer expires and sends an R-APS NR, RPL Blocked (NR,
RB) message. After receiving this R-APS (NR, RB) message, the nodes set their
recovered ports on the ring to the Forwarding state.
l Hold-off timer
Protection switching sequence requirements vary for Layer 2 networks running ERPS.
For example, in a multi-layer service application, if a server fails, it will require a certain
period of time to recover. No protection switching is performed immediately after the
server fails, and the client does not detect the failure during this period. A hold-off timer
can be set to meet this requirement. If a fault occurs, the fault is not immediately
reported to ERPS. Instead, the hold-off timers starts. If the fault persists after the timer
expires, the fault will be reported to ERPS.
l WTB timer
The WTB timer starts after an FS or MS operation is performed. When multiple nodes
on an ERPS ring are in the FS or MS state, the clear operation takes effect only after the
WTB timer expires. This ensures that the RPL owner port will not be blocked
immediately.
The WTB timer value cannot be configured. Its value is the guard timer value plus 5.
Revertive and Non-revertive Switching

After link faults are rectified, whether to re-block the RPL owner port depends on the
switching mode.
l In revertive switching, the RPL owner port is re-blocked after the wait to restore (WTR)
timer expires, and the traffic channel is blocked on the RPL.
l In non-revertive switching, the traffic channel continues to use the RPL.
ERPSv1 supports only revertive switching. ERPSv2 supports both revertive and non-revertive
switching.

Equipment
Port Blocking Modes

ERPSv2 supports manual port blocking.
If the RPL has high bandwidth, blocking a low-bandwidth link and unblocking the RPL allow
traffic to use the RPL and have more bandwidth. ERPS supports two manual port blocking
modes: forced switch (FS) and manual switch (MS).
l FS: forcibly blocks a port immediately after FS is configured, irrespective of whether
link failures have occurred.
l MS: forcibly blocks a port when link failures and FS conditions are absent.
In addition to FS and MS operations, ERPS also supports the clear operation. The clear
operation has the following functions:
l Clears an existing FS or MS operation.
l Triggers revertive switching before the WTR or wait to block (WTB) timer expires in the
case of revertive operations.
l Triggers revertive switching in the case of non-revertive operations.
R-APS PDU Transmission Mode on Sub-rings

ERPSv2 supports single and multi-ring topologies. In multi-ring topologies, sub-rings either
uses R-APS virtual channels (VCs) or non-virtual channels (NVCs) to transmit R-APS PDUs.
l Uses VCs: R-APS PDUs on sub-rings are transmitted to the major ring through
interconnection nodes. The RPL owner port of a sub-ring blocks both R-APS PDUs and
data traffic.
l Uses NVCs: R-APS PDUs on sub-rings are terminated on the interconnection nodes. The
RPL owner port blocks data traffic but not R-APS PDUs on each sub-ring.
On the network shown in Figure 5-133, a major ring is interconnected with two sub-rings.
The sub-ring on the left has a VC, whereas the sub-ring on the right has an NVC.
Figure 5-133 Interconnected rings with a VC or NVC
Major
Sub-Ring Ring
Sub-Ring
with without
virtual virtual
channel channel
Ethernet Ring Node
Interconnection Node
RPL owner Interface

RAPS Virtual Channel

Equipment
By default, sub-rings use NVCs to transmit R-APS PDUs, except for the scenario shown in
Figure 5-134.
NOTE
When sub-ring links are not contiguous, VCs must be used. On the network shown in Figure 5-134,
links b and d belong to major rings 1 and 2, respectively; links a and c belong to the sub-ring. Because
links a and c are not contiguous, they cannot detect the status change between each other. Therefore,
VCs must be used for R-APS PDU transmission.
Figure 5-134 VC application networking
Sub-Ring Major
Major b with virtual d
Ring1 Ring2
channel
Ethernet Ring Node
Interconnection Node
RPL owner Interface
RAPS Virtual Channel
Table 5-36 lists the advantages and disadvantages of R-APS PDU transmission modes on
sub-rings with VCs or NVCs.

Equipment
Table 5-36 Comparison between R-APS PDU transmission modes on sub-rings with VCs or
NVCs
R-APS Advantage Disadvantage
PDU
Transmis
sion
Mode on
Sub-
rings
Using Applicable when sub-ring links Requires VC resource reservation and

VCs are not contiguous. control VLAN assignment from adjacent
rings.
Using Does not require resource Inapplicable when sub-ring links are not
NVCs reservation or control VLAN contiguous.
assignment from adjacent rings.
5.9.2.2 R-APS PDU Format

Ethernet Ring Protection Switching (ERPS) protocol packets are called R-APS PDUs. Ring
Auto Protection Switching (R-APS) Protocol Data Units (PDUs) are transmitted on ERPS
rings to convey ERPS ring information. Figure 5-135 shows the basic R-APS PDU format.
Figure 5-135 Basic R-APS PDU format

1 2 3 4
8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
1
MEL Version OpCode(R-APS=40) Flags(0) TLV Offset(32)
5
... R-APS Specific Information(32 octets)
...
37
[optional TLV starts here;otherwise End TLV]
last End TLV(0)
Table 5-37 describes the fields in an R-APS PDU.
Table 5-37 R-APS PDU field description

Field Name Lengt Description
h
MEL 3 bits Identifies the maintenance entity group (MEG) level of the R-
APS PDU.

Equipment
Field Name Lengt Description

h
Version 5 bits l 0x00: used in ERPSv1.

l 0x01: used in ERPSv2.
OpCode 8 bits Indicates an R-APS PDU. The value of this field is 0x28.
Flags 8 bits The value of this field is 0x00. This field should be ignored
upon reception.
TLV Offset 8 bits The value of this field is 0x20.
R-APS Specific 32 x 8 Carries R-APS ring information and is the core in an R-APS
Information bits PDU. This field has different meanings for some of its sub-
fields in ERPSv1 and ERPSv2. Figure 5-136 shows the R-
APS Specific Information field format in ERPSv1. Figure
5-137 shows the R-APS Specific Information field format in
ERPSv2.
TLV Not Describes information to be loaded. The end TLV value is

limite 0x00.
d
Figure 5-136 R-APS Specific Information field format in ERPSv1

1 2 3 4
8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
Request Reserved Status Node ID(6 octets)

/State 1 R D
Status
B N
Reserved
F
(Node ID)
Reserved 2(24 octets)
Figure 5-137 R-APS Specific Information field format in ERPSv2

1 2 3 4
8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
Reque Status Node ID(6 octets)

Sub-code
st/State R DB
Status
B NP
Reserved
FR
(Node ID)
Reserved 2(24 octets)

Equipment
Table 5-38 describes sub-fields in the R-APS Specific Information field.
Table 5-38 Sub-fields in the R-APS Specific Information field

Sub-Field Length Description
Name
Request/ 4 bits Indicates that this R-APS PDU is a request or state PDU. The
State value can be:
l 1101: forced switch (FS)
l 1110: Event
l 1011: signal fail (SF)
l 0111: manual switch (MS)
l 0000: no request (NR)
l Others: reserved
Reserved 1 4 bits Reserved 1 is used in ERPSv1 for message reply or protection

identifier.
Sub-code
Sub-code is used in ERPSv2 with its value determined by the
Request/State field value:
l If the Request/State field value is 1110, the Sub-code value
is 0000, meaning Flush Request.
l If the Request/State field value is any value other than
1110, the Sub-code value is 0000 and ignored upon
reception.
Status 8 bits Includes the following status information:

l RPL Blocked (RB) (1 bit): If the value is 1, the RPL owner
port is blocked; if the value is 0, the RPL owner port is
unblocked. The nodes without the RPL owner port set this
sub-field to 0 when sending an R-APS PDU.
l Do Not Flush (DNF) (1 bit): If the value is 1, an FDB flush
should not be triggered by the reception of the R-APS
PDU; if the value is 0, an FDB flush may be triggered by
the reception of the R-APS PDU.
l Blocked port reference (BPR) (1 bit): If the value is 0, ring
link 0 is blocked; if the value is 1, ring link 1 is blocked.
BPR is valid only in ERPSv2.
l Status Reserved (5 bits): This sub-field is reserved for
future specification and should be ignored upon reception.
This sub-field should be encoded as all 0s in transmission.
Node ID 6 x 8 bits Identifies the MAC address of a node on the ERPS ring. It is
informational and does not affect protection switching on the
ERPS ring.

Equipment
Sub-Field Length Description

Name
Reserved 2 24 x 8 bits Reserved for future extension and should be ignored upon
reception. Currently, this sub-field should be encoded as all 0s
in transmission.
5.9.2.3 ERPS Single Ring Principles

ERPS is a standard ring protocol used to prevent loops on ERPS rings at the Ethernet link
layer. A ATN can have a maximum of two ports added to the same ERPS ring.
To prevent loops on an ERPS ring, you can enable a loop-breaking mechanism to block the
ring protection link (RPL) owner port to eliminate loops. If a link on the ring network fails,
the ERPS-enabled ATN immediately unblocks the blocked port and performs link switching
to restore communication between nodes on the ring network.
This section describes how ERPS is implemented on a single ring when links are normal,
when a link fails, and when the link recovers.
Links Are Normal

On the network shown in Figure 5-138, ATN A through ATN E constitute a ring network, and
they can communicate with each other.
1. To prevent loops, ERPS blocks the RPL owner port and also the RPL neighbor port (if
any is configured). All other ports can transmit service traffic.
2. The RPL owner port sends R-APS (NR) messages to all other nodes on the ring at an
interval of 5s, indicating that ERPS links are normal.

Equipment
Figure 5-138 ERPS single ring networking (links are normal)
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
RPL owner ATNC
CE
Blocked Interface
Data Flow
A Link Fails
As shown in Figure 5-139, if the link between ATN D and ATN E fails, the ERPS protection
switching mechanism is triggered. The ports on both ends of the faulty link are blocked, and
the RPL owner port and RPL neighbor port are unblocked to send and receive traffic. This
mechanism ensures that traffic is not interrupted. The process is as follows:
1. After ATN D and ATN E detect the link fault, they block their ports on the faulty link
and perform a Filtering Database (FDB) flush.
2. ATN D and ATN E send three consecutive R-APS Signal Fail (SF) messages to the other
LSWs and then, after 5s, send another R-APS (SF) message.
3. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush. ATN C
on which the RPL owner port resides and ATN B on which the RPL neighbor port
resides unblock the respective RPL owner port and RPL neighbor port, and perform an
FDB flush.

Equipment
Figure 5-139 ERPS single ring networking (unblocking the RPL owner port and RPL
neighbor port if a link fails)
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
RPL owner ATNC
CE Failed Link
Blocked Interface
Data Flow
The Link Recovers

After the link fault is rectified, either of two situations may occur:
l If the ERPS ring uses revertive switching, the RPL owner port is blocked again, and the
link that has recovered is used to forward traffic.
l If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link
that has recovered remains blocked.
The following example uses revertive switching to describe the process after the link
recovers.
1. After the link between ATN D and ATN E recovers, ATN D and ATN E start a guard
timer to avoid receiving out-of-date R-APS PDUs. The two ATN devices do not receive
any R-APS PDUs before the timer expires. At the same time, ATN D and ATN E send R-
APS (NR) messages to the other LSWs.
2. After receiving an R-APS (NR) message, ATN C on which the RPL owner port resides
starts the wait to restore (WTR) timer. After the WTR timer expires, ATN C blocks the
RPL owner port and sends R-APS (NR, RB) messages.

Equipment
3. After receiving an R-APS (NR, RB) message, ATN D and ATN E unblock the ports at
the two ends of the link that has recovered, stop sending R-APS (NR) messages, and
perform an FDB flush. The other LSWs also perform an FDB flush after receiving an R-
APS (NR, RB) message.
Protection Switching
l Forced switch
On the network shown in Figure 5-140, ATN A through ATN E on the ERPS ring can
communicate with each other. A forced switch (FS) operation is performed on the ATN
E's port that connects to ATN D, and the ATN E's port is blocked. Then the RPL owner
port and RPL neighbor port are unblocked to send and receive traffic. This mechanism
ensures that traffic is not interrupted. The process is as follows:
a. After the ATN E's port that connects to ATN D is forcibly blocked, ATN E performs
an FDB flush.
b. ATN E sends three consecutive R-APS (FS) messages to the other LSWs and then
sends one R-APS (FS) message at an interval of 5s afterwards.
c. After receiving an R-APS (SF) message, the other LSWs perform an FDB flush.
ATN C on which the RPL owner port resides and ATN B on which the RPL
neighbor port resides unblock the respective RPL owner port and RPL neighbor
port, and perform an FDB flush.

Equipment
Figure 5-140 Layer 2 ERPS ring networking (blocking a port by FS)
Network
NPE1 NPE2
ATNA ATNE
ERPS
RPL ATND
ATNB
RPL owner ATNC
CE
Blocked Interface
Data Flow
l Clear
After a clear operation is performed on ATN E, the port that is forcibly blocked by FS
sends R-APS (NR) messages to all other ports on the ERPS ring.
– If the ERPS ring uses revertive switching, the RPL owner port starts the wait to
block (WTB) timer after receiving an R-APS (NR) message. After the WTB timer
expires, the FS operation is cleared. The RPL owner port is then blocked, and the
blocked port on ATN E is unblocked. If you perform a clear operation on ATN C
(on which the RPL owner port resides) before the WTB timer expires, the RPL
owner port is immediately blocked, and the blocked port on ATN E is unblocked.
– If the ERPS ring uses non-revertive switching and you want to block the RPL
owner port, perform a clear operation on ATN C (on which the RPL owner port
resides).
l Manual switch
Compared with an FS operation, a manual switch (MS) operation triggers protection
switching in a similar way except that an MS operation does not take effect in FS, MS,
or link failure conditions.

Equipment
5.9.2.4 ERPS Multi-ring Principles

Ethernet Ring Protection Switching version 1 (ERPSv1) supports only single ring topology,
whereas ERPSv2 supports single and multi-ring topologies.
A multi-ring network consists of one or more major rings and sub-rings. A sub-ring can have
a virtual channel (VC) or non-virtual channel (NVC), depending on whether Ring Auto
Protection Switching (R-APS) Protocol Data Units (PDUs) on the sub-ring will be transmitted
to a major ring.
This section describes how ERPS is implemented on a multi-ring network with sub-rings that
have NVCs when links are normal, when a link fails, and when the link recovers.
Links Are Normal

On the multi-ring network shown in Figure 5-141, ATN A through ATN E constitute a major
ring; ATN B, ATN C, and ATN F constitute sub-ring 1, and ATN C, ATN D, and ATN G
constitute sub-ring 2. The devices on each ring can communicate with each other.
1. To prevent loops, each ring blocks its ring protection link (RPL) owner port. All other
ports can transmit service traffic.
2. The RPL owner port on each ring sends R-APS (NR) messages to all other nodes on the
same ring at an interval of 5s. The R-APS (NR) messages on the major ring are
transmitted only on this ring. The R-APS (NR) messages on each sub-ring are terminated
on the interconnection nodes and therefore are not transmitted to the major ring.
Traffic between PC1 and the upper-layer network travels along the path PC1 <-> ATN F <->
ATN B <-> ATN A <-> ATN A; traffic between PC2 and the upper-layer network travels
along the path PC2 <-> ATN G <-> ATN D <-> ATN E <-> ATN B.

Equipment
Figure 5-141 ERPS multi-ring networking (links are normal)
Network
NPE1 NPE2
ATNA ATNE
Major Ring
ATNB
RPL ATND
Sub-Ring1
Sub-Ring2
ATNC
ATNF ATNG
PC1 PC2
RPL owner
Data Flow
A Link Fails
As shown in Figure 5-142, if the link between ATN D and ATN G fails, the ERPS protection
switching mechanism is triggered. The ports on both ends of the faulty link are blocked, and
the RPL owner port on sub-ring 2 is unblocked to send and receive traffic. In this situation,

Equipment
traffic from PC1 still travels along the original path. ATN C and ATN D inform the other
nodes on the major ring of the topology change so that traffic from PC2 is also not
interrupted. Traffic between PC2 and the upper-layer network travels along the path PC2 <->
ATN G <-> ATN C <-> ATN B <-> ATN A <-> ATN E <-> ATN B. The process is as
follows:
1. After ATN D and ATN G detect the link fault, they block their ports on the faulty link
and perform a Filtering Database (FDB) flush.
2. ATN G sends three consecutive R-APS (SF) messages to the other LSWs and then, after
5s, sends another R-APS (SF) message.
3. ATN G then unblocks the RPL owner port and performs an FDB flush.
4. After the interconnection node ATN C receives an R-APS (SF) message, it performs an
FDB flush. ATN C and ATN D then send R-APS Event messages within the major ring
to notify the topology change in sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic from PC2 is switched to a normal link.

Equipment
Figure 5-142 ERPS multi-ring networking (unblocking the RPL owner port if a link fails)
Network
NPE1 NPE2
ATNA ATNE
Major Ring
ATNB
RPL ATND
Sub-Ring1
Sub-Ring2
ATNC
ATNF ATNG
PC1 PC2
Blocked Interface
Data Flow
The Link Recovers

After the link fault is rectified, either of two situations may occur:
l If the ERPS ring uses revertive switching, the RPL owner port is blocked again, and the
link that has recovered is used to forward traffic.
l If the ERPS ring uses non-revertive switching, the RPL remains unblocked, and the link
that has recovered remains blocked.

Equipment
The following example uses revertive switching to describe the process after the link
recovers.
1. After the link between ATN D and ATN G recovers, ATN D and ATN G start a guard
timer to avoid receiving out-of-date R-APS PDUs. The two ATN devices do not receive
any R-APS PDUs before the timer expires. Then ATN D and ATN G send R-APS (NR)
messages within sub-ring 2.
2. ATN G on which the RPL owner port resides starts the wait to restore (WTR) timer.
After the WTR timer expires, ATN G blocks the RPL owner port and unblocks its port
on the link that has recovered and then sends R-APS (NR, RB) messages within sub-ring
2.
3. After receiving an R-APS (NR, RB) message from ATN G, ATN D unblocks its port on
the recovered link, stops sending R-APS (NR) messages, and performs an FDB flush.
ATN C also performs an FDB flush.
4. ATN C and ATN D, the interconnection nodes, then send R-APS Event messages within
the major ring to notify the link recovery of sub-ring 2.
5. After receiving an R-APS Event message, the other LSWs on the major ring perform an
FDB flush.
Then traffic changes to the normal state, as shown in Figure 5-141.
5.9.2.5 ERPS Multi-instance

On a common ERPS network, a physical ring can be configured with a single ERPS ring and
a single blocked port can be specified on the ring. If the ERPS ring is complete, the blocked
port prevents all user package from passing through. As a result, all user package travels
through a single path over the ERPS ring, and the other link on the blocked port becomes idle,
causing bandwidth wastes.
The ERPS multi-instance allows two logical ERPS rings on a physical ring. On the ERPS ring
shown in Figure 5-143, all ATN devices, ports, and control VLANs work based on basic
ERPS rules. A physical ring has two blocked ports. Each blocked port verifies the
completeness of the physical ring and blocks or forwards data without affecting others.
One or two ERPS rings can be configured over a physical ring. Each ERPS ring is configured
with an EPRS protected instance. Each protected instance represents a range of VLANs. The
topology calculated for a specific ERPS ring does not apply to another ERPS ring and does
not affect other rings.
With a specific protected instance for each ERPS ring, a blocked port takes effect only on
VLANs of a specific ERPS ring. Different VLANs can use separate paths, implementing
traffic load balancing and link backup.

Equipment
Figure 5-143 Networking diagram for the ERPS multi-instance
Network
NPE1 NPE2
ATN
ATN
ERPS
ATN
P2 ATN
P1
CE1 CE2
VLAN: VLAN:
100~200 300~400
ERPS ring1
ERPS ring2
Blocked Interface1
Blocked Interface2
Data Flow1
Data Flow2
5.9.2.6 Association Between ERPS and Ethernet CFM

When a transmission device is connected to an Ethernet Ring Protection Switching (ERPS)
ring and fails, ERPS, in absence of an automatic link detection mechanism, cannot quickly
detect the device failure. This issue will make convergence slow or even cause service
interruption in worse cases. To resolve this problem, ERPS can be associated with Ethernet
connectivity fault management (CFM).

Equipment
After Ethernet CFM is deployed on ERPS nodes connecting to transmission devices and
detects a transmission link failure, Ethernet CFM informs the ERPS ring of the failure so that
ERPS can perform fast protection switching.
NOTE
Currently, ERPS can be associated only with outward-facing MEPs.
On the network shown in Figure 5-144, ATN A, ATN B, and ATN C form an ERPS ring.
Three relay nodes exist between ATN A and ATN C. Ethernet CFM is configured on ATN A
and ATN C. Interface1 on ATN A is associated with Interface1 on Relay1, and Interface1 on
ATN C is associated with Interface1 on Relay3.
In normal situations, the RPL owner port sends R-APS (NR) messages to all other nodes on
the ring at an interval of 5s, indicating that ERPS links are normal.
Figure 5-144 ERPS ring over transmission links (links are normal)
Relay2
Interface1
Interface1
Relay1
Relay3
Interface1
Interface1
ATNA
ATNC
ATNB
RPL owner
Data Flow
If Relay2 fails, ATN A and ATN C detect the Ethernet CFM failure, block their Interface1,
send R-APS (SF) messages through their respective interfaces connected to ATN B, and then
perform a Filtering Database (FDB) flush.
After receiving an R-APS (SF) message, ATN B unblocks the RPL owner port and performs
an FDB flush. Figure 5-145 shows the networking after Relay2 fails.

Equipment
Figure 5-145 ERPS ring over transmission links (Relay2 fails)
Relay2
Interface1
Interface1
Relay1
Relay3
Interface1
Interface1
ATNA
ATNC
ATNB
Blocked Interface
Data Flow
After Relay2 recovers, Relay2 in revertive switching mode re-blocks the RPL owner port and
sends R-APS (NR, RB) messages.
After ATN A and ATN C receive an R-APS (NR, RB) message, ATN A and ATN C unblock
their blocked Interface1 and perform an FDB flush so that traffic changes to the normal state,
as shown in Figure 5-144.
5.9.3 Applications
5.9.3.1 ERPS Layer 2 Transparent Transmission

Generally, redundant links are used on an Ethernet switching network to provide link backup
and enhance network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and rendering the MAC address table unstable. As a result, the
communication quality deteriorates, and communication services may be interrupted.
To prevent loops caused by redundant links, enable ERPS on the nodes of the ring network.
ERPS is a Layer 2 loop-breaking protocol defined by the ITU-T. It boasts of fast convergence,
implementing convergence within 50 ms.
As shown in Figure 5-146, ATN A through ATN E constitute an aggregation ring that
provides Layer 2 aggregation services and accesses a Layer 3 network for service processing.
The aggregation ring runs ERPS, providing protection switching for Layer 2 redundant links.
VLANIF interfaces are configured on ATN A and ATN B for Layer 3 access. In addition,

Equipment
VRRP is configured on the VLANIF interfaces to function as the virtual gateway, and peer
BFD is enabled for fast fault detection and then fast VRRP switching.
Figure 5-146 ERPS single-ring networking
Network
NPE1 NPE2
VRRP+peer BFD
ATNE
ATNA
ERPS ATND
ATNB
RPL
RPL Owner
ATNC
CE1 CE3
CE2
Blocked Port
Data Flow1
Data Flow2
Data Flow3

Equipment
Terms
Term Description
FDB Forwarding database, including entries for guiding data forwarding. There are
Layer 2 FDB and Layer 3 FDB. The Layer 2 FDB refers to the MAC table,
which provides information about MAC addresses and outbound interfaces and
guides Layer 2 forwarding. The Layer 3 FDB refers to the ARP table, which
provides information about IP addresses and outbound interfaces and guides
Layer 3 forwarding.
MSTP The Multiple Spanning Tree Protocol (MSTP) is a new spanning tree protocol
defined in IEEE 802.1s. MSTP uses the concepts of region and instance. Based
on different requirements, MSTP divides a large network into regions where
instances are created. These instances are mapped to VLANs. BPDUs with
region and instance information are transmitted between bridges. A bridge
determines which domain it belongs to based on the information carried in
BPDUs.
RRPP The Rapid Ring Protection Protocol (RRPP) is a link layer protocol specially
used to prevent loops on an Ethernet ring network. Devices running RRPP detect
loops on the network by exchanging information with each other, and block
certain interfaces to eliminate loops.
RSTP The Rapid Spanning Tree Protocol (RSTP) is defined in IEEE 802.1w released
in 2001. RSTP is the amendment and supplementation to STP, implementing
rapid convergence.
STP The Spanning Tree Protocol (STP) is defined in IEEE 802.1d released in 1998.
This protocol is used to eliminate loops on a LAN. The ATN devices running
STP detect loops on the network by exchanging information with each other, and
block specified interfaces to eliminate loops.

Abbreviation
APS Auto Protection Switching
ERPS Ethernet Ring Protection Switch
FS forced switch
MEL maintenance entity group level
MS manual switch
NR No Request
NR, RB No Request, RPL Blocked

Equipment

Abbreviation
R-APS Ring Auto Protection Switching
RPL ring protection link
SF Signal Fail
WTB wait to block
WTR wait to restore
5.10 Automatic Link Discovery
5.10.1 Introduction
Definition
The Automatic Link Discovery Protocol (ALDP) is a Huawei proprietary feature used by the
ATN to discover neighbors at the link layer. This protocol allows the Network Management
System (NMS) to use the Simple Network Management Protocol (SNMP) and Management
Information Base (MIB) to initiate a link detection process. After receiving the set operation
delivered by the MIB, ATN sends Link Detect packets to its neighboring devices. Upon
receiving the Link Detect packets, the neighboring devices respond with Link Reply packets
to the ATN. The ATN then saves neighbor information. The NMS can use the MIB to view
the neighbor information on the ATN and calculates the network topology based on the
obtained neighbor information.
Purpose
At present, many NMSs use the Automated Discovery function to trace topology changes.
The function only allows the NMS to calculate the topology at the network layer and
determine to which subnet a device belongs. As a result, the topology discovery result only
shows basic topology information such as device addition or deletion, but not detailed
topology information such as interfaces through which devices are connected, device
locations, and network topology status.
To discover detailed topology information at the link layer, including interface connection
information on devices, the automatic link discovery protocol is introduced.
Benefits
This feature improves carriers' network sensitivity to topology changes and operating
efficiency.
5.10.2 Principles

Equipment

1. The NMS requests for automatic optical fiber discovery on an interface of ATN A. After
receiving the request, ATN A generates one or multiple Link Detect packets and sends
them to ATN B.
2. ATN B encapsulates its link information into Link Reply packets and sends them back to
ATN A.
3. ATN A saves the link information about ATN B as a neighbor information record.
4. The NMS queries link discovery results, and ATN A returns the results to the NMS.
5. The NMS collects topology information from all its managed devices' MIBs and
calculates the entire network topology.
Automatic Link Discovery Between Directly Connected Neighbors

Automatic link discovery between directly connected neighbors is implemented as follows:
Figure 5-147 Automatic link discovery between directly connected neighbors

NMS
MP
SN
SN
MP
Link
detect
ATNA ATNB PTN

Link
SNMP detect
Packets Packets
Automatic Link Discovery Between Indirectly Connected Neighbors

The implementation of automatic link discovery between indirectly connected neighbors is
similar to that of automatic link discovery between directly connected neighbors. The only
difference lies that automatic link discovery is not enabled on transit nodes so that the transit
nodes can transparently transmit Link Detect and Link Reply packets.

Equipment
Figure 5-148 Link automatic discovery between indirectly connected neighbors

NMS
Network
P
SN
M
Tunnel M
SN
Link Link
detect detect
ATNA ATNB
SNMP Link detect

Packets Packets
5.10.2.2 Automatic Link Discovery Packets
Packet Format
Figure 5-149 and Figure 5-150 show formats of automatic link discovery packets.
Figure 5-149 Format of an automatic link discovery packet on an Ethernet physical link
Flag
DA SA Type Information
(6bytes) (6bytes) (2bytes)
（20bytes） CRC
(0xff-ff-ff-ff-ff-ff) （0x0000）
“Huawei Link * (4bytes)
Search”
Figure 5-150 Format of an automatic link discovery packet on an Ethernet sub-interface link
H a rd w a re
typ e + p ro to co l
typ e + h a rd w a re
DA SA 0x8100 VLAN T yp e F la g In fo rm a tio n CRC
a d d re ss
le n g th + p ro to co l
a d d re ss le n g th

Equipment
Packet Type
Link automatic discovery packets can be classified into two types:
l Link Detect packet: is generated by the NMS to discover links. The TLV field in a Link
Detect packet is Send Link Info SubTLV.
l Link Reply packet: is a response to a Link Detect packet. The TLV fields in the Link
Reply packet are Recv Link Info SubTLV and Send Link Info SubTLV.
5.10.2.3 Neighbor Information Parameters

Neighbor information on all types of interfaces contains the following parameters:
l Local network element ID
l Local interface number
l Remote network element ID
l Remote interface number
Neighbor information on sub-interfaces also contains the following parameters:
l Local VLAN ID
l Remote VLAN ID
5.10.2.4 Implementation of Automatic Link Discovery

l Automatic link discovery on the ATN can be implemented based on interfaces or the
entire device.
l Automatic link discovery allows the NMS to use the MIB to collect topology
information.
l Automatic link discovery allows neighbor information to be cleared from an interface,
sub-interfaces of a specific interface, and the entire ATN.
5.10.3 Applications
The automatic link discovery function allows the device to obtain neighbor information at the
link layer, expanding the network scale managed by the NMS and providing network
administrators with detailed network topology information.
In Figure 5-151, the network on which automatic link discovery is enabled can be a VLAN
network,a network that traverses third-party SDH devices. A network administrator can click
a link on the NMS to acquire information about the link and its connected network elements.

Equipment
Figure 5-151 Usage scenario of automatic link discovery

NMS
ATN A ATN B PTN

Equipment
Feature Description 6 WAN Access
6 WAN Access
About This Chapter
This document describes the WAN features in terms of the overview, principle, and
applications.
6.1 ATM IMA

6.2 PPP and MP
6.3 CES
6.4 BER Measurement
6.5 APS
This document describes principles and applications of the Automatic Protection Switching
(APS) feature.
6.6 xDSL
6.7 GPON
Gigabit-capable passive optical network (GPON) is a PON technology standardized by ITU-T
Recommendation G.984.x. GPON devices support high-bandwidth transmission, thereby
addressing the bandwidth bottleneck in twisted-pair access and meeting user demands on
high-bandwidth services.
6.1 ATM IMA
6.1.1 Introduction
Definition
IMA is the acronym of Inverse Multiplexing for ATM. The general idea of IMA is that the
sender schedules and distributes a high-speed ATM cell stream to multiple low-speed physical
links for transmission, and then the receiver schedules and reassembles the stream fragments
into one cell stream and submits the cell stream to the ATM layer. In this manner, bandwidths
are multiplexed flexibly, improving the efficiency of bandwidth usage.

Equipment
Based on ATM circuits on PSNs, Asynchronous Transfer Mode over Packet Switching
Networks (ATMoPSN) is a type of PWE3 service emulation. ATMoPSN emulates ATM
services over a PSN such as an MPLS or Ethernet network, and transparently transmits ATM
services over the PSN. ATM cells can be encapsulated in the following modes: 1-to-1 VPC, 1-
to-1 VCC, N-to-1 VPC, and N-to-1 VCC.
Purpose
Currently, on mobile carriers' networks, a great number of ATM switches are deployed on the
convergence point to converge ports and bandwidths for ATM and IMA interfaces of Base
Station. With the changes in the entire industry chain, ATM switches are showing
disadvantages in terms of costs and scalability.
Along with the trend of All-IP on core networks and increasing use of the Ethernet technology
on access-layer devices, the Ethernet plus IP solution has become more appealing to
customers than conventional service access and bearing solutions, in terms of both costs and
resource usage. Therefore, for service providers and users, the provision and bearing of ATM
services need to be shifted to PSNs. ATMoPSN is a well-developed solution to meet this
need.
Benefits
This feature offers the following benefits to carriers:
l Construction and maintenance of networks will cost less.
l Networks can be expanded flexibly and bandwidth usage is more efficient.
This feature offers the following benefits to users:
None
6.1.2 Principles
Basic IMA Concepts
l ICP cell
ICP is short for IMA Control Protocol. ICP cells are a type of IMA negotiation cells,
used mainly to synchronize frames and transmit control information (such as the IMA
version, IMA frame length, and peer mode) between communicating devices. The offset
of ICP cells in IMA frames on a link is fixed. Like common cells, ICP cells consist of a
5-byte header and 48-byte payload.
l Filler cell
In the ATM model without an IMA sub-layer, decoupling of cell rates is implemented by
Idle cells at the Transmission Convergence (TC) sub-layer. After the IMA sub-layer is
adopted, decoupling of cell rates can no longer be implemented at the TC sub-layer due
to frame synchronization. Therefore, Filler cells are defined at the IMA sub-layer to
implement decoupling of cell rates. If there is no ATM cell to be sent, the sender sends
Filler cells so that the physical layer transmits cells at a fixed rate.
l Minimum number of active links
It refers to the minimum number of active links that are required when the IMA group
enters the Operational state. Link faults may cause the number of active links for the
IMA group in the Operational state to be smaller than the configured minimum value. As
a result, the IMA group status changes and IMA may go Down. Two communication

Equipment
devices can be configured with different minimum numbers of active links, but both
devices must be configured with at least the specified minimum number of active links to
be able to properly send ATM cells.
l Differential delay
Links in an IMA group may have different delays and jitters. If the difference between
the greatest phase and the smallest phase in an IMA group exceeds the configured
differential delay, the IMA group removes the link with the longest delay from the
cyclical sending queue and informs the peer that the link is unavailable by sending the
Link Control Protocol (LCP) cells. Through negotiation between the two ends of a link,
the link becomes active and then rejoins the cyclical sending queue of the IMA group.
ATMoPSN
Figure 6-1 shows a reference model of ATM cell transport.
Figure 6-1 Model of ATM cell transport

Pseudowire
ATM Service PSN Tunnel ATM Service

PW1
ATN PE1 PW2 PE2 RNC
Emulated Service
Layer 2 service emulation attempts to emulate original ATM services between two PEs
connected through PWs that are set up to transmit packets, cells, and bit streams over public
networks or PSNs.
The outer tag (called PSN Label) identifies a PSN tunnel; the inner tag (called PW Header)
identifies a PW; ATM cells that are used for Layer 2 connections are the payload of PWs.
ATM cell transport involves three levels (port, VP, and VC), four encapsulation modes (N-
to-1, 1-to-1, AAL5-PDU, and AAL-SDU), and two transparent transport modes (cell and
frame). They are applicable to different scenarios.
Currently, only the N-to-1 PVC, N-to-1 VCC, 1-to-1 PVC, or 1-to-1 VCC encapsulation mode
of ALL0 cell is supported.
l N-to-1 VPC
In N-to-1 VPC ATM cell transport, a PW transmits cells of multiple ATM VPCs. A
tunnel packet carries both the VPI and the VCI information, as shown in Figure 6-2.

Equipment
Figure 6-2 Format of a tunnel packet in N-to-1 VPC mode

0 1 2 3
Control word (Optional)
VPI VCI PTI C
ATM Payload (48 Bytes)
VPI VCI PTI C
l N-to-1 VCC
In N-to-1 VCC ATM cell transport, a PW transmits cells of multiple ATM VCCs. A
tunnel packet carries both the VPI and the VCI information. ATM cell transport through
PWs must support the N-to-1 VCC mode. In this mode, multiple VCs can be set up
between a PE and a CE. Data transmission on VCs is independent of each other.
In N-to-1 VCC ATM cell transport, multiple VCs of different service boards can be
mapped to a PW, as shown in Figure 6-3.
Figure 6-3 Format of a tunnel packet in N-to-1 VCC mode

0 1 2 3
Control word (Optional)
VPI VCI PTI C
VPI VCI PTI C
l 1-to-1 VPC

Equipment
In 1-to-1 VPC ATM cell transport, one PW transmits cells of one ATM VPC. A tunnel
packet carries the VCI information but not the VPI information, as shown in Figure 6-4.
Figure 6-4 Format of a tunnel packet in 1-to-1 VPC mode

0 90 90 9
PSN Transport Header (As Required)
Pseudowire Header
0000 Resvd Optional Sequence Number M V Res PTI C
VCI
ATM Cell Payload(48 Bytes)
M V Res PTI C VCI
VCI
l 1-to-1 VCC
In 1-to-1 VCC ATM cell transport, one PW transmits cells of one ATM VCC. A tunnel
packet does not carry the VPI or VCI information, as shown in Figure 6-5.
Figure 6-5 Format of a tunnel packet in 1-to-1 VCC mode

0 90 90 9
PSN Transport Header (As Required)
Pseudowire Header
0000 Resvd Optional Sequence Number M V Res PTI C
M V Res PTI C

Equipment
l M (transport mode) bit

Bit (M) of the control byte indicates whether the packet contains an ATM cell or a frame
payload. If set to 0, the packet contains an ATM cell. If set to 1, the PDU contains an
AAL5 payload.
l V (VCI present) bit
Bit (V) of the control byte indicates whether the VCI field is present in the packet. If set
to 1, the VCI field is present for the cell. If set to 0, no VCI field is present. In the case of
a VCC, the VCI field is not required. For VPC, the VCI field is required and is
transmitted with each cell.
l Reserved bits
The reserved bits should be set to 0 at the transmitter and ignored upon reception.
l PTI Bits
The 3-bit Payload Type Identifier (PTI) incorporates ATM Layer PTI coding of the cell.
These bits are set to the value of the PTI of the encapsulated ATM cell.
l C (CLP) Bit
The Cell Loss Priority (CLP) field indicates CLP value of the encapsulated cell.
l VCI Bits
The 16-bit Virtual Circuit Identifier (VCI) incorporates ATM Layer VCI value of the
cell.
6.1.3 Applications
Applicable Scenario 1
Figure 6-6 Networking diagram of applicable scenario 1

Co-Located Base Station
ATM over
ATM over
E1
E1 E3/OC3
3G Base Station 3G RNC
PE1 PE2
ATM over ATM over
Packet E1
E1 E3/OC3
Switched
Network
2.5G Base Station 2.5G BSC
ATM over ATM over

ATM over PSN
E1 E1 E3/OC3
2G BSC
2G Base Station
Scenario description

Equipment
As show in Figure 6-6, after ATM services from the NodeB are converged at the E1 interface
on PE1, ATM cells are encapsulated into PSN packets that can be transmitted over PSNs.
After arriving at the downlink PE2, the PSN packets are decapsulated into the original ATM
cells and then the ATM cells are sent to the RNC.
Advantages of the solution
In this solution, services of multiple types are converged at a PE on a PSN. This improves the
efficiency of current network resources, reduces Plesiochronous Digital Hierarchy (PDH)
VLLs, and facilitates the deployment of new sites as well as the maintenance and
management of multiple services.
Figure 6-7 Networking diagram of applicable scenario 2
lub
Node B
N*E1(ATM IMA) N*E1(ATM IMA)
MPLS over
Metro Ethernet
CX600 RNC
N*E1(ATM IMA)
Node B
PWE3 ATM Transparent
Cell Transport
Deploying ATN on a Metro Ethernet-based MPLS network, as shown in Figure 6-7, can
solve the problem of bandwidth statistical multiplexing. In this scheme, a NodeB is connected
to the ATN that provides an E1 IMA interface. After the ATN receives cells on the IMA
interface, it transparently transmits the high-speed ATM cell stream through ATM PWE3 to
the CX at the RNC side. Then, the CX at the RNC side divides the high-speed ATM cell
stream into N segments, and sends each segment along a low-speed E1 link to the RNC.
In this solution, MPLS networks are used to implement bandwidth multiplexing, reducing
costs on network construction and maintenance.

Equipment
Terms
None

Acronym & Abbreviation Full Name
FMC Fixed-Mobile Convergence
IMA Inverse Multiplexing for ATM
AN Access Node
PSN Pack Switched Network
IP RAN IP Radio Access Network
PWE3 Pseudo-Wire Emulation Edge-to-Edge
PW Pseudo Wire
6.2 PPP and MP

6.2.1 Introduction
Definition
The Point-to-Point Protocol (PPP) is a link layer protocol.
PPP consists of the following types of protocols:
l Link Control Protocol (LCP): Creates, monitors, and tears down PPP links.
l Network Control Protocol (NCP) suite: negotiates the format and type of packets
transmitted on data links.
l Extended PPP suite (such as PPPoE): provides further supports on PPP functions.
In the event that a single synchronous serial interface cannot meet the bandwidth requirement,
you can use the Multilink PPP (MP) to bundle multiple synchronous serial interfaces to form
a logical interface to meet the bandwidth requirement.
Purpose
PPP transmits data between two peers over full-duplex synchronous or asynchronous links. In
addition, PPP provides authentication mechanisms.

Equipment
6.2.2 Principles
6.2.2.1 Process of Establishing a PPP Connection

A PPP connection is established after a series of negotiations.
Figure 6-8 Flowchart of establishing a PPP connection
AUTHENTICATE SUCCESS
UP OPENED
Dead Establish
FAIL
/NONE
DOWN
Terminate Network
In the process of configuring, maintaining, and terminating the point-to-point (P2P) link, the
P2P link goes through several distinct phases which are specified in Figure 6-8:
1. Link Dead
Setup of a PPP link begins and ends with the Link Dead phase.
After the communicating devices on both ends detect that a physical link is activated
(generally, carrier signals are detected on the link), the devices enter the Link
Establishment phase.
2. Link Establishment
In this phase, the LCP negotiation is performed. The negotiation involves options such
as: Maximum Receive Unit (MRU), authentication mode, magic number, and
asynchronous character mapping.
If the LCP negotiation fails, both ends return to the Link Dead phase. If the LCP
negotiation succeeds, LCP changes to an Open state, indicating that the lower-layer link
has been established and the devices enter the next phase. If authentication is configured,
the devices enter the Authentication phase; if authentication is not configured, the
devices enter the Network-Layer Protocol phase.
3. Network-Layer Protocol
Once PPP has completed the previous phases, each network-layer protocol (such as IP,
IPX, or AppleTalk) must be separately configured by the appropriate Network Control
Protocol (NCP). After an NCP enters the Open state, PPP will carry the corresponding
network-layer protocol packets.
If one device receives a Configure-Request packet in this phase, both devices return to
the Link Establishment phase.
4. Link Termination
PPP can terminate the link at any time. This might happen because of the loss of carrier
signal, authentication failure, link quality failure, the expiration of an idle-period timer,
or the administrative closing of the link.
LCP is used to close the link through the exchange of Terminate packets. When the link
is closing, PPP informs the network-layer protocols so that they may take appropriate
action. After the exchange of Terminate packets, the implementation should signal the
physical-layer to disconnect in order to enforce the termination of the link.

Equipment
6.2.2.2 Process of Establishing an MP Connection

Certain MP options such as the maximum received reconstructed unit (MRRU) and endpoint
discriminator are negotiated in the process of LCP negotiation.
MP negotiation involves LCP negotiation and NCP negotiation in sequence.
l LCP negotiation: Both ends begin with LCP negotiation. In addition to negotiating
general LCP parameters, both ends need to confirm that the peer interface is working in
MP mode. If the two ends are not in this mode, LCP negotiation will fail.
l NCP negotiation: NCP negotiation is performed based on the NCP parameters of the
MP-Group interface or on a specified virtual template (VT). NCP parameters configured
on the physical interface do not take effect.
After NCP negotiation succeeds, an MP link is established.
When configuring an MP-Group interface, ensure that the interfaces to be added to the MP-
Group interface are on the same card in the same slot.
When MP bundling is implemented using a global-MP-group interface, only trunk-serial
interfaces can be added to the global-MP-group interface.
During MP negotiation, the ATN requires a negotiated endpoint discriminator by default. If
the peer device does not send the endpoint discriminator, MP negotiation will fail. In this
case, you can configure the local device so that it does not negotiate the endpoint
discriminator with the peer.
6.2.2.3 PPP and MP Features Supported by the ATN

The ATN supports the configuration of PPP on serial interfaces to implement the following
functions:
l Supporting MRU negotiation
The ATN supports the configuration of MP connections in MP-Group mode. The following
types of interfaces on the ATN support MP bundling:
l Synchronous serial interfaces
The ATN does not support inter-board or inter-subcard MP bundling. Therefore, all the
interfaces to be added to the same MP-Group interface must be on the same subcard in the
same slot. The two ends must add the same number of interfaces to an MP-Group.
In the case of the AND1MD1A/AND1MD1B/AND2MD1A/AND2MD1B, the interfaces in
the former 16 and interfaces in the latter 16 E1 interfaces cannot be added to an MP-Group.
When multiple interfaces on one end are bound to an MP-Group , their directly-connected
interfaces on the other end must be bound to the same MP-Group .
6.2.3 Applications
None

Terms
None

Equipment

MP Multilink PPP
LCP Link Control Protocol
NCP Network Control Protocol
MRU Maximum Receive Unit
RTP Real-Time Transport Protocol
CCP Compression Control Protocol
6.3 CES
6.3.1 Introduction
Definition
l TDM
Time Division Multiplex (TDM) is implemented by dividing a channel by time,
sampling voice signals, and enabling sampled voice signals to occupy a fixed interval
that is called timeslot according to time sequence. In this way, multiple ways of signals,
through TDM, can be combined into one way of high-rate complex digital signal (group
signal) in a certain structure. Each way of signal is transmitted independently.
Figure 6-9 Multiplexing and demultiplexing for TDM
Traditional Transmission Mode

After processed by Pulse Code Modulation (PCM), voice signals, together with other
digital signals, are transmitted through Plesiochronous Digital Hierarchy (PDH) or
Synchronous Digital Hierarchy (SDH) connections by using the TDM technology.
Generally speaking, PDH/SDH services are called TDM services.
Service System
TDM services are classified by transmission mode as follows:

Equipment
– In the PDH system, E1, T1, E3, and T3 are usually used.
– In the SDH system, the STM-1, STM-4, and STM-16 are usually used.
Clock Synchronization
TDM services require clock synchronization. One of the two parties in communication
takes the clock of the other as the source, that is, the device functioning as the Data
Circuit-terminal Equipment (DCE) outputs clocks signals to the device functioning as
the Data Terminal Equipment (DTE). If the clock mode is incorrect or the clock is faulty,
error code is generated or synchronization fails.
The synchronization clock signals for TDM services are extracted from the physical
layer. The 2.048 MHz synchronization clock signals for E1 are extracted from the line
code. The transmission adopts HDB3 or AMI coding that carries timing information.
Therefore, devices can extract clock signals from these two types of codes.
l PWE3
Pseudo Wire Emulation Edge-to-Edge (PWE3) is a mechanism to emulate core features
of the telecom service through PSN, such as a T1 leased line or frame relay (FR). The
PW technology is used to carry emulated services from one PE to another PE or multiple
PEs through a PSN. It adopts a tunnel (IP/MPLS) on the PSN to emulate multiple
services, such as HDLC, PPP, TDM, and Ethernet.
PSN can transmit Protocol Data Units (PDUs) of multiple services. Interoperability and
conversion between the services are not required. Tunnels used for PWE3 are called
pseudo wires (PW). PW data traffic is invisible for the core network. The core network
transparently transmits CE services.
Figure 6-10 PWE3 framework
AC PW AC
PSN Tunnel
CE2 PE2 PSN PE1 CE1
l TDMoPSN
Based on TDM circuits on a PSN, TDM Circuits over Packet Switching Networks
(TDMoPSN) is a kind of PWE3 service emulation. TDMoPSN emulates TDM services
over a PSN such as an MPLS or Ethernet network; therefore, transparently transmitting
TDM services over a PSN. TDMoPSN is mainly implemented by means of two
protocols: Structure-Agnostic TDM over Packet (SAToP) and Structure-Aware TDM
Circuit Emulation Service over Packet Switched Network (CESoPSN).
l IP RAN
IP RAN, mobile carrier, is a technology used to carry wireless services over the IP
network. IP RAN scenarios are complex because different base stations (BSs), interface
technologies, access and convergence scenarios are involved.
– 2G/2.5G/3G/LTE, traditional BSs/IP BSs, GSM/CDMA, TDM/ATM/IP (interface
technologies) are involved.
– Varying with the BS type, distribution model, network environment, and evolution
process, the convergence modes include microwave, MSTP, DSL, PON, and Fiber.
You can converge services on BSs directly to the MAN UPE or through
convergence gateways (with functions of BS convergence, compression
optimization, packet gateway, and offload).

Equipment
– Reliability, security, QoS and operation and maintenance (OM) are considered in IP
RAN scenarios. In some IP RAN scenarios, transmission efficiency is concerned.
Purpose
TDMoPSN is just a mature solution of this kind. TDMoPSN is applied to implement
accessing and bearing of TDM services on the PSN.TDMoPSN is mainly applied to IP RAN
carrying wireless services to carry fixed network services between MSAN devices.
Benefits
The TDMoPSN feature offers the following benefits to carriers:
l Saves rent for expensive TDM leased lines.
l Facilitates smooth evolution of the network.
l Simplifies network operations and reduces maintenance cost.
l Binds only the useful time slots into packets to improve the resource utilization.
The TDMoPSN feature offers the following benefits to users:
Be free from paying expensive rent for leased lines for fixed network operators when an
enterprise access the network for the voice service.
6.3.2 Principles
TDMoPSN
A TDMoPSN packet, as defined by Recommendation rfc4553-Structure-Agnostic Time
Division Multiplexing, includes the Ethernet header, TDMoPSN packet (CESoPSN or SAToP
packet), and FCS.
Figure 6-11 Structure of a TDMoPSN packet
TDMoPSN Frame
Ethernet IP/UDP or Control

TDM Payload FCS
Header MPLS header Word
l SAToP
The Structure-Agnostic TDM over Packet (SAToP) function emulates PDH circuit
services of low rate.
SAToP is used to carry E1 services in unframed mode (non-structured). It divides and
encapsulates serial data streams of TDM services, and then transmits encapsulated
packets in a PW. SAToP is the most simple method to handle transparent transmission of
PDH low-rate services in TDM circuit simulation schemes.

Equipment
Figure 6-12 SAToP
DS1/T1 Payload IP/UDP/RTP/PW
Ch24 ... .
Ch2 Ch1 F
UDT
. TDMoIP
. IWF
Ch24 ... Ch2 Ch1 F
T1 is considered a bit stream which is

transported over the PSN
Features of non-structured transmission mode are as follows:

– The mode does not need to protect the integrity of the structure; it does not need to
explain or operate the channels.
– It is suitable for the PSN of higher transmission performance.
– It needs to neither distinguish channels nor interrupt TDM signaling.
l CESoPSN
Figure 6-13 CESoPSN
Channels may come from any T1 stream
DS1/T1
Ch24 ... Ch2 Ch1 F

Payload IP/UDP/RTP/PW
UDT
Ch24 ... Ch2 Ch1 F
TDMoIP Ch24 ... Ch2 Ch1 Ch24 ... Ch2 Ch1 Header
Ch24 ... Ch2 Ch1 F

IWF
IWF may provide a Cross

Connect function
Features of the structured transmission mode are as follows:

– When services are carried on the PSN, the TDM structure needs to be protected
explicitly.
– The transmission with a sensitive structure can be applied to the PSN with poor
network performance. In this manner, the transmission is more reliable.

Equipment
Other Key Technologies

l Jitter Buffer
After traversing the MPLS network, PW packets may reach the egress PE at different
intervals or packet disorder may occur Therefore, the TDM service flow should be
reconstructed on the egress PE according to the interval of smoothing PW packets
through the jitter buffer technology.
The jitter buffer of a larger capacity can tolerate a greater jitter in the transmission
interval of packets on the network, but it causes a longer delay in the reconstruction of
TDM service data streams. A jitter buffer can be configured by users under different
delay and jitter conditions.
l Analysis of data delay
Most TDM services are voice services and therefore require short delay. As mentioned in
ITU-T G.111 (A.4.4.1 Note3), when the delay reaches 24 ms, the user can sense the echo
in the voice service.
In common applicable scenarios, the TDMoPSN processing delay is calculated as
follows:
TDMoPSN processing delay = Hardware processing delay + Jitter buffer depth +
Encapsulation time + Network delay.
Where:
– Hardware processing delay: unchangeable
– Jitter buffer depth: can be configured by manual
– Encapsulation time: 0.125 ms x Number of encapsulation frames
– Network delay: delay between two PEs on the network
l Clock synchronization
TDM services are data services with fixed rate and therefore require clock
synchronization of input/output services between the upstream device and the
downstream device. Traditional TDM services can synchronize clock through a physical
link but TDMoPSN services are carried on a PSN. As a result, TDM services lose
synchronization clock signals when reaching a downstream PE.
Downstream PEs can synchronize clock in two modes as follows:
– External BITS clock
– Packet recovery clock
Downstream PEs, by following an algorithm, can extract clock signals from
received PWE3 packets, which are called the packet recovery clock. The packet
recovery clock has two algorithms of different implementation methods: Adaptive
Clock Recovery (ACR) and Differentiated Clock Recovery (DCR).
l QoS processing
TDM services require low delay, low jitter, and fixed bandwidth, that is, high priority of
QoS and forwarding.
6.3.2.2 IP RAN Implementation on the Device

SAToP and CESoPSN are two protocol standards of TDM PWE3. Their difference lies in that
SAToP is insensitive to the E1 frame structure and packs the whole E1 frame in a PW; but
CESoPSN is sensitive to the E1 frame structure and packs E1 data by timeslot and tunnel.
Unframed E1 frames are packed to simplify configurations through SAToP and framed E1
frames are packed through CESoPSN.

Equipment
TDMoPSN services on the ATN are encapsulated through MPLS. The CESoPSN
encapsulation structure complies with Recommendation draft-ietf-pwe3-cesopsn-07 and
SAToP encapsulation structure complies with Recommendation rfc4553-Structure-Agnostic
Time Division Multiplexing.
CESoPSN implementation
CESoPSN services are encapsulated through MPLS, with the structure defined by
Recommendation draft-ietf-pwe3-cesopsn-07 as shown in Figure 6-14.
Figure 6-14 CESoPSN
0 1 2 3
MPLS Label Stack
CESoPSN Control Word
OPTIONAL
Fixed RTP Header
Packetized TDM data (Payload)

...
l MPLS Lable
The specified PSN header includes data required for forwarding packets from the PSN
border gateway to the TDM border gateway.
PWs are distinguished by PW tags that are carried on the specified layer of the PSN.
Since TDM is bidirectional, two PWs in reverse directions should be correlated.
l PW Control Word
The structure of the CESoPSN control word is defined by Recommendation draft-ietf-
pwe3-cesopsn-07 as shown in Figure 6-15.
Figure 6-15 PW Control Word
0 1 2 3
0 0 0 0 L R M FRG LEN Sequence number

Equipment
The padding structure of the PW control word on the ATN is as follows:

– Bit 0 to bit 3: padded with 0 fixedly.
– L bit (1 bit), R bit (1 bit), and M bit (2 bits): Used for transparent transmission of
alarms and identifying the detection of severe alarms by an upstream PE on the CE
or AC side.
– FRG (2 bits): padded with 0 fixedly.
– Length (6 bits): length of a TDMoPSN packet (control word and payload) when the
padding bit is used to meet the requirements on the minimum transmission unit on
the PSN. When the length of the TDMoPSN packet is longer than 64 bytes, padding
bit field is padded with all 0s.
– Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The sequence number starts from 0 by default and
starts from 1 after the tdm-sequence-number value is configured.
l Optional RTP
An RTP header can carry timestamp information to a remote device to support packet
recovery clock such as DCR. The packet recovery clock is not discussed in this
document. In addition, packets transmitted on some devices must include the RTP
header. To save bandwidth, no RTP header is recommended under other situations.
The RTP header is not configured by default. You can add it to packets. Configurations
of PEs on both sides must be the same; otherwise, two PEs cannot communicate with
each other.
Figure 6-16 RTP header
0 1 2 3
V=2 P X CC M PT sequence number

timestamp
synchronization source (SSRC) indentifier

contributing source (CSRC) indentifiers
...
The padding method for the RTP header on the ATN is to keep the sequence number (16
bits) consistent with the PW control word and pad other bits with 0s.
l TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by the
number of timeslots bound to PW (bytes). When the length of the whole PW packet is
shorter than 64 bytes, fixed bit fields are padded to meet requirements of Ethernet
transmission.

Equipment
SAToP implementation
SAToP services are encapsulated through MPLS, with the structure defined by
Recommendation rfc4553-Structure-Agnostic Time Division Multiplexing as show in Figure
6-17
Figure 6-17 CESoPSN
0 1 2 3
...
MPLS Label Stack
...
SAToP Control Word
OPTIONAL
Fixed RTP Header
...
TDM data (Payload)
...
l MPLS Lable
The MPLS label for SAToP is the same as the MPLS label for CESoPSN.
l PW Control Word
The structure of the CESoPSN control word is defined by Recommendation RFC4553-
Structure-Agnostic Time Division Multiplexing as show in Figure 6-18.
Figure 6-18 PW Control Word
0 1 2 3
0 0 0 0 L R RSV FRG LEN Sequence number
The padding structure of the PW control word on the ATN is as follows:

– Bit 0 to bit 3: padded with 0 fixedly.
– L bit (1 bit) and R bit (1 bit): Used for transparent transmission of alarms and
identifying the detection of severe alarms by an upstream PE on the CE or AC side.
– RSV (2 bits) and FRG (2 bits): padded with 0 fixedly.
– Length (6 bits): length of a TDMoPSN packet (control word and payload) when the
padding bit is used to meet the requirements on the minimum transmission unit on
the PSN. When the length of the TDMoPSN packet is longer than 64 bytes, the
padding bits are padded with all 0s.

Equipment
– Sequence number (16 bits): It is used for PW sequencing and enabling the detection
of discarded and disordered packets. The length of the sequence number is 16 bits
and has unsigned circular space. The initial value is the sequence number is
random.
l Optional RTP
The optional RTP for SAToP is the same as the optional RTP for CESoPSN.
l TDM Payload
The length of TDM payload is the number of encapsulated frames multiplied by 32
(bytes). When the length of the whole PW packet is shorter than 64 bytes, the fixed bits
are padded to meet requirements of Ethernet transmission.
Timeslot 0 Transparent Transmission

When the E1 frame adopts the structure of the CRC4 multiframe, bits SA4 to SA8 in timeslot
0 of the E1 frame are used to transmit the signaling defined by the operator.
If timeslot 0 is configured on both sides of the PSN, timeslot 0 in the upstream is processed in
the same way as the process method of the data tunnel. Timeslot 0 is packed as a PW or
bound with other timeslots as a PW. In the downstream, the Framer configures transparent
transmission of SA bits, and SA bits use network data and other bits in timeslot 0 are
generated locally.
Statistics of Alarms and Error Codes

l E1
Framed mode: LOS, LOF, RDI, and AIS. Unframed mode: LOS and AIS.
Statistics: Number of missing packets, Number of times the jitter buffer was overrun,
Number of times the jitter buffer was underrun, Number of misorder packets, Number of
malformed packets, Number of misconnected packets, Number of error seconds, Number
of severely error seconds, Number of unavailable seconds
Implementation Procedures
The frequency of E1 frames is 8000 frames/second, namely, 32 bytes/frame. An E1 frame
consists of 32 timeslots and each timeslot corresponds to one byte of 32 bytes. For example,
in CESoPSN mode, timeslot 0 (the byte 0 of 32 bytes) as the frame header, cannot carry data
but is used for special processing. The other 31 timeslots correspond to bytes 1 to 31 of each
E1 frame. In SAToP mode, no frame header is used and an E1 frame consists of 32 bytes.
As shown in Figure 6-19, the following implementation procedures goes from CE1, PE1,
PE2, to CE2. In the direction of TDM transparent transmission from CE1 to PE1, in
CESoPSN mode, PE1 encapsulates bytes 1 to 31 (payload) of the E1 frame received from
CE1 in a PW packet. In SAToP mode, PE1 encapsulates 256 bits as payload from the bit
stream in the form of 32 x 8 = 256bit in a PW packet. The frequency of E1 frames is fixed,
and therefore PE1 receives data (31 bytes or 256 bits) of a fixed frequency from CE1 and then
encapsulates data in the PW packet continuously. When the number of encapsulated frames
reaches the pre-configured number, the whole PW packet is sent to the PSN.
In the encapsulation structure of a PW packet, the control word is mandatory. The L bit, R bit,
and sequence number domain must be paid attention to. The L bit and R bit are used to carry
alarm information. They are used when the TDM transparent transmission process transmits
E1 frame data received by PE1 in a PW to an E1 interface of PE2 and PE1 needs to transmit
alarm information (such as AIS and RDI) from CE1 to a remote device. PE1 reports received

Equipment
alarm information (AIS/RDI) to the control plane. The control plane modifies the L bit and R
bit in the control word of the PW packet and then sends them with E1 frame data to PE2.
The sequence number is used to prevent PW packets from being discarded or disordered
during forwarding on the PSN. Every time a PW packet is sent by PE1, the sequence number
increases by 1.
The downstream traffic goes from PE2 to CE2. After receiving a PW packet from the PSN,
PE2 caches the PW packet in different buffers by the mask included in the sequence number.
For example, the sequence number is 16 bits and 256 buffers are configured for caching, and
therefore the lowest 8 bits of the 16-bit sequence number is cached according to the map
address. When the sequence number of received PW packet is sequential and the configured
jitter buffer for the PW packet reaches the threshold, the PW packet is unpacked and then
sent. For example, 8 frames are encapsulated in a packet. According to the frequency of 8000
frames/second, 8 frames require 1 ms; however, the jitter buffer is configured to 3 ms.
Therefore, PW packets are not sent until its total number reaches 3.
If the PW packet corresponding to a sequence number is not received, an idle code (its
payload is configurable) is sent.
Before the PW packet is resolved and the sequence number is processed, the L bit and R bit
need to be processed. The L bit and R bit that carry alarm information is sent to PE2. After
being extracted with payload, the PW packet is sent to CE2 at the same frequency as that of
CE1 in the way that 31 bytes or 256 bits are included in a frame; otherwise, PE2 overruns or
underruns. Therefore, clock synchronization (frequency synchronization) is required between
the CE1 lock and PE2 clock in TDM transparent transmission.
Alarm Transparent Transmission

Before PWE3 is applied, CEs are directly connected by cables or fibers. In this way, alarms
generated on CE1 can be directly detected by CE2. After PWE3 is applied, CE2 cannot
directly detect alarms generated on CE1 because the PWE3 tunnel between CEs does not have
the circuit features of TDM services. To implement better simulation, alarm transparent
transmission is used.
Figure 6-19 Alarm transparent transmission
AC PW AC
PSN Tunnel
CE1 PE1 PSN PE2 CE2
The interface on the Tansmission is The alarm is The alarm is

AC side of PE1 controlled by the restroed detected
detects the alarm PW control word
As shown in Figure 6-19, it is assumed that data is transmitted from CE2 to CE1. Alarm
transparent transmission is the process of transmitting E1/T1 alarms on PE1 to downstream
PE2 through the PW control word, restoring E1/T1 alarms, and then transmitting them to
CE2, and vice versa.
The types of alarms that can be transparently transmitted are AIS and RDI. Involved PW
control words are the L bit, R bit, and M bit.

Equipment
Other Features
Both the non-slotted TDM interface (SAToP transparent transmission) and the slotted TDM
interface (CESoPSN transparent transmission) can be created.
The serial port supports encapsulation of packets through multiple protocols such as TDM,
ATM, PPP.
The dynamic or static PW protocol is supported.
6.3.3 Applications
Figure 6-20 Applicable Scenario 1
ATM over E1 ATM over

E1 E3/OC3
Node B RNC
PE1 PE2
ATM over
ATM over E1 Packet E1
Switched E3/OC3
Network
Node B RNC
TDM over E1
TDM over E1
E3/OC3
Node B RNC
After TDM services from 2G base stations are converged on the E1 interface on PE1, TDM
packets are encapsulated into PSN packets that can be transmitted on PSNs. After reaching
downstream PE2, PSN packets are decapsulated to original TDM packets and then the TDM
packets are sent to the 2G convergence device.
In the solution, multiple types of services are converged at a PE on the PSN. The solution
effectively saves original network resources, uses less PDH VLLs, and facilitates site
deployment and the maintenance and administration of multiple services.
6.4 BER Measurement

6.4.1 Introduction to BER Measurement
Definition
The bit error rate (BER) is the proportion of error bits to the total bits received by a digital
communication system within a specific period of time. The BER is a key counter for
measuring communication quality. The smaller the BER, the higher the communication
quality.

Equipment
An interface implements BER measurement as follows:

1. Uses the pseudo random binary sequence (PRBS) technique to generate a PRBS bit
sequence.
2. Sends and receives bits in the PRBS bit sequence.
3. Calculates the proportion of error bits to the total bits to obtain the BER for measuring
link quality.
Purpose
BER measurement is implemented using random PRBS bit sequences over an entire link,
monitoring link connectivity and quality.
Benefits
BER measurement brings the following benefits to operators:
l Monitors link quality during network cutover and helps identify potential risks,
improving the cutover success ratio and minimizing user complaints about operator
network issues.
l Helps speed up service deployment and cutover on a network, shortening the service
launch period.
6.4.2 Principles of E1 BER Measurement

6.4.2.1 Basic principle of BER Measurement
PRBS bit sequence

An interface that supports BER measurement uses the PRBS technique to generate, send, and
receive a PRBS bit sequence. The interface then calculates the proportion of error bits to the
total bits to obtain the BER for measuring link quality.
The PRBS is a pseudo random binary sequencer of N bits.
l PRBS bit sequence generation
A PRBS bit sequence is generated by a specific carry flip-flop using a multinomial. The
multinomial varies according to the length of a sequence.
l PRBS bit sequence measurement
As shown in Figure 6-21, R1 sends a PRBS bit sequence through the TX end of an
interface to R2. After receiving the bits, R2 forwards them to the Rx end of the interface
of R1. Once the bits are received at R1, the number of received bits is subtracted from
the number of sent bits to give the number of error bits.
l BER calculation
The BER is calculated using the following equation: BER = Number of error bits/
(Interface rate x Test period)
Figure 6-21 PRBS bit sequence measurement

RX PRBS
TX
R2 R1
RX
TX

Equipment
6.4.3 Application
Typical Application on an IP RAN

On an IP RAN, nNormal service communication may be affected due to link deterioration or
incorrect connections. E1 bit error rate detection can be used to detect the E1 link quality, as
Figure 6-22 E1 transparent transmission on an IP RAN
E1 MPLS/IP E1
Core
Node B ATN CX RNC
The ATN can detect the bit error rate on E1 links between itself and a NodeB or RNC.
Terms
BER Description
BER A key counter for measuring

communication quality. The BER is a
proportion of error bits to the total bits
received by a digital communication system
within a specific period of time.
E1 Recommended by the International

Telecommunication Union
Telecommunication Standardization Sector
(ITU-T). An E1 interface with a
transmission rate of 2.048 Mbit/s complies
with Recommendations G.703 and G.704.
PRBS Generates random data.

BER Bit Error Rate
PRBS Pseudo Random Binary Sequence

Equipment
6.5 APS
This document describes principles and applications of the Automatic Protection Switching
(APS) feature.
NOTE
Only ATN 950B supports the Automatic Protection Switching (APS) feature.
6.5.1 Introduction to APS
Definition
Automatic Protection Switching (APS) is a mechanism of using a protection interface on the
Synchronous Digital Hierarchy (SDH) network as the backup for a working interface. When
APS detects a fault on the working link, a switchover request conveyed by the K1 and K2
bytes of the Multiplex Section Overhead (MSOH) on the protection link is sent to the peer
device. Upon receiving the switchover request, the peer device returns a reply and performs
the switchover action.
Object
APS is an inherent feature of the SDH network. In a mobile bearer network, the ATN needs to
be connected to the Add/DropMultiplex (ADM) on the SDH network or the RNC. The
previous protection mechanisms on the ATN, however, cannot accomplish the task of
protecting the link between the ATN and the ADM or RNC. The APS feature, that is
supported on the ATN, ADM, and RNC, is thus introduced to meet the requirement of link
protection.
Benefits
The APS feature brings remarkable benefits to operators:
l Saving human resources with minimum human interference.

l Improving the reliability of network transmission with minimum service interruption
duration.
l Increasing the success rate of user access with high network reliability.
6.5.2 Principles
6.5.2.1 Basic APS Principles

APS is defined by ITU G.783 and ITU G.841. Being an inherent feature of the SDH network,
APS is a mechanism of using a protect interface on the SDH network as the backup for a
working interface. The SDH network itself is equipped with APS. Therefore, APS is required
only on the router connecting to the ADM on the SDH network to control the switchover of
links between the router and the ADM. When two routers are directly connected, APS is
required at the two ends of the connection.

Equipment
Transmission of an APS Request

The K1 and K2 bytes from the MSOH of the SDH frame indicate the current status of the
APS connection and convey any requests for switch action. The meaning of bits in K1 and K2
bytes are as follows:
K1 byte coding
l Bits 5 to 8 indicate the link number. The value 0 indicates the protection link; 1 to 14
indicate the working link (in 1+1 mode, the working link is always signified by 1); 15
indicates the extra traffic link (in 1:N mode only).
l Bits 1 to 4 indicate the type of the request. For details, see Table 6-1.
Table 6-1 Bits 1 to 4 in the K1 byte

Bits (1 to 4) Condition, state, or external request
1111 Lock of protection
1110 Forced switch
1101 Signal fail high priority (not available in 1 + 1 mode)
1100 Signal fail low priority
1011 Signal degrade high priority (not available in 1 + 1 mode)
1010 Signal degrade low priority
1001 Unused
1000 Manual switch
0111 Unused
0110 Wait-to restore
0101 Unused
0100 Exercise
0011 Unused
0010 Reverse request (only in bidirectional APS mode)
0001 Do not revert (only in unidirectional APS mode)
0000 No request
The priority of 0000 is the lowest, and the priority of 1111 is the highest.
K2 byte coding
l Bits 1 to 4 indicate the link number. The link is defined with the same syntax as K1 Bits
5 to 8.
l Bit 5 indicates the protection mode: 1 indicates the 1:N mode and 0 indicates the 1+1
mode.

Equipment
l Bits 6 to 8 indicate the operation mode or operation code. For details, see Table 6-2.
Table 6-2 Bits 6 to 8 in the K2 byte

Bits (6 to 8) APS Operation Mode or Other Line Information
111 Line Alarm Indication Signal (AIS-L)
110 Line Remote Defect Indication (RDI-L)
101 Bidirectional APS mode
100 Unidirectional APS mode
Others Reserved for future use
APS Modes
l According to the protection architecture, APS modes can be classified into 1 + 1 mode
and 1:N mode
– In 1 + 1 mode, a protection link is paired with each working link. Normally, the
sender periodically sends the signal payload to both the working and protection
links (this process is called bridging), and the receiver obtains the signal payload
from the working link unless the working link becomes unavailable. In most cases,
the switchover action is performed only on the receiver and the negotiation between
the sender and receiver based on K1 and K2 bytes is not required.
In APS 1 + 1 mode, the time taken for the switchover is short and the switch
reliability is high; however, the link usage is as low as 50%. Figure 6-23 shows
detailed switchover procedures in APS 1 + 1 mode.
Figure 6-23 1+1
Near End Remote End
Working link
Normal Condition:
Protection link
one signal is chosen per pair
Working link
Failture Condition:
Protection link the "best" signal is chosen
Source Destination

Equipment
– In 1:N mode, one protection link is allowed to protect up to N (1 to 14) working

links. In normal conditions, the protection link only transmits additional services.
When the working link becomes unavailable, the sender bridges the data on the
working link to the protection link, and the receiver obtains data from the protection
link. The bridged data has a higher priority than the data being transmitted on the
protection link. Therefore, the protection link stops its original transmission task
and preferentially transmits bridged data, as shown in Figure 6-24.
Figure 6-24 1:N
Near End Remote End
Working link
Normal Condition:
protection on
channel is empty
Protection link
Working link Failture Condition:

protection channel
contains failed link
Protection link
Source Destination
When multiple working links become unavailable, only the data on the working link
with the highest priority is switched to the protection link. Data on other working
links is discarded.
When N is 1, the APS mode is 1:1.
In 1:N mode, both the sender and receiver perform the switchover action after the
negotiation based on K1 and K2 bytes. Comparing with 1 + 1 mode, 1:N mode
features higher link usage but lower reliability.
l According to the reverse mode, APS modes can be classified into revertive mode and
non-revertive mode.
In revertive mode, data is switched back from the protection link to the working link
after the working link becomes available and remains stable for several minutes. In non-
revertive mode, data is not switched back from the protection link to the working link
after the working link becomes available. The APS 1 + 1 mode can and be revertive or
non-revertive, the default mode is non-revertive. The APS 1:1 mode can and be revertive
or non-revertive, the default mode is revertive.
l According to the switchover mode in the event of link failure, APS modes can be
classified into unidirectional mode and bidirectional mode.
– In unidirectional mode, only the receiver detects the link failure and performs the
switchover action. After the switchover, the sender and receiver select different
links to receive traffic.

Equipment
– In bidirectional APS mode, the receiver detects the fault, but both the receiver and
sender perform the switchover action after the negotiation based on K bytes. After
the switchover, the receiver and sender select the same link to send or receive data.
6.5.2.2 Implementation of APS
To guarantee the normal transmission of services (above Layer 2) after the APS switchover,
physical interfaces are added to a trunk interface, such as the CPOS-Trunk interface (Trunk-
Serial and Global-MP-Group). Service attributes are configured on the trunk interface to
transparently transmit services above Layer 2. However, physical attributes are configured on
the physical interfaces.
6.5.3 Applications
On the following mobile bearer network, NodeBs connect to an MSTP network through E1
lines. After processing services, the MSTP network sends the services to the ATN device
through CPOS interfaces. To improve service reliability, an APS group can be created, and the
two CPOS interfaces can be added to the APS group. If the working link fails, the ATN
device automatically receives traffic from the protection link, thereby ensuring service
reliability.
Figure 6-25 STM-1 APS
E1
Cpos0/2/1
NodeB1 GE1/3/3 STM-1
MSTP
Cpos0/2/2 GE0/3/3
E1 RNC
NodeB2

Acronym & Full Name
Abbreviation
APS automatic protection switching

Equipment
6.6 xDSL
NOTE
Only ATN 905/ATN 910 supports the xDSL feature.
6.6.1 Introduction
Definition
Digital subscriber line (DSL) provides digital connections over telephone lines without
affecting the plain old telephone service (POTS).
xDSL refers to a family of modulation and demodulation DSL technologies. xDSL uses a
high frequency (over 4 kHz) digital compression mechanism to provide high-speed broadband
network access service. Because the frequency band for xDSL is higher than that for voice
signals, telephone lines can transmit both data and voice signals without one affecting the
other.
The ATN supports ADSL2+, SHDSL, VDSL2.
l Asymmetric digital subscriber line (ADSL) is an asymmetric transmission technology

used to transmit high-speed data over twisted pair cables. ADSL2+ is an extension of
ADSL. The maximum downstream and upstream transmission rates of ADSL2+ are up
to 24 Mbit/s and 2.5 Mbit/s, respectively. The maximum transmission distance of
ADSL2+ is 6.5 km.
l Very-high-speed digital subscriber line 2 (VDSL2) is developed based on ITU-T
Recommendation G.993.2 and is an extension of VDSL1, which is developed based on
ITU-T Recommendation G.993.1. VDSL2 is designed to be compatible with ADSL,
ADSL2, and ADSL2+, but not with the less-common VDSL1.
l Single-pair high-speed digital subscriber line (SHDSL) is defined in ITU-T G.991.2 and
is a new symmetrical digital subscriber line technology developed based on HDSL,
SDSL, and HDSL2. SHDSL uses the Trellis coded pulse amplitude modulation (TC-
PAM) technology to transfer high-speed data over common twisted pairs.
Ethernet in the first mile (EFM) combines the technical advantages of SHDSL and Ethernet,
and provisions the POTS and high-speed Internet access services over common twisted pairs,
while addressing user demands for high definition (HD) TV and video on demand (VOD).
EFM is ideal for providing "last mile" access to residential areas.
Purpose
Obtaining cost-effective broadband resources is a challenge for carriers building mobile
backhaul networks. Fixed-line networks are a viable option for mobile backhaul, but terminal
modems on these networks are difficult to manage and their reliability cannot be guaranteed.
The xDSL physical interface cards (PICs) for ATNs integrate the traditional modems for easy
management and high reliability. xDSL PICs and other PICs designated for ATNs provide
mobile backhaul solutions for carriers.

Equipment
Benefits
This feature helps carriers reduce investment costs in building 3G networks by using the high
bandwidth provided by the legacy copper cable fixed network. Compared with traditional
modems, xDSL also ensures higher reliability and easier management.
6.6.2 Principles
6.6.2.1 Packet Encapsulation Mode

Figure 6-26 lists the packet encapsulation modes.
l In ATM mode, the xDSL PIC encapsulates packets in AAL5 format, converts them into
ATM cells, processes the ATM cells at the physical layer, and then transmit them.
l In EFM mode, the xDSL PIC encapsulates packets in EFM format, processes them at the
physical layer, and then transmits them.
l In IMA mode, the xDSL PIC encapsulates packets in AAL5 format, converts them into
IMA cells, processes the IMA cells at the physical layer, and then transmit them.
Figure 6-26 xDSL packet encapsulation modes
ATM ATM ATM ATM
PWE3 PWE3 PWE3 PWE3
PW Label PW Label PW Label PW Label
LSP Label LSP Label LSP Label LSP Label
Ethernet Ethernet Ethernet Ethernet
AAL5 PTM EFM AAL5
ATM VDSL SHDSL IMA
ADSL/SHDSL PTM EFM SHDSL

ATM ATM
6.6.2.2 Principles of xDSL on the ATN

An ATN provides virtual Ethernet (VE) interfaces to support xDSL.
xDSL uses three types of interfaces:
l xDSL interface: a physical interface on an xDSL PIC. You can disable or enable an
xDSL interface in the xDSL interface view, and configure an xDSL interface in a VE
interface view.
l DSL-group interface: a link-layer logical interface. You can set link-layer attributes in a
DSL-group interface view.

Equipment
l VE interface: a Layer 3 logical interface used only on the NNI side to carry ETHoA
services. You can also run the portswitch command to switch a VE interface to a Layer
2 interface..
Before configuring xDSL services, create a VE interface and a DSL-group interface. Then,
enter the DSL-group interface view and configure link-layer attributes for the xDSL PIC.
Establish the binding relationships between VE and DSL-group interfaces and between DSL-
group and xDSL interfaces. After the binding relationships are established, configure xDSL
services on the VE interfaces. The xDSL services are then configured on the VE interfaces
and carried over the xDSL interfaces.
In the transmit direction, an xDSL PIC receives Ethernet packets, processes them, and then
transmits them. In the receive direction, an xDSL PIC receives and processes service packets,
and then converts them into Ethernet packets for the ATN to further process.
6.6.3 Applications
6.6.3.1 Ethernet-based xDSL Service Forwarding in the Offload Solution

Figure 6-27 shows the offload scenario for Ethernet-based xDSL service forwarding.
Figure 6-27 Offload scenario for Ethernet-based xDSL service forwarding

ATM ATM ATM ATM ATM ATM ATM
E1 PWE3 PWE3 PWE3 PWE3 PWE3 STM-1
PW Label PW Label PW Label PW Label PW Label
LSP Label LSP Label LSP Label LSP Label LSP Label
Ethernet Ethernet Ethernet Ethernet Ethernet
AAL5 PTM EFM
ATM/IMA VDSL SHDSL Eth based
xDSL PTM EFM
ATM/IMA Mode Mode
Mode
Wholesale xDSL
service
DSLAM
HSDPA
flow
ATM
IMA STM-1
Node B ATN 910/950 R99 flow CX600 RNC

Network
In the offload scenario for ETH-based service forwarding, data services (HSDPA flow) are
carried over the Layer 2 Ethernet switching network (wholesale xDSL service network). An
MPLS tunnel must be set up between the ATN and CX devices to carry the PW.
In the ATN-to-CX direction:
l The ATN encapsulates the ATM cells into Ethernet frames. Then, the xDSL PIC
performs EFM/PTM encapsulation or performs AAL5 adaptation and ATM
encapsulation.

Equipment
l The DSLAM terminates the xDSL and ATM encapsulation and transports the Ethernet
frames into the Layer 2 Ethernet switching network.
l The CX receives and decapsulates the Ethernet frames to ATM cells.
xDSL PICs work in different working modes:

l AVD8A works only in ATM mode.
l AVD8B works only in EFM mode.
l SHD4 works either in ATM or EFM mode.
l SHD4I works only in IMA mode.
l ATN 905A-V works only in EFM mode.
l ATN 905-V works in either ATM or EFM mode.
6.6.3.2 IP-based xDSL Service Forwarding for the Offload Solution

Figure 6-28 shows the offload scenario for IP-based xDSL service forwarding.
Figure 6-28 Offload scenario for IP-based xDSL service forwarding

ATM ATM ATM ATM ATM ATM ATM
E1 PWE3 PWE3 PWE3 STM-1
PWE3 PWE3
PW Label PW Label PW Label PW Label PW Label
GRE GRE GRE GRE GRE
IP IP IP IP IP
Ethernet Ethernet Ethernet Ethernet Ethernet
AAL5 PTM EFM
ATM VDSL SHDSL IP based
xDSL EFM
PTM
ATM/IMA Mode Mode
Mode
Wholesale xDSL
service
DSLAM
HSDPA
flow
ATM
IMA STM-1
Node B ATN 910/950 R99 flow CX600 RNC

Network
In the offload scenario for IP-based xDSL service forwarding, data services (HSDPA flow)
are carried over the IP switching network (wholesale xDSL service network). A GRE tunnel
must be set up between the ATN and CX to carry the PW. GRE in this figure indicates that IP
packets are carried by the GRE tunnel.
In the ATN-to-CX direction:
l The ATN encapsulates the ATM cells into Ethernet frames. Then, the xDSL PIC
performs EFM encapsulation or performs AAL5 adaptation and ATM encapsulation.
l DSLAM terminates the xDSL and ATM encapsulation and forwards the packets over
Layer 3 to the CX.

Equipment
l The CX receives and decapsulates the packets to ATM cells.

xDSL PICs work in different working modes:
l AVD8A works only in ATM mode.
l AVD8B works only in EFM mode.
l SHD4 works either in ATM or EFM mode.
l SHD4I works only in IMA mode.
l ATN 905A-V works only in EFM mode.
l ATN 905-V works either in ATM or EFM mode.

Terms
None

Abbreviation
DSL digital subscriber line
xDSL x digital subscriber line
ADSL asymmetric digital subscriber line
VDSL very-high-data-rate digital subscriber line
SHDSL single-pair high-speed digital subscriber line
DSLAM digital subscriber line access multiplexer
EFM Ethernet in the first mile
ATM asynchronous transfer mode
PTM packet transfer mode
IMA inverse multiplexing over ATM
6.7 GPON
Gigabit-capable passive optical network (GPON) is a PON technology standardized by ITU-T
Recommendation G.984.x. GPON devices support high-bandwidth transmission, thereby
addressing the bandwidth bottleneck in twisted-pair access and meeting user demands on
high-bandwidth services.

Equipment
6.7.1 Overview
With the wide use of broadband services and fiber-in and copper-out development, carriers
require a longer transmission reach, higher bandwidth and reliability, and lower operating
expense (OPEX) on services. GPON meets the requirements by providing:
l A longer transmission reach: Optical fibers are used for transmission, providing a
coverage radius of 20 km for the access layer.
l A higher bandwidth: The maximum downstream and upstream bandwidths are 2.5 Gbit/s
and 1.25 Gbit/s, respectively, for each user.
l Quality of service (QoS) for all services: A GPON carries GPON encapsulation mode
(GEM) frames to ensure better QoS.
l Optical splitters: An optical splitter splits a single optical fiber into multiple optical
fibers, allowing a single optical fiber from the central office (CO) to feed multiple users.
Optical splitter conserve optical fiber resources, reduce the number of optical and
electrical devices in the CO, and reduce the OPEX.
A PON is a point to multi-point (P2MP) network and consists of three parts, as shown in
Figure 6-29.
Figure 6-29 PON network

ONU
Passive Optical Network
OLT
Passive Optical Splitter
ONU
l The optical line terminal (OLT) implements a PON protocol and aggregates PON traffic,
and is located at the CO.
l Optical network units (ONUs)/Optical network terminals (ONTs) are located on the user
side to provide various ports for connecting to user terminals. The OLT and ONUs are
connected through a passive optical distribution network (ODN) for communication.
l The ODN is composed of passive optical components, such as optical fibers and one or
more passive optical splitters (POSs). The ODN provides highly reliable optical channels
between the OLT and ONUs.
NOTE
A passive ODN does not require active optical amplifiers or regenerators, saving the costs associated
with maintaining outdoor active devices.

Equipment
6.7.2 Introduction
GPON System
Gigabit-capable passive optical network (GPON) is a mainstream PON technology that
provides gigabit access speeds. Other PON technologies include Ethernet passive optical
network (EPON) and broadband passive optical network (BPON). BPON uses ATM
encapsulation for carrying ATM services (however, as ATM becomes obsolescent, BPON
usage is shrinking).
Figure 6-30 shows the components involved in a GPON network.
Figure 6-30 GPON network
Downstream Wavelength:
1490nm
ODN
OLT
Upstream Wavelength:
ONU 1310nm
Main features:
l On a GPON network, an OLT connects to multiple ONUs by way of an optical splitter
that splits the optical fiber connection from the OLT into multiple optical fibers that
connect to the ONUs. The GPON network wavelength used for transmission in the
upstream direction is 1310 nm, and that used in the downstream direction is 1490 nm.
l Wavelength division multiplexing (WDM) is used to transmit data over an ODN: data is
broadcast in the downstream direction, and time division multiple access (TDMA) is
used in the upstream direction.
GPON Downstream Transmission

In the downstream direction, an OLT broadcasts data to all ONUs. Each ONU accepts data
belonging only to itself, based on ONU IDs, and discards all other data. Figure 6-31 shows
the details.

Equipment
Figure 6-31 GPON downstream communication
ONU1
OLT
ONU2
ONU3 Splitter
GPON Upstream Transmission

In the upstream direction, data is sent using time division multiple access (TDMA). Each
ONU sends data to the OLT only in the timeslots pre-allocated by the OLT, which ensures
correct data sequence and prevents upstream data conflicts. Signals from the ONUs are
coupled by an optical splitter. Figure 6-32 shows the details.
Figure 6-32 GPON upstream communication
ONU1
OLT
ONU2
ONU3 Splitter
6.7.3 GPON Principles
GEM Frame
On a gigabit-capable passive optical network (GPON), a GPON encapsulation mode (GEM)
frame is the smallest service-carrying unit and the basic encapsulation structure. All service
streams are encapsulated into GEM frames, transmitted over GPON midia, and identified by
GEM ports. Each GEM port is identified by a unique port ID that is globally allocated by an

Equipment
OLT. Similar to the virtual path identifier (VPI)/virtual channel identifier (VCI) in an
asynchronous transfer mode (ATM) virtual connection, a GEM port identifies a virtual service
channel that transmits service streams between the OLT and an ONU.
Figure 6-33 shows the GEM frame structure.
Figure 6-33 GEM frame structure
A GEM header consists of payload length indicator (PLI), Port ID, payload type indicator
(PTI), and header error check (HEC), and is used to differentiate data of different GEM ports.
l PLI: identifies the length of the data payload.
l Port ID: uniquely identifies a GEM port.
l PTI: identifies the type and status of the data that is being transmitted. For example, the
PTI value can indicate whether an operation, administration and maintenance (OAM)
message is being transmitted or whether data transmission is complete.
l HEC: provides the forward error correction (FEC) function to ensure transmission
quality.
l Fragment payload: identifies the payload of a frame fragment.
Figure 6-34 shows the mapping between an Ethernet frame and GEM frame.
Figure 6-34 Mapping between an Ethernet frame and GEM frame

Equipment
l The GPON system parses Ethernet frames and maps Ethernet data into GEM payloads
for transmission.
l GEM frames automatically encapsulate header information.
l The mapping format is clear and widely compatible.
T-CONT
A transmission container (T-CONT) is a carrier and basic control unit of upstream service
streams in the GPON system. Each T-CONT is identified by an Alloc-ID, which is allocated
by a GPON port of the OLT. All GEM ports are mapped to T-CONTs. T-CONTs then transmit
upstream service streams to an OLT through dynamic bandwidth allocation (DBA)
scheduling.
ONU OLT
GEM Port
GEM Port T-CONT
GEM Port
GEM Port
T-CONT
GEM Port
T-CONTs are divided into five types, which can be selected based on the upstream service
streams during scheduling. Each T-CONT type has its own quality of service (QoS) feature.
Table 6-3 lists the T-CONT types. Type 1 through Type 5 represent fixed, assured, non-
assured, best-effort, and hybrid modes, respectively.
Table 6-3 T-CONT types

Bandwidth T-CONT Type
Type
Type 1 Type 2 Type 3 Type 4 Type 5
Fixed X – – – X
Bandwidth
Assured – Y Y – Y
Bandwidth
Maximum Z=X Z=Y Z>Y Z Z≥X+Y

Bandwidth

Equipment

Type
Description l The fixed l The l This type l This type This type is
bandwidt assured is the is the the
h is bandwidt combinat maximu combination
reserved h is ion of the m of the fixed,
for available assured bandwidt assured, and
specific at any bandwidt h that can maximum
ONUs or time h and be used bandwidths.
specific required maximu by an It supports
services by an m ONU. the following
on ONU. bandwidt l It applies functions:
ONUs. It When h. The to IPTV l Reserves
cannot be the system and high- bandwidt
used by bandwidt assures speed h that
other h some Internet cannot be
ONUs required bandwidt services. preempte
even if by the h for d for
no service users and users.
upstream streams allows
service on the users to l Provides
streams ONU is preempt bandwidt
are smaller bandwidt h to an
carried than the h. ONU
on the assured However, when
specific bandwidt the total required.
ONUs. h, the used l Allows
Even if system bandwidt users to
no can use h cannot preempt
upstream the DBA exceed some
service mechani the bandwidt
streams sm to maximu h. (The
are allocate m total used
carried the configure bandwidt
on the remainin d h cannot
specific g bandwidt exceed
ONUs, bandwidt h. the
the fixed h to l Applies maximu
bandwidt services to VoIP m
h cannot on other services. configure
be used ONUs. d
by other l Because bandwidt
ONUs. DBA is h.)
l Applies required,
to TDM this type
or VoIP provides
services lower
that are real-time
sensitive performa
nce

Equipment

Type
to service compare
quality. d with
the fixed
bandwidt
h.
NOTE
In Table 6-3, X indicates the fixed bandwidth value, Y indicates the assured bandwidth value, Z indicates the
maximum bandwidth value, and a hyphen (-) indicates not involved.
6.7.3.2 Service Multiplexing Principles

GPON encapsulation mode (GEM) ports and transmission containers (T-CONTs) divide a
passive optical network (PON) network into virtual connections for service multiplexing.
l Each GEM port can carry one or more types of service streams. For GEM ports carrying
service streams, each GEM port must be mapped to a T-CONT before upstream service
scheduling. Each optical network unit (ONU) supports multiple T-CONTs that can
transmit different types of services.
l A T-CONT can be bound to one or more GEM ports. On the optical line terminal (OLT),
GEM ports are demodulated from the T-CONT, and service streams are demodulated
from the GEM port payload for further processing.
Service mapping
l In the downstream direction, the GPON service processing unit encapsulates all service
streams into GEM ports and broadcasts the streams to all ONUs connected to the OLT's
GPON port. Each ONU filters data according to GEM port IDs and accepts only its own
services. Then, each ONU decapsulates service streams from the GEM port and sends
them to the user terminal through an ONU service port. Figure 6-35 shows GPON
service mapping in the downstream direction.
Figure 6-35 GPON downstream service mapping

OLT
GEM Port filter IFgpon ONU 1
IFgpon GEM Port

GEM Port filter
GEM Port
filter IFgpon ONU N

Equipment
l In the upstream direction, ONUs map service streams to GEM ports and then to different
types of T-CONTs. After services are transmitted to an OLT, the T-CONT demodulates
GEM ports and sends them to the GPON MAC chip. The MAC chip demodulates
service streams in the GEM port payload and then sends them to a service processing
unit. Figure 6-36 shows GPON service mapping in the upstream direction.
Figure 6-36 GPON upstream service mapping

OLT
GEM Port T-CONT IFgpon ONU 1
IFgpon
GEM Port
T-CONT IFgpon ONU N
GEM Port
6.7.3.3 GPON Frame Structure

Figure 6-37 shows the gigabit-capable passive optical network (GPON) frame structure.
l A GPON downstream frame has a fixed duration of 125 µs and comprises Physical
Control Block downstream (PCBd) and payload. PCBd mainly consists of the GTC
header and upstream bandwidth map (BWmap). The GTC header is used for frame
delimitation, clock synchronization, and forward error correction (FEC). The BWmap is
used for notifying each optical network unit (ONU) of its upstream bandwidth allocation,
determining the upstream start and end timeslots of the transmission container (T-CONT)
corresponding to each ONU, and ensuring that all ONUs send data in timeslots specified
by an optical line terminal (OLT).
l A GPON upstream frame also has a fixed duration of 125 µs. In the upstream direction,
services are scheduled over T-CONTs in time division multiple access (TDMA) mode.
All ONUs connected to a GPON port share the upstream bandwidth and send their data
upstream in their own timeslots according to the BWmap requirements. Each ONU also
reports the status of data to be sent to the OLT through upstream frames. The OLT then
uses DBA to allocate upstream timeslots to ONUs and periodically updates the timeslots.

Equipment
Figure 6-37 GPON frame structure
Downstream framing
125 µs
Physical Control Block
Downstream (PCBd)
Upstream Payload
Bandwidth Map
ONU
AllodID Start End AllodID Start End AllodID Start End
1 100 200 x 300 500 y 501 650 OLT
T-CONT1 T-CONT x T-CONT y

(ONT 1) (ONT 2) (ONT 2)
Slot Slot Slot Slot Slot Slot
100 200 300 500 501 650
Guard ONU DBRu Payload DBRu Payload

BIP
Ind
Preamble Delimiter PLOAMu
time -ID x x y y
Allocation
PLOu Allocation interval
interval
Upstream framing
Each upstream frame contains the content carried by one or more T-CONTs. The BWmap in
each downstream frame identifies the transmission start time and end time for each T-CONT.
When an ONU receives the PON media access right from another ONU, it must send physical
layer overhead upstream (PLOu) data. If an ONU is allocated two consecutive Alloc-IDs (the
end time of one Alloc-ID is smaller by 1 than the start time of the other Alloc-ID), the ONU
stops sending the PLOu of the second Alloc-ID.
Upstream GPON Frame
An upstream GPON frame consists of the physical layer overhead upstream (PLOu), physical
layer operations, administration, and management upstream (PLOAMu), dynamic bandwidth
report upstream (DBRu), and payload fields. These fields are described as follows:
l PLOu: used for frame alignment, synchronization, and identification for an ONU.
l PLOAMu: used for reporting ONU management messages, including maintenance and
management status. This field must be negotiated and may or may not be carried in a
frame.
l DBRu: used for reporting the T-CONT status to apply for bandwidth next time and for
allocating dynamic bandwidths. This field must be negotiated and may or may not be
carried in a frame.
l Payload: can be a DBA status report or data frame. If this field is a data frame, this field
consists of a GEM header and frames.
Downstream GPON Frame
GPON uses TDM for upstream transmission. If multiple ONUs transmit data upstream
concurrently, transmission conflicts occur. To prevent conflicts, an OLT sends a notification
through the downstream frame, informing each ONU of its timeslot for upstream
transmission.

Equipment
The OLT broadcasts PCBd to all ONUs. Each ONU receives the entire PCBd and performs
operations based on the information contained in the PCBd.
Figure 6-38 shows the PCBd structure.
Figure 6-38 PCBd structure
PCBd Payload
PSync Ident PLOAMd BIP PLend PLend US BW Map

1
4 bytes 4 byte 13 bytes byte 4 bytes 4 bytes N*8bytes
Coverage of this BIP Coverage of next BIP
PCBd contains PSync, Ident, PLOAMd, BIP, PLend, and US BW Map fields. These fields are
l PSync: used by ONUs to specify the start of each frame.
l Ident: used for sorting a frame in the frames of the same type in length sequence.
l PLOAMd: used for reporting ONU management messages, including maintenance and
management status. This field must be negotiated and may or may not be carried in a
frame.
l BIP: used for performing a parity check for all bytes between two BIP fields (excluding
the preamble and delimit) to monitor error codes.
l PLend: used for specifying the length of the BWmap field.
l US BW Map: used by the OLT for sending the upstream bandwidth mapping to each T-
CONT. The BWmap specifies the start and end times for each T-CONT in transmitting
data.
6.7.4 Key GPON Technologies
6.7.4.1 Ranging
The logical distance from optical network units (ONUs) to an optical line terminal (OLT)
vary. The round trip delays (RTDs) between an OLT and ONUs also vary depending on time
and environment. Therefore, collisions may occur when an ONU sends data in TDMA mode
(in this mode, only one of the ONUs connecting to a PON port sends data at any given
moment), as shown in Figure 6-39.

Equipment
Figure 6-39 Cell transmission without ranging
ONU1
Collision
OLT
ONU2
ONU3 Splitter
Ranging helps prevent the collisions and is enabled when an ONU registers for the first time.
In the ranging process, the OLT measures the RTD and calculates the equalization delay
(EqD) of each ONU to ensure that the Teqd value, (which is equal to RTD plus EqD) of all
ONUs connected to the same PON port are the same. Therefore, the logical distance from
each ONU to an OLT are the same, preventing collisions during upstream transmission.
Figure 6-40 Cell transmission with ranging
ONU1
Td1
Based on ranging
OLT
ONU2
Td2
Td3 Splitter
ONU3
NOTE
In the ranging process, the OLT must open a window and pause upstream transmission channels of other
ONUs.
6.7.4.2 Burst Optical/Electrical Technology

Time division multiple access (TDMA) is used in the upstream direction on a gigabit-capable
passive optical network (GPON). An optical network unit (ONU) transmits data only within
its allocated timeslots. During the timeslots that are not allocated to an ONU, the ONU
disables its optical transceiver to prevent interference with other ONUs. Therefore, the optical
line terminal (OLT) receives upstream data from each ONU in bursts, based on the timeslots.
Therefore, OLT-and ONU-side optical modules must support burst transmit and receive
functions to ensure normal running of the GPON system. In the downstream direction,
however, all data is broadcast continuously to ONUs, and therefore the burst functions do not

Equipment
apply in this direction.Figure 6-41 shows the burst transmit function supported by ONU-side
optical modules, and Figure 6-42 shows the burst receive function supported by OLT-side
optical modules.
Figure 6-41 Burst transmit function supported by ONU-side optical modules
ONU1
Burst-transmit module
OLT
ONU2
Continuous-transmit module
ONU3
Ranging can be implemented to prevent cells transmitted by different ONUs from conflicting
with each other on the OLT. However, the ranging accuracy is ± 1 bit, and the cells
transmitted by different ONUs have a protection time of several bits (not a multiple of 1 bit).
If the ONU-side optical modules do not support the burst receive and transmit function, the
transmitted signals overlap and distortion occurs.
Figure 6-42 Burst receive function supported by OLT-side optical modules
Main features:
l The distance from each ONU to the OLT varies and therefore the optical signal
attenuation varies for each ONU. As a result, an OLT may use different power and level
to receive packets at different timeslots.
l If the OLT-side optical modules do not support the burst receive function, the OLT may
restore incorrect signals because only the level greater than the level threshold is valid
and the signals with the level lower than the level threshold cannot be restored.

Equipment
6.7.4.3 DBA
In the GPON system, an OLT controls an ONU's upstream data traffic by sending
authorization signals to the ONU. PON requires an effective TDMA mechanism to control the
upstream traffic so that data packets from multiple ONUs do not collide in upstream
transmission. However, such a mechanism requires quality of service (QoS) management in
an optical distribution network (ODN). The management cannot be implemented or may
severely decrease efficiency because the ODN is a passive network. To resolve this problem,
ITU-TG.984.3 Recommendation defines the dynamic bandwidth allocation (DBA) protocol
for managing upstream PON traffic.
DBA brings the following benefits:
l Improved upstream bandwidth usage on a PON port
l More users on a PON port
l Higher bandwidths for services that have burst requirements
Figure 6-43 shows DBA principles.
Figure 6-43 DBA principles
ONU OLT
DBA report
Control plane DBA algorithm
logic
BW Map
T-CONT
Time slot Data plane
Scheduler
l The OLT controls the upstream traffic by allocating data authorization to each
transmission container (T-CONT) inside the ONU.
l The ONUs report their data status to the OLT. After receiving this report, the OLT uses
DBA to periodically update the information according to the status of data waiting to be
sent on the ONU and notifies all ONUs of the updates through a downstream frame.
l Each ONU dynamically adjusts its upstream bandwidth according to the allocated
bandwidth.

Equipment
NOTE
Bandwidth can also be allocated in static mode, which is also called fixed mode. In this mode, an OLT
periodically allocates a fixed bandwidth to each ONU based on the ONU's service level agreement (SLA),
bandwidth, and delay indicators.
l In static mode, an OLT uses a polling mechanism. The bandwidths allocated to ONUs may vary but the
bandwidth allocated to each ONU is the same in each polling period. The bandwidth guarantee depends
on an ONU's SLA but not on its upstream service traffic. An ONU is allocated a fixed bandwidth,
regardless of whether it is carrying upstream services.
l Static allocation mode is simple and applies to services that require a fixed bandwidth, such as TDM.
However, this mode does not apply to IP services that have burst requirements on bandwidth. If this
mode is implemented to IP services, the upstream bandwidth may fail to meet the upstream service
transmission requirement.
6.7.4.4 FEC
Forward error correction (FEC) detects and corrects bit errors by allowing the transmit end to
encode redundant signals and the receive end to decode the signals based on specific rules.
Common FEC codes include Hamming codes, Reed-Solomon (RS) codes, and convolutional
codes. "Forward" in FEC means error correction is unidirectional, and no error feedback is
provided.
GPON uses RS(255,239) codes in which the codeword is 255 bytes long, consisting of 239
data bytes followed by 16 overhead bytes. RS(255,239) complies with ITU-T G.984.3. The
FEC algorithm reduces the bit error rate (BER) of 10-3 to 10-12 for GPON lines. However, due
to the overhead caused by multi-frame tail fragments, the bandwidth throughput of the GPON
system with FEC enabled is about 90% of that with FEC disabled. Figure 6-44 shows FEC
principles.
Figure 6-44 FEC principles
FEC has the following features:

l Does not require data retransmission and provides a high real-time efficiency.
l Enables channels to be more noise tolerant, but requires additional bandwidth. (Users
must achieve a trade-off between the transmission quality and bandwidth usage.)
FEC applies to the following services:
l Services capable of detecting and correcting errors after arriving at the receive end.
l Services transmitted on a network that has a poor quality. For example, FEC can be
enabled when the transmission distance from an OLT to an ONT is excessive or the
transmission line is of poor quality, which results in insufficient optical power budget or
high BERs.

Equipment
l Services that require low delays.
6.7.4.5 Line Encryption

In GPON systems, upstream data is transmitted using different wavelengths. Therefore, one
ONU cannot intercept the upstream data of other ONUs. However, downstream data is
broadcast to all ONUs. This causes a potential risk that an unauthorized user intercepts the
downstream data of some or all ONUs. Line encryption technologies are required to eliminate
the risk of data theft. Figure 6-45 shows the line encryption process.
Figure 6-45 Line encryption process
Encryption Algorithm
The encryption algorithm uses the advanced encryption standard (AES). Also known as the
Rijndael algorithm, AES is a block cipher-based standard described in documents published
by the National Institute of Standards and Technology (NIST). AES replaces the original data
encryption standard (DES) and has been used worldwide after being analyzed by multiple
institutes. The GPON system uses the AES-128 encryption algorithm in counter (CTR) mode.
In this mode, the AES-128 encryption algorithm generates a 16-byte pseudo-random cipher
block stream that is used to perform an exclusive OR operation with the input simple text to
produce the ciphertext key. To regenerate the simple text, the ciphertext is used to perform an
exclusive OR operation with the same pseudo-random cipher block stream. The AES key
length is fixed at 128 bits.
Key Change
1. An OLT initiates a key change request to an ONU. The ONU responds to the request and
sends a new key to the OLT.
2. After receiving the new key, the OLT replaces the existing key with the new one and
uses the new key to encrypt data.
3. The OLT sends the frame number that uses the new key to the ONU.
4. The ONU receives the frame number and changes the verification key on data frames.
NOTE
l Because the length of a physical layer OAM (PLOAM) message is limited, the ONU sends the key in two
pieces to the OLT for three times. For redundancy, the key is sent three times. If the OLT does not receive
either part of the key after the three sending attempts, the OLT re-initiates a key replacement request to
the ONU. If the key transmission fails three times, the OLT declares a loss of key synchronization (LOKi)
and deactivates the ONU.
l The OLT delivers a command three times to instruct the ONU to use the frame number of the new key.
The ONU switches the verification key on data frames once it receives the command.

Equipment
Features and Applications

l The line encryption algorithms used in the GPON system neither increase overheads nor
decrease bandwidth usage.
l The line encryption algorithms will not prolong transmission delays.
l Enable line encryption in scenarios that require high security.
6.7.5 GPON Terminal Authentication and Management

GPON terminal authentication allows an OLT to authenticate an ONU according to the ONU-
reported authentication information. Only authenticated ONUs can access the GPON system.
After an ONU passes authentication and goes online, data can be transmitted between the
ONU and OLT.
6.7.5.1 GPON Terminal Authentication (an ONU Not Preconfigured)

Figure 6-46 shows the authentication process of an ONU that is not preconfigured.
1. The OLT sends a serial number (SN) request to the ONU.
2. The ONU responds to the SN request message.
3. Upon receipt of the SN response, the OLT assigns a temporary ONU ID to the ONU.
4. After the ONU enters the operation state, the OLT sends a password request to the ONU.
The ONU then responds with a password that is not configured on the OLT.
– If the automatic discovery function is not enabled on the PON port to which the
ONU is connected, the OLT sends a deregister message to the ONU. Upon
receiving this message, the ONU sends a register request message to the OLT.
– If the automatic discovery function is enabled on the PON port to which the ONU is
connected, the port reports an alarm to the command line interface (CLI) or network
management system (NMS), indicating that the ONU is automatically discovered.
The ONU can go online only after being confirmed.

Equipment
Figure 6-46 Authentication process of an ONU that is not preconfigured

ONU OLT
DS Frame with valid Psync

O1: Initial state
Upstream_Overhead PLOAM
O2: Standby state
SN _Request(BWMap)
Serial_Number_ONU PLOAM
The OLT assigns a O3: Serial number state

temporary ONU ID
when the SN is not
Assign ONU_ID configured on the OLT.
Ranging request
Ranging response O4: Ranging state
Ranging time
Request password
Password
The OLT sends a deregister O5: Operation state

message to the ONU when the
The ONU returns
password is not configured on
to the O2 state.
the OLT and automatic discovery
is not enabled on the PON port.
6.7.5.2 GPON Terminal Authentication (an ONU Preconfigured)

A preconfigured optical network unit (ONU) can be authenticated in three modes: SN, SN
+password, and password.
SN Authentication
In SN authentication, the OLT matches only the ONU SN. Figure 6-47 shows the SN
authentication process.
NOTE
Currently, the ATN only supports SN authentication.

Equipment
Figure 6-47 SN authentication

ONU OLT

O1: Initial state
O2: Standby state
SN _Request(BWMap)
O3: Serial number state

SN is matched.
Assign ONU_ID
Ranging request
Ranging time
Normal-state Normal-state
ONU OLT
l After receiving an SN response message from an ONU, the OLT checks whether another
ONU with the same SN is online. If such an ONU is online, the OLT reports an SN
conflict alarm to the command line interface (CLI) or network management system
(NMS). Otherwise, the OLT directly assigns an ONU ID to the ONU.
l After the ONU enters the operation state, the OLT does not send a password request to
this ONU. Instead, the OLT automatically configures a GPON encapsulation mode
(GEM) port for the ONU to carry optical network terminal management and control
interface (OMCI) messages, and allows the ONU to go online. The GEM port must have
the same ID as the ONU ID. After the ONU goes online, the OLT reports an ONU online
alarm to the CLI or NMS.
SN+Password Authentication
In SN+password authentication, the OLT matches both the ONU SN and password. Figure
6-48 shows the SN+password authentication process.

Equipment
Figure 6-48 SN+password authentication

ONU OLT

O1: Initial state
O2: Standby state
SN _Request(BWMap)
SN is
matched.
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
O5: Operation state

Password is
matched.
ONU OLT
l After receiving an SN response message from an ONU, the OLT checks whether another
ONU with the same SN is online. If such an ONU is online, the OLT reports an SN
conflict alarm to the CLI or NMS. Otherwise, the OLT directly assigns an ONU ID to the
ONU.
l After the ONU enters the operation state, the OLT sends a password request to the ONU.
After the ONU responds with a password, the OLT compares the password with the local
password. If the two passwords are the same, the OLT directly configures a GEM port
for the ONU to carry OMCI messages, allows the ONU to go online, and reports an
ONU online alarm to the CLI or NMS. If the two passwords are different, the OLT
reports a password error alarm to the CLI or NMS. The OLT does not report an ONU
automatic discovery message even if the ONU automatic discovery function is enabled
on the PON port. Instead, the OLT sends the Deactivate_ONU-ID PLOAM message to
deregister the ONU.
Password Authentication
In password authentication, an ONU that has password authentication configured connects to
a PON port. If the OLT determines that the ONU SN or password conflicts with that of an
online ONU, the OLT deregisters the ONU to be authenticated, protecting the online ONU
from being affected. Password authentication is available in two modes: once-on and always-
on.
The once-on mode applies to the following scenario: A carrier allocates a password to a user
and requires the user to go online within a specified time. After going online, the user cannot

Equipment
change the ONU. To change the ONU, the user must notify the carrier. In once-on mode, the
aging time is configurable. After the aging time is set, the ONU must register with the OLT
and go online within the preset aging time. Otherwise, the ONU is not allowed to register with
the OLT or go online. Once the ONU is authenticated, its SN cannot be changed.
For the once-on mode:
l Only the initial authentication of an ONU is performed by password, as shown in Figure
6-49.
l In subsequent authentications, the ONU can be authenticated in SN or SN+password
mode according to the CLI configuration, as shown in Figure 6-47 or Figure 6-48.
Figure 6-49 Initial ONU authentication in once-on mode

ONU OLT

O1: Initial state
O2: Standby state
SN _Request(BWMap)
For the ONU that goes O3: Serial number state

online for the first time, the
OLT records the ONU SN.
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
Password is
matched. O5: Operation state
ONU OLT
In once-on mode, before the ONU registration times out or before the ONU successfully
registers with the OLT for the first time, the ONU discovery status is ON. Only the ONU
whose discovery status is ON is allowed to register with the OLT and go online. After the
ONU registration times out or after the ONU successfully registers with the OLT for the first
time, the OLT sets the ONU discovery status to OFF.
l The ONU whose registration times out is not allowed to register with the OLT or go
online. The registration timeout flag of the ONU needs to be reset at the central office
(CO), and then the ONU can go online.
l An ONU that successfully registers for the first time is allowed to register and go online
again.
The always-on mode applies to the following scenario: A carrier allocates a password to a
user, and the user can use different ONUs with this password and different SNs. The user can

Equipment
change the ONU without notifying the carrier. In always-on mode, no restriction is set on the
time when the user goes online.
l An ONU is authenticated in password mode when it goes online for the first time. After
the ONU passes the password authentication and goes online successfully, the OLT
generates an SN+password binding entry according to the ONU SN and password.
Figure 6-50 shows the authentication process.
l If an ONU goes online not for the first time, the following situation may occur:
– If the SN and password of the ONU are the same as those of the ONU that goes
online for the first time, the ONU is authenticated in SN+password mode. Figure
6-48 shows the authentication process.
– If the user replaces the ONU with another ONU that has the same password but a
different SN, the new ONU is authenticated in password mode. After this ONU
passes authentication and goes online successfully, the original SN+password
binding entry is updated. Figure 6-50 shows the authentication process.
Figure 6-50 ONU authentication in always-on mode

ONU OLT

O1: Initial state
O2: Standby state
SN _Request(BWMap)
Assign ONU_ID
Ranging request
Ranging time
Request password
Password
Password is
O5: Operation state
matched.
ONU OLT
6.7.6 Networking Applications (FTTx)

GPON adopts the passive optical transmission technology and is mainly applicable to
scenarios such as fiber to the mobility base station (FTTM), fiber to the office (FTTO), fiber
to the building (FTTB), fiber to the curb (FTTC), fiber to the WLAN (FTTW), and fiber to
the home (FTTH) to provide the following services (as shown in Figure 6-51):
l Voice

Equipment
l Data
l Video
l Leased line
l Distributed service
Figure 6-51 FTTx network applications

iManager U2000
FTTM
BTS GE/10GE Multicast-Server
Node B ONU NGN/IMS

FTTO
PBX
STM-1/E1
SDH/Metro
ONU Enterprise
Enterprise Router Splitter
HQ
FTTB/FTTC OLT
ONU
E1/GE
Terms
None

Abbreviation
FTTB fiber to the building
FTTC fiber to the curb
FTTH fiber to the home
FTTM fiber to the mobile base station
GPON gigabit-capable passive optical network
IMS IP multimedia subsystem

Equipment

Abbreviation
NGN next generation network
ODN optical distribution network
OLT optical line terminal
ONT optical network terminal
ONU optical network unit
PBX private branch exchange
PLMN public land mobile network
PON passive optical network
POTS plain old telephone service

Equipment
Feature Description 7 IP Services
7 IP Services
About This Chapter
This document describes the IP services in terms of the overview, principle, and applications.
7.1 IP Addressing
This chapter provides an introduction to Internet Protocol (IP) addressing, the principles of IP
addresses, and IP applications.
7.2 ARP
7.3 ACL
7.4 IPv4
7.5 IP Unicast Policy-Based Routing
7.6 IPv6
7.1 IP Addressing
This chapter provides an introduction to Internet Protocol (IP) addressing, the principles of IP
addresses, and IP applications.
7.1.1 Introduction to IP Addresses

An Internet Protocol (IP) address is a numerical label that is assigned to devices on a
computer network, which uses the Internet Protocol for communication between its nodes.
The architecture of an IP address is based on the network structure. An IP address consists of
the network ID and host ID.
You need to allocate IP addresses for the hosts on an IP network. To connect a computer to the
Internet, you need to apply to the Internet Service Provider (ISP) for an IP address.
An IP address is a 32-bit binary number. To help users to recognize and remember IP

addresses, IP addresses are expressed in dotted decimal notation. In dotted decimal notation,
an IP address consists of four dotted decimal integers. Each decimal integer corresponds to a

Equipment
byte. For example, the binary IP address of the Host A is 00001010 00000001 00000001
00000010; the decimal IP address of the Host A is 10.1.1.2.
An IP address consists of the following fields:
l Network ID field (net-id): It is used to distinguish networks. The bits of the net-ID are
called the class field (or class bits). These bits are used to distinguish the IP address
class.
l Host ID field (host-id): It is used to distinguish different hosts on the network.
The network ID field identifies a network, and the host ID field identifies a connection of the
network device on the network. If multiple network devices have the same network ID, they
reside at the same network regardless of their locations. That is, whether multiple network
devices on a public network reside at the same network does not depend on their locations.
7.1.2 Principles
This section describes the classification and characteristics of IP addresses, as well as private
and special IP addresses.
7.1.2.1 Classes of IP Addresses

As shown in Figure 7-1, IP addresses are divided into five classes to facilitate IP address
management and networking.
You can determine the class of an IP address depending on the first bits of the network ID
field. This is the simplest method to distinguish each class of addresses.
Figure 7-1 Five classes of IP addresses

0 7 15 23 31
A 0 Net-id Host-id
B 1 0 Net-id Host-id
C 1 1 0 Net-id Host-id
D 1 1 1 0 Multicast-address
E 1 1 1 1 Reserved
Most IP addresses in use belong to Class A, Class B, or Class C. Class-D IP addresses are
multicast addresses, and Class-E IP addresses E are reserved. For details, refer to RFC 1166
(Internet Numbers).
Certain IP addresses are reserved for special uses. Table 7-1 lists the ranges of IP addresses
for all five classes.

Equipment
Table 7-1 IP address classes and ranges
Class Address Available IP Description

Range Network
Range
A 0.0.0.0 to 1.0.0.0 to The IP addresses with all-0 host IDs are

127.255.255.255 126.0.0.0 network addresses, and they are used for
network routing. The IP addresses with all-1
host IDs are broadcast addresses. A packet
with such an IP address is broadcast to all
hosts on the network. The IP address 0.0.0.0
is only used for temporary communication
after the system in DHCP mode starts up. It
cannot be an effective destination address.
The IP address with an all-0 network ID
represents the current host. This IP address
allows the host to use the current network
even if it does not know its network ID. The
IP addresses with the network ID in the
127.X.Y.Z format are reserved for loopback
test. The packets destined for these addresses
will not be sent to the network. These
packets are processed internally and
considered as input packets.
B 128.0.0.0 to 128.1.0.0 to The IP addresses with all-0 host IDs are

hosts on the network.
C 192.0.0.0 to 192.0.1.0 to The IP addresses with all-0 host IDs are

hosts on the network.
D 224.0.0.0 to None The addresses of Class D are multicast

239.255.255.255 addresses.
E 240.0.0.0 to None The addresses are reserved for future use.

255.255.255.255 The IP address 255.255.255.255 is used as a
LAN broadcast address.
7.1.2.2 Characteristics of IP Addresses

The major characteristics of IP addresses are as follows:
l IP addresses have a non-hierarchical structure that is different from the structure of

telephone numbers. That is, IP addresses do not show any geographical information

Equipment
about the host position. The network ID field determines which network a host belongs
to.
l When a host is connected to two networks, the host must have two IP addresses with
different net-IDs. This host is called a multi-homed host. Each interface on a host has an
IP address. Therefore, a multi-interface host has multiple IP addresses.
l According to the Internet concept, different LANs connected through repeaters or
bridges are in the same network. Therefore, these LANs have the same net-ID.
l For IP addresses, all networks assigned with net-IDs are equal (regardless of whether it
is a small LAN or a large WAN).
7.1.2.3 Special IP Addresses

In real world applications, some special IP addresses are used. Table 7-2 shows the ranges
and description of these IP addresses.
Table 7-2 Special IP addresses
Net ID Subnet Host Used as Used as Description

ID ID Source Destination
Address Address
All 0s - All 0s Ok Never Applies to hosts on a

network.
All 0s - Host ID Ok Never Applies to a specified host

on a network.
127 - Any Ok Ok Used as a loopback

value address.
All 1s - All 1s Never Ok Applies to restricted

broadcast (never
forwarded).
net-id - All 1s Never Ok Used to send broadcast

packets to the network
specified by the net-id.
net-id subnet-id All 1s Never Ok This type of addresses is

used to send broadcast
packets to the subnets
specified by the net-id and
subnet-id.
net-id All 1s All 1s Never Ok Used to send broadcast

packets to all subnets
specified by the net-id.
NOTE
In Table 7-2, net-id and subnet-id indicate the fields that are neither all zero bits nor all one bits.

Equipment
7.1.2.4 Private IP Addresses

Private addresses are used to resolve the problem of IP address shortage. Private addresses
indicate the IP addresses of internal networks or hosts. These addresses can only be used in
one internal network instead of a public network. RFC 1918 describes three IP address
network segments reserved for private network.
Table 7-3 shows the private network addresses reserved by the Internet Address Distribution
Organization.
Table 7-3 Private IP Addresses

Network IP Address Range
A From 10.0.0.0 to 10.255.255.255
B From 172.16.0.0 to 172.31.255.255
C From 192.168.0.0 to 192.168.255.255
7.1.3 Applications
This section describes applications of IP addresses.
7.1.3.1 Subnetting
The network part of an IP address is called the network address. The network address
identifies a unique network segment. A network administrator can divide a network address
into subnets so that broadcast packets are transmitted within a single subnet.
From the perspective of address allocation, subnets are supplements to network addresses.
Only the net-id is assigned so that IP addresses can be used flexibly when an enterprise
applies for IP addresses. The specific host-ids are assigned by the enterprise as long as there is
no repetition of host IDs in the Intranet.
When hosts are widely scattered on a network, you can divide the internal host-ids into many
subnets. Through the subnet classification, the entire network can be divided into smaller
networks.
Subnets on an enterprise network are invisible outside the enterprise. When an external packet
enters the enterprise network, the internal devices select the routes based on the subnet ID.
The devices then forward the packet to the destination host.
Figure 7-2 shows the subnetting of a Class B IP address. The subnet mask consists of a string
of continuous 1s and 0s. The 1s correspond to the net ID field and the subnet ID field. The 0s
correspond to the host ID field.
Figure 7-2 Subnetting of a Class B address

7 15 21 31
Class B
Net-id Host-id
address
Mask 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Subnet Net-id Subnet-id Host-id
Mask 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

Equipment
After performing an AND operation on the 32-bit IP address and the corresponding subnet
mask, you can get the net ID of an IP address. If the IP address is 10.1.1.2 and the subnet
mask is 255.255.0.0, you can get 10.1.0.0 as the network address after performing the AND
operation on the IP address and the corresponding subnet mask.
Subnetting reduces the available IP addresses for hosts. For example, an IP address of Class B
originally can accommodate 65534 host IDs. After a 6-bit subnet field is classified, there can
be a maximum of 64 subnets. Each subnet has a 10-bit host ID, which means each subnet has
a maximum of 1022 (210-2, except the host IDs with all 1s and all 0s) host IDs. Therefore,
there are 65408 (64 x 1022 = 65408) host IDs, 126 less than the number of IDs before subnet
classification.
If an enterprise does not divide its network into subnets, the subnet mask is the default value.
The number of 1s in the subnet mask indicates the net ID length. Therefore, the default values
of the subnet mask for Class A, Class B and Class C IP addresses are 255.0.0.0, 255.255.0.0,
and 255.255.255.0 respectively.
During subnetting and IP address planning, consider the following rules to implement
reasonable and efficient network planning:
Hierarchy
To divide network in hierarchy, consider geographic and service factors so that subnetting is
subject to network hierarchies in top-down mode. In this manner, networks are effectively
managed and routing tables are simplified. In most cases:
l A network consisting of a backbone network and a MAN is divided into flattened

subnets.
l An administrative network is divided into multi-level subnets.
Consistency
Consecutive addresses facilitate routing aggregation on a hierarchical network, which greatly
reduces the number of routing tables and improves route searching efficiency. When
allocating IP addresses, note the following issues:
l Allocate consecutive IP addresses to each area.

l Allocate consecutive IP addresses to devices that have the same service and function.
Expansibility
When you allocate addresses, reserve certain addresses in each hierarchy. In this manner,
consecutive addresses can be allocated to an expanded network, implementing long-term
network planning.
A backbone network must have enough consecutive addresses for independent ASs and
further network expansion.
Efficiency
When planning subnets, fully use address resources as follows to ensure that are sufficient IP
addresses for hosts:

Equipment
l Use variable-length subnet masking (VLSM) to fully and properly use address resources.
l Considering the routing mechanism of networks to fully use the IP address spaces that
have been allocated for better IP address utilization.
Service-oriented
Devices that have similar functions should be allocated IP addresses of the same type. IP
address allocation complies with the following rules:
l High-end ATNs, IP telephony gateways, IP telephony gatekeepers, Internet servers,

firewalls, and edge or access ATNs should be allocated public network IP addresses.
l Loopback interfaces, which are device management interfaces, should be allocated a
section of consecutive IP addresses with 32-bit masks.
l Interfaces connecting devices should be allocated a section of consecutive IP addresses
with 30-bit masks.
7.1.3.2 IP Address Allocation

Users can access the Internet only with a valid IP address. Therefore, an access device must
support unified IP address allocation and management. Three IP address allocation modes are
available.
Manually Allocating IP Addresses

You can directly configure an IP address on your computer. This mode is suitable for the
server used for certain purposes or for users with special requirements, such as Web servers
and ATNs. To prevent an IP address from being spoofed, configure IP/VLAN or IP/PVC
binding on the access server.
Allocating IP Addresses Using DHCP

The Dynamic Host Configuration Protocol (DHCP) works in client/server mode. The
administrator sets an IP address range on a DHCP server. With DHCP, a client can apply for
configuration information from a DHCP server, including the assigned IP address, the subnet
mask, and the default gateway. The DHCP server can reply the configuration information
based on a certain policy.
7.1.3.3 IP Address Unnumbered

When an interface has no IP address, it cannot generate routes or forward packets. The IP
address unnumbered mechanism allows an interface without an IP address to borrow an IP
address from another interface.
The IP address unnumbered mechanism conserves IP addresses. When an interface is used

occasionally, this interface can be configured with an unnumbered IP address. In this manner,
it does not occupy an exclusive IP address all the time.
7.1.3.4 IP Address Resolution

A device that connects to multiple networks has the IP addresses of multiple networks. You
cannot directly use the IP addresses mentioned earlier for the following reasons:

Equipment
l An IP address is the network layer address of a host. Before a datagram is transmitted to

the destination host, the physical address of the host must be obtained. Therefore, the IP
address must be first resolved into a physical address.
l To remember a host name is easier than to remember an IP address. Therefore, the host
name needs to be resolved into an IP address.
Figure 7-3 shows the relationship between a host name, an IP address, and a physical address.
On the Ethernet, the physical address of a host refers to the MAC address. The DNS server
resolves a host name into an IP address. The Address Resolution Protocol (ARP) resolves the
IP address into a MAC address.
Figure 7-3 Relationship between the host name, IP address, and physical address
DNS:Hostname->IP
Source DNS Server

Hostname:HostA
IP:209.0.0.5/24
ARP:IP->MAC
Destination
Hostname:HostB
IP:209.0.0.6/24
MAC:0800-2B00-EE0A
7.1.3.5 IP Address Overlapping in the VPN Instance
VPN Instance
The concept of VPN instance is first introduced by BGP/MPLS VPN to isolate VPN routes
from public routes and isolate different routes between different VPNs.
In this manner, VPN instances can also be widely used in non-BGP/MPLS VPN network
environments. By using VPN instances, you can create several independent virtual devices on
the same device. In addition, routes in an IP network are isolated.
In the ATN, various software features support VPN instances. They are bound to different
VPN instances, and they can implement VPN multi-instance function, such as multi-instances
of various routing protocols (RIP multi-instances, OSPF multi-instances, ISIS multi-instances,
and BGP multi-instances).

Equipment

Terms
Terms Description
dotted decimal A format of IP address. IP addresses in this format are separated into
notation four parts by a dot "." with each part in the decimal numeral format.
IP address An interface borrowing the IP address from other interface when it is

unnumbered not configured with an IP address.
private IP address An IP address allocated to an interior network or host. It cannot be

allocated to a public network.
subnet mask A subnet mask is represented by 32-bit binary digits. It is used to

identify the network number of an IP address.
7.2 ARP
For the ATN 905A-P that functions as a small cell dock, if the destination IP address in ARP
reply packets received by an interface is not the interface's IP address, ARP entries are not
learned or updated.
7.2.1 Introduction to ARP

Definition
The Address Resolution Protocol (ARP) is used to resolve IP addresses into MAC addresses.
Purpose
If two hosts need to communicate, the network-layer address (IP address) of the receiver must
be available to the sender. Because IP datagrams must be encapsulated into frames before they
can be transmitted over the physical network, the physical address (MAC address) of the
receiver must also be available to the sender. The sender must map the IP address of the
receiver to the receiver's MAC address, so that IP datagrams can be successfully transmitted.
ARP provides a mechanism for mapping IP addresses to MAC addresses.
Function Overview
In addition to the previous function, ARP has other features, as described in Table 7-4.

Equipment
Table 7-4 ARP features

Feature Description Usage Scenario
7.2.2.2 Dynamic ARP Devices dynamically learn Real-time communication is a

and update the mapping priority, or network resources
between IP addresses and are insufficient.
MAC addresses by using
ARP messages. The mapping
does not need to be
configured manually.
7.2.2.3 Static ARP The mapping between IP Communication security is a

addresses and MAC priority, and network
addresses is manually created resources are sufficient.
and cannot be dynamically
modified.
7.2.2.5 Gratuitous ARP Gratuitous ARP is used to To identify IP address

check whether the IP address conflicts, so that network
of a device conflicts with the deployment can be promptly
IP address of the local host, to adjusted.
notify other devices in the
same network segment of the
new MAC address after the
network adapter on the local
host is replaced, or to notify
master/slave switchovers in a
Virtual Router Redundancy
Protocol (VRRP) backup
group.
7.2.2.6 Proxy ARP If a source and a destination l If two hosts on different

hosts are on the same network subnets of a network
segment, the source host segment want to
sends an ARP request packet communicate, routed
that carries the destination proxy ARP must be
host's IP address. If a gateway enabled on the gateway
on which proxy ARP is device.
enabled receives the packet, l If two isolated hosts in the
the gateway sends its own same virtual local area
MAC address to the source network (VLAN) want to
host. communicate, intra-
VLAN proxy ARP must
be enabled on the gateway
device.
l If two hosts in different
VLANs want to
implement Layer 2
communication, inter-
VLAN proxy ARP must
be enabled on the gateway
device.

Equipment
Feature Description Usage Scenario
7.2.2.7 ARP-Ping Before configuring an IP To prevent address conflict,

address or MAC address for a use ARP-Ping to check
device, check whether the whether an address is being
address is being used by used by another device on the
another device on the network network before configuring
by sending ARP messages. the address as the IP address
of a device or as the MAC
address of an interface on a
device.
7.2.2.8 IP Address IP address conflict detection IP address conflict detection

Conflict Detection helps users quickly locate and is applicable to Ethernet
modify IP address conflicts to LANs, and it helps users
ensure stability and security quickly locate and modify the
of user services. conflicted IP addresses and
instruct users to properly
configure and manage the IP
addresses of devices on a
network.
7.2.2.9 ARP Security Security and stability of To ensure stable, real-time,

network devices are ensured and secure communication,
by filtering out untrusted the customer can choose to
ARP messages and enabling deploy ARP security based on
timestamp suppression for actual requirements.
ARP messages.
Benefits
ARP implements mapping between IP addresses at the network layer and MAC addresses at
the link layer on the Ethernet network. It is the basis for Ethernet communication.
7.2.2 Principles
7.2.2.1 Basic ARP Principles
Related Concepts
l Address Resolution Protocol (ARP) messages
ARP messages include Request messages and Reply messages. Figure 7-4 shows the
ARP message format.

Equipment
Figure 7-4 ARP message format

0 15 23 31
Ethernet Address of destination(0-31)
Ethernet Address of destination(32-47) Ethernet Address of sender(0-15)
Ethernet Address of sender(16-47)
Frame Type Hardware Type
Protocol Type Hardware Length Protocol Length
OP Ethernet Address of sender(0-15)
Ethernet Address of sender(16-47)
IP Address of sender
Ethernet Address of destination(0-31)
Ethernet Address of destination(32-47) IP Address of destination(0-15)
IP Address of destination(16-31)
NOTE
The Ethernet Address of destination contains a total of 48 bits. Ethernet Address of destination
(0-31) indicates the first 32 bits of the Ethernet Address of destination field and Ethernet Address of
destination (32-47) indicates the last 16 bits of the Ethernet Address of destination field.
An ARP message consists of 42 bytes. The first 14 bytes indicate the Ethernet frame
header, and the last 28 bytes are the content of the ARP Request or Reply message.
Table 7-5 describes the fields in an ARP message.
Table 7-5 Description of fields in an ARP message
Field Length Description
Ethernet address 48 bits Ethernet destination MAC address. This field in an ARP
of destination Request message is the broadcast MAC address, with a
value of 0xFF-FF-FF-FF-FF-FF.
Ethernet address 48 bits Ethernet source MAC address.

of sender
Frame type 16 bits Frame type. For an ARP Request or Reply message, the
value of this field is 0x0806.
Hardware type 16 bits Type of the hardware address. For an Ethernet network,
the value of this field is 1.
Protocol type 16 bits Type of the protocol address to be mapped by the

sender. For an IP address, the value of this field is
0x0800.
Hardware 8 bits Length of the hardware address. For an ARP Request or

length Reply message, the value of this field is 6.
Protocol length 8 bits Length of the protocol address. For an ARP Request or
Reply message, the value of this field is 4.

Equipment
OP 16 bits Operation type. The values are as follows:

l 1: ARP requesting
l 2: ARP replying
l 3: RARP requesting
l 4: RARP replying
Ethernet address 48 bits Source MAC address. The value of this field is the same
of sender as the Ethernet source MAC address in the Ethernet
frame header.
IP address of 32 bits Source IP address.

sender
Ethernet address 48 bits Destination MAC address. The value of this field in an
of destination ARP Request message is 0x00-00-00-00-00-00.
IP address of 32 bits Destination IP address.

destination
l ARP table
If a host broadcasts an ARP Request message before it sends every IP datagram, the
communication traffic on the network will greatly increase. Furthermore, all other hosts
on the network have to receive and process the ARP Request messages, which lowers
network efficiency. To solve this problem, an ARP table is maintained on each host to
ensure efficient ARP operations. An ARP table contains the latest mapping between IP
addresses and MAC addresses. The mapping between an IP address and a MAC address
is called an ARP entry.
ARP entries can be classified as dynamic or static.
– Dynamic ARP entries are automatically generated and maintained by using ARP
messages. Dynamic ARP entries can be aged and overwritten by static ARP entries.
– Static ARP entries are manually configured and maintained by a network
administrator. Static ARP entries can neither be aged nor be overwritten by dynamic
ARP entries.
Before sending IP datagrams, a host searches the ARP table for the MAC address
corresponding to the destination IP address.
– If the ARP table contains the corresponding MAC address, the host directly sends
the IP datagrams to the MAC address instead of sending an ARP Request message.
– If the ARP table does not contain the corresponding MAC address, the host
broadcasts an ARP Request message to request the MAC address of the destination
host.
l Reverse Address Resolution Protocol (RARP)
If only the MAC address of a host is available, its IP address can also be obtained using
RARP messages.
You need to establish the mapping between MAC addresses and IP addresses on a
gateway. When a new host must be configured, the RARP client requests the host's IP
address from the RARP server on the gateway.

Equipment
Implementation
l ARP implementation within a network segment
Figure 7-5 shows how ARP is implemented within a network segment, by using IP
datagram transmission from Host A to Host B as an example.
NOTE
The numbers in the following figures correspond to the steps described below the figures.
Figure 7-5 ARP implementation between Host A and Host B in the same network
segment
PE
Port1 Port2
2
CE1 2 CE2
1
4 3
5
HostA HostB HostC HostD

IP address: 10.10.10.1/24 IP address: 10.10.10.2/24
MAC address: 1-1-1 MAC address :2-2-2
ARP Request message
ARP Reply message
IP datagram
a. Host A searches its ARP table and does not find the mapping between the IP
address and MAC address of Host B. Host A then sends an ARP Request message
to request the MAC address of Host B. In this ARP Request message, the source IP
address and source MAC address are respectively the IP address and MAC address
of Host A, the destination IP address and destination MAC address are respectively
the IP address of Host B and 00-00-00-00-00-00, and the Ethernet source MAC
address and Ethernet destination MAC address are respectively the MAC address of
Host A and the broadcast MAC address.
b. After receiving the ARP Request message, CE1 broadcasts it in the network
segment.
c. After receiving the ARP Request message, Host B adds the MAC address of Host A
into its ARP table and sends an ARP Reply message to Host A. In this ARP Reply
message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host B, the destination IP and MAC addresses are respectively the IP
and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC addresses of Host B and Host A.

Equipment
NOTE
The destination IP address in the ARP Request message is not the IP address of PE. Therefore, PE
discards the received ARP Request message.
d. After receiving the ARP Reply message, CE1 forwards it to Host A.
e. After receiving the ARP Reply message, Host A adds the MAC address of Host B
into its ARP table and sends the IP datagrams to Host B.
l ARP implementation between different network segments
NOTE
ARP messages are Layer 2 messages. Therefore, ARP is applicable only to devices on the same
network segment. If two hosts in different network segments need to communicate, the source host
sends IP datagrams to the default gateway, and then the default gateway forwards the IP datagrams
to the destination host. ARP implementation between different network segments involves
separate ARP implementation within network segments. In this manner, hosts in different network
segments can communicate.
Figure 7-6 and Figure 7-7 show how ARP is implemented between different network
segments, by using IP datagram transmission from Host A to Host C as an example.
Figure 7-6 shows the ARP implementation between Host A and PE in the same network
segment. The ARP implementation enables Host A to send IP datagrams to PE.
Figure 7-6 ARP implementation between Host A and PE in the same network segment
3 PE
2
5
Port1 Port2
IP address: IP address:
10.10.10.3/24 10.10.11.1/24
MAC address: 3-3-3 MAC address: 4-4-4
CE1 CE2
4
5 1 2

ARP Request message
ARP Reply message
IP datagram
a. Host A searches its ARP table and does not find the mapping between the IP
address and MAC address of port 1 on the default gateway PE, which is connected
to Host C. Host A then sends an ARP Request message to request the MAC address
of port 1 on PE. In this ARP Request message, the source IP and MAC addresses
are respectively the IP and MAC addresses of Host A, the destination IP and MAC
addresses are respectively the IP address of port 1 on PE and 00-00-00-00-00-00,

Equipment
and the Ethernet source and destination MAC addresses are respectively the MAC
address of Host A and the broadcast MAC address.
segment.
c. After receiving the ARP Request message, PE adds the MAC address of Host A
into its ARP table and sends an ARP Reply message to Host A. In this ARP Reply
message, the source IP and MAC addresses are respectively the IP and MAC
addresses of port 1 on PE, the destination IP and MAC addresses are respectively
the IP and MAC addresses of Host A, and the Ethernet source and destination MAC
addresses are respectively the MAC address of port 1 on PE and the MAC address
of Host A.
NOTE
The destination IP address in the ARP Request message is not the IP address of Host B.
Consequently, Host B discards the received ARP Request message.
d. After receiving the ARP Reply message, CE1 forwards it to Host A.
e. After receiving the ARP Reply message, Host A adds the MAC address of port 1 on
PE into its ARP table and sends the IP datagrams to PE.
Figure 7-7 shows the ARP implementation between PE and Host C in the same network
segment. The ARP implementation enables PE to send the IP datagrams to Host C.
Figure 7-7 ARP implementation between PE and Host C in the same network segment
Routing table
Destination Nexthop Interface

10.10.11.0/24 10.10.11.1 Port2
PE 5
1
Port1 Port2
IP address: IP address:
10.10.10.3/24 10.10.11.1/24 4
CE1 2 CE2
2
3
5

ARP Request message
ARP Reply message
IP datagram

Equipment
PE queries its routing table and sends the IP datagrams from port 1 to port 2.
a. PE searches its ARP table and does not find the mapping between the IP address
and MAC address of Host C. Then, PE sends an ARP Request message to request
the MAC address of Host C. In this ARP Request message, the source IP and MAC
addresses are respectively the IP and MAC addresses of port 2 on PE, the
destination IP and MAC addresses are respectively the IP address of Host C and
00-00-00-00-00-00, and the Ethernet source and destination MAC address are
respectively the MAC address of port 2 on PE and the broadcast MAC address.
segment.
c. After receiving the ARP Request message, Host C adds the MAC address of port 2
on PE into its ARP table and sends an ARP Reply message to PE. In this ARP
Reply message, the source IP and MAC addresses are respectively the IP and MAC
addresses of Host C, the destination IP and MAC addresses are respectively the IP
and MAC addresses of port 2 on PE, and the Ethernet source and destination MAC
addresses are respectively the MAC address of Host C and the MAC address of port
2 on PE.
NOTE
The destination IP address in the ARP Request message is not the IP address of Host D.
Consequently, Host D discards the received ARP Request message.
d. After receiving the ARP Reply message, CE2 forwards it to PE.
e. After receiving the ARP Reply message, PE adds the MAC address of Host C into
its ARP table and sends the IP datagrams to Host C.
So far, the IP datagram transmission from Host A to Host C is complete.
NOTE
1. ARP Request messages are broadcast, whereas ARP Reply messages are unicast.
2. In the ARP implementation, the switches CE1 and CE2 transparently forward IP datagrams and do
not modify them.
7.2.2.2 Dynamic ARP
Definition
Dynamic Address Resolution Protocol (ARP) means that devices dynamically learn and
update the mapping between IP addresses and MAC addresses by using ARP messages. You
do not need to manually configure the mapping.
Related Concepts
l Dynamic ARP aging mechanism
The dynamic ARP aging mechanism enables an ARP entry that is not used in a specified
period to be automatically deleted. By deleting seldom used ARP entries, the dynamic
ARP aging mechanism helps to reduce storage space of ARP tables and speed up ARP
table queries.
Table 7-6 describes concepts related to the dynamic ARP aging mechanism.

Equipment
Table 7-6 Concepts related to the dynamic ARP aging mechanism

Conc Description Usage Scenario
ept
Aging Before a dynamic l If the IP address of the peer device remains

probe ARP entry on a unchanged but its MAC address changes
mode device is aged, the frequently, it is recommended that you configure
device sends ARP ARP aging probe messages to be broadcast.
aging probe messages l If the MAC address of the peer device remains
to the other devices in unchanged, network bandwidth resources are
the same network insufficient, and the aging time of ARP entries is
segment. An ARP set to a small value, it is recommended that you
aging probe message configure ARP aging probe messages to be
can be a unicast or unicast.
broadcast message.
By default, a device
sends the last ARP
aging probe message
in broadcast mode,
and the rest ARP
aging probe messages
are sent in unicast
mode.
Aging A dynamic ARP entry Two interconnected devices can learn the mapping
time has a life cycle. If a between their respective IP and MAC addresses
dynamic ARP entry is using ARP and can save the mapping in their ARP
not updated before its tables. Then, the two devices can communicate by
life cycle ends, this using the ARP entries. When the peer device
dynamic ARP entry is becomes faulty, or the network adapter of the peer
deleted from the ARP device is replaced but the local device does not
table. The life cycle is receive any status change information about the
called aging time. peer device, the local device continues sending IP
datagrams to the peer device. As a result, network
traffic is interrupted because the ARP table of the
local device is not updated in time. To reduce the
risk of network traffic interruption, an aging timer
can be set for each ARP entry. After the aging timer
of a dynamic ARP entry expires, the entry is
automatically deleted.

Equipment
Conc Description Usage Scenario

ept
Numb Before a dynamic The ARP aging timer can help reduce the risk of
er of ARP entry is aged, network traffic interruptions that occur because an
aging the local device sends ARP table is not updated quickly enough, but
probe ARP aging probe cannot eliminate problems due to delays.
attemp messages to the peer Specifically, if the length of a dynamic ARP entry
ts device. If the local aging timer is N seconds, the local device can detect
device does not the status change of the peer device after N seconds.
receive an ARP Reply During the N seconds, the ARP table of the local
message after the device is not updated. If the number of aging probe
number of aging attempts is specified, the local device can obtain the
probe times reaches status change information about the peer device and
the specified number, update its ARP table.
the dynamic ARP
entry is deleted.
Enhanced Functions
Layer 2 topology probe
With the Layer 2 topology probe function, the aging time of all ARP entries corresponding to
the VLAN to which a Layer 2 interface belongs is set to 0 when the status of the Layer 2
interface changes from Down to Up. Then, the device resends ARP probe messages to update
all the ARP entries.
If a non-Huawei device is interconnected with a Huawei device, the non-Huawei device does
not respond to an ARP aging probe message with the destination MAC address as the
broadcast MAC address if the ARP table of the non-Huawei device contains the mapping
between the IP address and MAC address of the Huawei device. Then, the Huawei device
considers that the link to the non-Huawei device is in the Down state and deletes the mapping
between the IP address and MAC address of the non-Huawei device. Therefore, if a non-
Huawei device is interconnected with a Huawei device, configure the Huawei device to
unicast ARP aging probe messages to the non-Huawei device.
Implementation
Devices dynamically learn and update the mapping between IP addresses and MAC addresses
by using ARP messages. The process involves the creation, update, and aging of dynamic
ARP entries.
l Creating and updating dynamic ARP entries
If an ARP message received by a device meets any of the following conditions, the
system automatically creates or updates the corresponding ARP entry:
– The source IP address of the ARP message is in the same network segment as the IP
addresses of inbound interfaces. The destination IP address of the ARP message is
the IP address of the interface on the device.
addresses of inbound interfaces. The destination IP address of the ARP message is
the virtual IP address of the Virtual Router Redundancy Protocol (VRRP) backup
group configured on the interface on the device.

Equipment
addresses of inbound interfaces, which are virtual Ethernet interfaces applied in the
IP over Ethernet over AAL5 (IPoEoA) service.
l Aging dynamic ARP entries
After the aging timer of a dynamic ARP entry on a device expires, the device sends ARP
aging probe messages to the peer device. If the device does not receive an ARP Reply
message after the number of aging probe attempts reaches the specified number, the
dynamic ARP entry is aged.
Usage Scenarios
Dynamic ARP is applicable to a network with a complex topology, insufficient bandwidth
resources, and a high requirement for real-time communication.
Benefits
Dynamic ARP entries are dynamically created and updated using ARP messages. They do not
need to be manually maintained, greatly reducing maintenance workload.
7.2.2.3 Static ARP
Definition
Static Address Resolution Protocol (ARP) means that the mapping between IP addresses and
MAC addresses is manually created by a network administrator.
Principles
Static ARP and dynamic ARP differ in ARP entry creation and maintenance methods.
Dynamic ARP entries are automatically created and maintained using ARP messages,
whereas static ARP entries are manually configured and maintained by a network
administrator. The advantages and disadvantages of dynamic ARP and static ARP are as
follows:
l Dynamic ARP
– Advantages: Dynamic ARP entries do not need to be manually configured and
maintained. When a device becomes faulty or the network adapter on a host is
frequently replaced, the ARP entry can be updated in real time. Maintenance
workload is greatly reduced.
– Disadvantages:
n Dynamic ARP entries can be aged and overwritten by new dynamic ARP
entries. This affects the stability and security of network communications.
n The execution of dynamic ARP consumes some network resources, which may
affect user services. Therefore, dynamic ARP is not applicable to a network
with insufficient bandwidth resources.
l Static ARP
– Advantages:
n Static ARP entries are neither aged nor overwritten by dynamic ARP entries.
This ensures the reliability of network communications.

Equipment
n Static ARP configuration binds IP addresses and MAC addresses. This

prevents network attackers from modifying ARP entries by using ARP
messages, ensuring the security of network communications.
n Static ARP configuration replaces dynamic ARP execution, reducing the
network resource consumption.
– Disadvantages: A network administrator must manually configure static ARP
entries. Therefore, maintenance workload is heavy if the network structure
frequently changes.
To ensure the stability and security of network communications, you can deploy static ARP
based on actual requirements and network resources.
Static ARP can implement the following functions:
l Certain IP addresses can be bound to the MAC address of a specified gateway by
configuring static ARP entries. Then, IP datagrams destined for these IP addresses must
be forwarded by the specified gateway.
l The destination IP addresses of certain IP datagrams sent by a specified host can be
bound to a nonexistent MAC address, helping filter out unnecessary IP datagrams.
Related Concepts
Static ARP entries are classified into short and long entries.
l Short static ARP entries
The short static ARP entries cannot be used to forward messages directly. Users send
ARP request messages. If the source IP and MAC addresses of the received reply
messages are the same as the configured IP and MAC addresses, the interface receiving
ARP reply messages is added into the static ARP entries. Therefore, devices can use this
interface to forward messages directly.
NOTE
If a MAC address is configured for multiple interfaces, the short static ARP entry in which the
MAC address exists cannot be updated by users.
l Long static ARP entries
When configuring long static ARP entries, configure IP and MAC addresses as well as
the VLAN and outbound interface through which devices send messages based on the
ARP entries. Long static ARP entries are used to forward messages directly.
Usage Scenarios
l Static ARP is applicable to a network with a simple topology and high stability.
l Static ARP is applicable to a network where information security is of high priority, such
as a governmental network or military network.
NOTE
Short static ARP entries mainly apply to a scenario in which network administrators want to bind users'
IP and MAC addresses but users' access interfaces can change.
Benefits
Static ARP ensures the communication security. If a static ARP entry is configured on a
device, the device can communicate with the peer device using only the specified MAC
address. Network attackers cannot modify the mapping between the IP address and MAC
address by using ARP messages, ensuring normal communications between the two devices.

Equipment
7.2.2.4 ARP Automatic Scanning and Fixed ARP
Background
Static ARP protects a network against ARP spoofing attacks. However, network
administrators must configure static ARP entries, which can be time-consuming and
laborious, and errors may occur during the configuration. ARP automatic scanning and fixed
ARP solve this problem, while ensuring reliable and secure network operations.
Related Concepts
ARP automatic scanning: A device automatically sends ARP Request packets to all its
neighbor devices on a local area network (LAN) to obtain the MAC addresses of the neighbor
devices and generate dynamic ARP entries.
Fixed ARP: The device converts the generated dynamic ARP entries to static ARP entries.
ARP automatic scanning is generally used with fixed ARP. A device uses ARP automatic
scanning to generate dynamic ARP entries and uses fixed ARP to convert these dynamic ARP
entries to static ARP entries. These features prevent network attackers from modifying ARP
entries to attack the network.
Implementation
Figure 7-8 shows a network that implements ARP automatic scanning and fixed ARP.
Figure 7-8 ARP automatic scanning and fixed ARP
Internet
PE
CE1 CE2
ARP Scan packet ARP Reply packet
Host A, Host B, Host C, and HostD on a LAN communicate with the Internet through a
provider edge (PE) on the network shown in Figure 7-8. The implementation of ARP
automatic scanning and fixed ARP is as follows:

Equipment
1. After the PE is configured with ARP automatic scanning, the PE sends ARP Request
packets to each host to learn their MAC addresses and generate dynamic ARP entries.
2. After the PE is configured with fixed ARP, the PE converts the generated dynamic ARP
entries to static ARP entries.
Usage Scenario
ARP automatic scanning and fixed ARP apply to small-sized LANs.
Benefits
ARP automatic scanning and fixed ARP rapidly configure static ARP entries to maintain
reliable and secure network communications.
7.2.2.5 Gratuitous ARP
Principles
To ensure the stability and reliability of network communication, a device can broadcast
gratuitous Address Resolution Protocol (ARP) messages to notify the other devices in the
same network segment of its address information in the following scenarios:
l You need to check whether the IP address of a device conflicts with the IP address of
another device in the same network segment. The IP address of each device must be
unique to ensure the stability of network communication.
l After the MAC address of a host changes after its network adapter is replaced, the host
must quickly notify other devices in the same network segment of the MAC address
change before the ARP entry is aged. This ensures the reliability of network
communication.
l When a master/slave switchover occurs in a Virtual Router Redundancy Protocol
(VRRP) backup group, the new master router needs to notify other devices in the same
network segment of its status change.
Related Concepts
Gratuitous ARP message
A gratuitous ARP message is a special ARP message. The source and destination IP addresses
in a gratuitous ARP message are the IP addresses of the sender.
Implementation
l If a device finds that the source IP address in a received gratuitous ARP message is the
same as its own IP address, the device sends a gratuitous ARP message to notify the
sender of the address conflict.
l If a device finds that the source IP address in a received gratuitous ARP message is
different from its own IP address, the device maintains the corresponding ARP entry
based on the information (such as the sender's IP address and MAC address) carried in
the gratuitous ARP message.
Figure 7-9 shows how gratuitous ARP is implemented.

Equipment
Figure 7-9 Gratuitous ARP implementation
PE1 CE PE2
Port1 Port2
Port1 Port2
ARP Reque
st message
es sage
it ou s ARP m
Grat u
Gratuitou
s ARP m
essage
…
As shown in Figure 7-9, the IP address of port 1 on PE1 is 10.1.1.1, and the IP address of port
2 on PE2 is 10.1.1.1.
1. Port 1 broadcasts an ARP Request message. Port 2 receives the ARP Request message
and finds that the source IP address in the message conflicts with its own IP address.
Then, port 2 performs the following operations:
a. Port 2 sends a gratuitous ARP message to notify port 1 of its IP address.
b. A conflict node is generated on the conflict link of port 2. Then, port 2 sends
gratuitous ARP messages to port 1 at an interval of 5 seconds.
2. Port 1 receives the gratuitous ARP messages from port 2 and finds that the source IP
address in the message conflicts with its own IP address. Then, port 1 performs the
following operations:
a. Port 1 sends a gratuitous ARP message to notify port 2 of its IP address.
b. A conflict node is generated on the conflict link of port 1. Then, port 1 sends
gratuitous ARP messages to port 2 at an interval of 5 seconds.
Port 1 and port 2 send gratuitous ARP messages to each other at an interval of 5 seconds until
the address conflict is rectified.
If one port does not receive a gratuitous ARP message from the other port within 8 seconds,
the port considers that the address conflict has been rectified. The port deletes the conflict
node on its conflict link and stops sending gratuitous ARP messages to the other port.
Functions
Gratuitous ARP implements the following functions:
l To check for IP address conflict on the network, send a gratuitous ARP message from a
device. If the device receives a gratuitous ARP message from another device, the IP
addresses of the two devices conflict.

Equipment
l When the MAC address of a host changes after its network adapter is replaced, the host
sends a gratuitous ARP message to notify other devices of the MAC address change
before the ARP entry is aged. This ensures the reliability of network communication.
After receiving the gratuitous ARP message, other devices maintain the corresponding
ARP entry in their ARP tables based on the address information carried in the message.
l When a master/slave switchover occurs in the VRRP backup group, the new master
router sends a gratuitous ARP message to notify other devices on the network of its
status change.
Benefits
Gratuitous ARP reveals address conflict on a network so that ARP tables of devices can be
quickly updated. This ensures the stability and reliability of network communication.
7.2.2.6 Proxy ARP
Principles
The Address Resolution Protocol (ARP) is applicable only to devices on the same physical
network. When a device on a physical network needs to send IP datagrams to another physical
network, the gateway needs to query the routing table to implement communication between
the two networks. However, routing table query consumes system resources and can affect
other services. To resolve the problem, you can deploy proxy ARP on an intermediary device.
The proxy ARP feature helps reduce system resource consumption caused by routing table
queries and improve the efficiency of system processing.
Implementation
l Routed proxy ARP
A large network of a company is usually divided into multiple subnets to facilitate
management. The routing information of a host in a subnet can be modified so that IP
datagrams sent from this host to another subnet is first sent to the gateway and then to
another subnet. With this solution, devices are hard to manage and maintain, but
deploying proxy ARP on the gateway effectively resolves the management and
maintenance problems caused by network division.
Figure 7-10 shows how proxy ARP is implemented using the communication between
Host A and Host B as an example.

Equipment
Figure 7-10 Typical networking diagram for routed proxy ARP

HostA HostB
3
PE
1
2
Port1
IP address: 10.10.10.2/24
MAC address: 2-2-2
Destination IP Destination MAC
ARP Request address address
message 10.10.11.1 FF-FF-FF

ARP Reply address address
message 10.10.10.1 1-1-1

address address
IP datagram
10.10.11.1 2-2-2
a. Host A sends an ARP Request message to request the MAC address of Host B.
b. After receiving the ARP Request message, PE checks the destination IP address of
the message and finds that the requested MAC address is not its MAC address. PE
then checks whether there are routes to Host B.
n If there are routes to Host B, PE checks whether routed proxy ARP is enabled
on it.
○ If routed proxy ARP is enabled on PE, PE sends the MAC address of its
port 1 to Host A.
○ If routed proxy ARP is not enabled on PE, PE discards the ARP Request
message sent by Host A.
n If there are no routes to Host B, PE discards the ARP Request message sent by
Host A.
c. After learning the MAC address of port 1, Host A sends IP datagrams to PE based
on this MAC address.
After receiving the IP datagrams, PE forwards them to Host B.
l Proxy ARP within a VLAN
Figure 7-11 shows how proxy ARP is implemented within a VLAN by using the
communication between Host A and Host C as an example.

Equipment
Figure 7-11 Typical networking diagram for proxy ARP within a VLAN
VLAN 4
VLANIF 4
Interface CE IP address:
isolation 10.10.10.4/24
deployed on CE MAC address:
4-4-4
3
1
2
HostA HostB HostC


message
10.10.10.3 FF-FF-FF

message
10.10.10.1 1-1-1

address address
IP datagram
10.10.10.3 4-4-4
Host A, Host B, and Host C belong to the same VLAN. Port isolation is configured on
CE. Therefore, Host A and Host C cannot communicate at Layer 2. You can configure a
VLANIF interface on CE and enable proxy ARP within a VLAN to implement
communication between Host A and Host C.
a. Host A sends an ARP Request message to request the MAC address of Host C.
b. After receiving the ARP Request message, CE checks the destination IP address of
the message and finds that the requested MAC address is not the MAC address of
its VLANIF 4. Then, CE searches its ARP table for the ARP entry indicating the
mapping between the IP address and MAC address of Host C.
n If CE finds this ARP entry in its ARP table, CE checks whether proxy ARP
within a VLAN is enabled on it.
○ If proxy ARP within a VLAN is enabled on CE, CE sends the MAC
address of its VLANIF 4 to Host A.
○ If proxy ARP within a VLAN is not enabled on CE, CE discards the ARP
Request message sent by Host A.
n If CE does not find this ARP entry in its ARP table, CE discards the ARP
Request message sent by Host A and checks whether proxy ARP within a
VLAN is enabled on it.
○ If proxy ARP within a VLAN is enabled on CE, CE sends the ARP
Request message to Host C. After CE receives an ARP Reply message

Equipment
from Host C, an ARP entry indicating the mapping between the IP

address and MAC address of Host C is generated in the ARP table.
○ If proxy ARP within a VLAN is not enabled on CE, CE does not perform
any operations.
c. After learning the MAC address of VLANIF 4, Host A sends IP datagrams to CE
based on this MAC address.
After receiving the IP datagrams, CE forwards them to Host C.
l Proxy ARP between VLANs
Figure 7-12 shows how proxy ARP is implemented between VLANs by using the
communication between Host A and Host B as an example.
Figure 7-12 Typical networking diagram for proxy ARP between VLANs
PE
Super-VLAN 4
VLANIF 4
IP address:10.10.10.3/24
MAC address: 3-3-3
3 2 1 VLAN 3 VLAN 2
Sub-VLAN 3 Sub-VLAN 2
HostA HostB
message 10.10.10.2 FF-FF-FF

message 10.10.10.1 1-1-1

address address
IP datagram
10.10.10.2 3-3-3
Host A belongs to VLAN 3, whereas Host B belongs to VLAN 2. Therefore, Host A

cannot communicate with Host B. You can configure a VLANIF interface on PE and
enable proxy ARP between VLANs on super-VLAN 4 to implement communication
between Host A and Host B.

Equipment
a. Host A sends an ARP Request message to request the MAC address of Host B.
b. After receiving the ARP Request message, PE checks the destination IP address of
the message and finds that the requested MAC address is not the MAC address of
its VLANIF 4. Then, PE searches its ARP table for the ARP entry indicating the
mapping between the IP address and MAC address of Host B. The ARP entries
include dynamically learned and statically configured ARP entries.
n If PE finds this ARP entry in its ARP table, PE checks whether proxy ARP
between VLANs is enabled on it.
○ If proxy ARP between VLANs is enabled on PE, PE sends the MAC
address of its VLANIF 4 to Host A.
○ If proxy ARP between VLANs is not enabled on PE, PE discards the
ARP Request message sent by Host A.
n If PE does not find this ARP entry in its ARP table, PE discards the ARP
Request message sent by Host A and checks whether proxy ARP between
VLANs is enabled on it.
○ If proxy ARP between VLANs is enabled on PE, PE sends the ARP
Request message to Host B. After PE receives an ARP Reply message
from Host B, an ARP entry indicating the mapping between the IP
address and MAC address of Host B is generated in the ARP table.
○ If proxy ARP between VLANs is not enabled on PE, PE does not perform
any operations.
c. After learning the MAC address of VLANIF 4, Host A sends IP datagrams to PE
based on this MAC address.
After receiving the IP datagrams, PE forwards them to Host B.
Usage Scenarios
Table 7-7 describes the usage scenarios for the three types of ARP.
Table 7-7 Proxy ARP usage scenarios
Proxy ARP Usage Scenario

Type
Routed Two hosts that need to communicate belong to the same network segment
proxy ARP but are located on different physical networks.
Proxy ARP Two hosts that need to communicate belong to the same VLAN in which
within a user isolation is configured and the same network segment.
VLAN
Proxy ARP Two hosts that need to communicate belong to the same network segment
between but different VLANs.
VLANs NOTE
In the VLAN aggregation scenario, proxy ARP between VLANs can be enabled on
the VLANIF interface corresponding to the super-VLAN to implement
communication between sub-VLANs.

Equipment
Benefits
l Proxy ARP enables a host on a network to mistakenly consider that the destination host
is in the same network segment. In this manner, the details of the physical network are
hidden, and the division of the network into subnets is transparent to hosts.
l All processing related to proxy ARP is performed on a gateway, with no configuration
needed on the hosts connected to it. In addition, proxy ARP affects only the ARP tables
on hosts and does not affect the ARP table and routing table on a gateway.
7.2.2.7 ARP-Ping
Principles
ARP-Ping includes ARP-Ping IP and ARP-Ping MAC, and is used to maintain a network on
which Layer 2 features are deployed (ARP refers to Address Resolution Protocol).
l ARP-Ping IP
Before configuring an IP address for a device, check whether the IP address is being
used by another device. Generally, the ping command can be used to check whether an
IP address is being used. However, if a firewall is configured for the device using the IP
address, and the firewall is configured not to respond to ping messages, you may
mistakenly believe that the IP address is not being used. To solve this problem, use the
ARP-Ping IP feature. ARP messages are Layer 2 protocol messages and, in most cases,
can pass through a firewall configured not to respond to ping messages.
l ARP-Ping MAC
The host's MAC address is the fixed address of the network adapter on the host. It does
not normally need to be configured manually; however, there are exceptions. For
example, if a device has multiple interfaces and the manufacturer does not specify MAC
addresses for these interfaces, the MAC addresses must be configured, or a virtual MAC
address must be configured for a Virtual Router Redundancy Protocol (VRRP) backup
group. Before configuring a MAC address, use the ARP-Ping MAC feature to check
whether the MAC address is being used by another device.
Related Concepts
l ARP-Ping IP
A device obtains the specified IP address and outbound interface number from the
configuration management plane, saves them to the buffer, constructs an ARP Request
message, and broadcasts the message on the outbound interface. If the device does not
receive an ARP Reply message within a specified period, the device displays a message
indicating that the IP address is not being used by another device. If the device receives
an ARP Reply message, and the specified timeout expires, the device compares the
source IP address in the ARP Reply message with the IP address stored on the buffer. If
the two IP addresses are the same, the device displays the source MAC address in the
ARP Reply message and displays a message indicating that the IP address is being used
by another device.
l ARP-Ping MAC
The ARP-Ping MAC process is similar to the ping process. It varies in that ARP-Ping
MAC is applicable only to directly connected Ethernet LANs or Layer 2 Ethernet virtual
private networks (VPNs). A device obtains the specified MAC address and outbound
interface number (optional) from the configuration management plane, constructs an
Internet Control Message Protocol (ICMP) Echo Request message, and broadcasts the

Equipment
message on the outbound interface. If the device does not receive an ICMP Echo Reply
message within a specified period, the device displays a message indicating that the
MAC address is not being used by another device. If the device receives an ICMP Echo
Reply message within a specified period, the device compares the source MAC address
in the message with the MAC address stored on the device. If the two MAC addresses
are the same, the device displays the source IP address in the ICMP Echo Reply message
and displays a message indicating that the MAC address is being used by another device.
Implementation
l ARP-Ping IP implementation
Figure 7-13 ARP-Ping IP implementation

HostA
10.1.1.2/32
ATN A
GE0/2/0
10.1.1.1/24
Ethernet A
ARP Request message

ARP Reply message
As shown in Figure 7-13, ATN A can use ARP-Ping IP to check whether the IP address
10.1.1.2 is being used. ATN A receives an ARP Reply message from Host A, with an IP
address of 10.1.1.2. After the specified timeout expired, ATN A displays the MAC
address of Host A along with a message indicating that the IP address is in use by
another host.
The ARP-Ping IP implementation process is as follows:
a. After the IP address 10.1.1.2 is specified using a command line on ATN A, ATN A
broadcasts an ARP Request message and starts a timer for ARP Reply messages.
b. After receiving the ARP Request message, Host A on the same LAN finds that the
destination IP address in the message is the same as its own IP address and sends an
ARP Reply message to ATN A.
c. After receiving the ARP Reply message, and the specified timeout expires, the
device compares the source IP address in the ARP Reply message with the IP
address stored on the device.
n If the two IP addresses are the same, ATN A displays the source MAC address
in the message and displays a message indicating that the IP address is being
used by another host. Meanwhile, ATN A stops the timer for ARP Reply
messages.
n If the two IP addresses are different, ATN A discards the ARP Reply message
and displays a message indicating that the IP address is not being used by
another host.

Equipment
If ATN A does not receive any ARP Reply messages before the ARP Reply
message timer expires, it displays a message indicating that the IP address is not
being used by another host.
l ARP-Ping MAC implementation
Figure 7-14 ARP-Ping MAC implementation

HostA
0013-46E7-2EF5
ATN A
GE0/2/0
10.1.1.1/24
Ethernet A
ICMP Echo Request message

ICMP Echo Reply message
As shown in Figure 7-14, ATN A can use ARP-Ping MAC to check whether the MAC
address 0013-46E7-2EF5 is being used by another host. After receiving ICMP Echo
Reply messages from all the hosts on the network, ATN A displays the IP address of the
host with a MAC address of 0013-46E7-2EF5 and displays a message indicating that the
MAC address is being used by another host.
The ARP-Ping MAC implementation process is as follows:
a. After the MAC address 0013-46E7-2EF5 is specified using a command line on
ATN A, ATN A broadcasts an ICMP Echo Request message and starts a timer for
ICMP Echo Reply messages.
b. After receiving the ICMP Echo Request message, all the other hosts on the same
LAN send ICMP Echo Reply messages to ATN A.
c. If ATN A receives an ICMP Echo Reply message from a host, ATN A compares the
source MAC address in the message with the MAC address in the command line.
n If the two MAC addresses are the same, ATN A displays the source IP address
in the ICMP Echo Reply message and displays a message indicating that the
MAC address is being used by another host. Meanwhile, ATN A stops the
timer for ICMP Echo Reply messages.
n If the two MAC addresses are different, ATN A discards the ICMP Echo Reply
message and displays a message indicating that the MAC address is not being
used by another host.
If ATN A does not receive any ICMP Echo Reply messages before the ICMP Echo
Reply message timer expires, it displays a message indicating that the MAC address
is not being used by another host.

Equipment
Usage Scenarios
ARP-Ping is applicable to directly connected Ethernet LANs or Layer 2 Ethernet VPNs.
Benefits
ARP-Ping checks whether an IP address or MAC address to be configured is being used by
another device, preventing address conflict.
7.2.2.8 IP Address Conflict Detection
Background
The occurrence of an IP address conflict causes route flapping and traffic interruptions,
affecting user services. IP address conflicts are often caused by incorrect networking or
configurations. Users expect that devices can automatically detect IP address conflicts on a
network and immediately notify users of conflict reasons, so that they can rapidly resolve
such conflicts and minimize impact on services.
IP address conflict detection can help users quickly locate and modify the conflicted IP
addresses and instruct users to properly configure and manage the IP addresses of devices on
a network.
Implementation
IP address conflict detection can be classified into active and passive detection, and their
differences are as follows:
l Active detection
When the protocol status of an interface on a device changes to Up, the device actively
sends gratuitous ARP packets to detect possible IP address conflicts. For the detailed
detection procedure, see 7.2.2.5 Gratuitous ARP.
l Passive detection
When a device receives ARP packets that are not gratuitous ARP packets, it checks the
IP addresses carried by the ARP packets. The device concludes that IP address conflicts
exist on the network if any of the following conditions are met:
– The source IP address in an ARP packet is the same as the IP address of the
inbound interface that receives the ARP packet, but the source MAC address in the
ARP packet is different than the MAC address of the inbound interface.
– The source IP address in an ARP packet is the same as the IP address in an existing
ARP entry, but the source MAC address is different than the MAC address in the
ARP entry.
– The source IP address in an ARP packet is different than the CE IP address
configured on the inbound interface that connects to the CE, or the source MAC
address is different than the CE MAC address configured on the inbound interface
that connects to the CE.
– The source IP address in an ARP packet is 0.0.0.0 (probe ARP packet), The
destination IP address is the same as the IP address of the inbound interface that
receives the ARP packet, but the source MAC address in the ARP packet is
different than the MAC address of the inbound interface.

Equipment
Usage Scenario
IP address conflict detection is applicable to Ethernet LANs.
Benefits
IP address conflict detection helps users quickly locate and modify IP address conflicts to
ensure stability and security of user services.
7.2.2.9 ARP Security
Principles
The Address Resolution Protocol (ARP) is simple and easy to implement. It is the basis for
Ethernet communication. However, ARP does not provide any security mechanisms.
Attackers can modify ARP entries by transmitting pseudo ARP messages to attack the
network. ARP attacks and ARP viruses pose a serious threat to LAN security. Network
devices must be able to utilize various technologies to effectively detect and avoid ARP
attacks.
ARP security ensures the security and robustness of network devices by filtering out untrusted
ARP messages and enabling timestamp suppression on certain ARP messages.
Related Concepts
ARP Miss message
An ARP Miss message is reported by a device to the upper-layer software when the device
fails to find a matched ARP entry for IP datagram forwarding. After receiving the ARP Miss
message, the upper-layer software generates a fake ARP entry and sends it to the device. The
upper-layer software then sends an ARP Request message to request the destination MAC
address. After receiving an ARP Reply message, the upper-layer software learns address
information in the message and sends the real ARP entry to the device to replace the fake
ARP entry. The device can then forward IP datagrams.
A dynamic fake ARP entry has an aging time.
l Before the aging time elapses, the device stops sending ARP Miss messages to the
upper-layer software.
l After the aging time elapses, the dynamic fake ARP entry is deleted. If the device still
cannot find the matched ARP entry, the device sends another ARP Miss message to the
upper-layer software.
Implementation
Table 7-8 shows how ARP security is implemented.

Equipment
Table 7-8 ARP security implementation

ARP Implementation Usage Scenario
Security
Feature
ARP If a device receives multiple ARP In the case of special networking or

message messages with the same source IP ARP attacks, a device may receive
suppressi address in a specific period, the multiple ARP messages with the same
on device notifies the source device that source IP address in a specific period.
the ARP messages have been Consequently, the device needs to
received but does not update the ARP repetitively update the ARP entry,
table. which affects the processing of other
NOTE services.
If ARP message suppression is enabled
on all interfaces, the ARP entries of
interfaces that do not encounter ARP
attacks cannot be correctly updated. ARP
message suppression is applicable only to
VLANIF and Eth-Trunk interfaces. By
default, ARP message suppression is
always available to VLANIF interfaces
and can be enabled on Eth-Trunk
interfaces based on requirements.
Strict A device learns only address A network encounters a great many

ARP information carried in the ARP Reply ARP attacks. Attackers send a large
entry messages corresponding to the ARP number of fake ARP Request and
learning Request messages sent by the device. Reply messages to attack network
The device does not learn address devices, resulting in ARP entry
information carried in the ARP overflow or ARP denial of service
Request messages sent from other (DoS).
devices. Strict ARP entry learning
prevents attacks from most ARP
Request messages.
Interface- The number of ARP entries that an An unauthorized user sends a large
based interface can learn is restricted, number of ARP messages to a device.
ARP effectively preventing ARP entry This results in the device having to
entry overflow and ensuring ARP entry learn a large number of ARP entries in
restrictio security. a short period of time, causing ARP
n entry overflow. As a result, authorized
users cannot use the network as
normal.
Timestam A device counts received ARP An unauthorized user sends a large

p messages. If the number of ARP number of ARP messages to a device.
suppressi messages received in a specified Many resources are diverted into
on on period exceeds the threshold, the processing these ARP messages, and
ARP device does not process excess ARP the processing of other services is
messages messages. affected.
NOTE
Currently, timestamp suppression on
ARP messages can be performed based
only on source IP addresses or
destination IP addresses.

Equipment
ARP Implementation Usage Scenario

Security
Feature
Timestam A device counts received ARP Miss Unauthorized users use specific tools
p messages. If the number of ARP Miss to send a large number of ARP
suppressi messages received in a specified messages to hosts in the local network
on on period exceeds the threshold, the segment or other network segments.
ARP device does not process excess ARP Many ARP Miss messages are
Miss Miss messages. generated because MAC addresses
messages NOTE corresponding to the destination IP
Currently, timestamp suppression on addresses do not exist. Devices have
ARP Miss messages can be performed to spend a lot of resources processing
based only on source IP addresses. these ARP Miss messages, and the
processing of other services is
affected.
Enhanced Functions
ARP security not only provides solutions to various attacks, but also sends alarms when
potential attack behaviors are encountered.
The sending of alarms for potential attack behaviors is an enhancement of timestamp

suppression on ARP messages and ARP Miss messages. If timestamp suppression and alarm
sending for potential attack behaviors are enabled, a device sends alarms and generates logs
for ARP messages discarded due to timestamp suppression. These logs include information
such as the source and destination IP addresses of the discarded ARP messages, virtual
private network (VPN) instances, and numbers of the physical interfaces receiving the ARP
messages.
Usage Scenarios
ARP security is deployed at the access layer and aggregation layer.
l After ARP security is deployed at the edge of the access layer, a device can learn only
address information carried in the ARP Reply messages corresponding to the ARP
Request messages sent by the device. This mechanism prevents attacks from most ARP
Request messages.
l After ARP security is deployed at the edge of the aggregation layer, many untrusted ARP
messages are filtered out and timestamp suppression is performed on certain ARP
messages. This mechanism ensures security and stability of core network devices.
Benefits
ARP security ensures the reliability of network communication, and the security and
robustness of network devices.
7.2.3 Applications

Equipment
7.2.3.1 Application of Static ARP
As shown in Figure 7-15, the intranet of an organization communicates with the Internet by
using the gateway PE. You can deploy static Address Resolution Protocol (ARP) to prevent
network attackers from obtaining private information by modifying ARP entries on PE.
Figure 7-15 Typical networking diagram for static ARP

Attacker
Internet
PE
CE1 CE2
l Before static ARP is deployed, PE dynamically learns and updates ARP entries using
ARP messages. However, dynamic ARP entries can be aged and overwritten by new
dynamic ARP entries. Therefore, network attackers can send fake ARP messages to
modify ARP entries on PE to obtain the private information of the organization.
l After static ARP is deployed, ARP entries on PE are manually configured and
maintained by a network administrator. Static ARP entries are neither aged nor
overwritten by dynamic ARP entries. Therefore, deploying static ARP can prevent
network attackers from sending pseudo ARP messages to modify ARP entries on PE,
and information security is ensured.
Feature Deployment
Deploy static ARP on PE to set up fixed mapping between IP addresses and MAC addresses
of hosts on the intranet. This can prevent network attackers from sending pseudo ARP
messages to modify ARP entries on PE, ensuring the stability and security of network
communication and minimizing the risk of private information being stolen.
7.2.3.2 Application of Proxy ARP Within a VLAN
As shown in Figure 7-16, to facilitate ease of management, communication isolation is
implemented for various departments on the intranet of a company. For example, although

Equipment
Host A of the president's office, Host B of the R&D department, and Host C of the financial
department belong to the same VLAN, they cannot communicate at Layer 2. However, the
business requires that the president's office communicate with the financial department. To
permit this, you can enable proxy Address Resolution Protocol (ARP) within a VLAN on CE
so that Host A can communicate with Host C.
l Before proxy ARP within a VLAN is enabled, if Host A sends an ARP Request message
to request the MAC address of Host C, the message cannot be broadcast to hosts of the
R&D department and financial department due to interface isolation configured on CE.
Therefore, Host A can never learn the MAC address of Host C and cannot communicate
with Host C.
l After proxy ARP within a VLAN is enabled, CE does not discard an ARP Request
message sent from Host A although the destination IP address in the message is not the
IP address of CE. Instead, CE sends the MAC address of its VLANIF 4 to Host A. Then,
Host A sends IP datagrams to this MAC address.
Figure 7-16 Typical networking diagram for proxy ARP within a VLAN
VLAN 4
VLANIF 4
Interface CE IP address:
isolation 10.10.10.4/24
deployed on CE MAC address:
4-4-4
3
3
1
2
President R&D Financial

office department department
HostA HostB HostC

ARP Request message
ARP Reply message
IP datagram
Feature Deployment
Configure VLANIF 4, which is a Layer 3 interface, on CE, and enable proxy ARP within a
VLAN on VLANIF 4. After the deployment, CE sends the MAC address of its VLANIF 4 to
Host A when receiving a request for the MAC address of Host C from Host A. Host A then
sends IP datagrams to CE, which forwards the IP datagrams to Host C. Consequently, the
communication between Host A and Host C is implemented.

Equipment

Acronym & Full Name
Abbreviation
IPoEoA IP over Ethernet over AAL5
NAT network address translation
RARP Reverse Address Resolution Protocol
VE virtual Ethernet
7.3 ACL
7.3.1 Introduction to the ACL

Definition
An Access Control List (ACL) is a set of sequential rules. These rules are described based on
the source address, destination address, and port number of a packet. The ACL filters packets
according to the specified rules. With the rules applied to a device, the device permits or
denies the packets according to the rules. For example, you can set a rule in the ACL to
prevent any user terminal from logging in to a device through Telnet, or to allow every user
terminal to send emails to a device through the Simple Mail Transfer Protocol (SMTP).
NOTE
In this document, if an ACL function supports both IPv4 and IPv6, the implementation of this ACL
function is the same for IPv4 and IPv6 unless otherwise specified. For ACL function support for IPv4
and IPv6 and implementation differences between IPv4 and IPv6, see Appendix.
Purpose
ACLs are used to ensure reliable data transmission between devices on a network by
performing the following:
l Defend the network against various attacks, such as attacks by using IP, Transmission
Control Protocol (TCP), or Internet Control Message Protocol (ICMP) packets.
l Control network access. For example, ACLs can be used to control enterprise network
user access to external networks, to specify the specific network resources accessible to
users, and to define the time ranges in which users can access networks.
l Limit network traffic and improve network performance. For example, ACLs can be
used to limit the bandwidth for upstream and downstream traffic and to apply charging

Equipment
rules to user requested bandwidth, therefore achieving efficient utilization of network

resources.
Benefits
ACL rules are used to classify packets. After ACL rules are applied to a device, the device
permits or denies packets based on them. The use of ACL rules therefore greatly improves
network security.
NOTE
An ACL is a set of rules. It identifies a type of packet but does not filter packets. Other ACL-associated
functions are used to filter identified packets.
7.3.2 Principles
An ACL manages all rules configured by users and provides rule matching algorithm for
services. Services then can permit or deny packets according to the matched rule.
Management of ACL
As a group of rules, each ACL can store multiple rules. When extra ACL groups or rules are
added, the system prompts a configuration failure message.
Rule Matching in an ACL

If an ACL exists and there are rules that meet matching conditions in the ACL, it indicates
that packets match the ACL rules no matter the packets are permitted or denied.
If no ACL exists, no rules are contained in an ACL, or all rules in an ACL do not meet
matching conditions, it indicates that packets do not match the ACL rules.
Process of ACL Rule Matching

1. Check whether the user configures an ACL.
2. According to the ACL configuration:
– If the ACL exists, a packet needs to be checked according to ACL rules. If the
packet matches a rule, the ACL notifies the service of the behavior defined in the
rule, and the packet stops matching the remaining rules.
– If the ACL exists and the service matches only some of the options such as source
address, destination address, TCP source port and destination port, and ICMP
protocol, a packet must match all ACLs according to service requirements. The first
rule that the packet matches is notified to the service, and the packet does not
continue to match the remaining rules.
Matching Order of Rules in an ACL

Rule display order determines matching order. That is, a packet searches an ACL from the
first rule. When matching a rule, the packet stops searching. Therefore, rules in the front of an
ACL are more likely to be matched.
The rule order is determined by two factors: rule ID and rule matching order.
There are two rule matching orders, namely, configuration order and automatic order.

Equipment
l Configuration order indicates that ACL rules are matched according to their
configuration order. Users can configure rule IDs, or the system automatically generates
rule IDs according to an ACL step. The ACL step enables users to easily maintain or add
rules. For example, the step of ACL is 5 by default. When a user does not configure a
rule ID, the system automatically generates a rule ID, 5, for the first rule. In this manner,
if the user intends to add a new rule before rule 5, he or she only needs to input a rule ID
smaller than 5. Then, after rearrangement, the new rule becomes the first rule.
l In the case of automatic order, the system automatically assigns rule IDs, and puts the
rule which is the most precision to the first place according to the principle of depth first.
This can be implemented through the comparison of address wildcards. The smaller the
wildcard, the smaller the specified host range.
For example, 129.102.1.1 0.0.0.0 specifies a host at 129.102.1.1, and 129.102.1.1
0.0.0.255 specifies a network segment ranging from 129.102.1.1 to 129.102.1.255. In
this case, the former rule that specifies a smaller host range is placed before the latter one
in an ACL. The detailed standards are as follows:
– The clauses of basic ACL rules are ordered as follows:
n The clause carrying VPN instance information is ordered first.
n If the VPN instance information is the same, the clause with a smaller range of
source IP addresses is ordered first.
n If the ranges of source IP addresses are the same, the clause configured first is
ordered first.
– The any clauses of interface-based ACL rules are ordered last, and the other clauses
are ordered in the configuration sequence.
– The clauses of advanced ACL rules are ordered as follows:
n The clause carrying VPN instance information is ordered first.
n If the VPN instance information is the same, the clause with IPv4 protocols is
ordered first.
n If the protocol information is the same, the clause with a smaller range of
source IP addresses is ordered first.
n If the range of source IP addresses is the same, the clause with a smaller range
of destination IP addresses is ordered first.
n If the range of destination IP addresses is the same, the clause with a smaller
range of TCP/UDP port numbers is ordered first.
n If the ranges of TCP/UDP port numbers are the same, the clause configured
first is ordered first.
A rule is identified by a rule ID, which is configured by a user or generated by the system
according to the ACL step. All rules in an ACL are arranged in ascending order of rule IDs.
Rule IDs are separated by a certain space. The size of the space depends on the ACL step. For
example, if the ACL step is set to 5, the difference between two rule IDs are 5, such as 5, 10,
15, and the rest may be deduced by analogy. If the ACL step is 2, the rule ID automatically
generated by the system starts from 2. In this manner, the user can add a rule before the first
rule.
l Configuration order
– If rule IDs are not specified, the system automatically assigns rule IDs according to
the ACL step and the configuration order of rules. For example, the user configures
three rules without rule IDs. If the ACL step is 5, the system assigns rule IDs 5, 10,
and 15 to the three rules according to the configuration order.

Equipment
– If rule IDs are specified, rules are arranged according to their rule IDs. For example,
rule IDs are 5, 10, and 15. If a rule ID, 3, is specified for a new ACL rule, the order
of the rules is 3, 5, 10, and 15. It can be considered that a new rule is added before
rule 5.
Therefore, in the case of the configuration order, the system performs rule matching
according to the configuration order of rules. In essence, the system performs rule
matching in ascending order of the rule IDs. In this manner, a new rule may be matched
earlier.
l Automatic order
In the case of the automatic order, the user cannot specify rule IDs. Instead, the system
automatically assigns rule IDs according to the principle of depth first. In addition, the
user cannot add a new rule. The rule that specifies a smaller packet range obtains a
smaller rule ID. The system performs rule matching in ascending order of rule IDs.
NOTE
When an ACL rule is configured:

l If the rule does not have an ID, the system checks whether the configured rule is the same as
any of the existing rules. If it is different from any of the existing rules, the rule is regarded as
a new one.
l If the rule has an ID that does not conflict with any of the existing rule IDs, the system checks
whether the configured rule is the same as any of the existing rules. If it is different from any
of the existing rules, the rule is regarded as a new one.
l If the rule has an ID that is the same as the ID of an existing rule, the configured rule is
regarded as a modification of the existing rule. Then, the system modifies the corresponding
fields in the existing rule according to the configured rule.
To restore the default value of a field in an ACL rule, use the undo command with a rule ID and
the field to be restored. For example, use the undo rule 5 source command to restore the default
value of the source field in ACL rule 5.
7.3.2.1 Differences Between ACL4 and ACL6

ACL4 and ACL6 are the same in the basic principle except for some slight differences. The
differences are listed in Table 7-9.
Table 7-9 Differences between ACL4 and ACL6

ACL4 ACL6
MIBs are supported. MIBs are not supported.
7.3.3 Applications
Application of ACLs in Route Filtering
ACLs can be applied in various dynamic routing protocols to filter the advertised and
received routes.

Equipment
Figure 7-17 Application of ACLs in route filtering

ATNC
OSPF
Internet
ATNA 172.1.17.0/24
172.1.18.0/24
172.1.19.0/24
172.1.20.0/24
ATNB
ATND
As shown in Figure 7-17, in a network running the Open Shortest Path First (OSPF) protocol,
ATN A receives routes from the Internet, and provides part of the Internet routes for ATN B.
An ACL is configured on ATN A and applied in OSPF to control the advertisement and
receiving of routes.
l ATN A provides routes 172.1.17.0/24, 172.1.18.0/24, and 172.1.19.0/24 for ATN B.
l ATN C accepts only the route 172.1.18.0/24.
l ATN D accepts all the routes provided by ATN B.
Application of ACLs in QoS

ACLs can be used in Quality of Service (QoS) to process packets with certain attributes.

Equipment
Figure 7-18 Application of ACLs in QoS
Network A
Network B
ATNA
Network E
Network C
Network D
Packets from Network A

Packets from Network A on which
QoS processing is performed
Packets from Network B, C, and D
As shown in Figure 7-18, an ACL is configured on ATN A to identify all packets from
Network A. Then, the ACL is applied to the QoS policy. In this manner, all the packets from
Network A are forwarded only after ATN A performs QoS processing. The packets from other
networks, however, are forwarded normally, because they do not match the ACL.

Terms
Term Description
Interface-based ACL An interface-based ACL can define rules based on the

interface that receives packets.
Basic ACL A basic ACL can define ACL rules based on only source
addresses.

Equipment
Term Description
Advanced ACL An advanced ACL can define ACL rules based on the source
addresses, target addresses, protocol type, such as TCP source
or target port, the type of the ICMP protocol, and message
codes.
Ethernet frame header- An Ethernet frame header-based ACL can define rules to filter
based ACL packets based on the source MAC address, destination MAC
address, or protocol type of Ethernet frames.
Abbreviations
ACL Access Control List
7.3.5 Appendix
ACL Type Support for Support for Implementation Difference
IPv4 IPv6
Interface-based Yes Yes -

ACL
Basic ACL Yes Yes l Basic ACL4 supports the

matching for non-fragments, first
fragments, and subsequent
fragments. Basic ACl6 supports
only sequent fragments.
l Basic ACL4 supports the
configuration of a wildcard
(inverse mask) for IPv4
addresses. Basic ACL6 supports
the configuration of a wildcard
and mask length for IPv6
addresses.

Equipment
ACL Type Support for Support for Implementation Difference

IPv4 IPv6
Advanced ACL Yes Yes l In comparison with advanced

ACL4, advanced ACL6 supports
IPv6, ICMPv6, IPv6-AH, and
IPv6-ESP, but not IP, ICMP, and
IGMP.
l In comparison with advanced
ACL4, advanced ACL6 does not
support the TCP flag.
l Advanced ACL4 supports the
configuration of wildcard
(inverse mask) for IPv4
addresses. Advanced ACL6
supports the configuration of the
wildcard and mask length for
IPv6 addresses.
l Advanced ACL4 supports the
matching for non-fragments, first
fragments, and subsequent
fragments. Advanced ACl6
supports only sequent fragments.
l The DSCP item of advanced
ACL4 indicates the 6-bit DSCP
field in an IPv4 packet as defined
in RFC 2474. The ToS item of
advanced ACL6 indicates the
leftmost six bits of the TC field
in an IPv6 packet.
l The Precedence item of
high-order three bits of the ToS
field in an IPv4 packet as defined
in RFC 791. The ToS item of
leftmost three bits of the TC field
in an IPv6 packet.
l The ToS item of advanced ACL4
indicates the 4-bit ToS field in an
IPv4 packet as defined in RFC
1349. The ToS item of advanced
ACL6 indicates the leftmost four
to seven bits of the TC field in an
IPv6 packet.
Ethernet frame Yes No -

header-based
ACL
MPLS-based ACL Yes No -

Equipment
7.4 IPv4
7.4.1 Introduction to IPv4
Definition
At the core of the TCP/IP protocol suite, Internet Protocol Version 4 (IPv4) works at the
Internet layer in the TCP/IP model. This layer corresponds to the network layer in the OSI
model. At the IP layer, information is divided into data units, and address and control
information is added to allow datagrams to be routed.
IP provides unreliable and connectionless data transmission services. Unreliable transmission

means that IP does not ensure that IP datagrams successfully arrive at the destination. IP only
provides best effort delivery. Once an error occurs, for example, a router exhausts the buffer,
IP discards the excess datagrams and sends ICMP messages to the source. The upper layer
protocols, such as TCP, are responsible for resolving reliability issues.
Connectionless transmission means that IP does not maintain status information for
subsequent datagrams. Every datagram is processed independently, meaning that IP
datagrams may not be received in the same order they are sent. If a source sends two
consecutive datagrams A and B in sequence to the same destination, each datagram is
possibly routed over a different path to the destination, and therefore B may arrive ahead of
A.
Purpose
IPv4 shields the differences at the link layer and provides the upper layer with services based
on a uniform standard of transmission on the network layer.
7.4.2 Principles
7.4.2.1 Principle of TCP

The Transmission Control Protocol (TCP), as defined in RFC 793, is used to provide
connection-oriented and reliable services between hosts. TCP defines reliable and connection-
oriented services in full duplex mode.
TCP is an end-to-end connection-oriented reliable protocol. It supports multiple network

application programs. TCP assumes that the lower layer can provide only unreliable
datagrams, and this is why TCP can be run on the network composed of different types of
hardware.
Figure 7-19 shows the position of TCP in the hierarchical architecture. Below it is the IP
protocol. TCP transmits data of different sizes based on the services provided by IP. IP
fragments and reassembles data and then transmits packets on different networks.

Equipment
Figure 7-19 Hierarchical architecture
Network layers
Higher layer
TCP
IP
Transport network
In ISO reference model, TCP connects the upper-layer application program and the lower-
layer IP protocol.
TCP can transmit data to upper-layer application programs asynchronously. Assume that the
lower-layer interface is the IP protocol interface. To implement reliable data transmission in
connection-oriented mode on unreliable networks, TCP must provide the following:
l Reliability and traffic control

l Multiple interfaces for upper-layer application programs
l Data for multiple application programs
l Connection
l Secure communication
Figure 7-20 shows how a TCP connection is set up and terminated.
Figure 7-20 Establishment and closing of a TCP connection

Client Server
Invoke Socket and Invoke Socket and return
return Invoke bind/listen and return
Invoke Connect SYN
Invoke accept
Establish a Block
SYN|ACK Block
connection
Connect return ACK
Invoke recv accept return
Data
Block Invoke Send and return
Transmit
recv return Data|ACK Invoke recv
data and invoke Block
Send ACK recv return
Invoke Close and Invoke recv

return
FIN
Block
Close the ACK recv return 0
connection Invoke Close and
FIN
return
ACK

Equipment
7.4.2.2 Principle of UDP

UDP is a computer communications protocol used to exchange packets on a network. UDP
considers IP as its default lower-layer protocol and provides the simplest mechanism to send
messages to another user program. Focusing on operation, UDP does not provide mechanisms
to protect data submission and data duplication. If application programs require reliable data
transmission, TCP should be adopted. Figure 7-21 shows the format of a UDP datagram.
Figure 7-21 Format of a UDP datagram

0 78 15 16 23 24 31
Source port Destination port
Length Check code
Data ...
...
User data format
7.4.2.3 Principle of Raw IP

Raw IP only fills a limited number of fields in the IP header, but it allows application
programs to provide their own IP headers. Similar to UDP, raw IP is unreliable. Therefore, it
cannot be determined whether raw IP datagrams reach their destinations. Raw IP is
connectionless, that is, no circuits are needed during the transmission between hosts. Unlike
UDP, raw IP allows application programs to directly operate the IP layer using the socket.
This facilitates the direct interactions with the lower layer.
7.4.2.4 Principle of the Socket

The socket is a group of Application Programming Interfaces (APIs). It functions between the
transport layer and the application layer, shielding the differences of the transport layer and
providing uniform APIs for the application layer. The application layer does not need to know
the details of TCP/IP. Instead, it directly invokes the functions of the socket to complete the
data transmission on IP networks. Figure 7-22 shows the position of the socket in the TCP/IP
stack.

Equipment
Figure 7-22 Socket in the hierarchical model
Application Layer
Socket API
Transport Layer
TCP UDP RawIP
NetworkLayer
Datalink Layer
Physical Layer
Four types of sockets are supported, shielding the differences at the transport layer:
l TCP-based socket: ensures reliable transmission of data streams to the application layer.
l UDP-based socket: provides connectionless and unreliable data transmission to the
application layer. Such transmission, however, can provide packet boundaries.
l Raw IP-based socket: also called the raw socket. Similar to the UDP-based socket, the
raw IP-based socket provides connectionless and unreliable data transmission and packet
boundaries. It allows application programs to directly access the network layer.
l Link-layer-based socket: provided for the Intermediate System-to-Intermediate System
(IS-IS) routing protocol. The link-layer-based socket allows IS-IS to directly access the
link layer.
7.4.3 Applications
ICMP Message Sending Control

Normally, Host Unreachable messages and Redirection messages of ICMP can be correctly
sent. When network traffic is heavy and a great number of errors occur, a routing device needs
to send many ICMP messages. In addition, attackers usually attack the network by sending
ICMP error messages, which may cause a vicious circle and worsen network congestion. For
example, sending a great number of Redirection messages may cause frequent route changes.
The ATN supports the control over ICMP message sending on the outbound interfaces. You
can run command lines to either enable the system to send ICMP Host Unreachable or
Redirection messages or disable the system from sending ICMP Host Unreachable or
Redirection messages. If you disable the system from sending ICMP messages, the system no

Equipment
longer sends the two types of messages, reducing the traffic burden and protecting the
network from malicious attacks.
Terms
None
Abbreviations
7.5 IP Unicast Policy-Based Routing
7.5.1 Introduction
Definition
Policy-based routing (PBR) is a mechanism used to make routing decisions based on user-
defined routing policies. This differs from the routing mechanism based on destination
addresses of IP packets.
NOTE
Policy-based routing and routing policy are two different terms.

l PBR applies to data packets. PBR provides a means to route or forward data packets flexibly, in
accordance with predefined policies instead of following the routes in the existing routing table.
l Routing policies apply to routing information. Based on routing protocols, routes in the routing
table are generated, advertised, and selected by changing parameters, by using control modes, or
based on rules.
For details on routing policies, see "IP Routing" in Feature Description.
Purpose
Traditionally, packets are forwarded based on destination addresses in routing tables
constructed based on routing protocols. This mechanism allows routers to route packets based
on only destination addresses of packets. This routing mode meets requirements for data
forwarding but does not support differentiated services. IP PBR allows network
administrators to select forwarding paths based on packet attributes, such as destination
addresses, source addresses, and packet sizes.
Benefits
l This feature improves flexibility of route selection.

Equipment
l This feature improves control of route selection.
7.5.2 Principles
Related Concepts
Policy-based routing (PBR) can be categorized into the following types:
l Interface PBR: applies to received packets instead of locally sent packets (such as ping
packets).
l Local PBR: applies to locally sent packets instead of received packets.
Implementation
PBR is implemented in the following steps:
1. Specify packets suitable for PBR.
2. Specify routes for these packets.ATN PBR allows routers to flexibly select routes
according to access control list (ACL)-based packet filtering results, addresses, and
packet sizes. ACL-based packet filtering allows routers to classify packets based on
source and destination addresses, protocols, port numbers, priorities, types of services
(ToSs), time segments, and virtual private networks (VPNs). Then, the routers forward
these packets along different routes.
PBR is implemented as follows:
If PBR has been configured, a router first checks whether packets match any PBR nodes when
sending or forwarding packets.
l If the router finds matched PBR nodes, it performs the following steps to send or forward
packets:
a. The router sets priorities for packets based on the predefined priority rules to
differentiate services based on priorities. After priorities are set, the process goes to
step 2.
b. The router checks whether sending interfaces are configured for the matched PBR
nodes.
n If yes, the router sends packets through these sending interfaces.
n If no, the process goes to step 3.
c. The router checks whether next hops are configured for matched PBR nodes.
NOTE
Multiple next hops can be configured for a PBR node for load balancing.
n If yes, the router sends packets to next hops.
n If no, the router follows the normal procedure for sending packets by searching
routes based on destination addresses of packets. If no route is available, the
process goes to step 4.
d. The router checks whether a default sending interface has been configured.
n If yes, the router sends packets to the default sending interface.
e. The router checks whether default next hops are configured for matched PBR
nodes.

Equipment
n If yes, the router sends packets to default next hops.

f. The router discards packets and generates ICMP_UNREACH messages.
l If the router finds no matched PBR node, it follows the normal procedure for sending
packets by searching routes based on destination addresses of packets.
Usage Scenarios
PBR can be used for:
l Security: PBR can be configured to filter the IP address of a network attacker and
prevent routers from forwarding data flows from this IP address.
l Load balancing: When multiple paths to an Internet service provider (ISP) network are
available, network administrators use PBR to distribute traffic based on route bandwidths
to achieve load balancing.
l Routing based on source addresses: When a network provides two lines with different
rates to access the Internet, network administrators use PBR to ensure that users with
high priorities use the line with the higher rate and common users use the line with the
lower rate.
l Routing based on service classes: Data packets have different service requirements for
transmission rate, throughput, and reliability. PBR allows routers to route data packets
based on the network status. For example, routers use large-bandwidth lines for voice
and video services and small-bandwidth lines for data services.
Benefits
Different from traditional routing protocols, PBR allows network administrators to control
packet forwarding and storage more effectively and flexibly. For example, when packets have
the same destination address, PBR allows routers to select routes based on source addresses of
packets.
7.5.3 Applications
Service Overview
As shown in Figure 7-23, the internal network is connected to the Internet through a router.
The router provides multiple interfaces to connect to the Internet.
l To ensure that a certain type of packet is forwarded through a specified interface,
configure IP unicast policy-based routing (PBR) for the interface.
l To apply PBR to packets that are generated on a router, configure local PBR.

Equipment
Figure 7-23 Networking diagram of IP unicast policy-based routing
PC1 PC2
Port1
Port3
Internet
10.110.0.0 Port2
RouterA
PC3
Feature Deployment
l Routing based on source addresses: When a network provides two lines with different
rates to access the Internet, network administrators use PBR to ensure that users with
high priorities use the line with the higher rate and common users use the line with the
lower rate. A PBR node is configured on ATN A. The PBR node defines routing rules
and actions. For example, PBR is enabled on Ethernet port 3. The PBR configuration
allows ATN A to send all packets that are received on port 3 from PC1 at 10.110.0.11/24
through port 2 and send other packets based on their destination addresses.
l Routing based on service classes: Data packets have different service requirements for
transmission rate, throughput, and reliability. PBR allows routers to route data packets
differently based on the network status. For example, routers use large-bandwidth lines
for voice and video services and small-bandwidth lines for data services. For example,
the bandwidth of the line for sending packets from port 1 is larger than that from port 2.
PBR can be configured on port 3 of ATN A to enable ATN A to send voice and video
services from port 1 and data services from port 2.
Terms
Term Description
PBR In computer networking, policy-based routing (PBR) is a technique

used to make routing decisions based on policies set by the network
administrator.
7.6 IPv6

Equipment
7.6.1 Introduction to IPv6

Definition
Internet Protocol Version 6 (IPv6), also called IP Next Generation (IPng), is the second-
generation standard protocol of network layer protocols. As a set of specifications defined by
the Internet Engineering Task Force (IETF), IPv6 is the upgraded version of Internet Protocol
Version 4 (IPv4). The most obvious difference between IPv6 and IPv4 is that IP addresses are
lengthened from 32 bits to 128 bits. Featuring the simplified header format, sufficient address
space, hierarchical address structure, flexible extended header, and enhanced neighbor
discovery (ND) mechanism, IPv6 is competitive in the future market.
ATN supports IPv6 on the following interfaces:
l Ethernet interfaces and sub-interfaces
l Tunnel interfaces
l Loopback interfaces
l VLANIF interfaces
Purpose
The IPv4-based Internet achieves a great success. Consequently, the IP technology is widely
applied. With the rapid development of the Internet, however, deficiencies in IPv4 become
increasingly obvious in the following aspects:
l The IPv4 address space is insufficient.
An IPv4 address is identified by using 32 bits. In theory, a maximum of 4.3 billion
addresses can be provided. In actual applications, less than 4.3 billion addresses are
available because of address allocation. In addition, IPv4 address resources are allocated
unevenly. Address resources of the USA occupy almost half of the global address space;
the address resources of Europe are relatively fewer than those of the USA; the address
resources of the Asian-Pacific region are much fewer. The development of mobile IP and
broadband technology requires more IP addresses. Consequently, limited IPv4 address
resources directly restrict the further development of the IP technology.
There are several solutions to IPv4 address shortage. Classless Interdomain Routing
(CIDR) and Network Address Translator (NAT) are two representative solutions to IPv4
address shortage. CIDR and NAT, however, have their disadvantages and unsolved
problems. This promotes the development of IPv6.
l The backbone device maintains too many routing entries.
Many discontinuous IPv4 addresses are allocated because of the problems in the initial
IPv4 address allocation planning. As a result, routes cannot be aggregated effectively.
The increasingly large routing table consumes a lot of memory, degrading forwarding
efficiency. Subsequently, device manufacturers have to upgrade products to improve
route addressing and forwarding performance.
l Address autoconfiguration and readdressing cannot be performed easily.
An IPv4 address occupies only 32 bits and IP addresses are allocated unevenly.
Consequently, IP addresses need to be reallocated during network expansion or network
replanning. The workload for maintenance is heavy.
l Security cannot be well guaranteed.
As the Internet develops, security problems become more serious. The IPv4 design does
not fully consider security, so the original framework cannot ensure end-to-end security.

Equipment
IPv6 provides end-to-end security by using IP security (IPSec) as the standard extended
header.
IPv6 radically solves the problem of IP address shortage. Moreover, IPv6 has the following
advantages: It is easy to deploy, compatible with various applications, easy for IPv4 networks
to transit to IPv6 networks. With so many obvious advantages over IPv4, IPv6 is rapidly
developed.
7.6.2 Principles
Basic functions of IPv6 include IPv6 neighbor discovery and IPv6 path MTU (PMTU)
discovery. Neighbor discovery and PMTU discovery are implemented through Internet
Control Message Protocol for IPv6 (ICMPv6) messages.
7.6.2.1 IPv6 Header Format

This section describes the IPv6 header format.
Figure 7-24 shows the IPv6 packet format.
Figure 7-24 IPv6 packet format
An IPv6 packet is generally composed of the following components:

l Fixed header: contains basic packet forwarding information. Routers use the information
in fixed headers to forward most packets.
l Extension header: contains extended packet forwarding information. Use of this header
is optional. Not all packets contain extension headers nor do all routers need to process
them. Generally, only destination routers or hosts process extension headers.
l Upper-layer protocol data unit: is generally composed of an upper-layer protocol header
and a valid payload. The IPv6 upper-layer protocol data unit is the same as the unit in an
IPv4 packet.

Equipment
Fixed Header
Figure 7-25 Fixed header format
0 7 15 31
Version Traffic class Flow label
Payload length Next header Hop limit
Source address
Destination address
IPv6 header
Table 7-10 Description of the fixed header
Version 4 bits Internet Protocol version number = 6
Traffic Class 8 bits The function of this field is similar to that of

the Service Type field in an IPv4 header.
This field specifies the class or priority of an
IPv6 data packet. In RFC 2460, the value of
the field is not defined. In RFC 2474, the DS
field is defined to replace the Traffic Class
field.
Flow Label 20 bits Used by a source to label sequences of

packets for which the label requests special
handling by IPv6 routers. Details for use of
this field are not defined. Generally, a flow
can be identified based on the source IPv6
address, destination IPv6 address, and flow
label.
Payload Length 16 bits Length of the IPv6 payload. The payload

refers to the extension header and the upper-
layer protocol data unit that follow the IPv6
header. The maximum length of the payload
is 65535 bytes. If the length of the payload
exceeds 65535 bytes, the value of this field
is set to 0.

Equipment
Next Header 8 bits Type of header immediately following the

IPv6 header. The value of the Next Header
field varies with extension headers as
follows:
l 0: Hop-by-Hop Options header
l 43: Routing header
l 44: Fragment header
l 50: Encapsulating Security Payload
l 51: Authentication header
l 59: No Next Header
l 60: Destination Options header
Hop Limit 8 bits Replaces the IPv4 Time to Live field. This
field defines the maximum allowable
number of hops a packet can pass through.
The value is decreased by 1 for each node
that forwards the packet. The packet is
discarded if Hop Limit is decreased to zero.
Source Address 128 bits IPv6 address of the sending node.
Destination Address 128 bits IPv6 address of the destination node.
Extension Header
Optional Internet layer information is encoded in separate headers to reduce IPv6 packet
processing costs and to limit the bandwidth needed for IPv6 headers. Extension headers are
classified as follows:
l Hop-by-Hop Options header
This header is used to specify sending parameters for each hop on the path of a packet.
Every intermediate node on the path needs to read and process the field. It is identified
by the Next Header value 0 in the IPv6 header.
Figure 7-26 Hop-by-Hop Options header
0 7 15 31
Next Header Hdr Ext Len
Options
– Next Header: 8 bits. It identifies the type of header immediately following the Hop-
by-Hop Options header. Its functions are the same as the Next Header field in a
fixed header. It is included in all options headers.

Equipment
– Hdr Ext Len: 8 bits. It indicates the length of the Hop-by-Hop Options header, not
including the first 8 bytes.
– Options: a combination of fields. This is used to describe a data forwarding feature
or to fill in the Hop-by-Hop Options header. A Hop-by-Hop Options header can
contain one or more Options fields. The Options field that describes data
forwarding is essential for the Hop-by-Hop Options header. The following table
describes the Options field format.
Figure 7-27 Options field
Option type Opt Data length Option Data
The Options field is used in the Destination Options header as well as in the Hop-
by-Hop Options header. Each option is encoded in the type-length-value (TLV)
format.
n Option Type: 8 bits. It identifies the option type and specifies the method used
by relevant nodes to process this field.
n Opt Data Length: 8 bits. It indicates the length of the Option Data field for this
option, not including the Option Type and Opt Data Length fields.
n Option Data: variable-length field. It contains data specific to this Option
Type.
l Destination Options header
The format of the Destination Options header is similar to the Hop-by-Hop Options
header shown in Figure 7-26, except that the value of the Next Header field in the
Destination Options header is 60. Destination Options headers are the only type of
header that can occur twice in a packet, once before a Routing header and once before
the upper-layer header. When the Destination Options header is before a Routing header,
it is processed by the nodes in the address list contained in the Routing header. When the
Destination Options header is before the upper-layer header, it is processed by the
destination device. Options that need to be processed by all nodes on a specified
forwarding path are placed before the Routing header, whereas options that need to be
processed only by the destination device are placed before the upper-layer header.
l Routing header
The Routing header is used to specify the intermediate nodes that a packet must pass
through. Figure 7-28 shows the format of the Routing header.
Figure 7-28 Routing header

0 7 15 31
Next Header Hdr Ext Len Routing Type Segments Left
Type-specific data
The Next Header and Hdr Ext Len fields mean the same things in a Routing header as
they do in a Hop-by-Hop Options header, except that the Next Header field has the value
43 in a Routing header. The other Routing header fields are as follows:

Equipment
– Routing Type: 8 bits. It identifies the type-specific data. At present, RFC 2460 has
defined only Routing Type=0.
– Segments Left: 8 bits. It indicates the number of route segments remaining. This
refers to the number of listed intermediate nodes still to be visited before the
destination is reached.
– Type-specific data: Variable-length field. The Routing Type determines the format
of this field. Type-specific data for Routing Type=0 defined by RFC 2460 is the IP
addresses of intermediate nodes to be visited.
l Fragment header
When packet size exceeds the Maximum Transmission Unit (MTU), the packet needs to
be fragmented. Fragments are identified by the Fragment header. Unlike IPv4,
fragmentation in IPv6 is performed only by source nodes, not by routers along the path a
packet traverses. Figure 7-29 shows the format of the Fragment header.
Figure 7-29 Fragment header

0 7 15 31
Next Header Reserved Fragment Offset Res M
Identification
Fragment header fields are as follows:

– Next Header: 8 bits. It identifies the initial header type of the original packet part
that can be fragmented. The value of this Next header field is 44.
– Reserved: 8 bits. It is initialized to zero for transmission and ignored on reception.
– Fragment Offset: 13 bits. It indicates the offset of data that follows this header,
relative to the start of the original packet part that can be fragmented.
– Res: 2 bits. It is initialized to zero for transmission and ignored on reception.
– M: 1 bit. 1 indicates there are more fragments coming; 0 indicates this is the last
fragment.
– Identification: 32 bits. The source node generates an Identification value for every
packet that is to be fragmented. At any given time, all fragmented packets with the
same source and destination addresses must have different Identification values.
The fragments can be reassembled at the receiver based on the Identification. The
Identification value is a simple, 32-bit, "wrap-around" counter, that is incremented
by 1 each time a packet must be fragmented.
l Authentication header
An Authentication header provides authentication services and is used as an IP security
measure. The value of the Next Header field of the Authentication header is 51. Figure
7-30 shows the format of this extension header.

Equipment
Figure 7-30 Authentication header

0 7 15 31
Next Header Payload Len RESERVED
Security Parameters Index(SPI)
Sequence Number Field
Authentication Data(variable)
Authentication header fields are as follows:

– Payload Len: 8 bits. It indicates the length of the Authentication Data field of this
header.
– Reserved: 16 bits. It is initialized to zero for transmission and ignored on reception.
– Security Parameters Index (SPI): A random 32 bit value. It is used in combination
with the destination IP address and AH to uniquely identify the security association
of data packets. 0 is locally reserved. 1-255 is reserved for IANA.
– Sequence Number: 32 bits. It is set to zero for the first packet and incremented by 1
each time a packet is sent.
– Authentication Data: variable length. It is used to authenticate whether a packet is
complete. The number of bits in this field varies so that the total Authentication
header length is a multiple of 32 bits.
l Encapsulating Security Payload (ESP)
Encapsulating Security Payload headers are used as an IP security measure. The value of
its Next Header field is 50. ESP headers are similar to Authentication headers. Both IPv4
and IPv6 packets contain this header, which is usually called IPsec ESP. For the
meanings of its fields, see the Authentication header.
Figure 7-31 Encapsulating security Payload

0 7 15 31
Security Parameters Index(SPI)
Sequence Number
Payload Data* (variable)
Padding (0-255 bytes)
Pad Length Next Header
Authentication Data (variable)

Equipment
NOTE
When using extension headers, note the following:
l When more than one extension header is used in the same packet, it is recommended that those
headers appear in the previously mentioned order.
l Not all extension headers need to be checked and processed by intermediate nodes. When an
intermediate node forwards a packet, it determines whether to process extension headers carried in
the packet on the basis of the Next Header field value in the fixed header. Packets have only one of
each type of extension header, with the exception of the Destination Options header. This header
may occur twice, once before a Routing header and once before the upper-layer header.
l The value 59 in the Next Header field of an IPv6 header or extension header indicates that there is
nothing following that header. Even if the Payload Length field indicates that there are more bytes
behind that header, those bytes must be ignored, and passed on unchanged if the packet is forwarded.
7.6.2.2 IPv6 Addresses
IPv6 Address Formats

A 128-bit IPv6 address is expressed in either of the following formats:
l X:X:X:X:X:X:X:X
– An IPv6 address is divided into eight groups, separated by colons. Each group (an
X) is a 16-bit hexadecimal number that consists of four hexadecimal digits,
including 0 to 9 and A to F. For example, 2031:0000:130F:0000:0000:09C0:876A:
130B is an IPv6 address.
For convenience, a group containing all 0s is displayed as a single 0. The example
address can be written as 2031:0:130F:0:0:9C0:876A:130B.
– Two or more consecutive groups of 0s can be replaced with an empty group using a
pair of colons (::), which helps minimize the IPv6 address length. The example
address can also be written as 2031:0:130F::9C0:876A:130B.
An IPv6 address can only contain a single pair of colons (::). If an IPv6 address
contains more than one pair of colons, a computer cannot restore the compressed
address to the original 128-bit address because it cannot identify the number of
zeros in the IPv6 address.
l X:X:X:X:X:X:d.d.d.d
Each "X" is 16 bits long and consists of four hexadecimal digits. Each "d" is 8 bits long
and is presented by a decimal number. "d.d.d.d" represents an IPv4 address. The
following addresses are expressed in this format:
– 0:0:0:0:0:0:IPv4-address: an IPv4-compatible IPv6 address. The most significant
96 bits of 0s precede a 32-bits IPv4 address. The IPv4 address must be reachable on
an IPv4 network and can only be a unicast address, but not a multicast address, a
broadcast address, a loopback address, or an unspecified address (0.0.0.0, for
example).
An IPv4-compatible IPv6 address is used to configure an IPv6 over IPv4 tunnel.
– 0:0:0:0:0:FFFF:IPv4-address: an IPv4-mapped IPv6 address that is mapped to an
IPv4 address of an IPv4 node. This address type is used to represent the address of
an IPv4 node as an IPv6 address.

Equipment
Structure of an IPv6 Address

An IPv6 address is divided into two parts:
l Network prefix: equivalent to the network ID of an IPv4 address.
l Interface ID: equivalent to the host ID in an IPv4 address. The interface ID length is as
follows:
Interface ID length = 128 bits – n bits, where n is the length of the network ID
Figure 7-32 illustrates the structure of IPv6 address
2001:A304:6101:1:0000:E0:F726:4E58 /64.
Figure 7-32 Structure of IPv6 address 2001:A304:6101:1:0000:E0:F726:4E58 /64
Nework prefix Interface ID
64 bits 64 bits
2001:A304:6101:0001 0000:00E0:F726:4E58
IPv6 Address Classification

IPv6 addresses are classified as unicast, anycast, and multicast addresses.
l Unicast address: uniquely identifies an interface. An IPv6 unicast address is similar to an
IPv4 unicast address. Packets bound for a unicast address are transmitted to an interface
uniquely identified by the unicast address.
Unicast addresses can be classified into the following categories, as shown in Table
7-11.
Table 7-11 Types of IPv6 unicast addresses

Address Type Binary Prefix IPv6 Prefix Identifier
Link-local unicast address 1111111010 FE80::/10
Loopback address 00...1 (128 bits) ::1/128
Unspecified address 00...0 (128 bits) ::/128
Unique-local address 1111110 FC00::/7
Global unicast address Everything else -
The meanings of each type of address are as follows:

– A link-local unicast address is used by a neighbor discovery protocol or by nodes
on a local link to perform stateless address autoconfiguration. Packets with a link-

Equipment
local IPv6 unicast address as a source or destination address are forwarded only on
a local link. A link-local IPv6 unicast address can be automatically configured on
any interface using a link-local prefix FE80::/10 (1111 1110 10 in binary) and an
EUI-64 interface ID.
Figure 7-33 Link-local IPv6 unicast address structure
10 bits 54 bits 64 bits

1111111010 0 Interface ID
– A unique-local unicast address identifies a single site and has a globally unique
prefix. Sites use unique-local unicast addresses to establish private connections,
without incurring address conflicts. Even if routes destined for unique-local unicast
addresses leak, the routes do not conflict with Internet routes. Upper layer
applications use unique-local unicast addresses as global unicast addresses. Figure
7-34 shows the unique-local unicast address structure. The address contains the
following fields:
n 1111101: the prefix of a unique-local unicast address.
n L: a 1-bit field. The value can be:
○ 1: The address is used locally.
○ 2: The address is reserved for future use.
n Group ID: a 40-bit global identifier that is a pseudo random number.
n Subnet ID: a 16-bit subnet identifier that identifies a subnet within a site.
n Interface ID: a 64-bit identifier that identifies an interface.
Figure 7-34 Unique-local unicast address structure
7 bits 1 bit 40 bits 16 bits 64 bits

1111101 L Group ID Subnet ID Interface ID
– Loopback address: functions similarly to IPv4 loopback address 127.0.0.1. A node

sends an IPv6 packet with the loopback address to itself. A loopback address can be
set to 0:0:0:0:0:0:0:1 or ::1, while cannot be assigned to any interface.
– Unspecified address (::): used in the Source Address field of an IPv6 packet sent by
an initializing host before the host obtains an address. A Neighbor Solicitation (NS)
message carries an unspecified unicast address in the Source Address field to
perform Duplicate Address Detection (DAD). An unspecified address cannot be
assigned to any node or function as destination addresses.
– Global unicast address: equivalent to an IPv4 public address. Internet service
providers (ISPs) use global unicast addresses to aggregate links. The structure of a
global unicast address enables route prefix aggregation, which maximizes the
number of global routing entries. A global unicast address consists of a 48-bit
routing prefix, a 16-bit subnet ID, and a 64-bit interface ID. Figure 7-35 shows the
global unicast address structure. The address contains the following fields:

Equipment
n Global routing prefix: with three left-most bits of 001. When an ISP assigns a
global routing prefix to an organization, the global routing prefix must have at
least 48 bits.
n Subnet ID: identifies a subnet within a site.
n Interface ID: uniquely identifies an interface.
Figure 7-35 Global unicast address structure
n bits 64-n bits 64 bits

001 Global prefix Subnet ID Interface ID
l Anycast address: identifies a group of interfaces on different nodes. Packets bound for an
anycast address reach the interface that is nearest to the source node among interfaces in
the interface group identified by the anycast address. A routing protocol determines the
shortest path.
Applicable environment: When a mobile host needs to communicate with the mobile
agent on the home subnet, it uses the anycast address of the device of the subnet.
Specifications of addresses: Anycast addresses do not have independent address space.
They can use the format of any unicast address. Therefore, a syntax is used to
differentiate an anycast address from a unicast address.
l Multicast address: identifies a group of interfaces on different nodes. A multicast IPv6
address is similar to an IPv4 multicast address. Packets bound for a specified multicast
address reach all interfaces identified by the multicast address. Figure 7-36 shows the
multicast address structure. The address contains the following fields:
– 11111111: a binary number that identifies a multicast address.
– Flags: a 4-bit field that identifies a subnet. The third-bit T flag can be:
n 0: permanent multicast address
n 1: transient or dynamic multicast address
– Scope: a 4-bit field that identifies the usage scope of a multicast address. Some
meaningful scope values are as follows:
n 1: Interface-Local scope. A multicast address is locally used on a node.
n 2: Link-Local scope. A multicast address is locally used on a link.
n 4: Admin-Local scope. A multicast address is locally used for management.
n 5: Site-Local scope. A multicast address is locally used within a site.
n 8: Organization-Local scope. A multicast address is locally used by an
organization.
n E: Global scope. A multicast address is globally used.
– Group ID: a 112-bit field that identifies a multicast group. The multicast group can
be permanent or transient within a specified scope.
Figure 7-36 Multicast address structure
8 bits 4 bits 4 bits 112 bits

11111111 flags scope Group ID

Equipment
Although no IPv6 broadcast addresses exist, IPv6 multicast addresses provide broadcast
address functions.
Interface ID in IEEE EUI-64 Format

A 64-bit interface ID in an IPv6 address identifies a unique interface on a link. The interface
ID is derived from a 48–bit MAC address. The process for converting a MAC address into an
EUI-64 interface ID is as follows:
1. The hexadecimal number FFFE (1111 1111 1111 1110 in binary) is inserted in the middle
of a MAC address.
2. The U/L bit (the most significant seventh bit) is set to 1.
3. An EUI-64 interface ID is obtained.
Figure 7-37 shows the process for converting a MAC address into an EUI-64 interface ID.
Figure 7-37 Converting a MAC address into an EUI-64 interface ID

MAC: 0012:3400:ABCD
Binary:
00000000 00010010 00110100 00000000 10101011 11001101
Insert FFFE:
00000000 00010010 00110100 1111111111111110 00000000
1010101111001101
Set U/L bit:

00000010 00010010 00110100 11111111 11111110 00000000
10101011 11001101
EUI-64: 0212:34FF:FE00:ABCD
7.6.2.3 Features of IPv6

l Hierarchical address structure
The IPv6 hierarchical address structure facilitates route search, reduces the size of the
IPv6 routing table through route aggregation, and improves the forwarding efficiency of
devices.
l Automatic address configuration
To simplify the host configuration process, IPv6 supports stateful address
autoconfiguration and stateless address autoconfiguration.
– In the case of stateful address autoconfiguration, a host obtains the address and
configuration through a server.
– In the case of stateless address autoconfiguration, a host automatically configures
an IPv6 address that contains the prefix advertised by the local device and interface
ID of the host. If no device exists on the link, the host can only configure the link-
local address automatically to interwork with local nodes.

Equipment
l Supporting QoS
In an IPv6 header, a new field, namely, the Flow Label field, specifies how to identify
and process traffic. The Flow Label field identifies a flow and allows a device to
recognize packets in a flow and to provide special processing.
QoS is guaranteed for even the packets encrypted with IPSec because the IPv6 header
can identify different types of flows.
l Built-in security
Adopting IPSec as the standard extension header, IPv6 provides end-to-end security.
This provides specifications for ensuring network security, and improves interoperation
between different IPv6 applications.
l Flexible extension header
An IPv4 header supports only the 40-byte option, whereas the size of the IPv6 extension
header is limited only by the IPv6 packet size.
IPv6 introduces multiple extension headers to replace the Options field in the IPv4
header. This improves the packet processing efficiency, enhances IPv6 flexibility, and
provides better scalability for the IP protocol. Figure 7-38 shows an IPv6 extension
header.
Figure 7-38 IPv6 extension header
IPv6 header IPv6 data
Fragment
IPv6 header extension IPv6 data
header
Routing Destination
IPv6 header extension extension IPv6 data
header header
When multiple extension headers are used in the same packet, the headers must be listed in
the following order:
Not all extension headers need to be examined and processed by devices. When forwarding
packets, a device determines whether to process the extension headers based on the Next
Header value in the IPv6 basic header.
Appearing twice in a packet, the destination options extension header appears before the
routing extension header and after the upper layer extension header. The other extension
headers appear only once.

Equipment
7.6.2.4 ICMPv6
As one base protocol of IPv6, Internet Control Message Protocol for IPv6 (ICMPv6)
generates error messages and informational messages, which are used by IPv6 nodes to report
errors and information generated during packet processing. Figure 7-39 shows the format of
an ICMPv6 message.
Figure 7-39 Format of an ICMPv6 message
0 7 15 23 31
Type (1) Code (1) Checksum (2)
Packet Content
......
The meaning of each field in an ICMPv6 message is as follows:

l Type field: indicates the message type. The values from 0 to 127 indicate the error
message type, and values from 128 to 255 indicate the informational message type.
l Code field: indicates the specific message type.
l Checksum field: indicates the checksum of an ICMPv6 message.
Classification of ICMPv6 Error Messages

l Destination Unreachable message
When an IPv6 node forwards IPv6 packets, if it detects that the destination address of the
packets is unreachable, it sends an ICMPv6 Destination Unreachable message to the
source node of the packets. Specific causes for the error message are carried in the
message. Destination Unreachable messages are classified into the following types:
– No route to destination
– Address Unreachable
– Port Unreachable
l Datagram Too Big message
When an IPv6 node forwards IPv6 packets, if it detects that the size of the packets
exceeds the path MTU of the outbound interface, it sends an ICMPv6 Datagram Too Big
message to the source node of the packets. The path MTU of the outbound interface is
carried in the message. Path MTU discovery is implemented based on Datagram Too Big
messages.
l Time Exceeded message
During the transmission of IPv6 packets, when a device receives a packet with the hop
limit being 0 or a device reduces the hop limit to 0, it sends an ICMPv6 Time Exceeded
message to the source node of the packets. During the processing of a packet to be
fragmented and reassembled, an ICMPv6 Time Exceeded message is also generated
when the reassembly time is longer than the specified period.
l Parameter Problem message
When a destination node receives an IPv6 packet, it checks the validity of the packet. If
it detects the following errors, it sends an ICMPv6 Parameter Problem message to the
source node of the packet:

Equipment
– A field in the IPv6 basic header or extension header is incorrect.

– The NextHeader in the IPv6 basic header or extension header cannot be identified.
– Unknown options exist in the extension header.
Classification of ICMPv6 Informational Messages

ICMPv6 informational messages are classified into Echo Request messages and Echo Reply
messages. ICMPv6 messages can be used for network fault diagnosis, path MTU discovery,
and neighbor discovery. During the detection of interworking between two nodes, the node
that receives an Echo Request message sends an Echo Reply message to the source node. In
this manner, packets are transmitted between the two nodes.
7.6.2.5 Neighbor Discovery

Neighbor discovery (ND) is a group of messages and processes that identify relationships
between neighboring nodes. IPv6 ND contains the same features of the Address Resolution
Protocol (ARP), Internet Control Message Protocol (ICMP) router discovery, and ICMP
Redirect of IPv4 as well as additional functions.
l ND Packet Format
l Router Discovery
l Default Router Priority and Route Information
l Duplicate Address Detection
l Neighbor Discovery
ND Packet Format
After being configured with an IPv6 address, a node checks whether this address is available
and does not conflict with other addresses. When a node is a host, a ATN needs to notify the
host of the optimal next-hop address of a packet to a destination. When a node is a ATN, it
needs to advertise its address, address prefix, and other configuration parameters to instruct
hosts to configure parameters. When forwarding IPv6 packets, a node needs to know the link
layer addresses and check reachability of neighboring nodes. IPv6 ND provides five types of
ICMPv6 messages:
l Router Solicitation (RS): After startup, a host sends an RS message to a device and waits
for the device to respond with a Router Advertisement (RA) message. Figure 7-40
shows the RS message format.
Figure 7-40 RS message format

0 7 15 23 31
Type Code Checksum
Reserved
Options
An RS message contains the following fields:

– Type: The value is 133.

Equipment
– Code: The value is 0.

– Checksum: 16-bit ICMP checksum.
– Reserved: This field must be initialized to 0 on the transmit end and be ignored on
the receive end.
– Options: An RS message only contains the Source Link-Layer Address option. This
option contains the link-layer address of the sender. If an unspecified IPv6 source
address is used, an RS message cannot contain the Source Link-Layer Address
option.
l Router Advertisement (RA): A device periodically advertises RA messages that contain
prefixes and flag bits. Figure 7-41 shows the RA message format.
Figure 7-41 RA message format

0 7 15 31
Type Code Checksum
Cur Hop Limit M O H Prf P Rsv Router Lifetime
Reachable Time
Retrans Timer
Options
An RA message contains the following fields:

– Cur Hop Limit: 8-bit unsigned integer that defines the hop limit of a message to be
sent. The default value is placed in the Hop Count field of the IPv6 header, defined
in RFC 2461. Value 0 means that a router does not use this field.
– M: a 1-bit Managed Address Configuration flag.
n 0: stateless address allocation. A host obtains an IPv6 address using a stateless
protocol, for example, ND.
n 1: stateful address allocation. A host obtains an IPv6 address using a stateful
protocol, for example, DHCPv6.
– O: a 1-bit Other Stateful Configuration flag.
n 0: A host obtains IPv6 configurations (except an IPv6 address) using a
stateless protocol, for example, ND.
n 1: A host obtains IPv6 information (except an IPv6 address) using a stateful
protocol, for example, DHCPv6. The IPv6 configurations include DNS and
Simple Internet Protocol(SIP) server addresses.
NOTE
RFC 4861 defines that if the M flag is set to 1, the O flag must also be set to 1.
– H: a 1-bit Home Agent flag defined for mobile IPv6.
n 0: A router sending the RA message functions as a home agent for mobile
nodes.
n 1: A router sending an RA message does not function as a home agent for
mobile nodes.

Equipment
– Prf: a 2-bit Default Router Preference flag. The Prf value of a router that sends the
RA message is used as the priority of the default router for hosts.
– P: a 1-bit Proxy flag. Its value can be:
n 0: disables ND proxy.
n 1: enables ND proxy.
– Rsv: This field must be initialized to 0 on the transmit end and be ignored on the
receive end.
– Router Lifetime: a 16-bit field that indicates the lifetime (in seconds) of a default
router. The lifetime of a router that sends the RA message is used as the lifetime of
the default router for hosts. The default value is 30 minutes, and the maximum
value is 18.2 hours. Value 0 indicates that the router sending the RA message does
not function as the default router, while information carried in the RA message
takes effect.
– Reachable Time: a 32-bit field that indicates a period of time (in milliseconds),
during which a router considers its neighbor reachable after having received a
reachability confirmation. A router sends an RA message through an interface to
enable all nodes on a link connected to the interface to use the same reachable time.
The value can be set. The default value is 0 in an RA message. Value 0 means that a
router does not use this field.
– Retrans Timer: a 32-bit retransmission field that indicates the interval at which NS
messages are resent. The Retrans Timer value is used during neighbor
unreachability detection and address resolution. The value can be set. The default
value is 0 in an RA message. Value 0 means that a router does not use this field.
– Options:
n Source link-layer option: only used on link layers that have addresses. A router
must ignore this option when performing load sharing among multiple link-
layer addresses.
n MTU option: variable MTU of a link.
n Prefix Information option: specifies one or more prefixes for address
autoconfiguration.
n Advertisement Interval option: interval (in milliseconds) at which RA
messages are sent. This option is used for mobile IPv6.
n Home Agent option: used for mobile IPv6.
n Route Information option: used by a host to generate a default route.
l Redirect: When a device finds that the inbound interface and outbound interface of a
packet are the same, the device can send Redirect messages to instruct the host that sends
the packet to choose a better next hop. Figure 7-42 shows the Redirect message format.
Figure 7-42 Redirect message format

0 7 15 31
Type Code Checksum
Reserved
Target Address
Destination Address
Options

Equipment
A Redirect message contains the following fields:

the receive end.
– Target Address: a 32-bit next-hop address:
n If the destination is a router that is not on the local link, the Target Address
field must be set to the local link-layer address of the router.
n If the destination is a host on the local link, the Target Address field must be
set to the destination address.
– Destination: a 32-bit destination address carried in the IPv6 header.
– Options:
n Target link-layer address option: new next-hop link-layer address.
n Redirected header option: contains the content of the IP packet that triggers the
sending of the Redirect message. The size of a Redirect message with this
option cannot exceed 1280 bytes.
l Neighbor Solicitation (NS): used by an IPv6 node to obtain the link-layer address of its
neighbor, check whether the neighbor is reachable, and perform duplicate address
detection. Figure 7-43 shows the NS message format.
Figure 7-43 NS message format

0 7 15 31
Type Code Checksum
Reserved
Target Address
Options
An NS message contains the following fields:

the receive end.
– Target Address: a 32-bit address of the target node in the solicitation. A link-local or
global unicast address, except a multicast address, can be used as a target address.
– Options: An NS message only contains the source link-layer address option. This
option contains the link-layer address of the sender. If an unspecified IPv6 source
address is used, an NS message cannot contain the source link-layer address option.
l Neighbor Advertisement (NA): After receiving an NS message, an IPv6 node responds
with an NA message. The IPv6 node proactively sends NA messages when link-layer
information changes. Figure 7-44 shows the NA message format.

Equipment
Figure 7-44 NA message format

0 7 15 31
Type Code Checksum
R S O Reserved
Target Address
Options
An NA message contains the following fields:

– R: a 1-bit Router flag which identifies the role of the sender.
n 0: host
n 1: router
In the NUD scenario, a router that receives the NA message from its neighbor
router checks whether the neighbor router becomes a host based on the R flag.
– S: a 1-bit Solicited flag:
n 0: No NA message is sent in response to a unicast NS message.
n 1: An NA message is sent in response to a unicast NS message.
During NUD, the S field is used to check whether a neighbor is reachable.
n 0: reachable
n 1: unreachable
The S field must be set to 0 in a multicast advertisement or a non-solicit unicast
advertisement. For example, if an NS message used for DAD contains a multicast
address of a solicited node, and a node receives the NS message and uses DAD to
detect an address conflict, the node replies with an NA message with the S field of
0.
– O: a 1-bit Override flag.
n 0: enables the router to use the target link-layer address option to update the
cached neighbor entry only if the link-layer address is unspecified.
n 1: enables the router to use the target link-layer address option to update the
cached neighbor entry, regardless of the link-layer address.
If the target address of an NS message is an anycast address or a proxy
advertisement is solicited, the O field must be set to 0. In other situations, for
example, the DAD scenario, the O field must set to 1.
the receive end.
– Target Address: a 32-bit address.
n If an NA message is sent in response to an NS message, the Target Address
field is equal to the target address carried in the NS message.
n If an NA message is not a response to the NS message, the Target Address
field is equal to the IP address of a node with a changed link-layer address.

Equipment
A multicast address cannot be used as a target address.

– Options: The NA message only contains the target link-layer address option. The
link-layer address belongs to the sender.
Router Discovery
Router discovery is used to locate a neighboring device and learn the address prefix and
configuration parameters related to address autoconfiguration. IPv6 router discovery is
implemented based on the following messages:
l Router Solicitation (RS) message
When a host is not configured with a unicast address, for example, when the system is
just started, it sends an RS message. An RS message helps the host rapidly perform
address autoconfiguration without waiting for the RA message periodically sent by an
IPv6 device. An RS message is an ICMPv6 message with type 133.
l Router Advertisement (RA) message
Interfaces on each IPv6 device periodically send RA messages only when they are
enabled to send IPv6 RA messages. After receiving the RS message of an IPv6 device on
the local link, a device responds with an RA message. An RA message is sent to the all-
nodes multicast address (FF02::1) or to the IPv6 unicast address of the node that sends
the RS message. An RA message is an ICMPv6 message with type of 134 and contains
the following information:
– Whether to use address autoconfiguration.
– Supported autoconfiguration type: stateless or stateful.
– One or multiple on-link prefixes. On-link nodes can perform address
autoconfiguration using these address prefixes.
– Lifetime of the advertised on-link prefixes
– Whether the device that sends an RA message can be used as a default device. If
yes, the lifetime, expressed in seconds, of the default device is also used.
– Other information about the host, such as the hop limit and the MTU that specifies
the maximum size of the packet initiated by a host.
After an IPv6 node on the local link receives the RA message, it extracts the preceding
information to obtain the updated default device list, prefix list, and other configurations.
Address Autoconfiguration
A router sends RA messages with the M field to instruct a host how to perform address
autoconfiguration. A host selects an address configuration mode based on the M flag in an RA
message shown in Figure 7-41. The configuration modes include stateless and stateful
address configuration.
l If the M field is set to 0, stateless address allocation is used. The host does not need to be
additionally configured, the router needs a few configurations, and no server is needed.
After a host receives an RA message, it uses prefix information in the message and local
interface ID to automatically calculate an IPv6 address. The host also sets the default
router according to the default router information in the message. Stateless address
allocation only applies to hosts, not routers.
l If the M field is set to 1, stateful address allocation is used. A server, for example, a
DHCPv6 server, assigns a host an IPv6 address. The server maintains a database that
contains the host information and configured addresses. Stateful address allocation
allows hosts to obtain IPv6 addresses from a server.

Equipment
Hosts can select the mode for configuring other information, such as DNS and SIP server
address based on the O field carried in the RA messages:
l If the O field is set to 0, the host obtains IPv6 settings (except an IPv6 address) using a
stateless protocol, for example, ND.
l If the O field is set to 1, the host obtains IPv6 settings (except an IPv6 address) using a
stateful protocol, for example, DHCPv6.
NOTE
RFC 4861 defines that if the M flag is set to 1, the O flag must also be set to 1.
Default Router Priority and Route Information

Two fields are defined in an RA message: the default router priority and the route
information. The two fields help a host select a suitable ATN to forward packets.
If there are multiple ATNs on the links connected to a host, the host needs to select suitable
ATNs based on different destination addresses of the packets to be forwarded. Each ATN
advertises its default router priority and specific route information to the host so that the host
can enhance its own capability of selecting suitable forwarding ATNs based on different IP
addresses of the packets to be forwarded.
After receiving an RA message that contains route information, the host updates its own
routing table. Before sending packets to other devices, the host can search the updated route
information to select a suitable route to forward the packets.
After receiving an RA message that contains the default router priority, the host updates its
own default router list. If the host does not have any route to select when sending packets to
other devices, the host will search the updated router list for the ATN with the highest priority.
If the ATN with the highest priority becomes faulty, the host selects another ATN in
descending order of priority.
Duplicate Address Detection

Duplicate address detection (DAD) is a detection mechanism used to determine whether an
IPv6 address is available. ND repeatedly performs DAD before a unicast IPv6 address of an
interface takes effect. The unicast IPv6 address that is to take effect is called a tentative
address. ND sends NS messages to the tentative address to detect whether IP addresses are
duplicate. The target address in an NS message is set to the tentative address. The source IPv6
address in the IPv6 header is unspecified. The destination address is set to the multicast
address of the solicited node with the target address. Figure 7-45 illustrates the DAD process.

Equipment
Figure 7-45 DAD process

Nerghber Advertisement
Destination MAC:33-33-00-00-00-01
Source Addr:FEC0::2:260:8FF:FE52:F9D8
HostA
Destination Addr:FF02::1
Target Addr:FEC0::2:260:8FF:FE52:F9D8
Target Link-Layer Addr:00-60-08-52-F9-D8 Tentative IP:
FEC0::2:260:8FF:FE52:F9DB
NA(Muslticast)
NS(Muslticast)
Nerghber Solicitation
Destination MAC:33-33-FF-52-F9-D8
Source Addr: :: HostC
HostB
Destination Addr:FF02::1:FF52:F9DB
MAC:00-60-08-52-F9-D8
Target Addr:FEC0::2:260:8FF:FE52:F9D8
IP:FEC0::2:260:8FF:FE52:F9D8
The DAD process is as follows:

1. When a node is configured with an IPv6 address, it immediately sends an NS message to
check whether this address is used by another neighboring node.
2. After receiving the NS message, a neighboring node checks whether the same IPv6
address exists. If the same IPv6 address exists, the neighboring node sends an NA
message with the IPv6 address to the source node.
3. After the source node receives the NA message, it considers that this IPv6 address is
used by a neighbor and does not use this IPv6 address. If the source node does not
receive the NA message, it considers the configured IPv6 address available and uses it.
Neighbor Discovery
Similar to IPv4 ARP, IPv6 ND resolves the addresses of neighbors and monitors the
reachability of neighbors based on NS and NA messages.
When a node needs to obtain the link-layer address of another node on the same local link, it
sends an NS message of type 135. The NS message is similar to an IPv4 ARP Request
message and is destined for a multicast address instead of a broadcast address. Only the node
with last 24 bits in the address the same as the multicast address can receive the NS message.
This helps minimize the possibility of broadcast storms. A destination node fills in its link-
layer address in the NA message.

Equipment
Figure 7-46 Neighbor address resolution

Nerghber Advertisement
Destination MAC:00-10-5A-AA-20-A2
Source Addr:FE80::260:97FF:FE02:6EA5 HostA
Destination Addr:FF80::210:5AFF:FEAA:20A2
Target Addr:FE80::260:97FF:FE02:6EA5
MAC:00-10-5A-AA-20-A2
Target Link-Layer Addr:00-60-97-02-6E-A5
IP:FE80::210:5AFF:FEAA:20A2
NA(Unicast)
NS(Multicast)
Nerghber Solicitation
Destination MAC:33-33-FF-02-6E-A5
Source Addr:FE80::210:5AFF:FEAA:20A2
HostB HostC
Destination Addr:FF02::1:FF02:6EA5
MAC:00-60-97-02-6E-A5
Target Addr:FE80::260:97FF:FE02:6EA5
IP:FE80::260:97FF:FE02:6EA5
Source Link-Layer Addr:00-10-5A-AA-20-A2
An NS message is also used to monitor the reachability of a neighbor when the link-layer
address of the neighbor is known. After receiving an NS message, a destination node responds
with an NA message of type 136 on the local link. After receiving the NA message, the source
node can communicate with the destination node. When the link-layer address of a node on
the local link changes, the node proactively sends an NA message.
IPv6 Security Neighbor Discovery

IPv6 neighbor discovery (ND) is used to verify the reachability of neighbors on the local link,
which is critical to network security. In recent years, network security issues have become an
increasingly relevant concern, and threats related to the ND protocol have been examined and
discussed at length. RFC 3756 defines key security threats to network access using ND, and
in response, RFC 3971 defines SEcure Neighbor Discovery (SEND) and provides two SEND
mechanisms: SEND cryptographically generated address (CGA) and SEND authorization
delegation discovery (ADD). SEND CGA is able to counter most network attacks. To counter
the malicious last-hop router attacks and bogus address prefix attacks, SEND ADD needs to
be deployed.
l ND-specific Security Threats and Mechanisms
Table 7-12 outlines security threats to the ND protocol as well as the security
mechanisms designed to address these threats.

Equipment
Table 7-12 ND-specific security threats and mechanisms

Sec Secur Attack Principle Defense Principle
urit ity
y Mech
Thr anis
eat m
NS/ SEND An attacker sends a legitimate The key reason why such an attack
NA CGA node (host or ATN) a neighbor is launched during ND address
spoo solicitation (NS) message that resolution is that legitimate nodes
fing contains a bogus source link- fail to determine the IPv6
layer address option or a addresses and link-layer addresses
neighbor advertisement (NA) as well as the bindings between
message that contains a bogus them. Therefore, legitimate nodes
target link-layer address incorrectly receive NS or NA
option. NS/NA spoofing messages sent from the attacker. In
causes messages for the response to this attack, SEND
legitimate node to be sent to CGA combines a CGA address, a
the bogus address. CGA option, and an RSA option to
authenticate the validity of the
source address carried in an ND
message.
Neig SEND An attacker repeatedly sends SEND counters this attack by

hbor CGA falsified NA messages in requiring the NA message in
unre response to an NS message response to an NS message to
acha sent by a legitimate node include an RSA Signature option
bility during NUD. As a as a proof of authorization to use
detec consequence, the legitimate the interface identifier in the
tion node fails to detect neighbor address being tested. If these
(NU unreachability. The prerequisites are not met, the node
D) consequences of this attack performing NUD discards NA
failu depend on why the neighbor messages.
re became unreachable and how
the legitimate node would
behave if it knew that the
neighbor has become
unreachable.
Dupl SEND An attacker responds to every SEND counters this attack by

icate CGA DAD attempt made by a host requiring the NA message in
addr that accesses the network. If response to an NS message to
ess the attacker claims the address, include an RSA Signature option
detec then the host will never obtain as a proof of authorization to use
tion an address. the interface identifier in the
(DA address being tested. If these
D) prerequisites are not met, the node
attac performing DAD discards NA
ks messages.

Equipment

urit ity
y Mech
Thr anis
eat m
Spoo SEND An attacker uses the link-layer SEND counters this attack by
fed CGA address of the current first-hop requiring a Redirect message to
Redi router to send a Redirect contain an RSA Signature option.
rect message to a legitimate host. The RSA signature is calculated
mess The legitimate host accepts this using the public key of the
age message because the host legitimate host. All messages that
mistakenly considers that the fail to pass the RSA signature-
message came from the first- based authentication are discarded.
hop router.
Repl SEND An attacker captures valid SEND protects against this attack
ay CGA messages and replays them. from solicited messages (such as
attac That is, even if Neighbor NS/RS messages) by including a
ks Discovery Protocol (NDP) Nonce option and requiring
messages are cryptographically response messages (such as
protected so that their contents NA/RA messages) to include a
cannot be forged, they are still matching Nonce option. SEND
prone to replay attacks. protects against this attack from
unsolicited messages (such as
NA/RA/Redirect messages) by
including a Timestamp option.
Mali SEND An attacker multicasts bogus NDP allows a node to perform

cious ADD RA messages or unicasts bogus autoconfiguration based on
last- RA messages in response to information learned shortly after
hop multicast RS messages to a connecting to a new link. It would
route host attempting to discover a be particularly difficult for a node
r last-hop router. If the host to distinguish between valid and
selects the attacker as its invalid sources of router
default router, the attacker is information, because the node
able to intercept all messages needs this information before
exchanged between the host communicating with nodes outside
and its destination and inserts of the link.
new contents into messages. As the newly-connected node
This attack is also called a cannot communicate off-link, it
man-in-the-middle attack. cannot be responsible for searching
information to help validate
routers. However, given a
certificate path, the node can check
another device's search results and
conclude that a particular message
comes from an authorized source.

Equipment

urit ity
y Mech
Thr anis
eat m
Bog SEND An attacker sends a falsified The security mechanism designed

us ADD RA message specifying that for this attack is similar to that of
addr some prefixes are on-link. If a the last-hop router attack.
ess prefix is on-link, a host will not SEND requires that routers be
prefi send any packets that contain certified. This certification is
x this prefix to the router. jointly implemented by hosts and
Instead, the host will send NS routers. Specifically, a host must
messages to attempt address be configured with a trust anchor
resolution, but the NS to which a router has a certification
messages will not result in a path before the host selects the
response, denying services to router as its default router.
the attacked host.
l Basic Concepts
Table 7-13 SEND options and messages

SEN Options and Description
D Messages
Mec
hani
sm
CGA CGA address The interface identifier of a CGA address is generated

using a one-way hash function from the sender's public
key and some additional parameters.
CGA option The CGA option includes the sender's amendment value
and public key. The receiver can use the CGA option to
verify the sender's CGA.
RSA Signature The RSA Signature option includes the hash value of the
option sender's public key and the digital signature constructed
using the sender's private key and ND messages. The
receiver uses the RSA Signature option to verify the
integrity of ND messages and authenticate the identity of
the sender.
Timestamp The value of the Timestamp option is a 64-bit unsigned

option integer, indicating the number of seconds since January 1,
1970, 00:00 (UTC). The receiver needs to use the
Timestamp option to ensure that the last received packet
has the latest timestamp, protecting unsolicited
advertisement and redirect messages from being replayed.

Equipment
SEN Options and Description

D Messages
Mec
hani
sm
Nonce option The Nonce option contains a random number selected by

the sender of a solicitation message. For example, if an NS
message carries the Nonce option, the sender considers the
NA message in response to the NS message valid only
when the NA message also carries the Nonce option. In
this manner, the Nonce option prevents replay attacks
launched during exchange of request and response
messages.
ADD Certification A CPS message is sent by a host when it wishes to request

Path a certificate path between a ATN and one of the host's trust
Solicitation anchors. When a router advertisement (RA) message has
(CPS) message been received with a public key that is not available from a
certificate in the host's cache, or when there is no
certification path to one of the host's trust anchors, the host
will send a CPS message carrying the Trusted Anchor
option to search for the certification path. Currently, the
Options field in a CPS message includes one or more Trust
Anchor options.
Certification A CPA message is sent by a ATN to a host to advertise the

Path requested certificate. The Options field in a CPA message
Advertisement includes a Certificate option and zero or more Trust
(CPA) message Anchor options.
Trust Anchor The Trust Anchor option identifies a trust anchor for which
option a given certification path should be constructed.
Certificate The Certificate option is included only in a CPA message

option and carries the certificate contents requested by a CPS
message.
l Deployment Model
– SEND deployment with no public key infrastructure (PKI)
Figure 7-47 SEND deployment with no PKI
Host A Host B Device
SEND-NS(SLLA,CGA,Nonce,(Time Stamp),RSA)
SEND-NA(SLLA,CGA,Nonce,(Time Stamp),RSA)

Equipment
In this deployment model, an ND message contains a CGA address that functions as

the IPv6 source address and the CGA, RSA Signature, Timestamp, and Nonce
options. A host or ATN verifies message integrity, message source, and message
authenticity by checking the CGA address and the CGA, RSA Signature,
Timestamp, and Nonce options.
This deployment model works towards countering CGA related attacks only.
– SEND deployment with PKI
This deployment model is used when SEND ADD needs to be implemented. Some
PKI related concepts are described as follows:
n Digital certificate: It is an important component of PKI. A digital certificate
(also known as a certificate) is an electronic document that is issued by a
certificate authority (CA) and used to prove ownership of a public key. Such a
certificate includes information about the key, information about its owner's
identity, and the digital signature of an entity that has verified the certificate's
contents. A digital certificate provides a technical guarantee for a digital
signature. Generally, mainstream digital certificates comply with the X.509 v3
standard, providing a public key of the entity being certified and allowing
verification of the entity's identity. In a typical PKI scheme, two certificates
and two key pairs are offered for each entity, with one certificate for digital
signature and the other certificate for encryption. Theoretically, the encryption
certificate cannot be used to certify a digital signature.
n CA: It serves as PKI's brain. A CA is an authoritative third party that is trusted
to sign digital certificates in a fair manner. In the case of large-scale PKI where
multiple users need to be certified and issued a certificate, the use of only one
CA may lead to an overload condition. This situation requires layered CA
deployment. That is, a trust chain needs to be established from top to bottom
for all CAs, with lower-layer CAs trusting upper-layer CAs. The certificates
for lower-layer CAs are issued and certified by upper-layer CAs.
n Trust anchor: To certify a user certificate, the user must obtain the public key
of the CA that has issued the certificate prior to checking the private key
signature of the CA. Each CA certificate is certified by an upper-layer CA,
forming a certificate path. A certificate path ends at a trust point. The trust
point, also called a trust anchor, is typically the root CA holding a self-signed
certificate or a trusted intermediate CA. Generally, a certificate path starts
from a trust anchor.
n Certification revocation list (CRL): In the event of a private key leak, user
service interruption, or a change in user identity, user information, or public
key, users need to perform certificate revocation by canceling the bindings
between the public key and the user identity. In PKI schemes, routers perform
certificate revocation by applying to the CA server administrator for a CRL in
an out-of-band manner, such as by phone or email.
n IP address extension: The ATN can advertise a combination of certified
prefixes (contained in the IP address extension of a certificate) and uncertified
prefixes (not contained in the IP address extension of a certificate). Uncertified
prefixes, which are considered insecure, may be discarded by SEND nodes
that do not interact with non-SEND nodes. Certified prefixes can be classified
as constrained prefixes or unconstrained prefixes.
○ Constrained prefix: If a network administrator wants to constrain the
prefixes to be routed by a ATN, the ATN needs to be configured with a
certificate having the constrained prefixes listed in the IP address

Equipment
extension. The ATN can then advertise only the prefixes that are within
the prefix range specified in the certificate.
○ Unconstrained prefix: If the IP address extension in the certificate for a
ATN is missing or is the null prefix (::/0), the prefixes that the ATN
advertises are said to be unconstrained. That is, the ATN is allowed to
advertise any prefix.
Figure 7-48 SEND deployment with PKI
CA – Certificate Authority
TA – Trust Anchor
CRL – Certificate Revocation List
CA (C0)
CRL TA
CA (C1)
CA (C2)
TA TA
Off link
On link
Device (CR)
Host A Host B
RS(SLLA,CGA,Nonce,(Time Stamp),RSA)
1
RA(SLLA,CGA,Nonce,(Time Stamp),RSA)
2
CPS(Trust Anchor)
3
CPA(Trust Anchor,Certificate(C1))
4
CPA(Trust Anchor,Certificate(C2))
CPA(Trust Anchor,Certificate(CR))
3 证书验证、签名验证、前缀验证
As shown in Figure 7-48, when SEND ADD is deployed with PKI, both the host
and ATN use trust anchors. The ATN is certified using CA2, with CA2 certified
using CA1 and CA1 certified using CA0. CA0 is trusted by the host.
SEND ADD includes offline preparation and online operation.
n Offline preparation:

Equipment
○ Certificate C0 and a CRL need to be designated for and loaded to the

host.
○ The certification path (from CR, C2, C1, to C0) with C0 as the trust
anchor certificate needs to be designated for and loaded to the ATN.
n Online operation:
1) The host multicasts an RS message to all devices on the network. The RS
message must carry a CGA address and the CGA, RSA Signature,
Timestamp, and Nonce options.
2) Upon receiving the RS message, the ATN uses the CGA algorithm to
check the validity of the RS message as well as the Timestamp and Nonce
options. If the check fails, the ATN discards the RS message. If the check
succeeds, the ATN replies with an RA message that carries a CGA
address and the CGA, RSA Signature, Timestamp, and Nonce options.
3) Upon receiving the RA message, the host uses the CGA algorithm to
check the validity of the RA message as well as the Timestamp and
Nonce options. If the check fails, the host discards the RA message. If the
check succeeds, the host sends the ATN a CPS message that carries the
Trust Anchor option, requesting a certification path to the trust anchor
(C0).
NOTE
The host will not send a CPS message if it already owns a certification path to
one of the ATN's trust anchors.
4) Upon receiving the CPS message, if the ATN fails to locate the
certification path to the requested trust anchor contained in the CPS
message, the ATN replies with a CPA message that carries a Trust Anchor
option other than a Certificate option. If the ATN successfully locates the
desired certification path, the ATN replies with a CPA message that
carries both the Certificate and Trust Anchor options. If multiple
certificates are saved in the certification path, the ATN replies with
multiple CPA messages, with each message carrying the contents of a
single certificate. The CPA messages are sent from the trust anchor (C0)
to C1, C2, and CR.
NOTE
The CPA message does not need to contain information about a trust anchor
certificate because the trust anchor certificate (C0) has been loaded to the host.
5) Upon receiving the CPA message, the host performs validity checks in the
following order:
1) Certificate authentication:
The host checks the certificates carried in the CPA message from the
trust anchor certificate (C0) to the ATN certificate (CR). If any
certificate is detected to be missing, the authentication fails, and the
host discards the RA message.
If the host has not been connected, the host fails to perform online
CRL check and therefore cannot determine whether the ATN
certificate (CR) is valid. The certificates that pass the authentication
are considered temporary ones until the certificate authentication is
complete. Once the host becomes connected, the host must
immediately perform CRL check on the ATN certificate (CR). If the
CRL check fails, the host instantly stops using the ATN as the
default router and selects another ATN to take its place.

Equipment
NOTE
The host caches the certificates that passed the authentication so that no
more CPS messages will be sent upon receiving an RA message. The
cached certificates require periodic CRL checks to ensure availability.
2) Signature authentication:
If certificate authentication succeeds, the host uses the public key
carried in the ATN certificate (CR) to certify the digital signature
contained in the RSA Signature option of the RA message. If
signature authentication fails, the host discards the RA message.
3) (Optional) Prefix authentication:
If signature verification succeeds and the ATN certificate (CR)
contains the IP address extension, the host authenticates the prefix
carried in the IP address extension.
○ If stateful address autoconfiguration is performed, the host
authenticates the prefix provided by the DHCP server other
than the prefix carried in the RA message.
○ If stateless address autoconfiguration is performed, the host
authenticates the prefix (or prefix range) carried in an RA
message sent by the ATN.
NOTE
If none of the prefixes carried in the RA message are within the

prefix range specified in the IP address extension contained in the
ATN certificate (CR), the host stops using that ATN as the default
router and selects another ATN to take its place.
7.6.2.6 Path MTU
Introduction to the MTU

During transmission, IPv6 packets cannot be fragmented on the transit node, so packet length
is often greater than the path MTU (PMTU). The source node then needs to retransmit the
IPv6 packets continuously. This reduces transmission efficiency. If the source node uses the
minimum IPv6 MTU of 1280 bytes as the maximum fragment length, in most cases, the
PMTU is greater than the minimum IPv6 MTU of the link, and the fragments sent by a node
are always smaller than the PMTU. As a result, network resources are wasted. The PMTU
discovery protocol is introduced to solve this problem.
Principle of the Path MTU

PMTU is the process of discovering the preferable IPv6 MTU on the path from the source to
the destination. PMTU discovery describes a method of dynamically discovering the PMTU
for a path. When an IPv6 node sends a great deal of data to another node, data is transmitted
through a series of IPv6 fragments. When these fragments are of the maximum length allowed
in successful transmission between the source node and destination node, the fragment length
is considered optimal and called PMTU.
A source node assumes that a PMTU of a path is the known IPv6 MTU of the first hop on the
path. If the packet sent from this path is too large to be forwarded along the path, the transit
node discards this packet and returns an ICMPv6 Datagram Too Big message to the source
node. The source node then sets the PMTU of the path according to the IPv6 MTU in the
message.

Equipment
When the PMTU learned by the node is smaller than or equal to the actual PMTU, the PMTU
discovery process is complete. Before the PMTU discovery process is complete, ICMPv6
Datagram Too Big messages may be repeatedly sent and received because smaller IPv6
MTUs may be found on farther paths.
7.6.2.7 Dual Protocol Stacks

An IPv6 node that retains the complete IPv4 protocol stack to be compatible with IPv4 is a
dual-stack node. Figure 7-49 shows the structure of a single protocol stack and dual protocol
stacks.
Figure 7-49 Structure of a single protocol stack and dual protocol stacks in Ethernet
IPv4 Application IPv4/IPv6 Application
TCP UDP TCP UDP
IPv4 IPv4 IPv6

Protocol ID: Protocol ID: Protocol ID:
0x0800 0x0800 0x86DD
Ethernet Ethernet
IPv4 Stack Dual Stack
Dual protocol stacks have the following advantages:
l Multiple link protocols support dual protocol stacks.

Multiple link protocols, such as Ethernet, support dual protocol stacks. In Figure 7-49,
the link protocol is Ethernet. In an Ethernet frame, if the Protocol ID field is 0x0800, it
indicates that the network layer receives IPv4 packets; if it is 0x86DD, it indicates that
the network layer receives IPv6 packets.
l Multiple applications support dual protocol stacks.
Multiple applications, such as the DNS, FTP, and Telnet, support dual protocol stacks.
The upper layer application such as the DNS can use TCP or UDP as the transmission
layer protocol, and prefers the IPv6 protocol stack rather than the IPv4 protocol stack as
the network layer protocol.
7.6.2.8 TCP6
Transmission Control Protocol Version 6 (TCP6) provides a mechanism to establish virtual
circuits between processes of two endpoints. A TCP6 virtual circuit is similar to the full-
duplex circuit that transmits data between systems. Providing reliable data transmission
between processes, TCP6 is called a reliable protocol. TCP6 also provides a mechanism to
optimize the transmission performance according to the network status. When all the data can
be received and acknowledged, the transmission rate increases gradually. However, delay
causes the sending host to decrease the sending rate before it receives Acknowledgement
packets.

Equipment
TCP6 is generally used in interactive applications, such as the Web. However, certain errors in
data receiving affect the normal operation of devices. TCP6 establishes virtual circuits by
using the three-way handshake mechanism, and all the virtual circuits are deleted through the
four-way handshake. TCP6 connections provide multiple checksums and reliability functions,
but increase the cost. As a result, TCP6 has lower efficiency than User Datagram Protocol
Version 6 (UDP6).
Figure 7-50 shows the establishment and removal of a TCP6 connection.
Figure 7-50 Establishment and removal of a TCP6 connection
Client Server
Call the socket and Call the socket and receive its
receive its return value return value
Call the bind/listen function
and receive their return values
Call the connect SYN
function
Call the accept function
Set up a Wait SYN|ACK
connection
Receive the return Wait
ACK
value of connect
Receive the return value of accept
Call the recv function
Data Call the send function and receive
Wait its return value
Data Receive the return Call the recv function
transmission value of recv Wait
Call the send Data|ACK
function Receive the return value of recv
ACK
Call the close FIN

function and Call the recv function
receive its return ACK
Wait
Close the value
connection Receive the return value 0 of
FIN
recv
ACK
Call the close function and
receive its return value
7.6.2.9 UDP6
User Datagram Protocol Version 6 (UDP6) is a computer communication protocol used to
exchange packets on a network. UDP6 has the following characteristics:
l UDP uses only source and destination information and is mainly used in the simple
request/response structure.
l UDP is unreliable, so it cannot be determined whether UDP6 datagrams reach their
destinations.
l UDP is connectionless. That is, no virtual circuits are required during data transmission
between hosts.
The connectionless feature of UDP6 enables UDP6 to send data to multicast addresses. This
is different from TCP6, which requires specific source and destination addresses.

Equipment
7.6.2.10 RawIP6
RawIP6 fills only a limited number of fields in the IPv6 header, and it allows application
programs to provide their own IPv6 headers.
RawIP6 is similar to UDP6 in the following aspects:
l RawIP6 is unreliable, so it cannot be determined whether RawIP6 datagrams reach their
destinations.
l RawIP6 is connectionless. That is, no virtual circuits are required during data
transmission between hosts.
Unlike UDP6, RawIP6 allows application programs to directly operate the IP layer through
the socket. This facilitates the direct interactions with the lower layer.
7.6.3 Applications
ND Proxy Within a VLAN

On an IPv6 network, if two users are in the same VLAN but user isolation is configured for
the VLAN, to enable the two users to communicate with each other, you need to enable ND
proxy within a VLAN on the VLANIF interface.
As shown in Figure 7-51, PC1 and PC2 belong to VLAN 1. Interface isolation in a VLAN is
configured on S1 and therefore PC1 and PC2 cannot communicate. In such a case, you can
configure ND proxy within a VLAN on VLANIF1 of the ATN so that PC1 can communicate
with PC2.
Figure 7-51 Networking diagram of ND proxy within a VLAN
Router
VLANIF1
Switch1
VLAN1
PC 1 PC 2
If PC1 needs to communicate with PC2, PC1 first sends an NS packet to query the MAC
address of PC2; the NS packet, however, cannot reach PC2 because interface isolation is
configured on S1. The routing device, therefore, is responsible for forwarding the NS packet
to PC2. Note that on the routing device, the MAC address carried in the NS packet is changed
to the MAC address of VLANIF1. PC2 then returns an NA packet to PC1. After receiving the
NA packet, the routing device generates an ND entry for PC2 and related routing entries,

Equipment
changes the MAC address carried in the NA packet to the MAC address of itself, and
forwards the NA packet to PC1. In this manner, the MAC address of PC2 learnt by PC1 is
actually the MAC address of the routing device.
PC1 then encapsulates packets based on the learnt ND entries and sends the packets to the
ATN and the ATN forwards the packets to PC2 based on the learnt route.
ND Proxy Between VLANs

In the scenario where two users belong to different VLANs, if the two users attempt to
communicate, you need to enable ND proxy between VLANs on the VLANIF interface of the
Aggregate-VLAN.
As shown in Figure 7-52, PC1 and PC2 belong to VLAN 1 (Access-VLAN) and VLAN 2
(Access-VLAN) respectively and are connected to the routing device through S1 and S2
separately. VLAN 1 and VLAN 2 both belong to VLAN 3 (Aggregate-VLAN). In such a
case, you can configure ND proxy between VLANs on VLANIF3 of the ATN so that PC1 can
communicate with PC2.
Figure 7-52 Networking diagram of ND proxy between VLANs
ATN A
GE0/2/1 GE0/2/2
VLAN1 VLAN2
VLAN3
Switch1 Switch2
VLAN1 VLAN2
If PC1 needs to communicate with PC2, PC1 first sends an NS packet to query the MAC
address of PC2. The routing device then changes the MAC address carried in the NS packet to
the MAC address of VLANIF3 on the routing device. PC2 then returns an NA packet to PC1.
After receiving the NA packet, the routing device generates an ND entry for PC2 and related
routing entries, changes the MAC address carried in the NA packet to the MAC address of
itself, and forwards the NA packet to PC1. In this manner, the MAC address learnt by PC1 is
actually the MAC address of the routing device.
PC1 then encapsulates packets based on the learnt ND entries and sends the packets to the
routing device and the routing device forwards the packets to PC2 based on the learnt route.

Equipment

Terms
Term Explanation
IPv6 Internet Protocol Version 6, which is also called IP Next Generation.
ND Neighbor discovery, which is used during the forwarding of IPv6 packets for
duplicate address detection, neighbor address resolution, and neighbor
reachability detection. Additionally, ND is a set of protocols and processes for
host address configuration In ND, different ICMPv6 messages are used for
router discovery and neighbor discovery.
ICMPv6 Internet Control Message Protocol Version 6, which is a base protocol of IPv6
and generates error messages and informational messages used by IPv6 nodes
to report errors and information generated during packet processing.
PMTU Path MTU, which discovers the supported MTU on a specific path by using
ICMPv6 Datagram Too Big messages.
Abbreviations
ICMPv6 Internet Control Management Protocol Version 6
ND Neighbor Discovery
RS Router Solicitation
RA Router Advertisement
NS Neighbor Solicitation
NA Neighbor Advertisement
PMTU Path MTU
IPv6 Internet Protocol Version 6
IPng IP Next Generation
TCP6 Transmission Control Protocol 6
UDP6 User Datagram Protocol 6
RawIP6 Raw IP6

Equipment
Feature Description 8 IP Routing
8 IP Routing
About This Chapter
This document describes the IP routing in terms of the overview, principle, and applications.
8.1 IP Routing Overview

8.2 Static Routes
8.3 RIP
8.4 RIPng
8.5 IS-IS
8.6 OSPF
8.7 OSPFv3
8.8 BGP
8.9 Routing Policies
8.10 Appendix List of Port Numbers of Common Protocols
8.1 IP Routing Overview
8.1.1 Introduction to IP Routing
Definition
Routing is the basic element of data communication networks. Routing information guides
data packet forwarding. IP routing refers to the process of relaying and forwarding packets.
8.1.2 Principles

Equipment
8.1.2.1 Routers
In the Internet, network connecting devices control traffic and ensure the quality of data
transmission on the network. Common network connecting devices include hubs, bridges,
switches and, routers.
As a typical network connection device, a router is used to select routes and forward packets.
According to the destination address in the received packet, a router selects a proper path,
which has single-hop or multiple hops in it, to send the packet to the next router. The last
router is responsible for sending the packet to the destination host. In addition, the router can
select an optimal path to transmit data.
The hop count from a router to its directly connected network is zero, and to a network
through another router, is one. The remaining number of hops required for the route can be
deduced by analogy. If a router is connected to another router through a network, that is, a
network segment exists between the two routers, the two routers are considered as adjacent
routers on the Internet. This connection between routers is independent of the physical links
that constitute each network segment.
In Figure 8-1, to get from Host A to Host C, a packet needs to go through three networks and
two routers. The bold arrows indicate network segments.
Figure 8-1 Hop count and network segments
Host A
Host C
Host B
The size of networks may vary, and the length of each network segment may also vary. In this
case, the number of network segments is multiplied by a weighted coefficient when the actual
length of a path is measured.
Routing through the minimum number of network segments is not always the ideal path. For
example, routing through three high-speed LAN network segments is probably much faster
than routing through two low-speed WAN network segments.

Equipment
8.1.2.2 Routing Protocols

The main functions of a router are route selection and packet forwarding. The rules that ATNs
use to discover routes, and guide the forwarding of packets are called routing protocols.
Routing protocols are classified into link-state protocols and distance-vector protocols
according to the type of algorithm they use. The protocols determine what information is
stored in the Routing and Forwarding Information Base (FIB) tables.
8.1.2.3 Routing Table and FIB Table

Each ATN maintains one routing table and one FIB table at least. A router uses the routing
table to select routes, and uses the FIB table to guide packet forwarding.
l Routes discovered by the various routing protocols are stored in the routing table. The
routes in the routing table are divided, according to their sources, into the following
types:
– Directly connected route or interface route: is the route discovered by the link layer
protocols.
– Static route: is the route manually configured by the network administrator.
– Dynamic route: is the route discovered by dynamic routing protocols.
l Each entry in the FIB table contains the physical or logical interface through which a
packet is sent to a network segment or host to reach the next ATN. An entry also
indicates whether the packet can be sent directly to a destination host in a directly
connected network.
Routing Table
Each ATN maintains the protocol routing table for each type of protocol and a local core
routing table (or routing management table).
l Protocol routing table
A protocol routing table stores the routing information discovered by the protocol.
A routing protocol can import and advertise the routes that are discovered by other
protocols. For example, if a ATN that runs the Open Shortest Path First (OSPF) protocol
needs to use OSPF to advertise direct routes, static routes, or Intermediate System-
Intermediate System (IS-IS) routes, the ATN must import the routes into the OSPF
routing table.
l Local core routing table
A ATN uses the local core routing table to store protocol routes and preferred routes. The
ATN then sends the preferred routes to the FIB table to guide packet forwarding.
The ATN selects routes according to the priorities of protocols and costs stored in the
routing table. To view the local core routing table of a ATN, run the display ip routing-
table command.
NOTE
A ATN that supports Layer 3 Virtual Private Network (L3VPN) maintains a local core routing
table for each VPN instance.
Routing Table Contents

When you run the display ip routing-table command on the ATN, you can view the brief
version of routing table contents.

Equipment
A routing table contains the following key data for each IP packet:
l Destination address: is used to identify the destination IP address or the destination
network address of an IP packet.
l Network mask: is combined with the destination address to identify the address of the
network segment where the destination host or ATN resides.
– The network address of the destination host or ATN is obtained through the "AND"
operation on the destination address and network mask. For example, if the
destination address is 1.1.1.1 and the mask is 255.255.255.0, the address of the
network where the host or ATN resides is 1.1.1.0.
– The network mask is composed of several consecutive 1s. These 1s can be
expressed in either the dotted decimal notation or the number of consecutive 1s in
the mask. For example, the network mask can be expressed either as 255.255.255.0
or 24.
l Proto: indicates the protocol through which routes are learned.
l Pre: indicates the preference added to the IP routing table for a route. To the same
destination, multiple routes with different next hops and outgoing interfaces exist. The
routes in the table are those discovered by different routing protocols or tare the
manually configured static routes. The router selects the route with the highest
preference (the smallest value) as the optimal route. For more information on the
preference of each protocol, see Table 8-1.
l Cost: indicates the route cost. When multiple routes to the same destination have the
same preference, the route with the lowest cost is selected as the optimal route.
NOTE
The Preference value is used to compare the preferences of various routing protocols, while the
Cost value is used to compare the preferences of different routes of the same routing protocol.
l NextHop: indicates the IP address of the next device that an IP packet passes through.
l Interface: indicates the outgoing interface through which an IP packet is forwarded.
The routes are divided according to the destination of the packet into the following types:
l Subnet route: The destination is a subnet.
l Host route: The destination is a host.
In addition, based on whether the router is directly connected to the network in which the
destination resides, a route is one of the following connection types:
l Direct route: The ATN is directly connected to the destination network.
l Indirect route: The ATN is not directly connected to the destination network.
To reduce the number of entries in the routing table, you can set a default route. All packets
that fail to match entries in the routing table are forwarded through this default route. For
example, the first route listed in the preceding routing table, with the destination address of
0.0.0.0/0, is a default route.
As shown in Figure 8-2, ATN A is connected to three networks, so it has three IP addresses
and three physical interfaces. Figure 8-2 also shows the routing table of ATN A.

Equipment
Figure 8-2 Schematic diagram of routing table
Routing Table 11.0.0.0/8

Destination Nexthop Interface
11.0.0.0/8 1.1.1.2 GE0/2/0 ATNB
12.0.0.0/8 2.2.2.2 GE0/2/1
1.1.1.2/24
13.0.0.0/8 3.3.3.2 GE0/2/2
GE0/2/0
1.1.1.1/24
GE0/2/1 GE0/2/2
2.2.2.1/24 3.3.3.1/24
ATNA
ATNC ATND
2.2.2.2/24 3.3.3.2/24
12.0.0.0/8 13.0.0.0/8
Matching with FIB Table

After route selection, ATNs send the active routes in the routing table to the FIB table. When
a router receives a packet, the ATN searches the FIB table for the optimal route to forward the
packet.
The, ATN performs the "AND" operation on the destination address in the packet and the
network mask of each entry in the FIB table. The ATN then compares the result of the "AND"
operation with the entries in the FIB table to find a match. The ATN chooses the optimal route
to forward packets according to the best or "longest" match.
As an example, a certain ATN has the following brief routing table:

Routing Tables:
Destination/Mask Proto Pre Cost Flags NextHop
Interface
1.1.1.1/32 OSPF 10 1 D 192.168.2.1

Ethernet0/2/0
2.2.2.2/32 Direct 0 0 D 127.0.0.1
LoopBack0
3.3.3.3/32 OSPF 10 2 D 192.168.22.2
GigabitEthernet0/2/1

Equipment
4.4.4.0/24 Direct 0 0 D 4.4.4.4

Ethernet0/2/2
4.4.4.4/32 Direct 0 0 D 127.0.0.1
Ethernet0/2/2
4.4.4.255/32 Direct 0 0 D 127.0.0.1
Ethernet0/2/2
5.5.5.5/32 OSPF 10 1 D 192.168.22.2
GigabitEthernet0/2/1
NOTE
The complete routing table contains active routes and inactive routes. The brief routing table contains
only active routes. To view the complete routing table, run the display ip routing-table verbose
command.
After receiving a packet that carries the destination address 9.1.2.1, the ATN searches the
following table:
FIB Table:
Total number of Routes : 5
Destination/Mask Nexthop Flag TimeStamp Interface
TunnelID
9.1.2.1/32 192.168.22.2 DGHUT t[11687] GE0/2/1
0xa
192.168.7.255/32 127.0.0.1 HU t[11637] InLoop0
0x0
192.168.7.2/32 127.0.0.1 HU t[11637] InLoop0
0x0
1.1.1.1/32 192.168.2.1 DGHUT t[288] Eth0/2/0
0x7
4.4.4.255/32 127.0.0.1 HU t[213] InLoop0 0x0
The ATN chooses the 9.1.2.1/32 entry because it is the longest match. The router then
forwards the packet through GE0/2/1 for the 9.1.2.1 entry.
8.1.2.4 Route Iteration

Routes can be used to forward traffic only when they have directly connected next hops.
However, this condition may not be met when routes are generated. Therefore, the system
needs to search for the directly connected next hops and corresponding outbound interfaces,
and this process is called route iteration. In most cases, BGP routes, static routes, and UNRs
do not have directly connected next hops, and route iteration is required.
For example, the next hop IP address of a BGP route is the IP address of a non-directly
connected peer's loopback interface, and therefore, the BGP route needs to be iterated.
Specifically, the system searches the IP routing table for a direct route (IGP route in most
cases) that is destined for the next hop IP address of the BGP route and then adds the next hop
IP address and outbound interface of the IGP route to the IP routing table to generate a FIB
entry.
The next hop IP address of a BGP VPN route is the IP address of a non-directly connected
PE's loopback interface, and the BGP route needs to be iterated to a tunnel. Specifically, the
system searches the tunnel list for a tunnel that is destined for this loopback IP address and
then adds the tunnel information to the routing table to generate a FIB entry.
8.1.2.5 Static Routes and Dynamic Routes

The ATN supports static protocols as well as dynamic routing protocols, such as RIP, OSPF,
IS-IS, and BGP.
Static routes can be easily configured on a system and have lower system requirements. Static
routes are applicable to simple, stable, and small-scale networks. Static routes, however,

Equipment
cannot automatically adapt to changes in the network topology, so they must be manually
configured.
On the other hand, dynamic routing protocols use routing algorithms to automatically adapt to
changes in network topology. Dynamic routes are applicable to the network that is equipped
with Layer 3 devices. The dynamic route configuration, however, has a higher requirement
(such as large memory capacity) for system performance and occupies more network
resources.
8.1.2.6 Classification of Dynamic Routing Protocols

Dynamic routing protocols are classified according to the following factors:
Range of Functions
Routing protocols are classified according to the application range:
l Interior Gateway Protocol (IGP): runs inside an AS, such as RIP, OSPF, and IS-IS.
l Exterior Gateway Protocol (EGP): runs between different ASs, such as BGP.
Algorithm
Routing protocols are classified according to the type of algorithm they use:
l Distance-Vector Routing Protocol: includes RIP and BGP (BGP is also called Path-
Vector).
l Link-State Routing Protocol: includes OSPF and IS-IS.
The algorithms differ mainly in their methods of route discovery and route calculation.
Destination Addresses
Routing protocols are classified by the following types of destination addresses: t
l Unicast routing protocol: includes RIP, OSPF, BGP, and IS-IS.
Static routes and dynamic routes discovered by the routing protocol are managed in the ATN.
All these routes can be shared among different routing protocols to implement
Readvertisement of Routing Information.
8.1.2.7 Routing Protocols and Route Preferences
Route Preferences
Routing protocols (including the static route) can learn different routes to the same
destination, but not all routes are optimal. Only one routing protocol at one time determines
the optimal route to a destination. To select the optimal route, each routing protocols
(including the static route) is configured with a preference (the smaller the value, the higher
the preference). When multiple routing information sources coexist, the route with the highest
preference is selected as the optimal route (the smaller the value is, the higher the preference
is). Table 8-1 lists the routing protocols and the default preferences of routes found by each
protocol.
In Table 8-1, 0 indicates the direct route, and 255 indicates any route learned from unreliable
sources.

Equipment
Table 8-1 Routing protocols and their default preferences

Routing Protocol or Route Type Route Preference
DIRECT 0
OSPF 10
IS-IS 15
STATIC 60
User Network Route (UNR) l Dynamic Host Configuration Protocol

(DHCP): 60
l AAA-Download: 60
l IP Pool: 61
l Frame: 62
l Host: 63
l Network Address Translation (NAT): 64
l Dual-Stack Lite (Dual-Stack Lite): 64
l IP Security (IPSec): 65
l Next Hop Resolution Protocol (NHRP): 65
l Point-to-Point Protocol over Ethernet
(PPPoE): 65
l Secure Sockets Layer Virtual Private Network
(SSL VPN): 66
RIP 100
OSPF AS-External (ASE) 150
OSPF Not-So-Stubby Area (NSSA) 150
IBGP 255
EBGP 255
Except for direct routes, you can manually configure a routing protocol's preference. In
addition, the preference for each static route can be distinct from the other routes.
The ATN also defines the external preference and internal preference. External preference is
the preference set by a user for each routing protocol. Table 8-1 shows the default external
preference.
If different routing protocols are configured with the same preference, the system determines
which routes discovered by these routing protocols become the preferred routes through an
internal preference. Table 8-2 shows the internal preferences of routing protocols.

Equipment
Table 8-2 Internal preferences of routing protocols

Routing Protocol or Route Type Route Preference
DIRECT 0
OSPF 10
IS-IS Level-1 15
IS-IS Level-2 18
STATIC 60
UNR 65
RIP 100
OSPF ASE 150
OSPF NSSA 150
IBGP 200
EBGP 20
For example, two routes, an OSPF route and a static route, can reach the destination
10.1.1.0/24, and the preferences of both routes are set to 5. In this case, the ATN determines
the optimal route according to the internal preferences listed in Table 8-2. The internal
preference value 10 of OSPF is higher than the internal preference value 60 of the static route.
Therefore, the system selects the route discovered by OSPF as the optimal route.
8.1.2.8 Priority-based Route Convergence
Definition
Priority-based route convergence, which provides faster convergence of routes for key
services, is an important technology to improve network reliability.
Routes can be set with different convergence priorities, such as critical, high, medium, and
low. The system performs route convergence based on the convergence priorities and a
convergence rule. In other words, the system schedules the convergence of routes with
different convergence priorities in proportion to a weighting scheme.
Purpose
With the integration of network services, the services must be differentiated. As required by
operators, the routes for key services, such as Voice over IP (VoIP), video conferences, should
converge as fast as possible, while the routes for common services can be converged
relatively slowly. To improve network reliability, the system converges routes in a manner
based on their convergence priorities.
Principle
Table 8-3 shows the default convergence priorities of public routes. The routing protocols
first compute and deliver routes of high convergence priorities to the system. By default, the

Equipment
system converges routes according to the scheduling weight values assigned to the
convergence priorities in the proportions of critical:high:medium:low = 8:4:2:1. You can re-
configure the scheduling weight values as required.
Table 8-3 Default convergence priorities of public routes

Routing Protocol or Route Type Convergence Priority
Direct High
Static Medium
32-bit host routes of OSPF and IS-IS Medium
OSPF route (except 32-bit host routes) Low
IS-IS route (except 32-bit host routes) Low
RIP Low
BGP Low
NOTE
For private routes, only 32-bit host routes of OSPF and IS-IS can be identified as medium and all other
routes are identifies as low.
8.1.2.9 Load Balancing and Route Backup
Load Balancing
The ATN supports the multi-route model (multiple routes with the same destination and
priority). Routes discovered by one routing protocol with the same destination and cost can
load-balance traffic. In each routing protocol view, you can run the maximum load-
balancing number command to configure the number of routes for load balancing. ATN
supports adopts per-flow load balancing.
l Per-flow load balancing
After per-flow load balancing is configured, the ATN forwards packets based on the
quintuple (the source address, destination address, source port, destination port, and
protocol in the packets). When the quintuple is the same, the ATN always chooses the
next hop address that is the same as the last one to send packets. Figure 8-3 shows the
networking for per-flow load balancing.

Equipment
Figure 8-3 Networking for per-flow load balancing
RouterB
GE0/2/0
10.1.1.0/24
P1～P6 10.1.1.0/24
ATN-A 10.2.1.0/24
10.2.1.0/24
GE0/2/4 P1～P6 RouterD
RouterC
ATN-A needs to forward packets to 10.1.1.0/24 and 10.2.1.0/24. Based on per-flow load
balancing, packets of the same flow are transmitted along the same path. The process for
ATN-A to forward packets is as follows:
– The first packet P1 to 10.1.1.0/24 is forwarded through GE 0/2/0, and all
subsequent packets to 10.1.1.0/24 are forwarded through the interface.
– The first packet P1 to 10.2.1.0/24 is forwarded through GE 0/2/4, and all
subsequent packets to 10.2.1.0/24 are forwarded through the interface.
Currently, the protocols that support load balancing are RIP, OSPF, BGP, and IS-IS. In
addition, static routes support load balancing.
8.1.2.10 Principle of IP FRR
IP FRR Overview
FRR refers to the mechanism that a fault detected at the physical layer or data link layer is
reported to the upper-layer routing system, and a backup link is immediately used to forward
packets.
Background of IP FRR
On traditional IP networks, when a fault occurs at the lower layer of the forwarding link, the
visible evidence is that the physical interface on the ATN becomes Down. After the ATN
detects the fault, it informs the upper layer routing system to recalculate routes and then
update routing information. Usually, it takes the routing system several seconds to re-select an
available route.
For services that require a low delay and low packet loss ratio, the convergence time of
several seconds is intolerant because it may lead to service interruption. For example, Voice
over Internet Protocol (VoIP) services are tolerant to interruption in milliseconds. IP FRR
ensures that the forwarding system swiftly detects such a fault and then takes measures to
restore services as soon as possible.

Equipment
Classification and Implementation of IP FRR

IP FRR, which is designed for routes on IP networks, is classified into IP FRR for the public
network and IP FRR for the private network.
l IP FRR for the public network: protects ATNs of the public network.
l IP FRR for the private network: protects costomer edges (CEs).
IP FRR is implemented as follows:
1. If the primary link is available, you can configure IP FRR by using a routing policy to
provide the forwarding information of the backup route for the forwarding engine.
2. If the forwarding engine is notified of a link fault, the engine uses the backup link to
forward traffic before the routes on the control plane converge.
Comparison Between IP FRR and VPN FRR
Table 8-4 Comparison Between IP FRR and VPN FRR
Item Description
IP FRR IP FRR is suitable for IP services that require a low delay and low packet
loss ratio.
l Protects the public network and CEs.
l Implements FRR through a backed up route.
VPN FRR VPN FRR is suitable for services that require a low delay and low packet
loss ratio on VPNs.
l Protects provider edges (PEs).
l Implements FRR through a backup tunnel.
8.1.2.11 Re-advertisement of Routing Information

The algorithm of each routing protocol is distinct. Different protocols may discover different
routes. If multiple protocols are run in a large-scale network, the protocols need to re-
advertise the routes they discover.
In the ATN the routes discovered by a protocol can be imported to the routing table of another
protocol. Each protocol has the mechanism to import routes. For details, refer to the chapter
"Routing Policy".
8.1.2.12 Indirect Next Hop
Definition
Indirect next hop can change the direct association between route prefixes and the next hop
into an indirect association. Then, next hop information can be refreshed independently, the
prefixes of the same next hop do not need to be refreshed one by one, and route convergence
is speeded up.

Equipment
Purpose
In the scenario in need of route iteration, when IGP routes or tunnels are switched,
Forwarding Information Base (FIB) entries are quickly refreshed. This implements traffic fast
convergence and reduces the impact on services.
Mapping Between the Route Prefix and the Next Hop

The mapping between the route prefix and the next hop is the basis of indirect next hop. To
meet the requirements of route iteration and tunnel iteration in different scenarios, next hop
information involves the address family, the original next hop address, or the tunnel policy.
The system assigns an index to information about each next hop, performs route iteration, and
then notifies the iteration result to the route protocol and distributes FIB entries.
On-Demand Route Iteration

On-demand route iteration indicates that when a dependent route is changed, only the next
hop related to the dependent route is re-iterated. If the destination address of a route is the
original next hop address or network segment address of next hop information, route changes
affect the iteration result of next hop information. Otherwise, route changes do not affect next
hop information. Therefore, when a route changes, you can re-iterate only the related next hop
by judging the destination address of the route. For example, if the original next hop address
of the route 2.2.2.2/32 is 1.1.1.1, the route that the original next hop 1.1.1.1 depends on may
be 1.1.1.1/32 or 1.1.0.0/16. If the route 1.1.1.1/32 or 1.1.0.0/16 changes, the iteration result of
the original next hop 1.1.1.1 is affected.
With respect to tunnel iteration, when a tunnel alternates between up and down, you just need
to re-iterate the next hop information whose next hop address is the same as the destination
address of the tunnel.
Iteration Policy
An iteration policy is used to control the iteration result of the next hop to meet the
requirements of different application scenarios. In route iteration, iteration behaviors do not
need to be controlled by the iteration policy. Instead, iteration behaviors only need to comply
with the longest matching rule. What is more, the iteration policy needs to be applied only
when VPN routes iterate tunnels.
By default, the system selects LSPs for a VPN. If other types of tunnels are required, you
need to configure a tunnel policy and bind the tunnel policy to a tunnel. After a tunnel policy
is applied, the system adopts the tunnel bound in the tunnel policy or selects a tunnel
according to the priorities of different types of tunnels.
Refreshment of Indirect Next Hop

On the forwarding plane, public network routes are forwarded based on the next hop and
outbound interface while VPN routes are forwarded based on the public network tunnel in
addition to the next hop and outbound interface. Before indirect next hop is adopted,
forwarding information, including the next hop, outbound interface, and the tunnel token,
needs to be added into the FIB entry by using the route prefix. In this manner, the route
convergence speed is relevant to the number of route prefixes. After indirect next hop is
adopted, many route prefixes corresponds to a shared next hop. Forwarding information is
added into the FIB entry by using the next hop, and the traffic with the relevant route prefixes
can be switched simultaneously. Therefore, the route convergence speed becomes faster.

Equipment
Figure 8-4 Schematic diagram before indirect next hop is adopted
Forwarding
Prefix 1 Nexthop 1
Information 1
Forwarding
Prefix 2 Nexthop 2
Information 2
…… …… ……
Forwarding
Prefix N Nexthop N
Information N
As shown in Figure 8-4, before indirect next hop is adopted, prefixes are totally independent,
each corresponding to its next hop and forwarding information. When a dependent route
changes, the next hop corresponding to each prefix is iterated and forwarding information is
updated based on the prefix. In this case, the convergence speed is related to the number of
prefixes.
Actually, prefixes of a BGP neighbor have the same next hop, forwarding information, and
refreshed forwarding information.
Figure 8-5 Schematic diagram after indirect next hop is adopted
Prefix 1
Forwarding
Prefix 2 Nexthop
Information
……
Prefix N
As shown in Figure 8-5, after indirect next hop is adopted, prefixes of a BGP neighbor share
a next hop. When a dependent route changes, only the shared next hop is iterated and
forwarding information is updated based on the next hop. In this case, traffic of all prefixes
can be converged at a time. The convergence speed is irrelevant to the number of prefixes.
Comparison Between Route Iteration and Tunnel Iteration

Comparison between route iteration and tunnel iteration is shown in Table 8-5.

Equipment
Table 8-5 Comparison between route iteration and tunnel iteration

Iteration Type Description
Route iteration l Iterating BGP public routes.

l It is triggered by route changes.
l It supports next-hop iteration based on the specified
routing policy.
Tunnel iteration l Iterating BGP VPN routes.

l It is triggered by tunnel changes or tunnel policy
changes.
l Iteration behaviors can be controlled through the tunnel
policy to meet the requirements of different application
scenarios.
8.1.2.13 Default Routes

Default routes are special routes. Generally, administrators can manually configure default
static routes. Default routes can also be generated through dynamic routing protocols such as
OSPF and IS-IS.
Default routes are used only when packets to be forwarded have no matching routing entry in
a routing table. In the routing table, a default route is the route to the network 0.0.0.0 (with the
mask also being 0.0.0.0). You can check whether the default route is configured by using the
display ip routing-table command.
If the destination address of a packet does not match any entry in the routing table, the packet
is sent through a default route. If no default route exists and the destination address of the
packet does not match any entry in the routing table, the packet is discarded. An Internet
Control Message Protocol (ICMP) packet is then sent, informing the originating host that the
destination host or network is unreachable.
8.1.3 Applications
8.1.3.1 Typical Application of IP FRR

As shown in Figure 8-6, IP FRR is configured to improve network reliability. CE1 is dual-
homed to PE1 and PE2. CE1 is configured with two outbound interfaces and two next hops.
That is, link B functions as the backup of link A. When link A fails, traffic can be rapidly
switched to link B.

Equipment
Figure 8-6 Configuring the IP FRR function
IP forwarding
Link A PE1
NodeB CE1 Link B
IP forwarding
PE2
8.1.3.2 Typical Application of Indirect Next Hop
Indirect Next Hop Enabled When IBGP Routes Are Iterated to an IGP Route
Figure 8-7 Networking diagram of IBGP route iteration
AS100
IGP IGP
ATN-B
IBGP
ANT-A ATN-D
IGP IGP
ATN-C
As shown in Figure 8-7, ATN-A and ATN-D establish an IBGP neighbor relationship. To
refresh Forwarding Information Base (FIB) entries and guide the packet forwarding, the real
outbound interface and the directly connected next hop must be identified based on the
original IBGP next hop. Note that the next hop of an IBGP route cannot be used to guide
packet forwarding, because the IBGP neighbor relationship is generally established through
two loopback interfaces, and the next hop is not directly reachable.
ATN-D receives 4 thousand routes from ATN-A. These routes have the same original BGP
next hop. After being iterated, these routes eventually follow the same IGP path (A->B->D).
When the IGP path (A->B->D) fails, these IBGP routes do not need to be iterated separately,
and the relevant FIB entries do not need to be refreshed one by one. Actually, only the shared
next hop need be iterated and refreshed. Consequently, these IBGP routes can be converged to
the path (A->C-> D) at a time in the forwarding plane. Therefore, convergence time is related

Equipment
only to the number of next hops, and sub-second convergence that is irrelevant to the number
of prefixes is implemented.
If ATN-A and ATN-D establish a multi-hop EBGP neighbor relationship, the convergence
procedure is the same as the previous procedure. Next hop separation also applies to multi-
hop EBGP route iteration.
Indirect Next Hop Enabled When VPN Routes Are Iterated to a Tunnel
Figure 8-8 Networking diagram of VPN route iteration
AS100
P1
Tunnel1
CE1 Tunnel2 CE2

PE2
PE1
AS65001 AS65002
P2
As shown in Figure 8-8, PE1 and PE2 establish a neighbor relationship and PE2 receives 4
thousand routes from PE1. These routes have the same original BGP next hop. After being
iterated, these private routes eventually follow the same network public tunnel, namely, tunnel
1. When tunnel 1 fails, these routes do not need to be iterated separately, and the FIB entries
do not need to be refreshed one by one. Actually, only the shared next hop need be iterated,
and the relevant FIB entries need be refreshed. Consequently, these VPN routes can be
converged to tunnel 2 at a time in the forwarding plane. Therefore, convergence time is
related only to the number of next hops, and sub-second convergence that is irrelevant to the
number of prefixes is implemented.

Terms
Term Description
ARP IP packets are forwarded through a specified physical interface. IP packets

Vlink cannot be forwarded through a VLANIF interface, because a VLANIF interface
direct is a logical interface with several physical interfaces as its member interfaces. If
routes an IPv4 packet reaches a VLANIF interface, the device obtains information
about the physical interface using ARP and generates the relevant routing entry.
The route recorded in the routing entry is called an ARP Vlink direct route.

Equipment
Term Description
FRR FRR is applicable to services that are very sensitive to packet loss and delay.
When a fault is detected at the lower layer, the lower layer informs the upper
layer routing system of the fault. Then, the routing system forwards packets
through a backup link. In this manner, the impact of the link fault on services is
minimized.
NDP IP packets are forwarded through a specified physical interface. IP packets

Vlink cannot be forwarded through a VLANIF interface, because a VLANIF interface
direct is a logical interface with several physical interfaces as its member interfaces. If
routes an IPv6 packet reaches a VLANIF interface, the device obtains information
about the physical interface using the neighbor discovery protocol (NDP) and
generates the relevant routing entry. The route recorded in the routing entry is
called an NDP Vlink direct route.
UNR When a user goes online through a Layer 2 device, such as a switch, but there is
no available Layer 3 interface and the user is assigned an IP address, no
dynamic routing protocol can be used. To enable devices to use IP routes to
forward the traffic of this user, use the Huawei User Network Route (UNR)
technology to assign a route to forward the traffic of the user.
Abbreviations
Abbreviation Full Name
BGP Border Gateway Protocol
CE Customer Edge
FIB Forwarding Information Base
IGP Internal Gateway Protocol
IS-IS Intermediate System-Intermediate System
NDP Neighbor Discover Protocol
PE Provider Edge
RIP Routing Information Protocol
RM Route Management
Vlink Virtual Link
VoIP Voice Over IP

Equipment
VRP Versatile Routing Platform
8.2 Static Routes
8.2.1 Introduction to Static Routes
Definition
Static routes need to be manually configured by the administrator.
Purpose
On a simple network, the administrator just needs to configure static routes so that the
network can run properly. Properly configuring and using static routes can improve network
performance and guarantee the required bandwidth for important applications.
8.2.2 Principles
8.2.2.1 Components of Static Routes

On the ATN, you can run the ip route-static command to configure a static route, which
consists of the following:
l Destination Address and Mask

l Outbound Interface and Next-Hop Address
Destination Address and Mask

In the ip route-static command, the IPv4 address is expressed in dotted decimal notation. The
mask is expressed in dotted decimal notation or represented by the mask length (the number
of consecutive 1s in the mask).
Outbound Interface and Next-Hop Address

When you configure a static route, you can specify an outbound interface, a next-hop address,
or both as required.
Actually, each routing entry requires a next-hop address. Before sending a packet, a device
needs to use the longest match rule to search its routing table for the route that matches the
destination address in the packet. The device can find the associated link layer address only
after the next-hop address of the packet is specified.

Equipment
NOTE
If the next hop IP address manually specified for a static route changes, the device on which the static
route is configured is unaware of the change. As a result, traffic fails to be forwarded along the static
route. To address this problem, associate the static route with DHCP so that the static route can obtain
the next hop IP address dynamically.
l If the next hop IP address obtained using DHCP changes, the static route updates it.
l If no next hop IP address can be obtained using DHCP, the static route is invalid.
When you specify an outbound interface, note the following:
l For a Point-to-Point (P2P) interface, the next-hop address is the address you specify as
the outbound interface. That is, the address of the remote interface connected to this
interface is the next-hop address. For example, when an MP-group interface is
encapsulated with the Point-to-Point Protocol (PPP) and obtains the remote IP address
through PPP negotiation, you need to specify only the outbound interface rather than the
next-hop address.
l Non-Broadcast Multiple-Access (NBMA) interfaces (such as an ATM interface) are
applicable to Point-to-Multipoint (P2MP) networks. IP routes and the mappings between
IP addresses and link layer addresses are required. Therefore, you need to configure
next-hop addresses.
l When configuring static routes, do not specify the Ethernet interface as the outbound
interface. An Ethernet interface is a broadcast interface and a VT interface can be
associated with several virtual access (VA) interfaces. If the Ethernet or VA interface is
specified as the outbound interface, a unique next hop cannot be determined because
multiple next hops exist. In actual applications, to specify a broadcast interface (such as
an Ethernet interface) or a VT interface as the outbound interface, you are recommended
to specify the associated next-hop address instead.
8.2.2.2 Applications of Static Routes

As shown in Figure 8-9, the network topology of static routes is simple, and network
communication can be implemented through static routes. In this application, you must
specify an address for each physical network, identify indirectly connected physical networks
for each ATN, and configure static routes for the indirectly connected physical networks.
Figure 8-9 Static routes networking
2 ATN B 4
1 5
ATN A ATN C

Equipment
In Figure 8-9, static routes to network segments 3, 4, and 5 need to be configured on ATN A;
static routes to network segments 1 and 5 need to be configured on ATN B; and static routes
to network segments 1, 2, and 3 need to be configured on ATN C.
Default Static Route

When you run the ip route-static command to configure a static route, if the destination
address and the mask are both set to all 0s (0.0.0.0 0.0.0.0), a default route is configured. This
condition simplifies the network configuration.
In Figure 8-9, because the next hop of the packets sent by ATN A to network segments 3, 4,
and 5 is ATN B, a default route can be configured on ATN A to replace the three static routes
destined for network segments 3, 4, and 5 in the preceding example. Similarly, only a default
route from ATN C to ATN B needs to be configured to replace the three static routes destined
for network segments 1, 2, and 3 in the example.
Floating Static Routes

Static routes can be configured with different preferences so that routing management policies
can be flexibly applied. Different preferences specified for multiple routes to the same
destination can be used to implement route backup.
As shown in Figure 8-10, there are two static routes from ATN A to ATN C. Normally, in the
routing table, only the static route with the next hop being ATN B is in the Active state
because this route has a higher preference. The other static route with the next hop being ATN
D functions as a backup route. The backup route is activated to forward data only when the
primary link becomes faulty. After the primary link recovers, the static route with the next hop
being ATN B becomes active to forward data. Therefore, the backup route is also called a
floating static route. The floating static route becomes ineffective when a fault occurs on the
link between ATN B and ATN C.
Figure 8-10 Floating static routes
ATN B
Preference=60
Preference=100
ATN A ATN C
ATN D

Equipment
8.2.2.3 Functions of Static Routes
IPv4 Static Routes

The ATN supports common static routes and the static routes associated with VPN instances.
The static routes associated with VPN instances are used to manage VPN routes. For details
on VPN instances, see the Feature Description - VPN.
8.2.2.4 BFD for Static Routes

Unlike dynamic routing protocols, static routes do not have a detection mechanism. When a
fault occurs on the network, the administrator needs to handle it. To assist the administrator,
Bidirectional Forwarding Detection (BFD) for static routes can be introduced into the network
to bind a static route to a BFD session. Then the BFD session can detect the status of the link
where the static route resides.
After BFD for static routes is configured, each static route can be associated with a BFD
session. In addition to route selection rules, whether a static route can be selected as the
optimal route is subject to BFD session status.
l If a BFD session associated with a static route detects a link failure when the BFD
session is Down, the BFD session reports the link failure to the system. The system then
deletes the static route from the IP routing table.
l If a BFD session associated with a static route detects that a faulty link recovers when
the BFD session is Up, the BFD session reports the fault recovery to the system. The
system then adds the static route to the IP routing table again.
l By default, a static route can still be selected even though the BFD session associated
with it is AdminDown (triggered by the shutdown command run either locally or
remotely). If a device is restarted, the BFD session needs to be re-negotiated. In this
case, whether the static route associated with the BFD session can be selected as the
optimal route is subject to the re-negotiated BFD session status.
BFD for static routes has two modes:
l Single-hop detection
For a non-iterated static route, the configured outbound interface and next-hop address
provide the information about the directly connected next hop. In this case, the outbound
interface bound to the BFD session is the outbound interface of the static route, and the
peer address is the next-hop address of the static route.
l Multi-hop detection
For an iterated static route, only the next-hop address is configured. Therefore, the
directly connected next-hop and outbound interface need to be iterated. In this case, the
peer address of the BFD session is the original next-hop address of the static route, and
the outbound interface is not specified. Generally, the original next hop to be iterated is
an indirect next hop. Therefore, multi-hop detection is performed on the static routes that
support route iteration.
NOTE
If the next hop of a route is not directly reachable, the route cannot be used for packet forwarding. Based
on information about the current next hop of this route, the system will calculate an actual outbound
interface and an actual next hop. This process is called route iteration. In the display ip routing-table
command output, if the Flags value of a route is displayed R, the route is an iterated route. Otherwise,
the route is not an iterated route.

Equipment
NOTE
For details on BFD, see the Feature Description - Reliability.
Terms
Term Description
FRR Fast Reroute is applicable to the services that are very sensitive to packet loss
and delay. After FRR is configured, when a fault is detected at the lower layer,
the fault is reported to the upper-layer routing system. Then, packets are
forwarded through a backup link. Therefore, the impact of link faults on the
carried services is minimized.
Abbreviations
Abbreviatio Full Name
n
RM Route Management
8.3 RIP
8.3.1 Introduction
Definition
Routing Information Protocol (RIP) is a simple Interior Gateway Protocol (IGP). RIP is used
in small-scale networks, such as campus networks and simple regional networks.
As a distance-vector routing protocol, RIP exchanges routing information using User

Datagram Protocol (UDP) packets and port number 520.
RIP employs the hop count as the metric to measure the distance to the destination. In RIP, by
default, the hop count from a router to its directly connected network is 0, and the hop count
from a router to a network that is reachable through another router is 1, and so on. That is, the
hop count equals the number of routers along the path from the local network to the
destination network. To speed up the convergence, RIP defines the hop count as an integer
ranging from 0 to 15. A hop count greater than or equal to 16 is considered infinite, indicating
that the destination network or host is unreachable. Due to the hop limit, RIP is not applicable
to large-scale networks.

Equipment
RIP supports split horizon, poison reverse, and triggered update, which improves performance
and prevents routing loops.
Purpose
As the earliest IGP, RIP is used in small- and medium-sized networks. Its implementation is
simple, and the configuration and maintenance of RIP are easier than those of Open Shortest
Path First (OSPF) and Intermediate System-to-Intermediate System (IS-IS). Therefore, RIP is
widely used on live networks.
8.3.2 Principles
RIP is based on the Distance-Vector (DV) algorithm. It forwards packets through User
Datagram Protocol (UDP). RIP uses timers to guarantee advertisement, update, and aging of
routing information. However, design defects in RIP may cause routing loops. Therefore, split
horizon, poison reverse, and triggered update were introduced into RIP to prevent routing
loops.
In addition, RIP periodically advertises its routing table to neighbors, and route
summarization was introduced to reduce the size of the routing table.
8.3.2.1 RIP-1
RIP Version 1 (RIP-1) is a classful routing protocol, and its protocol packets can only be
broadcast. Figure 8-11 shows the packet format. A RIP packet can carry a maximum of 25
entries. RIP is based on UDP, and RIP-1 data packets cannot be longer than 512 bytes.
Because RIP-1 packets do not carry any mask information, RIP-1 can identify only the routes
to natural network segments, such as Class A, Class B, and Class C. Therefore, RIP-1 does
not support route aggregation or discontinuous subnets.
Figure 8-11 RIP-1 packet format
0 7 15 31
Header Command Version Must be zero
Address familyidentifier Must be zero
IPaddress
Route
Entries Must be zero
Must be zero
Metric
8.3.2.2 RIP-2
RIP version 2 (RIP-2), is a classless routing protocol. Figure 8-12 shows the format of a
RIP-2 packet.

Equipment
Figure 8-12 RIP-2 packet format
0 7 15 31
Header Command Version Must be zero
Address Family Identifier Route Tag
IP Address
Route
Subnet Mask
Entries
Next Hop
Metric
Compared with RIP-1, RIP-2 has the following advantages:
l Supports external route tags and flexibly controls routes based on the tag using a routing
policy.
l Supports route summarization and Classless Inter-domain Routing (CIDR) because
RIP-2 packets carry mask information.
l Supports next hop specification so that the optimal next hop address can be specified on
the broadcast network.
l Uses multicast routes to send update packets. Only RIP-2 routers can receive protocol
packets, which reduces resource consumption.
8.3.2.3 Timers
RIP uses the following three timers:
l Update timer: The update timer periodically triggers update packet transmission. By
default, the interval at which update packets are sent is 30s.
l Age timer: If a RIP device does not receive any packets from its neighbor to update a
route before the route expires, the RIP device considers the route unreachable. By
default, the age timer interval is 180s.
l Garbage-Collect timer: If a route becomes invalid after the age timer expires, the route is
placed into a garbage queue instead of being immediately deleted from the RIP routing
table. If an Update packet of a route is received before the garbage-collect timer expires,
the route is placed back into the age queue. If no Update packet of a route is received
before the garbage-collect timer expires, the route is deleted from the RIP routing table.
The relationship between the timers is as follows:
The advertisement of RIP routing updates is triggered by the update timer at a default interval
of 30 seconds. Each entry is associated with the age timer and garbage-collect timer. After a
route is learned from a neighbor, it is added to the routing table, and the age timer is started. If
no update packet is received from the neighbor within 180s, the cost of the route is set to 16
(indicating that the route is unreachable). At the same time, the garbage-collect timer is
started. If no update packet is received within 120 seconds, the entry is deleted after the
garbage-collect timer expires.

Equipment
8.3.2.4 Split Horizon

Split horizon prevents a RIP-enabled interface from sending back the routes it learns, which
reduces bandwidth consumption and prevents routing loops.
Figure 8-13 Networking for split horizon
10.0.0.0/8
ATN A ATN B
10.0.0.0/8
In Figure 8-13, ATN B sends a route to 10.0.0.0 to ATN A, and ATN A does not send the
route back to ATN B.
8.3.2.5 Poison Reverse

Poison reverse allows a RIP-enabled interface to set the metric of the route that it learns from
a neighbor to 16 (indicating that the route is unreachable) and then send the route back. After
receiving this route, the neighbor deletes the useless route from its routing table, which
prevents loops.
Figure 8-14 Schematic diagram of poison reverse
10.0.0.0/8
cost=16
ATN A ATN B
cost=1
10.0.0.0/8
On the network shown in Figure 8-14, if poison reverse is not configured, ATN B sends ATN
A a route that was learned from ATN A. The metric of the route from ATN A to network
10.0.0.0 is 1. If the route from ATN A to network 10.0.0.0 is unreachable and ATN B keeps
sending ATN A routes to network 10.0.0.0 because ATN B fails to receive a route update
packet from ATN , a routing loop occurs.
If poison reverse configured, if ATN A sends ATN B a message that the route received from
ATN B is unreachable, ATN B does not learns the unreachable route from ATN A, which
avoids route loops.
If both split horizon and poison reverse are configured, only poison reverse takes effect.

Equipment
8.3.2.6 Triggered Update

If the local routing information changes, triggered update allows the local router to
immediately notify its neighbors of the changes through triggered update packets, which
speeds up network convergence.
Figure 8-15 Networking for triggered update
The network to The network to

10.4.0.0 fails. 10.4.0.0 fails.
10.1.0.0
E0 10.2.0.0
ATN B
S0 S0 S1
ATN A
ATN C 10.3.0.0
E0 S0
The network to
10.4.0.0 fails.
10.4.0.0
In the networking shown in Figure 8-15, when network 10.4.0.0 becomes unreachable, ATN
C learns the information first. By default, a RIP-enabled device sends routing updates to its
neighbors every 30s. If the update message of ATN B is sent to ATN C when ATN C is
waiting for the route update message, ATN C learns the incorrect route to 10.4.0.0. In this
case, the next hops of the routes from ATN B or ATN C to 10.4.0.0 are ATN C or ATN B
respectively, which results in a routing loop. If ATN C sends an Update packet to ATN B
immediately after it detects a network failure. The routing table of ATN B is updated in time,
which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local
device sets the cost of the route to 16 and then advertises the route immediately to its
neighbors. This process is called route poisoning.
8.3.2.7 Route Summarization

Route summarization allows routes to the same natural network segment but different subnets
to be summarized into a single route belonging to the same network segment before it is
transmitted to other network segments. RIP-1 packets do not carry mask information, and
therefore RIP-1 can advertise only routes with natural masks. RIP-2 supports route

Equipment
summarization because RIP-2 packets carry mask information. Therefore, RIP-2 supports
subnetting.
In RIP-2, route summarization reduces the size of the routing table and improves the
extensibility and efficiency of a large-scale network.
Route summarization is classified as follows:
l Process-based classful summarization
For example, a RIP process summarizes the route 10.1.1.0 /24 with metric 2 and route
10.2.2.0/24 with metric 3 into the route 10.0.0.0/8 with metric 2.
l Interface-based aggregation:
Users can specify a summary address.
For example, users can configure a RIP-enabled interface to summarize the route
10.1.1.0/24 with metric 2 and route 10.2.2.0/24 with metric 3 into the route 10.1.0.0/16
with metric 2.
8.3.2.8 Multi-process and Multi-instance

RIP supports multi-process and multi-instance, which simplifies network management and
improves service control efficiency. Multi-process allows a set of interfaces to be associated
with a specific RIP process, which ensures that the specific RIP process performs all the
protocol operations only on this set of interfaces. Therefore, multiple RIP processes can run
on one router, and each process manages a unique set of interfaces. In addition, the routing
data of each RIP process is independent; however, processes can import routes from each
other.
On routers that support VPN, each RIP process is associated with a specific VPN instance.
Therefore, all the interfaces associated with the RIP process need to be associated with the
RIP process-related VPN instance.
8.3.2.9 Hot Standby

NOTE
Only the ATN 950B supports this function.
Routers with a distributed architecture support the RIP Hot Standby (HSB) feature. RIP backs
up data from the Active Main Board (AMB) to the Standby Main Board (SMB). Whenever
the AMB fails, the SMB becomes active. Therefore, RIP is not affected.
RIP backs up only RIP configurations. During a Graceful Restart (GR), a RIP-enabled device
resends a routing request to neighbors to synchronize route database.
NOTE
The ATN can only be used as a GR Helper, not GR Restarter.

Equipment

Terms
Term Definition
Poison reverse Poison reverse allows a RIP-enabled interface to set the metric of the
route that it learns from a neighbor to 16 (indicating that the route is
unreachable) and then send the route back. After receiving this route, the
neighbor deletes the useless route from its routing table, which prevents
loops.
Split horizon Split horizon prevents a RIP-enabled interface from sending back the
routes it learns, which reduces bandwidth consumption and prevents
routing loops.

Abbreviation
8.4 RIPng
8.4.1 Introduction
Definition
RIP next generation (RIPng) is an extension to RIP Version 2 (RIPv2) on IPv6 networks.
Most RIP concepts apply to RIPng.
RIPng is a distance-vector routing protocol, which measures the distance (metric or cost) to
the destination host by the hop count. In RIPng, the hop count from a device to its directly
connected network is 0, and the hop count from a device to a network that is reachable
through another device is 1. When the hop count is greater than or equal to 16, the destination
network or host is considered unreachable.
To be applied on IPv6 networks, RIPng makes the following changes to RIP:
l UDP port number: RIPng uses UDP port number 521 to send and receive routing
information.
l Multicast address: RIPng uses FF02::9 as the link-local multicast address of a RIPng
device.
l Prefix length: RIPng uses a 128-bit (the mask length) prefix in the destination address.
l Next hop address: RIPng uses a 128-bit IPv6 address.
l Source address: RIPng uses the local link address FE80::/10 as the source address to
send RIPng update packets.

Equipment
Purpose
RIPng is an extension to RIP for support of IPv6.
8.4.2 Principles
RIPng is an extension to RIPv2 on IPv6 networks and uses the same timers as RIPv2. RIPng
supports split horizon, poison reverse, and triggered update, which prevents routing loops.
8.4.2.1 RIPng Packet Format

A RIPng packet is composed of a header and several route table entries (RTEs). In a RIPng
packet, the maximum number of RTEs is determined by the maximum transmission unit
(MTU) of an interface.
Figure 8-16 shows the basic format of a RIPng packet.
Figure 8-16 RIPng packet format

0 7 15 31
Command Version Must be zero
Route table entry 1 (20 octets)
---------
Route table entry N (20 octets)
A RIPng packet contains two types of RTEs:
l Next hop RTE: It defines the IPv6 address of the next hop and is located before a group
of IPv6-prefix RTEs that have the same next hop.
l IPv6-prefix RTE: It describes the destination IPv6 address and the cost in the RIPng
routing table and is located after a next hop RTE. A next hop RTE can be followed by
multiple different IPv6-prefix RTEs.
Figure 8-17 shows the format of the next-hop RTE.

Equipment
Figure 8-17 Format of the next hop RTE

0 7 15 31
IPv6 next hop address (16 octets)
Must be zero Must be zero 0xFF
Figure 8-18 shows the format of the IPv6-prefix RTE.
Figure 8-18 Format of the IPv6-prefix RTE

0 7 15 31
IPv6 prefix (16 octets)
Route tag Prefixlen Metric
8.4.2.2 Timers
RIPng uses the following three timers:
l Update timer: This timer periodically triggers update packet transmission. By default, the
interval at which update packets are sent is 30s. This timer is used to synchronize RIPng
routes on the network.
l Age timer: If a RIPng device does not receive any update packet from its neighbor
before a route expires, the RIPng device considers the route to its neighbor unreachable.
l Garbage-collect timer: If no packet is received to update an unreachable route after the
Age timer expires, this route is deleted from the RIPng routing table.
The relationship between these timers is as follows:
The advertisement of RIPng routing updates is periodically triggered by the update timer with
default value 30 seconds. Each routing entry is associated with the age timer and garbage-
collect timer. Each time a route is learned and added to the routing table, the age timer is
started. If no update packet is received from the neighbor within 180 seconds, the metric of
the route is set to 16, and the garbage-collect timer is started. If no update packet is received
within 120 seconds, the route is deleted after the garbage-collect timer expires.
8.4.2.3 Split Horizon

Split horizon prevents a RIPng-enabled interface from sending back the routes it learns, which
reduces bandwidth consumption and prevents routing loops.

Equipment
Figure 8-19 Networking for split horizon
2001:DB8:1::/64
ATNA ATNB
2001:DB8:1::/64
On the network shown in Figure 8-19, after ATN B sends a route to network 2001:DB8:1::/64
to ATN A, ATN A does not send the route back to ATN B.
8.4.2.4 Poison Reverse

Poison reverse allows a RIPng-enabled interface to set the metric of the route that it learns
from a neighbor to 16 (indicating that the route is unreachable) and then send the route back.
After receiving this route, the neighbor deletes the useless route from its routing table, which
prevents loops.
Figure 8-20 Networking for poison reverse
2001:DB8:1::/64
metric=16
ATNA ATNB
2001:DB8:1::/64
metric=1
As shown in Figure 8-20, if poison reverse is not configured, ATN B sends ATN A a route
that was learned from ATN A. The metric of the route from ATN A to network
2001:DB8:1::/64 is 1. When the route from ATN A to network 2001:DB8:1::/64 becomes
unreachable and ATN B does not receive an update packet from ATN A and keeps sending
ATN A the route from ATN A to network 2001:DB8:1::/64, a routing loop occurs.
With poison reverse, after receiving a route from ATN B, ATN A sends ATN B a message that
the route is unreachable. ATN B then no longer learns the reachable route from ATN A, which
prevents routing loops.
If both poison reverse and split horizon are configured, only poison reverse takes effect.
8.4.2.5 Triggered Update

Triggered update allows a device to advertise the routing information changes immediately,
which speeds up network convergence.

Equipment
Figure 8-21 Networking for triggered update

The network to The network to
2001:DB8:1:: fails. 2001:DB8:1:: fails.
2001:DB8:13::
2001:DB8:12::
E0 ATNB
S0 S0
ATNA S1
ATNC 2001:DB8:11::
E0 S0
The network to
2001:DB8:1:: fails.
2001:DB8:1::
On the network shown in Figure 8-21, if network 2001:DB8:1:: is unreachable, ATN C learns
the information first. By default, a RIPng-enabled device sends Update packets to its
neighbors every 30 seconds. If ATN C receives an Update packet from ATN B within 30s
when ATN C is still waiting to send update packets, ATN C learns the incorrect route to
network 2001:DB8:1:: from ATN B. In this case, the next hops of the routes from ATN B and
ATN C to network 2001:DB8:1:: are ATN C and ATN B, respectively, which results in a
routing loop. If ATN C sends an Update packet to ATN B immediately after it detects a
network fault, ATN B can rapidly update its routing table, which prevents routing loops.
In addition, if the next hop of a route becomes unavailable due to a link failure, the local ATN
sets the metric of the route to 16 and then advertises the route immediately to its neighbors.
This process is called route poisoning.

RIPng route summarization is implemented on interfaces by summarizing the routes to be
advertised by the interfaces according to the longest match rule.
RIPng route summarization reduces the size of the routing table and improves the
extensibility and efficiency of a large-scale network.
Implementation of route aggregation:
For example, two routes (2001:DB8:11::24 with metric 2 and 2001:DB8:12::34 with metric 3)
are available on an interface, and the summary route configured on the interface is
2001:DB8::/32. In this situation, the finally advertised route is 2001:DB8::/32 with metric 2.
8.4.2.7 Multi-process
RIPng supports multi-process and multi-instance, which simplifies network management and
improves service control efficiency. Multi-process allows a set of interfaces to be associated
with a specific RIPng process, which ensures that the specific RIPng process performs all the
protocol operations only on this set of interfaces. Therefore, multiple RIPng processes can run

Equipment
on one router, and each process manages a unique set of interfaces. In addition, the routing
data of each RIPng process is independent; however, processes can import routes from each
other.
8.4.2.8 Hot Standby

NOTE
Routers with a distributed architecture support RIPng Hot Standby (HSB). In the RIPng HSB
process, RIPng backs up RIPng configuration from the Active Main Board (AMB) to the
Standby Main Board (SMB). Whenever the AMB fails, the SMB becomes active.
After the SMB is activated, RIPng resends a request to neighbors to synchronize the route
database. Therefore, RIPng is not affected.
NOTE
The ATN can function as a GR helper, not a GR restarter.

Term
Term Definition
Poison Poison reverse allows a RIPng-enabled interface to set the metric of the
reverse route that it learns from a neighbor to 16 (indicating that the route is
unreachable) and then send the route back. After receiving this route, the
neighbor deletes the useless route from its routing table, which prevents
loops.
Split horizon Split horizon prevents a RIPng-enabled interface from sending back the
routes it learns, which reduces bandwidth consumption and prevents
routing loops.

Abbreviation
RIPng RIP next generation
8.5 IS-IS

Equipment
8.5.1 Introduction to IS-IS
Definition
Intermediate System to Intermediate System (IS-IS) is a link-state routing protocol that uses
the shortest path first (SPF) algorithm to calculate routes. IS-IS is an Interior Gateway
Protocol (IGP) and is used within an autonomous system (AS).
IS-IS was initially designed by the International Organization for Standardization (ISO) for its
Connectionless Network Protocol (CLNP).
To support IP routing, the Internet Engineering Task Force (IETF) extended and modified IS-
IS in RFC 1195. This modification enables IS-IS to be applied to TCP/IP and Open Systems
Interconnection (OSI) environments. This type of IS-IS is called Integrated IS-IS or Dual IS-
IS.
The term IS-IS used in this document refers to Integrated IS-IS, unless otherwise stated.
NOTE
If IS-IS IPv4 and IS-IS IPv6 implement a feature in the same way, details are not provided in this
chapter. For details about the implementation differences, see the appendix 8.5.4 Appendixes.
Purpose
United States' Government Open Systems Interconnection Profile (GOSIP) held the opinion
that TCP/IP was an interim protocol suite that would eventually be replaced by the OSI suite.
All routing protocols except IS-IS support TCP/IP only. IS-IS can apply to both TCP/IP and
OSI networks and supports dynamic routing information exchange on an IP network.
IS-IS implements fast network convergence by discovering, advertising, and calculating

routes in an AS. IS-IS and Open Shortest Path First (OSPF) have distinct differences though
both are IGPs, as shown in Table 8-6.
Table 8-6 Differences between IS-IS and OSPF
Item IS-IS OSPF
Protocol type Link layer protocol IP layer protocol
Non-IP protocols are Yes No

supported
Applicable scope Large Internet service Enterprise network and ISP

provider (ISP) network network
Complexity Generates a small number of Generates a considerable

link state protocol data units number of link-state
(LSPs) and uses only one advertisements (LSAs) and
area. uses multiple areas.
Scalability Supports a large single area. Supports multiple areas on a

large network.

Equipment
Benefits
IS-IS has become a scalable, powerful, and easy-to-use IGP after many years of development.
It has the following advantages:
l Implements routing in a routing domain.
l Supports fast network convergence when a fault occurs on a network.
l Provides loop-free routes.
l Improves network stability.
l Supports network scalability.
l Improves network resource usage.
These advantages make IS-IS that carriers use for wide-scale deployment on live networks to
guarantee network stability, security, and scalability.
8.5.2 Principles
8.5.2.1 Basic Concepts of IS-IS
Development of IS-IS
CLNP is a Layer 3 protocol in the OSI model posed by the ISO. IS-IS was initially designed
by the ISO and is used as a routing protocol based on CLNP addressing.
Figure 8-22 OSI model

OSI Reference OSI Protocol Suite
Model
CMIP DS FTAM MHS VTP
ASES
Application
ACSE ROSE RTSE CCRSE ......
Presentation Presentation Service/Presentation Protocal
Session Session Service/Session Protocal
Transport TP0 TP1 TP2 TP3 TP4
CONP/CMNS CLNP/CLNS
Network
IS-IS ES-IS
DataLink IEEE IEEE IEEE 802.5

FDDI X.25
802.2 802.3 Token Ring
IEEE 802.3 Token Ring FDDI X.25
Physical
Hardware Hardware Hardware Hardware
OSI adopts systemized (or hierarchical) addressing. The services at the transport layer in OSI
can be addressed through the Network Service Access Point (NSAP).

Equipment
OSI uses the following terms:

l CLNS: Connectionless Network Service
l CLNP: Connectionless Network Protocol
l CMNS: Connection-Mode Network Service
l CONP: Connection-Oriented Network Protocol
OSI implements CLNS through CLNP, and CMNS through CONP.
CLNS is implemented through the following protocols:
l CLNP: similar to the IP protocol in TCP/IP
l IS-IS: routing protocol between intermediate systems
l ES-IS: protocol used between a host system and an intermediate system, similar to ARP
and ICMP in IP
Table 8-7 Concepts in OSI and IP

Abbrevi Concept in OSI Concept in IP
ation
IS Intermediate System Router
ES End System Host
DIS Designated Intermediate System Designated Router (DR) in

OSPF
SysID System ID Router ID in OSPF
PDU Protocol Data Unit IP packet
LSP Link State Protocol data unit OSPF LSA
NSAP Network Service Access Point IP address
With the popularity of TCP/IP, the IETF extends and modifies IS-IS in RFC 1195 to support
IP routing. This enables IS-IS to be applied to TCP/IP and OSI environments. This type of IS-
IS is called Integrated IS-IS or Dual IS-IS.
Address Structure of IS-IS

In OSI, the NSAP is used to locate resources. The ISO has adopted the NSAP address
structure shown in Figure 8-23. An NSAP is composed of the Initial Domain Part (IDP) and
the Domain Specific Part (DSP). IDP is the counterpart of network ID in an IP address, and
DSP is the counterpart of the subnet number and host address in an IP address.
As defined by the ISO, the IDP consists of the Authority and Format Identifier (AFI) and the
Initial Domain Identifier (IDI). The AFI specifies the address assignment mechanism and
address format; the IDI identifies a domain.
The DSP consists of the High Order DSP (HODSP), system ID, and NSAP Selector (SEL).
The HODSP is used to divide areas, the system ID identifies a host, and the SEL indicates the
service type.

Equipment
The lengths of the IDP and the DSP are variable. The length of the NSAP varies from 8 bytes
to 20 bytes.
Figure 8-23 IS-IS address structure
IDP DSP
AFI IDI High Order DSP System ID SEL(1 octet)
Area Address
The components in the address structure are described as follows:

l Area address
An IDP and HODSP of the DSP can identify a routing domain and the areas in a routing
domain; therefore, the combination of the IDP and HODSP is referred to as an area
address, equal to an area number in OSPF. An area address is used to uniquely identify
the area in a routing domain. The area addresses of routers in the same Level-1 area must
be the same, whereas the area addresses of routers in the Level-2 area can be different.
In general, a router can be configured with only one area address, and the area address of
all nodes in an area must be the same. To support seamless combination, division, and
transformation of areas, a maximum of three area addresses can be configured for an IS-
IS process on a device.
l System ID
A system ID uniquely identifies a host or a device in an area. On the device, the system
ID length is 48 bits (6 bytes).
In actual applications, a router ID corresponds to a system ID. If a device uses the IP
address of Loopback 0 (168.10.1.1) as its router ID, its system ID used in IS-IS can be
obtained in the following steps:
– Extend each part of the IP address 168.10.1.1 to 3 bits and add 0 or 0s to the front
of the part that is shorter than 3 bits.
– Divide the extended address 168.010.001.001 into three parts, with each part
consisting of 4 decimal digits.
– The reconstructed 1680.1000.1001 is the system ID.
You can specify a system ID in many ways. You need to ensure that the system ID
uniquely identifies a host or a router.
l SEL
The role of an SEL (also referred to as NSAP Selector or N-SEL) is similar to that of the
"protocol identifier" of IP. A transport protocol matches an SEL. The SEL is 00 in IP.
l NET
A Network Entity Title (NET) indicates the network layer information of an IS itself. It
does not contain the transport layer information (SEL = 0). A NET can be regarded as a
special NSAP. The length of the NET field is the same as that of an NSAP, varying from

Equipment
8 bytes to 20 bytes. When configuring IS-IS on a device, you can configure only a NET
instead of an NSAP.
In general, an IS-IS process is configured with only one NET. When areas need to be
redefined, for example, areas need to be combined or an area needs to be divided into
sub-areas, you can configure multiple NETs.
An IS-IS process can be configured with a maximum of three area addresses; therefore, a
maximum of three NETs can be configured. When configuring multiple NETs, ensure
that their system IDs are the same.
For example, in NET ab.cdef.1234.5678.9abc.00, the area is ab.cdef, the system ID is
1234.5678.9abc, and the SEL is 00.
NOTE
The routers in the same area must have the same area address.
IS-IS PDU Format

The types of PDUs for IS-IS include Hello, LSPs, CSNPs, and PSNPs.
Table 8-8 PDU types

Type PDU Type Name
Value
15 Level-1 LAN IS-IS Hello PDU L1 LAN IIH
16 Level-2 LAN IS-IS Hello PDU L2 LAN IIH
17 Point-to-Point IS-IS Hello PDU P2P IIH
18 Level-1 Link State PDU L1 LSP
20 Level-2 Link State PDU L2 LSP
24 Level-1 Complete Sequence Numbers PDU L1 CSNP
25 Level-2 Complete Sequence Numbers PDU L2 CSNP
26 Level-1 Partial Sequence Numbers PDU L1 PSNP
27 Level-2 Partial Sequence Numbers PDU L2 PSNP
l Hello packet format

Hello packets, also called the IS-to-IS Hello PDUs (IIH), are used to set up and maintain
neighbor relationships. Among them, Level-1 LAN IIHs are applied to the Level-1
routers on broadcast LANs; Level-2 LAN IIHs are applied to the Level-2 routers on
broadcast LANs; and P2P IIHs are applied to non-broadcast networks. Hello packets on
different networks have different formats.
Figure 8-24 shows the format of a Hello packet on a broadcast network (the part
highlighted in blue is the common header).

Equipment
Figure 8-24 Format of a Level-1 or Level-2 LAN IIH Hello packet

No. of Octets
Intradomain Routeing Protocol Discriminator 1
Length Indicator 1
Version/Protocol ID Extension 1
ID Length 1
R R R PDU Type 1
Version 1
Reserved 1
Maximum Area Address 1
Reserved/Circuit Type 1
Source ID ID Length
Holding Time 2
PDU Length 2
R Priority 1
LAN ID ID Length+1
Variable Length Fields
Figure 8-25 shows the format of a Hello packet on a P2P network.
Figure 8-25 Format of a P2P IIH Hello packet

No. of Octets
Length Indicator 1
ID Length 1
R R R PDU Type 1
Version 1
Reserved 1
Reserved/Circuit Type 1
Source ID ID Length
Holding Time 2
PDU Length 2
Local Circuit ID 1

Equipment
As shown in Figure 8-25, most fields in a P2P IIH are the same as those in a LAN IIH.
The P2P IIH does not have the priority and LAN ID fields, but has a local circuit ID
field. The local circuit ID indicates the local link ID.
l LSP packet format
LSPs are used to exchange link-state information. There are two types of LSPs: Level-1
LSPs and Level-2 LSPs. Level-1 IS-IS transmits Level-1 LSPs; Level-2 IS-IS transmits
Level-2 LSPs; and Level-1-2 IS-IS can transmit both Level-1 and Level-2 LSPs.
Level-1 and Level-2 LSPs have the same format, as shown in Figure 8-26.
Figure 8-26 Format of a Level-1 or Level-2 LSP

No. of Octets
Length Indicator 1
ID Length 1
R R R PDU Type 1
Version 1
Reserved 1
PDU Length 2
Remaining Lifetime 2
LSP ID ID Length+2
Sequency Number 4
Checksum 2
P ATT OL IS Type 1
The main fields of a level-1 or Level-2 LSP are described as follows:

– OL: LSDB overload
LSPs with the overload bit are still flooded on the network, but the LSPs are not
used when routes that pass through a router configured with the overload bit are
calculated. Specifically, after a router is configured with the overload bit, other
routers ignore the router when performing the SPF calculation except for the direct
routes of the router.
In Figure 8-27, packets from ATN A to ATN C are all forwarded by ATN B. If the
OL field is set to 1 on ATN B, ATN A considers the LSDB of ATN B incomplete.
ATN A then forwards the packets to ATN C through ATN D and ATN E, but the
packets to the destination that is directly connected to ATN B are forwarded
normally.

Equipment
Figure 8-27 Networking for LSDB overload
ATN D ATN E
Overload
ATN A ATN C
ATN B
– IS Type: type of IS-IS that generates the LSP

The IS Type specifies the IS-IS type. 01 indicates Level-1, and 11 indicates Level-2.
l SNP Format
Sequence Number PDUs (SNPs) describe the LSPs in all or part of the databases to
synchronize and maintain all LSDBs.
SNPs include complete SNPs (CSNPs) and partial SNPs (PSNPs). They are further
divided into the Level-1 CSNP, Level-2 CSNP, Level-1 PSNP, and Level-2 PSNP.
A CSNP contains the summary of all LSPs in an LSDB. This maintains LSDB
synchronization between neighboring routers. On a broadcast network, the DIS
periodically sends CSNPs. The default interval at which CSNPs are sent is 10 seconds.
On a point-to-point link, CSNPs are sent only when the neighbor relationship is
established for the first time.
Figure 8-28 shows the CSNP packet format.
Figure 8-28 Format of a Level-1 or Level-2 CSNP

No. of Octets
Length Indicator 1
ID Length 1
R R R PDU Type 1
Version 1
Reserved 1
PDU Length 2
Source ID ID Length+1
Start LSP ID ID Length+2
End LSP ID ID Length+2

Equipment
The main fields are described as follows:

– Source ID: system ID of the router that sends the SNP
– Start LSP ID: ID of the first LSP in the CSNP
– End LSP ID: ID of the last LSP in the CSNP
A PSNP lists only the sequence number of recently received LSPs. A PSNP can
acknowledge multiple LSPs at a time. If an LSDB is not updated, the PSNP is also used
to request a neighbor to send a new LSP.
Figure 8-29 shows the PSNP packet format.
Figure 8-29 Format of a Level-1 or Level-2 PSNP

No. of Octets
Length Indicator 1
ID Length 1
R R R PDU Type 1
Version 1
Reserved 1
PDU Length 2
Source ID ID Length+1
l CLV
The variable length fields in a PDU are the multiple Code-Length-Values (CLVs). A
CLV is also called Type-Length-Value (TLV). Figure 8-30 shows the CLV format.
Figure 8-30 CLV format

No. of Octets
Code 1
Length 1
Value Length
CLVs vary with PDU types, as shown in Table 8-9.

Equipment
Table 8-9 PDU types and CLV names
CLV Code Name Applied PDU Type
1 Area Addresses IIH and LSP
2 IS Neighbors (LSP) LSP
4 Partition Designated Level2 IS L2 LSP
6 IS Neighbors (MAC Address) LAN IIH
7 IS Neighbors (SNPA Address) LAN IIH
8 Padding IIH
9 LSP Entries SNP
10 Authentication Information IIH, LSP, and SNP
128 IP Internal Reachability Information LSP
129 Protocols Supported IIH and LSP
130 IP External Reachability Information L2 LSP
131 Inter-Domain Routing Protocol L2 LSP

Information
132 IP Interface Address IIH and LSP
The CLVs with codes ranging from 1 to 10 are defined in ISO 10589 (CLV 3 and CLV 5
are not listed in the table), and the other CLVs are defined in RFC 1195.
IS-IS Areas
l Two-Level structure
To support large-scale routing networks, IS-IS adopts a two-level structure in a routing
domain. A large domain can be divided into areas. In general, Level-1 routers are located
in an area, Level-2 routers are located between areas, and Level-1-2 routers are located
between Level-1 and Level-2 routers.
l Level-1 router
A Level-1 router manages intra-area routing. It establishes neighbor relationships with
only the Level-1 and Level-1-2 routers in the same area and maintains a Level-1 LSDB.
The LSDB contains routing information in the local area. A packet to a destination
beyond this area is forwarded to the nearest Level-1-2 router.
l Level-2 router
A Level-2 router manages inter-area routing. It can establish neighbor relationships with
Level-2 routers or Level-1-2 routers in other areas. It maintains a Level-2 LSDB which
contains inter-area routing information.
All Level-2 routers form the backbone network of the routing domain. They are
responsible for communications between areas. The Level-2 routers in the routing
domain must be contiguous to ensure the continuity of the backbone network. Only

Equipment
Level-2 routers can exchange data packets or routing information with routers beyond
the area.
l Level-1-2 router
A router that belongs to both a Level-1 area and a Level-2 area, is called a Level-1-2
router. It can establish Level-1 neighbor relationships with Level-1 routers and Level-1-2
routers in the same area. It can also establish Level-2 neighbor relationships with Level-2
routers and Level-1-2 routers in other areas. Level-1 routers can be connected to other
areas only through Level-1-2 routers.
A Level-1-2 device maintains two LSDBs: a Level-1 LSDB and a Level-2 LSDB. The
Level-1 LSDB is used for intra-area routing, whereas the Level-2 LSDB is used for
inter-area routing.
NOTE
Level-1 routers in different areas cannot establish neighbor relationships. Level-2 routers can
establish neighbor relationships with each other, regardless of the areas to which the Level-2
routers belong.
l Interface level
A Level-1-2 device may need to establish only a Level-1 adjacency with a neighbor and
establish only a Level-2 adjacency with another neighbor. In this case, you can set the
level of an interface to control the setting of adjacencies on the interface. Specifically,
only Level-1 adjacencies can be established on a Level-1 interface, and only Level-2
adjacencies can be established on a Level-2 interface.
Figure 8-31 shows a network that runs IS-IS. The network is similar to an OSPF network
with multiple areas. The entire backbone area contains all routers in area 1 and Level-1-2
routers in other areas.

Equipment
Figure 8-31 IS-IS topology I
Area2 Area3
L1
L1/2
L1/2
L2
L2
backbone Area1
L2 L2
Area5
Area4
L1/2 L1
L1/2
L1
L1
L1
L1
Figure 8-32 shows another type of IS-IS topology. All the contiguous Level-1-2 and Level-2
routers form the backbone area of IS-IS. In this topology, Level-2 routers belong to different
areas, and Level-1-2 routers also belong to different areas. No area is specifically defined as
the backbone area.
Figure 8-32 IS-IS topology II
Area1
L1
L2
L1
L1/L2
Area2 L1/L2 L1
Area4
L2
L2 Area3

Equipment
NOTE
The IS-IS backbone network does not refer to a specific area.
For OSPF, inter-area routes are forwarded by the backbone area, and the SPF algorithm is
used only in the same area. For IS-IS, both Level-1 and Level-2 routes are calculated through
the SPF algorithm to generate the Shortest Path Tree (SPT).
IS-IS Network Types

IS-IS supports only two types of networks. IS-IS networks can be classified into the following
types based on the differences in physical links:
l Broadcast links: such as Ethernet and Token-Ring
l Point-to-point links: such as PPP and HDLC
On a Non-Broadcast Multi-Access (NBMA) network, the network type must be set to P2P for
sub-interfaces of ATM interfaces. IS-IS cannot run on Point to MultiPoint (P2MP) networks.
DIS and Pseudo Node

On broadcast networks, IS-IS needs to elect a Designated Intermediate System (DIS) from all
the routers.
The Level-1 and Level-2 DISs are elected separately. You can configure different priorities
for DISs of different levels. The router with the highest priority is elected as the DIS. If there
are multiple routers with the same highest priority on a broadcast network, the one with the
largest MAC address is elected. The DISs of different levels can be the same router or
different routers.
Unlike DR election in OSPF, the DIS election in IS-IS has the following features:
l The router with the priority 0 also takes part in the DIS election.
l When a new router that meets the requirements of being a DIS joins a broadcast network,
the router is selected as the new DIS, and the original pseudonode is deleted. This causes
LSP flooding.
On IS-IS broadcast networks, the routers (including non-DIS routers) of the same level in a
network segment set up adjacencies, which is different from the implementation on OSPF
networks. Figure 8-33 shows the networking for adjacencies.
Figure 8-33 DISs and adjacencies on IS-IS broadcast networks
L1/L2 L1/L2
L1 Adjacencies
L2 Adjacencies
L1 DIS L2 DIS

Equipment
A DIS is used to create and update pseudo nodes. It also generates LSPs of the pseudo nodes.
The LSPs describe the available routers on the network.
The pseudo node is used to simulate the virtual node on the broadcast network and is not a
real router. In IS-IS, a pseudo node is identified by the system ID of the DIS and the 1-byte
Circuit ID (its value is not 0).
With pseudo nodes, the network topology is simplified, and LSPs are shortened. When the
network changes, the number of generated LSPs is reduced. Therefore, the SPF consumes
fewer resources.
NOTE
On IS-IS broadcast networks, although all the routers set up adjacencies with each other, the LSDBs are
synchronized by the DISs.
Establishment of IS-IS Neighbor Relationship

Two IS-IS routers need to establish a neighbor relationship before exchanging protocol
packets for routing. On different networks, the modes for establishing IS-IS neighbors are
different.
l Establishment of a neighbor relationship on a broadcast link
Figure 8-34 shows the networking for establishing a neighbor relationship between
Router A and Router B.
Figure 8-34 Networking for a broadcast link
ATN-A ATN-B
ATN-C ATN-D
ATN A, ATN B, ATN C, and ATN D are Level-2 routers. ATN A is newly added to the
broadcast network. The process of establishing the neighbor relationship between ATN A
and ATN C or between ATN A and ATN D is similar to that between ATN A and ATN
B.
Figure 8-35 shows the process of establishing the neighbor relationship between Router
A and Router B.

Equipment
Figure 8-35 Establishing a neighbor relationship on a broadcast link

ATN -A ATN -B
SYS id:1111.1111.1111 SYS id:2222.2222.2222

L2 LAN IIH
( sys id:1111.1111.1111 neighbor:null ) neighbor ATN -A
L2 LAN IIH initialized
neighbor ATN -B ( sys id:2222.2222.2222 neighbor:ATN -A )
established L2 LAN IIH
( sys id:1111.1111.1111 neighbor:ATN -B )
neighbor ATN -A
L2 LAN IIH established
L2 LAN IIH
ATN A broadcasts a Level-2 LAN IS-IS Hello PDU. After receiving the PDU, ATN B
sets its neighbor status with ATN A to Initial. Then, ATN B responds to ATN A with a
Level-2 LAN IIH packet indicating that ATN A is a neighbor of ATN B. On receiving
the IIH packet, ATN A sets its neighbor status with ATN B to Up.
The network is a broadcast network, and a DIS needs to be elected. After the neighbor
relationship is established, routers wait for two intervals before sending Hello packets to
elect the DIS. The IIH packets exchanged by the routers contain the Priority field. The
router with the highest priority is elected as the DIS. If the routers have the same priority,
the router with the largest interface MAC address is elected as the DIS.
l Establishment of a neighbor relationship on a P2P link
The establishment of a neighbor relationship on a P2P link is different from that on a
broadcast link. On a P2P link, the establishment of a neighbor relationship can be
conducted in 2-way or 3-way handshake mode.
– 2-way mode
Upon receiving an IS-IS Hello packet, a router unidirectionally sets up the neighbor
relationship.
– 3-way mode
A neighbor relationship is established after IS-IS Hello PDUs are sent for three
times, which is similar to the establishment of a neighbor relationship on a
broadcast link.
NOTE
For details on 3-way handshake mechanism of IS-IS, see IS-IS 3-Way Handshake chapters.
Basic rules for establishing an IS-IS neighbor relationship are as follows:
l Only neighboring routers of the same level can set up the neighbor relationship with each
other.
l For Level-1 routers, their area IDs must be the same.
l Routers must be on the same network segment.

Equipment
Network types of IS-IS interfaces on both ends of a link must be consistent. Otherwise, a
neighbor relationship cannot be established. By simulating Ethernet interfaces as P2P
interfaces, you can establish a neighbor relationship on a P2P link.
IS-IS runs on the data-link layer and was initially designed for CLNP. Therefore, the
establishment of an IS-IS neighbor relationship is not related to IP addresses. In the
implementation of a device, IS-IS runs only over the IP layer. Therefore, IS-IS needs to check
the IP address of its neighbor. If secondary IP addresses are assigned to the interfaces, the
routers can still set up the IS-IS neighbor relationship only when either the primary IP
addresses or secondary IP addresses are on the same network segment.
When IP address unnumbered is not configured, if the IP address of a neighbor and the
address of an interface through which the local device receives packets are not on the same
network segment, the neighbor relationship cannot be set up, preventing IP unreachability.
The neighbor relationship can be set up if you prevent the device from checking the IP
addresses contained in received Hello PDUs.
l For P2P interfaces, you can prevent them from checking IP addresses.
l For Ethernet interfaces, simulate them as P2P interfaces and then prevent them from
checking IP addresses.
Process of Exchanging IS-IS LSPs

l LSP flooding
LSP flooding is a process in which a device generates an LSP and sends it to its
neighbors, and the neighbors send the received LSP to their neighbors except the devices
from which the LSP is received. In this manner, the LSP is flooded among the devices of
the same level. Through the flooding, each device of the same level has the same LSP
information and keeps a synchronized LSDB.
Each LSP has a 4-byte sequence number. When a device is started, the sequence number
of the first LSP sent by the device is 1. When a new LSP is generated, the sequence
number of the LSP is the sequence number of the previous LSP plus 1. The greater the
sequence number, the newer the LSP.
l Causes of LSP generation
All routers in the IS-IS routing domain can generate LSPs. The following events trigger
the generation of a new LSP:
– A neighbor goes Up or Down.
– An associated interface goes Up or Down.
– Imported IP routes change.
– Inter-area IP routes change.
– A new metric value is configured for an interface.
– Periodical updates occur.
l Processing of a new LSP received from a neighbor
a. The device adds the LSP to the LSDB and marks it as flooding.
b. The device sends the LSP to all interfaces except the one that received the LSP.
c. The neighbors flood the LSP to their neighbors.
l Synchronizing LSDBs between a newly added router and the DIS

Equipment
Figure 8-36 Process of updating LSDBs on a broadcast link
ATN -A
ATN -C
ATN -B (DIS)
LSP
ATN -C.00-00
CSNP
ATN -A.00-00
ATN -B.00-00
ATN -B.01-00 PSNP
ATN -C.00-00 ATN -A.00-00
ATN -B.00-00
ATN -B.01-00
LSP
ATN -A.00-00
ATN -B.00-00
ATN -B.01-00
– Newly added ATN C sends Hello packets to establish neighbor relationships with
the other routers in the broadcast domain. For details, see "Establishment of a
neighbor relationship on a broadcast link."
– After setting up the neighbor relationships with other routers, ATN C sends its LSP
to the following multicast addresses after the LSP timer expires:
Level-1: 01-80-C2-00-00-14
Level-2: 01-80-C2-00-00-15
Then, all neighbors on the network can receive the LSP.
– The DIS on the network segment adds the LSP received from ATN C to its LSDB.
After the CSNP timer expires, the DIS sends CSNPs to synchronize the LSDBs on
the network. By default, CSNPs are sent at an interval of 10 seconds.
– After receiving the CSNPs from the DIS, ATN C checks its LSDB and sends a
PSNP to request the LSPs it does not have.
– After receiving the PSNP, the DIS sends the required LSPs to synchronize LSDBs.
l Process of updating the LSDB of the DIS
– When the DIS receives an LSP, it searches the LSDB for related records. If the DIS
does not find the LSP in its LSDB, the DIS adds the LSP to its LSDB and
broadcasts the new LSDB.
– If the sequence number of the received LSP is greater than that of the local LSP in
the LSDB, the DIS replaces the local LSP with the received LSP in the LSDB, and
broadcasts the new LSDB.

Equipment
– If the sequence number of the received LSP is less than that of the local LSP in the
LSDB, the DIS sends the local LSP to the inbound interface.
– If the sequence number of the received LSP is equal to that of the local LSP in the
LSDB, the DIS checks whether the Remaining Lifetime of the received LSP is 0. If
the Remaining Lifetime of the received LSP is not 0 and the Remaining Lifetime of
the local LSP in the LSDB is 0, the DIS replaces the local LSP with the received
LSP and broadcasts the new LSDB. If the Remaining Lifetime of the received LSP
is 0 and the Remaining Lifetime of the local LSP in the LSDB is not 0, the DIS
sends the local LSP in the LSDB to the inbound interface.
– If the sequence numbers of the received LSP and local LSP in the LSDB are the
same, and the Remaining Lifetimes of the two LSPs are not 0, the DIS compares the
checksum of the two LSPs. If the checksum of the received LSP is greater than that
of the local LSP in the LSDB, the DIS replaces the local LSP with the received LSP
and broadcasts the new LSDB. If the checksum of the received LSP is less than that
of the local LSP in the LSDB, the DIS sends the local LSP in the LSDB to the
inbound interface.
same, the Remaining Lifetimes of the two LSPs are not 0, and the checksums of the
two LSPs are the same, the DIS does not forward the received LSP.
l Synchronizing the LSDB on a P2P link
Figure 8-37 Process of updating the LSDB on a P2P link
ATN -A ATN -B
PPP
LSP
ATN -A.00-00
PSNP
ATN -A.00-00
Retransmission
times out
LSP Resend
ATN -A.00-00 response packet
PSNP
ATN -A.00-00
a. When the neighbor relationship is set up for the first time, a router sends a CSNP to
its neighbor. If the LSDB of the neighbor and the CSNP are not synchronized, the
neighbor sends PSNP requests for a required LSP.
b. The router sends the required LSP to the neighbor and starts the LSP retransmission
timer. The router then waits for a PSNP from the neighbor as an acknowledgement
of receiving the LSP.
c. If the router does not receive the PSNP from the neighbor after the LSP
retransmission timer expires, it resends the LSP.

Equipment
NOTE
A PSNP on a P2P link is used as follows:

l An Ack packet to acknowledge the received LSP.
l A request packet to acquire LSPs.
l Updating the LSDB of the P2P
– If the sequence number of the received LSP is greater than that of the local LSP in
the LSDB, the device adds the received LSP to its LSDB and then sends a PSNP to
acknowledge the received LSP. At last, the device sends the LSP to all its neighbors
except the neighbor that sent the LSP.
– If the sequence number of the received LSP is less than that of the local LSP in the
LSDB, the device directly sends the local LSP to the neighbor and waits for a PSNP
from the neighbor.
– If the sequence number of the received LSP is equal to that of the local LSP in the
LSDB, the device checks whether the Remaining Lifetime of the received LSP is 0.
If the Remaining Lifetime of the received LSP is not 0, and the Remaining Lifetime
of the local LSP in the LSDB is 0, the device adds the received LSP to its LSDB,
sends a PSNP to acknowledge the received LSP, and then sends the received LSP to
all its neighbors except the neighbor that sent the LSP. If the sequence number of
the received LSP is 0, and the sequence number of the corresponding LSP in the
LSDB is not 0, the device directly sends the local LSP to the neighbor and waits for
a PSNP from the neighbor.
same, and the Remaining Lifetimes of the two LSPs are not 0, the device compares
the checksums of the two LSPs. If the checksum of the received LSP is greater than
that of the local LSP in the LSDB, the device adds the received LSP to its LSDB,
sends a PSNP to acknowledge the received LSP, and then sends the received LSP to
all its neighbors except the neighbor that sent the LSP. If the checksum of the
received LSP is less than that of the local LSP in the LSDB, the device directly
sends the local LSP to the neighbor and waits for a PSNP from the neighbor.
same, the Remaining Lifetimes of the two LSPs are not 0, and the checksums of the
two LSPs are the same, the device does not forward the received LSP.
8.5.2.2 IS-IS Multi-instance and Multi-process

For the routers that support VPNs, each IS-IS process can be associated with a specific VPN
instance. Therefore, multiple IS-IS processes can be created, with each process associated
with one VPN instances.
l IS-IS multi-instance allows multiple IS-IS instances to be configured on the same router.
l IS-IS multi-process allows multiple IS-IS processes to be created on the same VPN or
public network.
– The multi-process feature allows a set of interfaces to be associated with a specific
IS-IS process. This ensures that the specific IS-IS process performs all the protocol
operations only on the set of interfaces. Therefore, multiple IS-IS processes can run
on one router, and each process is responsible for a unique set of interfaces.
– Multiple IS-IS processes share the same RM routing table, and multiple IS-IS
instances use the RM routing tables of VPNs. Each VPN has its own RM routing
table.

Equipment
– When an IS-IS process is created, it can be associated with a VPN instance. Then,
the IS-IS process belongs to the VPN instance and processes events only in the
VPN instance. If the VPN instance is deleted, the IS-IS process is also deleted.
For easy management and effective control, IS-IS supports multi-process and multi-instance.
In the scenario where IS-IS is applied to users on private networks, after a VPN is created,
interfaces bound to the VPN and routes in the VPN are isolated from other VPNs and public
network data. In this case, you can adopt IS-IS multi-instance to deploy IS-IS in the VPN.
For the routers that support the VPN, each IS-IS process is associated with a specific VPN
instance. All the interfaces attached to an IS-IS process, therefore, must be associated with the
VPN instance with which this IS-IS process is associated.
At present, VPN instances are maintained by the VPN module. IS-IS multi-instance is
implemented by associating an IS-IS process with a VPN instance when creating the IS-IS
process.
When configuring IS-IS multi-instance and multi-process, note the following:

l An IS-IS instance must be associated with a VPN instance when the IS-IS instance is
created. If an IS-IS instance is not associated with a VPN instance when the IS-IS
instance is created, the IS-IS instance cannot be bound to any VPN instance later.
l An IS-IS process that is already associated with a VPN instance cannot be associated
with another VPN instance.
l An IS-IS process can be associated with only one VPN instance of a single protocol type
such as IPv4.
l Multiple IS-IS processes can be associated with one VPN instance.
l The interfaces to be enabled with IS-IS multi-instance must be associated with the VPN
instance to which the IS-IS instance is bound.
l The IS-IS process associated with a VPN instance belongs to the VPN. Therefore, if the
VPN instance is deleted, the IS-IS process is deleted with it.
l VPN instances cannot import routes from each other.
8.5.2.3 IS-IS Route Leaking

With the route leaking function, Level-1-2 IS-IS advertises to the specified Level-1 areas the
known routing information about other Level-1 and Level-2 areas.
In most cases, intra-area routes are managed by Level-1 devices. All Level-2 and Level-1-2
devices form a contiguous backbone area. A Level-1 area can only be connected to the
backbone area, not to another Level-1 area.
The routing information of a Level-1 area is advertised to a Level-2 area through a Level-1-2
device; therefore, Level-1-2 and Level-2 devices know the routing information of the entire
IS-IS domain. A Level-2 device, by default, does not inform a Level-1 area of the learned
routing information of either the backbone area or other Level-1 areas. Therefore, the Level-1
devices do not know the routing information beyond the area. As a result, the Level-1 devices
cannot select the optimal route to the destination beyond the area.
In IS-IS route leaking, you can define access control lists (ACLs), routing policies, and tags
on Level-1-2 routers so that these routers select eligible routes about other Level-1 areas and
the backbone area. The Level-1-2 routers can then advertise to their Level-1 areas these
eligible routes.

Equipment
Figure 8-38 Networking for route leaking
ATN A ATN C
Level-1 Level-1-2
1.1.1.1/24 1.1.1.2/24
cost50
cost10 4.4.4.2/24
4.4.4.1/24 6.6.6.2/24 6.6.6.1/24

2.2.2.1/24
cost10 cost10
ATN E ATN F
cost 10 Level -2
2.2.2.2/24 5.5.5.2/24 Level -2
cost10
5 .5.5.1/24 Area 20
3.3.3.1/24 3.3.3.2/24
ATN B ATND
Level -1 Level -1-2
Area 10
l In the figure, ATN A, ATN B, ATN C, and ATN D belong to area 10. ATN A and ATN B
are Level-1 routers; ATN C and ATN D are Level-1-2 routers.
l ATN E and ATN F are Level-2 routers and belong to area 20.
The optimal route for ATN A to send a packet to ATN F is ATN A -> ATN B -> ATN D ->
ATN E -> ATN F, which has a cost of 40. However, the selected route that the packet traverses
is ATN A -> ATN C -> ATN E -> ATN F, which has a cost of 70. This route is not the optimal
route from ATN A to ATN F.
Because ATN A does not detect the routes outside the local area, ATN A sends the packets to
other network segments through the default route generated by the nearest Level-1-2 router.
To ensure that the optimal route is selected, you can enable route leaking on the Level-1-2
routers ATN C and ATN D.
8.5.2.4 IS-IS Fast Convergence

IS-IS fast convergence is an extended feature of IS-IS implemented to speed up route
convergence. It includes the following concepts:
l Incremental SPF (I-SPF): recalculates only the routes of the changed nodes rather than
all the nodes when the network topology changes, which speeds up route calculation.
l Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.
l LSP fast flooding: speeds up LSP flooding.
l Intelligent timer: is applicable to LSP generation and SPF calculation.
The first timeout period of the timer is fixed. If an event triggers the timer before the set
timer expires, the next timeout period increases.
I-SPF
In ISO 10589, the Dijkstra algorithm is used to calculate routes. When a node changes on the
network, this algorithm is used to recalculate all routes. The calculation takes a long time and
consumes too many CPU resources, which affects the convergence speed.

Equipment
I-SPF improves the algorithm. Except for the first time, only the nodes that have changed
rather than all nodes on the network are calculated. The SPT generated using I-SPF is the
same as that generated using the Dijkstra algorithm. This significantly decreases CPU usage
and speeds up network convergence.
PRC
Similar to I-SPF, PRC calculates only the changed routes, but it does not calculate the shortest
path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a router. If the SPT
changes after I-SPF calculation, PRC calculates all the leaves only on the changed node. If the
SPT remains unchanged, PRC processes only the changed leaves.
For example, if IS-IS is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. PRC updates only the routes of this interface, consuming less CPU
resources.
PRC working with I-SPF further improves network convergence performance and replaces
the SPF algorithm.
NOTE
Only I-SPF and PRC are used to calculate IS-IS routes.
LSP Fast Flooding

When IS-IS receives new LSPs from other routers, it updates the LSPs in the LSDB and
periodically floods out the updated LSPs based on a timer.
With LSP fast flooding, when a device receives newer LSPs, it floods out the LSPs less than
the specified number before calculating routes, which speeds up network convergence.
Intelligent Timer
Although the route calculation algorithm is improved, the long interval for triggering route
calculation affects the convergence speed. Frequent network changes also consume too many
CPU resources. The SPF intelligent timer addresses these problems.
In most cases, an IS-IS network running normally is stable. Frequent changes on a network
are rather rare, and IS-IS does not calculate routes frequently. Therefore, a short period
(within milliseconds) can be configured as the first interval for route calculation. If the
network topology changes frequently, the interval set by the intelligent timer increases with
the number of calculations, which reduces CPU consumption.
The LSP generation intelligent timer is similar to the SPF intelligent timer. When the LSP
generation intelligent timer expires, the system generates a new LSP based on the current
topology. The original mechanism uses a timer with fixed intervals, which results in slow
convergence and high CPU consumption. Therefore, the LSP generation timer is designed as
an intelligent timer to respond to emergencies (for example, the interface goes Up or Down)
quickly and speed up network convergence. In addition, when the network changes
frequently, the interval for the intelligent timer becomes longer to reduce CPU consumption.

Equipment
8.5.2.5 Priority-based IS-IS Convergence

Priority-based IS-IS convergence allows specified routes (such as routes that match the
specified IP prefix list) to converge first when a great number of routes are available.
Different routes can be set with different convergence priorities.
You can assign the highest convergence priority to routes for key services so that these routes
converge first. This decreases the impact on key services and improves network reliability.
8.5.2.6 IS-IS LSP Fragment Extension

When the LSPs to be advertised by IS-IS contain much information, they are advertised in
multiple LSP fragments of the same system. The IS-IS LSP fragment extension attribute
allows an IS-IS router to generate more LSP fragments and carry more IS-IS information.
As defined in RFC 3786, virtual system IDs can be configured, and virtual LSPs that carry
routing information can be generated for IS-IS.
Terms
l Originating system: is a router that runs the IS-IS protocol. A single IS-IS process can
advertise its LSPs as multiple "virtual" routers do, except that the originating system
refers to a real IS-IS process.
l Normal system ID: is the system ID of the originating system.
l Additional system ID: assigned by network administrators, is used to generate additional
or extended LSP fragments. Up to 256 additional or extended LSP fragments can be
generated. Like a normal system ID, an additional system ID must be unique in the
routing domain.
The additional system ID, assigned by network administrators, is used to generate
additional or extended LSP fragments. Up to 256 additional or extended LSP fragments
can be generated. Like a normal system ID, an additional system ID must be unique in
the routing domain.
l Virtual system: identified by an additional system ID, is used to generate extended LSP
fragments. These fragments carry the additional system IDs in their LSP IDs.
Principles
IS-IS LSP fragments are identified by the LSP Number field in their LSP IDs. The LSP
Number field is 1 byte. An IS-IS process can generate a maximum of 256 fragments. A 1497-
byte LSP can carry about 30,000 routes. With fragment extension, more information can be
carried.
With additional system IDs (up to 50 virtual systems), an IS-IS process can generate a
maximum of 13056 LSP fragments.
When a virtual system and fragment extension are configured, an IS-IS router adds the
contents that cannot be contained in its LSPs to the LSPs of the virtual system and notifies
other routers of the relationship between the virtual system and itself through a special TLV in
the LSPs.
IS Alias ID TLV
A special TLV, IS Alias ID TLV, is defined in RFC 3786.

Equipment
Table 8-10 IS Alias ID TLV
Type 1 byte TLV type. If the value is 24, it indicates the IS Alias
ID TLV.
Length 1 byte TLV length.
System ID 6 bytes System ID.
Pseudonode 1 byte Pseudonode number.

number
sub-TLVs 1 byte Sub-TLVs length

length
sub-TLVs 0 to 247 bytes Sub-TLVs
Regardless of the operation mode, the originating system and virtual system send the LSPs
with fragment number 0 carrying the IS Alias ID TLV to indicate the originating system.
Operation Modes
The following figure shows the networking for LSP fragment extension, which can be run in
two different modes.
Figure 8-39 Networking for LSP fragment extension
ATNA1
ATNB ATNA
ATNA2
l The IS-IS router can run the LSP fragment extension feature in the following modes:
Mode-1: is used when some routers on the network do not support the LSP fragment
extension.
In this mode, virtual systems participate in the SPF calculation. The originating system
advertises LSPs that contain information about the links to each virtual system.
Similarly, each virtual system advertises LSPs that contain information about the links to

Equipment
the originating system. In this manner, the virtual systems function the same as the
physical devices connected to the originating system on the network.
Mode-1 is a transitional mode for earlier IS-IS versions that do not support fragment
extension. In the earlier versions, IS-IS cannot identify Alias ID TLVs. Therefore, the
LSP sent by a virtual system must resemble a common IS-IS LSP.
The LSP sent by a virtual system contains the same area address and overload bit as
those in the common LSP. If the LSPs sent by a virtual system contain TLVs specified in
other features, the TLVs must be the same as those in common LSPs.
LSPs sent by a virtual system carry information of the neighbor (the originating system),
and the carried cost is the maximum value minus 1. LSPs sent by the originating system
carry information of the neighbor (the virtual system), and the carried cost is 0. This
mechanism ensures that the virtual system is a node downstream of the originating
system when other devices calculate routes.
In Figure 8-39, ATN B does not support the LSP fragment extension; ATN A supports
the LSP fragment extension in mode-1, and ATN A1 and ATN A2 are virtual systems of
ATN A. In this example, ATN A1 and ATN A2 send LSPs carrying routing information
of ATN A. After receiving LSPs from ATN A, ATN A1, and ATN A2, ATN B considers
there to be three devices at the peer end and calculates routes normally. Because the cost
of the route from ATN A to ATN A1 and the cost of the route from ATN A to ATN A2
are both 0, the cost of the route from ATN B to ATN A is equal to that of the route from
ATN B to ATN A1.
l Mode-2: is used when all routers on the network support LSP fragment extension.
In this mode, virtual systems do not participate in the SPF calculation. All routers on the
network detect that the LSPs generated by the virtual systems belong to the originating
system.
Working in mode-2, IS-IS identifies IS Alias ID TLVs, which are used to calculate the
SPT and routes.
In Figure 8-39, ATN B supports LSP fragment extension; ATN A supports the LSP
fragment extension in mode-2; andATN A1 and ATN A2 send LSPs carrying routing
information of ATN A. When receiving LSPs from ATN A1 and ATN A2, ATN B
obtains IS Alias ID TLV and learns that the originating system of ATN A1 and ATN A2
is ATN A. ATN B considers information advertised by ATN A1 and ATN A2 to be about
ATN A.
Regardless of the LSP fragment extension mode in use, LSPs can be resolved. However, if
LSP fragment extension is not supported, only LSPs in mode-1 can be resolved.
Table 8-11 Comparison between LSP fragment extension mode-1 and mode-2
LSP Field Carried in Mode-1 Carried in

Mode-2
IS Alias ID Yes Yes
Area Yes No
Overload bit Yes Yes
IS NBR/IS EXTENDED NBR Yes No
Routing Yes Yes

Equipment
LSP Field Carried in Mode-1 Carried in

Mode-2
ATT bits Yes, with value 0 Yes, with value 0
P bit Yes, with value 0 Yes, with value 0
Process
After LSP fragment extension is configured, if information is lost because LSPs overflow, the
system restarts the IS-IS process. After being restarted, the originating system loads as much
routing information as possible. Any excessive information beyond the forwarding capability
of the system is added to the LSPs of the virtual systems for transmission.
Usage Scenario
NOTE
If there are non-Huawei devices on the network, LSP fragment extension must be set to mode-1.
Otherwise, these devices cannot identify LSPs.
Before setting up IS-IS neighbors or importing routes, it is recommended to configure LSP

fragment extension and virtual systems. If you set up IS-IS neighbors or import routes first
and the information to be carried exceeds the forwarding capability of 256 fragments, you
must restart the IS-IS process for the configurations to take effect.
8.5.2.7 IS-IS Administrative Tag

Administrative tags carry information about IP address prefixes and control advertisement of
IP prefixes in the IS-IS domain. They are used to control the import of routes of different
levels and areas, and to control different routing protocols and IS-IS multi-instances running
on the same router.
The value of an administrative tag is associated with certain attributes. If the cost-style is
wide, wide-compatible or compatible and the prefix of the reachable IP address to be
advertised by IS-IS has these attributes, IS-IS adds the administrative tag to the TLV in the
prefix. The tag is flooded with the prefix throughout the routing domain.
8.5.2.8 Dynamic Hostname Exchange

The dynamic hostname exchange mechanism provides a mapping from the hostname to
system ID for IS-IS routers.
On an IS-IS router without hostname exchange, information about IS-IS neighbors and
LSDBs is represented by a system ID with 12 hexadecimal digits, for example, aaaa.eeee.
1234. This representation is complicated and not easy to use.
To easily maintain and manage IS-IS networks, the dynamic hostname exchange mechanism
was introduced.
Dynamic hostname information is advertised in the form of a dynamic hostname TLV (type
137) in LSPs. The dynamic hostname exchange mechanism also provides a service to
associate a host name with the Designated IS (DIS) on a broadcast network. Then, this
mechanism advertises this association through LSPs in the form of a dynamic hostname TLV.

Equipment
On the ATN, routers with IS-IS dynamic hostname mapping enabled add the Dynamic
Hostname TLV (TLV type 137) that records the local host name to the LSPs they generate
before sending the LSPs.
Dynamic Hostname TLV (TLV type 137) includes the following fields:
l Type: indicates the dynamic hostname exchange mechanism.

l Length: indicates the total length of the value field.
l Value: indicates a string of 1 to 255 characters.
The Dynamic Hostname TLV is optional and can be inserted anywhere in an LSP. The
hostname value cannot be null. A router determines whether to add the TLV to LSPs to be
sent. The router that receives the LSPs determines to ignore or obtain the TLV for its mapping
table.
Implementation
l Matching rules
The dynamic hostname mechanism abides by the longest matching rule. First, System ID
+NSEL is first compared. If that does not match, the system ID is then compared.
l Dynamic hostname transmission
The dynamic hostname can be carried by the original LSP only.
l DIS dynamic hostname transmission
The DIS dynamic hostname is transmitted through the LSPs generated by the DIS.
l Dynamic hostname priority
The dynamic hostname takes precedence over the static hostname. When both dynamic
and static hostnames are configured, the dynamic hostname replaces the static hostname.
l Dynamic hostname configuration and resolution
The dynamic hostname is a maximum of 64 bytes, and a maximum of 255-byte content
can be resolved.
Usage Scenario
In maintenance and management, the hostname is easier to identify and retain than the system
ID. After a hostname is configured, it rather than the system ID is displayed when you view
information about IS-IS on the router.
The hostname exchange mechanism implemented on the ATN includes dynamic and static
hostname mapping. The system ID is replaced by the hostname in the following cases:
l When an IS-IS neighbor is displayed, the system ID of the IS-IS neighbor is replaced by
the dynamic hostname. If the IS-IS neighbor is the DIS, the system ID of the DIS is
replaced by the dynamic hostname of the neighbor.
l When an LSP in the IS-IS LSDB is displayed, the system ID in the LSP ID is replaced
by the dynamic hostname of the router that advertises the LSP.
l When details about the IS-IS LSDB are displayed, the Host Name field is included for
the LSP generated by the router where dynamic hostname exchange is enabled; the
system ID is replaced by the dynamic hostname of the IS-IS neighbor.

Equipment
8.5.2.9 IS-IS HA
NOTE
IS-IS HA includes hot standby, data backup, command line backup, batch backup, and real-
time backup.
IS-IS backs up data from the Active Main Board (AMB) to the Standby Main Board (SMB).
If the AMB fails, the SMB becomes active and takes traffic over from the AMB. IS-IS,
therefore, can keep working normally.
Basic Concepts
l Data backup
It indicates backup of data of processes and interfaces.
l Command line backup
If the AMB processes command lines successfully, it sends them to the SMB for
processing. If the AMB fails to process the command lines, it logs that the command
lines fail to take effect and does not send them to the SMB for processing. If the SMB
fails to process the command lines, the failure is recorded in a log.
Hot Standby
The IS-IS Hot Standby (HSB) feature is supported by devices.
IS-IS HSB allows IS-IS configurations on the AMB and those on the SMB to be consistent.
When an AMB/SMB switchover occurs, IS-IS on the new AMB performs GR. The new AMB
sends requests to neighbors to reestablish neighbor relationships and synchronize the LSDB.
Traffic, therefore, is not affected.
NOTE
Batch Backup
l Backing up data in batches
When the SMB is installed, all data of the AMB is backed up to the SMB. No
configuration can be changed during batch backup.
l Backing up command lines in batches
When the SMB is installed, all configurations of the AMB are backed up to the SMB.
No configuration can be changed during batch backup.
Real-time Backup
l Real-time backup of data
It indicates real-time backup of changed data of processes and interfaces to the SMB.
l Real-time backup of command lines
It indicates that command lines that were run successfully on the AMB are backed up to
the SMB.

Equipment
8.5.2.10 IS-IS 3-way Handshake

IS-IS introduces the 3-way handshake mechanism on P2P links to ensure a reliable data link
layer.
Based on ISO 10589, the IS-IS 2-way handshake mechanism uses Hello packets to set up P2P
adjacencies between neighboring devices. When a device receives a Hello packet from the
other end, it considers the other end Up and sets up an adjacency with it. However, this
mechanism has some serious shortcomings.
When two or more links exist between two routers, an adjacency can still be set up when one
link is Down and the other is Up in the same direction. As a result, the router that does not
detect the fault on the faulty link still forwards packets through this link, causing a forwarding
failure.
The 3-way handshake mechanism solves the problem on P2P links. In 3-way handshake
mode, the router considers a neighbor Up and sets up an adjacency with it only after
confirming that the neighbor has received the packet that the router sends.
In addition, the 3-way handshake mechanism uses the 32-bit Extended Local Circuit ID field,
which extends the original 8-bit Extended Local Circuit ID field and the limit of only 255 P2P
links.
NOTE
By default, the IS-IS 3-way handshake mechanism is implemented on P2P links, as defined in RFC
3373.
8.5.2.11 IS-IS GR
IS-IS Graceful Restart (GR) is a high availability (HA) technology and ensures non-stop
forwarding.
Because IS-IS is a link state routing protocol, all routers in an area must maintain the same
network topology and share the same LSDB.
After a master/slave switchover, no neighbor information is stored on the restarted router.
Therefore, the first Hello packets sent by the router after restart do not contain the neighbor
list. After receiving the Hello packets, the neighbor checks the 2-way neighbor relationship
and detects that it is not in the neighbor list of the Hello packets sent by the router. Then the
neighbor relationship is interrupted.
The neighbor then generates new LSPs and floods the topology changes to all other routers in
the area. The routers recalculate routes, which leads to a routing interruption or even a routing
loop.
Because no LSDB is stored on the restarted router, the router needs to synchronize its LSDB
with those of its neighbors.
When restarting IS-IS without GR mode, IS-IS neighbor relationships are reset, and LSPs are
regenerated and flooded. This triggers the SPF calculation in the entire area, which causes
route flapping and forwarding interruptions in the area.
The IETF defines IS-IS GR in RFC 3847. Protocol restarts are processed for both reserved
and unreserved FIB entries, preventing route flapping and traffic forwarding interruptions
caused by the restarts.
When a router fails, neighbors at the routing protocol layer detect that their neighbor
relationships are Down and then become Up again after a period. This is neighbor relationship

Equipment
flapping, which may cause route flapping, black-hole routes, or routing loops on the restarted
router, decreasing network reliability. To address this problem, GR was introduced.
Basic Concepts
IS-IS GR involves two roles: GR restarter and GR helper.
NOTE
l GR restarter: is the router that restarts in GR mode.

l GR helper: is another GR router that helps the restarter to complete the GR process. The
GR restarter has the capability of the GR helper.
NOTE
By default, the device supports the GR helper.
To implement GR, IS-IS introduces the restart Type-Length-Value (TLV), T1 timer, T2 timer,
and T3 timer.
Restart TLV
The restart TLV is an extended part of an IS-to-IS Hello (IIH) PDU. All IIH packets of the
router that supports IS-IS GR contain the restart TLV that carries the parameters for protocol
restarts. Figure 8-40 shows the format of the restart TLV.
Figure 8-40 Restart TLV

0 1 2 3 4 5 6 7
Type(211)
Length(1 to 9)
Reserved SA RA RR
Remaining Time
Restarting Neighbor System ID
Table 8-12 describes the fields of the restart TLV.
Table 8-12 Restart TLV fields
Type 1 byte TLV type. Type value 211 indicates the restart TLV.
Length 1 byte Length of value in the TLV.

Equipment
RR 1 bit Restart request bit. A router sends an RR packet to notify

neighbors of its restarting or starting and to require the
neighbors to retain the current IS-IS adjacency and return
CSNPs.
RA 1 bit Restart acknowledgement bit. A router sends an RA packet

to respond to RR packets.
SA 1 bit Suppress adjacency advertisement bit. The restarter uses an

SA packet to require its neighbors to suppress the broadcast
of their neighbor relationships to prevent routing loops.
Remaining 2 bytes Time during which the neighbor retains the adjacency, in
Time seconds. The length of the field is 2 bytes. When RA is set,
the value is mandatory.
Timers
IS-IS GR has three timers: T1, T2, and T3.
l T1
Any interface enabled with IS-IS GR maintains a T1 timer. On a Level-1-2 router,
broadcast interfaces maintain a T1 timer for Level-1 and Level-2 neighbor relationships.
If the GR restarter has already sent an IIH packet with RR being set but does not receive
any IIH packet that carries the restart TLV and the RA set from the GR helper even after
the T1 timer expires, the GR restarter resets the T1 timer and continues to send the
restart TLV.
If the ACK packet is received or the T1 timer expires three times, the T1 timer is
disabled. The default value of a T1 timer is 3 seconds.
l T2
Level-1 and Level-2 LSDBs maintain independent T2 timers.
The value of the T2 timer indicates the longest time during which the system waits for
the LSDB synchronization. The default value is 60s.
l T3
The entire system maintains a T3 timer.
T3 indicates the maximum time that a whole GR process is allowed to last.
If the T3 timer expires, GR fails.
The initial value of the T3 timer is 65535 seconds. After the IIH packets with RA set are
received from neighbors, the T3 timer uses the smallest value of the Remaining Time
field in the IIH packets.
The T3 timer only applies when devices are restarted.
Session Mechanism of IS-IS GR

GR triggered by a master/slave switchover or IS-IS process restart is referred to as restarting
during which FIB entries remain unchanged. GR triggered by a router restart is referred to as
starting during which FIB entries are updated.

Equipment
The following describes the process of IS-IS GR in restarting and starting modes:
IS-IS Restarting
Figure 8-41 shows the process of IS-IS restarting.
Figure 8-41 IS-IS restarting

GR Restarter GR Helper
Active/standby
switchover
Start T1, T2, IIH (Restart TLV, RR=1, RA=0, SA=0)

and T3 timers
IIH (Restart TLV, RR=0, RA=1, SA=0)

Reset T3 timer
CSNP
Delete T1 timer
LSPs
Delete T2 timer
Delete T3 timer and Flood LSPs Update the

Update the FIB table FIB table
1. After performing a protocol restart, the GR restarter performs the following actions:
– Starts T1, T2, and T3 timers.
– Sends IIH packets that contain the restart TLV from all interfaces. In these packets,
RR is set to 1, and RA and SA are set to 0.
2. After receiving an IIH packet, the GR helper performs the following actions:
– Maintains the neighbor relationship and updates the current Holdtime.
– Replies with an IIH packet containing the restart TLV. In the packet, RR is set to 0;
RA is set to 1, and the value of the Remaining Time field indicates the left time for
the Holdtime to expire.
– Sends CSNPs and all LSPs to the GR restarter.
NOTE
l On a P2P link, a neighbor must send CSNPs.

l On a LAN link, only the neighbor of the DIS sends CSNPs. If the DIS is restarted, a temporary
DIS is elected from the other routers on the LAN.
If the GR helper does not support GR, it ignores the restart TLV and resets the adjacency
with the GR restarter according to the normal processing of IS-IS.

Equipment
3. After the GR restarter receives the IIH response packet, in which RR is set to 0 and RA
is set to 1, from the neighbor, it performs the following actions:
– Compares the current value of the T3 timer with the value of the Remaining Time
field in the packet. The smaller value is used as the value of the T3 timer.
– Deletes the T1 timer maintained by the interface that receives the ACK packet and
CSNPs.
– If the interface does not receive the ACK packet or CSNPs, the GR restarter
repeatedly resets the T1 timer and resends the IIH packet that contains the restart
TLV. If the number of timeouts of the T1 timer exceeds the threshold value, the GR
restarter deletes the T1 timer and initiates the normal IS-IS processing to complete
LSDB synchronization.
4. After the GR restarter deletes the T1 timers on all interfaces, the synchronization with all
neighbors is complete when the CSNP list is cleared and all LSPs are collected. The T2
timer is then deleted.
5. After the T2 timer is deleted, LSDBs of the corresponding level are synchronized.
– In the case of a Level-1 or Level-2 router, SPF calculation is triggered.
– In the case of a Level-1-2 router, it determines whether the T2 timer of the other
level is also deleted. If both T2 timers are deleted, SPF calculation is triggered.
Otherwise, the router waits for the T2 timer of the other level to expire.
6. After all T2 timers are deleted, the GR restarter deletes the T3 timer and updates FIB
entries. The GR restarter re-generates the LSPs of each level and floods them. During
LSDB synchronization, the GR restarter deletes the LSPs generated before the restarting.
7. At this point, the IS-IS restarting of the GR restarter is complete.
IS-IS Starting
The starting device does not retain FIB entries. Before it starts, the starting device needs to
reset its adjacencies that are Up with its neighbors and suppress the neighbors from
advertising the adjacencies. The IS-IS starting process is different from the IS-IS restarting
process, as shown in Figure 8-42.

Equipment
Figure 8-42 IS-IS starting
Starting
Start T2 timer for IIH (Restart TLV, RR=0, RA=0, SA=1)

various LSPDBs
Reestablish the adjacency relationship
Start T1 timer
CSNP
Delete T1 timer
LSPs
Delete T2 timer
Update the Flood LSPs Update the

FIB table FIB table
1. After the GR restarter is started, it performs the following actions:

– Starts the T2 timer for the synchronization of LSDBs of each level.
– Sends IIH packets that contain the restart TLV from all interfaces.
If RR in the packet is set to 0, a router is started.
If SA in the packet is set to 1, the router requests its neighbor to suppress the
advertisement of their adjacency before the neighbor receives the IIH packet in
which SA is set to 0.
2. After the neighbor receives the IIH packet that carries the restart TLV, it performs the
following actions depending on whether GR is supported:
– GR is supported.
Re-initiates the adjacency.
Deletes the description of the adjacency with the GR restarter from the LSP to be
sent. The neighbor also ignores the link connected to the GR restarter when
performing SPF calculation until it receives an IIH packet in which SA is set to 0.
– GR is not supported.
Ignores the restart TLV and resets the adjacency with the GR restarter.

Equipment
Replies with an IIH packet that does not contain the restart TLV. The neighbor then
initiates the normal IS-IS processing. In this case, the neighbor does not suppress
the advertisement of the adjacency with the GR restarter. On a P2P link, the
neighbor also sends a CSNP.
3. After the adjacency is re-initiated, the GR restarter re-establishes the adjacency with the
neighbors on each interface. When an adjacency set on an interface is Up, the GR
restarter starts the T1 timer for the interface.
4. After the T1 timer expires, the GR restarter sends an IIH packet in which both RR and
SA are set to 1.
5. After the neighbor receives the IIH packet, it replies with an IIH packet, in which RR is
set to 0 and RA is set to 1, and sends a CSNP.
6. After the GR restarter receives the IIH ACK packet and CSNP from the neighbor, it
deletes the T1 timer.
If the GR restarter does not receive the IIH packet or CSNP, it repeatedly resets the T1
timer and resends the IIH packet in which RR and SA are set to 1. If the number of
timeouts of the T1 timer exceeds the threshold value, the GR restarter deletes the T1
timer and initiates the normal IS-IS processing to complete LSDB synchronization.
7. After receiving the CSNP from the helper, the GR restarter synchronizes the LSDB.
8. After the LSDB of this level is synchronized, the T2 timer is deleted.
9. After all T2 timers are deleted, the SPF calculation is started, and LSPs are regenerated
and flooded.
10. At this point, the IS-IS starting of the GR restarter is complete.
Usage Scenario
GR is typically applied to PEs, especially single-point PEs, preventing single points of failure
on a PE or master/slave control board switchovers due to maintenance operations, such as
software upgrades. GR ensures non-stop forwarding of key services. Figure 8-43 shows the
networking for the application of GR.

Equipment
Figure 8-43 GR on a carrier network
VPN A VPN B
CE-1 PE3 CE-2
PE1
IS-IS Level-2 PE4

VPN A IBGP Full mesh
PE2 AS#100 VPN B
CE-3
CE-4
NOTE
NSF is deployed on PE2 to prevent single points of failure on PE2. (IS-IS GR, and LDP GR run on
PE2).
On the PEs, IS-IS, or LDP GR is run. On the Ps, IS-IS or LDP GR is run. The MPU/SRUs on
the PEs and Ps work in backup mode.
8.5.2.12 IS-IS for IPv6

The draft-ietf-isis-ipv6 released by the IETF defines two new TLVs that can support IPv6
routes and a new Network Layer Protocol Identifier (NLPID), which ensures that IS-IS can
process and calculate IPv6 routes.
The two new TLVs are as follows:
l IPv6 Reachability
The IPv6 Reachability TLV indicates the reachability of a network by specifying the
route prefix and cost. The type value is 236 (0xEC).
l IPv6 Interface Address
The IPv6 Interface Address TLV is similar to the IP interface address TLV of IPv4 in
function, except that it changes the original 32-bit IPv4 address to a 128-bit IPv6
address. The type value is 232 (0xE8).
The NLPID is an 8-bit field that identifies network layer protocol packets. The NLPID of
IPv6 is 142 (0x8E). If an IS-IS router supports IPv6, it advertises routing information through
the NLPID value.

Equipment
8.5.2.13 IS-IS TE
IS-IS Traffic Engineering (TE) is an extension of IS-IS to support MPLS TE.
IS-IS TE supports MPLS establishment and maintenance of Constraint-based Routed Label

Switched Paths (CR-LSPs).
To establish CR-LSPs, MPLS needs to learn the traffic attributes of all the links in the local
area. MPLS can acquire the TE information of the links through IS-IS.
Traditional routers select the shortest path as the primary route regardless of other factors,
such as bandwidth, even when the path is congested.
Figure 8-44 Networking with IS-IS routing defects
ATN C
ATN D
ATN H
ATN B
ATN E
ATN A
ATN F ATN G
In Figure 8-44, all the links have the same cost. The shortest path from ATN A/ATN H to
ATN E is ATN A/ATN H -> ATN B -> ATN C -> ATN D -> ATN E. Data is forwarded along
this shortest path. The path ATN A/ATN H -> ATN B -> ATN C -> ATN D -> ATN E may be
congested, and the path ATN A/ATN H -> ATN B -> ATN F -> ATN G -> ATN D -> ATN E
may be idle.
To solve the preceding problem, the cost of the path ATN B-ATN C can be increased so that
the traffic is switched to the path ATN A/ATN H -> ATN B -> ATN F -> ATN G -> ATN D ->
ATN E.
This method eliminates the congestion on the link ATN A/ATN H -> ATN B -> ATN C ->
ATN D -> ATN E; however, the other link ATN A/ATN H -> ATN B -> ATN F -> ATN G ->
ATN D -> ATN E may be congested. In addition, on networks with complicated topologies,
changing the cost of one link may affect multiple routes.
As an overlay model, MPLS can set up a virtual topology over the physical network topology
and map traffic to this virtual topology, effectively combining MPLS and TE technology into
MPLS TE.
MPLS TE can resolve network congestion problems by allowing carriers can precisely control
the path through which traffic passes and prevent traffic from passing through congested
nodes. Meanwhile, MPLS TE can reserve resources during the establishment of LSPs to
ensure service quality.

Equipment
To ensure continuity of services, MPLS TE provides the CR-LSP backup and fast reroute
(FRR) mechanisms. If a link fault occurs, traffic can be switched immediately. Through
MPLS TE, service providers (SPs) can fully utilize the current network resources to provide
diverse services, optimize network resources, and methodically manage the network.
To accomplish the preceding tasks, MPLS TE needs to learn TE information about all devices
on the network. However, MPLS TE lacks a mechanism in which each device floods its TE
information throughout the entire network for TE information synchronization. However, IS-
IS does provide such a mechanism. Therefore, MPLS TE can advertise and synchronize TE
information with the help of IS-IS. To support MPLS TE, IS-IS needs to be extended.
In brief, IS-IS TE collects TE information on IS-IS networks and then transmits the TE
information to the CSPF module.
Basic Principle
As specified in RFC 5305 and RFC 4205, IS-IS TE defines new TLVs and sub-TLVs in IS-IS
LSPs to carry TE information, floods, synchronizes, and resolves TE information, and
transmits the resolved TE information to the CSPF module. IS-IS TE plays the role of a porter
in MPLS TE. Figure 8-45 shows the relationships between IS-IS TE, MPLS TE, and CSPF.
Figure 8-45 Relationships between MPLS TE, CSPF, and IS-IS TE
MPLS TE
TE management
Feedback
Advertising
And Adjust
CSPF IS-IS TE
calculating TE Flooding TE
collecting
To carry TE information in LSPs, IS-IS TE defines the following TLVs in RFC 5305:
l Extended IS reachability TLV
The Extended IS reachability TLV replaces the IS reachability TLV and extends the TLV
format using sub-TLVs. The implementation of sub-TLVs in TLVs is the same as that of
TLVs in LSPs. Sub-TLVs are used to carry TE information configured on physical
interfaces.
NOTE
Currently, all sub-TLVs defined in RFC 5305 and sub-TLV type 22 defined in RFC 4124 are
supported.

Equipment
Table 8-13 Sub-TLVs defined in Extended IS reachability TLV

Name Type Length (Byte)
Administrative Group 3 4
IPv4 Interface Address 6 4
IPv4 Neighbour Address 8 4
Maximum Link Bandwidth 9 4
Maximum Reserved Link Bandwidth 10 4
Unreserved Bandwidth 11 32
Traffic Engineering Default Metric 18 3
Bandwidth Constraints sub-TLV 22 36
l Traffic Engineering router ID TLV

The type of this TLV is 134, and this TLV carries a 4-byte router ID (MPLS LSR-ID). In
MPLS TE, each device has a unique router ID.
l Extended IP reachability TLV
The Extended IP reachability TLV replaces the IP reachability TLV and carries routing
information. It extends the length of the route cost field to 4 bytes and carries sub-TLVs.
l Shared Risk Link Group TLV
It is of TLV type 138 and used to carry information about the shared risk link group. This
TLV can carry information about multiple shared links, each of which is a 4-byte
positive integer.
IS-IS TE is implemented in two processes.
l Responding to MPLS TE configurations
IS-IS TE functions only after MPLS TE is enabled.
IS-IS TE updates the TE information in IS-IS LSPs based on MPLS TE configurations.
IS-IS TE transmits MPLS TE configurations to the CSPF module.
l Processing TE information in LSPs.
IS-IS TE extracts TE information from IS-IS LSPs and transmits the TE information to
the CSPF module.
Usage Scenario
IS-IS TE helps MPLS TE set up TE tunnels. In Figure 8-46, a TE tunnel is set up between
ATN A and ATN C.

Equipment
Figure 8-46 Networking for IS-IS TE
ATN B
ATN C
Tunnel
ATN D
The configuration requirements are as follows:

l Enable MPLS TE on ATN A and enable MPLS TE CSPF to calculate the path.
l Enable MPLS TE on ATN B, ATN C, and ATN D.
l Run IS-IS on ATN A, ATN B, ATN C, and ATN D for communication between the
routers and enable IS-IS TE on each router.
After the configuration, IS-IS on ATN A, ATN B, ATN C, and ATN D sends LSPs carrying
TE information configured on each router. ATN A then obtains the TE information of ATN B,
ATN C, and ATN D from the received LSPs. The CSPF module can calculate the path
required by the TE tunnel based on the TE information on the entire network.
8.5.2.14 IS-IS Shortcut (AA) and Advertise (FA)

IS-IS Shortcut (AA) and IS-IS Advertise (FA) calculate routes through TE tunnel interfaces.
The features are not clearly defined in RFC 3906. TE tunnel interfaces are used as outbound
interfaces of routes; therefore, packets are transmitted through MPLS instead of IP.
For the traffic transmitted through a specific route, MPLS guarantees the forwarding,
compared with IP which is unreliable. When IS-IS Shortcut (AA) and IS-IS Advertise (FA)
are configured, MPLS forwarding is implemented, and TE tunnel interfaces participate in
route calculation and are used as outbound interfaces of specific routes.

Equipment
Figure 8-47 Principles of IS-IS Shortcut (AA) and Advertise (FA)
TE tunnel FA 10
10 10 10 10
ATN -E ATN -A ATN -B ATN -C ATN -D

5
ATN -T
IS-IS Shortcut (AA)

If the TE tunnel does not participate in IS-IS route calculation, packets from ATN A to ATN T
pass through ATN B. Therefore, the interface that connects ATN A to ATN B is the outbound
interface.
If packets from ATN A to ATN C need to travel through the TE tunnel, you can enable IS-IS
Shortcut (AA) and the IS-IS process on the TE tunnel interfaces. Then, ATN A considers the
cost of the path to ATN C as 10 and then selects the tunnel interface as the outbound interface.
IS-IS Shortcut (AA) applies only to the local interface and functions unidirectionally.
l IS-IS Shortcut (AA) applies only to the local interface.

ATN A neither sends the message that ATN A has a direct route to ATN C, nor sends the
message that the cost of the route from ATN A to ATN T is 15. ATN C through the TE
tunnel, ATN E does not know that packets from ATN A can be transmitted to ATN C
through the TE tunnel and considers the cost of the route from ATN E to ATN T through
ATN A as 35 (10+10+10+5).
l IS-IS Shortcut (AA) functions unidirectionally.
Generally, IS-IS considers a link available only after bidirectional detection is
implemented on the link. If ATN B does not consider ATN A as its neighbor, ATN A
then does not consider the link to ATN B available.
IS-IS Shortcut (AA) applies only to the local interface. As a result, bidirectional
detection cannot be implemented. If a unidirectional tunnel works normally, the link is
available.
IS-IS Shortcut (AA) does not affect the original structure of the IS-IS SPT, regardless of
whether a TE tunnel exists. Apart from the link from ATN A to ATN B, and that from ATN B
to ATN C, a link marked with an S from ATN A to ATN C is added. S is short for Shortcut.
The link marked with an S participates in route calculation.
IS-IS Shortcut (AA) provides two types of metrics.
l Absolute metric
An absolute metric indicates that the metric of TE tunnels in IS-IS is fixed.

Equipment
l Relative metric
A relative metric indicates that the metric of TE tunnels in IS-IS is relative. The route
cost is the physical link cost plus the relative metric.
In Figure 8-47, if the relative metric is set to 1, the cost of the path from ATN A to ATN C
through the TE tunnel is 21 (10+10+1).
If the relative metric is set to 0, the TE tunnel and physical link have the same cost on the
outbound interface. If the relative metric is less than 0, the TE tunnel interface is preferred as
the outbound interface.
The metric of IS-IS Shortcut (AA) is prior to the IS-IS cost. If the metric of IS-IS Shortcut
(AA) is not configured, IS-IS uses the IS-IS cost of the TE tunnel interface. If the metric of
IS-IS-Shortcut (AA) is configured, IS-IS uses its value.
IS-IS Advertise (FA)

Similar to IS-IS Shortcut (AA), IS-IS Advertise (FA) also calculates routes through TE tunnel
interfaces. Currently, there is no RFCs or drafts detailing this feature.
The algorithm of IS-IS Advertise (FA) is the same as that of IS-IS Shortcut (AA).
The differences between IS-IS Advertise (FA) and IS-IS Shortcut (AA) are listed as follows:
l IS-IS Advertise (FA) advertises the TE tunnel information to other ISs, whereas IS-IS
Shortcut (AA) does not.
In Figure 8-47, if the TE tunnel is enabled with IS-IS Advertise (FA), ATN A advertises
information indicating that ATN C is its neighbor. The neighbor information is carried in
TLV type 22 with no sub-TLVs. That is, no TE information is carried. If the TE tunnel is
enabled with IS-IS Shortcut (AA), ATN A does not advertise such information.
l IS-IS Advertise (FA) functions only when bidirectional TE tunnels are configured.
If the TE tunnel is enabled with IS-IS Advertise (FA), ATN C must advertise information
indicating that ATN A is its neighbor. Then, the TE tunnel interface can be used by ATN
A to forward traffic. If the TE tunnel is enabled with IS-IS Shortcut (AA), ATN A does
not check whether ATN C is its neighbor.
l IS-IS Advertise (FA) affects the SPTs of other routers.
If the TE tunnel is enabled with IS-IS Advertise (FA), ATN A advertises the message
that ATN C is a neighbor of ATN A to other routers on the network. Other routers then
consider ATN C a neighbor of ATN A and add ATN C to the SPT without marking it
with an S.
l IS-IS Advertise (FA) does not support the relative metric.
IS-IS Advertise (FA) functions on the entire network. Therefore, note the following
points when deploying IS-IS Advertise (FA):
– TE tunnels enabled with IS-IS Advertise (FA) are preferred to be bidirectional.
– In Figure 8-47, you must enable IS-IS Advertise (FA) on the TE tunnel from ATN
C to ATN A so that the TE tunnel from ATN A to ATN C is available.
– If there are P2P neighbors between the two devices enabled with IS-IS Advertise
(FA), a unidirectional TE tunnel is also available.
– In Figure 8-47, if the TE tunnel from ATN A to ATN B is enabled with IS-IS
Advertise (FA), and ATN A and ATN B are connected through networks other than
Ethernet, this TE tunnel is available. In this case, the physical link from ATN B to

Equipment
ATN A functions as a TE tunnel enabled with IS-IS Advertise (FA) in the other
direction.
8.5.2.15 IS-IS Wide Metric

A small range of metrics cannot meet the requirements of large-scale networks.
In the earlier ISO 10589, the largest metric of an interface is 63. TLV type 128 and TLV type
130 contain information about routes, and TLV type 2 contains information about IS-IS
neighbors.
As defined in RFC 3784, with IS-IS wide metric, the largest metric of an interface is extended
to 16777215, and the largest metric of a route is 4261412864. With IS-IS wide metric
enabled, TLV type 135 contains information about routes; TLV type 22 contains information
about IS-IS neighbors.
l The following TLVs are used in narrow mode:
– IP Internal Reachability: carries routes within an area.
– IP External Reachability: carries routes outside an area.
– IS Neighbors: carries information about neighbors.
l The following TLVs are used in wide mode:
– Extended IP Reachability TLV: replaces the earlier IP Reachability TLV and carries
information about routes. This TLV expands the range of the route cost to 4 bytes
and carries sub-TLVs.
– IS Extended Neighbors TLV: carries information about neighbors.
NOTE
IS-IS in wide mode and IS-IS in narrow mode cannot communicate. If IS-IS in wide mode and IS-IS in
narrow mode need to communicate, you must change the mode to enable all routers on the network to
receive packets sent by other routers.
Table 8-14 Metric style carried in received and sent under different metric style
configurations
Configured Metric Metric Style Carried in Metric Style Carried in Sent

Style Received Packets Packets
Narrow Narrow Narrow
Narrow-compatible Narrow and wide Narrow
Compatible Narrow and wide Narrow and wide
Wide-compatible Narrow and wide Wide
Wide Wide Wide
When the metric style is set to compatible, IS-IS sends the information both in narrow and
wide modes.

Equipment
Process
NOTICE
Once the metric style is changed, the IS-IS process restarts. Therefore, exercise caution when
changing the metric style.
l If the metric style carried in sent packets is changed from narrow to wide:
The information previously carried by TLV type 128, TLV type 130, and TLV type 2 is
now carried by TLV type 135 and TLV type 22.
l If the metric style carried in sent packets is changed from wide to narrow:
The information previously carried by TLV type 135 and TLV type 22 is now carried by
TLV type 128, TLV type 130, and TLV type 2.
l If the metric style carried in sent packets is changed from narrow or wide to narrow and
wide:
The information previously carried in narrow or wide mode is now carried by TLV type
128, TLV type 130, TLV type 2, TLV type 135, and TLV type 22.
8.5.2.16 IS-IS LDP Synchronization

In the networking where primary and backup LSPs are used, if the primary LSP fails, traffic is
switched to the backup LSP. If the primary LSP recovers, traffic is switched back to the
primary LSP.
Because IGP route convergence completes before an LDP session is established, the original
LSP is deleted before the new LSP is established. As a result, LSP traffic is interrupted for a
long period of time.
In Figure 8-48, the primary LSP uses the path PE1 -> P1 -> P2 -> P3 -> PE2, and the backup
LSP uses the path PE1->P1->P4->P3->PE2.
Figure 8-48 Networking for IS-IS LDP synchronization
P2
PE1 P1 P3 PE2
P4

Equipment
IS-IS LDP synchronization on P1 and P2 can shorten the traffic interruption during traffic
switchback from the backup LSP to the primary LSP.
To prevent packet loss during traffic switchback, LDP-IGP synchronization delays switchback
of the traffic forwarded by IGP routes until LDP sessions are established. That is, before an
LSP is set up, the original LSP is not deleted and is still used to forward traffic.
Figure 8-49 LDP-IGP synchronization
1 2
Init
3 3
Hold 5 Hold
Down max cost
3 4
Hold max Cost
2
Timer Expired
4 4
Sync
Achieved
The numbers in Figure 8-49 are described as follows:

l 1: The interface is Up.
l 2: The LDP session is Down.
l 3: The interface is Down.
l 4: The LDP session is Up.
l 5: The LSP is unreachable, or the Hold Down timer expires.
l 6: The Hold max cost timer expires.
l State descriptions:
– Init: indicates the initial LDP-IGP synchronization state.
– Holdtimeout: indicates the expiry state of an interface.
– Holddown: indicates the holddown state of an interface. In the Holddown state, an
interface suppresses the receiving and sending of Hello packets.
– HoldMaxCost: indicates that an interface advertises the maximum cost.
– Sync Achieved: indicates the LDP-IGP synchronization state.
l State transition descriptions:

Equipment
– If an interface is in the Init state and the LDP session is Down, the interface changes
to the HoldDown state when it receives a message indicating that the interface is
Up.
– If an interface is in the Init state and the LDP session is Up, the interface changes to
the Achieve state when it receives a message indicating that the interface is Up.
– If an interface is in the Holdtimeout state, the interface changes to the Init state
when it receives a message indicating that the interface is Down.
– If an interface is in the Holdtimeout state, the interface changes to the Achieve state
when it receives a message indicating that the LDP session is Up.
– If an interface is in the HoldMaxCost state, the interface changes to the Achieve
state when it receives a message indicating that the interface is Up.
– If an interface is in the HoldMaxCost state, the interface changes to the Init state
when it receives a message indicating that the interface is Down.
– If the HoldMaxCost timer expires, an interface changes to the Holdtimeout state
when it does not receive a message indicating that the LDP session is Up.
– If an interface in the HoldDown state receives a message indicating that the LDP
session is Down, the interface state changes to Achieve.
– If an interface in the HoldDown state receives a message indicating that the
interface is Down, the interface state changes to Init.
– If an interface in the HoldDown state receives a message indicating that the Hold
Down timer expires, the interface state changes to HoldMaxCost.
– If an interface in the Achieve state receives a message indicating that the LDP
session is Down, the interface state changes to HoldMaxCost.
– If an interface in the Achieve state receives a message indicating that the interface
is Down, the interface state changes to Init.
Usage Scenario
In the networking shown in Figure 8-48, LDP-IGP synchronization can be configured to
prevent packet loss during traffic switchback from the backup LSP to the primary LSP.
8.5.2.17 BFD for IS-IS

BFD functions as a simple "Hello" protocol. In many aspects, it is similar to the adjacency
test of a routing protocol.
Two systems periodically send BFD packets on the path between them. If one system does not
receive any BFD packets from its peer within the detection period, the system detects that the
bidirectional path to its peer is faulty. Under some conditions, systems need to negotiate the
sending and receiving rates to reduce the load.
BFD is classified into static BFD and dynamic BFD.
NOTE
BFD uses the local discriminator and remote discriminator to differentiate multiple BFD sessions
between the same pair of systems.
l Static BFD
In static BFD, BFD session parameters including local and remote discriminators are set
using commands, and the requests for establishing BFD sessions are manually delivered.

Equipment
l Dynamic BFD(including BFD for IPv4)

In dynamic BFD, the establishment of BFD sessions is triggered by routing protocols.
The local discriminator is dynamically assigned, and the remote discriminator is learned
by a routing protocol.
In BFD for IS-IS, the establishment of a BFD session is dynamically triggered by IS-IS
instead of being performed manually by an administrator. When detecting a fault, BFD
notifies IS-IS of the fault through the RM module. IS-IS then sets the status of the associated
neighbor relationship to Down, immediately advertises the changed Link State PDU (LSP),
and performs incremental SPF. In this manner, fast route convergence is implemented.
Generally, the interval for sending Hello packets is set to 10s. The interval for advertising that
a neighbor is Down, that is, the Holddown time for keeping the neighbor relationship, is three
times the interval for sending Hello packets. If a router does not receive any Hello packet
from its neighbor within the Holddown time, the router deletes the associated neighbor
relationship.
A router can detect a neighbor fault at only the second level. As a result, a large number of
packets may be lost on a high-speed network.
To solve the problem, BFD provides link fault detection featuring light load and high speed
(in milliseconds).
BFD can provide millisecond-level fault detection. BFD does not take the place of the Hello
mechanism of IS-IS, but works with IS-IS to more quickly detect the faults that occur on
neighboring devices or links, and instructs IS-IS to recalculate routes to correctly guide packet
forwarding.
Static BFD
In static BFD, BFD session parameters including local and remote discriminators are set using
commands, and the requests for establishing BFD sessions are manually delivered.
In this mode, the creation and deletion of BFD sessions also need to be triggered manually,
which is inflexible and configuration errors can occur from user mistakes. For example, the
local discriminator and remote discriminator are incorrectly configured, which causes
abnormal functioning of the BFD session.
Dynamic BFD
Dynamic BFD is more flexible than static BFD. In dynamic BFD, routing protocols trigger
the establishment of BFD session. The establishment of a BFD-for-IPv4 session is triggered
by IS-IS when an IPv4 neighbor relationship is set up.
In setting up a new neighbor relationship, IS-IS sends parameters of the neighbors and
detection parameters (including source and destination IP addresses) to BFD. BFD then sets
up a session according to the received parameters. Dynamic BFD is more flexible than static
BFD.
The RM module provides related services for association with the BFD module for IS-IS.
Through RM, IS-IS prompts BFD to set up or tear down BFD sessions by sending notification
messages. In addition, BFD events are transmitted to IS-IS through RM.
Establishment and Deletion of BFD Sessions

l Conditions for setting up a BFD session

Equipment
– Basic IS-IS functions are configured on each router and IS-IS is enabled on the
interfaces of the routers.
– BFD is enabled on each router, and BFD for IPv4 is enabled on interfaces or
processes of the routers.
– BFD for IPv4 is enabled on interfaces or processes, and the status of the
neighboring router is Up (the DIS must be elected on a broadcast network).
l Process of setting up a BFD session
– P2P network
After the conditions for setting up a BFD session are satisfied, IS-IS instructs BFD
through RM to directly set up a BFD session between neighbors.
– Broadcast network
After the conditions for establishing BFD sessions are met, and the DIS is elected,
IS-IS instructs BFD through RM to establish a BFD session between the DIS and
each router. No BFD session is established between non-DISs.
On a broadcast network, the routers (including non-DIS routers) of the same level on the
same network segment can set up neighbor relationships. In the implementation of IS-IS
BFD, however, BFD sessions are set up between the DIS and non-DIS devices rather
than between non-DISs. On a P2P network, BFD sessions are directly set up between
neighbors.
If a Level-1-2 neighbor relationship is set up between two routers on a link, IS-IS sets up
two BFD sessions for the Level-1 and Level-2 neighbors on a broadcast network, but
sets up only one BFD session on a P2P network.
l Conditions for tearing down a BFD session
– P2P network
When a neighbor relationship that was set up on P2P interfaces by IS-IS is down
(that is, the neighbor relationship is not in the Up state) or when the IP protocol type
of a neighbor is deleted, IS-IS tears down the BFD session.
– Broadcast network
When a neighbor relationship that was set up on P2P interfaces by IS-IS is torn
down (that is, the neighbor relationship is not in the Up state) when the IP protocol
type of a neighbor is deleted, or when the DIS is re-elected, IS-IS tears down the
BFD session.
When the configurations of a dynamically established BFD session are deleted or BFD
for IS-IS is disabled on an interface, all BFD sessions to which neighbor relationships on
the interface correspond-between devices or between devices and the DIS are deleted.
After dynamic BFD is globally disabled in an IS-IS process, the BFD sessions on all the
interfaces in this IS-IS process are deleted.
NOTE
BFD detects only one-hop links between IS-IS neighbors, because IS-IS establishes only one-hop
neighbor relationships.
l Response to the Down event of a BFD session
When detecting a link failure, BFD generates a Down event, and then notifies RM of the
event. RM then instructs IS-IS to deletes the neighbor relationship. IS-IS recalculates
routes to speed up route convergence on the entire network. After BFD for IPv4 informs
IS-IS of the link failure, IS-IS changes only the IPv4 route.
When a router and its neighbor are Level-1-2 routers, they set up two neighbor
relationships, that is, the Level-1 neighbor relationship and the Level-2 neighbor

Equipment
relationship. Then, IS-IS sets up two BFD sessions for the Level-1 neighbor relationship
and Level-2 neighbor relationship. In this case, the RM module deletes the neighbor
relationship of a specific level.
Applicable Environment
NOTICE
You must configure BFD according to the actual network environment. If timer parameters
are set improperly, network flapping may occur.
BFD for IS-IS can quickly sense link changes to implement fast route convergence.
Figure 8-50 Networking for BFD for IS-IS
ATN A Switch ATN B
Primary path
Backup path
ATN C
BFD for IS-IS configuration requirements are as follows:

l Enable IS-IS on the routers, as shown in Figure 8-50.
l Enable BFD globally.
l Enable BFD for IS-IS on ATN A and ATN B.
When the link between ATN A and ATN B becomes faulty, BFD can quickly detect the fault
and notify IS-IS. IS-IS then changes the neighbor relationship on the interface to Down and
deletes the IP protocol type to which the neighbor relationship corresponds, which triggers
route calculation. In addition, IS-IS updates LSPs so that neighbors such as ATN C can
receive updated LSPs from ATN B.
8.5.2.18 IS-IS Auto FRR

IS-IS Auto Fast Reroute (FRR) is a type of dynamic IP FRR. In IS-IS Auto FRR, an IGP pre-
computes a backup link for a primary link based on the LSDBs on the entire network and
stores it in the forwarding table, and switches traffic to the backup link if the primary link
fails. In this manner, the failure recovery time can be within 50 ms.
Complying with RFC 5286 (Basic Specification for IP Fast Reroute Loop-Free Alternates),
IS-IS Auto FRR protects traffic when links or nodes become faulty.

Equipment
Background
With the development of networks, the services such as Voice over IP (VoIP) and online video
services require high-quality real-time transmission. Nevertheless, if an IS-IS link fault
occurs, traffic can be switched to a new link only after the processes, including fault
detection, LSP update, LSP flooding, route calculation, and FIB entry delivery, are complete.
As a result, it takes much more than 50 ms to rectify the fault, which cannot meet the
requirement for real-time transmission services on the network.
Implementation Principle
IS-IS Auto FRR pre-computes a backup link by using the Loop-Free Alternate (LFA)
algorithm, and then adds the backup link and the primary link to the forwarding table. In the
case of an IS-IS network failure, IS-IS Auto FRR can fast switch traffic to the backup link
before routes on the control plane converge. This ensures normal transmission of traffic and
improves the reliability of the IS-IS network.
The backup link is calculated through the LFA algorithm. With the neighbor that can provide
the backup link being the root, the shortest path to the destination node is calculated by a
device through the SPF algorithm. Then, the loop-free backup link is calculated according to
the inequality defined in RFC 5286.
IS-IS Auto FRR can filter backup routes that need to be added to the IP routing table. Only
the backup routes matching the filtering policy are added to the IP routing table. In this
manner, users can flexibly control the addition of IS-IS backup routes to the IP routing table.
In the scenario where a BFD session is bound to IS-IS Auto FRR, when BFD detects a link
fault on an interface, the BFD session goes Down, triggering FRR on the interface. After that,
the traffic is switched from the faulty link to the backup link, which protects the traffic.
IS-IS Auto FRR supports the following types of TE links:
l IP protecting TE
As shown in Figure 8-51, the TE tunnel has the smallest IS-IS cost among the paths
from ATN S to ATN D. Therefore, ATN S selects the TE tunnel as the primary path to
ATN D. The path ATN S->ATN N->ATN D has the second smallest cost. According to
the LFA algorithm, ATN S selects the path ATN S->ATN N->ATN D as the backup path.
The outbound interface of the backup path is the physical interface that connects ATN S
to ATN N.
NOTE
If the outbound interface of the backup link is the actual outbound interface of the TE tunnel, IP
protecting TE fails.

Equipment
Figure 8-51 IP protecting TE
ATNS TE Tunnel: IS-IS cost = 1 ATND
IS-IS cost = 13
IS
-IS
co
=1
st
st
=1
co
0
-IS
IS
ATNN
Traffic in normal
Traffic in case of failure
l TE protecting IP
As shown in Figure 8-52, the physical path ATN S-->ATN N-->ATN D has the smallest
IS-IS metric among the paths from ATN S to ATN D. Therefore, ATN S prefers the path
ATN S-->ATN N-->ATN D as the primary path from ATN S to ATN D. The IS-IS cost of
the TE tunnel is 12, and the explicit path of the TE tunnel is the direct link from ATN S
to ATN D. The IS-IS metric of the direct link from ATN S to ATN D is 13, which is
greater than the IS-IS metric of the TE tunnel. Therefore, IS-IS selects the TE tunnel as
the backup path. TE protecting IP is implemented.

Equipment
Figure 8-52 TE protecting IP
ATNS TE Tunnel: IS-IS cost = 12 ATND
IS-IS cost = 13
1
=
st
IS
o
-IS
c
-IS
co IS
st
=
10
ATNN
Traffic in normal
Traffic in case of failure
IS-IS Auto FRR traffic protection is classified into link protection and link-node dual
protection. Distance_opt(X, Y) indicates the shortest path between node X and node Y.
Link protection: indicates that the object to be protected is the traffic passing through an IS-IS
Auto FRR-enabled link. The link cost must satisfy the inequality: Distance_opt(N, D) <
Distance_opt(N, S) + Distance_opt(S, D). In the inequality, S indicates the source node of
traffic, N indicates a node on the backup link, and D indicates the destination node of traffic.
As shown in Figure 8-53, traffic is forwarded from ATN S to ATN D. The link cost satisfies
the link protection inequality. When the primary link fails, ATN S switches traffic to the
backup link from ATN S to ATN N so that the traffic can be further transmitted along
downstream paths. This ensures that the traffic interruption period is less than 50 ms.
Figure 8-53 IS-IS Auto FRR link protection
cost = 10
ATNS co ATND
st
10
=
=
10
st
co
ATNN

Equipment
Link-node dual protection: Figure 8-54 shows link-node dual protection of IS-IS Auto FRR.
Node protection takes precedence over link protection.
Link-node dual protection must satisfy the following situations:
l The link cost must satisfy the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
l The interface cost of the router must satisfy the inequality: Distance_opt(N, D) <
Distance_opt(N, E) + Distance_opt(E, D).
S indicates the source node of traffic; E indicates the faulty node; N indicates the node on the
backup link; D indicates the destination node of traffic.
Figure 8-54 IS-IS Auto FRR link-node dual protection
ATNE
co
st
5
=
=
st
10
co
ATNS co ATND
st
10
=
=
10
st
co
ATNN
8.5.2.19 IS-IS Authentication

IS-IS authentication encrypts IS-IS packets by adding the authentication field to packets to
ensure network security. When a local router receives IS-IS packets from a remote router, the
local router discards the packets if the authentication passwords carried in these packets do
not match the local one, which protects the local device from potential attacks.
Based on packet types, the authentication is classified as follows:
l Area authentication
Area authentication is configured in the IS-IS process view to authenticate Level-1
CSNPs, PSNPs, and LSPs.
l Routing domain authentication
Routing domain authentication is configured in the IS-IS process view to authenticate
Level-2 CSNPs, PSNPs, and LSPs.
l Interface authentication
Interface authentication is configured in the interface view to authenticate Level-1 and
Level-2 Hello packets.

Equipment
Based on the authentication modes of packets, authentication is classified into the following
types:
l Simple authentication
The authenticated party directly adds the configured password to packets for
authentication. This authentication mode provides the lowest password security. Because
this imposes security risks, the MD5 authentication was introduced.
l MD5 authentication
In MD5 authentication, passwords are encrypted through the MD5 algorithm before they
are added to packets. This improves the security of passwords.
l Keychain authentication
Keychain authentication further improves network security with configurable key chain
that changes with time.
IS-IS provides a TLV to carry authentication information. The TLV components are as
follows:
l Type (of the authentication packets): is defined by ISO as 10, with a length of 1 byte.
l Length: indicates the length of the authentication TLV, which is 1 byte.
l Value: indicates the authentication information, including authentication type and
password, which ranges from 1 to 254 bytes.
– Type 0 is reserved.
– Type 1 indicates simple authentication.
– Type 54 indicates MD5 authentication.
– Type 255 indicates private routing domain authentication.
The authentication password is saved in the following modes:
l The authentication password for IIH packets is saved on interfaces for interface
authentication.
l The authentication password for Level-1 LSPs and SNPs is saved in the IS-IS process
for area authentication.
l The authentication password for Level-2 LSPs and SNPs is saved in the IS-IS process
for routing domain authentication.
Interface authentication can be classified as follows:
l A router sends authentication packets with the authentication TLV and verifies the
authentication information of the packets it receives.
l A router sends authentication packets with the authentication TLV but does not verify the
authentication information of the packets it receives.
For area authentication and routing domain authentication, you can enable a router to
authenticate SNPs and LSPs separately in the following ways:
l A router sends LSPs and SNPs that carry the authentication TLV and verifies the
authentication information of received LSPs and SNPs.
l A router sends LSPs that carry the authentication TLV and verifies the authentication
information of received LSPs. The router sends SNPs that carry the authentication TLV
but does not verify the authentication information of received SNPs.

Equipment
l A router sends LSPs that carry the authentication TLV and verifies the authentication
information of received LSPs. The router sends SNPs without the authentication TLV
and does not verify the authentication information of received SNPs.
l A router sends LSPs and SNPs that carry the authentication TLV but does not verify the
authentication information of received LSPs and SNPs.
Usage Scenario
Figure 8-55 Networking for IS-IS authentication on a broadcast network
ATN A ATN B ATN C
ATN D ATN E
The requirements for IS-IS authentication on a broadcast network are as follows:
l IS-IS neighbor relationships can be set up between multiple routers on the same network
only when interface authentication is configured in the same manner on all the routers.
l When multiple routers are in the same area, you must configure area authentication the
same way on all the routers to ensure synchronization of their Level-1 LSDBs.
l When Level-2 neighbor relationships are set up between multiple routers, you must
configure routing domain authentication the same way on all the routers to ensure the
synchronization of their Level-2 LSDBs.
Terms
Term Description
s
TLV Type-Length-Value. TLV encoding features high efficiency and expansibility. It is

also called Code-Length-Value (CLV).
T indicates that different types can be defined through different values.
L indicates the total length of the value field.
V indicates the actual data of the TLV and is most important.
TLV encoding features high expansibility. New TLVs can be added to support
new features, which is flexible in describing information loaded in packets.

Equipment
Term Description
s
LSP Link State Protocol Data Unit. It broadcasts link states in the area and contains all
information about a router. The information includes IS-IS neighbors, IP address
prefix, the ES it is connected to, and the area address. LSPs are classified as
Level-1 LSPs or Level-2 LSPs. A router generates one Level-1 LSP and one
Level-2 LSP with fragments included.
CSNP Complete Sequence Numbers Protocol Data Unit. It contains brief information
about the local LSDB and is used to synchronize the LSDBs of neighbors. CSNPs
are sent and resolved at different levels.
DIS Designated Intermediate System
Pseud A virtual node that is used to simulate a broadcast network. It is generated by the
onode DIS and sets up neighbor relationships with all routers on the broadcast network.
PE Provider Edge
CE Customer Edge
NSR Non-Stop Routing

Abbreviation
IGP Interior Gateway Protocol
LSP Link State Protocol Data Unit
CSNP Complete Sequence Numbers Protocol Data Unit
SNP Sequence Number PDU
DIS Designated Intermediate System
TLV Type-Length-Value
SPF Shortest Path First
MI Multiple Instance
MT Multi-topology
Local-MT Local Multicast-Topology
URT Unicast Routing Table
MIGP IGP Routing Table for Multicast
GR Graceful Restart

Equipment

Abbreviation
RM Routing Management
VPN Virtual Private Networks
CSPF Constraint-based Shortest Path First
PE Provider Edge
CE Customers Edge
RIB Routing Information Base
8.5.4 Appendixes
Feature Supported Supported Differences
by IPv4 by IPv6
IS-IS TE Yes No This feature applies to only IPv4.
8.6 OSPF
8.6.1 Introduction
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2, defined in RFC 2328, is intended for IPv4. OSPF stated in this
document refers to OSPFv2, unless otherwise stated.
Purpose
Before the emergence of OSPF, the Routing Information Protocol (RIP) was widely used as
an IGP on networks.
RIP is a distance vector algorithm-based routing protocol. Due to its slow convergence,
routing loops, and poor scalability, RIP is gradually being replaced with OSPF.

Equipment
As a link-state protocol, OSPF can solve many problems encountered by RIP. Additionally,
OSPF has the following advantages:
l Receives and sends packets in multicast mode, which reduces the load on devices that do
not run OSPF.
l Supports Classless Interdomain Routing (CIDR).
l Supports load balancing among equal-cost routes.
l Supports packet encryption.
With the preceding advantages, OSPF is widely accepted and used as an IGP.
8.6.2 Principles
8.6.2.1 Fundamentals of OSPF

OSPF has the following advantages:
l Divides an Autonomous System (AS) into a single area or multiple logical areas.
l Sends Link State Advertisements (LSA) to advertise routes.
l Synchronizes routing information by exchanging OSPF packets between routers in
OSPF areas.
l Encapsulates the OSPF packets in IP packets and sends the packets in unicast or
multicast mode.
l Enabling the feature on an OSPF interface is supported to allow users to manage OSPF
using NMS.
l The same-router-ID detection and recovery function is supported. After OSPF detects
same router IDs, it selects a new router ID to avoid route flapping.
OSPF Packet Type
Table 8-15 OSPF packet types

Packet Type Function
Hello Hello packets are sent periodically to discover and

maintain OSPF neighbor relationships.
Database Description (DD) DD packets carry brief information about the local
Link State Database (LSDB) and are used to
synchronize the LSDBs of two routers.
Link State Request (LSR) LSR packets are used to request the required LSAs
from neighbors.
LSR packets are sent only after DD packets have
been exchanged successfully.
Link State Update (LSU) LSU packets are used to send the required LSAs to
neighbors.
Link State Acknowledgment LSAck packets are used to acknowledge the received
(LSAck) LSAs.

Equipment
LSA Type
Table 8-16 OSPF LSA types

LSA Function
Router-LSA (Type1) Describes the link status and link cost of the ATN.
Generated by each ATN and advertised in the area to which
the ATN belongs.
Network-LSA (Type2) Describes the link status of all routers in the local network
segment. Generated by a designated router (DR) and
advertised in the area to which the DR belongs.
Network-summary-LSA Describes the routes in a network segment and advertises the

(Type3) routes to the related non-totally STUB or NSSA area.
ASBR-summary-LSA Describes routes to an Autonomous System Boundary

(Type4) Router (ASBR). Generated by an ABR and advertised in the
related areas, except the area to which the ASBR belongs.
AS-external-LSA (Type5) Describes routes to a destination outside the AS. Generated

by an ASBR and advertised in all areas, except stub areas
and Not-So-Stubby Areas (NSSA).
NSSA-LSA (Type7) Describes routes to a destination outside the AS. Generated

by an ASBR and advertised in NSSAs only.
Opaque-LSA (Type9/ Provides a general mechanism for OSPF extension:

Type10/Type11) l Type9 LSAs are advertised in the network segment
where interfaces reside. Graceful LSAs used to support
GR are one example of Type9 LSAs.
l Type10 LSAs are advertised in an area. LSAs used to
support TE are one example of Type10 LSAs.
l Type11 LSAs are advertised in an AS. Currently, no
application examples of Type11 LSAs exist.
Router Type
Figure 8-56 illustrates the common types of routers in OSPF.

Equipment
Figure 8-56 Router types
IS-IS ASBR
Area1 Area4
Internal Router Backbone Router
Area0
Area2 ABR Area3
Table 8-17 OSPF router types

Router Description
Internal router All interfaces of an internal router belong to the same

OSPF area.
Area Border Router (ABR) An ABR can belong to two or more areas; one of the areas
must be a backbone area.
An ABR is used to connect the backbone area and non-
backbone areas. It can be physically or logically connected
to the backbone area.
Backbone router At least one interface on a backbone router belongs to the

backbone area.
All ABRs and internal routers in Area 0 are backbone
routers.
AS Boundary Router An ASBR exchanges routing information with other ASs.

(ASBR) An ASBR does not have to reside at the boundary of an
AS. It can be an internal router or an ABR.
OSPF Route Type

Inter-area routes and intra-area routes define the network structure of an AS. External routes
define how to select a route to a destination outside an AS. OSPF classifies the imported AS
external routes into Type1 and Type2 external routes.
Table 8-18 lists route types in descending order of priority.

Equipment
Table 8-18 OSPF route types

Route Description
Intra area Intra-area routes
Inter area Inter-area routes
Type1 external route Because of the high reliability of Type1 external routes,
the calculated cost of external routes equals that of AS
internal routes, and.
In other words, the cost of a Type1 external route equals
the cost of the route from the router to the corresponding
ASBR plus the cost of the route from the ASBR to the
destination.
Type2 external route Because of the low reliability of Type2 external routes,
their costs are considered to be greater than the cost of
any internal path to an ASBR.
The cost of a Type2 external route equals the cost of the
route from the ASBR to the destination.
Area Type
Table 8-19 OSPF area types

Area Function
Totally stub area Allows Type3 default routes that are advertised by an ABR, and
denies inter-area routes and the routes outside an AS.
Stub area Allows inter-area routes, unlike a totally stub area.
NSSA area Imports routes from outside an AS, unlike a stub area. An ASBR
advertises Type7 LSAs in the local area.
Totally NSSA Denies inter-area routes, unlike an NSSA.
OSPF Network Type

OSPF classifies network, in terms of link layer protocols, into the following types listed in
Table 8-20.

Equipment
Table 8-20 OSPF network types
Network Description
Broadcast If the link layer protocol is Ethernet or Fiber Distributed Data

Interface (FDDI), OSPF defaults the network type to broadcast.
In broadcast networks:
l Hello and LSAck packets are transmitted in multicast mode.
LSU packets are first transmitted in multicast mode and
retransmitted in unicast mode. The address 224.0.0.5 is the
reserved IP multicast address of the OSPF router, and the
address 224.0.0.6 is the reserved IP multicast address of the
OSPF DR.
l DD packets and LSR packets are transmitted in unicast
mode.
Non-Broadcast If the link layer protocol is frame relay (FR), ATM, or X.25,
Multiple Access OSPF defaults the network type to NBMA.
(NBMA) In NBMA networks, protocol packets, such as Hello, DD, LSR,
LSU, and LSAck packets, are transmitted in unicast mode.
Point-to-Multipoint Regardless of the link layer protocol, OSPF does not default the
(P2MP) network type to P2MP. A P2MP network must be forcibly
changed from other network types. The common practice is to
change a non-fully connected NBMA network to a P2MP
network.
In P2MP networks:
l Hello packets are transmitted in multicast mode through the
multicast address 224.0.0.5.
l Other protocol packets, such as DD, LSR, LSU, and LSAck
packets, are transmitted in unicast mode.
Point-to-point (P2P) If the link layer protocol is PPP, HDLC, or LAPB, OSPF
defaults the network type to P2P.
In broadcast networks:
l In P2P networks, protocol packets, such as Hello, DD, LSR,
LSU, and LSAck packets, are transmitted in multicast mode
through the multicast address 224.0.0.5.
l LSU packets are retransmitted in unicast mode.
Stub Area
A stub area is a special area where ABRs do not flood the received external routes. In a stub
area, the size of the routing table of routers and routing information in transmission are
greatly reduced.
Configuring a stub area in a network is optional. Not all areas can be configured as stub areas.
Generally, a stub area is a non-backbone area with only one ABR and is located at the AS
boundary.

Equipment
To ensure the reachability of a destination outside an AS, the ABR in a stub area generates a
default route and advertises it to non-ABRs in the stub area.
When you configure a stub area, note the following:
l The backbone area cannot be configured as a stub area.
l If an area needs to be configured as a stub area, use the stub command to configure all
the routers in this area.
l An ASBR cannot exist in a stub area. That is, external routes are not flooded in the stub
area.
l A virtual link cannot pass through a stub area.
OSPF Route Aggregation

Route aggregation occurs when routes with the same prefix are aggregated into one route and
the aggregated route is advertised in other areas.
After route aggregation, the route information can be reduced. Consequently, the size of
routing tables is reduced, which improves the performance of routers.
Route aggregation can be carried out in the following ways:
l ABR aggregation
When an ABR transmits routing information to other areas, the router originates Type3
LSAs per network segment. If any consecutive segments exist in this area, you can run
the related command to aggregate these segments into one segment. An ABR sends only
one aggregated LSA. Any LSA that belongs to the aggregated network segment
specified by the command is not transmitted separately.
l ASBR aggregation
After route aggregation is enabled, if the local router is an ASBR, it aggregates the
imported Type5 LSAs within the aggregated address range. After an NSSA is
configured, the ASBR aggregates the imported Type7 LSAs within the aggregated
address range.
If the local router is both an ABR and ASBR, it aggregates the Type5 LSAs that are
transformed from Type7 LSAs.
OSPF Default Route

A default route is the route whose destination address and mask are all 0s. When a router does
not have exact matching routes, it can forward packets through default routes.
OSPF default routes are applicable to the following situations:
l An ABR advertises the default Type3 summary-LSAs to instruct intra-area routers to
forward packets to other areas.
l An ASBR advertises default Type5 ASE LSAs or Type7 NSSA LSAs to instruct intra-
AS routers to forward packets to other ASs.
The principles for advertising OSPF LSAs that describe default routes are as follows:
l An OSPF router advertises an LSA that describes a default route only when an interface
on the OSPF router is connected to a network outside an area.
l If an OSPF router has already advertised an LSA that describes a default route, the OSPF
route no longer learns LSAs of the same type advertised by other routers. The OSPF

Equipment
router calculates routes by using an LSA that describes a default route in an LSDB, but
not an LSA of the same type advertised by another router.
l If the OSPF router needs to advertise an LSA that describes a default route only with the
help of another route, the route cannot be the one in the local routing domain. That is, it
cannot be the one learned by the local OSPF process. The external default route guides
forwarding outside the local OSPF routing domain, but the next hop of the routes in the
local OSPF routing domain are inside the local OSPF routing domain, and fails to
forward packets outside the local OSPF routing domain.
l The router checks whether there is any peer with the state of full in area 0 before
advertising the default route. The router advertises the default route only when there are
such peers because if there is no such peer, the backbone area cannot forward packets
and advertising the default route is meaningless.
Table 8-21 shows the advertisement of default routes in different areas.
Table 8-21 Principles for advertising area-specific default routes

Area Type Advertising Principles
Common area By default, no default route is generated in a common area, even if

a default route exists in the common area.
After a default route is generated by another process, the default
route must be advertised within an entire OSPF AS. To help OSPF
generate a default route, you need to run a command on an ASBR.
After the configuration, a default ASE LSA (Type5 LSA) is
generated and advertised in the entire OSPF AS.
Stub area AS external routes in Type5 LSAs cannot be advertised in a stub

area.
Routers in the stub area must learn AS external routes from an
ABR. The ABR automatically generates a default summary-LSA
(Type3 LSA) and advertises it in the entire stub area. Routers in the
stub area can obtain reachable AS external routes through the ABR.
Totally stub area AS external routes in Type5 LSAs or inter-area routes in Type3
LSAs cannot be advertised in a totally stub area.
Routers in the totally stub area have to learn AS external routes and
routes to other areas through an ABR. To help OSPF generate a
default router, you need to configure a totally stub area. After the
totally stub area is configured, an ABR automatically generates a
default summary-LSA (Type3 LSA) and advertises it to the entire
totally stub area. Routers in the totally stub area can obtain
reachable AS external routes and routes to other areas through the
ABR.

Equipment
Area Type Advertising Principles
NSSA area A small number of AS external routes that are obtained through the
ASBR in the NSSA can be imported to an NSSA. Routes to other
areas in ASE LSAs (Type5 LSAs) cannot be advertised in the
NSSA. AS external routes are imported by the ASBR, and other
external routes are advertised through other areas. The ABR
generates a default NSSA LSA (Type7 LSA) automatically and
advertises it in the entire NSSA. A small number of AS external
routes can be obtained through the ASBR in the NSSA, and other
routes to other areas can be obtained through the ABR in the NSSA
connected to ASBR in other areas. You need to run commands on
the ASBR. The ASBR generates a default NSSA LSA (Type7
LSA) and advertises it to the entire NSSA. This way, external
routes can be received through the ASBR in an NSSA.
A Type7 LSA that describes a default route is neither translated
into a Type5 LSA that describes a default route on an ABR nor
advertised in the entire OSPF routing domain.
Totally NSSA area External routes in ASE LSAs (Type5 LSAs) to other areas or inter-
area routes in Type3 LSAs cannot be advertised in a totally NSSA.
Routers in the totally NSSA learn routes to other areas from an
ABR. You can configure a totally NSSA so that an ABR
automatically generates a default Type7 LSA and advertises it to
the entire totally NSSA. In this manner, routes to external areas and
inter-area routes can be advertised in the totally NSSA through the
ABR.
OSPF Route Filtering

By default, OSPF does not filter routes. OSPF supports the filtering of routes through routing
policies.
OSPF routing policies include access control lists (ACLs), IP prefix lists, and route-policies.
For details about these policies, see the section "Routing Policy" Feature Description - IP
Routing
OSPF route filtering is applicable in the following situations:
l Import of routes
OSPF imports the routes that are learned by other protocols. You can configure routing
policies to filter the routes so that OSPF imports only eligible routes.
l Advertisement of imported routes
OSPF advertises the imported routes to neighbors.
You can configure rules to filter the routing information to be advertised to neighbors.
The filtering rules take effect only when being configured on ASBRs because only the
ASBRs can import routes.
l Learning of routes
You can configure rules, by which OSPF filters the received intra-area, inter-area, and
AS external routes.

Equipment
The filtering action determines whether to add routing entries to the routing table. That
is, only the routes that pass the filtering are added to the local routing table. All the
routes, however, can still be advertised from the OSPF routing table.
l Learning of inter-area LSAs
You can configure ABRs filter the incoming summary-LSAs of the local area using a
command. This configuration takes effect only on ABRs, because only ABRs can
advertise summary-LSAs.
l Advertisement of inter-area LSAs
You can configure ABRs to filter the outgoing summary-LSAs of the local area through
a command. This configuration takes effect only on ABRs.
l Table 8-22 Differences between inter-area LSA learning and route learning
Inter-area LSA Route Learning

Learning
Filters the incoming Filters only the calculated routes in LSAs to determine
LSAs of an area whether these routes are added to the local routing table.
directly.
OSPF Virtual Link

Virtual link refers to a logical channel established between two ABRs through a non-backbone
area.
l A virtual link must be configured on both ends of the link; otherwise, it does not take
effect.
l A transit area provides an internal route of a non-backbone area for both ends of the
virtual link.
According to RFC 2328, during the deployment of OSPF, all non-backbone areas need to be
connected to the backbone area. Otherwise, some areas will be unreachable.
As shown in Figure 8-57, Area 2 is not connected to the backbone area (Area 0), and ATN A
is not an ABR. Therefore, ATN A does not advertise routing information of Network 1 in
Area 0. As a result, ATN B does not have the route to Network 1.
Figure 8-57 Non-Backbone area without connection to backbone area
ATN B
Network1
Area1 Area2
Area0
ABR ATN A

Equipment
Because of various limitations, in actual applications, physical connectivity between non-

backbone areas and backbone areas cannot be ensured. To solve this problem, you can
configure OSPF virtual links.
A virtual link is similar to a P2P connection between two ABRs. Similar to physical
interfaces, you can configure the interfaces on both ends of the virtual link with parameters,
such as the interval for sending Hello packets.
Figure 8-58 OSPF virtual link
Area0 Area2
Virtual Link
ABR Area1 ABR
Transit Area
As shown in Figure 8-58, OSPF packets transmitted between two ABRs are only forwarded
by the OSPF routers that reside between the two ABRs. These routers detect that they are not
the destinations of the packets, and forward the packets as common IP packets.
OSPF Multi-process
OSPF supports multi-processes. Multiple OSPF processes can run on the same router
independently. Route interaction between different OSPF processes is similar to route
interaction between different routing protocols.
An interface of a router belongs to only a certain OSPF process.
A typical application of OSPF multi-process is to run OSPF between PEs and CEs in the VPN
where OSPF is also adopted in the backbone network. On the PEs, the two OSPF processes
are independent of each other.
8.6.2.2 OSPF GR
More and more routers use technologies that separate the control plane from the forwarding
plane. With such technologies, when the network topology remains stable, a restart of the
control plane does not affect the forwarding plane, and the forwarding plane can still forward
data properly, which ensures non-stop service forwarding.
Graceful restart (GR) is such a technology. It ensures that the forwarding plane keeps
forwarding data even if a restart occurs, and the actions on the control plane, such as re-
establishment of neighbor relationships and route calculation, do not affect the forwarding
plane. GR prevents service interruptions caused by route flapping, improving network
reliability.
Basic Concepts
GR is one of the high availability (HA) technologies, which comprise a set of comprehensive
technologies, such as fault-tolerant redundancy, link protection, faulty node recovery, and

Equipment
traffic engineering. As a fault-tolerant redundancy technology, GR ensures normal traffic

forwarding and non-stop forwarding of key services during the restart of routing protocols,
master/slave control board switchover, and system upgrade.
Unless otherwise stated, GR described in this section refers to the GR technology defined in
RFC 3623.
GR-related concepts are described as follows:
l Grace-LSA
OSPF implements GR by flooding grace LSAs. Grace LSAs are used to inform a
neighbor of the GR time, cause, and interface address when the GR starts and ends.
l Role of a router during GR:
NOTE
The ATN device can function as a GR helper, but not as a GR restarter.
– Restarter: is the router that restarts. The restarter can be configured to support
totally GR or partly GR.
– Helper: is the router that helps the restarter. The helper can be configured to support
planned GR or unplanned GR or to selectively support GR through configured
policies.
l Causes of GR:
– Unknown: GR is triggered for an unknown reason.
– Software restart: GR is triggered by command execution.
– Software reload/upgrade: GR is triggered by a software restart or upgrade.
– Switch to redundant control processor: GR is triggered by an unexpected master/
slave control board switchover.
l GR period
The GR period cannot exceed 1800s. OSPF routers can exit from GR regardless of
whether GR succeeds or fails, without waiting for GR to expire.
Classification of OSPF GR
Classification based on GR status:
l Totally GR: indicates that if a neighbor of a router does not support GR, the router exits
from GR.
l Partly GR: indicates that if a neighbor does not support GR, only the interface associated
with this neighbor exits from GR, whereas the other interfaces perform GR normally.
Classification based on the GR implementation mode:
l Planned GR: indicates that a router restarts or performs the master/slave control board
switchover because of command execution. The restarter sends a grace LSA before the
restart or master/slave control board switchover.
l Unplanned GR: indicates that a router restarts or performs a master/slave control board
switchover because of faults. A router performs the master/slave control board
switchover, without sending a grace LSA, and then enters GR after the slave control
board goes Up.

Equipment
GR Process
l A router starts GR.
In planned GR mode, after a master/slave control board switchover is triggered because
of command execution, the restarter sends a grace LSA to all neighbors to notify them of
the start, period, and cause of GR, and then performs the master/slave control board
switchover.
In unplanned GR, the restarter does not send the grace LSA.
In unplanned GR mode, the restarter sends a grace LSA immediately after the slave
board goes Up, informing neighbors of the start, period, and cause of GR. The restarter
then sends a grace LSA to each neighbor five times to ensure that neighbors receive it.
This implementation is proposed by manufacturers but not defined by OSPF.
The restarter sends a grace LSA to notify neighbors that it enters GR. During GR,
neighbors retain neighbor relationships with the restarter so that other routers are not
aware of the switchover of the restarter.
l The router implements GR.
Figure 8-59 OSPF GR process

ATN-A ATN-B
Restarter Helper
Before the active/ Grace-LSA
Enter Helper
standby switchover
Switchover LSAck Return LSAck packet
Finish switchover
for the received LSA
Grace-LSA Updates the GR

Enter GR
period for the received
Grace-LSAs Grace-LSAs
Send Hello packets, negotiate, exchange
DD packets, and synchronize LSDB
Full
Exit GR successfully, Flush Grace-LSA Exit the Helper successfully
calculate routes, and and generate Router-LSA
generate LSA
l The router exits from GR.
Table 8-23 Reasons why a router exits GR
Executi Restarter Helper

on of
GR
GR Before GR expires, the restarter re- After the helper receives the
succeed establishes neighbor relationships with grace LSA with the Age being
s. all neighbors before a master/slave 3600s from the restarter, their
control board switchover. neighbor relationship enters the
Full state.

Equipment
Executi Restarter Helper

on of
GR
GR l GR expires, and neighbor l The helper does not receive

fails. relationships do not recover the grace LSA from the
completely. restarter before the neighbor
l The router LSA or network LSA sent relationship expires.
by the helper causes the restarter to l The status of the helper's
fail to perform bidirectional check. interface changes.
l The status of the restarter's interface l The helper receives the
changes. LSA that is inconsistent
l The restarter receives the one-way with that in the local LSDB
Hello packet from the helper. from another router. To
prevent this problem, the
l The restarter receives the grace LSA helper can be configured
that is generated by another router on not to perform strict LSA
the same network segment. Only one check.
router can perform GR on the same
network segment. l The helper receives grace
LSAs from two routers on
l On the same network segment, the same network segment
neighbors of the restarter have at the same time.
different DRs or BDRs because of
topology changes. l The neighbor relationships
between the helper and
other neighbors change.
Comparison Between GR Mode and Non-GR Mode
Table 8-24 Comparison of master/slave control board switchovers in the GR mode and non-
GR mode
Switchover in Non-GR Mode Switchover in GR Mode
l OSPF neighbor relationships are re- l OSPF neighbor relationships are re-
established. established.
l Routes are recalculated. l Routes are recalculated.
l FIB entries change. l FIB entries remain unchanged.
l The entire network detects route l Except for neighbors of the device where
changes, and routes flap for a short master/slave control board switchover
period. occurs, other routers do not detect route
l Packets are lost during forwarding, changes.
and services are interrupted. l No packets are lost, and services are not
affected.
8.6.2.3 OSPF TE
OSPF Traffic Engineering (TE) is a new feature extended on the basis of OSPF to support
MPLS TE and establish and maintain the Label Switch Path (LSP) of TE. In the MPLS TE

Equipment
architecture described in "MPLS Feature Description", OSPF functions as the information

advertising component, responsible for collecting and advertising MPLS TE information.
In addition to the network topology, TE also needs to know network constraints, such as the
bandwidth, TE metric, administrative group, and affinity attribute. Current OSPF functions,
however, cannot meet these requirements. Therefore, OSPF needs to be extended by
introducing a new type of LSAs to advertise network constraints. Based on the network
constraints, the Constraint Shortest Path First (CSPF) algorithm can calculate the path that
satisfies certain constraints.
Figure 8-60 Function of OSPF in the MPLS TE architecture
IGP route Route

selection seletction
LSP is set up component LSP is set up
Signaling
LSA TEDB
component
Information Information
flooding flooding
Message advertisement
Incoming Outgoing
packets packets
Packet forwarding component
Function of OSPF in the MPLS TE Architecture

In the MPLS TE architecture, OSPF functions as the information advertising component:
l Collects related information about TE.

l Floods TE information between ATNs in the same area.
l Uses the collected TE information to form the TE database (TEDB) and provides it for
CSPF to calculate routes.
OSPF does not care what the specific information is or how MPLS uses the information.
TE LSA
OSPF uses a new type of LSAs, namely, Type10 opaque LSAs, to collect and advertise TE
information. This type of LSA contains the link status information required by TE, including
the maximum link bandwidth, maximum reservable bandwidth, current reserved bandwidth,
and link color. Type10 opaque LSAs synchronize link status information among ATNs in an
area through the OSPF flooding mechanism. In this manner, a uniform TEDB is formed for
route calculation.
Figure 8-61 shows the typical format of the TE LSA.

Equipment
Figure 8-61 TE LSA format
0 15 23 31
LS age Options LS type = 10
Opq type = 1 Opaque ID
Advertising router
LS sequence number
LS checksum length = 132
TLV type = 1 TLV length = 4
Router address
TLV type = 2 TLV length = 100
Sub-TLV type = 1 Sub-TLV length = 1
Link type = 1 Padding
External route tag
Link ID
Sub-TLV type = 3 Sub-TLV length = 4N
Local IP address
Sub-TLV type = 4 Sub-TLV length = 4N
Remote IP address

TE metric
Maximum bandwidth
Maximum reservable bandwidth
Unreserved bandwidth-priority 0
...
Administrative group
The TE LSA uses the TLV format to carry the needed information. At present, two types of
TLVs are defined as follows:
l Router address TLV: uniquely identifies an MPLS node. In CSPF, this is known as the
router ID.

Equipment
l Link TLV: carries the attributes of a link enabled with MPLS TE. Table 8-25 shows the
sub-TLVs that can be carried in the Link TLV.
Table 8-25 Sub-TLVs that can be carried in the Link TLV

Sub-TLV Description
Type1: Link Type (the length of the Link type:

Value field is 1 byte) l 1: point-to-point links
l 2: multi-access links
There is padding of three bytes after the value field
of the Type1 sub-TLV.
Type2: Link ID (the length of the Link ID, in the format of an IP address.
Value field is 4 bytes) l For a point-to-point link, this field indicates the
OSPF router ID of the neighbor.
l For a multi-access link, this field indicates the
interface IP address of the designated router
(DR).
Type3: Local IP Address (the IP address of the local interface. It can be IP

length of the Value field is 4N addresses of several local interfaces. Each IP address
bytes) occupies 4 bytes.
Type4: Remote IP Address (the IP address of the remote interface. It can be IP

length of the Value field is 4N addresses of several remote interfaces. Each IP
bytes) address occupies 4 bytes.
l For a point-to-point link, this field is set to the
remote IP address.
l For a multi-access link, this field can be set to
0.0.0.0 or skipped.
Type5: Traffic Engineering Metric TE metric configured on a TE link. The data format
(the length of the Value field is 4 is ULONG.
bytes)
Type6: Maximum Bandwidth (the Maximum bandwidth of a link. The data format is 4
length of the Value field is 4 bytes) bytes in floating point.
Type7: Maximum Reservable Maximum reservable bandwidth of a link. The data

Bandwidth (the length of the Value format is 4 bytes in floating point.
field is 4 bytes)
Type8: Unreserved Bandwidth (the Reservable bandwidth of eight priorities of a link.

length of the Value field is 32 Each priority is in the format of 4 bytes in floating
bytes) point.
Type9: Administrative Group (the Administrative group attribute.

length of the Value field is 4 bytes)

Equipment
Interaction between OSPF TE and CSPF

OSPF collects TE information in an area by using Type10 LSAs, including the bandwidth,
priority, and link metric. After processing the TE information, OSPF provides it for CSPF to
calculate routes.
IGP Shortcut and Forwarding Adjacency

OSPF supports IGP shortcut and forwarding adjacency. The two features allow OSPF to use a
tunnel interface as an outgoing interface to reach a destination.
Differences between IGP shortcut and forwarding adjacency are as follows:
l A ATN enabled with IGP shortcut uses a tunnel interface as an outgoing interface, but it
does not advertise the tunnel interface to neighbors. Therefore, other ATNs cannot use
this tunnel.
l A ATN enabled with forwarding adjacency uses a tunnel interface as an outgoing
interface, and advertises the tunnel interface to neighbors. Therefore, other ATNs can use
this tunnel.
l IGP shortcut is unidirectional and needs to be configured only on the ATN that uses IGP
shortcut.
l Forwarding adjacency is bidirectional and needs to be configured on the ATNs on both
ends of the tunnel.
8.6.2.4 OSPF VPN
Definition
As an extension of OSPF, OSPF VPN multi-instance enables Provider Edges (PEs) and
Customer Edges (CEs) in VPNs to run OSPF for interworking and use OSPF to learn and
advertise routes.
Purpose
As a widely used IGP, in most cases, OSPF runs in VPNs. If OSPF runs between PEs and
CEs, and PEs use OSPF to advertise VPN routes to CEs, no other routing protocols need to be
configured on CEs for interworking with PEs, which simplifies management and
configuration of CEs.
Running OSPF Between PEs and CEs

In BGP/MPLS VPN, Multi-Protocol BGP (MP-BGP) is used to transmit routing information
between PEs, while OSPF is used to learn and advertise routes between PEs and CEs.
Running OSPF between PEs and CEs has the following benefits:
l OSPF is used in a site to learn routes. Running OSPF between PEs and CEs reduces the
number of the protocol types that CEs must support.
l Similarly, running OSPF both in a site and between PEs and CEs simplifies the work of
network administrators and reduces the number of protocols that network administrators
must be familiar with.
l When a network using OSPF but not VPN on the backbone network begins to use BGP/
MPLS VPN, running OSPF between PEs and CEs facilitates the transition.

Equipment
In Figure 8-62, CE1, CE3, and CE4 belong to VPN 1, and the numbers following OSPF refer
to the process IDs of the multiple OSPF instances running on PEs.
Figure 8-62 Networking with OSPF running between PEs and CEs
VPN1 VPN1
Site1 Site3
Area1
Area0
CE1 CE3
Area0 Area0
MPLS VPN
OSPF 100 VPN1
OSPF 100 VPN1 Backbone
Area1 PE1 PE2 Area1

OSPF 200 VPN2 OSPF 200 VPN1
CE2 CE4
Area1 Area2
Site2 Site4
VPN2 VPN1
The routes that PE1 receives from CE1 are advertised to CE3 and CE4 as follows:
1. PE1 imports OSPF routes of CE1 into BGP and converts them to BGP VPNv4 routes.
2. PE1 uses MP-BGP to advertise the BGP VPNv4 routes to PE2.
3. PE2 imports the BGP VPNv4 routes into OSPF and then advertises these routes to CE3
and CE4.
The process of advertising routes of CE4 or CE3 to CE1 is the same as the preceding process.
Configuring OSPF Areas Between PEs and CEs

OSPF areas between PEs and CEs can be non-backbone or backbone areas (area 0), and PEs
can function only as ABRs.
NOTE
By default, when OSPF is configured in VPN instances, the device is considered as an ASBR for each
area. For this reason, if two OSPF non-backbone areas (for example area 2 and area 4) are connected by
two parallel links, routes are imported between the two areas. Specifically, the routes in area 2 are
advertised to area 4, and the routes in area 4 are advertised to area 2. As a result, the two areas have the
same routing table. To prevent this problem, use one of the following methods:
l Filter LSAs or routes using a route-filter, route-policy, or any other filters available.
l Configure two VPN instances on each link, deploy import/export extended communities for
communication, and configure two OSPF processes.
.
In the extended application of OSPF VPN, the MPLS VPN backbone network is considered
Area 0. OSPF requires that Area 0 be contiguous. Therefore, Area 0 of all VPN sites must be

Equipment
connected to the MPLS VPN backbone network. If a VPN site has Area 0, the PEs that CEs
access must be connected to the backbone area of this VPN site through Area 0. In this
scenario, a virtual link can also be deployed between the PEs and the backbone area. Figure
8-63 shows the networking for configuring OSPF areas between PEs and CEs.
Figure 8-63 Configuring OSPF areas between PEs and CEs
PE1 VPN
backbone PE2
Area0 Area0
Area1
Virtual link
CE1 CE2 Area0

Site1 Site2
Area0
In Figure 8-63, a non-backbone area (Area 1) is configured between PE1 and CE1, and a
backbone area (Area 0) is configured in Site 1. Then, the backbone area in Site 1 is separated
from the VPN backbone area. To ensure that the backbone areas are contiguous, a virtual link
is configured between PE1 and CE1.
OSPF Domain ID
If inter-area routes are advertised between local and remote OSPF areas, these areas are
considered to be in the same OSPF domain.
l Domain IDs identify domains.

l Each OSPF domain has one or multiple domain IDs. If more than one domain ID is
available, one of the domain IDs is a primary ID, and the others are secondary IDs.
l If an OSPF instance does not have a specific domain ID, its ID is considered as null.
Before advertising the remote routes sent by BGP to CEs, PEs need to determine the type of
OSPF route (Type 3 or Type 5) to be advertised to CEs based on domain IDs.
l If local domain IDs are the same as or compatible with remote domain IDs in BGP
routes, PEs advertise Type 3 routes.
l If local domain IDs are different from or incompatible with remote domain IDs in BGP
routes, PEs advertise Type 5 routes.
Table 8-26 Domain ID relationships and corresponding generated routes
Relationship Between Local and Same or Type of the Generated

Remote Domain IDs Different Route
Both domain IDs are null. Same Inter-area route

Equipment
Relationship Between Local and Same or Type of the Generated

Remote Domain IDs Different Route
The remote domain ID is the same as the Same Inter-area route

local primary domain ID or one of the local
secondary domain IDs.
The remote domain ID is different from the Different If the local area is a non-
local primary domain ID or any of the local NSSA, external routes are
secondary domain IDs. generated.
If the local area is an NSSA,
NSSA routes are generated.
Routing Loop Prevention

Routing loops may occur between PEs and CEs when OSPF and BGP learn routes from each
other.
Figure 8-64 Networking for OSPF VPN routing loops
PE1
VPN
backbone
vpn1 site1 10.1.1.1/32
CE1
PE2
In Figure 8-64, on PE1, OSPF imports a BGP route destined for 10.1.1.1/32 and then
generates and advertises a Type 5 or Type 7 LSA to CE1. Then, CE1 learns an OSPF route
with 10.1.1.1/32 as the destination address and PE1 as the next hop and advertises the route to
PE2. Therefore, PE2 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
Similarly, CE1 also learns an OSPF route with 10.1.1.1/32 as the destination address and PE2
as the next hop. PE1 learns an OSPF route with 10.1.1.1/32 as the destination address and
CE1 as the next hop.
As a result, CE1 has two equal-cost routes with PE1 and PE2 as next hops, respectively, and
the next hops of the routes from PE1 and PE2 to 10.1.1.1/32 are CE1, which leads to a routing
loop.
In addition, the priority of an OSPF route is higher than that of a BGP route. Therefore, on
PE1 and PE2, BGP routes to 10.1.1.1/32 are replaced with the OSPF route, and the OSPF

Equipment
route with 10.1.1.1/32 as the destination address and CE1 as the next hop is active in the
routing tables of PE1 and PE2.
The BGP route is inactive, and therefore, the LSA generated when this route is imported by
OSPF is deleted, which causes the OSPF route to be withdrawn. As a result, no OSPF route
exists in the routing table, and the BGP route becomes active again. This cycle causes route
flapping.
OSPF VPN provides solutions to this problem, as described in Table 8-27.
Table 8-27 Routing loop prevention measures

Feature Definition Function
DN-bit It is a flag bit used by OSPF multi- After OSPF multi-instance

instance processes to prevent routing is configured on the ATN (a
loops. PE or an MCE), the ATN
sets the DN-bit of generated
Type 3, Type 5, or Type 7
LSAs to 1 and retains the
DN-bit (0) of other LSAs.
When calculating routes, the
OSPF multi-instance
process on the ATN ignores
LSAs with DN-bit 1, which
prevents the ATN from
receiving the LSAs that are
advertised by itself. In
Figure 8-64, PE1 generates
Type 3, Type 5, or Type 7
LSAs, sets their DN-bit to 1,
and advertises these LSAs to
CE1. After receiving these
LSAs, CE1 forwards them
to PE2. Upon reception of
these LSAs, PE2 ignores
them and does not forward
them back to PE1, which
prevents a routing loop.
VPN route tag The VPN route tag is carried in the When a PE detects that the
Type 5 or Type 7 LSA generated by VPN route tag in the
PEs based on the received BGP incoming LSA is the same
private route. as that in the local LSA, the
It is not carried in BGP extended PE ignores this LSA, which
community attributes. The VPN route prevents routing loops.
tag is valid only on PEs that receive
BGP routes and generate OSPF LSAs.

Equipment
Feature Definition Function
Default route It is a route whose destination IP PEs do not calculate default

address and mask are both 0s. routes.
Default routes are used to
forward traffic from CEs or
sites where CEs reside to the
VPN backbone network.
Sham Link
OSPF sham links are unnumbered P2P links between two PEs over an MPLS VPN backbone
network.
Generally, BGP extended community attributes carry routing information over the MPLS
VPN backbone between BGP peers. OSPF running on the other PE can use the routing
information to generate inter-area routes from PEs to CEs.
Figure 8-65 OSPF sham link
MPLS VPN backbone
PE1 sham link PE2
Area 1 Area 1
OSPF 200 OSPF 200
CE12 CE22
VPN1 VPN1
site1 site3
backdoor
In Figure 8-65, if an intra-area OSPF link exists between the network segments of local and
remote CEs. Routes that pass through the intra-area route link and have higher priorities than
inter-area routes that pass through the MPLS VPN backbone network. As a result, VPN traffic
is always forwarded through the intra-area route instead of the backbone network. To prevent
such a problem, an OSPF sham link can be established between PEs so that the routes that
pass through the MPLS VPN backbone network also become OSPF intra-area routes and take
precedence.
l A sham link is a link between two VPN instances. Each VPN instance contains the
address of an end-point of a sham link. The address is a loopback address with the 32-bit
mask in the VPN address space on the PE.

Equipment
l After a sham link is established between two PEs, the PEs become neighbors on the
sham link and exchange routing information.
l A sham link functions as a P2P link within an area. Users can select a route from the
sham link and intra-area route link by adjusting the metric.
Multi-VPN-Instance CE
OSPF multi-instance generally runs on PEs. The routers that run OSPF multi-instance within
user LANs are called Multi-VPN-Instance CEs (MCEs).
Compared with OSPF multi-instance running on PEs, MCEs have the following
characteristics:
l Do not need to support OSPF-BGP association.
l Establish one OSPF instance for each service. Different virtual CEs transmit different
services, which ensures LAN security at a low cost.
l Implement OSPF multi-instances on a CE. The key to implementing MCEs is to disable
loop detection and calculate routes directly. MCEs also need to use the received LSAs
with the ND-bit 1 for route calculation.
8.6.2.5 OSPF NSSA
Definition
OSPF Not-So-Stubby Areas (NSSA) are a new type of OSPF areas.
Derived from stub areas, NSSAs resemble stub areas in many ways. The difference between
NSSAs and stub areas is that NSSAs can import and flood AS external routes to the entire
OSPF AS, without learning external routes in other areas of the OSPF network.
Purpose
As defined in OSPF, stub areas cannot import external routes. This prevents a large number of
external routes from consuming the bandwidth and storage resources of the ATN s in stub
areas. Stub areas cannot meet the requirement of the scenario where external routes need to be
imported Resource consumption caused by external routes also needs to be avoided.
Therefore, NSSAs are introduced into the network.
Figure 8-66 NSSA
RIP RIP
Type5 Type5 NSSA Area
Type5 Type5 Type7

ASBR Area2 Area0 Area1 ASBR
ABR ABR
Type7 LSA
l Type7 LSAs are a new type of LSA that was introduced to support NSSAs and describe
the imported external routes.

Equipment
l Type7 LSAs are generated by the ASBRs of NSSAs and flooded only in the NSSAs
where ASBRs reside.
l When receiving Type7 LSAs, the ABRs of NSSAs selectively translate Type7 LSAs to
Type5 LSAs so that external routes can be advertised in other areas of the OSPF
network.
l Default routes can also be expressed through Type7 LSAs so that traffic can be
forwarded to other ASs.
N-bit
A ATN in an area must be configured with the same area type. In OSPF, the N-bit is carried in
a Hello packet to identify that a ATN supports NSSAs. OSPF neighbor relationships cannot
be established between the ATNs with different area types.
Going against RFC 1587, some manufacturers also set the N-bit in OSPF Database
Description (DD) packets. Huawei devices can be configured to be compatible with the
devices of these manufacturers for interworking.
Translating Type7 LSAs to Type5 LSAs

To advertise the external routes imported by NSSAs in other areas, you need to translate
Type7 LSAs to Type5 LSAs, so the external routes can be advertised in the entire OSPF
network.
l The Propagate bit (P-bit) notifies a ATN when Type7 LSAs need to be translated.
l The ABR with the largest Router ID in an NSSA translates Type7 LSAs to Type5 LSAs.
l Only the Type7 LSAs with the set P-bit and forwarding address other than 0 are
translated to Type5 LSAs.
NOTE
FA indicates that the packet to a specific destination address is to be forwarded to the address
specified by.
The loopback interface address in an area is preferentially selected as the FA. If no loopback
interface exists, the address of the interface that is Up and has the largest logical index in the area
is selected as the FA.
l Default Type7 LSAs that meet the preceding conditions can also be translated.
l Type7 LSAs generated by ABRs are not set with the P-bit.
Preventing Loops Caused by Default Routes

There may be multiple ABRs in an NSSA. To prevent routing loops, ABRs do not calculate
the default routes advertised by the peer.
8.6.2.6 BFD for OSPF
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects connectivity of a data protocol on the same path between two
systems. The path can be a physical link, logical link, or tunnel.

Equipment
In BFD for OSPF, a BFD session is associated with OSPF. The BFD session immediately
detects a link fault and then notifies OSPF of the fault. This speeds up the OSPF's response to
the change of the network topology.
Purpose
A link fault or topology change can cause ATNs to recalculate routes. The convergence of
routing protocols must be sped up to improve network performance.
Because link faults are unavoidable, a feasible solution is required to detect faults faster and
notify the faults to routing protocols immediately. If BFD is associated with routing protocols,
when a link fault occurs, BFD can speed up the convergence of routing protocols.
Table 8-28 BFD for OSPF

Associated Link Fault Detection Mechanism Convergence
with BFD or Speed
Not
Not associated An OSPF Dead timer expires. By default, the Seconds level
timeout period is 40s.
Associated A BFD session goes Down. At milliseconds level
Principle
Figure 8-67 BFD for OSPF
ATN A ATN B GE0/2/2

172.16.1.1/24
GE0/2/1 GE0/2/1
GE0/2/0 3.3.3.1/24 3.3.3.2/24 GE0/2/0
1.1.1.1/24 2.2.2.2/24
GE0/2/0 GE0/2/1
1.1.1.2/24 2.2.2.1/24 Area0
ATN C
The principle of BFD for OSPF is shown in Figure 8-67.

1. OSPF neighbor relationships are established between the three ATNs.
2. When a neighbor relationship becomes Full, this triggers BFD to establish a BFD
session.

Equipment
3. The outbound interface on ATN A connected to ATN B is GE 0/2/1. If the link fails,
BFD detects the fault and notifies ATN A.
4. ATN A processes the event that a neighbor relationship is Down and re-calculates routes.
After calculation, the outbound interface is GE 0/2/0 passes through ATN C and then
reaches ATN B.
8.6.2.7 OSPF GTSM
Definition
Generalized TTL security mechanism (GTSM) is a mechanism that protects services over the
IP layer by checking whether the TTL value in an IP packet header is within a pre-defined
range.
Purpose
On networks, attackers may simulate OSPF packets and keep sending them to a device. After
receiving these packets, the device directly sends them to the control plane for processing
without checking their validity if the packets are destined for the device. As a result, the
control plane is busy processing these packets, resulting in high CPU usage.
GTSM is used to protect the TCP/IP-based control plane against CPU-utilization attacks, such
as CPU-overload attacks.
Principles
GTSM-enabled devices check the TTL value in each received packet based on a configured
policy. The packets that fail to match the policy are discarded or sent to the control plane,
which prevents the devices from possible CPU-utilization attacks. A GTSM policy involves
the following items:
l Source address of the IP packet sent to the local router

l VPN instance to which the packet belongs
l Protocol number of the IP packet (89 for OSPF; 6 for BGP)
l Source interface number and destination interface number of protocols above TCP/UDP
l Valid TTL range
GTSM is implemented as follows:
l For directly connected OSPF neighbors, the TTL value of the protocol packets to be sent
is set to 255.
l For multi-hop neighbors, a reasonable TTL range is defined.
GTSM is applicable as follows:
l GTSM takes effect on unicast packets rather than multicast packets. This is because the
TTL value of multicast packets can only be 255, and therefore GTSM is not needed to
protect against multicast packets.
l GTSM does not support tunnel-based neighbors.
l GTSM is applicable to Non-Broadcast Multi-Access (NBMA) networks, virtual links,
and sham links.

Equipment
8.6.2.8 OSPF Smart-discover
Definition
Generally, routers periodically send Hello packets through OSPF interfaces. Specifically, a
router uses a Hello timer to control the interval at which Hello packets are sent and can send
Hello packets again only after the Hello timer expires. Therefore, sending Hello packets at a
fixed interval slows down the establishment of OSPF neighbor relationships.
Smart-discover speeds up the establishment of OSPF neighbor relationships in specific

scenarios.
Table 8-29 Processing differences with and without Smart-discover
With or Without Smart- Processing

discover
Without Smart-discover l Hello packets are sent only when the Hello timer
expires.
l Neighbors keep waiting to receive Hello packets
within the Hello interval.
With Smart-discover l Hello packets are sent directly regardless of

whether the Hello timer expires.
l Neighbors can receive packets and change the state
immediately.
Principles
In the following scenarios, Smart-discover-enabled interfaces can send Hello packets to
neighbors regardless of whether the Hello timer expires:
l The neighbor status becomes two-way for the first time.

l The neighbor status changes from two-way or a higher state to Init.
8.6.2.9 OSPF-BGP Synchronization
Definition
When a new router is deployed on the network or a router is restarted, network traffic may be
lost during BGP route convergence because IGP routes converge more quickly than BGP
routes.
OSPF-BGP synchronization can address this problem.
Purpose
If a backup link exists, during traffic switchback, BGP traffic may be lost because BGP routes
converge more slowly than OSPF routes do.

Equipment
In Figure 8-68, ATN A, ATN B, ATN C, and ATN D run OSPF and are IBGP peers. ATN C is
the backup device of ATN B. When the network is stable, BGP and OSPF routes converge
completely on the devices.
In most cases, traffic from ATN A to 10.3.1.0/30 passes through ATN B. If ATN B fails,
traffic is switched to ATN C. After ATN B recovers, traffic is switched back to ATN B.
However, convergence of OSPF routes is complete while BGP route convergence is still
going on at this point. As a result, ATN B does not have the route to 10.3.1.0/30.
When packets from ATN A to 10.3.1.0/30 reach ATN B, ATN B discards them because ATN
B has no route to 10.3.1.0/30.
Figure 8-68 Networking for OSPF-BGP synchronization
ATN -C AS 20
10.1.2.2/30 ATN -F
10.1.4.1/30
10.3.1.2/30
10.1.2.1/30 10.1.4.2/30
10.3.1.1/30
ATN -A AS 10 ATN -D EBGP
ATN -E
10.1.3.2/30 10.2.1.1/30
10.2.1.2/30
10.1.1.1/30
10.1.1.2/30 10.1.3.1/30
ATN -B
Principles
In Figure 8-68, OSPF-BGP synchronization is enabled on ATN B. In this situation, before
BGP route convergence is complete, ATN A continues to forward traffic to the backup link
ATN C, without forwarding traffic to ATN B, until BGP route convergence on ATN B is
complete.
The router enabled with OSPF-BGP synchronization remains as a stub router within the set
synchronization period. During this period, the link cost in the LSA advertised by the router is
the maximum value (65535), instructing other OSPF devices not to use it as a transit router
for data forwarding.
8.6.2.10 OSPF-LDP Synchronization
Background
In the networking with primary and backup links, if the primary link fails and then recovers,
traffic is switched from the backup link back to the primary link.

Equipment
IGP route convergence completes before an LDP session is established. Consequently, the
original LSP is deleted before the new LSP is established. As a result, LSP traffic is
interrupted.
Purpose
In Figure 8-69, the primary link travels along the path PE1→P1→P2→P3→PE2, and the
backup link travels along the path PE1→P1→P4→P3→PE2.
If the primary link fails, traffic is switched to the backup link. After the primary link recovers,
traffic is switched back to the primary link. During this process, traffic is interrupted for a
long time.
Figure 8-69 OSPF-LDP synchronization
P2
PE1 P1 P3 PE2
P4
Configuring OSPF-LDP synchronization on P1 and P2 shortens the traffic interruption during

a traffic switchover from the backup link to the primary link.
Table 8-30 OSPF-LDP synchronization

Whether OSPF-LDP Synchronization Traffic Interruption Time
Is Enabled
No Seconds
Yes Milliseconds
Principles
OSPF-LDP synchronization delays route switchback by suppressing the establishment of
OSPF neighbor relationships until LDP convergence is complete. Specifically, the backup link
continues to forward traffic until an LSP is established on the primary link, and then the
backup link is deleted.

Equipment
OSPF-LDP synchronization uses three timers:
l Hold-down
l Hold-max-cost
l Delay
After the primary link recovers, a router performs the following operations:
1. Starts the hold-down timer. The OSPF interface does not establish OSPF neighbors but
waits for establishment of an LDP session until the timer expires.
2. Starts the hold-max-cost timer when the hold-down timer expires and advertises the
maximum link cost of the interface connected to the primary link through local LSAs.
3. Starts the Delay timer to wait for establishment of an LSP after an LDP session is
reestablished on the primary link.
4. When the Delay timer expires, LDP notifies OSPF that synchronization is complete,
regardless of the OSPF status.
8.6.2.11 OSPF Database Overflow
Definition
OSPF requires that routers in the same area have the same link state database (LSDB).
When a router fails to store additional routing information because of limited system
resources, OSPF database overflow occurs.
Purpose
You can configure stub areas or NSSAs to solve the problem of the continuous increase in
routing information that causes the exhaustion of system resources of routers. However,
configuring stub areas or NSSAs cannot prevent the database overflow caused by the increase
in dynamic routes. To reduce the LSDB size, set the maximum number of external LSAs in
the LSDB.
Principles
To prevent database overflow, you can set the maximum number of non-default external
routes on a router.
All routers on the OSPF network must be set with the same upper limit. If the number of
external routes on a router reaches the upper limit, the router enters the Overflow state and
starts an overflow timer. The router automatically exits from the overflow state after the timer
expires, By default, it is 5 seconds.
Table 8-31 OSPF database overflow
Overflow Phase OSPF Processing
Entering overflow state A router deletes all non-default external routes that is
generated.

Equipment
Overflow Phase OSPF Processing
Staying in overflow state l Router does not generate non-default external routes.
l Router discards the newly received, non-default
external routes, and does not reply with an LSAck
packet.
l When the overflow timer expires, the router checks
whether the number of external routes still exceeds the
upper limit.
– If so, the router restarts the timer.
– If not, the router exits from overflow state.
Exiting from the overflow l Router deletes the overflow timer.

state l Router generates non-default routes.
l Router learns the newly received non-default routes,
and replies with an LSAck packet.
l Router prepares to enter Overflow state for the next
time it occurs.
8.6.2.12 OSPF Hot Standby

NOTE
Routers in the distributed structure support OSPF hot standby (HSB).
In HSB mode, OSPF backs up necessary information on the Active Main Board (AMB) to the
Standby Main Board (SMB). When the AMB fails, the SMB replaces it to ensure the normal
operation of OSPF.
HSB mode supported by OSPF:

l Backing up all OSPF data: Once the switching between AMB and SMB happens, OSPF
can immediately restore its normal operation.
l Backing up only OSPF configuration: When the AMB/SMB switchover occurs, OSPF
performs graceful restart (GR), obtains adjacencies from its neighbors, and then
synchronizes the LSDBs.
NOTE
ATN device can only be used as a GR Helper and not be used as a GR Restarter.
8.6.2.13 OSPF Fast Convergence

OSPF fast convergence speeds up the convergence of routes. It includes the following
components:
l Incremental SPF (I-SPF): recalculates only the routes of the changed nodes rather than
all the nodes when the network topology changes, which speeds up route calculation.
l Partial Route Calculation (PRC): calculates only the changed routes when the routes on
the network change.

Equipment
l An OSPF intelligent timer: can dynamically adjust its value based on the user's
configuration and the interval at which an event is triggered, such as the route calculation
interval, which ensures rapid and stable network operation.
The OSPF intelligent timer uses the exponential backoff technology so that the value of
the timer can reach the millisecond level.
I-SPF
In ISO 10589, the Dijkstra algorithm was adopted to calculate routes. When a node changes
on the network, this algorithm is used to recalculate all routes. The calculation takes a long
time and consumes too many CPU resources, which affects the convergence speed.
I-SPF improves the Dijkstra algorithm. Except for the first time, only changed nodes instead
of all nodes are involved in calculation. The SPT generated at last is the same as that
generated by the previous algorithm. I-SPF decreases CPU usage and speeds up network
convergence.
PRC
Similar to I-SPF, PRC calculates only the changed routes. However, PRC does not calculate
the shortest path. It updates routes based on the SPT calculated by I-SPF.
In route calculation, a leaf represents a route, and a node represents a router. Either an SPT or
a leaf change causes a route change. The SPT change is irrelevant to the leaf change. PRC
processes routing information as follows:
l If the SPT changes, PRC processes the routing information of all leaves on a changed
node.
l If the SPT remains unchanged, PRC does not process the routing information on any
node.
l If a leaf changes, RPC processes the routing information on the leaf only.
l If a leaf remains unchanged, PRC does not process the routing information on any leaf.
For example, if OSPF is enabled on an interface of a node, the SPT calculated by I-SPF
remains unchanged. PRC updates only the routes of this interface, consuming less CPU
resources.
PRC improves the SPF algorithm. Working with I-SPF, RPC further improves network
convergence performance.
NOTE
On live networks, only I-SPF and PRC are used to calculate OSPF routes.
OSPF Intelligent Timer

On an unstable network, routes are calculated frequently, which consumes a great number of
CPU resources. In addition, LSPs that describe the unstable topology are generated and
transmitted on the unstable network. Frequently processing such LSAs affects the rapid and
stable operation of the entire network.
To speed up route convergence on the entire network, the OSPF intelligent timer controls
route calculation, LSA generation, and LSA receiving.
The OSPF intelligent timer works as follows:

Equipment
l On a network where routes are calculated repeatedly, the OSPF intelligent timer
dynamically adjusts the route calculation based on user's configuration and the
exponential backoff technology. The number of route calculation times and the CPU
resource consumption are decreased. Routes are calculated after the network topology
stabilizes.
l On an unstable network, if a router generates or receives LSAs due to frequent topology
changes, the OSPF intelligent timer can dynamically adjust the interval. No LSAs are
generated or processed within an interval, which prevents invalid LSAs from being
generated and advertised on the entire network.
The OSPF intelligent timer is started by default and uses the default value.
8.6.2.14 OSPF MIB
Definition
The management information base (MIB) is a database that stores information. The network
administrator can call MIB objects through the agent to control, configure, or monitor
network devices. For details, refer to chapter "SNMP".
As defined in RFC 4750, the OSPF MIB is used to set, modify, and view the running status of
OSPF on network devices.
Purpose
The network administrator can use the MIB to query information about the operation of
managed devices and to configure network devices through the set operation. The OSPF MIB
helps the administrator monitor and manage networks more rapidly and effectively.
The network administrator can perform the get and get-next operations (rather than the set
operation) on all OSPF MIB objects defined in RFC 4750.
To enhance and supplement MIBs defined in RFC 4750, private OSPF MIBs are supported,
and you can perform the set operation on the private MIBs.
Principles
After an OSPF process is bound to the MIB, the network administrator can perform the get
and get-next operations through the OSPF MIB to obtain information about OSPF link state
databases (LSDBs), areas, interfaces, and neighbors of the bound OSPF process.
OSPF supports the set operation on three private MIB tables, process table, area table, and
network table.
l Using the set operation on the process table of a private MIB, you can create or delete an
OSPF process, and configure or delete parameters of the OSPF process.
l Using the set operation on the area table of a private MIB, you can create or delete an
OSPF area, and set or delete parameters of the OSPF area.
l Using the set operation on the network table of a private MIB, you can create or delete a
specific network segment for an OSPF area.
Using the set operation on the three tables, you can configure the basic OSPF functions and
set up a basic OSPF topology to conveniently manage and configure the network.

Equipment
8.6.2.15 OSPF Mesh-Group
Definition
When multiple concurrent links exist, you can deploy OSPF mesh-group to add the links to a
mesh group. Then, OSPF floods LSAs only to a link selected from the mesh group, reducing
the pressure on the system.
The mesh-group feature is disabled by default.
Purpose
After receiving or generating an LSA, an OSPF process floods the LSA. When there are
multiple concurrent links, OSPF floods the LSA to each link and sends Update messages.
If there are 2000 concurrent links, OSPF floods each LSA 2000 times. Only one flooding,
however, is valid.
To prevent burden on the system caused by repetitive flooding, enable mesh-group to add
multiple concurrent links to a group so that the system floods LSAs only to a primary link
selected from the mesh group.
Principles
In Figure 8-70, ATN-A and ATN-B are OSPF neighbors and are connected through three
links. After receiving a new LSA from interface 4, ATN-A floods it to ATN-B through
interfaces 1, 2, and 3.
This flooding causes a heavy load on the concurrent links. For the neighbor with concurrent
links, only one link is needed to flood the LSA.
Figure 8-70 LSA flooding with OSPF mesh-group disabled
1 LSA
LSA 4 2 LSA
ATN-A 3 LSA ATN-B
When multiple concurrent links exist between a device enabled with OSPF mesh-group and
its neighbor, the device selects a primary link to flood received LSAs, as shown in Figure
8-71.
Figure 8-71 LSA flooding with OSPF mesh-group enabled
1 LSA
LSA 4 2 LSA
3 LSA
ATN-A ATN-B

Equipment
As defined in OSPF, a device floods LSAs to a link only when the neighbor status is
Exchange or higher. When the status of the interface on the primary link is lower than
Exchange, the device reselects a primary link from the concurrent links and then floods the
LSA. After receiving the LSA flooded by ATN-A from link 1, ATN-B does not flood the LSA
back through interfaces 2 and 3.
As defined by the mesh-group feature, the router ID of a neighbor uniquely identifies the
mesh group. Interfaces connected to the same neighbor with the status higher than Exchange
belong to the same mesh group.
In Figure 8-72, a mesh group of ATN-A resides in Area 0, which contains the links of
interface 1 and interface 2. Interface 3 resides on the broadcast link and has more than one
neighbor. Therefore, interface 3 cannot be added to the mesh group.
Figure 8-72 Interface not added to the mesh group
4 2
ATN-B
ATN-A 3
Area0
NOTE
If mesh-group is enabled on a router and the router IDs of the router and its directly connected neighbor
are the same, LSDBs cannot be synchronized, and routes cannot be calculated correctly. In this case, you
need to reconfigure the router ID of the neighbor.
8.6.2.16 Priority-based OSPF Convergence

Priority-based OSPF convergence ensures that specific routes converge first when a large
number of routes need to converge. Different routes can be set with different convergence
priorities.
A higher convergence priority can be configured for routes over which key services are
transmitted to ensure these routes can converge first, in order to minimize impact on these key
services.
8.6.2.17 OSPF IP FRR

With OSPF IP fast reroute (FRR), a device pre-computes alternate next hops and stores them
in the IP routing table. If a primary link fails, the device switches the traffic to a backup link
within 50 ms.

Equipment
NOTE
The nexthop and outbound interface of an OSPF loop-free backup link can be obtained using either of
the following methods:
l For a static backup link, after IP FRR is enabled, configure a nexthop and an outbound interface for
the static backup link.
l For a dynamic backup link, after OSPF IP FRR is enabled, use the LFA algorithm to calculate the
nexthop and outbound interface for the dynamic backup link.
This section describes how to obtain the nexthop and outbound interface for the dynamic backup link.
Background
As networks develop, voice over IP (VoIP) and online video services pose higher
requirements for real-time transmission. Nevertheless, if a primary link fails, OSPF-enabled
devices need to perform multiple operations, including detecting the fault, updating the link-
state advertisement (LSA), flooding the LSA, calculating routes, and delivering forward
information base (FIB) entries before switching traffic to a new link. This process takes a
much longer time than 50 ms, the minimum delay to which users are sensitive. As a result, the
requirements for real-time transmission cannot be met. OSPF IP FRR can solve this problem.
OSPF IP FRR conforms to dynamic IP FRR defined by RFC 5286. With OSPF IP FRR,
devices can switch traffic from a faulty primary link to a backup link within 50 ms, protecting
against a link or node failure.
Major Auto FRR techniques include loop-free alternate (LFA), U-turn, Not-Via, Remote LFA,
and MRT, among which OSPF supports only LFA and Remote LFA.
Related Concepts
OSPF IP FRR
OSPF IP FRR refers to a mechanism in which a device uses the loop-free alternate (LFA)
algorithm to compute the next hop of a backup link and stores the next hop together with the
primary link in the forwarding table. If the primary link fails, the device switches the traffic to
the backup link before routes are converged on the control plane. This mechanism keeps the
traffic interruption duration within 50 ms and minimizes the impacts.
OSPF IP FRR policy
An OSPF IP FRR policy is used to filter alternate next hops. Only the alternate next hops that
match the filtering rules of the policy can be added to the IP routing table. Users can
configure a desired OSPF IP FRR policy to filter alternate next hops.
LFA algorithm
A device uses shortest path first (SPF) algorithm to calculate the shortest path from each
neighbor that can provide a backup link to the destination node. The device then uses the
inequalities defined in RFC 5286 and the LFA algorithm to calculate the next hop of the loop-
free backup link that has the smallest cost of the available shortest paths.
Remote LFA
LFA Auto FRR cannot be used to calculate alternate links on large-scale networks, especially
on ring networks. Remote LFA Auto FRR addresses this problem by calculating a PQ node
and establishing a tunnel between the source node of a primary link and the PQ node. If the
primary link fails, traffic can be automatically switched to the tunnel, which improves
network reliability.

Equipment
P space
P space consists of the nodes through which the shortest path trees (SPTs) with the source
node of a primary link as the root are reachable without passing through the primary link.
Extended P space
Extended P space consists of the nodes through which the SPTs with neighbors of a primary
link's source node as the root are reachable without passing through the primary link.
Q space
Q space consists of the nodes through which the SPTs with the destination node of a primary
link as the root are reachable without passing through the primary link.
PQ node
A PQ node exists both in the extended P space and Q space and is used by Remote LFA as the
destination of a protection tunnel.
OSPF LFA FRR

OSPF IP FRR guarantees protection against either a link failure or a node-and-link failure.
The link-and-node protection takes precedence over the link protection.
Link protection
NOTE
In the following description, Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a
source node, N for a node along a backup link, and D for a destination node.
Link protection takes effect when the traffic to be protected flows along a specified link and
the link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +
Distance_opt(S, D).
As shown in Figure 8-73, traffic flows from Router S to Router D. The primary link is Router
S->Router E->Router D, and the backup link is Router S->Router N->Router E->Router D.
The preceding inequality is met. With OSPF IP FRR, Router S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration within 50 ms.
Figure 8-73 OSPF IP FRR link protection
Node-and-link protection
NOTE
In the following description, Distance_opt(X, Y) indicates the shortest link from X to Y. S stands for a
source node, E for the faulty node, N for a node along a backup link, and D for a destination node.

Equipment
Node-and-link protection takes effect when the traffic to be protected flows along a specified
link and node and the following conditions are met:
l The link costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, S) +

Distance_opt(S, D).
l The interface costs meet the inequality: Distance_opt(N, D) < Distance_opt(N, E) +
Distance_opt(E, D).
As shown in Figure 8-74, traffic flows from Router S to Router D. The primary link is Router
S->Router E->Router D, and the backup link is Router S->Router N->Router D. The
preceding inequalities are met. With OSPF IP FRR, Router S switches the traffic to the
backup link if the primary link fails, keeping the traffic interruption duration within 50 ms.
Figure 8-74 OSPF IP FRR node-and-link protection
OSPF Remote LFA Auto FRR

Similar to LFA Auto FRR, Remote LFA protects against both link and node-and-link failures.
The following example shows how Remote LFA works to protect against link failures:
In Figure 8-75, traffic flows through PE1 -> P1 -> P2 -> PE2, and the primary link is between
P1 and P2. Remote LFA calculates a PQ node (P4) and establishes a Label Distribution
Protocol (LDP) tunnel between P1 and P4. If P1 detects a failure on the primary link, P1
encapsulates packets into MPLS packets and forwards MPLS packets to P4. After receiving
the packets, P4 removes the MPLS label from them and searches the IP routing table for a
next hop to forward the packets to PE2. Remote LFA ensures uninterrupted traffic forwarding.
Figure 8-75 Networking for Remote LFA
On the network shown in Figure 8-75, Remote LFA calculates the PQ node as follows:
1. Calculates the SPTs with all neighbors of P1 as roots. The nodes through which the SPTs
are reachable without passing through the primary link form an extended P space. The
extended P space in this example is {PE1, P1, P3, P4}.

Equipment
2. Calculates the SPTs with P2 as the root and obtains the Q space {PE2, P4}.
3. Selects the PQ node (P4) that exists both in the extended P space and Q space.
OSPF anti-microloop
In Figure 8-75, OSPF remote LFA FRR is enabled, the primary link is PE1 -> P1 -> P2 ->
PE2, and the backup link is PE1 -> P1 -> P3 -> P4 -> P2 -> PE2. If the primary link fails,
traffic is switched to the backup link. Specifically, after P1 completes route convergence, its
next hop becomes P3. However, the route convergence on P3 is slower than that on P1, and
P3's next hop is still P1. As a result, a temporary loop occurs between P1 and P3. OSPF anti-
microloop can address this problem by delaying P1 from switching traffic to P3 until the route
convergence on P3 completes.
NOTE
OSPF anti-microloop applies only to OSPF remote LFA FRR.
Derivative Functions
If you bind a Bidirectional Forwarding Detection (BFD) session with OSPF IP FRR, the BFD
session goes Down if BFD detects a link fault. If the BFD session goes Down, OSPF IP FRR
is triggered on the interface to switch traffic from the faulty link to the backup link, which
minimizes the loss of traffic.
8.6.2.18 OSPF Authentication

OSPF authentication encrypts OSPF packets by adding the authentication field to packets to
ensure network security. When a local device receives OSPF packets from a remote device,
the local device discards the packets if finding that the authentication passwords do not match.
This protects the local device.
According to the types of packets, the authentication is classified into the following:
l Area authentication
This authentication is configured in the OSPF area view and applies to the packets
received by all the interfaces in the OSPF area.
l Interface authentication
This authentication is configured in the interface view and applies to all the packets
received by the interface.
According to the authentication modes of packets, the authentication is classified into the
following:
l Non-authentication
Authentication is not required.
l Simple authentication
The authenticated party directly adds the configured password to packets for
authentication. This imposes security threats.
l MD5 authentication
The authenticated party encrypts the configured password using a Message Digest 5
(MD5) algorithm and adds the ciphertext password to packets for authentication. This
authentication mode improves password security. The MD5 algorithms supported
includes MD5 and HMAC-MD5.

Equipment
l Keychain authentication
A keychain consists of multiple authentication keys, each of which contains an ID and a
password. Each key has the lifecycle. According to the life cycle of the key, you can
dynamically select different authentication keys from the keychain. A keychain can
dynamically select the authentication key to enhance attack defense.
Keychain provides authentication protection for OSPF by dynamically changing
algorithms and keys to improve the security of OSPF.
l HMAC-SHA256 authentication
The HMAC-SHA256 algorithm use to encrypt a password before adding the password to
the packet, which improves password security.
OSPF carries authentication types in packet headers and authentication information in packet
tails.
The authentication types include:
l 0: Non-authentication
l 1: Simple authentication
l 2: Ciphertext authentication
Figure 8-76 Networking for OSPF authentication on a broadcast network
ATNA ATNB ATNC
ATND ATNE
The configuration requirements are as follows:
l OSPF neighbor relationships can be set up between multiple devices on the same
network only when interface authentication is configured in the same manner on all the
devices.
l When multiple devices are in the same area, you must configure area authentication in
the same manner on all the devices.
8.6.3 OSPF Applications
8.6.3.1 OSPF GR
On the network shown in Figure 8-77, ATN-A, CX-B, CX-C, and CX-D run OSPF for
interworking, and GR is enabled on ATN-A and CX-B. When ATN-A restarts, CX-B helps

Equipment
ATN-A to perform GR, without notifying other neighbors of ATN-A's restart. This ensures
uninterrupted network traffic forwarding.
Figure 8-77 OSPF GR
t CX-C
s no t
e a
B d o - C th s
- X rt
CX ify C resta
t
Set up neighbor no N-A
ATN-A relationship and AT
CX-B
negotiate GR
NodeB Restarter Helper CX-B

C X - does n CX-D
D th ot n
Restart a ot
rest t ATN- ify
arts A
Terms
Term Description
CE Customer Edge: is an edge router on the user network, and is connected to

the PE. CEs cannot detect the connected VPN.
OSPF Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol
(IGP) developed by the Internet Engineering Task Force (IETF). OSPF
version 2 (OSPFv2), which is defined in RFC 2328, is intended for IPv4.
OSPF version 3 (OSPFv3), which is defined in RFC 2740, is intended for
IPv6.
OSPF IP With OSPF IP fast reroute (FRR), a device pre-computes alternate next hops
FRR and stores them in the IP routing table. If a primary link fails, the device
switches the traffic to a backup link within 50 ms.
OSPF TE OSPF Traffic Engineering (TE) is a new feature extended on the basis of
OSPF to support MPLS TE and establish and maintain the Label Switch
Path (LSP) of TE. In the MPLS TE architecture, OSPF functions as the
information advertising component, responsible for collecting and
advertising MPLS TE information.
PE A provider edge (PE) is an edge router on an SP network and is connected to

the CE. PEs process all VPN services.
Sham links Sham links are unnumbered P2P links between two PEs over an MPLS VPN
backbone network.

Equipment
Term Description
Virtual link A virtual link is a logical channel established between two ABRs over a non-
backbone area.

Acronym and Full name
Abbreviation
ABR Area Border Router
ASBR AS Boundary Router
BDR Backup Designated Router
8.7 OSPFv3
8.7.1 Introduction
Definition
Open Shortest Path First (OSPF), developed by the Internet Engineering Task Force (IETF), is
a link-state Interior Gateway Protocol (IGP).
At present, OSPF Version 2 (OSPFv2) is used for IPv4, and OSPF Version 3 (OSPFv3) is
used for IPv6.
l As defined in RFC 2740, OSPFv3 is a routing protocol over IPv6.

l OSPFv3 is an independent routing protocol whose functions are enhanced on the basis of
OSPFv2.
Purpose
The primary purpose of OSPFv3 is to develop a routing protocol independent of any specific
network layer. The internal routing information of OSPFv3 is redesigned to serve this
purpose.
The differences between OSPFv3 and OSPFv2 are as follows:
l OSPFv3 does not insert IP-based data in the header of each packet and Link State
Advertisement (LSA).
l OSPFv3 executes some crucial tasks that originally require the data in the IP packet
header using the information independent of any network protocol. For example,
OSPFv3 can identify the LSA that advertises the routing data.
8.7.2 Principles

Equipment
8.7.2.1 Principle of OSPFv3

Running on IPv6, OSPFv3 (defined in RFC 2740) is an independent routing protocol whose
functions are enhanced on the basis of OSPFv2.
l OSPFv3 and OSPFv2 are the same in respect of the working principles of the Hello
message, state machine, link-state database (LSDB), flooding, and route calculation.
l OSPFv3 divides an Autonomous System (AS) into one or more logical areas and
advertises routes through LSAs.
l OSPFv3 achieves unity of routing information by exchanging OSPFv3 packets between
routers within an OSPFv3 area.
l OSPFv3 packets are encapsulated into IPv6 packets, which can be transmitted in unicast
or multicast mode.
Formats of OSPFv3 Packets

Hello message Hello messages are sent regularly to discover and

maintain OSPFv3 neighbor relationships.
Database Description (DD) A DD packet contains the summary of the local LSDB.
packet It is exchanged between two OSPFv3 routers to update
the LSDBs.
Link State Request (LSR) packet LSR packets are sent to the neighbor to request the
required LSAs.
An OSPFv3 router sends LSR packets to its neighbor
only after they exchange DD packets.
Link State Update (LSU) packet The LSU packet is used to transmit required LSAs to
the neighbor.
Link State Acknowledgment The LSAck packet is used to acknowledge the received
(LSAck) packet LSA packets.
LSA Type
LSA Type Description
Router-LSA (Type1) Generated by a router for each area to which an OSPFv3

interface belongs, the router LSA describes the status and
costs of links of the router and is advertised in the area
where the OSPFv3 interface belongs.
Network-LSA (Type2) Generated by a designated router (DR), the network LSA

describes the link status and is broadcast in the area that the
DR belongs to.

Equipment
LSA Type Description
Inter-Area-Prefix-LSA Generated on the area border router (ABR), an inter-area

(Type3) prefix LSA describes the route of a certain network
segment within the local area and is used to inform other
areas of the route.
Inter-Area-Router-LSA Generated on the ABR, an inter-area router LSA describes

(Type4) the route to the autonomous system boundary router
(ASBR) and is advertised to all related areas except the
area that the ASBR belongs to.
AS-external-LSA (Type5) Generated on the ASBR, the AS-external LSA describes

the route to a destination outside the AS and is advertised
to all areas except the stub area and NSSA area.
NSSA-LSA (Type7) Describes routes to a destination outside the AS. It is

generated by an ASBR and advertised in NSSAs only.
Link-LSA (Type8) Each router generates a link LSA for each link. A link LSA
describes the link-local address and IPv6 address prefix
associated with the link and the link option set in the
network LSA. It is transmitted only on the link.
Intra-Area-Prefix-LSA Each router or DR generates one or more intra-area prefix

(Type9) LSAs and transmits it in the local area.
l An LSA generated on a router describes the IPv6
address prefix associated with the router LSA.
l An LSA generated on a DR describes the IPv6 address
prefix associated with the network LSA.
Router Type
Figure 8-78 Router type
IS-IS ASBR
Area1 Area4
Backbone Router
Internal Router
Area0
Area2 ABR Area3

Equipment
Table 8-32 Router types and descriptions

Router Type Description
Internal router All interfaces on an internal router belong to the same

OSPFv3 area.
Area border router (ABR) An ABR can belong to two or more areas, but one of the
areas must be a backbone area.
An ABR is used to connect the backbone area and the non-
backbone areas. It can be physically or logically connected
to the backbone area.
Backbone router At least one interface on a backbone router belongs to the

backbone area.
All ABRs and internal routers in Area 0, therefore, are
backbone routers.
AS boundary router (ASBR) A router that exchanges routing information with other ASs
is called an ASBR.
An ASBR may not locate on the boundary of an AS. It can
be an internal router or an ABR.
OSPFv3 Route Type

Inter-area routes and intra-area routes describe the network structure of an AS. External routes
describe how to select a route to the destination outside an AS. OSPFv3 classifies the
imported AS external routes into Type 1 routes and Type 2 routes.
Table 8-33 lists route types in a descending order of priority.
Table 8-33 Types of OSPFv3 routes

Route Type Description
Intra Area Intra-area routes
Inter Area Inter-area routes
Type1 external routes Because of the high reliability of Type 1 external routes,
the calculated cost of external routes is equal to that of AS
internal routes, and can be compared with the cost of
OSPFv3 routes.
That is, the cost of a Type1 external route equals the cost of
the route from the router to the corresponding ASBR plus
the cost of the route from the ASBR to the destination
address.

Equipment
Route Type Description
Type2 external routes Because of the low reliability of Type2 external routes, the
cost of the route from the ASBR to a destination outside
the AS is considered far greater than the cost of any
internal path to an ASBR.
Therefore, OSPFv3 only takes the cost of the route from
the ASBR to a destination outside the AS into account
when calculating route costs. That is, the cost of a Type2
external route equals the cost of the route from the ASBR
to the destination of the route.
Area
When a large number of ATNs run OSPFv3, link state databases (LSDBs) become very large
and require a large amount of storage space. Large LSDBs also complicate shortest path first
(SPF) computation and are computationally intensive for the ATNs. Network expansion
causes the network topology to change, which results in route flapping and frequent OSPFv3
packet transmission. When a large number of OSPFv3 packets are transmitted on the network,
bandwidth usage efficiency decreases. Each change in the network topology causes all ATNs
on the network to recalculate routes.
OSPFv3 resolves this problem by partitioning an AS into different areas. An area is regarded
as a logical group, and each group is identified by an area ID. A ATN, not a link, resides at the
border of an area. A network segment or link can belong only to one area. An area must be
specified for each OSPFv3 interface.
OSPFv3 areas include common areas, stub areas, and not-so-stubby areas (NSSAs), as
described in Table 8-34.
Table 8-34 OSPF areas

Area Function Notes
Type
Common By default, OSPFv3 areas are defined as l The backbone area

area common areas. Common areas include: must have all its
l Standard area: transmits intra-area, inter- devices connected.
area, and external routes. l All non-backbone
l Backbone area: connects to all other OSPFv3 areas must remain
areas and transmits inter-area routes. The connected to the
backbone area is represented by area 0. backbone area.
Routes between non-backbone areas must be
forwarded through the backbone area.

Equipment
Area Function Notes

Type
Stub area A stub area is a non-backbone area with only l The backbone area
one ABR and generally resides at the border of cannot be configured
an AS. The area border router (ABR) in a stub as a stub area.
area does not transmit received AS external l An autonomous system
routes, which significantly decreases the number boundary router
of entries in the routing table on the ABR and (ASBR) cannot exist in
the amount of routing information to be a stub area. Therefore,
transmitted. To ensure the reachability of AS AS external routes
external routes, the ABR in the stub area cannot be advertised
generates a default route and advertises the route within the stub area.
to non-ABRs in the stub area.
l A virtual link cannot
A totally stub area allows only intra-area routes pass through a stub
and ABR-advertised Type 3 link state area.
advertisements (LSAs) carrying a default route
to be advertised within the area.
NSSA An NSSA is similar to a stub area. An NSSA l ABRs in an NSSA

does not advertise Type 5 LSAs but can import advertise Type 3 LSAs
AS external routes. ASBRs in an NSSA carrying a default route
generate Type 7 LSAs to carry the information within the NSSA. All
about the AS external routes. The Type 7 LSAs inter-area routes are
are advertised only within the NSSA. When the advertised by ABRs.
Type 7 LSAs reach an ABR in the NSSA, the l A virtual link cannot
ABR translates the Type 7 LSAs into Type 5 pass through an NSSA.
LSAs and floods them to the entire AS.
A totally NSSA area allows only intra-area
routes to be advertised within the area.
Network Types Supported by OSPFv3

OSPFv3 classifies networks into the following types according to link layer protocols.
Table 8-35 Types of OSPFv3 networks

Network Type Description
Broadcast If the link layer protocol is Ethernet or FDDI, OSPFv3 defaults

the network type to broadcast.
In this type of networks, the following situations occur:
l Hello messages, LSU packets, and LSAck packets are
transmitted in multicast mode (FF02::5 is the reserved IPv6
multicast address of the OSPFv3 router; FF02::6 is the
reserved IPv6 multicast address of the OSPFv3 DR or BDR).
l DD packets and LSR packets are transmitted in unicast
mode.

Equipment
Network Type Description
Non-broadcast Multiple If the link layer protocol is frame relay, ATM, or X.25, OSPFv3
Access (NBMA) defaults the network type to NBMA.
In this type of networks, protocol packets such as Hello
messages, DD packets, LSR packets, LSU packets, and LSAck
packets, are transmitted in unicast mode.
Point-to-Multipoint Regardless of the link layer protocol, OSPFv3 does not default
(P2MP) the network type to P2MP. A P2MP network must be forcibly
changed from other network types. The common practice is to
change a non-fully connected NBMA to a P2MP network.
In this type of networks, the following situations occur:
l Hello messages are transmitted in multicast mode with the
multicast address as FF02::5.
l Other protocol packets, including DD packets, LSR packets,
LSU packets, and LSAck packets, are transmitted in unicast
mode.
Point-to-point (P2P) If the link layer protocol is PPP, HDLC, or LAPB, OSPFv3
defaults the network type to P2P.
In this type of network, the protocol packets, including Hello
messages, DD packets, LSR packets, LSU packets, and LSAck
packets, are transmitted to the multicast address FF02::5.
Stub Area
A stub area is a special area where the ABRs do not flood the received external routes. In stub
areas, the size of the routing table of the routers and the routing information in transmission
are reduced.
Configuring a stub area is optional. Not all areas can be configured as stub areas. Usually, a
stub area is a non-backbone area with only one ABR and is located at the AS boundary.
To ensure the reachability of a destination outside the AS, the ABR in the stub area generates
a default route and advertises it to the non-ABR routers in the stub area.
Note the following when configuring a stub area:
l The backbone area cannot be configured as a stub area.

l If an area needs to be configured as a stub area, all the routers in this area must be
configured with the stub command.
l An ASBR cannot exist in a stub area. That is, external routes are not flooded in the stub
area.
l A virtual link cannot pass through the stub area.
OSPFv3 Route Summarization

Routing information can be decreased after route aggregation so that the size of routing tables
is reduced, which improves the performance of routers.

Equipment
The procedure for OSPFv3 route aggregation is as follows:

l Route summarization on an ABR
An ABR can summarize routes with the same prefix into one route and advertise the
summarized route in other areas.
When sending routing information to other areas, an ABR generates Type 3 LSAs based
on IPv6 prefixes. If consecutive IPv6 prefixes exist in an area and route summarization is
enabled on the ABR of the area, the IPv6 prefixes can be summarized into one prefix. If
there are multiple LSAs that have the same prefix, the ABR summarizes these LSAs and
advertises only one summarized LSA. The ABR does not advertise any specific LSAs.
l Route summarization on an ASBR
An ASBR can summarize imported routes with the same prefix into one route and then
advertise the summarized route to other areas.
After being enabled with route summarization, an ASBR summarizes imported Type 5
LSAs within the summarized address range. After route summarization, the ASBR does
not generate a separate Type 5 LSA for each specific prefix within the configured range.
Instead, the ASBR generates a Type 5 LSA for only the summarized prefix. In an NSSA,
an ASBR summarizes multiple imported Type 7 LSAs within the summarized address
range into one Type 7 LSA.
OSPFv3 Virtual Link

A virtual link refers to a logical channel established between two ABRs through a non-
backbone area.
l A virtual link must be set up on both ends of the link; otherwise, it does not take effect.
l The transmit area refers to the area that provides an internal route of a non-backbone
area for both the ends of the virtual link.
In actual applications, the physical connectivity between non-backbone areas and the
backbone area cannot be ensured owing to various limitations. To solve this problem, you can
configure OSPFv3 virtual links.
The virtual link is similar to a point-to-point connection between two ABRs. Similar to
physical interfaces, the interfaces on the virtual link can be configured with parameters such
as the hello interval.
Figure 8-79 OSPFv3 virtual link
Area0 Area2
Virtual Link
ABR Area1 ABR
Transit Area
As shown in Figure 8-79, OSPFv3 packets transmitted between two ABRs are only
forwarded by the OSPFv3 devices that reside between the two ABRs. The OSPFv3 devices

Equipment
detect that they are not the destinations of the packets, so they forward the packets as common
IP packets.
OSPFv3 Multi-process
OSPFv3 supports multi-process. More than one OSPFv3 process can run on the same router
because processes are independent of each other. Route interaction between different OSPFv3
processes is similar to the route interaction between different routing protocols.
An interface of a router belongs to only a certain OSPFv3 process.
8.7.2.2 Comparison between OSPFv3 and OSPFv2

OSPFv3 and OSPFv2 are the same in the following aspects:
l Network type and interface type

l Interface state machine and neighbor state machine
l LSDB
l Flooding mechanism
l Five types of packets, including Hello, DD, LSR, LSU, and LSAck packets
l Route calculation
OSPFv3 and OSPFv2 are different in the following aspects:
l OSPFv3 is based on links rather than network segments.

OSPFv3 runs on IPv6, which is based on links rather than network segments.
Therefore, you need not to configure OSPFv3 on the interfaces in the same network
segment. It is only required that the interfaces enabled with OSPFv3 are on the same
link. In addition, the interfaces can set up OSPFv3 sessions without IPv6 global
addresses.
l OSPFv3 does not depend on IP addresses.
This is to separate topology calculation from IP addresses. That is, OSPFv3 can calculate
the OSPFv3 topology without knowing the IPv6 global address, which only applies to
virtual link interfaces for packet forwarding.
l OSPFv3 packets and LSA format change.
– OSPFv3 packets do not contain IP addresses.
– OSPFv3 router LSAs and network LSAs do not contain IP addresses, which are
advertised by link LSAs and intra-area prefix LSAs.
– In OSPFv3, Router IDs, area IDs, and LSA link state IDs no longer indicate IP
addresses, but the IPv4 address format is still reserved.
– Neighbors are identified by Router IDs instead of IP addresses in broadcast,
NBMA, or P2MP networks.
l Information about the flooding scope is added in LSAs of OSPFv3.
Information about the flooding scope is added in the LSA Type field of LSAs of
OSPFv3. Thus, OSPFv3 routers can process LSAs of unidentified types, which makes
the processing more flexible.
– OSPFv3 can store or flood unidentified packets, whereas OSPFv2 just discards
unidentified packets.

Equipment
– OSPFv3 floods packets in an OSPF area or on a link. It sets the U flag bit of packets
(the flooding area is based on the link local) so that unidentified packets are stored
or forwarded to the stub area.
For example, ATN A and ATN B can identify LSAs of a certain type. They are
connected through ATN C, which, however, cannot identify this type of LSAs. When
ATN A floods an LSA of this type, ATN C can still flood the received LSA to ATN B
although it does not identify this LSA. ATN B then processes the LSA.
If OSPFv2 is run, ATN C discards the unidentified LSA so that the LSA cannot reach
ATN B.
l OSPFv3 supports multi-process on a link.
Only one OSPF process can be configured on a physical interface.
In OSPFv3, one physical interface can be configured with multiple processes that are
identified by different instance IDs. That is, multiple OSPFv3 instances can run on one
physical link. They establish neighbor relationships with the other end of the link and
transmit packets to the other end without interfering with each other.
Thus, the resources of a link can be shared among OSPFv3 instances that simulate
multiple OSPFv3 routers, which improves the utilization of limited router resources.
l OSPFv3 uses IPv6 link-local addresses.
IPv6 implements neighbor discovery and automatic configuration based on link-local
addresses. Routers running IPv6 do not forward IPv6 packets whose destination address
is a link-local address. Those packets can only be exchanged on the same link. The
unicast link-local address starts from FE80/10.
As a routing protocol running on IPv6, OSPFv3 also uses link-local addresses to
maintain neighbor relationships and update LSDBs. Except Vlink interfaces, all OSPFv3
interfaces use link-local addresses as the source address and that of the next hop to
transmit OSPFv3 packets.
The advantages are as follows:
– The OSPFv3 can calculate the topology without knowing the global IPv6 addresses
so that topology calculation is not based on IP addresses.
– The packets flooded on a link are not transmitted to other links, which prevents
unnecessary flooding and saves bandwidth.
l OSPFv3 packets do not contain authentication fields.
OSPFv3 directly adopts IPv6 authentication and security measures. Thus, OSPFv3 does
not need to perform authentication. It only focuses on the processing of packets.
l OSPFv3 supports two new LSAs.
– Link LSA: A router floods a link LSA on the link where it resides to advertise its
link-local address and the configured global IPv6 address.
– Intra-area prefix LSA: A router advertises an intra-area prefix LSA in the local
OSPF area to inform the other routers in the area or the network, which can be a
broadcast network or a NBMA network, of its IPv6 global address.
l OSPFv3 identifies neighbors based on router IDs only.
On broadcast, NBMA, and P2MP networks, OSPFv2 identifies neighbors based on IPv4
addresses of interfaces.
OSPFv3 identifies neighbors based on router IDs only. Thus, even if global IPv6
addresses are not configured or they are configured in different network segments,
OSPFv3 can still establish and maintain neighbor relationships so that topology
calculation is not based on IP addresses.

Equipment

Terms
Term Description
OSPF Open Shortest Path First (OSPF) is a link-state Interior Gateway Protocol
(IGP) developed by the Internet Engineering Task Force (IETF). OSPF
version 2 (OSPFv2), which is defined in RFC 2328, is intended for IPv4.
OSPF version 3 (OSPFv3), which is defined in RFC 2740, is intended for
IPv6.
OSPFv3 IP With OSPFv3 IP fast reroute (FRR), a device pre-computes alternate next
FRR hops and stores them in the IP routing table. If a primary link fails, the
device switches the traffic to a backup link within 50 ms.

Abbreviation
ABR Area Border Router
ASBR AS Boundary Router
BDR Backup Designated Router
8.8 BGP
8.8.1 Introduction to BGP

Definition
NOTE
l If BGP and BGP4+ implement a feature in the same way, details are not provided in this chapter.
l For the route aggregation function, BGP supports both automatic aggregation and manual
aggregation, whereas BGP4+ supports only manual aggregation.
Border Gateway Protocol (BGP) is a dynamic routing protocol used between autonomous
systems (AS).
BGP-1 (defined in RFC 1105), BGP-2 (defined in RFC 1163), and BGP-3 (defined in RFC
1267) are three earlier-released versions of BGP. BGP exchanges the reachable inter-AS
routes, establishes inter-AS paths, avoids routing loops, and applies routing policies between
ASs.
The current BGP version is BGP-4 defined by RFC 4271.
As an exterior routing protocol on the Internet, BGP is widely used among Internet Service
Providers (ISP).

Equipment
BGP has the following characteristics:

l Is an Exterior Gateway Protocol (EGP) unlike IGP such as OSPF and RIP, which
controls route advertisement and selects the optimal route between ASs rather than
discover and calculate routes.
l Uses the Transport Control Protocol (TCP) with the listening port number of 179 as the
transport layer protocol.
– BGP selects inter-AS routes, which imposes high requirements on the reliability of
the protocol. TCP with high reliability is used to enhance the stability of BGP.
– BGP peers must be logically connected and must establish TCP connections. The
destination port number is 179 and the local port number is random.
l Supports Classless Inter-domain Routing (CIDR).
l Transmits only the updated routes when routes are being updated. This reduces the
bandwidth occupied by BGP for route distribution. Therefore, BGP is applicable to the
Internet where a large number of routes are transmitted.
l Is a distance-vector routing protocol.
l Is designed to avoid loops.
– Inter-AS: BGP routes carry information about the ASs along the path. Routes that
carry the local AS number are discarded, avoiding inter-AS loops.
– Intra-AS: BGP does not advertise routes learned in the AS to the BGP peers,
avoiding intra-AS loops.
l Provides rich routing policies to flexibly select and filter routes.
l Provides the mechanism for preventing route flapping, which effectively enhances the
stability of the Internet.
l Can be easily extended to adapt to the development of networks.
Purpose
BGP transmits routes between ASs, but is not required in all situations.

Equipment
Figure 8-80 BGP application scenario
Client AS
IBGP
EBGP EBGP
ISP1 ISP2
Internet
BGP is required in the following situations:
l As shown in Figure 8-80, the user (Client AS) needs to be connected to two or more
ISPs. The ISPs need to provide all or part of the Internet routes for the user. Based on the
AS Path carried in BGP routes, the ATN selects the optimal route through the AS of an
ISP to the destination.
l Different organizations need to transmit the AS_Path.
l Users transmit private network routes through Layer 3 VPN. For details, see the Feature
Description - VPN.
BGP is not required in the following situations:
l User is connected to only one ISP.

l ISP does not need to provide Internet routes for users.
l ASs are connected through default routes.
8.8.2 Principles
This chapter describes BGP features.
8.8.2.1 Basic Principle of BGP
BGP Operating Modes

BGP operates on a ATN in either of the following modes, as shown in Figure 8-81:

Equipment
l Internal BGP (IBGP)

l External BGP (EBGP)
BGP is called IBGP when it runs within an AS and is called EBGP when it runs between ASs.
Figure 8-81 BGP operating modes
Client AS
IBGP
EBGP EBGP
ISP1 ISP2
Internet
Roles in Transmitting BGP Messages

l Speaker: The device that sends BGP messages is called a BGP speaker. The speaker
receives or generates new routing information, and then advertises the routing
information to other BGP speakers. When receiving a new route from another AS, a
BGP speaker compares the route with the current route. If the route takes precedence
over the existing route, or the route is new, the speaker advertises this route to all other
BGP speakers except the BGP speaker that sent this route.
l Peer: BGP speakers that exchange messages with each other are called peers. Multiple
peers compose a peer group.
BGP Messages
BGP runs by sending five types of BGP messages: Open, Update, Notification, Keepalive,
and Route-refresh.
l Open message: is the first message that is sent after a TCP connection is set up, and is
used to negotiate capability in order to set up BGP peer relationships. After the peer

Equipment
receives an Open message and peer negotiation succeeds, the peer sends a Keepalive
message to confirm and maintain the peer relationship. Then, peers can exchange
Update, Notification, Keepalive, and Route-refresh messages.
l Update message: is used to exchange routes between BGP peers. Update messages can
be used to send the following communications:
– Advertise multiple reachable routes with the same attributes. These routes can share
a group of route attributes. Route attributes contained in an Update message are
applicable to all destination addresses (expressed by IP prefixes) contained in the
Network Layer Reachability Information (NLRI) field of the Update message.
– Withdraw multiple unreachable routes. Each route is identified by its destination
address, which identifies routes previously advertised between BGP speakers.
– Withdraw routes only. In this case, the message does not need to carry the path
attributes or NLRI. Conversely, an Update message can be used only to advertise
the reachable routes, so it does not need to carry information about withdrawn
routes.
l Notification message: is sent to its peer when BGP detects an error. The BGP connection
is then torn down immediately.
l Keepalive message: is sent periodically to the peer to maintain the peer relationship.
l Route-refresh message: is used to request that the peer resend all reachable routes.
If all devices of BGP are enabled with Route-refresh capability, the local BGP device
sends Route-refresh messages to peers when the import routing policy of BGP changes.
After receiving the message, the peers resend their routing information to the local BGP
device. The BGP routing table can be dynamically refreshed, and the new routing policy
can be used, without tearing down BGP connections.
BGP Finite State Machine

The BGP Finite State Machine (FSM) has six states: Idle, Connect, Active, OpenSent,
OpenConfirm, and Established.
l In Idle state, BGP denies all connection requests. This is the initial status of BGP.
Upon receiving a Start event, BGP initiates a TCP connection to the remote BGP peer,
starts the ConnectRetry Timer with the initial value, listens for a TCP connection
initiated by the remote BGP peer, and changes its state to Connect.
l In Connect state, BGP performs other actions after TCP connection is set up
– If the TCP connection succeeds, BGP stops the ConnectRetry Timer, sends an Open
message to the remote peer, and changes its state to OpenSent.
– If the TCP connection fails, BGP restarts the ConnectRetry Timer with the initial
value, continues to listen for a TCP connection initiated by the remote peer, and
changes its state to Active.
– If the ConnectRetry Timer has expired before a TCP connection is established, BGP
restarts the timer with the initial value, initiates a TCP connection to the remote
BGP peer, and stays in the Connect state.
l In Active state, BGP attempts to set up a TCP connection. This is the intermediate status
of BGP.
– If the TCP connection succeeds, BGP stops the ConnectRetry Timer, sends an Open
message to the remote peer, and changes its state to OpenSent.
– If the ConnectRetry Timer has expired before a TCP connection is established, BGP
restarts the timer with the initial value and changes its state to Connect.

Equipment
– If BGP initiates a TCP connection with an unknown IP address, the TCP connection
fails. When this occurs, BGP restarts the ConnectRetry Timer with the initial value
and stays in the Active state.
l In OpenSent state, BGP has sent one Open message to its peer and waits for the other
Open message from the peer.
– If there are no errors in the Open message received, BGP changes its state to
OpenConfirm and sends a Keepalive message.
– If there are errors in the Open message received, BGP sends a Notification message
to the remote peer and changes its state to Idle.
– If the TCP connection fails, BGP restarts the ConnectRetry Timer with the initial
value, continues to listen for a TCP connection initiated by the remote peer, and
changes its state to Active.
l In OpenConfirm state, BGP waits for a Notification message or a Keepalive message.
– If BGP receives a Notification message or the TCP connection fails, BGP changes
its state to Idle.
– If BGP receives a Keepalive message, BGP changes its state to Established.
l In Established state, BGP peers can exchange Update, Route-Refresh, Keepalive, and
Notification messages.
During establishment of BGP peer relationships, BGP is usually in the Idle, Active, or
Established state.
l If BGP receives an Update or a Keepalive message, its state stays in Established.
l If BGP receives a Notification message, BGP changes its state to Idle.
The BGP peer relationship can be established only when both BGP peers are in the
Established state. The two peers send Update messages to exchange routes.
BGP Processing
l BGP adopts TCP as its transport layer protocol. Before the BGP peer relationship is set
up, a TCP connection must be set up between the peers. Then, BGP peers exchange
Open messages to negotiate related parameters, and finally establish the BGP peer
relationship.
l After the peer relationship is set up, BGP peers exchange BGP routing tables. BGP does
not periodically update the routing table. When BGP routes change, however, BGP
updates the BGP routing table incrementally through Update messages.
l BGP sends Keepalive messages to maintain the BGP connection between peers. When it
detects an error on a network, for example, error packets or packets that indicate
unsupported negotiation capability are received, BGP sends a Notification message to
report the error, and the BGP connection is torn down.
BGP Attributes
The BGP route attribute is a set of parameters that further describe routes. With the BGP route
attribute, BGP can filter and select routes. BGP route attributes are classified into the
following types:
l Well-known mandatory: can be identified by all BGP devices. This type of attribute is
mandatory and must be carried in Update messages. Without this attribute, errors occur
in the routing information.

Equipment
l Well-known discretionary: can be identified by all BGP devices. The attribute is

discretionary and is not necessarily carried in Update messages.
l Optional transitive: indicates the transitive attribute between ASs. A BGP device may
not recognize this attribute, but it still receives these attributes and advertises them to
other peers.
l Optional non-transitive: indicates an attribute that is not recognized. The corresponding
attributes are ignored and are not advertised to other peers.
The common BGP route attributes are described as follows:
l Origin: defines the origin of a route and marks the paths of a BGP route. The Origin
attributes are classified into the following types:
– IGP: indicates the highest priority. For routing information obtained through an IGP
of the AS that originates the route, the Origin attribute is IGP. For example, for
routes imported to the BGP routing table through the network command, the Origin
attribute is IGP.
– Exterior Gateway Protocol (EGP): indicates the second highest priority. The Origin
attribute of routes obtained through EGP is EGP.
– Incomplete: indicates the lowest priority. The Origin attribute of routes learned by
other means is Incomplete. For example, for the routes imported by BGP through
the import-route command, the Origin attribute is Incomplete.
l AS_Path: is used to record all ASs that a route passes through from the local end to the
destination in the distance-vector (DV) order.
Assume that the BGP speaker advertises a local route:
– When advertising the route to other ASs, the BGP speaker adds the local AS
number in the AS_Path list, and advertises it to the neighboring devices through
Update messages.
– When advertising the route to the local AS, the BGP speaker creates an empty
AS_Path list in an Update message.
Assume that the BGP speaker advertises the routes learned from Update messages of
other BGP speakers:
– When advertising the route to other ASs, the BGP speaker adds the local AS
number to the beginning of the AS_Path list. According to the AS_Path attribute,
the BGP device that receives the route can detect the ASs through which the route
passes to the destination. The number of the AS that is nearest to the local AS is
placed at the top of the list. The other AS numbers are arranged in sequence.
– When the BGP speaker advertises the route to the local AS, it does not change the
AS_Path.
The AS_Path attribute has four types:
– AS_Sequence: a sequenced set of numbers of the ASs that a route passes through
from a local end to the destination
– AS_Set: an unsequenced set of numbers of the ASs that a route passes through from
a local end to the destination The AS_Set attribute is used in the route aggregation
scenario. After route aggregation, the device cannot sequence the numbers of ASs
that specific routes pass through, so the AS_Set attribute is used to record the
unsequenced AS numbers. No matter how many AS numbers an AS_Set contains,
BGP regards the AS_Set as one AS number to calculate routes.
– AS_Confed_Sequence: a sequenced set of sub-AS numbers in a confederation

Equipment
– AS_Confed_Set: an unsequenced set of sub-AS numbers in a confederation. The

AS_Confed_Set attribute is used in the route aggregation scenario in a
confederation.
The AS_Confed_Sequence and AS_Confed_Set attributes are used to prevent route
loops and to select routes among the various sub-ASs in a confederation.
l Next_Hop: is different from that of IGP. It is not necessarily the IP address of a
neighboring device. Generally, the Next_Hop attribute complies with the following
principles:
– When advertising a route to an EBGP peer, the BGP speaker sets the next hop of
the route to the address of the local interface through which the BGP peer
relationship is set up.
– When advertising a locally generated route to an IBGP peer, the BGP speaker sets
the next hop of the route to the address of the local interface through which the
BGP peer relationship is set up.
– When advertising a route learned from an EBGP peer to an IBGP peer, the BGP
speaker does not change the next hop of the route.
l Multi_Exit Discriminator (MED): is exchanged only between two neighboring ASs. The
AS that receives the MED does not advertise it to any other ASs.
MED serves as the metric used by an IGP. It is used to determine the optimal route when
traffic enters an AS. When a BGP device obtains multiple routes to the same destination
address but with different next hops through EBGP peers, the route with the smallest
MED value is selected as the optimal route.
l Local_Pref: indicates preferences of the BGP devices. It is exchanged only between
IBGP peers and is not advertised to other ASs.
The Local_Pref attribute is used to determine the optimal route when traffic leaves an
AS. When a BGP device obtains multiple routes to the same destination address but with
different next hops through IBGP peers, the route with the largest Local_Pref value is
selected.
Policies for BGP Route Selection

When there are multiple routes to the same destination, BGP selects routes according to the
following policies:
1. Prefers the route with the highest PreVal.
PrefVal is a Huawei-specific parameter. It is valid only on the device where it is
configured.
2. Prefers the route with the highest Local_Pref.
A route without Local_Pref has had the value set using the default local-preference
command or has a value of 100 by default.
3. Prefers a locally originated route. A locally originated route takes precedence over a
route learned from a peer.
Locally originated routes include routes imported using the network command or the
import-route command, manually summarized routes, and automatically summarized
routes.
a. A summarized route is preferred. A summarized route takes precedence over a non-
summarized route.
b. A route obtained using the aggregate command is preferred over a route obtained
using the summary automatic command.

Equipment
c. A route imported using the network command is preferred over a route imported
using the import-route command.
4. Prefers a route that carries the Accumulated Interior Gateway Protocol Metric (AIGP)
attribute.
– The priority of a route that carries the AIGP attribute is higher than the priority of a
route that does not carry the AIGP attribute.
– If two routes both carry the AIGP attribute, the route with a smaller AIGP attribute
value plus IGP metric of the iterated next hop is preferred over the other route.
5. Prefers the route with the shortest AS_Path.
– The AS_CONFED_SEQUENCE and AS_CONFED_SET are not included in the
AS_Path length.
– An AS_SET counts as 1, no matter how many ASs are in the set.
– After you run the bestroute as-path-ignore command, the AS_Path attributes of
routes are not compared in the route selection process.
6. Prefers the route with the highest Origin type. IGP is higher than EGP, and EGP is higher
than Incomplete.
7. Prefers the route with the lowest MED.
– BGP compares MEDs of only routes from the same AS, but not a confederation
sub-AS. MEDs of two routes are compared only when the first AS number in the
AS_SEQUENCE (excluding AS_CONFED_SEQUENCE) is the same for the two
routes.
– A route without MED is assigned a MED of 0, unless the bestroute med-none-as-
maximum command is run. If you run the bestroute med-none-as-maximum
command, the route is assigned the highest MED of 4294967295.
– After you run the compare-different-as-med command, MEDs in routes received
from peers in different ASs are compared. Do not use this command unless you
confirm different ASs use the same IGP and route selection mode. Otherwise, a
loop can occur.
– If you run the bestroute med-confederation command, MEDs are compared for
routes that consist of only AS_CONFED_SEQUENCE. The first AS number in the
AS_CONFED_SEQUENCE must be the same for the routes.
– After you run the deterministic-med command, routes are not selected in the
sequence in which routes are received.
8. Prefers EBGP routes over IBGP routes.
EBGP is higher than IBGP, IBGP is higher than LocalCross, and LocalCross is higher
than RemoteCross.
If the export route target (ERT) of a VPNv4 route in the routing table of a VPN instance
on a Provide Edge (PE) matches the import route target (IRT) of another VPN instance
on the PE, the Virtual Private Network version 4 (VPNv4) route is added to the routing
table of the second VPN instance. This is called LocalCross. If the ERT of a VPNv4
route from a remote PE is learned by the local PE and matches the IRT of a VPN
instance on the local PE, the VPNv4 route will be added to the routing table of that VPN
instance. This is called RemoteCross.
9. Prefers the route with the lowest IGP metric to the BGP next hop.
After the bestroute igp-metric-ignore command is run, the IGP metrics are not
compared for routes during route selection.

Equipment
NOTE
Assume that load balancing is configured. If the preceding rules are the same and there are
multiple external routes with the same AS_Path, load balancing will be performed based on the
number of configured routes.
10. Prefers the route with the shortest Cluster_List.
NOTE
By default, Cluster_List takes precedence over Originator_ID during BGP route selection. To
enable Originator_ID to take precedence over Cluster_List during BGP route selection, run the
bestroute routerid-prior-clusterlist command.
11. Prefers the route advertised by the device with the smallest router ID.
NOTE
If routes carry the Originator_ID, the originator ID is substituted for the router ID during route
selection. The route with the smallest Originator_ID is preferred.
12. Prefers the route learned from the peer with the smallest address if the IP addresses of
peers are compared in the route selection process.
BGP ECMP
When multiple equal-cost routes have the same destination address, traffic can be evenly load
balanced using BGP Equal Cost Multiple Path (ECMP).
Condition for BGP ECMP: Routes must have the same first nine attributes defined in the
preceding "Policies for BGP Route Selection".
Policies for BGP Route Advertisement

BGP adopts the following policies for the BGP speaker to advertise routes:
l Advertises only the optimal route to its peer when there are multiple valid routes.
l Advertises the routes learned from EBGP devices to all BGP peers, including EBGP
peers and IBGP peers.
l Does not advertise the routes learned from IBGP devices to its IBGP peers.
l Advertises the routes learned from IBGP devices to its EBGP peers.
l Advertises all BGP optimal routes to new peers when the peer relationship is established.
Synchronization of IBGP and IGP

IBGP and IGP are synchronized to prevent unreachable routes being imported to the external
AS devices.
If a non-BGP device in an AS provides forwarding service, IP packets forwarded by this AS

might be discarded because the destination address is unreachable. As shown in Figure 8-82,
ATN E learns route 8.0.0.0/8 of ATN A from ATN D through BGP, and then forwards the
packet to ATN D. ATN D searches the routing table and detects that the next hop is ATN B.
ATN D forwards the packet to ATN C through route iteration, because ATN D obtained a
route to ATN B through IGP. ATN C, however, does not obtain the route to 8.0.0.0/8 and
discards the packet.

Equipment
Figure 8-82 IBGP and IGP synchronization
8.0.0.0/8
AS20 IGP
ATN C
IGP ATN E
ATN A IBGP EBGP
AS10 EBGP AS30
ATN B ATN D
If synchronization is configured, devices check the IGP routing table before they add the
IBGP route to the routing table and advertising it to the EBGP peers. The IBGP route is added
to the routing table and advertised to EBGP peers only when IGP obtains this IBGP route.
The synchronization can be disabled in the following cases:
l The local AS is not a transitive AS. (AS20 in Figure 1 is a transitive AS)
l All devices in the local AS are full-meshed IBGP peers.
8.8.2.2 Route Import

BGP itself cannot discover routes. It needs to import other protocol routes, such as IGP or
static routes, to the BGP routing table. Imported routes can be transmitted within an AS or
between ASs.
BGP routes are imported in either of the following modes:
l The import command imports routes based on protocol types, such as RIP routes, OSPF
routes, Intermediate System to Intermediate System (IS-IS) routes, static routes, or direct
routes.
l The network command imports a route with the specified prefix and mask to the BGP
routing table, which is more precise than the previous mode.
Purpose
On medium or large-scale Border Gateway Protocol (BGP) networks, the BGP routing table
on a device contains a large number of routing entries. Storing the routing table consumes a
great deal of memory, and transmitting and processing routing information consume
significant network resources. Route summarization can reduce the size of a routing table,
prevent specific routes from being advertised, and minimize the impact of route flapping on
network performance.
Definition
Route summarization is the process of summarizing specific routes with the same IP prefix
into a summary route. BGP supports automatic and manual route summarization. Table 8-36
defines the differences between the two modes.

Equipment
Table 8-36 Differences between automatic and manual route summarization

Route Implementation Characteristics
Summa
rization
Mode
Automati After automatic route summarization l BGP summarizes only local routes
c route is configured, BGP summarizes routes that are imported using the
summari based on the natural network segment import-route command.
zation and sends only the summarized route l During BGP route selection, an
to peers. For example, 10.1.1.1/24 and automatically summarized route
10.2.1.1/24 are summarized into has a lower priority than a
10.0.0.0/8, which is a Class A address. manually summarized one.
l An automatically summarized
route does not carry path
information because BGP
summarizes only local routes that
are imported using the import-
route command.
l BGP4+ does not support
automatic route summarization.
Manual BGP routes are summarized manually. l The attributes carried in a

route manually summarized route are
summari controllable.
zation l Whether to advertise the specific
routes for summarization is
controllable.
l During BGP route selection, a
manually summarized route has a
higher priority than an
automatically summarized one.
l A manually summarized route can
carry specific path information,
which prevents routing loops.
l Both BGP and BGP4+ support
manual route summarization.
An automatically summarized route comes from local routes, and the mechanism of automatic
route summarization is much less complex than that of manual route summarization.
Therefore, the next section describes only manual route summarization.
Related Concepts
Atomic_Aggregate: a well-known discretionary BGP attribute, carried in Update messages,
indicating that a route is a summarized one. BGP speakers cannot delete this attribute during
route transmission.

Equipment
Aggregator: an optional transitive attribute, carried in Update messages, indicating where

routes are summarized. Aggregator consists of the AS number and router ID of the router
that performs the route summarization.
AS_Sequence: a type of AS_Path, carried in Update messages, recording in reverse order all
the numbers of the ASs that a route passes from the local device to the destination address.
AS_Set: a type of AS_Path, carried in Update messages, recording all the numbers of the
ASs that a route passes from the local device to the destination address without an order.
AS_Set can also indicate a summarized route and carry path information. Therefore, if a
summarized route carries AS_Path, Atomic_Aggregate is optional. During route selection, a
router considers that AS_Set carries only one AS number regardless of the actual number of
ASs.
NOTE
AS_Set affects BGP route selection. Whenever AS_Set changes, a router sends Update messages to its
peers whose routes are not summarized by the router to notify the change. If the summarized route
passes through a large number of ASs and the specific routes change frequently, the router needs to send
Update messages frequently to its peers to notify them of the AS_Set changes. This process may lead to
route flapping.
AS4_Path: a new attribute defined by BGP. It is similar to AS_Path in function, but

AS4_Path can carry both 2-byte and 4-byte AS numbers. AS4_Path can be classified as
AS4_Sequence or AS4_Set, which are respectively similar to AS_Sequence and AS_Set in
function.
AS4_Aggregator: a new attribute defined by BGP. AS4_Aggregator carries 4-byte AS
numbers, while Aggregator carries 2-byte AS numbers.
Implementation
As shown in Figure 8-83, the router in AS 100 summarizes the routes from AS 65001, AS
65002, and AS 65003 into the route 10.1.1.0/24 and then advertises it. Because the route
10.1.1.0/24 originates from AS 100, it carries only AS 100 without the path information about
the specific routes for the summarization.

Equipment
Figure 8-83 Networking for route summarization
10.1.1.0/24
AS_Path: 100
AS 100
10.1.1.128/27
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
Without the path information, AS_Path carried in the route 10.1.1.0/24 can no longer prevent
routing loops. To warn downstream routers that the path information has been lost, the router
in AS 100 adds Atomic_Aggregate to an Update message.
As shown in Figure 8-84, the router in AS 100 adds Atomic_Aggregate and Aggregator to
an Update message to advertise the route 10.1.1.0/24.
NOTE
If Atomic_Aggregate is added to the route, Aggregator is optional.
BGP speakers cannot delete Atomic_Aggregate carried in a summarized route during route
transmission. After the downstream router receives this route, the router cannot restore the
lost path information.

Equipment
Figure 8-84 Networking in which an Update message carries Atomic_Aggregate
10.1.1.0/24
AS_Path: 100
AS 100 Atomic_Aggregate
10.1.1.128/27 Aggregator=100, 10.1.1.1
Router ID
10.1.1.1
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
However, only Atomic_Aggregate and Aggregator cannot prevent routing loops. AS_Set
can address this problem. If AS_Set is configured on the ATN in AS 100 in the networking
shown in Figure 8-85, the summarized route 10.1.1.0/24 carries AS_Set {65001, 65002,
65003} which records all the ASs it passes through.

Equipment
Figure 8-85 Networking in which an Update message carries AS_Set
10.1.1.0/24
AS 100
AS_Path: 100 {65001, 65002, 65003}
10.1.1.128/27
Aggregator=100, 10.1.1.1
Router ID
10.1.1.1
10.1.1.192/27
AS_Path: 65003 65001
10.1.1.160/27
AS_Path: 65003
AS 65003 10.1.1.224/27
10.1.1.160/27 AS_Path: 65002
10.1.1.192/27
AS_Path: 65001
AS 65001 AS 65002
10.1.1.192/27 10.1.1.224/27
Because the ATN in AS 100 cannot determine the AS sequence, it records the ASs without an
order in AS_Set. After the ATN in AS 65001, AS 65002, or AS 65003 receives the route
10.1.1.0/24 elsewhere, it checks the AS_Path carried in the route. Because the AS_Path
contains its AS number, the ATN discards the route. Therefore, even though the ASs are listed
without an order in AS_Set, routing loops can still be prevented.
In the preceding section, all ATNs are old speakers. If new speakers co-exist with old
speakers, AS4_Path and AS4_Aggregator must be available. Figure 8-86 shows such a
scenario in which new speakers use 4-byte AS numbers and old speakers use 2-byte AS
numbers. The ATN in AS 3.3 summarizes the routes from AS 1.1, AS 65001, AS 2.2, and AS
65002 into the route 10.0.0.0/8 carrying AS4_Path {1.1, 65001, 2.2, 65002} and advertises it
to the ATNs in AS 4.4 and AS 65003. The carried AS4_Path equals an AS4_Set in function.

Equipment
Figure 8-86 Networking for route summarization in which new speakers co-exist with old
speakers
AS 5.5 AS 65003 AS 4.4
Destination: 10.0.0.0/8
AS_Path: 23546 {23456 65001 65002}
Destination: 10.0.0.0/8
AS4_Path: 3.3 {1.1 65001 2.2 65002}
AS_Path: 3.3 {1.1 65001 2.2 65002}
Aggregator: 23456 192.168.1.1
AS_Aggregator: 3.3 192.168.1.1
AS4_Aggregator: 3.3 192.168.1.1
AS 3.3 Destination: 10.0.0.0/8

AS_Path: {1.1 65001 2.2 65002}
Router ID
192.168.1.1
AS 1.1 AS 65001 AS 2.2 AS 65002

10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 10.1.4.0/24
New Session
New Speaker Old Speaker
Old Session
l Because the BGP connection is an old session between the ATN in AS 3.3 and that in AS
65003 that does not support 4-byte AS numbers, the ATN in AS 3.3 replaces the 4-byte
AS numbers in AS4_Path and AS4_Aggregator with 23456 (AS_Trans) before it sends
the route 10.0.0.0/8 to the ATN in AS 65003. Therefore, the AS_Path carried in the
route is 23456{23456, 65001, 23456, 65002}, and the Aggregator is 23456 192.168.1.1.
After the ATN in AS 65003 receives the route 10.0.0.0/8, it checks the AS_Path.
Because its own AS number is not listed in the AS_Path, the ATN in AS 65003 accepts
the route.
NOTE
23456 is a reserved AS number and cannot be the number of the AS to which the downstream
router that receives the summarized route belongs. Therefore, the downstream router does not
discard the summarized route.
In addition, the ATN in AS 65003 may be connected to downstream new speakers in an
AS numbered in 4-byte format, AS 5.5 for example. To ensure that the ATN in AS 5.5
knows about the actual path that the route passes through, the ATN in AS 3.3 adds
AS4_Path and AS4_Aggregator to the Update message to advertise the route 10.0.0.0/8
to the ATN in AS 65003. After the ATN in AS 65003 receives the message, it
transparently transmits the message to the ATN in AS 5.5. After the ATN in AS 5.5
receives the message, it constructs the actual path that the route passes based on
AS4_Path and AS4_Aggregator carried in the message.

Equipment
l Because the BGP connection is a new session between the ATN in AS 3.3 and that in AS
4.4 that supports 4-byte AS numbers, the ATN in AS 3.3 adds only AS_Path and
AS_Aggregator to the Update message to advertise the route 10.0.0.0/8 to the ATN in
AS 4.4. After the ATN in AS 4.4 receives the route 10.0.0.0/8, it checks the AS_Path.
Because its own AS number is not listed in the AS_Path, the ATN in AS 4.4 accepts the
route.
Benefits
Route summarization brings the following benefits:
l Reduces the router load: Route summarization reduces the size of a routing table and
spares a router from advertising a large number of specific routes, which reduces the
transmitting load. Route summarization also reduces the receiving load because
downstream routers receive only the summarized route.
l Reduces the link load: A router advertises only the summarized route to its peers, which
reduces link bandwidth consumption.
l Minimizes the impact of route flapping: If route flapping occurs in the ASs that the
specific routes for summarization pass through, its impact will not spread beyond the
ASs.
8.8.2.4 Route Dampening

Route instability is reflected in route flapping. When a route flaps, it repeatedly disappears
from the routing table and then reappears.
When route flapping occurs, a device sends an Update packet to its peers. After the peers
receive the Update packet, they recalculate routes and update their routing tables. Frequent
route flapping consumes lots of bandwidth and CPU resources and can even affect network
operations.
Route dampening can address this problem. In most cases, BGP is deployed on complex
networks where routes change frequently. To reduce the impact of frequent route flapping,
BGP adopts route dampening to suppress unstable routes.
BGP dampening use a penalty value to measure the stability of a route. The greater the
penalty value, the less stable a route. Each time route flapping occurs (a route changes from
active to inactive), BGP adds a certain penalty value (1000) to this route. When the penalty
value of a route exceeds the Suppress value, the route is suppressed. BGP does not add the
route to the routing table, or advertise any Update message to BGP peers.
The penalty value of the suppressed route decreases by half after a half-life period. When the
penalty value decreases to the Reuse value, the route is reusable and is added to the routing
table. At the same time, BGP advertises an Update message to BGP peers. The penalty value,
Suppress value, and half-life are configurable. Figure 8-87 shows the process of BGP route
dampening.

Equipment
Figure 8-87 Networking for BGP route dampening
Penalty value
suppress value
reuse value
suppress time
time
half-life
8.8.2.5 Community Attribute

A community is a set of destination addresses with the same characteristics. It is four bytes
long. The community is in the format of aa:nn or a community number.
l aa:nn: aa indicates an AS number and nn indicates the community identifier defined by

an administrator. The value of aa or nn ranges from 0 to 65535, which is configurable.
For example, if a route is from AS 100 and the community identifier defined by the
administrator is 1, the community is 100:1.
l Community number: It is an integer ranging from 0 to 4294967295. As defined in RFC
1997, numbers from 0 (0x00000000) to 65535 (0x0000FFFF) and from 4294901760
(0xFFFF0000) to 4294967295 (0xFFFFFFFF) are reserved.
The community attribute is used to simplify the application, maintenance, and management of
routing policies. With the community attribute, a group of BGP peers in multiple ASs can
share the same routing policy. The community attribute is a route attribute. It is transmitted
between BGP peers and is not restricted by the AS. Before advertising a route with the
community attribute to peers, a BGP peer can change the original community attribute of this
route.
The peers in a peer group share the same policy, whereas the routes with the same community
attribute share the same policy.
The well-known communities are described in the next section. Users can also create their
own communities to filter routes.
Well-known Community
Table 8-37 lists the well-known community attributes of BGP routes.

Equipment
Table 8-37 Well-known communities of BGP routes

Community Identifier Description
Internet 0 (0x00000000) By default, all routes belong to the Internet

community. A route with this attribute can be
advertised to all BGP peers.
No_Export 4294967041 A route with this attribute cannot be advertised

(0xFFFFFF01) outside the local AS. If a confederation is
defined, the route with this attribute cannot be
advertised to ASs outside the confederation.
No_Advertise 4294967042 A route with this attribute cannot be advertised

(0xFFFFFF02) to any other BGP peers.
No_Export_Subconf 4294967043 A route with this attribute cannot be advertised

ed (0xFFFFFF03) outside the local AS or to other sub-ASs in the
confederation.
Usage Scenario
In Figure 8-88, EBGP connections are established between ATN B and ATN A, and between
ATN B and ATN C. With the community attribute of No_Export configured on ATN A, routes
from AS 10 advertised to AS 20 are not advertised to other ASs by AS 20.
Figure 8-88 Networking for BGP communities
AS 10
ATN-A
EBGP
EBGP
200.1.3.1/24 ATN -C
ATN -B
AS 20 AS 30
8.8.2.6 Route Reflector

Generally, fully meshed connections need to be established between IBGP peers to ensure the
connectivity between them. If there are n routers in an AS, n (n-1)/2 IBGP connections need

Equipment
to be established. If many IBGP peers exist, network resources and Central Processing Unit
(CPU) resources are greatly consumed. To solve this problem, route reflection is introduced in
the network.
In an AS, one router serves as a Route Reflector (RR) and the other routers serve as clients.
The clients establish IBGP connections with the RR. The RR and its clients form a cluster.
The RR reflects routes between clients, and clients do not need to establish BGP connections.
A BGP device that functions as neither an RR nor a client is called a non-client. A non-client
must establish a fully meshed connection with an RR and with all the other non-clients, as
Figure 8-89 Networking for Route Reflector
Route
Reflector Non-Client
IBGP IBGP
Client
IBGP IBGP
Cluster IBGP IBGP
Client Client Non-Client

AS65000
Applications
After receiving routes from peers, an RR selects the optimal route based on BGP route
selection policies. The RR advertises the learned routes to its IBGP peers according to the
rules defined in RFC 2796.
l After learning routes from a non-client IBGP peer, the RR advertises the routes to all the
clients.
l After learning routes from a client, the RR advertises the routes to all the other clients
and all non-clients.
l After learning routes from an EBGP peer, the RR advertises the routes to all clients and
non-clients.
An RR is easy to configure, because it needs to be configured only on the device that
functions as a reflector, and clients do not need to detect that they are clients.
On some networks, if clients of an RR establish fully meshed connections between each other,
they can exchange routing information directly. In this case, route reflection between clients is
unnecessary and occupies bandwidth needlessly. On the ATN, you can run the undo reflect
between-clients command to disable route reflection between clients, but routes between
clients and non-clients can still be reflected. Route reflection between clients is enabled by
default.

Equipment
Originator_ID
The Originator_ID attribute and Cluster_List attribute are defined in RFC 2796. They are
used to detect and prevent routing loops.
The Originator_ID attribute is four bytes long and is generated by an RR. It carries the router
ID of the originator of the route in a local AS.
l When a route is reflected by an RR for the first time, the RR adds the Originator_ID
attribute to the route to identify the originating router. If a route already has the
Originator_ID attribute, the RR does not create a new Originator_ID.
l When another BGP speaker receives the route, it compares the Originator_ID added to
the route with the local router ID. If the Originator_ID and local router ID are the same,
the BGP speaker ignores the route.
Cluster_List
To avoid routing loops between ASs, a BGP router uses the AS_Path attribute to record the
ASs that a route passes through. The router discards the route with the local AS number. To
avoid routing loops within an AS, a BGP router prohibits IBGP peers from advertising routes
learned from the local AS.
An RR is implemented on the basis that IBGP peers can advertise to each other the routes
learned from the local AS to each other. In this case, the Cluster_List attribute is introduced to
avoid routing loops within an AS.
An RR and its clients form a cluster. In an AS, each RR is uniquely identified by a

Cluster_ID.
To avoid routing loops, the RR uses the Cluster_List attribute to record Cluster_IDs of all
RRs that a route passes through.
The Cluster_List is composed of a series of Cluster_IDs. It records all the RRs that a route
passes through. The Cluster_List is similar to the AS_Path list and is generated by an RR.
l When an RR reflects routes between its clients or between its clients and non-clients, the
RR adds the local Cluster_ID to the top of the Cluster_List. If the Cluster_List is empty,
the RR creates a new one.
l When receiving an updated route, the RR checks its Cluster_List. If the Cluster_List
contains the local Cluster_ID, the RR discards the received route. If the Cluster_List
does not contain the local Cluster_ID, the RR adds the local Cluster_ID to the
Cluster_List, and then reflects the updated route.
Backup RR
To enhance the reliability of a network and avoid the single node fault, more than one RR
needs to be configured in a cluster. To avoid routing loops, RRs in the same cluster have the
same Cluster_ID. On the ATN, you can run the reflector cluster-id command to configure the
same Cluster_ID for all RRs in a cluster.
In the redundant environment, clients can receive multiple routes to the same destination from
different RRs. Clients then apply route selection policies to select the optimal route.
In Figure 8-90, RR1 and RR2 are in the same cluster. RR1 and RR2 establish an IBGP
connection. That is, the two RRs are non-clients.

Equipment
Figure 8-90 Backup RR
RR1 RR2
IBGP
Cluster
IBGP IBGP IBGP
Client2 Client3
Client1
AS65000
l When Client 1 receives an updated route from an external peer, it advertises the route to
RR1 and RR2 through IBGP.
l After it receives the updated route, RR1 reflects the route to other clients (Client 2 and
Client 3) and non-clients (RR2) and adds the local Cluster_ID to the top of the
Cluster_List.
l After receiving the reflected route, RR2 checks the Cluster_List. RR2 finds that its
Cluster_ID is contained in the Cluster_List; therefore, it discards the updated route and
does not reflect the route to its clients.
NOTE
Application of the Cluster_List ensures that routing loops do not occur between RRs in the same AS.
Multiple Clusters in an AS
Multiple clusters can exist in an AS. RRs are IBGP peers of each other. An RR can be
configured as a client or non-client of another RR. The relationship between clusters in an AS
can be configured flexibly.
For example, a backbone network is divided into multiple reflection clusters. Each RR is
configured as a non-client of the other RRs, which are fully meshed. Each client establishes
IBGP connections with only the RRs in the same cluster. All BGP devices in the AS can
receive the reflected routes, as shown in Figure 8-91.

Equipment
Figure 8-91 Multiple clusters in an AS
Cluster 4
Cluster 3
Client Client Client Client
Client
Client RR
RR
RR RR Client
Client Client Client Client

AS100 Cluster 1 Cluster 2
Hierarchical Reflector
In the actual deployment of RRs, the scenario of the hierarchical reflector is most often used.
In Figure 8-92, an Internet Service Provider (ISP) provides Internet routes for AS 100. Two
EBGP connections are established between the ISP and AS 100. AS 100 is divided into two
clusters. The four ATNs in Cluster 1 are core routers.
l Two Level-1 RRs (RR-1) are deployed in Cluster 1. This redundant structure ensures
reliability of the AS100 core layer. The other two ATNs at the core layer serve as clients
of RR-1.
l One Level-2 RR (RR-2) is deployed in Cluster 2. RR-2 is the client of RR-1.

Equipment
Figure 8-92 Hierarchical reflector
ISP
EBGP EBGP
RR-1 RR-1
Cluster1 Client/RR-2
Client
Cluster2
AS100
Client Client
NOTE
In the networking with RRs, if BGP preferred routes do not need to guide packet forwarding, after the
BGP-RIB-only feature is configured, all BGP preferred routes are not added to the IP routing table and
are not delivered to the forwarding layer. Forwarding efficiency is improved and system capacity is
expanded.
8.8.2.7 BGP Confederation

Besides RR, BGP confederation can also reduce IBGP connections in an AS. It divides an AS
into several sub-ASs. Fully meshed IBGP connections are established in each sub-AS, and
fully meshed EBGP connections are established between sub-ASs, as shown in Figure 8-93.

Equipment
Figure 8-93 Confederation
ATN -B ATN-C
AS 65002
AS 65003
AS 65001
ATN -D
ATN -F ATN -A
AS 100
AS 200
ATN -E
BGP speakers outside the confederation (such as, the devices in AS 100) are unaware of the
sub-ASs (AS 65001, AS 65002, and AS 65003) in the same confederation. The external
devices do not need to detect the topology of each sub-AS. The confederation ID is the AS
number that is used to identify the entire confederation. As shown in Figure 8-93, AS 200 is
the confederation ID.
In Figure 8-93, AS 200 has multiple BGP devices. To reduce IBGP connections, AS 200 is
divided into three sub-ASs: AS 65001, AS 65002, and AS 65003. In AS 65001, IBGP full
meshes are established between the three devices.
Applications and Limitations

The confederation needs to be configured on each device, and any device that joins the
confederation must support the confederation function.
If the devices that are not in a confederation need to be configured as a confederation, the
logical topology changes accordingly.
On large-scale BGP networks, the RR and confederation can both be used.
NOTE
4-byte AS numbers do not support confederations, which may incur routing loops. Therefore, old BGP
speakers with 2-byte AS numbers and new speakers with 4-byte AS numbers cannot exist in the same
confederation.
8.8.2.8 BGP GR
Graceful restart (GR) is one of the high availability (HA) technologies that comprise a series
of comprehensive technologies, such as fault-tolerant redundancy, link protection, faulty node

Equipment
recovery, and traffic engineering. As a fault-tolerant redundancy technology, GR ensures

normal forwarding of data during the restart of routing protocols to prevent interruptions of
key services. Currently, GR has been widely applied to the master/slave switchover and
system upgrade.
GR is usually used when the active route processor (RP) fails because of a software or
hardware error, or used by an administrator to perform the master/slave switchover.
Prerequisite for Implementation

On a traditional routing device, a processor implements both control and forwarding. The
processor finds routes based on routing protocols, and maintains the routing table and
forwarding table of the device. Mid-range and high-end devices generally adopt the multi-RP
structure to improve forwarding performance and reliability. The processor in charge of
routing protocols is located on the main control board, whereas the processor responsible for
data forwarding is located on the interface board. The design helps to ensure the continuity of
packet forwarding on the interface board during the restart of the main processor. The
technology that separates control from forwarding satisfies the prerequisite for GR
implementation.
Related Concepts
The concepts related to GR are as follows:
NOTE
The ATN device can only function as a GR Helper.
l GR Restarter: indicates a device that performs master/slave switchover triggered by the
administrator or a failure. A GR Restarter must support GR.
l GR helper: indicates the neighbor of a GR Restarter. A GR helper must support GR.
l GR session: indicates a session, through which a GR Restarter and a GR helper can
negotiate GR capabilities.
l GR time: indicates the time when the GR helper finds that the GR Restarter is Down but
keeps the topology information or routes obtained from the GR Restarter.
l End-of-RIB (EOR): indicates a BGP information, notifying a peer BGP that the first
route upgrade is finished after the negotiation.
l EOR timer: indicates a maximum time of a local device waiting for the EOR information
sent from the peer. If the local device does not receive the EOR information from the
peer within the EOR timer, the local device will select an optimal route from the current
routes.
Principles
Principles of BGP GR are as follows:
l During BGP peer relationship establishment, devices negotiate GR capabilities by
sending supported GR capabilities to each other.
l When detecting the master/slave switchover of the GR Restarter, a GR helper does not
delete the routing information and forwarding entries related to the GR Restarter within
the GR time, but waits to re-establish a BGP connection with the GR Restarter.

Equipment
NOTE
If the GR Helper sends Keepalive packets to the Restarter but receives no reply within the
Holdtimer, the GR Helper is in GR state and marks the route sent from the Restarter as Stale. The
Restarter restart may trigger the GR Helper to enter the GR state.
l After the master/slave switchover, the GR Restarter receives routes from all the
negotiated peers with GR capabilities before the switchover, and starts the EOR timer.
The GR Restarter selects a route when either of the following conditions is met:
a. The GR Restarter receives the EOR information of all peers and the EOR timer is
deleted.
b. The EOR timer times out but the GR Restarter receives no EOR information from
all peers.
l The GR Restarter sends the optimal route to the GR Helper and the GR Helper starts the
EOR timer. The GR Helper quits GR when either of the following conditions is met:
a. The GR Helper receives the EOR information from the GR Restarter and the EOR
timer is deleted.
b. The EOR timer times out and the GR Helper receives no EOR information from the
GR Restarter.
GR Reset
Currently, BGP does not support dynamic capability negotiation. Therefore, each time a new
BGP capability (such as the IPv4, IPv6, VPNv4, and VPNv6 capabilities) is enabled on a
BGP speaker, the BGP speaker tears down existing sessions with its peer and renegotiates
BGP capabilities. This process will interrupt ongoing services.
To prevent the service interruptions, the ATN provides the GR reset function that enables the
ATN to reset a BGP session in GR mode. With the GR reset function configured, when you
enable a new BGP capability on the BGP speaker, the BGP speaker enters the GR state, resets
the BGP session, and renegotiates BGP capabilities with the peer. In the whole process, the
BGP speaker re-establishes the existing sessions but does not delete the routing entries for the
existing sessions, so that the existing services are not interrupted.
Benefits
BGP GR ensures uninterrupted forwarding. In addition, the flapping of BGP occurs only on
the peers of the GR Restarter. This is important for BGP that needs to process a large number
of routes.
8.8.2.9 BGP Security
BGP Authentication
BGP uses TCP as the transport layer protocol. To enhance BGP security, you can perform the
Message Digest 5 (MD5) authentication or Keychain authentication when you set up a TCP
connection is set up. The MD5 authentication or Keychain authentication, however, does not
authenticate BGP packets. Instead, it sets the authentication password for the TCP connection,
and the authentication is performed by TCP. If authentication fails, the TCP connection cannot
be established.

Equipment
GTSM of BGP
The Generalized TTL Security Mechanism (GTSM) defends against attacks by checking the
time to live (TTL) value (maximum number of routers through which a packet can pass). If an
attacker simulates real BGP packets and keeps sending them to a router, an interface board on
the router receives the packets and sends them directly to the main control board for BGP
processing, without checking the validity of the packets. When the router is tied up in
processing these packets, Central Processing Unit (CPU) usage is high.
The GTSM checks whether the TTL value in the IP packet header is within a pre-defined
value range. To enhance system security, the GTSM can protect services above the IP layer.
After the GTSM of BGP is enabled, an interface board checks the TTL values carried in all
BGP packets. As required by the actual networking, packets whose TTL values are not within
the specified range are either allowed to pass or discarded by the GTSM. To configure the
GTSM to discard packets by default, you need to set an appropriate TTL value range
according the network topology. Then, packets whose TTL values are not within the specified
range are discarded and attacks by bogus BGP packets are avoided.
You can also enable the log function to record information when GTSM drops packets to help
you locate faults.
8.8.2.10 BFD for BGP

Bidirectional Forwarding Detection (BFD) provides fast link fault detection for BGP. BGP
periodically sends messages to the peer to detect the status of the peer. The detection lasts
longer than one second. When the data transmission rate reaches the level of Gbit/s, such slow
detection will cause a large amount of data to be lost.
To meet the requirement for high reliability of carrier-class networks, BFD for BGP is
introduced in the network to detect faults on the links between BGP peers in milliseconds and
notify the faults to BGP so that routes can converge quickly.
Networking
In Figure 8-94, ATN A belongs to AS 100, ATN B belongs to AS 200, and the EBGP peer
relationship is established between them.
BFD is enabled to detect the BGP relationship between ATN A and ATN B. When the link
between ATN A and ATN B becomes faulty, BFD can quickly detect the fault and notify BGP.
Figure 8-94 BFD for BGP
EBGP
AS100 AS200
ATN A ATN B

Equipment
8.8.2.11 BGP Peer Tracking

After BGP peer tracking is enabled on BGP peers, when a link between the BGP peers fails,
one BGP peer rapidly detects the unreachability of its peer, terminates the connection, and
deletes the routes received from the peer, which speeds up route convergence.
To ensure network stability, configure a proper delay for BGP to terminate a connection after
detecting the associated peer unreachable.
l If the delay is set to 0, BGP immediately terminates the connection between the local
device and its peer after detecting the peer unreachable.
l If IGP route flapping occurs and the delay is set to 0, the peer relationship between the
local device and its peer alternates between Up and Down. Therefore, setting the delay to
a value greater than the IGP route convergence time is recommended.
l When BGP peers successfully perform GR negotiation, a master/slave control board
switchover is performed on the BGP peers. To prevent a GR failure, set the delay to a
value greater than the GR convergence time. If the delay is less than the GR convergence
time, the connection between the local device and its BGP peer will be terminated,
leading to GR failure.
BGP peer tracking can speed up network convergence and is easy to deploy. However, BGP
route convergence on a network configured with BGP peer tracking is slower than that on a
network enabled with BFD. BGP peer tracking cannot meet the requirements of voice
services that require a high convergence speed.
Networking
In Figure 8-95, an IBGP peer relationship is established between ATN A and ATN C. BGP
peer tracking is configured on ATN A. If the link between ATN A and ATN B fails, ATN A
detects that ATN C is unreachable after IGP fast route convergence and then terminates the
BGP connection with ATN C.
Figure 8-95 Networking for BGP peer tracking
ATN-A ATN-B ATN-C
8.8.2.12 BGP Auto FRR

As a protection measure against faults over links, BGP Auto Fast Reroute (FRR) is applicable
to networks with primary and backup links. With BGP Auto FRR, traffic can be switched
between two BGP peers or next hops within sub-seconds.
With BGP Auto FRR, if a peer has multiple routes with the same prefix that are learned from
different peers, the peer uses the optimal route as the primary link to forward packets and the
less optimal route as a backup link. If the primary link fails, the peer rapidly notifies other
peers that the BGP route has become unreachable and then switches traffic from the primary
link to the backup link.

Equipment
Usage Scenario
In Figure 8-96, ATN Y advertises a learned BGP route to ATN X2 and ATN X3 in AS 100;
ATN X2 and ATN X3 then advertise the BGP route to ATN X1 through the reflector. ATN X1
therefore receives two routes whose next hops are ATN X2 and ATN X3 respectively. Then,
ATN X1 selects a route according to the configured policy. Assume that the route received
from ATN X2 (link A) is preferred. Link B, then functions as the backup link.
Figure 8-96 Networking for BGP Auto FRR
Loopback1
2.2.2.2/32
RR
Loopback1 ATNX2
1.1.1.1/32
LinkA
AS100 AS200
ATNX1 LinkB ATNY
ATNX3
RR
Loopback1
3.3.3.3/32
When a device along Link A fails or faults occur on Link A, the next hop of the route to ATN
X2 becomes invalid on ATN X1. If BGP Auto FRR is enabled on ATN X1, the forwarding
plane then quickly switches traffic sent from ATN X1 to ATN Y to Link B, which ensures
uninterrupted traffic transmission. In addition, ATN X1 reselects the route received from ATN
X3 according to the forwarding prefixes and then updates the FIB table.
8.8.2.13 BGP ORF

Outbound Route Filtering (ORF) is used to enable a BGP device to send to its BGP peer a set
of routing policies that can be used by its peer to filter out unwanted routes during route
advertisement.
A user wants a carrier to send only the required routes, whereas the carrier does not want to
maintain a separate outbound policy for every user. Therefore, a solution is required by the
carrier to send only required routes to each user without maintaining a separate outbound
policy for each user. Therefore, ORF is developed to meet the requirements of users and
carriers. ORF supports on-demand route advertisement, which greatly reduces bandwidth
consumption and effectively reduces the efforts of configuration.
l Prefix-based ORF
Prefix-based ORF, defined in RFC 5291 and RFC 5292, can be used to send prefix-based
inbound policies configured by users to a carrier through Route-Refresh messages. The

Equipment
carrier then filters out unwanted routes during route advertisement based on the received
inbound policies. This prevents users from receiving a large number of unwanted routes
and saves resources.
Applications
As shown in Figure 8-97, ATN A and ATN B are directly connected, and are enabled with
prefix-based ORF; after negotiating the prefix-based ORF capability with ATN B, ATN A
adds the local prefix-based inbound policy to a Route-Refresh message, and then sends the
Route-Refresh message to ATN B. Based on the received Route-Refresh message, ATN B
works out an outbound policy for advertising routes to ATN A.
Figure 8-97 Applying ORF to directly connected BGP peers
ATNA ATNB
AS100 AS200
As shown in Figure 8-98, there is an RR in the domain, and ATN A and ATN B are the
clients of the RR; ATN A, ATN B, and the RR are enabled with prefix-based ORF. After
negotiating prefix-based ORF with the RR, ATN A and ATN B add the local prefix-based
inbound policies to Route-Refresh messages, and then send the Route-Refresh messages to
the RR. Based on the Route-Refresh messages received from ATN A and ATN B, the RR
works out associated outbound policies for reflecting routes to ATN A and ATN B.
Figure 8-98 Applying ORF to a domain with an RR
RR
ATNA ATNB

Equipment
8.8.2.14 Active-Route-Advertise
BGP advertises only optimal routes to peers. In versions earlier, only the routes preferred by
the routing management layer are advertised. Active-route-advertise is designed to implement
forward compatibility.
By default, when a route is preferred by BGP, the route can be advertised to peers. When
active-route-advertise is configured, only the route preferred by BGP and also active on the
routing management layer is advertised to peers.
NOTE
Imported routes are active in the IP routing table and are not restricted by the active-route-advertise
command.
8.8.2.15 BGP Dynamic Update Peer-Groups

With rapid increases in the size of the routing table and the complexity of the network
topology, BGP needs to support more peers. Especially in the case of a large number of peers
and routes, high-performance packaging and forwarding are required when a device needs to
send routes to a large number of BGP peers, most of which share the same outbound policy.
The dynamic update peer-group feature considers all BGP peers with the same outbound
policy as an update-group.
Without the dynamic update peer-groups feature, each route to be sent is grouped per peer.
With the dynamic update peer-groups feature, each route to be sent is grouped once for all and
then sent to all peers in the update-group, improving grouping efficiency and forwarding
performance exponentially.
Usage Scenario
The BGP dynamic update peer-group feature is applicable to the following scenarios:
l Scenario with an international gateway
l Scenario with an RR
l Scenario where routes received from EBGP peers need to be sent to all IBGP peers
The following figures represent each scenario in turn.

Equipment
Figure 8-99 Networking for the international gateway
AS1000
AS200
AS65001
AS30
Internet Route
AS100
AS65002
AS120

Equipment
Figure 8-100 Networking for the RR with many clients
AS100
RR1 RR2
IBGP IBGP
Client Client Client Client Client Client
Figure 8-101 Networking for a PE connecting multiple IBGP peers
AS200
ATN -C
IBGP
AS100 ATN -D
ATN -A EBGP
ATN -B ATN -E
IBGP
ATN -F
The preceding scenarios have in common that a router needs to send routes to a large number
of BGP peers, most of which share the same outbound policy. This situation is most evident in
the networking shown in Figure 8-100. When a large number of peers and routes exist, the
forwarding efficiency is low.
For example, an RR has clients and needs to reflect routes to them. If the RR groups the
routes for each peer before sending the routes to 100 clients, the total number of times that all
routes are grouped . After the dynamic update peer-groups feature is applied, the total number
of times that all routes are grouped changes to . The efficiency is times higher than before.

Equipment
8.8.2.16 BGP NSR

As networks develop fast, the demand for the triple play services of the Public Switched
Telephone Network (PSTN), TV network, and Internet becomes more and more stringent.
Carriers pose high requirements for reliability on IP networks. Non-Stop Routing (NSR), as a
High Availability (HA) solution, is therefore introduced.
NSR ensures that a peer is unaware of the fault on the control plane of the local device with a
slave control plane. In this process, the peer relationships set up through specific routing
protocols, MPLS, and other protocols that carry services are not interrupted.
As an HA solution, NSR ensures that user services are not affected or least affected in the
case of device failures.
During the master/slave switchover, BGP NSR ensures uninterrupted forwarding and BGP
route advertisement.
8.8.2.17 4-Byte AS Number

2-byte AS numbers used on the network range from 0 to 65535, and the available AS
numbers almost become exhausted as networks expand. 4-byte AS numbers can address this
problem. New speakers that support 4-byte AS numbers can co-exist with old speakers that
support only 2-byte AS numbers.
4-byte AS numbers define a new capability code and new optional transitive attributes to
negotiate the 4-byte AS number capability and transmit 4-byte AS numbers. This mechanism
enables communication between new speakers and between old speakers and new speakers.
l New speaker: a peer that supports 4-byte AS numbers

l Old speaker: a peer that does not support 4-byte AS numbers
l New session: a BGP connection established between new speakers
l Old session: a BGP connection established between a new speaker and an old speaker, or
between old speakers
BGP Extension
Open capability code 0x41, defined for BGP connection negotiation, indicates that the BGP
speaker supports 4-byte AS numbers.
Two new optional transitive attributes, AS4_Path with attribute code 0x11 and
AS4_Aggregator with the attribute code 0x12, are defined to transmit 4-byte AS numbers on
old sessions.
If a new speaker with an AS number greater than 65535 communicates with an old speaker,
the old speaker needs to set the peer AS number to AS_TRANS. The value of AS_TRANS is
23456 and reserved.
Principles
When setting up connections, BGP peers determine whether the peer supports 4-byte AS
numbers according to the optional capability field in Open messages.
l New sessions are set up between new speakers. AS_Path and Aggregator in an Update
message carry 4-byte AS numbers.

Equipment
l Old sessions are set up between new and old speakers. AS_Path and Aggregator on old
speakers carry 2-byte AS numbers.
– When a new speaker sends an Update message to an old speaker, if the AS number
of the new speaker is greater than 65535, AS4_Path and AS4_Aggregator are used
together with AS_Path and AS_Aggregator to carry 4-byte AS numbers. AS4_Path
and AS4_Aggregator are transparent to the old speaker.
– When receiving messages that contain AS_Path, AS4_Path, AS_Aggregator, and
AS4_Aggregator from an old speaker, a new speaker reconstructs the actual
AS_Path and AS_Aggregator based on the reconstruction algorithm.
Usage Scenario
Figure 8-102 shows old speakers and new speakers. The 4-byte AS number feature, together
with AS4_Path, transmits routing information between the old and new speakers.
Figure 8-102 Networking for the application of 4-byte AS numbers
AS10
old speaker
ATN-A
D=(8.0.0.0)
AS_Path (10)
AS20.1 AS50.5
ATN-C
new speaker new speaker
ATN-B
D=(8.0.0.0)
AS_Path (23456, 10) D=(8.0.0.0)
AS4_Path (20.1, 10) AS_Path (40.4, 30, 20.1, 10)
old speaker new speaker

D=(8.0.0.0)
ATN-D AS_Path (30, 23456, 10) ATN-E
AS4_Path (20.1, 10)
AS30 AS40.4
In Figure 8-102, before advertising route D=8.0.0.0 of AS 10 to other ASs, a BGP device
performs the following:
1. BGP adds AS 10 to the AS_Path list (10).
2. When the route passes AS 20.1, to enable ATN-D (old speaker) to transmit AS path
information with 4-byte AS numbers, this route carries the AS4_Path attribute (20.1, 10).

Equipment
ATN-B then adds AS 20.1 to the beginning of the AS_Path list (23456, 10). (The value
23456 is obtained when AS_TRANS replaces 20.1.)
3. When the route passes AS 30, ATN-D, an old speaker, transparently transmits AS4_Path
(20.1, 10) to ATN-E. ATN-D then adds AS 30 to the beginning of the AS_Path list (30,
23456, 10).
4. When the route passes AS 40.4, after the reconstruction of AS_Path and AS4_Path, BGP
adds AS 40.4 to the beginning of the AS_Path list (40.4, 30, 20.1, 10).
The rest may be deduced by analogy. After the device in AS 50.5 receives the route, the
device learns the path to AS 10 according to the AS_Path list.
8.8.2.18 Routing Policy-based Next Hop Iteration

BGP needs to iterate indirect next hops. If indirect next hop iteration is not controlled, routes
may be iterated to incorrect forwarding paths. To prevent this problem, a routing policy can be
used to control next hop iteration. The route that fails to match the routing policy is ignored.
Application
On the network shown in Figure 8-103, IBGP peer relationships are established between
ATN A and ATN B, and between ATN A and ATN C through loopback interfaces. ATN A
receives a BGP route with the prefix 10.10.10.10/32 from ATN B and ATN C. The original
next hop of the BGP route received from ATN B is 2.2.2.2. The address of GE 0/2/1 on ATN
A is 2.2.2.10/24.
Figure 8-103 Routing policy–based next hop iteration
Loopback0
2.2.2.2/32
Loopback0
1.1.1.1/32 ATNB
GE0/2/1
2.2.2.10/24 GE0/2/2
10.10.10.10/32
GE0/2/3
ATNA
ATNC
Loopback0
AS100 3.3.3.3/32
When ATN B runs normally, the BGP route with the prefix 10.10.10.10/32 is iterated to the
IGP route 2.2.2.2/32. If ATN B fails, the IGP route 2.2.2.2/32 becomes unavailable, which
triggers new route iteration. ATN A searches the IP routing table based on the original next
hop 2.2.2.2 and uses the route 2.2.2.0/24 for iteration. However, the route 3.3.3.3 is expected
when the route 2.2.2.2 is unreachable.

Equipment
In this situation, you can configure a routing policy with the mask length of the route to the
original next hop as the matching condition to control the route iteration. In this example, you
can configure a routing policy so that the route with the original next hop 2.2.2.2 depends on
only the IGP route 2.2.2.2/32.

Terms
Term Definition
BGP Border Gateway Protocol. Dynamic routing protocol used between ASs.
Different from the IGP, such as OSPF and RIP, BGP controls route
transmission and selects optimal routes rather than discovering or
calculating routes.
BFD Bidirectional Forwarding Detection. A common fast detection

mechanism that uses Hello packets for detection. BFD can quickly
detect link status changes and notify protocols of the changes. In this
manner, protocols can determine whether to maintain or tear down peer
relationships.

Abbreviation
RM Routing Management
AS Autonomous System
EGP Exterior Gateway Protocol
IBGP Internal BGP
EBGP External BGP
CE Customer Edge
PE Provider Edge
P Provider
MPLS MultiProtocol Label Switch

Equipment

Abbreviation
NLRI Network Layer Reachability Information
CIDR Classless Inter-Domain Routing
RR Route Reflector
RIB Route Information Base
MP-BGP Multiprotocol Extensions for BGP
GR Graceful Restart
GTSM Generalized TTL Security Mechanism
TTL time to live
8.9 Routing Policies
8.9.1 Introduction to Routing Policies
Definition
Routing policies are used to filter routes and set attributes for routes. Changing route
attributes (including reachability) changes the path that network traffic passes through.
NOTE
The difference between a routing policy and policy-based routing (PBR) is as follows:
l Routing policies apply to routes. Based on routing protocols, the result of route generation,
advertisement, and selection is changed by following rules, changing parameters, or using control
modes. That is, the contents in the routing table are changed.
l PBR applies to data packets. PBR provides a means to route or forward data packets flexibly based
on predefined policies instead of following the routes in the existing routing table.
For details about PBR, see Feature Description - IP Services.
Purpose
When advertising, receiving, and importing routes, the ATN implements certain policies
based on actual networking requirements to filter routes and change the attributes of the
routes. Routing policies serve the following purposes:
l Control route advertising

Only routes that match the rules specified in a policy are advertised.
l Control route receiving
Only the required and valid routes are received. This reduces the size of the routing table
and improves network security.
l Filter and control imported routes

Equipment
A routing protocol may import routes discovered by other routing protocols. Only routes
that satisfy certain conditions are imported to meet the requirements of the protocol.
l Modify attributes of specified routes
Attributes of the routes that are filtered by a routing policy are modified to meet the
requirements of the local device.
l Configure fast reroute (FRR)
If a backup next hop and a backup outbound interface are configured for the routes that
match a routing policy, IP FRR, VPN FRR, and IP+VPN FRR can be implemented.
Benefits
This feature brings the following benefits:
l Controls the size of the routing table, saving system resources.
l Controls route receiving and advertising, improving network security.
l Modifies attributes of routes for proper traffic planning, improving network
performance.
l Improves network reliability using FRR.
8.9.2 Principles
8.9.2.1 Basic Principles of Routing Policies

You can implement routing policies in the following steps:
1. Define rules. Define features of routing information to which routing policies are
applied. That is, you need to define a set of matching rules regarding different attributes
of routing information such as the destination address and AS number.
2. Implement the rules. Apply matching rules to the routing policies to advertise, receive,
and import desired routes.
Currently, the following filters are available for routing protocols:
l Access Control List (ACL)

l IP prefix list
l AS_Path
l Community
l Extended community
l Route distinguisher (RD)
ACL
There are ACLs for IPv4 packets. When defining an ACL, you can specify an IP address and
a subnet range against which the destination network segment address or the next hop address
of a route is matched.
IP Prefix List
There are IP prefix lists for IPv4 routes.

Equipment
An IP prefix list is identified by its name. Each IP prefix list can contain multiple entries.
Each entry can independently specify a matching range in the form of a network prefix. The
matching range is identified by an index number that specifies the matching sequence.
During route matching, the device checks entries identified by the index number in ascending
order. If a route matches an entry, the route is not matched against the next entry.
AS_Path
Each BGP route contains an AS_Path attribute. AS_Path filters specify matching rules
regarding AS_Path attributes. AS_Path filters are applicable only to BGP.
For more information about the AS_Path attribute, refer to RFC 1965.
Community
Community filters are applicable only to BGP. Each BGP route contains a community
attribute to identify a community. Community filters specify matching rules regarding
community attributes.
For more information about the community attribute, refer to RFC 1997.
Extended Community
Extended community filters are applicable only to BGP. Currently, Huawei devices support
route filtering only through VPN route-target (RT) extended community attributes.
RD
RD filters are applicable only to BGP. RD filters specify matching rules regarding VPN RD
attributes.
Route-Policy
Matching rules are the core of route-policies.
A route-policy can use the preceding filters to define its matching rules. A route-policy can
consist of multiple nodes, and the relationship between these nodes is OR. The system checks
the nodes based on index numbers. If a route matches a node in the route-policy, the route is
not matched against the next node.
Each node comprises a set of if-match and apply clauses. The if-match clauses define the
matching rules that are used to filter certain route attributes. The relationship between the if-
match clauses of a node is AND. A route matches a node only when the route matches all the
matching rules defined by the if-match clauses of the node. The apply clauses specify
actions. When a route matches a node, the apply clauses set certain attributes for the route.
Nodes have the following matching modes:
l Permit: If a route matches all the if-match clauses of a node, the route matches the route-
policy, and all the actions defined by apply clauses are performed on the route. If a route
does not match one if-match clause of a node, the route is matched against subsequent
nodes.
l Deny: If a route matches all the if-match clauses of a node, the route is denied and is not
matched against the next node.

Equipment
8.9.2.2 Usage Scenario

Routing policies can be configured using the following commands:
l filter-policy { import | export }

import defines the type of routes that can be accepted.
export defines the type of routes that can be sent.
l import-route
The import-route command defines route exchange between routing protocols. By
default, a routing protocol advertises only the routing information discovered by itself.
The import-route command allows a routing protocol to exchange routing information
with other protocols and advertise the routes discovered by other protocols.

Acronym Full Name
and
Abbreviatio
n
EBGP External Border Gateway Protocol
ACL Access Control List
USR Unicast Static Route
RM Route Management
8.10 Appendix List of Port Numbers of Common Protocols

Table 8-38 Port number of routing protocols
Routing Protocol UDP Port Number TCP Port Number
RIP 520 -
RIPv2 520 -
RIPng 521 -
BGP - 179
OSPF - -
IS-IS - -

Equipment
Routing Protocol UDP Port Number TCP Port Number
Note that "-" indicates that the related transport layer protocol is not used.
Table 8-39 Port number of application layer protocols

Application Layer UDP Port Number TCP Port Number
Protocol
DHCP 67 -
DNS 53 53
FTP - 20/21
HTTP - 80
IMAP - 993
NetBIOS 137/138 137/139
POP3 - 110
SMB 445 445
SMTP 25 25
SNMP 161 -
TELNET - 23
TFTP 69 -
Note that "-" indicates that the related transport layer protocol is not used.

Equipment
Feature Description 9 IP Multicast
9 IP Multicast
About This Chapter
This document describes the IP multicast in terms of the overview, principle, and applications.
NOTE
IP multicast is under GTL License control. IP multicast can be enabled on the ATN only after a valid
GTL License file is loaded and activated.
9.1 IP Multicast Overview

9.2 PIM
9.3 IGMP
9.4 Layer 2 Multicast
9.5 MSDP
9.6 Multicast Management
9.7 Multicast Route Management
9.8 Multicast VPN in Rosen Mode
9.9 Multicast Security
9.1 IP Multicast Overview
9.1.1 Introduction
IP multicast is a method of sending a single IP stream to multiple receivers simultaneously,
reducing bandwidth consumption. IP multicast provides benefits for point to multi-point
(P2MP) services, such as e-commerce, online conferencing, and video on demand. P2MP
services offer opportunities for significant profits, yet require high bandwidth and secure
operation. IP multicast is used to meet these requirements.

Equipment
IP Data Transmission
IP data transmission is based on IP addresses. An IP address identifies a specific device on a
specific network.
l Source devices send packets that carry destination IP addresses.

l The ATN determines packet next hops based on the destination IP addresses and
forwards each packet to the network segment where the next hop device resides.
l Each host checks the destination IP addresses of received packets and drops the packets
whose destination IP address it cannot identify.
Process of Transmitting IP Data

The process of transmitting IP data is as follows:
1. A source sends an IP packet. The packet's destination address field contains an IP
address that can be identified by the destination host.
2. ATNs forward the packet to the network segment where the destination host resides. The
network segment can be connected to multiple hosts.
3. Each host checks the destination IP addresses of all the packets in the network segment
and drops the packets whose destination IP address it cannot identify. If the IP addresses
identified by each host in the network segment are different, the packets received by
each host are different.
IP Addresses Identified by Hosts

Hosts identify the following types of IP addresses:
l Unicast IP address
A unicast IP address can identify only one host, and a host can identify only one unicast
IP address. An IP packet that carries a unicast destination address can be received only
by one host.
l Broadcast IP address
A broadcast IP address can identify all hosts on a network segment, and an IP packet that
carries a broadcast destination IP address can be received by all hosts on a network
segment. However, a host can identify only one broadcast IP address. IP broadcast
packets cannot be transmitted across network segments.
l Multicast IP address
A multicast IP address can identify multiple hosts at different locations, and a host can
identify multiple multicast IP addresses. An IP packet that carries a multicast destination
IP address can therefore be received by multiple hosts at different locations.
IP Transmission Modes
Based on the IP address types, networks can transmit packets in the following modes:
l IP unicast mode
l IP broadcast mode
l IP multicast mode
Any of these modes can be used for P2MP data transmission.

Equipment
l Unicast transmission
– Features: A unicast packet uses a unicast address as the destination address. If
multiple receivers require the same packet from a source, the source sends an
individual unicast packet to each receiver.
– Disadvantages: This mode consumes unnecessary bandwidth and processor
resources when sending the same packet to a large number of receivers.
Additionally, the unicast transmission mode does not guarantee transmission quality
when a large number of hosts exist.
l Broadcast transmission
– Features: A broadcast packet uses a broadcast address as the destination address. In
this mode, a source sends only one copy of each packet to all hosts on the network
segment, irrespective of whether a host requires the packet.
– Disadvantages: This mode requires that the source and receivers reside on the same
network segment. Because all hosts on the network segment receive packets sent by
the source, this mode cannot guarantee information security or charging of services.
l Multicast transmission
The following example uses the network shown in Figure 9-1 to illustrate the multicast
transmission mode. A source exists on the network. User A and User C require
information from the source, while User B does not.
Figure 9-1 Multicast transmission
ATN D
Receiver
UserA
RouterA
Source ATN E
UserB
RouterB
ATN F
Receiver
RouterC UserC
packets for the multicast group
– Features: A multicast packet uses a multicast address as the destination address. If

multiple receivers on a network segment require the same packet from a source, the
source sends only one packet to the multicast address.
The multicast protocol deployed on the network establishes a routing tree for the
packet. The tree's root is the source, and routes branch off to all multicast members.
As shown in Figure 9-1, multicast data is transmitted along the path: Source →
Router B → ATN E [ → ATN D → User A | → ATN F → User C ]

Equipment
– Advantages: In multicast mode, only one copy of a multicast packet exists on each
link and is sent to users along the distribution tree. Only users who require the
packet receive it, providing the basis for information security. Compared with
unicast, multicast does not increase the network load when the number of users
increases in the same multicast group. As a result, multicast requires fewer server
and CPU resources. Compared with broadcast, multicast can transmit information
across network segments and across long distances.
– Applications: Multicast applies to all P2MP applications, such as multimedia
presentations, streaming media, and finance (stock-trading) applications. IP
multicast is being widely used in Internet services, such as online broadcast,
network TV broadcast, and real-time video and audio conferencing.
9.1.2 Principles
Multicast Group
A multicast group consists of a group of receivers that require the same data stream. A
multicast group uses an IP multicast address identifier. A host that joins a multicast group
becomes a member of the group and can identify and receive IP packets that have the IP
multicast address as the destination address.
Multicast Source
A multicast source sends IP packets that carry multicast destination addresses.
l A multicast source can simultaneously send data to multiple multicast groups.

l Multiple multicast sources can simultaneously send data to the same multicast group.
Members of a Multicast Group

A member of a multicast group is a host that requires IP packets from the multicast group.
Hosts can choose to join or leave a multicast group, so the members of a multicast group are
dynamic. The members can be located anywhere geographically on a network.
A multicast source is generally not a receiver or a member of a multicast group.
Multicast Router
The ATN that supports the multicast function is called multicast router.
A multicast router implements the following functions:
l Manages group members on the leaf segment networks that connect to users.
l Routes and forwards multicast packets.
Multicast Distribution Tree

A multicast distribution tree (MDT) is a tree-shaped packet distribution path along which
multicast traffic is sent to multicast receivers.

Equipment
9.1.2.2 Basic Framework

This section describes the basic multicast framework and key multicast techniques that
transmit multicast data from a source to multiple receivers. Table 9-1 provides a brief
description of key multicast techniques.
Table 9-1 Key multicast techniques

Multicast Technique Description
Host registration Connects hosts to multicast groups.
Multicast source discovery Determines multicast sources.
Multicast destination addressing Determines multicast data destinations.
Multicast routing Forwards multicast data.
IP multicast is an end-to-end service. Figure 9-2 shows four IP multicast functions from the
lower protocol layer to the upper protocol layer.
Figure 9-2 IP multicast basic framework
Multicast Multicast
Multicast Multicast ……
…… routing routing
Host Host
…… …… …… registration registration
Addressing Addressing Addressing Addressing
mechanism mechanism mechanism mechanism
Multicast Multicast Multicast Receiver
source (host) router router (host)
The four functions operate as follows:

l Addressing mechanism: transmits data to multicast groups based on multicast destination
addresses.
l Host registration: allows hosts to dynamically join and leave groups, implementing
group member management.
l Multicast routing: sets up distribution trees to transmit packets from sources to receivers.
l Multicast application: To work together, multicast sources and receivers must support the
same multicast application software, such as a video conference application. The TCP/IP
protocol suite must support multicast data transmission and receipt.

Equipment
9.1.2.3 Multicast Addresses

The multicast destination addressing mechanism helps a device to determine the destination
of a packet and the way to determine a destination address. Multicast addressing has the
following needs:
l Multicast IP addresses are needed to implement the communication between a source

and its receivers on the network layer.
l Link layer multicast (also known as hardware multicast) is needed to transmit multicast
data on a local physical network. On an Ethernet link layer network, hardware multicast
uses multicast MAC addresses.
l An IP-to-MAC address mapping technology is needed to map multicast IP addresses to
multicast MAC addresses.
IPv4 Multicast Addresses

IPv4 addresses are classified as Class A, B, C, D, or E. Class D addresses are IPv4 multicast
addresses and are carried in packets' destination address fields to identify multicast groups.
A multicast packet's source address field contains a Class A, B, or C unicast address. A Class
D address cannot be a source IP address in a multicast packet. Class E addresses are reserved
for future use.
All receivers in a multicast group are identified by the same IPv4 multicast group address on
the network layer. Once a user joins the group, the user can receive all IP packets sent to the
group.
Class D addresses are in the 224.0.0.0 to 239.255.255.255 range. For details, see Table 9-2.
Table 9-2 Class D addresses

Class D Address Range Description
224.0.0.0 to 224.0.0.255 Permanent multicast group addresses reserved by the

Internet Assigned Number Authority (IANA) for routing
protocols
224.0.1.0 to 231.255.255.255 Temporary any-source multicast (ASM) group addresses

233.0.0.0 to 238.255.255.255 valid on the entire network
232.0.0.0 to 232.255.255.255 Temporary source-specific multicast (SSM) group

addresses valid on the entire network
239.0.0.0 to 239.255.255.255 Temporary ASM group addresses valid only in local

administration domains
A local administration multicast address is a private
address and can be used in different multicast
administration domains.
l A permanent multicast group address, also known as a reserved multicast group address,
identifies all devices in a multicast group that may contain any number (including 0) of
members. For details, see Table 9-3.

Equipment
l A temporary multicast group address, also known as a common group address, is an

IPv4 address that is assigned to a multicast group temporarily. If there is no user in this
group, this address is reclaimed.
Table 9-3 General permanent multicast group addresses
Permanent Multicast Group Address Description
224.0.0.0 Unassigned address
224.0.0.1 Address of all hosts and routers on a network

segment (this address works like a broadcast
address)
224.0.0.2 Address of all multicast routers
224.0.0.3 Unassigned address
224.0.0.4 Address of Distance Vector Multicast

Routing Protocol (DVMRP) devices
224.0.0.5 Address of Open Shortest Path First (OSPF)

devices
224.0.0.6 Address of OSPF designated routers (DRs)
224.0.0.7 Address of ST routers
224.0.0.8 Address of ST hosts
224.0.0.9 Address of RIP version 2 (RIP-2) devices
224.0.0.11 Address of mobile agents
224.0.0.12 Address of Dynamic Host Configuration

Protocol (DHCP) server/relay agents
224.0.0.13 Address of all Protocol Independent

Multicast (PIM) devices
224.0.0.14 Address of Resource Reservation Protocol

(RSVP) encapsulation devices
224.0.0.15 Address of all CBT devices
224.0.0.16 Address of a specified Successful Backward

Setup Information Message (SBM) device
224.0.0.17 Address of all SBM devices
224.0.0.18 Address of Virtual Router Redundancy

Protocol (VRRP) devices
224.0.0.19 to 224.0.0.21 Unassigned addresses
224.0.0.22 Address of all Internet Group Management

Protocol version 3 (IGMPv3) routers
224.0.0.23 to 224.0.0.255 Unassigned addresses

Equipment
Multicast MAC Addresses

IEEE802.3 defines unicast and multicast MAC addresses as follows:
l The last bit in the first byte of a unicast address is fixed at 0.
l The last bit in the first byte of a multicast address is fixed at 1.
A multicast MAC address identifies receivers of the same multicast group at the link layer.
Ethernet interface boards can identify multicast MAC addresses. After a multicast MAC
address of a multicast group is configured on a device's driver, the device can then receive and
forward data of the multicast group on the Ethernet. The mapping between the multicast IPv4
address and multicast IPv4 MAC address is as follows:
As defined by the IANA, the 24 most significant bits of a MAC address are 0x01005e, the
25th bit is 0, and the 23 least significant bits are the same as those of a multicast IPv4 address.
Figure 9-3 shows the mapping relationships between multicast IPv4 addresses and multicast
MAC addresses.
Figure 9-3 Mapping relationships between multicast IPv4 addresses and multicast MAC
addresses
5 bits information loss
XXXX X
32 bits IP address 1110 XXXX X XXXXXXX XXXXXXXX XXXXXXXX
...... 23 bits ......

mapping
48 bits MAC address

00000001 00000000 01011110 0 XXXXXXX XXXXXXXX XXXXXXXX
25 bits MAC address prefix
The first four bits of an IPv4 multicast address, 1110, are mapped to the 25 most significant
bits of a multicast MAC address. In the last 28 bits, only 23 bits are mapped to a MAC
address, resulting in the loss of 5 bits. Therefore, 32 IPv4 multicast addresses are mapped to
the same MAC address.
NOTE
This document focuses on IP multicast technology and device operation. Multicast in the document
refers to IP multicast, unless otherwise specified.
9.1.2.4 Multicast Model Classification

Based on the control level for multicast sources, IP multicast models are classified as follows:

Equipment
l ASM model
l SFM model
l SSM model
ASM Model
In the any-source multicast (ASM) model, any sender can act as a multicast source and send
information to a multicast group address. Receivers cannot know the multicast source location
before they join a multicast group.
SFM Model
From the sender's point of view, the source-filtered multicast (SFM) model works the same as
the ASM model. That is, any sender can act as a multicast source and send information to a
multicast group address.
Compared with the ASM model, the SFM model extends the following function: The upper
layer software checks the source addresses of received multicast packets, permitting or
denying packets of multicast sources as configured.
NOTE
Compared with ASM, SFM adds multicast source filtering policies. The basic principles and
configurations of ASM and SFM are the same. In this document, information about ASM also applies to
SFM.
SSM Model
In real-world situations, users may not require all data sent by multicast sources. The source-
specific multicast (SSM) model allows users to specify multicast data sources.
Compared with receivers in the ASM model, receivers in the SSM model know the multicast
source location before they join a multicast group. The SSM model uses a different multicast
address scope from the ASM model and sets up a dedicated forwarding path between a source
and receivers.
9.1.2.5 Multicast Protocols

To implement a complete set of multicast services, several multicast protocols need to work
together. Figure 9-4 show major multicast protocols on IPv4 and IPv6 networks, respectively.

Equipment
Figure 9-4 IPv4 multicast network
AS1 AS2
Source
PIM PIM
MSDP
IGMP IGMP
User User
The ATN supports various multicast routing protocols to implement different applications.
Table 9-4 describes commonly used multicast routing protocols.
Table 9-4 Multicast protocols

Location Multicast Protocol Protocol Function
Between a user IGMP Internet Group Allows hosts to access multicast

host and a Management Protocol networks:
multicast router (IGMP) for IPv4 networks l On the host side, the protocol allows
hosts to dynamically join and leave
multicast groups.
l On the ATN side, the protocol
exchanges information with upper
layer multicast routing protocols,
and manages and maintains
multicast group member
relationships.
Between Protocol Independent Routes and forwards multicast packets:

multicast Multicast (PIM) l Creates multicast routing entries on
routers in the demand.
same domain
l Responds to network topology
changes and maintains multicast
routing tables.
l Forwards multicast data based on
routing entries.

Equipment
Location Multicast Protocol Protocol Function
Between Multicast Source Discovery Transmits source information between

multicast Protocol (MSDP) for IPv4 ATNs in different domains
routers in networks
different
domains
Multicast protocols have two main types of functions: managing member relationships;
establishing and maintaining multicast routes.
Managing Member Relationships

This function is used to set up and maintain member relationships between hosts and ATNs.
IGMP applies to IPv4 networks with the following variants:
l IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. At present, IGMPv2 is most
widely used. IGMP versions are backward compatible.
l All the IGMP versions support the any-source multicast (ASM) model. IGMPv3 can
support the source-specific multicast (SSM) model independently, while IGMPv1 or
IGMPv2 needs to work with SSM mapping to support the SSM model.
Establishing and Maintaining Multicast Routes

A multicast route, also called a multicast distribution tree, refers to the data transmission path
from a multicast source to all receivers. The path is unidirectional, loop-free, and the shortest
available path. Multicast data packets can be forwarded only after multicast routes are
established and maintained among ATNs.
l Intra-domain multicast routing protocols discover multicast sources and establish
multicast distribution trees in an autonomous system (AS) to deliver information to
receivers. PIM is a typical intra-domain multicast routing protocol and operates in dense
mode (DM) or sparse mode (SM):
– DM: applies to small-scale networks in which receivers are densely distributed.
This mode supports the ASM model.
– SM: applies to large-scale networks in which receivers are sparsely distributed. This
mode supports the ASM and SSM models.
l Inter-domain multicast routing protocols transmit multicast source information between
domains to set up inter-domain routes. Multicast resources can then be shared among
different domains. MSDP is a typical inter-domain multicast routing protocol. It usually
works with the Multicast Border Gateway Protocol (MBGP) to implement inter-domain
multicast. MSDP applies to domains that run the Protocol Independent Multicast-Sparse
Mode (PIM-SM).
In the SSM model, domains are not classified as intra-domains or inter-domains. Receivers
know the location of the multicast source domain; therefore, multicast transmission paths can
be directly established with the help of partial PIM-SM functions.
9.1.2.6 Multicast Packet Forwarding

Multicast packet forwarding is independent of and does not affect unicast packet forwarding.

Equipment
In multicast packet forwarding, an IP packet's destination address is a multicast group

address. A multicast source sends data packets to the host group identified by the destination
address. To transmit packets to all receivers, the ATN on the forwarding path needs to send a
packet received from an inbound interface to many outbound interfaces. To perform these
tasks, multicast models use the following functions:
l A multicast routing table guides the forwarding of multicast packets.

l Reverse path forwarding (RPF) ensures that multicast routing uses the shortest path tree.
RPF is used by most multicast protocols to create multicast route entries and forward
packets.
The multicast packet forwarding process is as follows:

l In the any-source multicast (ASM) model: After receiving a multicast data packet, a
device searches the multicast forward information base (MFIB) table for a matching
entry. If a matching entry is found, the device forwards the multicast data packet based
on the entry information. If a matching entry is not found, the forwarding plane instructs
PIM to create a multicast routing entry based on the previously received Join message.
Then, PIM delivers the entry to the MFIB table to guide multicast data forwarding.
l In the source-specific multicast (SSM) model: After receiving a Join message, PIM-SSM
builds a multicast routing entry in the multicast routing table and delivers the entry to the
MFIB table. Then, after receiving a multicast data packet, the device searches the MFIB
table for a matching entry. If a matching entry is found, the device forwards the packet
based on the entry information. If a matching entry is not found, the device discards the
packet.
Terms
Terms Definition
IGMP Internet Group Management Protocol. A signaling mechanism that implements

communication between hosts and ATNs on IP multicast leaf networks.
By sending IGMP messages, hosts join or leave multicast groups, and ATNs
identify whether a multicast group contains members on a downstream network.
PIM Protocol Independent Multicast. A multicast routing protocol.

Reachable unicast routes are the basis of PIM forwarding. PIM uses the existing
unicast routing information to perform reverse path forwarding (RPF) check on
multicast packets to create multicast routing entries and set up a multicast
distribution tree (MDT).
MSDP Multicast Source Discovery Protocol. A protocol applies only to the any-source
multicast (ASM) model in PIM-SM domains.
After an MSDP peer relationship is set up between rendezvous points (RPs) of
different PIM-SM domains, multicast source information can be shared between
the PIM-SM domains, and the inter-domain multicast can then be implemented.
After an MSDP peer relationship is set up between RPs of the same PIM-SM
domain, multicast source information can be shared in the PIM-SM domain, and
Anycast-RP can then be implemented.

Equipment

Abbreviation
ASM any-source multicast
SFM source-filtered multicast
SSM source-specific multicast
9.2 PIM
9.2.1 PIM
Purpose
A multicast network requires multicast protocols to replicate and forward multicast data. The
Protocol Independent Multicast (PIM) is a widely used intra-domain multicast protocol that
builds MDTs to transmit multicast data between devices in the same domain.
PIM can create multicast routing entries on demand, forward packets based on multicast
routing entries, and dynamically respond to network topology changes.
Definition
PIM is a multicast routing protocol that uses unicast routing protocols to forward data, but
PIM is independent of any specific unicast routing protocols.
PIM can be implemented in PIM-DM, PIM-SM, or PIM-SSM mode on IPv4 networks.
Table 9-5 PIM implementation modes

Protoco Name Model Deployment
l Scenario
PIM- Protocol Independent Multicast- ASM Small-scale LANs on

DM Dense Mode which multicast data
receivers are densely
distributed
PIM- Protocol Independent Multicast- ASM Large-scale networks on

SM Sparse Mode which multicast data
receivers are sparsely
distributed

Equipment
Protoco Name Model Deployment

l Scenario
PIM- Protocol Independent Multicast- SSM Networks on which

SSM Source-Specific Multicast multicast data receivers
can learn source
locations before they
join multicast groups
and require multicast
data from specific
multicast sources
Benefits
PIM works together with other multicast protocols to implement applications, such as:
l Multimedia and media streaming applications
l Training and tele-learning communication
l Data storage and financial management applications
IP multicast is being widely used in Internet services, such as online broadcasts, network TV,
e-learning, telemedicine, network TV stations, and real-time video/voice conference services.
9.2.2 Principles
9.2.2.1 PIM-DM
PIM-DM is used for P2MP data transmission on small-scale networks on which users are
densely distributed.
PIM-DM uses a flooding-pruning method to forward multicast data. PIM-DM is not suitable
for large-scale networks with sparsely distributed users because a large number of Prune
messages will be generated and the flooding-pruning process is time-consuming.
PIM-DM constructs SPT MDTs with multicast sources as roots and group members as leaves.
PIM-DM assumes that all members are densely distributed and each network segment has
members. Based on these assumptions, a multicast source first floods multicast data to each
network segment and then prunes segments without any members. Through regular flooding
and pruning, PIM-DM creates and maintains a unidirectional and loop-free SPT that connects
the multicast source and group members.
Related Concepts
This section provides basic PIM-DM concepts. See Figure 9-5.

Equipment
Figure 9-5 PIM networking
Receiver DR
Receiver
Ethernet
Receiver
Source DR
Source
Ethernet
PIM Router
l PIM device
A multicast router that supports PIM is called a PIM device. A PIM-enabled interface on
a PIM device is called a PIM interface.
l DR
A designated router (DR) is responsible for forwarding multicast data and is categorized
as a source's DR or receiver's DR.
– A multicast source's DR is a PIM device directly connected to a multicast source in
a PIM-DM domain and is responsible for forwarding multicast data packets to other
PIM devices.
– A receiver's DR is a PIM device directly connected to receivers' hosts and is
responsible for forwarding multicast data to group members.
l SPT
A shortest path tree (SPT) is a multicast distribution tree (MDT) with the multicast
source at the root and group members at leaves. SPTs can be used in PIM-DM, PIM-SM,
and PIM-SSM scenarios.
Implementation
The multicast data forwarding process in a PIM-DM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-DM domain periodically sends Hello messages to all other
PIM devices to discover PIM neighbors and maintain PIM neighbor relationships.
NOTE
By default, a PIM device permits other PIM control messages or multicast messages from a
neighbor, irrespective of whether the PIM device has received Hello messages from the neighbor.
However, if a PIM device has the neighbor check function enabled, the PIM device permits other
PIM control messages or multicast messages from a neighbor only after the PIM device has
received Hello messages from the neighbor.
2. Flooding

Equipment
PIM-DM assumes that at least one multicast group member exists on each network
segment, and floods multicast data to all routers on the network. Therefore, all PIM
devices on the network can receive multicast data.
3. Prune
After flooding multicast data, PIM-DM prunes network segments that have no multicast
data receiver and retains only the network segments that have multicast data receivers.
Only PIM devices that require multicast data can receive multicast data.
4. State Refresh
If a downstream device is in the prune state, the upstream device maintains a prune timer
for this device. When the prune timer expires, the upstream device resumes data
forwarding to the downstream device, which wastes network resources. To prevent this
problem, the state-refresh function can be enabled on the upstream router. This function
enables the upstream router to periodically send State-Refresh messages to refresh the
status of the prune timers of downstream devices. Downstream devices that do not
require multicast data remain in the prune state.
5. Graft
If a node on a pruned network segment has new group members, PIM-DM uses the graft
mechanism to enable the node to immediately forward multicast data.
If there are multiple PIM devices on a network segment, the same multicast packets are sent
repeatedly across the network segment. The Assert mechanism can be used to select a unique
multicast data forwarder, preventing redundant multicast data forwarding.
The detailed PIM-DM implementation process is as follows:
Neighbor Discovery
Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that
carries a Hello message has the following features:
l The destination address is 224.0.0.13.
l The source address is an interface address.
l The TTL is 1, indicating that packets are sent to neighbor interfaces only.
Hello messages are used to discover neighbors, adjust protocol parameters, and maintain
neighbor relationships.
l Discovering PIM neighbors

All PIM devices on the same network segment must permit multicast packets with the
destination address 224.0.0.13. Directly connected multicast routers can then learn
neighbor information from the received Hello messages.
l Adjusting protocol parameters
Hello messages are used to establish and maintain neighbor relationships. A Hello
message carries the following protocol parameters:
– DR_Priority: priority used by each ATN interface to elect a DR. A higher DR
priority indicates a higher probability of becoming a DR.
– Holdtime: timeout period during which a neighbor is considered to be in the
reachable state.
– LAN_Delay: delay for transmitting Prune message on a shared network segment.
– Neighbor-Tracking: neighbor tracking function.

Equipment
– Override-Interval: interval carried in a Hello message for overriding a Prune

message.
l Maintaining neighbor relationships
PIM devices periodically exchange Hello messages. If a PIM device does not receive a
new Hello message from its PIM neighbor within the holdtime, the ATN considers the
neighbor unreachable and deletes the neighbor from its neighbor list.
Changes in PIM neighbor relationships lead to multicast topology changes. If an
upstream or a downstream neighbor in the MDT is unreachable, multicast routes are
reconverged, and the MDT is adjusted.
Flooding
The following example uses the network shown in Figure 9-6 to describe the flooding
function. The source sends a data packet to ATN A. Then ATN A floods the packet to all its
neighbors. ATN B and ATN C also exchange data packets with each other. To prevent data
duplication, PIM-DM capable ATN B uses the reverse path forwarding (RPF) mechanism to
ensure that it only permits data packets from one neighbor, ATN A or ATN C. (The RPF
check is used by a device to check received packets and determine unicast routing entry
creation and maintenance. When a device receives a multicast packet, the device searches the
unicast routing table, Multicast Border Gateway Protocol (MBGP) routing table, and static
multicast routing table for an RPF route that matches the packet source address. If the packet
inbound interface matches the RPF interface, the packet passes the RPF check; otherwise, the
packet is considered invalid and discarded. Therefore, the RPF check is the basis for multicast
routing because it ensures that multicast data is forwarded along correct paths.) Based on the
RPF check result, ATN B permits the data packet from ATN A and sends the packet to User
A.
Figure 9-6 PIM-DM flooding
Source ATN A ATN B Receiver
UserA
PIM-DM
ATN C
packets
Flooding
Prune
The following example uses the network shown in Figure 9-7 to describe the prune function.
ATN C has no receivers, so it sends a Prune message upstream to ATN A to instruct ATN A
to stop forwarding data to the interface connected to ATN C.

Equipment
After receiving the Prune message, ATN A stops forwarding data to the downstream
interface connected to ATN C. This process is called pruning. Because a downstream interface
on ATN A is connected to ATN B that has a receiver, ATN A forwards multicast data to the
downstream interface connected to ATN B. In this manner, a unidirectional and loop-free SPT
is set up from the source to User A.
Figure 9-7 PIM-DM prune
Source ATN B Receiver

ATN A
UserA
PIM-DM
ATN C
packets
Prune
State Refresh
The following example uses the network shown in Figure 9-7 to describe the state refresh
function. After ATN A prunes the network segment of ATN C, ATN A maintains a prune
timer for ATN C. When the prune timer expires, ATN A resumes data forwarding to ATN C.
This results in a waste of network resources.
The state refresh function can prevent this problem and works as follows: ATN A periodically
floods State-Refresh messages to all its downstream interfaces to reset the prune timers of all
the downstream devices.
Graft
The following example uses the network shown in Figure 9-8 to describe the graft function.
If User B sends to pruned ATN C an IGMP Report message for joining a multicast group or
for responding to a Query message, flooding-pruning needs to be performed, so the entire
service access process is prolonged. The graft function can be used to prevent flooding-
pruning and shorten this process as follows:
ATN C sends a Graft message upstream to require ATN A to restore the forwarding status of
the downstream interface connected to ATN C. After restoring the forwarding the status, ATN
A sends multicast data to ATN C. Therefore, the graft function implements rapid data
forwarding for devices in the pruned state.

Equipment
Figure 9-8 PIM-DM graft
Source ATN B Receiver

ATN A
UserA
PIM-DM
Receiver
UserB
ATN C
packets
Graft
Assert
The following example uses the network shown in Figure 9-9 to describe the assert function.
ATN B and ATN C can receive multicast packets from the multicast source S and the
multicast packets that pass the RPF check. (S, G) entries can be created on ATN B and ATN
C. Because the downstream interfaces of ATN B and ATN C are connected to the same
network segment, ATN A and ATN C can both send multicast data to the network segment.
The assert function is used to ensure that only one multicast data forwarder exists on the
network segment. The assert process is as follows:
1. ATN B receives a multicast packet from ATN C through a downstream interface, but this
packet fails the RPF check and is discarded by ATN B. At the same time, ATN B sends
an Assert message to the network segment.
2. ATN C compares its routing information with that carried in the Assert message sent by
ATN B. ATN C is denied because the route cost from ATN B to the source is lower. The
downstream interface of ATN C is prohibited from forwarding multicast packets and
deleted from the downstream interface list of the (S, G) entry.
3. ATN C receives a multicast packet from ATN B through the network segment, but the
packet fails the RPF check and therefore is discarded.

Equipment
Figure 9-9 PIM-DM assert
ATN B Ethernet
Source ATN A Receiver

10
ost=
C
Cos
t= 2 ATN D
0
ATN C
multicast packets
Assert message from RouterB
Assert message from RouterC
9.2.2.2 PIM-SM
Protocol Independent Multicast-Sparse Mode (PIM-SM) implements P2MP data transmission
on large-scale networks on which multicast data receivers are sparsely distributed. PIM-SM
forwards multicast data only to network segments with active receivers that have required the
data.
PIM-SM assumes that no host wants to receive multicast data, so PIM-SM sets up an MDT
only after a host requests multicast data, and then sends the data to the host along the MDT.
Concepts
This section provides basic PIM-SM concepts. Figure 9-10 shows a typical PIM-SM
network.

Equipment
Figure 9-10 PIM-SM network

Ethernet
RP Receiver DR
Receiver
Ethernet
Receiver
Source DR PIM-SM
Source
BSR
PIM Router
l PIM device
A router that runs PIM is called a PIM device. A router interface on which PIM is
enabled is called a PIM interface.
l PIM domain
A network constructed by PIM devices is called a PIM network.
A PIM-SM network can be divided into multiple PIM-SM domains by configuring
BootStrap router (BSR) boundaries on router interfaces to restrict BSR message
transmission. PIM-SM domains isolate multicast traffic between domains and facilitate
network management.
l DR
A designated router (DR) can be a multicast source's DR or a receiver's DR.
– A multicast source's DR is a PIM device directly connected to a multicast source
and is responsible for sending Register messages to an RP.
– A receiver's DR is a PIM device directly connected to receiver's hosts and is
responsible for sending Join messages to an RP and forwarding multicast data to
receiver's hosts.
l RP
A rendezvous point (RP) is the forwarding core in a PIM-SM domain and is used to
process hosts' join requests and multicast source's registration requests. An RP constructs
an MDT with the RP at the root, called an RP tree (RPT). An RP creates (S, G) entries to
transmit multicast data to hosts. All routers in the PIM-SM domain need to know the
RP's location.
The following table lists the types of RPs.

Equipment
Table 9-6 RP classifications

RP Implementation Deployment Scenario Precautions
Type
Static RP A static RP is manually Static RPs are To use a static RP,

configured. recommended on small-/ ensure that all
If a static RP is used, medium-sized networks ATNs, including the
the same RP address because a small-/ RP, have the same
must be configured on medium-sized network RP and multicast
all PIM devices in the is stable and has low group address range
same domain as the RP. forwarding requirements information.
on an RP.
NOTE
If only one multicast
source exists on a
network, configuring the
device directly connected
to the source as the RP is
recommended, so the
source's DR does not need
to register with the RP.
Dynamic A dynamic RP is Dynamic RPs can be To use a dynamic

RP elected among C-RPs in used on large-scale RP, a BSR must be
the same PIM domain. networks to improve available to
The RP election process network reliability and dynamically
is as follows: After PIM maintainability. advertise group-to-
devices are configured l If multiple multicast RP mapping
as candidate-RPs (C- sources are densely information to
RPs) or candidate BSRs distributed on a ATNs.
(C-BSRs) in a PIM network, configuring
domain, a BSR is core devices close to
elected among the C- the multicast sources
BSRs. The BSR as C-RPs is
collects and recommended.
summarizes the C-RP l If multiple users are
information as an RP- densely distributed
Set, and adds the RP- on a network,
Set information to configuring core
BootStrap messages to devices close to the
be advertised to all users as C-RPs is
multicast devices in the recommended.
PIM domain. Based on
the same RP-Set
formation, the multicast
devices follow the same
rules to elect a unique
RP among the C-RPs. If
the elected RP fails, a
new RP is elected.
l BSR

Equipment
A BSR in a PIM-SM domain collects and summarizes C-RP information as an RP-Set,

and adds the RP-Set information to BootStrap messages to be advertised to all multicast
devices in the domain
A network can have only one BSR but can have multiple C-BSRs. If a BSR fails, a new
BSR is elected from the C-BSRs.
l RPT
An RPT is an MDT with an RP at the root and group members at the leaves.
l SPT
A shortest path tree (SPT) is an MDT with the multicast source at the root and group
members at the leaves. SPTs are used on PIM-DM, PIM-SM, and PIM-SSM networks.
Implementation
The multicast data forwarding process in a PIM-SM domain is as follows:
1. Neighbor discovery
Each PIM device in a PIM-SM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
NOTE
However, if a PIM device has the neighbor check function, the PIM device permits other PIM
control messages or multicast messages from a neighbor only if the PIM device has received Hello
messages from the neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on a shared network segment. The
source's DR is responsible for forwarding multicast data received from the multicast
source along an MDT.
3. RP discovery
An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards
multicast data on the entire network.
4. RPT setup
PIM-SM assumes that no hosts want to receive multicast data, so PIM-SM sets up an
RPT only after a host requests multicast data, and then sends the data from the RP to the
host along the RPT.
5. SPT switchover
A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All
multicast data packets are forwarded by the RP. The path along which the RP forwards
multicast data may not be the shortest path from the multicast source to receivers. The
load of the RP increases when the multicast traffic volume increases. If the multicast
data forwarding rate exceeds a configured threshold, an RTP-to-SPT switchover can be
implemented to reduce the burden on the RP.
If a network problem occurs, the assert mechanism or a DR switchover delay can be used to
ensure successful multicast data transmission.
l Assert
If multiple multicast data forwarders exist on a network segment, each multicast packet
is repeatedly sent across the network segment, generating redundant multicast data. To

Equipment
resolve this issue, the assert mechanism can be used to select a unique multicast data
forwarder on a network segment.
l DR switchover delay
If the role of an interface on a PIM device is changed from DR to non-DR, the PIM
device immediately stops using this interface to forward data. If multicast data sent from
a new DR does not arrive, multicast data traffic is temporarily interrupted. If a DR
switchover delay is configured, the interface continues to forward multicast data until the
delay expires. Therefore, setting a DR switchover delay prevents multicast data traffic
from being interrupted.
The detailed PIM-SM implementation process is as follows:
Neighbor Discovery
Neighbor discovery in PIM-SM is the same as that in PIM-DM. For details, see Neighbor
Discovery.
DR Election
The network segment on which a multicast source or group members reside is usually
connected to multiple PIM devices, as shown in Figure 9-11. The PIM devices exchange
Hello messages to set up PIM neighbor relationships. A Hello message carries the DR
priority and the address of the interface that connects the PIM device to this network segment.
A PIM device compares its own information with that carried in messages sent from
neighbors to elect a DR. This process is a DR election. The DR election rules are as follows:
l The PIM device with the highest DR priority wins if all the PIM devices send Hello
messages that carry a DR priority.
l If the PIM devices have the same DR priority or one or more PIM devices do not support
Hello messages carrying a DR priority, the PIM device with the highest IP address wins.
Figure 9-11 DR election
Ethernet
Ethernet
UserA
Source
DR RP
DR
UserB
Server
Hello
Join
Register Message

Equipment
RP Discovery
l Static RP
A static RP is specified by running commands. A static RP's address must be manually
configured on other routers so they can find and use this RP for data forwarding.
l Dynamic RP
A dynamic RP is elected among multiple PIM devices.
Figure 9-12 Dynamic RP election
C-BSR
PIM-SM
BSR
C-RP
C-RP
Bootstrap
C-RP advertisement
The network shown in Figure 9-12 is used as example to describe the dynamic RP
election process:
a. To use a dynamic RP, configure C-BSRs to elect a BSR among the C-BSRs.
At first, each C-BSR considers itself a BSR and advertises a Bootstrap message.
The Bootstrap message carries the address and priority of the C-BSR. Each router
compares the information contained in its received Bootstrap messages to elect a
BSR as follows:
i. The C-BSR with the highest priority wins (the greater the priority value, the
higher the priority).
ii. If all the C-BSRs have the same priority, the C-BSR with the highest IP
address wins.
All the ATNs follow the same BSR election rules, so they will elect the same BSR
and learn the BSR address.
b. The C-RPs send C-RP Advertisement messages to the BSR. Each Advertisement
message carries the address of the C-RP that sent it, the range of multicast groups
that the C-RP serves, and the priority of the C-RP.
c. The BSR collects the received information as an RP-Set, encapsulates the RP-Set
information in a Bootstrap message, and advertises the Bootstrap message to all
PIM-SM devices.

Equipment
d. Each ATN uses the RP-Set information to perform the same calculations and
comparisons to elect an RP among multiple C-RPs as follows:
i. A C-RP wins if it serves the group address that has the longest mask.
ii. If group addresses have the same mask length, the C-RP with the highest
priority wins (the greater the priority value, the lower the priority).
iii. If the C-RPs have same priority, the hash function is started. The C-RP with
the greatest calculated value wins.
iv. If none of the above criteria can determine a winner, the C-RP with the highest
address wins.
e. Because all ATNs use the same RP-set and the same election rules, the relationship
between the multicast group and the RP is the same for all ATNs. ATNs save this
relationship to guide subsequent multicast operations.
l Anycast-RP
In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When
the network is overloaded or traffic is heavy, many network problems can occur. For
example, if the RP is overloaded, routes will converge slowly, or the multicast
forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be
implemented through MSDP or PIM:
– Through MSDP: To use this mode, configure multiple RPs with the same address in
a PIM-SM domain, and allow the RPs to set Multicast Source Discovery Protocol
(MSDP) peer relationships, so that they can share multicast data source information.
This mode is only for use on IPv4 networks. For details about the implementation
principles, see Anycast-RP in MSDP.
– Through PIM: To use this mode, configure multiple RPs with the same address in a
PIM-SM domain, and assign a unique local address for each RP. These local
addresses are used to set up connectionless peer relationships between the RP
devices. The peers share multicast source information by exchanging Register
messages.
This mode is for use on IPv4 networks.
NOTE
These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is
implemented through PIM, you can also configure a device in a local domain to advertise the source
information obtained from extra-domain MSDP peers to the peers in the local domain.
With Anycast-RP, both a multicast receiver and a source select their topologically closest RPs
to create RPTs. After receiving multicast data, the receiver's DR determines whether to trigger
a switchover to an SPT. Therefore, Anycast-RP facilitates optimal RP route selection and
implements load sharing on RPs.

Equipment
Figure 9-13 Anycast-RP through PIM
PIM-SM
RP1 DR1
U1 S1
S2 U2
DR2 RP2
Register message
In the PIM-SM domain shown in Figure 9-13, multicast sources S1 and S2 send multicast
data to multicast group G, and U1 and U2 are members of group G. Perform the following
operations to implement Anycast-RP through PIM:
l Configure RP1 and RP2 and assign a same IP address for them (for example, the
loopback interface address 10.10.10.10).
l Assign a unique local IP address for each RP (for example, 1.1.1.1 for RP1, and 2.2.2.2
for RP2), so that the RPs can set up a connectionless peer relationship.
The implementation process is as follows:

1. Each receiver sends a Join message to the closest RP to set up an RPT.
– The RPT with RP1 as the root is set up for U1, and RP1 creates a (*, G) entry.
– The RPT with RP2 as the root is set up for U2, and RP2 creates a (*, G) entry.
2. Each multicast source sends a Register message to the closest RP for registration.
– S1 sends a Register message to DR1, and DR1 sends the message to RP1. RP1
creates an (S1, G) entry. Then, S1 sends multicast data to U1 along the RPT with
RP1 as the root.
– S2 sends a Register message to DR2, and DR2 sends the message to RP2. RP2
creates an (S2, G) entry. Then, S2 sends multicast data to U2 along the RPT with
RP2 as the root.
3. After receiving Register messages from a source's DR, each RP re-encapsulates and
then forwards the Register messages to peers to share multicast source information.
– After receiving the (S1, G) Register message from DR1, RP1 replaces the source
and destination addresses with 1.1.1.1 and 2.2.2.2 respectively, and re-encapsulates
and sends the message to RP2. After receiving the re-encapsulated Register
message, RP2 processes this Register message but does not forward it to other
peers.

Equipment
– After receiving the (S2, G) Register message from DR2, RP2 replaces the source
and destination addresses with 2.2.2.2 and 1.1.1.1 respectively, and re-encapsulates
and sends the message to RP1. After receiving the re-encapsulated Register
message, RP1 processes this Register message but does not forward it to other
peers.
4. Each RP joins an SPT with the source's DR as the root to obtain multicast data.
– RP1 sends a Join message to S2. Then, S2 sends multicast data to RP1 along the
SPT, and RP1 sends the data to U1 along the RPT.
– RP2 sends a Join message to S1. Then, S1 sends multicast data to RP2 along the
SPT, and RP2 sends the data to U2 through the RPT.
5. After receiving the multicast data, each receiver's DR determines whether to trigger a
switchover to an SPT based on configurations.
RPT Setup
A PIM-SM RPT is an MDT that uses an RP as a root and group member ATNs as leaves.
An RP is a data forwarding core for processing Register messages from source's DRs and
Join messages from receivers. Therefore, an RP acts as an information convergence center.
All PIM devices need to know the RP address.
Figure 9-14 shows the RPT setup and data forwarding processes.
Figure 9-14 RPT setup
Source RouterA RouterB

DR RP
Receiver
RouterC DR
(*,G) join ATN D
packets
Setting up an RPT creates a forwarding path for multicast data.

l When a multicast source sends the first message of a multicast group to its DR, the
source's DR encapsulates the multicast message in a Register message and unicasts the
Register message to the RP. The RP then creates an (S, G) entry to register the multicast
source information.
l When a receiver requests to join a multicast group, the receiver's DR sends a Join
message to the RP. A (*, G) entry is then created on each hop along the path from the
receiver's DR to the RP. An RPT is then created.

Equipment
l When a multicast receiver appears, an available multicast source encapsulates multicast

data in a Register message and unicasts the message to the RP. The RP then forwards
the multicast data along the RPT to the receiver.
RPT implements on-demand multicast data forwarding, which reduces bandwidth
consumption by preventing unrequested data transmission.
NOTE
To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM
supports switchovers to SPTs, allowing a multicast network to set up an SPT. Then, the multicast source
can send multicast data directly to receivers along the SPT.
SPT Switchover
A PIM-SM SPT is an MDT with the multicast source as the root and the group members as
leaves.
In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set
up. If SPT switchover is not enabled, all multicast packets must be encapsulated in Register
messages and then sent to the RP. After receiving the packets, the RP de-encapsulates them
and forwards them along the RPT.
Since all multicast messages forwarded along the RPT are transferred by the RP, the RP may
be overloaded when multicast traffic is heavy. To resolve this problem, PIM-SM allows the
RP or the receiver's DR to trigger an SPT switchover.
Figure 9-15 SPT switchover triggered by the receiver's DR
Source RouterA RouterB

DR RP
2
1
4
3
Receiver
RouterC DR
ATN D
(*,G) join
packets
(S,G) join
packets
An SPT switchover can be triggered by the RP or by the receiver's DR:

l SPT switchover triggered by the RP
After receiving a Register message from the source's DR, the RP sends the multicast
data encapsulated in the Register message to multicast receivers along an RPT. The RP
also sends an SPT Join message to the source's DR to set up an SPT from the source to
the RP.

Equipment
After the SPT is set up and the RP receives the first multicast data message on the SPT,
the RP stops processing Register messages. This frees the source's DR and RP from
encapsulating and decapsulating messages. Multicast data is sent from the ATN directly
connected to the multicast source to the RP along the SPT and then forwarded to group
members along the RPT.
l SPT switchover triggered by the receiver's DR
a. As shown in Figure 9-15, multicast data is transmitted along the RPT. The
receiver's DR (ATN D) sends (*, G) Join messages to the RP. Multicast data is sent
to receivers along the path: source's DR (ATN A)->RP (ATN B)-> receiver's DR
(ATN D).
b. The receiver's DR periodically checks the forwarding rate of multicast packets. If
the receiver's DR detects that the forwarding rate is greater than a configured
threshold, the DR triggers an RPT-to-SPT switchover.
c. The receiver's DR sends (S, G) Join messages to the source's DR. After receiving
multicast data along the SPT, the receiver's DR discards multicast data received
along the RPT and sends a Prune message to the RP to delete the receiver from the
RPT. The switchover from the RPT to the SPT is then complete.
d. Multicast data is forwarded along the SPT. Specifically, multicast data is
transmitted to receivers along the path: multicast source's DR (ATN A) -> receiver's
DR (ATN D).
After an SPT is set up, subsequent packets may not pass through the RP. After a
switchover to an SPT, delays in transmitting multicast data are reduced because the
previously used RPT may not have the shortest path.
If one source sends packets to multiple groups simultaneously and an SPT switchover policy
is specified for a specified group range:
l Before an SPT switchover, packets reach the receiver's DR along the RPT.
l After an SPT switchover, only the packets within the group range specified in the SPT
switchover policy are forwarded along the SPT. Other packets are still forwarded along
the RPT.
NOTE
By default, the RP performs an SPT switchover immediately after receiving the first Register message,
and the receiver's DR performs an SPT switchover immediately after receiving the first multicast data
message.
Assert
Either of the following conditions indicates other multicast forwarders are present on a
network segment:
l A multicast message fails the RPF check.

l A multicast message receiving interface is recorded as a downstream interface in an (S,
G) entry on the local ATN.
If other multicast forwarders are present on the network segment, the ATN starts the Assert
mechanism as follows.
The ATN sends an Assert message through the downstream interface, and the downstream
interface also receives an Assert message from a forwarder on the network segment. In an
Assert message, the destination address is 224.0.0.13, the source address is the downstream

Equipment
interface address, and the TTL is 1. An Assert message carries the route cost from the PIM
ATN to the source or RP, priority of the used unicast routing protocol, and group address.
The ATN compares information in its sent and received Assert messages to start Assert
election. The election rules are as follows:
1. The ATN with the highest unicast routing protocol priority wins.
2. If the ATNs have the same unicast routing protocol priority, the ATN with the smaller
route cost to the source or RP wins.
3. If the ATNs have the same priority and route cost, the ATN with the highest IP address
for the downstream interface wins.
The ATN performs the following operations based on the Assert election result:
l If the ATN wins, the downstream interface of the ATN is responsible for forwarding
multicast packets on the network segment. The downstream interface is called an Assert
winner.
l If the ATN loses, the downstream interface is prohibited from forwarding multicast
packets and deleted from the downstream interface list of the (S, G) entry. The
downstream interface is called an Assert loser.
After the Assert election is complete, only one upstream ATN that has a downstream interface
exists on the network segment and the downstream interface transmits only one copy of
multicast traffic. The Assert winner then periodically sends Assert messages to maintain its
status as the Assert winner. If the Assert loser does not receive any Assert message from the
Assert winner within the time limit, it re-adds a downstream interface for multicast data
forwarding.
DR Switchover Delay
If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is
triggered.
By default, when an interface changes from a DR to a non-DR, the ATN immediately stops
using the interface to forward data. If multicast data sent from a new DR has not yet arrived,
multicast data flows are temporarily interrupted.
However, if a PIM-SM interface that has a PIM DR switchover delay configured receives
Hello messages from a new neighbor and changes from a DR to a non-DR, the interface
continues to function as a DR and to forward multicast messages until the delay elapses.
If the PIM-SM interface receives packets from a new DR before the delay elapses, the
interface immediately stops forwarding messages, preventing duplicated multicast data
transmission. Therefore, when a new IGMP Report message is received on the shared network
segment, the new DR, not the old DR configured with the DR switchover delay, sends a PIM
Join message to the upstream device.
NOTE
If the new DR receives multicast data from the original DR before the DR switchover delay elapses, an
Assert election is triggered.
PIM-SM Administrative Domain

A PIM-SM network can be divided into one global domain and multiple BSR administrative
domains to simplify network management and to reduce workloads on a single BSR. The

Equipment
division also allows for the use of private group addresses to provide user services in a
specified domain.
Each BSR administrative domain has one BSR, and this BSR serves multicast groups in a
specific address range. The global domain also has one BSR, and the BSR serves all multicast
groups that not served by the BSR administrative domains.
The relationship between the BSR administrative domain and the global domain is described
as follows in terms of the domain space, group address range, and multicast function.
l Domain space
Figure 9-16 Space difference
C-RP BSR
BSR1
domain
BSR C-RP
Global
C-RP
domain
BSR2 C-RP BSR

domain
Each BSR administrative domains contains exclusive ATNs, and each ATN belongs only
to one BSR administrative domain, as shown in Figure 9-16. BSR administrative
domains are independent and geographically isolated from each other. Each BSR
administrative domain serves multicast groups in a specific address range and cannot
transmit multicast messages to other BSR administrative domains.
The global domain contains all the ATNs on the PIM-SM network. The multicast
messages that do not belong to a BSR administrative domain can be transmitted over the
entire PIM network.
l Group address range

Equipment
Figure 9-17 Group address range difference
BSR1 BSR3
G1 address G3 address
Global
G-G1-G2 address BSR2
G2 address
Each BSR administrative domain provides services for multicast groups within a specific
address range. The multicast groups that different BSR administrative domains serve can
overlap. However, a multicast group address is valid only in its serving BSR
administrative domain and considered a private group address of the domain. As shown
in Figure 9-17, the group address range of BSR1 overlaps with that of BSR3.
The multicast group that does not belong to any BSR administrative domain belongs to
the global domain. In the example shown in Figure 9-17, the group address range of the
global domain is G-G1-G2.
l Multicast function
The global domain and each BSR administrative domain have their respective C-RP and
BSR devices, as shown in Figure 9-16. Devices only function in the domain to which
they are assigned. Each BSR administrative performs BSR and RP elections
independently.
Each BSR administrative domain has a border. Multicast information for this domain,
such as the C-RP Advertisement messages and BSR Bootstrap message, can be
transmitted only within the domain. Multicast information for the global domain can be
transmitted over the entire global domain and can traverse any BSR administrative
domain.
9.2.2.3 PIM-SSM
PIM-SM needs to maintain Rendezvous Points (RPs) to transmit multicast data. If receivers
know the exact location of a multicast source and want to request multicast data directly from
a multicast source, Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) can
enable use hosts to rapidly join multicast groups. A shortest path tree (SPT) is set up between
the multicast source and group members.
Unlike the Any-Source Multicast (ASM) model, the Source-Specific Multicast (SSM) model
does not need to maintain an RP, construct a rendezvous point tree (RPT), or register a
multicast source.

Equipment
The SSM model is based on the PIM-SM technology and IGMPv3/Multicast Listener
Discovery (MLD)v2. The procedure for setting up a multicast forwarding tree on a PIM-SSM
network is similar to the procedure for setting up an SPT on a PIM-SM network. The
receiver's Designated router (DR), which knows the exact position of the multicast source,
sends Join messages directly to the source so that multicast data streams can be sent to the
receiver's DR.
Related Concepts
PIM-SSM is implemented based the PIM-SM technology. For details about PIM-SSM
concepts, see Related Concepts.
Implementation
The process for forwarding multicast data in a PIM-SSM domain is as follows:
1. Neighbor Discovery
Each PIM device in a PIM-SSM domain periodically sends Hello messages to all other
PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor
relationships.
NOTE
However, if a PIM device has the neighbor check function, the PIM device permits other PIM
control messages or multicast messages from a neighbor only after the PIM device has received
Hello messages from the neighbor.
2. DR Election
PIM devices exchange Hello messages to elect a DR on a shared network segment. The
receiver's DR is the only multicast data forwarder on the segment.
3. SPT Setup
Users on a PIM-SSM network can know the exact position of the multicast source in
advance and can, therefore, specify the source when joining a multicast group. After
receiving a Report message from a user, the receiver's DR sends a Join message towards
the multicast source to establish an SPT between the source and the user. Multicast data
is sent by the multicast source to the user along the SPT.
NOTE
l The SPT establishment can be triggered by a user dynamically joining a multicast group, static join,
or SSM-mapping.
l The DR in an SSM scenario is valid only in the shared network segment connected to group
members. The DR on the group member side sends Join messages to the multicast source, creates
the (S, G) entry hop by hop, and then sets up an SPT.
l PIM-SSM supports a PIM DR switchover delay, PIM silent, and BFD for PIM.
9.2.2.4 PIM Reliability
The following mechanisms are used to ensure PIM reliability:

l BFD for PIM
l PIM NSR

Equipment
Basic Principles of BFD for PIM

Network devices need to detect communication faults with adjacent devices rapidly and take
measure to keep services running normally in order to reduce the impact of faults.
Currently, the following detection mechanisms are used:
l Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarm
function can be used to detect link faults. Hardware fault detection mechanisms are fast,
but cannot be used in all scenarios by all media.
l Slow Hello mechanism: It usually refers to the Hello mechanism of a routing protocol.
The detection rate for slow Hello mechanisms is measured in seconds. Detection times of
one second or more can result in large losses if data is being transmitted at high speed
gigabit rates. For delay-sensitive services such as voice, a delay of one second or more is
also unacceptable.
l Other detection mechanisms: Different protocols or manufacturers may provide
proprietary detection mechanisms, but it is difficult to deploy proprietary mechanisms
when systems are interconnected for interworking.
Bidirectional Forwarding Detection (BFD) is a unified detection mechanism for the entire
network. BFD can be used with all types of transmission media and protocols. BFD can detect
a fault in milliseconds. Two systems set up a BFD session, and periodically send the BFD
packets along the path. If one system does not receive BFD packets within a specified period,
the system considers the path to be faulty.
In multicast scenarios, if the current Designated router (DR) on the shared network segment is
faulty and the neighbor relationship times out, other PIM neighbors start a new DR election.
Consequently, multicast data transmission is discontinued. The duration of the disruption,
usually a few seconds, is longer than the timeout period of the neighbor relationship.
BFD for PIM can detect the status of a link on a shared network segment within milliseconds
and respond quickly to a fault on a PIM neighbor. If the interface configured with BFD for
PIM does not receive any BFD packets from the current DR within a configured detection
period, it considers that a fault has occurred on the DR. BFD notifies the RM (Route
management) of the session status and the RM then notifies PIM. The PIM module triggers a
new DR election immediately rather than waiting for the neighbor relationship to time out.
This reduces the duration of multicast data transmission interruptions and makes multicast
more reliable.
NOTE
Currently, BFD for PIM can be used on IPv4 PIM-SM/SSM networks.

Equipment
Figure 9-18 Diagram of BFD for PIM

Ethernet
RouterA
Source PIM-SM
ATN C
ATN B GE2
GE1
Ethernet
Receiver
As shown in Figure 9-18, in the shared network segment where user hosts reside, a PIM BFD
session is set up between the downstream interface GE1 of ATN B and the downstream
interface GE2 of ATN C. Both interfaces send BFD packets to detect the status of the link
between them.
GE1 of ATN B is elected as a DR for forwarding multicast data to Receiver. If GE1 fails,
BFD fast notifies the RM of the session status and the RM then notifies PIM. PIM triggers a
new DR election. GE2 of ATN C is then elected as a new DR to forward multicast data to
Receivers.
PIM NSR
Multicast NSR enables the protocol control plane to back up protocol control information,
including neighbor information, MDT information, and RP set information. Multicast NSR
also synchronizes information between the protocol control and forwarding control planes,
and between the forwarding control planes of the MPU and LPUs.
Currently, multicast NSR can be used with PIM-SM, and PIM-SSM.
NOTE
Multicast NSR cannot be used with PIM DM
9.2.2.5 PIM Security

To ensure that multicast services are correctly transmitted on networks, PIM security is
implemented to limit the valid BSR and C-RP address ranges, filter packets, and check PIM
neighbors.

Equipment
Table 9-7 PIM security

PIM Applica Purpose Principle Applicable Protected
Security ble Device Device
Feature Protocol
Limit on PIM-SM Any router on a An ACL All multicast BSR

the PIM-SM network and filtering devices on a
BootStra that uses the BSR rules can be network
p router mechanism can be configured
(BSR) configured as a to limit the
address Candidate-BSR (C- range of
range BSR) and valid BSR
participate in a addresses.
BSR election. The Consequent
winner of the BSR ly, devices
election is will discard
responsible for BSR
advertising packets
Rendezvous Point carrying
(RP) information BSR
on the network. addresses
This function is outside the
used to guarantee valid
BSR security by address
preventing BSR range.
spoofing and
malicious hosts
from replacing
valid BSRs.

Equipment

Feature Protocol
Limit on PIM-SM Any router on a An ACL C-BSR RP

the PIM-SM network and filtering
Candidat that uses the BSR rules can be
e-RP (C- mechanism can be configured
RP) configured as a C- to limit the
address RP and serve range of
range multicast groups in valid C-RP
a specified range. addresses
Each C-RP unicasts and the
an Advertisement range of
message to the multicast
BSR. The BSR groups that
collects all received each C-RP
C-RP information serves.
and summarizes it Then the
as the RP-set, and BSR will
floods the RP-set discard
on the entire Advertisem
network using ent
Bootstrap messages
messages. Based on carrying C-
the RP-set, routers RP
on the network can addresses
calculate out the RP outside the
to which a valid C-RP
multicast group in a address
specific range range.
corresponds.
This function is
used to guarantee
C-RP security by
preventing C-RP
spoofing and
malicious hosts
from replacing
valid C-RPs. With
this function, an RP
can be correctly
elected.

Equipment

Feature Protocol
Register PIM-SM Any new multicast An ACL RP RP

message source on a PIM- and filtering
filtering SM network must rules can be
initially register configured
with the RP. The to enable
RP forwards the RP to
multicast data sent filter
by a multicast Register
source to group messages
members after received
receiving a Register from the
message from the multicast
multicast source's source's
Designated router DR.
(DR).
This function is
used to protect the
network against
invalid Register
messages from
malicious devices.
With this function,
multicast
forwarding trees
can be correctly set
up so that multicast
data can be
correctly sent to
receivers.

Equipment

Feature Protocol
PIM PIM-DM Some unknown An ACL All multicast All multicast

neighbor PIM-SM devices on a and filtering devices on a devices on a
filtering network may set up rules can be network network
Protocol PIM neighbor configured
Independ relationships with a to enable
ent multicast router and interfaces to
Multicast prevent the set up
-Source- multicast router neighbor
Specific from functioning as relationship
Multicast a DR. s only
(PIM- interfaces
SSM) This function is
used to prevent a with valid
multicast router addresses
from setting up and to
PIM neighbor delete
relationships with neighbors
unknown devices with invalid
and prevent an addresses.
unknown router
from becoming a
DR.
Join PIM-DM A Join/Prune An ACL All multicast All multicast

informati PIM-SM message received and filtering devices on a devices on a
on by an interface rules can be network network
filtering PIM- contains both join configured
SSM and prune to filter join
information. information
This function is . Devices
used to filter join create PIM
information to entries
prevent based on
unauthorized users valid Join
from joining information
multicast groups. .

Equipment

Feature Protocol
Source PIM-DM This function An ACL All multicast All multicast

address- PIM-SM enables a device to and filtering devices on a devices on a
based filter multicast data rules can be network network
filtering PIM- packets based on configured
SSM source or source/ to enable
group addresses, devices to
ensuring the forward
security of multicast
multicast data packets
packets. carrying
source or
source/
group
addresses
within the
valid source
or source/
group
address
range.
PIM PIM-DM This function When All multicast All multicast

neighbor PIM-SM guarantees the receiving or devices on a devices on a
check security of Join/ sending network network
PIM- Prune or Assert Join/Prune
SSM messages received or Assert
or sent by devices. messages, a
device
checks
whether the
messages
are sent to
or received
from a PIM
neighbor. If
these
messages
are not sent
to or
received
from a PIM
neighbor,
these
messages
will be
discarded.

Equipment

Feature Protocol
PIM PIM-DM If PIM-SM is The Interface PIM devices

silent PIM-SM enabled on the interface is directly directly
interface directly not allowed connected to connected to
PIM- connecting a to receive the user host user host
SSM multicast device to or forward network network
user hosts, this any PIM segment that segments.
interface can set up packets and is connected
PIM neighbor all PIM to only one
relationships and neighbor PIM device.
process PIM relationship
packets. If a s
malicious host established
sends pseudo PIM by this
Hello packets to the interface
multicast device, are deleted.
the multicast device
may break down.
This function is
used to protect
interfaces of PIM-
SM devices against
pseudo PIM Hello
packets.

Equipment

Feature Protocol
PIM PIM-DM This function is PIM IPSec All PIM All PIM
IPSec PIM-SM used to authenticate uses devices on a devices on a
IPv4 or IPv6 PIM security network. network.
PIM- packets to prevent association
SSM bogus IPv4 or IPv6 (SA) to
PIM protocol authenticate
packet attacks or sent and
denial of service received
(DoS) attacks, IPv4 or
improving IPv6 PIM
multicast service packets.
security. The PIM
IPSec
implementa
tion process
is as
follows:
l Before
an
interface
sends
out an
IPv4 or
IPv6
PIM
protocol
packet,
IPSec
adds a
protocol
header
to the
packet.
l After an
interface
receives
an IPv4
or IPv6
PIM
protocol
packet,
IPSec
uses a
protocol
header
to
authenti

Equipment

Feature Protocol
cate the
protocol
header
in the
packet.
If the is
authenti
cation is
successf
ul, the
packet is
forward
ed.
Otherwi
se, the
packet is
discarde
d.
PIM IPSec
can
authenticate
the
following
types of
IPv4 or
IPv6 PIM
packets:
l IPv4 or
IPv6
PIM
multicas
t
protocol
packets,
such as
Hello
and
Join/
Prune
packets.
l IPv4 or
IPv6
PIM
unicast
protocol
packets,
such as

Equipment

Feature Protocol
Register
and
Register
-Stop
packets.
NOTE
For IPsec
feature
description,
see IPSec.
9.2.2.6 PIM Control Message

PIM devices exchange control messages to implement multicast routing. A PIM control
message is encapsulated in an IP packet, as shown in Figure 9-19.
Figure 9-19 Encapsulation format of a PIM control message
IP header PIM message
l The value of the protocol type field in the IP header is 103. It indicates that the PIM
message is encapsulated in the data field.
l The destination address of the IP header identifies the receiver of the PIM message. The
destination address can be either a unicast address or a multicast address.
l PIM-DM and PIM-SM support different control messages.
Types of PIM Control Message

The headers of all PIM control messages use the same format, as shown in Figure 9-20.
Figure 9-20 Format of the PIM protocol message
0 4 8 16 31
Version Type Reserved Checksum
Table 9-8 Description of the fields of the PIM control message
Field Description
Version Indicates the PIM version. The value is 2.

Equipment
Field Description
Type Indicates the message type.

l 0: Hello (applicable to PIM-DM, PIM-SM, and Protocol
Independent Multicast-Source-Specific Multicast (PIM-
SSM))
l 1: Register (applicable only to PIM-SM)
l 2: Register-Stop (applicable only to PIM-SM)
l 3: Join/Prune (applicable to PIM-DM, PIM-SM, and PIM-
SSM)
l 4: Bootstrap (applicable only to PIM-SM)
l 5: Assert (applicable to PIM-DM, PIM-SM, and PIM-SSM)
l 6: Graft (applicable only to PIM-DM)
l 7: Graft-Ack (applicable only to PIM-DM)
l 8: Candidate-RP-Advertisement (applicable only to PIM-
SM)
l 9: State Refresh (applicable only to PIM-DM)
Reserved Indicates that this field is reserved.
Checksum Indicates the checksum.
Hello Message
NOTE
Hello messages are used in PIM-DM and PIM-SM, so you cannot distinguish PIM-DM or PIM-SM
through the Hello message.
PIM devices periodically send Hello messages through all PIM interfaces. PIM devices
discover neighbors and maintain the neighbor relationship by exchanging Hello messages.
The source address of the IP packet encapsulated with the Hello message is the local interface
address. The destination address is 224.0.0.13, and the TTL value is 1. The message is
transmitted in multicast mode.
Figure 9-21 Format of the Hello message
Hello Option [1]
...
Hello Option [N]

Equipment
Figure 9-22 Hello Option field format
Type Option Type Option Length

Value...
Option Value
Table 9-9 Description of the fields of the Hello message

Field Description
Type Indicates the message type. The value is 0.
Reserved Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is ignored when the message is
received.
OptionType Indicates the types of parameters. For the valid values, see
Table 9-10.
OptionLength Indicates the length of the Option Value field.
OptionValue Indicates the values of parameters.
Table 9-10 Valid OptionType values

OptionType OptionValue
1 Holdtime: indicates the timeout period during which the

neighbor is in the reachable state. If the ATN does not receive
any Hello message in the timeout period, the ATN considers its
neighbor unreachable.
2 The field consists of the following parts:

l LAN Prune Delay: indicates the delay for transmitting
Prune messages in the shared network segment.
l Interval: indicates the period for overriding the prune action
in the shared network segment.
l T: indicates the suppression capability of the Join message.
19 DR Priority: indicates the priorities of the ATN interfaces that

take part in the DR election. The higher the priority is, the more
possible the interface wins in the election.

Equipment
OptionType OptionValue
20 Indicates the generation ID, a random number carried in the

Hello message. It indicates the status of the neighbor. If the
status of the neighbor changes, the random number is updated.
When the ATN finds that the Hello messages received from the
upstream contain different Generation ID values, the ATN
considers that the upstream neighbor is lost or the status of the
upstream neighbor changes.
21 State Refresh Capable: indicates the interval for refreshing the

status of the neighbor.
24 Address List: indicates the secondary address list of PIM

interfaces.
Register Message
NOTE
The Register message is used only in PIM-SM.
When an active multicast source appears in the PIM-SM network, the Designated router (DR)
at the source side sends a Register message to the Rendezvous Point (RP) to register the
source.
The source address of the IP packet encapsulated with the Register message is the address of
the DR at the source side and the destination address is the address of the RP. The message is
transmitted in unicast mode.
Figure 9-23 Format of the Register message

Destination IP
B N Reserved2
Address
Multicast Data Packet
Table 9-11 Description of the fields of the Register message

Field Description
Reserved Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is not processed when the message is
received.
B Indicates the border bit.

Equipment
Field Description
N Indicates the Null-Register bit.
Reserved2 Indicates that this field is reserved. The field is set to 0 when
the message is sent, and is not processed when the message is
received.
Multicast data packet Indicates the multicast data packet. The DR at the source side
encapsulates the received multicast data in a Register message
and sends the message to the RP. After decapsulating the
message, the RP learns the (S, G) information of the multicast
data packet.
If a multicast source may send data to multiple groups, the DR at the source side must send a
Register message to the RP that each group corresponds to. A Register message is
encapsulated in only one multicast data packet, so the packet carries only one copy of the (S,
G) information.
In the register suppression period, the DR sends Null-Register messages to notify the RP that
the multicast source is still in the active state. After the register suppression times out, the DR
reuses the Register message to encapsulate multicast data packet. In the Null-Register
message, the field contains only the IP header of the multicast data packet, including the
source address and group address.
Register-Stop Message
NOTE
The Register-Stop message is used in the PIM-SM.
In a PIM-SM network, the RP sends Register-Stop messages to the DR at the source side in
the following cases:
l Receivers have not received data sent to a certain group from the RP.
l The RP does not serve a certain multicast group.
l Multicast data has been switched from the rendezvous point tree (RPT) to the shortest
path tree (SPT).
After receiving the Register-Stop message, the DR at the source side stops using the Register
message to encapsulate multicast data packet and enters the register suppressed state.
The source address of the IP packet encapsulated with the Register message is the address of
the RP and the destination address is the address of the DR at the source side. The message is
transmitted in unicast mode.
Figure 9-24 Format of the Register-Stop message
Version Type -
Group Address
Source Address

Equipment
Table 9-12 Description of the fields of the Register-Stop message

Field Description
Group Address Indicates the group address G.
Source Address Indicates the source address S.
An RP can synchronously serve multiple groups and a group may correspond to multiple
sources that send data to the group. Therefore, an RP may synchronously perform multiple (S,
G) registrations.
A Register-Stop message carries only one copy of the (S, G) information. When the RP sends
a Register-Stop message to the DR at the source side, the RP can end only one (S, G)
registration.
After receiving the Register-Stop message carrying the (S, G) information, the DR at the
source side stops encapsulating (S, G) packets. S still uses Register messages to encapsulate
packets and send the packets to other groups.
Join/Prune Message
NOTE
The Join/Prune message is used in PIM-DM and PIM-SM.
A Join/Prune message can contain both Join messages and Prune messages. The Join/Prune
message that contains only a Join message is called Join message. The Join/Prune message
that contains only a Prune message is called Prune message.
l When the downstream interface of a PIM device does not have any receiving
requirement, the PIM device sends a Prune message through the upstream interface to
notify the upstream device of stopping forwarding packets to the network segment.
l When a group member appears in the PIM-SM network, the DR at the group member
side sends a Join message through Reverse Path Forwarding (RPF) interface towards the
RP to notify the upstream neighbor of forwarding packets to the network segment. The
Join message is sent to the upstream hop by hop. The RPT is set up.
l When the RP triggers the SPT switchover, the RP sends a Join message through the RPF
interface towards the source to notify the upstream neighbor of forwarding packets to the
network segment. The Join message is sent to the upstream hop by hop. The RP-source
tree is set up.
l When the DR at the group member side triggers the SPT switchover, the DR sends a Join
message through the RPF interface towards the source to notify the upstream neighbor of
forwarding packets to network segment. The Join message is sent to the upstream hop by
hop. The SPT is set up.
l A PIM network segment may be connected to a downstream interface and multiple
upstream interfaces. Assume that an upstream interface sends a Prune message. If other
upstream interfaces still need to receive multicast packets, these interfaces must send the
Join message within the override-interval. Otherwise, the downstream interfaces

Equipment
responsible for forwarding packets in the network segment cannot perform the prune
action.
NOTE
l On a multicast network, if PIM is enabled on the interfaces of the user-side routers, a DR is

elected, and outbound interfaces are added to the PIM DR. The PIM DR sends Join messages
to the RP.
l On a multicast network, if PIM is not enabled on the interfaces of the user-side routers, no
members' DR is elected, and outbound interfaces are added to the IGMP querier. The IGMP
sends Join messages to the RP.
As shown in Figure 9-25, GE1 of ATN A is a downstream interface, and GE2 of ATN B
and GE3 of ATN C are upstream interfaces. If ATN B sends a Prune message through
GE2, GE3 of ATN C and GE1 of ATN A can receive the message. If ATN C still wants
to receive the multicast data of the group, ATN C needs to send a Join message within
the override-interval. GE1 of ATN A can know that a downstream ATN still wants to
receive the multicast data. Therefore, ATN A does not perform the prune action.
l Figure 9-25 Join/Prune Messages in the PIM Shared Network Segment
ATN A
GE1
Ethernet
Prune Join
GE2 GE3
ATN B ATN C
The source address of the IP packet encapsulated with Join/Prune message is the local
interface address. The destination address is 224.0.0.13, and the TTL value is 1. The message
is transmitted in multicast mode.
Figure 9-26 Format of the Join/Prune message
Version Type -
Upstream Neighbor Address
- Number of Groups(N) Holdtime
Group J/P Record [ 1 ]
...
Group J/P Record [ N ]

Equipment
Figure 9-27 Format of the Group J/P Record field
Group Address [ 1 ]
Number of Joined Sources( J ) Number of Pruned Sources( P )
Joined Source Address [ 1 ]
...
Joined Source Address [ J ]
Pruned Source Address [ 1 ]
...
Pruned Source Address [ P ]
Table 9-13 Description of the fields of the Join/Prune message

Field Description
Upstream Neighbor Indicates the address of the upstream neighbor, that is, the
Address address of the downstream interface that receives the Join/
Prune message and performs the Join and Prune actions.
Number of Groups Indicates the number of groups contained in the message.
Holdtime Indicates the amount of time a receiver keeps the Join/Prune

state, in seconds.
Group Address Indicates the group address.
Number of Joined Indicates the number of sources that the ATN joins.
Sources
Number of Pruned Indicates the number of sources that the ATN prunes.
Sources
Joined Source Address Indicates the address of the source that the ATN joins.
Pruned Source Address Indicates the address of the source that the ATN prunes.
Bootstrap Message
NOTE
The Bootstrap message is applicable only to PIM-SM.
When the dynamic RP is used in the PIM-SM network, ATNs configured with Candidate-
BSR (C-BSR) periodically send Bootstrap messages through all PIM interfaces to take part in
the BootStrap router (BSR) election. The ATN that wins in the election continues to send
Bootstrap messages carrying RP-set information to all PIM devices in the domain.

Equipment
The source address of the IP packet encapsulated with the Bootstrap message is the C-BSR
address and the destination address is 224.0.0.13. The packet is sent in multicast mode. The
packet with the TTL of 1 is forwarded hop by hop in the PIM-SM and is flooded in the entire
network at last.
Figure 9-28 Format of the Bootstrap message
Version Type -
Fragment Tag Hash Mask Length BSR-priority
BSR-Address
Group-RP Record [ 1 ]
...
Group-RP Record [ N ]
Figure 9-29 Format of the Group-RP Record field
Group Address
RP-Count Frag RP-Cnt(M) -
RP-address [ 1 ]
RP-holdtime [ 1 ] RP-Priority [ 1 ] -
...
RP-address [ M ]
RP-holdtime [ M ] RP-Priority [ M ] -
Table 9-14 Description of the fields of the Bootstrap message

Field Description
Fragment Tag Indicates random number used to distinguish the Bootstrap

message.
Hash Mask length Indicates the length of the Hash mask of the C-BSR.
BSR-priority Indicates the priority of the C-BSR.
BSR-Address Indicates the C-BSR address.

Equipment
Field Description
RP-Count Indicates the total number of the Candidate-RPs (C-RPs) that

serve the group.
Frag RP-Cnt Indicates the total number of the C-RPs that want to serve the
group in the network segment. The packet may be fragmented
and the RP-Set information may not be integrated, so the field
is used to indicate this.
RP-address Indicates the address of the C-RP.
RP-holdtime Indicates the aging time of the advertisement message sent by

the C-RP. The aging time indicates the valid time of the C-RP.
RP-Priority Indicates the priority of the C-RP.
The BSR boundary of a PIM interface can be set by using the pim bsr-boundary command
on the interface. Multiple BSR boundary interfaces divide the network into different PIM-SM
domains. Bootstrap messages cannot pass through the BSR boundary.
Assert Message
NOTE
The Assert message can be used in PIM-DM and PIM-SM.
In the shared network segment, if a PIM device receives an (S, G) packet from the
downstream interface of the (S, G) or (*, G) entry, it indicates that other forwarders exist in
the network segment. The PIM device sends an Assert message through the downstream
interface to take part in the election. The ATN that fails in the election stops forwarding
multicast packets through the downstream interface.
The source address of the IP packet encapsulated with the Assert message is the local
interface address, the destination address is 224.0.0.13, and the TTL value is 1. The packet is
sent in multicast mode.
Figure 9-30 Format of the Assert message
Version Type -
Group Address
Source Address
R Metric Preference
-
Metric

Equipment
Table 9-15 Description of the fields of the Assert message

Field Description
Source address If the ATN elects the unique forwarder of the (S, G) entry, the
address is the source address. If the ATN elects the unique
forwarder of the (*, G) entry, the address is 0.
R Indicates the RPT bit. If the ATN elects the unique forwarder of
the (S, G) entry, the bit is set to 0; if the ATN elects the unique
forwarder of the (*, G) entry, the bit is set to 1.
Metric Preference Indicates the priority of the unicast path to the source address.
If the R field value is set 1, this field indicates the priority of the
unicast path to the RP.
Metric Indicates the cost of the unicast route to the source address.
If the R field value is set 1, this field indicates the cost of the
unicast path to the RP.
Graft Message
NOTE
The Graft message is applicable only to PIM-DM.
In the PIM-DM network, when the ATN receives a Report message from a host, the ATN
sends a Graft message through the upstream interface of the related (S, G) entry if the ATN is
not on the SPT. The upstream neighbor immediately restores the forwarding of the
downstream interface. If the upstream neighbor is not on the SPT, the neighbor continues to
send the Graft message to the upstream.
The source address of the IP packet encapsulated with the Graft message is the local interface
address and the destination address is the RPF neighbor. The packet is sent in unicast mode.
The format of the Graft message is the same as that of the Join/Prune message, as shown in
Table 9-16. Only the values of partial fields are different.
Table 9-16 Values of partial fields of the Graft message

Field Description
Joined Source Address Indicates the source address of the (S, G) to be grafted.
Number of Pruned Indicates that the field is 0.

Sources
Hold Time Indicates that the field is 0.

Equipment
Graft-Ack Message
NOTE
The Graft-Ack message is applicable only to PIM-DM.
In the PIM-DM network, when the ATN receives a Graft message from the downstream, the
ATN restores the forwarding of the related downstream interface. At the same time, the ATN
sends a Graft-Ack message through the downstream interface to notify that it has received the
Graft message.
If the ATN that sends out the Graft message does not receive any Graft-Ack message in the
set time, the ATN considers that the upstream does not receive the Graft message and resends
it.
The source address of the IP packet encapsulated in the Graft-Ack message is the downstream
interface address of an upstream device and the destination address is the address of the ATN
that sends out the Graft message. The packet is sent in unicast mode.
The format of the Graft-Ack message is the same as that of the Graft message and copies
some contents of the Graft message. Only the values of partial fields are different.
Table 9-17 Values of partial fields of the Graft-Ack message

Field Description
Upstream Neighbor Indicates the address of the ATN that sends out the Graft
Address message.
C-RP Advertisement Message

NOTE
The C-RP Advertisement message is applicable only to PIM-SM.
When the dynamic RP is used in the PIM-SM network, ATNs configured with C-RP
periodically send Advertisement messages to notify the BSR of the range of groups they want
to serve.
The source address of the IP packet encapsulated with the Advertisement message is the C-RP
at the source side and the destination address is the BSR. The packet is sent in unicast mode.
Figure 9-31 Format of the Advertisement message
Version Type -
Prefix-Cnt Priority Holdtime
RP-Address
Group Address [ 1 ]
...
Group Address [ N ]

Equipment
Table 9-18 Description of the fields of the Advertisement message
Field Description
Prefix-Cnt Indicates the prefix value of the multicast address.
Priority Indicates the C-RP priority.
Holdtime Indicates the aging time of the Advertisement message.
RP-Address Indicates the C-RP address.
State-Refresh Message
NOTE
The State-Refresh message is applicable only to PIM-DM.
In the PIM-DM network, to avoid that the interface restores forwarding because the prune
timer times out, the first-hop router nearest to the source periodically triggers State-Refresh
messages. The State-Refresh message is flooded in the entire network and the statuses of
prune timers on all ATNs are refreshed.
The source address of the IP packet encapsulated with the State-Refresh message is the
downstream interface address, the destination address is 224.0.0.13, and the TTL value is 1.
The packet is sent in multicast mode.
Figure 9-32 Format of the State-Refresh message
Version Type -
Multicast Group Address
Source Address
Originator Address
Metric Preference
Metric
Masklength TTL P - Interval
Table 9-19 Description of the fields of the State-Refresh message
Field Description
Multicast Group Indicates the group address.

Address

Equipment
Field Description
Source Address Indicates the source address.
Originator Address Indicates the address of the first-hop router.
Metric Preference Indicates the priority of the unicast route to the source.
Metric Indicates the cost of the unicast route to the source.
Masklength Indicates the address mask length of the unicast route to the
source.
TTL Indicates the TTL of the State-Refresh message. The TTL is

used to limit the transmission range of the messages. The TTL
value is reduced by 1 each time the State-Refresh message is
forwarded by the ATN.
P Indicates the prune indicator flag. If the State-Refresh message

is sent out through the pruned interface, P is 1. Otherwise, P is
0.
Interval Indicates the interval for sending State-Refresh messages.
9.2.3 Applications
9.2.3.1 PIM-DM Intra-domain

Continuing development of the Internet has led to considerable growth in the types of data,
voice, and video information exchanged online. New services, such as Video on Demand
(VOD) and Broadcast Television (BTV) have emerged and continue to develop. Multicast
plays an increasingly important role in transmitting these services. This section describes the
networking of a PIM-DM intra-domain.
Multicast services are deployed on the small-scale network shown in Figure 9-33. An IGP
has been deployed, and each network segment route is reachable. Group members are
distributed densely. Users want to receive VoD information without consuming too many
network bandwidth resources.

Equipment
Figure 9-33 PIM-DM intra-domain

ATN A
HostA
PIM-DM
Source
RouterD ATN B
HostB
ATN C
Interfaces that need to be enabled with PIM-DM
Implementation Solution
On the network shown in Figure 9-33, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast
mode. PIM-DM is used throughout the PIM domain. RouterD is connected to the multicast
source. ATN A is connected to Host A. ATNs B and C are connected to Host B.
Network configuration details are as follows:

l PIM-DM is enabled on all ATN interfaces.
l IGMP runs between ATN A and Host A, between ATN B and Host B, and between ATN
C and Host B.
When configuring IGMP on ATN interfaces, ensure that interface parameters are
consistent. All ATNs connected to the same network must run the same version of IGMP
(IGMPv2 is recommended) and be configured with the same interface parameter values,
such as the Query timer value and hold time of memberships. If the IGMP versions or
interface parameters are different, IGMP group memberships are inconsistent on
different ATNs.
l Hosts A and B can receive VoD information.
9.2.3.2 PIM Intra-domaim

Continuing development of the Internet has led to considerable growth in the types of data,
voice, and video information exchanged online. New services, such as Video on Demand
(VOD) and Broadcast Television (BTV) have emerged and continue to develop. Multicast
plays an increasingly important role in transmitting these services. This section describes the
networking of a PIM-SM intra-domain.
Figure 9-34 shows a large-scale network with multicast services deployed. An IGP has been
deployed, and each network segment route is reachable. Group members are distributed
sparsely. Users on the network want VoD services, but network bandwidth resources are
limited.

Equipment
Figure 9-34 PIM-SM intra-domain
S1 RouterB ATN C
HostA
Loopback0
Loopback0
C-BSR C-RP PIM-SM
S2
C-RP C-BSR
RouterA RouterD ATN E

HostB
RouterG
ATN F
Interfaces that need to be enabled with PIM -SM
As shown in Figure 9-34, Host A and Host B are multicast information receivers, each
located on a different leaf network. The hosts receive VoD information in multicast mode.
PIM-SM is used in the entire PIM domain. RouterB is connected to multicast source S1.
RouterA is connected to multicast source S2. ATN C is connected to Host A. ATNs E and F
are connected to Host B.

l PIM-SM is enabled on all equipment interfaces.
l As shown in Figure 9-34, multicast sources are densely distributed. Candidate-RPs (C-
RPs) can be deployed on devices close to the multicast sources. Loopback 0 interfaces
on Routers A and D are configured as Candidate-BSRs (C-BSRs) and C-RPs. A
BootStrap router (BSR) is elected among the C-BSRs. A Rendezvous Point (RP) is
elected among the C-RPs.
There are several RP deployment modes:
– Small and mid-scale network: A static RP is recommended because it is stable and
does not require high performance from devices.
If there is only one multicast source on the network, setting the device directly
connected to the multicast source as a static RP is recommended. The source' DR
also functions as the RP and does not need to register with the RP.
When a static RP is used, all equipments, including the RP, must have the same
information about the RP and the multicast groups that the RP serves.
– Large-scale network: A dynamic RP or an anycast RP is recommended for high
network reliability and maintainability.
n Dynamic RP

Equipment
○ If multiple multicast sources are densely distributed on the network,

configuring core devices close to the multicast sources as C-RPs is
recommended.
○ If multiple users are densely distributed on the network, configuring core
devices close to the users as C-RPs is recommended.
n Anycast RP
○ Small-scale network: A static RP is recommended.
○ Large-scale network: Configuring a BSR to elect an RP is recommended
to facilitate maintenance of RP information.
NOTE
Avoid configuring static RPs on some equipments and dynamic RPs on others in the same PIM
domain. This ensures that RP information is consistent throughout the PIM domain.
l IGMP is run between ATN C and Host A and between ATN E, ATN F, and Host B.
When configuring IGMP on equipment interfaces, ensure that interface parameters are
consistent. All equipments connected to the same network must run the same version of
IGMP (IGMPv2 is recommended) and be configured with the same parameter values,
such as the interval at which IGMP Query messages are sent and holdtime of
memberships. Otherwise, IGMP group memberships on different equipments are
inconsistent.
l Hosts A and B send Join messages to the RP as required to obtain required information
from the multicast source.
NOTE
Configuring interfaces on network edge devices to statically join all multicast groups is
recommended to increase the speed for changing channels and to provide a stable viewing
environment for users.
9.2.3.3 PIM-SSM Intra-domain

Both Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM) and PIM-SM are
for use on large-scale networks where group members are sparsely distributed. Unlike PIM-
SM, PIM-SSM can be used in scenarios in which users know the multicast source location in
advance and send requests to specific sources for multicast data. This section describes PIM-
SSM intra-domain networking.
Multicast services are deployed on the large-scale network shown in Figure 9-35. An IGP has
been deployed, and each network segment route is reachable. Group members are sparsely
distributed on the network. User hosts on the network want to send Join messages directly to
specific multicast sources and receive Video on Demand (VOD) information.

Equipment
Figure 9-35 PIM-SSM intra-domain
S1 RouterB ATN C
HostA
PIM-SM
S2
ATN E
RouterA RouterD
HostB
RouterG
ATN F
Interfaces that need to be enabled with PIM -SM
On the network shown in Figure 9-35, Hosts A and B are multicast information receivers,
each located on a different leaf network. The hosts receive VoD information in multicast
mode. PIM-SSM is used throughout the PIM domain. RouterB is connected to multicast
source S1. Router A is connected to multicast source S2. ATN C is connected to Host A.
ATNs E and F are connected to Host B.
l PIM-SSM is enabled on all equipment interfaces.
NOTE
A receiver in a PIM-SSM scenario can send a Join message directly to a specific multicast source.
A shortest path tree (SPT) is established between the multicast source and receiver. It is
unnecessary to maintain a Rendezvous Point (RP) on a PIM-SSM network.
l IGMP runs between ATN C and Host A, between ATN E and Host B, and between ATN
F and Host B.
When configuring IGMP on equipment interfaces, ensure that interface parameters are
consistent. All equipments connected to the same network must run the same version of
IGMP (IGMPv3 is recommended) and be configured with the same interface parameter
values, such as the Query timer value and hold time of memberships. If the IGMP
versions or interface parameters are different, IGMP group memberships are inconsistent
on different equipments.
l Host A can send Join messages to S1. Host B can send Join messages to S2. Information
sent by these multicast sources can reach corresponding user hosts.
NOTE
Configuring interfaces on network edge devices to statically join all multicast groups is
recommended to increase the speed for changing channels and to provide a stable viewing
environment for users.

Equipment
Terms
Terms Definition
(S, G) A multicast routing entry. S indicates a multicast source, and G indicates a

multicast group.
After a multicast packet with S as the source address and G as the group
address reaches the multicast device, it is forwarded through the downstream
interface of the (S, G) entry.
The packet is expressed as an (S, G) packet.
Assert A mechanism that applies to PIM-DM and PIM-SM.

After receiving a multicast packet from the downstream interface, a multicast
device performs an RPF check on the packet. If the RPF check fails, it
indicates that other multicast forwarders exist in the network segment. The
multicast device sends Assert messages through the downstream interface to
participate in the assert election. If the multicast device fails in the assert
election, it removes the downstream interface from the downstream interface
list.
Assert ensures that a maximum of one multicast forwarder exists in a network
segment and only one multicast packet is transmitted.
Flooding A routing method that applies to PIM-DM.

PIM-DM assumes that all members are densely distributed on the network
and each network segment may have members. According to the assumption,
the multicast source floods multicast packets to each network segment and
then prunes the network segment that does not have any member.
Through the periodical Flooding-Pruning, PIM-DM creates and maintains a
unidirectional and loop-free SPT connecting the multicast source and group
members.
Graft A behavior that applies only to PIM-DM.

After the downstream interface list of a multicast device that is originally null
is added with a downstream interface, the multicast device sends a Graft
message upstream.
When receiving the Graft message, the interface of the upstream multicast
device immediately changes from the prune state to the forwarding state, and
is added to the downstream interface list.
PIM A multicast routing protocol, with a full name of Protocol Independent

Multicast.
Reachable unicast routes are the basis of PIM forwarding. PIM uses the
existing unicast routing information to perform Reverse Path Forwarding
(RPF) check on multicast packets to create multicast routing entries and set up
a Multicast Distribution Tree (MDT).

Equipment
Terms Definition
Prune A behavior that applies to PIM-DM and PIM-SM.

When the downstream interface of a PIM router does not have any receiving
requirement, the PIM router sends a Prune message through the upstream
interface to notify the upstream device of stopping forwarding packets to the
network segment.
After receiving the Prune message, the upstream multicast device removes the
downstream interface from the downstream interface list.

Acronym & Full Name
Abbreviation
BSR BootStrap Router
C-BSR Candidate-BSR
RP Rendezvous Point
C-RP Candidate-RP
PIM-SM Protocol Independent Multicast-Sparse Mode
SSM Source-Specific Multicast
PIM-DM Protocol Independent Multicast-Dense Mode
PIM Protocol Independent Multicast
9.3 IGMP
9.3.1 Introduction
Definition
In the TCP/IP protocol suite, the Internet Group Management Protocol (IGMP) manages IPv4
multicast members, and sets up and maintains multicast member relationships between IP
hosts and their directly connected multicast ATNs.
After IGMP is configured on hosts and their directly connected multicast ATNs, the hosts can
dynamically join multicast groups, and the multicast ATNs can manage multicast group
members on the local network.
IGMP has three versions: IGMPv1 (defined by RFC 1112), IGMPv2 (defined by RFC 2236),
and IGMPv3 (defined by RFC 3376). All the IGMP versions support any-source multicast
(ASM). IGMPv3 supports source-specific multicast (SSM), not requiring the SSM mapping
technique; however, IGMPv1 and IGMPv2 require the SSM mapping technique to support
SSM.

Equipment
Purpose
IGMP allows receivers to access IP multicast networks, join multicast groups, and receive
multicast data from multicast sources. IGMP manages multicast group members by
exchanging IGMP messages between hosts and ATNs. In addition, IGMP records host join
and leave information on interfaces, ensuring correct multicast data forwarding on the
interfaces.
9.3.2 Principles
9.3.2.1 IGMPv1&v2&v3
IGMP
Figure 9-36 IGMP networking
ISP
ATN A ATN B
Ethernet
HostA HostB HostC
The IGMP implementation principle is as follows:

IGMP enables a multicast ATN to identify receivers by sending IGMP Query messages to and
receiving IGMP Report and Leave messages from hosts. A multicast ATN forwards multicast
data to a network segment only if the network segment has multicast receivers. Hosts can
decide whether to join or leave a multicast group.
For example, on the network shown in Figure 9-36, the IGMP-enabled ATN A periodically
sends IGMP Query messages. All hosts (Host A, B, and C) on the same network segment as
ATN A can receive these Query messages.
l When a host (for example, Host A) receives an IGMP Query message of a multicast
group G, the processing flow is as follows:
– −If Host A is already a member of group G, Host A replies with an IGMP Report
message of group G at a random time within the response period specified on ATN
A.
After receiving the IGMP Report message, ATN A records information about G and
forwards data to the network segment of the host interface directly connected to
ATN A. Meanwhile, ATN A starts a timer for G or resets the timer if it has been

Equipment
started. If no members of group G respond to ATN A within the interval specified

by the timer, ATN A stops forwarding the data of group G.
– − If Host A is not a member of any multicast group, Host A does not respond to the
IGMP Query message from ATN A.
l When a host (for example, Host A) joins a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Report message of group G to ATN A, instructing ATN A to
update its multicast group information. Subsequent IGMP Report messages of group G
are triggered by IGMP Query messages sent by ATN A.
l When a host (for example, Host A) leaves a multicast group G, the processing flow is as
follows:
Host A sends an IGMP Leave message of group G to ATN A. After receiving the IGMP
Leave message, ATN A triggers a query to check whether group G has other receivers on
the network segment. After the query ends, if ATN A does not receive IGMP Report
messages of group G within the period specified by the Query message, ATN A deletes
the information about group G and stops forwarding data of group G to the network
segment.
IGMP Message Processing in IGMPv1

IGMPv1 manages multicast groups by exchanging IGMP Query messages and IGMP Report
messages. In IGMPv1, a host does not send an IGMP Leave message when leaving a
multicast group, and an IGMP ATN deletes the record of a multicast group when the timer for
maintaining the members in the multicast group expires.

In IGMPv2, an IGMP Report message contains information about a multicast group, but does
not contain information about a multicast source. After a host sends an IGMP Report message
of a multicast group to an IGMP ATN, the ATN notifies the multicast forwarding module of
this join request. Then the multicast forwarding module can correctly forward multicast data
to the host.
IGMPv2 is capable of suppressing IGMP Report messages to reduce repetitive IGMP Report
messages. This function works as follows:
After a host (for example, Host A) joins a multicast group G, Host A receives an IGMP Query
message from the ATN. Then the host randomly selects a value from 0 to the maximum
response time (specified in the IGMP Query message) as the timer value. When the timer
expires, Host A sends an IGMP Report message of group G to the ATN. However, if Host A
receives an IGMP Report message of group G from another host in group G before the timer
expires, Host A does not send an IGMP Report message of group G to the ATN.
When a host leaves group G, the host sends an IGMP Leave message of group G to the ATN.
With the Report message suppression function in IGMPv2, the ATN cannot determine
whether another host exists in group G. Therefore, the ATN triggers a query on group G. If
another host exists in group G, the host sends an IGMP Report message of G to the ATN.
If the ATN sends the query on group G for a specified number of times, but does not receive
an IGMP Report message for group G, the ATN deletes information about group G and stops
forwarding multicast data of group G.

Equipment
NOTE
Both IGMP queriers and non-queriers can process IGMP Report message, while only queriers can
forward IGMP Report messages. IGMP non-queriers cannot process IGMP Leave messages.

An IGMPv2 Report message contains information about multicast groups, but does not
contain information about multicast sources. Therefore, an IGMPv2 host can select a
multicast group, but not a multicast source/group. IGMPv3 has resolved the problem. The
IGMPv3 message from a host can contain multiple records of multicast groups, with each
multicast group record containing multiple multicast sources.
On the ATN side, the querier sends IGMP Query messages and receives IGMP Report and
Leave messages from hosts to identify network segments that contain receivers and forward
the multicast data to the network segments. In IGMPv3, source information in multicast group
records can be filtered in either include mode or exclude mode:
l In include mode:
– If a source is included in a group record and the source is active, the ATN forwards
the multicast data of the source.
– If a source is included in a group record but the source is inactive, the ATN deletes
the source information and does not forward the multicast data of the source.
l In exclude mode:
– If a source is active, the ATN forwards the multicast data of the source, because
there are hosts that require the multicast data of the source.
– If a source is inactive, the ATN does not forward the multicast data of the source.
– If a source is excluded in a group record, the ATN forwards the multicast data of the
source.
Compared with IGMPv2, IGMPv3 has no Report message suppression mechanism. As a
result, all hosts joining the multicast group must reply with IGMP Report messages when
receiving IGMP Query messages.
IGMPv3 allows hosts to select multicast sources, while IGMPv2 does not. Therefore, the
IGMPv3-enabled device adds the designated multicast source and group query along with the
common query and multicast group query. This process ensures that the ATN can identify
whether receivers of a specified multicast source exist.
9.3.2.2 IGMP Group Compatibility
In IGMP group compatibility mode, a multicast device of a higher IGMP version can also be
compatible with the hosts of a lower IGMP version.
For example, the multicast device of the IGMPv2 version can correctly process the joining of
hosts in the IGMPv1 version; the multicast device of the IGMPv3 version can correctly
process the joining of hosts in the IGMPv1 or IGMPv2 version. When the multicast device
operates in IGMP group compatibility mode, and receives IGMP Report messages from the
hosts in a lower IGMP version, the multicast device automatically lowers the version of the
corresponding multicast group to be the same as that for the hosts and then operates in this
version.
For example, when the multicast device of IGMPv2 or IGMPv3 version receives Report
messages from the hosts in the IGMPv1 version, the multicast device lowers the version of

Equipment
the corresponding multicast group to IGMPv1. Then, the multicast device ignores the
IGMPv2 Leave messages in the multicast group.
In addition, when the multicast device of the IGMPv3 version receives Report messages from
the hosts in the IGMPv2 version, the multicast device lowers the version of the corresponding
multicast group to IGMPv2. Then, the multicast device ignores the IGMPv3 BLOCK
messages, the IGMPv3 TO_IN messages, and the multicast source list in the IGMPv3 TO_EX
messages. The multicast source-selecting function of IGMPv3 messages is suppressed.
If the IGMP version of a multicast device is configured higher, the multicast group of the
original IGMP version can still function properly as soon as the multicast group contains
hosts.
NOTE
By default, the IGMP version of a multicast device is IGMPv2.
9.3.2.3 IGMP Querier Election
The IGMP-enabled multicast device plays the following two roles on the network segment:
l Querier
The querier is responsible for sending IGMP Query messages and receiving IGMP
Report messages and Leave messages from hosts. In this way, the multicast device
knows which multicast group has receivers (multicast group members) on the relevant
network segment.
l Non-querier
The non-querier only receives IGMP Report messages from hosts, and knows which
multicast group on the network segment has receivers. Then, according to the action of
the querier on the network segment, the multicast device identifies which receives leave
the network segment.
Generally, only one querier exists on a network segment. Therefore, you must follow the
principles to select the querier among multicast devices (take ATN A, ATN B, and ATN C as
an example):
l After ATN A is enabled with IGMP, ATN A considers itself as the default querier of the
network segment in the IGMP startup process, and sends IGMP Query messages on the
network segment. If ATN A receives the IGMP Query message from ATN B that has a
lower IP address, ATN A is changed from the querier to the non-querier, starts the
another-querier-existing timer, and records ATN B as the querier of the network segment.
l If ATN A in the non-querier state receives the IGMP Query message from the querier
ATN B, the another-querier-existing timer is updated; if ATN A in the non-querier status
receives the IGMP Query message from ATN C that has a lower IP address than the
querier ATN B, the querier is updated to be ATN C, and the another-querier-existing
timer is updated.
l When ATN A is in the non-querier status, the another-querier-existing timer expires.
Then ATN A is changed from the non-querier to the querier.
NOTE
IGMPv1 does not support the querier election, and the querier in IGMPv1 is designated by the upper-
layer protocol, such as the Protocol Independent Multicast (PIM). At present, only the querier election
for multicast devices of the same network segment and same IGMP version is supported. Therefore, all
multicast devices on the same network segment must be configured with the same IGMP version.

Equipment
9.3.2.4 Router-Alert for IGMP
Generally, packets are sent to and processed by the routing protocol only if the destination IP
address is the IP address of an interface on the device. In real applications, if the destination
IP address of protocol packets is a multicast address or a particular IP address, the packets
may not be sent to the routing protocol.
Therefore, Router-Alert, as a particular mechanism for marking protocol packets, is
introduced. If a packet contains the Router-Alert option, it indicates that the packets must be
sent to and processed by the routing protocol.
The destination IP address of IGMP packets is usually a multicast address, and thus the IGMP
packets may not be sent to the routing protocol. In such a situation, the Route-Alert option can
properly address the problem:
l By default, Router-Alert is not checked , the multicast device sends the received IGMP
messages to the routing protocol no matter whether these IGMP messages contain the
Route-Alert option.
l When Router-Alert is configured to be checked, only the IGMP messages containing the
Route-Alert option can be sent to the routing protocol.
9.3.2.5 IGMP Only-Link
IGMP Only-Link refers to the mechanism that the interface of the multicast device that is
connected to the host is only enabled with IGMP (rather than other upper-layer protocols such
as PIM), and IGMP guides data forwarding on the corresponding network segment.
Compared with PIM-guided data forwarding on a network segment, IGMP Only-Link reduces
the maintenance jobs of the multicast device on information such as PIM neighbors and state
machine of the PIM interface.
After IGMP Only-Link is used, the querier provides the following functions:
l By sending IGMP Query messages to hosts and receiving IGMP Report messages and
Leave messages from hosts, the querier can know which multicast group contains
receivers on the relevant network segment.
l The querier maintains the Join/Leave status of the IGMP multicast group, and guides
data forwarding on the relevant network segment according to the Join/Leave status.
The non-querier only maintains the Join/Leave status of the IGMP multicast group.
NOTE
If PIM is enabled on the interface, Designated router (DR) is responsible for guiding data forwarding.
For the details, refer to the section "Basic Principles of PIM DR Election" in the PIM-SM.
9.3.2.6 IGMP On-Demand
By sending IGMP Query messages to the connected host and receiving IGMP Report
messages and Leave messages from the host, a multicast device can know which multicast
group contains receivers on the relevant network segment. The device connected to the
multicast device, however, may be not a host, but an access device that is enabled with IGMP
proxy.
To reduce packet exchange between the multicast device and the access device, you can
perform optimization. After converging the IGMP Report/Leave status of the IGMP multicast

Equipment
group, the access device reports the IGMP Report/Leave status to the multicast device only if
the status of the IGMP multicast group is changed. In other words, the access device sends
IGMP Report messages to the multicast device only if the first member joins the multicast
group, and sends the IGMP Leave message to the multicast device only if the last member
leaves the multicast group. This is called IGMP On-Demand.
The multicast device enabled with IGMP On-Demand does not send the IGMP Query
message initiatively to identify whether the IGMP multicast group contains receivers on the
network segment, but maintains the IGMP multicast group by receiving the Report/Leave
status of the multicast group converged by its connected access device (IGMP proxy).
IGMP On-Demand is applied to IGMPv2 and IGMPv3 only. After a multicast device is
enabled with IGMP On-Demand, the multicast device implements IGMP different from the
standard one, as shown follows:
l The multicast device does not send IGMP Query messages initiatively.
l After the multicast device receives the IGMP Report message, the multicast device
creates the entry about the multicast group and multicast source, and the entry never
expires.
l The multicast device deletes the relevant entries only if it receives the IGMP Leave
message.
9.3.2.7 IGMP Prompt-Leave
When a host quits the multicast group G, the host sends the IGMP Leave message of G to the
multicast device. Because of the report depression mechanism in IGMPv2, the multicast
device cannot determine whether another host joins G. Therefore, the multicast device
triggers a query on G. If another host joins G, the host sends the IGMP Report message of G
to the multicast device. If the multicast device sends the query on G for several times, but
receives no IGMP Report message from any host, the multicast device does not record
information about G, and stops forwarding the multicast data of G to the relevant network
segment.
If the multicast device is only connected to an access device that is enabled with IGMP proxy,
when the access device leaves a multicast group G and sends the IGMP Leave message of G
to the multicast device, the multicast device can identify that G contains no receivers and thus
need not to trigger the IGMP Query message. Then the multicast device can delete all records
about G, and stop forwarding data of G to the relevant network segment. This is called IGMP
Prompt-Leave.
After the multicast device is enabled with IGMP Prompt-Leave, the multicast device triggers
no IGMP Query messages destined for the multicast group when the multicast device receives
the IGMP Leave message from the multicast group. Then the multicast device deletes all
records about the multicast group, and stops forwarding the data of the multicast group to the
relevant network segment. In this manner, the multicast device responses faster to the IGMP
Leave message.
The implementation of IGMP Prompt-Leave of (S,G) is same as that of IGMP Prompt-Leave

of group G.
NOTE
The IGMP On-Demand feature already includes the IGMP Prompt-Leave feature.

Equipment
9.3.2.8 IGMP Policy Control

IGMP policy control restricts or extends IGMP actions, without affecting IGMP
implementation. IGMP policy control can be implemented through IGMP-limit, source
address-based IGMP message filtering, or group-policy.
l IGMP-limit
IGMP-limit is used to limit the number of entries of multicast groups, including source-
specific groups, on interfaces, a single instance, or all instances.
l Static-group
Static-group allows data forwarding of a multicast group, irrespective of whether users
require data of this group. Therefore, this function fastens the response to user requests
and the service availability during channel switchover.
l Group-policy
Group-policy is configured on ATN interfaces. With this function, you can set
restrictions on specific multicast groups, so that entries will not be created for the
restricted multicast groups. This improves IGMP security.
IGMP-Limit
Figure 9-37 Networking diagram of IGMP-Limit
Ethernet
HostA
ATN A Receiver
GE0/3/0 N1
192.168.1.1/24 GE0/3/1
HostB
10.110.1.1/24
ATN B
GE0/3/0 GE0/3/1
192.168.2.1/24 10.110.2.1/24 Leaf network
HostC
PIM network
ATN C GE0/3/1 Receiver
N2
10.110.2.2/24
HostD
GE0/3/0
192.168.3.1/24
Ethernet
When a large number of multicast users request multiple programs simultaneously, excessive
bandwidth resources will be exhausted, and the ATN's performance will be degraded,
deteriorating the multicast service quality. To prevent this problem, use IGMP-limit to restrict
the maximum number of multicast groups on specific or all interfaces. This function enables
users who have successfully joined multicast groups to enjoy smoother multicast services.
IGMP-limit applies to a specific ATN interface, a single instance, and all instances or the
entire system. With IGMP-limit, when an IGMP Report message for a new group reaches, the
ATN first checks whether the number of multicast groups exceeds the upper limit. If the upper
limit is not exceeded, the ATN establishes a membership for the group and forwards data of
this group. However, if the upper limit is exceeded, the ATN rejects the join request.

Equipment
l IGMP-limit on an interface
– You can set an IGMP entry limit on an interface. Then, after the interface receives
an IGMP Join message, the interface can determine whether to create an entry
based on the IGMP entry limit.
– You can configure groups, including source-specific groups, to be free of the IGMP
entry limit.
l IGMP entry limit on the ATN
You can set the IGMP entry limit on the ATN. That is, you can limit the number of
IGMP entries on the interfaces belonging to all instances on the ATN.
– After an interface receives an IGMP Join message, the interface determines whether
to create an entry according to whether the number of the IGMP entries on the
whole ATN reaches the configured limit.
– When an interface deletes (*, G) and (S, G) entries, the interface decreases the
IGMP entries on the ATN correspondingly.
The preceding IGMP entry limit policies are subject to the following rules:
l A (*, G) entry or an (S, G) entry is counted as one entry.
l A (*, G) entry used in SSM mapping is counted as one entry; however, the (S, G) entry
mapped by the (*, G) entry is not counted as an entry.
Static-Group
Figure 9-38 Networking diagram of Static-Group
ATN A
Source1
User1
PIM-DM
or
PIM-SM
Source2 User2
ATN B
Interfaces that are configured to statically join related

groups and are equal to stable members on the network segment
Static-Group is implemented by configuring the static multicast group on the designated

interface. After Static-Group is configured, the entries created on the ATN have no timer and
never expire. Therefore, the ATN continuously forwards data to receivers in the static
multicast group. When the receivers do not need the forwarded multicast data, the multicast
data cannot be automatically deleted through entry expiration, but through the manual
deletion of the static multicast group.
In real-world situations, Static-Group is configured on the ATN interface that is connected to
the host. This configuration facilitates multicast data forwarding to the ATN. When the host or
ATN that is directly connected to the ATN has receivers that want to receive the multicast

Equipment
data, the ATN can fast forward the multicast data. This step shortens the channel switchover
period.
IGMP Group-Policy
Figure 9-39 Networking diagram of Group-Policy
Ethernet
HostA
ATN A Receiver
GE0/3/0 N1
192.168.1.1/24 GE0/3/1
HostB
10.110.1.1/24
ATN B
GE0/3/0 GE0/3/1
192.168.2.1/24 10.110.2.1/24 Leaf network
HostC
PIM network
ATN C GE0/3/1 Receiver
N2
10.110.2.2/24
HostD
GE0/3/0
192.168.3.1/24
Ethernet
Group-Policy refers to a filtering policy configured on the ATN interface. After Group-Policy
is configured, the ATN can set restrictions on certain multicast groups, and establish no
entries for these multicast groups.
When too many users watch multiple programs simultaneously, greater ATN bandwidth is
consumed, leading to degraded ATN performance. To avoid this degradation, you can use
Group-Policy to set restrictions on certain multicast groups and limit the number of multicast
groups. In addition, for network security or expedient management, you can also use Group-
Policy to prevent receiving IGMP Report messages from certain multicast groups and prohibit
forwarding data of these multicast groups.
Group-Policy is configured through ACL.
9.3.2.9 SSM Mapping
The Source Specific Multicast Mapping (SSM Mapping) mechanism enhances the
compatibility of the hosts running versions earlier than IGMPv3, and ensures that these hosts
can also use services in the SSM range. To be specific, the SSM mapping mechanism converts
the (*,G) of IGMPv1/v2 in the SSM range into the (S,G) according to the configured
conversion principle. In this manner, hosts of lower IGMP versions can also enjoy multicast
services in the SSM range.
What is more, the SSM mapping mechanism can better protect the multicast source server and
prevent attacks to the server.

Equipment
NOTE
The multicast device does not process the (*,G) requirements, but only the (S,G) requirements from the
multicast group of the SSM range. For details of SSM, see Protocol Independent Multicast-Source-
Specific Multicast (PIM-SSM).
Figure 9-40 SSM mapping application

IGMPv1 Report
ATN IGMPv2 Report
SSM
IGMPv3
Report
HostA with HostB with HostC with

IGMPv3 IGMPv2 IGMPv1
As shown in Figure 9-40, in the user network segment of the SSM network, Host A runs
IGMPv3, Host B runs IGMPv2, and Host C runs IGMPv1. If you want Host B and Host C to
provide SSM multicast services for all hosts in the network segment without upgrading their
IGMP versions to IGMPv3, the multicast device needs to support SSM mapping.
If the multicast device supports SSM mapping, and is configured with the relevant conversion
principle, the multicast device performs either of the following after receiving the IGMP
Report messages (*,G) from Host B and Host C:
l If the multicast group of the messages indicates the ASM range, see the section 9.3.2.1
IGMPv1&v2&v3 for the processing method.
l If the multicast group of the messages indicates the SSM range, follow the SSM
mapping mechanism to convert the (*,G) of IGMPv1/v2 into the (S,G) according to the
configured conversion principle.
Figure 9-41 Networking diagram for various SSM mapping settings on various interfaces
HostC
ATN GE2
SSM GE1
GE0
HostB
HostA

Equipment
For example, the network shown in Figure 9-41 provides SSM multicast services for all
interfaces connected to hosts.
l If all the interfaces require the same SSM multicast service, configure an entry
conversion principle in the IGMP view.
l If the interfaces require different SSM multicast services, configure an entry conversion
principle for each interface in the system view and enable the principle on the
corresponding interface.
9.3.2.10 Source Address-based IGMP Message Filtering
Figure 9-42 Source address-based IGMP message filtering
ISP
ATN A
ATN B
10.0.0.1/24
Ethernet
Report Report Report

11.0.0.1 10.0.0.8 0.0.0.0
HostA HostB HostC
Source address-based Internet Group Management Protocol (IGMP) message filtering enables
a multicast device's interface to filter IGMP messages based on the access control list (ACL)
configuration to protect a multicast device against attacks from user hosts. To ensure the
precision in multicast traffic sending, configure source address-based IGMP message filtering
on the multicast device's interface connected to user hosts. Different IGMP messages have
different source address-based filtering policies:
l IGMP Report or Leave messages
– If you have not specified an ACL rule:
n If the source address of an IGMP Report or Leave message and the IP address
of the receiving interface are on the same network segment, or the host address
of the IGMP Report or Leave message is 0.0.0.0, the IGMP source address
filtering is passed.
n If the source address of an IGMP Report or Leave message and the IP address
of the receiving interface are on different network segments, the IGMP source
address filtering fails and the IGMP Report or Leave message is discarded.
– If you have specified an ACL rule, the interface filters out the IGMP Report or
Leave messages whose source addresses do not match the ACL rule.
l IGMP Query messages: The interface filters out IGMP Query messages whose source
addresses do not match a specified ACL rule.

Equipment
As shown in Figure 9-42, ATN A is connected to the hosts through the interface at
10.0.0.1/24. The source addresses of IGMP Report or Leave messages sent by Host A, Host
B, and Host C are 11.0.0.1, 10.0.0.8, and 0.0.0.0, respectively. If you have not specified an
ACL rule, the interface filters out the IGMP Report or Leave messages from Host A. If you
have specified an ACL rule, the interface filters out the IGMP messages whose source
addresses do not match the rule.
9.3.2.11 Protocol Comparison
The comparison between IGMPv1 and IGMPv2 is as follows:
IGMPv1 IGMPv2 Advantages of IGMPv2

over IGMPv1
IGMPv1 has no IGMP IGMPv2 provides IGMP IGMPv2 can manage members
Leave messages. Leave messages. of multicast groups effectively.
IGMPv1 provides only IGMPv2 provides General The multicast group can be
General Query messages. Query messages and selected directly, and thus the
Group-specific Query selection is more precise.
messages.
The comparison between IGMPv2 and IGMPv3 is as follows:
IGMPv2 IGMPv3 Advantages of IGMPv3

over IGMPv2
The message contains the A message contains not The multicast source can be
multicast group only the multicast group selected directly, and thus the
information, rather than the information, but also the selection is more precise.
multicast source multicast source
information. information.
A message contains the A message contains The number of IGMP

record of a multicast records of multiple messages is reduced on the
group. multicast groups. network segment.
The IGMP Query message The IGMP Query message The multicast information
of a specified multicast of a specified multicast maintained by the non-querier
group features no re- group and a specified and querier can be kept
transmission mechanism. multicast source features consistent better.
the re-transmission
mechanism.
9.3.3 IGMP Applications
9.3.3.1 Typical IGMP Applications

Equipment
Figure 9-43 Typical IGMP application
Source1 ATN A
User1
PIM-SM
Source2 User2
ATN B
Interfaces that are connected with user hosts

and are enabled with IGMP
IGMP is the protocol responsible for adding hosts into the routing network. Therefore, IGMP
is applied to the area where the multicast device and host are connected. Note that IGMP can
be used for hosts and multicast devices of different versions.
The IGMP On-Demand and IGMP Prompt-Leave features are only applicable to the
scenario where only a single multicast device and a single access device are located on the
shared network segment.
Terms
Terms Description
IGMP The Internet Group Management Protocol (IGMP) refers to the signaling
mechanism between the host and multicast device on the leaf network of IP
multicast.
The host joins or leaves a multicast group by sending relevant IGMP messages;
the multicast device identifies whether the multicast group contains members on
the downstream network.
(S,G) (S,G) refers to a multicast routing entry. S indicates a multicast source, and G
indicates a multicast group.
After a multicast message with S as the source address and G as the group
address reaches the multicast device, it is forwarded through the downstream
interface of the (S, G) entry.
Usually, the multicast message is expressed as the (S, G) message.

Equipment
Terms Description
(*,G) (*,G) refers to a PIM routing entry. * indicates any multicast source, and G
indicates a multicast group.
(*, G) is applicable to all multicast messages with the multicast group address as
G. That is, all the multicast messages sent to G are forwarded through the
downstream interface of the (*, G) entry, regardless of which multicast sources
send the multicast messages.

Acronym & Full Name
Abbreviation
ASM Any-Source Multicast
IGMP Internet Group Management Protocol
SSM Source-Specific Multicast
9.4 Layer 2 Multicast
9.4.1 Introduction
Definition
Layer 2 multicast is used to transmit multicast data on the data link layer. On the network
shown in Figure 9-44, after Layer 2 multicast is configured on ATN B (a Layer 2 device),
ATN B listens to Internet Group Management Protocol (IGMP) packets exchanged between
Router A (a Layer 3 device) and hosts, and creates a Layer 2 multicast forwarding table. This
implements on-demand multicast data transmission and ensures proper deployment of
multicast services on the data link layer.
Figure 9-44 Layer 2 multicast
Source RouterA ATN B NodeB

PIM
L2 Multicast
IGMP Protocol Packet
Purpose
The primary purpose of Level 2 multicast is to reduce network bandwidth consumption. On
the network shown in Figure 9-44, after receiving multicast packets from Router A, ATN B at

Equipment
the edge of the access network forwards these multicast packets to multicast receivers. If
Layer 2 multicast is not configured on ATN B, ATN B will broadcast received multicast data
packets in the broadcast domain to which the packets belong. All hosts including group
members and non-group members in the broadcast domain will receive the packets. This is
because ATN B does not know which interfaces are connected to receivers. This wastes
network bandwidth and adversely affects network security.
The problem of bandwidth waste can be addressed by configuring Layer 2 multicast on ATN
B. Layer 2 multicast enables ATN B to record the mappings between multicast group
addresses and relevant ports in the forwarding table. Instead of flooding multicast data
packets, ATN B can now forward these multicast data packets based on the forwarding table.
Upon receiving multicast packets, ATN B searches the forwarding table for downstream ports
based on group addresses of the multicast packets and forwards these multicast packets to
relevant users.
Functions
Layer 2 multicast has the following principal functions:
l IGMP snooping
Internet Group Management Protocol (IGMP) snooping provides a way to control
multicast traffic at Layer 2. By listening to IGMP packets exchanged between an
upstream device and hosts, IGMP snooping can set up Layer 2 multicast forwarding
tables to deliver traffic only to interfaces with at least one group member, significantly
reducing the volume of multicast traffic.
l Static Layer 2 multicast
A Layer 2 multicast forwarding table is manually configured in which interfaces and
multicast address entries are bound. This enables multicast data packets to be forwarded
to hosts that must steadily receive multicast data.
l Layer 2 Source-Specific Multicast (SSM) mapping
Layer 2 SSM mapping enables IGMPv2 hosts to enjoy IGMPv3 services.
l IGMP Snooping Proxy
An IGMP Snooping Proxy-enabled device has the several functions. Acting as an
attached host of an upstream device, it responds rapidly to Query messages sent from the
upstream device and forwards Report and Leave messages sent from users to the
upstream device. Acting as an upstream device directly connected to hosts, it sends
IGMP Query messages to the hosts and processes IGMP Report messages sent by the
hosts. This reduces performance pressure on the upstream device and saves network
bandwidth.
l Multicast VLAN
After the multicast VLAN function is configured on a Layer 2 device, an upstream
device of the Layer 2 device sends multicast data only to a specific multicast VLAN. The
Layer 2 device replicates the multicast data to its other VLANs. This reduces bandwidth
consumption on the upper-layer network.
Benefits
Layer 2 multicast implements the on-demand multicast data distribution on the data link layer.
It provides the following benefits:
l Saves network bandwidth.

Equipment
l Ensures multicast data security.

l Reduces performance pressure on Layer 3 devices.
l Ensures the quality of user services.
9.4.2 Principles
9.4.2.1 IGMP Snooping
Principles
Layer 3 devices and attached hosts use IGMP to implement multicast data communications.
In IGMP, before a host joins a multicast group, it needs to send an IGMP Report message to
the upstream device directly it is connected to. The upstream device can then send multicast
packets to the host. IGMP messages are encapsulated in IP packets (Layer 3 packets). A link
layer device cannot, however, process Layer 3 information carried in packets. In addition, the
link layer device cannot learn any multicast MAC address by learning the source MAC
addresses of link layer data frames because the source MAC addresses of the data frames
cannot be multicast MAC addresses. When a link layer device receives a date frame, the
destination MAC address of which is a multicast MAC address, the device cannot find a
matching entry in its MAC address table. Consequently, the link layer device broadcasts all
multicast packets it receives. This wastes bandwidth resources and poses a threat to network
security.
IGMP snooping is a basic Layer 2 multicast function, and is used to control multicast traffic at
Layer 2. A Layer 2 device that runs IGMP snooping listens to and analyzes IGMP messages
exchanged between a Layer 3 device and hosts. The Layer 2 device sets up a Layer 2
forwarding table based on these messages and uses this table to forward data packets.
Figure 9-45 shows a network on which ATN B functions as a Layer 2 device.
l If ATN B does not run IGMP snooping, multicast data is broadcast at the data link layer.
l If ATN B runs IGMP snooping and does not receive Report messages for multicast
group 225.0.0.1 from some users, ATN B does not broadcast the multicast data for this
group. Instead, it uses the Layer 2 multicast forwarding table to send the multicast data
to the group members through Port1 and Port2.

Equipment
Figure 9-45 Multicast packet transmission before and after IGMP snooping is configured on a
Layer 2 device
Multicast packet transmission
without IGMP Snooping
Source RouterA
PIM
ATN B
Multicast packet transmission

when IGMP Snooping runs
Source
RouterA
PIM
ATN B
Port1 Port2
Port3
Multicast Packet

Equipment
Table 9-20 Layer 2 multicast forwarding table

Multicast Group Downstream Port
225.0.0.1 Port1
225.0.0.1 Port2
Related Concepts
Figure 9-46 is used to introduce concepts related to IGMP snooping.
Figure 9-46 Layer 2 multicast
Internet
/Intranet
Source
ATN B
ATN A
NodeB1 NodeB2 NodeB3 NodeB4 NodeB5
multicast group multicast group multicast group

member member member
Multicast group member port Multicast packets
Router port
l Router port: is a port (labeled with a circle in Figure 9-46) connecting a link layer
multicast device to an upstream multicast router.
Router ports are either dynamic or static. Dynamic router ports are discovered by
protocols. Static router ports are manually configured.
l Member port of a multicast group: is a port (labeled with a square in Figure 9-46)
connecting a link layer multicast device to a group member host. The link layer multicast

Equipment
device uses the member port of a multicast group to send multicast packets to a host. A
member port of a multicast group is called a member port for short.
Member ports are either dynamic or static. Dynamic member ports are discovered by
protocols. Static member ports are manually configured.
l Layer 2 multicast forwarding entry: is the basis for multicast data forwarding. Devices at
the link layer use entries in the multicast forwarding table to forward multicast packets
sent by an upstream device to receivers. An entry in a Layer 2 multicast forwarding table
contains the following information:
– VLAN ID or VSI name
– Multicast group address
– Router port (upstream port)
– Member port list (downstream port list)
Multicast MAC address: is mapped from a multicast IP address contained in a multicast data
packet to be transmitted at the data link layer. Multicast MAC addresses are used to transmit
multicast data packets at the data link layer. As defined by the Internet Assigned Numbers
Authority (IANA), the 24 most significant bits of a multicast MAC address are 0x01005e, the
25th bit is 0, and the 23 least significant bits are the same as those of a multicast IP address.
Figure 9-47 shows the mapping between a multicast IP address and a multicast MAC address.
For example, if the IP address of a multicast group is 224.0.1.1, the MAC address of this
multicast group is 01-00-5e-00-01-01. Information about 5 bits of the IP address is lost,
because only 23 bits of the 28 least significant bits of the IP address are mapped to the MAC
address. As a result, 32 IPv4 multicast addresses are mapped to the same MAC address. For
example, IP multicast addresses 224.0.1.1, 224.128.1.1, 225.0.1.1, and 239.128.1.1 all
correspond to multicast MAC address 01-00-5e-00-01-01.
Figure 9-47 Mapping between an IP multicast address and a multicast MAC address
5 bits information loss
XXXX X
32 bits IP address 1110 XXXX X XXXXXXX XXXXXXXX XXXXXXXX
...... 23 bits ......

mapping
48 bits MAC address

00000001 00000000 01011110 0 XXXXXXX XXXXXXXX XXXXXXXX
25 bits MAC address prefix
Implementation
The process for implementing IGMP snooping is as follows:
1. IGMP snooping analyzes IGMP messages exchanged between hosts and a Layer 3
device and sets up a Layer 2 multicast forwarding table on the basis of this analysis.

Equipment
Forwarding table entries contain VLAN IDs or VSI names, multicast source addresses,
multicast group addresses, and downstream port lists.
– After receiving an IGMP Query message from an upstream device, IGMP snooping
sets the network-side port as a dynamic router port.
– After receiving an IGMP Report message from a downstream device or a user,
IGMP snooping sets the user-side port as a dynamic member port.
2. When multicast data traffic passes through a Layer 2 device, the Layer 2 device forwards
the multicast data traffic based on its Layer 2 multicast forwarding table.
NOTE
When multiple router ports exist (for example, in a dual-homing scenario) and one of them receives
multicast traffic, the Layer 2 device forwards the traffic to all the other router ports while forwarding the
traffic to users based on the Layer 2 multicast forwarding table.
Other Functions
IGMP snooping also supports the following functions:
l Support for all IGMP versions

IGMP has three versions: IGMPv1, IGMPv2, and IGMPv3. Devices can be configured
to process multicast data packets for each of these versions.
l Choice of multicast forwarding mode
Multicast data can be forwarded by IP address.
l Rapid response to Layer 2 network topology changes
MSTP or RRPP links are usually used to connect Layer 2 devices on a user network
segment. When the network topology changes, a device with rapid response to Layer 2
network topology changes deployed can immediately update port information and switch
multicast data traffic to a new forwarding path. This prevents multicast services from
being interrupted.
l Multicast group security policy
A multicast group security policy can be used to limit the range and number of multicast
groups that users can join and to determine whether to receive multicast data packets
containing a security field. This provides more control for multicast and improves
security.
Deployment Scenarios
IGMP snooping can be used on VLANs and VPLS networks.
Benefits
IGMP snooping enabled on the ATN connected to a user network segment provides the
following benefits:
l Reduces bandwidth consumption by reducing multicast traffic that would otherwise

flood a Layer 2 network.
l Facilitates separate accounting for individual hosts.

Equipment
9.4.2.2 Static Layer 2 Multicast
Principles
Multicast data can be transmitted to user terminals over an IP bearer network in either
dynamic or static multicast mode.
l In dynamic multicast mode, a device receives and delivers the data for a channel
(multicast group) only after it receives a Report message for the channel from the first
user. The device strops receiving data for the channel after it receives the Leave message
from the last member. The dynamic multicast mode has both an advantage and a
disadvantage:
– Advantage: It reduces bandwidth consumption by reducing multicast traffic.
– Disadvantage: It brings in a delay when a user switches a channel.
l In static multicast mode, multicast forwarding entries are configured for each channel
(multicast group) on a device. Multicast traffic for each channel is delivered to the
device, regardless of whether there are users attached to the device. The static multicast
mode has the following advantages and disadvantages:
– Advantages:
n Multicast routes are fixed, and multicast paths exist regardless of whether there
are multicast data receivers. Users can change channels without delays and the
quality of user experience is good.
n Multicast source and group ranges are easy to manage because multicast paths
are stable.
n The delay when data is first forwarded is minimal because static routes already
exist and do not need to be established as dynamic multicast routes do.
– Disadvantages:
n Each device on a multicast data transmission path must be manually
configured. The device configuration load is heavy.
n Sub-optimal multicast forwarding paths may be generated because
downstream ports must be specified in advance on each device.
n When there are changes in the network topology or unicast routes, static
multicast paths may need to be reconfigured. The workload is heavy.
n Multicast routes exist even when no multicast data needs to be forwarded. This
wastes network resources and creates high bandwidth requirements.
A Layer 2 multicast forwarding table can be built dynamically with IGMP snooping or can be
manually configured. Network quality requirements or the kinds of services demanded by
users can be the basis for determining whether to use dynamic or static multicast mode on
network devices.
If network bandwidth is sufficient and hosts need to multicast data for specific multicast
groups from a router port for a long period of time, static Layer 2 multicast can be used to
implement stable multicast data transmission on a MAN or bearer network.
Related Concepts
Static router or member ports are used in static Layer 2 multicast.
l Static router ports are used to receive regular multicast traffic.

Equipment
l Static member ports are used to send data for specific multicast groups.
Static Layer 2 multicast is used on VLANs and VPLS networks.
Benefits
After static Layer 2 multicast is deployed on a device, multicast entries on the device do not
age and users attached to the device can regularly receive multicast data for specific multicast
groups. Static Layer 2 multicast provides the following benefits:
l Simplifies network management.
l Reduces network delays.
l Protects unregistered users from being attacked by multicast data and protocol packets,
improving information security.
9.4.2.3 Layer 2 SSM Mapping
Principles
IGMPv3 supports Source Specific Multicast (SSM). While many multicast devices currently
support IGMPv3, most old multicast terminals support IGMPv1 or IGMPv2. The SSM
mapping mechanism enables devices running IGMPv3 to provide SSM services for hosts
running IGMPv1 or IGMPv2. The SSM mapping mechanism maps IGMPv1 or IGMPv2 (*,
G) messages in which group addresses are within an SSM group address range into IGMPv3
(S, G) messages. This enables hosts running IGMPv1 or IGMPv2 to obtain SSM services.
The SSM mapping mechanism effectively protects multicast sources from being attacked.
Layer 2 SSM mapping is used to implement SSM mapping on Layer 2 networks. In the
networking shown in Figure 9-48, the Layer 3 device runs IGMPv3 and is directly connected
to a Layer 2 device. NodeB-A runs IGMPv3, NodeB-B runs IGMPv2, and NodeB-C runs
IGMPv1 on the Layer 2 network. If the IGMP versions of NodeB-B and NodeB-C cannot be
upgraded to IGMPv3, Layer 2 SSM mapping needs to be configured on the Layer 2 device to
provide SSM services for all hosts on the network segment.
Figure 9-48 Layer 2 SSM mapping

IGMPv1 Report
ATN IGMPv2 Report
SSM
IGMPv3
Report
NodeB-A NodeB-B NodeB-C

with IGMPv3 with IGMPv2 with IGMPv1

Equipment
Implementation
If SSM mapping is configured on a multicast device and mappings between group addresses
and source addresses are configured, the multicast device will perform the following actions
after receiving a (*, G) message from Host C or Host B:
l If the multicast group address contained in the message is out of the SSM group address
range, the device processes the message in the same manner as it processes an
IGMPv1/v2 message.
l If the multicast group address contained in the message is within the SSM group address
range, the device maps the (*, G) Report message into an (S, G) Report messages based
on mapping rules.
Layer 2 SSM mapping is used on VLANs and VPLS networks.
Benefits
Layer 2 SSM mapping deployed on Layer 2 devices provides the follow benefits:
l Enables IGMPv1/v2 terminal users to enjoy SSM services.
l Better protects multicast sources from being attacked.
9.4.2.4 IGMP Snooping Proxy
Principles
On the network shown in Figure 9-49, forwarding entries are set up based on IGMP messages
exchanged between the PE (a Layer 3 device) and user hosts. If there are many user hosts,
redundant IGMP messages increase the work load of the PE.
IGMP Snooping Proxy can be deployed on the CE (a Layer 2 device) connecting the PE and
hosts to address this problem by terminating IGMP messages. The CE where IGMP Snooping
Proxy is configured has the following functions:
l Periodically sends Query messages to the hosts and receives Report and Leave messages
from the hosts.
l Maintains group member relationships.
l Sends Report and Leave messages to the PE.
l Forwards multicast traffic only to those hosts that require it.
After IGMP Snooping Proxy is deployed on the CE, the PE believes that it is interacting with
only one user. The CE interacts with the upstream PE and downstream user hosts, and is not
completely transparent.

Equipment
Figure 9-49 IGMP Snooping Proxy
IP/ MPLS IP/ MPLS
PE PE
CE CE
IGMP Query Packet

IGMP Report/Leave Packet
Multicast Packets
Implementation
A device that runs IGMP Snooping Proxy establishes and maintains a multicast forwarding
table and sends multicast data to users based on this table. The process for implementing
IGMP Snooping Proxy is as follows:
l An IGMP Snooping Proxy-enabled device sends Query messages to query members of
multicast groups. The querier function must be enabled on a device If the upstream
device of this device does not send IGMP Query messages or static multicast groups are
configured on the upstream device.
l The IGMP Snooping Proxy-enabled device suppresses Report and Leave messages if
large numbers of users frequently join or leave multicast groups. This reduces message
processing pressure on the upstream device.
– When receiving the first Report message for a multicast group from a user host, the
device checks whether there is an entry for this group. If no entry exists, the device
sends the Report message to its upstream device and also creates an entry for this
group. If the entry exists, the device adds the host to the multicast group and does
not send any Report messages to its upstream device.
– After receiving a Leave message for a group from a user host, the device sends a
group-specific query message to check whether there are any members of this
group. If there are other members of this group, the device deletes the user from the
group. If there are no other members of this group, the device considers the user as
the last member of the group and sends a Leave message to its upstream device.
IGMP Snooping Proxy is used on VLANs and VPLS networks.

Equipment
Benefits
IGMP Snooping Proxy deployed on a Layer 2 ATN connected to a user network segment
provides the following benefits:
l Reduces bandwidth consumption by reducing IGMP message exchanges.
l Reduces the load of a directly connected Layer 3 device by processing protocol
messages received from downstream hosts and maintaining group memberships.
9.4.2.5 Multicast VLAN
Principles
As shown in Figure 9-50, in traditional multicast on-demand mode, bandwidth is wasted and
extra loads are borne by both the Layer 3 PE and Layer 2 CE if users in different VLANs
(VLAN 11 and VLAN 22) need to receive multicast data from the same source through the
same device. The Layer 3 PE must send one copy of the multicast data for each VLAN to the
Layer 2 CE and the Layer 2 CE must send a copy of the multicast data to each user.
The multicast VLAN function can be used to address this problem. Based on IGMP snooping,
the multicast VLAN function implements multicast replication across broadcast domains on
Layer 2 devices. After the multicast VLAN function is enabled on the CE, the PE connected
to the CE sends one copy of multicast traffic only to VLAN 3 (multicast VLAN) of the CE.
The multicast data is replicated on the CE and copies are sent to VLAN 11 and VLAN 22.
The PE no longer needs to send more than one identical multicast data flow downstream. This
saves network bandwidth and reduces the load on the PE.

Equipment
Figure 9-50 Comparison of networks with and without the multicast VLAN function
IP core IP core
PE PE
CE CE
VLAN VLAN VLAN VLAN

11 22 11 22
VLAN11 data VLAN11 data

VLAN22 data VLAN22 data
VLAN 3 data
The multicast VLAN function generally must be used together with IGMP Snooping Proxy
for the following reasons:
l On the network shown in Figure 9-50, if IGMP Snooping Proxy is not enabled on
VLAN 3 and users in different VLANs want to join the same group, the CE needs to
forward an IGMP Report message from each user to the PE. Similarly, if users in
different VLANs want to leave the same group, the CE also needs to forward an IGMP
Leave message from each user to the PE.
l After IGMP Snooping Proxy is enabled on VLAN 3, if users in different VLANs want to
join the same group, the CE needs to send only one IGMP Report message to the PE. If
the last member of the group leaves, the CE sends an IGMP Leave message to the PE.
This reduces network-side bandwidth consumption on the CE and performance pressure
on the PE.
Related Concepts
The following concepts are involved in the multicast VLAN function:
l Multicast VLAN: is a VLAN to which the interface connected to a multicast source
belongs. A multicast VLAN is used to aggregate multicast flows.

Equipment
l User VLAN: is a VLAN to which a group member host belongs. A user VLAN is used
to receive multicast flows from a multicast VLAN.
One multicast VLAN can be bound to multiple user VLANs.
After the multicast VLAN function is configured on a device, the device receives multicast
traffic through the multicast VLANs and sends the multicast traffic to users through user
VLANs.
Implementation
The multicast VLAN implementation process can be divided into two parts:
l Protocol packet forwarding

– After the user VLAN tag in an IGMP Report message is replaced with a
corresponding multicast VLAN tag, the message is sent out through a router port of
the multicast VLAN.
– After the multicast VLAN tag in an IGMP Query message is replaced with a
corresponding user VLAN tag, the message is sent out through a member port of
the user VLAN.
– Entries learned by IGMP snooping in user VLANs are added to the table of the
multicast VLAN.
l Multicast data forwarding
After receiving a multicast data packet from an upstream device, a Layer 2 device
searches its multicast forwarding table for a matching entry.
– If a matching forwarding entry exists, the Layer 2 device will identify the
downstream ports and their VLAN IDs, replicate the multicast data packet on each
downstream port, and send a copy of the packet to user VLANs.
– If no matching forwarding entry exists, the Layer 2 device will broadcast the
multicast data packet in the local multicast VLAN.
The multicast VLAN function is used on VLANs.
Benefits
The multicast VLAN function moves the multicast replication point downstream to edge
devices so that multicast data can be transmitted in different VLANs. The multicast VLAN
function provides the following benefits:
l Reduces bandwidth consumption.
l Reduces the loads of Layer 3 devices
l Facilitates management of multicast sources and multicast group members.
9.4.2.6 Layer 2 Multicast Instance
Principles
In traditional multicast on-demand mode, if users in different VLANs or VPLS networks need
to receive the multicast data from the same source through a device, the upstream device of

Equipment
this device must send several identical multicast data flows downstream. This wastes the
bandwidth and imposes extra processing burdens on the upstream device.
One or more Layer 2 multicast instances can be deployed on the Layer 2 network to address
this problem. The Layer 2 multicast instance function enhances the multicast VLAN function.
This function implements multicast data replication across VLANs and VPLS networks, and
limits the multicast group range in different instances. This saves bandwidth resources and
simplifies multicast group management. In the networking shown in Figure 9-51, if users in
VLAN 11 and VLAN 22 require the multicast data for channels in the range of 225.0.0.1 to
225.0.0.5, Layer 2 multicast instances can be deployed on the CE. After the PE sends one
copy of the multicast data traffic through VLAN 3, the CE replicates the multicast data and
sends a copy to each VLAN. This reduces bandwidth consumption.
Figure 9-51 Layer 2 multicast instance application
IP core
PE
VLAN 3
( 225.0.0.1~225.0.0.5)
CE
VLAN 11 VLAN 22
VLAN11 User Data

VLAN22 User Data
VLAN 3 Instance Data
Multicast users are allowed to receive multicast data traffic across different types of networks.
This facilitates flexible deployment of multicast services and satisfies the requirements of
different types of networking. For example, users are allowed to receive multicast data traffic
across VPLS networks and VLANs.
Related Concepts
The following concepts are involved in the Layer 2 multicast instance function:

Equipment
l Multicast instance: is the instance to which the interface connected to the multicast
source belongs. A multicast instance is used to aggregate multicast flows.
l User instance: is the instance to which a group member host belongs. A user instance is
used to receive multicast flows from a multicast instance.
One multicast instance can be bound to multiple user instances.
l Channel: is a series of multicast groups. To facilitate program management, content
providers operate different types of channels in different Layer 2 multicast instances.
Channels need to be configured in the Layer 2 multicast instances.
Implementation
The Layer 2 multicast instance implementation is similar to the multicast VLAN
implementation. After receiving a multicast data packet from an upstream device, a Layer 2
device searches the multicast forwarding table for a matching entry based on the multicast
instance ID and the destination address (multicast group address) contained in the packet. If a
matching forwarding entry exists, the Layer 2 device will identify the downstream interfaces
and their VLAN IDs or VSI names, replicate the multicast data packet on each downstream
interface, and send a copy of the packet to user instances. If no matching forwarding exists,
the Layer 2 device will broadcast the multicast data packets in the local multicast VLAN or
VSI.
The Layer 2 multicast instance function is used on VLANs and VPLS networks.
Benefits
The Layer 2 multicast instance function provides the following benefits:
l Reduces bandwidth consumption.
l Ensures network security.
l Separates the unicast and multicast domains and prevents the traffic of a particular user
from affecting other users or the network as a whole.
9.4.3 Applications
9.4.3.1 Application of Layer 2 Multicast for IPTV Services
IPTV Service Overview

IPTV is a video service that users access an IP network. IPTV is a new type of media service
that makes high demands for bandwidth, real-time transmissions, and reliability on IP MANs.
Multiple users can receive the same IPTV service data simultaneously.
Given the characteristics of IPTV, multicast technologies can be used to bear IPTV services.
Unlike traditional unicast, multicast does not require more network bandwidth as the number
of users increases. It reduces loads on video servers and the bearer network. If service
providers want to deploy IPTV services quickly and economically, E2E multicast push is
recommended.

Equipment
Network Description
Currently, an IP MAN consists of a metro backbone network and a broadband access network.
IPTV service traffic is pushed through the metro backbone network to the broadband access
network and finally to user terminals. Figure 9-52 shows an E2E IPTV service push model.
The metro backbone network is made up primarily of network layer (Layer 3) devices. PIM,
such as PIM-SM, is used on metro backbone devices to connect to the multicast source.
Devices directly connected to the broadband access network use IGMP to forward multicast
packets to user terminals. The broadband access network consists of data link layer (Layer 2)
devices. Layer 2 devices can use Layer 2 multicast technologies such as IGMP Snooping
Proxy or IGMP snooping to forward multicast packets to terminal users. The multicast
technology ensures that there is only one copy of multicast data is transmitted on the metro/
backbone and broadband access networks, greatly reducing bandwidth consumption.
Figure 9-52 Application of Layer 2 multicast for IPTV services
Server
Metro
IP/MPLS Backbone
Network
PIM/IGMP SR BSR
IGMP Snooping PE1 PE2 Broadband

Access
Network
IGMP Snooping/
IGMP Proxy/ CE1 CE2
Multicast VLAN
Multicast
Packets
The following section describes Layer 2 multicast features used on the broadband access
network.

Equipment
Layer 2 Multicast Deployment

The broadband access network made up of Layer 2 devices. Layer 2 devices exchange or
forward data frames based on MAC addresses. Their ability to parse and route IP packets is
very limited. As a result, Layer 2 devices do not support Layer 3 multicast protocols. In the
past, Layer 2 devices broadcast all IPTV multicast traffic to all interfaces. This easily resulted
in broadcast storms.
The use of common Layer 2 multicast forwarding technologies such as IGMP snooping,
IGMP Snooping Proxy, and multicast VLAN prevent multicast packet flooding and resolve
the problems it creates:
l IGMP snooping is deployed on all Layer 2 devices. IGMP snooping listens to IGMP
messages exchanged between Layer 3 devices and user terminals. Layer 2 devices use
these messages to maintain multicast group memberships in order to implement on-
demand multicast traffic forwarding.
l IGMP Snooping Proxy is deployed on CEs close to user terminals to listen to, filter, and
forward IGMP message. This reduces the number of multicast protocol packets
exchanged directly between CEs and upstream devices, and reduces packet processing
pressure on upstream devices.
l Multicast VLAN can be deployed on CEs close to user terminals to reduce the network
bandwidth required for transmissions between CEs and multicast sources.
The following features can also be deployed on Layer 2 devices:
l VSI or VLAN-based Layer 2 multicast instance (a multicast VLAN enhancement) can be
deployed on CEs close to user terminals to reduce the network bandwidth required for
transmissions between CEs and multicast sources.
l If the number of user terminals attached to a CE exceeds the number of IPTV channels,
static multicast groups can be configured on the CE to increase the channel change speed
and improve the QoS for IPTV services.
l If user hosts support IGMPv1 and IGMPv2 only, SSM mapping can be deployed on the
CE connected to these user terminals so the user hosts can access SSM services.
This example uses a BTV channel with a bandwidth of 2 Mbit/s.
l If a Layer 2 device uses no Layer 2 multicast forwarding technology, the device forwards
multicast packets to all IPTV users. Broadcasting multicast packets for five IPTV
channels leads to network congestion. This is the case even if the bandwidth of the
interface connecting the Layer 2 device to users is 10 Mbit/s.
l After Layer 2 multicast forwarding technologies are used on the Layer 2 device, the
Layer 2 device sends multicast packets only to users that require the multicast packets. If
each interface of the Layer 2 device is connected to at least one IPTV user terminal,
multicast packets (2 Mbit/s traffic) for at most one BTV channel are forwarded to
corresponding interfaces. This ensures the availability of adequate network bandwidth
and the quality of user experience.

Equipment

Terms
Term Definition
(*, G) A multicast entry of the ASM model. * indicates any source,

and G indicates a multicast group.
(*, G) entry is applicable to all multicast packets carrying the
group address G. All multicast packets that are sent to G are
forwarded through the downstream interfaces of the (*, G)
entry, regardless of which source sends the packets out.
(S, G) A multicast entry of the SSM model. S indicates a multicast

source, and G indicates a multicast group.
After a multicast packet with S as the source address and G as
the group address reaches the router, it is forwarded through
the downstream interfaces of the (S, G) entry.
The packet is expressed as (S, G) packet.

IGMP Internet Group Management Protocol
PIM Protocol Independent Multicast
PW Pseudo Wire
VLAN Virtual Local Area Network
VSI Virtual Switch Instance
9.5 MSDP
9.5.1 Introduction to MSDP

Definition
Multicast Source Discovery Protocol (MSDP) is an inter-domain IPv4 multicast solution
based on the interconnection of multiple Protocol Independent Multicast Sparse Mode (PIM-
SM) domains.
Purpose
A network composed of multiple PIM-SM devices is called the PIM-SM network. A large
PIM-SM network may be maintained by multiple Internet Service Providers (ISPs).

Equipment
PIM-SM domains are isolated by Rendezvous Points (RPs). The multicast source can only
register to the local RP, and hosts can only send the Join message to the local RP. As a result,
the RP only recognizes the local multicast source and distributes the data from the multicast
source to the local users.
A PIM-SM network depends on RPs to forward multicast data. To implement load balancing
among RPs, enhance network reliability, and facilitate management, you can group multiple
RPs into different domains on the PIM-SM network. Each domain is called a PIM-SM
domain.
After a PIM-SM network is divided into multiple PIM-SM domains, RPs in different domains
cannot communicate with each other. To implement the communication between PIM-SM
domains, MSDP is introduced.
NOTE
A PIM-SM domain can be considered the service scope of an RP, and different PIM-SM domains can be
divided by the BootStrap Router (BSR) boundary or by configuring different static RPs on different
ATNs.
9.5.2 Principles
9.5.2.1 Inter-Domain Multicast in MSDP
MSDP Peer Relationship

In MSDP, inter-domain multicast correctly addresses the problem that Rendezvous Point (RP)
information and multicast source information are isolated between different PIM-SM
domains. As a result, RPs can communicate with each other, multicast source information is
shared, and multicast services can be forwarded between PIM-SM domains.
The MSDP peer relationship can be configured in the following ways:
l Establish MSDP peer relationships between RPs in the same autonomous system (AS)
but of different PIM-SM domains.
l Establish MSDP peer relationships between RPs in different ASs.
NOTICE
To ensure successful Reverse Path Forwarding (RPF) checks in an inter-AS scenario, a
BGP or a Multicast Border Gateway Protocol (MBGP) peer relationship must be
established on the same interfaces as the MSDP peer relationship.
NOTE
For details of MBGP, refer to the chapter "MBGP Configuration" in the Configuration Guide - IP
Multicast.

Equipment
Basic Principle
Setting up the MSDP peer relationships between RPs in different PIM-SM domains ensures
communications between MSDP peers (RPs). This procedure forms an MSDP-connected
graph.
MSDP peers then exchange Source Active (SA) messages. The SA message carries (S, G)
information registered on RP of the source DR. SA messages are exchanged among MSDP
peers. This exchange ensures that SA messages sent by an RP can be received by all the other
RPs.
As shown in Figure 9-53, the PIM-SM network is divided into four PIM-SM domains. The
multicast source of PIM-SM1 domain (Source) sends data to the multicast group G. Receiver
in the PIM-SM3 domain, as a member of G, maintains an RP-rooted Shared Tree (RPT) of G
with RP3.
Figure 9-53 Inter-domain multicast in MSDP
Receiver
PIM-SM 3
DR3 RP3
Source DR1
PIM-SM 4
PIM-SM 1
RP2
RP1 PIM-SM 2
MSDP peers
multicast packet
Register
SA message
Join
As shown in Figure 9-53, Receiver can receive the multicast data sent by Source after the
MSDP peer relationships between RP1, RP2, and RP3 are set up.
1. Source sends multicast data to G. DR1 (Designated Router) then encapsulates the data
into the Register message and sends the message to RP1. As the RP of the multicast

Equipment
source, RP1 creates an SA message, which carries the IP addresses of the multicast
source, multicast group G, and RP1, and sends the SA message to the peer RP2.
2. After RP2 receives the SA message, it performs an RPF check on the message. If the
check succeeds, RP2 forwards the message to RP3.
3. After RP3 receives the SA message, it performs an RPF check on the message, and the
check succeeds. RP3 has the (*, G) entry, and the domain contains the member of G.
4. RP3 creates an (S, G) entry and sends a Join message with the (S, G) information to
Source hop by hop. A multicast path (source tree) from Source to RP3 is set up. After the
multicast data reaches RP3 along the source tree, RP3 forwards it to Receiver along the
RPT.
5. After Receiver receives the multicast data, it determines whether to initiate the SPT
switchover.
9.5.2.2 Anycast RP in MSDP
In a traditional PIM-SM domain, each multicast group is mapped to only one Rendezvous
Point (RP). When the network is overloaded or the traffic is too heavy, many network
problems occur, such as the heavy pressure of the RP, the slow convergence after the RP fails,
and the non-optimal multicast forwarding path.
Therefore, anycast RP is introduced in MSDP. After anycast RP is enabled in MSDP, multiple
RPs can be configured with the same loopback address in a PIM-SM domain, and MSDP peer
relationships are established between these RPs. As a result, the path destined for the RP is
optimal, and load balancing is implemented among RPs.
To sum up, anycast RP can properly address the problem of heavy loading on a single RP in a
PIM-SM domain, which is caused by the convergence of all multicast source information and
multicast join information on the RP. Meanwhile, anycast RP ensures the path destined for an
RP is optimal because the receiver and multicast source join and register to the nearest RP.
Principles
As shown in Figure 9-54, in the PIM-SM domain, the multicast sources, S1 and S2, send
multicast data to the multicast group G that contains multicast members, U1 and U2.

Equipment
Figure 9-54 Networking diagram of anycast RP
PIM-SM
RP1 DR1
U1 S1
S2 U2
DR2 RP2
SA message
MSDP peers
The implementation of anycast RP in a PIM-SM domain is shown as follows:

1. Establish the MSDP peer relationship between RP1 and RP2, and enable multicast in the
PIM-SM domain through the MSDP peers.
2. The receiver sends a Join message to the nearest RP to set up a rendezvous point tree
(RPT) tree. In addition, the multicast source registers to the nearest RP, and RPs sends
each other Source Active (SA) messages to share the multicast source information.
3. RPs join the Shortest Path Tree (SPT), whose root is the multicast source Designated
router (DR). Then RPs receive and forward multicast data. After the receiver receives the
multicast data, it determines whether to initiate the SPT switchover.
9.5.2.3 MD5/Key-Chain Authentication
MSDP supports the Message-digest algorithm 5 (MD5) or Key-Chain authentication to

enhance the security and reliability in MSDP message forwarding. The application scenario of
the MD5/Key-Chain authentication is the same as that of basic MSDP applications. Currently,
MSDP supports two authentication modes: MD5 and Key-Chain. The two modes are
mutually exclusive, and MSDP peers need to adopt the same authentication mode.
Key-Chain provides the encryption and authentication functions for all applications, supports
multiple encryption algorithms, and supports dynamically updating of the key value of the
encryption algorithm. For details about Key-Chain, refer to relevant feature description.
9.5.2.4 RPF Rules of SA Messages

To prevent Source Active (SA) messages from being circularly forwarded between MSDP
peers, MSDP performs the Reverse Path Forwarding (RPF) check on the received SA

Equipment
messages. MSDP strictly controls the inbound of the SA message. MSDP directly discards an
SA message that does not comply with the RPF rules.
MSDP has the following RPF rules:
l Rule 1: If the peer that sends the SA message is the source Rendezvous Point (RP), the
SA message is received and forwarded to other peers.
l Rule 2: The SA message sent by the static RPF peer is received. A device can set up an
MSDP peer relationship with multiple devices. Users can select one or multiple peers
from these remote peers and set it as the static RPF peer.
l Rule 3: If a device has only one remote MSDP peer, the remote peer automatically
becomes the RPF peer. The device receives the SA message sent by the remote peer. The
PIM-SM domain that has only one remote MSDP peer outside the PIM-SM domain is
called STUB domain.
l Rule 4: If the peer that sends the SA message and the local device belong to the same
mesh group, the local device receives the SA message. The SA messages from the mesh
group are not forwarded to the members of the mesh group, but to all the peers that do
not belong to the mesh group.
l Rule 5: If the peer that sends the SA message is the next hop of the route to the source
RP or a route forwarder, the local device receives SA messages and forwards them to
other peers. The route types can be Multicast Border Gateway Protocol (MBGP) routes,
BGP routes, static multicast routes, and Interior Gateway Protocol (IGP) routes.
l Rule 6: If the route that reaches the source RP spans multiple autonomous systems
(ASs), only the SA message received from the peer whose AS number is in the AS-path
is accepted.
9.5.3 MSDP Applications
Inter-Domain Multicast
Figure 9-55 Inter-PIM-SM domain multicast in an AS
AS 100
Receiver
Source
PIM-SM1 PIM-SM2
RP1 RP2
Router1 Router2
MSDP Peers

Equipment
l The MSDP peer relationship is set up between Rendezvous Points (RPs) of two PIM-SM
domains. In this manner, information about the multicast source is shared between the
two PIM-SM domains.
l When receiving the multicast data, the multicast source RP1 sends Source Active (SA)
messages that carry the multicast source information to RP2.
l Then RP2 forwards the multicast data to the receiver in its domain.
l After receiving the multicast data, the receiver decides whether to initiate the shortest
path tree (SPT) switchover.
Anycast RP
Figure 9-56 Anycast RP application
U2
PIM-SM
S1
Loopback1 ATN2
S2
BSR
Loopback1
ATN1
U1
MSDP peers
l ATN 1 and ATN 2, as RPs, establish the MSDP relationship between each other.
l Through the MSDP peer relationship, the intra-domain multicast is performed. In
addition, the receiver sends a Join message to the nearest RP to set up a rendezvous point
tree (RPT) tree.
l The multicast source registers to the nearest RP, and RPs send each other SA messages to
share the multicast source information.

Equipment
l RPs join the SPT with the source Designated router (DR) as root to obtain the multicast
data.
l After the receiver receives the multicast data, it determines whether to initiate the SPT
switchover.
Terms
Terms Description
MSDP Multicast Source Discovery Protocol (MSDP) is only applicable to the PIM-SM
domain and only meaningful for the Any-Source Multicast (ASM) model.
After the MSDP peer relationship is set up between RPs of different PIM-SM
domains, multicast source information can be shared between PIM-SM domains,
and the inter-domain multicast can be implemented.
After the MSDP peer relationship is set up between RPs of the same PIM-SM
domain, multicast source information can be shared in the PIM-SM domain, and
anycast RP can be implemented.
PIM Protocol Independent Multicast (PIM) is one of the multicast routing protocols.
PIM forwarding can be implemented only if unicast routes are reachable. By
using the existing unicast routing information, PIM performs Reverse Path
Forwarding (RPF) check on multicast messages. In this manner, multicast
routing entries are created and the multicast distribution tree is set up.
SA Source Active (SA) refers to a type of the MSDP message. An SA message

contains multiple groups of (S,G) information or encapsulates a Register
message. MSDP peers exchange (S,G) information to share the multicast source
information.
SPT Shortest Path Tree (SPT) distributes multicast data by taking the multicast
source as the root and multicast group members as leaves. SPT is applicable to
PIM-DM, PIM-SM, and PIM SSM.
BSR BootStrap Router (BSR), also called Boot Router, is the management core of the
PIM-SM network. The BSR collects the C-RP information into an RP-set,
encapsulates the RP-set into a Bootstrap message, and advertises the Bootstrap
message to each PIM-SM device in the entire network. The PIM-SM device
then calculates the RP corresponding to the specified multicast group according
to the RP-set.

Acronym & Full Name
Abbreviation

Equipment
Acronym & Full Name

Abbreviation
BSR BootStrap Router
MSDP Multicast Source Discovery Protocol
PIM-SM Protocol Independent Multicast Sparse Mode
RP Rendezvous Point
9.6 Multicast Management
9.6.1 Introduction to Multicast Management
Definition
With the fast development of the Internet, there has been a considerable growth in all types of
data and voice and video information exchanged in the network, which speeds up the
development of multicast services. Multicast management provides the following tools for
multicast service probe and fault diagnosis.
l Multicast Ping (MPing): is a tool used to probe multicast services. By sending Internet
Control Message Protocol (ICMP) Echo Request messages, MPing triggers the setup of
the multicast forwarding tree and detects the members of reserved multicast groups over
the network.
NOTE
Reserved multicast group: The reserved multicast group addresses are within the range from
224.0.0.0 to 224.0.0.255. For example, 224.0.0.5 is reserved for the OSPF multicast group;
224.0.0.13 is reserved for the PIMv2 multicast group.
l Multicast trace route (MTrace): is a tool used to trace multicast forwarding paths. It can
trace the path from a receiver to a multicast source along the multicast forwarding tree.
Purpose
As multicast services are widely applied, MPing and MTrace become more important in
multicast service maintenance and fault location. When selecting the network devices that
support multicast, users demand that the devices should support not only multicast forwarding
and multicast routing protocols but also tools for diagnosing multicast faults. With the
development of multicast services, multicast maintenance and fault location are absolutely
necessary.
MPing mainly has the following uses:
l Pinging the address of a common multicast group

l Checking the protocol running status and checking whether the multicast distribution
tree is set up by viewing multicast routing table on the multicast device

Equipment
l Performing statistics on the ICMP Echo Reply messages sent from the destination host to
calculate the time-to-live (TTL) and response time from the multicast source to the
member of the multicast group
l Obtaining the network delay and route jitter by performing MPing periodically
l Pinging the address of a reserved multicast group
l Checking the members of reserved multicast groups over the network
MTrace mainly has the following uses:
l Locating faulty nodes and finding configuration errors in multicast troubleshooting and
routine maintenance
l Tracing the actual forwarding path of packets and collecting traffic information during
the trace; calculating multicast traffic rate in cyclic path tracing
l Outputting information about the faulty nodes for the NMS to analyze the fault and
generate alarms
9.6.2 Principles
9.6.2.1 MPing
MPing uses standard Internet Control Message Protocol (ICMP) messages to detect the
connectivity of a multicast path. A standard ICMP message used by MPing is an ICMP Echo
Request message, with the encapsulated destination address being a multicast address (either a
multicast address for the reserved multicast group or a common multicast group address).
l If the encapsulated destination address is a multicast address for the reserved multicast
group, the querier must specify the outgoing interface of the ICMP Echo Request
message. Finding that the destination address of the received ICMP Echo Request
message is the address of the reserved multicast group, the member (multicast device) of
the reserved multicast group responds with an ICMP Echo Reply message. Therefore,
MPing can be used to check the members of reserved multicast groups over the network.
l If the encapsulated destination address is a common multicast group address, the querier
cannot specify the outgoing interface of the ICMP Echo Request message. The ICMP
Echo Request message, as multicast traffic, is forwarded across the multicast network,
which can build multicast routing. The network quality analysis (NQA) software can
perform the MPing operations on multicast groups, and then gather the information
about delay and jitter. In this manner, multicast services can be successfully maintained
and multicast faults can be located.
9.6.2.2 MTrace
MTrace is complied with the protocol standard draft-fenner-traceroute-ipm-01 defined by the

Internet Engineering Task Force (IETF).
This standard describes a mechanism to trace the path on which multicast data is forwarded
from a multicast source to a designated receiver.

Equipment
Figure 9-57 Networking diagram of MTrace
Source First-hop route
Receiver
Querier Last-hop route
IGP Tracert Query

IGP Tracert Request
IGP Tracert Response
MTrace is based on the multicast-enabled network such as the Protocol Independent Multicast
(PIM), including PIM-DM or PIM-SM and the established multicast distribution tree. MTrace
probes the multicast forwarding path by sending IGMP Tracert messages. IGMP Tracert
messages fall into the following types: IGMP Tracert Query message, IGMP Tracert Request
message, and IGMP Tracert Response message.
l The IGMP Tracert Request message is the IGMP Tracert Query message with an
additional response data block added to the end of the message.
l The IGMP Tracert Response message is the IGMP Tracert Request message with only
the message type field changed.
The principle of MTrace is as follows:
1. Run the MTrace command on the querier, with the multicast source address, destination
host address, and multicast group being specified.
2. The querier sends an IGMP Tracert Query message to the last-hop device connected with
the destination host.
3. After receiving the IGMP Tracert Query message, the last-hop device adds a response
data block containing the information about the interface receiving this IGMP Tracert
Query message to construct an IGMP Tracert Request message, and sends the message
to the previous-hop device.
4. The device of each hop adds a response data block to the IGMP Tracert Request message
and sends the message upstream.
5. When the first-hop device connected with the multicast source receives the IGMP
Tracert Request message, it also adds a response data block and sends the IGMP Tracert
Response message to the querier.
6. The querier parses the IGMP Tracert Response message and obtains the information
about the forwarding path from the multicast source to the destination host.
7. If the IGMP Tracert Request message cannot reach the first-hop device because of some
errors, the IGMP Tracert Response message is directly sent to the querier. The querier

Equipment
then parses the data block information for locating the faulty node. In this way, faulty
node monitoring is realized.
An MTrace operation can be initiated in the following modes. The initiating modes vary
with networking environment.
– all-router: indicates that the current multicast device is directly connected to the
destination host but it is not the last-hop device. 224.0.0.2 is set as the destination
address of the message. Such a message can be received by all multicast devices
residing on the network segment of the destination host, including the last-hop
device.
– last-hop: indicates that the IP address of the last-hop multicast device is set as the
destination address of the message. This mode requires the user to input the IP
address of the last-hop device.
– destination: indicates that the IP address of the destination host is set as the
destination address of the message. When the multicast device that directly
connects the destination host receives such a message, the device judges whether it
is the last-hop device. If not, the device re-encapsulates the IGMP Tracert Query
message in all-router mode.
– multicast-tree: indicates that the querier is just on the path from the multicast source
to the destination host (for example, the first-hop multicast device). The IP address
of the traced multicast group is set as the destination address of the message, and
the IP address of the multicast source is set as the source address of the message.
Then, the message is forwarded along the multicast path and finally arrives at the
last-hop multicast device.
Terms
None.

MPing Multicast Ping
MTrace Multicast trace route
NQA Network Quality Analysis
9.7 Multicast Route Management

Equipment
9.7.1 Introduction to Multicast Route Management
Definition
Multicast Route Management is used to manage multicast routing tables and control the
creation and change of multicast routes.
Multicast route management provides the following functions:

l Reverse Path Forwarding (RPF) check
l Multicast load splitting
l Longest-match multicast routing
l Multicast boundary designation
l Multicast NSR
Purpose
l RPF check
This function is used to search for an optimal unicast route to a multicast source and
create a multicast forwarding tree. The outgoing interface of the unicast route is the
incoming interface of the forwarding entry. Then, when the forwarding module receives
multicast data packets, it searches the forwarding entry and checks whether the incoming
interface of the data packets is correct. If the interface that a multicast data packet
reaches is the outgoing interface of the unicast route, the packet passes the RPF check;
otherwise, the packet cannot pass the RPF check and is discarded. The RPF check
effectively avoids traffic loops during multicast data forwarding.
l Multicast load splitting
During multicast routing, you can configure a multicast load splitting policy on the ATN
so that the ATN can select different routes from the equal-cost routes as RPF routes for
different forwarding entries to guide data forwarding. Because the RPF routes of
forwarding entries can be distributed to different equal-cost routes, multicast data
distribution is implemented.
l Longest-match multicast routing
During multicast routing, the ATN prefers a route whose destination address mask and
source address mask are of the longest match to achieve accurate route matching.
l Multicast boundary designation
By configuring a multicast boundary on an interface, you can block multicast data on the
interface. That is, disable the interface from forwarding the received multicast data.
l Multicast NSR
Through multicast NSR, the adjacent devices cannot sense the master/slave switchover
of the current device. Therefore, multicast routing is not interrupted and the MDT is not
changed, which will not trigger the processing on the adjacent devices.
9.7.2 Principles
9.7.2.1 RPF Check

Equipment
The Reverse Path Forwarding (RPF) check rules are as follows: According to the source of a
packet, a multicast device searches its unicast routing table, Multicast Border Gateway
Protocol (MBGP) routing table, Multicast Interior Gateway Protocol (MIGP) routing table,
and static multicast routing table for an optimal route as an RPF route. A packet passes the
RPF check only when the interface that the packet reaches is the same as the RPF interface.
If all the MIGP, MBGP, and MSR routing tables have candidate routes for the RPF route, the
system selects one optimal route from each of the routing table. If the routes selected from
each table are Rt_urt (migp), Rt_mbgp, and Rt_msr, the system selects the RPF route based
on the following rules:
l By default, the system selects the RPF route based on the route priority.
a. The system compares the priorities of Rt_urt (migp), Rt_mbgp, and Rt_msr. The
route with the smallest priority value is preferentially selected as the RPF route.
b. If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same priority, the system selects
the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt (migp).
l If the multicast longest-match command is run to control route selection based on the
route mask:
– The system compares the mask lengths of Rt_urt (migp), Rt_mbgp, and Rt_msr.
The route with the longest mask is preferentially selected as the RPF route.
– If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length, the system
compares their priorities. The route with the smallest priority value is preferentially
selected as the RPF route.
– If Rt_urt (migp), Rt_mbgp, and Rt_msr have the same mask length and priority, the
system selects the RPF route in descending order of Rt_msr, Rt_mbgp, and Rt_urt
(migp).
Figure 9-58 Process of an RPF check
ATN B
GE0/2/1 Receiver
RouterA GE0/2/0
Source
ISP
192.168.0.1/24
GE0/2/0
GE0/2/1 Receiver
multicast packets ATN C
IP Routing Table on ATN C

Destination/Mask Interface
192.168.0.0/24 GE0/2/1
In Figure 9-58, ATN C searches its routing tables and finds that GE 0/2/1 is the outbound
interface of the shortest path to the source. If the inbound interface of (S,G) entry is GE 0/2/1,

Equipment
ATN C forwards the packet. If the inbound interface of (S,G) entry is not GE 0/2/1, ATN C
discards the packet.
9.7.2.2 Multicast Load Splitting

The multicast load splitting function supports five policies:
l Multicast group-based load splitting
Multicast group-based load splitting mainly applies to a large number of multicast
groups that exist on the network.
l Multicast source-based load splitting
Multicast source-based load splitting applies to a large number of multicast sources that
exist over the network.
l Multicast source and group-based load splitting
Multicast source- and multicast group-based load splitting mainly applies to t a large
number of multicast sources and groups that exist over the network.
l Stable-preferred load splitting
Stable-preferred load splitting mainly applies to the preceding multicast load splitting
policies. It can also be applied to a shared network segment.
l Balance-preferred load splitting
The scenarios for balance-preferred load splitting are the same as those for stable-
preferred load splitting.
Multicast Group-based Load Splitting

Figure 9-59 shows the networking diagram of multicast group-based load splitting.
Figure 9-59 Networking diagram of multicast group-based load splitting

RouterA
RouterE
RouterB
Source1 ATN G
RouterC
(S,G1) RouterF
(S,G2)
RouterD
(S,G3)
(S,G4)
......
Based on a series of algorithms, a multicast ATN can select an appropriate route among
several equal-cost routes for each multicast group. This route is used for packet forwarding
for this group. Finally, multicast traffic for different groups can be split into different
forwarding paths.

Equipment
Multicast Source-based Load Splitting

Figure 9-60 shows the networking diagram of multicast source-based load splitting.
Figure 9-60 Networking diagram of multicast source-based load splitting

RouterA
RouterE
RouterB
Source1 ATN G
RouterC
(S1,G)
......
RouterF
Source10
RouterD
(S10,G)
several equal-cost routes for each multicast source. This route is used for packet forwarding
for this source. Finally, multicast traffic from different sources can be split into different
forwarding paths.
Multicast Source- and Multicast Group-based Load Balancing

Figure 9-61 shows the networking diagram of multicast source- and multicast group-based
load splitting.
Figure 9-61 Networking diagram of multicast source- and multicast group-based load
splitting
RouterA
RouterE
RouterB
Source1 ATN G
RouterC
(S1,G1)
......
RouterF
Source10
RouterD
(S10,G10)

Equipment
several equal-cost routes for each source-specific multicast group. This route is used for
packet forwarding for this source-specific multicast group. Finally, multicast traffic for
different source-specific groups can be split into different forwarding paths.
Stable-preferred Load Splitting

l Applicable environment
In addition to the scenarios for the preceding multicast load splitting policies shown in
Figure 9-59, Figure 9-60, and Figure 9-61, stable-preferred load splitting can also be
applied to a shared network segment, as shown in Figure 9-62.
Figure 9-62 Networking diagram of stable-preferred load splitting
RouterA
RouterE
RouterB
Source ATN G
RouterC
RouterF Receiver
RouterD
l Implementation principle
The ATN configured with stable-preferred load splitting selects the most appropriate
route for a newly created entry, that is, the route assigned the fewest entries. When the
network topology and entries are stable, all entries with the sources on the same network
segment are distributed evenly among the equal-cost routes.
If unbalance is caused because an entry is deleted or the weight of a route changes, the
ATN configured with stable-preferred load splitting solves the problem by selecting the
most appropriate routes for subsequent entries.
In stable-preferred load splitting mode, if finding that entries are not balanced among
paths, the device will balance entries after a certain time (a waiting time) to reduce the
impact of frequent entry changes on the system.
Currently, setting a load balancing timer to change the waiting time before balancing
entries is supported.
Balance-preferred Load Splitting

The ATN configured with balance-preferred load splitting selects the most appropriate route
for a newly created entry, that is, the route assigned the fewest entries. Balance-preferred load
splitting always balances the entries with sources on the same network segment among the
equal-cost routes. This balancing occurs even if entries are deleted, the weights of outgoing

Equipment
interfaces change, or the number of equal-cost routes changes. When the entries are
unbalanced, there is a delay for the ATN enabled with balance-preferred load splitting to
balance the entries, which prevents the frequent changes of routes for the entries. In addition,
within the delay, the ATN can balance the entries by selecting the most appropriate routes for
subsequent entries.
Currently, setting a load balancing timer to change the waiting time before balancing entries is
supported.
Unbalanced Load Splitting

l Applicable environment
Unbalanced load splitting complements stable-preferred load splitting and balance-
preferred load splitting, but does not change the basic behaviors of the two policies.
Instead, unbalanced load splitting distributes the entries on the equal-cost routes in a
certain proportion. Unbalanced load splitting is applied to the scenarios of the preceding
two types of load splitting. The exception is shown in Figure 9-62. Typical scenarios of
unbalanced load splitting are as follows:
– When the forwarding capabilities of several equal-cost routes are different or the
severities of traffic congestion on the equal-cost routes are different, unbalanced
load splitting can be configured to adjust the entries distributed to the equal-cost
routes. In this case, the route, the weight of whose outgoing interface is higher, is
assigned more entries. Stable-preferred load splitting is valid to only the subsequent
entries whereas balance-preferred load splitting adjusts the existing entries based on
the weights of the equal-costs routes.
– When the ATN on the path of an equal-cost route of the ATN configured with
unbalanced load splitting needs to be upgraded, the weight of the outgoing interface
of the equal-cost ATN can be set to 0 so that the traffic of the entries on the equal-
cost route can be switched to other equal-cost routes. This scenario applies when
balance-preferred load balancing is configured.
l Implementation principle
The unicast equal-cost routes are different in terms of the forwarding capability, network
load, and link use. Therefore, it is difficult to implement balanced load splitting for
multicast entries in a specific scenario. To solve this problem, unbalanced load splitting
is introduced. Unbalanced load splitting allows users to set the weights of the equal-cost
routes on the outgoing interfaces of the equal-cost routes. The route, the weight of whose
outgoing interface is higher, is assigned more entries.
9.7.2.3 Longest-Match Multicast Routing
During route selection, an optimal intra-domain unicast route, an optimal inter-domain unicast
route, and an optimal multicast static route are selected. One of the them is finally selected as
the forwarding path for the multicast data.
The longest match principle works as follows:
1. If the longest match principle is configured for route selection, a route with the longest
matched mask is chosen by the multicast router.
For example, there is a multicast source with the IP address of 10.1.1.1, and multicast
data needs to be sent to a host with the IP address of 192.168.1.1. There are two
reachable routes to the source in the static routing table and intra-domain unicast routing
table, and the destination network segments are 10.1.1.0/16 and 10.1.1.0/24. Based on

Equipment
the longest match principle for route selection, the route to the network segment of
10.1.1.0/24 is chosen as the forwarding path for the multicast data.
2. If the mask lengths of the routes are the same, the route with a higher priority is chosen
as the forwarding path for the multicast data.
3. If the mask lengths and priorities of the routes are the same, a route is selected in the
order of a static route, an inter-domain unicast route, and an intra-domain unicast route
as the forwarding path for multicast data.
4. If all the preceding conditions cannot determine a forwarding path for multicast data, the
route with the highest next-hop address is chosen.
9.7.2.4 Multicast Boundary Designation
A multicast boundary is used to control the transmission of multicast information. After this
function is enabled, the multicast information each multicast group corresponds to can be
transmitted only within a certain range. You can configure a multicast boundary on an
interface to form a closed multicast forwarding area. When an interface of the multicast
device is configured with the multicast boundary for a group, the interface does not forward
or receive any packet for this group.
Principles
Figure 9-63 Networking diagram of the multicast boundary
Source1 Source2
ATN B ATN D
GE0/2/0 GE0/2/0
RouterA RouterE
Multicast
packet
RouterC RouterF
Receiver Receiver
As shown in Figure 9-63, RouterA, ATN B, and RouterC form a multicast domain 1; ATN D,
RouterE, and RouterF form a multicast domain 2. The two multicast domains communicate
through ATN B and ATN D.
If the data for a multicast group (G) in one multicast domain needs to be isolated from the
other multicast domain, you only need to configure GE 0/2/0 of ATN B or GE 0/2/0 of ATN
D as a multicast boundary for G so that the interface no longer forwards data to and receives
data from G.

Equipment
9.7.2.5 Multicast NSR

NOTE
Only ATN 950B supports this function.
Non-Stop Routing (NSR) is a feature used to implement master/slave switchover on the

control plane. Through multicast NSR, the protocol control plane can back up protocol
control information, including neighbor information, MDT information, RP set information.
Multicast NSR also finishes information synchronization between the protocol control plane
and the forwarding control planes, and between the forwarding control plane of the MPU and
the forwarding control plane of the LPU.
Through multicast NSR, the adjacent devices cannot sense the master/slave switchover of the
current device. Therefore, multicast routing is not interrupted and the MDT is not changed,
which will not trigger the processing on the adjacent devices.
Currently, multicast NSR is applicable to Protocol Independent Multicast-Sparse Mode (PIM-
SM), Protocol Independent Multicast-Source-Specific Multicast (PIM-SSM), Multicast
Virtual Private Network (MVPN), Multicast Source Discovery Protocol (MSDP), and Internet
Group Management Protocol (IGMP). Multicast NSR is not applicable to Protocol
Independent Multicast-Dense Mode (PIM-DM).
Terms
Term Description
Multicast load Multicast load splitting is different from load balancing. Multicast load
splitting splitting indicates that multicast entries can be distributed to multiple
equal-cost routes and the number of multicast entries transmitted on
each equal-cost route can be different.
MBGP routing Multicast BGP routing table

table MBGP is an application of MP-BGP in multicast. The BGP protocol
defines an address family to advertise unicast routes for multicast, thus
forming a multicast BGP routing table.
MIGP routing Multicast IGP routing table

table For the route in the unicast routing table with its outbound interface
being a shortcut tunnel interface, route recalculation is performed and
the routes detouring around the shortcut tunnel interfaces are generated,
thus forming a multicast IGP routing table.
An optimal route is then selected from the multicast IGP routing table
for multicast data forwarding.

Equipment

Acronym & Full Name
Abbreviation
RPF Reverse Path Forwarding
9.8 Multicast VPN in Rosen Mode
9.8.1 Introduction to Multicast VPN
Definition
Multicast VPN (MVPN) in Rosen Mode implements multicast service transmission over
MPLS/BGP VPNs. It is a solution based on the multicast domain (MD) scheme defined in
RFC 6037.
Purpose
MVPN in Rosen Mode implements multicast transmission on MPLS/BGP VPNs. It transmits
multicast data and control messages of the PIM instances in private network over the public
network to the remote sites of the VPN.
The PIM instances in the public network (PIM P-instances) need not know multicast data
transmitted between the private networks and the PIM C-instances also need not know
multicast routing information of the PIM P-instance. Therefore, isolating the PIM instances of
the public network from those of the private networks is implemented.
9.8.2 Principles
9.8.2.1 Concepts in MVPN
l MD
MD is short for Multicast Domains. MD is the set of all the VPN instances that can
transmit multicast packets on each Provider Edge (PE). Different VPN instances belong
to different MDs. An MD serves a specific VPN. All private multicast data transmitted in
the VPN is transmitted in the MD.
l Share-Group
Based on the MD principle, all the VPN instances on the PEs in the same MD must join
a common group, called a Share-Group.
Currently, one VPN instance can be configured with only one Share-Group, that is, one
VPN instance can join only one MD.
l Share-Multicast Distribution Tree
Share-MDT is short for Share-Multicast Distribution Tree. Actually, it is set up when the
PIM C-instances on the PEs join Share-Groups. A Share-MDT transmits the PIM

Equipment
protocol packets and data packets to other PEs within the same VPN. The Share-MDT is
regarded as a multicast tunnel (MT) within an MD.
l MTI
MTI is short for Multicast Tunnel Interface. It is the outgoing interface or incoming
interface of an MT. An MTI is equal to the outgoing interface or incoming interface of
an MD. The local PE and remote PE send and receive VPN data through MTIs.
The MTI is the channel through which the public network instance and VPN instances
on PEs communicate. PEs are connected to an MT by using MTIs, which is equal to the
situation that PEs are connected to a shared network segment. On each PE, VPN
instances that belong to the MD set up the PIM neighbor relationship on MTIs.
l Switch-Group
It is a group to which all the VPN receivers of the PE join for establishing a Switch-
MDT after a Share-MDT is established.
l Switch-MDT
Switch-MDT is short for Switch-Multicast Distribution Tree. It prevents multicast data
packets from being transmitted to unnecessary PEs. After a Share-MDT is set up, all the
PEs to which the receivers in the VPN are attached join an MDT set up based on Switch-
Groups. A Switch-MDT can transmit high-rate data packets to other PEs in the same
VPN.
9.8.2.2 Basic Implantation Principles
The MVPN scheme is applicable to a multicast-supported backbone network (core network or

public network) of the service provider (SP).
l The PIM instance running in the VPN instance bound to the PE is referred to as a VPN-
specific PIM instance or a PIM C-instance.
l The PIM instance running at the public network side of the PE is referred to as a PIM P-
instance.
Figure 9-64 Networking diagram of MVPN

Source2
Source1
VPN CE1B CE2R VPN

BLUE RED
Public
PE1 PE2
VPN CE2B
RED CE1R P VPN
BLUE
PC2
PC1

Equipment
The process of implementing the communication between PIM C-instances on the PEs
through MVPN is as follows:
1. Establish a virtual MT between PIM C-instances.

2. Each PIM C-instance creates a Multicast Tunnel Interface (MTI) to connect to the MT.
3. Each VPN instance joins the corresponding MT based on the configured Share-Groups.
In this manner, the VPN instances with the same Share-Group address form a multicast
domain (MD).
As shown in Figure 9-64, VPN BLUE instances bound to PE1 and PE2 communicate through
the MD BLUE and similarly, VPN RED instances bound to PE1 and PE2 communicate
through the MD RED, as shown in Figure 9-65 and Figure 9-66.
Figure 9-65 Networking diagram of MD-based VPN BLUE interworking
Source1
VPN CE1B
BLUE
PE1 MD BLUE PE2
CE2B
VPN
BLUE
PC2
Figure 9-66 Networking diagram of MD-based VPN RED interworking
Source2
CE2R VPN
RED
PE1 MD RED PE2
VPN
RED CE1R
PC1

Equipment
The PIM C-instance on the PE considers the MTI as a LAN interface and sets up the PIM
neighbor relationship with the remote PIM C-instance through MTIs. The PIM C-instances
then use MTIs to perform DR election, send Join/Prune messages, and forward and receive
multicast data.
The PIM C-instance sends PIM protocol packets or multicast data packets to the MTI and the
MTI encapsulates the received packets. The packets after encapsulation are public network
multicast data packets and therefore are forwarded by the PIM P-instances on the network. In
conclusion, an MT is actually a multicast distribution tree on the public network.
l Different VPNs use different topologies and each topology uses a unique packet
encapsulation mode. In this manner, multicast data in different VPNs is isolated from
each other.
l The PIM C-instances on the PEs in the same VPN use the same MT and communicate
through this MT.
NOTE
A VPN uniquely defines an MD. An MD serves only one VPN. This relationship is called one-to-one
relationship. The VPN, MD, MTI, Share-Group, and Switch-group-pool are all in one-to-one
relationship.
9.8.2.3 PIM Neighbor Relationship Between CE, PE, and P
The PIM neighbor relationship is set up between two or more direct multicast devices that
reside in the same network segment. There are three types of PIM neighbor relationships in a
multicast domain (MD) VPN: PE-Customer Edge (CE) neighbor relationship, PE-Provider
(P) neighbor relationship and PE-PE neighbor relationship.
As shown in Figure 9-67, VPN A instances on each PE and the sites that belong to the VPN
A implement VPN A multicast. Figure 9-68 shows the neighbor relationship between CE,
PE, and P.

Equipment
Figure 9-67 VPN A multicast
VPNA
site1
CE1
PE1_vpnA-instance
PE3_vpnA-instance
MD A
MD A
CE2
VPN A
site3 PE2_vpnA-instance
CE3
VPN A
site2
Figure 9-68 Neighbor relationship between CE, PE, and P in an MD
CE3
PE3
CE1 CE2
P
MD
PE1 PE2
PE-PE neighbour
PE-P neighbour
PE-CE neighbour
l PE-CE neighbor relationship

Equipment
It is set up between the interface on the PE bound to a VPN instance and the interface on
the CE at the remote end of the link.
l PE-P neighbor relationship
It is set up between the interface on the public network side of the PE and the interface
on the P at the remote end of the link.
l PE-PE neighbor relationship
It is set up after the VPN instance on the local PE receives Hello packets from the VPN
instance on the remote PE through a Multicast Tunnel Interface (MTI).
9.8.2.4 Process of Establishing a Share-MDT
The multicast distribution tree (MDT) that takes the Share-Group address as the group address
is called a Share-MDT. The VPN uniquely identifies a Share-MDT by using a Share-Group.
The public network can run PIM-SM or PIM-DM. The process of establishing a Share-MDT
is different in the two cases.
Establishing a Share-MDT in a PIM-SM Network
Figure 9-69 Establishing a Share-MDT in a PIM-SM network

PE3
IBGP:11.1.3.1/24
P RP
MD
PE1 PE2
IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
Public instance IBGP Peer SPT(11.1.1.1,239.1.1.1.)

Share-Group: 239.1.1.1 SPT(11.1.2.1,239.1.1.1.)
RPT(*,239.1.1.1.) SPT(11.1.3.1,239.1.1.1.)
As shown in Figure 9-69, the public network runs PIM-SM. The process of establishing a
Share-MDT is as follows:
1. The PIM P-instance on PE1 sends a Join message with the Share-Group address being
the multicast group address to the Rendezvous Point (RP) in the public network. PEs that
receive the Join message then create the (*, 239.1.1.1) entry on themselves. PE2 and PE3
also send Join messages to the RP in the public network. A Rendezvous Point Tree
(RPT) is thus formed in the MD, with the RP being the root and PE1, PE2, and PE3
being leaves.

Equipment
2. The PIM P-instance on PE1 sends a Register message with the Multicast Tunnel
Interface (MTI) address being the source address and the Share-Group address being the
group address to the RP in the public network. The RP then creates the (11.11.1.1,
239.1.1.1) entry on itself. PE2 and PE3 also send Register messages to the RP in the
public network. Thus, three independent RP-source trees that connect PEs to the RP are
formed in the multicast domain (MD).
In the PIM-SM network, an RPT (*, 239.1.1.1) and three independent RP-source trees form a
Share-MDT.
Establishing a Share-MDT in a PIM-DM Network
Figure 9-70 Establishing a Share-MDT in a PIM-DM network
PE3
IBGP:11.1.3.1/24
MD
PE1 PE2
IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
Public instance IBGP Peer SPT(11.1.1.1,239.1.1.1.)

Share-Group: 239.1.1.1 SPT(11.1.2.1,239.1.1.1.)
SPT(11.1.3.1,239.1.1.1.)
As shown in Figure 9-70, the public network runs PIM-DM. The process of establishing a
Share-MDT is as follows.
A flooding-pruning process is initiated on the entire public network with the PIM P-instance
on PE1 being a multicast source, the Share-Group address being the multicast group address,
and other PEs that support VPN A being group members. During this process, the (11.1.1.1,
239.1.1.1) entry is created on the PEs along the path in the public network. A Shortest Path
Tree (SPT) with PE1 being the root and PE2 and PE3 being leaves is thus set up. PE2 and
PE3 also start the similar flooding-pruning process in the public network to form two SPTs.
As a result, in the PIM-DM network, three independent SPTs are created and form a Share-
MDT.
9.8.2.5 MT Transmission Process Based on the Share-MDT
After the Share-MDT is established, MT transmission can be performed.

Equipment
MT Transmission Process Based on the Share-MDT

1. The VPN instance on a PE sends a VPN multicast packet to an Multicast Tunnel
Interface (MTI).
2. Regardless of whether the packet is a protocol packet or a data packet, the PE
encapsulates the packet with the MTI address being the source address and the share-
group address being the group address and converts the packet into a public network
multicast data packet.Figure 9-71 shows the encapsulation format of a public network
multicast data packet.
3. The PE then forwards the multicast data packet of the public network to the public
network instance. Then the public network instance sends out the packet.
4. The packet is forwarded to the public network instance on the remote PE along the
Share-multicast distribution tree (Share-MDT).
5. The remote PE decapsulates the packet, reverts it to a VPN multicast packet, and
forwards it to the VPN instance.
Figure 9-71 shows the process of converting a VPN multicast packet into a public network
multicast data packet and then into a VPN multicast packet. Table 9-21 describes the meaning
of each field in a VPN or public network multicast packet.
Figure 9-71 Process of converting a VPN multicast packet

Public network multicast
data packet
P-IP Header
VPN multicast GRE

VPN multicast
packet packet
C-IP Header C-IP Header C-IP Header
C-Payload C-Payload C-Payload
Table 9-21 Fields in a VPN and public network multicast packet
Field Description
C-IP Header IP header of a VPN multicast packet.
C-Payload Type of a VPN multicast packet, which can be a protocol or

data packet.
GRE Generic Routing Encapsulation (GRE) encapsulation.
P-IP Header IP header of a public network multicast data packet. In this

header, the source address is the MTI interface's address,
and the destination address is the share-group's address.

Equipment
Major Tasks in the MT Transmission Process

l MTIs exchange Hello packets to set up PIM neighbor relationship between VPN
instances on each PE.
l MTIs exchange other protocol packets to set up a VPN MDT.
l The MT transmits VPN multicast data.
NOTE
l All interfaces that belong to the same VPN, including the PE interfaces bound to the VPN instance
and MTI, must be in the same PIM mode.
l The VPN instance and the public network instance are independent of each other. They can be in
different PIM modes.
Process of Transmitting Multicast Protocol Packets

When a VPN runs PIM-DM,
l MTIs exchange Hello packets to set up PIM neighbors.
l Flooding-pruning is initiated across the public network to create a shortest path tree
(SPT).
When a VPN runs PIM-SM,
l MTIs exchange Hello packets to set up PIM neighbors between VPN instances.
l If receivers and the VPN Rendezvous Point (RP) belong to different sites, receivers need
to send Join messages across the public network to set up a shared tree.
l If the multicast source and the VPN RP belong to different sites, registration must be
initiated across the public network to set up a source tree.
In the following example, the public network and VPNs run PIM-SM. VPN receivers send
Join messages across the public network. An example is given to show the process of
transmitting multicast protocol packets.
As shown in Figure 9-72, the receiver in VPN A belongs to Site2 and is connected to CE2.
CE1 is the RP of the VPN group G (225.1.1.1) and belongs to Site1.

Equipment
Figure 9-72 Process of transmitting multicast protocol packets

PE3 S: 192.1.1.1/24
IBGP:11.1.3.1/24 G: 255.1.1.1
Share-Group: 239.1.1.1
RP
Source CE1 PE1 PE2 CE2 Receiver

IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
VPN instance Join(*,255.1.1.1)

Public instance Join(11.1.2.1,239.1.1.1)
Public instance IBGP Peer
The process of exchanging multicast protocol packets is as follows:

1. Through IGMP, the receiver informs CE2 to receive and forward data of the multicast
group G. CE2 creates the (*, 225.1.1.1) entry locally, and then sends a Join messages to
the VPN RP (CE1).
2. The VPN instance on PE2 receives the Join message sent by CE2, creates the (*,
255.1.1.1) entry, and specifies an MTI as the upstream interface. The instance then
forwards the Join message to the P for further processing. The VPN instance on PE2 then
considers that the Join message is sent out from the MTI.
3. PE2 encapsulates the Join message with GRE, reverts it to a common multicast data
packet (11.11.1.2, 239.1.1.1) on the public network with the address of the IBGP
interface on PE2 being the source address and the share-group address being the group
address. PE2 forwards the multicast data packet to the public network instance on PE2
for forwarding.
4. The multicast data packet (11.11.1.2, 239.1.1.1) is forwarded to the public network
instance on each PE along the Share-MDT. PEs decapsulate the packet and revert it to
the Join message sent to the VPN RP. The PEs then check the Join message. If the VPN
RP (CE1) is their directly connected sites, the PEs send the message to the VPN
instances on them for further processing. Otherwise, the Join message is discarded.
5. After receiving the Join message, the VPN instance on PE1 considers that the message is
received from an MTI. The instance creates the (*, 225.1.1.1) entry, and specifies an
MTI as the downstream interface and the interface towards CE1 as the upstream
interface. Then, the instance sends the Join message to the VPN RP.
6. After receiving the Join message from the instance on PE1, CE1 updates or creates the
(*, 225.1.1.1) entry. The multicast shared tree across VPNs is thus set up.

Equipment
Process of Transmitting Multicast Data Packets

When a VPN runs PIM-DM, VPN multicast data is transmitted across the public network
along the VPN SPT.
When a VPN runs PIM-SM,
l If receivers and the VPN RP belong to different sites, the VPN multicast data is
transmitted across the public network along the VPN RPT.
l If the multicast source and the receiver belong to different sites, the VPN multicast data
is transmitted across the public network along the source tree.
In the following example, the public network and VPNs run PIM-DM. VPN multicast data is
transmitted across the public network along the SPT. An example is given to show the process
of transmitting multicast data packets along the Share-MDT.
As shown in Figure 9-73, the multicast source in VPN A sends multicast data to the group G
(225.1.1.1). The receiver belongs to Site2 and is connected to CE2.
Figure 9-73 Process of transmitting multicast data packets

PE3 S: 192.1.1.1/24
IBGP:11.1.3.1/24 G: 255.1.1.1
RP
P

IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
VPN instance packets (192.1.1.1, 225.1.1.1)

Public instance packets (11.1.1.1, 239.1.1.1)
The process of transmitting VPN multicast data across the public network is as follows:
1. The source sends VPN multicast data (192.1.1.1, 225.1.1.1) to CE1.
2. CE1 forwards the VPN multicast data to PE1 along the SPT. The VPN instance on PE1
searches for the forwarding entry. If the outgoing interface of the forwarding entry
contains an MTI, the instance forwards the VPN multicast data to the related P for
further processing. The VPN instance on PE1 then considers that the Join message is
sent out from the MTI.
3. PE1 encapsulates the VPN multicast data with GRE and reverts it to a public network
multicast data packet (11.11.1.1, 239.1.1.1) with the address of the IBGP interface on
PE1 being the source address and the share-group address being the group address. PE1
then forwards the multicast data packet to the public network instance for forwarding.

Equipment
4. The multicast data packet (11.11.1.1, 239.1.1.1) is sent to the public network instance on
each PE along Share-MDT. Each PE decapsulates it, reverts it to VPN multicast data,
and forwards it to the related VPN instance for further processing. If there is an SPT
downstream interface on the PE, the data is forwarded along SPT. Otherwise, the data is
discarded.
5. The VPN instance on PE2 searches for the forwarding entry and then sends the VPN
multicast data to the receiver. So far, the process of transmitting VPN multicast data
across the public network is complete.
9.8.2.6 Switch-MDT Switchover
Background
According to the process of establishing a Share-multicast distribution tree (Share-MDT)
described in the previous section, you can find that the VPN instance bound to PE3 has no
receivers but PE3 still receives the VPN multicast data packet of the group (192.1.1.1,
225.1.1.1). This is a defect of the multicast domain (MD) scheme: All the PEs belonging to
the same MD can receive multicast data packets regardless of whether they have receivers.
This wastes the bandwidth and imposes extra burden on PEs.
In MVPN, an optimized solution, Switch-MDT, is provided so that multicast data can be

transmitted on demand. It allows on-demand multicast transmission. Traffic will be switched
from the Share-MDT to the Switch-MDT if multicast traffic on PEs reaches the maximum.
Only the PEs that have receivers connected to them will receive multicast data from the
Switch-MDT. This reduces the stress on PEs and bandwidth consumption.
Implementation
Figure 9-74 shows the switch-MDT implementation process based on the assumption that a
share-MDT has been successfully established.
Figure 9-74 Switch-MDT implementation
PE3 S: 192.1.1.1/24
IBGP:11.1.3.1/24 G: 255.1.1.1
RP

IBGP:11.1.1.1/24 IBGP:11.1.2.1/24
VPN instance Join(*,255.1.1.1)

Public instance Join(11.1.2.1,239.1.1.1)

Equipment
1. On PE1, the range of the switch-group-pool for Switch-MDT is set to

238.1.1.0-238.1.1.255 and the data forwarding rate threshold that triggers Switch-MDT
switchover is set.
2. When the rate of the data forwarded by the source connected with CE1 exceeds the
configured threshold, PE1 selects a group address 238.1.1.0 from the switch-group-pool
and sends signaling packets to other PEs through the Share-MDT periodically to inform
them to switch to the Switch-MDT.
3. If PE2 has a receiver, after receiving the signaling packet, PE2 joins the group 238.1.1.0
and then a Switch-MDT is set up accordingly. The process of establishing a Switch-
MDT is similar to that of a Share-MDT. If PE3 has no receiver, after receiving the
signaling packet, it does not join the Switch-MDT. As a result, only PE2 can receive the
VPN multicast data packet of the group (192.1.1.1, 225.1.1.1). Note that the PIM control
packets are still transmitted over the Share-MDT.
The Switch-MDT switchover is triggered if the following conditions are met:
– The source and group addresses of VPN multicast data packets match the source
and group address ranges defined in the ACL filtering rules. Otherwise, the packets
are still forwarded along the Share-MDT.
– The forwarding rate of VPN multicast data packets exceeds the switchover
threshold, and this situation keeps for a certain period.
4. In some cases, the forwarding rate of VPN multicast data packets fluctuates around the
switchover threshold. To prevent the multicast data packets from being frequently
switched between the Share-MDT and Switch-MDT, the system does not perform the
switchover immediately after the system finds that the forwarding rate is greater than the
switchover threshold. Instead, the system starts the switch-delay timer. During the setup
of the Switch-MDT, the Share-MDT is still used for multicast data packet forwarding.
That is, the switch-delay timer can ensure non-stop forwarding during switch from the
Share-MDT to the Switch-MDT. Before the switch-delay timer times out, the system
continues to detect the data forwarding rate. If the rate remains higher than the
switchover threshold, data packets are switched to the Switch-MDT. Otherwise, the
packets are still forwarded through the Share-MDT.
Switchback from the Switch-MDT to Share-MDT

When the conditions change after the VPN multicast data is switched to the Switch-MDT, the
switchover conditions may not be met. In this case, PE1 switches the VPN multicast data
from the Switch-MDT to the Share-MDT. The reverse switchover conditions are as follows:
l The forwarding rate of VPN multicast data packets should be lower than the specified
threshold and remain unchanged in the switch-Holddown period.
l In some cases, the forwarding rate of VPN multicast data packets fluctuates around the
switchover threshold. To prevent the multicast data flow from being frequently switched
between the Switch-MDT and Share-MDT, the system does not perform the switchover
when the system finds that the forwarding rate is lower than the switchover threshold.
Instead, the system starts the Holddown timer. The timeout period of the timer is
configured through the related command. Before the Holddown timer expires, the system
continues to detect the data forwarding rate. If the rate is always lower than the
switchover threshold, the data packets are switched back to the Share-MDT. Otherwise,
the packets are still forwarded through the Switch-MDT.
l When the switch-group-pool is changed, the switch-group address used to encapsulate
the VPN multicast data should be outside the switch-group-pool.

Equipment
l If the advanced ACL rules used to control the switchover of VPN multicast data packets
to the Switch-MDT change, the VPN multicast data packets cannot pass the filtering of
new ACL rules.
9.8.2.7 Multicast VPN Extranet
Multicast VPN extranet meets the following requirements:
l Distributes multicast services among different enterprise users.
l Enables service or contents providers to distribute multicast services to different
enterprise users. The multicast data of a VPN can be provided for users in other VPNs to
use.
Principles
Multicast VPN extranet is applicable to two scenarios: remote-cross scenario and local-cross
scenario. The basic principles of multicast VPN extranet applied in the two scenarios are
l Remote-cross scenario
Figure 9-75 Networking diagram of the remote-cross scenario of multicast VPN extranet
Source
VPN
Source BLUE
VPN
RED
CE2
PE1 PE2
CE1
IP MPLS Core
PE3
CE3
VPN
BLUE
Receiver
Receiver

Equipment
As shown in Figure 9-75, VPN RED is configured on PE1; the address of the share-
group is the address of G1; the site where CE1 resides is connected to the multicast
source of VPN RED. VPN BLUE is configured on PE2; the address of the share-group is
the address of G2; the site where CE2 resides is connected to the multicast source of
VPN BLUE. Therefore, PE1 functions as the source PE of VPN RED, and PE2 functions
as the source PE of VPN BLUE. VPN BLUE is configured on PE3; the address of the
share-group is the address of G2; PE3 establishes an MDT with PE2 on the public
network. A user at the site where CE3 resides needs to receive multicast data from both
VPN BLUE and VPN RED. Therefore, PE3 functions as the receiver PE of VPN RED
and VPN BLUE.
In such a scenario, after configuring a VPN instance on the local PE, you need to
establish a multicast tunnel between the VPN instance of the local PE and that of the
remote PE. There are two configuration options available to provide multicast VPN
extranet services:
– Configure the source VPN on the PE where the receiver VPN resides. Based on the
multicast domain (MD) to which the VPN to be accessed belongs, configure source
VPN RED on PE3 and a multicast routing policy for the receiver VPN instance.
Then, hosts in the receiver VPN instance can send Join messages to source VPN.
PE3 then encapsulates multicast Join messages with the share-group address of
VPN RED, and sends the multicast Join messages to PE1 over the public network.
Finally, the multicast Join messages reach the multicast source of VPN RED.
Similarly, the multicast source of VPN RED sends multicast traffic over the public
network to VPN RED at the PE3 side. The multicast traffic is then imported to VPN
BLUE, and finally reaches the user.
– Configure the receiver VPN on the PE where the source VPN resides. Based on the
MD to which the VPN to be accessed belongs, configure receiver VPN BLUE
instance on PE1. Then, the source VPN instance and the receiver VPN instance can
exchange unicast routes. Hosts in the receiver VPN instance send Join messages to
the source VPN instance.
PE3 then encapsulates multicast Join messages with the share-group address of
VPN BLUE, and then sends the multicast Join messages to VPN BLUE on PE1
over the public network. PE1 then imports the multicast Join messages from VPN
BLUE to VPN RED. Therefore, the multicast Join messages reach the multicast
source of VPN RED. Similarly, after multicast traffic sent by the multicast source of
VPN RED is imported by PE1 to receiver VPN BLUE, VPN BLUE encapsulates
the multicast traffic with its share-group address, and then sends the multicast
traffic to the local VPN instance of PE3. Finally, the multicast traffic is forwarded
to the user on the associated VPN.
l Local-cross scenario

Equipment
Figure 9-76 Networking diagram of the local-cross scenario of multicast VPN extranet
Receiver Source
VPN VPN
RED BLUE
CE1
PE1 PE2 CE2
IP MPLS Core
PE3
CE3-1 CE3-2
VPN VPN
RED BLUE
Source Receiver Receiver
As shown in Figure 9-76, users at the site where CE3-2 resides need to receive multicast
data from both VPN BLUE and VPN RED. PE2 is the source PE of VPN BLUE. The
site where CE2 resides is connected to the multicast source of VPN BLUE. The
multicast source of VPN RED is connected to CE3-1. Both CE3-1 and CE3-2 are at the
PE3 side.
In the local-cross scenario, the receiver VPN and the source VPN are on the same PE,
and multicast traffic enters the PE through a VPN instance and leaves the PE through
another VPN instance. On PE3, the Import Route Target (IRT) of VPN BLUE needs to
be configured to be the same as the Export Route Target (ERT) of VPN RED so that
CE3-1 and CE3-2 can exchange VPN unicast routes. The process for a user to request
and receive multicast data from VPN RED is as follows:
a. A user at the site where CE3-2 resides requests multicast data from VPN RED. PE3
receives a PIM Join message from CE3-2, and then creates a multicast routing entry
of VPN BLUE. Through the RPF check, PE3 finds that the upstream interface of
the RPF route belongs to VPN RED. Then, PE3 sends a Join message to VPN RED.
b. PE3 creates a multicast routing entry (which has the receiver list including receiver
VPN BLUE) for VPN RED and then sends a PIM Join message to CE3-1.
c. The multicast data of VPN RED reaches PE3 through CE3-1. PE3 then imports the
multicast data to receiver VPN BLUE based on the multicast routing entries of VPN
RED.

Equipment
d. After importing multicast data from VPN RED to VPN BLUE, PE3 sends the
multicast data to CE3-2 based on multicast routing entries of VPN BLUE. CE3-2
then sends the required multicast data of VPN RED to the user.
NOTE
A VPN extranet's multicast protocol and data packets are not encapsulated by GRE if the VPN extranet
connects to multicast sources on a public network.
9.8.2.8 Multicast VPN in BGP A-D Mode
In a multicast domain (MD)-based MVPN, because the PEs that belong to the same VPN
know neither the BGP peers of each other nor multicast source information, the PEs cannot
send Join messages to the multicast source to establish a Protocol Independent Multicast-
Source-Specific Multicast (PIM-SSM) multicast distribution tree (MDT). Therefore, the
share-MDT of the public network cannot use a PIM-SSM tunnel.
MVPN in BGP auto-discovery (A-D) mode is introduced to address this problem. In MVPN
in BGP A-D mode, PEs exchange BGP Update packets carrying A-D routes (recording the
peers of each PE) to automatically discover the BGP peers of the PEs on a multicast VPN. In
this manner, multicast VPN services can be transmitted over a public-network tunnel based on
a PIM-SSM MDT.
Related Concepts
BGP A-D MVPN related concepts are as follows:
l Peer: BGP speakers that exchange messages with each other are called peers.
l A-D route: is used to discover all peers in the same VPN. This type of route helps
implement tunnel setup and control message exchange between peers.
l BGP update message: is used to exchange routes between BGP peers.
Principles
Currently, two BGP A-D modes, namely, MDT-Subsequent Address Family Identifier (SAFI)
A-D and MCAST-VPN SAFI A-D, are supported:
l In MDT-SAFI A-D mode, a new address family is defined by BGP. In this manner, after
VPN instance is configured on a PE, the PE advertises the VPN configuration including
the RD and share-group address to all its BGP peers. After a remote PE receives an
MDT-SAFI message advertised by BGP, the remote PE compares the share-group
address in the message with its share-group address. If the remote PE confirms that it is
in the same VPN with the sender of the MDT-SAFI message, the remote PE establishes
the PIM-SSM MDT on the public network to transmit multicast VPN services.
l The principles of MCAST-VPN SAFI A-D are similar to that of MDT-SAFI A-D. That
is, the multicast VPN configuration is transmitted through BGP Update packets. The
difference is that in MCAST-VPN SAFI A-D mode, a BGP Update packet carries more
multicast VPN attributes and information for establishing the public network tunnel.
Therefore, the MCAST-VPN SAFI A-D mode is applicable to the next-generation
multicast VPN.
The same VPN to which different PEs are added can use the same BGP A-D mode, and the
different VPNs to which the same PE is added can use different BGP A-D modes to
automatically discover the BGP peers of PEs.

Equipment
l Scenario where the same VPN to which different PEs are added can use the same BGP
A-D mode in MVPN in BGP A-D mode
Figure 9-77 Networking diagram of the scenario where the same VPN to which different
PEs are added can use the same BGP A-D mode in MVPN in BGP A-D mode
Source
VPN
RED
PE1
CE1
IP MPLS Core
PE2
PE3
CE3 CE2
VPN VPN
RED RED
Receiver Receiver Receiver
As shown in Figure 9-77, PE1, PE2, and PE3 belong to VPN RED, and join the share-
group G1. The address of G1 is within the SSM group address range. BGP A-D in the
same mode is enabled on each PE. In addition, the BGP A-D function is enabled on VPN
RED. The site where CE1 resides is connected to Source of VPN RED, and CE2 and
CE3 are connected to VPN users. Based on the BGP A-D mechanism, every PE on the
network obtains and records information about all its BGP peers on the same VPN, and
then directly establishes a PIM-SSM MDT on the public network for transmitting
multicast VPN services. In this manner, MVPN services can be transmitted over a public
network tunnel based on the PIM-SSM MDT.
The following takes PE3 as an example to describe service processing in MVPN in BGP
A-D mode:
a. After being configured with the BGP A-D function, PE1, PE2, and PE3 negotiate
session parameters, and confirm that both ends support the BGP A-D function.
Then, the PEs can establish BGP peer relationships. After receiving a BGP Update
packet from PE1 and PE2 respectively, PE3 obtains and records the BGP peer
addresses of PE1 and PE2. The BGP Update packets carry the information about
the PEs that send packets, such as the PE address and supported tunnel type.
b. VPN RED is configured on PE3. PE3 joins the share-group G1. PE3 creates a PIM-
SSM entry with G1 being the group address and the address of PE1 being the

Equipment
source address and another PIM-SSM entry with G1 being the group address and
the address of PE2 being the source address. PE3 then directly sends PIM Join
messages to PE1 and PE2 to establish two PIM-SSM MDTs to PE1 and PE2
respectively.
c. CE3 sends a Join message to PE3. After receiving the Join message, PE3
encapsulates the Join message with the PIM-SSM share-group address, and then
sends it to PE1 over the public network tunnel. PE1 then decapsulates the received
Join message, and then sends it to the multicast source.
d. After the multicast data sent by the multicast source reaches PE1, PE1 encapsulates
the multicast data with the share-group address, and then forwards it to PE3 over
the public network tunnel. PE3 then forwards the multicast data to CE3, and CE3
sends the multicast data to the user.
l Scenario where the different VPNs to which the same PE is added can use different BGP
A-D modes in MVPN in BGP A-D mode
Figure 9-78 Networking diagram of the scenario where the different VPNs to which the
same PE is added can use different BGP A-D modes in MVPN in BGP A-D mode
Source Source
VPN
RED VPN
BLUE
CE1
PE1 PE2 CE4

IP MPLS Core
PE3
CE3 CE2
VPN VPN
RED BLUE
Receiver Receiver Receiver
As shown in Figure 9-78, PE1 belongs to VPN RED, PE2 belongs to VPN BLUE, and
PE3 belongs to both VPN RED and VPN BLUE. BGP A-D in MDT-SAFI mode is
enabled on PE1, BGP A-D in MCAST-VPN SAFI mode is enabled on PE2, and both

Equipment
BGP A-D in BGP MDT-SAFI mode and BGP A-D in MCAST-VPN SAFI mode are
enabled on PE3. In addition, on PE3, BGP A-D in MDT-SAFI A-D mode is used by
VPN RED whereas BGP A-D in MCAST-VPN SAFI A-D mode is used by VPN BLUE.
The site where CE1 resides is connected to the multicast source of VPN RED and the
site where CE4 resides is connected to the multicast source of VPN BLUE. CE2 and
CE3 are connected to VPN users. Based on the BGP A-D mechanism, BGP peers
enabled with BGP A-D in the same mode obtain the BGP A-D information from each
other, and every PE on the network obtains and records information about all peers on
the same VPN and directly establishes a PIM-SSM MDT on the public network for
transmitting multicast VPN services. In this manner, MVPN services can be sent over a
public network tunnel based on the PIM-SSM MDT.
The interaction between MVPNs supporting BGP A-D through different BGP A-D
address families is as follows:
a. After being configured with BGP A-D, PE1, PE2, and PE3 negotiate session
parameters. Because PE1 and PE2 are configured with BGP A-D in different
modes, PE1 and PE2 fail to negotiate session parameters and cannot set up the BGP
peer relationship. Because PE3 is configured with BGP A-D in both modes, PE3
can establish BGP peer relationships with PE1 and PE2 respectively. After
receiving a BGP Update packet from PE1 and PE2 respectively, PE3 obtains and
records the BGP peer addresses of PE1 and PE2. The BGP update packets carry the
information about the PEs that send packets, such as the PE address and supported
tunnel type.
b. PE3 encapsulates its BGP A-D information including the address and supported
tunnel type in a packet used by MVPN in MDT-SAFI mode and in another packet
used by MVPN in MCAST-VPN SAFI mode, and then sends the former packet to
PE1 and latter packet to PE2. After receiving the corresponding packet, PE1 and
PE2 record the information about PE3.
c. PE1 and PE3 that belong to the same VPN RED obtain each other's information,
and send an (S, G) Join message to each other. Similarly, PE2 and PE3 that belong
to the same VPN BLUE obtain each other's peers, and send an (S, G) Join message
to each other.
d. Finally, the multicast traffic of each VPN is forwarded to users attached to CE3 of
VPN RED and users attached to CE2 of VPN BLUE respectively.
9.8.2.9 Inter-AS MVPN
Principles
As multicast services are widely deployed and the number of users requesting multicast
services is increasing, multicast users and multicast sources may reside in different ASs. To
allow multicast users in a different AS to enjoy multicast services, multicast services need to
be transmitted in a VPN across ASs. There are two types of inter-AS MVPNs: OptionA and
OptionC.
NOTE
Inter-AS MVPN does not support a switchover to the switch-MDT.
Related Concepts
The following part briefly describes the concept of inter-AS MVPN with reference to the
following figures:

Equipment
l PIM Vector: refers to a technique of adding BGP neighbor information (Vector

information) to PIM Join messages. Based on the Vector information, a multicast
distribution tree can be established in inter-AS MVPN OptionC networking. Vector
information can carry Route Distinguisher (RD) information or not.
l BGP multicast distribution tree (MDT) Subsequent Address Family Identifier (SAFI)
route: refers to a route that has the same routing policy with a VPNv4 route. PEs and
autonomous system boundary routers (ASBRs) check information about this route and
obtain BGP neighbor information as the Vector information.
l RD: refers to a route distinguisher, which is an eight-byte field in a VPNv4 address. A
VPNv4 address is composed of an RD and a four-byte IPv4 address prefix. An RD is
used to distinguish IPv4 prefixes with the same address space.
l Inter-AS VPN OptionA: refers to the networking in which ASBRs of two ASs are
directly connected and are PEs (ASBR PEs) in their respective ASs, and an ASBR takes
the peer ASBR as its CE and uses EBGP to advertise IPv4 routes to the peer ASBR.
l Inter-AS VPN OptionC: refers to the networking in which a tunnel is established
between PEs in two ASs (not ASBR PEs). Similar, a P in either AS does not know the
route to the other AS. ASBR1 and ASBR2, however, know the routes to remote PEs.
Implementation
Among two types of inter-AS MVPN, OptionA supports Any-Source Multicast (ASM) and
Source-Specific Multicast (SSM), whereas OptionC only support MDT-SAFI auto-discovery
(A-D) of MVPN BGP A-D in SSM. The implementation of each type of inter-AS MVPN is
l Inter-AS MVPN OptionA:
As shown in Figure 9-79, an independent multicast domain (MD) is established in each
AS, and VPN multicast data is transmitted between MDs.
Figure 9-79 Inter-AS MVPN OptionA
AS1 ASBR1 ASBR2 AS2

P1 P2 PE2
PE1 Real physical link
Public instance
MTI1 MTI1 MTI2 MTI2

ASBR1" MD2
CE1 MD1
Virtual multicast link
MT1 MT2 CE2
PE1" ASBR2" PE2"
VPN instance1 VPN instance2
In inter-AS MVPN OptionA, VPN multicast data is transmitted in the following process:
a. CE1 in VPN1 sends VPN multicast data to the CE of ASBR1", that is, ASBR2";
CE2 sends VPN multicast data to the CE of ASBR2", that is, ASBR1".
b. After the VPN multicast data of CE1 reaches ASBR2", ASBR2" considers that the
multicast data comes from VPN2. ASBR2" then encapsulates the multicast data and
forwards it to PE2 and then CE2 in MD2. Similarly, VPN multicast data of CE2 can
also reach CE1 based on the preceding process. In this manner, CE1 and CE2 can
exchange VPN multicast data across ASs.

Equipment
l Inter-AS MVPN OptionC:

As shown in Figure 9-80, each P does not know the route to the other AS, and each
ASBR does not know the route to the PE in the other AS. This hinders the establishment
of a multicast distribution tree across ASs because the next hop to the multicast source
cannot be identified when PIM Join messages are sent. To solve this problem, the PIM
Vector technique is used. Based on PIM Vector, PEs and ASBRs check the BGP MDT
SAFI route to obtain the BGP neighbor information (Vector information without RD
information), and add the Vector information to the PIM Join messages. In this manner, a
multicast distribution tree can be established across ASs. In inter-AS MVPN OptionC, an
ASBR knows the route to a remote PE. Therefore, after receiving a PIM Join message
with Vector information, an ASBR knows the route to the remote PE so that it can
directly identify the route corresponding to the Vector information in the IPv4 routing
table, and directly send the PIM Join message to the multicast source. In this manner, a
multicast distribution tree can be established across ASs.
Inter-AS MVPN OptionC needs only one MD to be established across ASs.
Figure 9-80 Inter-AS MVPN OptionC

VPN LSP
AS1 ASBR1 ASBR2 AS2

PE2 Real physical link
PE1
P1 P2
Public instance
MTI MTI
CE1 MD
Virtual multicast link
MT CE2
PE1" PE2"
VPN instance1 VPN instance2
In inter-AS MVPN OptionC, VPN multicast data is transmitted in the following process:
a. VPN multicast data of CE1 is encapsulated on PE1 based on MTI and then
forwarded over MT tunnels. The encapsulated VPN multicast data is transmitted
over the public network as common multicast data based on Share-Group or
Switch-Group entries of the public network.
b. The VPN multicast data reaches PE2 and inter-AS multicast is implemented. PE1
and PE2 do not know how VPN multicast data is transmitted across ASs and
consider that the VPN multicast data is transmitted within the same AS.
Application scenario
The application scenarios are inter-AS MVPN OptionA and Inter-AS MVPN OptionC.
Advantages
Inter-AS MVPN allows carriers to deploy multicast VPN across ASs to provide multicast
services for users in different ASs.
9.8.3 MVPN Applications

Equipment
9.8.3.1 Single-AS MD VPN
The single-AS MD VPN is mainly used to isolate multicast services in different VPNs within
a multicast domain (MD).
Figure 9-81 Single-AS MD VPN
Source2
Source1
VPN CE1B CE2R VPN

BLUE RED
Public
PE1 PE2
VPN CE2B
RED CE1R P VPN
BLUE
PC2
PC1
As shown in Figure 9-81, a single AS runs MPLS/BGP VPN. Both PE1 and PE2 are
configured with two VPN instances, namely, VPN BLUE and VPN RED, and the same Share-
Group address is set for the same VPN instances on the two PEs. In such a case, the VPN
instances with the same Share-Group address join the same MD. After the corresponding
Share-multicast distribution tree (MDT) is established, the protocol packets and low-rate data
in the VPNs can be transmitted through their respective Multicast Tunnel (MT).
VPN BLUE is taken as an example to describe how multicast services are transmitted
between VPNs.
1. A VPN instance named VPN BLUE is configured on both PE1 and PE2 and the
instances on the two PEs use the same Share-Group address. After the corresponding
Share-MDT is established, the VPN BLUE instances connected with CE1B and CE2B
can exchange multicast protocol packets through the corresponding MT.
2. Multicast devices in the VPNs connected with CE1B and CE2B can then establish
neighbor relationships, and send Join, Prune, and BootStrap router (BSR) messages to
each other. The protocol packets in the VPNs are encapsulated and decapsulated only on
the MT of the PEs. The devices, however, do not know that they are in VPN networks.
They still process the multicast protocol packets and forward multicast data packets like
the devices in the public network. In this way, multicast service transmission in one VPN
instance is implemented and multicast services in different VPN instances are isolated.

Equipment
Terms
Terms Explanation
PIM It is a multicast routing protocol, with the full name being Protocol
Independent Multicast. Reachable unicast routes are the basis of PIM
forwarding. PIM uses the existing unicast routing information to perform
the RPF check on multicast packets to create multicast routing entries and
set up an MDT.
SPT It is a shortest path tree, with the multicast source being the root and
group members being leaves. SPT is applicable to PIM-DM, PIM-SM,
and PIM-SSM.
Share-Group Based on the MD principle, all the VPN instances on the PEs in the same
MD must join a common group, called a Share-Group.
Currently, one VPN instance can be configured with only one Share-
Group, that is, one VPN instance can join only one MD.
Share-MDT Share-MDT is short for Share-Multicast Distribution Tree. Actually, it is

set up when the PIM C-instances on the PEs join Share-Groups. A Share-
MDT transmits the PIM protocol packets and low-rate data packets in a
VPN to other PEs within the same VPN. The Share-MDT is regarded as a
multicast tunnel (MT) within an MD.
MTI MTI is short for Multicast Tunnel Interface. It is the outgoing interface or
incoming interface of an MT. An MTI is equal to the outgoing interface or
incoming interface of an MD. The local PE sends VPN data through an
MTI. The remote PE receives it through an MTI.
The MTI is the channel through which the public network instance and
VPN instances on PEs communicate. PEs are connected to an MT by
using MTIs, which is equal to the situation that PEs are connected to a
shared network segment. On each PE, VPN instances that belong to the
MD set up the PIM neighbor relationships on MTIs.
Switch-Group It is a Switch-Group to which all VPN receivers of the PE join for

establishing a Switch-MDT after a Share-MDT is established.
Switch-MDT Switch-MDT is short for Switch-Multicast Distribution Tree. It prevents

multicast data packets from being transmitted to unnecessary PEs. After a
Share-MDT is set up, all the PEs to which the receivers in the VPN are
attached join an MDT set up based on Switch-Groups. A Switch-MDT
can transmit high-rate data packets to other PEs in the same VPN.

Acronym & Full Name
Abbreviation
ASBR Autonomous System Boundary Router

Equipment
Acronym & Full Name

Abbreviation
PIM-SM Protocol Independent Multicast - Sparse Mode
RP Rendezvous Point
9.9 Multicast Security
9.9.1 Introduction to Multicast Security

As the Internet grows, more and more data, voice and video information is exchanged across
networks. New services such as E-commerce, online conferencing and auctions, Video on
Demand (VOD), and e-learning have also emerged and continue to develop. These services
are for the most part a good fit with the point to multi-point model (P2MP) model. Their
security and bandwidth requirements are high, as is their earnings potential.
To address the problems of P2MP data transmission on IP networks, multicast group member
management, multicast packet forwarding, intra-domain multicast routing, and inter-domain
multicast routing are implemented, besides the basic functions of various multicast protocols.
In addition, multicast security is designed, and multiple methods such as limitation, filtering,
and authentication are provided to ensure the normal operation of multicast services.
9.9.2 Principles
9.9.2.1 Limit on the Total Number of Multicast Entries

The limit on the total number of multicast entries, which means the limit on the number of
IGMP entries, Multicast Source Discovery Protocol (MSDP) entries, and PIM entries,
prevents a device from creating too many multicast entries, and saves device resources.
Limit on the Number of IGMP Entries on an Interface

The number of IGMP entries to be created on an interface can be limited.
For details about the limit on IGMP entries on an Interface, see "IGMP-Limit" in IGMP
Policy Control.
Limit on the Number of IGMP Entries in a Single Instance

The number of IGMP entries to be created on all the interfaces of an instance can be limited.
For details about the limit on IGMP entries in a single instance, see "IGMP-Limit" in IGMP
Policy Control.
Limit on the Number of Global IGMP Entries

The number of IGMP entries to be created on all the interfaces of all instances can be limited.

Equipment
For details about the limit on global IGMP entries, see "IGMP-Limit" in IGMP Policy
Control.
Limit on PIM Entries

The number of PIM entries to be created on a device can be limited.
Limit on the Size of an MSDP SA Cache

An MSDP Source Active (SA) cache is used to store the (S, G, RP) information in SA
messages sent by MSDP peers.
The following methods can be used to limit the number of (S, G) entries in an MSDP SA
cache:
l Limiting the number of (S, G) entries in a single instance
l Limiting the total number of (S, G) entries on all MSDP devices
l Limiting the number of (S, G) entries on each MSDP peer
9.9.2.2 Limit on the Number of Downstream Interfaces of a Multicast Entry

The maximum number of downstream interfaces of each multicast entry on the forwarding
plane can be limited, and multicast data replication can be controlled.
9.9.2.3 Limit on Multicast Protocol Status

The limit on multicast protocol status, which means the limit on the number of maintenance
entries for various multicast protocols, prevents invalid packets from consuming device
resources. The limit on multicast protocol status includes the limit on the number of PIM
neighbors, limit on the number of multicast VPNs, limit on the number of BootStrap routers
(BSRs), and limit on the number of Candidate-Rendezvous Points (C-RPs).
Limit on the Number of PIM Neighbors

PIM packets are sent and received on the condition that PIM neighbor relationships have been
established.
The limit on the number of PIM neighbors, which means the maximum number of PIM
neighbors in the neighbor list of an interface on a device, ensures the normal operation of a
device by preventing an interface on the device from establishing too many PIM neighbor
relationships.
Limit on the Number of C-RPs

The maximum number of C-RPs whose information can be recorded by the BSR can be
limited.
Limit on the Number of BSRs in an Administrative Domain

The limit on the number of BSRs in an administrative domain includes the limit on the
number of BSRs supported by a device in an administrative domain and the limit on the
number of RPs supported by a single instance in an administrative domain.

Equipment
9.9.2.4 Multicast Filtering Policies

After being configured with various policies, a device can effectively filter protocol packets,
reject invalid packets, and maintain entries.
IGMP Group Policy

An IGMP group policy can be configured on an interface to limit the range of multicast
groups that the hosts attached to the interface can Join.
For details about an IGMP group policy, see "IGMP Group-Policy" in IGMP Policy Control.
Source Policy
A source policy is used to filter received multicast data packets based on source addresses or
source/group addresses.
SSM Policy
A Source-Specific Multicast (SSM) policy is used to change the SSM group address range in
a certain instance.
After an SSM policy is configured on a device, all PIM-SM interfaces on the device consider
that multicast groups whose addresses are within the SSM group address range adopt the
PIM-SSM model.
For details about an SSM policy, see SSM Mapping.
BSR Policy
A BootStrap router (BSR) policy is used to limit the range of valid BSR addresses. A device
configured with a BSR policy discards messages received from the BSRs whose addresses are
beyond the set address range, preventing BSR spoofing.
C-RP Policy
A Candidate-Rendezvous Point (C-Rendezvous Point) policy is used to set the range of valid
C-RP addresses and the range of multicast groups that each C-RP serves. The BSR configured
with a C-RP policy discards messages received from the C-RPs whose addresses are beyond
the set address range, preventing C-RP spoofing.
Register Policy
After being configured with a register policy, a device receives or denies Register messages
matching the register policy, preventing illegal Register messages.
MSDP SA Policy
A Multicast Source Discovery Protocol (MSDP) Source Active (SA) policy is used to filter
the received or sent SA messages.
When receiving SA messages from a specified MSDP peer or forwarding SA messages to a
specified MSDP peer, a device configured with an MSDP SA policy filters the (S, G)
information in the SA messages based on source addresses or source/group addresses. In this
manner, the transmission of source information or source/group information is controlled.

Equipment
MSDP SA Request Policy

An MSDP SA request policy is used to filter SA request messages received from a specified
MSDP peer. Once a received SA request message matches the policy configured on a device,
the device immediately responds to the SA request message.
MSDP SA Import Source Policy

An MSDP SA import policy is used to limit the advertisement of active source information
carried in an SA message to be created in a domain.
When creating an SA message, a device configured with this policy filters the (S, G) entry
contained in the SA message based on the source address or the source/group address,
controlling the advertisement of source or source/group information during the creation of an
SA message.
Multicast Boundary
The multicast information to which each multicast group corresponds needs to be transmitted
in a certain range on a network. A multicast boundary can be configured on an interface to
define the forwarding range of the data of a multicast group, forming a closed multicast
forwarding area. When an interface of a device is configured with a forwarding boundary for
a group, the interface does not forward or receive any packet of the group.
BSR Boundary
A BSR boundary can be configured an edge device to restrict the range of a PIM domain,
implementing refined management of networks.
Source Address-based IGMP/MLD Message Filtering

To improve the security of received information, an IGMP/MLD IP source policy can be
configured on the interface connecting a device to hosts to filter IGMP/MLD messages based
on source addresses.
For details about a Source Address-based IGMP Message Filtering, see "". The principles for
a Source Address-based MLD Message Filtering are similar to those for a Source Address-
based IGMP Message Filtering.
PIM Neighbor Policy

To prevent an unknown device from establishing a PIM neighbor relationship with a device
on the network and prevent the unknown device from becoming the DR, filtering PIM
neighbors is required. After being configured with this function, an interface sets up neighbor
relationships with only the interfaces whose addresses match the filtering rules and removes
the neighbor relationships with the interfaces whose addresses does not match the filtering
rules.
PIM Join Policy

A Join/Prune message received on an interface contains a Join message and a Prune message.
After being configured with a PIM join policy, a device creates PIM entries for only Join
messages matching the policy, thereby preventing the joining of illegal users.

Equipment
PIM Silent
PIM silent can be configured on the interface connecting a device to hosts to prevent hosts
from maliciously sending Hello messages to attack the device. After entering the PIM silent
state, the interface is forbidden to receive or forward any PIM packet. All PIM neighbors and
the PIM state machines on this interface are deleted and the interface automatically becomes a
DR. The IGMP function on the interface, however, is not affected.
For details about PIM silent, see "PIM Silent" in PIM Security.
9.9.2.5 Multicast Protocol Packet Attack Defense

In some applications, a device uses only needed protocols, and does not need to send the
packets of unneeded upper-layer protocols to the CPU for processing. The forwarding plane
of the device can sense whether upper-layer protocols are enabled. The forwarding plane
sends only the packets of enabled protocols to the control plane, and discards the packets of
disabled protocols. In this manner, the risk that the protocol layer suffers attacks is minimized,
and the system security is improved.
Currently, all multicast protocols such as IGMP, PIM, and Multicast Source Discovery
Protocol (MSDP) support packet attack defense.
9.9.2.6 Multicast Security Authentication

Multicast security authentication involves Multicast Source Discovery Protocol (MSDP)
Message-digest algorithm 5 (MD5)/Keychain authentication.
MSDP MD5/Keychain Authentication

l MSDP MD5 authentication: After MD5 authentication is configured on devices, only the
devices configured with the same MD5 authentication password can establish the MSDP
peer relationship and exchange packets.
l MSDP Keychain authentication: A group of passwords is set for each TCP connection.
Each password in the group can be configured with a separate encryption algorithm and
a valid period, and can be randomly changed. In this manner, requests from illegal users
to set up a TCP connection are denied and invalid packets are filtered out.
For details about MD5/Keychain authentication, see MD5/Keychain Authentication.
9.9.3 Multicast Security Applications
9.9.3.1 Measures to Guarantee Network Security

Currently, the following multicast mechanisms can be used to guarantee network security:
l PIM silent Interface: prevents the interface connected a device to hosts from processing
PIM packets.
l Boundary: prevents multicast data packets and multicast protocol packets from being
sent or received across a multicast boundary.
9.9.3.2 Measures to Guarantee Protocol-Layer Security

Currently, the following multicast mechanisms can be used to guarantee protocol-layer
security:

Equipment
l MSDP MD5/Keychain authentication: prevents a forged MSDP peer from establishing

the MSDP peer relationship with a device.
l MSDP Source Active (SA) policy and MSDP SA request policy: prevent forged MSDP
SA messages.
l PIM neighbor policy: prevents the establishment of invalid neighbor relationships.
l Candidate-Rendezvous Point (C-RP) policy and BootStrap router (BSR) policy: prevents
the learning of illegal invalid Rendezvous Point (RP) and BSR addresses.
l Register, Source, Join/Prune, and IGMP group policies: prevent the learning of invalid
multicast entries.
l Multicast protocol packet attack defense: discards packets of unneeded upper-layer
packets or preferentially sends the packets of certain protocols.
9.9.3.3 Measures to Guarantee Device Security

Currently, the following multicast mechanisms can be used to guarantee device security:
l Limit on the number of IGMP entries, limit on the number of PIM entries, limit on the
number of MSDP peers, and limit on the size of the MSDP Source Active (SA) cache:
prevent too many memory sources from being consumed.
l Central Processor Committed Access Rate (CPCAR): controls the consumption of
bandwidth resources for sending protocol packets.
l Traffic rate limit: controls the consumption of CPU resources for processing protocol
packets.

None.

Equipment
Feature Description 10 MPLS
10 MPLS
About This Chapter
This document describes the MPLS in terms of the overview, principle, and applications.
10.1 MPLS Basics

10.2 MPLS LDP
10.3 MPLS TE
10.4 Seamless MPLS
10.1 MPLS Basics
10.1.1 Introduction
Background
IP-based Internet prevailed in the mid 90s. The technology is simple and costs little to deploy.
However, nowadays IP technology, which relies on the longest match algorithm, is not the
most efficient choice for forwarding packets.
In comparison, asynchronous transfer mode (ATM) technology is much more efficient at
forwarding packets. It uses labels (particularly, cells) of fixed length and maintains a label
table that is much smaller than a routing table. ATM technology, however, is a complex
protocol with a high deployment cost, which hinders its widespread popularity and growth.
Users wanted a technology that combines the best that IP and ATM technologies have to offer.
This has sparked the emergence of MPLS technology.
Multiprotocol Label Switching (MPLS) is designed to increase forwarding rates. Unlike IP
technology, MPLS analyzes packet headers only on the edge of a network, not at each hop.
Therefore, packet processing time is shortened.
MPLS no longer has the high-speed forwarding advantages since application-specific
integrated circuit (ASIC) technology has been developed to increase the routing rate. MPLS

Equipment
supports multi-layer labels, and its forwarding plane is connection-oriented. MPLS is widely
used in virtual private network (VPN), traffic engineering (TE), and quality of service (QoS).
Overview
MPLS works between the data link layer and the network layer in the TCP/IP protocol stack.
MPLS provides connections for the IP layer and obtains services from the data link layer.
MPLS replaces IP forwarding with label switching. A label is a short connection identifier of
fixed length that is meaningful to the local end. The label is similar to the ATM virtual path
identifier (VPI)/virtual channel identifier (VCI) and the Frame Relay data link connection
identifier (DLCI). The label is encapsulated between the data link layer and network layer.
MPLS can use any Layer 2 media to transfer packets, but is not limited by any specific
protocol on the data link layer.
The origin of MPLS is the Internet Protocol version 4 (IPv4). The core MPLS technology can
be extended to multiple network protocols, such as the Internet Packet Exchange (IPX),
Appletalk, DECnet, and Connectionless Network Protocol (CLNP). MPLS supports label
switching between multiple network protocols, as implied by its name.
The MPLS technology is a tunneling technology, but not a service or an application. It
supports multiple protocols and services. Moreover, it improves data transmission security.
10.1.2 Principles
10.1.2.1 Concepts
MPLS Network Structure

Figure 10-1 shows the typical structure of an MPLS network, which consists of many label
switching routers (LSRs). An MPLS network, also called an MPLS domain, comprises the
label edge routers (LERs) and core LSRs. LERs reside on the edge of an MPLS domain and
directly connect to one or more nodes that do not run MPLS. Core LSRs directly connect to
MPLS-enabled nodes within an MPLS domain.
Figure 10-1 MPLS network structure
LER
Non- MPLS MPLS network Non- MPLS
network network
Core LSR Core LSR
LER
LER
Core LSR
Core LSR
Non- MPLS Non- MPLS
network network
LER
LER

Equipment
All LSRs on the MPLS network forward data based on labels. When an IP packet enters an
MPLS network, an LER adds a label to it. Before the IP packet leaves the MPLS network,
another LER removes the label.
The path that MPLS packets take in an MPLS network is called a label switched path (LSP).
The LSP is a unidirectional path that transmits traffic from the ingress to the egress.
Figure 10-2 MPLS LSP
MPLS network
Ingress Transit Transit Egress

Non-MPLS Non-MPLS
network network
LER Core LSR Core LSR LER
LSP
The start node of an LSP is the ingress. The end node of the LSP is the egress. The nodes
between both ends along the LSP are transit nodes. An LSP may have none, one, or several
transit nodes and has only one ingress and one egress.
Forwarding Equivalence Class

The forwarding equivalence class (FEC) is a set of data flows with the same attributes. Data
flows in the same FEC are processed by LSRs in the same way.
FECs can be identified by the address, service type, and QoS.
Label
A label is 20-bit identifier that uniquely identifies the FEC to which a packet belongs. A label
is only meaningful to a local end. A FEC can be mapped to multiple incoming labels to
balance loads, but a label only represents a single FEC. A label on an MPLS network
performs the same function as a virtual path identifier (VPI)/virtual channel identifier (VCI)
in an ATM network or a data link connection identifier (DLCI) in a Frame Relay network.
Figure 10-3 shows the structure of an MPLS header.
Figure 10-3 Structure of the MPLS packet header
0 19 22 23 31
Label Exp S TTL
The MPLS header contains the following fields:
l Label: a 20-bit field that identifies a label value.

Equipment
l Exp: a 3-bit l field used for extension. This field is used by the class of service (CoS)
function, which is similar to Ethernet 802.1p.
l S: a 1-bit field that identifies the bottom of a label stack. MPLS supports multiple labels
that may be stacked. If the S field value is set to 1, the label is at the bottom of the label
stack.
l TTL: a time to live value. The length is 8 bits. This field is the same as the TTL in IP
packets.
Labels are encapsulated between the data link layer and network layer and supported by all
data link layer protocols.
Figure 10-4 shows the position of the label in a packet.
Figure 10-4 Position of a label in a packet
Link layer header Label Layer 3 header Layer 3 payload
Label Space
Label space is the label value range. The ATN supports the following label ranges:
l 0 to 15: special labels. For details about special labels, see Table 10-1.
l ATN 910 support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
and dynamic signaling protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 910I support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
l ATN 910B support for 16 to 7167: the label space shared by static LSPs, static CR-LSPs,
l ATN 950B with the control board AND1CXPA/AND1CXPB installed support for 16 to
3071: the label space shared by static LSPs, static CR-LSPs, and dynamic signaling
protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 950B with the control board AND2CXPB/AND2CXPE installed support for 16 to
7167: the label space shared by static LSPs, static CR-LSPs, and dynamic signaling
protocols, such as LDP, RSVP-TE, and MP-BGP.
l ATN 905 support for 16 to 2559: the label space shared by static LSPs, static CR-LSPs,
NOTE
When the ATN 905 supports 4K VLANs, the label space is 16 to 271.
Table 10-1 Special labels
Label Value Label Description
0 IPv4 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress must remove the label from the packet.
The egress then forwards the packet using IPv4.

Equipment
Label Value Label Description
1 Router Alert If a node receives a packet carrying label with this value,
Label the node sends the packet to a software module, without
implementing hardware forwarding. The node forwards
the packet based on the next layer label. If the packet
needs to be forwarded using hardware, the node pushes
the Router Alert Label back onto the top of the label
stack before forwarding the packet.
This label takes effect only when it is not at the bottom
of a label stack.
2 IPv6 Explicit If the egress receives a packet carrying a label with this
NULL Label value, the egress removes the label from the packet and
forwards the packet using IPv6.
3 Implicit If the penultimate LSR receives a packet carrying a label

NULL Label with this value, the penultimate LSR removes the label
and forwards the packet (now, an IP or VPN packet) to
the egress. The egress then forwards the packet over IP
or VPN routes.
ATN 910: 4 to Reserved None.

12
ATN 910I: 4
to 12
ATN 910B: 4
to 12
ATN 950B: 4
to 7, 9 to 12
ATN 905: 4 to
12
13 OAM Router If the ingress receives a packet carrying a label with this
Alert Label value, the ingress considers it an Operation,
Administration and Maintenance (OAM) packet and
transparently forwards it to the egress. MPLS OAM
sends OAM packets to monitor LSPs and advertise
faults.
14 to 15 Reserved N/A
Label Stack
A label stack in an MPLS packet contains a set of labels. The label next to the Layer 2 header
is the top or outer label. The label next to the Layer 3 header is the bottom or inner label.
Theoretically, there is no limitation to the number of MPLS labels that can be stacked.

Equipment
Figure 10-5 Label stack
Label stack
Link layer header Outer label Inner label Layer3 header Layer3 payload
The labels are processed from the top of the stack based on the last in, first out principle.
Label Operations
The label forwarding table defines the following label operations:
l Push: The ingress adds a label to a packet between the Layer 2 header and IP header
before forwarding the packet over an MPLS network. Within an MPLS network, an LSR
adds a label to the top of the label stack.
l Swap: A transit node replaces a label on the top of the label stack in an MPLS packet
with another label, which is assigned by the next hop.
l Pop: The penultimate LSR removes the top label from the label stack to decrease the
number of labels in the stack. The egress removes a label from the MPLS packet before
the packet leaves an MPLS network.
The VPN Option C scenario supports the following actions to process labels:
l Swappush: swaps an existing outer label for a new one and pushes a label of another
tunnel into a packet.
l Popgo: pops out outer labels from a packet and pushes a label of another tunnel into the
packet.
Penultimate Hop Popping

Penultimate hop popping (PHP) enables the penultimate LSR to pop the label out of the
packet. The penultimate node removes the label from a packet to reduce the packet size before
sending the packet to the last hop.
PHP is configured on the egress. The PHP-enabled egress advertises a label with value 3 to
the penultimate LSR.
The label with value 3, or implicit-null Label, instructs the penultimate LSR to implement
PHP on MPLS packets. The MPLS packets are reverted to IP or VPN packets and forwarded
to the egress, which forwards the packets over IP routes or based on the next layer label.
Label Switching Router

A label switching router (LSR) swaps labels and forwards MPLS packets. It is also called an
MPLS node. As a fundamental element on an MPLS network, all LSRs support MPLS.
LER
An LER is an LSR that resides on the edge of an MPLS domain. When an LSR connects to a
node that does not run MPLS, the LSR acts as the LER.

Equipment
The LER classifies the packets entering an MPLS domain by FECs and pushes labels into
them. Then, the LER forwards MPLS packets based on these labels. When packets leave the
MPLS domain, the labels are popped out. The packets again become IP packets and are
forwarded.
Label Switched Path

The path through which packets belonging to a forwarding equivalence class (FEC) pass
through an MPLS network is called an LSP.
LSPs are unidirectional and originate from the ingress and terminate at the egress. LSPs
perform the same functions on MPLS networks as permanent virtual circuits (PVCs) on ATM
and Frame Relay networks.
Ingress, Transit, and Egress LSRs

The LSP is a unidirectional path that consists of the following LSRs:
l Ingress LSR: the start node on an LSP. An LSP can have only one ingress.
The ingress pushes a new label into the packet and encapsulates the IP packet as an
MPLS packet to be forwarded.
l Transit LSR: the intermediate node of an LSP. Multiple transit LSRs may exist on an
LSP.
The transit LSR searches for routes in the label forwarding table and swaps labels to
forward MPLS packets.
l Egress LSR: the end node on an LSP. An LSP can have only one egress.
The egress removes labels from MPLS packets and forwards the resultant IP packets.
Upstream and Downstream

There are two types of LSRs: upstream and downstream. Upstream LSRs send MPLS packets
to a local LSR. Downstream LSRs are directly connected to and receive MPLS packets from a
local LSR.
As shown in Figure 10-6, LSRA is the upstream LSR of LSRB, and the LSRB is the
downstream LSR of LSRA. Similarly, LSRB is the upstream LSR of LSRC. LSRC is the
downstream LSR of LSRB.
Figure 10-6 Upstream and downstream

192.168.1.0/24
Downstream Downstream
LSR-A LSR-B LSR-C
data flow data flow
Label Distribution
An LSR records a mapping between a label and FEC and notifies upstream LSRs of the
mapping. This process is called label distribution.

Equipment
Figure 10-7 Label distribution

Labels are Labels are
distributed distributed
192.168.1.0/24
upstream upstream
LSR-A Data flow LSR-B Data flow LSR-C

downstream downstream
On the network shown in Figure 10-7, packets with destination address 192.168.1.0/24 are
assigned to a specific FEC. LSRB and LSRC assign labels that represent the FEC and
advertise the mapping between labels and the FEC to upstream LSRs.
Label Distribution Protocols

Label distribution protocols, also called signaling protocols, are MPLS control protocols used
to identify FECs, distribute labels, and create and maintain LSPs.
MPLS utilizes Label Distribution Protocol (LDP), Resource Reservation Protocol traffic
engineering (RSVP-TE), and Multiprotocol Extensions for Border Gateway Protocol (MP-
BGP).
MPLS Architecture
As shown in Figure 10-8, the MPLS architecture consists of a control plane and a forwarding
plane.
Figure 10-8 Schematic diagram for the MPLS architecture
Control Plane
IP Routing Protocol
Routing Information
Base (RIB)
Label Information Base MPLS IP Routing

(LIB) Protocol
Forwarding Plane
Label Forwarding
Information Base(LFIB)
l The control plane is connectionless and is used to distribute labels, create a label
forwarding table, and establish or tear down LSPs.
l The forwarding plane, also known as the data plane, is connection-oriented. It can apply
services and protocols supported by ATM, Frame Relay, and Ethernet networks. The

Equipment
forwarding plane, also known as the data plane, is connection-oriented. It can apply
services and protocols supported by ATM, Frame Relay, and Ethernet. The forwarding
plane adds labels to IP packets, forwards packets based on the label forwarding table,
and removes labels from MPLS packets before the packets.
10.1.2.2 Establishing LSPs
Procedure
MPLS assigns packets to a FEC, distributes labels that identify the FEC, and establishes an
LSP. Packets travel along the LSP.
Labels are assigned and distributed by a downstream LSR to an upstream LSR. As shown in
Figure 10-9, packets destined for 3.3.3.3 are assigned to a FEC. Downstream LSRs assign
labels for the FEC to upstream LSRs and use a label advertisement protocol to inform the
upstream LSRs of the mapping between the labels and FEC. Each upstream LSR adds the
mapping to a label forwarding table. An LSP is established using the label mapping
information.
Figure 10-9 Procedure for establishing

To 3.3.3.3/32 To 3.3.3.3/32 To 3.3.3.3/32
Label=Z Label=Y Label=3

3.3.3.3/32
LSPs can be either static or dynamic. Static LSPs are established manually. Dynamic LSPs
are established using a routing protocol and a label distribution protocol.
Establishing Static LSPs

You can manually allocate labels to set up static LSPs. The outgoing label value of the
upstream node is equal to the incoming label value of the downstream node.
A static LSP is meaningful only to the local node, and the local node that cannot monitor the
entire LSP.
l On the ingress: A static LSP is configured over a route, and the outbound interface is
enabled with MPLS. If the route is reachable, the static LSP goes Up, regardless of the
existence of the transit node or egress. A reachable route means that a route entry exists
and its destination address and next hop address match those in the local routing table.
l On the transit node: A static LSP is configured, and the inbound and outbound interfaces
are enabled with MPLS. If the inbound and outbound interfaces are Up on the physical
and protocol layers, the static LSP can go Up, regardless of the existence of the ingress,
egress, or other transit nodes.
l On the egress: A static LSP is configured, and the inbound interface is enabled with
MPLS. If the inbound interface is Up on the physical and protocol layers, the static LSP
goes Up, regardless of the existence of the ingress or transit node.

Equipment
NOTE
A reachable route is only required on the ingress for establishing a static LSP, but not on the transit node
or egress.
A static LSP is established without label distribution protocols or the exchanging of control
packets. The static LSP has a low cost and is recommended for small-scale networks with
simple and stable topology. The static LSP cannot vary dynamically with the network
topology. Instead, it needs to be configured by an administrator.
Establishing Dynamic LSPs

Dynamic LSPs are established automatically by one of the following label distribution
protocols:
l Label Distribution Protocol (LDP)
LDP is specially defined to distribute labels. When LDP establishes an LSP in hop-by-
hop mode, LDP identifies a next hop based on the routing forwarding table on each LSR.
Information contained in the routing forwarding table is collected by Interior Gateway
Protocol (IGP) and BGP. LDP only uses routing information, but is not associated with
the routing protocols.
In addition to LDP, BGP and RSVP can also be extended to distribute MPLS labels.
l Resource Reservation Protocol-Traffic Engineering (RSVP-TE)
The RSVP-TE signaling protocol is an extension to RSVP. RSVP is designed for the
integrated service model and is used to reserve the resources of nodes along a path.
RSVP works on the transport layer and does not transmit application data. This is
because RSVP is a network control protocol, similar to the Internet Control Message
Protocol (ICMP).
RSVP-TE sets up constraint-based routed LSPs (CR-LSPs).
Unlike LDP LSPs, CR-LSPs support the following parameters:
– Bandwidth reservation requests
– Bandwidth constraints
– Link colors
– Explicit paths
l Multiprotocol Extension for BGP (MP-BGP)
TMP-BGP is an extension to BGP. MP-BGP defines community attributes. MP-BGP
supports label distribution for packets transmitted over MPLS virtual private network
(VPN) routes and labeled inter-AS VPN routes.
10.1.2.3 MPLS Forwarding
Basic Concepts of MPLS Forwarding

l Tunnel ID
The system automatically allocates an ID to each tunnel. A tunnel ID uniquely identifies
a tunnel interface for a specific upper layer application, such as VPN or route
management. The tunnel is only meaningful to a local node.
The tunnel ID is 32 bits long. Each field contained in a tunnel ID varies depending on the
tunnel type.

Equipment
Figure 10-10 shows the tunnel ID structure.
Figure 10-10 Tunnel ID structure
0 31
Token Sequence-number Slot-number Allocation Method
The description of each field is as follows.
Table 10-2 Description of each field of a tunnel ID
Field Description
Token An index used to search an MPLS forwarding table for a

specific entry
Sequence-number Tunnel ID sequence number
Slot-number Slot number of an outbound interface that sends packets
Allocation Method Method used to allocate tokens::

l Global: All tunnels on a node share the public global token
space. Each token must have a unique value.
l Global with reserved tokens: Similar to the global method
except that some tokens are reserved. Tunnels can only be
established using unreserved tokens.
l Per slot: Each slot uses its own tokens, which has a unique
value. The tokens in one slot may have the same values as
those in other slot.
l Per slot with reserved tokens: Similar to the per slot method
except that some tokens are reserved. Tunnels can only be
established using unreserved tokens.
l Per slot with different avail value: Similar to the per slot
method except that a specific token range is allocated to
each slot.
l Mixed: Label spaces are created using both global and per
slot methods but take effect based on the interface type.
VLANIF interfaces or the interfaces of a backbone network
use the label space created using the global method. Other
interfaces use the label space created using the per slot
method.
l Mixed with 2 global space: Label spaces are created using
the global, global with reserved tokens, and per slot
methods.
l 2 global space: Label spaces are created using the global
and global with reserved tokens methods.

Equipment
l Next hop label forwarding entry (NHLFE)

An NHLFE is used to guide MPLS packet forwarding.
An NHLFE contains the following information:
– Tunnel ID
– Outbound interface
– Next hop
– Outgoing label
– Label operation
l Incoming label map (ILM)
An ILM entry defines the mapping between an incoming label and a set of NHLFEs.
An ILM entry contains the following information:
– Tunnel ID
– Incoming label
– Inbound interface
– Label operation
A transit node creates ILM entries containing the mapping between labels and NHLFEs.
The node searches an ILM table for an entry that matches a specific destination IP
address before forwarding the packet.
l FEC-to-NHLFE (FTN) map
An FTN entry defines the mapping between a FEC and a set of NHLFEs.
The FTN entry is only available on the ingress. You can obtain FTN information by
searching for non-0x0 token values in a FIB.
MPLS Forwarding Process

In the following example, a PHP-capable LSP is established to forward MPLS packets.
Figure 10-11 MPLS label distribution and packet forwarding

To 3.3.3.3/32 To 3.3.3.3/32 To 3.3.3.3/32
Label=Z Label=Y Label=X

Label distributing 3.3.3.3/32
Packet transmitting
PHP
Label=Z Label=Y
IP Packet IP Packet IP Packet IP Packet IP Packet
To 3.3.3.3 PUSH To 3.3.3.3 SWAP To 3.3.3.3 POP To 3.3.3.3 To 3.3.3.3

3.3.3.3/32
An LSP for FEC with the destination address 3.3.3.3/32 is established on the MPLS network

Equipment
The process of forwarding MPLS packets is as follows:

1. The ingress receives an IP packet destined for 3.3.3.3/32. The ingress adds Label Z to the
packet and forwards the packet to the adjacent transit node.
2. The transit node receives the labeled packet, swaps Label Z for Label Y in the packet,
and forwards the packet to the penultimate transit node.
3. The penultimate transit node receives the packet with Label Y. As the egress allocated
Label 3 to the penultimate transit node, the node removes Label Y and forwards the IP
packet to the egress.
4. The egress receives the IP packet and forwards it to 3.3.3.3/32.
MPLS Processing on Each Node

When an IP packet enters an MPLS domain, the ingress searches the FIB and checks whether
the tunnel ID mapped to the destination IP address is 0x0.
l If the tunnel ID is 0x0, the packet is forwarded along an IP link.
l If the tunnel ID is not 0x0, the packet is forwarded along an LSP.
Figure 10-12 shows the MPLS forwarding flow.
Figure 10-12 MPLS forwarding flow

IngressFIB NHLFE
FEC Tunnel ID Tunnel ID Out Interface Next Hop OutLabel Operation
Transit ILM NHLFE
InLabel Tunnel ID Tunnel ID Out Interface Next Hop OutLabel Operation
Egress ILM
InLabel
Nodes along an LSP search the following tables for entries used to forward MPLS packets:
1. The ingress searches the FIB and NHLFE tables.
2. The transit node searches the ILM and NHLFE tables.
3. The egress searches the ILM table.
FIB entries, ILM entries, and NHLFEs are associated with each other using the token field in
a tunnel ID.
l The ingress performs the following steps:
a. Searches the FIB table and finds a tunnel ID mapped to a specific destination IP
address.
b. Finds an NHLFE mapped to the tunnel ID in the FIB table and associates the FIB
entry with the NHLFE.
c. Searches the NHLFE table for the outbound interface name, next-hop IP address,
outgoing label value, and label operation. The label operation type is Push.

Equipment
d. Pushes a label into an IP packet, processes the EXP field based on a specific QoS
policy and TTL field and sends the encapsulated MPLS packet to a transit node.
l A transit node performs the following steps:
a. Searches the ILM table mapped to an MPLS label for the token.
b. Finds the NHLFE mapped to the token in the ILM table and associates the FIB
entry with the NHLFE.
c. Searches the NHLFE table for the outbound interface name, next-hop IP address,
outgoing label value, and label operation.
d. Processes the MPLS packets based on the specific label value:
n If the label value is greater than or equal to 16, the label operation is Swap.
The transit node performs the following operations:
○ Replaces the existing label with a new label in the MPLS packet.
○ Processes the EXP field and TTL field.
○ Forwards the MPLS packet with the new label to the egress.
n If the label value is 3, the label operation is Pop. The transit node performs the
following operations:
○ Removes the label from the MPLS packet.
○ Processes the EXP field and TTL field.
○ Forwards the packet over IP routes or based on the next layer label.
l The egress performs the following steps:
a. Searches for the label operation. The operation is Pop.
b. Processes the EXP field and TTL field.
c. Determines the forwarding path:
– When the S field in the label is equal to 1, the label is at the bottom of the stack.
Therefore, the egress forwards the packet over an IP route.
– When the S field in the label is equal to 0, the label is not at the bottom of the stack.
Therefore, the egress forwards the packet based on the next layer label.
Processing MPLS TTL

An MPLS label contains an 8-bit TTL field. The TTL field has the same function as that in an
IP packet header. MPLS processes the TTL to prevent loops and implement traceroute.
As defined in RFC 3443, MPLS processes TTLs in either uniform or pipe mode. By default,
MPLS processes TTLs in pipe mode.
l Uniform mode
The IP TTL value reduces by one each time it passes through a node in an MPLS
network.
When IP packets enter the MPLS network shown in Figure 10-13, the ingress reduces
the IP TTL value in an IP packet by one and copies the IP TTL into the MPLS TTL field.
Each transit node only processes the MPLS TTL. The egress reduces the MPLS TTL by
one and copies it into the IP TTL field before the packet leaves the MPLS network.

Equipment
Figure 10-13 TTL processing in uniform mode for incoming traffic
MPLS
CE PE P PE CE
MPLS MPLS
TTL 254 TTL 253
IP TTL IP TTL IP TTL IP TTL
255 254 254 252
l Pipe mode
The IP TTL value decreases by one only when passing through the ingress and egress.
On the network shown in Figure 10-14, the ingress reduces the IP TTL value in packets
by one and sets the MPLS TTL to a specific value. Transit nodes only process the MPLS
TTL. When the egress receives the packets, it removes the MPLS label carrying the
MPLS TTL from each packet and reduces the IP TTL value by one.
Figure 10-14 TTL processing in pipe mode for incoming traffic
MPLS
CE PE P PE CE
MPLS MPLS
TTL 255 TTL 254
IP TTL IP TTL IP TTL IP TTL
255 254 254 253
10.1.2.4 MPLS Ping/Traceroute
Overview
On an MPLS network, when data fails to be transmitted across an LSP, the MPLS control
plane cannot detect the transmission failure. Network maintenance is difficult to carry out.

Equipment
MPLS ping and traceroute functions provide a mechanism used to detect LSP faults and
locate faulty nodes.
MPLS ping is used to test the network connectivity and host accessibility. MPLS traceroute is
used to check network connectivity and locate network faults.
Similar to the IP ping and traceroute, the MPLS ping and traceroute monitor the LSP
availability using MPLS Echo Request and MPLS Echo Reply messages. These two messages
are sent over UDP with port number 3503. The receiver can distinguish between these
messages based on the received UDP port number.
An MPLS Echo Request message contains information about the FEC of the LSP to be
monitored. The message is sent like other packets that belong to the FEC along the LSP. The
LSP is monitored. Echo Request messages are transmitted to the destination using MPLS,
whereas MPLS Echo Reply messages are transmitted to the source using IP.
The destination address in the IP header of the Echo Request message is set to 127.0.0.1/8 and
the IP TTL is set to 1. This prevents the egress from forwarding the message to other nodes.
Format of message
The MPLS Echo Request and MPLS Echo Reply messages use the same format, as shown in
Figure 10-15.
Figure 10-15 Format of the MPLS Echo message

0 7 15 23 31
Version Numbers Global Flags
Message Type Reply Mode Return Code Return Subcode
Sender's Handle
Sequence Number
TimeStamp Sent (seconds)
TimeStamp Sent (microseconds)
TimeStamp Received (seconds)
TimeStamp Received (microseconds)
TLVs
……

Equipment
Table 10-3 Description of fields of an MPLS Echo message

Field Description
Version Number Can only be 1.
Global Flags The value can be:

l 1: The sender wants the receiver to perform FEC Stack
validation.
l 0: The choice is left to the receiver.
Message Type Type of MPSL Echo message:

l 1: MPLS Echo Request
l 2: MPLS Echo Reply
Reply Mode How an Echo reply is sent:

l 1: Do not reply
l 2: Reply using an IPv4/IPv6 UDP packet
l 3: Reply using an IPv4/IPv6 UDP packet with Router Alert
l 4: Reply using an application level control channel
Return Code The Return Code is set to zero by a sender. The receiver can set it
to one of the following values:
l 0: No return code
l 1: Malformed echo request received
l 2: One or more of the TLVs was not understood
l 3: Replying device is an egress for the FEC at stack-depth RSC
l 4: Replying device has no mapping for the FEC at stack-depth
RSC
l 5: Downstream mapping mismatch
l 6: Upstream interface index unknown
l 7: Reserved
l 8: Label switched at stack-depth RSC
l 9: Label switched but no MPLS forwarding at stack-depth RSC
l 10: Mapping for this FEC is not the given label at stack-depth
RSC
l 11: No label entry at stack-depth RSC
l 12: Protocol not associated with interface at FEC stack-depth
RSC
l 13: Premature termination of ping due to label stack shrinking
to a single label
Return Subcode RSC value.
Sender's Handle Value of sender's handle.
Sequence Number Used to detect missed replies.

Equipment
Field Description
TimeStamp Sent Time (in seconds and microseconds) when the MPLS Echo
Request was sent.
TimeStamp Received Time (in seconds and microseconds) when the corresponding
MPLS Echo Request was received. It is carried in an MPLS Echo
Reply message.
TLVs Carried in the MPLS Echo messages.
Figure 10-16 shows the format of the TLV carried in the MPLS Echo Request and MPLS
Echo Reply messages used to monitor LDP LSPs. Figure 10-17 shows the format of the TLV
carried in the MPLS Echo Request and MPLS Echo Reply messages used to monitor RSVP
LSPs.
Figure 10-16 Format of an LDP LSP TLV

0 7 15 23 31
Type = 1 (FEC TLV) Length = 12
sub-Type = 1 (LDP IPv4 FEC) Length = 5
IPv4 prefix
Prefix Length Must be zero
Figure 10-17 Format of an RSVP LSP TLV

0 7 15 23 31
Type = 1 (FEC TLV) Length = 24
sub-Type = 3 (RSVP IPv4 LSP) Length = 20
IPv4 tunnel end point address
Must be zero Tunnel ID
Extended Tunnel ID
IPv4 tunnel sender address
Must be zero LSP ID

Equipment
MPLS Ping
Figure 10-18 MPLS network

5.5.5.5/32 4.4.4.4/32
LSP
1.1.1.1/30 2.2.2.1/30 3.3.3.1/30
1.1.1.2/30 2.2.2.2/30 3.3.3.2/30
ATNA CX-B CX-C ATND
As shown in Figure 10-18, an LSP whose FEC is identified with the destination of ATND is
established on ATNA. ATNA uses the MPLS ping feature to monitor the LSP:
1. ATNA checks whether the LSP exists. For a TE tunnel, ATNA checks whether the tunnel
interface exists and whether a CR-LSP is established successfully. If the LSP does not
exist, an error message is returned, and ATNA stops pinging. If the LSP exists, ATNA
performs the following actions continuously.
2. ATNA constructs an MPLS Echo Request packet. The destination address in the IP
packet header is 127.0.0.1/8 and the IP TTL is 1. ATNA searches a matching LSP and
pushes a label (with the TTL of 255) of the LSP into the packet. Then, ATNA sends the
packet to CX-B.
3. CX-B and CX-C that serve as transit nodes forward the MPLS Echo Request packet as a
common MPLS packet.
If a transit node fails to forward the packet, the transit node returns a reply message
carrying the error code.
4. When the MPLS forwarding path is working properly, transit nodes forward the packet
successfully to ATND, namely, the egress of the LSP. ATND processes the packet and
replies with an MPLS Echo Reply packet.
MPLS Traceroute
As shown in Figure 10-18, ATNA uses the MPLS traceroute feature to monitor an LSP with
the destination address of 4.4.4.4/32:
1. ATNA checks whether an LSP exists.
– If the LSP exists, ATNA performs the following actions continuously.
– If the LSP does not exist, an error message is returned, and ATNA stops tracing the
route.
2. ATNA constructs an MPLS Echo Request packet. The destination address is 127.0.0.1/8
in the IP packet header and the IP TTL is 1. ATNA searches for a matching LSP and
pushes a label (with the TTL value of 1) of the LSP into the packet. Then, ATNA sends
the packet to CX-B. CX-B receives this packet and the TTL of the label times out. Then,
an MPLS Echo Reply message is returned. The destination UDP port and the destination
IP address of the MPLS Echo Reply message are the source UDP port and the source IP
address, respectively, of the MPLS Echo Request packet. The IP TTL is 255.
3. After receiving the MPLS Echo Reply message, ATNA sends an MPLS Echo Request
packet. The TTL of the label is 2. CX-B forwards this packet as a common MPLS

Equipment
packet. CX-C receives this packet and the TTL of the label times out. Then, an MPLS
Echo Reply message is returned.
4. After receiving the MPLS Echo Reply message, ATNA sends an MPLS Echo Request
packet. The TTL of the label is 3. CX-B and CX-C forward this packet as a common
MPLS packet. ATND receives the packet and finds that the destination address of the
packet is a local loop IP address. Then, ATND returns an MPLS Echo Reply message.
10.1.3 Applications
10.1.3.1 MPLS-based VPN
A traditional virtual private network (VPN) transmits private network data over a public
network using tunneling protocols, such as the Generic Routing Encapsulation (GRE), Layer
2 Tunneling Protocol (L2TP), and Point to Point Tunneling Protocol (PPTP).
The MPLS-based VPN technology establishes LSPs to connect private network branches
within a single VPN and to connect VPNs. Figure 10-19 shows the devices in the MPLS-
based VPN.
The following devices are deployed on the MPLS-based VPN:
l Customer edge (CE): an edge device on a customer network. The CE can be a router, a
switch, or a host.
l Provider edge (PE): an edge device on a service provider network.
l Provider (P): a backbone device on an SP network. A P is not directly connected to CEs.
Ps only need to obtain basic MPLS forwarding capabilities and do not maintain VPN
information.
Figure 10-19 MPLS-based VPN

CE3
VPN
branch 3
PE3
CE1 PE1
Backbone network
VPN
branch 1
PE2
CE2
VPN
branch 2
The principles of MPLS-based VPN are as follows:

l PEs manage VPN users, establish LSPs between themselves, and advertise routes to
VPN sites.

Equipment
l LDP or MP-BGP is used to allocate routes.

l The MPLS-based VPN supports IP address multiplexing between sites and the
interconnection of VPNs.
10.1.3.2 PBR to an LSP
Policy-based routing (PBR) enables the ATN to select routes based on a user-defined policy,
which helps transmit traffic securely or balance traffic. On an MPLS network, IP packets that
meet a PBR policy can be forwarded along a specified LSP.
In Figure 10-20, ATN-A, ATN-B, ATN-C, ATN-D, and ATN-E are in the original network.
ATN-F and ATN-G are added to provide new services. Traffic is forwarded as follows:
l Traffic for original services is forwarded through the original network.
l Traffic for new services is forwarded by ATN-F and ATN-G.
Figure 10-20 Application of the PBR to an LSP

ATN-B ATN-D
ATN-A ATN-C ATN-E
ATN-F ATN-G
To allow part of the new services to pass through the original network, the PBR can be
configured on ATN-A. The services matching a specific PBR policy can travel along LSPs
over the original network.
You can also use the PBR to the LSP together with LDP FRR to divert some traffic to the
backup LSP for load balancing.

Terms
Term Definition
Label space A value range of labels.

Equipment
Term Definition
ILM The incoming label map (ILM) indicates the

mapping between an incoming label and a
set of NHLFEs. The ILM contains the
following information: Tunnel ID, incoming
label, and incoming interface.
LDP peer Two LSRs with an LDP session that use

LDP to exchange label or FEC mapping
information.
LDP identifier A value that is used to identify a specified

LSR label space.
NHLFE The next hop label forwarding entry

(NHLFE), which can guide MPLS packet
forwarding. An NHLFE contains the
following information: Tunnel ID, outgoing
interface, next hop, outgoing label, and label
operation.
PHB Per-hop behavior describes how the packets

with the same DSCP value are forwarded to
the next hop. The PHB records certain
traffic attributes, such as latency and the
packet loss ratio.
Control plane The control plane is connectionless and

responsible for distributing labels, creating
the label forwarding table, and creating or
deleting LSPs.
Forwarding plane The forwarding plane, also known as the

data plane, is connection-oriented. It can
apply services and protocols of ATM, Frame
Relay, and Ethernet networks. The
forwarding plane is mainly responsible for
adding labels to and deleting labels from IP
packets. It simultaneously forwards the
received packets based on the label
forwarding table.

Abbreviation
DoD downstream-on-demand
DU downstream unsolicited
FEC forwarding equivalence class

Equipment

Abbreviation
ILM incoming label map
LAM label advertisement mode
LER label edge router
LFIB label forward information base
LSP label switched path
LSR label switching router
NHLFE next hop label forwarding entry
PHP penultimate hop popping
10.2 MPLS LDP
10.2.1 Introduction
Definition
The Label Distribution Protocol (LDP) is a control protocol of Multiprotocol Label Switching
(MPLS). It is similar to a signaling protocol working on a traditional network. It classifies
packets based on forwarding equivalence classes (FECs), distributes labels, and establishes
and maintains label switched paths (LSPs). In addition, LDP defines the messages and
procedures for distributing labels.
Purpose
MPLS supports multiple labels and its forwarding plane is connection-oriented, and this
excellent scalability enables the MPLS/IP-based network to provide various services. Label
switching routers (LSRs) run LDP to map routing information at the network layer to the
switched paths at the data link layer, and establish LSPs at the network layer. LDP features
simple networking and configurations, supports route topology-driven establishment of LSPs,
and supports large-capacity LSPs, and is widely used to provide virtual private network
(VPN) services.
10.2.2 Principles
10.2.2.1 Concepts
The MPLS architecture consists of multiple label distribution protocols, among which LDP is
widely used. Label switching routers (LSRs) exchange LDP messages to obtain information

Equipment
about incoming labels, next-hop nodes, and outgoing labels for specified FECs so that they
can establish LSPs. For LDP specifications, see RFC 5036 titled "LDP Specification."
LDP Adjacency
When an LSR receives a Hello message from a peer, the LSR establishes an adjacency with
the peer. An LDP adjacency maintains a peer relationship between the two LSRs. There are
two types of LDP adjacencies:
l Local adjacency: established by exchanging Link Hello messages between two LSRs.
l Remote adjacency: established by exchanging Target Hello messages between two LSRs.
LDP Peers
Two LDP peers establish an LDP session and exchange Label Mapping messages over the
session so that they can establish an LSP.
LDP peers learn each other's labels through the LDP session between them.
LDP Sessions
An LDP session established between LSRs helps them exchange messages, such as Label
Mapping messages and Label Release messages. LDP sessions are classified into the
following types:
l Local LDP session: established between two LSRs that are directly connected.
l Remote LDP session: established between two LSRs that are directly or indirectly
connected.
The local and remote LDP sessions can be established simultaneously.
LDP Dynamic Capability Announcement Function

The LDP dynamic capability announcement function allows an LDP extension to be
dynamically enabled or disabled on a device in an LDP session, improving LSP stability.
Relationships Between LDP Adjacencies, Peers, and Sessions

LDP maintains peer relationships over adjacencies. The type of peers depends on the type of
adjacencies. A peer can be maintained using multiple adjacencies. If a peer is maintained by
both local and remote adjacencies, the type of the peer supports both the local and remote
adjacencies. An LDP session can only be created if two peers establish a peer relationship.
Type of LDP Messages

LSRs exchange the following messages:
l Discovery message: used to notify or maintain the presence of an LSR on an MPLS

network.
l Session message: used to establish, maintain, or terminate an LDP session between LDP
peers.
l Advertisement message: used to create, modify, or delete a mapping between a specific
FEC and label.

Equipment
l Notification message: used to provide advisory information or error information.

LDP transmits Discovery messages using the User Datagram Protocol (UDP) and transmits
Session, Advertisement, and Notification messages using the Transmission Control Protocol
(TCP).
Label Spaces and LDP Identifiers

l Label space
A label space defines a range of labels allocated between LDP peers, which can be
categorized into the following types:
– Per-Platform Label Space:
All interfaces on an LSR share a single label space.
l LDP ID
An LDP identifier identifies a label space used by a specified LSR. A 6-byte LDP
identifier consists of a 4-byte LSR ID and a 2-byte label space. An LDP identifier is in
the format of <LSR ID>:<Label space ID>.
10.2.2.2 LDP Sessions
LDP Discovery Mechanism

The LDP discovery mechanism is used by an LSR to discover potential LDP peers. LDP
discovery mechanisms are classified into the following two types:
l Basic discovery mechanism: used to discover directly connected LSR peers on a link.
An LSR periodically sends LDP Hello messages to implement the basic discovery
mechanism and establish a local LDP session.
A Hello message contains an LDP identifier and other information, such as the hold time
and transport address. If the LSR receives an LDP Hello message on an interface, an
LDP peer is connected to the interface.
l Extended discovery mechanism: used to discover indirectly connected LSR peers on a
link.
An LSR periodically sends Targeted Hello messages to a specified address to implement
the extended discovery mechanism and establish a remote LDP session.
The Targeted Hello message, which is a UDP message, is sent to the specified address
through LDP port 646. The Targeted message contains an LDP identifier and other
information, such as the transport address and hold time. If the LSR receives a Targeted
Hello message on an interface, an LDP peer is connected to that interface.
Procedures for Establishing an LDP Session

Two LSRs send Hello messages to each other to trigger the establishment of an LDP session.
Figure 10-21 shows the procedures for establishing an LDP session.

Equipment
Figure 10-21 Procedures for establishing an LDP session

LSR-A (active role) LSR-B (passive role)
192.168.1.2/32 192.168.1.1/32
Hello message
Step1
TCP Connection
Step2
The actor sends an Initialization
message to negotiate about parameters
Step3
When the parameters are received,
an Initialization message and a
Keepalive message are sent
Step4
When the parameters are received,
a Keepalive message is sent
Step5
The procedure for establishing an LDP session is as follows:
1. Two LSRs send a Hello message to each other. The Hello message contains the transport
address that the two parties use to establish an LDP session. The LSR with the larger
transport address initiates a TCP connection and functions as the active role. As shown
in Figure 10-21, LSRA starts to establish a TCP connection and functions as the active
role, and LSRB waits for the TCP connection request and functions as the passive role.
2. After the TCP connection is successfully established, LSRA sends an Initialization
message to negotiate parameters used to establish the LDP session with LSRB. These
parameters include the LDP version, label distribution mode, value of the Keepalive
timer, maximum length of PDUs, and label space.
3. After receiving the Initialization message, either of the following situations occurs:
– If LSRB rejects some parameters, it sends a Notification message to instruct LSRA
to terminate the process of establishing the LDP session. The whole process ends.
– If LSRB accepts all parameters, it sends an Initialization message and a Keepalive
message to LSRA.
4. After receiving the Initialization message, either of the following situations occurs:
– If the active role LSRA cannot accept some parameters, it sends a Notification
message to LSRB to terminate the process of establishing the LDP session.
– If LSRA accepts all parameters, it sends a Keepalive message to LSRB.
After both LSRs receive the Keepalive messages from each other, the LDP session is
successfully established.
Automatically Established Remote LDP Session

A common remote LDP session is manually configured on two devices at two ends of the
session. Unlike the manual configuration scenario, in some scenarios, a local device needs to
automatically establish remote LDP sessions with its peers. For example, in a Remote LFA
FRR scenario (for details, see 10.2.2.10 LDP FRR), after an ingress uses the Remote LFA

Equipment
algorithm to calculates a PQ node, the ingress needs to run LDP to automatically establish a
remote LDP session with the destination IP address set to the PQ node's IP address.
After an Remote LFA-enabled LSR receives a Targeted Hello message with the R bit of 1, the
LSR automatically establishes a remote LDP peer relationship with its peer and replies with a
Targeted hello message with the R bit of 0, which triggers the establishment of a remote LDP
session. The R bit of 1 in the Targeted Hello message indicates that the receive end
periodically replies with a Targeted Hello message. The R bit of 0 in the Targeted Hello
message indicates that the receive end does not need to periodically reply with a Targeted
Hello message. If the LSR does not receive a Targeted Hello message with the R bit of 1, the
LSR deletes the established remote LDP session.
10.2.2.3 Advertising and Managing Labels

LDP peers send messages, such as Label Mapping messages, over an LDP session to
exchange label information with each other to establish an LSP. RFC 5036 defines the label
advertisement, distribution control, and retention modes.
NOTE
On the ATN, LDP by default works in the DU label advertisement mode, ordered label control mode,
and liberal label retention mode.
The ATN supports the combinations of the following modes:

l Combination of the DU label advertisement mode, ordered label control mode, and
liberal label retention mode
l Combination of the DoD label advertisement mode, ordered label control mode, and
conservative label retention mode
Label Advertisement Mode

An LSR on an MPLS network binds a label to a specific FEC and notifies its upstream LSRs
of the binding. This process is called label advertisement.
Label advertisement modes are as follows:
l Downstream unsolicited (DU)
An LSR binds a label to a specified FEC and notifies its upstream LSR of the binding,
without having to receive a Label Request message sent by an upstream LSR.
As shown in Figure 10-22, the egress sends an unsolicited Label Mapping message to
the upstream transit node to advertise the label of the host route to 192.168.1.1/32.
Figure 10-22 DU mode

Distribute labels Distribute labels
upstream voluntarily upstream voluntarily192.168.1.1/32
Ingress Transit Egress
l Downstream on demand (DoD)

An LSR binds a label to a specified FEC and notifies its upstream LSR of the binding
only after receiving a Label Request message from an upstream LSR.

Equipment
As shown in Figure 10-23, the upstream ingress sends the Label Request message. The
downstream egress receives this message and sends the Label Mapping message
upstream to advertise the label of the host route to 192.168.1.1/32.
Figure 10-23 DoD mode

The upstream requests The upstream requests the
the downstream for labels downstream for labels 192.168.1.1/32
Ingress Transit The label is Egress

distributed after the
request is received
An upstream LSR and a downstream LSR must use the same label advertisement mode.
Label Distribution Control Mode

The label distribution control mode defines how an LSR distributes labels.
The label distribution control modes are classified into the following categories:
l Independent label distribution control
A local LSR can assign a label bound to a FEC to an upstream LSR even though the
local LSR does not receive a label distributed by a downstream LSR.
– On the network shown in Figure 10-22, if the label distribution mode is DU and the
label distribution control mode is Independent, the transit LSR distributes labels to
the ingress without waiting for labels assigned by the egress.
– On the network shown in Figure 10-23, if the label distribution mode is DoD and
the label distribution control mode is Independent, the directly connected transit
node to which the ingress sends a Label Request message replies with labels
without waiting for labels assigned by the egress.
l Ordered label distribution control
A local LSR sends a Label Mapping message to an upstream LSR only when the local
LSR is the egress or only after it receives a Label Mapping message from a downstream
LSR.
– As shown in Figure 10-22, the label distribution mode is DU and the label
distribution control mode is ordered. The transit LSR must receive a Label Mapping
message from the downstream egress before it distributes a label to the ingress.
– As shown in Figure 10-23, if the label distribution mode is DoD and the label
distribution control mode is Ordered, the transit node directly connected to the
ingress that sends a Label Request message must receive a Label Mapping message
from the downstream egress before it distributes a label upstream to the ingress.
Label Retention Mode

The label retention mode defines how an LSR preserves a Label Mapping message.
The label mapping that an LSR receives may or may not originate at the next hop.
The label retention modes are classified into the following categories:

Equipment
l Liberal label retention mode

An LSR preserves Label Mapping messages sent by all LSRs, regardless of whether the
LSR is its next hop.
l Conservative label retention mode
An LSR preserves Label Mapping messages sent only by next-hop LSRs.
If the next hop of an LSR changes, either of the following situations occurs:
l In Liberal mode, the LSR can use an existing label advertised by a non-next LSR to
quickly establish an LSP. (For information about the establishment of an LDP LSP, see
Establishment of an LDP LSP). Liberal mode requires more memory and label space
than conservative mode.
An LSP that is assigned a label but is not successfully established called a Liberal LSP.
l In Conservative mode, the LSR only preserves the label advertised by a new next hop. In
most cases, the conservative and DoD modes are used simultaneously. This mode saves
memory and label space but the LSP is reestablished more slowly.
Conservative label retention mode is usually used together with DoD on the LSRs that
have limited label spaces.
10.2.2.4 LDP LSP Establishment
Process of Establishing an LSP

The LSP establishment is the process of binding a FEC to a label and advertising the binding
to LSRs. The procedure for establishing an LSP in DU label distribution mode and ordered
label control mode is as follows:
1. If an LER finds a new host route in the routing table and the destination IP address in the
host route is mapped to no existing FEC, the LER by default creates a FEC for the
destination IP address.
2. If the egress has available labels, it distributes labels for FECs and proactively sends a
Label Mapping message to an upstream transit LSR. The Label Mapping message
contains distributed labels and bound FECs.
3. After receiving the Label Mapping message, a transit LSR adds the mapping entry to its
label forwarding table and then proactively sends a Label Mapping message of the
specified FEC to the ingress.
4. After receiving the Label Mapping message, the ingress also adds the mapping to its
label forwarding table. An LSP is established, and the packets classified as the FEC can
be forwarded based on the label.
LSP Load Balancing

Equal-cost LDP LSPs for the same FEC can be established on the ingress or a transit node to
balance traffic. Equal-cost LDP LSPs are established using equal-cost routes, including IGP
routes or static routes.
NOTE
The maximum number of equal-cost LDP LSPs that can be established on the ingress or a transit node
depends on the device type.

Equipment
Proxy Egress LSP

A proxy egress extends an LSP to a non-LDP node. The extended LSP is called a proxy
egress LSP. A penultimate LSR functions as a special proxy egress when penultimate hop
popping (PHP) is enabled.
A proxy egress LSP can be established on a network with MPLS-incapable routers or in the
Border Gateway Protocol (BGP) route load balancing scenario. For example, on the network
shown in Figure 10-24, LSRA, LSRB, and LSRC, all except LSRD, are in an MPLS domain.
An LSP is established along the path LSA -> LSRB -> LSRC. LSRC functions as a proxy
egress and extends the LSP to LSRD. The extended LSP is a proxy egress LSP.
Figure 10-24 LDP LSP establishment
Proxy Egress
Loopback1 Loopback1 Loopback1 Loopback1

1.1.1.9/32 2.2.2.9/32 3.3.3.9/32 4.4.4.9/32
LSRA LSRB LSRC LSRD
MPLS domain IP domain
10.2.2.5 Delayed LDP Adjacency Deletion
Principles
After the GR Restarter performs an AMB/SMB switchover, the GR Helper's interface may go
Up slowly. As a result, the GR Helper fails to receive the Hello messages in the following
situations:
l A coexistent local and remote LDP session is established or multiple LDP-enabled links
reside between GR-enabled LSRs (called GR Restarter and Helper).
l Multiple LDP-enabled links reside between GR-enabled LSRs. These LSRs are called
GR Restarter and Helper.
This causes link protocol timeout on the control plane. The GR Helper cannot receive Hello
messages before the Hello hold timer expires. In this situation, the LDP adjacency goes
Down, without bringing down the LDP session between the GR Restarter and GR Helper.
This is because there are still other LDP adjacencies. As a result, the GR Helper does not
enter the GR process and deletes the LDP LSP for the LDP adjacency that went Down.
To prevent this problem, delayed LDP adjacency deletion can be used. This setting allows
LDP to delete an LDP adjacency in the Down state and its LDP LSP a specified delay after
the LDP adjacency goes Down, ensuring stable LSP traffic transmission.
Implementation
Delayed LDP adjacency deletion is implemented as follows:

Equipment
1. The GR Restarter performs an AMB/SMB switchover.

2. The GR Helper fails to receive Hello messages from the GR Restarter, causing an LDP
adjacency to go Down.
3. The LDP session does not go Down, because there are other LDP adjacencies between
the GR Restarter and Helper. The GR Helper does not delete the LDP adjacency and its
LDP LSP before the specified deletion delay.
4. If the LDP adjacency recovers before the deletion delay expires, services are not
affected. If the LDP adjacency does not recover before the deletion delay expires, the
LDP adjacency and its LSP are deleted.
Usage Scenario
Delayed LDP adjacency deletion is used when a coexistent local and remote LDP session is
established or multiple LDP-enabled links reside between the GR Restarter and Helper.
l Scenario in which a coexistent local and remote LDP session is established between the
GR Restarter and Helper
On the network shown in Figure 10-25, a coexistent local and remote LDP session is
established between the GR Restarter and Helper and is maintained by the local and
remote LDP adjacencies.
After the GR Restarter performs an AMB/SMB switchover, an interface may go Up
slowly before being able to send Hello messages to the GR Helper. The GR Helper
cannot receive the Hello messages before the Hello hold timer expires. This causes the
LDP local adjacency to go Down. The LDP session does not go Down, because it is
maintained by the remote LDP adjacency. As a result, the GR Helper does not enter the
GR process and deletes the LDP LSP for the local LDP adjacency. This causes traffic
loss during the GR process. In this case, delayed LDP adjacency deletion can be
deployed so that the local adjacency and its LSP are deleted only after the specified
delay. As a result, the link protocol can be restored to ensure traffic transmission.
Figure 10-25 Scenario in which a coexistent local and remote LDP session is established
between the GR Restarter and Helper
Local Adjacency
GR Remote Adjacency GR
Restarter Helper
Local Adjacency
GR Remote Adjacency GR
Restarter Helper
l Scenario in which multiple links exist between the GR Restarter and Helper
On the network shown in Figure 10-26, an LDP session is established between the GR
Restarter and Helper and is maintained by two local LDP adjacencies.
After the GR Restarter performs an AMB/SMB switchover, an interface may go Up
slowly before being able to send Hello messages to the GR Helper. The GR Helper
cannot receive the Hello messages before the Hello hold timer expires. This causes LDP
local adjacency 1 to go Down. The LDP session does not go Down, because it is

Equipment
maintained by local LDP adjacency 2. As a result, the GR Helper does not enter the GR
process and deletes the LDP LSP for local LDP adjacency 1. This causes traffic loss
during the GR process. In this case, delayed LDP adjacency deletion can be deployed so
that local adjacency 1 and its LSP are deleted only after the specified delay. As a result,
the link protocol can be recovered to ensure traffic transmission.
Figure 10-26 Scenario in which multiple links exist between the GR Restarter and
Helper
Local Adjacency1
GR Local Adjacency2 GR
Restarter Helper
Benefits
This function minimizes packet loss during a GR process and helps implement stable traffic
transmission when a coexistent local and remote LDP session is established or multiple LDP-
enabled links reside between the GR Restarter and Helper.
10.2.2.6 LDP-IGP Synchronization
Background
LDP-IGP synchronization enables the LDP status and the IGP status to go Up simultaneously,
which helps minimize traffic interruption time if a fault occurs.
LDP converges slower than IGP routes, causing traffic loss. Traffic is dropped when an active
link fails and recovers on the network shown in Figure 10-27 configured with the active and
standby links.
1. When an active link fails, an IGP route of a standby link becomes reachable. A backup
LSP over the standby link takes over traffic. This process is implemented usually using
LDP FRR. After the active link recovers, the IGP route of the active link becomes
reachable before an LDP session is established over the active link. As a result, traffic is
dropped when being transmitted using the reachable IGP route along the unreachable
LSP.
2. When the IGP route of the active link is reachable and an LDP session between nodes on
the active link fails, traffic is directed using the IGP route of the active link, whereas the
LSP over the active link is torn down. Because a preferred IGP route of the standby link
is unavailable, an LSP over the standby link cannot be established, causing traffic loss.

Equipment
Figure 10-27 Problem to be resolved using LDP-IGP synchronization

1 The primary tunnel
recovers after a physical
fault is cleared
2 The IGP status is Up but LSR3
the LDP session goes
Down on the primary tunnel
LSR1 LSR2 LSR5 LSR6
Primary tunnel
LSR4 Backup tunnel
To prevent traffic loss, LDP-IGP synchronization can be used. LDP-IGP synchronization

delays IGP route advertisement and ensures that the IGP route of the standby link and the
backup LSP over the standby link go Up simultaneously after an active/standby link
switchover is performed, minimizing traffic loss.
Related Concepts
LDP-IGP synchronization delays IGP route advertisement to ensure that the LDP session and
IGP route converge simultaneously.
Table 10-4 lists states and timers in LDP-IGP synchronization.
Table 10-4 States and timers in LDP-IGP synchronization
Status Status Timer Timer Description

Description
Init An interface None -

configured with
LDP-IGP
synchronization
goes Down.
Hold-down An IGP does not Hold-down timer The Hold-down

send or receive timer starts to count
Hello packets, which the period for
delays the delaying the
establishment of an establishment of the
IGP neighbor IGP neighbor
relationship after the relationship. If the
link recovers. value is set to 0, the
Hold-down timer
does not start.
The default value is
10, in seconds.

Equipment
Status Status Timer Timer Description

Description
Hold-max-cost An IGP establishes Hold-max-cost timer The Hold-max-cost

an IGP neighbor timer starts. Before
relationship and it expires, an IGP
advertises the keeps advertising
maximum cost of the maximum link
the active link. cost. If the value is
set to 0, the Hold-
max-cost timer does
not start.
10, in seconds.
Hold-normal-cos An IGP advertises None -

the actual cost of the
active link.
Sync-achieved LDP-IGP Delay timer The Delay timer

synchronization is starts after the LDP
achieved. The LDP session goes Up.
session is Up, an The Delay timer
LSP is established, counts the period
and the IGP during which an
advertises the actual LSP is being
cost of the active established. If the
link. value is set to 0, the
Delay timer does not
start.
10, in seconds.
Implementation
l LDP-IGP synchronization state machine
After LDP-IGP synchronization is enabled on an interface, the LDP-IGP synchronization
state machine operates based on the flowchart shown in Figure 10-28.

Equipment
Figure 10-28 LDP-IGP synchronization status transition

Enable LDP-IGP
synchronization
Query interface
LDP session goes Down
and LDP session
status
Interface goes Down

Interface and LDP
Interface goes Down

session go Up
Init
Interface goes Up and

Interface LDP route is
goes Down unreachable or
LDP session Hold-down timer
goes Up expires
Hold-down
LDP session goes Up

Sync-achieved Hold-max-cost
Infinitely advertise
LDP session
max-cost
goes Up
Hold-normal-cost
Hold-max-cost timer expires
and LDP session is Down
NOTE
Note differences when different IGP protocols are used:

l When OSPF is used, the status transits based on the flowchart shown in Figure 10-28.
l When IS-IS is used, the Hold-normal-cost state does not exist. After the Hold-max-cost
timer expires, IS-IS advertises the actual link cost, but the Hold-max-cost state is displayed
even though this state is nonexistent.
l State transition during LDP-IGP synchronization
Either of the following processes takes place based on Figure 10-27 and Figure 10-28.
a. The following process takes place after the physical fault is rectified on the active
link:
i. The active link fails and the faulty interface enters the Init state.
ii. The active link recovers and the interface goes Up.
iii. The interface on LSR2 enters the Hold-down state. LSR2 and LSR3 establish
an LDP session. The establishment of an IGP neighbor relationship is delayed.
The Hold-down timer starts if the timer value is not 0.
iv. Traffic keeps traveling through the LSP over the backup link.
v. After the link fault is rectified, LSR2 and LSR3 can discover each other as an
LDP peer and reestablish an LDP session (over a reachable LSR2-to-LSR3
route along the path LSR2 -> LSR4 -> LSR5 -> LSR3). They send a Label
Mapping message to each other to establish an LSP and instruct the IGP to
start LDP-IGP synchronization.

Equipment
vi. An IGP neighbor relationship starts to be established. An IGP route of the

active link becomes reachable. An LSP is reestablished over the active link in
milliseconds.
NOTE
The Hold-down timer can be set. Using the default value is recommended.
b. The following process takes place after the physical fault is rectified on the active
link:
i. An LDP session between nodes on the active link fails.
ii. The LDP module notifies the IGP module of the fault. The IGP interface enters
the Hold-max-cost state. An IGP advertises the maximum cost of the active
link and starts the Hold-max-cost timer.
iii. The IGP route of the standby link becomes reachable.
iv. An LSP is established over the standby link and the LDP module on LSR2
delivers forwarding entries.
NOTE
The Hold-max-cost timer can be configured to always advertise the maximum cost of the
active link. This setting allows traffic to keep traveling through the standby link before the
LDP session over the active link is reestablished.
Other Functions
On the MPLS network shown in Figure 10-29, a graceful restart (GR) process is performed
after LSR2 goes faulty. The LDP session between LSR2 and LSR3 may be established after
the GR process is complete. If LDP-IGP synchronization is enabled on the interface between
LSR2 and LSR3, LSR2 and LSR3 perform the following operations:
l LSR2 functioning as a GR Restarter
a. During the GR process, an IGP advertises the actual cost of the active link and
starts the GR Delay timer that delays the GR completion. The LDP session is
waiting to be established before the GR is complete.
b. After the GR Delay timer expires, the GR is complete. If the LDP session is not
established at this time, the IGP starts the Hold-max-cost timer and advertises the
maximum active link cost of the interface, switching the IGP route to the standby
link.
c. If the LDP session is reestablished or the Hold-max-cost timer expires, the IGP
resumes the actual link cost of the interface, switching the IGP route back to the
active link.
l LSR3 functioning as a GR Helper
a. LSR3 retains the original IGP route and the LSP before the LDP GR is complete.
When the LDP session goes Down, LDP does not notify the IGP link of the session
Down event. In this case, the IGP still advertises the actual link cost, ensuring that
the IGP route is not switched to the standby link.
b. If the LDP session is not established after the GR is complete, the IGP starts the
Hold-max-cost timer and advertises the maximum active link cost of the interface,
switching the IGP route to the standby link.
c. If the LDP session is reestablished or the Hold-max-cost timer expires, the IGP
resumes the actual link cost of the interface, switching the IGP route back to the
active link.

Equipment
Figure 10-29 Fault occurring on a node during the GR process

LSR3
GR starts on the
faulty node
LSR1 LSR2 LSR5 LSR6
GR Helper GR Rstarter Primary tunnel

LSR4 Backup tunnel
Usage Scenario
Figure 10-30 shows an LDP-IGP synchronization scenario.
On the network shown in Figure 10-30, an active link and a standby link are established.
LDP-IGP synchronization and LDP FRR are deployed.
Figure 10-30 LDP-IGP synchronization scenario

LDP-IGP
CE1 synchronization
CE3
and LDP FRR P2
are used
PE1 P1 P4 PE2
P3
CE2 CE4
VPN
Primary tunnel
Backup tunnel
Benefits
Packet loss is reduced during an active/standby link switchover or the GR process, improving
network reliability.
10.2.2.7 Synchronization Between LDP and Static Routes

With the help of synchronization between LDP and static routes, you can switch traffic from
the faulty active link to the standby link by suppressing the activation of static routes and

Equipment
delay traffic switchback to the active link. This process ensures that LDP is synchronized with
static routes.
Synchronization between LDP and static routes is used to minimize packet loss during a
traffic switchover or switchback on the network with active and standby links. As shown in
Figure 10-31, on a network with active and standby links, a static route is configured between
LSRA and LSRD, and an LSP between the two devices is established based on the static
route. Normally, Link A is preferred.
Figure 10-31 Networking diagram for LSP switching with synchronization between LDP and
static routes
LSRB
Link A
LSRA LSRD
Link B
LSRC
l Switchover scenario
If the link between LSRA and LSRB is working properly but the LDP session between
LSRA and LSRB goes Down, the static route on the active link LinkA remains reachable
but the LSP on Link A is deleted. The static route on the standby link LinkB is not
available, so that no LSP can be established on LinkB, which causes traffic loss on the
deleted LSP.
After synchronization between LDP and static routes is enabled on LSRA, the LDP
session between LSRA and LSRB goes Down, causing the static route on LinkA to
become unreachable. The LDP session between LSRA and LSRC goes Up, causing the
static route on LinkB to become reachable. As a result, the LSP switches from LinkA to
LinkB so that traffic on the LSP is not interrupted.
l Switchback scenario
If the link between LSRA and LSRB fails, both the static route and the LSP on LinkA
switch to LinkB. After the link between LSRA and LSRB recovers, the static route
precedes the LSP and switches back to LinkA. This is because a static route converges
faster than LDP. In this case, the backup LSP on LinkB cannot be used. The LSP is not
established on LinkA, causing the LSP traffic interruption.
After synchronization between LDP and static routes is enabled on LSRA, the static
route on LinkA does not become reachable until the LDP session between LSRA and
LSRB goes Up. This configuration enables the static route and the LSP to switch back to
LinkA at the same time, which prevents traffic loss.
10.2.2.8 LDP GR
LDP graceful restart (GR), with the help of a helper, implements uninterrupted forwarding
during an active main board (AMB)/standby main board (SMB) switchover. Without GR,

Equipment
during an AMB/SMB switchover, a neighboring device deletes an LSP because an LDP

session goes Down. As a result, traffic is interrupted within a short period of time. LDP GR
can be configured to remain labels to implement uninterrupted MPLS forwarding after an
unexpected AMB/SMB switchover is performed.
LDP GR timers
Defined in RFC 3748, the following LDP GR timers are used:
l MPLS Forwarding State Holding timer: When a GR restarter restarts the LDP protocol,
it sets forwarding entries to the Down state and starts this timer. After this timer expires,
a device deletes forwarding entries in the Stale state.
l LDP Reconnect timer: After a GR helper finds that the LDP session established with the
GR restarter goes Down, the helper retains the FEC label mapping for the LDP session,
sets the mapping to the Stale state, and starts the LDP Reconnect timer. If the LDP
session is not established after this timer expires, the GR helper deletes the FEC label
mapping and forwarding entries for the LDP session.
l LDP Recovery timer: After the LDP session is reestablished between the GR helper and
restarter, the GR helper starts the LDP recovery timer. After this timer expires, the GR
helper deletes the stale FEC label mapping and forwarding entries for the LDP session.
Figure 10-32 describes the usage scenario and timing sequence of using the preceding timers.
Figure 10-32 Usage scenario and timing sequence of using LDP GR timers
MPLS Forwarding State Holding timer
Session
reestablishment
GR Restarter
time
An LDP session goes LDP restarts. The LDP session is A forwarding entry is Time
Down. reestablished. created.
LDP Reconnect timer LDP Recovery timer

GR Helper
Detects that the LDP The LDP session is A forwarding entry is Time
session went Down. reestablished. created.
LDP GR implementation
Figure 10-33 illustrates the process of implementing LDP GR.
1. Before an AMB/SMB switchover is implemented, LDP peers negotiate GR capabilities

before they establish an LDP session.
2. After the restarter performs an AMB/SMB switchover, it restarts the LDP protocol, starts
the MPLS Forwarding State Holding timer, and sets a forwarding entry used before LDP
is set to the Stale state.
3. After the helper detects the AMB/SMB switchover performed by the restarter, the helper
sets the FEC label mapping entry sent over the LDP session established between the
helper and restarter to the Stale state. The helper then starts the LDP Reconnect timer.

Equipment
4. If the LDP is reestablished between the restarter and helper before the LDP Reconnect
timer expires, the helper deletes the LDP Reconnect timer and starts the LDP Recovery
timer.
5. Before the LDP Recovery timer expires, the helper and restarter exchange Label
Mapping messages to restore forwarding entries.
6. After the MPLS Forwarding State Holding timer expires, the restarter deletes the stale
forwarding entries.
7. After the LDP Recovery timer expires, the helper deletes the stale FEC label mapping
and forwarding entries.
Figure 10-33 LDP GR implementation

An LDP session is established,

and both ends negotiate the GR
capability.
Performs an AMB/SMB Detects the switchover

switchover performed by the
Restarts LDP Restarter
The LDP session is reestablished.
Establishes an Establishes an
LDP session LDP session
Both ends exchange Label Mapping
messages.
Creates a Creates a
forwarding entry forwarding entry
LDP GR ends LDP GR ends
MPLS Forwarding State Holding timer

LDP Reconnect timer
LDP Recovery timer
NOTE
The ATN can only function as the GR helper.
10.2.2.9 LDP NSR
Principle of LDP NSR

NOTE
Only ATN 950B supports LDP NSR.
The non-stop routing (NSR) technology is an innovation based on the non-stop forwarding
(NSF) technology, whereas is naturally different from NSF. If a software or hardware fault
occurs on the control plane, NSR ensures the uninterrupted forwarding and the uninterrupted
connection on the control plane. In addition, the control plane of a neighbor does not detect
the fault.

Equipment
LDP NSR is implemented through synchronization between the master control board and
slave control board. When being started, the slave control board backs up data of the master
board in batches to ensure data consistency on both boards at this stage. LDP NSR
simultaneously notifies the master and slave control boards of receiving packets in real time.
The slave control board synchronizes data with the master board. NSR then ensures that after
the switchover, the slave board fast takes over services on the original master board, whereas
the neighbor does not detect the fault on the local router.
LDP NSR synchronizes the following key data between the master control board and slave
control board:
l LSP forwarding entries
l Cross connect (XC) information, used to describe the cross connection between a
forwarding equivalence class (FEC) and an LSP
l Labels, including the following types:
– LDP LSP labels on a public network
– Labels of VCs in Martini mode in a VLL networking
– Labels of VCs in Martini mode in a VPLS networking
– PW labels used by dynamic PWs in a PWE3 networking
l LDP protocol control blocks
10.2.2.10 LDP FRR

LDP fast reroute (FRR) backs up local interfaces to provide the fast reroute function for
MPLS networks.
LDP FRR, in liberal label retention mode, obtains a liberal label, applies a forwarding entry
for the label, and forwards the forwarding entry to the forwarding plane as the backup
forwarding entry for the primary LSP. If the interface is faulty (detected by the interface itself
or by BFD) or the primary LSP fails (the standby link failure detected by BFD for IP), LDP
FRR fast switches traffic to the backup LSP to protect the primary LSP.
l Manually configured LDP FRR needs to have the backup LSP's outbound interface name
and next hop address specified manually. When the source of the liberal label matches
the outbound interface and next hop, a backup LSP can be established and its forwarding
entries can be delivered.
l LDP Auto FRR depends on the implementation of IP FRR. When the source of the
preserved liberal label matches the outbound interface and next hop of the backup route,
the requirement for the policy for establishing the backup LSP is met, and no backup
LSP manually configured according to the backup route exists, a backup LSP can be
established and its forwarding entries can be delivered. The default LDP Auto FRR
policy is that LDP can use the backup routes with 32-bit addresses to establish backup
LSPs. When both LDP manual FRR and LDP Auto FRR meet the establishment
conditions, LDP manual FRR backup LSP is established preferentially.

Equipment
Usage Scenario
Figure 10-34 Typical usage scenario for LDP FRR (triangle topology)
LSRC
LSRA LSRB
Figure 10-34 shows a typical usage scenario for LDP FRR. The preferred LSRA-to-LSRB
route is LSRA-LSRB, and the second optimal route is LSRA-LSRC-LSRB. A primary LSP
between LSRA and LSRB is established on LSRA, and a backup LSP of LSRA-LSRC-LSRB
is established to protect the primary LSP. After receiving a label from LSRC, LSRA compares
the label with the LSRA-to-LSRB route. Because the next hop of the LSRA-to-LSRB route is
not LSRC, LSRA preserves the label as a liberal label. If either of the following conditions is
met, a specific situation occurs:
l The source of a liberal label for LDP manual FRR corresponds to a specified outbound
interface and next hop.
l The backup route corresponding to the source of the liberal label for LDP auto FRR
exists, and its destination meets the policy for LDP to create a backup LSP, and no
backup manual FRR LSP is established over the backup route.
LSRA can apply for a forwarding entry for the liberal label, establish a backup LSP as the
backup forwarding entry of the primary LSP, and send the entries mapped to both the primary
and backup LSPs to the forwarding plane. In this way, the primary LSP is associated with the
backup LSP.
LDP FRR is triggered when the interface detects fault by itself, BFD detects faults in the
interface, or BFD detects a primary LSP failure. After LSP FRR is complete, traffic is
switched to the backup LSP based on the backup forwarding entry. Then, the route is
converged to LSRA-LSRC-LSRB. An LSP is established on the new LSP (the original
backup LSP), and the original primary LSP is torn down, and the traffic is forwarded along
the new LSP over the path LSRA-LSRC-LSRB.
Figure 10-35 Typical usage scenario for LDP FRR (rectangle topology)
S D
N1 N2

Equipment
LDP FRR is applicable to a triangle network with three Figure 10-34s deployed, but may be
not supported in a square network with four Figure 10-34s. On the network shown in Figure
10-35, if the optimal route from N1 to D is N1-N2-D (load balancing is unavailable), then S
receives a liberal label from N1 and is bound to LDP FRR. If the link between S and D is
faulty, traffic is switched to the route of S-N1-N2-D without forming a loop.
However, if the optimal route from N1 to D is load balanced between N1-N2-D and N1-S-D,
the S as the downstream neighbor of N1 does not necessarily receive the liberal label from
N1. In addition, although the S receives the liberal label (LDP distributes labels for each peer)
and is configured with LDP FRR, traffic may still go to the S after traffic switches to N1,
which leads to a loop, till the route from N1 to D is converged to N1-N2-D.
LDP Remote LFA FRR
LDP LFA FRR cannot calculate backup paths on large networks, especially ring networks,
which fails to meet reliability requirements. To address this issue, LDP Remote LFA FRR is
used. Remote LFA FRR is implemented based on IGP Remote LFA FRR's (OSPF IP FRR)
LDP Auto FRR. Figure 10-36 illustrates the typical LDP Auto FRR usage scenario. The
primary LDP LSP is established over the path PE1 -> PE2. Remote LFA FRR establishes a
Remote LFA FRR LSP over the path PE1 -> P2 -> PE2 to protect the primary LDP LSP.
Figure 10-36 Typical LDP Auto FRR usage scenario - ring topology
PE1 PE2
P1 P2 (PQ)
Primary LDP LSP
Remote LFA FRR LSP
LDP tunnel
Remote LFA iterated LSP
The implementation is as follows:

1. An IGP uses the Remote LFA algorithm to calculate a Remote LFA route with the PQ
node (P2) IP address and the iterated outbound interface's next hop and then notifies the
route management module of the information.
2. LDP obtains the Remote LFA route from the route management module. PE1
automatically establishes a remote LDP peer relationship with the PQ node and a remote
LDP session for the relationship. PE1 then establishes an LDP LSP to the PQ node and a
Remote LFA FRR LSP over the path PE1 -> P2 -> PE2. For information about how to
automatically establish a remote LDP session, see 10.2.2.2 LDP Sessions.

Equipment
3. LDP-enabled PE1 establishes an LDP LSP over the path PE1 -> P1 -> P2 with the
iterated outbound interface's next hop. This LSP is called a Remote LFA FRR iterated
LSP.
If PE1 detects a fault, PE1 rapidly switches traffic to the Remote LFA FRR LSP.
10.2.2.11 LDP MTU

A maximum transmission unit (MTU) plays an important role when two devices on the same
network interconnect with each other. The MTU determines the maximum number of bytes
that can be transmitted without being segmented by a sender at a time. If the MTU exceeds
the maximum number of bytes supported by the receiver or a transit device, packets are then
fragmented or discarded, which aggravates the network transmission load. To prevent the
problem, devices have to calculate the MTU before communications so that sent packets can
reach the receiver successfully.
Principles
LDP LSP forwarding and common IP forwarding differ greatly in terms of implementation
mechanism but share a large number of similar aspects about the MTU. Both of them are
required to send packets smoothly to the receiver through each hop without reassembly.
The MPLS MTU, like the interface MTU, has a default value and is configurable. Before
informing the upstream device of the LDP MTU, an LSR calculates the LDP MTU by
selecting the smallest value among the MTU values used by all downstream devices and the
MTU of the egress. The LSR adds the smaller MTU value to the MTU TLV of a Label
Mapping message and then sends the message to an upstream device. If any of the two MTUs
mentioned previously changes due to configuration modifications or the outbound interface
changes on the local end, the LSR recalculates the MTU and sends a Label Mapping message
that contains the calculated MTU to all upstream devices.
10.2.2.12 LDP Authentication
LDP MD5
Message-digest algorithm 5 (MD5) is a standard digest algorithm defined in RFC 1321.
Typically, MD5 is used to compute a message digest to prevent message spoofing. The MD5
message digest is a uniquely calculated by an irreversible character string algorithm. If a
message is modified during transmission, a different digest is generated. After the message
arrives at the receiving end, the receiving end can determine whether the packet is modified
by comparing the received digest with the pre-computed digest.
LDP MD5 verifies LDP packets against modifications by generating a unique digest from the
same message. This authentication is stricter than the common checksum verification of TCP
connections.
Before sending packets over a TCP connection, the sender performs LDP MD5 authentication
adding the unique message digest after the TCP header. The message digest is computed
using the TCP header, LDP message, and password set by the user.
After receiving this TCP packet, the receiver obtains the TCP header, digest, and LDP
message, and uses MD5 to calculate a digest based on the received TCP header, received LDP
message, and locally stored password. The receiver compares the calculated digest with the
received one to check whether the packet is modified.

Equipment
A password can be set in either ciphertext or simple text. The simple password is directly
recorded in the configuration file. The ciphertext password is recorded in the configuration
file after being encrypted using a special algorithm.
During the calculation of a digest, the manually entered character string is used regardless of
whether the password is in simple text or ciphertext. This means that a password in ciphertext
does not participate in MD5 calculation.
LDP Keychain
Keychain, an enhanced encryption algorithm to MD5, calculates a message digest for the
same LDP message to prevent the message from being modified.
During keychain authentication, a group of passwords is defined to form a password string.
Each password is specified with encryption and decryption algorithms, such as MD5
algorithm and SHA-1, and is assigned the validity period. The system selects a valid password
based on the user's configuration. Within the validity period of the password, the system uses
the encryption algorithm matching the password to encrypt the packet before sending it out, or
uses the decryption algorithm matching the password to decrypt the packet before accepting
it. In addition, the system automatically uses a new password after the previous password
expires, preventing the password from being decrypted.
The keychain authentication password, the encryption and decryption algorithms, and the
password validity period, the three of which construct a keychain configuration node are
configured using different commands. A keychain configuration node requires at least one
password and encryption and decryption algorithms.
To reference a keychain configuration node, specify a desired peer and the name of the node
in the MPLS LDP view so that an LDP session is encrypted. Different peers can reference the
same keychain configuration node.
10.2.2.13 LDP over TE

RSVP-TE is an MPLS tunnel technique used to generate LSPs as tunnels for other protocols
to transparently transmit packets. LDP is another MPLS tunnel technique also used to
generate LDP LSPs. LDP over TE, a technique used by VPN servers, enables LDP LSPs to
cross RSVP-TE areas. According to VPN applications, to carry out MPLS traffic engineering,
carriers have difficulties in deploying TE on the entire network. Carriers can plan a core TE
area where TE is deployed, and implement LDP outside the TE area.
After the RSVP-TE tunnel is established, an IGP (such as OSPF or IS-IS) enables the
outbound interface of routes to select a TE tunnel through local calculation or by advertising
link state advertisements (LSAs). The source device is directly connected to the destination
device of the TE tunnel through the TE tunnel interface (logical interface). Actually, packets
are transparently transmitted over the TE tunnel.

Equipment
Figure 10-37 Networking topology for LDP over TE

LDP LSP LDP LSP
RSVP LSP
LSR1 LSR2 LSR3 LSR4 LSR5
BP1 BP2
RSVP LSP
As shown in Figure 10-37, the entire network is an MPLS VPN that runs LDP as a signaling
protocol and provides common VPN services. LSR1 and LSR5 are PEs. After a large number
of users are connected to the network, all traffic between LSR1 and LSR5 passes through the
link between LSR2 and LSR3. The link is then congested. The link between LSR2 and BP1 is
idle. The LSP, however, cannot use the link between LSR2 and BP1 because the IGP cost of
this link is high.
To prevent traffic congestion, LDP over TE can be deployed. A TE tunnel can be established
between LSR2 and LSR4, and the tunnel passes through BP1 and BP2. The IGP cost value is
adjusted so that routes can be balanced on LSR2 on the following two types of interfaces:
l Physical interfaces of LSR2 and LSR3

l Interfaces of the TE tunnel between LSR2 and LSR4
LDP establishes LSPs to balance traffic between the two links.
A specific number of TE tunnels can be established on idle links. This setting has more
advantages than adjusting IGP cost values and is widely applied in MPLS TE.
10.2.2.14 Coexistence of the Local and Remote LDP Sessions
Principles
Coexistence of local and remote LDP session mainly applies to L2VPNs.
Both the local and remote LDP adjacencies can be connected to the same peer. The peer is
maintained by both the local and remote LDP adjacencies.
In Figure 10-38, when the local LDP adjacency is deleted due to the faulty link to which the
adjacency is connected, the peer type may change without affecting the existence and status of
the peer. The peer type is determined by the type of adjacencies. The type of adjacencies can
be local, remote, or coexistence of the local and remote.
If the link becomes faulty or is recovering, the peer type may change, and the session type
corresponding to the peer also changes. The session remains Up, not being deleted or going
Down.

Equipment
Usage Scenario
Figure 10-38 Networking topology for the coexistence of the local and remote LDP sessions
Remote Adjacency
Local
CE1 PE1 Adjacency PE2 CE2
A typical application scenario is L2VPN. In Figure 10-38, L2VPN services are configured on
PE1 and PE2. When the directly connected link between PE1 and PE2 is disconnected and
then recovers, the procedure is as follows:
1. A session for the coexistence of the local and remote LDP adjacencies is created on the
two directly connected devices. L2VPN messages are sent over this session.
2. The physical link between PE1 and PE2 becomes Down, and the local LDP adjacency of
the peer becomes Down. The route between PE1 and PE2 is reachable through the P
because the remote LDP adjacency is still Up. When the session type changes, the
session becomes a remote session and is still Up. The L2VPN cannot detect the session
type change and does not delete the session. This implementation prevents the L2VPN
from disconnecting neighbors and then recovering and reduces the service interruption
time.
3. After the fault is rectified, the link between PE1 and PE2 goes Up and then the local
LDP adjacency goes Up. If the session type changes, the session is restored to a session,
through which the local LDP adjacency and remote LDP adjacency can coexist, and the
session is still Up. The L2VPN cannot detect the session type change and does not delete
the session. This implementation helps reduce the service interruption time.
10.2.2.15 Distributing Labels for All Peers by LDP

This sub-feature resolves the problem that convergence is slow if a link is faulty.
When labels are distributed for only upstream peers and Label Mapping messages are sent,
check the upstream/downstream relationship of a session in routing information. An upstream
node cannot send the Label Mapping message to its downstream node along a route. If the
route changes and the upstream/downstream relationship is switched, the new downstream
node resends a Label Mapping message. In this process, convergence is slow.
After LDP distributes labels for all peers, LDP sends Label Mapping messages to all
matching peers without distinguishing the upstream/downstream relationship. Each node can
send Label Mapping messages to all peers.
In Figure 10-39, the original routes from P2 to P3 are P2->P1->P3->PE3 and P2->P4->PE4-
>PE3. For the loopback interface route on PE3, P1 is the next hop of P2. After P2 receives the
Label Mapping message from P1 and is able to distribute labels only to upstream nodes, P2
does not send the Label Mapping message about the route to P1. If the link between P1 and
P3 is faulty, the route from PE1 to PE3 is switched from PE1->P1->P3->PE3 to PE1->P1-

Equipment
>P2->P4->P3->PE3, and P2 becomes the downstream node of P1. P2, however, does not send
the Label Mapping message to P1 and has to wait to resend the Label Mapping message. In
the process, LSP reconvergence is slow.
When LDP distributes labels for all peers, and P2 receives the Label Mapping message from
P1, P2 directly sends the Label Mapping message about the route to P1 and LDP generates a
liberal LSP on P1. In this manner, when the link between P1 and P3 is faulty, the route from
PE1 to PE3 is switched from PE1->P1->P3->PE3 to PE1->P1->P2->P4->P3->PE3, P2
becomes the downstream node of P1, and the liberal LSP directly changes to a normal LSP.
Then, LSP convergence is accelerated.
In addition, you can configure split horizon to determine the upstream peers to which Label
Mapping messages are sent, and the upstream peers to which Label Mapping messages are not
sent.
Figure 10-39 Networking topology for distributing labels for all peers by LDP
PE1 P1 P3 PE3
PE2 P2 P4 PE4
Primary LSP
Backup LSP
LSP from P2 to PE3
10.2.2.16 Smart LDP Ingress Policy
Background
By default, the Label Distribution Protocol (LDP) establishes label switched paths (LSPs)
using Interior Gateway Protocol (IGP) host routes with 32-bit masks. A growing network has
an increasing number of routes that are used to establish a great number of LDP LSPs. Since
only some LDP LSPs are used to transmit services, the other LDP LSPs not in use cause
forwarding resource wastes, or LSPs for some services fail to be established.
Although manual policies prevent unwanted LDP LSPs from being established, they have the
following drawbacks:
l The configuration is complex and involves operations on multiple devices.
l Configuration errors may arise if policy configurations differ between devices.
To address these issues, a smart LDP ingress policy can be configured to only allow service-
specific LSPs to be established. This decreases resource consumption and simplifies manual
configuration.

Equipment
Usage Scenario
A smart LDP ingress policy is used when the following routes are reachable:
l Exact routes used by LDP in Downstream Unsolicited (DU) mode
l Exact routes used by LDP in Downstream on Demand (DoD) mode
l Longest match rule-based routes used by LDP in DU mode
The ingress running LDP obtains tunnel information based on BGP or VPN services and
establishes LSPs only for the services. This prevents establishment of unwanted LSPs, and
increases the efficiency of forwarding resources.
NOTE
In DU mode, the exact route rule and longest match rule take effect on routing information, not on LDP
LSP establishment. Therefore, exact routes and longest match rule-based routes are the same for the
establishment of smart LDP LSPs in DU mode.
Smart LDP ingress policies take effect on ingress LSPs, not transit or egress LSPs.
Figure 10-40 Smart LDP ingress policy networking
Ingress MPLS network Egress
Transit
Non-MPLS Non-MPLS
network network
PE1 P PE2
In Figure 10-40, an L2VPN, L3VPN, or BGP service is deployed between PE1 and PE2.
These services are similar, with the exception that an L2VPN service involves a remote LDP
session. A smart LDP ingress policy is configured on PE1 and PE2. With this policy
configured, PE1 runs LDP to obtain tunnel information for a specified service and to establish
an ingress LSP to PE2. PE2 runs LDP to obtain tunnel information for a specified service and
to establish an ingress LSP to PE1. The P functions as a transit node and does not need to
establish an ingress LSP.
Figure 10-41 shows the process of LDP smartly obtaining tunnel information.

Equipment
Figure 10-41 LDP smartly obtaining tunnel information
BGP L3VPN L2VPN
TNLM
LDP
In Figure 10-41, the implementation is as follows:
1. The BGP, L2VPN, or L3VPN service module notifies the tunnel management module
(TNLM) of tunnel iteration information.
2. The TNLM module advertises service-specific tunnel information to the LDP module.
3. The LDP module enforces the smart LDP ingress policy and uses service-specific tunnel
information to establish an ingress LSP.
Benefits
A smart LDP ingress policy helps reduce the number of ingress LSPs to be established and
minimize resource consumption, which ensures that LDP LSPs can be established based on
services.
10.2.2.17 Smart LDP Request Policy
Background
In seamless Multiprotocol Label Switching (MPLS) networking, downstream on demand
(DoD) LDP sessions are established on the access side, and an Interior Gateway Protocol
(IGP) advertises default routes (or static default routes are configured). In such deployment,
an ingress must be able to send requests based on a specified service to establish LSPs
because the ingress cannot obtain routing information stored in its routing table to establish
LSPs.
The service can only be L2VPN, not BGP or L3VPN. The existing remote-ip auto-dod-
request command enables an ingress to use a remote LDP session to automatically send DoD
requests to a specified peer for a Label Mapping message. The remote peer must have been
configured. This function is supported by L2VPN services, not BGP or L3VPN services. This
is because the BGP and L3VPN services do not need remote LDP peers that are mandatory
for L2VPN services. To overcome the drawback in BGP and L3VPN implementation, a smart
request policy can be used to allow a service to trigger a request to establish an ingress LSP.
With the smart request policy, LDP can obtain tunnel information needed by BGP or VPN
services to send requests to establish ingress LSPs, without deploying remote LDP peers.

Equipment
Usage Scenario
A smart LDP request policy is used when longest-match-rule routes are reachable for LDP in
Downstream Unsolicited (DU) mode.
NOTICE
An L2VPN service can be configured in DU and DoD mixed networking. In this situation, a
transit policy, not an ingress policy, is used to enable nodes to send requests to establish LSPs,
including unwanted transit LSPs. To allow only wanted LSPs to be established, a remote peer-
specific pseudo wire emulation edge-to-edge (PWE3) policy can be configured.
Figure 10-42 Smart request policy with the DoD mode configured
PE2 PE3
ABR1 ABR2
PE1
l If an L2VPN service is configured, a remote LDP session must be established between

PE1 and PE2 and between PE1 and PE3. The remote LDP sessions trigger the
establishment of ingress LSPs to PE2 and PE3.
l If an L3VPN service is configured between PE1 and PE2 and between PE1 and PE3, an
LDP DoD session is configured on an access ring on which PE2 resides, and each area
border router (ABR) advertises a default route across the access ring. In this situation,
PE1 does not obtain exact routes to PE2 and PE3 and has to obtain tunnel information
from the tunnel management module (TNLM) to establish ingress LSPs.
NOTE
This implementation also applies to L2VPN services and does not conflict with the auto DoD
request function.
Figure 10-43 shows the process of LDP smartly obtaining tunnel information.

Equipment
Figure 10-43 LDP smartly obtaining tunnel information
BGP L3VPN L2VPN
TNLM
LDP
1. The BGP, L2VPN, or L3VPN service module notifies the tunnel management module
(TNLM) of tunnel iteration information.
2. The TNLM module advertises service-specific tunnel information to the LDP module.
3. The LDP module enforces the smart LDP request policy and uses service-specific tunnel
information to establish an ingress LSP.
Benefits
In a BGP or VPN service scenario, a device can send requests to establish an ingress LSP,
without a remote LDP peer configured.
Terms
Term Definition
GR Helper A neighbor of a GR Restarter. The GR

Helper must support GR.
GR Restarter A node that is restarted by an administrator

or in case of failures. The GR Restarter
must support GR.
mLDP Multipoint extensions for Label Distribution

Protocol, used to establish an LDP tunnel
that consists of sub-LSPs destined for
multiple egress nodes to transmit multicast
services.

Equipment

Abbreviation
FRR fast reroute
GR graceful restart
GTSM Generalized TTL Security Mechanism
MTU maximum transmission unit
NSR non-stopping routing
P2MP point-to-multipoint
SRLG shared risk link group
TTL time to live
10.3 MPLS TE
10.3.1 MPLS TE
Multiprotocol Label Switching (MPLS) traffic engineering (TE) effectively schedules,
allocates, and uses existing network resources to provide sufficient bandwidth and support for
quality of service (QoS). MPLS TE helps carriers minimize expenditures without requiring
hardware upgrades. TE is implemented based on MPLS techniques and is easy to deploy and
maintain on live networks. MPLS TE supports a range of reliability techniques, which helps
backbone networks achieve carrier- and device-class reliability.
Purpose
Traffic engineering techniques are common for carriers operating IP/MPLS bearer networks.
These techniques are used to prevent traffic congestion and uneven resource allocation.
A node on a conventional IP network selects the shortest path as an optimal route, regardless
of other factors, for example, bandwidth. The shortest path may be congested with traffic,
whereas other available paths are idle.

Equipment
Figure 10-44 Conventional routing

LSRG LSRC LSRD LSRJ
80M
LSRB
LSRA LSRI LSRK

40M
LSRE LSRF LSRH
Each Link on the network shown in Figure 10-44 has a bandwidth of 100 Mbit/s and the
same metric value. LSRA sends LSRJ traffic at 40 Mbit/s, and LSRG sends LSRJ traffic at 80
Mbit/s. Traffic from both routers travels through the shortest path LSRA (LSRG) → LSRB →
LSRC → LSRD → LSRI → LSRJ that is calculated by an Interior Gateway Protocol (IGP)
protocol. As a result, the path LSRA (LSRG) → LSRB → LSRC → LSRD → LSRI → LSRJ
may be congested because of overload, while the path LSRA (LSRF) → LSRB → LSRE →
LSRF → LSRH → LSRI → LSRJ is idle.
Congestion is a major cause for poor performance of a backbone network. A network may be
congested because of insufficient resources or be partially congested because of network
resource imbalance. TE resolves congestion caused by load imbalance. Conventional TE
solutions are as follows:
l TE controls network traffic by adjusting the metric of a path. This method eliminates
congestion only on some links. Adjusting a metric is difficult on a complex network
because a link change affects multiple routes.
l TE directs some traffic to virtual connections (VCs) based on an overlay model. The
current IGPs are topology driven and applicable to only static network connections,
regardless of dynamic factors, such as bandwidth and traffic attributes.
The overlay model, such as IP over asynchronous transfer mode (ATM) or IP over frame
relay (FR), complements IGP disadvantages. An overlay model provides a virtual
topology over a physical topology for a network. This helps properly adjust traffic and
implement QoS features, but has high costs and poor extensibility.
A scalable and simple solution is required to implement TE on a large-scale network. MPLS,
an overlay model, allows a virtual topology to be established over a physical topology and
maps traffic to the virtual topology. MPLS can be integrated with TE. MPLS TE was
introduced.
Definition
MPLS TE establishes label switched paths (LSPs) satisfying specific constraints and
transparently transmits traffic over the LSPs based on labels. This satisfies constraints, such as
controllable paths and sufficient link bandwidth reserved for services transmitted over the
LSPs. MPLS TE can be used on the network shown in Figure 10-44 to address congestion.
MPLS TE establishes an 80 Mbit/s LSP over the path LSRG → LSRB → LSRC → LSRD →
LSRI → LSRJ and a 40 Mbit/s LSP over the path LSRA → LSRB → LSRE → LSRF →
LSRH → LSRI → LSRJ. MPLS TE directs traffic to the two LSPs, preventing congestion.

Equipment
Figure 10-45 MPLS TE

LSRG LSRC LSRD LSRJ
LSP 1
80M
LSRB
LSP 2
LSRA 40M LSRI LSRK
LSRE LSRF LSRH
Table 10-5 describes MPLS TE functions.
Table 10-5 MPLS TE functions

Module
Basic Includes basic MPLS TE settings and the tunnel establishment capability.
function
Tunnel Allows existing tunnels to be reestablished over other paths if the topology is
optimizati changed, or these tunnels can be reestablished using updated bandwidth if
on service bandwidth values are changed.
CR-LSP Uses configured CR-LSP attribute templates to establish CR-LSPs in batches.

attribute The templates facilitate CR-LSP configuration and management.
templates
Reliabilit Supports path protection, local protection, and node protection.

y function
Security Supports Resource Reservation Protocol (RSVP) authentication, which

improves signaling security over MPLS TE networks.
Static An extension to a static CR-LSP. Static bidirectional co-routed CR-LSPs

bidirectio transmit traffic on a transport network without an IGP enabled. They also
nal co- provide reverse paths for LSP ping reply packets, LSP tracert reply packets,
routed and operation, administration and maintenance (OAM) response packets.
CR-LSP
Dynamic An extension to a dynamic CR-LSP. Dynamic bidirectional associated CR-

bidirectio LSPs provide bandwidth protection for bidirectional services. If a fault occurs,
nal automatic protection switching (APS) is rapidly triggered to switch traffic
associated from a faulty working link to a protection link.
CR-LSP

Equipment
Module
P2MP TE Point-to-multipoint (P2MP) traffic engineering (TE) is a promising solution to

multicast service transmission. P2MP TE helps carriers provide high TE
capabilities and increased reliability on an IP/MPLS backbone network and
reduce network operational expenditure (OPEX).
Benefits
MPLS TE offers the following benefits:
l Provides sufficient bandwidth and supports QoS capabilities for services.
l Optimizes bandwidth allocation.
l Establishes public network tunnels to isolate virtual private network (VPN) traffic.
l Is implemented based on existing MPLS techniques and its deployment and maintenance
are simple.
l Supports carrier- and device-level reliability functions.
10.3.2 Principles
MPLS TE Tunnel
Multiple LSPs are bound together to form an MPLS TE tunnel. An MPLS TE tunnel is
uniquely identified by the following parameters:
l Tunnel interface: a P2P virtual interface that encapsulates packets. Similar to a loopback
interface, a tunnel interface is a logical interface. A tunnel interface name is identified by
an interface type and number. The interface type is "tunnel." The interface number is
expressed in the format of SlotID/CardID/PortID.
l Tunnel ID: a decimal number that identifies an MPLS TE tunnel and facilitates tunnel
planning and management. A tunnel ID must be specified before an MPLS TE tunnel
interface is configured.

Equipment
Figure 10-46 MPLS TE Tunnel and LSPs

Primary LSP
LSRB LSRC LSRD
MPLS TE Tunnel
LSRA LSRE
LSRF LSRG LSRH

Backup LSP
MPLS TE Tunnel:
Tunnel Interface = Tunnel 0/1/0
Tunnel ID = 100
Primary LSP ID = 2
Backup LSP ID = 32500
A primary LSP with LSP ID 2 is established along the path LSRA → LSRB → LSRC →
LSRD → LSRE on the network shown in Figure 10-46. A backup LSP with LSP ID 32500 is
established along the path LSRA → LSRF → LSRG → LSRH → LSRE. The two LSPs are in
a tunnel named Tunnel 0/1/0 with a tunnel ID 100.
CR-LSPs
LSPs in an MPLS TE tunnel are constraint-based routed LSPs (CR-LSPs).
Unlike Label Distribution Protocol (LDP) LSPs that are established using routing
information, CR-LSPs are established based on bandwidth and path constraints, in addition to
routing information.
Link Attributes
MPLS TE link attributes describe bandwidth resources, route costs, and link reliability. The
link attributes are as follows:
l Total link bandwidth
Bandwidth of all physical links.
l Maximum reservable bandwidth
Maximum bandwidth that a link can reserve for an MPLS TE tunnel to be established.
The maximum reservable bandwidth must be lower than or equal to the total link
bandwidth. The maximum reservable bandwidth can be manually set.
l TE metric
A TE metric is used in TE tunnel path calculation, allowing the calculation process to be
independent from IGP route-based path calculation. The IGP metric is used for MPLS
TE tunnels by default.
l SRLG

Equipment
A shared risk link group (SRLG) is a set of links which are likely to fail concurrently
when sharing a physical resource (for example, an optical fiber). Links in an SRLG share
the same risk of faults. If one link fails, other links in the SRLG also fail.
An SRLG enhances CR-LSP reliability on an MPLS TE network enabled with CR-LSP
hot standby or TE FRR. For more information about the SRLG, see SRLG.
l Link administrative group
Link administrative group is also called link color. A link administrative group is a 32-bit
vector, with each bit set to a specified value that is associated with a desired meaning.
For example, a link administrative group attribute can be configured to describe link
bandwidth, a performance parameter (for example, the delay time) or a management
policy. The policy can be a traffic type (for example, multicast) or a flag indicating that
an MPLS TE tunnel passes over the link. The link administrative group attribute is used
together with affinities to control the paths for tunnels.
Tunnel Attributes
MPLS TE tunnels support the following attributes:
l Bandwidth
Bandwidth values are planned based on services that are to pass through a tunnel. The
configured bandwidth is reserved on each node through which a tunnel passes.
l Affinity attribute
An affinity is a 32-bit vector, configured on the ingress of a tunnel. It must be used
together with a link administrative group attribute.
After a tunnel is configured with an affinity, a device compares the affinity with the
administrative group value during link selection to determine whether a link with
specified attributes is selected or not. The device implements two AND operations, one
between a 32-bit mask and each affinity, and one between the 32-bit mask and the
administrative group value. If the two AND operations yield the same results, the path is
selected. If the results are different, the path is not selected. The following rules apply:
– If some bits in a mask are 1s, at least one bit in the administrative group is 1, and
the corresponding bit in the affinity must be 1. If some bits in the affinity are 0s, the
corresponding bits in the administrative group cannot be 1.
For example, an affinity is 0x0000FFFF and its mask is 0xFFFFFFFF. The higher-
order 16 bits in the administrative group of available links are 0 and at least one of
the lower-order 16 bits is 1. This means the administrative group attribute ranges
from 0x00000001 to 0x0000FFFF.
– If some bits in a mask are 0s, the corresponding bits in the administrative group are
not compared with the affinity bits.
For example, an affinity is 0xFFFFFFFF, and its mask is 0xFFFF0000. At least one
of the higher-order 16 bits in an administrative group attribute is 1, and the lower-
order 16 bits can be 0s and 1s. This means that the administrative group attribute
ranges from 0x00010000 to 0xFFFFFFFF.
NOTE
Understand specific comparison rules before deploying devices of different vendors because the
comparison rules vary with vendors.
A network administrator can use the link administrative group and affinities to control
the paths over which MPLS TE tunnels are established.
l Explicit path

Equipment
An explicit path used to establish a CR-LSP. Nodes to be included or excluded are

specified on this path. Explicit paths are classified into the following types:
– Strict explicit path
A hop is directly connected to its next hop on a strict explicit path. By specifying a
strict explicit path, the most accurate path is provided for a CR-LSP.
Figure 10-47 Strict explicit path

LSRA LSRB LSRD LSRF
Strict
B Strict
C Strict
E Strict
LSRC LSRE D Strict
F Strict
For example, a CR-LSP is established between LSRA and LSRF on the network
shown in Figure 10-47. LSRA is the ingress, and LSRF is the egress. "X strict"
specifies the LSR through which the CR-LSP must travel. For example, "B strict"
indicates that the CR-LSP must travel through LSRB, and the previous hop of
LSRB must be LSRA. "C strict" indicates that the CR-LSP must travel through
LSRC, and the previous hop of LSRC must be LSRB. The procedure repeats. A
path with each node specified is provided for the CR-LSP.
– Loose explicit path
A loose explicit path contains specified nodes through which a CR-LSP must pass.
Other routers that are not specified can also exist on the CR-LSP.
Figure 10-48 Loose explicit path

LSRA LSRB LSRD LSRF
Loose
LSRC LSRE D Loose
For example, a CR-LSP is established over a loose explicit path between LSRA and
LSRF on the network shown in Figure 10-48. LSRA is the ingress, and LSRF is the
egress. "D loose" indicates that the CR-LSP must pass through LSRD and LSRD
and LSRA may not be directly connected. This means that other LSRs may exist
between LSRD and LSRA.
l Hop limit
Hop limit is a condition for path selection during CR-LSP establishment. Similar to the
administrative group and affinity attributes, a hop limit defines the number of hops that a
CR-LSP allows.

Equipment
l Route pinning
Any changes in the network topology or tunnel functions may cause an established CR-
LSP to be reestablished, leading to the following issues:
– The reestablished CR-LSP may be over a path that is different from the original
one, causing management difficulties.
– Traffic must switch from the original CR-LSP to the new one, causing traffic loss.
Route pinning can be used to resolve the preceding problems. Route pinning helps an
established CR-LSP remain over a path regardless of route changes. This function
improves service traffic continuity and reliability.
l Priorities and preemption
They are used to allow TE tunnels to be established preferentially to transmit important
services, preventing random resource competition during tunnel establishment.
If there is no path meeting the bandwidth requirement of a desired CR-LSP, a device can
tear down an established CR-LSP and use the bandwidth assigned to that CR-LSP to
establish a desired CR-LSP. This is called preemption. The following preemption modes
are supported:
– Hard preemption: A CR-LSP with a higher priority can directly preempt resources
assigned to a CR-LSP with a lower priority. Traffic is dropped on the CR-LSP with
a lower priority during the hard preemption process until the lower priority tunnel is
reestablished.
– Soft preemption: The make-before-break mechanism applies. A CR-LSP with a
higher priority has to wait until traffic over a lower-priority CR-LSP switches to
another CR-LSP before the higher-priority CR-LSP preempts bandwidth assigned
to the lower-priority CR-LSP.
CR-LSPs use setup and holding priorities to determine whether to preempt resources.
The priority value ranges from 0 to 7. A smaller value allows for a higher priority. The
setup priority must be lower than or equal to the holding priority for a tunnel.
The priority and preemption attributes are used in conjunction to determine resource
preemption among tunnels. If multiple CR-LSPs are to be established, CR-LSPs with
high priorities can be established by preempting resources. If resources (such as
bandwidth) are insufficient, a CR-LSP with a higher setup priority can preempt resources
of an established CR-LSP with a lower holding priority.
The following tunnels are established on the network shown in Figure 10-49.
– Tunnel 1: established over the path LSRA → LSRF → LSRD. Its bandwidth is 155
Mbit/s, and its setup and holding priority values are 0.
– Tunnel 2: established over the path LSRB → LSRF → LSRC. Its bandwidth is 150
Mbit/s, and its setup and holding priority values are 7.
If the link between LSRF and LSRD fails, LSRA recalculates a path LSRA → LSRF →
LSRC → LSRE → LSRD for tunnel 1. The link between LSRF and LSRC is shared by
tunnels 1 and 2, but has insufficient bandwidth for these two tunnels. As a result,
preemption is triggered.

Equipment
Figure 10-49 Preemption based on priorities

LSRA LSRB
Tunnel 1 Tunnel 2
155M 155M
LSRF
150M
200M 155M
155M 155M
LSRC LSRE LSRD
– If hard preemption is used, LSRF directly sends an RSVP message to tear down
tunnel 2 because tunnel 1 has a higher priority than tunnel 2. As a result, some
traffic on tunnel 2 is dropped if tunnel 2 is transmitting traffic.
– If soft preemption is used, LSRF sends LSRC an RSVP message. After LSRC
receives this message, LSRC reestablishes tunnel 2 over another path LSRB →
LSRD → LSRE → LSRC. LSRC switches traffic to the new path before tearing
down tunnel 2 over the original path.
10.3.2.2 Implementation
An MPLS TE tunnel is established using four components. Table 10-6 lists the components
and describes their functions.
Table 10-6 MPLS TE components

N Name Description
o.
1 Informatio Extends an IGP to advertise TE information, in addition to routing

n information. TE information includes the maximum link bandwidth,
Advertise maximum reservable bandwidth, reserved bandwidth, and link colors.
ment Every node collects TE information about all nodes in a local area and
Componen generates a traffic engineering database (TEDB).
t
2 Path Runs Constraint Shortest Path First (CSPF) and uses TEDB data to
Selection calculate a path that satisfies specific constraints. CSPF evolves from
Componen the Shortest Path First (SPF) protocol. CSPF excludes nodes and links
t that do not satisfy specific constraints and uses the same algorithm that
SPF supports to calculate a path.

Equipment
N Name Description
o.
3 Path Establishes the following types of CR-LSPs:

Establish l Static CR-LSP: established by manually configuring labels and
ment bandwidth, irrespective of signaling protocols or path calculation.
Componen Establishing a static CR-LSP consumes a few resources because no
t MPLS control packets are exchanged between two ends of the CR-
LSP. The static CR-LSP cannot be adjusted dynamically in a
changeable network topology; therefore, the static CR-LSP is not
widely used.
l Dynamic CR-LSP: established using RSVP-TE signaling. RSVP-
TE carries parameters, such as the bandwidth, explicit routes, and
affinities. There is no need to manually configure each hop along a
dynamic CR-LSP. Dynamic CR-LSPs apply to large-scale
networks.
4 Traffic Directs traffic to a CR-LSP and forwards the traffic along the CR-LSP.
Forwardin Although a CR-LSP can be established using the preceding three
g components, the CR-LSP cannot automatically import traffic. The
Componen traffic forwarding component can be used to direct traffic to the CR-
t LSP.
NOTE
l A static CR-LSP is manually established, and there is no need to use the information advertisement
component or the path calculation component.
l A dynamic CR-LSP is dynamically established by signaling. Therefore, all the preceding
components are used to establish a dynamic CR-LSP.
A network administrator can configure link and tunnel attributes to enable MPLS TE to
automatically establish a CR-LSP. The network administrator can then direct traffic to the
CR-LSP and forward traffic over the CR-LSP.
10.3.2.3 Information Advertisement Component

The information advertisement component advertises network resource information over an
MPLS TE network. TE is used to control network traffic distribution, which optimizes
network resource usage. All nodes, especially ingress nodes on an MPLS TE network, must
obtain information about link resources to determine the paths and nodes for MPLS TE
tunnels.
Contents to Be Advertised
The network resource information to be advertised includes the following items:
l Link status information: interface IP addresses, link types, and link metric values, which
are collected by an Interior Gateway Protocol (IGP)
l Bandwidth information, such as maximum link bandwidth and maximum reservable
bandwidth
l TE metric: TE link metric, which is the same as the IGP metric by default

Equipment
l Administrative Group and Affinity Attributes

l SRLG
Advertisement Methods
Either of the following link status protocol extensions can be used to advertise TE
information:
l OSPF TE
l IS-IS TE
Open Shortest Path First (OSPF) TE and Intermediate System to Intermediate System (IS-IS)
TE automatically collect TE information and flood it to MPLS TE nodes.
When to Advertise Information

OSPF TE or IS-IS TE floods link information so that each node can save area-wide link
information in a traffic engineering database (TEDB). Information flooding is triggered by the
establishment of an MPLS TE tunnel, or one of the following conditions:
l A specific IGP TE flooding interval elapses.

l A link is activated or deactivated.
l A CR-LSP in an MPLS TE tunnel fails to be established because of insufficient
bandwidth.
l Link attributes, such as the administrative group attribute or affinity attribute change.
l The link bandwidth changes.
When the available bandwidth of an MPLS interface changes, the system automatically
updates information in the TEDB and floods it. When a lot of tunnels are to be
established on a node, the node reserves bandwidth and frequently updates information
in the TEDB and floods it. For example, the bandwidth of a link is 100 Mbit/s. If 100 TE
tunnels, each with bandwidth of 1 Mbit/s, are established, the system floods link
information 100 times.
To help suppress the frequency at which TEDB information is updated and flooded, the
flooding is triggered based on either of the following conditions:
– The proportion of the bandwidth reserved for an MPLS TE tunnel to the available
bandwidth in the TEDB is greater than or equal to a specific threshold.
– The proportion of the bandwidth released by an MPLS TE tunnel to the available
bandwidth in the TEDB is greater than or equal to a specific threshold.
If either of the preceding conditions is met, an IGP floods link bandwidth information,
and constraint shortest path first (CSPF) updates the TEDB.
For example, the available bandwidth of a link is 100 Mbit/s and 100 TE tunnels, each
with bandwidth of 1 Mbit/s, are established over the link. The flooding threshold is 10%.
The Figure 10-50 shows the proportion of the bandwidth reserved for each MPLS TE
tunnel to the available bandwidth in the TEDB.
Bandwidth flooding is not performed when tunnels 1 to 9 are created. After tunnel 10 is
created, the bandwidth information (10 Mbit/s in total) on tunnels 1 to 10 is flooded. The
available bandwidth is 90 Mbit/s. Similarly, no bandwidth information is flooded after
tunnels 11 to 18 are created. After tunnel 19 is created, bandwidth information on tunnels
11 to 19 is flooded. The process repeats until tunnel 100 is established.

Equipment
Figure 10-50 Proportion of the bandwidth reserved for each MPLS TE tunnel to the
available bandwidth in the TEDB
10% 10%
9% 8.9%
8% 7.8%
7% 6.7%
6% 5.6%
5%
4.4%
4%
3.3% 3.8%
3%
2% 2.2% 2.5%
1% 1.1% 1.3%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22......
Original available First flooding Second flooding

bandwidth 100M bit/s Available bandwidth Available bandwidth
90M bit/s 80M bit/s
Results Obtained After Information Advertisement

Every node creates a TEDB in an MPLS TE area after OSPF TE or IS-IS TE floods
bandwidth information.
TE parameters are advertised during the deployment of an MPLS TE network. Every node
collects TE link information in the MPLS TE area and saves it in a TEDB. The TEDB
contains network link and topology attributes, including information about the constraints and
bandwidth usage of each link.
A node calculates the optimal path to another node in the MPLS TE area based on
information in the TEDB. MPLS TE then establishes a CR-LSP over this optimal path.
The TEDB and IGP link-state data base (LSDB) are independent of each other. The two types
of database both collect routing information flooded by IGPs, but they differ in the following
ways:
l A TEDB contains TE information in addition to all information in an LSDB.
l An IGP uses information in an LSDB to calculate the shortest path, whereas MPLS TE
uses information in a TEDB to calculate the optimal path.
10.3.2.4 Path Calculation Component

Intermediate System to Intermediate System (IS-IS) or Open Shortest Path First (OSPF) uses
shortest path first (SPF) to calculate the shortest paths between nodes. MPLS TE uses
constrained shortest path first (CSPF) to calculate the optimal path to a specific node. CSPF,
which is derived from SPF, is an algorithm that supports constraints.
CSPF Fundamentals
CSPF works based on the following parameters:

Equipment
l Tunnel attributes configured on an ingress to establish a CR-LSP

l Traffic engineering database (TEDB)
NOTE
A TEDB can be generated only after Interior Gateway Protocol (IGP) TE is configured. On an IGP TE-
incapable network, CR-LSPs are established based on IGP routes, but not CSPF calculation results.
CSPF Calculation Process

The CSPF calculation process is as follows:
1. Links that do not meet tunnel attribute requirements in the TEDB are excluded.
2. SPF calculates the shortest path to a tunnel destination based on TEDB information.
NOTE
CSPF attempts to use the OSPF TEDB to establish a path for a CR-LSP by default. If a path is
successfully calculated using OSPF TEDB information, CSPF completes calculation and does not use
the IS-IS TEDB to calculate a path. If path calculation fails, CSPF attempts to use IS-IS TEDB
information to calculate a path.
CSPF can be configured to use the IS-IS TEDB to calculate a CR-LSP path. If path
calculation fails, CSPF uses the OSPF TEDB to calculate a path.
CSPF calculates the shortest path to a destination. If there are several shortest paths with the
same metric, CSPF uses a tie-breaking policy to select one of them. The following tie-
breaking policies for selecting a path are available:
l Most-fill: selects a link with the highest proportion of used bandwidth to the maximum
reservable bandwidth, efficiently using bandwidth resources.
l Least-fill: selects a link with the lowest proportion of used bandwidth to the maximum
reservable bandwidth, evenly using bandwidth resources among links.
l Random: selects links randomly, allowing LSPs to be established evenly over links,
regardless of bandwidth distribution.
When several links have the same proportion of used bandwidth to the maximum reservable
bandwidth (for example, the links do not use the reserved bandwidths or the same bandwidth
is used on every link), the link discovered first is selected, irrespective of whether most-fill or
least-fill is configured.
For example, CSPF removes links marked blue and links each with bandwidth of 50 Mbit/s
based on tunnel constraints. It uses other links each with bandwidth of 100 Mbit/s to calculate
a path for an MPLS TE tunnel on the network shown in Figure 10-51. The constraints include
the destination LSRE, bandwidth of 80 Mbit/s, and a transit node LSRH.

Equipment
Figure 10-51 Link removal process

LSR B LSR C LSR D
50
50 Bl Bl
ue ue
ue
Bl
LSR A LSR E
50
LSR F LSR G LSR H
MPLS TE Tunnel 0/1/0:

Destination = LSR E
Bandwidth = 80M
Affinity Property = Black
H Loose
LSRC LSRD
Calculated topology
LSRA LSRE
LSRF LSRG LSRH
CSPF calculates a path shown in Figure 10-52 in the same way SPF would calculate it.
Figure 10-52 CSPF calculation result

LSR D
LSR A LSR E
LSR F LSR G LSR H
Differences Between CSPF and SPF

CSPF is dedicated to calculating MPLS TE paths. It has similarities with SPF, but they have
the following differences:
l CSPF calculates the shortest path between the ingress and egress. SPF calculates the
shortest path between a node and each of other nodes on a network.
l CSPF uses metrics such as the bandwidth, link attributes, and affinity attributes, in
addition to link costs, which are the only metric used by SPF.

Equipment
l CSPF does not support load balancing and uses three tie-breaking policies to determine a
path if multiple paths have the same attributes.
10.3.2.5 Establishing a CR-LSP Using RSVP-TE
RSVP-TE Overview
The Resource Reservation Protocol (RSVP) is designed on the basis of the Integrated
Services model. RSVP can reserve resources on each node of a CR-LSP. RSVP, an Internet
control protocol, operates at the transport layer and does not transport application data. RSVP-
TE is an extension to RSVP. RSVP-TE can establish or delete CR-LSPs using TE attributes in
extended objects.
RSVP-TE has the following unique aspects compared with RSVP:
l RSVP-TE appends Label Request objects to RSVP Path messages to request labels. Resv
messages carry Label objects that are used to allocate labels. TE tunnels can be
established based on the labels.
l The extended RSVP messages can carry information about path constraint parameters, in
addition to label binding information.
l RSVP-TE supports MPLS TE attributes, such as resource reservation, carried in the
extended objects.
RSVP-TE Principles
Table 10-7 lists RSVP-TE principles.
Table 10-7 RSVP-TE principles

Function Module Description
CR-LSP A CR-LSP is established over a path calculated by CSPF or an

Establishment explicit path configured on the ingress.
Path Maintenance RSVP-TE sends messages to maintain the path status on each
node.
Path Teardown A CR-LSP is torn down and releases labels and bandwidths on
each node. The ingress initiates the request for a teardown.
Fault RSVP nodes send advertisements to notify upstream and

Advertisement downstream nodes of faults that occur during path establishment or
maintenance.
CR-LSP Establishment
Figure 10-53 shows the process of establishing an RSVP-TE CR-LSP.
Figure 10-53 Process of establishing an RSVP-TE CR-LSP

1 Path 2 Path 3 Path
if1 if0 if1 if0 if1 if0
6 Resv 5 Resv 4 Resv

PE1 P1 P2 PE2

Equipment
1. PE1 uses CSPF to calculate a path between PE1 and PE2. The IP address of every hop
on this path has been specified. PE1 generates a Path message and creates a PSB. PE1
then adds the explicit route object (ERO) field containing a list of IP addresses calculated
by CSPF, and sends the Path message to P1 along the path specified by the ERO.
Figure 10-54 Path message on PE1

OBJECT Value
SESSION Source：PE1；Destination：PE2
HOP PE1-if1
EXPLICIT_ROUTE P1-if0；P2-If0；PE2-if0
LABEL LABEL_REQUEST


PE1 P1 P2 PE2
2. After P1 receives the Path message, P1 parses the message and creates a PSB based on
the Path message. P1 then generates a new Path message and sends it to P2 based on the
ERO.
– The HOP field in the Path message updated by PE1 specifies the IP address of the
outbound interface through which PE1 sends the message to P1. The HOP field in
the Path message updated by P1 specifies the IP address of the outbound interface
through which P1 sends the message to P2.
– P1 deletes the local LSR ID and IP addresses of the inbound and outbound
interfaces from the ERO field in the Path message.
Figure 10-55 Path message on P1

OBJECT Value
HOP P1-if1
EXPLICIT_ROUTE P2-If0；PE2-if0
LABEL LABEL_REQUEST


PE1 P1 P2 PE2
3. P2 deals with the received Path message in the same process as that on P1. P2 creates a
PSB based on the Path message, updates the new Path message, and sends it to PE2.

Equipment
Figure 10-56 Path message on P2

OBJECT Value
HOP P2-If1
EXPLICIT_ROUTE PE2-if0
LABEL LABEL_REQUEST


PE1 P1 P2 PE2
4. After PE2 receives a Path message, PE2 knows that it is the egress of the CR-LSP to be
established based on the Tunnel Address field in the Session object. PE2 then allocates a
label and bandwidth resources, and generates an RSB based on the Resv message. The
Resv message is sent to P2 and carries the label which is allocated by PE2.
Different from the destination IP address in the Path message, the destination IP address
of the Resv message sent by PE2 is the IP address carried in the HOP field of the
received Path message, not the LSR ID of the ingress. The IP header of the Resv
message does not need to contain the Router Alert option.
The Resv message is forwarded along the reverse path. Therefore, the Resv message
does not carry the ERO field.
Figure 10-57 Resv message on PE2


PE1 P1 P2 PE2
OBJECT Value
SESSION Source：PE2-if0；Destination：P2-if1
HOP PE2-if0
LABEL 3
REQUEST RECORD_ROUTE PE2-if0
5. After P2 receives the Resv message, P2 creates an RSB based on the Resv message,
allocates a new label, updates the Resv message, and sends the message to P1.

Equipment
Figure 10-58 Resv message on P2


PE1 P1 P2 PE2
OBJECT Value
SESSION Source：P2-if0；Destination：P1-if1
HOP P2-if0
LABEL 17
REQUEST RECORD_ROUTE P2-if0；PE2-if0
6. P1 deals with the received Resv message in the same process as that on P2. P1 updates
the Resv message and sends it to PE1.
Figure 10-59 Resv message on P1


PE1 P1 P2 PE2
OBJECT Value
SESSION Source：P1-if0；Destination：PE1-if1
HOP P1-if0
LABEL 18
REQUEST RECORD_ROUTE P1-if0；P2-If0；PE2-if0
7. PE1 obtains the label allocated by P1 based on the received Resv message. Resource
reservation succeeds and a CR-LSP is established.
Reservation Styles
The treatment style of reserving resources for different senders within the same session is
called a reservation style. The following reservation styles are supported:
l Fixed Filter (FF) style: allows a particular sender to create a separate reservation for a
tunnel. This sender does not share its resource reservation with other senders. A resource
reservation on the same link is used by a specific CR-LSP.
l Shared Explicit (SE) style: allows a set of selected upstream senders to share a single
reservation. The same resource reservation on the same link is shared by different CR-
LSPs.

Equipment
RSVP Soft State Mechanism and Path Status Maintenance

RSVP sends its messages as IP datagrams with no reliability enhancement. Nodes along an
established CR-LSP periodically send Path and Resv messages to maintain neighbor status.
These messages are called RSVP Refresh messages, and they are used to synchronize PSB
and RSB information between nodes. This is the RSVP soft state mechanism.
NOTE
A Refresh message is not a new type of message. Refresh messages are the messages that have already
been advertised.
The refreshing interval is specified in the Time Value field.
If the PSB or RSB does not receive any Refresh message about a specific state block after the
specified refreshing intervals elapses, it deletes the state.
RSVP Refresh messages can also monitor the reachability between RSVP neighbors and
maintain RSVP neighbor relationships.
Figure 10-60 shows an RSVP Refresh message. Path and Resv messages are sent separately.
Figure 10-60 RSVP Refresh message

PE1 P1 P2 PE2
Path Resv Path Resv Path Resv
Time
0:00
0:30
1:00
1:30
. . . .
. . . .
. . . .
Path Teardown
After a user instructs an ingress to delete a CR-LSP or the ingress receives a PathErr message,
the ingress sends a PathTear message to a downstream node. The downstream node receives
this message, tears down the CR-LSP, and replies to the ingress with a ResvTear message.
The functions of PathTear and ResvTear messages are as follows:

l A PathTear message instructs a node to remove saved path information. The PathTear
message functions in the opposite way to a Path message.
l A ResvTear message instructs a node to remove resource reservation status. The
ResvTear message functions in the opposite way to a Resv message.

Equipment
Error Signaling
RSVP-TE uses the following messages to advertise LSP errors.
l PathErr message: sent upstream by an RSVP node if an error occurs while this node is
processing a Path message. A PathErr message is forwarded by consecutive transit nodes
and arrives at the ingress.
l ResvErr message: sent downstream by an RSVP node if an error occurs while this node
is processing a Resv message. A ResvErr message is forwarded by consecutive transit
nodes and arrives at the egress.
10.3.2.6 RSVP Summary Refresh

RSVP summary refresh (Srefresh) function enables a node to send digests of RSVP Refresh
messages to maintain RSVP soft states and respond to RSVP soft state changes, which
reduces signaling packets used to maintain the RSVP soft states and optimizes bandwidth
allocation.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships. As the sizes of Path and
Resv messages are larger, sending many messages to establish many CR-LSPs causes
increased consumption of network resources. RSVP Srefresh can be used to address this
problem.
Implementation
RSVP Srefresh defines new objects based on the existing RSVP protocol:
l Message_ID extension and retransmission extension
The Srefresh extension builds on the Message_ID extension. According to the
Message_ID extension mechanism defined in RFC 2961, RSVP messages carry
extended objects, including Message_ID and Message_ID_ACK objects. The two
objects are used to confirm RSVP messages and support reliable RSVP message
delivery.
The Message_ID object can also be used to provide the RSVP retransmission
mechanism. For example, a node initializes a retransmission interval as Rf seconds after
it sends an RSVP message carrying the Message_ID object. If the node receives no ACK
message within Rf seconds, the node retransmits an RSVP message after (1 + Delta) x
Rf seconds. The Delta determines the increased rate of the transmission interval set by
the sender. The node keeps retransmitting the message until it receives an ACK message
or the retransmission times reach a specific threshold (called a retransmission increment
value).
l Summary Refresh extension
The Summary Refresh extension supports Srefresh messages to update the RSVP status,
without standard Path or Resv messages transmitted.
Each Srefresh message carries a Message_ID object. Each object contains multiple
messages IDs, each of which identifies a Path or Resv state to be refreshed. If a CR-LSP
changes, its message ID value increases.
Only the state that was previously advertised by Path and Resv messages containing
Message_ID objects can be refreshed using the Srefresh extension.

Equipment
After a node receives an Srefresh message, the node compares the Message_ID with that
saved in a local state block.
– If they match, the node does not change the state.
– If the Message_ID is greater than that saved in the local state block, the node sends
a NACK message to the sender, refreshes the PSB or RSB based on the Path or
Resv message, and updates the Message_ID.
10.3.2.7 RSVP Hello

The RSVP Hello extension can rapidly monitor the reachability of RSVP nodes. If an RSVP
node becomes unreachable, TE FRR protection is triggered. The RSVP Hello extension can
also monitor whether an RSVP GR neighboring node is in the restart process.
Background
RSVP Refresh messages are used to synchronize path state block (PSB) and reservation state
block (RSB) information between nodes. They can also be used to monitor the reachability
between RSVP neighbors and maintain RSVP neighbor relationships.
Using Path and Resv messages to monitor neighbor reachability delays a traffic switchover if
a link fault occurs and therefore is slow. The RSVP Hello extension can address this problem.
Related Concepts
l RSVP Refresh messages: Although an MPLS TE tunnel is established using Path and
Resv messages, RSVP nodes still send Path and Resv messages over the established
tunnel to update the RSVP status. These Path and Resv messages are called RSVP
Refresh messages.
l RSVP GR: ensures uninterrupted transmission on the forwarding plane when an
AMB/SMB switchover is performed on the control plane. A GR helper assists a GR
restarter in rapidly restoring the RSVP status.
l TE FRR: a local protection mechanism for MPLS TE tunnels. If a fault occurs on a
tunnel, TE FRR rapidly switches traffic to a bypass tunnel.
Implementation
The principles of the RSVP Hello extension are as follows:
1. Hello handshake mechanism
Figure 10-61 Hello handshake mechanism

Hello Repuest
LSRA Hello ACK LSRB
LSRA and LSRB are directly connected on the network shown in Figure 10-61.
– If RSVP Hello is enabled on LSRA, LSRA sends a Hello Request message to
LSRB.

Equipment
– After LSRB receives the Hello Request message and is also enabled with RSVP
Hello, LSRB sends a Hello ACK message to LSRA.
– After receiving the Hello ACK message, LSRA considers LSRB reachable.
2. Detecting neighbor loss
After a successful Hello handshake is implemented, LSRA and LSRB exchange Hello
messages. If LSRB does not respond to three consecutive Hello Request messages sent
by LSRA, LSRA considers router B lost and re-initializes the RSVP Hello process.
3. Detecting neighbor restart
If LSRA and LSRB are enabled with RSVP GR, and the Hello extension detects that
LSRB is lost, LSRA waits for LSRB to send a Hello Request message carrying a GR
extension. After receiving the message, LSRA starts the GR process on LSRB and sends
a Hello ACK message to LSRB. After receiving the Hello ACK message, LSRB
performs the GR process and restores the RSVP soft state. LSRA and LSRB exchange
Hello messages to maintain the restored RSVP soft state.
NOTE
There are two scenarios if a CR-LSP is established between LSRs:

l If GR is disabled and FRR is enabled, FRR switches traffic to a bypass CR-LSP after the Hello
extension detects that the RSVP neighbor relationship is lost to ensure proper traffic transmission.
l If GR is enabled, the GR process is performed.
The RSVP Hello extension applies to networks enabled with both RSVP GR and TE FRR.
10.3.2.8 Traffic Forwarding Component

The traffic forwarding component imports traffic to a tunnel and forwards traffic over the
tunnel. Although the information advertisement, path selection, and path establishment
components are used to establish a CR-LSP in an MPLS TE tunnel, a CR-LSP (unlike an
LDP LSP) cannot automatically import traffic. The traffic forwarding component must be
used to import traffic to the CR-LSP before it forwards traffic based on labels.
Static Route
Static route is the simplest method for directing traffic to a CR-LSP in an MPLS TE tunnel. A
TE static route works in the same way as a common static route and has a TE tunnel interface
as an outbound interface.
Auto Route
An Interior Gateway Protocol (IGP) uses an auto route related to a CR-LSP in a TE tunnel
that functions as a logical link to calculate a path. The tunnel interface is used as an outbound
interface in the auto route. The TE tunnel is considered a P2P link with a specified metric
value. The following auto routes are supported:
l IGP shortcut: A route related to a CR-LSP is not advertised to neighbor nodes,
preventing other nodes from using the CR-LSP.
l Forwarding adjacency: A route related to a CR-LSP is advertised to neighbor nodes,
allowing these nodes to use the CR-LSP.
Forwarding adjacency allows tunnel information to be advertised based on IGP neighbor
relationships.

Equipment
If the forwarding adjacency is used, nodes on both ends of a CR-LSP must be in the
same area.
The following example demonstrates the IGP shortcut and forwarding adjacency.
Figure 10-62 Schematic diagram for IGP shortcut and forwarding adjacency
LSRH
LSRC LSRD
10
10
0
LSRB 1 5 LSRE
10 10
10
LSRA 10
MPLS TE Tunnel 1 LSRF LSRG

TE Metric=10
Node Mode Destination Nexthop Cost

LSRB LSRG 20
LSRE
Forwarding LSRA LSRG 30
adjacency LSRB Tunnel 1 10
LSRG
LSRA Tunnel 1 20
LSRB LSRD 25
LSRE
LSRA LSRD 35
Shortcut
LSRB Tunnel 1 10
LSRG
LSRA Tunnel 1 20
A CR-LSP over the path LSRG → LSRF → LSRB is established on the network shown in
Figure 10-62, and the TE metric values are specified. Either of the following configurations
can be used:
l The auto route is not used. LSRE uses LSRD as the next hop in a route to LSRA and a
route to LSRB; LSRG uses LSRF as the next hop in a route to LSRA and a route to
LSRB.
l The auto route is used. Either IGP shortcut or forwarding adjacency can be configured:
– The IGP shortcut is used to advertise the route of Tunnel 1. LSRE uses LSRD as the
next hop in the route to LSRA and the route to LSRB; LSRG uses Tunnel 1 as the
next hop in the route to LSRA and the route to LSRB. LSRG, unlike LSRE, uses
Tunnel 1 in IGP path calculation.
– The forwarding adjacency is used to advertise the route of Tunnel 1. LSRE uses
LSRG as the next hop in the route to LSRA and the route to LSRB; LSRG uses
Tunnel 1 as the next hop in the route to LSRA and the route to LSRB. Both LSRE
and LSRG use Tunnel 1 in IGP path calculation.

Equipment
Policy-based Routing
The policy-based routing (PBR) allows the system to select routes based on user-defined
policies, improving security and load balancing traffic. If PBR is enabled on an MPLS
network, IP packets are forwarded over specific CR-LSPs based on PBR rules.
MPLS TE PBR, the same as IP unicast PBR, is implemented based on a set of matching rules
and behaviors. The rules and behaviors are defined using an apply clause, in which the
outbound interface is a specific tunnel interface. If packets do not match PBR rules, they are
properly forwarded using IP; if they match PBR rules, they are forwarded over specific CR-
LSPs.
Tunnel Policy
Tunnel policies applied to virtual private networks (VPNs) guide VPN traffic to tunnels in
either of the following modes:
l Select-seq mode: The system selects tunnels for VPN traffic in the specified tunnel
selection sequence.
l Tunnel binding mode: A CR-LSP is bound to a destination address in a tunnel policy.
This policy applies only to CR-LSPs.
10.3.3 Tunnel Optimization
10.3.3.1 Tunnel Re-optimization

An MPLS TE tunnel can be automatically reestablished over a new optimal path (if one
exists) if topology information is updated.
Background
MPLS TE tunnels are used to optimize traffic distribution over a network. An MPLS TE
tunnel is configured using static information, such as a bandwidth setting and a calculated
path. Without the optimization function, an MPLS TE tunnel cannot be automatically updated
after the service bandwidth or a tunnel management policy changes. This wastes network
resources. MPLS TE tunnels need to be optimized after being established.
Implementation
The optimization enables the CR-LSP to be reestablished over the optimal path with the
smallest metric. A specific event that occurs on the ingress can trigger optimization for a CR-
LSP bound to an MPLS TE tunnel.
NOTE
l Re-optimization is disabled by default.

l The fixed filter (FF) reservation style and CR-LSP re-optimization cannot be configured
simultaneously.
l Re-optimization cannot be performed for a CR-LSP that is established over an explicit path.
Re-optimization is classified into the following modes:

l Automatic re-optimization

Equipment
After the interval at which a CR-LSP is optimized elapses, constraint shortest path first
(CSPF) attempts to calculate a new path. If the calculated path has a metric smaller than
that of the existing CR-LSP, a new CR-LSP is established over the new path. After the
CR-LSP is successfully established, the ingress instructs the forwarding plane to switch
traffic to the new CR-LSP and tear down the original CR-LSP. Re-optimization is then
complete. If the CR-LSP fails to be established, traffic is still forwarded along the
existing CR-LSP.
l Manual re-optimization
A re-optimization command is run in the user view to trigger re-optimization.
The make-before-break mechanism is used to ensure uninterrupted service transmission

during the re-optimization process. Traffic must be switched to a new CR-LSP before the
original CR-LSP is torn down.
10.3.4 CR-LSP Attribute Templates
CR-LSP attribute templates can be used to flexibly configure MPLS TE tunnels in batches
and effectively manage these tunnels.
Background
If many MPLS TE tunnels need to be established, lots of MPLS TE functions need to be
configured and managed. To reduce workload, CR-LSP attribute templates each with a set of
parameters can be used, providing configuration flexibility.
Related Concept
A CR-LSP attribute template is a set of CR-LSP parameters that are configured on a tunnel
interface.
Implementation
A network administrator creates a CR-LSP attribute template and sets attributes in this
attribute template. This attribute template is used on a tunnel interface of an ingress. The
ingress can use this template to create CR-LSPs. Table 10-8 lists attributes that can be
configured in a CR-LSP attribute template.
Table 10-8 Attributes that a CR-LSP attribute template supports

Attribute Description
Bandwidth Bandwidth specified for a CR-LSP to be established.
Explicit path A specified path over which a CR-LSP is to be established.
Affinity A constraint for a CR-LSP to be established.
Priorities Setup and holding priorities.
Hop limit Maximum number of hops on a CR-LSP.
Flag indicating the route Enables a tunnel interface to record routes and labels.
and label record

Equipment
Attribute Description
FRR flag Enables Auto FRR on the tunnel interface.
Constraints for a bypass Setup priority value, holding priority value, and bandwidth for
tunnel a bypass tunnel that protects a CR-LSP established using this
template.
After an attribute template is used to create a CR-LSP, this template can also be used to
manage and maintain CR-LSP attributes. CR-LSP attributes can be modified in either of the
following modes:
l Configurations in the attribute template that is used to establish a CR-LSP are modified.
The attribute template can be modified to update the attributes of an existing CR-LSP
that uses this attribute template.
A specific attribute template update has a specific impact on the setup of CR-LSPs:
– If the priority or bandwidth type is changed, the ingress tears down the existing CR-
LSP and uses the changed attribute to establish a new CR-LSP.
– If other attributes are changed, the ingress implements the make-before-break
procedure.
l Commands are run on the tunnel interface to update attributes.
When attributes are configured both using the attribute template and command lines on
the tunnel interface, the attributes configured using command lines on the tunnel
interface are used to establish a CR-LSP.
For example, an attribute template named lsp-attribute 1 is used to establish a tunnel named
Tunnel 1 with the hop limit 24. If the mpls te hop-limit command is used on the tunnel
interface to set the hop limit to 16, the ingress implements the make-before-break procedure
and establishes a new CR-LSP with the hop limit 16.
Other Usage
A primary CR-LSP can be established using a specific attribute template. A maximum of
three attribute templates can be specified for a hot-standby CR-LSP or an ordinary backup
CR-LSP. Each attribute template contains specific attributes. One attribute template can be
selected to establish a desired CR-LSP.
In addition to the attribute templates, a best-effort path can also be configured on the tunnel
interface. This means that the hot-standby CR-LSP, ordinary backup CR-LSP, and best-effort
path can be configured on the same tunnel interface. If no attribute template is used, only hot-
standby CR-LSPs and best-effort paths can be configured simultaneously on a tunnel
interface.
CR-LSP attribute templates can be used to establish primary and backup CR-LSP bound to a
TE tunnel.
Benefits
CR-LSP attribute templates on a tunnel interface offer the following advantages:

Equipment
l CR-LSPs with the same TE attributes can be established in a batch, which greatly
simplifies configurations.
l CR-LSP attribute templates with different settings can be configured, and one of them
can be selected to establish a hot-standby or an ordinary CR-LSP. More attributes and
paths are provided for the CR-LSP than those configured using commands.
l A maximum of three CR-LSP attribute templates on a TE tunnel interface are designated
for a hot-standby CR-LSP or an ordinary CR-LSP. Different protection paths are
available.
l A hot-standby CR-LSP, an ordinary CR-LSP, and a best-effort path are configured
simultaneously to protect a primary CR-LSP on the same tunnel interface.
l Modifying attributes in a CR-LSP attribute template updates the configuration of CR-
LSPs that have been established using that attribute template, providing more flexibility
for CR-LSP configuration.
10.3.5 MPLS TE Reliability
10.3.5.1 Reliability Overview

MPLS TE reliability techniques need to prevent or minimize packet loss that occurs in one of
the following situations:
l If attributes, such as bandwidth, are modified when an MPLS TE tunnel is transmitting
services, the tunnel is reestablished using new attributes, and services switch to the new
path.
l If a node or link fails while an MPLS TE tunnel is transmitting services, a backup tunnel
is established and takes over traffic.
l The control plane is faulty and the forwarding plane works properly on a node over an
MPLS TE tunnel that is transmitting services.
MPLS TE tunnels that transmit mission-critical services require high reliability. Table 10-9
lists MPLS TE reliability functions.
Table 10-9 MPLS TE reliability functions

Technique Description Functions
Classification
Reliability Ensures reliable traffic transmission after attributes Make-before-

mechanism for are updated and a new CR-LSP is established using break
updating MPLS the updated attribute and takes over traffic.
TE attributes
Fault detection Rapidly detects MPLS TE network faults to speed l RSVP Hello
up a protection switchover. l BFD for TE

Equipment
Technique Description Functions

Classification
Traffic Supports network-level reliability, including E2E l TE FRR

protection path protection and local protection. l SRLG
l CR-LSP
Backup
l TE Tunnel
Protection
Group
Supports device-level reliability, including l RSVP GR

uninterrupted traffic transmission on the forwarding l RSVP NSR
plane if a fault occurs on the control plane of a node.
10.3.5.2 Make-Before-Break
The make-before-break mechanism prevents traffic loss during a traffic switchover between
two CR-LSPs. This mechanism improves MPLS TE tunnel reliability.
Background
MPLS TE provides a set of tunnel update mechanisms, which prevents traffic loss during
tunnel updates. In real-world situations, an administrator can modify the bandwidth or explicit
path attributes of an established MPLS TE tunnel based on service requirements. An updated
topology allows for a path better than the existing one, over which an MPLS TE tunnel can be
established. Any change in bandwidth or path attributes causes a CR-LSP in an MPLS TE
tunnel to be reestablished using new attributes and causes traffic to switch from the previous
CR-LSP to the newly established CR-LSP. During the traffic switchover, the make-before-
break mechanism prevents traffic loss that occurs if the traffic switchover is implemented
more quickly than the path switchover.
Principles
Make-before-break is a mechanism that allows a CR-LSP to be established using changed
bandwidth and path attributes over a new path before the original CR-LSP is torn down. It
helps minimize data loss and additional bandwidth consumption. The new CR-LSP is called a
modified CR-LSP. Make-before-break is implemented using the shared explicit (SE) resource
reservation style.
The new CR-LSP competes with the original CR-LSP on some shared links for bandwidth.
The new CR-LSP cannot be established if it fails the competition. The make-before-break
mechanism allows the system to reserve bandwidth used by the original CR-LSP for the new
CR-LSP, without calculating the bandwidth to be reserved. Additional bandwidth is used if
links on the new path do not overlap the links on the original path.

Equipment
Figure 10-63 Schematic diagram for make-before-break

60Mbit/s 60Mbit/s 60Mbit/s
LSRA LSRB LSRC LSRD

60Mbit/s 60Mbit/s
LSRE
In this example, the maximum reservable bandwidth on each link is 60 Mbit/s on the network
shown in Figure 10-63. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is
established, with the bandwidth of 40 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load. The reservable bandwidth of the link between LSRC and
LSRD is just 20 Mbit/s. The total available bandwidth for the new path is less than 40 Mbit/s.
The make-before-break mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path
LSRA → LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link
between LSRC and LSRD. After the new CR-LSP is established over the path, traffic
switches to the new CR-LSP, and the original CR-LSP is torn down.
In addition to the preceding method, another method of increasing the tunnel bandwidth can
be used. If the reservable bandwidth of a shared link increases to a certain extent, a new CR-
LSP can be established.
In the example shown in Figure 10-63, the maximum reservable bandwidth on each link is 60
Mbit/s. A CR-LSP along the path LSRA → LSRB → LSRC → LSRD is established, with the
bandwidth of 30 Mbit/s.
The path is expected to change to LSRA → LSRE → LSRC → LSRD to forward data
because LSRE has a light load, and the bandwidth is expected to increase to 40 Mbit/s. The
reservable bandwidth of the link between LSRC and LSRD is just 30 Mbit/s. The total
available bandwidth for the new path is less than 40 Mbit/s. The make-before-break
mechanism can be used in this situation.
The make-before-break mechanism allows the newly established CR-LSP over the path
LSRA → LSRE → LSRC → LSRD to use the bandwidth of the original CR-LSP's link
between LSRC and LSRD. The bandwidth of the new CR-LSP is 40 Mbit/s, out of which 30
Mbit/s is released by the link between LSRC and LSRD. After the new CR-LSP is
established, traffic switches to the new CR-LSP and the original CR-LSP is torn down.
Delayed Switchover and Deletion

If an upstream node on an MPLS network is busy but its downstream node is idle or an
upstream node is idle but its downstream node is busy, a CR-LSP may be torn down before
the new CR-LSP is established, causing a temporary traffic interruption.
To prevent this temporary traffic interruption, the switching and deletion delays are used
together with the make-before-break mechanism. In this case, traffic switches to a new CR-
LSP a specified delay time later after a new CR-LSP is established. The original CR-LSP is

Equipment
torn down a specified delay later after a new CR-LSP is established. The switching delay and
deletion delay can be manually configured.
10.3.5.3 TE FRR
TE FRR protects links and nodes on CR-LSPs bound to an MPLS TE tunnel. If a link or node
fails, TE FRR rapidly switches traffic to a backup path, minimizing traffic loss.
Background
A link or node failure triggers a primary/backup CR-LSP switchover. IGP routes of the
backup path need to converge, and CSPF recalculates a path over which a CR-LSP is
established. Traffic is dropped during this process.
TE FRR can be used to prevent traffic loss. After a link or node fails, TE FRR establishes a
bypass CR-LSP, which excludes the faulty link or node. The bypass CR-LSP can rapidly take
over traffic, minimizing traffic loss. The ingress can reestablish a primary CR-LSP.
Related Concepts
Figure 10-64 Local protection
PLR MP
Primary CR- LSP
LSRA LSRB LSRC LSRD
Bypass CR-LSP
LSRE
Table 10-10 describes TE FRR concepts.
Table 10-10 TE FRR concepts
Concept Description
Primary CR-LSP A CR-LSP that is protected.
Bypass CR-LSP A CR-LSP that protects the primary CR-LSP. The bypass CR-LSP
is usually in the idle state and transmits few data. If the bypass
CR-LSP needs to forward service data when it protects the
primary CR-LSP, sufficient bandwidth must be allocated to the
bypass CR-LSP.
Point of Local Repair The ingress of the bypass CR-LSP. It must be on the path of the
(PLR) primary CR-LSP. The PLR can be the ingress, not the egress of
the primary CR-LSP.

Equipment
Concept Description
Merge point (MP) The egress of the bypass CR-LSP. It must be on the path of the
primary CR-LSP. The MP cannot be the ingress of the primary
CR-LSP.
Table 10-10 describes TE FRR protection functions.
Table 10-11 TE FRR protection functions

Cla Type Description
ssif
ied
By
Obj Link The PLR (LSRB) and MP (LSRC) are directly connected, and the
ect protectio primary CR-LSP passes through the direct link. Bypass CR-LSP 1
to n protects the direct link, as shown in Figure 10-65.
be
prot Node A primary CR-LSP between the PLR (LSRB) and MP (LSRD) passes
ecte protectio through LSRC. Bypass CR-LSP 2 protects LSRC on the primary CR-
d n LSP, as shown in Figure 10-65.
Ban Bandwid The bandwidth of a bypass CR-LSP is higher than or equal to that of the
dwi th primary CR-LSP. The bypass CR-LSP protects the primary CR-LSP and
dth protectio its bandwidth.
n
Non- No bandwidth is assigned to a bypass CR-LSP. The bypass CR-LSP

bandwidt protects only the path of the primary CR-LSP.
h
protectio
n
Imp Manual A manually configured bypass CR-LSP is established and bound to a

lem protectio CR-LSP that is to be protected. If a link or node on the protected CR-
enta n LSP fails, traffic automatically switches to the bypass CR-LSP.
tion
Auto An Auto FRR-enabled node automatically establishes a bypass CR-LSP.
FRR The node binds the bypass CR-LSP to a primary CR-LSP if the node
protectio receives an FRR protection request and the FRR topology requirements
n are met.

Equipment
Figure 10-65 TE FRR link and node protection

PLR MP MP
LSRB LSRC LSRD
Primary LSP
LSRA LSRE
LSRF LSRG LSRH

Bypass CR-LSP 1 Bypass CR-LSP 2
Link protection Node protection
Faulty point
NOTE
A bypass CR-LSP supports the combination of protection modes. For example, manual protection, node
protection, and bandwidth protection can be implemented together on a bypass CR-LSP.
Implementation
The PLR implements TE FRR as follows:
1. Establishes a primary CR-LSP.

A primary CR-LSP is established the same way that an ordinary CR-LSP is established.
The PLR, however, adds the following flags into the Session_Attribute object in a Path
message:
– Local protection desired flag
– Route record desired flag
– Shared explicit (SE) style flag
– Bandwidth protection desired flag: can be added to the Path message if bandwidth
protection is required.
Figure 10-66 TE FRR local protection
Path LSRF
SESSION ATTRIBUTE :
Local protection desired
Bandwidthprotection desired
( PLR ) ( MP )
LSRA LSRB LSRC LSRD LSRE

Equipment
2. Binds a bypass CR-LSP to the primary CR-LSP.

Searching for a suitable bypass CR-LSP is also called bypass CR-LSP binding. This
process is completed before a CR-LSP switchover is performed. A bypass CR-LSP can
be bound to a primary CR-LSP only with the "local protection desired" flag. The PLR
must obtain the following information before binding the two CR-LSPs:
– Outbound interface
– Next Hop Label Forwarding Entry (NHLFE)
– Label switching router (LSR) ID of the MP
– Label allocated by the MP
– Protection type:
n Link protection: The next hop (NHOP) is the egress of the bypass CR-LSP. For
example, bypass CR-LSP 1 shown in Figure 10-67 provides link protection.
n Node protection: The next-next hop (NNHOP) is the egress of the bypass CR-
LSP. For example, bypass CR-LSP 2 shown in Figure 10-67 provides node
protection.
Figure 10-67 Binding between bypass and primary CR-LSPs

PLR NHOP NNHOP
LSRB LSRC LSRD
Primary LSP
LSRA LSRE
LSRF LSRG LSRH

Bypass CR-LSP 1 Bypass CR-LSP 2
Link protection Node protection
Faulty point
If multiple bypass CR-LSPs are established, the PLR selects the one with the highest
priority. The PLR prioritizes bypass CR-LSPs in the following order:
– Bandwidth protection
– Non-bandwidth protection
– Manual protection
– Auto FRR protection
– Node protection
– Link protection
Both bypass CR-LSPs 1 and 2 shown in Figure 10-67 are manually configured and
provide bandwidth protection. Bypass CR-LSP 1, which protects a link, has a lower
priority than bypass CR-LSP 2, which protects a node. In such a scenario, bypass CR-
LSP 2 is then bound to a primary CR-LSP. If bypass CR-LSP 1 only protects bandwidth
and bypass CR-LSP 2 only protects a link, bypass CR-LSP 1 is then bound to the
primary CR-LSP.

Equipment
After the binding is complete, the primary CR-LSP NHLFE records the bypass CR-LSP
NHLFE index and an inner label that the MP allocates for the primary CR-LSP. The
label is used to forward traffic from the MP to the next hop along the primary CR-LSP.
3. Performs fault detection.
– Link protection directly uses a data link layer protocol to detect and report faults.
The speed of fault detection at the data link layer depends on the link type.
– Node protection uses a data link layer protocol to detect link faults. If no link fault
occurs, the bidirectional forwarding detection (BFD) mechanism is used to detect
faults in a protected node.
After a link or node fault is detected, FRR switching triggers immediately.
NOTE
If node protection is enabled, only the link between the protected node and PLR is protected. The
PLR cannot detect faults in the link between the protected node and MP.
4. Performs a traffic switchover.
If the primary CR-LSP fails, both data traffic and RSVP messages switch to the bypass
CR-LSP, and the switchover event is reported upstream. The PLR pushes both an inner
label that the MP assigns for the primary CR-LSP and an outer label assigned for the
bypass CR-LSP into a packet. The outer label is removed at the penultimate hop of the
bypass CR-LSP, and the packet, only with the inner label, arrives at the MP. The MP
forwards the packet to the next hop along the primary CR-LSP.
Figure 10-68 Packet forwarding before TE FRR switching
Bypass LSP
Primary LSP
PLR MP
LSRA LSRB LSRC LSRD LSRE

1024 1025 1022 IP
IP IP Swap IP Pop
Swap

Equipment
Figure 10-69 Packet forwarding after TE FRR switching

35 36
1022 1022
Swap Swap Pop 36
IP IP
34
1022 1022
IP IP
PLR MP
1024 IP
LSRA IP LSRB LSRC LSRD LSRE
Swap 1024→1022
Push 34 Faulty point
Figure 10-68 shows nodes on the primary and bypass CR-LSPs and their allocated
labels and forwarding behaviors. The bypass CR-LSP provides node protection. If the
link between LSRB and LSRC fails or LSRC fails, LSRB (PLR) swaps an inner label
1024 for an inner label 1022, pushes an outer label 34 into the packet, and forwards the
packet over the bypass CR-LSP. After the packet arrives at LSRD, LSRD forwards the
packet to the next hop LSRE. For the detailed transmission procedure, see Figure 10-69.
5. Performs a traffic switchback.
After TE FRR (either manual or Auto FRR) switching is complete, the PLR (ingress)
attempts to reestablish the primary CR-LSP using the make-before-break mechanism.
Service traffic and RSVP messages switch from the bypass CR-LSP back to the primary
CR-LSP after the primary CR-LSP is successfully reestablished. The reestablished CR-
LSP is called a modified CR-LSP. The make-before-break mechanism allows the original
primary CR-LSP to be torn down only after the modified CR-LSP is established
successfully.
NOTE
FRR does not take effect if multiple nodes fail simultaneously. This means that after FRR switches data
from the primary CR-LSP to the bypass CR-LSP, all nodes on the bypass CR-LSP must be working
properly when transmitting data. If the bypass CR-LSP fails, the protected data cannot be forwarded,
and the FRR function fails. Even if the bypass CR-LSP is reestablished, it cannot forward data. Data will
be restored only after the primary CR-LSP is restored or reestablished.
Other Usage
l Board hot removal protection
Board hot removal protection protects traffic on the primary CR-LSP's outbound
interface on a PLR. If an interface board on which a protected outbound interface of a
primary CR-LSP resides is removed from a PLR, the PLR rapidly switches traffic to a
bypass CR-LSP. After the interface board is re-installed and the outbound interface of the
primary CR-LSP becomes available, traffic switches back to the primary CR-LSP.
Hot removal protection does not apply to an interface board, on which tunnel interfaces
are configured. If an interface board configured with a tunnel interface is removed, CR-
LSP information is lost and traffic is interrupted. The primary and bypass CR-LSPs'

Equipment
tunnel interfaces and the bypass CR-LSP's outbound interface must be configured on
boards different from the board configured with the bypass CR-LSP's outbound interface
on the PLR.
Configuring tunnel interfaces on the main control board of the PLR is recommended. If
an interface board on which the primary CR-LSP's outbound interface is removed or
fails, the primary CR-LSP's tunnel interface enters the Stale state, and resources
allocated to the tunnel interface remain. After the interface board is re-installed, the
tunnel interface recovers and a primary CR-LSP is reestablished.
l N:1 protection
A single bypass CR-LSP can protect traffic over multiple primary CR-LSPs.
TE FRR is a local protection mechanism that applies to MPLS TE networks that have backup
paths.
Benefits
TE FRR provides carrier-class local protection capabilities for CR-LSPs to improve network
reliability.
10.3.5.4 CR-LSP Backup

CR-LSP backup techniques protect E2E MPLS TE tunnels. If the ingress finds that the
primary CR-LSP is unavailable, the ingress switches traffic to a backup CR-LSP. After the
primary CR-LSP recovers, traffic switches back.
Related Concepts
CR-LSP backup functions include hot standby, ordinary backup, and the best-effort path
function. CR-LSP backup functions are as follows:
l Hot standby: A hot-standby CR-LSP is established immediately after a primary CR-LSP
is created. If the primary CR-LSP fails, the hot-standby CR-LSP takes over traffic from
the primary CR-LSP. After the primary CR-LSP recovers, traffic switches back.
l Ordinary backup: An ordinary backup CR-LSP can be established only after a primary
CR-LSP fails. The ordinary backup CR-LSP takes over traffic if the primary CR-LSP
fails. After the primary CR-LSP recovers, traffic switches back.
l Best-effort path
If both the primary and backup CR-LSPs fail, a best-effort path is established and takes
over traffic.
For example, the primary CR-LSP is established over the path PE1 → P1 → P2 → PE2,
and the backup CR-LSP is established over the path PE1 → P3 → PE2 shown in Figure
10-70. If both CR-LSPs fail, PE1 establishes a best-effort path PE1 → P4 → PE2 to take
over traffic.

Equipment
Figure 10-70 Best-effort path
P3
Backup CR-LSP
PE1 P1 Primary P2 PE2

CR-LSP
Best-effort path
P4
NOTE
A best-effort path has no bandwidth reserved for traffic, but has an affinity and a hop limit
configured as needed.
Implementation
The procedure of CR-LSP backup is as follows:
1. CR-LSP backup is deployed.
Plan the paths, bandwidth values, and deployment modes. Table 10-12 lists CR-LSP
backup deployment items.
Table 10-12 CR-LSP backup deployment
It Hot Standby Ordinary Backup Best-Effort

e Path
m
Pa Determines whether the primary Allowed to use the path Automatically

th and hot-standby CR-LSPs of the primary CR-LSP calculated by
entirely or partially overlap. A in all scenarios. the ingress.
hot-standby CR-LSP can be An ordinary backup CR- A best-effort
established over an explicit path. LSP supports the path supports
A hot-standby CR-LSP supports following attributes: the following
the following attributes: l Explicit path attributes:
l Explicit path l Affinity l Affinity
l Affinity l Hop limit l Hop limit
l Hop limit
l Overlapping path for a hot-
standby CR-LSP

Equipment
It Hot Standby Ordinary Backup Best-Effort

e Path
m
Ba A hot-standby CR-LSP and a An ordinary backup CR- A best-effort

nd primary CR-LSP have the same LSP and a primary CR- path is only a
wi bandwidth by default. Dynamic LSP have the same protection path
dt Bandwidth Protection for Hot- bandwidth. that does not
h standby CR-LSPs is supported have reserved
and ensures that a hot-standby bandwidth.
CR-LSP does not use additional
bandwidth when transmitting
traffic.
De Can be established without Can be established Can be

pl attribute templates. without attribute established
oy templates. without
m attribute
en templates.
t
m Can be established using attribute Can be established using Automatically
od templates. attribute templates. established and
e does not
support
attribute
templates.
Co l If established without an l If established without –

nfi attribute template, a hot- an attribute template,
gu standby CR-LSP can be used an ordinary CR-LSP
rat together with a best-effort can only be used
io path. alone.
n l If established using an l If established using an
co attribute template, a hot- attribute template, an
m standby CR-LSP can be used ordinary backup CR-
bi together with both an ordinary LSP can be used
na backup CR-LSP and a best- together with a hot-
tio effort path. standby backup CR-
n LSP and a best-effort
path.
2. CR-LSPs are established in sequence.

There may be many available paths for a CR-LSP on the same tunnel interface. The
ingress attempts to use each path to establish a CR-LSP until a CR-LSP is successfully
established.
The rules for establishing a CR-LSP are as follows:
a. If new tunnel configuration is committed or a tunnel goes Down, the ingress first
attempts to establish a primary CR-LSP. If the attempt fails, the ingress attempts to
establish a hot-standby CR-LSP. If establishing the hot-standby CR-LSP fails, the
ingress then attempts to establish an ordinary backup CR-LSP. If this attempt also
fails, the ingress establishes a best-effort path.

Equipment
b. A maximum of three CR-LSP attribute templates can be configured for hot-standby

CR-LSPs and three for ordinary backup CR-LSPs. These templates are prioritized.
The ingress uses each in descending order by priority until a CR-LSP is
successfully established.
c. If a CR-LSP has been established using a lower-priority attribute template and the
CR-LSP status changes, the ingress will attempt to establish a CR-LSP using a
higher-priority attribute template. The make-before-break mechanism ensures that
traffic is uninterrupted when a new CR-LSP is being established.
d. If a stable CR-LSP has been established using any of the attribute templates, you
can lock the used backup CR-LSP attribute template. After the attribute template is
locked, the ingress will not attempt to use a higher-priority attribute template to
establish a CR-LSP. This locking function prevents unnecessary traffic switchovers
and lowers system costs.
3. Backup CR-LSP attributes are modified.
If an attribute of a backup CR-LSP is modified, the ingress uses the make-before-break
mechanism to reestablish the backup CR-LSP with the updated attribute. After that
backup CR-LSP has been successfully reestablished, traffic on the original backup CR-
LSP (if it is transmitting traffic) switches to this new backup CR-LSP, and the original
backup CR-LSP is torn down.
4. Fault detection is implemented.
CR-LSP backup supports the following fault detection functions:
– The RSVP-TE fault advertisement mechanism sends signaling packets to detect
faults at a low speed.
– Bidirectional forwarding detection (BFD) for CR-LSP rapidly detects faults. This is
a recommended function.
5. A traffic switchover is implemented.
If a primary CR-LSP fails, the ingress attempts to switch traffic from the primary CR-
LSP to a hot-standby CR-LSP. If the hot-standby CR-LSP is unavailable, the ingress
attempts to switch traffic to an ordinary backup CR-LSP. If the ordinary backup CR-LSP
is unavailable, the ingress attempts to switch traffic to a best-effort path.
6. A traffic switchback is implemented.
Traffic switches back to a path based on the available CR-LSPs. Traffic will switch first
to the primary CR-LSP, which has the highest priority. If the primary CR-LSP is
unavailable, traffic will switch to the hot-standby CR-LSP. The ordinary CR-LSP has the
lowest priority.
Dynamic Bandwidth Protection for Hot-standby CR-LSPs

Hot-standby CR-LSPs support dynamic bandwidth protection. The dynamic bandwidth
protection function allows a hot-standby CR-LSP to obtain bandwidth resources only after the
hot-standby CR-LSP takes over traffic from a faulty primary CR-LSP. This function uses
network resources efficiently and reduces network costs.
Dynamic bandwidth protection ensures that the hot-standby CR-LSP does not use bandwidth,
while the primary CR-LSP is transmitting traffic. The dynamic bandwidth protection process
is as follows:
1. If the primary CR-LSP fails, traffic immediately switches to the hot-standby CR-LSP
with 0 bit/s bandwidth. The ingress uses the make-before-break mechanism to establish a
hot-standby CR-LSP.

Equipment
2. After the new hot-standby CR-LSP has been successfully established, the ingress
switches traffic to this CR-LSP and tears down the hot-standby CR-LSP with 0 bit/s
bandwidth.
3. After the primary CR-LSP recovers, traffic switches back to the primary CR-LSP. The
hot-standby CR-LSP then releases the bandwidth it uses, and the ingress establishes
another hot-standby CR-LSP with no bandwidth.
Overlapping Path for a Hot-standby CR-LSP

The overlapping path function can be configured for a hot-standby CR-LSP. The path of the
hot-standby CR-LSP can overlap the path of a primary CR-LSP in all scenarios.
Coexistence of CR-LSP Backup and TE FRR

1. CR-LSP backup functions can be used together with TE FRR.
– Hot standby and TE FRR: If TE FRR detects a link fault, traffic switches to a TE
FRR bypass CR-LSP and then to a hot-standby CR-LSP.
– Ordinary backup and TE FRR: If TE FRR detects a link fault, traffic switches to a
TE FRR bypass CR-LSP. If both the primary and TE FRR bypass CR-LSPs fail, an
ordinary backup CR-LSP is established and takes over traffic.
2. CR-LSP backup can be associated with TE FRR.
The association improves tunnel security. The association provides the following
functions based on backup modes:
– Association between an ordinary backup CR-LSP and a TE FRR bypass CR-LSP
provides the following functions:
If a protected link or node fails, traffic switches to a bypass CR-LSP. The ingress
attempts to reestablish the primary CR-LSP, while attempting to establish an
ordinary backup CR-LSP.
If the ordinary backup CR-LSP is established successfully before the primary CR-
LSP is restored, traffic switches to the ordinary backup CR-LSP.
After the primary CR-LSP recovers, traffic switches back to the primary CR-LSP.
If the ordinary backup CR-LSP fails to be established, and the primary CR-LSP
does not recover, the traffic still passes through the bypass CR-LSP.
– Association between a hot-standby CR-LSP and a TE FRR bypass CR-LSP
provides the following functions:
If a hot-standby CR-LSP is Up and a protected link or node fails, traffic switches to
a TE FRR bypass CR-LSP and then immediately switches to the hot-standby CR-
LSP. At the same time, the ingress attempts to restore the primary CR-LSP.
If the hot-standby CR-LSP is Down, the traffic switching procedure is the same as
that when the ordinary backup is used.
Association between ordinary backup CR-LSPs and TE FRR is recommended. An
ordinary backup CR-LSP without additional bandwidth needed is established only after
the primary CR-LSP enters the FRR-in-use state. Although the primary CR-LSP is Up,
the system attempts to establish a hot-standby CR-LSP with additional bandwidth
needed.
NOTE
The dynamic bandwidth function can be configured to allow the system to create a primary CR-
LSP and a hot-standby CR-LSP with the bandwidth of 0 bit/s simultaneously. The hot-standby
CR-LSP does not use bandwidth resources before the primary CR-LSP fails.

Equipment
10.3.5.5 Isolated LSP Computation

Isolated LSP computation enables a device to compute isolated primary and hot-standby label
switched paths (LSPs) by using the disjoint algorithm and constrained shortest path first
(CSPF) algorithm simultaneously.
Background
Most live IP radio access networks (RANs) use ring topologies and have the access ring
separated from the aggregation ring. To improve the end-to-end and inter-ring LSP reliability,
many IP RAN carriers require isolated primary and hot-standby LSPs. The CSPF algorithm
does not meet this reliability requirement, because CSPF is a metric-based path computing
algorithm that may compute two intersecting LSPs. Specifying explicit paths can meet this
reliability requirement; this method, however, does not adapt to topology changes. Each time
a node is added to or deleted from the IP RAN, operators must configure new explicit paths,
which is time-consuming and laborious. To resolve these problems, you can configure
isolated LSP computation.
Figure 10-71 shows an IP RAN on which a Multiprotocol Label Switching (MPLS) Traffic
Engineering (TE) tunnel is established between a cell site gateway (CSG) on the access ring
and a radio service gateway (RSG) on the aggregation ring. The MPLS TE tunnel implements
the end-to-end virtual private network (VPN) service. To improve the network reliability, this
network requires the constraint-based routed label switched path (CR-LSP) hot standby
feature and isolated primary and hot-standby LSPs.
Without the isolated LSP computation feature, CSPF on this network will compute CSG ->
ASG1 -> ASG2 -> RSG as the primary LSP. This LSP does not have an isolated hot-standby
LSP. However, two isolated LSPs exist on this network: CSG -> ASG1 -> RSG and CSG ->
ASG2 -> RSG. With the isolated LSP computation feature, the disjoint and CSPF algorithms
work simultaneously to get the two isolated LSPs.
Figure 10-71 Application of isolated LSP computation on an end-to-end VPN bearer network
Last Mile Access Aggregation MBB Core
metric=1 ASG1 metric=4
Node B RNC
CSG RSG
metric=1
metric=3 ASG2 metric=2
MPLS TE + Hotstandby MME

e Node B /SGW
L3VPN Eth
PW ATM/TDM
Implementation
Isolated LSP computation is implemented by both the disjoint and CSPF algorithms. This
feature computes primary and hot-standby LSPs simultaneously and cuts off overlapping

Equipment
paths of the two LSPs to get two isolated LSPs. In the example shown in Figure 10-72, before
isolated LSP computation is configured, CSPF computes LSRA -> LSRB -> LSRC -> LSRD
as the primary LSP and LSRA -> LSRC -> LSRD as the hot-standby LSP if path overlapping
is allowed. These two LSPs intersect, so that they do not meet the reliability requirement.
After isolated LSP computation is configured, the disjoint and CSPF algorithms compute
LSRA -> LSRB -> LSRD as the primary LSP and LSRA -> LSRC -> LSRD as the hot-
standby LSP. These two LSPs do not intersect, so that they meet the reliability requirement.
Figure 10-72 Principles of the disjoint algorithm
LSRB LSRB
metric=1 metric=4 metric=1 metric=4

LSRA LSRD LSRA LSRD
metric=1
metric=1
metric=2 metric=2
metric=3 metric=3
LSRC LSRC
Before isolated LSP computation is After isolated LSP computation is
configured configured
Primary LSP
Hot-standby LSP
Excluded path
NOTE
l Isolated LSP computation is a best-effort technique. If the disjoint and CSPF algorithms cannot get
isolated primary and hot-standby LSPs or two isolated LSPs do not exist, the device uses the
primary and hot-standby LSPs computed by CSPF.
l The disjoint algorithm cannot work together with the following features: explicit path, affinities, hop
limit, CR-LSP attribute template, and automatic bandwidth adjustment. Therefore, before you
configure isolated LSP computation, check that all those features are disabled. Otherwise, the device
does not allow you to configure isolated LSP computation. After you configure isolated LSP
computation, the device does not allow you to configure any of those features, either.
l After you configure isolated LSP computation, the shared risk link group (SRLG), if configured,
becomes ineffective.
Usage Scenario
Isolated LSP computation applies to networks on which Resource Reservation Protocol -
Traffic Engineering (RSVP-TE) tunnels and the CR-LSP hot standby feature are configured.
Benefits
Isolated LSP computation offers the following benefits to carriers:
l Improves the network reliability.

l Reduces the maintenance workload.

Equipment
10.3.5.6 SRLG
The shared risk link group (SRLG) functions as a constraint that is used to calculate a backup
path in the scenario where CR-LSP hot standby or TE FRR is used. This constraint helps
prevent backup and primary paths from overlapping over links with the same risk level,
improving MPLS TE tunnel reliability as a consequence.
Background
Carriers use CR-LSP hot standby or TE FRR to improve MPLS TE tunnel reliability.
However, in real-world situations, protection failures may occur, requiring the SRLG
technique to be configured as a preventative measure, as the following example demonstrates.
Figure 10-73 Networking diagram for an SRLG
PE1 P1 P2 PE2
P3
Logical topology
SRLG
PE1 P1 P2 PE2
Physical topology
P3
Shared link Shared link
Optical transport
device
NE1
The primary tunnel is established over the path PE1 → P1 → P2 → PE2 on the network
shown in Figure 10-73. The link between P1 and P2 is protected by a TE FRR bypass tunnel
established over the path P1 → P3 → P2.
In the lower part of Figure 10-73, core nodes P1, P2, and P3 are connected using a transport
network device. They share some transport network links marked yellow. If a fault occurs on
a shared link, both the primary and FRR bypass tunnels are affected, causing an FRR
protection failure. An SRLG can be configured to prevent the FRR bypass tunnel from
sharing a link with the primary tunnel, ensuring that FRR properly protects the primary
tunnel.

Equipment
Related Concepts
An SRLG is a set of links at the same risk of faults. If a link in an SRLG fails, other links also
fail. If a link in this group is used by a hot-standby CR-LSP or FRR bypass tunnel, the hot-
standby CR-LSP or FRR bypass tunnel cannot provide protection.
Implementation
An SRLG link attribute is a number and links with the same SRLG number are in a single
SRLG.
Interior Gateway Protocol (IGP) TE advertises SRLG information to all nodes in a single
MPLS TE domain. The constraint shortest path first (CSPF) algorithm uses the SRLG
attribute together with other constrains, such as bandwidth, to calculate a path.
The MPLS TE SRLG works in either of the following modes:
l Strict mode: The SRLG attribute is a necessary constraint used by CSPF to calculate a
path for a hot-standby CR-LSP or an FRR bypass tunnel.
l Preferred mode: The SRLG attribute is an optional constraint used by CSPF to calculate
a path for a hot-standby CR-LSP or FRR bypass tunnel. For example, if CSPF fails to
calculate a path for a hot-standby CR-LSP based on the SRLG attribute, CSPF
recalculates the path, regardless of the SRLG attribute.
Usage Scenario
The SRLG attribute is used in either the TE FRR or CR-LSP hot-standby scenario.
Benefits
The SRLG attribute limits the selection of a path for a hot-standby CR-LSP or an FRR bypass
tunnel, which prevents the primary and bypass tunnels from sharing links with the same risk
level.
10.3.5.7 TE Tunnel Protection Group

A tunnel protection group protects E2E MPLS TE tunnels. If a working tunnel in a protection
group fails, traffic switches to a protection tunnel, minimizing traffic interruptions.
Related Concepts
Concepts related to a tunnel protection group are as follows:
l Working tunnel: a tunnel to be protected.

l Protection tunnel: a tunnel that protects a working tunnel.
l Protection switchover: switches traffic from a faulty working tunnel to a protection
tunnel in a tunnel protection group, which improves network reliability.
Figure 10-74 illustrates a tunnel protection group.

Equipment
Figure 10-74 Tunnel protection group
Working tunnel-1
Working tunnel-2
Protection tunnel-3
LSRA LSRB
Data flow when primary
tunnel is normal
tunnel is failed
Working tunnel-1 is failed
Primary tunnels tunnel-1 and tunnel-2, and the bypass tunnel tunnel-3 are established on the
ingress LSRA on the network shown in Figure 10-74.
Tunnel-3 is configured as a protection tunnel for primary tunnels tunnel-1 and tunnel-2 on
LSRA. If the configured fault detection mechanism on the ingress detects a fault in tunnel-1,
traffic switches to tunnel-3. LSRA attempts to reestablish tunnel-1. If tunnel-1 is successfully
established, traffic switches back to the primary tunnel.
Implementation
A TE tunnel protection group uses a configured protection tunnel to protect traffic on the
working tunnel to improve tunnel reliability. To ensure the improved performance of the
protection tunnel, the protection tunnel must exclude links and nodes through which the
working tunnel passes during network planning.
Table 10-13 describes the implementation procedure of a tunnel protection group.
Table 10-13 Implementation procedure of a tunnel protection group
Seq Process Description

uenc
e
Nu
mbe
r
1 Establish The working and protection tunnels must have the same ingress and
ment destination address. The protection tunnel is established in the same
procedure as a regular tunnel. The protection tunnel can use attributes
that differ from those for the working tunnel. Ensure that the working
and protection tunnels are established over different paths as much as
possible.
NOTE
l A protection tunnel cannot be protected or enabled with TE FRR.
l Attributes for a protection tunnel can be configured independently of those
for the working tunnel, which facilitates the network planning.

Equipment
Seq Process Description

uenc
e
Nu
mbe
r
2 Binding The protection tunnel is bound to the tunnel ID of the working tunnel
between so that the two tunnels form a tunnel protection group.
the
working
and
protectio
n tunnels
3 Fault In addition to MPLS TE's own detection mechanism, MPLS OAM and
detection BFD for CR-LSP are used to detect faults in a tunnel protection group
to speed up protection switching.
4 Protectio The tunnel protection group supports either of the following protection
n switching modes:
switching l Manual switching: Traffic is forcibly switched to the protection
tunnel.
l Automatic switching: Traffic automatically switches to the
protection tunnel if the working tunnel fails.
A time interval can be set for automatic switching.
5 Switchba After a traffic switchover is implemented, the ingress attempts to

ck reestablish the working tunnel. If the working tunnel is reestablished,
the ingress can switch traffic back to the working tunnel or still
forward traffic over the protection tunnel.
Other Usage
A tunnel protection group works in either 1:1 or N:1 mode. The 1:1 mode enables a protection
tunnel to protect only a single working tunnel. The N:1 mode enables a protection tunnel to
protect more than one working tunnel.
Figure 10-75 N:1 protection mode
Working tunnel-1
LSRA Working tunnel-2 LSRB
Protection tunnel-3
tunnel is normal
tunnel is failed

Equipment
Differences Between CR-LSP Backup and a Tunnel Protection Group

CR-LSP backup and a tunnel protection group are both E2E protection mechanisms for
MPLS TE. Table 10-14 describes the comparison between these two mechanisms.
Table 10-14 Comparison between CR-LSP backup and a tunnel protection group
Item CR-LSP Backup Tunnel Protection Group
Object to be Primary and backup CR-LSPs One tunnel protects traffic over
protected are established on the same another tunnel in a tunnel protection
tunnel interface. A backup CR- group.
LSP protects traffic on a
primary CR-LSP.
TE FRR A primary CR-LSP supports TE A working tunnel supports TE FRR.

FRR. A backup CR-LSP does A protection tunnel does not support
not support TE FRR. TE FRR.
LSP attributes Primary and backup CR-LSPs The attributes of one tunnel in a
have the same attributes, except tunnel protection group are
for the TE FRR attribute. In independent of the attributes of the
addition, the bandwidth for the other tunnel. For example, a
backup CR-LSP can be set protection tunnel with no bandwidth
separately. can protect traffic on a working
tunnel that has a bandwidth.
Protection The 1:1 protection mode is The N:1 protection mode is

mode supported. Each primary CR- supported. Many tunnels share one
LSP is protected by a backup protection tunnel. If any protected
CR-LSP. tunnel fails, traffic switches to the
protection tunnel.
10.3.5.8 BFD for MPLS TE

Bidirectional forwarding detection (BFD) can monitor MPLS TE tunnels, CR-LSPs bound to
the MPLS TE tunnels, and RSVP neighbor relationships. If BFD detects a fault, the BFD
module instructs the MPLS module to perform a traffic switchover, which improves network
reliability.
Background
TE FRR, CR-LSP backup, and tunnel protection groups can be used to improve the reliability
of MPLS TE networks. A fault occurs if no message arrives after the refresh period of RSVP
Hello or RSVP messages elapses, which leads to a slow detection speed. When a Layer 2
device (such as a switch or hub) exists on the faulty link, slow detection delays a traffic
switchover and causes some traffic to be dropped. BFD can send packets to quickly detect
faults in MPLS TE tunnels and trigger a rapid traffic switchover to minimize traffic loss.
Related Concepts
BFD sessions are classified into the following types:

Equipment
l Static BFD session: Local and remote discriminators are configured manually.
l Dynamic BFD session: Local and remote discriminators are allocated automatically.
NOTE
For details about BFD, see the chapter "BFD" in Feature Description - Reliability.
Implementation
The following BFD functions are supported for MPLS TE:
l BFD for CR-LSP
BFD monitors CR-LSPs. After BFD detects a fault in a CR-LSP, the BFD module
immediately instructs the forwarding plane to trigger a rapid traffic switchover. BFD for
CR-LSP is used together with a hot-standby CR-LSP or a tunnel protection group.
l BFD for RSVP
BFD can detect faults in links between RSVP neighboring nodes in milliseconds. BFD
for RSVP applies to a TE FRR network, on which Layer 2 devices exist between the
PLR and its RSVP neighboring nodes over the primary CR-LSP.
l BFD for TE tunnel
BFD can monitor MPLS TE tunnels that are used as public network tunnels to transmit
VPN traffic. BFD monitors a whole TE tunnel. If BFD detects a fault in a tunnel that
transmits private network traffic, the BFD module instructs the VPN FRR module to
perform a traffic switchover.
BFD for CR-LSP

BFD monitors CR-LSPs. After BFD detects a fault in a CR-LSP, the BFD module
immediately instructs the forwarding plane to trigger a rapid traffic switchover. BFD for CR-
LSP is used together with a hot-standby CR-LSP or a tunnel protection group.
A BFD session can be bound to a CR-LSP when it is established between the ingress and
egress. A BFD packet is sent by the ingress to the egress along a CR-LSP. After the egress
receives the packet, the egress responds to the BFD packet. The ingress can rapidly detect the
status of links through which the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, the BFD module notifies the forwarding plane of the fault. The
forwarding plane searches for a backup CR-LSP and switches traffic to the backup CR-LSP.
In addition, the forwarding plane reports the fault to the control plane. If dynamic BFD for
CR-LSP is used, the control plane proactively creates a BFD session to monitor the backup
CR-LSP. A static BFD session can also be used to monitor the backup CR-LSP.

Equipment
Figure 10-76 Traffic forwarding of a BFD session before and after a traffic switchover
LSRD
LSRB
LSRA LSRC
LSRD
LSRA LSRC
LSRB
Primary Lsp
Backup Lsp
Bfd Session
BFD for RSVP

BFD monitors RSVP neighbor relationships. When a Layer 2 device (a hub for example)
exists between RSVP neighboring nodes, the two nodes can detect a link fault only using the
Hello mechanism in seconds. This process results in the loss of lots of data. BFD for RSVP
rapidly detects faults in a link between RSVP neighboring nodes within milliseconds. BFD
for RSVP applies to TE FRR networks, on which Layer 2 devices exist on a primary CR-LSP
between the PLR and its RSVP neighboring node.
Figure 10-77 BFD for RSVP

BFD Session
BFD Session
A BFD session for RSVP is established to monitor the link between RSVP neighbors shown
in Figure 10-77. The RSVP module can rapidly detect a link failure.
BFD for RSVP can share BFD sessions with BFD for Open Shortest Path First (OSPF), BFD
for Intermediate System to Intermediate System (IS-IS), or BFD for Border Gateway Protocol
(BGP). The local node selects the smallest values of parameters between the two ends of the

Equipment
shared BFD session as local BFD parameters. The parameters include the interval at which
BFD packets are sent, interval at which BFD packets are received, and local detection
multiplier.
BFD for TE Tunnel

BFD can monitor MPLS TE tunnels that are used as public network tunnels to transmit VPN
traffic. If BFD detects a fault, the BFD module instructs the VPN module to perform a traffic
switchover. BFD for TE tunnel and BFD for CR-LSP send fault information to different
objects. BFD for TE tunnel notifies applications (VPN for example) of faults and triggers a
traffic switchover between different tunnel interfaces. BFD for CR-LSPs notifies a TE tunnel
of faults and triggers a traffic switchover between different CR-LSPs in the same TE tunnel.
Differences
Table 10-15 lists differences between BFD for CR-LSP, BFD for RSVP, and BFD for TE
tunnel.
Table 10-15 Differences between BFD for CR-LSP, BFD for RSVP, and BFD for TE tunnel
Detection Detection Node Usage BFD Session
Technique Object Scenario Support
BFD for CR- CR-LSPs Ingress and Can be used l Dynamic

LSP egress with a hot- l Static
backup CR-LSP
BFD for RSVP RSVP neighbor Two ends of an Can be used Dynamic
relationships RSVP session with TE FRR
BFD for TE MPLS TE Ingress and Can be used Static

tunnel tunnels egress with VPN FRR
10.3.5.9 RSVP GR
RSVP graceful restart (GR) ensures uninterrupted transmission on the forwarding plane when
an active main board (AMB)/standby main board (SMB) switchover is performed on the
control plane. A GR helper assists a GR restarter in rapidly restoring the RSVP status.
Background
GR applies to provider edge (PE) devices on the provider network shown in Figure 10-78.
User nodes access the provider network through only a single PE. RSVP-TE tunnels are
established between PEs on the network to implement TE or transmit VPN traffic. If a PE
fails or a maintenance measure (such as a software upgrade) is taken, an AMB/SMB
switchover is performed on the PE. To prevent traffic loss during a traffic switchover, RSVP
GR can be implemented to ensure the uninterrupted transmission of critical services.

Equipment
Figure 10-78 RSVP GR application scenario
VPNA VPNA
CE1 CE2
PE1 PE2
PE3 PE4
CE3 CE4
VPNB VPNB
Related Concepts
RSVP GR is a rapid status restoration mechanism for RSVP-TE that is implemented based on
non-stop forwarding (NSF).
Devices play the following roles during a GR process:
l GR restarter: performs a graceful restart.
l GR helper: assists the GR restarter in implementing a graceful restart.
RSVP GR supports the following messages:
l Hello messages: used to create a Hello session between RSVP neighboring nodes.
l Path messages with a restoration label: also called GRPath messages. A GRPath message
is sent by an upstream node and carries the content of the latest refreshed Path message.
l RecoveryPath messages: sent by a downstream node and carry the content of the last
Path message that is received by the downstream node.
The RSVP GR process involves the following periods:
l RSVP restart time: elapses when the restarter restarts RSVP-TE components and
reestablishes an RSVP signaling channel. The time is specified by the Restart timer.
l RSVP recovery time: elapses when the restarter restores the RSVP soft state and
refreshes MPLS forwarding entries after the restarter and helper exchange Hello
messages. The time is specified by the Recovery timer.
Implementation
RSVP GR depends on the RSVP hello extension capability. After the RSVP Hello extension
capability is enabled, RSVP neighbors exchange Hello messages to advertise each other's GR
capabilities and GR parameters, including the RSVP restart time and recovery time. In Figure
10-79, if a fault occurs and an AMB/SMB switchover is performed on a device, the device
functions as the restarter, and its upstream and downstream GR-capable RSVP neighbors
function as helpers.

Equipment
The process of implementing RSVP GR is as follows:

1. During the AMB/SMB switchover, the restarter does not send a Hello message to its
helpers. If a helper fails to receive three consecutive Hello messages sent by the restarter,
the helper considers that the restarter enters the GR process. The helper then remains all
forwarding entries and starts the Restart timer.
2. After the restarter performs the AMB/SMB switchover and smooths data, it starts the
Restart timer and sends Hello messages to its helpers, while attempting to restart RSVP-
TE components and establish RSVP signaling channels to its helpers.
3. Upon receipt of the Hello messages, the helpers stop the Restart timer and start the
Recovery timer. The upper helpers then send GR Path messages to the restarter, and the
downstream helpers sends Recovery Path messages to the helper. These messages help
the restarter restore RSVP.
4. After the RSVP restart is complete and the Restarter timer expires, the restarter starts the
Recovery timer. With the assistance of the helpers, the restarter restores RSVP soft status
and refreshes MPLS forwarding entries. The CR-LSP then is restored.
5. After the CR-LSP is restored, the restarter stops the Recovery timer and sends Hello
messages to instruct the helpers to stop Recovery timers. The RSVP GR process is then
complete.
Figure 10-79 RSVP GR implementation

GR Helper GR Helper
GR Restarter
(Upstream) (Downstream)
Hello Hello
Advertises GR capabilities and Advertises GR capabilities and
time parameters to each other. time parameters to each other.
Hello Hello
Hello Hello
Hello Hello
... ...
Hello Hello
Hello Hello
RSVP restarting complete
Recovery Path
GR Path
Path
Resv
Resv
...
...
Hello Hello
RSVP recovery complete
AMB/SMB switchover time

RSVP restart time
RSVP recovery time
RSVP GR can be used on nodes that run RSVP-TE to establish MPLS TE tunnels to improve
device reliability.

Equipment
Benefits
RSVP GR ensures uninterrupted data service transmission when the control plane performs an
AMB/SMB switchover. It supports device-level reliability for MPLS TE nodes.
10.3.5.10 RSVP NSR

The non-stop routing (NSR) technique is an upgrade of the NSF technique. If a fault occurs
on the control plane of a node, NSR implements an uninterrupted connection between the
forwarding and control planes. This prevents the fault from affecting the forwarding plane on
a neighboring node.
With the wide use of IP/MPLS on bearer networks and metropolitan area networks (MANs),
higher reliability is required for IP/MPLS networks. NSR is a high reliability solution and
plays a more and more important role on operators' networks.
RSVP implements NSR by synchronizing data on the slave control board with the master
control board. NSR allows the slave control board to rapidly take over services from the
original master control board after switchover in the situation where a neighboring node does
not detect the fault.
RSVP synchronizes the following key data between the slave and master control boards:
l Configuration-associated data
l Stable LSP state blocks
l Key batch event
10.3.6 MPLS TE Security
10.3.6.1 RSVP Authentication and Its Enhancements

RSVP authentication verifies digest messages carried in RSVP messages to prevent attacks
initiated by modified or forged messages. Authentication enhancements can also be used to
prevent replay attacks and packet mis-sequence. RSVP authentication and its enhancements
improve network security and reliability.
Background
RSVP uses raw IP to transmit packets. Raw IP has no security mechanism and is prone to
attacks. RSVP authentication can be used to verify packets based on keys to prevent attacks.
Original RSVP authentication, however, cannot prevent replay attacks or the problem of
neighbor relationship termination resulted from RSVP message mis-sequence. The RSVP
authentication enhancements are used to address this problem. The authentication lifetime,
handshake, and message window are added as enhanced functions. The authentication
enhancements improve security and user authentication in a harsh network environment, such
as network congestion.
Related Concepts
l Raw IP: similar to UDP but unreliable. No control is provided for raw IP. Whether raw
IP datagrams reach their destinations is uncertain. Connectionless raw IP can exchange
data between hosts without virtual circuits.

Equipment
l Spoofing attack: An unauthorized router establishes a neighbor relationship with a local

router or attacks the local router by generating pseudo RSVP messages to establish an
RSVP neighbor relationship. The pseudo RSVP messages can reserve lots of
bandwidths.
l Replay attack: A remote RSVP router sends a packet with a sequence number less than
the maximum sequence number on a local RSVP router. After the local router receives
that packet, the local router terminates the RSVP neighbor relationship with the remote
RSVP router.
Implementation
l Key authentication
RSVP authentication uses keys carried in packets exchanged between RSVP neighboring
nodes to verify those packets, preventing spoofing attacks. The same key must be
configured on two RSVP neighboring nodes before they perform RSVP authentication.
The RSVP authentication implementation is as follows:
a. A local node uses Keyed-Hashing for Message Authentication Message Digest 5
(HMAC-MD5) to calculate a digest for a key.
b. The local node adds this digest as an integrity object into an RSVP message, and
sends that message to the remote node.
c. After the remote node receives the message, the node uses the same key and
algorithm to calculate a digest and checks whether the local digest is the same as the
received one.
n If they match, the remote node accepts the message.
n If they do not match, the remote node discards the message.
l Handshake mechanism
The handshake mechanism maintains the RSVP authentication status. After RSVP
neighboring nodes authenticate each other, they exchanged handshake packets. If they
accept the packets, they record a successful handshake. If a local node receives a packet
with the sequence number less than the local maximum sequence number, the local node
processes the packet as follows:
– Discards the packet if the packet shows that the handshake mechanism is not
enabled on the remote node.
– Discards the packet if the packet shows that the handshake mechanism is enabled
on the remote node and the local node has a record about a successful handshake. If
the local node does not have a record about a successful handshake, this packet is
the first one arrives at the local node and the local node starts a handshake process.
NOTE
In the preceding procedure, the local node only records the maximum sequence number, without
the message window enabled.
l Message window
A message window saves sequence numbers of received RSVP messages. If the window
size is 1, only the largest sequence number is saved. If the window size is set to a value
greater than 1, the specified number of largest sequence numbers can be saved. For
example, a window size is set to 10, and the largest sequence number of a received
RSVP message is 80. The sequence numbers between 71 and 80 can be saved if there is
no packet mis-sequence. If a packet mis-sequence problem occurs, the local node
arranges the messages and records the 10 largest sequence numbers.

Equipment
l Authentication lifetime
Authentication can be performed at a specified interval.
RSVP Key Management Modes

RSVP keys can be managed in either of the following modes:
l MD5 key
An MD5 key is entered in either ciphertext or simple text on an RSVP interface or node.
An MD5 key has the following characteristics:
– A key cannot be shared. Each protocol is configured with a separate key.
– An interface or a node is assigned only one key. The key can be reconfigured but
cannot be changed.
l Keychain key
Keychain is an enhanced encryption algorithm. A group of passwords is defined in the
format of a password string during keychain authentication, and each password is
assigned a specified encryption and decryption algorithm and configured with a validity
period. When the system sends or receives a packet, the system selects a valid password.
Within the validity period of the password, the system uses the encryption algorithm
matching the password to encrypt the packet before sending it out, or uses the decryption
algorithm matching the password to decrypt the packet before accepting it. In addition,
the system automatically uses a new password after the previous password expires,
minimizing password decryption risks.
Keychain management has the following characteristics:
– A keychain authentication password and the encryption and decryption algorithms
must be configured. A password validity period can also be configured.
– Keychain settings can be shared by separate protocols and features and can be
managed uniformly.
Keychain can be used on an RSVP interface and node and support HMAC-MD5.
Leveled RSVP Authentication

Leveled RSVP authentication is supported
l Neighbor-oriented authentication
You can configure authentication information, such as authentication keys, based on
neighbor addresses. RSVP then authenticates each neighbor separately.
Either of the following items can be used as an RSVP neighbor address:
– IP address of an interface on an RSVP neighboring node
– LSR ID of an RSVP neighboring node
l Interface-oriented authentication
Authentication is configured on interfaces, and RSVP authenticates messages based on
inbound interfaces.
Neighbor-oriented authentication has a higher priority than interface-oriented authentication.

A node discards messages if neighbor-oriented authentication fails and performs interface-
oriented authentication only if neighbor-oriented authentication is not enabled.

Equipment
10.3.7 DS-TE
This section describes the background, basic concepts, principle, and applications of DS-TE.
NOTE
Only the ATN 910B/ATN 950B (AND2CXPB/AND2CXPE) supports DS-TE.
10.3.7.1 Background
Background
l Advantages and disadvantages of MPLS TE
Multiprotocol label switching traffic engineering (MPLS TE) uses available resources to
establish a label switched path (LSP), and therefore provides guaranteed bandwidth for
traffic. MPLS TE can also precisely control traffic paths so that current bandwidth can be
fully used.
MPLS TE, however, cannot provide differentiated QoS guarantees for traffic of different
types. When both voice and video traffic is transmitted, video frames may be
retransmitted over a long period of time, so it may be required that video traffic be of a
higher drop precedence than voice traffic. MPLS TE, however, does not classify traffic
and processes voice and video traffic with the same drop precedence.
Figure 10-80 MPLS TE
HSI IP/MPLS backbone

network
I
+ HS
VoIP
S -T E fo r Internet
M PL
VoIP
l Advantages and disadvantages of the MPLS DiffServ model

The DiffServ model classifies user services and performs differentiated traffic
forwarding behaviors based on the service class, meeting different QoS requirements.
The DiffServ model is excellent in scalability. Data streams of multiple services are
mapped with a limited number of service classes so that the amount of information to be
maintained is in direct proportion to the types of data streams but not the volume of data
streams.
The DiffServ model, however, can reserve resources only on a single node. End-to-end
QoS cannot be guaranteed.
l Disadvantages of using both MPLS DiffServ and MPLS TE
In some application scenarios, using MPLS DiffServ or MPLS TE alone cannot meet
requirements.
For example, a link carries both voice and data services. To ensure the quality of voice
services, you must lower voice traffic delays. The sum delay is calculated based on this
formula: Sum delay = Delay in processing packets + Delay in transmitting packets. The
delay in processing packets is calculated based on this formula: Delay in processing
packets = Forwarding delay + Queuing delay. When the path is specified, the delay in
transmitting packets remains unchanged. To shorten the sum delay for voice traffic,

Equipment
reduce the delay in processing voice packets on each hop. When traffic congestion
occurs, the more packets, the longer the queue, and the higher the delay in processing
packets. Therefore, you must restrict the voice traffic on each link.
If the MPLS DiffServ model is used in this case, services are distinguished, and a
specific MPLS TE LSP is configured for each type of service. When a link or node fails
on the network, the network topology changes, or an LSP is preempted, the voice traffic
rate on the link may still exceed the specification, and end-to-end QoS cannot be
guaranteed.
Figure 10-81 Using both MPLS TE and MPLS DiffServ

R6
0M
10
R5 10
0M
1000M
0M
HSI:20M 10
0M 10 0M
10
R4
VoIP:60M R1 Internet
0M
10
R3 1
0M
VoIP:40M 00
M
10
R2
HSI:20M R7
As shown in Figure 10-81, the bandwidth of each link is 100 Mbit/s, and all links share
the same metric. Voice traffic is transmitted from R1 to R4 and from R2 to R4 at the rate
of 60 Mbit/s and 40 Mbit/s, respectively. Traffic from R1 to R4 is transmitted along the
LSP over the path R1 → R3 → R4, with the ratio of voice traffic being 60% between R3
and R4. Traffic from R2 to R4 is transmitted along the LSP over the path R2 → R3 →
R7 → R4, with the ratio of voice traffic being 40% between R7 and R4.
When the link between R3 and R4 fails, as shown in Figure 10-82, the LSP between R1
and R4 switches to the path R1 → R3 → R7 → R4 because this path is the shortest path
with sufficient bandwidth. At this time, the ratio of voice traffic from R7 to R4 reaches
100%, causing the sum delay of voice traffic to prolong.

Equipment
Figure 10-82 Networking after a link fails

R6
0M
10
R5 10
0M
1000M
0M
HSI:20M 10
0M 10 0M
10
R4
VoIP:60M R1 Internet
0%
0M
10
10
R3 1
0M
VoIP:40M 0 0M
10
R2
HSI:20M R7
MPLS DiffServ-Aware Traffic Engineering (DS-TE) can resolve this problem.
What Is MPLS DS-TE

MPLS DS-TE combines MPLS TE and MPLS DiffServ to provide QoS guarantee.
The class type (CT) is used in DS-TE to allocate resources based on the service class. To
provide differentiated services, DS-TE divides the LSP bandwidth into one to eight parts,
each part corresponding to a service class. Such a collection of bandwidths of an LSP or a
group of LSPs with the same service class are called a CT. DS-TE maps traffic with the same
per-hop behavior (PHB) to one CT and allocates resources to each CT.
If one LSP corresponds to multiple CTs and carries traffic with various service classes, this
LSP is called a multi-CT LSP. The IETF defines that DS-TE can support up to eight CTs,
marked as CTi, in which i ranges from 0 to 7.
If an LSP corresponds to one CT, this LSP is called a single-CT LSP.
Multi-CT LSPs can be used for the scenario shown in Figure 10-81. VoIP and HSI services
from R1 to R4 use different CTs on one MPLS TE tunnel. VoIP and HSI services from R2 to
R4 use different CTs on another MPLS TE tunnel. Then the ratios of different services remain
balanced, and the ratio of voice services stays within a proper range, as shown in Figure
10-83.

Equipment
Figure 10-83 MPLS DS-TE

R6
0M
10
R5 10
0M
1000M
HSI:20M
0M
%
P: 60
r VoI : 20%
10
CT f o I
10 r HS R4
VoIP:60M R1 0M CT fo 0M Internet
10
CT
0M
CT for
for VoI
10
R3 H P:
VoIP:40M 10 SI: 2 40%
0M
0M 0%
10
R2
HSI:20M R7
When the link from R3 to R4 fails, VoIP and HSI services from R1 to R4 are switched to the
path R1 → R3 → R5 → R6 → R4, as shown in Figure 10-84. Voice services from R1 to R4
can also be controlled within a proper range.
Figure 10-84 After the link fails

R6
0M
10
R5 %
60 % 10
IP: 20 0M 1000M
o I:
HSI:20M rV S
0M
fo or H
CT T f
10
10 C R4
0M Internet
VoIP:60M R1 10 0M
CT
0M
CT for
fo Vo
10
R3 1 r HS IP: 4
VoIP:40M 00
0M
M I: 20%0%
10
R2
HSI:20M R7
10.3.7.2 Related Concepts
DS Field
To carry out the DiffServ model, RFC 2474 redefines the ToS field in the IPv4 packet header
as the Differentiated Services (DS) field. The high order 2 bits in the DS field are reserved,
and the low order 6 bits specify the DS Code Point (DSCP).

Equipment
Per Hop Behavior

Per Hop Behavior (PHB) describes how the packets with the same DSCP value are forwarded
to the next hop. The PHB records certain traffic attributes, such as latency and the packet loss
ratio.
At present, the IETF defines three standardized PHBs: expedited forwarding (EF), assured
forwarding (AF), and best-effort (BE). The BE is the default PHB.
CT
To carry out differentiated services, the DS-TE model divides the bandwidth of the LSP into
eight parts. Each part of bandwidth is allocated with a different service class. The set of
bandwidth of one LSP or a group of LSPs with the same service class is called a class type
(CT). One CT can bear the traffic of a single service type.
As defined in the IETF, the DS-TE supports a maximum of eight CTs. CTs can be represented
as CTi. The value of "i" ranges from 0 to 7.
Single CT and Multi-CT

Single CT indicates that one LSP can bear the traffic of only one CT.
Multi-CT indicates that one LSP can bear the traffic of multiple CTs.
For the multi-CT, the resource reservation, LSP establishment, or bandwidth preemption can
be successfully performed only when the bandwidth of all the CTs is sufficient.
TE-class
A TE-class indicates the combination of a CT and a priority in format of <CT, priority>.
The priority indicates the priority of the CR-LSP preemption rather than the value of the EXP
field in the MPLS packet header. The value of the preemption priority ranges from 0 to 7. The
smaller the value is, the higher the priority is. A CR-LSP can be set up only when both the
combination of its CT and setup priority (<CT, setup-priority>) and the combination of its CT
and holding-priority (<CT, hold-priority>) exist in the TE-class mapping table. For example,
suppose the TE-class mapping table of a certain node contains only TE-class[0] = <CT0, 6>
and TE-class[1] = <CT0, 7>.
Only the following types of CR-LSPs can be set up successfully:
l Class-Type = CT0, setup-priority = 6, holding-priority = 6
NOTE
The CR-LSPs of "Class-Type = CT0, setup-priority = 6, holding-priority = 7" cannot be configured,
because the setup priority of the CR-LSP cannot be higher than its holding priority.
Each of eight CTs can be combined with any of eight priorities, which theoretically yields 64
TE-classes. In the ATN, eight TE-classes can be configured manually.
DS-TE Modes
The DS-TE modes are as follows:

Equipment
l IETF mode: indicates the mode defined by the IETF. Eight CTs are combined with eight
priorities and the combinations specify 64 TE-classes. In the ATN, the maximum of the
configurable TE-classes is 8.
l Non-IETF mode: indicates the mode not defined by the IETF that each of the two CTs is
combined with each of eight priorities, which yields 16 TE-classes.
TE-class Mapping Table

A TE-class mapping table is comprised of a set of TE-classes. In the ATN, a TE-class
mapping table consists of a maximum of eight TE-classes. You are recommended to configure
all the LSRs with the same TE-class mapping table over an MPLS network.
Bandwidth Constraints Model

Bandwidth constraint model is a set of rules defining the maximum number of bandwidth
constraints and which CTs each bandwidth constraint applies to and how.
10.3.7.3 Implementation
Basic Implementation
The edge nodes in the DiffServ model divide the traffic into several classes and add the class
information into the DSCP field in packets. The internal node chooses a proper PHB for the
packet according to the DSCP value.
The EXP field in the MPLS packet header contains information related to the DiffServ model.
The key to implement DS-TE is how to map the DSCP field (with a maximum of 64 values)
to the EXP field (with a maximum of eight values). RFC 3270 defines the following
solutions:
l Label-Only-Inferred-PSC LSP (L-LSP): The drop priority is specified in the EXP field,
and the PHB is determined by the label value. During packet forwarding, the label
determines the packet forwarding path and allocates the PHB for the path.
l EXP-Inferred-PSC LSP (E-LSP): The PHB and the drop priority are specified in the
EXP field of the MPLS label. During packet forwarding, the label value determines the
packet forwarding path, and the EXP value determines the PHB. The E-LSP applies to
networks that support not more than 8 PHBs.
The ATN implements the E-LSP. The mapping of the DSCP to the EXP field complies to
RFC 3270. The mapping of the EXP field to the PHB is configured manually.
The CT is introduced to the DS-TE to allocate resources according to traffic types. The DS-
TE maps the traffic of the same PHB to one CT and allocates resources to each CT separately.
DS-TE LSPs are set up based on the CT. DS-TE calculates the path and reserves resources
based on the CT and its bandwidth.
IGP Extension
RFC 4124 extends IGP to support the DS-TE. RFC 4124 introduces a Bandwidth Constraints
sub-TLV into IGP and redefines the Unreserved Bandwidth sub-TLV. These sub-TLVs are
used to collect and advertise the reservable bandwidth for each CT along a link. For details,
see RFC 4124.

Equipment
RSVP Extensions
IETF extends RSVP to implement the DS-TE in IETF mode. RFC 4124 defines a
CLASSTYPE object for Path messages. In the IETF draft (draft-minei-diffserv-te-multi-
class), a new object is defined, the extended-classtype object. For details, refer to RFC 4124
and draft-minei-diffserv-te-multi-class.
When each LSR along an LSP receives RSVP Path messages carrying the CT information and
the resources are sufficient, the LSR agrees to set up an LSP. At the same time, the LSR re-
calculates the reservable bandwidth for each CT. After the LSP is set up, information about
the reservable bandwidth is responded to IGP, and IGP advertises the information to other
nodes over the network.
Bandwidth Constraints Model

At present, the IETF defines the following Bandwidth Constraints models:
l Maximum Allocation Model (MAM): One BC is mapped to one CT and the bandwidth
cannot be shared among CTs. The BC model ID of the MAM is 1.
Figure 10-85 diagram for the MAM
BC0 BC1 ... BC7
Max. reservable bandwidth >= BC0 + BC1 + ... + BC7
In the MAM, the total bandwidth of CTi along an LSP is not more than that of BCi (0 <=
i <= 7). The total bandwidth of CTs of all LSPs is not more than the maximum
reservable bandwidth.
For example, suppose the bandwidth of a link is 100 Mbit/s; the MAM is applied, and
three CTs are supported: CT0, CT1, and CT2. BC0 is 20 Mbit/s, bearing CT0 traffic (for
example, BE traffic); BC1 is 50 Mbit/s, bearing CT1 traffic (for example, AF traffic);
BC2 is 30 Mbit/s, bearing CT2 traffic (for example, EF traffic). The total bandwidth of
all LSPs bearing BE traffic cannot be more than 20 Mbit/s; the total bandwidth of all
LSPs bearing AF traffic cannot be more than 50 Mbit/s; the total bandwidth of all LSPs
bearing EF traffic cannot be more than 30 Mbit/s.
In the MAM, the bandwidth preemption is allowed between the LSPs of the same CT,
and is not allowed between different CTs. In the MAM, however, the bandwidth may be
wasted.
l Russian Dolls Model (RDM): CTs can share bandwidth. The BC model ID of the RDM
is 0.
The bandwidth of BC0 is equal to or less than the maximum reservable bandwidth of the
link. As shown in Figure 10-86:
– The total bandwidth of all LSPs from CT7 <= Bandwidth of BC7
– The total bandwidth of all LSPs from CT6 and CT7 <= Bandwidth of BC6
– The total bandwidth of all LSPs from CT5, CT6, and CT7 <= Bandwidth of BC5
– The total bandwidth of all LSPs from CT0, CT1,... CT7 <= Bandwidth of BC0 <=
Maximum reservable bandwidth
This model is similar to Russian dolls that bigger dolls nest smaller ones.

Equipment
Figure 10-86 Schematic diagram for the RDM
BC1 BC0
BC7
... CT1 + ... + CT7 CT0 + CT1 + ... + CT7
CT7
Max. reservable bandwidth >= BC0 >= BC1 >= ... >= BC7
For example, the bandwidth of a link is 100 Mbit/s; the RDM applies, and three CTs are
supported: CT0, CT1, and CT2. CT0 bears BE traffic; CT1 bears AF traffic; CT2 bears
EF traffic. The bandwidth of BC0 is 100 Mbit/s; the bandwidth of BC1 is 50 Mbit/s; the
bandwidth of BC2 is 20 Mbit/s. The total bandwidth of all LSPs bearing EF traffic
cannot be more than 20 Mbit/s; the total bandwidth of all LSPs bearing AF and EF traffic
cannot be more than 50 Mbit/s; the total bandwidth of all LSPs cannot be more than 100
Mbit/s.
The RDM allows the bandwidth preemption between CTs. If 0 <= m < n <= 7 and 0 <= i
< j <= 7, the CTi of priority m can preempt the bandwidth of CTi of priority n and the
bandwidth of CTj of priority n. The total bandwidth of CTi of all LSPs cannot exceed the
bandwidth of BCi.
In the RDM, the bandwidth can be used efficiently.
l Extended-MAM: A bandwidth allocation mode that supports E-LSPs. The BC mode ID
of the extended-MAM is 254.
The extended-MAM supports eight more implicit CTs (the combination of CT0 and eight
priorities). This is different from the MAM. IGP floods these eight implicit CTs that are
carried in the unreserved BW TLV.
Differences Between IETF Mode and Non-IETF Mode

The ATN implements the IETF mode and the non-IETF mode. Their differences are described
in Table 10-16.
NOTE
If bandwidth constraints are set for tunnels,, the IETF and non-IETF modes can be switched to each
other.
Table 10-16 Differences between the IETF mode and non-IETF mode
DS-TE Mode Non-IETF Mode IETF Mode
Bandwidth Supports the MAM and Supports the RDM, MAM, and extended-
constraints RDM. MAM.
model
CT Supports CT0 and CT1. Supports CT0 to CT7.
BC type Supports BC0 and BC1. Supports BC0 to BC7.

Equipment
DS-TE Mode Non-IETF Mode IETF Mode
TE-class The TE-class mapping table Supports the configuration and

mapping table can be configured but cannot application of the TE-class mapping
take effect. table.
IGP message The reservable bandwidth is The CT information is carried in the sub-
carried in the Unreserved TLVs.
Bandwidth sub-TLV based The sub-TLVs are as follows:
on the priority.
l Unreserved Bandwidth sub-TLV:
carries the unreserved bandwidth of
the 8 TE-classes, in byte/s.
l Bandwidth Constraints sub-TLV:
– For RDM and MAM, it carries
information about the BC model
and the BC bandwidth, in byte/s.
– For extended-MAM, it carries the
unreserved bandwidth of the 8
implicit TE-classes, in byte/s.
RSVP The ADSPEC object carries Different objects carry the CT

messages CT information. information as follows:
l Single CT: The CLASSTYPE object
carries CT information.
l Multi-CT: The
EXTENDED_CLASSTYPE object
carries CT information.
DS-TE Mode Switching

In the ATN, the non-IETF mode and the IETF mode can be switched to each other. DS-TE
mode switching is described in Table 10-17.
Table 10-17 DS-TE mode switching

Non-IETF Mode to IETF
Item Mode IETF Mode to Non-IETF Mode
Change in The bandwidth model is The bandwidth models are changed as

the unchanged. follows:
Bandwidt l The extended-MAN is changed to
h MAM.
Constraint
s model l The RDM is unchanged.
l The MAM is unchanged.

Equipment
Non-IETF Mode to IETF

Item Mode IETF Mode to Non-IETF Mode
Change in If the TE-class mapping table is The TE-class mapping table is not
the TE- configured, it applies. Otherwise, applied.
class the default one applies. l If a TE-class mapping table is
mapping For information about the default configured, it is not deleted.
table TE-class mapping table, see l If no TE-class mapping table is
Table 10-18. configured, the default one is deleted.
LSP LSPs whose <CT, set-priority> or The following LSPs are torn down on the
deletion <CT, hold-priority> is not in the ingress and transit nodes:
TE-class mapping table are torn l Multi-CT LSPs
down on the ingress and transit
nodes. l LSPs of single CT from CT2 to CT7
Table 10-18 Default TE-class mapping table

TE-Class CT Priority
TE-Class [0] 0 0
TE-Class [1] 1 0
TE-Class [2] 2 0
TE-Class [3] 3 0
TE-Class [4] 0 7
TE-Class [5] 1 7
TE-Class [6] 2 7
TE-Class [7] 3 7
10.3.8 Static Bidirectional Co-routed LSPs

Static bidirectional co-routed LSP is an important feature. With this feature, LSP ping
messages, LSP tracert messages, and OAM messages and their reply messages can travel
through the same path.
Background
Service packets exchanged by two nodes need to travel through the same links and nodes on a
transport network without running a routing protocol. Static bidirectional co-routed LSPs can
be used to meet the requirements.
Related Concept
Static bidirectional co-routed LSP:

Equipment
A static bidirectional co-routed LSP is a type of LSP over which two flows are transmitted in
opposite directions by the same nodes over the same links. A static bidirectional co-routed
LSP is established manually. A static bidirectional co-routed LSP differs from two LSPs that
transmit traffic in opposite directions. Two unidirectional LSPs bound to a static bidirectional
co-routed LSP function as a whole LSP. Two forwarding tables are used to forward traffic in
opposite directions.
The static bidirectional co-routed LSP can go Up only when the conditions for forwarding
traffic in opposite directions are met. If the conditions for forwarding traffic in one direction
are not met, the bidirectional LSP is in the Down state. If no IP forwarding capabilities are
enabled on the bidirectional LSP, any intermediate node on the bidirectional LSP can reply
with a packet along the original path.
Implementation
A static bidirectional co-routed LSP is established by allocating labels manually to a specific
forwarding equivalence class (FEC). In manual label allocation mode, the outgoing label
value of a node is equal to the incoming label value of its next hop. Although this LSP is
established in the same way as a common static CR-LSP, a static bidirectional co-routed LSP
requires two forwarding tables, one for sending packets and the other for receiving packets.
A node on a static bidirectional co-routed LSP only has information about the local LSP and
cannot obtain information about nodes on the other LSP. A static bidirectional co-routed LSP
shown in Figure 10-87 consists of a CR-LSP and a reverse CR-LSP. The CR-LSP originates
from the ingress and terminates on the egress. Its reverse CR-LSP originates from the egress
and terminates on the ingress.
Figure 10-87 Static bidirectional co-routed LSP
The procedure for configuring a static bidirectional co-routed LSP is as follows:

l Configure the ingress.
Configure a tunnel interface and enable MPLS TE on the outbound interface of the
ingress. If the outbound interface is Up and has sufficient bandwidth, the static
bidirectional co-routed LSP can go Up, irrespective of whether a transit node or an
egress is available.
l Configure a transit node.
Enable MPLS TE on two outbound interfaces. If they are both Up and have sufficient
bandwidth, the static bidirectional co-routed LSP can go Up, irrespective of whether an
ingress, another transit node, or an egress is available.
l Configure the egress.
Enable MPLS TE on the inbound interface. If the inbound interface goes Up and has
sufficient bandwidth, the static bidirectional co-routed LSP can go Up, irrespective of
whether an ingress or a transit node is available.

Equipment
10.3.9 Loopback Detection for a Static Bidirectional Co-Routed

CR-LSP
Loopback detection enables a specified transit node on a PW of an IP RAN to loop back
simulated low-speed traffic to the ingress along an established static bidirectional co-routed
CR-LSP overlapping the PW. Upon receipt of the traffic, the ingress can use the information
to locate the point where packet loss occurs if there is.
Background
On an IP RAN, low-speed E1/T1 services are transmitted on the AC side. If some packets are
dropped or bit errors occur on a PW, faults must be diagnosed. To diagnose the faults, a static
bidirectional co-routed CR-LSP overlapping the PW is established, and loopback detection is
enabled to locate faults along the PW.
Implementation
In Figure 10-88, loopback detection is enabled for a static bidirectional co-routed CR-LSP
that overlaps a PW.
To prevent the impact on the existing services, both the PW and static bidirectional co-routed
CR-LSP must overlap. The PW must be a static PW with the outgoing label the same as the
incoming label.
Fault diagnosis is performed on a low-speed AC-side interface without services transmitted.
The pseudo random binary sequence (PRBS) detection mechanism simulates traffic, and the
dichotomy method is used in loopback detection to monitor the link to each hop along the
CR-LSP to locate the point where packet loss occurs.
Figure 10-88 Networking diagram for loopback detection for a static bidirectional co-routed
CR-LSP

PE1 P1 P2 PE2
E1/T1 AC
interface
PRBS simulates Step 1

service packets. Step 2
Step 1 Monitors the link between PE1 and P1.

Step 2 Monitors the link between PE1 and P2.
The process of implementing loopback detection is as follows:

1. Loopback detection is enabled on P1, and PRBS detection is enabled on an E1/T1 AC
interface of the ingress PE1. The loopback detection function monitors the link between
PE1 and P1.
2. PRBS detection and loopback detection are disabled on P1. Loopback detection is
enabled on P2, and PRBS is enabled on the E1/T1 AC interface of the ingress PE1 again.
The loopback detection function monitors only the link between P1 and P2.
3. The detection process repeats until a link fault is located on the link between P2 and
PE2.

Equipment
With loopback detection enabled, a specified transit node loops back traffic to the ingress
along the CR-LSP. Loopback alarms can then be generated to prompt users that loopback
detection is performed. Loopback detection can be disabled manually or automatically after
being complete. Its configuration takes effect only on a main control board. After a master/
slave main control board is performed, loopback detection is automatically disabled.
Benefits
Loopback detection for a static bidirectional co-routed CR-LSP helps rapidly diagnose low-
speed service faults and improve network operation and maintenance efficiency.
10.3.10 Associated Bidirectional Dynamic LSPs

Associated bidirectional dynamic LSPs provide bandwidth protection for bidirectional
services. If a fault occurs, the automatic protection switching (APS) process is triggered on
both ends of the bidirectional LSPs.
Background
Existing networks have the following issues:
l RSVP-TE tunnels for transmitting TE services are unidirectional, and TE services are
transmitted from the ingress to the egress of a tunnel. TE services can be transmitted
from the egress to the ingress only using IP routes, which may cause traffic congestion.
l Another RSVP TE tunnel (a reverse tunnel) can be configured to send services from the
egress to the ingress. If a tunnel or its reverse tunnel fails, a traffic switchover is
performed, but the other tunnel cannot detect the fault or perform a traffic switchover,
which causes a service interruption.
In this case, you can deploy two RSVP-TE tunnels on two devices functioning as the source
and destination of each other. Then bind two unidirectional dynamic LSPs of the two tunnels
into an associated bidirectional dynamic LSP. The associated bidirectional dynamic LSP can
transmit bidirectional traffic, preventing network congestion. In addition, when one end of the
LSP fails, the other end will be notified of the fault, and the two ends can perform link
switching at the same time, preventing service interruptions.
Related Concepts
APS coordinates the source and destination ends to perform a protection switchover, a
delayed switchover, or a switchover after a wait-to-restore (WTR) time elapses.
Implementation
Figure 10-89 Principles of an associated bidirectional dynamic LSP
Tunnel1
LSRA Tunnel2 LSRB

Equipment
Figure 10-89 shows the implementation principles of an associated bidirectional dynamic

LSP. To implement associated bidirectional dynamic LSP, the following conditions must be
met:
l Tunnel1 and Tunnel2 are RSVP-TE tunnels.
l The RSVP-TE LSP of Tunnel1 and the RSVP-TE LSP of Tunnel2 in the opposite
directions are set to be the reverse LSPs of each other.
l Penultimate hop popping (PHP) is not supported by the associated bidirectional dynamic
LSP.
NOTE
l During service deployment, the reverse RSVP-TE LSP uses labels to establish a connection to
the forward RSVP-TE LSP on the forwarding plane. If PHP is supported, label 0 or 3 will be
popped out at the penultimate hop so that messages with label 0 or label 3 cannot be sent to the
destination.
l Using the same path to establish an LSP and its reverse LSP is recommended, which ensures
the same delay time for packets in opposite directions.
APS Switching for an Associated Bidirectional Dynamic LSP

APS is automatically enabled after an associated bidirectional dynamic LSP is established. As
shown in Figure 10-90, services from LSRA to LSRF pass through the primary tunnel LSRB
-> LSRC -> LSRD, and services from LSRF to LSRA pass through the primary tunnel LSRD
-> LSRC -> LSRB. To ensure service reliability, bypass tunnels are configured for the two
primary tunnels respectively on LSRB and LSRD. The associated bidirectional dynamic LSP
is established. If a fault occurs and an APS event is triggered on one end, the other end is
notified of the APS event and performs APS switching.
Figure 10-90 Principles of an associated bidirectional dynamic LSP (before APS is

implemented)
LSRA LSRB LSRC LSRD LSRF
LSRA LSRF
LSRC
LSRB LSRD
LSRE
Traffic before switching

Working path
Protection path
If a fault occurs on the LSP that originates from LSRB and is destined for LSRC on the
network shown in Figure 10-91, traffic on LSRB switches to the link LSRB -> LSRE ->

Equipment
LSRD, and a fault notification is sent to LSRD to instruct LSRD to switch traffic to the link
LSRD -> LSRE -> LSRB. Bidirectional services between LSRA and LSRF switch to the
bypass tunnels LSRB -> LSRE -> LSRD and LSRD -> LSRE -> LSRB, preventing service
interruptions.
Figure 10-91 Principles of an associated bidirectional dynamic LSP (after APS is

implemented)
LSRA LSRF
LSRC
LSRB LSRD
LSRE
Traffic after switching

Wroking path
Protection path
l Associated bidirectional LSPs apply to the scenario in which bidirectional services need
bandwidth protection.
l To configure bit-error-triggered RSVP-TE tunnel switching, configure associated
bidirectional LSPs.
10.3.11 MPLS TE Control Message
10.3.11.1 RSVP-TE Control Message
RSVP Messages
RSVP has the following message types:
l Path message: sent by a sender to receivers to collect path information of the passing
nodes.
l Resv message: sent upstream by a receiver hop-by-hop to respond to the Path message,
require resource reservation.
l PathTear message: sent to remove path state of the passing nodes.
l ResvTear message: sent to remove resource reservation state on a node.

Equipment
l PathErr message: sent upstream by a node to report errors in processing of the Path
messages.
l ResvErr message: sent downstream by a node if errors occur during the processing of the
Resv messages.
l ResvConf message: sent downstream by a sender hop-by-hop to confirm the resource
reservation requests. It is sent only when the Resv message contains the
RESV_CONFIRM object.
Each type of RSVP messages contains a common header. The length and types of other fields
are not fixed. Figure 10-92 shows the format of RSVP messages.
Figure 10-92 Format of RSVP messages

Version (4 bits) Flags (4bits) Message Type (8bits) RSVP Checksum (16bits)
Send_TTL (8bits) Reserved (8bits) RSVP Length (16bits)
Objects ( Variable )
Table 10-19 describes each field in the format.
Table 10-19 Description of each field in the format of RSVP messages

Field Description
Version RSVP version number.

Currently, the version is 1.
Flags Flag bit.

Commonly, the value is 0. In RFC 2961, it is extended to identify
whether Summary Refresh Extension (Srefresh) is supported. If
Srefresh is supported, the value of the flag field is 0x01.
Message Type RSVP messages type:

l 1: Path messages
l 2: Resv messages
l 3: PathErr messages
l 4: ResvErr messages
l 5: PathTear messages
l 6: ResvTear messages
l 7: ResvConf messages
l 13: ACK messages
l 15: Srefresh messages
l 20: Hello messages
RSVP RSVP checksum.

Checksum Value 0 indicates that no checksum was transmitted.

Equipment
Field Description
Send_TTL TTL of the message.

When a node receives an RSVP message, it compares the Send_TTL
and the TTL in the IP header to calculate the hops that the message
passes in a non-RSVP area.
Reserved Field that is reserved.
RSVP Length Total length of the RSVP message, in bytes.
Objects Object of the RSVP message.

Each RSVP message contains kinds of objects. The carried objects vary
with types of messages.
NOTE
For details of each type of RSVP messages, see RFC 3209.
Format of Objects Carried in RSVP Messages
Figure 10-93 Format of objects carried in RSVP messages

Length (16bits) Class_Number (8bits) C-Type (8bits)
Object Content (Variable)
Figure 10-93 shows the format of objects carried in RSVP messages.
l Length: total length of the object, in bytes. Its value must be a multiple of 4, and at least
4.
l Class_Number: an object class. Each object class has a name, such as SESSION,
SENDER_TEMPLATE, and TIME_VALUE.
l C-Type: object type, unique within the Class_Number. The Class-Number and C-Type
are used together to define a unique type for each object.
l Object Content: content of objects. The length of this field is changeable.
Path Message
In RSVP-TE, a Path message is used to create an RSVP session and maintain a Path state.
The Path message is sent by the ingress node to the egress node in the direction of data flows.
On each node, the path state block (PSB) is created.
NOTE
The source IP address of a Path message is the LSR ID of the ingress node and the destination IP
address is the LSR ID of the egress node.
Table 10-20 lists some objects carried in the Path message.

Equipment
Table 10-20 Path message objects

Message Object Description
SESSION Carries RSVP session information, such as the destination

address, tunnel ID, and extend tunnel ID.
HOP Identifies the IP address and the handle of the outbound interface
of the previous hop that sends the Path message.
TIME_VALUE Carries the refreshing interval.
SENDER_TEMPLATE Specifies the sender IP address and LSP ID.
SENDER_TSPEC Defines traffic characteristics of the data flow.
LABEL_REQUEST Indicates the LABEL_REQUEST object, which is carried only

in Path messages.
ADSPEC Collects actual QoS parameters about the path, such as

estimation of bandwidth of the path, minimal path delay, and
path MTU.
Explicit Route Object Describes information about the path through which the LSP
(ERO) passes. The explicit paths can be strict or loose. Path messages
are then forwarded with the specified ERO, without being
restricted by IGP shortest path.
Record Route Object Lists the LSRs through which the Path message passes when
(RRO) being transmitted. An RRO can be used to collect path
information and discover route loops. It can also be copied to the
next Path message for implementing Route Pinning.
Session Attribute Specifies the setup priority, holding priority, reservation style,
affinity, and other information.
Resv message
After receiving a Path message, a transit node and the egress node reply with Resv messages.
The Resv message, carrying resource reservation information, is sent to the previous node
hop-by-hop. Each passing node creates and maintains a reserved state block (RSB) and
allocates a label. When the Resv message reaches the ingress node, an LSP is established
successfully.
Table 10-21 describes objects carried in the Resv message.
Table 10-21 Resv message object

SESSION Carries RSVP session information, such as the destination address,

tunnel ID, and extend tunnel ID.
HOP Identifies the IP address and the index of the outbound interface that
sends the Resv message.

Equipment
TIME_VALUE Carries the refreshing interval.

The default value is 30 seconds.
STYLE Indicates the resource reservation style.

It is specified on the ingress node.
FLOW_SPEC Specifies QoS characteristics of a data flow.
FILTER_SPEC Specifies the sender IP address and LSP ID of the node that sends the
message.
RRO Collects the IP address of the inbound interface, LSR ID, and the IP
address of the outbound interface of the node along the path.
LABEL Indicates the assigned label.
RESV_CONFIR Indicates a confirmation of the resource reservation is requested when

M this object is received.
This object carries the IP address of the node that requests a
confirmation of the resource reservation.
10.3.12 MPLS TE Applications
10.3.12.1 MPLS TE Application on an IP RAN

This section describes the MPLS TE deployment and application on an IP radio access
network (RAN).
Service Overview
An IP RAN is a transport network that transmits traffic between wireless base stations and
base station controllers. A conventional RAN supports circuit-switching and consists of multi-
service transmission platform (MSTP) and microwave devices. As data, audio, and video
services are growing, these services have increasing demands for bandwidth on IP RANs, but
carriers operate the IP RANs on decreasing profit margins due to fierce competition. Carriers
start to use IP/MPLS techniques to transmit services on the IP RANs to meet the bandwidth
requirements and to reduce costs.
Table 10-22 lists services that an IP RAN runs.
Table 10-22 Services on an IP RAN

Service Description
Long Term Evolution An eNodeB uses IP/MPLS techniques to send LTE S1 services to
(LTE) S1 services an MME through an Ethernet interface.
LTE X2 services An eNodeB uses IP/MPLS techniques to send LTE X2 services to

a neighboring eNodeB through an Ethernet interface.

Equipment
Service Description
Mobile A NodeB sends 3G Ethernet services through an Ethernet interface

telecommunications to the CSG. The CSG uses IP/MPLS techniques to forward these
3rd Generation (3G) services to an AGG. After the AGG receives the services, the
Ethernet services AGG forwards them to a radio network controller (RNC).
3G Multi-Link Point- A NodeB sends 3G MLPPP services through a Point-to-Point

to-Point Protocol Protocol (PPP) interface to the CSG. The CSG uses PPP
(MLPPP) services techniques to forward these services to an AGG. After the AGG
receives the services, the AGG forwards them to an RNC.
3G asynchronous A NodeB uses inverse multiplexing over ATM (IMA) to send 3G

transfer mode (ATM) ATM services through an E1 interface to the CSG. The CSG uses
services pseudo wire emulation edge-to-edge (PWE3) and MPLS packet
transmission techniques to forward these services to an AGG.
After the AGG receives the services, the AGG forwards them to an
RNC.
Mobile A base transceiver station (BTS) uses TDM to send 2G TDM

telecommunications services through an E1 interface to the CSG. The CSG uses PWE3
2nd Generation (2G) and MPLS packet transmission techniques to forward these
time division services to an AGG. After the AGG receives the services, the
multiplexing (TDM) AGG forwards them to a base station controller (BSC).
services
Figure 10-94 shows an IP RAN that consists of the access, aggregation, and service core
layers. Access and aggregation layers use ring networking. Wireless base stations are
connected to the RNC, BSC, and MME through the access and aggregation networks.

Equipment
Figure 10-94 IP RAN networking (IGP area partition)
AGG1
RSG1
BTS CSG BSC
RNC
Node B
AGG2 RSG2
MME
e node B
2G TDM 2G TDM
2G ATM MPLS TE1 2G ATM
3G Eth 3G Eth
MPLS TE2
LTE S1/X2 LTE S1/X2
L3VPN
L2VPN
NOTE
In this section, an E2E virtual private network (VPN) solution is used to demonstrate how MPLS TE
techniques are used to transmit IP RAN services.
Feature Deployment
MPLS TE tunnels are established as public network tunnels to transmit E2E VPN services
over an IP RAN. Table 10-23 lists MPLS TE-related features.
Table 10-23 MPLS TE-related features
Service E2E L3VPN E2E L2VPN

Require
ments
Services 3G Ethernet l 3G ATM

l 2G TDM

Equipment
Service E2E L3VPN E2E L2VPN

Require
ments
Public MPLS TE tunnels

network
tunnels
for VPN
services
Reliabilit l Network reliability: l Network reliability: provided

y – Link protection: provided using using TE hot standby, BFD for
TE hot standby and CR-LSP, pseudo wire (PW)
bidirectional forwarding redundancy, and BFD for PW.
detection (BFD) for l Device reliability: provided using
constraints-routed label RSVP GR or NSR.
switched path (CR-LSP).
– Node protection: provided
using VPN FRR and BFD for
TE tunnel.
l Device reliability: provided using
RSVP GR or NSR.
Network RSVP MD5 or keychain is used.

security
QoS E2E QoS must be configured between CSGs and radio service gateways
(RSGs) to ensure service quality. Using DS-TE to establish MPLS TE tunnels
is recommended.
Deployment key points:

l Explicit paths are configured to separately establish primary and backup CR-LSPs. The
two paths do not overlap in important areas.
l Either RSVP GR or NSR can be used. Using RSVP NSR is recommended.
10.3.12.2 DS-TE Applications
Application Scenario: Access of Different Services to One VPN

On VPNs with MPLS-TE tunnels, one VPN may bear EF, AF, and BE services at the same
time. One MPLS-TE tunnel may bear different types of services at the same time.
To avoid different services in one tunnel interfering with each other, you can set up specific
VPNs and TE tunnels to bear specific services. Otherwise, when multiple VPNs and tunnels
are set up for bearing difference types of services over a network at the same time, the
resources may be wasted.
Alternatively, you can deploy DS-TE to use a multi-CT LSP to bear services over one VPN.
A multi-CT LSP can reserve up to 8 CTs. Each CT can bear one type of services of one VPN.
Services among different CTs cannot interfere with each other.

Equipment
As shown in Figure 10-95, VPN1 bears EF, AF, and BE services. One DS-TE tunnel needs to
be set up and configured with CT0 (100 Mbit/s), CT2 (50 Mbit/s), and CT7 (10 Mbit/s). The
tunnel is bound to VPN1 on the ingress. After traffic of VPN1 is classified, the traffic enters
corresponding CT queues.
Figure 10-95 Networking diagram for one LSP bearing different services on one VPN
CT0 for BE: 100M bps

CT2 for AF: 50M bps
CT7 for EF: 10M bps
MPLS TE tunnel
PE PE
VPN1 VPN1
Site 1 CE CE
Site 2
Application Scenario: Access of Different Services to Different VPNs

On a VPN with MPLS-TE tunnels, multiple VPNs may share one TE tunnel. These VPNs
require specific QoS. They compete with each other for resources and the QoS cannot meet
requirements of services over VPNs.
The solutions to the preceding scenarios are as follows:
l Multiple VPNs bear different types of services
One TE tunnel can be used to bear a maximum of eight types of services.
For example, VPN1 and VPN2 can access the TE tunnel at the same time. VPN1 bears
EF and BE services and VPN2 bears AF services. One TE tunnel needs to be set up and
each type of services on each VPN is configured with a specific CT. The number of CTs
is equal to the sum of service types on VPN1 and the number of service types on VPN2.
Three CTs are supported.
l Multiple VPNs bear the same type of services
The number of TE tunnels to be set up is equal to the number of VPNs. The number of
CTs on each tunnel is equal to the number of corresponding service types on VPNs.
For example, VPN1 and VPN2 can access the TE tunnel at the same time. VPN1 bears
EF and BE services, and VPN2 also bears EF and BE services. Two TE tunnels can be
set up for VPN1 and VPN2. Each type of services on each tunnel is configured with a
specific CT.
l Multiple VPNs bear services part of which are the same
Each VPN needs a tunnel. The number of CTs on each tunnel is equal to the number of
corresponding service types on VPNs.

Equipment
Application Scenario: Access of Traffic to VPNs and Non-VPNs

QoS requirements vary with VPN traffic and non-VPN traffic. If one TE tunnel bears all the
traffic, the VPN traffic and non-VPN traffic may compete with each other for resources. The
QoS cannot meet the requirements of services.
The solutions to the preceding scenarios are as follows:
l The VPN and non-VPN bear different types of services.

One TE tunnel needs to be set up. Services on the VPN and non-VPN are configured
with different CTs. The number of CTs is equal to the sum of the number of service types
on the VPN and the number of service types on the non-VPN.
l The VPN and non-VPN bear the same type of services.
Two TE tunnels need to be set up for the VPN and non-VPN services. Specific types of
services on each tunnel are configured with specific CTs.
l The VPN and non-VPN bear services, part of which is the same.
Two TE tunnels need to be set up for the VPN and non-VPN services. Specific types of
services on each tunnel are configured with specific CTs.
DS-TE Enabled with the Tunnel Protection

Table 10-24 describes DS-TE features in the tunnel protection networking.
Table 10-24 DS-TE enabled with the tunnel protection

Protection Mode DS-TE Features
TE FRR DS-TE features are applied as follows:

l When the bandwidth protection is required, the CTs and
bandwidth are configured manually on the bypass tunnel in
manually-configured FRR. The QoS is guaranteed. The protection
modes are the 1:1 and N:1. In the automatic FRR, the bypass
tunnel inherits the CTs and bandwidth of the primary tunnel. The
QoS is guaranteed. The protection mode is 1:1 only.
l When the bandwidth protection is not required, both manually-
configured and automatic FRR support 1:1 protection and N:1
protection, irrespective of the CTs and their bandwidth on the
bypass tunnel.
CR-LSP backup The bypass CR-LSP inherits the CTs and their bandwidth from the
primary CR-LSP. The best-effort path cannot guarantee QoS and it
does not inherit the CTs and bandwidth from the primary CR-LSP.
Tunnel protection Two independent tunnels are bound and they form a tunnel protection
group group. One tunnel is the primary tunnel and the other is the bypass
tunnel. DS-TE features on the backup tunnel can be configured. The
CTs and bandwidth of the bypass tunnel should be consistent with
those of the primary tunnel.
In addition, MPLS OAM packets are sent through the queue of the
highest TE tunnel priority.

Equipment
Interworking Between Devices

During network deployment or device version upgrade, non-DS-TE devices may work with
DS-TE devices, or devices in non-IETF mode work with devices in IETF mode.
The ATN supports the interworking between the following devices:

l Interworking between DS-TE devices and non-DS-TE devices
– Supports the establishment of non-DS-TE tunnels from non-DS-TE devices to DS-
TE devices.
– Supports the establishment of non-DS-TE tunnels from DS-TE devices to non-DS-
TE devices.
l Interworking between non-Huawei DS-TE devices that do not support the CLASSTYPE
object
The Path messages that carry the following CT information can be parsed:
– L-LSP CT information that is carried by the EXTENDED_CLASSTYPE object
– CTO information that is carried by the EXTENDED_CLASSTYPE object
Selecting a Bandwidth Constraints Model

In the ATN implementation, the IETF mode supports three bandwidth constraints models:
RDM, MAM, and extended-MAM. The non-IETF mode supports the RDM and MAM. By
default, the global bandwidth constraints model is the RDM.
On a network, if the bandwidth preemption is enabled among CTs, the RDM is applicable.
The bandwidth can be utilized efficiently. If the bandwidth preemption is disabled among
CTs, the MAM or extended-MAM is applicable. The extended-MAM is recommended when
a node needs to interwork with both a non-DS-TE node and a DS-TE node.
In the ATN implementation, all links on the same node use the same bandwidth constraints
model. In addition, the same bandwidth constraints model is recommended to all nodes over
the entire network. Therefore, the network is configured and maintained easily.

Abbreviation
CSPF Constrained Shortest Path First
CT class type
FRR fast reroute
MP merge point
PLR point of local repair
PSB path state block
RSVP Resource Reservation Protocol

Equipment

Abbreviation
RSB reserved state block
TE traffic engineering
10.4 Seamless MPLS

10.4.1 Introduction
Definition
Seamless MPLS is a bearer technique that extends MPLS techniques to access networks. All
services can be encapsulated using MPLS on access networks. Seamless MPLS establishes an
end-to-end (E2E) LSP across the access, aggregation, and core layers to transmit services.
Purpose
MPLS, a mature and well-known technique, proves its worth and inspires service providers in
network construction. MPLS can converge multiple networks on an Ethernet-based
infrastructure, which fully exerts advantages of the single forwarding model and reduces
network construction costs. MPLS has been widely used on aggregation and core networks.
With the trend towards the delayering network structure, a metropolitan area network (MAN)
evolves into the Ethernet architecture. This gives the opportunity of using MPLS techniques
on the MAN and access networks. To meet this requirement, the seamless MPLS technique
was developed. Seamless MPLS is not a new technique. It uses existing Border Gateway
Protocol (BGP), Interior Gateway Protocol (IGP), and MPLS techniques to establish an E2E
LSP across the access, aggregation, and core layers so that traffic can be encapsulated and
forwarded using MPLS over a whole network.
Benefits
Seamless MPLS offers the following benefits:
l Converges the access, aggregation, and core layers on an MPLS network, encapsulates
all services using MPLS, and transmits these services along an E2E LSP. Seamless
MPLS simplifies network provisioning, operation, and maintenance.
l Supports high deployment flexibility and scalability. On a seamless MPLS network, any
two nodes on an LSP can be connected and roll out services using MPLS.
10.4.2 Principles
Usage Scenario
Seamless MPLS establishes a BGP LSP across the access, aggregation, and core layers on a
network and transmits services along the E2E BGP LSP. Any two nodes on the LSP can
exchange service traffic. The seamless MPLS network architecture maximizes service
scalability using the following functions:

Equipment
l Allows access nodes to signal all services to an LSP.

l Uses the same transport layer convergence technique to rectify network-side faults,
without affecting service transmission.
Seamless MPLS networking solutions are as follows:
l Intra-AS seamless MPLS: The access, aggregation, and core layers are within a single
autonomous system (AS). Intra-AS seamless MPLS applies to mobile bearer networks.
l Inter-AS seamless MPLS: The access and aggregation layers are within a single AS,
whereas the core layer in another AS. Inter-AS seamless MPLS transmits enterprise
services.
l Inter-AS seamless MPLS+HVPN: A cell site gateway (CSG) and an aggregation
(AGG) node establish a hierarchical virtual private network (HVPN) connection, and the
AGG and a mobile aggregate service gateway (MASG) establish a seamless MPLS LSP.
The AGG provides hierarchical L3VPN access services and routing management
services. Seamless MPLS advantages are integrated with HVPN advantages. Any two
nodes on an inter-AS LSP can transmit services at the access, aggregation, and core
layers, which provides high service scalability. HVPN allows carriers to deploy devices
with layer-specific capacities to meet service requirements. This implementation allows
for a reduction in network deployment costs.

Equipment
Intra-AS Seamless MPLS
Table 10-25 Intra-AS seamless MPLS networking

Network Description
Deployment
Deploy Figure 10-96 illustrates an intra-AS seamless MPLS network

routing on which routing protocols are deployed.
protocols.
Figure 10-96 Deploying routing protocols for the intra-AS
seamless MPLS networking
Single AS
IBGP IBGP IBGP

OSPF/ OSPF/ OSPF/
IS-IS IS-IS IS-IS
CSG1 AGG1 Core ABR1 MASG1
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
NodeB/
eNodeB
Control
plane In Figure 10-96, routing protocol deployment is as follows:
l An IGP (IS-IS or OSPF) is enabled on devices at each of the
access, aggregation, and core layers to implement intra-AS
connectivity.
l The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each pair of the following devices:
– CSG and AGG
– AGG and core ABR
– Core ABR and MASG
The AGG and core ABR are configured as route reflectors
(RRs) so that the CSG and MASG can obtain routes
destined for each other's loopback addresses.
l The AGG and core ABR set the next hop addresses in BGP
routes to their own addresses to prevent advertising
unnecessary IGP area-specific public routes.

Equipment
Network Description
Deployment
Deploy Figure 10-97 illustrates an intra-AS seamless MPLS network

tunnels. on which tunnels are deployed.
Figure 10-97 Deploying tunnels for the intra-AS seamless

MPLS networking
Single AS
IBGP label IBGP label IBGP label
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
NodeB/
eNodeB MPLS LDP/ MPLS LDP/ MPLS LDP/
MPLS TE MPLS TE MPLS TE
In Figure 10-97, tunnel deployment is as follows:

l A public network tunnel is established using LDP or TE in
each IGP area.
l The path CSG1 -> AGG1 -> core ABR1 -> MASG1 is used
in the following example. An IBGP peer relationship is
established between each pair of the following devices:
– CSG and AGG
– AGG and core ABR
– Core ABR and MASG
These devices are enabled to advertise labeled routes and
assign labels to BGP routes that match a specified routing
policy. After the devices exchange labeled BGP routes, an
E2E BGP LSP between the CSG and MASG is established.

Equipment
Network Description
Deployment
Figure 10-98 illustrates the forwarding plane of the intra-AS

seamless MPLS networking.
Figure 10-98 Forwarding plane for the intra-AS seamless

MPLS networking
NodeB/
eNodeB Aggregation
Access Core
MME/
SGW
NodeB/
eNodeB
VPN
MPLS tunnel label BGP label
In Figure 10-98, seamless MPLS primarily transmits VPN

packets. The following example demonstrates how VPN
packets, including labels and data, are transmitted from a CSG
Forwarding plane to an MASG along the path CSG1 -> AGG1 -> core ABR1 ->
MASG1.
1. The CSG pushes a BGP LSP label and an MPLS tunnel
label in sequence into each VPN packet and forwards the
packets to the AGG.
2. The AGG removes the access-layer MPLS tunnel labels
from the packets and swaps the existing BGP LSP labels for
new labels. The AGG then pushes an aggregation-layer
MPLS tunnel label into each packet and forwards the
packets to the core ABR. If the penultimate hop popping
(PHP) function is enabled on the AGG, the AGG receives
packets without MPLS tunnel labels.
3. The core ABR removes aggregation-layer MPLS tunnel
labels from the VPN packets and swaps the existing BGP
LSP labels for new labels. The AGG pushes a core-layer
MPLS tunnel label to each packet. It then forwards the
packets to the MASG.
4. The MASG removes MPLS tunnel labels and BGP LSP
labels from the VPN packets. If the PHP function is enabled
on the MASG, the MASG receives packets without MPLS
tunnel labels.
The VPN packet transmission along the intra-AS seamless
MPLS tunnel is complete.

Equipment
Inter-AS Seamless MPLS
Table 10-26 Inter-AS seamless MPLS networking

Network Description
Deployment
Deploy Figure 10-99 illustrates an inter-AS seamless MPLS network on

routing which routing protocols are deployed.
protocols.
Figure 10-99 Deploying routing protocols for the inter-AS
seamless MPLS networking
AS x AS y
IBGP IBGP EBGP IBGP

OSPF/ OSPF/ OSPF/
IS-IS IS-IS IS-IS
CSG1 AGG1 AGG ASBR1 Core ASBR1 MASG1
NodeB/
eNodeB Access Aggregation Core
MME/
SGW
NodeB/
eNodeB
In Figure 10-99, routing protocol deployment is as follows:
Control l An IGP (IS-IS or OSPF) is enabled on devices at each of the

plane access, aggregation, and core layers to implement intra-AS
connectivity.
l The path CSG1 -> AGG1 -> AGG ASBR1 -> core ASBR1 -
> MASG1 is used in the following example. A BGP peer
relationship is established between each pair of the following
devices:
– CSG and AGG
– AGG and AGG ASBR
– AGG ASBR and core ASBR
– Core ASBR and MASG
An EBGP peer relationship between the AGG ASBR and
core ASBR is established, and IBGP peer relationships
between other devices are established.
l The AGG is configured as an RR so that IBGP peers can
exchange BGP routes, and the CSG and MASG can obtain
BGP routes destined for each other's loopback addresses.
l If the AGG ASBR and core ASBR are indirectly connected,
an IGP neighbor relationship between them must be
established to implement inter-area connectivity.

Equipment
Network Description
Deployment
Deploy Figure 10-100 illustrates an inter-AS seamless MPLS network

tunnels. on which tunnels are deployed.
Figure 10-100 Deploying tunnels for the inter-AS seamless

MPLS networking
AS x AS y
IBGP IBGP EBGP IBGP

label label label label
NodeB/
MME/
SGW
NodeB/

each IGP area. An LDP LSP or a TE LSP is established if
more than one hop exists between the AGG ASBR and core
ASBR.
l The CSG, AGG, AGG ASBR, and core ASBR are enabled to
advertise labeled routes and assign labels to BGP routes that
match a specified routing policy. After the devices exchange
labeled BGP routes, a BGP LSP between the CSG and core
ASBR is established.
l Either of the following tunnel deployment methods in the
core area can be used:
– A BGP LSP between the core ASBR and MASG is
established. This BGP LSP and the BGP LSP between
the CSG and core ASBR are combined into an E2E BGP
LSP. The MASG installs the route to the MASG's
loopback address into the BGP routing table and
advertises this route to the core ASBR using the IBGP
peer relationship. The core ASBR assigns a label to the
route and advertises the labeled route to the AGG ASBR.
– No BGP LSP between the core ASBR and MASG is
established. The core ASBR runs an IGP to learn the
route destined for the MASG's loopback address and
installs the route to the routing table. The core ASBR
assigns a BGP label to the route and associates the route
with an intra-AS tunnel. The BGP LSP between the CSG
and core ASBR and the MPLS tunnel in the core area are
combined into an E2E tunnel.

Equipment
Network Description
Deployment
Figure 10-101 illustrates the forwarding plane of the inter-AS

seamless MPLS networking with a core-layer BGP LSP
established.
Figure 10-101 Forwarding plane for the inter-AS seamless

MPLS networking with a BGP LSP established in the core area
NodeB/
MME/
SGW
NodeB/
eNodeB
VPN

Forwarding plane packets, including labels and data, are transmitted from a CSG
to an MASG along the path CSG1 -> AGG1 -> AGG ASBR1 ->
core ASBR1 -> MASG1.
1. The CSG pushes a BGP LSP label and an MPLS tunnel label
in sequence into each VPN packet and forwards packets to
the AGG.
new labels. The AGG pushes an aggregation-layer MPLS
tunnel label into each packet. The AGG then forwards the
packets to the AGG ASBR. If the PHP function is enabled
on the AGG, the AGG receives packets without MPLS
tunnel labels.
3. The AGG ASBR then removes the MPLS tunnel labels from
packets and swaps the existing BGP LSP label for a new
label in each packet. It forwards the packets to the core
ASBR. If the PHP function is enabled on the AGG ASBR,
the core ASBR receives packets without MPLS tunnel labels.
4. After the core ASBR receives the packets, it swaps a BGP
LSP label for a new label and adds a core-layer MPLS tunnel
label to each packet. It then forwards the packets to the
MASG.
5. The MASG removes MPLS tunnel labels, BGP LSP labels,
and VPN labels from the packets. If the PHP function is

Equipment
Network Description
Deployment
enabled on the MASG, the MASG receives packets without

MPLS tunnel labels.
The VPN packet transmission along the inter-AS seamless
MPLS tunnel is complete.
Figure 10-102 illustrates the forwarding plane for the inter-AS
seamless MPLS networking without a BGP LSP established in
the core area.
Figure 10-102 Forwarding plane for the inter-AS seamless

MPLS networking without a BGP LSP established in the core
area
NodeB/
MME/
SGW
NodeB/
eNodeB
VPN
In Figure 10-102, the process of transmitting VPN packets on

this network without a BGP LSP established in the core area is
similar to that on the network with a BGP LSP established in the
core area. The difference is that without a BGP LSP in the core
area, the core ASBR only removes BGP labels from packets
while adding MPLS tunnel labels to these packets.

Equipment
Inter-AS Seamless MPLS+HVPN
Table 10-27 Inter-AS seamless MPLS+HVPN networking

Network Description
Deployment
Deploy Figure 10-103 illustrates an inter-AS seamless MPLS+HVPN

routing network on which routing protocols are deployed.
protocols.
Figure 10-103 Deploying routing protocols for the inter-AS
seamless MPLS+HVPN networking
AS x AS y
IBGP EBGP IBGP

OSPF/ OSPF/ OSPF/
IS-IS IS-IS IS-IS
NodeB/
MME/
SGW
NodeB/
eNodeB
MP-IBGP
MP-EBGP
(HVPN)
Control
plane
In Figure 10-103, routing protocol deployment is as follows:

l An IGP (IS-IS or OSPF) is enabled on devices at each of the
access, aggregation, and core layers to implement intra-AS
connectivity.
l An IBGP peer relationship is established between each pair
of the following devices:
– AGG and an AGG ASBR
– Core ASBR and MASG
l An EBGP peer relationship between the AGG ASBR and
core ASBR is established.
l An MP-IBGP peer relationship between the CSG and AGG
is established, and a multi-hop MP-EBGP peer relationship
between the AGG and MASG is established.

Equipment
Network Description
Deployment
Deploy Figure 10-104 illustrates an inter-AS seamless MPLS+HVPN

tunnels. network on which tunnels are deployed.
Figure 10-104 Deploying tunnels for the inter-AS seamless

MPLS+HVPN networking
AS x AS y
IBGP EBGP IBGP

label label label
NodeB/
MME/
SGW
NodeB/

each IGP area.
l The AGGs, AGG ASBRs, core ASBRs, and MASGs are
enabled to advertise labeled routes. They assign labels to
BGP routes that match a specified routing policy. After they
exchange BGP routes, a BGP LSP between each pair of an
AGG and MASG can be established.

Equipment
Network Description
Deployment
Figure 10-105 illustrates the forwarding plane of the inter-AS

seamless MPLS+HVPN networking.
Figure 10-105 Forwarding plane of the inter-AS seamless

MPLS+HVPN networking
NodeB/
MME/
SGW
NodeB/
eNodeB
VPN

Forwarding plane packets, including labels and data, are transmitted from a CSG
to an MASG along the path CSG2 -> AGG1 -> AGG ASBR1 ->
core ASBR1-> MASG1.
1. The CSG pushes an MPLS tunnel label into each VPN
packet and forwards the packets to the AGG.
new labels. It then adds aggregation-layer MPLS tunnel
labels to the packets and forwards them to the AGG ASBR.
If the PHP function is enabled on the AGG, the AGG
receives packets without MPLS tunnel labels.
3. The AGG ASBR then removes the MPLS tunnel labels from
packets and swaps the existing BGP LSP label for a new
label in each packet. It then forwards the packets to the core
ASBR. If the PHP function is enabled on the AGG ASBR,
the AGG ASBR receives packets without MPLS tunnel
labels.
4. After the core ASBR receives the packets, it swaps a BGP
LSP label for a new label and adds a core-layer MPLS tunnel
label to each packet. It then forwards the packets to the
MASG.
5. The MASG removes MPLS tunnel labels, BGP LSP labels,
and VPN labels from the packets. If the PHP function is

Equipment
Network Description
Deployment
enabled on the MASG, the MASG receives packets without

MPLS tunnel labels.
The VPN packet transmission along the seamless MPLS
tunnel is complete.
Reliability
Seamless MPLS network reliability can be improved using a variety of functions. If a network
fault occurs, devices with reliability functions enabled immediately detect the fault and switch
traffic from active links to standby links.
NOTE
LDP FRR and BGP FRR cannot be simultaneously configured.

Among the ATN 950B series, only the ATN 950B with the control board AND2CXPB/AND2CXPE
installed supports TE FRR.
The following examples demonstrate reliability functions on an inter-AS seamless MPLS

network.
l A fault occurs on a link between a CSG and an AGG.

In Figure 10-106, the active link on the primary path between CSG1 and AGG1 fails.
After BFD for LDP or BFD for CR-LSP detects the fault, the BFD module uses LDP
FRR, TE FRR, TE Hot-standby or BGP FRR to switch traffic from the primary path to
the backup path.
Figure 10-106 Traffic protection triggered by a fault in the link between the CSG and
AGG on the inter-AS seamless MPLS network
AGG1 AGG ASBR1 Core ASBR1 MASG1

CSG1
NodeB/
MME/
SGW
NodeB/
eNodeB Primary path
Backup path
l A fault occurs on an AGG.

In Figure 10-107, BGP Auto FRR is configured on CSGs and AGG ASBRs to protect
traffic on the BGP LSP between CSG1 and MASG1. If BFD for LDP or BFD for TE
detects AGG1 faults, the BFD module switches traffic from the primary path to the
backup path.

Equipment
Figure 10-107 Traffic protection triggered by a fault in an AGG on the inter-AS

seamless MPLS network
AGG ASBR1 Core ASBR1

AGG1 MASG1
CSG1
NodeB/
MME/
SGW
CSG2 AGG2 Core ASBR2 MASG2
AGG ASBR2
NodeB/
eNodeB Primary path
Backup path
l A fault occurs on the link between an AGG and an AGG ASBR.

In Figure 10-108, a fault occurs on the link along the primary path between AGG1 and
ASBR1. After BFD for LDP or BFD for CR-LSP detects the fault, the BFD module uses
LDP FRR, TE FRR, TE Hot-standby or BGP FRR to switch traffic from the primary
path to the backup path.
Figure 10-108 Traffic protection triggered by a fault in the link between an AGG and an
AGG ASBR on the inter-AS seamless MPLS network

AGG1 MASG1
CSG1
NodeB/
MME/
SGW
NodeB/
eNodeB Primary path
Backup path
l A fault occurs on an AGG ASBR.

In Figure 10-109, BFD for LDP or BFD for TE is configured on AGG1, and BFD for
interface is configured on core ASBR1. If AGG ASBR1 fails, the BFD modules on
AGG1 and core ASBR1 detect the fault and trigger the BGP Auto FRR function. BGP
Auto FRR switches both upstream and downstream traffic from the primary path to
backup paths.

Equipment
Figure 10-109 Traffic protection triggered by a fault in an AGG ASBR on the inter-AS

AGG1 MASG1
NodeB/ CSG1
MME/
SGW
CSG2 MASG2
NodeB/ AGG2 AGG ASBR2 Core ASBR2
eNodeB
Primary path
Backup path for downstream traffic
Backup path for upstream traffic
l A fault occurs on the link between an AGG ASBR and a core ASBR.
As shown in Figure 10-110, BFD for interface is configured on AGG ASBR1 and core
ASBR1. If the BFD module detects a fault in the link between AGG ASBR1 and core
ASBR1, the BFD module triggers the BGP Auto FRR function. BGP Auto FRR switches
both upstream and downstream traffic from the primary path to backup paths.
Figure 10-110 Traffic protection triggered by a fault in the link between an AGG ASBR
and a core ASBR on the inter-AS seamless MPLS network

MASG1
CSG1 AGG1
NodeB/
MME/
SGW
CSG2 MASG2
eNodeB
Primary path
l A fault occurs on a core ASBR.

In Figure 10-111, BFD for interface and BGP Auto FRR are configured on AGG
ASBR1. BGP Auto FRR and BFD for LDP (or for TE) are configured on MASGs to
protect traffic on the BGP LSP between CSG1 and MASG1. If the BFD module detects a
fault in core ASBR1, it switches both upstream and downstream traffic from the primary
path to backup paths.

Equipment
Figure 10-111 Traffic protection triggered by a fault in a core ASBR on the inter-AS

MASG1
AGG1
NodeB/ CSG1
MME/
SGW
CSG2 MASG2
eNodeB
Primary path
l A link fault occurs in the core area.

In Figure 10-112, BFD for LDP or BFD for CR-LSP is configured on core ASBR1. If
the BFD module detects a fault in the link between core ASBR1 and MASG1, it triggers
the LDP FRR, TE FRR, TE Hot-standby or BGP FRR function. LDP FRR, TE FRR, or
BGP FRR switches both upstream and downstream traffic from the primary path to the
backup path.
Figure 10-112 Traffic protection triggered by a link fault in a core area on the inter-AS
Core ASBR1
AGG1 AGG ASBR1 MASG1
CSG1
NodeB/
MME/
SGW
MASG2
CSG2 AGG2 AGG ASBR2 Core ASBR2
NodeB/
eNodeB
Primary path
Backup path
l A fault occurs on an MASG.

In Figure 10-113, BFD for BGP tunnel is configured on CSG1. BFD for BGP tunnel is
implemented in compliance with RFC 5884 titled "Bidirectional Forwarding Detection
(BFD) for MPLS Label Switched Paths (LSPs)." BFD for BGP tunnel monitors E2E
BGP LSPs, including a BGP LSP connected to an LDP LSP. When MASG1 that
functions as a provider edge (PE) device fails, BFD for BGP tunnel can rapidly detect
the fault and trigger VPN FRR switching. The BFD module then switches both upstream
and downstream traffic from the primary path to the backup path.

Equipment
Figure 10-113 Traffic protection triggered by a fault in an MASG on the inter-AS

Core ASBR1
CSG1 AGG1 AGG ASBR1 MASG1
NodeB/
MME/
SGW
NodeB/
eNodeB
Primary path
Backup path
Seamless MPLS Load Balancing

The seamless MPLS networking solution uses BGP LSP load balancing between core ASBRs
and MASGs, which improves network resource utilization. Currently, the load balancing
supported by the seamless MPLS networking solution is implemented based on BGP LSP
labeled routes.
The following example describes how load balancing works between core ASBRs and
MASGs in inter-AS seamless MPLS networking:
Multiple BGP IPv4 unicast label peer relationships are configured between each core ASBR
and an MASG. After the MASG receives BGP LSP labeled routes with the same prefix but
different next hops, these BGP LSP labeled routes participate in load balancing, which
improves network resource utilization. In Figure 10-114, two BGP LSPs are configured
between MASG1 and each core ASBR for load balancing.
Figure 10-114 BGP LSP labeled route load balancing networking
AGG Core
CSG1 AGG1 MASG1
ASBR1 ASBR1
BGP LSP
NodeB/
BGP LSP
eNodeB Core
Access Aggregation
MME/
SGW
AGG Core
CSG2 AGG2 MASG2
ASBR2 ASBR2
NodeB/
eNodeB
10.4.3 Applications

Equipment
10.4.3.1 Seamless MPLS Applications in VPN Services
Service Overview
With the growth of third generation of mobile telecommunications (3G) services and Long
Term Evolution (LTE) services, inter-AS leased line services become the key services.
Seamless MPLS can establish an E2E LSP between a cell site gateway (CSG) and a mobile
aggregate service gateway (MASG) to transmit virtual private network (VPN) services.
Seamless MPLS helps carriers reduce costs of network construction, operation, and
maintenance and allows carriers to uniformly operate and maintain networks.
Figure 10-115 illustrates an LTE network. The access and aggregation layers belong to one
AS, and the core layer belongs to another AS. To transmit VPN services, the inter-AS
seamless MPLS+HVPN networking can be used to establish an LSP between each pair of a
CSG and an MASG. CSGs are connected to NodeBs functioning as Wideband Code Division
Multiple Access (WCDMA) 3G base stations and to eNodeBs functioning as LTE base
stations. MASGs are connected to a mobility management entity (MME) or service gateway
(SGW). VPN instances can be configured between CSGs and MASGs to transmit various
types of services. An HVPN is deployed between each pair of a CSG and an aggregation
(AGG) node, and an inter-AS LSP between each pair of an AGG and an MASG is established
using the seamless MPLS technique. A NodeB or eNodeB can then communicate with the
MME or SGW.
Figure 10-115 Seamless MPLS application

AS x AS y
CSG1 AGG1 A G G A S B R 1 C o re A S B R 1 MASG1
NodeB/
eNodeB Access A g g re g a tio n C o re
M M E/
SGW
CSG2 AGG2 A G G A S B R 2 C o re A S B R 2 MASG2
NodeB/
eNodeB

Equipment
Seamless MPLS Networking Characteristics
Table 10-28 Inter-AS seamless MPLS+HVPN characteristics

Characteristic Description
Services Segment tunnels are established.

l LDP LSPs or TE LSPs are established at the access layer.
l LDP LSPs or TE LSPs are established at the access layer. An
inter-AS BGP LSP is established across the aggregation and
core layers.
l The inter-AS BGP LSP overlaps an intra-AS LDP LSP or an
intra-AS TE LSP.
Route control The number of routes is minimized:

l On the public network, routes at the access and aggregation
layers are isolated from each other. Devices at the aggregation
and core layers advertise labeled BGP routes destined for one
another's loopback addresses.
l On the private network, AGGs only advertise default routes,
which minimizes the number of private routes to be advertised.
Enterprise leased line The VPN services of large-scale enterprises can be provisioned.
services The Layer 2 and Layer 3 leased lines connected to CSGs are easy
to deploy to transmit VPN services.
Protection switching The following protection switching functions can be configured:

l TE hot standby or LDP FRR: monitors TE LSPs or LDP LSPs
l BGP FRR: monitors BGP LSPs
l VPN FRR: monitor VPN connections
CSG performance CSGs that maintain a few routes only need to process packets each
requirements with two labels.

Equipment
Feature Description 11 VPN
11 VPN
About This Chapter
This document describes the VPN feature in terms of the overview, principle, and
applications.
11.1 VPN Overview

This chapter describes the background, classification, networking, and the principles for
implementing Virtual Private Network (VPN) services.
11.2 Tunnel Policy
11.3 BGP/MPLS IP VPN
11.4 VLL
11.5 PWE3
11.6 PWE3 Reliability
11.7 IP Hard Pipe
This chapter provides a description of IP hard pipe and describes its purpose, benefits,
principles, and applications.
11.8 VPLS
11.9 L2VPN Loop Detection
11.10 IP RAN Virtual Cluster
11.1 VPN Overview

This chapter describes the background, classification, networking, and the principles for
implementing Virtual Private Network (VPN) services.

Equipment
11.1.1 Introduction to VPN

Origin of VPN
With the development of the society, more IT technologies are applied to business processes
of modern enterprises. For example, IP technologies are applied to enterprise resource
programming, Voice over IP (VoIP), net meeting, and remote training. The IP technologies
provide a basic structure for an enterprise to realize automatic office operations and to obtain
information. With the development of network economy, the branches of enterprises are
increasing in different places, the partners of enterprises keep increasing, and the mobility of
personnel becomes more frequent. An enterprise, therefore, needs to connect its headquarters
and branches with the help of carrier's networks to form an enterprise network. Therefore,
mobile staff can conveniently access the enterprise network outside the enterprise.
In the initial stage, telecom operators use leased lines to provide Layer 2 links for enterprises.
The disadvantages of this mode are as follows:
l The establishment of private networks takes a long period.
l The investment of private networks is huge.
l The private networks are difficult to manage.
After the emergence of Asynchronous Transfer Mode (ATM) and Frame Relay (FR), telecom
operators begin to use virtual circuits (VCs) to provide point-to-point (P2P) layer 2
connections for clients. Clients set up Layer 3 networks and transmit IP data over the P2P
connections. Compared with leased lines, VCs are cheaper and can be constructed in short
period. In addition, VCs enable users of different private networks to share carrier's networks.
The disadvantages of traditional private networks are as follows:
l Traditional private networks are dependent on media such as ATM or FR. To provide
VPN services based on ATM, operators must set up ATM networks covering the service
areas. Similarly, to provide VPN services based on FR, operators must set up FR
networks covering the service areas. This causes the waste of customer investment.
l The speed of traditional private networks is lower than that requirement of the Internet.
l Deployment of traditional private networks is complex. To add a site into an existing
private network, you must modify the configurations of the edge nodes that access to the
site.
Traditional private networks help to increase the profits of enterprises. However, traditional
private networks do not meet the requirements of flexibility, security, economy, and
scalability. A substitution, namely, a logical private network over the IP network is therefore
introduced. This new solution is the technology of Virtual Private Network (VPN).
A VPN is a virtual private network set up over public networks by Internet Service Providers
(ISPs) and Network Service Providers (NSPs).
Characteristics of VPN
A VPN has the following characteristics:
l Privacy: For a VPN user, the VPN has no difference from a traditional private network in
terms of privacy. Resources of a VPN are separated from its bearer network. Therefore,
the resources of a VPN cannot be used by other users outside this VPN. In addition,
VPNs offer sufficient security measures to ensure that the internal information is free
from external interference.

Equipment
l Virtuality: VPN users communicate with each other through public networks. The public
networks are used by other non-VPN users at the same time. That is, a VPN is a logical
private network. The public networks are called VPN backbone networks.
Given by the characteristics of privacy and virtuality, VPN can segment an existing IP
network into several logically isolated networks. Such logical segmentation is quite flexible.
It can be applied to interconnect different departments or branches of an enterprise. VPNs can
also provide enhanced services. For example, creating a VPN for the IP phone service can
solve the problem of inadequate IP addresses, guarantee Quality of Service (QoS), and pave
the way for enhanced services.
VPNs, especially the Multiprotocol Label Switching VPN (MPLS VPN), are highly evaluated
by operators in terms of providing interworking between enterprises and other enhanced
services. In this manner, VPNs become an important means for operators to provide Value-
Added Services (VASs) in IP networks.
Advantages of VPN
Compared with traditional private networks, for a user, a VPN has the following advantages:
l A VPN can guarantee the data security. On a VPN, reliable connections are established
between remote users, branches, partners, suppliers, and company headquarters to ensure
the security of data transmission. High security is of great significance to the
combination of e-business or financial networks with communication networks.
l A VPN is an economical solution. Using public network, an enterprise can connect its
headquarters with branches, personnel on business, and business partners at a low cost.
l A VPN supports mobile services. VPN users that are located outside the headquarters
can access the VPN regardless of time and place. As a result, the increasing demand for
mobile services can be met.
l A VPN can guarantee QoS. A VPN with QoS such as MPLS VPN can provide VPN
users with QoS of different levels.
From the perspective of carriers, a VPN has the following advantages:
l VPNs are easy to operate. The resource utilization is improved, and profits of carriers are
increased.
l The configurations of VPNs are flexible. The carriers can add or delete VPN users
through software configurations without modifying hardware configurations. Therefore,
VPNs have flexible and wide applications.
l VPNs provide multiple services. In addition to basic VPN interworking services, carriers
can also provide enhanced services such as network outsourcing, service outsourcing,
and customized services.
Given by outstanding advantages, VPNs relieve enterprises from the burden of network
operation and maintenance to some extent, facilitate enterprises to achieve their business
goals, and therefore become popular with enterprises. In addition, an operator can manage and
operate a network, and provide multi-service on the network such as Best-effect IP service,
VPN, traffic engineering, and differentiated services (DS). Therefore, the cost of construction,
maintenance, and operation of the operator is reduced.
VPNs provide a network with strong scalability and flexibility besides security, reliability, and
manageability. Regardless of locations, users can enjoy the VPN services as long as they can
access the Internet.

Equipment
11.1.1.1 Classification of VPN

With the development of network technologies, the VPN technology is widely applied and
many new VPN technologies emerge. VPNs can be divided into different types.
Classification Based on Networking Models

According to networking models, VPNs are classified into the following types:
l Virtual Private Dial Network (VPDN)

A Virtual Private Dial Network (VPDN) provides access services for enterprises, small-
scale ISPs, and mobile personnel by using access networks and the dialing function of
public networks.
With the help of the VPN features such as private IP addresses, users can access VPDNs
through Public Switched Telephone Networks (PSTNs) and Integrated Services Digital
Networks (ISDNs). VPDNs feature low investment, short construction period, and low
operation cost. Generally, VPDNs adopt P2P connections. VPDNs are implemented
through the tunneling protocols such as the Layer 2 Tunneling Protocol (L2TP) and the
Point-to-Point Tunneling Protocol (PPTP).
Figure 11-1 illustrates an example of VPDN networking. Remote users such as
personnel of foreign branches or personnel on business can access the Internet through
the ISDN or PSTN. A tunnel is created between the network access server and
corporation gateway. Through the tunnel, the remote users can access the intranet.
Figure 11-1 Example of a VPDN

Network Access Server
Corporation
ISDN Internet
interior network
or
PSTN
Remote
user Corporation gataway
Compared with VPNs of other types, VPDNs provide more flexible authentication
mechanisms, accounting schemes, and higher security. In addition, VPDNs support
dynamic address assignment. VPDNs adopt Layer 2 tunnels and support multiple Layer
3 protocols.
l Virtual Private Routing Network (VPRN)
A Virtual Private Routing Network (VPRN) connects the headquarters, branches, and
remote offices through virtual devices. Different from VPNs of other types, in VPRNs,
packets are forwarded on the network layer. Each VPN node on the public network sets
up a private routing forwarding table, which contains information about reachability of
the network layer, for each VPN. Data traffic between VPN nodes and that between VPN
nodes and user sites is transmitted on the basis of the forwarding tables.
VPRNs are implemented through two ways: one is using traditional VPN protocols such
as the Generic Routing Encapsulation (GRE) protocol; the other way is using MPLS
(Multi-Protocol Label Switching).

Equipment
NOTE
l For more information about GRE, refer to the chapter GRE in this manual.
l For more information about MPLS VPRN, refer to the chapter BGP/MPLS IP VPN in this
manual.
l Virtual Private Wire Service (VPWS)
Virtual Private Wire Service (VPWS) are also called VLL (Virtual Leased Line). By
using IP networks to emulate private leased lines, Virtual Private Wire Service (VPWS)
provides the asymmetric and low cost Digital Data Network (DDN) service. For users on
the two ends of a virtual leased line, the virtual line is similar to a traditional leased line.
VPWS is available in traditional private networks such as ATM and FR networks.
Operators can smoothly update ATM or FR networks to VPWS networks.
As a service of virtual leased line, VPWS is generally used on the access layer and the
convergence layer. VPWS is divided into the following types:
– Circuit Cross-Connect (CCC)
– Static Virtual Circuit (SVC)
– Martini VPWS
As an end-to-end Layer 2 service-bearing technology, Pseudo-Wire Emulation Edge-to-
Edge (PWE3) is an extension of Martini VPWS.
NOTE
l For more information about VPWS, refer to the chapter VLL in this manual.
l For more information about PWE3, refer to the chapter PWE3 in this manual.
VPWS is suitable for VPNs of star topology; VPRN is suitable for fully connected
VPNs.
l Virtual Private LAN Service (VPLS)
Virtual Private LAN Service (VPLS) connects LANs through a virtual private network
segment. VPLS is an extension of LANs over IP public networks.
VPLS is also called Transparent LAN Service (TLS). Different from the common
L2VPN P2P service, by using VPLS, SPs can provide the multi-point service based on
Ethernet networks by using MPLS backbone networks.
Thanks to the advantages such as flexible configurations of VLAN logical interfaces and
high bandwidth/cost ratio, the Ethernet technology is widely used nowadays.
VPRNs and VPWS networks can also provide LAN services; the following limitations
of the traditional Ethernet technology, however, still exist:
– Broadcast storm of frames with unknown destination MAC addresses cannot be
avoided.
– The expansion of the Spanning Tree Protocol (STP) is limited.
– VLAN address spaces are limited.
VPLS is thus introduced to solve those problems. Instead of running STP, VPLS
backbone networks use full-mesh connections and split horizon to eliminate loops. For
unicast or multicast frames with unknown destination MAC addresses, a VPLS discards,
handles the frames on the local node, or broadcasts the frames. VPLS, therefore, can
expand the range of a VLAN to a country or even the whole world.
Classification Based on Applications

According to different applications, VPNs are divided into the following types:

Equipment
l Intranet VPN
Intranet VPNs connect all the branches of an enterprise through public networks. Intranet
VPNs are the extension or substitute for traditional private networks or other enterprise
networks.
Through Intranet VPNs, headquarters, branches, offices, and mobile personnel of an
enterprise compose an intranet by using public networks. VPNs can be applied to
constructing intranets of banks and governments.
Chain business such as chain stores, storage and logistics companies, and gas stations are
typical examples of intranet VPNs.
l Extranet VPN
Extranet VPNs extend enterprise networks to suppliers, partners, and clients by using
VPNs. The VPNs are established between different enterprises with common benefit
through public networks. Parts of the resources are thus shared among different VPN
users.
On a network of traditional leased lines, an extranet needs to manage the network,
perform access control, and even install compatible network devices on user side.
Although an extranet can be established in dialing mode, different extranet users must be
configured respectively. The configurations are not simplified. An extranet in dialing
mode requires high expenses in construction and maintenance for wide distribution of
partners and customers. Therefore, most enterprises give up extranets, which leads to the
complication of business processes between the enterprises. The efficiency of the
enterprises is reduced.
Extranet VPNs are thus introduced. Similar to intranet VPNs in terms of the technology
implementation, extranet VPNs are easy to construct and manage. Currently, enterprises
generally use VPNs to construct extranets. To guarantee QoS, generally, external
communication of an enterprise is not realized through the Internet. The reason is that
the data transmission requires high security guarantee, and the security of extranets is
stronger than that of the Internet. The access right of an extranet VPN can be configured
by each extranet user. For example, a user can configure firewalls to perform access
control.
l Access VPN
Through access VPNs, personnel on business, Small Office Home Office (SOHO), and
remote offices can access the servers of an intranet through cheap dialing media and set
up private network connections with intranets and extranets. Access VPNs are also
called VPDNs.
Access VPNs are divided into two types: client-initiated VPN and NAS-initiated VPN.
Classification Based on Layers

According to different layers on which VPNs are implemented, VPNs are divided into the
following types:
l L3VPN
Layer 3 VPNs (L3VPNs) are also called VPRNs. GRE VPN, BGP/MPLS VPN based on
RFC 4364, and BGP/MPLS VPN with GRE tunnels are all L3VPNs. The BGP/MPLS
VPN is generally applied at the forwarding layer of the core network; GRE VPN is
mainly applied at the access layer.
NOTE
l For more information about L3VPN, refer to the chapter BGP/MPLS IP VPN in this manual.

Equipment
l L2VPN
With the development of network technologies, carrier's networks become increasingly
complex. New technologies are required to integrate traditional switching networks such
as ATM and FR networks with IP or MPLS networks. Layer 2 VPN (L2VPN) is thus
introduced.
L2VPN includes the preceding described VPWS and VPLS. VPWS is suitable for large-
scale enterprises that are connected through Wide Area Networks (WANs); VPLS is
suitable for small-scale enterprises that are connected through Metropolitan Area
Networks (MANs). VPLS cannot avoid broadcast storm. In addition, on a VPLS
network, Provider Edge (PE) devices need to learn Medium Access Control (MAC)
addresses of devices in the private network. Protocol and storage involve a high cost.
L2VPNs only use Layer 2 links of SP networks. Therefore, L2VPNs can support
multiple Layer 3 protocols. L3VPNs also support multiple protocols; however, there are
more limitations than the L2VPN case.
NOTE
l For more information about L2VPN, refer to the chapter VLL, PWE3 and VPLS in this
manual.
Table 11-1 lists the difference between L2VPN and L3VPN.
Table 11-1 Comparison between L2VPNs and L3VPNs

Item L2VPN L3VPN
Security High Low
Support of Layer 3 protocols Relatively flexible Limited
Impact of user networks on Little Great

the backbone network
Compatibility with traditional Good Pool

WANs
Route management Routes managed by users Routes managed by SPs
Application layer Mainly at access layer and Mainly at core layer

convergence layer
Classification Based on Operation Modes

According to different operation modes, VPNs are divided into the following types:
l Customer Premises Equipment Based VPN

In Customer Premises Equipment Based VPN (CPE-based VPN) mode, users build,
manage, and maintain VPNs. A VPN tunneling protocol such as GRE, L2TP, and PPTP
must be configured on user devices.
In this mode, Customer Edge (CE) devices initiate requests for VPN connections. VPNs
can be realized without any special support of operators.
The configuration of a CPE-based VPN is complex and the service scalability is poor;
therefore, CPE-based VPNs are mainly used on the access layer.

Equipment
Traditional VPNs based on public IP networks (IP VPNs) belong to CPE-based VPNs.
CPE-based VPNs set up VPN security tunnels between private devices to transmit
private data of users. The Internet is a typical public IP network. Constructing VPNs
based on the Internet is economical; however, QoS cannot be guaranteed. When
programming an IP VPN, an enterprise should consider choosing which kind of public
IP network.
l Network-based VPN
In Network-based VPN mode, ISPs build, manage, and maintain VPNs. The ISPs allow
users to manage and control services in some measure. The functions and features are
mainly implemented on the devices on network side. On user side, only networks
interconnection is required.
This mode reduces the user investment, improves the flexibility and scalability of
services, and brings more incomes to operators.
VPNs based on MPLS, namely, MPLS VPNs belong to Network-based VPNs. Owing to
the advantages on flexibility, scalability, and QoS, MPLS VPNs become the major IP
VPN technology and are widely used in telecom carrier's networks and enterprise
networks. As an important technology to connect branches of VIP customers, to isolate
3G and NGN services, MPLS VPNs are generally applied on the backbone core network
and the convergence layer. MPLS VPNs are also of great importance to MANs. The
MPLS VPN technology applied within a MAN is an important means to improve IP
MAN values and increase profits of operators.
In an MPLS VPN network, user sites can use T1, FR, ATM VCs, and Digital Subscriber
Lines (DSLs) to access the MPLS VPN backbone network. No additional configuration
is required on user devices.
Table 11-2 lists the difference between a CPE-based VPN and a Network-based VPN.
Table 11-2 Difference between a CPE-based VPN and a Network-based VPN

Item CPE-based VPN Network-based VPN
Service scalability Service scalability is poor. Service scalability is

excellent.
Customer investment A CPE-based VPN requires A Network-based VPN

great investment. requires investment less
than a CPE-based VPN.
Support of tunnels on user A CPE-based requires the A Network-based VPN does

devices support of tunnels on user not require the support of
devices. tunnels on user devices.
Requirements on Most features and functions Most features and functions

performance are realized on CE devices. are realized on PE devices.
A CPE-based VPN, A Network-based VPN,
therefore, is CE intensive. therefore, is PE intensive.
Seamless integration of CPE-based VPNs with Network-based VPNs can provide users with
more reliable, securer, and more abundant VPN services.

Equipment
11.1.1.2 Architecture of VPN

As an upper-layer service, the VPN technology is more complex than the P2P technology. To
implement the VPN technology, network connections must be set up between users, including
construction of internal topology of a VPN, route calculation, and maintenance of VPN users
joining or leaving. The architecture of a VPN is complex, and comprises the following parts:
l VPN tunnels
– Establishment of tunnels.
– Management of tunnels.
l VPN management
– VPN configuration management.
– VPN member management.
– VPN attribute management: the management of attributes of multiple VPNs on PE
devices and differentiation of VPN address spaces.
– VPN automatic configuration: the establishment of one-to-one relationship between
VPN internal links in L2VPNs after information about the peer links is received on
the local link.
l VPN signaling protocol
– Exchange and share of VPN resources between CE devices on a VPN: For an
L2VPN, information about data links is exchanged; for an L3VPN, routing
information is exchanged; for a VPDN, information about a single data link is
exchanged.
– VPN member discovery in some applications.
11.1.1.3 Typical Networking of VPN

A typical VPN is divided into the following layers:
l Access layer
The devices on the access layer provide users with the access function. Those devices
need not realize many functions, but requires many access interfaces. For MANs in big
cities, the access layer needs to realize more functions in addition to the access function.
On the access layer, generally, a CE device is dual-homed to or multi-homed to access
nodes. The dual homing is either physical or logical. In the physical dual homing, a CE
device accesses two nodes through two physical links; in the logical dual homing, a CE
device accesses two nodes through loops. The logical dual homing is widely used in
L2VPN network.
l Convergence layer
The convergence layer is of either a mesh topology or a ring topology.
l Backbone layer
The backbone layer must be of a full-mesh topology and multi-level backup. The devices
on the backbone layer are generally connected through high-speed interfaces.
11.1.2 Principles

Equipment
11.1.2.1 VPN Tunnel

The major principle of a VPN is to encapsulate VPN packets with tunnel and transmit the
packets in a private channel established in VPN backbone networks. The packets, therefore,
are transparently transmitted in the tunnel.
The tunnel technology uses one protocol to encapsulate packets of another protocol, and the
encapsulation protocol itself can be encapsulated or carried by other protocols. For users, a
tunnel is a logical extension for PSTN or ISDN links, and the tunnel is used in the same way
as physical links.
A VPN tunnel has the following functions:
l Encapsulates data.
l Establishes connection between two ends of a tunnel.
l Periodically checks the connectivity of a VPN tunnel.
l Guarantees the security of a VPN tunnel.
11.1.2.2 Implementation Modes of VPN

Considering three parts of the VPN architecture, the VPN technology can be implemented in
the following three modes.
Tunnel + VPN Management

In this mode, the VPN architecture comprises the following parts:
l VPN tunnels: Establishment of tunnels
l VPN management
– Deployment of network management
– Accounting
Tunnel + VPN Management + VPN Signaling Protocol

In this mode, the VPN architecture comprises the following parts:
l VPN tunnels: Establishment of tunnels
l VPN management
– VPN configuration management
– VPN member management
– VPN attribute management
– VPN automatic configuration
l VPN signaling protocol: Exchange and share of VPN resources between CE devices on a
VPN
This mode is adopted by Martini VPWS, PWE3, Martini VPLS.
Instantiation
In instantiation mode, each VPN on Layer 2 and Layer 3 is instantiated, and instances of
private forwarding information of each VPN are established. Besides tunnel management, a

Equipment
VPN in this mode performs member discovery, member management, and VPN automatic
configuration.
11.1.2.3 Features Related to the Implementation of VPN
Operability
The VPN technology is generally used to provide services for different departments of an
enterprise through public networks. Nowadays, more and more VPN users require VPN
services being operable. They do not need to spend too much time and unexpected resources
on network maintenance, and require operators to undertake the task. Therefore, when
designing a VPN, consider the operability first.
Manageability
On a VPN, network management of an enterprise is seamlessly extended from LANs to the
public network, even clients and partners. Besides assigning some nonessential network
management tasks to the carrier, the enterprise need also fulfill many network management
tasks. So a complete VPN management system is absolutely necessary.
VPN management mainly includes security management, equipment management,

configuration management, access control list (ACL) management, and QoS management.
The effects of VPN management are as follows:
l VPN management reduces network risks. After a VPN intranet is extended to a public
network, the intranet is faced with more risks. VPN management can guarantee the
integrity of data resources of an intranet when branches, clients, and partners of an
enterprise access a VPN.
l VPN management provides better expansibility. VPN management quickly makes
adjustment to the increased number of clients and partners, including the upgrade of
network hardware and software, the guarantee of network quality, and the maintenance
of security policies.
l VPN management reduces costs. VPN management controls expenses of operation and
maintenance and ensure service scalability at the same time.
l VPN management improves the reliability of a VPN. VPNs are set up over a public
network. Compared with traditional WANs using leased lines, the controllability of the
VPNs is lower. VPN management should guarantee the reliable and stable operation of a
VPN.
Security
VPNs are constructed over public networks. The implementation of a VPN is simple,
convenient, and flexible. However, network risks arise at the same time.
l On a traditional IP VPN, an enterprise must guarantee that the VPN data is not
intercepted and modified by attackers, and prohibit the access of unauthorized users.
Extranet VPNs are faced with even more serious risks.
The following solutions can improve the security of a VPN:
– Tunnel technology and encryption: By performing multi-protocol encapsulation, the
tunnel technology can enhance the flexibility of a VPN and provide P2P logical
channel on connectionless IP networks. When users require a more secured data

Equipment
transmission, an encrypted tunnel is utilized, which can prevent data from being
intercepted and modified.
– Data authentication: In an insecure network such as a public network used by a
VPN, packets may be illegally intercepted and modified. The receiver receives the
incorrect packets. By using data authentication, the receiver can recognize the
modification.
– User authentication: Through user authentication, a VPN can allow legal users to
access enterprise resources and prohibit the access of unauthorized users. After the
configuration of Authentication, Authorization and Accounting (AAA), ATNs can
authenticate users, authorize users with different levels, and generate access
records. User authentication greatly improves the security of access VPNs and
extranet VPNs.
– Firewalls and attack detection: Firewalls are used to filter packets and prevent
illegal access. Attack detection is used to judge the validity of packets by analyzing
the packets, implement security policies in real time, disconnect the illegal sessions,
and record illegal access.
l MPLS VPNs are created on the basis of labels of forwarding table and packets on
network side. If an MPLS network is not connected to the Internet, the security of
internal resources of the MPLS network is guaranteed. The MPLS VPN, therefore, can
ensure the security of the VPN to some extent.
If the MPLS VPN users want to access the Internet, a channel with a firewall can be
created to provide a secure connection for the VPN. The MPLS VPN is easy to manage
because only one security policy is applied in the VPN.
11.1.3 VPN Applications

L2VPN+L3VPN for ATN+CX scenarios is a bearer mode used in the IPTime mobile
broadband (MBB) ATN+CX solution. The ATN+CX solution is a major solution that Huawei
uses to construct an IP mobile backhaul (MBH) network.
L2VPN+L3VPN for ATN+CX scenarios
In L2VPN+L3VPN for ATN+CX scenarios, ATN and CX series products are deployed on the
RAN, which provides excellent fixed mobile convergence (FMC) capabilities and features
simple and flexible networking. The hierarchical network between cell site gateways (CSGs)
and radio service gateways (RSGs) can bear various types of services. ATN devices function
as CSGs to form the access network. CXs function as aggregation site gateways (ASGs) and
RSGs to form the aggregation network. All these devices can be flexibly deployed based on
2G, 3G, and LTE service requirements.
On the network shown in Figure 11-2, to keep up with the Mobile IP trend, a base transceiver
station (BTS), a NodeB, and an eNodeB need to connect to a mobile network to bear TDM,
ATM, and Ethernet/IP services respectively.

Equipment
Figure 11-2 L2VPN+L3VPN for ATN+CX scenarios
Node B RNC
MPLS TE1 MPLS TE2 MME

eNode B ASG RSG /SGW
CSG
L2VPN
L3VPN Eth
MSPW ATM/TDM
L2VPN+L3VPN for ATN+CX scenarios has the following characteristics:

l Allows for large-scale dynamic networking, lowers performance requirements on ATN
and CX devices, and reduces pressure on aggregation devices.
l Provides a powerful fault isolation capability (isolates access ring faults from
aggregation ring faults) and improves network robustness.
l Uses the unified access ring technology (PWE3).
l Provides high reliability and security.
Configuring ARP/ND Dual Fed on CSGs
In the scenario shown in Figure 11-2, when a CSG receives an ARP/ND packet for PWE3
services, the CSG encapsulates the PW label and tunnel label into the ARP/ND packet and
then transparently transmits the packet to the remote AGG. As a result, only the master AGG
learns this ARP/ND packet. Specifically, only one AGG has the ARP/ND packet information.
When a fault occurs on the master AGG, the Layer 3 service is switched from the master
AGG to the backup AGG. The backup AGG, which does not have the ARP/ND packet
information, has to trigger ARP/ND packet learning by sending traffic. It takes a long time for
the backup AGG to learn the ARP/ND packet of the base station. As a result, many packets
are lost during path switching.
To reduce packet loss, configure ARP/ND dual fed on CSGs. When the primary and
secondary PWs are working properly, the AC-side device transparently transmits the received
ARP/ND packet to master and backup AGGs over the primary and secondary PWs
respectively so that Layer 3 interfaces of the master and backup AGGs can learn the ARP/ND
packet of the base station. When a fault occurs on the master AGG, the Layer 3 service is
switched from the master AGG to the backup AGG, reducing the time the backup AGG learns
the ARP/ND packet of the base station. Consequently, packet loss is reduced and device
performance is improved. When the Layer 3 service on the aggregation ring recovers and is
switched back from the backup AGG to the master AGG, the master AGG does not have to
learn the ARP/ND packet and corresponding packet loss will not occur.

Equipment
Terms
Term Description
AC A physical or logical link used to transmit frames between the

CE and the PE in L2VPN. An AC interface can be a physical
interface or a virtual interface. All the user packets on the AC
including the protocol packets of Layer 2 and Layer 3 are
completely transmitted to the peer site.
Address Space An address realm that is managed by a VPN.
AVP The attribute value pairs (AVP) that are used by the L2TP
protocol to transmit and negotiate the L2TP parameters. A
control message contains multiple AVPs.
Carrier's Carrier A network structure in which a user of a BGP/MPLS VPN

service who is also a service provider at the same time. In this
situation, the BGP/MPLS VPN service provider is a Level 1
carrier. The user of the BGP/MPLS VPN service who is also a
service provider at the same time is called a Level 2 carrier.
CCC An implementation of MPLS L2VPN that uses the static

configuration of labels. CCC transmits data by using Layer 1
label. CCC exclusively uses an LSP.
Control connection A connection that defines a pair of LNS and LAC and controls
the establishment, maintenance and dismantlement of tunnels
and sessions. The procedures for establishing a control
connection involve the exchange of information about identity
protection, L2TP version, frame type, and parameters of the
physical links.
Control message A message used in the establishment and maintenance of tunnels

and sessions, and in the transmission control. Control messages
are transmitted in reliable mode.
CPE-based VPN Customer Premises Equipment-based VPN. A VPN that is

controlled by users.
CE Customer edge equipment that is directly connected to the

service provider. In a VPN based on MPLS, a CE device can be
a ATN, switch, or even a host.
CW A 4-byte encapsulated packet header. It is used to transmit

packets in an MPLS packet switching network. The control word
carries the sequence number, fills the packets to prevent too
short packets, and carries Layer 2 header control information.

Equipment
Term Description
Data message A message that encapsulates PPP frames and is transmitted in

tunnels. Data messages are transmitted in unreliable mode.
Dynamic PW A PW that is set up through a signaling protocol.
Extranet VPN A VPN that expands an enterprise network to the service

provider, partner, and client. Through an extranet VPN, different
enterprises can construct VPN through public networks.
Forwarder A forwarding table of a VPLS network. After a PE receives

frames delivered by ACs, the forwarder chooses the PW that is
used to forward packets.
GRE An encapsulation mode in which packets of some network

protocols such as IP and IPX are encapsulated and therefore can
be transmitted in networks supporting other protocols such as IP.
Intranet VPN A VPN that connects sites within an enterprise through the
public network.
Kompella VPN An implementation of L2VPN that is realized in end-to-end

mode in an MPLS network. In Kompella VPN, BGP is used as
the signaling protocol to transmit Layer 2 information and VC
labels.
L2TP A Layer 2 tunneling protocol that is drafted by IETF and

involves the participation of companies such as Microsoft. The
L2TP combines the advantages of both PPTP and L2F. Many
companies have accepted it.

Equipment
Term Description
LAC A device that is attached to a switching network and is capable

of L2TP processing. It possesses PPP terminal system, and
generally provides the access service to users.
LNS A server that processes the L2TP protocol.
Martini VPN An implementation of MPLS L2VPN that is realized by setting

up point-to-point link. In Martini VPN, LDP is used as the
signaling protocol to transmit Layer 2 information and VC
labels.
MP-BGP A protocol that transmits VPN structure information and VPN

IPv4 routes between the PE devices.
MPLS L2VPN A VPN that provides Layer 2 VPN services based on the MPLS
network to enable the carriers to provide VPNs of different
media, including ATM, FR, VLAN, Ethernet, and PPP on
unified MPLS network.
Multi-segment PW A situation in which multiple PWs exist between the U-PEs.
NAS A server that provides the access to Internet for PSTN/ISDN

dialup users. A Network Access Server (NAS) can work as an
LAC, or as an LNS, or as an LAC and LNS at the same time.
Network-based VPN A VPN in which users entrust maintenance of the VPN to ISPs
and realize VPN features and functions on the network edge
devices.
P A backbone device that is located in the service provider

network. A P device is not directly connected to the CE devices.
The P devices only need the basic MPLS forwarding capability
and do not maintain information about a VPN.
Passenger Protocol It indicates the packet protocol before encapsulation.
Payload A data packet that must be encapsulated and routed is payload.

Equipment
Term Description
PE A Provider Edge (PE) device is a device that is located in the

backbone network in the MPLS VPN structure. A PE device is
responsible for VPN user management, establishment of LSPs
between the PE devices and exchanges of routing information
between sites of the same VPN. A PE device performs the
mapping and forwarding of the packets from the private network
to the public network tunnels and that in the reverse order. PE
can be further divided into UPE, SPE and NPE.
PPTP A tunnel protocol that encapsulates PPP on the tunnels of an IP

network. The protocol is supported by Microsoft, Ascend, and
3COM.
PW A bidirectional virtual connection between two VSIs. A VSI

consists of a pair of unidirectional MPLS VCs.
PWE3 A technology that bears Layer 2 services. PWE3 emulates

services such as ATM, FR, Ethernet, low-speed TDM circuit,
and SONET/SDH.
PW signaling A signaling protocol used to set up and maintain Pseudo Wires

(PWs). PW signaling can automatically discover the peer PE
devices of VSIs. Currently, the primary PW signaling protocols
are LDP and BGP.
PW Template A Pseudo Wire (PW) template is an aggregation of public

attributes of the PWs. A PW template is shared by different
PWs.
QinQ A mechanism that uses the tunnel protocol based on 802.1Q

encapsulation and provides multi-point L2VPN services. In Q-
in-Q, the private-network VLAN tag is encapsulated in the
public-network VLAN tag. The packets carrying double tags are
transmitted through the backbone network of the service
provider. Therefore, the users are provided with a Layer 2 VPN
tunnel service.
Route Distinguisher An 8-byte field in a VPN IPv4 address. A route distinguisher

(RD) together with a 4-byte IPv4 address prefix construct a VPN
IPv4 address to differentiate the IPv4 prefixes using the same
address space.

Equipment
Term Description
Single segment PW A situation in which only one PW exists between the U-PEs.
The label switching on the PW label level is not needed.
Site A group of IP systems. Sites have IP connectivity between each

other and this connectivity need not be realized by the service
provider network.
S-PE It is a device that is responsible for PW switching and PW label

forwarding within a backbone network. It is connected to the
UPE and is located in the core of a VPLS full-meshed network.
The SPE is connected to all the devices in the VPLS full-meshed
network. In the case of SPE that is connected to a UPE, the UPE
functions as a CE. The PW set up between the UPE and the SPE
serves as the AC of the SPE. The SPE must learn the MAC
addresses of all the sites at the UPE side and those of the UPE
interfaces that are connected to the SPE. SPE is also called NPE.
Static PW A PW whose parameters are specified manually instead of

parameter negotiation. Data is transmitted between the PE
devices through tunnels.
Session connection A connection that is multiplexed on a tunnel connection. A

session connection is a PPP session over a control connection.
Service quality The priority information in the Layer 2 frame header is mapped
to the priority of QoS in the packet that is transmitted on the
public network. Generally, it is applied to MPLS TE networks.
SVC An implementation of static MPLS L2VPN that does not use the
signaling protocol to transmit L2VPN information. In SVC, VC
label information needs manual configuration.
Transport protocol A protocol that is responsible for forwarding encapsulated

packets.
Token A part of a tunnel ID and is an index used to search for a tunnel.
Tunnel binding A process in which a VPN peer is associated with an MPLS TE

tunnel on the PEs of a VPN backbone network.
Tunnel ID A group of information, including token, slot number of an

outgoing interface, tunnel type.
Tunnel interface A virtual P2P interface that can encapsulate packets. Similar to
loopback interfaces, tunnel interfaces are logical interfaces.
Tunnel iteration A process in which a route is iterated to a tunnel.

Equipment
Term Description
Tunnel A channel through which a packet switching network transmits

service traffic between the PEs. In VPN, a tunnel is an
information transmission channel between two entities. The
tunnel provides security for transparent transmission of VPN
information. A tunnel can bear multiple PWs. In most cases, a
tunnel is an MPLS tunnel.
Tunnel Management A module manages the tunnel. It notifies the tunnel status to the
application that uses this tunnel and queries the tunnel and
configured policy based on the destination IP address. It
provides uniform interfaces to such upper-layer applications as
L3VPN, L2VPN, Resource Manager (RM), and the Border
Gateway Protocol (BGP).
Tunnel policy A policy used to choose a tunnel according to the destination IP

address.
Tunnel switch A technology that is used to implement the L2TP tunnel relay. A
device supporting the tunnel switch works on the one hand as an
LNS to set up the tunnel connection with the LAC, and on the
other hand works as an LAC to set up the tunnel connection with
the LNS.
U-PE A UPE is an edge device of a backbone network and is directly

connected to user edge devices in a VPN.
VC A unidirectional logical connection between two nodes.
VCCV A tool that is used to manually test the connectivity of the virtual
circuit. Similar to ICMP ping and LSP ping, it is realized
through the extended LSP ping.
VLL A line that emulates the leased line by using IP network and
therefore provides unsymmetrical and low-cost Digital Data
Network (DDN) service.
VPDN A network that implements VPN by using the dial-up function of

the public network such as ISDN and PSTN, and the access
network to provide the access service for enterprise, small-scale
ISP, and mobile business man.
VPLS A service that is used to connect more than one Ethernet LAN
segment through the PSN and make them operate in an
environment similar to a LAN.

Equipment
Term Description
VPN A recently-developed technology that implements the private

network over a public network. It is a network that only logically
exists.
VPN instance An entity that is set up and maintained by the PE devices for
directly-connected sites. Each site has its VPN instance on a PE
device. A VPN instance is also called VPN Routing and
Forwarding (VRF) table. A PE device has multiple forwarding
tables, including a public-network routing table and one or
multiple VRFs.
VPN route matching A process in which VPNv4 routes and VPN targets of the local
VPN instances are matched.
VPN target A BGP extended community attribute that is also called Route
Target. In BGP/MPLS IP VPN, VPN Target is used to control
VPN routing information. VPN Target attribute defines a VPN-
IPv4 route can be received by which site and a PE device can
receive routes from which site.
VPN Tunnel A Virtual Private Network (VPN) tunnel is the virtual

connection that is set up between the VPN nodes or between the
VPN node and the node in the customer side, generally the
Provider Edge (PE) devices of the backbone network in the
Packet Switched Network (PSN). The VPN tunnel is used to
transmit VPN data.
VPRN A network that realizes the communication between the

headquarters, branches, and the remote offices through the
virtual devices.
VPWS A technology that bears Layer 2 services. VPWS emulates

services such as ATM, FR, Ethernet, low-speed TDM circuit,
and SONET/SDH in a PSN.
VRF See VPN instance.
VSI An instance through which the physical access links of VPLS

can be mapped to the virtual links. Each VSI provides
independent VPLS service. VSI has Ethernet bridge function and
can terminate PW.
Abbreviations

Equipment
ASBR Autonomous System Boundary Router
ATM Asynchronous Transfer Mode
AVP Attribute Value Pair
CCC Circuit Cross Connect
CE Customer Edge
CHAP Challenge Handshake Authentication Protocol
COS Class of Service
CRC Cyclic Redundancy Check
CW Control Word
DDN Digital Data Network
DLCI Data Link Connection Identifier
DR Designated Router
DTE Data Terminal Equipment
DU Downstream Unsolicited
FEC Forwarding Equivalence Class
FR Frame Relay

Equipment
HDLC High-level Data Link Control
HoPE Hierarchy of PE
HoVPN Hierarchy of VPN
HVPLS Hierarchical Virtual Private LAN Service
HWTACACS Huawei Terminal Access Controller Access Control System
IETF Internet Engineering Task Force
IKE Internet Key Exchange
INARP Inverse Address Resolution Protocol
IPHC IP header compression
IPSec Internet Protocol Security extensions
IPX Internet Packet Exchange
ISDN Integrated Services Digital Network
L2F Layer 2 Forwarding
L2TP Layer 2 Tunneling Protocol
L2VPN Layer 2 Virtual Private Network
L3VPN Layer 3 Virtual Private Network
LAC L2TP Access Concentrator
LAN Local Area Network
LFIB Label Forward Information Base

Equipment
LMI Local Management Interface
LNS L2TP Network Server
LO Label-block Offset
LR Label Range
LSA Link State Advertisement
LSR Label Switching Router
MAC Media Access Control
MH-PW Multi-segment Pseudo-Wire
MTU Maximum Transmission Unit
NAS Network Access Server
NAT Net Address Translation
NBIP-VPN Network-based VPN
NCP Net Control Protocol; Network Control Point; Network Control

Protocol
NHLFE Next Hop Label Forwarding Entry
NNI Network-to-Network Interface
OAM Operation Administration and Maintenance
P2MP Point-to-Multipoint

Equipment
P2P Point-to-Point
PAP Password Authentication Protocol
PE Provider Edge
PHP Penultimate segment Popping
PING Packet internet groper
POP Point Of Presence
PPTP Point-to Point Tunneling Protocol
PPVPN Provider Provisioned VPN
PSN Packet Switched Network
PSTN Public Switched Telephone Network
PVC Permanent Virtual Channel
PW Pseudo-Wire
PWE3 Pseudo-Wire Emulation Edge-to-Edge
PW template Pseudo-Wire template
QinQ 802.1q-in-802.1q
RADIUS Remote Authentication Dial In User Service
RD Router Distinguisher
RR Route-Reflector
PSN Packet Switched Network
RSVP-TE RSVP-Traffic Engineering
RTP Real-Time Transport Protocol

Equipment
SDH Synchronous Digital Hierarchy
SH-PW Single-segment Pseudo Wire
SONET Synchronous Optical Network
SP Service Provider
SPE Superstratum PE; Service provider-end PE
SVC Static Virtual Circuit
S-PE Switching-point PE
TE Traffic Engineering
TDM Time Division Multiplexed
UPE Underlayer PE; User-end PE
U-PE Ultimate PE
VC Virtual Circuit
VCCV Virtual Circuit Connectivity Verification
VCI Virtual Channel Identifier
VLAN Virtual Local Area Network
VPDN Virtual Private Data Network
VPI Virtual Path Identifier
VPRN Virtual Private Routing Network
VPWS Virtual Private Wire Service
VRF VPN Routing and Forwarding table

Equipment
11.2 Tunnel Policy
11.2.1 Introduction
Definition
A tunnel policy determines which type of tunnel can be selected for an application. Tunnel
policies can be classified into the following types:
l Tunnel type prioritizing policy: selects tunnels for an application based on the tunnel
type priorities defined in the policy.
l Tunnel binding policy: selects only a specified tunnel for an application.
The two types of policies are mutually exclusive.
A tunnel selector selects a tunnel policy for each route based on route attributes.
Purpose
Currently, multiple types of tunnels are provided, such as LSPs (including LDP LSPs and
static LSPs), constraint-based routed LSPs (CR-LSPs), and Generic Routing Encapsulation
(GRE) tunnels. The tunnel management (TNLM) module selects tunnels for applications in
accordance with configured tunnel policies.
11.2.2 Principles
11.2.2.1 Tunnel Type Prioritizing Policy

In a tunnel type prioritizing policy, you can specify the sequence in which each type of tunnel
is selected and the number of tunnels participating in load balancing. This type of policy
applies only to GRE tunnels, common LSPs, and CR-LSPs. Tunnels specified in a tunnel type
prioritizing policy are selected in sequence. The tunnel type specified first is selected as long
as the tunnel of this type is Up, no matter whether the tunnel of this type is selected by other
services. Generally, the tunnel type specified later is not selected except when load balancing
is required or when the preceding tunnels are all Down.
NOTE
l If no tunnel policy is configured for an application or the tunnel policy to be configured has not been
created yet, the system selects a tunnel based on the default tunnel selection policy. Specifically, the
system selects only one LSP for the application.
l If a protection group is configured for CR-LSPs, the protection CR-LSP cannot be selected. In other
words, the tunnel playing the protection role cannot be selected.
l If a CR-LSP is reserved for tunnel binding, the CR-LSP cannot be selected.
11.2.2.2 Tunnel Binding Policy

A tunnel binding policy allows you to bind one or more tunnels to a destination address.
Tunnel binding applies only to TE tunnels, but the system does not check whether a bound
tunnel is a TE tunnel. Exercise caution when configuring a tunnel binding policy.

Equipment
In addition to binding multiple TE tunnels to a destination address to load-balance traffic, you

can also configure the down-switch attribute in a tunnel binding policy to ensure that other
tunnels can be selected when all the specified tunnels are unavailable. This implementation
ensures non-stop traffic forwarding.
A tunnel binding policy selects tunnels based on the destination address and indexes of bound
TE tunnel interfaces. A tunnel binding policy selects TE tunnels as follows:
l If no TE tunnel is specified for the destination IP address, the tunnel binding policy
selects a tunnel in the sequence of common LSP, CR-LSP.
l If a TE tunnel is specified for the destination IP address and the specified TE tunnel is
available, the tunnel binding policy selects the specified TE tunnel.
l If a TE tunnel is specified for the destination IP address but the specified TE tunnel is
unavailable, the tunnel selection result is determined by the down-switch attribute. If the
down-switch attribute is configured, another available tunnel is selected in the sequence
of common LSP, CR-LSP, and GRE tunnel. If the down-switch attribute is not
configured, no tunnel is selected.
NOTE
A tunnel binding policy can select only TE tunnels with the reserved-for-binding attribute configured.
11.2.2.3 Comparison of Tunnel Policies
Table 11-3 Comparison of tunnel policies

Type Description
Tunnel type Cannot ensure which tunnel is selected if there are several tunnels
prioritizing policy of the same type.
Tunnel binding Precisely specifies which TE tunnel can be used. Tunnel binding
policy policies applies only to TE tunnels, but TE tunnels can also use the
tunnel type prioritizing policy.
11.2.2.4 Tunnel Selector

A tunnel policy selector defines certain matching rules and associates the routes whose
attributes matching the rules with specific tunnels. This facilitates flexible tunneling and
better satisfies user requirements.
A tunnel policy selector consists of one ore more nodes and the relationship between these
nodes is "OR". The system checks the nodes based on index numbers. If a route matches a
node in the tunnel policy, the route does not continue to match the next node. Each node
comprises a set of if-match and apply clauses:
l The if-match clauses define the matching rules that are used to match certain route
attributes such as the next hop and RD.
The relationship between the if-match clauses of a node is "AND". A route matches a
node only when the route meets all the matching rules specified by the if-match clauses
of the node.
l The apply clause specifies actions. When a route matches a node, the apply clause
selects a tunnel policy for the route.

Equipment
The matching modes of a node are as follows:
l Permit: If a route matches all the if-match clauses of a node, the route matches the node
and the actions defined by the apply clause are performed on the route. If a route does
not match one if-match clause of a node, the route continues to match the next node.
l Deny: In this mode, the actions defined by the apply clause are not performed. If a route
matches all the if-match clauses of a node, the route is denied and does not match the
next node.
11.2.2.5 Introduction
Introduction
Definition
Generic Routing Encapsulation (GRE) is a tunneling protocol that encapsulates the packets of
a wide variety of network layer protocols, such as Internetwork Packet Exchange (IPX),
Asynchronous Transfer Mode (ATM), IPv6, and AppleTalk, into IP tunneling packets, so that
these packets can be transmitted over an IPv4 network.
GRE provides a mechanism of encapsulating packets of a protocol into packets of another

protocol. This allows packets to be transmitted over heterogeneous networks. The channel for
transmitting heterogeneous packets is called a tunnel.
Purpose
To ensure the packets of a wide variety of network layer protocols, such as IPX, ATM, IPv6,
and AppleTalk, to be transmitted over the IPv4 network, GRE is introduced. GRE solves the
transmission problem faced by heterogeneous networks.
In addition, GRE serves as a Layer 3 tunneling protocol of VPNs, and provides a tunnel for
transparently transmitting VPN packets. Currently, GRE is supported by IPv4 L3VPN, but not
IPv6 L3VPN.
Keepalive Detection
GRE Black Hole
GRE does not support link status detection. As a result, a GRE tunnel cannot immediately
close the tunnel connection when the remote interface is unreachable and continues
forwarding data to the peer. The peer, however, discards all the packets. A black hole is
therefore generated.
Keepalive Detection
The device provides link status detection, also called Keepalive detection, for GRE tunnels.
Keepalive detection is used to detect whether the tunnel link is in the Keepalive state,
specifically, whether the peer of the tunnel is reachable. If the peer is not reachable, the tunnel
is disconnected to prevent data loss caused by black holes.
After Keepalive detection is enabled for a GRE tunnel, the ingress periodically sends
Keepalive detection packets to the peer. If the peer is reachable, the ingress receives a reply
packet from the peer. Otherwise, the ingress cannot receive any reply packet.

Equipment
NOTE
The endpoint of a GRE tunnel has a Keepalive detection mechanism as long as it has Keepalive
detection configured. The peer does not need to have the Keepalive detection mechanism. After the peer
receives a Keepalive detection packet, it sends a reply packet, regardless of whether it has Keepalive
detection configured.
Unreachability Counter
After Keepalive detection is enabled for a GRE tunnel, the ingress creates a counter,
periodically sends Keepalive detection packets, and counts the number of sent detection
packets. The number increases by one each time a detection packet is sent.
The peer sends a reply packet to the ingress after receiving a detection packet. Upon receipt of
the reply packet, the source clears the counter value.
If the ingress receives a reply packet before the counter value reaches the preset value, the
ingress considers the peer reachable. If the source does not receive any reply packet before the
counter reaches the preset value, specifically, the retry times, the source considers the peer
unreachable. The ingress then closes the tunnel connection.
11.2.3 Applications
11.2.3.1 Connecting Discontinuous Local Networks into a VPN

GRE tunnels can connect discontinuous local networks into a VPN across a WAN.
Assume that two local networks, Site 1 and Site 2, are deployed in two different cities. By
establishing a GRE tunnel between PEs, you can connect the two networks into a continuous
VPN.
GRE applies to both L2VPNs and L3VPNs:
l On a CPE-based VPN, both ends of a GRE tunnel reside on CEs, as shown in Figure
11-3.
Figure 11-3 CPE-based VPN
GRE tunnel
VPN
VPN site1 VPN site2
backbone
CE PE PE CE
l On a network-based VPN, both ends of a GRE tunnel reside on PEs, as shown in Figure
11-4.
Figure 11-4 Network-based VPN
VPN
backbone
VPN site1 GRE tunnel VPN site2
CE PE PE CE

Equipment
Usually, the MPLS backbone network uses LSPs as public tunnels. If Ps do not support MPLS
but PEs do, LSPs cannot be used as public tunnels. In this situation, you can use GRE tunnels
for L2VPN or L3VPN solutions. Figure 11-5 shows the format of a GRE VPN packet
transmitted over an MPLS backbone network.
Figure 11-5 Format of a GRE VPN packet containing an MPLS label

Public Private
GRE MPLS
network network Payload
header label
IP header IP header

Acronym & Full Name
Abbreviatio
n
11.3 BGP/MPLS IP VPN
11.3.1 Introduction
Definition
A BGP/MPLS IP VPN is a Layer 3 virtual private network (L3VPN), which uses BGP to
advertise VPN routes and uses MPLS to forward VPN packets on the IP backbone networks
of service providers (SPs).
Figure 11-6 BGP/MPLS IP VPN
VPN 2
VPN 1 Site
CE
Site CE Service provider's
P backbone P
PE
PE
PE
VPN 1
VPN 2 P P
CE CE
Site Site

Equipment
As shown in Figure 11-6, a BGP/MPLS IP VPN consists of the following roles:
l CE: an edge device on a customer network. A CE provides interfaces that are directly
connected to the SP network. A CE can be a router, a switch, or a host. Usually, a CE is
unaware of the VPN and does not need to support MPLS.
l PE: an edge device on an SP network. A PE is directly connected to a CE. On an MPLS
network, PEs process all VPN services. The requirements on the performance of PEs are
rather high.
l P: a backbone device on an SP network. A P does not directly connect to a CE. Ps only
need to possess basic MPLS forwarding capabilities and do not maintain VPN
information.
PEs and Ps are managed by SPs. CEs are managed by users, except that the users trust SPs
with the management rights.
A PE can connect to multiple CEs. A CE can connect to multiple PEs, no matter whether
these PEs belong to the same SP.
Purpose
MPLS seamlessly integrates the flexibility of IP routing and simplicity of ATM label
switching. A connection-oriented control plane is introduced into an MPLS IP network, which
enriches the means of managing and operating the network. On IP networks, MPLS TE has
become an important tool in managing network traffic, reducing network congestion, and
ensuring QoS.
The VPNs using MPLS IP networks as the backbone networks are highly valued by carriers,
and have become an important means of providing value-added services.
Unlike the IGP, BGP focuses on controlling route transmission and choosing optimal routes
instead of discovering and calculating routes. VPNs use public networks to transmit VPN
data, and the public networks use an IGP to discover and calculate their routes. The key to
constructing a VPN is to control the transmission of VPN routes and choose the optimal
routes between two PEs.
BGP uses TCP (with port number 179) as the transport layer protocol, enhancing transmission
reliability. VPN routes can be directly exchanged between two PEs with routers located
between them.
BGP can append any information to a route as optional BGP attributes. The information is
transparently forwarded by BGP devices that cannot identify those attributes. Therefore, VPN
routes can be conveniently transmitted between PEs.
When routes are updated, BGP sends only updated routes rather than all routes. This
implementation saves the bandwidth consumed by route transmission, making the
transmission of a great number of routes over a public network possible.
As an Exterior Gateway Protocol (EGP), BGP is best suited for VPNs that cross the networks
of multiple carriers.
11.3.2 Principles

Equipment
11.3.2.1 BGP/MPLS IP VPN
Definition
A BGP/MPLS IP VPN is a Layer 3 virtual private network (L3VPN), which uses BGP to
advertise VPN routes and uses MPLS to forward VPN packets on the IP backbone networks,
as shown in Figure 11-7. A BGP/MPLS IP VPN applies to scenarios where there is only one
carrier backbone network or the backbone networks of multiple carriers belong to the same
AS. A BGP/MPLS IP VPN has the following characteristics:
l Transmits packets using extended BGP.
l Encapsulates and transmits VPN packets over MPLS LSPs serving as public network
tunnels.
l Allows a device to play only one role at a time, either PE, P, or CE.
Figure 11-7 BGP/MPLS IP VPN
VPN1 VPN2
MP-BGP
MPLS CE
CE Site3
Site1 Backbone
VPN2 PE P PE VPN1
CE CE
Site2 Site4
Related Concepts
l Site
The site concept is frequently mentioned in the VPN technology. The following
describes a site from different aspects:
– A site is a group of IP systems with IP connectivity that can be achieved
independent of service provider (SP) networks.
As shown in Figure 11-8, on the networks on the left, the headquarters of company
X in city A is a site, and the branch of company X in city B is another site. IP
devices within each site can communicate without using the SP network.

Equipment
Figure 11-8 Sites of a BGP/MPLS IP VPN
Two sites One site
Site A Site X
CE
CE
Carrier's Carrier's
network Headquarters of network Headquarters
X company in of X company
CityA in CityA
CE
CE
Branch of X Branch of X
company in company in
CityB Site B CityB
– Sites are classified based on the topological relationships between devices rather
than the geographical locations of devices, although devices at a site are
geographically adjacent to each other in general. If two geographically separated IP
devices are connected over a leased line, the two devices form a site if they can
communicate without the help of SP networks.
As shown in Figure 11-8, if the branch network in city B connects to the
headquarters network in city A over a leased line instead of an SP network, the
branch network and the headquarters network form a site.
– The devices at a site may belong to multiple VPNs. In other words, a site may
belong to more than one VPN.
As shown in Figure 11-9, in company X, the decision-making department in city A
(Site A) is allowed to communicate with the R&D department in city B (Site B) and
the financial department in city C (Site C). Site B and Site C are not allowed to
communicate with each other. In this case, two VPNs (VPN1 and VPN2) can be
established with Site A and Site B belonging to VPN1 and Site A and Site C
belonging to VPN2. In this manner, Site A is configured to belong to multiple
VPNs.

Equipment
Figure 11-9 One site belonging to multiple VPNs
Site B
City A City B
Site A VPN 1 CE
X Company X Company
Decision-making R&D
department department
CE
VPN 2
City C
X Company Carrier's
Financial network
department
CE
Site C
– A site connects to an SP network using a CE. A site may contain more than one CE,
but a CE belongs to only one site.
It is recommended that you determine the devices to be used as CEs based on the
following principles:
If the site is a host, use the host as the CE.
If the site is a subnet, use switches as CEs.
If the site comprises multiple subnets, use routers as CEs.
Sites connected to the same SP network can be classified into different sets based
on configured policies. Only sites that belong to the same set can access each other,
and this set is a VPN.
l Address space overlapping
As a private network, a VPN independently manages an address space. Address spaces
of different VPNs may overlap. For example, if both VPN1 and VPN2 use addresses on
the network segment 10.110.10.0/24, address space overlapping occurs.
NOTE
VPNs can use overlapped address spaces in the following situations:

l Two VPNs do not cover the same site.
l Two VPNs cover the same site, but devices at the site and devices using addresses in
overlapped address spaces in the VPNs cannot access each other.
l VPN instance
CEs are user-side devices and need to send only local VPN routes to PEs, irrespective of
whether the PEs are connected to the public network or other VPNs. PEs are network-
side devices, and a PE is generally connected to multiple CEs from different VPNs. A
PE may receive routes from different VPNs. Because address spaces used by different
VPNs may overlap, routes sent from different VPNs may carry the same destination
address. If a PE maintains only one routing and forwarding table, this table will accept
only one of the routes from different VPNs but with the same destination address. To
prevent this problem, the VPN technology uses VPN instances.

Equipment
A VPN instance is also called a VPN routing and forwarding (VRF) table. A PE
maintains multiple routing and forwarding tables, including a public routing and
forwarding table and one or more VRFs. A PE has multiple instances, including a public
network instance and one or more VPN instances, as shown in Figure 11-10. Each VPN
instance maintains routes from the corresponding VPN. The public network instance
maintains public network routes. This enables a PE to keep all routes from VPNs,
irrespective of their address spaces overlap.
Figure 11-10 VPN instances on a BGP/MPLS IP VPN

VPN1
Site1 CE
Backbone
VPN1 PE
VPN-instance
VPN2 Public
VPN-instance forwarding table
VPN2
Site2 CE
The differences between a public routing and forwarding table and a VRF are as follows:
– A public routing table contains the IPv4 routes of all PEs and Ps. These IPv4 routes
are static routes configured on the backbone network or are generated by routing
protocols configured on the backbone network.
– A VPN routing table contains the routes of all sites that belong to the corresponding
VPN instance. The routes are obtained through exchange of VPN routes between
PEs or between CEs and PEs.
– According to route management policies, a public forwarding table contains the
minimum forwarding information extracted from the corresponding routing table,
whereas a VPN forwarding table contains the minimum forwarding information
extracted from the corresponding VPN routing table.
VPN instances on a PE are independent of each other and of the public routing and
forwarding table.
Each VPN instance can be regarded as a virtual router, which maintains an
independent address space and has one or more interfaces connected to the router.
In RFC 4364 (BGP/MPLS IP VPNs), a VPN instance is called a per-site forwarding
table. As the name suggests, one VPN instance corresponds to one site. To be more
accurate, every connection between a CE and a PE corresponds to a VPN instance,
but this is not a one-to-one mapping. The VPN instance is manually bound to the
PE interface that directly connects to the CE.
A VPN instance uses a route distinguisher (RD) to identify an independent address
space and uses VPN targets to manage VPN memberships and routing principles of
directly connected sites and remote sites.
l Relationships between VPNs, sites, and VPN instances
The relationships between VPNs, sites, and VPN instances are as follows:

Equipment
– A VPN consists of multiple sites. A site may belong to multiple VPNs.

– A site is associated with a VPN instance on a PE. A VPN instance integrates the
VPN member relationships and routing principles of its associated sites. Multiple
sites form a VPN based on VPN instance rules.
l RD and VPNv4 address
Traditional BGP cannot process the routes that have overlapping address spaces. Assume
that VPN1 and VPN2 use addresses on the network segment 10.110.10.0/24, and each of
them advertises a route destined for this network segment. The local PE identifies the
two VPN routes based on VPN instances and sends them to the remote PE. Because
routes from different VPNs cannot work in load-balancing mode, the remote PE adds
only one of the two routes to its VRF based on BGP route selection rules.
This is because BGP cannot distinguish VPN routes with the same IP address prefix. To
solve this problem, BGP/MPLS IP VPN uses the VPNv4 address family.
A VPNv4 address consists of 12 bytes. The first eight bytes represent the RD and the last
four bytes the IPv4 address prefix, as shown in Figure 11-11.
Figure 11-11 VPNv4 address structure
RDs are used to distinguish address spaces with the same IPv4 address prefix. The
format of RDs enables SPs to allocate RDs independently. An RD, however, must be
unique on the entire network to ensure correct routing if CEs are dual-homed to PEs.
IPv4 addresses with RDs are called VPNv4 addresses. After receiving IPv4 routes from a
CE, a PE converts the routes to globally unique VPNv4 routes and advertises the routes
on the public network.
l VPN target
The VPN target, also called the route target (RT), is a 32-bit extended community
attribute. BGP/MPLS IP VPN uses the VPN target to control the advertising of VPN
routing information.
A VPN instance is associated with one or more VPN targets. VPN targets are classified
into the following types:
– Export target: After learning an IPv4 route from a directly connected site, a PE
converts the route to a VPNv4 route and sets export targets for the route. As an
extended community attribute, export targets are advertised with the route.
– Import target: After receiving a VPNv4 route from one PE, a second PE checks the
export targets of the route. If one of the export targets is identical with an import
target of a VPN instance on the PE, the PE adds the route to the corresponding
VRF.
A VPN target defines which sites can receive a VPN route and which VPN routes of
which sites can be received by a PE.
After receiving a route from a directly connected CE, a PE sets export targets for the
route. The PE then uses BGP to advertise the route with the export targets to related PEs.
After receiving the route, the related PEs compare the export targets with the import

Equipment
targets of all their VPN instances. If an export target is identical with an import target,
the route is added to the corresponding VRF.
The reasons for using the VPN target instead of the RD as the extended community
attribute is as follows:
– A VPNv4 route has only one RD, but can be associated with multiple VPN targets.
With multiple extended community attributes, BGP can greatly improve the
flexibility and expansibility of a network.
– VPN targets can be used to control route advertisement between different VPNs on
a PE. With properly configured VPN targets, different VPN instances on a PE can
import routes from each other.
On a PE, different VPNs have different RDs, but the extended community attributes
allowed by BGP are limited. Using RDs for route importing limits network expansibility.
On a BGP/MPLS IP VPN, VPN targets can be used to control exchange of VPN routes
between sites. Export targets and import targets are independent of each other and can be
configured with multiple values, ensuring flexible VPN access control and diversified
VPN networking modes.
l MP-BGP
Traditional BGP-4 defined in RFC 1771 can manage IPv4 routes but not the routes of
VPNs with overlapped address spaces.
To correctly process VPN routes, VPNs use MP-BGP defined in RFC 2858
(Multiprotocol Extensions for BGP-4). MP-BGP supports multiple network layer
protocols. Network layer protocol information is contained in the Network Layer
Reachability Information (NLRI) field and the Next Hop field of an MP-BGP Update
message.
MP-BGP uses the address family to differentiate network layer protocols. An address
family can be a traditional IPv4 address family or any other address family, such as a
VPNv4 address family or an IPv6 address family. For the values of address families, see
RFC 1700 (Assigned Numbers).
Route Advertisement on a BGP/MPLS IP VPN

On a BGP/MPLS IP VPN, CEs and PEs are responsible for advertising VPN routes, whereas
Ps only need to maintain the backbone network routes. Ps do not need to maintain VPN
routes, whereas PEs generally maintain all VPN routes on the network. Advertisement of
VPN routes consists of three phases: from local CEs to the ingress PE, from the ingress PE to
the egress PE, and from the egress PE to remote CEs. After this process, reachable routes can
be established between local and remote CEs and VPN routes can be advertised on the
backbone network. The following describes the three phases in detail.
1. Advertisement from local CEs to the ingress PE
After neighbor or peer relationships are established between CEs and their directly
connected PE, the CEs advertise local VPN routes to the PE. CEs can communicate with
the PE over static routes or routes established using Routing Information Protocol (RIP),
Open Shortest Path First (OSPF), Intermediate System-to-Intermediate System (IS-IS),
or BGP. Regardless of which routing protocol is used, routes advertised by CEs to the PE
are standard IPv4 routes.
VPN instances on a PE are isolated from each other and independent of the public
routing and forwarding table, to prevent problems caused by address space
overlapping. After learning routes from CEs, a PE decides to which routing and
forwarding table the routes need to be added based on configurations.

Equipment
2. Advertisement from the ingress PE to the egress PE

Advertisement from the ingress PE to the egress PE consists of the following parts:
– After learning VPN routes from a CE, a PE stores these routes in corresponding
VRFs and adds RDs to these standard IPv4 routes, generating VPNv4 routes.
– The ingress PE advertises VPNv4 routes to the egress PE by sending MP-BGP
Update messages. The MP-BGP Update messages also contain VPN targets and
MPLS labels.
Before being sent to the next-hop PE, these VPNv4 routes are filtered by BGP routing
policies, including the VRF export policy and peer export policy.
After these routes arrive at the egress PE, if they pass the peer import policy and their
next hops are reachable or they can be iterated, the egress PE performs local route
crossing and filters these routes based on a VRF import policy. The egress PE then
decides which routes are to be added to its VRFs. Routes received from other PEs are
added to the VPN routing table based on VPN targets. The egress PE stores the
following information for subsequent packet forwarding:
– Values of MPLS labels contained in MP-BGP Update messages
– Tunnel IDs generated after tunnel iteration
3. Advertisement from the egress PE to remote CEs
A remote CE can learn VPN routes from an egress PE over static routes or routes
established using RIP, OSPF, IS-IS, and BGP. Route advertisement from the egress PE to
a remote CE is similar to that from a local CE to the ingress PE. The details are not
described here. Note that routes advertised by the egress PE to a remote CE are standard
IPv4 routes.
After a PE receives routes of different VPNs from a local CE, if the next hops of these routes
are reachable or these routes can be iterated, the PE matches the export targets of these routes
with its VRF import targets. This process is called local route crossing. During local route
crossing, the PE filters these routes based on a VRF import policy and modifies the attributes
of eligible routes.
Packet Forwarding on a BGP/MPLS IP VPN

On a BGP/MPLS IP VPN backbone network, Ps cannot recognize VPN routing information,
so VPN packets are forwarded between PEs over tunnels. Figure 11-12 shows an example of
packet forwarding on a BGP/MPLS IP VPN. A packet is transmitted from CE1 to CE2. I-L
indicates an inner label, and O-L indicates an outer label. The outer label directs the packet to
the BGP next hop, and the inner label identifies the outbound interface for the packet or the
VPN to which the packet belongs.
Figure 11-12 Forwarding of a VPN packet from CE1 to CE2

CE1 Ingress PE P Egress PE CE2
data data data data data data data data

I-L I-L I-L I-L
Push O-L1 O-L1 O-L2 O-L2 Pop
Out-Label Switch

Equipment
The forwarding process is as follows:

1. CE1 sends a VPN packet to the ingress PE.
2. After receiving the packet from an interface bound to a VPN instance, the ingress PE
performs the following steps:
– Searches the corresponding VPN forwarding table based on the RD of the bound
VPN instance.
– Matches the destination IPv4 address with forwarding entries and searches for the
corresponding tunnel ID.
– Adds an I-L to the packet and finds the tunnel to be used based on the tunnel ID.
– Adds an outer label to the packet and sends the packet over the tunnel. In this
example, the tunnel is an LSP, and the outer label is an MPLS label.
– Transmits the double-tagged packet over the backbone network. Each P on the
forwarding path swaps the outer label of the packet.
3. After receiving the packet, the egress PE removes the outer label of the packet.
NOTE
In this example, the final outer label of the packet is O-L2. If penultimate hop popping (PHP) is
configured, O-L2 is removed on the penultimate hop, and the egress PE receives a packet with the
inner label only.
4. The egress PE removes the inner label residing at the bottom of the label stack.
5. The egress PE sends the packet from the corresponding outbound interface to CE2. After
its labels are removed, the packet becomes a pure IP packet.
In this manner, the packet is sent from CE1 to CE2. CE2 forwards the packet to the
destination in the way it sends other IP packets.
Benefits
BGP/MPLS IP VPN offers the following benefits:
l Enables users to communicate with each other over networks of geographically different
regions.
l Ensures the security of VPN data during transmission on the public network.
11.3.2.2 HVPN
Background
Currently, hierarchical architectures are used in most networking designs. For example,
metropolitan area networks (MANs) typically use a three-layer architecture consisting of an
access layer, an aggregation layer, and a core layer. On the network shown in Figure 11-13,
all PEs reside on the same plane and must provide the following functions:
l Provides access services for users. This function requires each PE to provide a large
number of interfaces.
l Manages and advertises VPN routes and processes user packets. This function requires
each PE to have a high-capacity memory and strong forwarding capabilities.

Equipment
Figure 11-13 Basic architecture of a BGP/MPLS IP VPN
VPN 2
VPN 1 Site
CE Service provider's CE
Site
backbone
P P
PE
PE
PE
VPN 2 P P VPN 1
Site CE Site
CE
To deploy VPN functions in hierarchical architectures, a BGP/MPLS IP VPN must use a

hierarchical model but not a plane model. In this manner, an HVPN is developed.
Related Concepts
Figure 11-14 shows a basic HVPN architecture consisting of mainly user-end PEs (UPEs),
superstratum PEs (SPEs), and network PEs (NPEs):
l UPE: directly connected to CEs and provides access services for users.
l SPE: connected to UPEs and located on the core of a network. SPEs manage and
advertise VPN routes.
l NPE: connected to SPEs and located on the network side.
A UPE and an SPE are connected by only one link and exchange packets based on labels. An
SPE does not need to provide a large number of interfaces for access users. UPEs and SPEs
can be connected by physical interfaces with physical links, by sub-interfaces with virtual
local area networks (VLANs) or permanent virtual circuits (PVCs), or by tunnel interfaces
with label switched paths (LSPs). If an IP or MPLS network resides between a UPE and an
SPE, the UPE and SPE can be connected by tunnel interfaces to exchange labeled packets
over a tunnel.
The capabilities of SPEs and UPEs differ according to the roles they play on a network. SPEs
require large-capacity routing tables and high forwarding performance, but few interface
resources. UPEs, on the other hand, require only low-capacity routing tables and low
forwarding performance, but high access capabilities.
NOTE
The roles of UPEs and SPEs are relative. On an HVPN, a superstratum PE is the SPE of an understratum
PE, and an understratum PE is the UPE of a superstratum PE.
An HoPE is compatible with common PEs on an MPLS network.
If a UPE and an SPE belong to the same autonomous system (AS), they use the Multi-
protocol Extensions for Interior Border Gateway Protocol (MP-IBGP). If they belong to

Equipment
different ASs, they use the Multi-protocol Extensions for Exterior Border Gateway Protocol
(MP-EBGP).
If MP-IBGP is used, an SPE can function as the route reflector (RR) for multiple UPEs to
advertise routes between IBGP peers. To reduce the number of routes on UPEs, ensure that an
SPE that is already acting as the RR for UPEs is not used as the RR for other PEs.
Figure 11-14 HVPN architecture
VPN1 CE
site1
UPE1 SPE1 NPE1

VPN1 CE
site2
VPN2 CE
site1
UPE2 SPE2 NPE2
VPN2
site2 CE
HVPN can be classified into HoVPN and H-VPN.
Table 11-4 Comparison between HoVPN and H-VPN
HVPN Characteristics Advantages

Mode
HoVPN An SPE advertises only default or Compared with an H-VPN, an

aggregated routes to UPEs. HoVPN allows an SPE to advertise
l An export policy must be only default or aggregated routes to
configured on an SPE so that the UPEs, and therefore devices with
SPE only advertises specific low route management capabilities
routes, such as the default routes, can be used as UPEs on an HoVPN.
to UPEs. This reduces network deployment
costs.
l VPN instances must be
configured on an SPE for the SPE
to import default routes locally or
aggregate routes received from
remote SPEs or NPEs, so that the
SPE advertises only default
routes or aggregated routes to
UPEs.

Equipment
HVPN Characteristics Advantages

Mode
H-VPN An SPE advertises all VPN routes to Compared with an HoVPN, an H-

UPEs. VPN allows a UPE to receive
l VPN instances do not need to be specific routes from an SPE, which
configured on SPEs. facilitates route management and
traffic forwarding control.
l MP-BGP peer relationships must
be configured between SPEs and
NPEs and between SPEs and
UPEs. The NPEs and UPEs must
be configured as the clients of
SPEs that function as RRs and be
configured to set the next hops of
routes they receive as themselves.
The following describes the route exchanging and packet forwarding processes on an HoVPN
and an H-VPN. In the following figures, N indicates a next hop, and L indicates a label.
Route Advertisement from CE1 to CE2 on an HoVPN or H-VPN

Figure 11-15 shows route advertisement from CE1 to CE2 on an HoVPN or H-VPN.
1. CE1 advertises IPv4 routes to the UPE using the IP protocol.
2. The UPE applies for label L1 for the received IPv4 routes and converts these routes to
VPNv4 routes. Then, the UPE sets itself as the next hops of these routes and advertises
them to the SPE.
3. After receiving the VPNv4 routes, the SPE saves label L1 locally and applies for label
L2 for these VPNv4 routes. Then, the SPE sets itself as the next hops of these routes and
advertises them to the NPE.
4. After receiving the VPNv4 routes, the NPE converts these routes to IPv4 routes and
imports routes with reachable next hops to its VPN IPv4 routing table. The NPE retains
label L2 and iteration tunnel ID information of these routes for later packet forwarding.
5. The NPE advertises the IPv4 routes to CE2 using the IP protocol.
Figure 11-15 Route advertisement from CE1 to CE2 on an HoVPN or H-VPN

VPNv4 route VPNv4 route
UPE SPE NPE
(L=L1,N=UPE) (L=L2,N=SPE)
IPv4 route IPv4 route
CE1 CE2
VPN1 VPN1
site1 site2

Equipment
Route Advertisement from CE2 to CE1 on an HoVPN

Figure 11-16 shows route advertisement from CE2 to CE1 on an HoVPN.
1. CE2 advertises IPv4 routes to the NPE using the IP protocol.
2. The NPE applies for label L3 for the received IPv4 routes and converts these routes to
VPNv4 routes. Then, the NPE sets itself as the next hops of these routes and advertises
them to the SPE.
3. After receiving the VPNv4 routes, the SPE saves label L3 locally and converts these
routes to IPv4 routes and imports routes with reachable next hops to its VPN IPv4
routing table.
4. The SPE imports a default route to its VPN IPv4 routing table or generates an aggregated
VPN route based on the received IPv4 routes in its VPN IPv4 routing table and applies
for label L4 for the default route or aggregated VPN route. Then, the SPE converts the
default route or aggregated VPN route to a VPNv4 route, sets itself as the next hop of the
VPNv4 route, and advertises the route to the UPE.
5. After receiving the VPNv4 route, the UPE converts the route to an IPv4 route and
imports the route to its VPN IPv4 routing table if the next hop of the route is reachable.
6. The UPE advertises the IPv4 route to CE1 using the IP protocol.
Figure 11-16 Route advertisement from CE2 to CE1 on an HoVPN

VPNv4 route
(Default route or
Aggregate route, VPNv4 route
UPE SPE NPE
L=L4, N=SPE) (L=L3,N=NPE)
CE1 CE2
VPN1 VPN1
site1 site2
Route Advertisement from CE2 to CE1 on an H-VPN

Figure 11-17 shows route advertisement from CE2 to CE1 on an H-VPN.
1. CE2 advertises IPv4 routes to the NPE using the IP protocol.
2. The NPE applies for label L3 for the received IPv4 routes and converts these routes to
VPNv4 routes. Then, the NPE sets itself as the next hops of these routes and advertises
them to the SPE.
3. After receiving the VPNv4 routes, the SPE saves label L3 locally and applies for label
L4 for these VPNv4 routes. Then, the SPE sets itself as the next hops of these routes and
advertises them to the UPE.

Equipment
4. After receiving the VPNv4 routes, the UPE converts these routes to IPv4 routes and
imports routes with reachable next hops to its VPN IPv4 routing table.
5. The UPE advertises the IPv4 routes to CE1 using the IP protocol.
Figure 11-17 Route advertisement from CE2 to CE1 on an H-VPN

VPNv4 route VPNv4 route
UPE SPE NPE
( L=L4, N=SPE) (L=L3,N=NPE)
CE1 CE2
VPN1 VPN1
site1 site2
Packet Forwarding from CE2 to CE1 on an HoVPN or H-VPN

Figure 11-18 shows packet forwarding from CE2 to CE1 on an HoVPN or H-VPN.
1. CE2 sends a VPN packet to the NPE.
2. After receiving the packet, the NPE searches its VPN forwarding table for a tunnel to
forward the packet based on the destination address of the packet. Then, the NPE adds an
inner label L2 and an outer label Lu to the packet and sends the packet to the SPE over
the found tunnel.
3. After receiving the packet, the SPE replaces the outer label Lu with Lv and the inner
label L2 with L1. Then, the SPE sends the packet to the UPE over the same tunnel.
4. After receiving the packet, the UPE removes the outer label Lv, searches for a VPN
instance corresponding to the packet based on the inner label L1, and removes the inner
label L1 after the VPN instance is found. Then, the UPE searches the VPN forwarding
table of this VPN instance for the outbound interface of the packet based on the
destination address of the packet and sends the packet through this outbound interface to
CE2. The packet sent by the UPE is a pure IP packet with no label.
Figure 11-18 Packet forwarding from CE2 to CE1 on an HoVPN or H-VPN
UPE Data L1 Lv SPE Data L2 Lu NPE
Data Data
CE1 CE2
VPN1 VPN1
site1 site2

Equipment
Packet Forwarding from CE1 to CE2 on an HoVPN

Figure 11-19 shows packet forwarding from CE1 to CE2 on an HoVPN.
1. CE1 sends a VPN packet to the UPE.
2. After receiving the packet, the UPE searches its VPN forwarding table for a tunnel to
forward the packet based on the destination address of the packet (the UPE does so by
matching the destination address of the packet against the forwarding entry for the
default route or aggregated route). Then, the UPE adds an inner label L4 and an outer
label Lv to the packet and sends the packet to the SPE over the found tunnel.
3. After receiving the packet, the SPE removes the outer label Lv and searches for the VPN
instance corresponding to the packet based on the inner label L4. Then, the SPE removes
the inner label L4 and searches the VPN forwarding table of the found VPN instance for
a tunnel to forward the packet based on the destination address of the packet. Finally, the
UPE adds an inner label L3 and an outer label Lu to the packet and sends the packet to
the NPE over the found tunnel.
4. After receiving the packet, the NPE removes the outer label Lu, searches for a VPN
label L3 after the VPN instance is found. Then, the NPE searches the VPN forwarding
CE2. The packet sent by the NPE is a pure IP packet with no label.
Figure 11-19 Packet forwarding from CE1 to CE2 on an HoVPN
Data Data
CE1 CE2
VPN1 VPN1
site1 site2
Packet Forwarding from CE1 to CE2 on an H-VPN

Figure 11-20 shows packet forwarding from CE1 to CE2 on an H-VPN.
1. CE1 sends a VPN packet to the UPE.
2. After receiving the packet, the UPE searches its VPN forwarding table for a tunnel to
forward the packet based on the destination address of the packet (the UPE does so by
matching the destination address of the packet against the forwarding entries for specific
routes received from the SPE). Then, the UPE adds an inner label L4 and an outer label
Lv to the packet and sends the packet to the SPE over the found tunnel.

Equipment
3. After receiving the packet, the SPE replaces the outer label Lv with Lu and the inner
label L2 with L3. Then, the SPE sends the packet to the NPE over the same tunnel.
4. After receiving the packet, the NPE removes the outer label Lu, searches for a VPN
label L3 after the VPN instance is found. Then, the NPE searches the VPN forwarding
CE2. The packet sent by the NPE is a pure IP packet with no label.
Figure 11-20 Packet forwarding from CE1 to CE2 on an H-VPN
Data
Data
CE1 CE2
VPN1 VPN1
site1 site2
Related Functions
H-VPN supports HoPE embedding.
l You can connect a new SPE to an existing SPE and configure the existing SPE to be the
UPE of the new SPE.
l You can connect new UPEs to an existing UPE and configure the existing UPE to be the
SPE of the new UPEs.
l HoPEs can be embedded repeatedly in the preceding two methods.
HoPE embedding can infinitely expand a VPN in theory.
Figure 11-21 shows a three-layer H-VPN, and the PEs in the middle are referred to as middle-
level PEs (MPEs). MP-BGP runs between the SPE and MPEs, and between the MPEs and
UPEs.
NOTE
The MPE concept of is introduced solely for descriptive purposes and does not actually exist in an H-
VPN model.
MP-BGP advertises all the VPN routes of UPEs to the SPE, but advertises only the default
routes of the VPN instances of the SPE to UPEs.
An SPE maintains the routes of all VPN sites connected to its understratum PEs, whereas a
UPE maintains only the routes of its directly connected VPN sites. The numbers of routes
maintained by an SPE, an MPE, and a UPE are in descending order.

Equipment
Figure 11-21 HoPE embedding

SPE
MPE UPE
UPE UPE
CE CE CE CE
Benefits
HVPN networking provides the following benefits:
l Flexible expandability
If the performance of a UPE is insufficient, you can add an SPE for the UPE to access. If
the access capabilities of an SPE are insufficient, add more UPEs to the SPE.
l Reduced interface resource requirements
Since a UPE and an SPE exchange packets based on labels, they only need to be
connected over a single link.
l Reduced burdens on UPEs
A UPE needs to maintain only local VPN routes. The remote VPN routes are represented
by a default or aggregated route, lightening the burdens on UPEs.
l Simpler configuration
SPEs and UPEs use MP-BGP, a dynamic routing protocol, to exchange routes and
advertise labels. Each UPE only needs to establish a single MP-BGP peer relationship
with an SPE.
11.3.2.3 VPN FRR
Background
As networks develop rapidly, the time used for end-to-end service convergence if a fault
occurs on a carrier's network has been used as an indicator to measure bearer network
performance. MPLS TE FRR is one of the commonly used fast switching technologies. The

Equipment
solution is to create an end-to-end TE tunnel between two PEs and a backup LSP that protects
a primary Label Switched Path (LSP). When either of the PEs detects that the primary LSP is
unavailable because of a node or link failure, the PE switches the traffic to the backup LSP.
MPLS TE FRR, however, cannot implement fast switching if faults occur on the ingress or
egress. If a fault occurs on the ingress or egress, services can only be restored through end-to-
end route convergence and LSP convergence. The service convergence time is closely related
to the number of routes inside an MPLS VPN and the number of LSP hops on the bearer
network. The more VPN routes, the longer the service convergence time, and the more traffic
is lost.
VPN FRR sets in advance on a remote PE forwarding entries pointing to the active and
standby PEs, respectively. In collaboration with fast PE fault detection, VPN FRR can reduce
end-to-end service convergence time if a fault occurs on an MPLS VPN where a CE is dual-
homed to two PEs. In VPN FRR, service convergence time depends on only the time required
to detect remote PE faults and change tunnel status. VPN FRR enables the service
convergence time to be irrelevant to the number of VPN routes on the bearer network.
Implementation
Figure 11-22 VPN FRR networking
PE2
Backbone
VPN Site Link A VPN Site
CE1 PE1
Link B CE2
PE3
As shown in Figure 11-22, normally, CE1 accesses CE2 over Link A. If PE2 is Down, CE1
accesses CE2 over Link B.
Based on the traditional BGP/MPLS IP VPN technology, both PE2 and PE3 advertise routes
destined for CE2 to PE1, and assign VPN labels to these routes. PE1 then selects a preferred
VPNv4 route based on the routing policy. In this example, the preferred route is the one
advertised by PE2, and only the routing information, including the forwarding prefix, inner
label, selected LSP, advertised by PE2 is filled in the forwarding entry of the forwarding
engine to guide packet forwarding.
When a fault occurs on PE2, PE1 detects the fault of PE2 (the BGP peer goes Down or the
MPLS LSP is unavailable), re-selects the route advertised by PE3, and updates the forwarding
entry to complete end-to-end convergence. Before PE1 re-delivers the forwarding entry for
the route advertised by PE3, CE1 cannot reach CE2 for a certain period, because PE2, an end
point of the LSP, is Down. As a result, end-to-end services are interrupted.
VPN FRR is an improvement of the traditional reliability technology. With VPN FRR, PE1
can select the appropriate VPNv4 routes based on the matching rules. For these routes, in

Equipment
addition to information about the preferred routes advertised by PE2, information about the
second-best route advertised by PE3 is also filled in the forwarding entry.
If a fault occurs on PE2, the MPLS LSP between PE1 and PE2 becomes unavailable. After
detecting the fault by means of techniques such as bidirectional forwarding detection (BFD),
PE1 marks the corresponding entry in the LSP status table as unavailable, and delivers the
setting to the forwarding table. After selecting a forwarding entry, the forwarding engine
examines the status of the LSP corresponding to the forwarding entry. If the LSP is
unavailable, the forwarding engine uses the second-best route carried in the forwarding entry
to forward packets. After being tagged with the inner labels assigned by PE3, packets are
transmitted to PE3 over the LSP between PE1 and PE3 and then forwarded to CE2. In this
manner, fast end-to-end service convergence is implemented and traffic from CE1 to CE2 is
restored.
Other Functions
VPN FRR is a fast switching technique based on inner labels. The outer tunnels can be LDP
LSPs, RSVP-TE tunnels, or traditional tunnels used by L3VPN (such as GRE tunnels). When
the forwarding engine detects that the outer tunnel is unavailable during packet forwarding,
fast switching based on inner labels can be implemented.
Usage Scenario
On a VPN where a CE is dual-homed to two PEs, after a PE fails, VPN FRR ensures that the
VPN services from the CE to the PE can be rapidly switched to the standby PE for
transmission.
Benefits
On a VPN where a CE is dual-homed to two PEs, VPN FRR speeds up service convergence
and enhances network availability in the case of PE failures.
11.3.2.4 VPN GR
Graceful restart (GR) is a high availability (HA) technology that comprises a comprehensive
set of techniques, such as fault-tolerant redundancy, link protection, faulty node recovery, and
traffic engineering. As a fault-tolerant redundancy technology, GR ensures normal forwarding
of data when the routing protocol restarts to prevent interruption of key services. Currently,
GR has been widely applied to active/standby switchovers and system upgrades.
GR is usually used when the active route processor (RP) fails due to a software or hardware
error, or used when an administrator performs a master/slave main control board switchover.
Implementation Prerequisite
On a traditional routing device, a processor performs both control and forwarding. The
processor finds routes based on routing protocols and maintains the routing and forwarding
tables of a device. High- and medium-end devices generally use the multi-RP structure to
improve forwarding performance and reliability. The processor responsible for routing
protocols is mostly located on the main control board, whereas the processor responsible for
data forwarding is located on the interface board. This design helps to ensure the continuity of
packet forwarding on the interface board during the restart of the main processor. The
forwarding-control decoupling technology satisfies the prerequisite for GR implementation.

Equipment
A GR-capable device must have two control boards, and its interface board must have an
independent processor and memory.
Related Concepts
GR involves the following concepts:
l GR restarter: a GR-capable router that performs a master/slave control board switchover

upon the occurrence of a failure or under the instructions of an administrator. A GR
restarter must support GR.
l GR helper: a neighbor of a GR restarter. A GR helper must support GR.
l GR session: a session over which a GR restarter and a GR helper can negotiate GR
capabilities.
l GR time: time when a GR helper keeps the topology information or routes obtained from
a GR restarter after detecting that the GR restarter is Down.
NOTE
Currently, the ATN can function only as a GR helper.
Overview
VPN GR is the application of the GR technology on a VPN. VPN GR ensures that VPN
traffic is not interrupted when a master/slave control board switchover is performed on a
device that transmits VPN services. VPN GR offers the following benefits:
l Reduces the impact of VPNv4 route or BGP label route flapping on the entire network
during a master/slave control board switchover.
l Decreases the packet loss rate of VPN services to almost 0%.
l Protects important VPN services.
l Improves VPN reliability by reducing PE or CE single-point failures.
To support VPN GR, a BGP/MPLS IP VPN must support IGP GR and BGP GR. When using
an MPLS LDP LSP as a tunnel, the BGP/MPLS IP VPN must support MPLS LDP GR. If
traffic engineering is used, the BGP/MPLS IP VPN must also support RSVP GR. After a
master/slave control board switchover is performed on a PE or CE, the PE or CE and its
connected PEs keep the forwarding information of all VPN routes for a certain period to
ensure that VPN traffic is not interrupted. CEs connecting to a PE on which a master/slave
control board switchover is performed also need to keep the forwarding information of all
VPN routes for a certain period.
On a common L3VPN, a master/slave control board switchover may be performed on any PE,
CE, or P.
Master/Slave Control Board Switchover on a PE

The master/slave control board switchover of a PE consists of three phases:
1. Before the switchover

The PE negotiates the IGP GR and MPLS LDP GR capabilities with a P, and negotiates
the IGP GR or BGP GR capabilities with the connected CE. The PE also negotiates BGP
GR capabilities with the peer PE and sends the Open message containing the GR
capability field of <AFI=Unicast,SAFI=VPNv4>.

Equipment
2. During the switchover

The PE keeps the status of VPNv4 route forwarding, and the following procedures are
involved:
– MPLS LDP GR
If a neighbor detects that the corresponding TCP session enters the Down state, the
neighbor backs up all LSPs on the slave board and marks these LSPs as invalid.
– BGP GR
BGP session messages are lost during the switchover. Then, the PE does not keep
any routing information but the forwarding information. GR-aware BGP peers mark
all the routes related to the GR routers as Stale. The BGP peers, however, still
forward packets based on these routes within the GR time.
3. After the switchover
The PE instructs all the IGP neighbors, BGP IPv4 peers, and private network IGP
neighbors between the PE and CE to reestablish connections. The following procedures
are involved:
– IGP convergence
To resynchronize the link state database (LSDB) of OSPF or IS-IS with the
neighboring P, the PE sends a signal to each neighboring P and reestablishes the
neighbor relationship list after receiving a response. If IS-IS or OSPF multi-
instances are run between the PE and CE, the PE also needs to resynchronize the
LSDB with the CE. The PE obtains the topology or routing information by
establishing sessions with all the neighbors. After obtaining the topology and
routing information, the PE recalculates the routing table and deletes the routes in
the Stale state to complete IGP convergence.
– BGP convergence
The PE also exchanges routing information with BGP peers, including public
network BGP peers, MP-BGP peers, and private network BGP peers. The PE then
updates the routing table and the forwarding table according to the new routing
information and replaces the invalid routing information to complete BGP
convergence.
– Label switched path manager (LSPM)
BGP may receive or send routes with labels to create BGP LSPs and apply for
labels. After receiving BGP LSP information, the LSPM module checks whether
the corresponding LSP exists. If a matching LSP exists, the LSPM module deletes
the invalid flag of this LSP.
After receiving the End-of-Rib message from a BGP peer on a public or private
network, the PE notifies the routing management (RM) module. The End-of-Rib
message is used to notify the peer that the first routing information update after a
BGP session is established has been completed.
Before all routing protocols complete GR, only FIB information on control boards
is updated.
After all routing protocols complete GR, the RM module sends a message to notify each
protocol and the LSPM module that GR is complete, and then updates the FIB
information on interface boards.
a. BGP sends BGP public network IPv4 routes, private network IPv4 routes, and
VPNv4 routes to each peer. After sending the routes, BGP sends End-of-Rib
messages.

Equipment
b. After the LSPM module deletes all LSPs in the Stale state, VPN GR is complete.
The processing on devices connecting to a PE is as follows:
l After a CE connecting to this PE detects the restart of the PE, the CE uses the same
processing flow as that of the GR helper in common IGP GR or BGP GR and keeps
information about all IPv4 routes for a certain period.
l After a P connecting to this PE detects the restart of the PE, either of the following
situations occurs:
– If BGP is not configured, the P uses the same processing flow as that of the GR
helper in common IGP GR and MPLS LDP GR.
– If BGP is configured, the BGP processing flow is the same as that of the GR helper
in the common BGP GR except that the BGP processing flow includes additional
IGP GR processing and MPLS LDP GR processing, and the P then keeps
information about all the public IPv4 routes for a certain period.
l After detecting the restart of the PE, the RRs reflecting VPNv4 routes and the other PEs
(including ASBRs) connecting to this PE use the same processing flow as that of the GR
helper in BGP GR. They then keep information about all the public IPv4 routes and
VPNv4 routes for a certain period.
Master/Slave Control Board Switchover on a P

The processing flow of a P is the same as that of the GR restarter in common IGP GR, MPLS
LDP GR, or BGP GR.
After detecting the restart of a P, other Ps and PEs that connect to the P use the same
processing flow as that of the GR helper in common IGP GR or BGP GR. This means that
they keep information about all the public network IPv4 routes for a certain period.
Master/Slave Control Board Switchover on a CE

The processing flow of a CE is the same as that of the GR restarter in common IGP GR or
BGP GR.
After detecting the restart of a CE, the PEs that connect to the CE use the same processing
flow as that of the GR helper in common IGP GR or BGP GR. This means that they keep
information about all the private network IPv4 routes for a certain period.
11.3.2.5 VPN NSR

As networks develop fast, the demand for the triple play services of the Public Switched
Telephone Network (PSTN), TV network, and Internet becomes more and more stringent.
Operators pose high requirements for reliability on IP networks. Non-Stop Routing (NSR), as
a High Availability (HA) solution, is therefore introduced.
NSR ensures that the control plane of a neighbor does not sense the fault on the control plane
of the local device which provides a slave control plane. In this process, the neighbor
relationships set up through specific routing protocols, MPLS, and other protocols that carry
services are not interrupted.
As an HA solution, NSR ensures that user services are not affected or least affected in the
case of device failures.
During the master/slave switchover, VPN NSR ensures the continuous forwarding and
continuous advertisement of VPN routes. In this process, the neighbor relationships are not

Equipment
affected, with neighbors not knowing the switchover on the local device. This ensures
uninterrupted transmission of VPN services.
11.3.2.6 BGP SoO

When multiple CEs in a VPN site access different PEs, VPN routes sent from CEs to PEs may
return to this VPN site after traveling through the backbone network. This may cause routing
loops in the VPN site.
After the BGP Site-of-Origin (SoO) attribute is configured on a PE, the PE adds the SoO
attribute to the route sent from a CE and then advertises the route to other PE peers. Before
advertising the VPN route to the connected CE, the PE peers check the SoO attribute carried
in the VPN route. If the PE peers find that this SoO attribute is the same as the locally
configured SoO attribute, the PE peers do not advertise this VPN route to the connected CE.
As shown in Figure 11-23, CE1 and CE2 are in the same VPN site and can advertise routes to
each other. CE1 advertises the route to 10.1.1.1/8 in the VPN site to PE1 and PE1 advertises
the route to PE2 by using MP-IBGP. PE2 then advertises the route to CE2 by using BGP. As a
result, the route returns to the original VPN site from which the route is advertised, which
may cause a routing loop in the VPN site.
Figure 11-23 Networking diagram for the BGP SoO application

PE1
CE1
VPN Site VPN

10.1.1.1/8 Backbone
CE2 PE1 PE1
To avoid routing loops in a VPN site, you can configure an SoO attribute on PE1 for CE1, the
SoO attribute identifies the site where the CE1 resides. The routes advertised by CE1 to PE1
then carry this SoO attribute and PE1 advertises the routes with the SoO attribute to other PEs
across the backbone network. Before advertising the received routes to its peer CE2, PE2
checks whether the routes carry the SoO attribute specified for the site where CE2 resides. If a
route carries this SoO attribute, it indicates that this route is advertised from the site where
CE2 resides. PE2 then refuses to advertise such a route to CE2, therefore avoiding routing
loops in the site.
11.3.2.7 Querying Bearer Relationships Between Tunnels and VPNs
Querying the Bearer Tunnel of a Specified VPN

The NMS allows you to query detailed information about a VPN bearer tunnel by specifying
the VRF name, public network next hop, and tunnel ID.
Figure 11-24 Typical BGP/MPLS IP VPN networking

CE1 PE1 PE2 CE2

Equipment
On the network shown in Figure 11-24, a public network tunnel and a BGP VPNv4 peer
relationship are established between PE1 and PE2. After PE2 receives VPNv4 routes
advertised by PE1, PE2 performs VPN route matching and iterates the matching roue to a
public network tunnel based on next hop and tunnel policy information. If tunnel load
balancing is configured, the route will be iterated to multiple public network tunnels. Then, a
bearer relationship is established between public network tunnel(s) specified on PE2 and a
specific VRF.
The MIB queries a VPN's bearer tunnel based on the specified VRF name, public network
next hop, and tunnel ID, and return queried tunnel information to the NMS client through
SNMP packets. The queried tunnel information includes the destination address of the tunnel,
source address of the tunnel, tunnel type, outbound interface of the tunnel, load balancing
status of the tunnel, LSP index, outbound interface of the LSP, outgoing label of the LSP, next
hop of the LSP, LSP FEC, mask length of the LSP FEC, and LSP status (primary or backup).
Note that tunnels can be of different types, such as LocalIfNet, TE, GRE, and LSP. Only
tunnels of the LSP and TE types have LSP information, including the LSP index, outbound
interface of the LSP, outgoing label of the LSP, next hop of the LSP, LSP FEC, mask length of
the LSP FEC, and LSP status (primary or backup).
Querying VPNs Carried over a Specified Tunnel

The MIB queries the VPNs carried over a tunnel based on the specified tunnel ID and tunnel
interface name and returns queried VPN information to the NMS client through SNMP
packets. The queried VPN information includes L3VPN, VPLS, and PWE3/VLL information.
The returned L3VPN information includes the VPN name, the returned VPLS information
includes the VSI name, VSI ID, peer IP address, and VC ID, and the returned PWE3/VLL
information includes the IFname, peer IP address, and VC ID. Querying the VPNs carried on
a specified tunnel is vital for network O&M, service monitoring, and fault locating.

Terms
Term Description
CE Customer edge equipment that is directly connected to the service provider. In

an MPLS-based VPN, a CE can be a router, switch, or even a host.
address An address realm managed by a VPN.

space
GRE An encapsulation mode in which packets of certain network protocols such as

IP and IPX are encapsulated and therefore can be transmitted in networks
supporting other protocols such as IP.
L2TP A Layer 2 tunneling protocol that is drafted by IETF and involves the
participation of companies such as Microsoft. The L2TP combines the
advantages of both PPTP and L2F.

Equipment
Term Description
MP-BGP A multi-protocol extension of BGP-4. MP-BGP supports multiple network

layer protocols and identifies the protocols based on address families. MP-
BGP transmits VPN composition information and VPN-IPv4 routes between
PEs.
P A backbone device that is located in the service provider network. A P is not

directly connected to CEs. Ps only need to possess basic MPLS forwarding
capabilities and do not maintain information about a VPN.
PE A device that is located in the backbone network of the MPLS VPN structure.
A PE is responsible for VPN user management, establishment of LSPs
between PEs, and exchange of routing information between sites of the same
VPN. During the process, a PE performs the mapping and forwarding of
packets between the private network and the public channel. A PE can be a
UPE, an SPE, or an NPE.
RD A route distinguisher, which is an 8-byte field in a VPN IPv4 address. An RD

and a 4-byte IPv4 address prefix construct a VPN IPv4 address, which is used
to differentiate the IPv4 prefixes using the same address space.
site A group of IP systems with IP connectivity, which can be achieved

independent of SP networks.
tunnel A channel on the packet switching network that transmits service traffic
between PEs. In VPN, a tunnel is an information transmission channel
between two entities. The tunnel ensures secure and transparent transmission
of VPN information. In most cases, a tunnel is an MPLS tunnel.
tunnel A process in which a route is iterated to a tunnel.

iteration
tunnel ID A group of information, including the token, slot number of an outgoing

interface, tunnel type, and location method.
VPN A recently-developed technology that implements the private network over a

public network. It is a network that only logically exists.
VPN An entity that is set up and maintained by PEs for directly-connected sites.
instance Each site has its VPN instance on a PE. A VPN instance is also called the
VPN Routing and Forwarding (VRF) table. A PE has multiple forwarding
tables, including a public-network routing table and one or multiple VRFs.
VPN A BGP extended community attribute that is also called Route Target. In
target BGP/MPLS IP VPN, VPN-Target is used to control VPN routing information.
The VPN-Target attribute defines which sites can receive a VPN IPv4 route
and the routes from which sites can be received by a PE.

AS autonomous system

Equipment
ASBR autonomous system boundary router
CE customer edge
HoPE Hierarchy of PE
HoVPN Hierarchy of VPN
IS-IS Intermediate System to Intermediate System
L2TP Layer 2 Tunneling Protocol
MP-BGP Multiprotocol Extensions for BGP
NAT Network Address Translation
P provider
PE provider edge
PHP penultimate hop popping
PVC permanent virtual circuit
QoS quality of service
QPPB QoS Policy Propagation Through the Border Gateway

Protocol
RD route distinguisher
RR route reflector
VPN virtual private network

Equipment
VRF VPN Routing and Forwarding
11.4 VLL
11.4.1 Introduction to the VLL
Definition
MPLS L2VPN
The Multiprotocol Label Switching Layer 2 Virtual Private Network (MPLS L2VPN)
transmits Layer 2 VPN services over an MPLS network. MPLS L2VPN enable operators to
provide L2VPN services over different media, such as Asynchronous Transfer Mode (ATM),
Frame Relay (FR), virtual local area network (VLAN), Ethernet, and Point-to-Point Protocol
(PPP) in a unified MPLS network.
Simply, the MPLS L2VPN indicates that Layer 2 data is transmitted transparently over an
MPLS network. For the users, the MPLS network functions as a Layer 2 switched network
through which Layer 2 connections can be set up between nodes. Layer 2 connections can be
set up in virtual leased line (VLL) mode and virtual private LAN service (VPLS) mode.
l VLL
The VLL is an emulation of the traditional leased line service. It emulates the leased line
over an IP network, and provides the asymmetrical digital data network (DDN) service at
low costs. For users at both ends of a VLL, the VLL is similar to the traditional leased
line. The VLL is a point-to-point virtual private wire technology that can support almost
all the link layer protocols. The VLL can be implemented in the following modes:
– Circuit Cross Connect (CCC): It is a mode of implementing the L2VPN through
static configuration.
– Static Virtual Circuit (SVC): It is a mode of implementing the MPLS L2VPN. The
SVC is similar to the Label Distribution Protocol (LDP) L2VPN. The difference is
that LDP is not used as the signaling protocol for transmitting VC labels or link
information, whereas VC labels are manually configured on the SVC.
– Martini: It implements the MPLS L2VPN by using LDP as the signaling protocol
for transmitting the VC information.
– Pseudo-Wire Emulation Edge to Edge (PWE3): It is an extension of Martini mode
and a technology for end-to-end Layer 2 service transmission.
l VPLS
VPLS uses the PSN to connect multiple Ethernet LAN segments and thus these segments
can work as one LAN. VPLS is also called transparent LAN service or virtual private
switched network service (VPSNS).
Different from the point-to-point service of the common L2VPN, VPLS enables the
service provider to offer Ethernet-based multipoint service to users through an MPLS
backbone network.

Equipment
Purpose
l Extended network functions and service capabilities of operators
Operators can provide MPLS L2VPN services over only one network. In addition,
operators can use enhanced technologies related to MPLS, such as traffic engineering
(TE) and Quality of Service (QoS), to provide users with different classes of services to
meet users' requirements.
l Higher scalability
In an ATM or FR network that MPLS is not enabled, VCs provide the L2VPN service.
For each VC, the provider edge (PE) devices and provider (P) devices in the network
need to maintain the complete VC information. Then, when PEs of the operators are
connected to multiple costumer edge (CE) devices, multiple VCs are created. Therefore,
PEs and P devices must maintain information about multiple VCs. The MPLS L2VPN,
however, can adopt label stacking to multiplex multiple VCs in a label switched path
(LSP). Therefore, P devices only need to maintain information about one LSP. This
improves scalability of a system.
l Separation of administrative responsibilities
In the MPLS L2VPN, operators provide only Layer 2 connectivity while users are
responsible for Layer 3 connectivity such as routing. Therefore, route flapping caused by
incorrect configurations does not affect stability of operators' networks.
l Privacy of routing and security of user information
Users maintain their own routing information; therefore, operators do not need to
concern address overlapping or IP address planning, and do not need to worry about that
the routing information of a user is leaked to other users in private networks. This
reduces the burden of operators on management and enhances security of user
information.
l Enhanced security and confidentiality
The MPLS L2VPN provides the same security and confidentiality as ATM and FR
networks. By having users maintain their own routing information, operators do not have
to worry about address overlapping or the risk of leaking the routing information of one
user to another user. The MPLS L2VPN reduces the management pressure of operators
and improves user information security.
l Support for multiple protocols
Operators provide only Layer 2 connections; therefore, users can use any Layer 3
protocol such as IPv4 and IPv6.
l Smooth network upgrade
The MPLS L2VPN is transparent to users; therefore, when operators upgrade networks
from traditional L2VPNs such as ATM and FR networks to MPLS L2VPNs, users do not
need to perform any configuration. The network upgrade does not affect user services
except for data loss in a short period during the switchover.
11.4.2 Principles

ATNs support the following VLL technologies:
l Circuit cross connect (CCC)
l Static virtual circuit (SVC)

Equipment
l Martini
VLL supports the following link layer protocols:
l VLAN
l Ethernet
VLL supports the following types of interfaces:
l Ethernet interface
l Ethernet sub-interface
l GE interface
l GE sub-interface
l Eth-Trunk interface
l Eth-Trunk sub-interface
l MP interface
An AC interface in VLAN encapsulation mode can be an Ethernet interface or an Ethernet
sub-interface. However, the encapsulation mode of an Ethernet interface used as an AC
interface must be Ethernet, not VLAN.
In VLL networking, only one VC can be configured on each interface.
VLL Architecture
The VLL architecture comprises two ACs, one VC, and one tunnel, as shown in Figure
11-25.
Figure 11-25 VLL architecture
AC VC AC
Tunnel
PE MPLS PE CE
CE
Network
Functional Modules
VLL involves the following functional modules:
l AC: an independent physical or virtual circuit connecting a CE and a PE. An AC
interface can be either a physical or a virtual interface. The AC attributes include the
encapsulation type, maximum transmission unit (MTU), and interface parameters of the
specified link type.
l VC: a virtual connection between two PEs.
l Tunnel: a virtual link used to transparently transmit service data.
11.4.2.2 CCC VLL

Circuit cross connect (CCC) is manually configured to implement L2VPN.

Equipment
CCC must be configured by network administrators and is best suited for small MPLS
networks with simple topologies. The establishment of CCC virtual circuits (VCs) does not
require signaling negotiation or exchange of control packets. Compared with other types of
Layer 2 connections, CCC VCs consume fewer resources and are easy to configure.
Local CCC VCs

A local CCC VC refers to the connection between two local CEs, that is, two CEs connected
to the same PE. Similar to a Layer 2 switch.
As shown in Figure 11-26, Site1 and Site2 of VPN2 connect to PE3 and communicate with
each other over a local CCC VC (black dashed line). PE3 functions as a Layer 2 switch and
no LSP is required between Site1 and Site2. Data of different link encapsulation types, such
as VLAN and Ethernet, can be directly exchanged.
Figure 11-26 Topology of an MPLS L2VPN in local CCC mode
ISP Network Site1 VPN2

P P Site2
PE3
VPN1 PE1
CE
CE
Site3
PE2 CE
P P VPN1
Site4
CCC local connection
The most significant advantage of an MPLS L2VPN in local CCC mode is that an ISP
network can support this type of L2VPN so long as the ISP network supports MPLS
forwarding. Exchange of labels or signaling packets carrying L2VPN information is not
required during the establishment of this type of L2VPN. In addition, QoS can be guaranteed
for CCC VCs. This is because an LSP used by a CCC VC can no longer be used by other
types of Layer 2 connections.
11.4.2.3 Martini VLL
Definition
A Martini VLL uses LDP as the signaling protocol to transmit VC information. The Martini
mode complies with RFC 4906, which extends LDP by adding a new type of forwarding
equivalence class (FEC) for exchanging VC labels. A PE assigns a VC label to each
connection between two CEs. L2VPN information carrying VC label information is
forwarded to the remote PE over an LSP established using LDP. In this manner, a VC LSP is
set up over the ordinary LSP.
In Figure 11-27, Site1 and Site2 in each VPN (VPN1 and VPN2) are interconnected using an
LSP on the ISP network. Site1 and Site2 in VPN1 can also multiplex an LSP with Site1 and
Site2 in VPN2.

Equipment
Figure 11-27 Topology of a Martini VLL

VPN1 VPN1
Site1 ISP Network Site2
P P
CE PE1 PE2 CE
Site1 Site2
CE P P CE
VPN2 VPN1 remote connection VPN2

VPN2 remote connection
A Martini VLL supports remote connections, but not local connections. The Martini mode
supports graceful restart (GR). After the ATN performs a switchover, the VC labels remain
unchanged. During the switchover, the packet forwarding on the VC remains unaffected.
Related Concepts
If PW redundancy is configured for a Martini VLL, you need to configure the following
parameters:
l VC type: indicates the encapsulation type of a VC, such as ATM, VLAN, and PPP.
l VC ID: identifies a VC. The IDs of VCs of the same type must be unique on a PE,
except that these VCs belong to the same MS-PW.
l Peer IP address: indicates the IP address of the remote PE for a VC. The peer IP address
uniquely identifies a VC. The loopback interface IP address of the remote PE is usually
used as the peer IP address.
The PEs that are connected to two CEs exchange VC labels using LDP and bind the
corresponding CEs to the VC ID. If two PEs that exchange VC labels are not directly
connected, a remote LDP session must be established on which the VC FEC and the VC label
are transmitted. A VC can be set up for two CEs to transmit Layer 2 data if the following the
conditions are met:
l The physical status of the AC interfaces is Up.
l A tunnel exists between the two PEs.
l The VC labels have been exchanged between PEs and CEs have been bound to the VC
ID.
Process of setting up and tearing down a PW

When using LDP as the signaling protocol, PWs carry the VC information by extending the
Type-Length-Value (TLV) of standard LDP. The Type 128 FEC is added. When PWs are set
up, the downstream unsolicited (DU) label distribution mode and the liberal label retention

Equipment
mode are adopted. To set up a PW, an LDP session must be established first. Which type of
LDP session needs to be established between PEs depends on the following situations:
l If a P exists between PEs, the LDP session needs to be established in remote mode.
l If PEs are directly connected, an ordinary LDP session needs to be established.
Figure 11-28 Process of setting up and tearing down a PW

Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
PE1 PE2
Reque
st
mpls l2vc 2.2.2.2 101 Mappin
g
mpls l2vc 1.1.1.1 101
ing
Mapp
VC state up
AC/Tunnel down Withdr

aw
Releas
e VC state down
se
Relea
VC state down raw
Withd
The process of setting up a PW when LDP is used as signaling is as follows:

1. If an LDP session is created between PE1 and PE2, PE1 sends a Request message to
PE2. PE2 can send a Mapping message to PE1 only after receiving the Request message.
2. PE1 determines the VC label and notifies PE2 of the VC label using a Mapping message
that carries this VC label.
3. PE2 receives the Request and Mapping messages sent by PE1.
4. If the PW configurations on PE2 and PE1 are consistent, PE2 sets the status of its local
VC to Up after processing the message received from PE1 and sends a Mapping message
to PE1.
After receiving the Mapping message, PE2 checks whether its VC configurations are
consistent with those on PE1. If the VC IDs and VC types on PE1 and PE2 are
consistent, the VCs on PE1 and PE2 are on the same VPN. In addition, if the interface
parameters of PE1 and PE2 are consistent, and the result of the control word negotiation
is also consistent, PE2 sets the status of its local VC to Up.
5. PE1 receives the Mapping message from PE2 and sets the status of its local VC to Up
after processing the message. Then, a dynamic PW between PE1 and PE2 is set up.
After receiving the Mapping message, PE1 checks the VC ID, VC type, and interface
parameters in the message. If they are consistent on both ends, PE1 sets the status of its

Equipment
local VC to Up. A PW consisting of two bidirectional VCs between PE1 and PE2 is set
up successfully.
The process of tearing down a PW when LDP is used as signaling is as follows:
1. If PE1 detects that the AC interface or tunnel is Down or the AC interface is deleted,
PE1 sends the Withdraw and Release messages to PE2. The Withdraw message is used
to instruct the peer to withdraw the VC label; the Release message is used to respond to
the Withdraw message, instructing the peer that sends the Withdraw message to
withdraw the VC label. To expediently delete the PW, PE1 sends the Withdraw message
and Release message in tandem.
2. After receiving the Withdraw message and Release message from PE1, PE2 processes
the Withdraw message of PE1.
3. PE2 sends the Release message to PE1.
4. After PE1 receives the Release message of PE2, the PW between PE1 and PE2 is
deleted.
Process of Packet Transmission in Martini Mode
Figure 11-29 Process of packet transmission in Martini mode
VPN1 VPN1
Site1 Site2
1000 3000 1001 3000 1002 3000
1000 4000 1001 4000 1002 4000
VL
CE
20
P P CE
AN
AN
VL
20
10
VL
PE2
AN
PE1
AN
ISP
20
VL
P Network P
VC
0
I1 0
I10
9
VC
I20
Site2
VC
Site1
VC
I
20
2002 3500 2001 3500 2000 3500

5
2002 4500 2001 4500 2000 4500

CE CE
Site1 to Site2
VPN2 VPN2
Site2 to Site1
As Figure 11-29 shows, the process of packet transmission in Martini mode can be:
l From Site1 to Site2

After the packet sent from Site1 of VPN1 to VLAN10 of PE1 reaches PE1, PE1 adds a
VC label 3000 and an outgoing label 1000 of LSP1 to the packet and send the packet to
LSP1 (indicated by the black dashed line) for transmission. For the ATM packet sent
from Site1 of VPN2 to PE1 with the virtual channel identifier (VCI) as 100, PE1 adds a
VC label 4000 and an outgoing label 1000 of LSP1 to the packet and sends the ATM
packet to LSP1 (indicated by the black dashed line) for transmission.

Equipment
After these packets reach PE2, PE2 strips the incoming label 1002 of LSP1 and selects
outbound interfaces according to the VC labels 3000 and 4000. The VC labels 3000 and
4000 are transmitted to PE1 through LDP signaling when PE2 sets up VCs with PE1.
l From Site2 to Site1
After the packet sent from Site2 of VPN1 to VLAN20 of PE2 reaches PE2, PE2 adds a
VC label 3500 and an outgoing label of LSP1 2000 to the packet and sends the packet to
LSP2 (indicated by the blue dashed line) for transmission. For the ATM packet sent from
Site2 of VPN2 to PE2 with the VCI as 205, PE2 adds a VC label 4500 and an outgoing
label 2000 of LSP2 to the packet and sends the packet to LSP2 (indicated by the blue
dashed line) for transmission.
After these packets reach PE1, PE1 strips the incoming label 2002 of LSP1 and selects
outbound interfaces according to the VC labels 3500 and 4500. The VC labels 3500 and
4500 are transmitted to PE2 through LDP signaling when PE1 sets up VCs with PE2.
The preceding process of packet transmission shows that the outer LSP tunnel is shared. After
receiving the packets, PE2 maps the packets to different VCs according to different inner
labels.
In Martini mode, the LSP label is used to transmit the data of each VC on the ISP network.
The VC label is used to identify service data. An LSP on the ISP network can be shared by
multiple VCs. The LSP is used to transmit the VC data across the ISP network and can be
encapsulated into the IP tunnel. To deploy the Martini mode, the ISP network must be able to
automatically set up LSPs. The ISP network must support MPLS forwarding and MPLS LDP.
If not, GRE tunnel encapsulation can be implemented.
Signaling for Transmitting VC Labels

As mentioned earlier, the Martini mode is an extension to traditional LDP and is used to
transmit VC information.
In Martini mode, to configure a PW between two PEs, you only need to configure two
unidirectional VCs on the two PEs. When the interface that connects a PE and a CE goes Up,
the local PE sends a Label Mapping message using LDP to the peer PE if an LSP between the
local PE and the peer PE exists. The Label Mapping message carries the newly defined VE
FEC that describes information about the CE interface type, the VC label, and the interface
parameter. Currently, defined interface parameters include the MTU, the maximum number of
ATM cells that can be encapsulated into a packet, and the interface description.
In an LDP session, the VC FEC information is carried in the Label Mapping message. Figure
11-30 shows the format of the Label Mapping message.

Equipment
Figure 11-30 Format of the Label Mapping message

0 15 31
0 Label Message(0x0400) Message Length
Message ID
0 0 FEC TLV(0x0010) Length
VC TLV(0x80) c VC Type VC Info Length
Group ID
VC ID
Interface Parameters
0 0 Generic Label(0x0200) Length

Label
Optional Parameters
In Martini mode, extended remote LDP sessions are established between PEs to transmit VC
information. Type 128 FECs are added to transmit VC information. Figure 11-31 shows the
format of a VC FEC.
Figure 11-31 Format of a Type 128 FEC
0 7 8 23 31
VC TLV(0x80) c VC Type VC Info Length
Group ID
VC ID
Interface Parameters
In a Type 128 FEC, the length of Interface parameters is indefinite. The length information is
contained in the VC info length field.
Table 11-5 Descriptions of the fields in the Type 128 VC FEC

Field Meaning Bits Description
VC TLV Indicates the 8 Its value is 0x80, 128 in decimal

TLV of the VC. notation.

Equipment
C Indicates the 1 If the value is 1, the control word

control word. feature is supported; if the value is
0, the control word feature is not
supported.

Equipment
VC type Indicates the 15 The VC types include:

VC type. l ATM PWE3 protocol standards:
– atm-1to1-vcc: One PW
carries the ATM cells of one
PVC.
– atm-1to1-vpc: One PW
carries the ATM cells of one
PVP.
– atm-nto1-vcc: One PW
carries the ATM cells of
mutiple PVCs.
– atm-nto1-vpc: One PW
carries the ATM cells of
mutiple PVPs.
– atm-trans-cell: ATM cells
are transmitted over PWE3
VCs.
l TDM PWE3 protocol standards:
– satop-e1: The SAToP
protocol is insensitive to the
E1 frame structure and
encapsulates a whole E1
frame into a PW packet.
– satop-t1: The SAToP
protocol is insensitive to the
T1 frame structure and
encapsulates a whole T1
frame into a PW packet.
– cesopsn-basic: The Circuit
Emulation Service (CES)
protocol is sensitive to E1
and T1 frame structures. E1
or T1 signals can be
encapsulated into packets
based on timeslots.
l Other encapsulation types:
– ethernet: This encapsulation
type is used when Ethernet
packets do not carry VLAN
information.
– ip-interworking: This
encapsulation type is used
for interworking between
Huawei devices.
– ip-layer2: This
encapsulation type is used

Equipment
for interworking between

Huawei devices and non-
Huawei devices.
– vlan: This encapsulation
type is used when Ethernet
packets carry VLAN
information.
VC info length Indicates the 8 The value indicates the total length
length of the of the VC ID and interface
VC information. parameters.
Group ID Indicates the 32 Some VCs constitute a group that is

group ID. used to withdraw the corresponding
VC information in batches.
VC ID Indicates the 32 The value indicates the VC ID.

VC ID.
Interface Indicates Indefinite The frequently used interface

parameters interface parameter is the MTU.
parameters.
Usage Scenarios
The Martini mode applies to networks with sparse Layer 2 connections, such as networks
with the star topology.
Benefits
In Martini mode, the ISP network can be shared by multiple VCs. The Martini mode is easy
to extend because in the carrier's network, only the PE needs to save information about VC
labels and LSP mapping and the P does not contain any L2VPN information. When you add a
VC, you only need to configure two unidirectional VCs on the two related PEs without
affecting the running of the network. Compared with the Kompella mode, the Martini mode
adopts LDP rather than BGP as the signaling protocol. The Martini mode is independent of
the timing refresh mechanism. This mode is faster in fault detection.
11.4.2.4 SVC VLL
Definition
SVC VLL is an L2VPN technology that uses VC labels manually configured based on VC
IDs to transmit data. SVC VLL is similar to Martini VLL, except that Martini VLL uses LDP
to exchange VC labels. SVC VLL can be regarded as simplified Martini VLL.
SVC VLL Topology

SVC VLL assigns outer labels (used to identify public tunnels) in the same way as Martini
VLL. Inner labels for SVC VLL are manually specified during VC configuration. PEs do not

Equipment
require a signaling protocol to exchange VC labels. The network topology and packet
exchange process of SVC VLL are the same as those of Martini VLL.
When creating a static Layer 2 VC connection in SVC mode, you can specify an LDP LSP,
constraint-based routing label switched path (CR-LSP) as the bearer tunnel in the tunnel
policy. You can also specify multiple bearer tunnels for load balancing. SVC VLL supports
multi-hop inter-AS L2VPN, but does not support local connections.
11.4.2.5 Heterogeneous VLL

On a heterogeneous VLL network, a VC is established between AC interfaces of different link
types and with IP interworking configured. The AC interfaces transparently transmit Layer 3
data (IP packets) over an MPLS network.
If the link types of AC interfaces on both ends of an L2VPN link are different, heterogeneous
VLL is required.
Introduction
Heterogeneous VLL applies to scenarios where the AC interfaces at both ends of an L2VPN
connection have different link types. After a PE receives a frame from a CE, the PE removes
the frame header and transparently transmits the IP packet over an MPLS network to the peer
PE. The peer PE re-encapsulates the IP packet according to its own link layer protocol and
transmits the packet to the connected CE. PEs directly process link-layer control packets
received from CEs without transmitting these packets over the MPLS network and silently
discard non-IP packets, including MPLS and Internet Packet Exchange (IPX) packets.
Topology
Heterogeneous VLL is required when two heterogeneous sites accessing an L2VPN backbone
network need to communicate. On the network shown in Figure 11-32, Site 3 and Site 4 are
homogeneous sites, but Site 1 and Site 2 are heterogeneous sites.
Figure 11-32 Heterogeneous VLL

Site1 Site2
CE Backbone CE
ATM 1 GE 1
VPN1 PW 200 VPN1

ATM 1 GE 2
PE PE
GE 1 GE 1
PW 100
VPN2 VPN2
GE 1 GE 1
CE CE
Site4 Site3
Table 11-6 lists the types of data that can be transparently transmitted over a VLL.

Equipment
Table 11-6 Types of data that can be transparently transmitted over a VLL
Value Type
0x0005 Ethernet
0x0007 PPP
Processing of Different Link Layer Protocols

l Ethernet and VLAN
Only the following types of Ethernet interfaces support IP interworking when
functioning as AC interfaces:
– Ethernet interface and its sub-interface
– GE interface and its sub-interface
l PPP
– PPP supports both Password Authentication Protocol (PAP) authentication and
Challenge Handshake Authentication Protocol (CHAP) authentication. The
authentication type can be local authentication, RADIUS authentication, or
HWTACACS authentication.
– PPP supports STAC Lempel-Ziv standard (STAC-LZS) compression, but does not
support IP header compression (IPHC) or Van Jacobson header (VJ) compression.
– PEs and CEs can assign IP addresses to each other.
– PPP supports the transparent transmission of IP packets from a local CE to the peer
CE, but does not support the transparent transmission of MPLS, Intermediate
System to Intermediate System (IS-IS), or Internetwork Packet Exchange (IPX)
packets. PPP interfaces with MPLS, IS-IS, or IPX enabled can perform Network
Control Protocol (NCP) negotiation, but cannot forward data packets.
11.4.2.6 Comparison Between Modes of Implementing the VLL

Table 11-7 shows the comparison between modes of implementing the VLL.
Table 11-7 Comparison between modes of implementing the VLL
Impleme Signalin Tunnel Applicati Scalabi Support for

ntation g on lity Local
Mode Protocol Scenario Connections
CCC None l Local CCC: does N/A Poor Yes

not need public
network tunnels.
SVC None Needs a shared GRE N/A Poor No

or LSP tunnel.
Martini LDP Needs a shared GRE Sparse Poor No

or LSP tunnel. mode

Equipment
11.4.2.7 Comparison Between the MPLS L2VPN and the BGP/MPLS VPN
Table 11-8 shows differences between the Martini MPLS L2VPN, and BGP/MPLS VPN.
Table 11-8 Comparison between the MPLS L2VPN and the BGP/MPLS VPN
Item BGP/MPLS VPN Maritni L2VPN
Cost of PEs The memory cost is high; the The memory cost is low; the
consumption of interface consumption of interface resources
resources is low; the signaling is high; the signaling cost is high.
cost is low.
Flooding mode BGP automatic discovery. Manual configuration.

of the VPN
topology
Flooding mode The VPN routes are flooded The VPN routes are flooded directly
of VPN routes through PEs and converge between CEs and converge rapidly.
slowly.
Access mode of Different sites in the same VPN Maritni L2VPNs of different
CEs can have different access modes. encapsulation types, such as PPP,,
ATM and Ethernet (VLAN), can
interwork through heterogeneous
IP-interworking.
VPN nesting Supported. Not supported.
Protocol Over only IP. Over any Layer 3 protocol.

independence
Inheritance from Inherits and improves the Inherits and improves the traditional
the tradition traditional L2VPN. L2VPN.
VPN
Maturity Mature. Immature.
Easy-of-use Simple. Complex.
Manageability Outsourced route and role-based Outsourced topology and

management. centralized management.

Redundant L2VPN Networking Schemes
L2VPN faults, comprising the local AC fault, remote AC fault, local PE fault, remote PE
fault, and PSN fault, may occur on any node or link between the Node B and the RNC. To
avoid the preceding L2VPN faults, you can adopt one of the following redundant L2VPN
networking schemes.
l As shown in Figure 11-33, the Node B accesses PEs through one AC, and the RNC
accesses PEs through two ACs.

Equipment
This scheme is a networking solution considering backbone network faults, AC link

faults on the dual-AC access side, and PE faults.
Figure 11-33 Node B accessing PEs
P1 PE2
PE1
VPN backbone AC2
AC1
AC3
P2 PE3 RNC
Node B
l Backbone tunnel backup: As shown in Figure 11-34, a primary tunnel and one or more
secondary tunnels are set up between PEs on both ends of the link.
This scheme is a networking solution considering tunnel faults in a backbone network.
Figure 11-34 Backup of the tunnel in a backbone network
P1
PE1 PE2
AC1
VPN backbone
AC2
Node B P2
RNC
Redundant L2VPN is not required, because if a secondary tunnel exists between PEs,
BFD can directly detect the tunnel fault and switch tunnels, which speeds up fault
convergence and avoids PW faults.

Acronym & Full Name
Abbreviation
AC attachment circuit
CCC circuit cross connect
CE customer edge

Equipment
Acronym & Full Name

Abbreviation
DDN digital data network
ISP Internet service provider
L2PDU Layer 2 protocol data unit
PE provider edge
SP service provider
VC virtual circuit
VLL virtual leased line
VPWS virtual private wire service
11.5 PWE3
11.5.1 Introduction
Definition
A pseudo wire emulation edge to edge (PWE3) service is a point-to-point (P2P) connection
on a multiprotocol label switching (MPLS) Layer 2 virtual private network (L2VPN). PWE3
provides methods for carrying network services such as asynchronous transfer mode (ATM),
frame relay (FR), Ethernet, time division multiplexing (TDM), and synchronous optical
network/synchronous digital hierarchy (SONET/SDH) over a packet switched network (PSN).
PWE3 is developed based on draft-martini-l2circuit-trans-mpls and in compliance with RFC
4447. Currently, PWE3 supports only FEC 128.
Purpose
IP networks have developed rapidly in recent years, owing to their advantages in
upgradability, expansibility, and interoperability. In comparison, the development of
traditional communications networks is confined due to limitations on transmission modes
and service types. To upgrade traditional communications networks and expand their capacity,
integrate them with existing PSNs. This solution maximizes the use of existing network
resources.
PWE3 is used to carry various types of services such as Ethernet, ATM, TDM, and PPP over
broadband metropolitan access networks or mobile broadband networks. As shown in Figure
11-35, the headquarters and branch of company A use traditional communications networks
such as ATM and FR networks. A pseudo wire (PW) is established between PE1 and PE2

Equipment
using PWE3, so that the headquarters and branch can communicate over the MPLS network.
By converging previous access modes with the current IP backbone network, PWE3 prevents
repetitious network construction and saves operation costs.
Figure 11-35 Typical application scenario of PWE3

PW
Company A Company A
MPLS branch
headquarters
network
MPLS tunnel
AC AC
CE1 PE1 PE2 CE2
Benefits
As an independent workgroup of the Internet Engineering Task Force (IETF), the PWE3
workgroup extends draft-martini-l2circuit-trans-mpls and defines a complete PW architecture.
This architecture uses some specifications of Martini virtual leased line (VLL) defined in
draft-martini-l2circuit-trans-mpls and has the following features compared with the Martini
PW architecture:
l Advertising PW status using Label Distribution Protocol (LDP) signaling Notification
messages
Notification messages only advertise PW status. A PW established in PWE3 mode is
torn down only when PW configurations are deleted or the LDP session is interrupted.
This feature reduces signaling control packets exchanged between PEs, reducing
signaling costs. PWE3 PWs can work with Martini PWs.
l Supporting MS-PWs
The number of LDP connections required on an access device is reduced, minimizing
LDP session costs on the access device. MS-PWs allow more flexible networking.
l Supporting TDM interfaces
TDM interfaces can use the control word (CW) feature to sequence TDM packets and
the Real-Time Transport Protocol (RTP) to extract and synchronize clock signals.
l Providing the fragmentation negotiation mechanism
l Providing PW connectivity detection functions, such as virtual circuit connectivity
verification (VCCV) and PW operation, administration, and maintenance (OAM)
PW connectivity detection ensures quicker network convergence and enhanced network
reliability.
l Enriching and optimizing MIB functions and improving MIB maintainability
In addition to carrying various types of services, PWE3 enables a mobile network to evolve
towards LTE. PWE3 can protect carriers' investment during the migration of services such as
ATM and TDM from traditional communications networks to IP networks.
NOTE
For similarities between PWE3 and Martini, such as L2VPN heterogeneous interworking and inter-AS
VPN, see Martini VLL.

Equipment
11.5.2 Principles
11.5.2.1 Basic PWE3 Principles
PW Classification
PWs can be classified into static PWs and dynamic PWs or single-segment PWs (SS-PWs)
and MS-PWs, depending on different classification methods.
Martini VLL supports dynamic PWs established using LDP signaling. In addition to dynamic
PWs, PWE3 also supports static PWs.
An SS-PW is a PW set up between two PEs without PW label switching. PW1 in Figure
11-36 is an example of an SS-PW.
An MS-PW is a set of two or more PW segments that function as a single PW. The
forwarding mechanisms of PEs for the SS-PW and MS-PW are the same. The only difference
is that PW labels are switched on switching PEs (SPEs) for MS-PWs. PW2 in Figure 11-36 is
an example of an MS-PW.
NOTE
If two PEs cannot establish a connection using signaling or cannot establish a direct tunnel, configure an MS-
PW between the two PEs instead. By supporting MS-PWs, PWE3 enables networking modes to be more
flexible.
The preceding PW classification methods can be used together. For example, an MS-PW can
be a set of static and dynamic PW segments.
Figure 11-36 Networking diagram for the SS-PW and MS-PW

PW1
PE1 P PE2
CE1 CE1 MPLS Network CE2
SPE
PW2 PW2
Segment1 Segment2
Establishment, Maintenance, and Deletion of Dynamic PWs

Dynamic PWs are established using LDP signaling and VC information is carried in extended
TLV fields of LDP signaling messages. Before a dynamic PW is established between two
PEs, an LDP session must be established between the two PEs. During the establishment of a

Equipment
dynamic PW, the label distribution mode is downstream unsolicited (DU) and the label
retention mode is liberal label retention.
NOTE
If Ps exist between the two PEs, the LDP session must be established in remote mode. If the two PEs are
directly connected, the local LDP session is established.
After PWE3 is configured on the two PEs and an LDP session is established between the two
PEs, the dynamic PW starts to be established. Figure 11-37 shows the process of establishing
a dynamic PW.
1. PE1 sends a Label Request message and a Label Mapping message to PE2.
2. After receiving the Label Request message from PE1, PE2 sends a Label Mapping
message to PE1.
3. After receiving the Label Mapping message from PE1, PE2 determines whether its PW
configurations are consistent with those on PE1. If its PW configurations such as the VC
ID, VC type, MTU, and CW enabling status are consistent with those on PE1, PE2 sets
the PW status as Up.
4. After receiving the Label Mapping message from PE2, PE1 determines whether its PW
configurations are consistent with those on PE2. If consistent, PE1 sets the PW status as
Up. After that, a dynamic PW is established between PE1 and PE2.
5. After the dynamic PW is established, PE1 and PE2 learn the status of each other by
exchanging Notification messages.
Figure 11-37 Process of establishing and maintaining an SS-PW

Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
PE1 PE2
mpls l2vc 2.2.2.2 101 Reque

st
Mappin mpls l2vc 1.1.1.1 101

g
parameter match , VC up
ing
Mapp
parameter match , VC up
Notification
AC/Tunnel state changed AC/Tunnel state changed
If the AC interface of a PW is Down or the corresponding tunnel is Down, Martini and PWE3
use different processing mechanisms:
l In Martini mode, the local Provider Edge (PE) sends a Label Withdraw packet to its peer
to tear down the PW. After the AC interface or tunnel goes Up, another round of
negotiation is required for the PEs to establish a PW.
l In PWE3 mode, the local PE sends a Notification signaling to notify its peer that packets
cannot be forwarded, but the PW is not torn down. When the AC interface or tunnel goes

Equipment
Up, the local PE sends a Notification packet to notify its peer that packets can be
forwarded.
The PW is torn down only when PW configurations are deleted from the PEs or the LDP
session is interrupted. Notification messages prevent repeated PW establishment and deletion
caused by network flapping.
Figure 11-38 Process of deleting an SS-PW
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
PE1 PE2
mpls l2vc 2.2.2.2 101 mpls l2vc 1.1.1.1 101

VC Deletion
Withdraw
Release
Release
VC Deletion
Figure 11-38 shows the process of deleting an SS-PW:

1. When PW configurations are deleted from PE1, PE1 withdraws its PW label and sends
Label Withdraw and Label Release messages to PE2 in succession.
NOTE
The Label Withdraw message notifies a PE of withdrawing the PW label. The Label Release message is
a response to the Label Withdraw message, notifying the PE that sends the Withdraw message to
withdraw the PW label. To delete the PW more quickly, PE1 sends the Label Withdraw and Label
Release messages in succession.
2. After receiving the Label Withdraw and Label Release signaling from PE1, PE2
withdraws its remote VC label and sends a Label Release message to PE1.
3. When PE1 receives the Label Release message from PE2, the PW is torn down.
The difference between an SS-PW and an MS-PW is that one or more SPEs exist between the
end PEs of an MS-PW. Figure 11-39 shows an example of an MS-PW between PE1 and PE2.
The SPE connects the two PW segments between PE1 and PE2.
During signaling negotiation, the SPE forwards to PE2 parameters carried in the Label
Mapping message sent by PE1 and forwards to PE1 parameters carried in the Label Mapping
message sent by PE2. After parameters are negotiated to be consistent, the PW status becomes
Up. The forwarding of Label Release, Label Withdraw, and Notification messages are similar
to the forwarding of Label Mapping messages.

Equipment
Figure 11-39 Signaling interaction process for an MS-PW
Loopback1 Loopback1 Loopback1

1.1.1.1/32 2.2.2.2/32 3.3.3.3/32
PE1 SPE PE2
mpls l2vc 2.2.2.2 100 Request
Mapping
mpls l2vc 2.2.2.2 100
Request
Mapping
Switch PW
Request
Request
Mapping parameters match
parameters match Mapping VC up
VC up
Derivative Functions
PWE3 reliability requirements are increasing as the PWE3 technology becomes more widely
used. Currently, many fast fault detection and protection switching mechanisms are available,
such as bidirectional forwarding detection (BFD), OAM, and FRR. These mechanisms,
however, address only link or node failures within a PSN, but not PE or AC failures between
PEs and CEs. To solve this problem, PW automatic protection switching (APS) and PW
redundancy are introduced. PW APS is in compliance with G.8131 and PW redundancy is in
compliance with draft-ietf-pwe3-redundancy. For details, see PWE3 Reliability.
11.5.2.2 ATM Cell Relay

ATM cell relay is a technique that transmits ATM cells over PWE3 VCs.
Background
ATM is a traditional multi-service bearer technology used on backbone networks. ATM
networks can carry services such as IP, FR, voice, teleconference, and ISDN/DSL and provide
well-designed quality of service (QoS) mechanisms for these services. ATM networks have
been used to carry important services.
IP networks have developed rapidly in recent years, owing to their advantages in

upgradability, expansibility, and interoperability. The traditional ATM networks, however, are
less compatible with newly deployed networks due to limitations on transmission modes and
service types. An urgent demand is to upgrade traditional ATM networks and integrate them
with existing PSNs, so that existing network resources can be fully utilized to meet expanded
service demand.

Equipment
By interconnecting ATM networks over a PSN, ATM cell relay emulates traditional ATM
services when they are transmitted over the PSN. This allows end users to be unaware of
network differences and protects carriers' investment during network convergence and
construction.
Related Concepts
l ATM cell: A cell is the basic ATM transmission unit. An ATM cell consists of 53 bytes,
comprising a 5-byte header and a 48-byte payload. Each ATM cell is transmitted
independently with a short transmission delay.
l VC: ATM is a VC-based and connection-oriented switching technology. Each VC is
identified by a virtual path identifier (VPI) and a virtual channel identifier (VCI). A
VPI/VCI pair is valid for only a link between ATM devices.
l PVC: A permanent virtual circuit (PVC) is a type of ATM connection configured by a
network administrator. The establishment of a PVC does not require signaling.
l SVC: A switched virtual circuit (SVC) is a type of ATM connection dynamically
established using signaling.
l VCC: A virtual circuit connection (VCC) is a type of ATM connection established based
on VCI switching.
l VPC: A virtual path connection (VPC) is a type of ATM connection established based on
VPI switching.
l AAL: The ATM adaptation layer is similar to the data link layer of the OSI reference
model and is integrated with the ATM layer. The AAL is responsible for separating the
upper layer from the ATM layer. The AAL prepares for conversion between service data
and ATM cells by fragmenting service data into 48-byte payloads for ATM cells.
l VPI/VCI mapping: As shown in Figure 11-40, a PW is used to emulate an ATM Switch.
To retain configurations on ATM Switch, VPI/VCI pairs 1/100 and 2/200 must be
mapped to each other on PE1 and PE2. In this manner, VPI/VCI pairs for CEs of a VC
are mapped. If the PW emulates only one VPC or VCC, the PW functions as an ATM
switch and mapping between VPI/VCI pairs does not need to be configured on PE1 and
PE2. If the PW emulates two or more VPCs or VCCs, mapping between VPI/VCI pairs
need to be configured on PE1 and PE2.
Figure 11-40 Networking diagram for ATM cell relay over a P2P tunnel on a PSN
ATM
Switch
VP
I: I/
I/ VC 2/2 VCI:
ATM VP / 1 0 0 00 ATM
1 CE
ATM CE ATM
Switching Switching
Network Network
VP I:
I/ /VC
1/1 VCI: I
VP /200
00 2
PSN
PE1 PE2

Equipment
NOTE
For details about ATM, see the ATM description in ATN Multi-service Access Equipment Feature
Description - WAN Access.
Implementation
ATM cell relay interconnects traditional ATM networks and carries ATM cells over a point-to-
point PW on a PSN.
Figure 11-41 shows the label encapsulation mode for ATM cell relay over a PSN. The outer
label is the MPLS tunnel label and the inner label is the VC label used to identify the PW.
Figure 11-41 Networking diagram for ATM cell relay over a PSN
PSN-based ATM encapsulation
PSN transport header Outer label
PW header Inner label
Outer label identifying ATM control word
the PSN tunnel ATM service payload
MPLS Inner label identifying
ATM service the PW
network
PSN tunnel ATM
ATM
network PW network
PE PE
ATM service
A VPI/VCI pair is used to identify an ATM VC. Based on PW emulation types and
comparison between ATM cell relay and AAL5 SDU relay, the following ATM cell relay
modes are defined:
l One-to-one (1-to-1): One PW emulates one VCC or VPC to carry ATM cells.
l N-to-one (N-to-1): One PW emulates two or more VCCs or VPCs to carry ATM cells.
l ATM port cell relay: One PW emulates one dedicated ATM transport line to carry ATM
cells and VPC or VCC emulation is not required.
As shown in Figure 11-42, ATM cell relay is classified into the following modes based on
PWE3 networking modes:
l Remote ATM cell relay: CEs are connected to two different PEs on the PSN, and ATM
cells need to be transparently transmitted over the PSN.
l Local ATM cell relay: CEs are connected to the same PE on the PSN. ATM cells are
directly forwarded by the PE, instead of being transparently transmitted over the PSN.

Equipment
Figure 11-42 Networking diagram for local and remote ATM cell relay
PE1 PE2
ATM
ATM PSN ATM
network network
CE1 CE3
ATM ATM
network
CE2
ATM
network Local connection
Remote connection
Table 11-9 lists the characteristics of different ATM cell relay modes.
Table 11-9 Characteristics of different ATM cell relay modes

Encapsulatio AAL Type Connection Type Encapsulation Method
n Mode
N-to-1 VCC All AAL types VC The VPI/VCI pair is

cell relay encapsulated into the ATM cell.
The control word is optional for
the PW. This PW encapsulation
mode supports VPI/VCI
switching.
1-to-1 VCC All AAL types VC The VPI/VCI pair is not

The control word is required for
the PW. This PW encapsulation
mode supports VPI/VCI
switching.
N-to-1 VPC All AAL types VP The VPI/VCI pair is

the PW.
1-to-1 VPC cell All AAL types VP The VCI but not the VPI is
relay encapsulated into the ATM cell.
The control word is required for
the PW.
ATM port cell All AAL types Port The VPI/VCI pair is
relay encapsulated into the ATM cell.
the PW.

Equipment
Usage Scenario
The following describes usage scenarios for different ATM cell relay modes.
ATM VCC Cell Relay
Figure 11-43 shows an example of a VCC. A VCC is the basic transmit unit of an ATM
network. VCCs can carry various ATM services.
Figure 11-43 ATM VCC cell relay
PVC:VPI1/VCI1 PVC:VPI2/VCI2
CE PE PE CE
ATM PSN ATM
Network Network Network
ATM VPC Cell Relay
Figure 11-44 shows an example of a VPC. A VPC is a set of VCCs with the same destination.
VPCs can carry various ATM services. ATM VPC cell relay applies to the scenario in which
packets from multiple users are bound to the same destination. ATM VPC cell relay features
rapid transmission, easy management, and convenient configuration.
Figure 11-44 ATM VPC cell relay
PVC:VPI1 PVC:VPI2
CE PE PE CE
ATM PSN ATM
ATM Port Cell Relay
Figure 11-45 shows an example for ATM port cell relay. ATM port cell relay allows an ATM
port to be connected to another ATM port for ATM cell transmission. ATM port cell relay
applies to the scenario in which ATM cells need to be transmitted between two CEs over a
connection other than the VPC or VCC. The ingress PE discards idle and unassigned cells
received on an ATM port, saving bandwidth resources.
Figure 11-45 ATM port cell relay
ATM Whole ATM Whole

Port Port
CE PE PE CE
ATM PSN ATM

Equipment
Benefits
By interconnecting traditional ATM network resources over a PSN, ATM cell relay emulates
traditional ATM services when they are being transmitted over the PSN. This allows end users
to be unaware of network differences and protects carriers' investment during network
convergence and construction.
11.5.2.3 PW Template
A PW template is a set of common attributes abstracted from PWs. Before configuring PWs
with similar attributes, you can define a PW template that contains the common attributes of
these PWs. Then, you can configure these PWs based on the PW template to simplify the
configuration process.
The ATN allows for binding between PWs and PW templates and the reset of PW templates.
Using a PW template helps simplify the configuration of PWs with the similar attributes.
PW Template Attributes
On the endpoint PEs of a PW, you can create a PW template and specify the related attributes,
such as the peer IP address, control word, tunnel policy, and maximum number of cells
allowed in a frame. These attributes are optional and can be selected as required. If you want
to perform the continuity check in control word mode, enable the control word function in
advance.
11.5.2.4 Static-Dynamic MS-PW

A static-dynamic multi-segment PW (MS-PW) consists of both static and dynamic PW
segments. One static-dynamic MS-PW may have multiple static or dynamic PW segments.
On the network shown in Figure 11-46, a static-dynamic MS-PW is established between

UPE1 and UPE2. The PW segment between UPE1 and the SPE is a dynamic one, and the PW
segment between UPE2 and the SPE is a static one.
Figure 11-46 Static-dynamic MS-PW

P1 S-PE P2
W S ta
ic P tic
am
Dyn PW
U-PE1 U-PE2
CE-A CE-B

Equipment
11.5.2.5 Other Related Features

Currently, devices support PWE3 configuration on trunk interfaces.

Acronym & Full Name
Abbreviation
CEP Circuit Emulation over Packet
CW control word
FR frame relay
PE provider edge
PW pseudo wire
PSN packet switched network
SPE superstratum provider edge
UPE user-end provider edge
VC virtual circuit
VCCV virtual circuit connectivity verification
11.6 PWE3 Reliability
11.6.1 Overview
Introduction
Pseudo Wire Emulation Edge to Edge (PWE3) is a bidirectional and point to point (P2P)
service on a multiprotocol label switching Layer 2 virtual private network (MPLS L2VPN).
High reliability is required for the VPN service. There are many fast fault detection and
protection switching mechanisms such as bidirectional forwarding detection (BFD),
operation, administration and maintenance (OAM), and fast reroute (FRR). These
mechanisms, however, address only link or node failures within a packet switched network
(PSN), but not PE failures or attachment circuit (AC) failures between PEs and CEs.
To protect services against PE and AC failures, PWE3 reliability mechanisms are required.

Equipment
Figure 11-47 AC protection using trunk or APS
Packet
Trunk/APS PE1 Switched PE2 Trunk/APS
Network
AC PW AC
The most effective way to protect ACs is to deploy multiple physical links between PEs and
CEs connected by the ACs. In Figure 11-47, the trunk technique is used to bundle multiple
physical links into a logical link and automatic protection switching (APS) is configured for
the trunk. Trunk applies to Ethernet links, whereas APS applies to asynchronous transfer
mode (ATM) or time division multiplexing (TDM) links.
Either trunk or APS protects services against only AC failures between PEs and CEs, but not
PE failures.
To protect services against both PE and AC failures, enhanced trunk (E-Trunk) and PW
redundancy/PW APS are used. E-Trunk is deployed between devices.
On the network shown in Figure 11-48, PE2 is the master and PE3 is the backup; the AC
between CE2 and PE2 is active and the AC between CE2 and PE3 is standby. A primary PW
is deployed between PE1 and PE2; a secondary PW is deployed between PE1 and PE3. The
backup PE and standby AC protect services on the master PE and active AC.
Figure 11-48 PE and AC protection usingPW redundancy/PW APS

PE2 E-Trunk/E-APS
W Trunk/APS
Trunk/APS ary P
Prim
Bypass PW
PSN
Seco
ndar
PE1 y PW CE2
CE1
PE3
AC PW AC
PWE3 Reliability Mechanisms

PWE3 provides PW redundancy, PW APS, and PWE3 FRR to improve reliability.
PW redundancy is an enhancement to PWE3 FRR.
When PWE3 FRR is used and a public network link on the PSN fails, traffic must also be
switched between ACs. As Figure 11-48 shows, the path between PE1 and PE2 is the active;
the path between PE1 and PE3 is the standby. When the public network link between PE1 and
PE2 fails or PE2 fails, PWE3 triggers Ethernet OAM to rapidly notify CE2 of the failure.
Upon receipt of the failure notification, CE2 switches traffic to the link between CE2 and
PE3. If PWE3 is associated with E-Trunk in this network, traffic cannot be rapidly switched
back after the failure is removed.

Equipment
Table 11-10 Comparison between PWE3 reliability mechanisms

PWE3 PW Type Protection Tunnel Link Switching
Reliability Mode Type Detection Performan
Mechanis Mechanis ce
m m
PW Only PWE3, 1:1 l Static BFD for PW Fair

redundancy but not LSP LDP
Martini or l Static signaling
SVC CR-LSP
l Dynamic
CR-LSP
l LDP LSP
PW APS PWE3, l 1:1 l Static TP-OAM Fair

Martini, and l 1+1 in LSP MPLS OAM
SVC asymmet l Static
ric mode: CR-LSP
CE1 l Dynamic
transmits CR-LSP
and
receives l LDP LSP
traffic
through
one link,
and CE2
transmits
and
receives
traffic
through
double
links but
accepts
traffic
through
only the
active
link.
PWE3 FRR Only PWE3 1:1 l Static BFD for PW, Poor
and Martini, LSP OAM
but not SVC l Static mapping,
CR-LSP and physical
layer failure
l Dynamic notification
CR-LSP
l LDP LSP

Equipment
11.6.2 Principles
11.6.2.1 PW Redundancy
PW Redundancy Signaling
In conventional PWE3, one-to-one mapping is implemented between ACs and PWs. To
ensure the same forwarding capability, the PW protection mechanism to be used must allow
the configuration of a single PW in a PW group as an active PW and the remaining as inactive
PWs.
RFC 4447 (Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP) )
specifies the PW Status TLV to transmit the PW forwarding status. The PW Status TLV is
transported to the remote PW peer using a Label Mapping or LDP Notification message. The
PW Status TLV is a 32-bit status code field. Each bit in the status code field can be set
individually to indicate more than a single failure at once. PW redundancy introduces a new
PW status code 0x00000020. When the bit is set, it indicates "PW forwarding standby".
NOTE
Only the VPLSs in PWE3 mode support PW redundancy.
Primary/Secondary and Active/Inactive PWs

PW redundancy defines primary/secondary PWs and active/inactive PWs.
l Primary and secondary are terms used to describe PW forwarding priorities and can be
configured.
A PE selects the primary PW in preference to a secondary PW when both PWs are in the
Active state. Currently, only one secondary PW can be configured for a primary PW.
l Active and inactive are terms used to describe PW forwarding and operating status and
cannot be configured.
Only active PWs are used to forward traffic. The signaling status and configured
forwarding priority determine PW forwarding status. A PW with the highest priority will
be selected as an active PW to forward traffic. All the other PWs will be in the Inactive
state and must not be used to forward traffic. Inactive PWs used in the VLL service can
be configured to receive traffic though.
Operation Modes
PW redundancy operation modes are specified on PEs where primary and secondary PWs
have been configured. If a PW redundancy operation mode is not specified, PWE3 FRR will
be used.
NOTE
In PWE3 FRR, a PE locally determines the primary and secondary status of the PWs, of which a remote
PE is not informed. PWE3 FRR is implemented on Huawei devices only and is not recommended.
There are two PW redundancy operation modes:
Master/slave mode:

Equipment
A PE locally determines the primary and secondary status of the PWs, and uses signaling to
inform a remote PE of the status. The PW status is independent of the AC status, and
therefore PW and AC failures are isolated.
Independent mode:
On a PE, its PW status is determined by the remote AC status after negotiation procedures.
The remote PE then informs the PE of the PW status. If an AC fails and protection switching
is triggered, protection switching will also be implemented on the PWs. This mode cannot
isolate PW and AC failures.
NOTE
PW redundancy in Independent mode is recommended in PWE3 networking to ensure protection

switching.
11.6.2.2 PW APS
Definition
APS instructs the source and destination ends to implement protection switching in the same
manner to achieve traffic switching, delayed switching, and wait-to-restore. APS always
transmits protocol traffic along the backup channel. Both the transmit and receive ends know
that they receive APS protocol packets through each other's backup channel. This
implementation helps determine whether both ends are configured with the same master and
backup channels.
PW APS is an application of APS on PWs. PW APS uses PW OAM to monitor the PW status.
If a PE detects that the primary PW fails, PW APS is triggered, and traffic is switched to the
secondary PW, implementing service protection.
Purpose
PWs are generally used to transmit 2G services between base transceiver stations (BTSs) and
base station controllers (BSCs), 3G services between NodeBs and RNCs, and long term
evolution (LTE) services between eNodeBs and mobility management entities (MMEs)/
serving gateways (S-GWs). PWs meet requirements for bandwidth, expansion, and flexible
configuration of these services. The bearer network solution includes:
l Static solution: Static routes, LSPs, and PWs are used.
l Dynamic solution: Dynamic routes, LSPs/TE tunnels, and PWs are used.
As static PWs do not use signaling, the primary and secondary PW status negotiation, PW
switchover, and PW switchback cannot be implemented using signaling. PW redundancy
currently supported addresses only PWE3 reliability, but not reliability for PWs in SVC or
Martini mode. SVC PWs are static PWs. PW APS can provide reliability for PWs in SVC,
Martini, or PWE3 mode.
l PW APS uses PW OAM (MPLS OAM or TP OAM) to rapidly monitor PW status and
notifies APS of the status.
l The primary/secondary PW protection group is associated with APS instances. APS
instructs the source and destination ends to implement bidirectional PW protection
switching in the same manner, as defined in G.8131.
PW APS applies to SVC, Martini, or PWE3 PWs.
Using PW APS or PW redundancy solely on the entire network is recommended. PW APS
and PW redundancy are both reliability mechanisms but are implemented differently.

Equipment
Implementing both mechanisms on a network increases the difficulties for network

maintenance.
Basic Concepts
Protection Type
PW APS can work in 1:1 or 1+1 mode, in which primary and secondary PWs backing up each
other. In PW APS 1:1 mode, traffic is transmitted and received through a single link. In PW
APS 1+1 mode, traffic is transmitted and received through double links but accepted through
only one link.
Switching Type
PW APS supports bidirectional protection switching. If a working PW fails in one direction,
APS switches traffic in both directions to a protection PW.
Operation Mode
The PW APS operation mode can be a revertive operation mode or a non-revertive operation
mode. In non-revertive mode, traffic will not be switched back from the protection PW to the
working PW even if the working PW recovers. In revertive mode, traffic will return to the
working PW after the wait-to-restore (WTR) timer configured for the working PW expires.
WTR Time
The WTR time is counted from the time when the primary PW recovers to the time when
traffic is switched back from the secondary PW. Setting a WTR time prevents frequent traffic
switching.
Delayed Switching Time
The delayed switching time is the time after which a protection switching is triggered if a
signal fail (SF) is still detected on a PW. Setting a delayed switching timer prevents switching
from immediately occurring after an SF is detected.
Dual-Homing Protection
Dual-homing protection is implemented by connecting two PEs to a CE through respective
ACs. This protects PE services on the bearer network.
PW APS Bundling
The device usually needs to undergo a great deal of PW APS protection switching. If PW
APS enables a state machine for each protection switching, the device will not be able to
implement all protection switching due to limited resources and capabilities. Configuring an
APS state machine to process a great deal of PW APS protection switching decreases resource
consumption. This APS state machine is shared by multiple PWs, which is called PW APS
bundling.
Switching Mechanism
PW APS uses PW OAM to monitor the primary and secondary PW status. PW OAM sends
detection packets from the ingress to the egress periodically. If the egress fails to receive any
detection packets in a certain period, it considers that an SF occurs and notifies the remote
APS module of the fault. This implements service switching and protection.
As shown in Figure 11-49, PW APS is configured on PE1 and PE2. Normally, upstream
traffic from a BTS/NodeB is transmitted along the path PE1->primary PW->PE2 on the PSN.

Equipment
PE2 forwards the traffic to a BSC/RNC. Downstream traffic from a BSC/RNC is transmitted
along the path PE2->primary PW->PE1 on the PSN. PE1 forwards the traffic to a BTS/
NodeB.
Figure 11-49 PW APS deployment
BTS/NodeB PE1 PE2 BSC/RNC
Primary PW
Secondary PW
Service flow
As shown in Figure 11-50, if the primary PW fails, PW OAM on PE1 and PE2 detects the
failure and triggers APS. Both upstream and downstream traffic are switched to the secondary
PW.
The delayed revertive operation mode is used for PW APS by default. After the primary PW
recovers, PW OAM on PE1 and PE2 detects the recovery but waits a delayed switching time
before triggering an APS revertive operation. Both upstream and downstream traffic are then
switched back to the primary PW.
Figure 11-50 PW APS implementation
BTS/NodeB PE1 PE2 BSC/RNC

Primary PW
Secondary PW
Point of failure
Service flow
11.6.3 Applications
11.6.3.1 PW Redundancy in the Scenario that the Node B Accesses Three PEs
(PWE3)

Equipment
Figure 11-51 Networking diagram of PW redundancy in the scenario that the Node B
Accesses three PEs
RNC
E-Trunk
PE1 PE2
bypass PW
PW1
PW2
PE3
BFD BFD
Node B
Figure 11-51 shows the networking diagram of PW redundancy in the scenario that CEs
asymmetrically access three PEs. This chapter takes E-Trunk as an example to describe how
the primary/secondary statuses of PWs are dynamically negotiated.
Table 11-11 Type and configuration of the link for PW redundancy in the scenario that the
Node B Accesses three PEs
Type of the AC Link Configuration on the Configuration on the PE
RNC
Ethernet E-Trunk PWE3 PW
ATM, TDM E-APS PWE3 PW
Negotiation of the Primary/Secondary Status of a PW

The primary/secondary statuses of PWs are negotiated in the following steps:
1. The master/backup statuses of PEs are negotiated.
The E-Trunk is responsible for negotiating the master/backup statuses of the dual-homed
PEs. Assume that PE1 is negotiated as the master and PE2 negotiated as the backup.
2. The primary/secondary statuses of PWs are dynamically negotiated.
a. The E-Trunk is associated with the PWs. In this manner, the E-Trunk can notify the
master/backup statuses of the PEs to the PWs so that the local statuses of the PWs
on PE1 and PE2 can be determined.

Equipment
b. The local statuses of the PWs on PE1 and PE2 are notified to PE3 through LDP
packets.
Note that LDP packets of PE1 and PE2 reach PE3 in a random sequence.
c. After receiving the LDP packets from PE1 and PE2, PE3 acknowledges that PW1
of PE1 is the primary PW and PW2 of PE2 the secondary PW.
In this case, the unidirectional traffic path is RNC -> PE1 -> PW1 -> PE3 -> Node B.
Primary/Secondary PW Switchover
The primary/secondary PW switchover occurs in one of the following situations:
l The E-Trunk priority is changed, and statuses of PWs are renegotiated.
l PE1 becomes faulty. In this case, the E-Trunk detects the fault, and changes the status of
PE2 from backup to master. Statuses of the PWs are then renegotiated.
Note that statuses of the PWs are not affected if the backup node PE2 becomes faulty.
l The AC link between PE1 and RNC becomes faulty. The processing flow is similar to
that for the fault of PE1.
Note that statuses of the PWs are not affected if the AC link between PE2 and RNC
becomes faulty.
After the primary/secondary PW switchover, the unidirectional traffic path becomes CE1 ->
PE2 -> PW2 -> PE3 -> Node B.
After the faulty node or link recovers, the master/backup statuses of PEs is renegotiated in the
E-Trunk, and PE1 resumes the master state because its priority is not changed.
Public Network Traffic Protection

When the link between PE1 and PE3 becomes faulty and PW1 goes Down, traffic is switched
through the bypass PW between PE1 and PE2. After detecting the fault, PE3 switches the
traffic from PW1 to PW2, and PE1 switches the traffic to the bypass PW. Thereafter, PE2, as
an SPE, forwards packets along PW2 and the bypass PW.
As a result, the unidirectional traffic path becomes RNC -> PE1 -> bypass PW -> PE2 ->
PW2 -> PE3 -> Node B.
If the public network is configured with a protection policy such as TE FRR or LDP FRR,
you do not need to configure the bypass PW.
11.6.3.2 PW APS Application
Figure 11-52 shows typical PW APS networking. The network comprises an access ring and
an aggregation ring. A BTS/NodeB is connected to a CSG. A BSC/RNC is connected to an
RSG. Primary and secondary PWs are established between a CSG and an RSG. The PWs can
be either single-segment PWs (SS-PWs) or multi-segment PWs (MS-PWs). A BTS/NodeB
communicates with a BSC/RNC through a mobile broadband (MBB) network.
PW APS is deployed on the bearer network to improve reliability. APS instances are
configured on CSGs and RSGs, and the primary/secondary PW protection group is associated
with each APS instance. APS instructs the source and destination ends to implement
bidirectional protection switching in the same manner to achieve delayed switching and WTR
for PW protection.

Equipment
Figure 11-52 Typical PW-APS networking
NodeB
SPE3
SPE1
BTS BSC/RNC
CSG1
PW APS PW APS
CSG2
BTS/NodeB RSG
CSG3
SPE2 BSC/RNC
SPE4 Primary PW
BTS
Secondary PW
NodeB
Access ring Aggregation ring

Acronym & Full Name
Abbreviation
APS Automatic Protection Switching
E-PW APS Enhanced PW APS
PE Provider Edge
PW Pseudo Wire
SPE Switching PE
UPE Ultimate PE
VC Virtual Circuit
11.7 IP Hard Pipe

This chapter provides a description of IP hard pipe and describes its purpose, benefits,
principles, and applications.

Equipment
11.7.1 Introduction
Definition
An IP hard pipe is an MPLS LSP or a PW with a bandwidth that is guaranteed and can neither
be exceeded nor infringed upon. IP hard pipe provides quality guarantee for leased line
services of high-value customers.
In the IP hard pipe solution, the U2000 manages bandwidth resources network-wide. The
physical interface bandwidth on the public network is divided and allocated to soft and hard
pipes. For example, on a 10G Ethernet interface, 2 Gbit/s bandwidth is allocated to the hard
pipe, and the remaining 8 Gbit/s is allocated to the soft pipe. The hard and soft pipe
bandwidths are isolated and cannot be preempted.
Figure 11-53 IP hard pipe networking

U2000
Enterprise 1 Enterprise 1
Enterprise 2 Enterprise 2
PE P PE
3G bps
Physical Interface
Soft Pipe
2G bps
Hard Pipe
Purpose
Customers who have strict bandwidth, delay, and security requirements generally use
synchronous digital hierarchy (SDH) networks. Retaining these customers is expensive

Equipment
because carriers must maintain both IP and SDH networks. Therefore, to reduce maintenance
costs and facilitate user management, carriers expect to migrate their SDH networks to IP
networks.
To meet these expectations, IP hard pipe has been developed. IP hard pipe provides SDH-like
service quality for access services on IP networks by providing guaranteed bandwidth and low
delay. It also provides granular and service-specific OAM and SLA monitoring, which can
accelerate the migration of SDH networks to IP networks.
Benefits
IP hard pipe offers the following benefits to carriers:
l Deployment of high-quality leased lines for VIP customers on newly deployed or
existing routers, reducing SDH network construction and costs for maintaining both
SDH and IP networks
l Rapid service protection, ensuring highly reliable service quality
l Granular service quality measurement using IP FPM, providing flexible and effective
maintenance and management for leased lines dedicated to VIP customers
11.7.2 Principles
This section describes the implementation principles of IP hard pipe.
11.7.2.1 Centralized Management of Hard-Pipe-based Leased Line Services on

the NMS
In the IP hard pipe solution, the NMS centrally manages bandwidth resources and implements
service provisioning. Hard pipe service provisioning involves two steps:
1. Establish a hard pipe plane.
Figure 11-54 IP hard pipe plane establishment (1)

PE P P PE
10G
Physical Network
PE P P PE
1G
9G
2G
8G

Equipment
In the physical network topology, select the public network links that require hard pipe
deployment and set the hard pipe bandwidth for each link. The hard pipe topology is then
established. On the network shown in Figure 11-55, after hard pipes are established over
the entire network, the physical network is divided into two logical networks: a hard pipe
network and a normal service network (called a soft pipe network).
Figure 11-55 IP hard pipe plane establishment (2)

PE P P PE
3G 3G
1G
1G
2G
2G 1G
PE
Hardened Pipe Network
PE P P PE
7G 7G 10G
10G
9G
8G
9G
10G 8G 9G
Normal Traffic Network
2. Provision services.
The service bandwidth, source and destination devices, and service IDs must be
manually configured for VIP customers. The intermediate path can be manually
configured or automatically calculated by the NMS.
The NMS checks whether the hard pipe bandwidth on each node is adequate for service
provisioning:
– If the bandwidth is inadequate, the NMS stops service provisioning and displays an
error message.
– If the bandwidth is inadequate, the NMS delivers configurations to devices.
After services are provisioned, the NMS updates the bandwidth resource database.

Equipment
Figure 11-56 IP hard pipe service provisioning

U2000
Leased Line 1:
300M
PE P P PE
3G 3G
1G
1G
2G
2G 1G
Leased Line 1:
300M
11.7.2.2 Interface-based Hard Pipe Bandwidth Reservation

During network planning, traffic is classified as hard pipe traffic or soft pipe traffic on each
interface. The total hard pipe and soft pipe bandwidth does not exceed the interface
bandwidth. Hard and soft pipe services do not affect each other.
Hard pipe services on the ATN have a higher priority than soft pipe services. If traffic is
transmitted through both the hard pipe and soft pipe, the bandwidth and low delay are
preferentially guaranteed for hard pipe traffic.
The ATN models have different chip capabilities, and therefore the hard pipe implementations
are also different.
The ATN 950B (AND1CXPA/AND1CXPB) supports IP hard pipe in interface-exclusive

mode. In this mode, if an interface is planned to carry hard pipe traffic, it cannot be used to
transmit soft pipe traffic. This mode ensures low delay and no packet loss for hard pipe traffic.
The ATN 910B/ATN 950B (AND2CXPB/AND2CXPE) supports IP hard pipe in interface-

shared mode. In this mode, an interface can carry both hard pipe and soft pipe traffic. Hard
pipe services have a higher priority than soft pipe services, and the hard pipe bandwidth is
exclusive to hard pipe traffic.
The ATN 910/ATN 910I/ATN 905 supports IP hard pipe in interface-shared mode. In this
mode, an interface can carry both hard pipe and soft pipe traffic. Hard pipe traffic enters only
the CS7 queue, whereas soft pipe traffic can enter any queue from BE to CS7. The bandwidth
available to soft pipe traffic entering the CS7 queue equals the interface bandwidth minus the
hard pipe bandwidth. Bandwidth unused by hard pipe traffic can be used by soft pipe traffic.
If a hard pipe is configured on an interface, do not apply for extended queues for soft pipe
traffic. If you do so, packet loss or high delay may occur for hard pipe traffic.
E2E hard pipe services can only be deployed using an NMS. The NMS delivers hard pipe
VLL and TE LSP configurations based on the hard pipe's processing capabilities. A device
establishes VLL PWs and TE LSPs based on the delivered data and transmits VLL and TE
services through the hard pipe.

Equipment
The NMS supports alarm thresholds for services exceeding the hard pipe's processing
capabilities, ensuring that services transmitted over the hard pipe do not exceed the hard
pipe's processing capabilities.
11.7.2.3 AC Interface Service Bandwidth Limitation

User-specific bandwidth limitation is implemented using interface-based CAR on the user
side.
The ATN 910/ATN 910I/ATN 910B/ATN 905/ATN 950B (AND2CXPB/AND2CXPE)

implements priority-based CAR, which ensures that high-priority user packets are
preferentially forwarded.
The ATN 910B/ATN 950B ( AND2CXPB/AND2CXPE ), which uses non-priority-based

CAR, cannot guarantee the preferential forwarding of high-priority user packets.
The ATN 950B (AND1CXPA/AND1CXPB), which uses non-priority-based CAR, cannot

guarantee the preferential forwarding of high-priority user packets.
11.7.2.4 Hard Pipe TE LSP
Principles
After the hard pipe bandwidth is reserved on a physical interface on a carrier network, the
logical hard pipe network comes into being. At this point, path planning is required for
service provisioning. Static bidirectional co-routed TE LSPs can be established to provide
P2P leased line services between two PEs.
After a carrier determines the PE for user access on the NMS, the network transmission paths
can be manually specified or automatically generated. When the NMS automatically plans
paths, hard pipe bandwidth is reserved on a hop-by-hop basis along the transmission path
based on user access bandwidth. If the hard pipe bandwidth of all links on the transmission
path meets the user access service requirements, a hard pipe TE LSP is established between
the PEs. The NMS then updates the bandwidth resource database. This implements hard pipe
services over the TE LSP.
11.7.2.5 Hard Pipe VLL/PWE3 PW
Principles
After the hard pipe bandwidth is reserved on a physical interface on a carrier network, the
logical hard pipe network comes into being. At this point, path planning is required for
services provisioned to users.
After a carrier determines the PE for user access on the NMS, the network transmission paths
can be either manually specified or automatically generated. After a transmission path is
determined and a hard pipe TE tunnel is established, the NMS reserves the hard pipe
bandwidth on a hop-by-hop basis on the TE tunnel based on user access bandwidth. A PW
can then be established and bound to the TE tunnel, and bandwidth limitation can be deployed
on the AC interface. This implements hard pipe services over the VLL/PWE3 PW, with
guaranteed bandwidth and low delay.

Equipment
Service Bandwidth Expansion

In compliance with the VLL/PWE3 service packet encapsulation standards, the outbound
interface on the public network encapsulates a public network header into access service
packets before sending them out. As a result, the packet length is increased, and the access
user bandwidth must also be increased on the public network side.
The U2000 must reserve bandwidth for the hard pipe on the public network side based on the
expanded access user bandwidth.
Figure 11-57 MPLS L2VPN networking
AC VC AC
Tunnel
CE PE PE CE
MPLS Network
On the network shown in Figure 11-57, the length of packets received on a PE from a CE is
L1 (CRC length included). The length of the packets sent by the PE to the public network
interface is L2. The public network interface is an Ethernet interface that sends double-tagged
packets. L2 is calculated as follows:
L2 = L1 + Public network header length (length of the destination MAC address, source MAC
address, Eth_Type, outer VLAN tag, inner VLAN tag, TE label, and VC label)
L2 = L1 + 30
The calculation shows that the public network packet length is determined by the following
factors:
l Service packet length
l Public network link type
The service packet length varies. Even in a data flow from the same access user, packet
lengths will vary. This variability means that Ethernet links cannot use a fixed bandwidth
expansion proportion. However, a bandwidth expansion proportion can be calculated based on
the average packet length.
The VLL bandwidth expansion proportion parameters can be configured. The default value is
calculated based on the average parameter values:
1. Service packet length

The average packet length of 300 bytes is used as the default packet length.
2. Public network link type
The public network interface uses Ethernet encapsulation and sends packets carrying
double VLAN tags.
The default bandwidth expansion proportion is: 30/300 = 10%

Equipment
NOTE
The bandwidth expansion proportion varies according to the POS and Ethernet encapsulation lengths
and the number of VLAN tags on the Ethernet network.
11.7.2.6 Hard Pipe Reliability

IP hard pipe provides reliability for both services and public network tunnels.
Deploy TP OAM and PW APS for primary and secondary static VLL PWs to provide service
protection switching within 50 ms.
Deploy TP OAM and TE APS for static TE tunnel protection groups to provide tunnel
protection switching within 50 ms.
The reliability mechanism for IP hard pipe is the same as that for common services.
11.7.2.7 Hard Pipe Service Quality Monitoring

IP FPM can be deployed to monitor the real-time performance of service flows. TP OAM or
Y.1731 can be deployed to measure packet loss and delay for hard pipe VLL/PWE3 PWs and
TE tunnels.
The reliability mechanism for IP hard pipe is the same as that for common services.
11.7.3 Applications
This section describes typical IP hard pipe applications.
IP hard pipe applies to P2P leased line services of high-end enterprise users.
11.7.3.1 Hard-Pipe-based Enterprise Leased Line Application

Carriers must plan the logical hard pipe network on existing IP bearer networks and reserve
the hard pipe bandwidth on physical interfaces.
On the network shown in Figure 11-58, an enterprise user wants to establish a leased line
between two sites over a carrier's network. The carrier first plans a path and then establishes a
static bidirectional LSP dedicated to the hard pipe over the path. The PEs then establish a
hard-pipe-based static PW. Bandwidth is reserved on a hop-by-hop basis along the path. If
Ethernet links that share bandwidth are used for user access, configure QoS on AC interfaces
to limit the access bandwidth. Subsequently, a hard-pipe-based enterprise leased line is
established. If the hard pipe bandwidth is insufficient, the NMS does not allow the static PW
to be established.
Figure 11-58 Hard-pipe-based enterprise leased line application

IP hard pipe network
PW
User network PE P P PE User network

Equipment
11.7.3.2 Hard-Pipe-based Enterprise Leased Line Protection

Tunnel protection groups can be configured for hard-pipe—based enterprise leased lines to
protect the network against faults. On the network shown in Figure 11-59, master and backup
tunnels are established using hard pipes between PEs, forming a tunnel protection group.
Hard-pipe-based PWs are established over the tunnel protection group. TP OAM is deployed
for the master tunnel. If the master tunnel link or a P fails, traffic can be quickly switched to
the backup tunnel, implementing protection switching.
Figure 11-59 Hard-pipe-based enterprise leased line tunnel protection

IP hard pipe network
PW
PE Master tunnel PE
User network P P User network
Bac P
kup
t unn
el
To protect PE1, deploy primary and secondary PWs. On the network shown in Figure 11-60,
if the master PE fails, traffic is switched to the backup PE, implementing user node
protection.
Figure 11-60 Hard-pipe-based enterprise leased line PW protection
PE
PW
ary
P rim
IP hard pipe
network
PE Se
User network co
nd User network
a ry
PW
PE
11.7.3.3 Hard-Pipe-based Leased Line Services Implemented Using Both Huawei

and Non-Huawei Devices
Hard pipe deployment requires network-wide devices to support hard pipe and the U2000 to
be deployed. When Huawei devices are connected over a network constructed using non-

Equipment
Huawei devices to implement E2E hard pipe services and the non-Huawei devices also
support PWs used to implement hard pipe, multi-segment PWs (MS-PWs) can be deployed.
Figure 11-61 Hard-pipe-based leased line services implemented using both Huawei and non-
Huawei devices
AS3
(Non-huawei
devices)
ASBR ASBR
SPE SPE
P P
AS1 PE PE AS2
(Huawei (Huawei
devices) devices)
User User
network network
Terms
Term Definition
IP hard pipe A technology that provides IP leased line services with strict bandwidth
guarantee and low delay.
SLA Service level agreement. A service agreement between a customer and a

service provider, defining the service type and quality and payment for a
customer.
SDH Synchronous digital hierarchy. A comprehensive transmission network

that integrates multiplexing, line transmission, and switching function
operated by the NMS.
11.8 VPLS

Equipment
11.8.1 Introduction to VPLS

Definition
As an MPLS-based point-to-multipoint (P2MP) L2VPN service provided over a public
network, the virtual private LAN service (VPLS) ensures that geographically isolated user
sites can communicate over metropolitan area networks (MANs) and wide area networks
(WANs) as if they were on the same local area network (LAN). VPLS is also called the
Transparent LAN Service (TLS).
Figure 11-62 shows a typical VPLS networking mode. In this networking, users located in
different geographical regions communicate with each other over different PEs. From the
perspective of users, an MPLS network is a Layer 2 switched network that allows them to
communicate with each other in a way similar to communication over a LAN.
Figure 11-62 Typical VPLS networking
VPN1
site1 VPN1
site3
CE1 PE1 PE2 CE3
MPLS backbone
VPN2 VPN2
PE3
site2 site4
CE2 CE4
VPN1
CE5
site5
Purpose
As enterprises set up more and more branches in different regions and office flexibility
increases, applications such as VoIP, instant messaging, and teleconferencing are increasingly
widely used. This imposes high requirements for end-to-end (E2E) datacom technologies. A
network capable of providing P2MP services is the key to datacom function implementation.
Traditional asynchronous transfer mode (ATM) and frame relay (FR) technologies provide
only Layer 2 point-to-point (P2P) connections. In addition, those network types have
disadvantages such as high construction costs, low speed, and complex deployment. The
development of IP has led to the MPLS VPN technology, which can provide VPN services
over an IP network and offers advantages such as easy configuration and flexibly bandwidth
control. MPLS VPNs can be classified into MPLS L2VPNs and MPLS L3VPNs.

Equipment
l Traditional MPLS L2VPNs, such as the virtual leased lines (VLLs) or virtual private
wire services (VPWSs), can provide P2P services but not P2MP services over a public
network.
l MPLS L3VPNs can provide P2MP services on the precondition that PEs keep routes
destined for end users. This implementation requires high routing performance of PEs.
To solve the preceding problems, VPLS, an MPLS-based Ethernet technology, is introduced.
l Like Ethernet, VPLS supports P2MP communication.

l VPLS is a Layer 2 label switching technology. From the perspective of users, the entire
MPLS IP backbone network is a Layer 2 switching device. PEs do not need to keep
routes destined for end users.
VPLS provides a more complete multipoint communication solution, integrating the

advantages provided by Ethernet and MPLS. By emulating traditional LAN functions, VPLS
enables users on different LANs to communicate with each other over MPLS networks as if
they were on the same LAN.
Benefits
VPLS brings the following benefits:
l VPLS networks can be constructed based on carrier's IP networks, reducing construction

costs.
l VPLS networks inherit the high-speed advantage of the Ethernet.
l VPLS networks allow users to communicate over Ethernet links, regardless of whether
these links are on WANs or LANs. This feature allows services to be rapidly and flexible
deployed.
l VPLS networks free carriers from configuring and maintaining routing policies, reducing
operational expenditure.
11.8.2 Principles
11.8.2.1 VPLS Introduction

NOTE
ATN implements VPLS control plane by running LDP. VPLS based on LDP is referred to as Martini
VPLS.
Basic VPLS Transport Structure

Figure 11-63 shows an example of a VPLS network. The entire VPLS network is similar to a
switch. PWs are established over MPLS tunnels between VPN sites to transparently transmit
Layer 2 packets between sites. When forwarding packets, PEs learn the source MAC
addresses of these packets and create MAC entries, mapping MAC addresses to attachment
circuits (ACs) and PWs.
The following table describes the various concepts related to VPLS networks.

Equipment
Table 11-12 Description of VPLS concepts

Name Description
AC A link between a CE and a PE. An AC must be established using Ethernet

interfaces. On a VPLS network, AC interfaces can be Ethernet interfaces,
Ethernet sub-interfaces, VLANIF interfaces, Eth-Trunk interfaces, Eth-
Trunk sub-interfaces, VE interfaces, and QinQ interfaces interfaces.
PW A bidirectional virtual connection between two virtual switch instances

(VSIs) residing on two PEs. A PW consists of a pair of unidirectional
MPLS VCs transmitting in opposite directions.
VSI A type of instance used to map ACs to PWs. A VSI independently

provides VPLS services and forwards Layer 2 packets based on MAC
addresses and VLAN tags. A VSI has the Ethernet bridge function and
can terminate PWs.
PW signaling A type of signaling used to create and maintain PWs. PW signaling is the
foundation for VPLS implementation. Currently, the PW signaling is
LDP or BGP.
Tunnel A connection between a local PE and a remote PE used to transparently

transmit data between PEs. A tunnel can carry multiple PWs and the
tunnel type can be MPLS or GRE.
Forwarder Similar to a VPLS forwarding table. After a PE receives packets from an

AC, the forwarder of the PE selects a PW to forward these packets.
Figure 11-63 Basic VPLS transmission process
VPN1 VPN1
Site3 Site2 VPN2
CE5 CE3 Site2
CE4
PE3
MPLS
PE2
Network
Forwarder
PE1
AC
CE1 CE2
PW
VPN1 VPN2
Site1 Site1 PW Signal
Tunnel
The forwarding of a packet from CE1 to CE3 on VPN1 is used as an example:

1. CE1 sends a Layer 2 packet to PE1 over an AC.
2. After PE1 receives the packet, the forwarder of PE1 selects a PW for forwarding the
packet.

Equipment
3. PE1 then adds two MPLS labels to the packet based on the PW forwarding entry and
tunnel information and sends the packet to PE2. The private network label identifies the
PW, and the public network label identifies the tunnel between PE1 and PE2.
4. After PE2 receives the packet from the public tunnel, PE2 removes the private network
label of the packet.
5. The forwarder of PE2 selects an AC and forwards the packet to CE3 over the AC.
VPLS Implementation Process

Transmission of packets between CEs relies on VSIs configured on PEs, and PWs established
between the VSIs. Figure 11-64 shows transmission of Ethernet frames over full-mesh PWs
between PEs.
The Ethernet often uses the Spanning Tree Protocol (STP) to prevent loops. VPLS networks,
however, use full-mesh PWs and split horizon to avoid loops as follows:
l The PEs in a VSI must be fully meshed. That is, a PE must create a tree path to every
other PE in the VSI.
l Each PE must support split horizon to avoid loops. Split horizon requires that packets
received from a PW in a VSI should not be forwarded to other PWs in the VSI. Any two
PEs in a VSI must communicate over a direct PW, which is why full-mesh PWs are
required between PEs in a VSI.
Figure 11-64 VPLS forwarding model
CE CE
VLAN1 VLAN1
VSI 1 VSI 1
PE PE
VSI 2 VSI 2
CE VSI 1 VSI 2 CE
VLAN2 VLAN2
PE
CE CE
VLAN1 VLAN2
A VPLS network consists of a control plane and a forwarding plane:
l The control plane of a VPLS PE provides the PW establishment function, including:

– Member discovery: a process in which a PE in a VSI discovers the other PEs in the
same VSI. This process can be implemented manually or automatically using
protocols. BGP VPLS and BGP AD VPLS both support automatic member
discovery.
– Signaling mechanism: PWs between PEs in the same VSI are established,
maintained, or torn down using signaling protocols such as LDP and BGP.
l The forwarding plane of a VPLS PE provides the data forwarding function, including:

Equipment
– Encapsulation: After receiving Ethernet frames from a CE, a PE encapsulates the

frames into packets and sends the packets to a PSN.
– Forwarding: A PE determines how to forward a packet based on the inbound
interface and destination MAC address of the packet.
– Decapsulation: After receiving packets from a PSN, a PE decapsulates these
packets into Ethernet frames and sends the frames to a CE.
VPLS Implementation Modes

VPLS can be implemented in LDP, BGP, or BGP AD mode. Table 11-13 compares these
VPLS implementation modes.
Table 11-13 Comparison between VPLS implementation modes

Implem Description Characteristic Usage Scenario
entation
Mode
LDP VPLS l Protocol implementation is simple, The LDP mode

mode implemented in and performance requirements for applies to VPLS
LDP mode, also PEs are low, but member discovery networks that do
called Martini must be carried out manually. not have many
VPLS, uses l After a PE is added, PWs between sites, do not span
LDP signaling. the newly added PE and existing multiple ASs, or
PEs need to be established. with PEs that do
not run BGP.
l An LDP session needs to be
established between every two PEs.
The number of LDP sessions is in
proportion to the number of PEs
squared.
l Labels are assigned to PEs on
demand, and label usage is high.
l If a VPLS network spans more than
one AS, VSIs in each AS must use
the same VSI ID range.

Equipment
Implem Description Characteristic Usage Scenario

entation
Mode
BGP VPLS l PEs must run BGP, and demands The BGP mode
mode implemented in on PE performance are high. applies to VPLS
BGP mode, also Automatic member discovery is networks that
called Kompella supported, simplifying user reside on the core
VPLS, uses operations. layers of large-
BGP signaling. l After a PE is added, configurations scale networks,
on existing PEs do not need to be or span multiple
modified, as long as the total ASs, or with PEs
number of PEs does not exceed the that run BGP.
number allowed by the label block.
l RRs are used to reduce the number
of BGP connections, increasing
network expansibility.
l Usage of label blocks wastes label
resources to some extent.
l VPN targets are used to identify
VPN member relationships. This
feature allows a VPLS network to
span multiple ASs.
BGP AD VPLS BGP AD The BGP AD mode supports The BGP AD

mode uses extended automatic member discovery and mode integrates
BGP Update VPLS PW establishment: the advantages of
packets to l Compared with the LDP mode, less the BGP and
implement configuration work is required to LDP modes.
automatic add a new PE in BGP AD mode.
member
discovery. It l Compared with the BGP mode, the
also uses LDP BGP AD mode saves local label
FEC 129 resources and is compatible with
signaling PWE3.
packets for local
and remote VSIs
to automatically
negotiate and
establish VPLS
PWs.
VPLS Encapsulation Modes

l Packet encapsulation on ACs
Packet encapsulation on ACs depends on the user access mode, which can be VLAN or
Ethernet access. The default user access mode is VLAN access.

Equipment
Table 11-14 Packet encapsulation on ACs

Packet Description
Encapsulati
on Type
VLAN The header of each Ethernet frame sent between CEs and PEs carries
a VLAN tag, known as the provider-tag (P-Tag). This is a service
delimiter identifying users on an ISP network.
Ethernet The header of each Ethernet frame sent between CEs and PEs does
not carry a P-Tag. If the frame header contains a VLAN tag, it is an
inner VLAN tag called the user-tag (U-Tag). A CE does not add the
U-Tag to an Ethernet frame; instead, the tag is carried in a packet
before the packet is sent to the CE. A U-Tag informs the CE to which
VLAN the packet belongs, and is meaningless to PEs.
l Packet encapsulation on PWs

The PW ID and PW encapsulation type uniquely identify a PW. The PW IDs and PW
encapsulation types configured on the two end PEs of a PW must be the same. The
packet encapsulation types of packets on PWs can be raw or tagged. By default, packets
on PWs are encapsulated in tagged mode.
Table 11-15 Packet encapsulation on PWs

Packet Description
Encapsulati
on Type
Raw Packets transmitted over a PW cannot carry P-Tags. If a PE receives

a packet with the P-Tag from a CE, the PE strips the P-Tag and adds
double labels (outer tunnel label and inner VC label) to the packet
before forwarding it. If a PE receives a packet with no P-Tag from a
CE, the PE directly adds double labels (outer tunnel label and inner
VC label) to the packet before forwarding it. The PE determines
whether to add the P-Tag to a packet based on actual configurations
before sending it to a CE. The PE is not allowed to rewrite or remove
an existing U-Tag.
Tagged Packets transmitted over a PW must carry P-Tags. If a PE receives a

packet with the P-Tag from a CE, the PE directly adds double labels
(outer tunnel label and inner VC label) to the packet before
forwarding it. If a PE receives a packet with no P-Tag from a CE, the
PE adds a null P-Tag and double labels (outer tunnel label and inner
VC label) to the packet before forwarding it. The PE determines
whether to rewrite, remove, or preserve the P-Tag of a packet based
on actual configurations before forwarding it to a CE.
Encapsulation modes of packets transmitted over ACs and PWs can be used together. As
shown in Figure 11-65, CE1 and CE3 are connected to the PEs in VLAN access mode,
whereas CE2 and CE4 are connected to the PEs in Ethernet access mode. Packets on the PW
between PE1 and PE3 are encapsulated in tagged mode, whereas packets on the PW between

Equipment
PE2 and PE4 are encapsulated in raw mode. The following uses Ethernet+raw encapsulation
and VLAN+tagged encapsulation as examples to describe the packet exchange process.
Figure 11-65 Packet encapsulation on the VPLS AC and PW sides
L2 IP
P-TAG Data
Header Header
L2 Tunnel VC L2 IP
P-TAG Data
Header Label Label Header Header
L2 IP
P-TAG Data
Header Header
VLAN VLAN
access Tagged mode access
CE1 CE3
PE1 PE3
He
L2 er
ad
He
ac AN
VL ess
L2 er
ss
ac
ad
AN
VL
ce
c
P-
TA
He
G
IP er
ad
He
IP er
ata
Da
ad
er D
ta
ata
Da
He IP
ad
er D
ta
He IP
ad
G
TA
Eth cess
t
ac erne
P-
ac
er
ern
ss
He 2
ad
L
ce
er
Eth
et
He L2
ad
PE2 PE4
CE2 CE4
Ethernet Raw mode Ethernet
access access
L2 IP L2 IP
Data Data
Header Header Header Header
L2 Tunnel VC L2 IP
Data
Header Label Label Header Header
From CE1 to CE3
From CE2 to CE4
l Ethernet+raw encapsulation
As shown in Figure 11-65, Ethernet+raw encapsulation is used on the path of CE2 ->
PE2 -> PE4 -> CE4 and its reverse path. The packet exchange process is as follows:
a. CE2 sends a Layer 2 packet without a P-Tag to PE2.
b. PE2 searches the corresponding VSI for a forwarding entry and selects a tunnel and
a PW to forward the packet based on the found forwarding entry. PE2 adds double
labels (outer tunnel label and inner VC label) to the packet based on the selected
tunnel and PW, performs Layer 2 encapsulation, and forwards the packet to PE4.

Equipment
c. Upon receipt, PE4 removes the Layer 2 encapsulation carried out by PE2 and the
double labels.
d. PE4 sends the original Layer 2 packet to CE4.
The process of sending a packet from CE4 to CE2 is similar to this process.
l VLAN+tagged encapsulation
As shown in Figure 11-65, VLAN+tagged encapsulation is used on the path of CE1 ->
PE1 -> PE3 -> CE3 and its reverse path. The packet exchange process is as follows:
a. CE1 sends a Layer 2 packet with a P-Tag to PE1.
b. PE1 retains the P-Tag because a packet sent to a PW with the tagged packet
encapsulation mode must carry a P-Tag.
c. PE1 searches the corresponding VSI for a forwarding entry and selects a tunnel and
a PW to forward the packet based on the found forwarding entry. PE1 adds double
labels (outer tunnel label and inner VC label) to the packet based on the selected
tunnel and PW, performs Layer 2 encapsulation, and forwards the packet to PE3.
d. Upon receipt, PE3 removes the Layer 2 encapsulation carried out by PE1 and the
double labels.
e. PE3 sends the original Layer 2 packet to CE3.
The process of sending a packet from CE3 to CE1 is similar to this process.
Derivative VPLS Functions

Traffic Statistics
Traffic statistics can be collected based on VSIs or VSI peers, and the status of various types
of traffic can be viewed in real time.
Suppression of Label Request Messages
If Huawei devices need to communicate with non-Huawei devices that do not support Label
Request messages, you can disable the function for sending Label Request messages on
Huawei devices to allow communication between all the devices.
VPLS Service Isolation
VPLS service isolation allows you to prohibit communication between users that use the same
service and bound to the same VSI. For example, HSI users bound to the same VSI cannot
communicate with each other.
On the network shown in Figure 11-66, CE1, CE2, and CE3 access the same VSI. With
VPLS service isolation, you can configure CE1 and CE3 to communicate with each other and
prohibit CE2 from communicating with CE1 or CE3.
Figure 11-66 VPLS service isolation
VPLS
network
CE1
PE1 PE2 CE3
Mutual visit allowed

CE2
Mutual visit prohibited

Equipment
By default, traffic can be forwarded between AC interfaces, between UPE PWs, and between
AC interfaces and UPE PWs in a VSI. On a non-hierarchical VPLS network, VPLS service
isolation prohibits traffic forwarding between AC interfaces. On an HVPLS network, VPLS
service isolation prohibits traffic forwarding between AC interfaces, between UPE PWs, and
between AC interfaces and UPE PWs.
11.8.2.2 LDP VPLS
Background
Label Distribution Protocol (LDP) virtual private LAN service (VPLS), also called Martini
VPLS, uses a static discovery mechanism to discover VPLS members using LDP signaling.
VPLS information is carried in extended type-length-value (TLV) fields (type 128 and type
129 FEC TLVs) of LDP signaling packets. During the establishment of a pseudo wire (PW),
the label distribution mode is downstream unsolicited (DU) and the label retention mode is
liberal.
Related Concepts
LDP VPLS involves the following concepts:
l FEC: A set of packets with similar or identical characteristics and forwarded in the same
way by label switching routers (LSRs). Characteristics determining the FEC of a packet
include the destination address, service type, and QoS attribute.
l TLV: A highly efficient and expansible coding mode for protocol packets. To support
new features, you only need to add new types of TLVs to carry information required by
the features.
l DU: A label distribution mode in which an LSR distributes labels to FECs without
having to receive Label Request messages from its upstream LSR.
l Liberal: A label retention mode in which an LSR retains the label mapping received from
a neighboring LSR, regardless of whether the neighboring LSR is its next hop. In liberal
label retention mode, an LSR can use the labels sent from neighboring LSRs that are not
at the next hop to re-establish an LSP. This mode requires more memory and label space
than the conservative mode.
Implementation Process
l Figure 11-67 shows the process of establishing a PW using LDP signaling.
Figure 11-67 Establishing a PW using LDP signaling

Label Mapping Message :
PW ID + VC Lable
VSI VSI
VC1
PE1 VC2
PE2
Label Mapping Message :

PW ID + VC Lable

Equipment
a. After PE1 is associated with a VSI, and PE2 is configured as a peer of PE1, PE1
sends a Label Mapping message to PE2 in DU mode if an LDP session already
exists between PE1 and PE2. The Label Mapping message carries information
required to establish a PW, such as the PW ID, VC label, and interface parameters.
b. Upon receipt of the message, PE2 checks whether itself has been associated with
the VSI. If PE2 has been associated with the VSI and PW parameters on PE1 and
PE2 are consistent, PE1 and PE2 belong to the same VSI. In this case, PE2
establishes a unidirectional VC named VC1 immediately after PE2 receives the
Label Mapping message. Meanwhile, PE2 sends a Label Mapping message to PE1.
After receiving the message, PE1 takes a similar sequence of actions to PE2 and
establishes VC2.
l Figure 11-68 shows the process of tearing down a PW using LDP signaling.
Figure 11-68 Tearing down a PW using LDP signaling

Label Withdrawal Message
VSI VC1 VSI

X
X
PE1 VC2 PE 2
Label Release Message
a. After the peer configuration about PE2 is deleted from PE1, PE1 sends a Label
Withdrawal message to PE2. After receiving the Label Withdrawal message, PE2
withdraws its local VC label, tears down VC1, and sends a Label Release message
to PE1.
b. After receiving the Label Release message, PE1 withdraws its local VC label and
tears down VC2.
Usage Scenario
The LDP mode applies to VPLS networks that do not have many sites, do not span multiple
ASs, or with PEs that do not run BGP.
Benefits
LDP VPLS brings the following benefits:
l Easy configuration
l Saved label resources
11.8.2.3 BGP AD VPLS
Definition
BGP AD VPLS, short for Border Gateway Protocol Auto-Discovery Virtual Private Line
Service, is a new technology for automatically deploying VPLS services.

Equipment
BGP AD VPLS-enabled devices exchange extended BGP Update packets to automatically

discover BGP peers in a VPLS domain. After BGP peer relationships are established, these
devices use LDP FEC 129 to negotiate and establish VPLS PWs. In addition, BGP AD
HVPLS is deployed by disabling split horizon. This allows all BGP peers in an AS to function
as UPEs on an HVPLS network.
Purpose
The wide use of VPLS technologies leads to the growing scale of VPLS networks and
configurations. BGP AD VPLS is introduced to simplify configurations, enable automatic
service deployment, and reduce OpEx.
BGP AD VPLS obtains the advantages of both Kompella and Martini VPLS. BGP AD VPLS-
enabled devices exchange extended BGP Update packets to automatically discover BGP peers
in a VPLS domain. After BGP peer relationships are established, these devices use LDP FEC
129 to negotiate and establish VPLS PWs. On the established PWs, VPLS services are
automatically deployed.
Automatic VPLS member discovery and automatic PW deployment simplify the

configurations required by VPLS networks, enable automatic service deployment, and reduce
OpEx for operators.
Concepts
Acronym and Full Name Description
Abbreviation
VPLS ID Virtual Private LAN Service ID Identifier of a VPLS domain
VSI ID Virtual Switch Instance ID Identifier of a VSI in a VPLS

domain
RD Route Distinguisher Route distinguisher in a BGP

packet which carries VSI
information
RT Route Target Route target in a BGP packet

which carries VSI information
AGI Attachment Group Identifier Domain identifier used during

PW negotiation between PEs in a
VPLS domain
AII Attachment Individual Identifier VSI identifier used during PW

negotiation between PEs in a
VPLS domain
SAII Source Attachment Individual Local IP address used by BGP

Identifier AD VPLS to negotiate the
creation of a PW
TAII Target Attachment Individual Remote IP address used during

Identifier negotiation on the creation of a
PW

Equipment
Acronym and Full Name Description

Abbreviation
FEC 129 Forwarding Equivalence Class New type of FEC used by LDP
129 signaling
Principles
BGP AD VPLS obtains the advantages of both Kompella and Martini VPLS. BGP AD VPLS
automatically discovers VPLS BGP peers, simplifying the configurations and saving labels.
BGP AD VPLS-enabled devices exchange extended BGP Update packets carrying VSI
information and automatically discover BGP peers in a VPLS domain. After BGP peer
relationships are established, these devices use LDP FEC 129 to negotiate and establish VPLS
PWs. On the established PWs, VPLS services are automatically deployed.
Automatically Discovering PEs in a VPLS Domain

Automatically discovering PEs in a VPLS domain is the first phase of VPLS service
deployment. BGP is used to automatically discover PEs in a VPLS domain. Figure 11-69
shows the process and information used for automatically discovering PEs in a VPLS domain.
Figure 11-69 Networking diagram of automatically discovering PEs in a VPLS domain
BGP UPDATE
VPLS-ID:65535:100
RD:65535:100
VSI-ID:1.1.1.1
RT:5:5
Next Hop:1.1.1.1
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
AS 65535
PE1 PE2
CE1 BGP UPDATE CE2

VPLS-ID:65535:100
RD:65535:100
VSI-ID:2.2.2.2
RT:5:5
Next Hop:2.2.2.2
The process of automatically discovering PEs in a VPLS domain is as follows:

Equipment
1. The VPLS ID, RT, VSI ID are set on PE1 and encapsulated in BGP AD Update
messages. These messages are sent to all peer PEs in all BGP areas. The operations and
process are the same on PE2.
NOTE
By default, the RD is equal to the VPLS ID. If the VPLS ID is set, the RD does not need to be set. The
VSI ID is equal to the local LSR ID and does not need to be set.
2. After receiving BGP AD packets, PEs check whether the BGP AD packets match the RT
policy. If they match, PEs obtain the VSI information carried in the packets and compare
the obtained information to the local configuration. After comparison, either of the
following results is obtained:
– If VPLS IDs of VSIs on both PEs are the same, it indicates that the two VSIs are in
the same VPLS domain. This allows one and only one PW to be established
between them.
– If VPLS IDs of VSIs on the two PEs are different, it indicates that the two VSIs are
in different VPLS domains. This allows no PW between them.
Automatically Deploying a PW
After a PE discovers remote PEs in a VPLS domain, BGP AD uses LDP FEC 129 to negotiate
the creation of PWs. Figure 11-70 shows the exchange process and information used during
negotiation.
Figure 11-70 Networking diagram of automatically deploying a PW
LDP Mapping(FEC 129)

Next Hop:1.1.1.1
AGI:65535:100(VPLS-ID)
SAII:1.1.1.1(VSI-ID)
TAII:2.2.2.2(VSI-ID from BGP AD)
Label:2001
Loopback1 Loopback1
1.1.1.1/32 2.2.2.2/32
AS 65535
PE1 PE2
CE1 LDP Mapping(FEC 129) CE2

Next Hop:2.2.2.2
AGI:65535:100(VPLS-ID)
SAII:2.2.2.2(VSI-ID)
TAII:1.1.1.1(VSI-ID from BGP AD)
Label:2001
The process of automatically deploying a PW is as follows:

Equipment
1. If no LDP session is established between two PEs in one VPLS domain, the two PEs
initiate negotiation on the creation of an LDP session. If an LDP session is established ,
the PEs exchange LDP Mapping messages to each other by using FEC 123 signaling.
The LDP Mapping messages carry information such as AGI, SAII, TAII, and the label.
NOTE
After BGP AD VPLS members are discovered, BGP AD VPLS proactively triggers LDP to establish an
LDP session, allowing the establishment of a PW for VPLS services. If VPLS services are deleted and
this LDP session is no longer used, LDP is proactively triggered to delete the LDP session. This
simplifies maintenance of the LDP session, saves the network resource cost, and improves system
resource usage and network performance.
2. After a PE receives LDP Mapping messages, the PE parses and obtains information
including the VPLS ID, PW type, MTU, and TAII. The PE compares the information to
the local VSI information. If they are the same and meet the requirements for setting up a
PW, the PE sets up a PW to the remote PE.
Applications on a Full-mesh Network

On a network shown in Figure 11-71, BGP peer relationships are established between PE1,
PE2, and PE3. BGP AD VPLS is configured on PE1 and PE2 in one VPLS domain. PE3 is to
be added to the VPN as the network needs to be expanded. PE3 is assigned the same VPLS
ID as that on PE1 and PE2, which allows PE3 to join the VPLS domain. BGP AD VPLS is
enabled on PE3 and allows PWs to be automatically established from PE3 to PE1 and PE2.
This simplifies configurations.
Figure 11-71 Networking diagram of full-mesh BGP AD VPLS
CE1
PE1
MPLS
PE2 Network PE3
CE2 CE3

Equipment
11.8.2.4 VPLS PW Redundancy
Background
NOTE
Among ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports this function.
To protect against failures and improve reliability, a redundant provider edge (PE) is often
deployed for a service. If a redundant PE is provided for a virtual private wire service
(VPWS) or virtual private LAN service (VPLS), two pseudo wires (PWs) are deployed for
PW protection. This mechanism is called PW redundancy.
PW redundancy, specified in draft-ietf-pwe3-redundancy-bit-04, can effectively increase

equipment switchover efficiency and reduce service interruption, improving network
reliability.
PW redundancy, which is widely used for point-to-point services on VPWS networks, can be
used on VPLS networks because point-to-multipoint VPLS services can be considered as
point-to-point services for each point.
PW redundancy used on VPLS networks is called VPLS PW redundancy. It helps rapidly

converge VPLS networks and reduce service interruption.
Related Concepts
Some key concepts for VPLS PW redundancy are described by using service traffic protection
between customer edge 1 (CE1) and CE2 on the VPLS network in Figure 11-72 as an
example.
Currently, VPLS PW redundancy can operate in two modes, specified on PE1.
l Master/Slave mode: PE1 determines whether a local PW is in the primary or secondary

state based on preset forwarding priorities.
l Independent mode: PE1 determines whether a local PW is in the primary or secondary
state based on the master and backup status of PE2 and PE5.
PEs on the two ends of a PW group must negotiate PW statuses to ensure that they select the
same PW to transmit packets.
l Primary and secondary are terms used to describe PW forwarding priorities. The PW
forwarding priorities can be configured, and a smaller value indicates a higher priority. A
PW with the highest priority is the primary PW.
NOTE
PW forwarding priorities take effect only when PE1 uses PW redundancy in Master/Slave mode.
In Master/Slave mode, PE1 instructs PE2 and PE5 to change forwarding statuses of PWs to be the
same as those of PWs on PE1. In Independent mode, the master and backup statuses of PE2 and
PE5 determine forwarding statuses of local PWs.
l Active and Standby are terms used to describe PW forwarding statuses and cannot be
configured. Only active PWs are used for forwarding traffic. Standby PWs may be used
for receiving traffic.
NOTE
Active and Inactive, and Primary and Backup are terms used by Huawei that have the same
meaning with Active and Standby as defined in draft-ietf-pwe3-redundancy-bit-04. They all
indicate PW forwarding statuses.

Equipment
Implementation
To ensure the same forwarding capability, the PW redundancy protection mechanism to be
used must allow the configuration of a single PW in a PW group to be an active PW and the
remaining to be standby PWs, which requires corresponding signaling control.
RFC 4447 (Pseudowire Setup and Maintenance Using the Label Distribution Protocol [LDP])
specifies the PW Status TLV to transmit the PW forwarding status. The PW Status TLV is
transported to the remote PW peer using a Label Mapping or LDP Notification message. The
PW Status TLV is a 32-bit status code field. Each bit in the status code field can be set
individually to indicate more than one failure. PW redundancy introduces a new PW status
code 0x00000020. When the code is set, it indicates "PW forwarding standby".
NOTE
Only the VPLSs in PWE3 mode support PW redundancy.
Forwarding priorities (Primary or Secondary) must be configured for PWs that back up each
other. The highest priority PW will be selected as the primary PW to forward traffic. The
remaining PWs will be in the Secondary state to protect the primary PW.
NOTE
Currently, only one secondary PW can be configured for a primary PW.
The forwarding status of a PW determines whether the PW is used to forward traffic. The PW
forwarding statuses depend on:
l Local and remote PW signaling statuses: A PE monitors the local signaling status and
uses PW redundancy signaling to obtain remote signaling status from a remote PE.
l PW redundancy mode: Master/Slave or Independent mode is specified on PE1.
l PW forwarding priorities: PW forwarding priorities (Primary or Secondary) are specified
on PE1.
Figure 11-72 shows that VPLS PW redundancy is configured on PE1. In normal cases, all
local and remote PW signaling statuses on PE1 are Up. PEs at the two ends of a PW in
different VPLS PW redundancy modes use different methods to select the same PW for
transmitting user packets.
l In Master/Slave mode, PE1 determines local PW forwarding statuses based on preset

forwarding priorities and inform PE2 and PE5 of the PW forwarding statuses; PE2 and
PE5 determine their PW forwarding statuses based on the received PW primary and
secondary statuses.
l In Independent mode, PE1 determines local PW forwarding statuses based on the
forwarding statuses learned from PE2 and PE5; PE2 and PE5 determine their PW
primary and secondary statuses based on signaling, which can be enhanced trunk (E-
Trunk), or Virtual Router Redundancy Protocol (VRRP) signaling, and notify PE1 of the
forwarding statuses.
In both Master/Slave and Independent modes, if a primary PW is faulty, it becomes inactive

and its secondary PW becomes active. PW-side faults do not affect the AC status. If AC-side
faults occur (for example, a PE or AC link is faulty), the PW primary and secondary statuses
in Independent mode will change because the statuses are determined by the master and
backup statuses of the dual-homing devices; the PW primary and secondary statuses in
Master/Slave mode will not change because they are determined by PW side.

Equipment
NOTE
VPLS PW redundancy is similar to VPWS PW redundancy, with the exception that a virtual switch
instance (VSI) has multiple PWs to different PEs. These PWs form various PW groups. PW switching in
one group does not affect other PW groups.
Currently, VPLS supports only the master/slave PW redundancy mode.
Derivative Function
In addition to protection against network faults in real time, VPLS PW redundancy allows
users to manually switch traffic between PWs in a group during network operation and
maintenance. For example, if a device providing a primary PW needs to be maintained, a user
can switch traffic to the secondary PW and switch it back to the primary PW after the
maintenance.
NOTE
The interval between a switchover and a switchback must be at least 15s.
Usage Scenarios
VPLS PW redundancy can be used on hierarchical virtual private LAN service (HVPLS)
networks and VPLS and virtual leased line (VLL) interconnected networks. These two types
of networks can bear any services, but when newly planned or deployed, these networks are
suggested to carry different services based on their networking characteristics.
l HVPLS networks are suitable for bearing multicast services because HVPLS networks
can save VPLS core network bandwidth.
l VPLS and VLL interconnected networks are suitable for bearing unicast services,
because VLL PEs do not need to learn user MAC addresses.
VPLS PW redundancy can also be used to improve reliability of existing networks. On the
VPLS network in Figure 11-72, CE1 communicates with CE2, CE3, and CE4 through PWs
between one VSI on PE1 and PE2, PE3, and PE4.
As services develop, services between CE1 and CE2 and between CE1 and CE3 require high
reliability. Services between CE1 and CE4 do not require high reliability.
To meet the reliability requirements, PE5 and PE6 are deployed on the VPLS network to
provide VPLS PW redundancy protection for PE2 and PE3, respectively. In addition, multiple
PW groups to peer PEs are configured in one VSI on PE1. Links between CE1 and CE4
remain unchanged.
VPLS PW redundancy protects services against failures on the network side, AC side, or PEs
without affecting existing services, improving network reliability.
NOTE
VPLS PW redundancy can be provided for the desired services without affecting services on other PWs,
which reduces costs and maximizes profits.

Equipment
Figure 11-72 VPLS PW redundancy networking
PE2
CE1 CE2
VPLS
PE1
PE5
PE3
PE4
PE6
CE3
CE4 Primary PW
Secondary PW

Acronym & Full Name
Abbreviation
CE customer edge
L2PDU Layer 2 protocol data unit
PE provider edge
SP service provider
VC virtual circuit
VSI virtual switching instance
11.9 L2VPN Loop Detection

Equipment
11.9.1 Overview
L2VPN loop detection can detect and eliminate L2VPN loops, preventing L2VPN broadcast
storms.
Purpose
Generally, redundant links are used on an Ethernet switching network to provide link backup
for higher network reliability. The use of redundant links, however, may produce loops,
causing broadcast storms and MAC table instability. As a result, the communication quality
may deteriorate and services may even be interrupted.
As Layer 2 Ethernet technologies, L2VPN technologies, including virtual private LAN
service (VPLS) and virtual leased line (VLL), also encounter loop problems in practical
application:
l If a customer network uses multiple leased lines, loops may accidentally occur due to
reasons such as incorrect network configurations, resulting in broadcast storms.
l If a customer network passes through a third-party network, loops may accidentally
occur due to reasons such as incorrect third-party network configurations, resulting in
broadcast storms.
To prevent loops on Ethernet switching networks, the Spanning Tree Protocol (STP) is used.
However, if STP is used to prevent L2VPN loops, varying and complex customer networks
will pose great network maintenance difficulties to carriers. This is because STP also needs to
be deployed on CEs and relies on the customer network to some extent.
To address this issue, L2VPN loop detection is introduced.
Benefits
L2VPN loop detection offers the following benefits to carriers:
l Reduced device burdens: L2VPN loop detection effectively prevents broadcast storms,
reducing device burdens.
l Flexible and controllable deployment: L2VPN loop detection only needs to be deployed
on PEs and is totally independent of the customer network. Therefore, L2VPN loop
detection can be deployed in a flexible and controllable manner.
11.9.2 Principles
11.9.2.1 Basic Concepts and Implementation Principles
Basic Concepts
L2VPN loop detection can detect and eliminate L2VPN loops, preventing L2VPN broadcast
storms.
Implementation
L2VPN loop detection is deployed on the AC interfaces of PEs. AC interfaces that have
L2VPN loop detection enabled send L2VPN loop detection packets (Layer 2 packets) at an
interval that ranges from 100 ms to 4000 ms. The interval changes automatically each time.

Equipment
The L2VPN loop detection packets are destined for broadcast MAC addresses and are
forwarded within a VLAN domain.
It can work in either self-sending self-receiving mode or loose mode.
Self-Sending Self-Receiving Mode
l If an AC interface on a PE receives an L2VPN loop detection packet from another AC
interface on the same PE and the two AC interfaces are bound to the same VSI or VLL,
the PE experiences an L2VPN network loop. The PE then compares the two looped AC
interfaces, automatically blocks the AC interface with a smaller interface index, and
reports an alarm.
Interface blocking rule: The blocking priorities and interface indexes of two AC
interfaces are compared. The AC interface with a higher blocking priority is blocked
preferentially. If the blocking priorities are the same, the indexes of both AC interfaces
are compared. The AC interface with a smaller index is blocked preferentially.
Interface index comparison rule:
– If a loop occurs on two GigabitEthernet or Ethernet interfaces, the PE compares the
slot IDs, interface IDs, and sub-interface IDs in order until the interface to be
blocked is determined. For example, if a loop occurs on GigabitEthernet 0/x1/y1.z1
and GigabitEthernet 0/x2/y2.z2, the PE compares x1 and x2 first. Because x1 is
smaller than x2, the PE blocks GigabitEthernet 0/x1/y1.z1 directly without going on
to compare y1 and y2 or z1 and z2.
– If a loop occurs on two Eth-Trunk interfaces, the PE compares the interface IDs and
sub-interface IDs in order until the interface to be blocked is determined. For
example, if a loop occurs on Eth-Trunk m1.n1 and Eth-Trunk m2.n2, the PE
compares m1 and m2 first. Because m1 is smaller than m2, the PE blocks Eth-
Trunk m1.n1 directly without going on to compare n1 and n2.
– If a loop occurs on a GigabitEthernet or Ethernet interface and an Eth-Trunk
interface, the PE blocks the Eth-Trunk interface.
l If a PW interface on a PE receives an L2VPN loop detection packet from an AC
interface on the same PE and the AC and PW interface are bound to the same VSI or
VLL, the PE is considered in an L2VPN loop. The PE then automatically blocks the AC
interface that sends the L2VPN loop detection packet and reports an alarm.
Loose Mode
On the network shown in Figure 11-73, AC1 on PE1 and AC2 on PE2 send L2VPN loop
detection packets to the customer network.

Equipment
Figure 11-73 Networking in which a customer network is dual-homed to a VPLS/VLL

network
PE1 PE2
VPLS/VLL
AC1 network AC2
L2 network
CE1 CE2
Two PEs (PE1 and PE2) reside on a VPLS or VLL network. If PE1's AC interface AC1
receives an L2VPN loop detection packet from PE2's AC interface AC2, PE1 and PE2 are
considered in an L2VPN loop, irrespective of whether AC1 and AC2 are bound to the same
VSI or VLL. A PE then blocks an AC interface according to the following interface blocking
rules and reports an alarm.
Interface blocking rule: The blocking priorities, system MAC addresses, and interface indexes
of AC1 and AC2 are compared. If an interface is blocked preferentially due to a higher
blocking priority, the system MAC addresses and interface indexes of the two interfaces will
not be compared. For example, if AC1 has a higher blocking priority than AC2, AC1 is
blocked preferentially.
11.9.3 Applications
11.9.3.1 Application of L2VPN Loop Detection When a CE Is Single-homed to a

PE over Redundant Links
On the network shown in Figure 11-74, AC1 and AC2 on PE1 send L2VPN loop detection
packets to the customer network.

Equipment
Figure 11-74 Networking in which a CE is single-homed to a PE over redundant links
VPLS/VLL
network
AC1 AC2
PE1
L2 network
CE1 CE2
11.9.3.2 Application of L2VPN Loop Detection When a Customer Network Is

Dual-homed to a VPLS/VLL Network
On the network shown in Figure 11-75, AC1 on PE1 and AC2 on PE2 send L2VPN loop
detection packets to the customer network.
Figure 11-75 Networking in which a customer network is dual-homed to a VPLS/VLL

network
PE1 PE2
VPLS/VLL
AC1 network AC2
L2 network
CE1 CE2

Equipment

Terms
Term Definition
transparent transmission A process in which a network forwards received packets

to another network without modifying these packets.

Abbreviation
CE customer edge
PE provider edge
LAN local area network
11.10 IP RAN Virtual Cluster
11.10.1 Introduction to IP RAN Virtual Clusters

This section describes the purpose, definition, and benefits of IP radio access network (RAN)
virtual clusters.
Purpose
IP RAN solutions have been worked out to maximize carriers' return on investment, reduce
network construction costs, and evolve the exiting network smoothly into a Long Term
Evolution (LTE) network.
The existing IP RAN solutions include end-to-end (E2E) virtual private network (VPN),
hierarchical VPN (HVPN), mixed VPN, native IP+L3VPN, and ATN+CX gateway solutions.
However, these solutions have disadvantages, as described in Table 11-16.

Equipment
Table 11-16 IP RAN solutions and their disadvantages

Sol Disadvantage
utio
n
Typ
e
Laye l This type of solution requires high-performance access devices because the
r3 dynamic signaling protocols, such as the Border Gateway Protocol (BGP),
solut Resource Reservation Protocol (RSVP), and Label Distribution Protocol (LDP),
ion need to be enabled on the devices. The protocols generate a large number of
packets, which consume large amounts of network bandwidth and system
process resources.
l This type of solution involves complex Layer 3 technologies and therefore
requires highly skilled operation and maintenance (O&M) personnel.
Laye l This type of solution has complex data planning, and a large number of features
r2 need to be deployed.
solut l This type of solution has complex configurations, and a large number of
ion configuration procedures are required.
l This type of solution has high O&M costs, and a lot of manpower is required for
routine maintenance.
IP RAN virtual clusters overcome these disadvantages. As shown in Figure 11-76, a virtual
cluster is deployed on the access ring. The access aggregation gateways (AGGs) perform
centralized path calculation, service provisioning, and traffic control for the cell site gateways
(CSGs). This virtual cluster simplifies network O&M and deployment. Table 11-17 and Table
11-18 respectively describe the configuration and protocol changes before and after a virtual
cluster is deployed.
Figure 11-76 Deployment position of a virtual cluster
Last mile Access Aggregation MBB core
BTS BSC
Virtual
Master
cluster
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG

Equipment
Table 11-17 Configuration changes before and after a virtual cluster is deployed
Ite CSG Primary AGG Secondary AGG
m
Before a After a Before a After a Before a After a
Virtual Virtual Virtual Virtual Virtual Virtual
Cluster Cluster Is Cluster Is Cluster Cluster Is Cluster Is
Is Deployed Deployed Is Deployed Deployed
Deploye Deploy
d ed
Serv The Only a The Only a The Only a

ice following virtual following virtual following virtual cluster
tunn configurat cluster configurati cluster configuration needs to be
el ions are needs to be ons are needs to s are configured.
required: configured. required: be required: The system
l Config The system l Configu configur l Configuri automatically
uring automatical ring IS- ed. The ng IS-IS establishes
Interm ly IS system the primary
establishes automati l Configuri and
ediate l Configu ng
Syste the primary cally secondary
and ring establish MPLS-TE LSPs and the
m to MPLS-
Interm secondary es the l Configuri correspondin
LSPs and TE primary ng the g BFD
ediate
Syste the l Configu and secondary sessions.
m (IS- correspondi ring the secondar LSP
IS) ng BFD primary y LSPs l Configuri
sessions. LSP and the ng BFD
l Config correspo
uring l Configu for LSP
ring nding
Multip BFD
rotoco BFD for
LSP sessions.
l
Label
Switch
ing-
Traffic
Engin
eering
(MPL
S-TE)
l Config
uring
the
primar
y and
second
ary
label
switch
ed
paths
(LSPs)

Equipment

m
Deploye Deploy
d ed
l Config
uring
Bidire
ctional
Forwa
rding
Detect
ion
(BFD)
for
LSP

Equipment

m
Deploye Deploy
d ed
Ethe The No The The The The

rnet following configuratio following followin following following
servi configurat n is configurati g configuration configuration
ce ions are required. ons are configur s are s are
bear required: required: ations required: required:
er l Config l Configu are l Configuri l Configuri
uring ring the required: ng the ng a
a port primary l Conf secondary remote-
virtual PW iguri PW AP
local l Configu ng a l Configuri interface
area ring remo ng BFD for a CSG
networ BFD for te- for PW l Binding
k PW AP the
(VLA interf l Configuri
l Configu ng an remote-
N) ace AP
ring a for a L2VE
l Config Layer 2 interface interface
uring CSG to VRF
virtual l Configuri
the Ethernet l Bindi l Configuri
primar ng ng a
(L2VE) spoken ng BGP
y and interfac the
second remo PW
e
ary te- l Configuri
pseud l Configu AP ng an
o ring a interf L3VE
wires spoken ace interface
(PWs) PW to and
l Config l Configu VRF connectin
uring ring a l Conf g the
BFD Layer 3 iguri L2VE
for virtual ng interface
PW Ethernet BGP to the
(L3VE) L3VE
interfac interface
e and l Binding
connecti the L3VE
ng the interface
L2VE to VRF
interfac
e to the l Configuri
L3VE ng BGP
interfac
e

Equipment

m
Deploye Deploy
d ed
l Binding
the
L3VE
interfac
e to
virtual
routing
and
forwardi
ng
(VRF)
l Configu
ring the
Border
Gatewa
y
Protocol
(BGP)

Equipment

m
Deploye Deploy
d ed
Asy The No The The The The

nchr following configuratio primary followin secondary following
onou configurat n is multi- g MS-PW configuration
s ions are required. segment configur needs to be s are
trans required: pseudo ations configured. required:
fer l Config wire (MS- are l Configuri
mod uring PW) needs required: ng a
e a port to be l Conf remote-
(AT VLAN configured. iguri AP
M)/ ng a interface
Tim l Config
uring remo for a CSG
e te-
divis the l Binding
primar AP the
ion interf
mult y and remote-
second ace AP
iplex for a
ing ary interface
PWs CSG to VRF
(TD
M) l Config l Bindi l Configuri
servi uring ng ng a PW
ce BFD the to an RSG
for remo
PW te-
AP
interf
ace
to
VRF
l Conf
iguri
ng a
PW
to a
radio
netw
ork
contr
oller
site
gate
way
(RS
G)

Equipment
Table 11-18 Protocol changes before and after a virtual cluster is deployed
m
Deploye Deploy
d ed
Serv l IS-IS l IS-IS l IS-IS l IS-IS l IS-IS l IS-IS

ice l Resou l Master l RSVP- l MSC l RSVP-TE l MSCP
tunn rce Slave TE P
el l LDP for l BGP and
Reserv Control l LDP for l BGP connectio LDP for
ation Protocol connecti and n with a connectio
Protoc (MSCP) on with LDP CSG n with an
ol- a CSG for RSG
Traffic l LDP for
l LDP for conn connectio
Engin ectio
eering connecti n with an
on with n AGG
(RSV with
P-TE) an AGG l BGP and
an
l LDP l BGP RSG LDP for
and connectio
LDP for n with an
connecti RSG
on with
an RSG
Definition
Virtual cluster: is a promising technology for simplifying network O&M and management
and reducing device loads. The control layers of all devices on a network are centralized on a
device. The device performs centralized path calculation, service provisioning, and traffic
control for other devices on the network.
Master: is a server in a virtual cluster and performs centralized path calculation, service
provisioning, and traffic control for access points (APs). The AGGs shown in Figure 11-76
are masters. Masters are classified as primary or secondary masters.
AP: is a client in a virtual cluster and is connected to base stations. The CSGs shown in
Figure 11-76 are APs.
To enhance network reliability, deploy the primary and secondary masters in a virtual cluster
to implement two control planes. As shown in Figure 11-77, an AP can belong to two masters
that work in primary/secondary mode.

Equipment
Figure 11-77 Primary and secondary control planes
Master A
Virtual cluster A
AP2
AP1
AP3
Virtual cluster B Master B
Different APs can belong to different primary and secondary masters. For example, AP1 can
belong to the primary master A and secondary master B, and AP2 can belong to the primary
master B and secondary master A.
If the primary master becomes faulty, traffic on the primary master switches to the secondary
master. The secondary master automatically becomes a new primary master for path
calculation. If the original primary master recovers, traffic switches back to the original
primary master. However, the control layer is still located on the new primary master.
Benefits
IP RAN virtual clusters offer the following benefits:
l The control layers of all devices on access rings are centralized on the AGG, which
significantly simplifies network deployment.
l The entire network automatically adapts to network topology changes, which reduces
O&M costs.
l The dynamic protocols, such as RSVP-TE, LDP, and BGP, do not need to run on access
devices, which implifies the complexity of the network and reduces device loads.
11.10.2 Principles
11.10.2.1 Data Plane
Related Concepts
l VP: A virtual path (VP) is a bidirectional label switched path (LSP) that is established on
a virtual cluster and is used to forward PW packets on the public network.

Equipment
l VP protection group: A VP protection group is a traffic engineering (TE) tunnel that is

established on a virtual cluster. Each VP protection group contains one primary VP and
one backup VP.
l VC: A virtual circuit (VC) is a PW that is established on a virtual cluster and is used to
carry IP RAN services.
Ethernet Service Forwarding

As shown in Figure 11-78, the existing L2VPN+L3VPN solution is used for Ethernet services
on the data plane.
Figure 11-78 Ethernet service forwarding on an LTE network
Virtual
cluster Master RSG
AP
MME
eNodeB
PDU PDU PDU PDU PDU PDU

IP IP IP IP IP IP
Eth1 Eth1 Eth1 VRF VRF Eth6
VC VC TE1 TE1
VP1 VP1 Eth4 Eth5
Swap Eth3 Swap
Eth2
label label
In the L2VPN+L3VPN solution, the process of forwarding an Ethernet packet is as follows:

l S1 service
– After the AP receives an Ethernet packet from the eNodeB, the AP encapsulates the
packet into a VC and forwards the VC packet to a master's remote-AP interface
through the VP between the AP and master.
– The master removes the PW label, encapsulates the packet into the tunnel mapped
to a virtual routing and forwarding (VRF) instance, and forwards the packet to a
radio network controller site gateway (RSG) connected to the mobility management
entity (MME).
– The RSG removes the tunnel label and forwards the packet to the MME.
NOTE
A tunnel label is swapped at each hop in a Multiprotocol Label Switching (MPLS) domain.
l X2 service
– A master forwards X2 packets between adjacent APs.
– If signaling needs to be forwarded to the MME, the forwarding process is similar to
that for S1 packets.

Equipment
NOTE
The process of forwarding 2G/3G Ethernet packets is similar to the process of forwarding LTE S1 and
X2 packets.
l For the process of forwarding 2G/3G Ethernet packets from a BTS/NodeB to a BSC/RNC or the
core network, see the process of forwarding LTE S1 packets.
l For the process of forwarding 2G/3G Ethernet packets between adjacent APs, see the process of
forwarding LTE X2 packets.
11.10.2.2 Control Plane
Related Concepts
Establishing a virtual cluster involves the following concepts:
l Remote-AP interface: is a virtual interface defined on a master in a virtual cluster. The
interface is mapped to an AP's physical interface connected to a base station. The
remote-AP interface terminates the virtual circuit (VC) from the AP to the master and
provides access to the VC from the master to the AP.
l vBridge interface: is a virtual interface created on a master. The interface applies to
Ethernet services when base stations use the same IP network segment. Multiple base
stations use the interface to share a Layer 3 gateway address on the master. Other
interfaces connected to the interface form a vBridge broadcast domain.
l Master Slave Control Protocol (MSCP): is an extension defined by Huawei to the
Diameter protocol. MSCP is used to establish a control channel between an AP and a
master. Using the control channel, the AP reports node information to the master and the
master delivers control information to the AP.
Implementation
Figure 11-79 shows the process of establishing a virtual cluster. All APs flood their own node
and topology information to the entire network. The same topology database (TOPO DB) is
established on the APs and masters. The primary master uses the TOPO DB to calculate the
forwarding paths between each AP and the primary master and between each AP and the
secondary master. Then the primary master advertises the calculation results to the secondary
master. Based on the calculation results, the primary and secondary masters generate tunnel
and VC forwarding entries and deliver the forwarding entries to the APs to establish tunnels
and VCs.
Figure 11-79 Process of establishing a virtual cluster
Virtual
cluster Primary
master
AP
1
2
3
4
5
Secondary
master

Equipment
As shown in Figure 11-79, establishing a virtual cluster involves five procedures. Table
11-19 describes details about the five procedures.
Table 11-19 Procedures for establishing a virtual cluster

N Proc Device Type
o. edur
e AP Primary Master Secondary Master
1 AP Each AP registers with The primary master The secondary master

Regi the primary and allocates an AP ID to allocates an AP ID to
strat secondary masters and the AP. Generally, the the AP. Generally, the
ion reports its own label AP ID is the loopback AP ID is the loopback
space and interface interface address of the interface address of the
information. AP. AP.
2 Topo The AP runs The primary master runs The secondary master
logy Intermediate System to IS-IS, collects topology runs IS-IS, collects
Infor Intermediate System information, and floods topology information,
mati (IS-IS), collects the collected and floods the collected
on topology information, information to the entire information to the
Colle and floods the collected network. entire network.
ction information to the
entire network.
3 Path - The primary master The secondary master

Calc calculates the receives the calculation
ulati forwarding paths results from the primary
on between the AP and master.
primary master and
between the AP and
secondary master and
advertises the
calculation results to the
secondary master.
4 Tun The AP receives The primary master The secondary master

nel forwarding information allocates labels for allocates labels for
Esta such as labels from the interworking with the interworking with the
blish primary and secondary AP and delivers tunnel AP and delivers tunnel
ment masters and generates a forwarding entries to the forwarding entries to
local forwarding table. AP. the AP.
5 VC The AP receives VC The primary master The secondary master

Esta information from the delivers VC entries to delivers VC entries to
blish primary and secondary establish a VC to the establish a VC to the
ment masters. AP. AP.
AP Registration
Before an AP joins a virtual cluster, the AP needs to register with a master. After the AP
registers with the master, you can use the master to log in to the AP and manage it.
AP registration involves the following scenarios:

Equipment
First AP registration
After you configure a virtual cluster, a master first enters the virtual cluster mode and waits
for an AP to register with it. After the AP enters the virtual cluster mode, the AP sends a
request for establishing an MSCP channel to the master. The AP and master must exchange
routes to management IP addresses before establishing an MSCP channel. The AP can
automatically or statically obtain the local management IP address and the primary and
secondary masters' management IP addresses. In automatic mode, the AP randomly selects a
local loopback interface address as the local management IP address and obtains the primary
and secondary masters' management IP addresses after the virtual cluster accesses an
Intermediate System to Intermediate System (IS-IS) process. In static mode, the three
management IP addresses can be specified on the AP. After the MSCP channel is established,
the AP collects its own label space and interface information and sends a registration message
to the master. After the MSCP channel is established, the AP collects its own label space and
interface information and sends a registration message to the master. If the registration is
successful, the master saves and maintains the label space and interface information of AP.
AP attribute change
When attributes on an AP change, the AP sends an update message to a master to update the
AP's information. For example, if an AP no longer belongs to a master, the AP sends a
registration cancel message to the master and sends a registration message to a new master.
Virtual clusters allow you to dynamically add or delete APs and change the status of the
primary and secondary masters to which an AP belongs.
Topology Information Collection

After you deploy a virtual cluster on an AP and a master, the AP and master use extended IS-
IS to establish a neighbor relationship and collect link data. The link data are saved on the
master for subsequent path calculation.
Versatile Routing Platform (VRP) extends IS-IS and adds a type-length-value (TLV) field to
carry topology information specific to a virtual cluster.
Establishing an IS-IS process for a virtual cluster involves the following scenarios:
l No IS-IS process exists in the virtual cluster.
After you configure a virtual cluster, an AP and a master automatically establish an IS-IS
process with the ID of 65534 for the virtual cluster.
l One or more IS-IS processes exist in the virtual cluster. An AP and a master have
different implementation modes:
– AP
If the AP has only one IS-IS process, the process is enabled for the virtual cluster.
If the AP has multiple IS-IS processes, the AP searches for an unused IS-IS process
ID in descending order from 65534 and uses the ID to establish an IS-IS process for
the virtual cluster.
– Master
The master can have multiple IS-IS processes for the virtual cluster. If the master
has no IS-IS process for the virtual cluster, the master searches for an unused IS-IS
process ID in descending order from 65534 and uses the ID to establish the first IS-
IS process for the virtual cluster. You need to manually establish other IS-IS
processes.

Equipment
NOTE
The original standard IS-IS process on the aggregation ring is used between the primary and secondary
masters.
Neighbor establishment and information flooding for extended IS-IS are similar to those for
standard IS-IS. After all information is flooded, the same TEDB is established on APs and
masters. The primary master uses the TEDB to calculate the optimal traffic forwarding path in
the virtual cluster.
Path Calculation
The primary master uses the Constraint Shortest Path First (CSPF) algorithm to calculate four
virtual paths (VPs) for an AP. As shown in Figure 11-80, the four VPs are VP1 to VP4. Table
11-20 describes the VPs and their constraints.
Table 11-20 VPs and their constraints

VP Description Constraint
VP1 Primary VP from an AP to the primary master l VP1 and VP2 do not
intersect.
VP2 Backup VP for VP1
l VP1 and VP3 do not
VP3 Primary VP from an AP to the secondary master intersect.
l VP3 and VP4 do not
VP4 Backup VP for VP3 intersect.
Figure 11-80 VPs in a virtual cluster

Primary
VP4 master
VP1
AP
VP2
Secondary
VP3 master
VP protection group 1 VP protection group 2
Tunnel Establishment
After path calculation is complete, the primary and secondary masters allocate labels and
forwarding entries to all APs based on the calculation results. Then the primary and secondary
masters use the MSCP channels to deliver the allocated labels and forwarding entries to the
APs. The primary master calculates four VPs for an AP. Two of the VPs form a VP protection
group from the primary master to the AP, and the other two form a VP protection group from

Equipment
the secondary master to the AP. Two VPs in each VP protection group work in hot standby
mode. Table 11-21 describes the relationship between the VP protection groups and VPs.
Table 11-21 Relationship between the VP protection groups and VPs
Path Description Path Description

Protection
Group
VP In normal cases, traffic Primary In normal cases, traffic is

protection is transmitted using VP path: VP1 transmitted using VP1. If the link
group 1 protection group 1. If between the AP and primary master
the AP or primary Backup becomes faulty, traffic is transmitted
master becomes faulty, path: VP2 using VP2.
traffic is transmitted
VP using VP protection Primary When traffic is transmitted using
protection group 2. path: VP3 VP protection group 2, the traffic is
group 2 transmitted using VP3 by default. If
Backup the link between the AP and
path: VP4 secondary master becomes faulty,
the traffic is transmitted using VP4.
VC Establishment
A master uses a remote-AP interface to establish a VC to an AP or terminate the VC. A
service bearer between a master and an RSG varies for different scenarios. Table 11-22
describes VC establishments in different scenarios.
Table 11-22 PW establishments in different scenarios
Scenario Implementation Figure
Ethernet A master uses a remote-AP interface

services to terminate a VC to an AP. Layer 3 Figure 11-81 Ethernet service
are carried virtual private network (L3VPN) bearers when base stations use
when base access using an L2VPN is different IP network segments
stations use implemented based on an interface
10.0.0.100/24
different IP loopback. To enable a master to 10.0.0.1
/24
AP1
VC1
L3VPN
network establish a VC to an AP, configure a

segments. remote-AP interface on the master .1/24 VC2
Master RSG
20.0.0 Remote-AP interface

and bind the interface to the AP's AP2 20.0.0.100/24
attachment circuit (AC) interface.

The precautions for end-to-end
service bearers are as follows:
l A master and an AP
automatically establish a VC.
l You need to bind the master's
remote-AP interface to an
L3VPN instance established
between the master and an RSG.

Equipment
Scenario Implementation Figure
Ethernet Remote-AP interfaces terminate

services multiple VCs on an access ring. The Figure 11-82 Ethernet service
are carried vBridge interface is bound to an bearers when base stations use the
when base L3VPN instance established between same IP network segment
stations use a master and an RSG. To implement
10.0.0.100/24
the same L3VPN access using an L2VPN and 10.0.0.1
/24
AP1
VC1
L3VPN
IP network ensure end-to-end traffic forwarding,

segment. add the remote-AP interfaces to a .2/24 VC2
Master RSG
10.0.0 Remote-AP interface

vBridge broadcast domain. AP2
vBridge interface
The precautions for end-to-end

service bearers are as follows:
l A master and APs automatically
establish VCs.
l You need to bind the vBridge
interface to an L3VPN instance
TDM A master uses a remote-AP interface

services to terminate a VC to an AP. To Figure 11-83 TDM service bearers
are carried. enable a master to establish a VC to MS-PW
an AP, configure a remote-AP VC PW
interface on the master and bind the

AP Master RSG
interface to the AP and its AC Remote-AP interface
interface. The precautions for end-to-

end service bearers are as follows:
l A master and an AP
automatically establish a VC.
l You need to configure L2VPN
virtual leased line (VLL) services
on the master's remote-AP
interface to establish a PW
The master connects the VC and the
PW to form an end-to-end multi-
segment pseudo wire (MS-PW).
11.10.2.3 Management Plane

In addition to a network management system (NMS), you can use a master to manage access
points (APs) on an IP radio access network (RAN) virtual cluster.
Implementation
You can use either of the following methods to manage devices on an access ring:
l Using commands: You can log in to a master and run commands to manage the devices.
When you run commands on the master to configure an AP, the master automatically
uses Telnet to connect to APs. The master only transparently transmits AP configurations

Equipment
and does not save them. To ensure the security of automatic login, an AP needs to verify
the identity of a login user.
l Using an NMS: You can use an NMS to manage the devices as you do before a virtual
Figure 11-84 Management plane for an IP RAN virtual cluster
NMS
BTS BSC
n et
Tel
Tel Master
net
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG
11.10.2.4 Protection Switching

This section describes how to use network detection and protection mechanisms to improve
network reliability and provide millisecond-level protection for Ethernet services in an IP
radio access network (RAN) virtual cluster.
Protection for Ethernet Services

After you deploy a virtual cluster, the virtual cluster automatically establishes the primary and
backup label switched paths (LSPs), primary and secondary VCs, and Bidirectional
Forwarding Detection (BFD) sessions for the VPs and VCs. The primary and secondary VCs
use master/slave PW redundancy. In normal cases, the primary VC works in the active state
and the secondary VC works in the standby state. Figure 11-85 shows how Ethernet services
are protected.

Equipment
Figure 11-85 Protection for Ethernet services

Primary
master
RSG1
Primary
VC VRF
BTS BSC
Secondary
VC
AP
NodeB RNC
Secondary RSG2
master
BFD for VC BFD for TE tunnel

BFD for VP BFD for LSP
VP protection group
VC redundancuy
VPN FRR
TE Hotstanby
11.10.2.5 Graceful Restart

If a Master Slave Control Protocol (MSCP) channel in a virtual cluster becomes faulty,
graceful restart (GR) enables traffic to be properly forwarded on the data plane during channel
recovery. GR provides high availability for the entire network.
Background
If the MSCP channel in a virtual cluster becomes faulty, the data plane is interrupted. The
MSCP channel becomes faulty if either of the following cases occurs:
l A master in the virtual cluster becomes faulty, and a switchover occurs between the
master's master and slave main control boards.
l The link between the AP and master becomes faulty.
GR ensures that the data plane is not interrupted if the MSCP channel becomes faulty.
Related Concepts
GR: In IETF, protocols related to Internet Protocol/Multiprotocol Label Switching (IP/MPLS)
such as Open Shortest Path First (OSPF), Intermediate System-Intermediate System (IS-IS),
Border Gateway Protocol (BGP), Label Distribution Protocol (LDP), and Resource
Reservation Protocol (RSVP) are extended to ensure that the forwarding is not interrupted
when the system is restarted. This reduces the flapping of the protocols at the control plane
when the system performs the AMB/SMB switchover. This series of standards is called GR
extension to each protocol.
Implementation
As shown in Figure 11-86, traffic is transmitted from the AP to the master over a pseudo wire
(PW). If the MSCP channel becomes faulty due to a switchover between the master's master
and slave main control boards or other causes, the AP and master both enter the GR state. The

Equipment
data plane is not interrupted during MSCP channel recovery. The radio network controller site
gateways (RSGs) outside the virtual cluster do not detect the fault. The GR process is as
follows:
Figure 11-86 GR process
AP Master
The AP and master advertise their

own GR capabilities to each other.
The MSCP channel is interrupted.
The master and AP start their own GR

Reconnect timers. The AP sends a re-
registration request to the master.
After the MSCP channel recovers, the AP

and master enter the GR Recovery state.
After the forwarding data is restored, the AP
sends a GR End message to notify the master
that GR ends.
1. Before a switchover occurs between the master's master and slave main control boards,
the AP sends a registration request carrying the GR flag to notify the master of its own
GR capability. After the AP registers with the master, the master notifies the AP of its
own GR capability.
2. If the MSCP channel is interrupted, the master and AP start their own GR Reconnect
timers. The AP sends a re-registration request to the master. During the re-registration,
the AP and master re-exchange GR capability information with each other. Before the
GR Reconnect timers expire, forwarding entries are retained, which ensures traffic
forwarding continuity.
3. After the MSCP channel recovers, the AP and master stop their own GR Reconnect
timers, restore the forwarding data before the fault occurred, and start their own GR
Recovery timers.
4. After the forwarding data is restored, the AP sends a GR End message to notify the
master that GR ends.
NOTE
If GR ends but service data is still not restored, traffic switches to the secondary PW or backup LSP.
11.10.2.6 OAM
IP radio access network (RAN) virtual clusters use the following operation, administration
and maintenance (OAM) techniques:
l Bidirectional Forwarding Detection (BFD): used for fault detection

Equipment
l IP Flow Performance Management (IP FPM) for Ethernet services: used for performance
monitoring
l Virtual circuit connectivity verification (VCCV) ping and label switched path (LSP)
ping/tracert: used for fault locating
Fault Detection
When a virtual cluster is established to carry Ethernet services, BFD sessions are
automatically established for LSPs and PWs in the virtual cluster. You need to manually
configure BFD for TE tunnel and BFD for LSP outside the virtual cluster.
Fault Locating
If a traffic interruption occurs, perform the following operations to locate the fault:
l Perform end-to-end detection in the order of RSG -> master -> AP.
l Perform TE tunnel detection between the RSG and master.
l Perform PW detection between the master and AP.
l Perform LSP detection.
Figure 11-87 shows methods for locating faults between nodes.
Figure 11-87 Methods for locating faults between nodes

Base RNC/BSC
AP Master RSG /MME
station
PW ping
PW Mix-VPN
layer MS-PW VCCV ping
LSP LSP ping/tracert LSP ping/tracert

layer
IP
layer
ICMP ping
As shown in Figure 11-87, methods for locating faults between nodes are classified into the
following categories:
l End-to-end fault locating
Internet Control Message Protocol (ICMP) ping/tracert can be used for Ethernet services
to implement end-to-end fault locating.
l Fault locating outside a virtual cluster
LSP ping/tracert can be used for both Ethernet services to implement fault locating
outside the virtual cluster.
l Fault locating in a virtual cluster
– PW ping in label alert mode can be used for fault locating in a virtual cluster. You
can use a ping command on the master to detect PW continuity. Because the AP has
no route to the master, a reply packet must be transmitted over an LSP.
– LSP ping/tracert can also be used for fault locating in a virtual cluster. You can use
a ping or tracert command on the master to detect LSP continuity. Because the AP
has no route to the master, a reply packet must be transmitted over an LSP.

Equipment
11.10.3 Application
11.10.3.1 Application of IP RAN Virtual Clusters

This section describes how to deploy virtual clusters on an IP radio access network (RAN),
including the deployment position, mode, and precautions.
Deployment Position
The Versatile Routing Platform (VRP) allows a virtual cluster to be deployed on an access
ring of an IP RAN. That is, you can deploy a virtual cluster on cell site gateways (CSGs) and
access aggregation gateways (AGGs). Figure 11-88 shows the deployment position of a
virtual cluster.
Figure 11-88 Deployment position of a virtual cluster
BTS BSC
Virtual
cluster Master
AP
RNC
NodeB
MME
eNodeB CSG AGG RSG
Deployment Mode and Precautions

A virtual cluster uses the primary and secondary masters to control access points (APs). As
shown in Figure 11-89, A, B, C, and D are masters and other nodes are APs. The precautions
for deploying masters and APs are as follows:
l You can specify one primary master and one secondary master for an AP and ensure that
the AP can communicate with the primary and secondary masters.
l The primary and secondary masters may be different for each AP. For example, the
primary master for AP1 is master A and the secondary master for AP1 is master B,
whereas the primary master for AP2 is master B and the secondary master for AP2 is
master A.
l If you specify a primary or secondary master for an AP, the master also needs to control
the APs between the AP and master.

Equipment
Figure 11-89 Example for AP and master deployment
Master A B C D
5 7
12 13
6
1
AP
9
4
2
11
3 10
8
Virtual Clusters Coexisting with Non Virtual Clusters

To ensure that the entire network smoothly evolves toward virtual clusters, deploy virtual
clusters on the entire network step by step. That is, before virtual clusters are completely
implemented on the entire network, some devices on the entire network run virtual clusters
and others run non virtual clusters. A ring-by-ring upgrade is recommended to minimize the
impact on services. The upgrade precautions are as follows:
l A master runs virtual and non virtual clusters based on interfaces. If you have configured
a virtual cluster on an interface, the interface cannot provide access services for non
virtual clusters.
l An AP can run only virtual or non virtual clusters globally. If you have configured a
virtual cluster on an AP, the virtual cluster takes effect on the AP's all interfaces.
Figure 11-90 shows the initialization status of an example network on which no virtual

Equipment
Figure 11-90 Initialization status of an example network on which no virtual cluster is

deployed
AGG A B C D
5 7
12 13
6
1
CSG
9
4
2
11
3 10
8
The procedure for deploying virtual clusters on the entire network is as follows:
1. Upgrade the A-D ring and ensure that D's interface connecting to AP13 runs a virtual
cluster and D's interface connecting to AP12 still runs a non virtual cluster, as shown in
Figure 11-91.
Figure 11-91 Step 1
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8
2. Upgrade the B-D ring, as shown in Figure 11-92.

Equipment
Figure 11-92 Step 2
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8
3. Upgrade C and AP7 to implement virtual clusters on the entire network, as shown in
Figure 11-93.
Figure 11-93 Step 3
A B C D
5 7
12 13
6
1
9
4
2
11
3 10
8

Terms
Term Description
Virtual cluster A promising technology for simplifying

network O&M and management and
reducing device loads. The control layers of
all devices on a network are centralized on a
device. The device performs centralized
path calculation, service provisioning, and
traffic control for other devices on the
network.

Equipment
Term Description
Master A server in a virtual cluster. A master

performs centralized path calculation,
service provisioning, and traffic control for
access points (APs). Masters are classified
as primary or secondary masters.
AP A client in a virtual cluster. An AP is a cell

site gateway (CSG) on a traditional IP RAN
and is connected to base stations.

Acronym&Abbreviation Full Name
AP access point
MSCP Master Slave Control Protocol
VP virtual path
VC virtual circuit

Equipment
Feature Description 12 QoS
12 QoS
About This Chapter
This document describes the QoS in terms of the overview, principle, and applications.
12.1 QoS Overview

This chapter describes QoS basics and solutions, the Diff-Serv model, and relevant QoS
technologies.
12.2 Traffic Policing and Traffic Shaping
12.3 Congestion Avoidance and Management
12.4 Class-Based QoS
12.5 HQoS
12.1 QoS Overview

This chapter describes QoS basics and solutions, the Diff-Serv model, and relevant QoS
technologies.
12.1.1 Introduction to QoS

This section describes the basic concepts of QoS, traditional packet delivery services, new
demands resulting from new services, and QoS features supported by the device.
Quality of service (QoS) is used to assess the ability of the supplier to meet the customer
demands.
On the Internet, QoS is used to assess the ability of the network to transmit packets. As the
network provides a wide variety of services, QoS should be assessed from different aspects.
QoS generally refers to the analysis of the issues related to the process of sending packets
such as, bandwidth, delay, jitter, and packet loss ratio.

Equipment
12.1.1.1 Traditional Packets Transmission Application

The best-effort service, as a traditional service, does not give priority to the traffic that is
delay- or jitter-sensitive, or requires a low packet loss ratio or high reliability. That is, all
packets are treated in a uniform manner.
It is difficult to ensure QoS in the traditional IP network. Because ATNs in the network handle
all the packets equally and adopt First In First Out (FIFO) method to transfer packets.
Resources used for forwarding packets are allocated based on the arrival sequence of the
packets.
All packets share the bandwidth of networks and ATNs. Resources are allocated according to
the arrival time of the packets. This policy is called best effort (BE) . The device in this mode
tries its best to transmit packets to the destination. The BE mode, however, does not ensure
any improvement in delay time, jitter, packet loss ratio, and high reliability.
The traditional BE mode applies only to services such as browsing the web page, file transfer,
and email, which have no specific request for bandwidth and jitter.
12.1.1.2 New Applications Requirements

Compared with traditional QoS policies, newly developed QoS policies meet various
requirements and provide differentiated services.
With the rapid development of the network, increasing number of networks are connected to
the Internet. The Internet expands greatly in size, scope, and users. The use of the Internet as a
platform for data transmission and implementation of various applications is on the rise.
Further, the service providers also want to develop new services for more profits.
Apart from traditional applications such as browsing the web page, email, and File Transfer
Protocol (FTP), the Internet has expanded to accommodate other services such as E-learning,
telemedicine, videophone, videoconference, and video on demand. Enterprise users want to
connect their branches in different areas through VPN technologies to implement applications
such as accessing corporate databases or managing remote devices through Telnet.
These new applications put forward special requirements for bandwidth, delay, and jitter. For
example, videoconference and video on demand require high bandwidth, low delay, and low
jitter. Telnet stresses on low delay and priority handling in the event of congestion.
As new services spring up, the number of requests for the service capability of IP networks
has been on the rise. Users expect improved service transmission to the destination and also
better quality of services. For example, IP networks are expected to provide dedicated
bandwidth, reduce packet loss ratio, avoid network congestion, control network flow, and set
the preference of packets to provide different QoS for various services.
All these demand better service capability from the network, and QoS is just an answer to the
requirements.
12.1.2 End-to-End QoS Model

Based on network quality and user requirements, QoS provides end-to-end services for users
through different service models.
Different service models are provided for user services to ensure QoS according to users'
requirements and the quality of the network. The common service models are as follows:
l Best-Effort service model

Equipment
l Integrated service model

l Differentiated service model
12.1.2.1 Best-Effort Service Model

The BE service model is applicable to the services that are insensitive to the delay and has
lower requirements for reliability. BE is realized through the FIFO mechanism.
Best-Effort is an indiscriminate and the simplest service model. Application programs can,
without notifying the network or obtaining any approval from the network, send any number
of packets at any time. For the Best-Effort service, the network tries its best to send packets,
but cannot ensure the performance such as delay and reliability. The Best-Effort model is the
default service model of the Internet and can be applied to most networks, such as FTP and
email, through the First-in-First-out (FIFO) queue.
12.1.2.2 Integrated Service Model

In the integrated service model, the application program applies to the network for specific
service, and does not send packets until the arrival of confirmation that the network has
reserved resources for it.
The integrated service model is called IntServ for short. IntServ is an integrated service model
and can meet various QoS requirements. In this service model, before sending packets, an
application program needs to apply for specific services through signaling. The application
program first notifies the network of its traffic parameters and the request for special service
qualities such as bandwidth and delay. After receiving the confirmation of the network that
resources have been reserved for packets, the application program begins sending packets.
The sent packets are controlled within the range specified by the flow parameters.
After receiving the request for resources from the application program, the network checks
the resource allocation. That is, based on the request and current available resources, the
network determines whether to allocate resources for the application program or not. Once the
network confirms that resources are allocated for the packets, and as long as the packets are
controlled within the range specified by the flow parameters, the network is certain to meet
the QoS requirements of the application program. The network maintains a state for each flow
that is specified by the source and destination IP addresses, interface number, and protocol
number. Based on the state, the network classifies packets and performs traffic policing,
entering queue, and scheduling to fulfil its commitment to the application program.
IntServ can provide the following services:
l Guaranteed service: Provides the preset bandwidth and delay to meet the requirements of
the application program. For example, a 10M Bit/s bandwidth and a delay less than one
second can be provided for Voice over IP (VoIP) services.
l Controlled-load service: If network overload occurs, packets can still be provided with
the service similar to that provided in the absence of network overload. That is, when
traffic congestion occurs on the network, less delay and high pass rate are ensured for the
packets of certain application programs.
12.1.2.3 Differentiated Service Model

In the differentiated service model, the application program does not need to send its request
for network resources before sending packets. Instead, the application program notifies
network nodes of its QoS requirements by setting QoS parameters in the IP header.

Equipment
The differentiated service model is called Diff-Serv for short. In the model, the application
program does not need to send its request for network resource before sending the packets.
The application program informs network nodes of its demand for QoS by using QoS
parameters in the IP packet header. Then ATNs along the path obtain the demand by
analyzing the header of the packet.
To implement Diff-Serv, the access ATN classifies packets and marks the class of service
(CoS) in the IP packet header. The downstream ATNs then identify the CoS and forward the
packets on the basis of CoS. Diff-Serv is therefore a class-based QoS solution.
Diff-Serv Model in IP Network

l Diff-Serv Networking
The network node that implements Diff-Serv is called a DS node. A group of DS nodes
that adopt the same service policy and the same per-hop behavior (PHB) is called a DS
domain. See Figure 12-1.
DS nodes are classified into the following two modes:
– DS border node: Connects DS domain with non-DS domain. This node controls
traffic and sets Differentiated Services CodePoint (DSCP) value in packets
according to the Traffic Conditioning Agreement (TCA).
– DS interior node: Connects a DS border node with other interior nodes or connects
interior nodes in a DS domain. This node carries out only the simple traffic
classification and traffic control based on the DSCP value.
Figure 12-1 Diff-Serv networking diagram
DS domain
DS node DS node
DS node
Non-DS
Non-DS domain domain
l DS Field and DSCP

The Type of Service (ToS) octet in IPv4 packet header is defined in RFC791, RFC134,
and RFC1349. As shown in Figure 12-2, the ToS octet contains the following fields:
Precedence: It is of three bits (bits 5 through 7). It indicates the precedence of the IP
packet. D bit: It is of one bit and indicates delay. T bit: It is of one bit and indicates

Equipment
throughput. R bit: It is of one bit and indicates reliability. C bit: It is of one bit and
indicates cost. The lowest bit of ToS field has to be 0.
The ATN first checks the IP precedence of packets to implement QoS. The other bits are
not fully used.
The ToS octet of IPv4 packet header is redefined in RFC2474, called DS field. As shown
in Figure 12-2: The leftmost 6 bits (from 0 through 5) in DS field are used as DSCP. The
rightmost 2 bits (6 and 7) are the reserved bits. The leftmost 3 bits (from 0 through 2) are
Class Selector CodePoint (CSCP), which indicate a type of DSCP. DS node selects PHB
according to the DSCP value.
Figure 12-2 ToS field and DS field
IPv4 ToS DS Field
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Precedence DTRC 0 CSCP unused
DSCP
The DSCP field within the DS field is capable of conveying 64 distinct codepoints. The
codepoint space is divided into three pools as shown in Table 12-1.
Table 12-1 Classification of DSCP

Code Pool Code Space Usage
1 xxxxx0 Standard action
2 xxxx11 EXP/LU (experiment or local use)
3 xxxx01 EXP/LU (can be used as the extended space for

future standard action)
Code pool 1 (xxxxx0) is used for standard action, code pool 2 (xxxx11) and code pool 3
(xxxx01) are used for experiment or future extension.
l Standard PHB
The DS node implements the PHB behavior on the data flow. The network administrator
can configure the mapping from DSCP to PHB. When a packet is received, the DS node
detects its DSCP to find the mapping from DSCP to PHB. If no matching mapping is
found, the DS node selects the default PHB (Best-Effort, DSCP=000000) to forward the
packet. All the DS nodes support the default PHB.
The following are the four standard PHBs defined by the IETF: Class selector (CS),
Expedited forwarding (EF), Assured forwarding (AF) and Best-Effort (BE). The default
PHB is BE.
– CS PHB

Equipment
Service levels defined by the CS are the same as the IP precedence used on the
network.
The value of the DSCP is XXX000 where the value of "X" is either 1 or 0. When
the value of DSCP is 000000, the default PHB is selected.
– EF PHB
EF means that the flow rate should never be less than the specified rate from any
DS node. EF PHB cannot be re-marked in DS domain except on border node. New
DSCP is required to meet EF PHB features.
EF PHB is defined to simulate the forwarding of a virtual leased line in the DS
domain to provide the forwarding service with low drop ratio, low delay, and high
bandwidth.
– AF PHB
AF PHB allows traffic of a user to exceed the order specification agreed by the user
and the ISP. It ensures that traffic within the order specification is forwarded. The
traffic exceeding the specification is not simply dropped, but is forwarded at lower
service priorities.
Four classes of AF: AF1, AF2, AF3, and AF4 are defined. Each class of AF can be
classified into three different dropping priorities. AF codepoint AFij indicates AF
class is i (1<=i<=4) and the dropping priority is j (1<=j<=3). When providing AF
service, the carrier allocates different bandwidth resource for each class of AF.
A special requirement for AF PHB is that the traffic control cannot change the
packet sequence in a data flow. For instance, in traffic policing, different packets in
a service flow are marked with different dropping priorities even if the packets
belong to the same AF class. Although the packets in different service flows have
different dropping ratio, their sequence remains unchanged. This mechanism is
especially applicable to the transmission of multimedia service.
– BE PHB
BE PHB is the traditional IP packet transmission that focuses only on reachability.
All ATNs support BE PHB.
l Recommended DSCP
Different DS domains can have self-defined mapping from DSCP to PHB. RFC2474
recommends code values for BE, EF, AFij, and Class Selector Codepoints (CSCP).
CSCP is designed to be compatible with IPv4 precedence model.
– BE: DSCP=000000
– EF: DSCP=101110
– AFij codepoint
AFij codepoint is shown in Table 12-2.
Table 12-2 AF codepoint
Service Low Dropping Medium High Dropping

Class Priority, j=1 Dropping Priority, j=3
Priority, j=2
AF(i=4) 100010 100100 100110
AF(i=3) 011010 011100 011110

Equipment
Service Low Dropping Medium High Dropping

Class Priority, j=1 Dropping Priority, j=3
Priority, j=2
AF(i=2) 010010 010100 010110
AF(i=1) 001010 001100 001110
In traffic policing:
n If j=1, the packet color is marked as green.
n If j=2, the packet color is marked as yellow.
n If j=3, the packet color is marked as red.
The first three bits of the same AF class are identical. For example, the first
three bits of AF1j are 001; that of AF3j are 011, that of AF4j are 100. Bit 3 and
bit 4 indicate the dropping priority which has three valid values including 01,
10, and 11. The greater the Bit value, the higher the dropping priority.
n Class selector codepoint
In the Diff-Serv standard, the CSCP is defined to make the DSCP compatible
with the precedence field of the IPv4 packet header. The ATNs identify the
priority of the packets through IP precedence. The IP precedence and the
CSCP parameters map with each other. The user should configure the values
for these parameters. In CSCP, the higher the value of DSCP=xxx000 is, the
lower the forwarding delay of PHB is.
The default mapping between DSCP and IPv4 precedence is shown in Table
12-3.
Table 12-3 The default mapping between IPv4 precedence and DSCP
IPv4 DSCP (in DSCP (in dotted Service
Precedence binary) decimal) Class
0 000000 0 BE
1 001000 8 AF1
2 010000 16 AF2
3 011000 24 AF3
4 100000 32 AF4
5 101000 40 EF
6 110000 48 CS6
7 111000 56 CS7
Diff-Serv Model in the MPLS Network

l EXP field

Equipment
Defined in RFC3032, MPLS packet header is shown in Figure 12-3. EXP field is of
three bits. Its value ranges from 0 to 7 and indicates the traffic type. By default, EXP
corresponds to IPv4 priority.
Figure 12-3 Position of EXP field
MPLS Header
0 1 2 3..... 20 21 22 23 31
LABEL EXP S TTL
l Processing QoS Traffic in MPLS Domain

– Processing QoS Traffic on the Ingress Device
On the Ingress device of MPLS domain, you can limit the data flow by setting the
Committed Access Rate (CAR) to ensure that the data flow complies with MPLS
bandwidth regulations. Besides, you can assign different priorities to the IP packets
according to certain policies.
One-to-one mapping can be achieved since the IP precedence field and the EXP
field are both 3 bits. In Diff-Serv domain, however, the DSCP field of IP packet is 6
bits, which is different from the length of EXP and thus leads to many-to-one
mapping. It is defined that the first 3 bits of DSCP (that is, CSCP) are mapped with
EXP.
– Processing QoS Traffic on the Device in the MPLS Domain
When forwarding the MPLS label, the LSR in MPLS carries out queue scheduling
according to the EXP field in the labels of packets that are received. This ensures
that packets with higher priority enjoy better service.
– Processing QoS Traffic on the Egress Device
On the Egress device of MPLS domain, you need to map EXP field to DSCP field
of IP packet. By standard, the first 3 bits of DSCP (that is, CSCP) take the value of
EXP, and the last 3 bits take 0.
It should be noted that QoS is an end-to-end solution, while MPLS only ensures that data can
enjoy the services regulated in SLA. After the data enters the IP network, IP network ensures
QoS.
12.1.3 Techniques Used for the QoS Application

This section describes functions used for QoS implementation, such as traffic classification,
traffic policing, traffic shaping, congestion management, and congestion avoidance, RSVP,
and the link efficiency mechanism.
The primary technologies for implementing Diff-Serv include:
l Traffic classification
l Traffic policing
l Traffic shaping
l Congestion management

Equipment
l Congestion avoidance
Traffic classification is the basis of the QoS application. With this technique, packets are
identified based on certain mapping rules. This is a precondition for providing differentiated
services. Traffic policing, traffic shaping, congestion management, and congestion avoidance
control the network traffic and resource allocation from different aspects. They feature the
Diff-Serv concept. The following describes these techniques in detail:
l Traffic classification: Identifies objects according to specific rules. It is the prerequisite

of Diff-Serv and is used to identify packets according to defined rules.
l Traffic policing: Controls the traffic rate. The rate of the traffic that enters the network is
monitored and the traffic exceeding its rate limit is restricted. Only a reasonable traffic
range is allowed to pass through the network. This optimizes the use of network
resources and protects the interests of the service providers.
l Traffic shaping: Actively adjusts the rate of outputting traffic. It adjusts the volume of
output traffic according to the network resources that can be afforded by the downstream
ATN to prevent dropping of packets and congestion.
l Congestion management: Handles resource allocation during network congestion. It
stores packets in the queue first, and then takes a dispatching algorithm to decide the
forwarding sequence of packets.
l Congestion avoidance: Monitors the usage of network resources, and actively drops
packets in case of heavy congestion. This addresses the problem of network overload.
For the common QoS features in the Diff-Serv model, see Figure 12-4.
Figure 12-4 Common QoS features in the Diff-Serv model
Configure simple traffic

classification, queue
scheduling, congestion
Configure complex management, and
traffic classification congestion avoidance.
and traffic policing on
the ingress of the
network DS domain
DS node DS node
DS node
Traffic
shaping
Non-DS
Non-DS domain domain

Equipment
In the Int-Serv model, the Resource Reservation Protocol (RSVP) is used as signaling for the
transmission of QoS requests. When a user needs QoS guarantee, the user sends a QoS
request to the network devices through the RSVP signaling. The request may be a
requirement for delay, bandwidth, or packet loss ratio. After receiving the RSVP request, the
nodes along the transfer path perform admission control to check the validity of the user and
the availability of resources. Then the nodes decide whether to reserve resources for the
application program. The nodes along the transfer path meet the request of the user by
allocating resources to the user. This ensures the QoS of the user services.
12.1.3.1 Traffic Classification

Consisting of complex traffic classification, simple traffic classification, and forced traffic
classification, traffic classification classifies packets so that the device can identify packets of
various features.
When implementing QoS in Diff-Serv model, the ATN needs to identify each class of traffic.
The following are the three methods for the ATN to classify traffic:
l Complex traffic classification: Packets are classified using complex rules, for example,
by integrated link layer, network layer, and transmission layer information (such as
source MAC address, destination MAC address, source IP address, destination IP
address, user group number, protocol type, and TCP/UDP port number specific to
application). Complex traffic classification is performed on ATN at edges of a Diff-Serv
domain.
l Simple traffic classification: This classification is based on IP precedence, DSCP, MPLS
EXP, 802.1P precedence in packets. A collection of packets of the same class is called
Behavior Aggregate (BA). Generally, the core ATN in Diff-Serv domain performs only
simple traffic classification.
l Forced traffic classification: The ATN supports forced traffic classification. That is, you
can run a command to configure forced traffic classification on the inbound interface to
set the precedence for the traffic. Then, the traffic is forwarded to the outbound interface
carrying the specified precedence.
12.1.3.2 Traffic Policing and Shaping

Traffic policing is used to monitor the volume of the traffic that enters a network and keep it
within a reasonable range. In addition, traffic policing optimizes network resources and
protects the carriers' interests by restricting the traffic that exceeds the rate limit.
In a Diff-Serv domain, traffic policing, and traffic shaping is completed by the traffic
conditioner. A traffic conditioner consists of four parts: Meter, Marker, Shaper, and Dropper
as shown in Figure 12-5.
l Meter: Measures the traffic and judges whether the traffic complies with the
specifications defined in TCS. Based on the result, the ATN performs other actions
through Marker, Shaper, and Dropper.
l Marker: Re-marks the DSCP of the packet, and puts the re-marked packet into the
specified BA. The available measures include lowering the color and the service level of
the packet flow which does not match the traffic specifications (Out-of-Profile) or
maintaining the color and the service level.
l Shaper: Indicates the traffic shaper. Shaper has buffer which is used to buffer the traffic
received and ensures that packets are sent at a rate not higher than the committed rate.

Equipment
l Dropper: Performs the traffic policing action, which controls the traffic by dropping
packets so that the traffic rate conforms with the committed rate. Dropper can be
implemented by setting the Shaper buffer to 0 or a small value.
Figure 12-5 Traffic policing and shaping
Meter
Packets
Shaper/
Classifier Marker
Dropper
In Diff-Serv, ATNs must support traffic control on the inbound and outbound interfaces
simultaneously. The functions of ATNs vary with their locations. The functions of a ATN are
as follows:
l The border ATN processes the access of a limited number of low-speed users. In this
way, traffic control on the border ATN can be completed efficiently. A large amount of
traffic classification and traffic control are completed by the border ATN.
l The core ATN only performs PHB forwarding of BA to which packets flow belong. In
this way, PHB forwarding can be completed with high efficiency, which also meets the
requirements of high-speed forwarding by Internet core network.
12.1.3.3 Congestion Avoidance Configuration

Congestion management creates queues, classifies packets, and places packets into different
queues for scheduling. When congestion occurs or intensifies, congestion management
allocates proper network resources to various services.
Low QoS in the traditional networks is caused by network congestion. When the available
resources temporarily fail to meet the requirements of the service transmission, the bandwidth
cannot be ensured. As a result, service rate decreases, resulting in long delay and high jitter.
This phenomenon is called congestion.
Causes of Congestion
Congestion often occurs in complex packet switching environment of the Internet. It is caused
by the bandwidth bottleneck of two types of links, as shown in Figure 12-6.

Equipment
Figure 12-6 Schematic diagram of traffic congestion
100M
100M 10M 100M 100M
Traffic congestion on interfaces

operating at different speeds 100M
Traffic congestion on interfaces

operating at the same speed
l Packets enter the ATN at high rate through v1, and are forwarded at low rate through v2.
Congestion occurs in the ATN because the rate of v1 is greater than that of v2.
l Packets from multiple links enter the ATN at the rate of v1, v2, and v3. They are
forwarded at the same rate of v4 through a single link. Congestion occurs in the ATN
because the total rate of v1, v2, and v3 is greater than that of v4.
Congestion also occurs due to the causes as follows:
l Packets enter the ATN at line speed.
l Resources such as available CPU time, buffer, or memory used for sending packets are
insufficient.
l Packets that arrive at the ATN within a certain period of time are not well controlled. As
a result, the network resources required to handle the traffic exceed the available
resources.
Congestion Results
The impact of congestion is as follows:
l Increases the delay and the jitter in sending packets. Long delay can cause retransmission
of packets.
l Reduces the efficiency of throughput of the network and result in waste of the network
resources.
l Consumes more network resources, particularly storage resources when congestion is
aggravated. If not properly allocated, the network resources may be exhausted, and the
system may crash.
Congestion is the main cause of low QoS. It is very common in complex networks and must
be solved to increase the efficiency of the network.
Congestion Solutions
When congestion occurs or aggravates, queue scheduling and packet discard policies can be
used to allocate network resources for traffic of each service class. The commonly used
packet discard policies are as follows:
l Tail Drop

Equipment
When the queue is full, subsequent packets that arrive are discarded.
l Random Early Detection (RED)
When the queue reaches a certain length, packets are discarded randomly. This can avoid
global synchronization due to slow TCP start.
l Weighted Random Early Detection (WRED)
When discarding packets, the ATN considers the queue length and packet precedence.
The packets with low precedence are discarded first and are more likely to be discarded.
The ATN adopts WRED to avoid congestion problems.
12.1.3.4 RSVP
Through RSVP signaling, requests for resources are transmitted between nodes on the entire
network. The nodes then allocate resources based on the priorities of requests.
RSVP is an end-to-end protocol.
Requests for resources are transmitted between nodes through RSVP. The nodes allocate
resources at the requests. This is the process of resource reservation. Nodes check the requests
against current network resources before determining whether to accept the requests. If the
current network resources are limited, certain requests can be rejected.
Different priorities can be set for different requests for resources. Therefore, a request with a
higher priority can preempt reserved resources when network resources are limited.
RSVP determines whether to accept requests for resources and promises to meet the accepted
requests. RSVP itself, however, does not implement the promised service. Instead, it uses the
techniques such as queuing to guarantee the requested service.
Network nodes need to maintain some soft state information for the reserved resource.
Therefore, the maintenance cost is very high when RSVP is implemented on large networks.
RSVP is therefore not recommended for the backbone network.
12.2 Traffic Policing and Traffic Shaping

12.2.1 Introduction
Definition
l Traffic Policing
Traffic Policing (TP) is a traffic management technology that is applied at the ingress or
egress of a ATN to limit specified types of data traffic.
l Traffic Shaping
The traffic shaping adopts the Generic Traffic Shaping (GTS) to shape the traffic that is
irregular or does not conform to the preset traffic features, which is convenient for the
bandwidth match between the network upstream and downstream.
Purpose
To control the traffic that is sent to networks by users is important for ISPs. To control the
traffic of certain applications in an enterprise network is a means to manage the network
status. A typical application of TP is to monitor the specification of a type of traffic that enters

Equipment
a network. Based on the result, the specification can be limited in a reasonable scope; or the
amount of traffic that exceeds the limit is "punished." As a result, the network resources and
the interests of a carrier are protected.
A typical application of traffic shaping is to control the normal flow and burst flow of
outgoing traffic based on the network connection. Therefore, the packets can be sent at a
uniform rate.
Benefits
l This feature brings the following benefits to carriers.
Punish the amount of traffic that exceeds the limit to protect the network resources and
the interests of a carrier.
12.2.2 Principles
12.2.2.1 Basic Principles of Traffic Policing
Mechanism for the Traffic Policing

Traffic policing with dual token buckets
l Single Rate Traffic Policing
In the single-rate traffic policing, two token buckets are used. The capacity of one token
bucket is the committed burst size (CBS); this token bucket is called bucket C for short.
The capacity of the other token bucket is the peak burst size (PBS); this token bucket is
called bucket P for short. Tc and Tp represent the token quantity of the two buckets.
When being initialized, Tc equals CBS and Tp equals PBS. The two buckets adopt the
same committed information rate (CIR) for filling tokens. An incoming packet is
processed (colored, discarded, or forwarded) according to the current capacity of the two
buckets.
Tokens are filled into bucket C and bucket P at the rate of CIR. When the buckets are
full, that is, when the quantity is CBS and PBS, the extra tokens are dropped. In
initialization, Tc equals CBS and Tp equals PBS.
Tokens are put into bucket C at the rate of CIR until bucket C is full. That is, the size of
the bucket reaches the CBS. Tokens are assigned to new data, and those without are
discarded.
Tc and Tp are refreshed CIR per second. Each time in the refreshment, the following
rules are observed:
– If Tc < CBS, Tc increases by 1.
– Otherwise, Tc remains unchanged.
Dual token buckets markers operates in Color-Blind or Color-Aware mode. The widely
used Color-Blind mode is the default one.
In the color-blind mode, when an arriving packet is measured, the following rules are
observed:
– If Tc-B ≥ 0, the packet is remarked in green and Tc decreases by B.
– If Tc-B < 0 and Tp-B ≥ 0, the packet is remarked in yellow and Tp decreases by B.
– If Tp-B < 0, the packet is remarked in red and both Tc and Tp remain unchanged.
In the color-aware mode, when an arriving packet is measured (suppose the size of the
arriving packet is B), the following rules are observed:

Equipment
– If the packet is green while Tc-B ≥ 0, the packet is remarked in green and Tc
decreases by B.
– If the packet is green or yellow while Tc-B < 0 and Tp-B ≥ 0, the packet is
remarked in yellow and Tp decreases by B.
– Otherwise, the packet is remarked in red and Tc and Tp remain unchanged.
l Double Rate Traffic Policing
When the network traffic is complex, the double rate traffic policing can be used.
In the double rate traffic policing, two token buckets are used. The capacity of one token
bucket is the committed burst size (CBS); this token bucket is called bucket C for short.
The capacity of the other token bucket is the peak burst size (PBS); this token bucket is
called bucket P for short. Tc and Tp represent the token quantity of the two buckets.
When being initialized, Tc equals CBS and Tp equals PBS. CBS is less than PBS. The
two buckets adopt the different rates for filling tokens: the CIR and the PIR. An
incoming packet is processed (colored, discarded, or forwarded) according to the current
capacity of the two buckets.
Tokens are filled into bucket C at the rate of CIR and into bucket P at the rate of PIR.
When the buckets are full, that is, when the quantity is CBS and PBS, the extra tokens
are dropped. In initialization, Tc equals CBS and Tp equals PBS.
Tc is refreshed CIR times per second and Tp is refreshed PIR times per second. Each
time in the refreshment, the following rules are observed:
– If Tc < CBS, Tc increases by 1.
– If Tp < PBS, Tp increases by 1.
– Otherwise, Tc and Tp remain unchanged.
The processing also goes in the following two modes:
– The color-blind mode
– The color-aware mode
In the color-blind mode, when an arriving packet is measured (suppose the size of the
– If Tp-B < 0, the packet is remarked in red.
– If Tp-B ≥ 0 and Tc-B < 0, the packet is remarked in yellow and Tp decreases by B.
– Otherwise, the packet is remarked in green and both Tc and Tp decrease by B.
In the color-aware mode, when an arriving packet is measured (suppose the size of the
– If the packet is red or Tp - B < 0, the packet is remarked in red.
– If the packet is or Tp - B ≥ 0 and Tc - B < 0, the packet is remarked in yellow and
Tp decreases by B.
– Otherwise, the packet is remarked in green and both Tc and Tp decrease by B.
Procedure for Traffic Policing

In traffic policing, the committed access rate (CAR) is used to control traffic. Packets are
classified according to a preset matching rule. If conforming to the rule, the packets are
forwarded by the ATN. If exceeding the limit specified by the rule, the packets are then either
discarded or re-sent after their precedence is re-marked.
To control traffic, the token bucket (TB) is introduced in the CAR technology. Figure 12-7
shows the procedure of traffic policing with CAR.

Equipment
Figure 12-7 Flowchart of traffic policing with CAR
...
Filling the bucket
Tokens
with tokens at a
specified rate
Classifying
Incoming packets Outgoing packets
Passed
Token bucket
Dropped
l The tokens are put into the TB at the rate preset by the user. The capacity of the TB is
also preset by users. When the number of tokens reaches the capacity of the TB, the
number does not increase any more.
l On arrival, the packets are classified according to the information such as the IP
precedence, source address, or destination address. The packets that conform to the
preset feature go into the TB for further processing.
l If the TB has enough tokens for sending packets, packets are forwarded. Meanwhile, the
number of tokens is reduced by the packet length. If the TB contains insufficient tokens
or is empty, the packets that are not assigned with tokens or not assigned with enough
tokens are discarded; or redirect and the packets are re-sent. At this time, the number of
tokens in the TB remains unchanged.
To limit the traffic rate is the main function of CAR. With the CAR technology, a TB is used
to measure the data traffic that flows through the ports of a ATN so that in the specified time
only the packets that are assigned with tokens go through the ATN. In this way, the traffic rate
is limited. CAR limits the maximum traffic rates of both incoming packets at the ingress and
the outgoing packets at the egress. Meanwhile, the rate of certain types of traffic can be
controlled according to such information as the IP address, port number, and precedence.
These characteristics include the IP address, port number, and precedence. The traffic not
conforming to the present conditions is not limited in rate; such traffic is forwarded at the
original rate.
The CAR technology is used at the network edge to ensure that the core device can process
data normally.
12.2.2.2 Basic Principles of Traffic Shaping
Basic Concepts of Traffic Shaping

l Token Bucket
Token bucket and cache queue are used to implement traffic shaping. The token bucket is
regarded as a container of tokens that has a pre-defined capacity. The system puts the
tokens into the bucket at a defined rate. In case that the token bucket is full, no more
tokens can be added.

Equipment
The processing procedures are as follows:

– If there are sufficient tokens in the bucket, packets are forwarded. At the same time,
the number of tokens in the bucket decreases based on the length of the packets.
– If the token bucket is insufficient or empty, the packets are cached in the GTS
queue.
l GTS Queue
When the packets need to be processed by GTS and there are no sufficient tokens in the
token bucket, the packets enter the cache queue, which is called the GTS queue.
When there are packets in the GTS queue, the GTS extracts and sends the packets
periodically. After each sending, the packets are compared with the number of tokens till
that the number of tokens decreases so that the packets in the queue cannot be sent or the
packets are all sent.
When the GTS queue exists, the packets that participate in the traffic shaping enter the
queue directly, waiting to be scheduled periodically by the GTS queue. The GTS queue
adopts the PQ or WFQ scheduling. The PQ or WFQ scheduling can either ensure the
real-time service that is sensitive to the delay and the bandwidth of the service with the
higher priority or allocate different bandwidths for the flows of different priorities based
on the configured weights.
When the packets, reaching the interface, find that the GTS queue is full, they will be
dropped.
Implementation of Traffic Shaping

At present, the ATN support only the traffic shaping on the outbound interface and support the
traffic shaping of all the packets on the interface.
l On the interface, configure different shaping parameters for the packets that participate
in the traffic shaping based on different service classes (EF, AF1, AF2, AF3, AF4, BE,
CS6, or CS7).
l The scheduling mode of the GTS queue can adopt either the PQ scheduling or the WFQ
scheduling. In the GTS queue, the scheduling modes of the packets with different service
levels have the following default values:
– For the AF1 to AF4 queue or the BE queue, by default, the WFQ scheduling is
configured. The bandwidth is allocated based on the configured weight parameters.
– For the EF, CS6, or CS7 queue, by default, the PQ scheduling is configured. Based
on the priority, the PQ scheduling is applied to the services that are sensitive to the
delay.
l When the GTS queue adopts the WFQ scheduling, the weight value can be configured,
which represents the ratio among the bandwidth occupied by all the WFQ queues.
l You can configure the shaping value on the interface, that is, the rate of putting the
tokens in the token bucket. If the rate of the packets exceeds this value, the packets enter
the GTS queue.
NOTE
The length of the frame header and CRC field are calculated in the bandwidth for packets to which CAR
applies but not calculated in the bandwidth for packets that have been implemented with traffic shaping.
For example, if the traffic shaping value is set to 23 Mbit/s for IPoE packets, the IP packets are
transmitted at a rate of 23 Mbit/s with the lengths of the frame header and CRC field not counted.
For the ATN, the depth of the token bucket is set by the system.

Equipment
Processing Procedure
The packets need to be processed by GTS on the outbound interface of the upstream ATN to
decrease the lost packets. The packets that exceed the GTS traffic features are cached in the
interface buffer of the upstream ATN. When the network congestion disappears, GTS extracts
the packets from the buffer queue and sends the packets continuously. Therefore, the packets
sent to the downstream ATNs conform to the traffic specifications, and the packet loss ratio
on the downstream ATNs is decreased. If the packets are not processed by GTS on the
outbound interface of the upstream ATN, the packets that exceed the CAR specifications of
the downstream ATN will be dropped by the downstream ATN.
Differences Between Traffic Shaping and Traffic Policing

The main differences between the traffic shaping and the traffic policing are as follows:
l The traffic policing is used to control the traffic and drop the packets that do not conform
to the traffic specifications. The traffic shaping, however, is used to cache the packets
that exceed the traffic specifications in the traffic policing. When there are sufficient
tokens in the token bucket, the cached packets are sent at a uniform rate.
l The traffic shaping may increase the delay, but the traffic policing cannot introduce the
extra delay.
Table 12-4 Comparison between traffic shaping and traffic policing

Type Merit Demerit
Traffic Prevents unnecessary packet Introduces the delay and jitter. More buffer
shaping loss. resources are needed to cache the packets.
Traffic Supports the marks. No extra The packet loss may lead to the resending.
policing buffer is needed.
12.2.3 Applications
Applications of Traffic Policing
Traffic policing is a traffic control policy that limits a traffic rate and resource usage through
monitoring the traffic specifications. Traffic policing is applied at the edge of a network to
ensure that core devices process data normally. Figure 12-8 is a typical networking diagram
for traffic policing.
Figure 12-8 Typical networking for traffic policing
WAN
Congestion in WAN
Output interface is not network results in
congested;queuing and nonintelliqent Layer 2
WRED do not work. drops.

Equipment
When a network of high traffic rate transmits data to a network of low traffic rate, the
entrance of the low-rate network bottlenecks the network traffic and results in serious data
loss, especially affects transmission of the data demanding low latency such as voice data.
Traffic control limits the transmission of data at high rate; the voice data, however, must be
forwarded first. The traffic classification function can be used to assign high priority to the
voice data stream. The CAR traffic policing and the queue scheduling work together to ensure
the quality of communication.
ATN supports traffic policing based on an interface. Traffic policing based on an interface
means to control all the traffic that enters the interface regardless of the packet type. Traffic
policing can be applied to both the incoming packets and outgoing packets.
The CAR works together with other QoS policies to provide QoS control for an entire
network. When the CAR works together with other QoS policies, the working order of the
policies is as follows:
l When policies are configured at the ingress, all policies take effect before packets are
decided to forward. If the CAR and other policies such as PQ and WFQ are configured,
the CAR takes effect earlier than other policies.
l When policies are configured on the egress, all policies take effect at the moment after
packets are decided to forward. If the CAR and other policies such as PQ and WFQ are
configured, the CAR takes effect earlier than other policies.
Applications of Traffic Shaping

Figure 12-9 shows the typical application.
Figure 12-9 Typical diagram of the traffic shaping
ATNA ATNB
Physical line
As shown in Figure 12-9, ATN A sends the packets to ATN B. To decrease the lost packets,
the packets are processed by GTS on the outbound interface of ATN A. The packets that do
not conform to the GTS traffic features are cached on ATN A. When ATN A can send the next
batch of the packets, the GTS extracts the packets from the buffer queue and sends them.
Therefore, the packets sent to ATN B conform to the traffic specifications of ATN B, and the
packet loss ratio on ATN B is decreased.

Equipment
Terms
Terms Description
Committed Access Committed Access Rate (CAR) limits the traffic volume of a
Rate (CAR) specified type of packets. A token bucket (TB) is used to perform TP.
Generic Traffic Generic Traffic Shaping: The typical application of GTS is to control
Shaping (GTS) the volume and burst of outgoing traffic based on the network
connection. Therefore, the packets can be sent at a uniform rate. The
traffic shaping is implemented by using the buffer and the token
bucket. When the rate of sending the packets is too fast, the packets
are first cached in the buffer and then sent at a uniform rate under the
control of the token bucket.
Traffic policing Traffic Policing (TP) is a traffic management technology that is

applied at the ingress or egress of a ATN to limit specified types of
data traffic. A typical application of TP is to monitor the specification
of a type of traffic that enters a network. Based on the result, the
specification can be limited in a reasonable scope; or the amount of
traffic that exceeds the limit is "punished." As a result, the network
resources and the interests of a carrier are protected.
Traffic Shaping A typical application of traffic shaping is to control the flow and burst
of outgoing traffic based on the network connection. Therefore, the
packets can be sent at a uniform rate. The traffic shaping adopts the
Generic Traffic Shaping (GTS) to shape the traffic that is irregular or
does not conform to the preset traffic features, which is convenient
for the bandwidth match between the network upstream and
downstream.

Acronyms & Full Name
Abbreviations
CAR Committed Access Rate
CBS Committed Burst Size
CIR Committed Information Rate
TP Traffic Policing
TS Traffic Shaping
WFQ Weighted Fair Queuing
12.3 Congestion Avoidance and Management

12.3.1 Introduction

Equipment
Definition
l Congestion Avoidance
Congestion avoidance is a traffic control mechanism that monitors the network resources
such as queues and buffer memory. When network congestion is found of tending to
intensify, the ATN actively discards packets to regulate network traffic so that the
network is free from overload.
l Congestion Management
Congestion management provides means to manage and control traffic when traffic
congestion occurs. The queue scheduling technology is used to handle traffic congestion.
Packets sent from one interface are placed into many queues which are identified with
different priorities. Packets are then sent according to the priorities. A proper queue
scheduling mechanism can provide packets of different types with reasonable QoS
features such as the bandwidth, latency, and jitter. The queue here refers to the outgoing
packet queue. Packets are buffered into queues before the interface is able to send them.
Therefore, the queue scheduling mechanism works only when an outbound interface is
congested. The queue scheduling mechanism can re-arrange the order of packets except
those in First In First Out (FIFO) queues.
Purpose
Congestion avoidance and management are a traffic control mechanism to regulate network
traffic so that the network is free from overload.
Benefits
l This feature brings the following benefits to users.
Packets with high priorities will be preferentially transmitted when traffic congestion
occurs.
12.3.2 Principles
12.3.2.1 Basic Principles of Congestion Avoidance
Congestion avoidance is a traffic control mechanism used to discard packets according to the
queue status when the network is congested. Through congestion avoidance, the QoS of
traffic is improved when the network is congested.
The traditional solution adopted by congestion avoidance is tail drop. That is, all arriving
packets are discarded when the network is congested. If a large number of packets from a
TCP connection are discarded, the TCP connection will time out and enter the slow start state.
Then, the TCP connection sends fewer packets. When packets from multiple TCP connections
are discarded in a queue, these TCP connections enter the congestion avoidance and slow start
state at the same time, which is referred to as global TCP synchronization. Therefore, these
TCP connections simultaneously send fewer packets to the queue so that the rate of incoming
packets is smaller than the rate of outgoing packets, which reduces the bandwidth usage.
To avoid the preceding problems, packet discarding must be done before the queue is to be
congested. WRED is a congestion avoidance mechanism used to discard packets to prevent
queues from being congested. WRED discards at probabilities increasing packets that may
cause congestion. Therefore, the bandwidth consumed by outgoing interfaces of TCP
connections is reduced slowly, which does not cause the slow synchronization of a large
number of TCP connections. This also reduces the average queue length and shortens the
delay for sending traffic.

Equipment
The ATN uses both the tail-drop and the WRED algorithms for congestion avoidance. In the
Diff-Serv model, the ATN preserves eight service queues for each port. The queues map the
following service types respectively: BE, AF1 to AF4, EF, CS6, and CS7. By default, AF1 to
AF4 and BE queues are applied with the WFQ scheduling; they are allocated with bandwidth
according to configured weight parameters. EF, CS6, and CS7 queues are configured to the
PQ scheduling by default.
RED Algorithm
To avoid the global TCP synchronization, you can use the random early detection (RED)
mechanism. The RED is a mechanism for detecting congestion. You can define a type of
traffic so that when the length of a queue exceeds a limit, the router discards packets
randomly in a proportion in advance. Figure 12-10 shows the working principle of the RED.
Figure 12-10 Principle of the RED
Drop No drop Random drop Full drop

Probability
100%
Minimum
Drop
Probability
10%
20 40 Average
Quere
Minimum Maximum Size
Threshold Threshold
According to the RED algorithm, each queue is set with a pair of minimum threshold and
maximum threshold. Apart from this,
l When a queue is shorter than the minimum threshold, the router does not discard
packets.
l When a queue is longer than the maximum threshold, the router discards all incoming
packets.
l When the length of a queue is between the minimum threshold and the maximum
threshold, the router discards packets at random. By the random way, each arriving
packet is applied with a random number. This random number is compared with the
current discarding probability of the current. If the number is greater than the discarding
probability, this packet is discarded. The longer a queue, the higher the drop probability
is. The probability, however, cannot exceed the maximum value.
With this mechanism, the router decides to discard packets by comparing the length of a
queue with the minimum threshold and the maximum threshold, that is, by means of setting
the absolute length of a queue. This mechanism is unfair to burst traffic and is unfavorable for
data transmission. Therefore, a new mechanism is adopted: a router decides to discard packets
by comparing the average queue size (AQS) with the minimum threshold and the maximum
threshold, that is, by means of setting the relative value by comparing the queue length
thresholds with the average queue length. The AQS implies the changing tendency of a queue;

Equipment
it is insensitive to the abrupt change of the queue length. This avoids the consequence that
burst traffic is considered unfairly.
The random discarding of packets in the RED mechanism can prevent simultaneous drop of
traffic rates of many TCP connections. In this way, global TCP synchronization is avoided.
When the packets of a certain TCP connection are discarded and the traffic rate decreases,
packets of other TCP connections are still sent at high rate. For all the time, there must be
packets of some TCP connections that are sent at high rate. As a result, the bandwidth of a
link is fully used.
Figure 12-11 Traffic flows for the RED algorithm
A ve ra g e
lin k u s e
F lo w A F lo w B F lo w C
Figure 12-11 shows that when the RED congestion avoidance algorithm is used, the traffic
flows on a network are stable.
WRED Algorithm
The RED algorithm can better solve the problem of the global TCP synchronization. This
algorithm, however, cannot sense any QoS signaling: all types of packets are considered
equally. Therefore, this algorithm is less flexible. To adopt differentiated discarding policies
to different types of packets, the weighted random early detection (WRED) algorithm is
introduced.
The WRED algorithm is similar to the RED algorithm. In the WRED algorithm, each queue is
also set with a minimum threshold and a maximum threshold. Apart from this,
l When a queue is shorter than the minimum threshold, the ATN does not discard packets.
l When a queue is longer than the maximum threshold, the ATN discards all incoming
packets.
l When the length of a queue is between the minimum threshold and the maximum
threshold, the ATN discards packets in a random order. By the random way, each
arriving packet is applied with a random number. This random number is compared with
the current discarding probability of the current. If the number is greater the discarding
probability, this packet is discarded. The longer a queue, the higher the discarding
probability. The probability, however, cannot exceed the high limit.
l In addition, the average queue length is used to compare with the minimum threshold
and the maximum threshold so that burst traffic is processed unfairly.
NOTE
The longer the queue, the higher the drop probability. When the queue lengths are the same, the higher
the maximum drop probability, the higher the drop probability.

Equipment
Different from the RED algorithm, the random numbers produced by the WRED algorithm is
based on the precedence. In the WRED mechanism, the DSCP value that indicates the IP
precedence is introduced to identify discarding policies. You can set different DSCP values
for the queue length, queue threshold, and drop probability so that packets of different
precedence are applied with different discarding probability. This is the important feature of
the WRED algorithm.
l When the weighted fair queuing (WFQ) is used in the queuing mechanism, packets of
different precedence can be set with different minimum threshold, maximum threshold,
and drop probability. In this way, packets of different precedence are provided with
different discarding features.
l When the FIFO, PQ are used in the queuing mechanism, you can set different minimum
threshold, maximum threshold, and drop probability for each queue so that packets of
different types are provided with different discarding features.
Figure 12-12 shows the relationships between the WRED and queues.
Figure 12-12 Relationships between the WRED and queues
WRED drop Queue1 weight1

Packets to be sent
from this interface
Queue2 weight2
Forwading queue
......
QueueN-1 weightN-1
Classifying Scheduling Forwarded packets
QueueN weightN
Dropped
packets
Congestion Avoidance Algorithms

l PQ
The discarding policy for the PQ can be the tail-drop or the WRED. The services that
demand high real-time performance are applied with the tail-drop policy. These packets
must be provided with the QoS guarantee to a large extent. The tail-drop policy means
that the ATN discards packets only when a queue reaches the length threshold. The PQ
scheduling preempts the bandwidth of other services; therefore, when traffic congestion
occurs, real-time services are guaranteed with the bandwidth to the maximum.
l WFQ
The default discarding policy for the WFQ is tail-drop but in reality the WRED is mostly
adopted. The WFQ scheduling is often applied to the packets of low precedence and
those insensitive to latency. You can use the WFQ and the WRED together to configure
different discarding parameters to different types of traffic so that different purposes are
reached.

Equipment
You can configure a template on a device to realize the WRED. First define WRED
templates: set the maximum and minimum thresholds for packets in different colors and
set the drop probability. Then apply the WRED templates for different levels of quality
on the interface. You can configure a maximum of eight WRED templates for queues on
an interface. Each template supports the process of packets of no more than three colors.
These packets are defined as red, yellow, and green packets. Generally green packets are
set to low drop probability and high threshold while red packets are set to high drop
probability and low threshold. You can configure packets of different colors with a
different thresholds and drop probabilities flexibly.
When traffic congestion occurs, a queue begins to buffer packets. According to the
classification of packets, red packets are set to low threshold and high drop probability;
therefore, the red packets begin to be dropped first. When the queue is long enough,
green packets begin to be dropped. When the queue length reaches the maximum
threshold of a color, packets of this color begin to be applied with the tail-drop policy.
Because the WFQ queues share the bandwidth in proportion, traffic congestion occurs
easily. The use of the WRED policy can effectively prevent the global TCP
synchronization.
Currently, the device supports the application of the WRED policy only on outbound
interfaces.
l Congestion Avoidance of Flow Queues
The downstream FQs support the WRED and tail drop mechanisms.
12.3.2.2 Basic Principles of Congestion Management
Common Queue Scheduling Algorithms

l FIFO
FIFO is the simplest queuing algorithm. One interface has only one FIFO queue.
Therefore, FIFO does not need traffic classification; one queue is unnecessarily to be
scheduled. FIFO handles only the queue length, which has effect on the latency and
packet drop ratio.
FIFO queuing works under the tail drop mechanism. In the tail drop mechanism, when a
queue is full, all further packets to join the queue are dropped. No means is provided to
let later packets take up the positions of the packets already in the queue.
Figure 12-13 FIFO queuing

Queue Packets out
Packets to be sent
from this interface Scheduling
As shown in Figure 12-13, FIFO does not classify packets. When packets enter the
interface at a rate higher than the ability the interface can support, FIFO lets the packets
that come earlier to enter the queue first. At the outbound interface, FIFO lets the packets
leave the interface in the same order as when the packets enter the interface. This is
called first in, first out for short.
In the FIFO mechanism, if a queue is defined to be too long, the queue is not easy to be
full and fewer packets are discarded. But long queue results in long latency. If a queue is

Equipment
defined to be too short, latency is short but more packets are discarded. In configuration,
you must balance between the two factors to achieve a favorable result. Such a problem
also exists in other queue scheduling mechanisms.
l PQ
In the Priority Queuing (PQ) mechanism, queues are generally classified into four levels,
namely, top, middle, normal, and bottom, from high to low in priority.
NOTE
On the device, queues are classified into eight priority levels, from 0 to 7.
As shown in Figure 12-14, when packets arrive, PQ organizes the packets into four
classes. Each class of packets is sent to one of the four PQ queues.
Figure 12-14 Priority queuing

Queues
Top
Middle Packets out

Classifying
Normal
Packets to be sent Scheduling
from this interface
Bottom
When packets leave a queue, PQ lets the packets from the queue of the top priority go
first. Packets from this queue keep being sent until the queue is empty. When the packets
from the queue of the top priority are all sent, packets from the queue of middle priority
are sent. When the packets from the queue of the middle priority are all sent, the packets
from the queue of the normal priority are sent; finally, the packets from the queue of the
bottom priority are sent.
In this way, packets from the queue of high priority are sent earlier according to the
classification. When congestion occurs, packets from the queue of high priority are still
authorized to leave earlier. This makes the packets of important services such as the
enterprise resource planning (ERP) service are handled earlier. The packets of not so
important services such as the email service are handled late until the packets of
important services are all sent up and the network is idle. As a result, key services are
handled first and network resources are also fully used.
PQ has the following features:
– ACLs can be used for packets classification and then classified packets are put into
different queues as required.
– The tail drop mechanism is used as the only packet drop policy when congestion
occurs.
– The FIFO is used in the queue internally.
– In queue scheduling, packets from the queue of high priority are scheduled first.
PQ has also obvious advantages and obvious disadvantages as follows:
– Advantages: Packets from the queue of high priority are provided with higher
bandwidth, lower latency, and less jitter.
– Disadvantages: Packets from the queue of low priority are not scheduled in time so
that they keep "starving."

Equipment
l WFQ
The weighted fair queuing (WFQ) is a complex queuing algorithm. With this algorithm,
services of the same priority are processed in fair manner. Services of different priorities
are weighted before being processed.
Figure 12-15 shows the WFQ queuing principle.
Figure 12-15 WFQ queuing principle

Queuing
Queue 1
Queue 2 Packets out

Classifying
...
Packets to be sent Scheduling
from this interface
Queue N
Implementation of Congestion Management

In queue configuration, you do not need to care about what scheduling algorithms are used.
You just need to care about the exterior traffic features (expressed with parameters) of the
service carried in a queue, for example, the bandwidth to be guaranteed, the bandwidth at the
peak time, and the proportion for taking up the remaining bandwidth. A scheduling algorithm
is chosen according to the configured traffic parameters so that user's configuration is
guaranteed.
The queue scheduling for the device consists of three stages: the traffic rate limit, the default
SQ queue scheduling and the FQ queue scheduling on the interface. The following are
configurable parameters for queues:
l shaping value: shaping value is set to provide guaranteed peak bandwidth.

l PQ algorithm: It is a scheduling algorithm based on priority.
l WFQ algorithm: It is a scheduling algorithm based on weight.
In the default SQ queue scheduling, PQ or WFQ algorithm is used. The advantages of

interface queue scheduling are:
l Real-time services sensitive to latency is provided with guaranteed quality; services of

high priority can be provided with prioritized bandwidth.
l Traffic flows of different priorities can be assigned with different bandwidth according
the weight.
In the Diff-Serv model, the ATN reserves eight service queues for each interface. These
queues map the service types of BE, AF1 to AF4, EF, CS6, and CS7. By default, AF1 to AF4
and BE queues are configured to the WFQ scheduling scheme; bandwidth is distributed
proportionally according to the preset weight. The EF, CS6, and CS7 queues are configured to
the PQ scheduling scheme by default. This scheduling is based on absolute priorities. PQ
scheduling is used in services sensitive to latency.

Equipment
12.3.3 Applications
Networking and Application of Congestion Avoidance
Figure 12-16 Typical networking for congestion avoidance
Server Telephone
10.1.1.2/24 10.1.1.3/24
ATNA CX
GE1
10.1.1.1/24 S0 Network
S1 GE2
100.1.1.1/24
PC1 PC2
10.1.1.4/24 10.1.1.5/24
As shown in Figure 12-16, devices Server, Telephone, PC1 and PC2 all send data to the
network through ATN A. The data sent from Server is of critical traffic class; the data sent
from Telephone is of voice services; the data from PC1 and PC2 is of normal services.
Because the rate of the inbound interface GE0/3/0 on ATN A is greater than that of the
outbound interface GE0/3/1, congestion may occur on GE0/3/1.
When network congestion occurs, the data sent by Server and Telephone must be transmitted
first. Users PC1 and PC2 allow a little delay to the transmission of their data but they also
require bandwidth guarantee because they are VIP users. Therefore, ATN A must discard
packets based on the priority of the packets when the network congestion intensifies.
ATN A sends packets through the S0 interface to CX. Because the bandwidth of the S0
interface is less than that of the S1 interface. the S0 interface on ATN A is easy to be
congested.
The queuing technologies need to be used to manage and control the congested interface. First
classify packets to be sent from the S0 interface and place them into many different queues.
Then process the queues respectively according to the priorities. Packets of high priorities are
handled first.
Terms
Terms Description
Congestion A network status that the network traffic exceeds the supported value
so that the system cannot process network messages normally.

Equipment
Terms Description
Congestion Congestion avoidance is a traffic control mechanism that monitors the

Avoidance network resources such as queues and buffer memory. When network
congestion is found of tending to intensify, the ATN actively discards
packets to regulate network traffic so that the network is free from
overload.
Congestion Congestion management provides means to manage and control

Management traffic when traffic congestion occurs. The queue scheduling
technology is used to handle traffic congestion. Packets sent from one
interface are placed into many queues which are identified with
different priorities. Packets are then sent according to the priorities.
First In First Out First in First out: It allows the packets to enter and leave the queue
Queuing (FIFO) based on the sequence of reaching the interface.
Priority Queue Priority Queuing: It classifies the packets based on the carried
(PQ) information and sends the packets based on the specified priorities.
The priority queuing ensures the larger bandwidth, lower delay, and
less jitter for the queue of the higher priority. The packets in the
queue of the lower priority cannot be scheduled immediately and are
dropped.
Weighted Fair Weighted Fair Queuing: WFQ classifies the traffic dynamically based
Queue (WFQ) on the quintuple (source IP address, destination IP address, protocol
number, source port number, and destination port number).The Hash
algorithm is used to map the flows to different queues and allocate the
bandwidth for each flow based on the priorities of the flows. In some
cases, the ToS field is used.
Weighted Random The weighted random early detection (RED) is a mechanism for
Early Detection detecting traffic congestion. You can define a type of traffic so that
(WRED) when the length of a queue exceeds a limit, the ATN discards packets
randomly in a proportion in advance. In this mechanism, weight is
introduced on the basis of the RED.

Acronyms & Full Name
Abbreviations
CBQ Class-based Queue
FIFO First In First Out
FQ Fair Queue
PQ Priority Queue
WRED Weighted Random Early Detection

Equipment
12.4 Class-Based QoS
12.4.1 Introduction
Definition
Traffic classification classifies packets based on certain rules defined based on specific
information contained in packets, and then implements different QoS policies for the packets
matching different rules.
Based on matching rules, traffic classification is classified into simple traffic classification
and complex traffic classification.
l Simple traffic classification

Simple traffic classification classifies data packets based on multiple priorities or service
classes. If the first three bits (IP precedence) of the ToS field in the IP header are used to
mark the packets, the packets can be classified into a maximum of eight classes. If the
Differentiated Services Code Point (DSCP), the first six bits of the ToS field, is used to
mark packets, the packets can be classified into a maximum of 64 classes. After the
packets are classified, other QoS features can be applied to different classes to
implement class-based congestion management and traffic shaping.
The network administrator can set BA policies for packets, including IP precedence or
DSCP values of the IP packets, EXP values of the MPLS packets, and 802.1p values of
the VLAN packets.
l Complex traffic classification
Complex traffic classification classifies packets based on information such as the
quintuple (source IP address, source port number, protocol number, destination IP
address, destination port number) and the TCP SYN (the general basis for traffic
classification is limited to the header information of encapsulation packets, and packet
contents are seldom used as classification standards). MF is configured at the edge of a
network by default. When the packets enter the edge node, the network administrator can
flexibly configure classification rules. Traffic classification can be implemented based on
either the quintuple (source IP address, source port number, protocol number, destination
IP address, destination port number) or all the packets that match a certain network
segment.
BA can retrieve data according to the priority fields of packets, search for entries, and
implement traffic behaviors, which does not affect forwarding performance. MF needs to
extract packet information, construct the key value, search for matching rules and match
the packets with the rules, obtain data according to the index, and implement traffic
behaviors, which affects forwarding performance.
Purpose
Traffic classification provides differentiated services for the traffic of users in the Diff-Serv
domain.
Due to the characteristics of the traffic model and service model on the IP network, the
Internet backbone network needs to provide services for thousands of service traffic at the
same time. As a result, the resolution roadmap of reserving the bandwidth for each flow

Equipment
cannot be expanded, which seriously restricts the IntServ application on the practical network.
The IntServ application is also restricted by other factors such as the large-scale deployment
of the RSVP signaling, the interworking between the devices of different manufacturers, and
the management (including authentication and accounting) based on services. Since 1994, the
IntServ has not been used for commercial purpose.
The Diff-Serv, however, is a class-based QoS technology. On the ingress of the network, Diff-
Serv is used to implement traffic classification and traffic control based on the service
requirements and set the ToS fields of the packets. Diff-Serv is also used to differentiate the
communications based on the values of the ToS fields in packets and provide QoS services
including resource allocation, queue scheduling, and packet discarding policy, which are
called Per Hop Behaviors (PHBs). All the nodes in the Diff-Serv domain abide by the PHB
based on the DSCP fields of packets. The Diff-Serv model classifies services, which improves
the service scalability.
The Diff-Serv model provides different services for different types of traffic. Therefore, in the
Diff-Serv model, the service traffic needs to be classified based on the service requirements,
which is the prerequisite and basis for the differentiated service.
Benefits
l This feature brings the following benefits to carriers:
Class-based QoS provide different QoS services by differentiating users and services.
12.4.2 Principles
12.4.2.1 Simple Traffic Classification

Simple traffic classification implements the mapping between the internal priority and the
external priority. Packets are classified based on DSCP values of IP packets, EXP values of
MPLS packets, and 802.1p values of VLAN packets. Then, the mapping between the
priorities of the packets on different networks is created. The Diff-Serv (DS) domain is
comprised of the DS nodes that adopt the same service policies and Per-Hop Behavior (PHB)
set. The traffic policies are defined in a DS domain that is bound to an interface to implement
simple traffic classification.
Simple traffic classification can be classified into two types: upstream simple traffic
classification and downstream simple traffic classification.
l Upstream simple traffic classification: Based on DSCP values of IP packets, EXP values
of MPLS packets, and 802.1p values of VLAN packets, the packets are classified into
eight CoSs (CS7, CS6, EF, AF4 to AF1, and BE) and marked with three colors (green,
yellow, and red). When the CoS of packets is EF, BE, CS6, or CS7, by default the
packets can be re-marked in green. Upstream simple traffic classification is used to
differentiate services such as voice, video, and data services. During congestion
management and queue scheduling, different services enter different queues. Therefore,
different scheduling solutions are implemented. For example, voice services can enter
the PQ queue of a higher priority, and short delay is ensured. If upstream simple traffic
classification is not implemented, the service type of all the packets is BE.
l Downstream simple traffic classification: Based on the CoSs (CS7, CS6, EF, AF4 to
AF1, and BE) and three colors (green, yellow, and red), DSCP values of IP packets, EXP
values of MPLS packets, or 802.1p values of VLAN packets are re-set. Downstream
simple traffic classification implements the re-marking function, that is, re-marking

Equipment
DSCP values of IP packets, EXP values of MPLS packets, or 802.1p values of VLAN
packets.
Figure 12-17 Mapping of upstream and downstream simple traffic classification
BA Mapping
8021p
ServiceClass
DSCP
Color
MPLS EXP
PHB Mapping
8021p
ServiceClass
DSCP
Color
MPLS EXP
Upstream Mapping Based on the DSCP Values of IP Packets

According to the mapping between DSCP values of IP packets and internal priorities, the
scheduling priority and color of the packets are specified on ATNs based on DSCP values of
the packets, which ensures proper scheduling of the packets.
Table 12-5 Default mapping between DSCP values of IP packets and CoSs in default domain
DSCP CoS Color DSCP CoS Color
00 BE Green 32 AF4 Green
01 BE Green 33 BE Green
04 BE Green 36 AF4 Yellow

Equipment
DSCP CoS Color DSCP CoS Color
06 BE Green 38 AF4 Red
08 AF1 Green 40 EF Green
10 AF1 Green 42 BE Green
12 AF1 Yellow 44 BE Green
14 AF1 Red 46 EF Green
16 AF2 Green 48 CS6 Green
22 AF2 Red 54 BE Green
30 AF3 Red 62 BE Green

Equipment
Upstream Mapping Based on the 802.1p Values of VLAN Packets

According to the mapping between the 802.1p values of VLAN packets and the internal
priorities, the scheduling priority and color of the packets are specified on ATNs based on the
802.1p values of the packets, which ensures proper scheduling of the packets.
Table 12-6 Default mapping between 802.1p values of VLAN packets and CoSs in the default
domain template
802.1p CoS Color 802.1p CoS Color
Upstream Mapping Based on the EXP Values of MPLS Packets

According to the mapping between the EXP values of MPLS packets and internal priorities,
the scheduling priority and color of the packets are specified on ATNs according to the EXP
values of the packets, which ensures proper scheduling of the packets.
Table 12-7 Default mapping between EXP values of MPLS packets and CoSs
EXP CoS Color EXP CoS Color
Precedence Mapping of Outgoing Packets

A ATN searches for the CoSs and colors of packets based on the DSCP values, 802.1p values,
or EXP values of the packets on the outgoing interface. After performing the internal
scheduling, the ATN adds the priority fields such as DSCP values, 802.1p values, and EXP
values to the packets to be sent according to the CoSs and colors of the packets.The simple
traffic classification-based mapping relationship for downstream traffic is the same as that for
upstream traffic. For details, see Table 12-5,Table 12-6 and Table 12-7.
The default mapping between the CoS value and the DSCP value is shown inTable 12-8.

Equipment
Table 12-8 Default mapping between the CoS value and the DSCP value in default domain
Service Color DSCP
BE Green 0
AF1 Green 10
AF1 Yellow 12
AF1 Red 14
AF2 Green 18
AF2 Yellow 20
AF2 Red 22
AF3 Green 26
AF3 Yellow 28
AF3 Red 30
AF4 Green 34
AF4 Yellow 36
AF4 Red 38
EF Green 46
CS6 Green 48
CS7 Green 56
The default mapping between the CoS value and the 802.1p value is shown inTable 12-9.
Table 12-9 Mappings from QoS CoSs and colors to 802.1p priorities in the default domain
template
CoS Color 802.1p
BE Green 0
AF1 Green, yellow, and red 1
EF Green 5
CS6 Green 6
CS7 Green 7

Equipment
The default mapping between the CoS value and the EXP value is shown in Table 12-10.
Table 12-10 Default mapping between the CoS value and the EXP value
Service Color MPLS EXP
BE Green 0
AF1 Green, Yellow, Red 1
EF Green 5
CS6 Green 6
CS7 Green 7
12.4.2.2 Complex Traffic Classification

Complex traffic classification classifies packets based on certain characteristics of the packets
and then performs pre-defined traffic behaviors on the packets of different traffic classifiers.
Traffic Classifiers
A classifier is a set of defined conditions for classifying packets. Packets are classified
through certain fields in the packets.
Multiple matching rules can be defined in a classifier. The default relationship between their
rules is "OR". That is, the corresponding behaviors can be implemented for the packets when
the packets match any one of the rules. The relationship between these rules can be specified
by setting the parameter operator. The relationships can be specified only while a traffic
classifier is created and cannot be specified after a traffic classifier is created.
Traffic Behaviors
Traffic classification is performed to provide differentiated services. Therefore, traffic
classification is useful only after it is associated with certain traffic control actions or resource
distribution actions. The traffic behaviors are as follows (these behaviors can be used
together):
l Deny or permit
The deny or permit action is the simplest traffic behavior. The network traffic can be
controlled by permitting packets to pass through or denying packets.
l Traffic policing
As one of the traffic behaviors, traffic policing is also called CAR. Through CAR,
operators can set the maximum volume of traffic for various services from the network
edge and control the usage of network resources, which ensures QoS on the entire
network. Operators sign the service level agreements (SLAs) for cooperation. An SLA

Equipment
contains the parameters such as the Committed Information Rate (CIR), Peak
Information Rate (PIR), Committed Burst Size (CBS), and Peak Burst Size (PBS) of
various service traffic. The device performs such behaviors as pass, drop, or re-marking
the priorities of packets for the traffic exceeding the promised limit.
l Re-marking
Re-marking is to mark service traffic with classes according to the SLA and results of
traffic classification. At present, the related RFC protocol defines six types of standard
services: EF, AF1 to AF4, and BE and confirms the requirements for implementing these
services by defining the PHBs of the services, that is, the requirements for processing
these services by the device. EF traffic requires short delay, low jitter, and low packet
loss ratio, and corresponds to real-time services such as video services, voice services,
and video conferences. AF traffic requires shorter delay, low packet loss ratio, and high
reliability, and corresponds to services that have high requirements for data reliability,
such as e-business and enterprise VPNs. BE traffic has no requirement for the CIR and
delay, and corresponds to traditional Internet services. The device can specify the service
types, that is, EF, AF1 to AF4, and BE, of packets.
l Redirect
The redirect action indicates that the device does not forward packets according to the
original destination addresses of the packets but forward the packets to a specified next
hop. In this manner, policy-based routing is implemented. Currently, the redirect action
is valid only for Layer 3 packet forwarding.
The device can implement multiple types of redirect action.
– IPv4 strong redirection
If a user specifies the next-hop IP address and outgoing interface of a packet, the
device does not need to search the FIB table for an entry for forwarding the packet.
The device can directly send the packet to the outgoing interface specified by the
user. The packet can be sent after being encapsulated with the ARP information on
the outgoing interface. If the outgoing interface is Down, the packet is discarded
and is not forwarded according to the original destination address.
– IPv4 weak redirection
When a user specifies the next-hop IP address of a packet but does not specify the
outgoing interface of the packet, the devices search the FIB table according to the
next-hop IP address configured by the user for an entry for forwarding the packet. If
the path specified by the user is available, the device forwards the packet along the
path. If the path specified by the user is unavailable, the device forwards the packet
according to the original destination address of the packet.
l Security
Security actions perform measures such as port mirroring, or traffic sampling over
packets. Security actions are not QoS measures but can be used together with other
actions to improve the security of the network and packets.
Traffic Policies
A traffic policy is an integrated QoS policy formed by associating traffic classifiers with QoS
behaviors. A traffic policy can be applied to interfaces, thus applying traffic classifiers and
behaviors defined in the traffic policy.
The traffic policy supports two attributes, that is, the shared attribute and the unshared
attribute. The shared attribute indicates that different interfaces use the same traffic policy and
share a set of traffic classifier and traffic behavior entries. The unshared attribute indicates

Equipment
that different interfaces use the same traffic policy but use multiple sets of traffic classifier
and traffic behavior entries generated based on interfaces and VLANs.
When two interfaces use the same traffic policy, the two interface share a set of rules and
behaviors if the attribute of the traffic policy is shared. If CAR is set, the traffic on both the
two interfaces is limited.
The two interface use two sets of rules and behaviors if the attribute of the traffic policy is
unshared. The rules are the same but the behaviors are different. If CAR is set, the traffic on
the two interfaces is limited independently.
The device supports the dynamic modification of rules of a traffic policy but does not support
the dynamic modification of the shared attribute or the unshared attribute of a traffic policy.
After applying a traffic policy on an interface, you can dynamically add, delete, or change the
rules and behaviors of the traffic policy, but you cannot change the shared attribute of the
traffic policy. You can change the shared attribute of the traffic policy only after disabling the
traffic policy on the interface.

Equipment
Process of Complex Traffic Classification
Figure 12-18 Process of complex traffic classification
Packet forwarding based on

the interface, VLAN, or VSI
Is complex No
traffic classification
enabled?
Yes
Rule key
Contruct the Key value.
GID SIP DIP ......
Rule mask
Search the table of SIP_mask DIP_mask ......
complex traffic
classification.
packet
GID SIP DIP ......
Do the packets
match the traffic
policy? No
Yes
Implement the behavior

specified in the traffic
policy.
Perform other processing.
The preceding figure shows the basic process of implementing complex traffic classification
for packets. When the AND operation is performed between the masks of the rules of a traffic
policy and the source IP addresses and destination IP addresses of packets, the packets match
the policy if the value obtained through the AND operation is the same as the value defined in
the rules. Then, the behavior specified in the traffic policy is implemented for the packets.
12.4.3 Applications
Mapping Instances of Simple Traffic Classification

l Priority mapping of VLAN packets

Equipment
Figure 12-19 Priority mapping of VLAN packets
10.1.1.1/24 10.1.1.2/24 11.1.1.0/24

20.1.1.1/24 VLAN 10 VLAN 10 11.1.1.1/24
VLAN
Network
ATNA CX
As shown in the preceding figure, ATN A and CX connect to each other through a VLAN.
When IP packets sent from ATN A enter the VLAN, the priority of the IP packets is mapped
to the priority of the VLAN frames according to the default mapping. When the packets from
the VLAN reach CX, the priority of the packets is mapped according to the priority mapping
for the DS domain set on CX.
l Simple traffic classification applied in MPLS networks
Figure 12-20 Simple traffic classification applied in MPLS networks
GE1 GE2 GE2 GE2

GE1 GE1
ATNA CXB CXC
As shown in the preceding figure, the three devices set up MPLS peer relationships. After
reaching ATN A, IP traffic is forwarded through MPLS from ATN A to CXC. After the IP
traffic leaves CXC, the IP traffic is forwarded through IP. The mapping from IP DSCP values
to MPLS EXP values is set on GE1 of ATN A, and the mapping from MPLS EXP values to IP
DSCP values is set on GE1 of CXC. Simple traffic classification is enabled on the two
interfaces. Therefore, the DSCP value of the IP traffic can be changed to the EXP value of
MPLS traffic on ATN A, and the EXP value of MPLS traffic can be changed to the DSCP
value of the IP traffic on CXC.

Equipment
Complex Traffic Classification
Figure 12-21 Complex traffic classification application network
Core
network
Edge access
nodes
Company A Company B
193.2.0.0 193.1.0.0
As shown in the preceding figure, assume that the bandwidth purchased by Company A is 200
Mbit/s, and that purchased by Company B is 400 Mbit/s. To ensure the bandwidth, you can
configure complex traffic classification on the edge access node. The node can thus
differentiate the traffic of Company A from that of Company B based on the IP addresses, and
then carry out different traffic policing policies.
Terms
None.

Acronyms & Abbreviations Full Name
Diff-Serv Differentiated Service
DSCP Differentiated Services CodePoint
12.5 HQoS

Equipment
12.5.1 Introduction to HQoS

Definition
Hierarchical Quality of Service (HQoS) is a technology used to guarantee the bandwidth of
multiple services of many users in the Differentiated Service (Diff-Serv) model through a
queue scheduling mechanism.
Diff-Serv is a type of the class-based QoS technology, and provides different services for
different service flows. Therefore, in the Diff-Serv scheme, service flows need to be classified
based on service requirements first. Traffic classification is the prerequisite and basis of
HQoS. For details of Diff-Serv, see the 12.4 Class-Based QoS.
Purpose
Along with the emergence of new applications on IP networks, new requirements are
presented to the QoS of IP networks. For example, real-time services such as Voice over IP
(VoIP) demand a shorter delay. A long delay for packet transmission is unacceptable. Email
and File Transfer Protocol (FTP) services are comparatively insensitive to the delay. To
support the services that present different service requirements, such as voice, video, and data
services, the network is required to distinguish the services before providing corresponding
quality of services for them. Therefore, the QoS technology is introduced. With the rapid
development of network equipment, the capacity of a single interface increases along with the
number of users accessing it. HQoS is needed because Traditional QoS is encountering new
problems with applications.
l Traditional traffic management schedules traffic based on the bandwidth of interfaces.
As a result, traffic management is sensitive to the class of services rather than users,
which is fit for traffic at the network core side but not fit for traffic at the service access
side.
l Traditional traffic management has great difficulties in simultaneously controlling
multiple services of many users.
To solve the problems and provide better QoS, a kind of QoS technology that can carry out
queue scheduling based on the priorities of services and control the traffic of users, is in
urgent need. Combined with the Diff-Serv scheme, HQoS supported by ATN adopts three
levels of scheduling. HQoS enables the equipment to acquire policies for controlling internal
resources with the existing hardware. It can both provide the quality assurance for the
advanced users and reduce the total cost of the network construction.
Family HQoS is the classifying of services with the specified characteristics into the same
family. These services enter one subscriber queue (SQ). Currently, methods of identifying
family members include NONE, C-VLAN. Subscribers going online from different sub-
interfaces of the same interface can also be classified as belonging to the same family.
Leased line user refers to an enterprise that leases an entire interface (or interfaces) from the
operator. All subscribers from this enterprise are scheduled, and managed in a unified manner.
12.5.2 Principles
12.5.2.1 Related Concepts of HQoS

A queue is a storage format of packets during the forwarding process. When the rate of traffic
exceeds the bandwidth on a port or the bandwidth set for the traffic, packets are placed into

Equipment
the cache in a queue. The time and sequence for packets leaving related queues and the
scheduling of packets in various queues are determined by scheduling policies. The QoS
queue structure only includes the downstream queue structure, which includes flow queues/
Subscriber Queue and port queues, as shown in Figure 12-22.
Figure 12-22 Downstream QoS queue scheduling structure

Level-3 FQ Level-2 SQ Level-1 TP
Packet
FQ0
shaping
FQ1 SQ-a
shaping PQ/WFQ shaping
FQ7 SQ-b Port

shaping shaping RR shaping
SQ-c
shaping
WRED Physical queue Virtual queue Scheduling algorithm
Flow Queue/Subscriber Queue

l A Flow Queue (FQ) is a physical queue, and is used to store the data packets of each
flow temporarily. A delay occurs when data packets enter or leave the queue.
l A Subscriber Queue (SQ) is a virtual queue. A virtual queue means that there is no
buffer for the queue. Data packets of the queue enter or leave the queue without any
delay. The queue is only a level in hierarchical scheduling for output packets.
Each FQ belongs to only one SQ, and each SQ corresponds to eight FQs. In practice, each SQ
maps one user (Tunnel, port or "port + VLAN"). Each user can use one to eight FQs.
Port Queue
A Port Queue (PQ) is a virtual queue. A virtual queue means that there is no buffer for the
queue. Data packets of the queue enter or leave the queue without any delay. The queue is
only a level in hierarchical scheduling for output packets. In practice, each PQ maps one
physical port or one logical port.
12.5.2.2 Queue Scheduling Technology

The queue scheduling mechanism is a very important technology in QoS. When the
congestion occurs, a proper queue scheduling mechanism can provide packets of a certain
type with proper QoS features such as the bandwidth, delay, and jitter. The queue scheduling
mechanism works only when the congestion occurs. The commonly used queue scheduling
technologies include Round Robin (RR), Weighted Fair Queuing (WFQ), Priority Queuing
(PQ), and Low Priority Queuing (LPQ).

Equipment
NOTE
Among the ATN 950B series, only the ATN 950B (AND2CXPB/AND2CXPE) supports lpq.
RR
RR is a simple scheduling method. Through RR, multiple queues can be scheduled.
Figure 12-23 Schematic diagram of RR
1
2
3
scheduliing
4 7 6 5 4 3 2 1
6
Round robin, take a packet
7 away from each queue
RR schedules multiple queues in ring mode. If the queue on which RR is performed is not
empty, the scheduler takes one packet away from the queue. If the queue is empty, the queue
is skipped and the scheduler does not wait.
PQ (SP)
PQ (also called Strict Priority (SP)) is an absolute priority scheduling method. In PQ, packets
with a high priority are scheduled with precedence.
WFQ
WFQ is used to assign bandwidth to queues taking part in the scheduling according to the
weights of the queues. In WFQ, unused bandwidth is reassigned.
LPQ (SPL)
LPQ (also called Strict Priority Low (SPL)) is also an absolute priority scheduling method.
The difference between LPQ and PQ lies in their priorities during two-level scheduling of
HQoS. The priority sequence applied for two-level scheduling is PQ > WFQ > LPQ.

Equipment
Comparison Between the Scheduling Technologies

Scheduli Complexity Delay/Jitter Fairness
ng
Algorith
ms
RR It is easy to In case of low-speed The packets in the

implement. scheduling (with many queues queues are
to be scheduled), delay and scheduled based on
jitter intensify. the lengths of
packets.
PQ It is easy to There is almost no delay or The queues are

implement. jitter. scheduled at the
granularity of
bytes.
LPQ It is easy to In case of low-priority queue, The queues are

implement. delay and jitter intensify. scheduled at the
granularity of
bytes.
WFQ It is complex to The delay is controlled The queues are

implement. properly and the jitter is low. scheduled fairly at
the granularity of
bytes.
12.5.2.3 HQoS Queue Scheduling

ATN supports three levels of scheduling in the downlink direction. FQs, SQs, and PQs are
scheduled in the sequence of FQ -> SQ - > Target Port (TP). See Figure 12-22.
QoS queue scheduling includes FQ/SQ scheduling, and Target Port scheduling.
FQ/SQ Scheduling
Flow queue scheduling includes FQ traffic shaping, FQ scheduling, SQ traffic shaping, and
SQ scheduling. Traffic shaping uses a double token bucket and places tokens to the bucket
according to the rate of the traffic shaper.
An SQ corresponds to a user (Tunnel, port or "port + VLAN"). The SQ scheduling ensures the
CIR and PIR bandwidth of the user.
The FQ scheduling adopts PQ/WFQ/LPQ. Each SQ corresponds to eight FQs, and the eight
FQs share the bandwidth of the SQ. Traffic shaping is implemented on each FQ to limit the
traffic rate. That is, the PIR is configured for each FQ.
NOTE
For ATN 950B series, only ATN 950B (AND2CXPB/AND2CXPE) supports LPQ.
Figure 12-24 shows the hierarchical structure of the FQ scheduling.

Equipment
Figure 12-24 Hierarchical structure of the FQ scheduling
FQ0(LPQ)
LPQ
FQ1(WFQ)
FQ2(WFQ)
WFQ
FQ3(WFQ) PQ
FQ4(WFQ)
FQ5(PQ)
FQ6(PQ) PQ
FQ7(PQ)
Target Port Scheduling

For ports where SQs are configured, packets enter the port queue through the configured SQs
and the traffic rate is limited at the port. For ports where SQs are not configured, packets enter
the port queue through the default SQs and the traffic rate is limited at the port.
To ensure the allocation of available bandwidth, RR scheduling is performed between the

downstream Target Port.
12.5.3 HQoS Applications
Example of HQoS Application to Mobile Services

Services from base stations are received over Layer 3 private lines. You can configure HQoS
scheduling for services from base stations to ensure that the total bandwidth for services from
base stations and the bandwidth for each type of services are sufficient.
Figure 12-25 Network diagram of HQoS application to mobile services
PE PE
mobile backhaul
network
Base station CX
ATN RNC
equipment

Equipment

Terms
Terms Description
Differentiated Service A QoS model that classifies the service level according the
packet precedence field (IP Precedence and DSCP), the
source IP address and the destination IP address. Packets
with different levels can be provided with different service
levels. It is commonly used to provide end-to-end QoS for
specified application programs.
Fair Queue A mechanism for queue scheduling in which network

resource is allocated equally and delay and jitter time of all
traffic are optimized.
Priority Queue A queuing policy based on packet priorities. It features that

the packet with a higher priority is allocated resource
firstly.
QoS An estimation of the ability of service providers to meet the

requirements of the user. It focuses on estimating the delay,
jitter delay and packet loss ratio.
Weighted Fair Queue It features automatic traffic classification and balances the
delay and jitter time of each traffic. Compared with Fair
Queue, it benefits the high-priority packet.
Abbreviations
HQoS Hierarchical QoS
RR Round Robin
WRR Weighted Round Robin
SP Strict Priority
SPL Strict Priority Low
CoS Class of Service

Equipment
Feature Description 13 Clock
13 Clock
About This Chapter
This document describes the clock and time synchronization features in terms of overview,
principle, and applications.
NOTE
If the SFP interface houses an electrical module, the interface does not support synchronous
Ethernet,IEEE 1588v2 or IEEE 1588 ACR .
13.1 Clock Synchronization

13.2 NTP
13.3 1588v2
13.4 1588 ACR
13.5 1588 ATR
13.6 CES ACR/DCR
13.7 G.8275.1
13.8 Atom GPS Timing
13.1 Clock Synchronization
13.1.1 Introduction
Definition
Clock synchronization including frequency synchronization and time synchronization, refers
to a precise and specific relationship between the frequencies or phases of signals. Signals are
moving at the same rate of speed at any given moment and this guarantees that all the devices
in the communication network works at the same speed.
Information is coded into discrete Pulse Code Modulation (PCM) pulses for transmission over
digital communications networks. If two digital switching devices have different clock

Equipment
frequencies or if digital bit streams are impaired or damaged, then phase drift or jitter occurs
during transmission. This results in the loss and duplication of elements from the buffer
storage device of the digital switching system, leading to loses from slippage of transmitted
bit streams.
13.1.2 Principles
Clock Source
A device that provides clock signals to a local device is called a clock source. A local device
may have multiple clock sources. There are several types of clock sources:
l External clock source

An external clock source uses a clock interface provided by a control board to trace a
higher level clock.
l Line clock source
A control board extracts clock signals from Ethernet line signals. The ATNcan extract
clock signals from physical interfaces.
l Internal clock source
Synchronization status messages (SSMs) are also referred to as synchronization quality
messages. On a link carrying synchronization timing signals, SSMs indicate clock
quality levels of the timing signals.
l PTP clock source
SSMs are sent and received in both directions between the master and slave devices.
Clock or time synchronization is implemented based on the received and sent timestamps
in SSMs.
SSM
SSMs are also referred to as synchronization quality messages. On a link carrying
synchronization timing signals, SSMs indicate clock quality levels of the timing signals.
13.1.2.2 Synchronization Mode and Issues of Concern
There are two ways to synchronize digital communications networks:
l Pseudo synchronization
l Master/slave synchronization
Pseudo Synchronization
Pseudo synchronization refers to situations in which each switching site has its own highly
accurate and highly stable independent clock. The clocks of the switching sites are not
synchronized. Differences in clock frequency and phasing between different switching sites
are, however, very small. They do not affect data transmissions and can be ignored.

Equipment
Pseudo synchronization is generally used when digital communications networks from

different countries interact. Most countries make use of cesium clocks on their networks.
Master/Slave Synchronization
Master/slave synchronization refers to situations in which a highly accurate clock is set as the
internal master clock for a network. Clocks at all sites within the network trace the master
clock. Each sub-site traces a higher level clock until the highest level network element is
reached.
There are two types of master/slave synchronization:
l Direct master/slave synchronization
l Level-based master/slave synchronization
Figure 13-1 shows direct master/slave synchronization. All of the slave clocks synchronize
directly with the primary reference clock. Direct master/slave synchronization is used on
networks with relatively simple structures.
Figure 13-1 Direct master/slave synchronization

Primary
reference clock
Slave clock Slave clock Slave clock
Figure 13-2 shows level-based master/slave synchronization. Devices on the network are
divided into three levels. Level two clocks synchronize with the level one reference clock.
Level three clocks synchronize with level two clocks. Level-based master/slave
synchronization is used on networks of larger scale and complicated structure.

Equipment
Figure 13-2 Level-based master/slave synchronization
Level-1 reference clock
Level-2 slave clock
Level-3 slave clock
Master/slave synchronization is generally used to synchronize a country's domestic digital

communications network or internal regional networks. The national digital communications
network or regional network has a highly accurate master clock. Other network elements on
the national or regional network use the master clock as the reference clock.
To improve the reliability of master/slave synchronization, two master clocks are set on the
network. There is an active master clock and a standby master clock. Both are cesium clocks.
Under normal circumstances, each network element traces the master clock The standby
master clock also traces the master clock. If the active master clock is faulty, the standby
master clock takes over and becomes the reference clock for the entire network. After the
fault is repaired and the master clock recovers, there is a switchover. The original active
master clock becomes active again and serves as the reference clock.
The device that functions as a slave clock has the following working status:
l Trace status
A slave clock traces and locks on to a clock source provided by a higher level clock. This
clock source may be the master clock or it may be an internal clock source on the
network element at the next highest level.
l Holding status
If a slave clock loses reference clocks, the slave clock enters holding status. The slave
clock uses the last frequency stored before the reference clocks were lost. In addition, the
slave clock provides clock signals that conform to the source reference clock. This
ensures that there are only minor frequency differences between the clock signals
provided by the slave clock and those of the reference clock. After the holding time
expires, the slave clock enters free running status.
l Free running status
If a slave clock loses all external reference clocks, the slave clock loses clock reference
memories. As a result, the oscillator inside the slave clock works in free running status.
Fixed oscillation frequencies may drift over time. This means that a clock in holding status
cannot retain its accuracy for a long period of time. The accuracy of a clock in holding status
is inferior to that of a clock in trace status.
13.1.2.3 Networking Mode for Clock Synchronization

Equipment
Transmitting Clock Signals Through a Clock Interface

As shown in Figure 13-3, CX-A traces the BITS clock and uses clock cables to connect the
clock output interface of CX-A with that of CX-B. CX-B and ATN are also connected through
clock cables. ATN traces the clock of CX-B and, finally, all three devices are synchronized
with the BITS clock.
Figure 13-3 Transmitting clock signals through a clock interface
BITS
CLK-IN
CLK-IN CLK-IN
CLK-OUT CLK-OUT
CX- A CX-B ATN
The networking previously described can only be used to connect devices at the same site.
The distance between the ATN and CX cannot exceed 200 meters.
Transmitting Clock Signals Through an Ethernet Link

A synchronized Ethernet network can transmit clock signals. The system uses a clock module
to transmit a high-precision system clock to all Ethernet interface cards. The Ethernet
interfaces then use this high-precision clock as the basis for data transmissions. On the
receiver side, the Ethernet interface decodes the synchronized clock information and, after
frequency division, sends it to the clock module. The clock module judges the quality of the
clocks, selects the most precise one, and synchronizes the system clock to itself.
To select the source correctly, SSMs must be transmitted along with clock information. On
SDH networks, clock levels are differentiated by the outbound overhead byte in the SDH. An
Ethernet network has no outbound channel, so the SSM domain of Ethernet OAM is used to
provide downstream devices with clock level information.
As shown in Figure 13-4, CX-A traces the BITS clock. There is a link connecting CX-A and
CX-B. CX-B and ATN are connected through Ethernet links. ATN traces the clock of CX-B.
Finally, clocks of all three devices synchronize with the BITS clock.
Figure 13-4 Transmitting clock signals through an Ethernet link
BITS
CLK-IN
Ethernet Ethernet
CX- A CX- B ATN

Equipment
13.1.2.4 Typical Networking for Clock Synchronization
Link Network Topology

As shown in Figure 13-5, CX-B and the external clock are connected. CX-B serves as the
main clock station for the network. The external clock of CX-B serves as the reference clock
for this station and for the network.
Figure 13-5 Networking diagram of a link network topology
External clock
E W E W E W
ATNA CX- B CX- C ATND
NOTE
In all of the networking diagrams for this chapter, W represents the west side interface, and E represents
the east side interface.
The ATNA serves as the local clock for this network element. It extracts clock information
from signals, which are received at the E side interface. The clock board on CX-C also acts as
the local clock for its network element, extracting clock information from signals, which in
this case are received at the W side interface. At the same time, clock information is attached
to signals and these are transmitted downstream to ATND. ATND receives these signals at the
W side interface and uses the clock information extracted as a reference point to complete
clock synchronization with the main station CX-B.
Performance degradation of the clock on ATNA will not influence the clocks on CX-C and
ATND, but performance degradation on CX-C can influence the ATND clock because ATND
traces its clock through the higher level device, CX-C.
If a link is very long, clock signals transmitted to a slave station must be transmitted a long
distance or divided into several transmissions. To ensure that slave stations receive high
quality clock signals, two master clocks can be set on the network to act as reference clocks.
Network elements can trace one or the other of these reference clocks. The two reference
clocks must maintain synchronization and be at the same quality level.
Ring Network Topology

As Figure 13-6 shows, CX-A is the main station in this topology. It uses an external clock
source as a local clock and as the reference clock for this network. Other network elements
trace the clock from CX-A. Slave stations on a ring network trace clocks in almost the same
way that slave stations on a link network trace clocks. However, on a ring network, slave
stations can extract clock information from STM-N signals that are received by either E side
or W side interfaces. Both the number and distance of transmissions influence clock signals.
Slave station network elements should extract clock signals that have traveled the shortest
route and required the fewest number of transmissions from the master clock. For example,
ATNE traces the clock at the W side, and ATNC traces the clock at the E side.

Equipment
Figure 13-6 Networking diagram of a ring network topology
External clock source
W E
E CX- A W
ATNA ATNE
W E
E W
ATNB ATND
W E
ATNC
E W
Mixed Topology
As shown in Figure 13-7, ATNA, ATNB, ATNC, and CX-D form a ring network topology
and make use of STM-N links. CX-D and CX-E form a link network topology and make use
of STM-M links. N > M: Indicates that the link bandwidth of the ring network is greater than
that of the link network.
Serving as the main station, CX-E uses an external clock source as the reference clock for all
the devices on the network. CX-E and CX-D are connected through a low-speed link.
Figure 13-7 Networking diagram of a mixed topology

W E
ATNA
STM-N
E W
STM-M W
ATNB CX- D
W E CX- E
ATNC
E W

Equipment
ATNA, ATNB, and ATNC use both E side and W side interfaces to trace and lock the clock of
CX-D. This CX-D clock traces the clock transmitted by the main station CX-E. CX-D
extracts clock information from the STM-N signals transmitted by CX-E and uses these to
synchronize with the downstream devices.
13.1.2.5 Clock Protection Switching
This section describes how to deploy a network with highly reliable clock synchronization.
The following topics are covered:
l Overview of clock protection switching
l Implementation
l Boards participating in clock protection switching
Overview
Each device on the network uses a particular clock synchronization path moving from level to
level to trace the same reference clock. Clock synchronization for the entire network is
implemented in this way. Usually, a device does not acquire just one path to one clock source.
It may have multiple clock sources. These clock sources may come from the same master
clock or they may come from reference clocks with different quality levels. On a
synchronized network, it is very important to keep the clocks of the device synchronous. If
one clock synchronization path is faulty, synchronization on the entire network is faulty.
Automatic protection switching for synchronized clocks can be used to avoid this situation.
Automatic protection switching means that, if a device loses all paths that it has traced to a
certain clock source, it can automatically trace another clock source. The new path may lead
to the same reference clock which the device was previously tracing or it may lead to a
different lower quality clock source.
The clock source switchover can be lossless, without error codes generated.
Implementation
l Specifying a clock reference source manually
This method is used to designate a particular, fixed clock source for a clock board to
trace. In addition, the active and standby clock boards can be configured to trace
different clock sources.
As shown in Figure 13-8, on the master clock CX-A, the active clock board has been set
manually to trace BITS1 and the standby clock board has been set to trace BITS2. Under
normal circumstances, the master clock traces the BITSI reference clock. If the active
clock board is faulty, there is a switchover between the active and standby clock boards.
After the CX-A traces the BITS2 reference clock. CX-B traces the clock of CX, and
ATNC traces the clock of CX-B.
The problem with this method is that all of the device on the network are set to trace the
clock of CX-A. If CX-A is faulty, the entire network has no reference clock. All of the
devices are in the free oscillation status.

Equipment
Figure 13-8 Networking diagram for specifying the clock reference source manually
BITS1
CLK-IN
CX- A
CLK-IN CX- B ATNC
BITS2
l Protection switching based on prioritized clock sources

If there are multiple clock reference sources, they can be assigned different priority
levels. During protection switching, the clock board first selects the clock reference
source with the highest priority if SSM is not enabled. If the highest priority clock
reference source is faulty, the clock board chooses the clock reference with the second
highest priority.
l Protection switching based on the SSM level
SSM is a group of codes used to indicate the quality level of clocks on a synchronized
network. At present, ITU-T specifies that four bits are used for coding. These four bits
are the Synchronous Status Message Byte (SSMB). As shown in Table 13-1, ITU-T
defines SSM code. The code defines 16 levels of quality for synchronized sources. When
SSMB equals 2, the clock is at the highest quality level. When SSMB equals f, the clock
is at the lowest quality level.
On an SDH transmission network, SSM is transmitted by the last four bits (b5 to b8) in
the S1 byte in the SDH section overhead. On a BITS device, however, SSM is
transmitted by a particular bit in the first timeslot (TS0) of a 2 Mbit/s clock signal.
Clearly, 2 MHz clock signals cannot carry SSM messages.
Be clear that SSMB and the S1 byte are not the same. SSMB is a group of message
codes used to indicate quality levels of clocks, as shown in Table 13-1. The S1 byte is in
the SDH section overhead. The last four bits of the S1 byte represent SSMB.
Table 13-1 Codes of synchronous status message
Z1 (b5-b8) S1 Byte SDH Synchronization Quality Level
0000 0x00 Unknown
0001 0x01 Reserved
0010 0x02 G.811 clock signals (PRC, the cesium clock)
0011 0x03 Reserved
0100 0x04 G.812 clock signals in the transit (SSUA, the

rubidium clock)
0101 0x05 Reserved
0110 0x06 Reserved

Equipment
Z1 (b5-b8) S1 Byte SDH Synchronization Quality Level
0111 0x07 Reserved
1000 0x08 G.812 clock signals in the local office (SSUB,

the rubidium clock or the crystal clock)
1001 0x09 Reserved
1010 0x0a Reserved
1011 0x0b Signals of the Synchronous Equipment Timing

Source (crystal clock)
1100 0x0c Reserved
1101 0x0d Reserved
1110 0x0e Reserved
1111 0x0f Cannot be used for synchronization (DUN)
If the SSM level of a reference source is DNU and SSM is enabled, this reference source
is not chosen during protection switching.
The SSM level of output signals is determined by the traced clock source. When the
clock works in trace status, the current traced clock source port outputs signals with SSM
level of ODU, and other clock source ports outputs clock signals with the same SSM
level as the traced clock. When the clock does not work in trace status, the SSM level of
the output signals is SEC.
To set a clock source for a PIC, an SSM can be extracted from the PIC and reported to
the system control board. The system control board uses the SSM of the line clock
source to set the clock board. The system control board can also force setting the SSM of
the PIC clock source.
For a clock module that uses a BITS clock source, if the signal is 2.048 Mbit/s, an SSM
can be extracted by the clock module.
Boards Participating in Clock Protection Switch

Clock protection switching involves the following boards:
l Ethernet Service Interface Board
SSM packets are encapsulated and transmitted by running the Ethernet OAM slow
protocol packets. The protocol format complies with ITU-T G.8264.
13.1.3 Applications
Transmission of the Clock Through an IP Bearer Network

Figure 13-9 shows an IP-RAN solution, in which an Ethernet IP network is used as the bearer
network between the Base Transceiver Station (BTS) andRNC. The external clock of ATN A
uses two clock links configured in master slave mode to access the BITS clock of the RNC.
All clocks on the IP network are synchronized with the external BITS clock on ATN A. To

Equipment
deliver wireless services, the IP bearer network uses IP and Ethernet technologies between the
RNC and NodeBs. Clock information sent by devices on the bearer network is synchronized
and then allocated to data communication devices connected to the base stations. Base
stations receive reliable clock transmissions and quality is guaranteed.
Figure 13-9 Networking diagram of an Ethernet clock synchronization network
BTS
FE
CX600 RNC
FE GE GE
BTS ATNA GE GE
FE ATNB
BTS
Clock Protection Switching

As shown in Figure 13-10,CX-A and CX--D access BITS clock signals through the external
clock interfaces. The two BITS clocks satisfy signal quality requirements for the G.812 local
clock. Generally, all clocks on the network synchronize with the external BITS reference
clock of CX--A.

Equipment
Figure 13-10 Diagram of clock trace in normal status
BITS
W E
E W
CX- A
ATNB ATNF
W E
E W
ATNC ATNE
W E
CX- D
E W
BITS
Table 13-2 Synchronization source and level of clock source

Network Synchroniz Level of Clock Source (in a Descending Order)
Element ation
Source
CX-A External External clock source, East side clock source, West side clock
clock source source, and internal clock source
ATN B East side East side clock source, West side clock source, and internal
clock source clock source
ATN C East side East side clock source, West side clock source, and internal
CX-D East side East side clock source, West side clock source, External clock
clock source source, and internal clock source
ATN E West side West side clock source, East side clock source, and internal
ATN F West side West side clock source, East side clock source, and internal

Equipment
In addition, for CX-A and CX-D, the timeslot where the S1 byte of the external BITS clock is
stored must be configured.
l Use of clock protection switching when the link between ATN B and ATN C is faulty
Under normal circumstances, if the optical fiber between ATN B and ATN C breaks, the
synchronized clocks will switch automatically. CX-D traces a clock from ATN C, so
keeping in mind the switching protocol mentioned in the preceding sentence, the clock
quality message sent by CX-D to ATN C is "Do Not Use (DNU)", that is, the S1 byte is
0X0F. When ATN C detects the loss of the east side clock source, it cannot use the west
side clock source as the synchronization source for this station. ATN C can only use the
internal clock source as a reference clock. It sends the information to CX-D through the
S1 byte. The specific S1 byte sent by ATN C to CX-D is 0X0B, meaning "Synchronized
Equipment Timing Source (SETS) clock signals".
After receiving this signal, the quality of all synchronization sources traced by CX-D
decreases (the original clock source is the G.812 local clock; its S1 byte is 0X08). None
of these can satisfy the quality requirements set for synchronization sources. CX-D must
choose a different clock source that satisfies quality requirements. Four clock sources are
available for CX-D:
– East side clock source
– West side clock source
– Internal clock source
– External BITS clock source
At this time, only the west side clock source and the external BITS clock source satisfy
the quality threshold requirements.
The level of the west side clock source configured on CX-D is superior to that of the
external BITS clock source, so CX-D finally chooses the former as the synchronization
source for this station. After CX-D switches from the east side to west side to trace a
synchronization source, the west side clock source of ATN C becomes available. At this
time, among the clock sources available to ATN C, the west side clock source meets
quality threshold requirements and is the highest level clock available. As a result, ATN
C chooses the west side clock source as the synchronization source. Clock traces for the
entire network are shown in Figure 13-11.

Equipment
Figure 13-11 Diagram of clock traces when the optical fiber between ATN B and ATN C
is damaged
BITS
W E
E W
CX- A
ATNB ATNF
E
W
ATNC ATNE
W E
CX- D
E W
BITS
l Clock protection switching when the BITS clock on CX-A is faulty

If the external BITS clock of CX-A is faulty, the likely situation for clock traces is
shown in Figure 13-12. Also using a switching protocol, the situation diagramed is
similar to the previous one.

Equipment
Figure 13-12 Diagram of clock traces when the external BITS clock on CX-A is faulty
W E
E CX- A W
ATNB ATNF
E
W
E W
ATNC ATNE
W E
CX- D
E W
BITS
l Clock protection switching when all BITS clocks are faulty

When the external BITS clocks on CX-A and CX-D are faulty, all the available clock
sources for each network element cannot satisfy the quality threshold requirements
needed for a reference clock source. With a switching protocol in place, each network
element chooses the highest level clock source from those available to be the
synchronization source. Before all the BITS clocks are faulty, the clocks synchronize
with the clock of CX-D. Assume that, prior to all the BITS clocks becoming faulty, the
clocks of each NE are synchronized with the clock on CX-D, as shown in Figure 13-13.
At this time, the synchronization clock source of the whole network changes from the G.
812–compliant local clock to the clock source from the synchronization device Though
the clock source is degraded, all clocks on the network are still synchronized with one
reference clock source.

Equipment
Figure 13-13 Diagram of clock traces when all the external BITS clocks are faulty
W E
E CX- A W
ATNB ATNF
E
W
E W
ATNC ATNE
W E
CX- D
E W
Inner clock
source
Terms
Terms Description
Synchron On a modern communications network, in most cases, the proper functioning

ization of telecommunications services requires network clock synchronization,
meaning that the frequency offset or time difference between devices must be
kept in an acceptable range. Network clock synchronization includes frequency
synchronization and phase synchronization.
l Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a
strict relationship between signals based on a constant frequency offset or a
constant phase offset, in which signals are sent or received at the same
average rate in a valid instance. In this manner, all devices on the
communications network operate at the same rate. That is, the phase
difference between signals remains a fixed value.
l Phase synchronization
Phase synchronization, also called time synchronization, refers to the
consistency of both frequencies and phases between signals. This means
that the phase offset between signals is always 0.
13.2 NTP

Equipment
13.2.1 Introduction
Definition
Network Time Protocol (NTP) is an application layer protocol used on the internet to
synchronize clock among a set of distributed time servers and clients. In this manner, the
clock of the host is synchronized with certain time standards.
Purpose
NTP synchronizes the time of all devices that have clock configured on the network. If time
synchronization is not performed using NTP, the devices may encounter time errors. NTP
enables all network devices to have consistent time so that the devices provide various
applications based on unified time.
13.2.2 Principle
NTP synchronizes time among a set of distributed time servers and clients. In this manner, the
time of the host is synchronized with certain time standards. The server and client are two
relative concepts. The device that announces the willingness to synchronize clocks and
provides the standard time is a server; the device that announces its willingness to be
synchronized is a client. A local system running NTP can be synchronized by other clock
sources or acts as a clock source to synchronize other clocks. In addition, mutual
synchronization can be implemented through NTP packet exchanges. NTP message
transmission is based on UDP.
13.2.2.1 Network Architecture

As shown in Figure 13-14, the networking of NTP is composed of primary time server,
secondary time server, clients, and interconnecting transmission paths.
Figure 13-14 Network Architecture of NTP
Secondary Primary server Secondary

server server
Third Third
server server
l A primary time server is directly synchronized with a primary reference source, which is
usually a radio clock or Global Positioning System (GPS).
l A secondary time server synchronizes its clock with the clock of the primary time server
on the network or other secondary time servers, and transmits the time information to
other hosts on the network through NTP.

Equipment
Under normal circumstances, primary and secondary time servers in the synchronization
subnet assume a hierarchical-master-slave structure, with the primary server at the root and
the secondary server at successive stratums toward the leaf node. The higher the stratum level
is, the less accurate the clock will be.
13.2.2.2 Operating Mode

In actual application, you need to select a proper NTP operating mode based on the network
deployment to meet various clock synchronization requirements. The operating modes of
NTP are classified into unicast client/server mode, peer mode, broadcast mode, manycast
mode and multicast mode.
Unicast Client/Server Mode

l The host that functions as a client sends packets to the server periodically. The value of
the Mode field in a packet is set to 3. This indicates that the packet is sent by a client,
without considering whether the server is reachable and which stratum the server is on.
Usually, the host operating in client mode is a workstation on a specified network, which
synchronizes its clocks with the clock on the server but does not alter the clock of the
server.
l The host that functions as a server receives the packets from the client and sends
response packets. The value of the Mode field in a response packet is set to 4. This
indicates that the packet is sent by a server. Usually, the host operating in server mode is
a time server on a network, which provides synchronization information for the clients
but does not alter its own clock.
During and after the restart, the host operating in client mode periodically sends NTP request
messages to the host operating in server mode. After receiving the NTP request message, the
server swaps the position of destination IP address and source IP address, and the source port
number and destination port number, fills in the necessary information, and sends the message
to the client. The server does not need to retain state information when the client sends the
request message. The client freely adjusts the interval for sending NTP request messages
according to the local conditions.
Figure 13-15 Unicast Client/Server Mode
Client Server
Internet
Clock synchronization Automatically works in

message(Mode3) client/server mode and
Performs clock filering sends a reply
Reply
and selection, and
message(Mode4)
synochronizes its local
clock to that of the optimal
reference source
Kiss-o'-Death (KOD) packets provide useful information to a client and are used for status
reporting and access control. When KOD is enabled at the server, the server may send packets
with kiss codes DENY and RATE to the client.

Equipment
l When the client receives packet with kiss code DENY, the client demobilizes any
associations with that server and stops sending packets to that server.
l When the client receives packet with kiss code RATE, the client immediately reduces its
polling interval to that server and continues to reduce it each time it receives a RATE
kiss code.
Peer Mode
In this mode, the active peer and the passive peer can be synchronized with each other. To be
specific, the higher stratum (lower level) peer is synchronized with the lower stratum (higher
level) peer. The active and passive peers first exchange NTP packets whose values of Mode
fields are 3 (sent by the client) and NTP packets whose values are 4 (sent by the server).
l Active peer: A host that functions as an active peer sends packets periodically. The value
of the Mode field in a packet is set to 1. This indicates that the packet is sent by an active
peer, without considering whether its peer is reachable and which stratum its peer is on.
The active peer can provide time information about the local clock for its peer, or
synchronize the time information about the local clock based on that of the peer clock.
l Passive peer: A host that functions as a passive peer receives packets from the active
peer and sends response packets. The value of the Mode field in a response packet is set
to 2. This indicates that the packer is sent by a passive peer. The passive peer can provide
time information about the local clock for its peer, or synchronize the time information
about the local clock based on that of the peer clock.
l Prerequisites for a host to function as a passive peer: The packets received by the local
host are sent by an active peer. The number of the stratum that the active peer is on must
be less than or equal to the number of the stratum that the local host is on. In addition,
the routes between the local host and the active peer must be reachable.
NOTE
The host operating in passive mode is at the lower stratum in the synchronization subnet. You do not
need to obtain information about the peer in advance because the connection between peers is not set up
and status variables are not configured unless the passive host receives NTP messages from the peer.
Figure 13-16 Peer mode

Symmetric active Symmetric
peer passive peer
Internet
Clock synchronization message

exchange(Mode 3 and Mode 4)
Automatically
Clock synchronization(Mode 1) works in symmetric
peers mode and
Reply(Mode 2) sends a reply
The symmetric peers mode is
established and the two devices Synchronize each other
can synchronize, or be
synchronized by each other

Equipment
Broadcast Mode
l A host that runs in broadcast mode sends clock synchronization packets to the broadcast
address 255.255.255.255 periodically. The value of the Mode field in a packet is set to 5.
This indicates that the packet is sent by a host that runs in broadcast mode, without
considering whether its peer is reachable and which stratum its peer is on. The host
running in broadcast mode is usually a time server running high-speed broadcast media
on the network, which provides synchronization information for all of its peers but does
not alter the clock of its own.
l The client listens to the broadcast packets sent from the server. When the client receives
the first broadcast packet, the client and server exchange NTP packets whose values of
Mode fields are 3 (sent by the client) and the NTP packets whose values of Mode fields
are 4 (sent by the server). In this process, the client enables the server/client mode for a
short time to exchange information with the remote server. This allows the client to
obtain the network delay between the client and the server. Then, the client returns the
broadcast mode, and continues to sense the incoming broadcast packets to synchronize
the local clock.
The broadcast mode is applied to the high speed network that has multiple workstations and
does not require high accuracy. In a typical scenario, one or more time servers on the network
periodically send broadcast packets to the workstations. The delay of packet transmission in a
LAN is at the milliseconds level.
Figure 13-17 Broadcast mode

Server Client
Internet
Periodically broadcasts clock

synchronization messages After receiving the first
(Mode 5) broadcast message, the
client sends a request
Clock synchronization message Calculates the network delay
exchange(Mode 3 and Mode 4) between client and the server
and enters the broadcast
Periodically broadcasts clock client mode
synchronization message
(Mode 5) Receives broadcast
messages and synchronizes
its local clock
Multicast Mode
l A server running in multicast mode sends clock synchronization packets to a multicast
address periodically. The value of the Mode field in a packet is set to 5. This indicates
that the packet is sent by a host that runs in multicast mode. The host running in
multicast mode is usually a time server running high-speed broadcast media on the
network, which provides synchronization information for all of its peers but does not
alter the clock of its own.
l The client listens to the multicast packets from the server. When the client receives the
first broadcast packet, the client and the server exchange NTP packets whose values of

Equipment
Mode fields are 3 (sent by the client) and the NTP packets whose values of Mode fields
are 4 (sent by the server). In this process, the client enables the server/client mode for a
short time to exchange information with the remote server. This allows the client to
obtain the network delay between the client and the server. Then, the client returns the
multicast mode, and continues to sense the incoming multicast packets to synchronize
the local clock.
Multicast mode is useful when there are large numbers of clients distributed in a network.
This normally results in large number of NTP packets in the network. In the multicast mode, a
single NTP multicast packet can potentially reach all the clients in the network and therefore
reduce the control traffic on the network.
Figure 13-18 Multicast mode

Server Client
Internet
Periodically multicasts clock

synchronization messages After receiving the first
(Mode 5) multicast message, the
client sends a request
Clock synchronization message Calculates the network delay
exchange(Mode 3 and Mode 4) between client and the server
and enters the multicast
Periodically multicasts clock client mode
synchronization message
(Mode 5) Receives multicast
messages and synchronizes
its local clock
13.2.2.3 Event Processing of NTP

The followings are considered as significant events of NTP: the event generated when the
timer of the peer in active mode times out; the event generated when NTP request messages
sent from various peers arrive at the server. An event can also be generated when certain
commands are run or system is faulty, for example, the primary reference clock is faulty.
Each mode processes the event with the same procedure: sending messages, receiving
messages, processing messages, and updating the clock.
Sending Messages
In all modes except the broadcast client mode and all server modes, the peer sends NTP
request messages when its timer times out. In broadcast client mode, the peer never sends
NTP request messages. In server mode, the peer sends NTP request messages only in
response to the received messages. If the received NTP request message does not result in a
local permanent connection, the action of receiving message invokes the action of sending
message to retain the connection.
To ensure a valid response message, the time when the message is sent must be accurately
saved and then added to the message.

Equipment
Receiving Messages
After receiving the NTP request message, the server first checks the mode field of the NTP
request message. If the value is 0, it indicates that the peer adopts an earlier NTP version.
Then, the server checks whether modes of the local end and the peer are matched. The
following cases exist:
l If the matching result is displayed as error, the message is discarded and an error
message is returned.
l If the matching result is displayed as recv, the received message is processed. If the
packet header is valid, the connection is marked as reachable. If both the packet header
and the data are valid, the clock-update procedure is invoked to update the local clock.
Otherwise, the connection is deleted if it is not configured in advance.
l If the matching result is displayed as xmit, the received message is processed and a
response message is sent immediately. If the connection is not configured in advance, it
is deleted.
l If the matching result is displayed as pkt, the received message is processed. If the
packet header is valid, the connection is marked as reachable. If both the packet header
and the data are valid, the clock-update procedure is invoked to update the local clock.
Otherwise, if the connection is not configured in advance, a response message is sent
immediately, and then the connection is deleted.
Processing Messages
This process is used to check the message validity, calculate delay/offset samples, and invoke
other procedures to filter data and select reference source. This process first requires that the
transmit timestamp should be different from the transmit timestamp of the last message
received from the same peer; otherwise, the message may be outdated.
Secondly, it is required that the originate timestamp should be different from the originate
timestamp of the last message sent to the same peer; otherwise, the message may be mis-
sequenced, bogus or less accurate. In broadcast mode (5), the roundtrip delay is zero. In this
case, the high accuracy of the time-transfer operation is not ensured. However, the accuracy
achieved may be adequate for most objectives.
After the preceding procedure, the best clock sample can be selected from a specified clock
and the best clock can be selected from clock groups at different stratums. Finally, the delay
(peer.delay), offset (peer.offset), and dispersion (peer.dispersion) for the peer are all
determined.
Updating the Clock

After the valid clock offset, delay and dispersion are determined by the clock-filter procedure,
the clock-selection procedure invokes the clock-update procedure. The result of the clock-
selection and clock-combining procedures is the final clock adjustment value. The local-clock
procedure then updates the local clock based on this value. If no reference source is found
after these procedures, the clock-update procedure stops.
The clock-selection is then invoked, which contains two algorithms: intersection and
clustering. The intersection algorithm generates a list of candidate peers suitable to serve as
the reference source and calculates a confidence interval for each peer. It then discards
falsetickers using a technology adopted from Marzullo and Owicki [MAR85]. The clustering
algorithm orders the list of remaining candidates based on their stratums and synchronization

Equipment
distance. It repeatedly discards outlyers peers based on select dispersion until only the most
accurate, precise and stable candidates are selected.
If the offset, delay, and dispersion of the candidate peers are close identical, the clock
combining analyzes the clock candidates and then provides the parameters determined
through comprehensive analysis to the local end for updating the local clock.
13.2.2.4 Operating Principle

Figure 13-19 shows the NTP implementation:
ATN A and ATN B are connected through a Wide Area Network (WAN). Each of them has its
own system clock, which is synchronized automatically through NTP.
Figure 13-19 Diagram of NTP implementation

NTP packet 10:00:00am
Step1: Network
ATNA ATNB
NTP packet 10:00:00am 11:00:01am
Step2: Network
ATNA ATNB
NTP packet 10:00:00am 11:00:01am 11:00:02am
Step3: Network
ATNA ATNB
NTP Packet received at 10:00:03
Step4: Network
ATNA ATNB
Presuming that:
l Before the clocks of ATN A and ATN B are synchronized, the clock of ATN A is
10:00:00 am. and the clock of ATN B is 11:00:00 am.
l ATN B acts as an NTP time server. ATN A must synchronize its clock with ATN B.
l Unidirectional transmission of an NTP message between ATN A to ATN B takes one
second.
l Both ATN A and ATN B take one second to process an NTP message.
The process of synchronizing the system clock is as follows:
l ATN A sends an NTP message to ATN B. The message carries an initial timestamp,
10:00:00 am (T1), indicating the time when it leaves ATN A.
l When the NTP message reaches ATN B, ATN B adds a timestamp, namely, 11: 00:01
am. (T2) to the NTP message, indicting the time when ATN B receives the message.

Equipment
l When the NTP message leaves ATN B, ATN B adds a transmit timestamp, namely,
11:00:02 am. (T3) to the NTP message, indicating the time when the message leaves
ATN B.
l When ATN A receives this response message, it adds a new receive timestamp, which is
10:00:03am (T4).
ATN A uses the received information to calculate the following two important parameters:
l A roundtrip delay of the NTP message: Delay = (T4 - T1) - (T3 - T2).
l The clock offset of ATN A by taking ATN B as a reference: Offset = ((T2 - T1) + (T3 -
T4))/2.
ATN A sets its clock based on the delay and offset to implement clock synchronization with
ATN B.
NOTE
NTP uses the standard algorithm in RFC 1305 to ensure the precision of clock synchronization. The
preceding example is only a brief introduction to the operating principle of NTP.
13.2.2.5 Security Mechanism

When a time server in the subnet is faulty or data is maliciously modified or destroyed,
timekeeping on other time servers in the subnet should not be affected. To meet this
requirement, NTP provides two security mechanisms: access authority and NTP
authentication to guarantee the network security.
Access Authority
The device protects local NTP services by setting access authority. This is a simple measure
to ensure security.
The device provides four access authority levels. When an NTP access request message
reaches the local end, the device matches it with the access authority from level 1 to level 4.
The first matched authority level takes effect. The matching sequence is as follows:
l peer: indicates the minimum access authority. The remote end can perform time requests
and control queries for the local NTP service. The local clock can also be synchronized
with the clock of the remote server.
l server: indicates that the remote end can perform time requests and control queries for
the local NTP service. The local clock, however, cannot be synchronized with the clock
of the remote server.
l synchronization: indicates that the remote end can perform time requests only for the
local NTP service.
l query: indicates the maximum access authority. The remote end can perform control
queries only for the local NTP service.
Authentication
NTP authentication can be enabled on networks demanding high security. NTP authentication
should be separately configured on the client and the server.
When configuring NTP authentication, note the following rules:

Equipment
l Configurations of NTP authentication on both the client and the server must be complete.
Otherwise, the authentication does not take effect. If NTP authentication is enabled, you
must configure the key and declare the key as reliable.
l Keys configured on the server and the client must be identical.
13.2.2.6 Dynamic and Static Associations of NTP

To manage the synchronization information transmitted to each reference source, the NTP
module sets up a peer structure for each reference source. These peer structures are saved as
links in the form of Hash. Each peer structure corresponds to an association (or a session).
NTP supports a maximum of 128 associations, including static and dynamic associations. The
maximum number of dynamic associations is 100.
Static Association
A static association is set up through command lines.
Dynamic Association
A dynamic association is set up dynamically during the NTP implementation process.
Static and Dynamic Associations in Different Modes

l In Unicast Client/Server mode, you must configure the IP address of the server to be
synchronized with on the client. In this situation, a static association is set up on the
client. The association is not set up on the server because the server only passively
responds to the request messages from the client.
l In symmetric peer mode, the IP address of the symmetric passive end must be configured
on the symmetric active end.
In such a case, a static association is set up on the symmetric active end for the message
exchange between peers. When the passive peer receives the seventeenth message from
the active peer
– If NTP authentication is not enabled, a dynamic association is set up
– If NTP authentication is enabled, a dynamic association is set up only after the
authentication
l In broadcast mode, you must enable the server mode on the interfaces of the broadcast
server. In such a case, a static association is set up on the server. You need also enable
the client mode on the broadcast client. This is not intended to set up a static association
but to set up a dynamic association after the client receives a response packet from the
server.
13.2.3 Terms and Acronyms

Terms
Terms Description
NTP Network Time Protocol (NTP) is an application layer protocol used in the
internet to synchronize timekeeping among a set of distributed time servers and
clients. In this manner, the clock of the host is synchronized with certain time
standards.

Equipment
Terms Description
Timesta An NTP timestamp is a second relative to 00:00:00 on 1 January, 1900. The

mp value is in the format of a 64-bit unsigned fixed-point number, with the integer
part being the first 32 bits and the fraction part being the last 32 bits.
l Originate Timestamp (peer.xmt, pkt.xmt): A format of the timestamp,
indicating the local time when the NTP message leaves the sender, such as
T1.
l Receive Timestamp (peer.rec, pkt.rec): A format of the timestamp,
indicating the local time when the NTP message reaches the remote peer,
such as T2. When the peer is unreachable, the receive timestamp is set to 0.
l Transmit Timestamp (peer.org, pkt.org): A format of the timestamp,
indicating the local time when the remote peer returns the NTP message,
such as T3. When the peer is unreachable, the transmit timestamp is set to 0.
l Reference Timestamp (sys.reftime, peer.reftime, pkt.reftime): A format of
the timestamp, indicating the local time when the NTP message reaches the
sender, such as T4. If the local clock is never synchronized, the reference
timestamp is set to 0.
Clock Clock offset is the time difference between the local clock and the reference
offset clock. It represents the offset time to be adjusted when the local clock is
synchronized with the reference clock.
Round Round trip delay refers to the time difference between the sending and
trip delay receiving of an NTP message on the client. It indicates the capability of the
local clock to send a message to the reference clock within a specified time.
Dispersio Dispersion is the maximum error of the local clock compared with the
n reference clock.
Stratum Stratum is a hierarchical standard for clock synchronization. It represents the

clock precision. The value ranges from 1 to 16. The lower the stratum is, the
more precise the clock will be. Value 1 indicates that the clock is the most
precise and 16 indicates that the clock is not synchronized.
Clock Clock filtering is used among is used to select the best time sample from a
filtering specified peer.
Clock Clock selection is a method of selecting reference clocks by using the clock
selection selection algorithm.
Acronyms
Acronyms Full Spelling
NTP Network Time Protocol

Equipment
13.3 1588v2
13.3.1 Introduction to 1588v2
Definition
l Synchronization
This is the process of ensuring that the frequency offset or time difference between
devices is kept within a reasonable range. In a modern communications network, most
telecommunications services require network clock synchronization in order to function
properly. Network clock synchronization includes time synchronization and frequency
synchronization.
– Time synchronization
Time synchronization, also called phase synchronization, means that both the
frequency of and the time between signals remain constant. In this case, the time
offset between signals is always 0.
– Frequency synchronization
Frequency synchronization, also called clock synchronization, refers to a constant
frequency offset or phase offset. In this case, signals are transmitted at a constant
average rate during any given time period so that all the devices on the network can
work at the same rate.
Figure 13-20 Schematic diagram of time synchronization and frequency synchronization
Phase synchronization
Watch A
Watch B
Frequency synchronization
Watch A
Watch B

Equipment
Figure 13-20 shows the differences between time synchronization and frequency
synchronization. If Watch A and Watch B always have the same time, they are in time
synchronization. If Watch A and Watch B have different time, but the time offset remains
constant, for example, 6 hours, they are in frequency synchronization.
l IEEE 1588
IEEE 1588 is defined by the Institute of Electrical and Electronics Engineers (IEEE) as
Precision Clock Synchronization Protocol (PTP) for networked measurement and control
systems. It is called the Precision Time Protocol (PTP) for short.
IEEE 1588v1, released in 2002, applies to industrial automation and tests and
measurements fields. With the development of IP networks and the popularization of 3G
networks, the demand for time synchronization on telecommunications networks has
increased. To satisfy this need, IEEE drafted IEEE 1588v2 based on IEEE 1588v1 in
June 2006, revised IEEE 1588v2 in 2007, and released IEEE 1588v2 at the end of 2008.
Targeted at telecommunications industry applications, IEEE 1588v2 improves on IEEE
1588v1 in the following aspects:
– Encapsulation of Layer 2 and Layer 3 packets has been added.
– The transmission rate of Sync messages is increased.
– A transparent clock (TC) model has been developed.
– Hardware timestamp processing has been defined.
– Time-length-value (TLV) extension is used to enhance protocol features and
functions.
1588v2 is a time synchronization protocol which allows for highly accurate time
synchronization between devices. It is also used to implement frequency synchronization
between devices.
Purpose
Data communications networks do not require time or frequency synchronization and,
therefore, routers on such networks do not need to support time or frequency synchronization.
On IP radio access networks (RANs), time or frequency needs to be synchronized among base
transceiver stations (BTSs). Therefore, routers on IP RANs are required to support time or
frequency synchronization.
Frequency synchronization between BTSs on an IP RAN requires that frequencies between

BTSs be synchronized to a certain level of accuracy; otherwise, calls may be dropped during
mobile handoffs. Some wireless standards require both frequency and time synchronization.
Table 13-3 shows the requirements of wireless standards for time synchronization and
frequency accuracy.
Table 13-3 Requirements of wireless standards for time synchronization and frequency
accuracy
Wireless Standards Requirement for Requirement for Time

Frequency Accuracy Synchronization
GSM 0.05 ppm NA
WCDMA 0.05 ppm NA
TD-SCDMA 0.05 ppm 1.5us

Equipment
Wireless Standards Requirement for Requirement for Time

Frequency Accuracy Synchronization
CDMA2000 0.05 ppm 1.5us
WiMax FDD 0.05 ppm NA
WiMax TDD 0.05 ppm 1us
LTE 0.05 ppm In favor of time

synchronization
Different BTSs have different requirements for frequency synchronization. These

requirements can be satisfied through physical clock synchronization (including external
clock input, WAN clock input, and synchronous Ethernet clock input) and packet-based clock
recovery (including CES ACR/DCR and 1588v2).
Traditional packet-based clock recovery cannot meet the time synchronization requirement of
BTSs. For example, NTP-based time synchronization is only accurate to within one second
and 1588v1-based time synchronization is only accurate to within one millisecond. To meet
time synchronization requirements, BTSs need to be connected directly to a global
positioning system (GPS). This solution, however, has some disadvantages such as GPS
installation and maintenance costs are high and communications may be vulnerable to
security breaches because a GPS uses satellites from different countries.
1588v2, with hardware assistance, provides time synchronization accuracy to within one
micro second to meet the time synchronization requirements of wireless networks. Thus, in
comparison with a GPS, 1588v2 deployment is less costly and operates independently of
GPS, making 1588v2 strategically significant.
In addition, operators are paying more attention to the operation and maintenance of
networks, requiring routers to provide network quality analysis (NQA) to support high-
precision delay measurement at the 100 us level. Consequently, high-precision time
synchronization between measuring devices and measured devices is required. 1588v2 meets
this requirement.
1588v2 packets are of the highest priority by default to avoid packet loss and keep clock
precision.
Benefits
This feature brings the following benefits to operators:
l Construction and maintenance costs for time synchronization on wireless networks are
reduced.
l High-accuracy NQA-based unidirectional delay measurement is supported.
13.3.2 Principles

Equipment
Clock Domain
Logically, a physical network can be divided into multiple clock domains. Each clock domain
has a reference time with which all devices in the domain are synchronized. Each clock
domain has its own reference time and these times are independent of one another.
A device can transparently transmit time signals from multiple clock domains over a bearer
network to provide specific reference times for multiple mobile operator networks. The
device, however, can join only one clock domain and can synchronize only with the
synchronization time of that clock domain.
Clock Node
Each node on a time synchronization network is a clock. The 1588v2 protocol defines the
following types of clocks:
l Ordinary clock
An ordinary clock (OC) has only one 1588v2 clock interface (a clock interface enabled
with 1588v2) through which the OC synchronizes with an upstream node or distributes
time signals to downstream nodes.
l Boundary clock
A boundary clock (BC) has multiple 1588v2 clock interfaces, one of which is used to
synchronize with an upstream node. The other interfaces can be used to distribute time
signals to downstream nodes.
The following is an example of a special case: If a ATN obtains the standard time from a
BITS through an external time interface (which is not enabled with 1588v2) and then
distributes time signals through two 1588v2 enabled clock interfaces to downstream
nodes, this router is a BC node, as it has more than one 1588v2 clock interface.
l Transparent clock
A transparent clock (TC) does not synchronize the time with other devices (unlike BCs
and OCs) but has multiple 1588v2 clock interfaces through which it transmits 1588v2
messages and corrects message transmission delays.
TCs are classified into end-to-end (E2E) TCs and peer-to-peer (P2P) TCs.
Figure 13-21 shows the location of the TC, OC, and TC+OC on a time synchronization
network.

Equipment
Figure 13-21 Location of the TC, OC, and TC+OC on a time synchronization network
BC1
Grandmaster clock
TC1 TC2
OC1 OC2 BC2 BC3
Cyclic path
TC3 TC4
OC3 OC4 OC5 OC6
Time Source Selection

On a 1588v2 time synchronization network, all clocks are organized into a master-slave
synchronization hierarchy with the Grandmaster (GM) clock at the top. This topology can be
statically configured or automatically generated by 1588v2 using the Best Master Clock
(BMC) algorithm.
1588v2 Announce messages are used to exchange time source information, including
information about the priority level of the GM, time strata, time accuracy, distance, and hops
to the GM between clocks. After this information has been gathered, one of the clock nodes is
selected to be the GM, the interface to be used for transmitting clock signals issued by the
GM is selected, and master and slave relationships between nodes are specified. A loop-free
and full-meshed GM-rooted spanning tree is established after completion of the process.
If a master-slave relationship has been set up between two nodes, the master node periodically
sends Announce messages to the slave node. If the slave node does not receive an Announce
message from the master node within a specified period of time, it terminates the current
master-slave relationship and finds another interface with which to establish a new master-
slave relationship.
Clock Modes of a 1588v2-enabled Device

l OC
l BC
l TC
l E2ETC
l P2PTC
l E2ETCOC
l P2PTCOC
l TCandBC

Equipment
Encapsulation Modes of a 1588v2 Packet

A 1588v2 packet can be encapsulated in either MAC or UDP mode:
l In MAC encapsulation, 802.1p priorities are carried in 1588v2 packets. MAC
encapsulation is classified into two types:
– Unicast encapsulation
– Multicast encapsulation
l In UDP encapsulation, Differentiated Service CodePoint (DSCP) values are carried in
1588v2 packets. UDP encapsulation is classified into two types:
– Unicast encapsulation
– Multicast encapsulation
Supported Link Types

Theoretically, 1588v2 supports all types of links, but at present it has only been defined for
encapsulation and implementation on Ethernet links and thus the ATN supports only Ethernet
links.
Grandmaster
A time synchronization network is like a GM-rooted spanning tree. All other nodes
synchronize with the GM.
Master/Slave
When a pair of nodes perform time synchronization, the upstream node distributing the
reference time signals is the master node and the downstream node receiving the reference
time signals is the slave node.
13.3.2.2 Principle of Synchronization

The principles of 1588v2 time synchronization and NTP are the same. The master and slave
nodes exchange timing packets, and calculate the packet transmission delays in two directions
(sending and receiving) according to the receiving and sending timestamps in the exchanged
timing packets. If the packet transmission delays in two directions are identical, the packet
transmission delay in one direction (the time offset between the slave and master nodes)
equals the delays in two directions divided by 2. Then, the slave node synchronizes with the
master node by correcting its local time according to the time offset.
In practice, the delay and jitter on the network need to be taken into account, and the sending
and receiving delays are not always identical. Therefore, packet-based time synchronization,
namely, 1588v2 and NTP, cannot guarantee high synchronization accuracy. For example, NTP
can only provide the synchronization accuracy of 10 to 100 ms.
1588v2 and NTP differ in implementation.
NTP runs at the application layer, for example, on the MPU of the ATN. The delay measured
by NTP, in addition to the link delay, includes various internal processing delays, such as the
internal congestion queuing, software scheduling, and software processing delays. These
make the packet transmission delay unstable, causing packet transmission delays in two
directions to be asymmetric. As a result, the accuracy of NTP-based time synchronization is
low.

Equipment
1588v2 presumes that the link delay is constant or changes so slowly that the change between
two synchronization processes can be ignored, and the packet transmission delays in two
directions on a link are identical. Packets are time-stamped for delay measurement at the
physical layer of the LPU. This ensures that time synchronization based on the obtained link
delay is extremely accurate.
1588v2 defines two modes for the delay measurement and time synchronization mechanisms,
namely, Delay and Peer Delay (PDelay).
Delay Mode
The Delay mode is applied to end-to-end (E2E) delay measurement. Figure 13-22 shows the
delay measurement in Delay mode.
Figure 13-22 E2E delay measurement in Delay mode

Master Slave
time time
Timestamps
known by slave
t1
Syn
t-ms
t2 t2
Follow_Up
t1, t2
t3 t1, t2, t3
Delay_Req
t-sm
t4
Delay_Resp
t1, t2, t3, t4
NOTE
As shown in Figure 13-22, t-sm and t-ms represent the sending and receiving delays respectively and
are presumed to be identical. If they are different, they should be made identical through asymmetric
delay correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. Only the one-step mode is described in this part and
Follow_UP packets are not mentioned. For details about the two-step mode, see the following part of
this section.
A master node periodically sends a Sync packet carrying the sending timestamp t1 to the slave
node. When the slave node receives the Sync packet, it time-stamps t2 to the packet.
The slave node periodically sends the Delay_Req packet carrying the sending timestamp t3 to
the master node. When the master node receives the Delay_Req packet, it time-stamps t4 to
the packet and returns a Delay_Resp packet to the slave node.

Equipment
The slave node receives a set of timestamps, including t1, t2, t3, and t4. Other elements
affecting the link delay are ignored.
The time offset between the master and slave nodes equals [(t2-t1)-(t4-t3)]/2.
Based on the time offset, the slave node synchronizes with the master node.
As shown in Figure 13-23, time synchronization is repeatedly performed to ensure constant

synchronization between the master and slave nodes.
Figure 13-23 Networking diagram of the directly-connected BC and OC
BC OC
Master Slave
t1 Sync
t2
DelayReq t3
t4
DelayResp
The BC and OC can be directly connected as shown in Figure 13-23. Alternatively, they can
be connected through other devices, but these devices must be TCs to ensure the accuracy of
time synchronization. The TC only transparently transmits 1588v2 packets and corrects the
packet transmission delay (which requires that the TC identify these 1588v2 packets).
To ensure the high accuracy of 1588v2 time synchronization, it is required that the packet
transmission delays in two directions between master and slave nodes be stable. Usually, the
link delay is stable but the transmission delay on devices is unstable. Therefore, if two nodes
performing time synchronization are connected through forwarding devices, the time
synchronization accuracy cannot be guaranteed. The solution to the problem is to perform the
transmission delay correction on these forwarding devices, which requires that the forwarding
devices be TCs.
Figure 13-24 shows how the transmission delay correction is performed on a TC.

Equipment
Figure 13-24 Schematic diagram of the transmission delay correction on a TC

Message at ingress Message at egress
Event message payload Network PTP message payload
Network
protocol Preamble protocol Preamble
correctionField headers correctionField headers
+
+
Ingress timestamp Engress timestamp

- +
Ingress Egress
Residence time bridge
The TC performs the transmission delay correction by adding the time it takes to transmit the
packet to the Correction field of a 1588v2 packet. This means that the TC deducts the
receiving timestamp of the 1588v2 packet on its inbound interface and adds the sending
timestamp to the 1588v2 packet on its outbound interface.
In this manner, the 1588v2 packet exchanged between the master and slave nodes, when
passing through multiple TCs, carry packet transmission delays of all TCs in the Correction
field. When the value of the Correction field is deducted, the value obtained is the link delay,
ensuring high accuracy time synchronization.
A TC that records the transmission delay from end to end as described above is the E2E TC.
Time synchronization in Delay mode can be applied only to E2E TCs. Figure 13-25 shows
how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 13-25 Networking diagram of the BC, OC, and E2E TC and the 1588v2 operation
BC E2E OC
Master TC Slave
t1 Sync
correction
t2
t3
correction
t4
DelayResp

Equipment
PDelay Mode
When performing time synchronization in PDelay mode, the slave node deducts both the
packet transmission delay and upstream link delay. This requires that adjacent devices
perform the delay measurement in PDelay mode to enable each device on the link to know its
upstream link delay. Figure 13-26 shows the delay measurement in PDelay mode.
Figure 13-26 Schematic diagram of the delay measurement in PDelay mode

Node 1 Node 2
time time
t1
Pdelay_Req
t-ms
t2
t3
Pdelay_Resp
t-sm
t4
Pdelay_Resp_Follow_Up
NOTE
As shown in Figure 13-22, t-sm and t-ms represent the sending and receiving delays respectively and
are presumed to be identical. If they are different, they should be made identical through asymmetric
delay correction. For details about asymmetric delay correction, see the following part of this section.
Follow_Up packets are used in two-step mode. In this part, the one-step mode is described and
Follow_UP packets are not mentioned. For details about the two-step mode, see the following part of
this section.
Node 1 periodically sends a PDelay_Req packet carrying the sending timestamp t1 to node 2.
When the PDelay_Req packet is received, node 2 time-stamps t2 to the PDelay_Req packet.
Then, node 2 sends a PDelay_Resp packet carrying the sending timestamp t3 to node 1. When
the PDelay_Resp packet is received, node 1 time-stamps t4 to the PDelay_Resp packet.
Node 1 obtains a set of timestamps, including t1, t2, t3, and t4. Other elements affecting the
link delay are ignored.
The packet transmission delays in two directions on the link between node 1 and node 2 equal
(t4 - t1) - (t3 - t2).
If the packet transmission delays in two directions on the link between node 1 to node 2 are
identical, the packet transmission delay in one direction equals [(t4 - t1) - (t3 - t2)]/2.

Equipment
The delay measurement in PDelay mode does not differentiate between the master and slave
nodes. All nodes send PDelay packets to their adjacent nodes to calculate adjacent link delay.
This calculation process repeats and the packet transmission delay in one direction is updated
accordingly.
The delay measurement in PDelay mode does not trigger time synchronization. To implement
time synchronization, the master node needs to periodically send Sync packets to the slave
node and the slave node receives the t1 and t2 timestamps. The slave node then deducts the
packet transmission delay on the link from the master node to the slave node. The obtained t2-
t1-CorrectionField is the time offset between the slave and master nodes. The slave node uses
the time offset to synchronize with the master node. Figure 13-27 shows how time
synchronization is implemented in PDelay mode in the scenario where the BC and OC are
directly connected.
Figure 13-27 Networking diagram of time synchronization in PDelay mode on the directly-
connected BC and OC
BC OC
Master Slave
t1 PDelay Req
t2
PDelay Resp t3
t4 PDelay Req t1
t2
t3 PDelay Resp
t4
t1 Sync
t2
The BC and OC can be directly connected as shown in Figure 13-23.

Alternatively, the BC and OC can be connected through other device functioning as TCs to
ensure the accuracy of time synchronization. The TC only transparently transmits 1588v2
packets and corrects the packet transmission delay (which requires that the TC identify these
1588v2 packets). The P2P-TC has a similar device structure as E2E-TC, but has a peer delay
mechanism configured on each interface. The mechanism calculates the link delay between
the local and remote interfaces on the shared links. Figure 13-28 shows how transmission
delay correction is performed on a P2PTC.

Equipment
Figure 13-28 Transmission delay correction in PDelay mode

Sync message at ingress Sync or Follow_up message at egress
Event message payload Network PTP message payload
Network
protocol Preamble protocol Preamble
correctionField headers correctionField headers
+
+
Ingress timestamp Engress timestamp

- +
+
Link delay on ingress port
Ingress Egress
Residence time bridge
Figure 13-29 shows how the BC, OC, and E2E TC are connected and how 1588v2 operates.
Figure 13-29 Schematic diagram of transmission delay correction in PDelay mode on a

P2PTC
BC P2P OC
Master TC Slave
t1 PDelayReq
t2
PDelayResp t3
t4 t1 PDelayReq
t2
t4 PDelayResp t3
PDelayReq t2
t1
PDelayResp t3
t4
t1 PDelayReq
t2
PDelayReq
t4 t3
t1 correction
Sync t2
One-Step/Two-Step
In one-step mode, both the Sync packets for time synchronization in Delay mode and
PDelay_Resp packets for time synchronization in PDelay mode are stamped with a sending
time.

Equipment
Asymmetric Correction
Theoretically, 1588v2 requires the packet transmission delays in two directions on a link to be
symmetrical. Otherwise, the algorithms of 1588v2 time synchronization cannot be
implemented. In practice, however, the packet transmission delays in two directions on a link
may be asymmetric due to the attributes of a link or a device. For example, if the delays
between receiving the packet and time-stamping the packet in two directions are different,
1588v2 provides a mechanism of asymmetric delay correction, as shown in Figure 13-30.
Figure 13-30 Asymmetric delay correction
Master clock
or
Responder
A B
t-sm t-ms
Slave clock
or
Requestor
Usually, t-ms is identical with t-sm. If they are different, the user can set a delay offset
between them as long as the delay offset is constant and obtainable. 1588v2 performs the time
synchronization calculation according to the asymmetric correction value. In this manner, a
high level of time synchronization accuracy can be achieved on an asymmetric-delay link.
Packet Encapsulation
1588v2 defines the following multiple packet encapsulation modes:
l Layer 3 unicast encapsulation through unicast UDP
The destination UDP port number is 319 or 320, depending on the types of 1588v2
packets.
Currently, it is recommended that Huawei base stations adopt Layer 3 unicast
encapsulation. The IP clock server consists of multiple BTSs and uses unicast UDP
packets to exchange 1588v2 protocol packets. Figure 13-31 shows Layer 3 unicast
encapsulation without VLAN tags.
Figure 13-31 Layer 3 unicast encapsulation without VLAN tags
1588
DA SA 0x800 IP(header) UDP(header)
packet
6Byte 6Byte 2Byte 20Byte 8Byte
Figure 13-32 shows Layer 3 unicast encapsulation with VLAN tags.

Equipment
Figure 13-32 Layer 3 unicast encapsulation with VLAN tags
Vlan--12bit
DA SA 0x8100 prority--3bit
IP(header) UDP(header) 1588 packet
6Byte 6Byte 2Byte 2Byte 20Byte 8Byte
l Layer 2 unicast encapsulation through a unicast MAC address
BITS Interface
1588v2 enables clock nodes to synchronize with each other, but cannot enable them to
synchronize with Greenwich Mean Time (GMT). If the clock nodes need to synchronize with
GMT, an external time source is required. That is, the GM needs to be connected to an
external time source to obtain the reference time in non-1588v2 mode.
Currently, the external time sources are from satellites, such as the GPS from the U.S.A,
Galileo from Europe, GLONASS from Russia, and Beidou from China. Figure 13-33 shows
how the GM and an external time source are connected.
Figure 13-33 Synchronization with an external time source
Grandmaster
1588v2
External
time port
ATN CX BITS
The ATN provides one type of external clock or time interfaces:
l RJ45 port (using a 120 Ohm shielded cable)

The two RJ45 ports function as an external clock port and an external time port
respectively, providing the following clock or time signals:
– 2 MHz clock signal (Differential level with one line clock input and one line clock
output)
– 2 Mbit/s clock signal (Differential level with one line clock input and one line clock
output)
– 1 pps + TOD time signal (RS422 differential level with one line time input)
– 1 pps + TOD time signal (RS422 differential level with one line time output)
Clock Synchronization
In addition to time synchronization, 1588v2 can be used for clock synchronization, that is,
frequency recovery can be achieved through 1588v2 packets.
1588v2 time synchronization in Delay or PDelay mode requires the device to periodically
send Sync packets to its peer.

Equipment
The sent Sync packet carries a sending timestamp. After receiving the Sync packet, the peer
adds a receiving timestamp to it. When the link delay is stable, the two timestamps change at
the same pace. If the receiving timestamp changes are faster or slower, it indicates that the
clock of the receiving device runs faster or slower than the clock of the sending device. In this
case, the clock of the receiving device needs to be adjusted. When this occurs, the frequencies
of the two devices are synchronized.
The frequency restored through 1588v2 packets has a lower accuracy than the frequency
restored through synchronous Ethernet. Therefore, it is recommended to perform frequency
synchronization through synchronous Ethernet and time synchronization through 1588v2.
1588v2 restores the frequency in the following modes:
l Hop-by-hop
In hop-by-hop mode, all devices on a link are required to support 1588v2. The frequency
recovery in this mode is highly accurate. In the case of a small number of hops, the
frequency recovery accuracy can meet the requirement of ITU-T G.813 (stratum 3
standard).
l End-to-end (Delay and jitter may occur on the transit network.)
In end-to-end mode, the forwarding devices do not need to support 1588v2, and the
delay of the forwarding path is only required to meet a specified level, for example, less
than 20 ms. The frequency recovery accuracy in this mode is low, and can meet only the
requirements of the G.8261 and base stations (50 pps) rather than that of the stratum 3
clock standard.
To achieve high frequency recovery accuracy, Sync packets must be transmitted at a high
frequency. For physical-layer frequency synchronization, at least 16 packets must be
transmitted per second; for 1588v2 frequency synchronization, at least 128 packets must be
transmitted per second.

Currently, 1588v2 is applicable to a link where all devices are 1588v2-capable.
Because a master clock has multiple slave clocks, it is recommended to use the BITS or IP
clock server as the master clock. It is not recommended to use any device as the master clock
because the CPU of the device may be overloaded.
1588v2 Clock Synchronization in E2E Mode
Figure 13-34 Networking diagram of 1588v2 clock synchronization in E2E mode
clock server
1588 1588
FE GE GE GE FE
Node B with Node B with
1588 1588

Equipment
As shown in Figure 13-34, clock servers and NodeBs exchange TOP-encapsulated 1588
messages over a QoS-enabled bearer network with the jitter being less than 20 ms.
Scenario description:
l NodeBs only need frequency synchronization.

l The bearer network does not support frequency recovery in synchronous Ethernet mode.
Solution description:
l The bearer network is connected to a wireless IP clock server and adopts 1588v2 clock
synchronization and frequency recovery in E2E mode.
l 1588v2 timing messages need to be transparently transmitted by priority over the bearer
network; the E2E jitter on the bearer network must be less than 20 ms.
l Advantage of the solution: Devices on the bearer network are not required to support
1588v2, and are therefore easily deployed.
l Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed. In practice, an E2E jitter of less than 20 ms is not ensured.
1588v2 Clock Synchronization in Hop-by-Hop Mode
Figure 13-35 Networking diagram of 1588v2 clock synchronization in hop-by-hop mode
BITS clock source/WAN link
1588 Synchronous
1588 WAN clock 1588
Ethernet
FE GE GE FE
Node B Node B
with 1588 Physical clock 1588 clock without 1588
signal transfer signal transfer
As shown in Figure 13-35, the clock source can send clock signals to NodeBs through the
1588v2 clock, synchronous Ethernet clock, or any combination of clocks.
l NodeBs only need frequency synchronization.

l GE links on the bearer network support the 1588v2 clock rather than the synchronous
Ethernet clock.

Equipment
l The Synchronous Digital Hierarchy (SDH) or synchronous Ethernet clock sends stratum
3 clock signals through physical links. On the GE links that do not support the
synchronous Ethernet clock, stratum 3 clock signals are transmitted through 1588v2.
l Advantage of the solution: The solution is simple and flexible.
l Disadvantage of the solution: Only frequency synchronization rather than time
synchronization is performed.
Bearer and Wireless Networks in the Same Clock Domain
Figure 13-36 Networking diagram of the bearer and wireless networks in the same clock
domain
GPS+BITS GPS+BITS
Node B
without 1588
1588 E1
1588
1588
FE GE GE
Node B BC BC BC BC FE
with 1588
Physical clock 1588 clock Node B
signal transfer signal transfer with 1588
l NodeBs need to synchronize time with each other.
l The bearer and wireless networks are in the same clock domain.
l The core node supports GPS or BITS clock interfaces.
l All nodes on the bearer network function as BC nodes, which support the link delay
measurement mechanism to handle fast link switching.
l Links or devices that do not support 1588v2 can be connected to devices with GPS or
BITS clock interfaces to perform time synchronization.
l Advantage of the solution: The time of all nodes is synchronous on the entire network.
l Disadvantage of the solution: All nodes on the entire network must support 1588v2.

Equipment
Terms
Terms Description
IEEE 1588v2, defined by the Institute of Electrical and Electronics Engineers

1588v2 (IEEE), is a standard for Precision Clock Synchronization Protocol for
PTP Networked Measurement and Control Systems. The Precision Time Protocol
(PTP) is used for short.
Clock Logically, a physical network can be divided into multiple clock domains. Each
domain clock domain has a reference time, with which all devices in the domain are
synchronized. Different clock domains have their own reference time, which is
independent of each other.
Clock Each node on a time synchronization network is a clock. The 1588v2 protocol
node defines three types of clocks: OC, BC, and TC.
Clock Clock source selection is a method to select reference clocks based on the
reference clock selection algorithm.
source
One-step In one-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode are stamped with the time when messages are sent.
Two-step In two-step mode, Sync messages in Delay mode and PDelay_Resp messages
mode in PDelay mode only record the time when messages are sent and carry no
timestamps. The timestamps are carried in the messages, such as Follow_Up
and PDelay_Resp_Follow_Up messages.
Abbreviations
1588v2 Precision Time Protocol
IP RAN Internet Protocol Radio Access Network
GSM Global System for Mobile communications
WCDMA Wideband Code Division Multiple Access
TD-SCDMA Time Division-Synchronous Code Division Multiple Access
WiMax FDD Worldwide Interoperability for Microwave Access Frequency

Division Duplex
WiMax TDD Worldwide Interoperability for Microwave Access Time Division

Duplex
GPS Global Position System
LTE Long Term Evolution
BC Boundary Clock

Equipment
OC Ordinary Clock
TC Transparent Clock
BMC Best Master Clock
BITS Building Integrated Time Supply System
13.4 1588 ACR

Table 13-4 ATN devices that support 1588 ACR
Device Description
ATN 910 ATN 910s with any control board support

1588 ACR.
ATN 910I Only the ATN 910I-C AC and ATN 910I-

TC DC support 1588 ACR.
ATN 910B Only the ATN 910B-A DC ,ATN 910B-D

DC,ATN 910B-E and ATN 910B-F supports
1588 ACR.
ATN 950B Only the ATN 950B (AND1CXPB/

AND1CXPA/AND2CXPE) supports 1588
ACR.
ATN 905 Only the ATN 905A-V and ATN 905A

support 1588 ACR.
13.4.1 Introduction to 1588 ACR

Definition
The 1588 adaptive clock recovery (ACR) algorithm is used to carry out clock (frequency)
synchronization between the ATN and clock servers by exchanging 1588v2 messages over a
clock link that is set up by sending Layer 3 unicast packets.
Unlike 1588v2 that achieves frequency synchronization only when all devices on a network
support 1588v2, 1588 ACR is capable of implementing frequency synchronization on a
network with both 1588v2-aware devices and 1588v2-unaware devices.
After 1588 ACR is enabled on a server, the server provides 1588 ACR frequency
synchronization services for clients.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints

Equipment
before migrating to IP packet-switched networks. Transmitting Time Division Multiplexing

(TDM) services over IP networks presents a major technological challenge. TDM services are
classified into two types: voice services and clock synchronization services. With the
development of VoIP, technologies of transmitting voice services over an IP network have
become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
1588v2 is a software-based technology that carries out time and frequency synchronization.
To achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if
not, frequency synchronization cannot be achieved.
Derived from 1588v2, 1588 ACR implements frequency synchronization with clock servers
on a network with both 1588v2-aware devices and 1588v2-unaware devices. Therefore, in the
situation where only frequency synchronization is required, 1588 ACR is more applicable
than 1588v2.
Benefits
l Frequency synchronization can be achieved on networks with both 1588v2-aware and

1588v2-unaware devices, reducing the costs of network construction.
l Operators can provide more services that can meet subscribers' requirements for
frequency synchronization.
13.4.2 Principles
13.4.2.1 Basic Mechanisms of 1588 ACR
Layer 3 Unicast Negotiation Mechanism

Layer 3 unicast negotiations can be enabled to carry out 1588 ACR frequency synchronization
as required. The principle of Layer 3 unicast negotiations is as follows:
A client initiates a negotiation with a server in the server list by sending a request to the
server. After receiving the request, the server replies with an authorization packet,
implementing a 2-way handshake. After the handshake is complete, the client and server
exchange Layer 3 unicast packets to set up a clock link, and then exchange 1588v2 messages
over the link to achieve frequency synchronization.
Duration Mechanism
On a 1588 ACR client, you can configure a duration for Announce, Sync, and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.

Equipment
Dual-Server Protection Mechanism

1588 ACR supports the configuration of double servers. Dual server protection is performed
as follows:
After the client triggers negotiation with a server, the client periodically checks the
negotiation result. If the client finds that the negotiation process fails or the server fails after
the client implements synchronization with the server, the client detects the negotiation status
change . If the client finds that servers are working properly during the query of the
negotiation result, the client selects a server to connect to based on the quality levels of the
servers.
When only one server is configured, the client re-attempts to negotiate with the server after a
negotiation failure. This allows a client to renegotiate with a server that is only temporarily
unavailable in certain situations, such as when the server fails and then recovers or when the
server is restarted.
13.4.2.2 Basic Principles of 1588 ACR

1588 ACR aims to synchronize frequencies of ATNs (clients) with those of clock servers
(servers) or ATN (Client) and the other device (Server).
1588 ACR sends Layer 3 unicast packets to establish a clock link between a client and a
server to exchange 1588v2 messages. 1588 ACR obtains a clock offset by comparing
timestamps carried in the 1588v2 messages, which enables the client to synchronize
frequencies with the server.
NOTE
ATN only supports client mode.
Process of 1588 ACR Clock Synchronization

1588 ACR implements clock (frequency) synchronization by adjusting time differences
between the time when the server sends 1588v2 messages and the time when the client
receives the 1588v2 messages over a link that is established after negotiations. The detailed
process is described as follows:
1588 ACR clock synchronization is implemented in two modes: one-way mode and two-way
mode.
l One-way mode

Equipment
Figure 13-37 Clock synchronization in one-way mode
Server clock Client clock
t1 Data obtained
by the client
clock
t2
t1' t1 t2
t2' t1' t2'
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
l Two-way mode

Equipment
Figure 13-38 Clock synchronization in two-way mode

Data obtained
t1 Sync by the client
clock
t2 t1 t2
t3 t1 t2 t3
Delay_Req
t4
t5
Delay_Resp t1 t2 t3 t4
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The same calculation method is used in two-way and one-way modes. t1 and t2 are compared
with t3 and t4. A group of data with less jitter is used for calculation. In the same network
conditions, the clock signals with less jitter in one direction can be traced, which is more
precise than clock signal tracing in one direction.
13.4.3 Applications
Typical Applications of 1588 ACR
On an IP RAN shown in Figure 13-39, NodeBs need to implement only frequency
synchronization rather than phase synchronization; devices on an MPLS backbone network do
not support 1588v2; the RNC-side device is connected to an IPCLK server; closed subscriber
groups (CSGs) support 1588 ACR.
NodeB1 transmits wireless services along an E1 link to a CSG, and NodeB2 transmits
wireless services along an Ethernet link to the other CSG.

Equipment
Figure 13-39 Networking diagram of 1588 ACR applications on a network

IP CLK
CSG
E1 RSG1
NodeB1
MPLS
Backbone
RNC
FE
IP CLK
NodeB2 CSG RSG2
1588v2 packet
line clock signal
NodeB service
On the preceding network, CSGs support 1588 ACR and function as clients to initiate
requests for Layer 3 unicast connections to the upstream IPCLK server. The CSGs then
exchange 1588 essages with the IPCLK server over the connections, achieving frequency
recovery. RSG1 and RSG2 are configured as clock servers for the CSGs to provide protection.
One CSG sends line clock signals carrying frequency information to NodeB1 along an E1
link. The other CSG transmits NodeB2 frequency information either along a synchronous
Ethernet link. In this manner, both NodeBs connected to the CSGs can achieve frequency
synchronization.
Typical Applications of 1588 ACR in the L3VPN Scenario

As shown in Figure 13-40, NodeBs need to implement only frequency synchronization rather
than phase synchronization. Devices on a level-2 backbone network do not support 1588v2.
The RNC-side device is connected to an IP CLK server. CSGs support 1588 ACR. When
1588 ACR packets need to traverse a VPN, the client and server must be configured on the
same VPN so that they can communicate with each other.
NodeB transmits wireless services to the other CSG along an Ethernet link.
Figure 13-40 Networking diagram of 1588 ACR applications in the L3VPN scenario
Access Aggregation RSG1

E1
IP CLK
Node CSG
B Level-2
Level-1
RNC
FE RSG2
e Node
B IP CLK

Equipment
On the preceding network, CSGs support 1588 ACR and function as clients to initiate
requests for Layer 3 unicast connections to the upstream IP CLK server. The CSGs then
exchange 1588 messages with the IP CLK server over the connections, achieving frequency
recovery. The two CSGs are configured as active and standby clock servers to provide
protection. Service flows are added with LSP labels and forwarded through PW on an MPLS.
The ATN adds LSP labels to locally generated and received packets.
l Downstream
The CXconnected to IP CLK servers and level-2 network implements service PWE
encapsulation. The CX, also functioning as the 1588 ACR server, generates 1588 ACR
packets and sends the 1588 ACR packets to level-2 network along with other service
packets through an Ethernet interface. Ethernet service packets, not clock packets, are
then transparently transmitted over level-2 network. The ATNA connected to the PSN
network receives and forwards the service packets to downstream devices. The ATNA,
also functioning as the 1588 ACR client, extracts 1588 ACR packets from the service
packets to restore the clock, implements hop-by-hop frequency synchronization in
synchronization Ethernet mode, and send the clock packets to the next hop ATNA. The
ATNA, connected to NodeB, extracts EI service from the service packets and sends the
clock packets to NodeB in E1 re-timing mode.
l Upstream
The ATNA connected to NodeB encapsulates service packets, adds LSP labels to them,
and sends them to ATNB that connects to level-2 network through an Ethernet interface.
Ethernet service packets, not clock packets, are then transparently transmitted over
level-2 network. The CXconnected to level-2 network receives and forwards the service
packets to upstream devices, and extracts 1588 ACR packets from the service packets as
the 1588 ACR server.
The CSG can send frequency information to NodeB and eNB using a synchronized Ethernet
clock line, implementing frequency synchronization at all NodeBs.
Terms
Term Description
Synchronizati On a modern communications network, in most cases, the proper

on functioning of telecommunications services requires network clock
synchronization, meaning that the frequency offset or time difference
between devices must be kept in an acceptable range. Network clock
synchronization includes frequency synchronization and time
synchronization.
Time Time synchronization, also called phase synchronization, refers to the

synchronizati consistency of both frequencies and phases between signals. This means
on that the phase offset between signals is always 0.
Frequency Frequency synchronization, also called clock synchronization, refers to a

synchronizati strict relationship between signals based on a constant frequency offset or
on a constant phase offset, in which signals are sent or received at the same

Equipment
Term Description
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.
Abbreviations
PTP Precision Time Protocol

1588v2
ACR Adaptive Clock Recovery
13.5 1588 ATR

1588 ATR Deployment Limitations
l The 1588-unaware intermediate network must have no more than three hops of Huawei
microwave devices or three Huawei switches. Other devices, such as routers, ATN
devices, and WDM devices, are not supported.
l Static delay asymmetric compensation for upstream and downstream paths is required.
Before the solution is used, measure and compensate the static asymmetry of upstream
and downstream paths for 1588 packets on both the primary and secondary servers.
l When LAG/Eth-trunk exists on the intermediate network, the fibers of all LAG/Eth-
trunk member links must have the same length. That is, the asymmetry of delay is fixed
regardless of changes. If not, the bottom line is that the fiber length difference between
member links should not exceed 50 meters and the asymmetry of delay between member
links is within 250m.
l The intermediate network must support QoS, and 1588 packets must have the highest
priority. The intermediate network identifies the priority of 1588 packets to ensure no
1588 packet dropping.
l On a microwave network, frequency synchronization, but not time synchronization, is
supported, requiring hop-by-hop physical layer frequency synchronization and PTP time
synchronization. On a switch network, frequency synchronization is generally not
supported. PTP packets are used to restore both frequency and time information.
l Colored optical interface modules do not support end-to-end 1588 time synchronization.
l 1588V2/G.8275.1-related commands cannot be run on the interface for receiving 1588
ATR packets.

Equipment
1588 ATR Deployment Requirements for the Intermediate Network

l Delay and Jitter Requirements for the Intermediate Network
If the time offset output by end-to-end 1588 time synchronization clients is <= +/- 1.1 us,
the requirements for the intermediate network are as follows:
– Atom GPS/BITS + end-to-end 1588 time synchronization: With 200-second
sampling of client-side PDV (two-way time error) and the use of the 1.5 mhz filter,
the network PDV should be <= +/- 800 ns (1.1 us – 200 ns from client noise –
100s ns of time source offset).
– BITS + hop-by-hop 1588v2 + end-to-end 1588 time synchronization: The reserved
time offset for hop-by-hop 1588 should be <= +/- 300 ns (which is relevant to the
actual offset and the number of hops). With 200-second sampling of client-side
PDV and the use of the 1.5 mhz filter, the network PDV should be <= +/- 500 ns
(1.1 us – 300 ns of hop-by-hop 1588 network offset – 200 ns from client noise –
100 ns of time source offset).
– If the network frequency mode is SyncE (multi-hop), the MTIE of the client
compared to the server within 1000s should be <= 200 ns.
Remarks: The evaluation time for the intermediate network should be longer than 1 day.
l Other Requirements for the Intermediate Network
– The long-term network traffic of each link should be <= 80%. The service
interruption or network congestion time of the intermediate network should be <=
100s. The interval for service interruption or network congestion should be >=
900s. The link switching interval for the intermediate network should be >= 900s.
– The proportion of 1518-byte packets to background traffic cannot be greater than
90% (test result).
– 1588 packets have the highest priority.
– The server/client node can send and receive native IP packets only, with MPLS
encapsulation not supported.
– The uplink path and downlink path for 1588 packet transmission over the
intermediate network must be consistent.
– Static asymmetry of delay of upstream and downstream paths for 1588 packet
transmission requires measurement and compensation.
– The path between a server and a client must be unique. The client independently
performs asymmetric compensation for each server.
NOTE
The preceding networks are a summary of the general experience for availability evaluation on
network deployment of the end-to-end 1588 time synchronization solution, instead of sufficient
conditions for stable running of end-to-end 1588 time synchronization. The general experience can
be used for preliminary evaluation.
13.5.1 Introduction to 1588 ATR
Definition
The 1588 Adaptive Time Recovery (ATR) algorithm is used to carry out time synchronization
between the clock clients and clock servers by exchanging 1588v2 messages over a clock link
that is set up by sending Layer 3 unicast packets.

Equipment
Unlike 1588v2 that achieves time synchronization only when all devices on a network support
1588v2, 1588 ATR is capable of implementing time synchronization on a network with both
1588v2-aware devices and 1588v2-unaware devices.
1588 ATR is a client/server protocol through which servers communicate with clients to
achieve time synchronization.
Purpose
All-IP has become the trend for future networks and services. Therefore, traditional networks
based on the Synchronous Digital Hierarchy (SDH) have to overcome various constraints
before migrating to IP packet-switched networks. Transmitting Time Division Multiplexing
(TDM) services over IP networks presents a major technological challenge. TDM services are
classified into two types: voice services and clock synchronization services. With the
development of VoIP, technologies of transmitting voice services over an IP network have
become mature and have been extensively used. However, development of technologies of
transmitting clock synchronization services over an IP network is still under way.
1588v2 is a software-based technology that carries out time and time synchronization. To
achieve higher accuracy, 1588v2 requires that all devices on a network support 1588v2; if not,
time synchronization cannot be achieved.
To address this disadvantage, 1588 ATR is introduced to allow time synchronization over a
third-party network that includes 1588v2-incapable devices. On the live network, 1588v2 is
preferred for 1588v2-capable devices, and 1588 ATR is used when 1588v2-incapable devices
exist.
Benefits
l Does not require 1588v2 to be supported by all network devices, reducing network
construction costs.
l Operators can provide more services that can meet subscribers' requirements for time
synchronization.
13.5.2 Principles
13.5.2.1 Basic Mechanisms of 1588 ATR
Layer 3 Unicast Negotiation Mechanism

Enable Layer 3 unicast negotiation before 1588 ATR time synchronization is performed. The
implementation of Layer 3 unicast negotiation is as follows:
A client initiates a negotiation request with a server. The server replies with an authorization
packet to implement handshake. After the handshake succeeds, the client and server establish
a clock link through Layer 3 unicast packets. Then, the client and server exchange PTP
packets to implement time synchronization over the clock link.

Equipment
Duration Mechanism
On a 1588 ATR client, you can configure a duration for Announce, Sync, and delay_resp
packets. The duration value is carried in the TLV field of a packet for negotiating signaling
and sent to a server.
Generally, the client sends a packet to renegotiate with the server before the duration times
out so that the server can continue to provide the client with synchronization services.
If the link connected to the client goes Down or fails, the client cannot renegotiate with the
server. When the duration times out, the server stops sending Sync packets to the client.
Dual-Server Protection Mechanism

1588 ATR supports the configuration of double servers. Dual server protection is performed
as follows:
After the client triggers negotiation with a server, the client periodically checks the
negotiation result. If the client finds that the negotiation process fails or the server fails after
the client implements synchronization with the server, the client detects the negotiation status
change . If the client finds that servers are working properly during the query of the
negotiation result, the client selects a server to connect to based on the quality levels of the
servers.
When only one server is configured, the client re-attempts to negotiate with the server after a
negotiation failure. This allows a client to renegotiate with a server that is only temporarily
unavailable in certain situations, such as when the server fails and then recovers or when the
server is restarted.
13.5.2.2 Basic Principles of 1588 ATR

1588 ATR aims to synchronize time of clock clients (clients) with those of clock servers
(servers).
1588 ATR sends Layer 3 unicast packets to establish a time link between a client and a server
to exchange 1588v2 messages. 1588 ATR obtains a time offset by comparing timestamps
carried in the 1588v2 messages, which enables the client to synchronize time with the server.
NOTE
The ATN 910B-D support 1588 ATR client mode.

The ATN 910B support 1588 ATR server mode.
The ATN 950B(AND2CXPA/AND2CXPB/AND2CXPE) support 1588 ATR server mode.
Process of 1588 ACR Clock Synchronization

1588 ACR implements clock (frequency) synchronization by adjusting time differences
between the time when the server sends 1588v2 messages and the time when the client
receives the 1588v2 messages over a link that is established after negotiations. The detailed
process is described as follows:
1588 ACR clock synchronization is implemented in two modes: one-way mode and two-way
mode.
l One-way mode

Equipment
Figure 13-41 Clock synchronization in one-way mode
t1 Data obtained
by the client
clock
t2
t1' t1 t2
t2' t1' t2'
a. The server sends the client 1588v2 messages at t1 and t1' and time-stamps the
messages with t1 and t1'.
b. The client receives the 1588v2 messages at t2 and t2' and time-stamps the messages
with t2 and t2'.
t1 and t1' are the clock time of the server, and t2 and t2' are the clock time of the client.
By comparing the sending time on the server and the receiving time on the client, 1588
ACR calculates a frequency offset between the server and client and then implements
frequency synchronization. For example, if the result of the formula (t2 - t1)/(t2' - t1') is
1, frequencies on the server and client are the same; if not, the frequency of the client
needs to be adjusted so that it is the same as the frequency of the server.
l Two-way mode

Equipment
Figure 13-42 Clock synchronization in two-way mode

Data obtained
t1 Sync by the client
clock
t2 t1 t2
t3 t1 t2 t3
Delay_Req
t4
t5
Delay_Resp t1 t2 t3 t4
a. The server clock sends a 1588 sync packet carrying a timestamp t1 to the client
server at t1.
b. The client server receives a 1588 sync packet from the server clock at t2.
c. The client clock sends a 1588 delay_req packet to the server clock at t3.
d. The server clock receives the 1588 delay_req packet from the client clock at t4, and
sends a delay_resp packet to the slave clock.
The round-trip latency of the link between the server and client is (t4-t1)-(t3-t2). 1588 ATR
requires the same link latency on two links involved in the same round trip. Therefore, the
offset of the client is t2-t1-[(t4 - t1) - (t3 - t2)]/2 = [(t2 - t1) -(t4 - t3)]/2, compared to the time
of the server. The client then uses the calculation result to adjust its local time.
13.5.3 Applications
Typical Applications of 1588 ATR
On the IP RAN shown in the following figure, time synchronization needs to be performed
between NodeBs, but the third-party network (such as a microwave or switch network) does
not support 1588v2.
If the third-party network supports frequency synchronization but not time synchronization,
frequency is restored at the physical layer hop by hop, and time is restored using 1588 ATR. If
the third-party network does not support frequency synchronization or time synchronization,
1588 ATR is used for frequency and time synchronization.

Equipment
Figure 13-43 1588 ATR time synchronization
Client 1
Server 1
NodeB 1 Master CLK 1
Server 2 CLK 2
NodeB 2 Client 2
The ATN device can function as a boundary clock (BC) to restore time from the upstream
device and also function as a master to implement E2E 1588 time synchronization with the
downstream slave device. The slave device can request time synchronization packets from the
master device to implement time synchronization and function as a BC to provide the hop-by-
hop time synchronization information to downstream devices. The slave device can
implement physical-layer hop-by-hop frequency recovery or 1588 ATR-based frequency
recovery. The slave device can negotiate with multiple master devices to implement time
source backup and protection switching.

Terms
Term Description
Synchronizati On a modern communications network, in most cases, the proper

on functioning of telecommunications services requires network clock
synchronization, meaning that the frequency offset or time difference
between devices must be kept in an acceptable range. Network clock
synchronization includes frequency synchronization and time
synchronization.
Time Time synchronization, also called phase synchronization, refers to the

synchronizati consistency of both frequencies and phases between signals. This means
on that the phase offset between signals is always 0.
Frequency Frequency synchronization, also called clock synchronization, refers to a

synchronizati strict relationship between signals based on a constant frequency offset or
on a constant phase offset, in which signals are sent or received at the same
IEEE 1588v2 1588v2, defined by the Institute of Electrical and Electronics Engineers
PTP (IEEE), is a standard for Precision Clock Synchronization Protocol for
Networked Measurement and Control Systems. The Precision Time
Protocol (PTP) is used for short.

Equipment
Abbreviations

1588v2
ATR Adaptive Time Recovery
13.6 CES ACR/DCR

Table 13-5 ATN devices that support CES ACR
Device Description
ATN 905 CES ACR is applicable only to the ATN

905-E.
ATN 910 CES ACR is supported when any MPU and

an E1 interface board work together.
ATN 910I CES ACR is applicable only to the ATN

910I-A DC , ATN 910I DC , ATN 910I-C
AC, ATN 910I-TC DC, and ATN 910I-E.
ATN 910B CES ACR is applicable only to the ATN

910B-A DC.
ATN 950B CES ACR is supported when any MPU and

an E1 interface board (excluding
AND3MD1) work together.
Table 13-6 ATN devices that support CES DCR
Device Description
ATN 905 CES DCR is applicable only to the ATN

905-E.
ATN 910B CES DCR is applicable only to the ATN

910B-A DC.
ATN 950B CES DCR is applicable only to the ATN

950B (AND2CXPB/AND2CXPE) with
AND3MD1.

Equipment
13.6.1 Introduction
Definition
Circuit emulation service (CES) clock synchronization implements adaptive clock frequency
synchronization and asynchronous clock frequency synchronization based on CESs. CES
clock synchronization uses special circuit emulation headers to encapsulate time multiplexing
service (TDM) packets that carry clock frequency information and transmits these packets
over a packet switched network (PSN).
Purpose
If a clock frequency is out of the allowed error range, problems such as bit errors and jitter
occur. As a result, network transmission performance deteriorates. Clock synchronization
confines the clock frequencies of all network elements (NEs) on a digital network to the
allowed error range, enhancing network transmission stability.
When the intermediate PSN does not support clock synchronization at the physical layer and
needs to transmit clock frequency information using TDM services of the CES .
13.6.2 Principles
CES
The circuit emulation service (CES) technology originated from the asynchronous transfer
mode (ATM) network. CES uses emulated circuits to encapsulate circuit service data into
ATM cells and transmits these cells over the ATM network. Circuit emulation was later used
on the metro Ethernet network to transparently transmit circuit switched services like TDM.
CES uses special circuit emulation headers to encapsulate TDM service packets that carry
clock information and transmits these packets over a PSN.
CES ACR
The CES technology generally uses the adaptive clock recovery (ACR) algorithm to
synchronize clock frequencies. If an Ethernet transmits TDM services over emulated circuits,
the Ethernet uses the ACR algorithm to extract clock synchronization information from data
packets.
Clock Recovery Domain

A clock recovery domain refers to a channel of clock signals that can be recovered on a client.

CES ACR
As shown in Figure 13-44, when the intermediate PSN does not support clock
synchronization at the physical layer and needs to transmit clock frequency information using
TDM services of the CES ACR. The detailed process is described as follows:

Equipment
1. The clock source sends clock frequency information to the CE1.

2. The CE1 encapsulates clock frequency information into TDM service packets sends to
gateway IWF1.
3. Gateway IWF1 that connects to the master clock regularly sends service clock
information to gateway IWF2 that connects to the slave clock. The service clock
information is coded using sequence numbers or timestamp. The service clock
information is encapsulated into E1 service packets for transmission.
4. IWF2 extracts the clock sequence number or timestamp from E1 emulation packets and
recovers clock information using the adaptive clock recovery algorithm. In this manner,
IWF2 synchronizes its local clock to the master clock and the local clock of IWF1.
Figure 13-44 Working principles of CES-based ACR

BITS
PW
E1 E1
PSN
TDM TDM
IWF1 IWF2
CE1
CES DCR
As shown in Figure 13-44, if the PSN does not support physical-layer clock synchronization,
clock information is sent through TDM packets in CES DCR mode. Details are as follows:
1. The clock source sends clock frequency information to the CE1.
2. The CE1 encapsulates clock frequency information into TDM service packets sends to
gateway IWF1.
3. The master gateway IWF1 and the slave gateway IWF2 obtain the common clock for
CES DCR using clock source 2 and clock source 3, respectively. Clock source 2 and
clock source 3 are connected to the same upper-layer clock source.
4. The master gateway IWF1 periodically sends the slave gateway IWF2 the service clock
information that is contained in a E1 simulation packet and expressed as a sequence
number or timestamp.
5. The slave gateway IWF2 extracts timestamp from the E1 simulation packet. The
common clock first restores clock information based on the timestamp and then the
service clock information using differential algorithm. From long-term perspective, the
local clock extracted from the slave gateway IWF2 is the same as the source clock. In
this manner, frequency synchronization is implemented between two IWFs on the PSN.

Equipment
Figure 13-45 Working principle of CES DCR
BITS1
PW
E1 E1
PSN
TDM TDM
IWF2
IWF1
CE1
Sync Sync
network network
BITS2 BITS3
13.6.3 Applications
CES ACR is used in scenarios in which the intermediate PSN does not support clock
synchronization at the physical layer and needs to transmit clock frequency information using
TDM services.
Figure 13-46 Applications of CES-based ACR
BITS
PW
E1 E1
PSN
TDM TDM
IWF1 IWF2
CE1
As shown in Figure 13-46, the clock source sends clock frequency information to a CE. The
CE encapsulates clock frequency information into TDM service packets and transmits these
packets over the intermediate PSN to the peer CE. CES ACR recovers clock frequency
information at the IWF connected to the peer CE. In practical application, multiple E1
interfaces can belong to the same clock recovery domain. By default, the system selects a PW
as the primary PW and uses the primary PW to recover clock signals. If the primary PW fails,
the system selects the next available PW as the primary PW to recover clocks. In this manner,
clock protection among multiple PWs is implemented.

Equipment
Abbreviations
CES Circuit Emulation Service
ACR Adaptive Clock Recovery
DCR Differential Clock Recover
13.7 G.8275.1
13.7.1 Introduction
Definition
G.8275.1 is a precise time synchronization standard defined by International Telecom Union -
Telecommunication Standardization Sector (ITU-T).
Similar to IEEE 1588, G.8275.1 implements network-wide time synchronization. However,

unlike IEEE 1588, ITU-T G.8275.1 is only used in the telecommunications field. ITU-T G.
8275.1 implements both high-precision time synchronization and clock synchronization
between devices. IEEE 1588 is also called the Precision Time Protocol (PTP), which is
defined by the Institute of Electrical and Electronics Engineers (IEEE).
Purpose
Because datacom networks do not require time or clock synchronization, routers on such
networks do not implement time or clock synchronization. To meet requirements of base
stations on an IP radio access network (RAN), routers used on the IP RAN must implement
time and clock synchronization. Clock synchronization between base stations is necessary for
the IP RAN. Frequency asynchrony leads to call drops when wireless terminals switch
between base stations. In addition to frequency synchronization (clock synchronization), some
wireless standards require phase synchronization (time synchronization). Table 13-7 lists time
and clock synchronization requirements of wireless standards.
Table 13-7 Time and clock synchronization requirements of wireless standards
Wireless Standard Precision Precision Requirement for Phase

Requirement Synchronization
for Frequency
Synchronizat
ion
GSM 0.05 ppm N/A
WCDMA 0.05 ppm N/A

Equipment
Wireless Standard Precision Precision Requirement for Phase

Requirement Synchronization
for Frequency
Synchronizat
ion
TD-SCDMA 0.05 ppm ±1.5 us
CDMA2000 0.05 ppm ±3 us
LTE-FDD 0.05 ppm N/A
LTE- Cell RADIUS <= 0.05 ppm ±1.5 us

TDD 3 km, <= 3 us
Cell RADIUS > 3 ±5 us

km, <= 10 us
LTE-A eMBMS/eICIC 0.05 ppm ±1.5 us
COMP Used in CloudBB scenarios and has no

phase synchronization requirements on
network devices.
CA Downstream CA: ±1.5 us

Upstream CA has the following
requirements on phase synchronization
between antenna ports connecting
communities:
l < 130 ns (intra-band contiguous)
l < 260 ns (intra-band non-contiguous
and inter-band)
Has no phase synchronization
requirements on network devices.
Due to few usage scenarios of inter-site
upstream CA, phase synchronization
requirements have to be further defined in
the future.
The requirements for clock synchronization on base stations that support different standards
can be met using multiple methods, such as physical clocks (including external clock input
and synchronous Ethernet) and packet-based recovery clocks (including NTP and 1588v2).
Traditionally, base stations are connected only to the global positioning system (GPS) to
implement time synchronization. Packet-based time synchronization that the Network Time
Protocol (NTP) and IEEE 1588v1 support cannot meet requirements of base stations. Time
synchronization can reach only sub-second precision using NTP and sub-millisecond
precision using IEEE 1588v1. The cost of GPS installation and maintenance is high. In
addition, because the GPS relies on satellites owned by different countries, communication
security cannot be guaranteed.
To resolve the preceding problems, ITU-T G.8275.1 can be used to implement
submicrosecond time synchronization. ITU-T G.8275.1, with hardware processing supported,

Equipment
is cost effective and less dependent on the GPS, mitigating some of the security concerns
about the GPS.
Benefits
ITU-T G.8275.1 offers the following benefits:
l Implements high-precision clock synchronization.
l Reduces the costs of construction and maintenance of a wireless network with time
synchronization implemented.
l Enhances communication security, with time and clock synchronization independent of
the GPS.
13.7.2 Principles

This section describes basic concepts related to ITU-T G.8275.1.
Synchronization
Most telecommunication services running on a modern communications network require
network-wide synchronization with the frequency offset or time difference between devices
remaining in a specified range. Network clocks implement the following synchronization:
l Clock synchronization
Also called frequency synchronization. There is a constant frequency or phase offset
between signals. Signals are sent or received at the same average rate in any given period
of time so that all devices on a communications network can operate at the same rate.
The difference between signal phases is a constant value.
l Time synchronization
Also called phase synchronization. The frequency offset and phase offset between
signals are always 0.

Equipment
Figure 13-47 Time and clock synchronization
Phase synchronization
Watch A
Watch B
Frequency synchronization
Watch A
Watch B
Figure 13-47 illustrates time and clock synchronization between Watch A and Watch B. In
time (phase) synchronization, Watch A and Watch B always keep the same time. In clock
(frequency) synchronization, Watch A and Watch B keep different time, but the time
difference between the two watches is a constant value, for example, 6 hours.
Clock Domain
A physical network can be logically divided into multiple clock domains. Each clock domain
has its own independent synchronous time, with which clocks in the same domain
synchronize. Each device can be added only to a single clock domain. Devices in the same
clock domain have the same domain ID.
Clock Nodes
ITU-T G.8275.1 defines the following clock nodes:
NOTE
An ATN device can only function as a T-BC node.
l Telecom grandmaster (T-GM)
A T-GM can only function as a master clock. It has one or more PTP ports and is unable
to trace other PTP clocks.
l Telecom boundary clock (T-BC)
A T-BC can function as either a master or slave clock. As a slave clock, the T-BC traces
other PTP clocks.

Equipment
l Telecom time slave clock (T-TSC)

A T-TSC can function as a slave clock, but cannot function as a master clock.
Figure 13-48 shows the locations of the three types of clocks on a time synchronization
network.
Figure 13-48 Locations of the three types of clocks on a time synchronization network.
M S M S M S
T-GM T-BC T-BC T-TSC
Packet Encapsulation
ITU-T G.8275.1 defines untagged Layer 2 multicast encapsulation. Figure 13-49 shows an
untagged Layer 2 multicast packet.
l EtherType field: set to 0x88F7.
l Source MAC address (SA) field: set to the MAC address of a device that sends a packet.
l Destination MAC address (DA) field: set to either the unforwardable multicast MAC
address 01-80-C2-00-00-0E or the forwardable multicast MAC address
01-1B-19-00-00-00.
l PTP message: ITU-T G.8275.1 defines five types of messages: Announce, Sync,
Delay_Req, Delay_Resp, and Follow_Up.
Figure 13-49 Format of an untagged Layer 2 multicast packet
DA SA 0x88f7 G.8275.1 Packet
6 Byte 6 Byte 2 Byte
Packet Types
ITU-T G.8275.1 defines five message types: Announce, Sync, Delay_Req, Delay_Resp, and
Follow_Up. Table 13-8 outlines five types of ITU-T G.8275.1 messages and their functions.
Table 13-8 ITU-T G.8275.1 message types and their functions
Packet Function
Type
Announce To select the master and slave clocks, clock nodes send Announce messages
to one another to exchange time source information, including the priority of
the grandmaster, SSM level, time precision, and the number of hops to the
grandmaster.

Equipment
Packet Function
Type
Sync The master clock sends Sync messages timed-stamped with t1 to a slave
clock.
Sync messages can be sent in either of the following modes:
l one-step: A Sync message carries the timestamp of when the message was
sent.
l two-step: A Sync message records the time when it is sent, but does not
carry a transmission timestamp. A Follow-Up message sent after the Sync
message carries the timestamp of when the Sync message was sent.
Delay_Re A slave clock sends Delay_Req messages time-stamped with t3 to the master
q clock. The t3 timestamp records the time when the Delay_Req message was
sent.
Delay_Res The master clock sends Delay_Resp messages carrying a timestamp t4 and
p the requested interface ID to a slave clock. The t4 timestamp records the time
when a Delay_Resp message was sent.
Follow_U Follow_Up messages are used only in two-step mode. The master clock sends
p a Sync message and then a Follow_Up message time-stamped with t1 to a
slave clock.
Asymmetric Correction
ITU-T G.8275.1 requires symmetric delays in opposite directions of a link. Asymmetric
delays cause time synchronization algorithms to fail. The delays in opposite directions,
however, are asymmetric due to link problems or device processing issues. ITU-T G.8275.1
provides the asymmetric delay correction mechanism, as shown in Figure 13-50.
Figure 13-50 Asymmetric delay correction mechanism
Master clock
or
Responder
A B
t-sm t-ms
Slave clock
or
Requestor
Both t-sm and t-ms are unidirectional delays, and t-ms is expected to be the same as t-sm. If
they are different, an asymmetric correction value can be set to compensate for the
asymmetric delays. In time synchronization calculation, the asymmetric correction value is

Equipment
used to ensure the precision of time synchronization even if delays on the forward and reverse
links are different.
13.7.2.2 Clock Synchronization Principles

ITU-T G.8275.1 can implement clock (frequency) synchronization.
Overview
In ITU-T G.8275.1 time synchronization, the master clock periodically sends Sync messages
carrying timestamps to a slave clock. Upon receipt of the Sync messages, the slave clock
records the timestamps. When the link delay is stable, the sending and receiving timestamps
change at the same pace. If the receiving timestamp is changed faster or slower than the
sending timestamp, the clock on the receiving device runs faster or slower than the clock on
the sending device. In this case, the local clock on the receiving device must be adjusted. This
process helps two devices synchronize to the same frequency.
NOTE
The frequencies restored using G.8275.1 messages have lower precision than those on a synchronous
Ethernet network. It is recommended that you use synchronous Ethernet to implement clock synchronization
and ITU-T G.8275.1 to implement time synchronization.
Synchronization Mechanism
ITU-T G.8275.1 restores frequencies by using packets on each hop.
Per-hop packet-based frequency recovery can be implemented only when all devices on a path
support ITU-T G.8275.1. On a path with only a few hops, ITU-T G.8275.1 can restore
Stratum 3 frequencies that meet the ITU-T G.813 standard.
To achieve high frequency recovery precision, ITU-T G.8275.1-enabled devices must send
Sync messages at a minimum rate of 128 messages every second.
13.7.2.3 Time Synchronization Principles

ITU-T G.8275.1 can implement time synchronization in Delay mode, which involves both the
delay measurement mechanism and time synchronization mechanism.
Overview
On an ITU-T G.8275.1 time synchronization network, all clocks establish master/slave
synchronization relationships with one another. The grandmaster clock, at the highest level in
the hierarchy, is used as the system reference clock. The synchronization topology is
automatically generated using the best master clock (BMC) algorithm. Clock nodes exchange
time source information, select the grandmaster clock, and determine which local ports to
receive clock signals sent by the grandmaster clock. With the BMC algorithm, a loop-free
meshed tree-shaped network is built and rooted at the grandmaster clock. In addition, a master
node periodically sends packets to its slave nodes. If the slave nodes do not receive packets
sent by the master node within a specific period of time, the slave nodes consider the master/
slave relationships invalid and start to re-select a master node that provides a clock source.
Synchronization Mechanism
The time synchronization principles of ITU-T G.8275.1 are the same as those of IEEE
1588v2. The master and slave nodes send and receive timing packets to and from each other.

Equipment
Based on the receiving and sending timestamps in the timing packets, the total delay in
bidirectional transmission can be calculated. If the delays in opposite directions are the same,
the total delay divided by 2 is equal to the unidirectional delay, which is the time difference
between the slave and master nodes. Then, the slave node corrects the local time based on the
time difference. The slave node is then synchronized with the master node.
Time synchronization precision, however, is low due to the variation jitter on an existing
network and different delays in opposite directions on a link. For example, NTP implements
time synchronization only with precision that ranges from 10 ms to 100 ms. In addition,
software on a control board of the ATN runs NTP, which means that software processing is
also involved in NTP delay calculation. NTP calculates communication delays, including the
link delay and internal processing delays stemming from queuing, software invoking, and
software processing. The variation jitter can be large, and the delays in opposite directions on
a link are asymmetric. As a result, time synchronization precision cannot meet requirements.
Unlike NTP, ITU-T G.8275.1 assumes that the link delay is a constant value (or a trivial value
that can be ignored between synchronization processes), and delays in opposite directions
along a link are the same. In this case, the link delay can be measured using timestamps on
two ends of a link to implement most precise time synchronization.
The Delay mode is used in E2E delay measurement to repeatedly implement synchronization.
Figure 13-51 shows the flowchart for ITU-T G.8275.1 E2E delay measurement in Delay
mode.
Figure 13-51 ITU-T G.8275.1 E2E delay measurement in Delay mode
Master Slave
time time
Timestamps
known by slave
t1
Syn
t-ms
t2 t2
Follow_Up
t1, t2
t3 t1, t2, t3
Delay_Req
t-sm
t4
Delay_Resp
t1, t2, t3, t4

Equipment
NOTE
In Figure 13-51, t-sm and t-ms are delays in opposite directions. In the following example, the two
delay values are the same. If they are different, the asymmetrical delay correction mechanism can be
used to compensate for the asymmetric delay.
In the following example, the one-step mode is used. Follow-Up messages are not used in one-step
mode because they are only used in two-step mode.
The process of implementing time synchronization in Delay mode is as follows:

1. A master node periodically sends a Sync message carrying the sending timestamp t1 to
the slave node. Upon receipt of the Sync message, the slave node generates the receiving
timestamp t2.
2. The slave node periodically sends a Delay_Req message to the master node and records
the sending timestamp t3. Upon receipt of the Delay_Req message, the master node
generates the receiving timestamp t4 and sends the slave node a Delay_Resp message
carrying t4.
3. The slave node calculates the following parameters:
– Sum of delays in opposite directions along the link between the master and slave
nodes = (t4 - t1) - (t3 - t2)
– Unidirectional delay along the link between the master and slave nodes (in
symmetric delays in opposite directions) = [(t4 - t1) - (t3 - t2)]/2
– Time difference between the slave and master nodes = [(t2 - t1) - (t4 - t3)]/2
4. The slave node uses the calculated results to adjust its local time to synchronize with the
master node.
13.7.3 Applications
Service Description
To meet clock and time synchronization requirements of base stations on an IP RAN, routers
on bearer networks must support clock and time synchronization. ITU-T G.8275.1 is a per-
hop synchronization protocol used only in the Telecom field. ITU-T G.8275.1 implements
high-precision clock and time synchronization.
Per-hop clock synchronization
NodeBs need to perform clock synchronization; however, a bearer network does not support
the synchronous Ethernet technique. ITU-T G.8275.1 can be configured to perform clock
synchronization between NodeBs (configuration on each hop is not required). Clock
information can be sent by a clock source to a destination node using any combinations of
1588, and synchronous Ethernet clocks. As shown in Figure 13-52, nodes A and B implement
clock synchronization.

Equipment
Figure 13-52 Per-hop clock synchronization networking

GPS+BITS Node C without
G.8275.1
G.8275.1 G.8275.1 E1
GE GE G.8275.1
FE
T-BC T-BC T-GM T-BC FE
Node A with
G.8275.1
Node B with
G.8275.1clock G.8275.1
signal transfer
Physical clock signal transfer
Time synchronization on a mobile bearer network

NodeBs need to implement time synchronization. To meet this requirement, ITU-T G.8275.1
needs to be configured on devices on a bearer network. In this case, each hop must support
ITU-T G.8275.1, and devices that do not support G.8275.1 cannot obtain time information.
As shown in Figure 13-53, nodes A and B can implement time synchronization, whereas
nodes A and C cannot.
Figure 13-53 Time synchronization on a mobile bearer network

GPS+BITS Node C without
G.8275.1
G.8275.1 G.8275.1 E1
FE GE GE G.8275.1
Node A with T-BC T-BC T-GM T-BC FE

G.8275.1
Node B with
G.8275.1clock G.8275.1
signal transfer
Physical clock signal transfer

Equipment
Terms
Term Description
ITU-T G. A standard entitled precision time protocol telecom profile for phase/time
8275.1 synchronization with full timing support from the network, defined by the ITU-
T.
IEEE A standard entitled Precision Clock Synchronization Protocol for Networked

1588v2 Measurement and Control Systems, defined by the Institute of Electrical and
PTP Electronics Engineers (IEEE). It is also called the Precision Time Protocol
(PTP).
Clock A physical network can be logically divided into multiple clock domains. Each
domain clock domain has its own independent synchronous time, with which clocks in
the same domain synchronize.

Abbreviation
BITS building integrated time supply system
BMC best master clock
GPS global positioning system
GSM global system for mobile communications
IP RAN Internet Protocol radio access network
T-BC Telecom boundary clock
T-GM Telecom grandmaster
T-TSC Telecom time slave clock
TD-SCDMA Time Division-Synchronous Code Division Multiple Access
WCDMA Wideband Code Division Multiple Access
WiMax FDD Worldwide Interoperability for Microwave Access Frequency

Division Duplex
WiMax TDD Worldwide Interoperability for Microwave Access Time Division

Duplex
13.8 Atom GPS Timing

Equipment
13.8.1 Introduction
Definition
The rapid commercial deployment of Long Term Evolution (LTE) Time Division Duplex
(TDD) and LTE-Advanced (LTE-A) drives the need for time synchronization of base stations.
Two time synchronization solutions are commonly used: one solution is to directly connect
base stations to the Global Positioning System (GPS) and the other solution is to obtain the
Precision Time Protocol (PTP) time from the network.
If base stations connect directly to the GPS, each base station must pay GPS deployment
costs. The total cost of ownership (TCO) therefore increases as the number of base stations
increases. If base stations obtain PTP time from the network, the entire network must support
PTP time synchronization, which renders high network-wide reconstruction costs.
Using the GPS solution also has additional limitations. For example, the GPS antenna must be
installed outdoors and positioned to receive signals from GPS satellites. Long feeders must
therefore be used to connect to devices that are deployed indoors, and holes must be drilled
through walls in order to route these feeders indoors. In addition, requirements such as
lightning protection must be considered when selecting antenna sites. These conditions make
it difficult and costly to deploy GPS antennas for indoor devices. Furthermore, rented indoor
equipment rooms may have restrictions in place that prevent or strictly control through-wall
installation of cables, and obtaining permissions for such installation may be complex. For
example, Japanese law does not allow GPS radio frequency (RF) cables to be installed from
outdoor to indoor.
To address this situation, the ATN provides Atom GPS timing. It uses a built-in AE 905S
module that provides GPS access. This module functions as a lightweight building integrated
timing supply (BITS) to receive clock and time signals from the GPS and converts them into
synchronous Ethernet (SyncE) signals and 1588v2 signals (PTP time signals), respectively.
The AE 905S module then outputs the signals to the ATN, which in turn synchronizes SyncE
clock and PTP time to all base stations connected to the ATN. This feature greatly reduces the
TCO for clock and time synchronization.
Benefits
Atom GPS timing offers the following benefits to carriers:
l Time synchronization deployment costs reduced by 80% for newly constructed networks
l Carrier investment protected by employing existing ATN networks in network expansion
scenarios
13.8.2 Principles
Atom GPS timing is implemented by using a GPS antenna, GPS receiver, PLL, SyncE
processing module, and PTP processing module.

Equipment
Related Modules
Figure 13-54 GPS timing

GPS
Antenna
SFP GPS Function Block ATN
GPS signal GPS 1PPS Frequen SysClk

APLL
Receiver cy PLL
SyncE
UTC SysClk
TCLK
SysTime SyncE SyncE SyncE
Time PLL RTC Slave
Master
TimeStamps
PTP Packet
PTP GM PTP BC
GPS Antenna
It receives GPS satellite signals.
GPS Receiver
It processes GPS RF signals and extracts frequency and time information from the GPS RF
signals.
Phase-locked loop (PLL)
The PLL can be:
l Frequency PLL: locks 1 PPS reference clocks and outputs a high-frequency clock.
l Analog PLL (APLL): multiplies the system clock to a higher-frequency clock.
l Time PLL: locks the UTC time and outputs the system time.
RTC
The real time clock (RTC) provides real-time timestamps for PTP event messages.
PTP GM
The PTP Grandmaster module periodically sends Announce, Sync, and Delay_Resp messages
and receives Delay_Req messages.
SyncE Slave
It is an ATN device's slave clock processing module that extracts SyncE clock signals.
PTP BC
It is an ATN device's PTP processing module that functions as the slave BC to process PTP
messages and extract PTP time.
Implementation
Atom GPS timing provides two service functions:

Equipment
1. Service function 1: Atom GPS timing allows an AE 905S to function as a SyncE clock
reference source to provide clock synchronization for ATN.
2. Service function 2: Atom GPS timing allows an AE 905S to function as a PTP time
reference source to provide time synchronization for ATN.
The implementation process of service function 1 is as follows:
1. The AE 905S module uses a built-in GPS receiver to receive satellite signals from the
GPS antenna and output GPS clock signals at 1pps.
2. The AE 905S module uses a built-in frequency PLL module to trace and lock 1 PPS
phase and frequency and output the system clock.
3. The AE 905S module uses a built-in APLL to multiply the system clock to a clock at GE
rate, which is then used as the SyncE transmit clock.
4. The ATN uses the GE interface equipped with the AE 905S module to obtain the SyncE
clock signals from the AE 905S module and transfer the clock signals to downstream
devices.
The implementation process of service function 2 is as follows:
1. The AE 905S module uses a built-in GPS receiver to receive satellite signals from the
GPS antenna and output the UTC time.
2. The AE 905S module uses a built-in time PLL module to trace time PLL and lock the
UTC time and output the system time.
3. The AE 905S module uses a built-in RTC module to obtain the system time.
4. The AE 905S module uses a built-in PTP Grandmaster module to process PTP messages.
The timestamp carried in PTP event messages is generated by the RTC module.
5. The ATN uses the GE interface equipped with the AE 905S module to obtain the PTP
time signals from the AE 905S module and transfer the time signals to downstream
devices.
13.8.3 Applications
Atom GPS Timing Scenario

l In an outdoor deployment scenario, the AE 905S is inserted into a GE optical interface of
an outdoor ATN 905A device and then connected to an indoor IP RAN device through
an optical fiber to achieve time and frequency synchronization. For details about how to
deploy the ATOM GPS outdoors, see ATN 905 GPS Timing System and GPS Quick
Installation Guide.
l In an indoor deployment scenario, the AE 905S is directly inserted into an indoor IP
RAN device and then connected to an outdoor GPS antenna through a long GPS feeder.
For details about how to deploy the ATOM GPS indoors, see ATN ATOM GPS (Indoor
ATN with AE 905S) Quick Installation Guide.
In comparison with indoor deployment, outdoor deployment has the following advantages:
1. The deployment is simple and requires less work hours and engineering costs.
2. Cable costs are low due to the use of a fixed-length feeder.
3. Integrated outdoor installation prevents the routing of a long feeder, reducing the fault
possibility and facilitating maintenance.

Equipment

Abbreviation
GPS Global Positioning System
PLL Phase-Locked Loop
PPS Pulse Per Second
PRC Primary Reference Clock
PRTC Primary Reference Timing Clock
PTP Precision Time Protocal
RTC Real-time Clock
SFP Small Form-factor Pluggable
SyncE Synchronous Ethernet
UTC Coordinated Universal Clock

Equipment
Feature Description 14 Security
14 Security
About This Chapter
This document describes the security feature in terms of the overview, principle, and
applications.
14.1 MAC Address Limit

14.2 DHCP Snooping
14.3 URPF
14.4 Local Attack Defense
14.5 Mirroring
14.6 Online Packet Head Capture
14.7 MPAC
14.8 Keychain
14.9 IPSec
14.1 MAC Address Limit
14.1.1 Introduction to MAC Address Limitation

Definition
By configuring the maximum number of MAC addresses to be learnt by an interface , a
VLAN or a Virtual Switch Instance (VSI), you can minimize the impact of an attack and
protect other users.
Purpose
MAC entries on the Layer 2 network are essential to forwarding packets. When MAC attacks
are launched on a network, MAC entries are exhausted by invalid MAC addresses, denying

Equipment
the access of authorized users to the network. To prevent this problem, you can configure
MAC address limit to minimize the impact of MAC attacks.
Benefits
Benefits brought to operators
To solve the preceding problem, MAC address limitation is introduced. By configuring the
maximum number of MAC addresses to be learnt by an interface , a VLAN or a Virtual
Switch Instance (VSI), you can minimize the impact of an attack and protect other users.
Benefits brought to users
MAC address limitation minimizes the impact of attacks so that the security of users is
enhanced and the bandwidth usage is improved.
14.1.2 Principles
MAC address limit allows you to set the maximum number of MAC addresses to be learnt by
an interface on a ATN. When the number of learnt MAC addresses reaches the set maximum
number, the interface forwards a subsequent packet if the source MAC address of the packet
exists in the MAC table; the interface forwards or discards the subsequent packet if the source
MAC address of the packet does not exist in the MAC table based on the action configured in
MAC address limit. For example, the packet is discarded by the interface if the action is
configured as Discard.
14.1.2.1 Basic Principles of MAC Address Limit

During MAC address learning, MAC address limit is used to restrict the maximum number of
MAC addresses that can be learnt by an interface. The basic principles are as follows:
l MAC address limit based on Port or Port+VLAN
1. When a user packet passes through a port enabled with MAC address limit based on Port or
Port+VLAN, the ATN learns the source MAC address and forwarding information carried in
the user packet and experiences the limit process.
2. Limit process: The ATN first determines whether the source MAC address to be learnt
exists in the MAC table. If so, the packet is simply forwarded; if not, the ATN checks whether
the number of MAC addresses that are learnt previously reaches the maximum number set in
MAC address limit. If the set maximum number is not reached, the ATN learns the MAC
address of the packet; If the set maximum number is reached, the ATN discards or forwards
the packet based on the action set in MAC address limit.
l MAC address limit based on a VLAN in a broadcast domain or a VSI
1. When a user packet is forwarded in a broadcast domain configured with MAC address
limit, the ATN learns the source MAC address of the packet on the outbound interface. If the
source MAC address of the packet exists in the MAC table, the ATN simply forwards the
packet; if the source MAC address of the packet does not exist in the MAC table, the ATN
checks whether the number of MAC addresses learnt previously reaches the maximum
number set in MAC address limit. If not, the ATN learns the MAC address of the packet; if
so, the ATN discards or forwards the packet based on the action configured in MAC address
limit.

Equipment
14.1.2.2 Traffic Suppression Principle

NOTE
In this document, the ATN in the network diagrams functions as a switch.
Traffic Types
Traffic in a Layer 2 network is classified into the following types:
l Unicast traffic: For the unicast packets that have destination MAC mapping entries in the
MAC table, the switch forwards them based on the mapping entries.
l Unknown unicast traffic: For the unicast packets that have no destination MAC mapping
entries in the MAC table, the switch broadcasts them.
l Multicast traffic: For the packets whose destination MAC addresses are multicast MAC
addresses, the switch broadcasts them.
l Broadcast traffic: For the packets whose destination MAC addresses are broadcast MAC
addresses, the switch broadcasts them.
To ensure normal forwarding of unicast traffic, you can limit the bandwidth for forwarding
the unknown unicast traffic, multicast traffic, and broadcast traffic by configuring traffic
suppression on the switch.
Background of Traffic Suppression

To improve the communication efficiency of traditional LANs and allow more PCs to access
the LAN, as well as avoiding conflicts on the network, a Layer 2 switch is deployed. Through
MAC address learning, a Layer 2 switch restricts the conflicts on the shared link to each
attached interface. Figure 14-1 shows a typical networking diagram of traffic suppression on
a switch.
Figure 14-1 Networking diagram of traffic suppression

U ser
D a te flo w
A switch receives all the data frames across the network. It learns the source MAC address
carried in the frames and constructs a MAC address table to save the mapping between the
MAC address and the source interface.
After receiving a data frame, the switch searches the MAC address table for its mapping
destination MAC address. If the corresponding MAC address is found, the switch forwards

Equipment
the frame to the destination MAC. In this manner, the switch implements conflict isolation.
Otherwise, the switch broadcasts the frame to all the interfaces except the interface that sends
the frame. Broadcast storms then occur across the network.
When receiving a multicast or broadcast packet, the switch cannot exactly find the interface
for which the packet destines based on the destination MAC address. The switch then also
needs to forward the multicast or broadcast packet to all the interfaces except the interface
that sends the packet. In such a case, broadcast storms are also generated.
Deploying switches in the network can improve the unicast forwarding efficiency. The
broadcast traffic, however, degrades the switch performance. To solve this problem, traffic
suppression is introduced.
Traffic Suppression
If the broadcast traffic is not suppressed, a great amount of network bandwidth is consumed
when a great deal of broadcast traffic flows through the network. The network performance is
therefore degraded, even interrupting the communication.
In such a case, you need to configure broadcast traffic suppression on the switch to ensure that
the switch can reserve a part of bandwidth for forwarding unicast traffic when broadcast
traffic bursts across the network.
Figure 14-2 Schematic diagram of traffic suppression

U ser
D a ta flo w
14.1.3 MAC Address Limit Applications
MAC Address Limit on an Inbound Interface

Currently, interfaces on the ATN can function as Layer 2 interfaces. A Layer 2 interface on
the device can be connected to multiple Layer 2 user networks belonging to different VLANs.
You can enable MAC address limit on Layer 2 interfaces of the ATN to control the total
number of MAC addresses that can be learnt from all the attached user networks, regardless
of the VLANs to which each user network belongs. As shown in Figure 14-3, you can
configure MAC address limit on port1 of the ATN.

Equipment
Figure 14-3 Diagram of MAC address limit on an Inbound Interface
Layer 2 In te rn e t
n e tw o rk
P o rt1
VLAN10 VLAN20
MAC Address Limit for One or More VLANs to Which an Inbound Interface
Belongs
In addition to MAC address limit on an inbound interface, you can also configure MAC
address limit for one or more specific VLANs to which an inbound interface belongs.
As shown in Figure 14-3, you can configure MAC address limit based on Port+VLAN on
port1 of the ATN to restrict the number of MAC addresses barely on VLAN 10 or VLAN 20.

Terms
Term Description
MAC A ATN or a switch learns the source MAC address of a user packet to
Address forward the packet to the proper destination.
Learning
MAC You can restrict the number of MAC addresses to be learnt by an interface
Address to enhance the network security.
Limit
MAC It is the maximum number of MAC addresses to be learnt by an interface.

Address
Limit
Threshold
Action It is the behavior adopted by the ATN to process a subsequent packet when
the MAC address limit threshold is reached. Currently, there are two
actions: discard and forward.

Equipment
Abbreviations
MAC Mac address
14.2 DHCP Snooping
14.2.1 Introduction
Definition
Dynamic Host Configuration Protocol (DHCP) snooping, a DHCP security feature, filters out
untrusted DHCP messages by means of DHCP snooping binding table, and IP+MAC binding.
DHCP snooping functions as a firewall between the DHCP client and DHCP server to prevent
DHCP-associated attacks.
Purpose
DHCP snooping prevents the following attacks:
l Bogus DHCP server attacks
l Middleman attacks and IP/MAC spoofing attacks
l DoS attacks launched by changing the value of the CHADDR field
The working mode of DHCP snooping varies according to the type of attack, as shown in
Table 14-1.
Table 14-1 Attack types and DHCP snooping working modes

Type of Attack DHCP Snooping Working Mode
Bogus DHCP server attack Trusted/Untrusted
Middleman attack and IP/MAC spoofing Using a DHCP snooping binding table
attack
DoS attack launched by changing the Checking the CHADDR field in a DHCP
value of the CHADDR field message
14.2.2 Principles
14.2.2.1 Bogus DHCP Server Attack

DHCPREQUEST messages are broadcast so that the bogus DHCP server can intercept the
DHCPREQUEST messages. The bogus DHCP server then replies with incorrect messages
such as the incorrect IP address of the gateway, incorrect DNS server, and incorrect IP address

Equipment
to the DHCP client. This causes the Denial of Service (DoS). Figure 14-4 shows a bogus
DHCP server attack.
Figure 14-4 Diagram of a bogus DHCP server attack
DHCP server
DHCP client ATN
DHCP pseudo
server
DHCP discovery (broadcast)
DHCP offer (unicast from the pseudo server)
DHCP request (broadcast)
DHCP ack (unicast from the pseudo server)
To prevent a bogus DHCP server attack, enable DHCP snooping to work in trusted or
untrusted mode.
You can configure a physical interface to be trusted or untrusted. DHCPREPLY messages

(DHCPOFFER, DHCPACK, or DHCPNAK messages) received from an untrusted interface
are directly discarded so that the bogus DHCP server attack can be prevented, as shown in
Figure 14-5.
Figure 14-5 Diagram of DHCP snooping in the trusted/untrusted working mode

DHCP Client
x
DHCP Server
Untrusted Trusted
Untrusted
x
DHCP Pseudo
Server
14.2.2.2 Middleman Attack and IP/MAC Spoofing Attack
Middleman Attack
A middleman sends a packet carrying its own MAC address and the IP address of the DHCP
server to the client. The client then learns the IP and MAC addresses and mistakenly regards
the middleman as the DHCP server. From then on, a packet sent from the client is always
destined for the middleman before reaching the DHCP server, and the middleman, as a
response, sends a packet carrying its own MAC address and the IP address of the client to the
DHCP server. The DHCP server then learns the IP and MAC addresses and mistakenly

Equipment
regards the middleman as the client. From then on, a packet sent from the DHCP server is
always destined for the middleman before reaching the client, as shown in Figure 14-6.
The middleman therefore participates in data communications between the DHCP server and
client. The DHCP server and client then mistake that they are exchanging packets, which are
actually bogus packets processed by the middleman.
Figure 14-6 Diagram of a middleman attack
(3)
Middleman
(2) (1)
DHCP server DHCP client
IP/MAC Spoofing Attack

As shown in Figure 14-7, the attacker sends a packet carrying valid IP and MAC addresses of
a client to the DHCP server. The DHCP server regards the attacker as an authorized client and
learns the IP and MAC addresses. The real authorized client, however, cannot access the
service provided by the DHCP server.
Figure 14-7 Diagram of an IP/MAC spoofing attack
DHCP server
10.1.1.1/32
MAC:1-1-1
10.1.1.2/32
MAC:2-2-2
10.1.1.3/32 10.1.1.2/32
MAC:3-3-3 MAC:2-2-2
DHCP server DHCP client
To prevent a middleman attack and IP/MAC spoofing attack, you can use a DHCP snooping
binding table.

Equipment
The ATN appllies the Discard policy by default. After receiving an ARP or IP packet, an
interface compares its source IP address and source MAC address with the entries in the
DHCP snooping binding table. As shown in Figure 14-8, if a matched entry is found, the
packet is forwarded; if no matched entry is found, the packet is discarded.
For the clients with static IP addresses configured, ARP packets or IP packets sent from them
are discarded. This is because these clients do not obtain IP addresses by sending
DHCPREQUEST messages and no DHCP snooping binding entry exists. In this manner,
these clients cannot access the network.
Similarly, for the clients that steal valid IP addresses of other clients, ARP packets or IP
packets sent from them are also discarded. This is because these clients do not obtain IP
addresses by sending DHCPREQUEST messages and the MAC address and interface
information corresponding to the IP address in the DHCP snooping binding table are therefore
different from those of the packet sender. In this manner, these clients cannot access the
network.
Figure 14-8 Diagram of the DHCP snooping binding table

Matched in the
binding table
DHCP snooping
enable
ISP
network
Not matched in the

binding table
The entries in the DHCP snooping binding table are classified into the following two types:
l Static entries configured through using command lines. These entries can be deleted only
using command lines.
l Dynamic entries automatically learned through DHCP snooping. These entries are aged
based on the lease.
The dynamic entries in the DHCP snooping binding table are automatically generated based
on DHCPACK messages from the DHCP server.
For the untrusted interface, a Layer 3 device intercepts the DHCPREPLY message to obtain
the information including the IP address assigned by the DHCP server, the MAC address of
the interface, and the interface through which the message pass. An IP and MAC binding
entry of the untrusted interface is then generated. A binding entry has the same lease as the IP
address of the client. When the lease expires or the client releases this IP address, the entry is
automatically deleted.
14.2.2.3 DoS Attack Launched by Changing the Value of the CHADDR Field
In a DHCP exhaustion attack, the attacker may change the Client Hardware Address
(CHADDR) carried in the DHCP message rather than the source MAC address in the frame
header to repeatedly apply for IP addresses, as shown in Figure 14-9. The attack packets may

Equipment
be retransmitted normally because the device verifies a packet based on only the source MAC
address in the frame header.
Figure 14-9 Format of a DHCP message

Matched in the
binding table
DHCP snooping
enable
ISP
network
Not matched in the

binding table
You can configure DHCP snooping on the device to check the CHADDR field carried in a
DHCPREQUEST message. If the CHADDR field matches the source MAC address in the
frame header, the message is forwarded. Otherwise, the message is discarded.
14.2.2.4 Format of the Option 82 Field
Format of the Option 82 Field

Option 82 is a Information option recording the location information of the DHCP client. It is
a special field contained in a DHCP message.
When a DHCPREQUEST message sent from a DHCP client passes through a interface, the
relay agent appends an Option 82 field to this DHCPREQUEST message. When receiving the
DHCPREQUEST message carrying an Option 82 field, the DHCP server returns a Response
message containing the same Option 82 field to the DHCP relay agent.
As shown in Figure 14-10, the Code field in Option 82 is 82; the Length field indicates the
total number of bytes in the Agent Information field; the iN field indicates a sub-option of the
Agent Information field and each sub-option is a SubOpt/Length/Value tuple.
The initially assigned device sub-options are as follows:1: agent circuit ID sub-option
A DHCP server uses the agent circuit ID sub-option for IP and other parameter assignment
policies.
Figure 14-10 Format of a message with an Option 82 field

Code Length Agent Information Field
i
82 N i1 i2 i3 i4 i5 …
N
The ATN device uses the Option 82 field to define the address assignment policies or other
policies for the DHCP server to perform.

Equipment
Appending an Option 82 Field to the DHCP Message on an ATN Device

When an Option 82 field is appended to the DHCP message on an ATN device, the ATN
device refers to the device.
As shown in Figure 14-11, after Option 82 is enabled, the device appends an Option 82 field
to the DHCPDISCOVER and DHCPREQUEST messages. The DHCP server then performs
the IP assignment policy and other policies based on Option 82.
The DHCPREPLY message returned by the DHCP server also carries the Option 82 field.
After receiving the DHCPREPLY message, the interface removes the Option 82 field from the
message before forwarding the message to the client.
Figure 14-11 Appending an Option 82 field to the DHCP message on an ATN Device
Client3
Client2
DHCP DHCP
ATN Relay Server
Client1 Internet
Discover
Discover+Option82
Offer+Option82
Offer
Request
Request+Option82
Ack+Option82
Ack
Data Exchange
Option 82 Implementation
After Option 82 is enabled, a interface checks whether the DHCPREQUEST message sent
from a client or the message ready to send to a client contains an Option 82 field.
l If the DHCPREQUEST message contains an Option 82 field, do as follows:
Check configurations about Option 82 field appending, if the current interface is
configured with the Rebuild mode, it indicates that this interface does not trust the
Option 82 field contained in the received message and must modify Sub-option 1
contained in the Option 82 field.
l If the DHCP Request packet does not contain the Option 82 field:
The device adds an Option 82 field with Sub-option 1.

Equipment
When the DHCPREPLY message is forwarded, the device first checks whether the message
contains Sub-option 1 and whether the sub-option contains the Huawei Device Identifier field.
If so, the device can successfully parse the Option 82 field, and then removes the Huawei
Device Identifier field from Sub-option 1 before forwarding the message.
14.2.3 Applications
A Dynamic Host Configuration Protocol (DHCP) server dynamically assigns IP addresses to
DHCP clients. Attacks, such as a bogus DHCP server attack and a DHCP denial of service
(DoS) attack may occur during IP allocation. To address this problem, deploy DHCP
snooping. DHCP snooping can be deployed on Layer 2 or Layer 3 devices. The DHCP relay
is required when DHCP snooping is deployed on Layer 3 devices.
Figure 14-12 Networking diagram for configuring DHCP snooping on a Layer 2 device
ISP network
L3
network
DHCP
Relay
L2
network
DHCP
Snooping DHCP
Trusted Server
Untrusted
User Network

Equipment
Figure 14-13 Networking diagram for configuring DHCP snooping on a Layer 3 device
ISP network
L3
network
DHCP
d
te
Snooping
us
Tr
ed
DHCP
st
ru
Relay
nt
U
L2
network DHCP
Server
User Network

Acronym & Full Name
Abbreviation
CCP Compression Control Protocol
CHAP Challenge-Handshake Authentication Protocol
MP MultiLink PPP
MRU Maximum Receive Unit
PAP Password Authentication Protocol
RTP Real-time Transport Protocol
14.3 URPF

Equipment
14.3.1 Introduction
Definition
Unicast Reverse Path Forwarding (URPF) is a security measure used to prevent source
address spoofing attacks across the network.
Purpose
As IP networks are developing, threats to network security increase, and network devices are
vulnerable. Source address spoofing attacks have become a typical security threat on the
Internet. URPF helps prevent such attacks.
When URPF is enabled on the ATN equipment, the system will obtain the source address of a
received packet and search the routing table to see whether the interface that receives the
packet matches the outbound interface corresponding to the address in the forwarding table. If
they do not match, the source address is considered as spoofed and the packet is dropped. In
this way, URPF protects the device against source address spoofing attacks.
Figure 14-14 Source address spoofing attack

1 0 .1 1 .1 .1 2 /2 4
A tta cke r
se rve r
G 1 0 .1 0 .1 .2 /2 4
S IP 1 0 .1 .1 .2 /2 4
E0
/2
/1
D IP 1 0 .1 0 .1 .2 /2 4
/0 In te rn e t
/2
S IP 1 0 .1 .1 .2 /2 4 E0
G
ATN B re a k d o w n
D IP 1 0 .1 0 .1 .2 /2 4
U se r D a ta F lo w
U se r
1 0 .1 .1 .2 /2 4 A tta cke r D a ta flo w
As shown in Figure 14-14, an attacker forges packets with the source addresses of authorized
users and sends the packets to the server. These forged packets flood the network resources
and cause Denial of Service (DoS) to the server and users.
There are typical countermeasures to DoS attacks, including Address Resolution Protocol
(ARP) attack defense, URPF, and Dynamic Host Configuration Protocol (DHCP) snooping.
URPF can be configured on the network ingress to prevent source address spoofing attacks by
blocking the packets with forged source addresses.
Benefits
Benefits to carriers

Equipment
URPF is an additional layer of network security. It helps protect devices against source
address spoofing, reducing DoS and Distributed DoS (DDoS) attacks.
Benefits to users
URPF increases users' security on the network.
14.3.2 Principles
14.3.2.1 Principles of URPF

URPF provides protection against source address spoofing attacks. URPF enables a device to
verify reachability of the source address of a received packet. If the source IP address is
unreachable, the device discards the packet.
In a complex network environment, for example, when there are asymmetrical routers, URPF
may be not functional.
To solve the preceding problem, the ATN supports the following URPF modes:
l Strict mode
l Loose mode
l Strict mode (excluding the default route)
l Loose mode (excluding the default route)
NOTE
Among the ATN 950B series, only the ATN 950B ( with the control board AND2CXPB/AND2CXPE
installed ) supports Strict mode (excluding the default route).
Strict Mode
In strict URPF mode, a data packet can pass the URPF check only when the forwarding table
contains a matched entry and the outbound interface of the entry matches the inbound
interface of the packet.
After interface-based strict URPF is enabled on a router, the router searches the routing table
for a matched entry based on the source IP address and the VRF (index of a VPN) of a
received data packet. If the router finds such an entry, it compares the outbound interface of
the entry with the inbound interface of the packet. If the two interfaces match, the router
considers that the packet passes the URPF check, and then forwards it normally. If the router
finds no such an entry or the outbound interface of the entry mismatches the inbound interface
of the packet, the router considers the source address of the data packet as a bogus source
address, and then discards the data packet.
If there is only one path between two network edge routers, routes are symmetrical and strict
URPF safeguards the network to the most extent.
Loose Mode
In loose URPF mode, a packet can pass the URPF check as long as there is a route with the
destination address being the source address of the packet, regardless of whether the outbound
interface of the route matches the inbound interface of the packet.
After interface-based loose URPF is enabled on a router, the router searches the routing table
for an entry based on the source IP address and the VRF (index of a VPN) of a received data

Equipment
packet. If a matched entry is found, the data packet passes the URPF check and is forwarded
normally. If no matched entry is found, the source address of the packet is considered as a
bogus source address, and the packet is discarded.
If there are multiple connections between two network edge devices, routers may be
asymmetrical but loose URPF still safeguards the network to a certain extent.
Strict Mode (Excluding the Default Route)

In strict URPF mode (Excluding the Default Route), a data packet can pass the URPF check
only when the forwarding table contains a matched entry (Excluding the Default Route) and
the outbound interface of the entry matches the inbound interface of the packet.
After interface-based strict URPF is enabled on a router, the router searches the routing table
for a matched entry based on the source IP address and the VRF (index of a VPN) of a
received data packet. If the router finds such an entry (Excluding the Default Route), the
router compares the outbound interface of the entry with the inbound interface of the packet.
If the two interfaces match, the router considers that the packet passes the URPF check, and
then forwards it normally. If no such an entry is found in the routing table or the outbound
interface of the entry mismatch the inbound interface of the packet, or an entry (default route)
matches the search conditions, the router considers the source address of the data packet as a
bogus source address, and then discards the data packet.
Loose Mode (Excluding the Default Route)

In loose URPF mode (Excluding the Default Route), the ATN device does not check whether
the inbound interface of a route matches the outbound interface of a received packet. Instead,
if there is a route starting from the source address of the packet, the device lets the packet pass
URPF verification.
Loose mode is set by interface. When receiving IP data packets through interfaces, the ATN
device searches the route table based on the source IP address and VRF (VPN index) of the
packets. If an entry (not the default route) matches the search conditions, the packets pass
URPF verification and then the device forwards the data packets. If no entry or an entry
(default route) matches the search conditions, the device processes the data packets as
unauthenticated and discards them.
14.3.3 Applications
NOTE
l IPv4 URPF can be applied globally or to a specific interface.

l IPv6 URPF can be applied to a specific interface. When an ATN 950B is equipped with
AND1CXPA/AND1CXPB board, IPv6 URPF can be globally applied on the ATN 950B.

Equipment
Application of Strict URPF on an ISP Network
Figure 14-15 URPF application environment where a client is connected to the ISP network
through only one path
Network A
Network C
PE-3
NodeB
source: 2.2.2.2 GE0/2/0

PE-1
destiation:3.3.3.3
ISP
ATN-1
Network B PE-2
GE0/2/1
URPF
enabled
NodeB
source: 2.2.2.2
destiation:3.3.3.3
As shown in Figure 14-15, Network A and Network B are connected to ATN-1. URPF is
enabled on GE 0/2/0 and GE 0/2/1 of ATN-1 to protect the ISP network against source
address spoofing attacks from network A and network B.
It is assumed that a PC in network A sends a request packet with a forged source address
being 2.2.2.2 to network C. After receiving the request packet, ATN-1 performs the URPF
check on the based on the inbound interface and source address of the packet. ATN-1 then
finds that the request packet should enter through GE 0/2/1 but it enters through GE 0/2/0.
ATN-1 considers the source address of the packet as a bogus source address, and directly
discards the packet. In this manner, ATN-1 is protected against the source address spoofing
attack.
After normal packets sent by a PC in network B to network C pass the URPF check, the
packets are normally forwarded.
Application of Loose URPF on ISP Networks

Loose URPF is applicable to the scenario where a client is dual-homed to devices on an ISP
network and the scenario where a client is dual-homed to devices on different ISP networks.
enters
l Scenario where a client is dual-homed to devices on an ISP network

Equipment
Figure 14-16 URPF application environment where a client is dual-homed to devices on an

ISP network
source: 2.2.2.2
NodeB destiation:3.3.3.3
ATN
1.1.1.1/24
CX-A
Server
source:2.2.2.2 3.3.3.3/24
destiation:3.3.3.3
NodeB ATN
2.2.2.2/24
l As shown in Figure 14-16, multiple connections are set up between NodeB network and
an ISP to ensure reliability. In this case, symmetrical routes between the NodeB network
and the ISP network cannot be ensured, and loose URPF must be used.
Scenario where a client is dual-homed to devices on different ISP networks
Figure 14-17 URPF application environment where a client is dual-homed to devices on

different ISP networks
Network A ISP A
CX-B
Internet
NodeB ATN A
ISP B
CX-C
As shown in Figure 14-17, the NodeB network is connected to multiple ISP networks.
Therefore, it is difficult to ensure symmetrical routes between the NodeB network and two
ISP networks. Loose URPF must be used.
URPF applied in the scenario where an NodeB network is connected to multiple ISP networks
has the following characteristics:
l If any special packet is required to pass the URPF check in all conditions, you can
specify the source address in an ACL.
l Many users' ATNs may have only one default route to the ATN of an ISP network.
Therefore, the default routing entries should be configured.

Equipment

None.
14.4 Local Attack Defense
14.4.1 Introduction
Definition
The local attack defense feature restricts the packets to be sent to the CPU of an ATN device
to protect services on the ATN device.
Local attack defense functions are as follows:
l Management and control plane protection: defines an interface-based or global attack

defense policy to prevent the interface or ATN device from receiving attack packets.
l Whitelist-based application layer association: configures a whitelist of authorized users
and allows the packets matching the whitelist to be preferentially sent to the CPU.
l CP-CAR: restricts the rate at which packets are sent to the CPU. If both the number of
dropped packets within a period and the CPU usage reach the configured thresholds, an
alarm is generated.
l Attack source tracing: records information about attack packets received by each module
and automatically identifies the attack sources for attack suppression.
Purpose
With the development and wide application of the network, people become more concerned
about how to keep the confidentiality and security of their privacy data and resources in such
an open network environment.
Therefore, protecting the CPU is necessary and important for the device to process and
respond to normal services. Both valid packets and attack packets that are destined for the
CPU exist on the network. Attack packets destined for the CPU can interrupt services or even
paralyze the system. Moreover, a burst of valid packets will increase the CPU usage,
deteriorating the CPU performance.
The local attack defense feature on an ATN device targets the packets to be sent to the ATN
CPU in an attempt to protect running services and prevent mutual impact of services on each
other in the case of attacks.
Local attack defense on the ATN device offers the following functions:
l Restricts users' remote access to the ATN device through unauthorized interfaces.
l Restricts control packets received on unauthorized interfaces.
l Ensures that packets matching the configured whitelist are preferentially sent to the
CPU.
l Restricts the rate at which packets are sent to the CPU.
l Records information about attack packets.

Equipment
Benefits
The local attack defense feature offers the following benefits to carriers:
l Services are not interrupted or affected when the ATN device is under attack, improving
the ATN device's working duration and carriers' service capabilities.
The local attack defense feature offers the following benefits to users:
l Attacks on the ATN device are restricted, improving user information security and
bandwidth usage.
14.4.2 Principle of Device Security

This section describes the basic principle of local attack defense.
Local attack defense applies in the following scenario:
Figure 14-18 Local attack defense application
ATN-A
GE1
1.1.1.1/24
Internet
ATN-B ATN-C
GE2 GE3
2.2.2.2/24 3.3.3.3/24
When under attacks from the Internet, ATN-A applies local attack defense to sustain its
service continuity and protect the communication with ATN-B and ATN-C.
14.4.2.1 Management and Control Plane Protection
Management and control plane protection has two functions. One function is to specify some
interfaces as management interfaces and enable the other interfaces to discard all received
management packets. In this manner, management and control plane protection can prevent
attackers from remotely controlling ATN. The other function is to control protocol packets at
the software layer. By applying policies globally or to specific interfaces, management and
control plane protection can flexibly specify the types of protocol packets for an interface on
ATN. If no active interface on the device has the protocols FTP, SSH. SNMP, Telnet, and
TFTP enabled, the command for disabling these protocols globally does not take effect. For
example, if no active interface on the device has FTP enabled, the command for disabling
FTP globally does not take effect. This prevents the device from being disconnected.
14.4.2.2 Attack Source Tracing
The attack source tracing module can be considered as a powerful log processing center. It
records the information about attack packets coming from each function module, and

Equipment
sequences the attack packets by timestamp in its buffer area. In addition, it supports exact
query and fuzzy query of information about the attack packets and retains the information
after the device is reset. Users can export the information in a standard format to the CF card
on the system control board by running a specific command.
Currently, Information recorded by attack source tracing cannot be saved on an independent
server.
14.4.2.3 CP-CAR
CP-CAR restricts the rate at which packets are sent to the CPU to protect the CPU.
Rate Limitation on Packets to Be Sent to the CPU

Figure 14-19 shows how rate limitation is implemented for packets to be sent to the CPU.
Figure 14-19 Rate limitation on packets to be sent to the CPU
Rate limitation can be implemented on packets first by a specific protocol and then by all
protocols.
l Rate limitation on protocol-specific packets: Specific bandwidth is configured for
packets of a specific protocol, such as ARP and DHCP, so that packets of this protocol
do not preempt bandwidth that is intended for other protocol packets.
l Rate limitation on all protocol packets: The excess packets over the rate limit are
dropped so that the CPU is protected against overloads. After packets of each protocol
are transmitted at a protocol-specific rate, these packets are added to 14 protocol groups,
for each of which bandwidth is assigned based on its weight. Packets in each protocol

Equipment
group are placed in eight queues for transmission to ensure that packets of different
protocols do not affect each other.
The alarm function accompanies rate limitation.
l Rate limitation on protocol-specific packets: If both the number of protocol-specific
packets sent to the CPU and the CPU usage reach the configured thresholds within a
certain period, an alarm is generated.
l Rate limitation on all protocol packets: If the number of all protocol packets sent to the
CPU and the CPU usage reach the configured thresholds within a certain period, an
alarm is generated.
14.4.2.4 Whitelist-based Application Layer Association

You can configure whitelist-based application layer association to prevent specific
connections from being interrupted due to protocol packet attacks.
When an ATN device establishes connections to other devices, the trusted protocol packets
transmitted for this purpose are added to the whitelist so that they are preferentially sent to the
CPU to ensure the security and maintainability of the connections.
After the ATN device and another device are disconnected, the ATN device deletes the
corresponding trusted protocol packets from the whitelist.
14.4.2.5 Alarm
If the number of discarded packets exceeds the threshold in a period, the device with the
security function generates alarms that inform the NMS of massive packet loss that must be
handled.
If the number of discarded packets is under the threshold, the alarm is cleared.
14.4.3 Applications
14.4.3.1 Whitelist-based Application Layer Association

You can configure the whitelist-based application layer association function to improve the
security and maintainability of established connections.
In Figure 14-20, a user accesses the ATN device using Telnet. When an attacker sends a large
number of Telnet protocol packets to the ATN device to exhaust the channel resources, the
Telnet protocol packets used for the user's login fail to be allocated sufficient channel
resources and therefore cannot reach the CPU. As a result, the Telnet connection between the
user and ATN device is interrupted, having the user logged out.

Equipment
Figure 14-20 Whitelist-based application layer association application
To resolve the problem showed in Figure 14-20, enable the whitelist-based application layer
association function to add the user to the whitelist to prevent the user's connection to the
ATN device from being interrupted.
14.4.3.2 CP-CAR
CP-CAR is used to protect the ATN device's CPU so that the CPU can properly work even
under attacks.
In Figure 14-21, ATN device need to process OSPF protocol packets transmitted from the
network.An attacker sends a large number of attack packets to an ATN device to consume
excessive CPU resources on the ATN device, causing the ATN device not to be able to
process OSPF protocol packets transmitted on the network side. As a result, the OSPF
connection fails to be established between the ATN device and the network-side device.
Figure 14-21 CP-CAR application
After CP-CAR is configured on the ATN device, attack packets are sent to the CPU at a
restricted rate so that sufficient CPU resources are allocated to OSPF protocol packets
transmitted on the network side. In this manner, OSPF protocol packets can be properly
transmitted between the ATN device and network-side device, allowing valid OSPF users to
access the Internet through the ATN device.

This section lists the acronyms and abbreviations in the document.

Equipment
14.4.4.1 Abbreviations
None
14.5 Mirroring
14.5.1 Introduction to Mirroring

This section describes the definition, purpose, and advantages of mirroring.
Definition
NOTE
The mirroring feature may be used to analyze the communication information of terminal customers for
a maintenance purpose. Before enabling the mirroring function, ensure that it is performed within the
boundaries permitted by applicable laws and regulations. Effective measures must be taken to ensure
that information is securely protected.
Mirroring is a method to capture packets received or sent on interfaces without affecting

packet forwarding. It is used to locate faults on a network.
Mirroring can be classified into the following types based on the locations of the mirrored
port and the observing port:
l Local mirroring
In local mirroring, traffic on an interface is copied and sent through another interface of a
local device. As shown in Figure 14-22, Port 1 is the local mirrored port and Port 3 is
the local observing port.
Figure 14-22 Networking diagram of local mirroring

ATN-A
Port 1 Port 2
Network 1 Network 2
Imported packets Forwarded packets
Port 3 Mirrored packets
Analyzer
Mirroring can also be classified into the following types based on mirroring policies:
l Port mirroring: All traffic received and sent on a port is mirrored.

l Flow mirroring: Traffic is mirrored only when it meets the configured filtering rules.
Figure 14-23 shows certain terms used in mirroring.

Equipment
Mirrored packets
GE0/2/0
Label VC L2 L3 Mirrored
L2 L3
GRE VC L2 L3 port Local
Source ATN
mirroring Destination ATN
ATN-A source ATN-B
MPLS/GRE tunnel
GE0/2/2
Mirrored port
Remote mirroring GE0/2/1
source Local observing port
Forwarded flow
Mirroring flow Analyzer
l Local mirroring source: In local mirroring, it is an interface on which packets are copied
to and then sent out by a specified local observing interface. As shown in Figure 14-23,
GE 0/2/0 of ATN-B is a local mirroring source.
l Local observing port: It is an interface for sending the mirrored traffic. As shown in
Figure 14-23, GE 0/2/1 of ATN-B is a local observing port.
Functions supported in both local mirroring and remote mirroring are as follows:
l Receives and sends data based on physical ports.

l Receives and sends data based on VLAN sub-interfaces.
l Mirrors packets based on traffic classification.
Purposes
When connected to the Internet, a device faces various attacks. In this case, the self-protection
capability of the device must be enhanced by analyzing attack packets in time to eliminate
attack threats, filtering attack packets before attacks and tracing attack sources to prevent the
same attacks.
Benefits
Benefits Brought to Users
Customers and customer service personnel apply mirroring to locate and analyze network
problems.
14.5.2 Principle
This section describes the basic principle and application of local mirroring.

Equipment
14.5.2.1 Principle of Local Mirroring
l Receives and sends data based on physical ports.

– Mirrors packets received by the mirrored port and sends the packets through a
specified observing port.
– Mirrors packets sent by the mirrored port and sends the packets through a specified
observing port.
l Receives and sends data based on VLAN sub-interfaces.
– Filters mirrored packets through VLANs.
– Completely mirrors only Ethernet packets containing an outer tag equal to a
specified VLAN ID.
– Sends these packets through a specified observing port.
l Mirrors packets based on traffic classification.
– Filters packets based on QoS complex traffic classification, such as source IP
addresses and destination IP addresses; completely mirrors packets; sends these
packets through a specified observing port.
l Sets the CAR for mirrored traffic.
– Sets the CIR (ranging from 100 Kbit/s to 2500000 Kbit/s) of the CAR to minimize
the impact of mirrored packets on normal forwarding.
l Mirroring between different types of interfaces.
l Supports user-defined offset for interface-based mirroring.
NOTE
For the ATN 950B, Only the ATN 950B ( with the AND2CXPB/AND2CXPE configured )
supports this.
l Supports the statistics function for mirroring.
14.5.2.2 Application
Application of Local Mirroring

ATN-A
Port 1 Port 2
Network 1 Network 2
Imported packets Forwarded packets
Port 3 Mirrored packets
Analyzer
As shown in Figure 14-24, Port 1 on ATN-A is a mirrored port that mirrors received packets
and Port 3 is an observing port.

Equipment
14.6 Online Packet Head Capture
14.6.1 Introduction
Definition
NOTE
The NetStream function conforms to IETF RFC3954. For security risks, see IETF RFC3954. This
function involves analyzing the communications information of terminal customers. Before enabling the
function, ensure that it is performed within the boundaries permitted by applicable laws and regulations.
Effective measures must be taken to ensure that information is securely protected.
Online packet head capture is used to intercept sent and received packets for analysis.
Purpose
When network devices send or receive incorrect packets or drop packets, the quality of voice
services on the network will deteriorate and mosaics will occur, affecting network
performance. To solve the preceding problems, you can use the online packet head capture
function.
In the traditional method for obtaining packet information, files on a device need to be
exported and maintenance engineers must connect devices or modify configurations on site.
This consumes great human cost and increases device operating risks. The online packet head
capture function is easy to configure and allows you to capture packets' head quickly for a
specified problem.
You can check packet head, captured on a device without exporting files from the device,
shortening location time and enhancing location efficiency.
Benefits
This function brings the following benefits to carriers:
l This function can be deployed remotely to facilitate packet head capture for devices
deployed far away.
l The topology of devices does not need to be modified, shortening location time.
l Compared with the mirroring function, online packet head capture does not need
observing interfaces, saving interface resources.
14.6.2 Principles
Principles
Online packet head capture is similar to the mirroring function. Packets are processed using
the mirroring process. Then packet head captured is sent to a CPU of an MPU and stored in
the CPU memory and CF card.

Equipment
Online Packet Head Capture of Forwarding Packets

Online packet head capture can be used directly to capture packet heads of forwarding
packets. The speed for capturing packet heads of forwarding packets can be limited by setting
the CIR value of CAR for sending the packets to a CPU.
Online Packet Head Capture of Packets Sent to a CPU

In the process of sending packets to a CPU, the packets are replicated based on system IDs
that are defined to prevent malicious attacks. Then the replicated packets are sent to the CPU
of an MPU.
A device can capture packet heads of various protocol packets sent to a CPU, including BGP,
LDP, FTP, and Telnet packets.
Profile Instance
The profile instance for online packet head capture defines the following parameters:
l Duration
Indicates the duration for capturing the packet head.
l Number of captured packets' head
Indicates the maximum number of packets' head to be captured.
l Size of packets' head to be captured
Indicates the size of a packet head capture file (storing captured packets' head). Packets'
head will no longer be captured if the size of the packet head capture file exceeds the
specified value.
l Length of packet head
Indicates the length of the packet head captured.
Online Packet Head Capture Based on ACL Rules

A device supports online packet head capture based on ACL rules.
l Packets sent to the CPU can be filtered by a basic or advanced ACL rule.
l Packets forwarded based on IPv4 can be filtered by a basic or advanced ACL rule, or an
Ethernet frame header-based ACL rule.
14.6.3 Applications
The voice service quality of a subscriber attached to a PE on the live network deteriorates and
troubleshooting is required, as shown in Figure 14-25.
To quickly locate the network fault, a network engineer can deploy the remote packet head
capture function on that PE and capture packet heads of packets sent to the CPU of that PE for
processing. Then the network engineer can analyze captured packet heads for fault
rectification.

Equipment
Figure 14-25 Network diagram for remote packet head capturing
ISP Networks
CE
OFFICE
Internet PE
BGP、LDP、FTP、TELNET
Engineer Packets
Forwarding packet
Captured Forwarding packet
CPU processed packet

Captured CPU processed packet
None.
14.7 MPAC
14.7.1 Introduction
Definition
The Management Plane Access Control (MPAC) feature protects devices from attacks.
MPAC enables devices to filter packets destined for the CPU based on rules specified in an
MPAC policy and discard unneeded packets, which helps prevent attacks to the CPU.
Purpose
On an Internet service provider (ISP) network, user-side interfaces on a local device receive a
great number of packets to be forwarded to the CPU. Some packets attempt to initiate attacks
to the CPU. If too many packets rush to the CPU, CPU usage increases sharply and device
performance deteriorates, which then affects services running on the device. Frequently
sending attack packets to the CPU causes the CPU to be busy processing packets, which
affects other services or even causes a system crash.
An MPAC policy can be configured to allow the device to send valid packets to the CPU and
to discard attack packets, which prevents attacks to the CPU. MPAC is enabled to protect
TCP/IP-based control plane protocols against Denial of Service (DoS) attacks. For example,
an attacker keeps sending packets to a device by simulating a routing protocol. The device
receives and processes the attack packets as valid packets. As a result, the device becomes
extremely busy, and its CPU usage increases. To prevent CPU overload, you can set an MPAC
rule to enable the device to drop forged packets destined for the CPU.

Equipment
14.7.2 Principles
On an Internet service provider (ISP) network shown in Figure 14-26, user-side interfaces on
a local device receive a great number of packets to be forwarded to the CPU. Some packets
attempt to initiate attacks to the CPU.
Figure 14-26 MPAC networking
User-side Network-side
Network
ATNA
Attack packets destined for the CPU on a device pose the following threats to the device:
l A great number of packets sent to the CPU are likely to cause a sharp spike in CPU
usage. If the CPU is overloaded, device performance deteriorates, and services may be
interrupted.
l Malicious packets allowed to reach the CPU consume resources, which causes a service
interruption or even a system crash.
To prevent CPU resource exhaustion for stable network operation, configure an MPAC policy
on sub-interfaces, physical interfaces, and the entire device. The rules in the policy determine
whether protocol-specific packets with the specified source and destination addresses can be
sent to the service module.
l If packets match a rule and the behavior in the rule is "permit", the packets are sent to the
service module.
l If packets match a rule and the behavior in the rule is "deny", the packets are discarded.
l If packets do not match rules in the policy, the packets are sent to the service module.
Figure 14-27 demonstrates how an MPAC-capable device processes packets. You can define
rules in an MPAC policy to meet site-specific requirements.

Equipment
Figure 14-27 Packet processing
Packets destined for the

CPU
Is an MPAC Yes Is the MPAC Yes

policy applied to sub- policy of sub-interfaces
interfaces? set to permit?
No
No
Discard the packets
Is an Yes Is the MPAC Yes

MPAC policy applied policy of interfaces
to interfaces? set to permit?
No
No
Discard the packets
Yes Is the Yes

Is an MPAC policy globally MPAC policy
applied globally? set to permit?
No
No
Discard the packets
Send the packets to the

CPU
Terms
None

Equipment

DoS Denial of Service
MPAC Management Plane Access Control
14.8 Keychain
14.8.1 Introduction
Definition
Applications, such as Routing Protocol Application (RPA), Transmission Control Protocol
(TCP), and signaling protocols (like LDP), exchange authenticated packets over the network
for security reasons, but the authentication mechanism in these applications is not robust.
Each application uses a constant authentication key unless the administrator of the network
changes the key manually. Manual authentication key change is a cumbersome procedure.
During the change, packets can be dropped, because it is very difficult to change the keys
instantaneously on all routers.
Another drawback of this type of authentication mechanism is that there is no central
application to control all the authentication functionality. Each application maintains its own
set of authentication rules. If there are many application instances that require the same set of
authentications, this results in duplication of data and processing.
This authentication system needs a mechanism to achieve centralization of all authentication
processing and dynamic change of authentication keys with little human intervention. To
achieve such, a new application called Keychain has been added to the system.
Keychain is a centralized application that provides authentication functionality to all
applications that require them. It also provides dynamic change of authentication keys to all
required applications.
Purpose
When routing applications communicate over a network, persons with malicious intent can
tamper with packets or pretend to be authenticated users. To detect modified messages and to
authenticate the sender, routing applications support message authentication by defining the
authentication rules statically. Each application may use different authentication rules, but
using the same authentication rule over a long period will eventually compromise security.
Manually changing the authentication rules on communicating peers simultaneously is error
prone.
If each application maintains its own set of authentication rules, multiple instances of the
same set of authentication information create duplication of data and processing across
networking applications.
Keychain centralizes the storage of authentication information and provides dynamic
modification of authentication information without human intervention for all applications
that need to perform authenticated communication.

Equipment
14.8.2 Principles
14.8.2.1 Principles of Keychain
Keychain is a set of key-ids, each of which uniquely represents authentication information.

Authentication information includes the authentication password and algorithm. The dynamic
change of authentication information is achieved based on the send and receive time
associated with a key-id.
Active Send key-id: When the current system time is within the time range of the configured
send time of a key-id, that key-id is "send-active" provided that the key-id has already been
configured with an authentication algorithm and password. The authentication information
associated with this key-id is used by applications to generate Message Authentication Codes
(MACs) when sending packets.
Active Receive key-id: When the current system time is within the time range of the
configured receive time of a key-id, that key-id is "receive-active" provided that the key-id
has already been configured with an authentication algorithm and password. The
authentication information associated with this key-id is used by an application to validate the
MACs in the received packets.
The send and receive times can be configured in an absolute time range or periodic time
range. An absolute time range is configured in Coordinated Universal Time (UTC) format.
Periodic time ranges are Daily periodic, Weekly periodic, Monthly periodic, and Yearly
periodic, which means the key-id will be active periodically during certain hours of the day,
on certain days of the week, dates of the month, and months of the year, respectively.
In a Keychain, there can be only one active send key-id for any instant in time; active time
ranges for a send key-id must not overlap. Keychain supports a default send key-id which is
used as the active send key-id when no other key-id is active. Multiple receive key-ids can be
active at any time.
When the send key-id on a router changes, the corresponding receive key-id on the peer router
should change instantaneously. However, because of clock non-synchronization, there can be
a time lag between the changes of key-id on one router and another. During this period,
packets can be dropped because of inconsistent key-ids. To prevent this scenario and to
facilitate a smooth transition from one receive key-id to another, a grace period, or "receive
tolerance period", is allowed during which both key-ids are used.
The receive tolerance period is applicable only to receive key-ids. On both the start and end
time of a receive key-id, the receive time range is extended by a period equal to that of the
receive tolerance period. The receive tolerance configuration is maintained per Keychain.
Keychain supports MD5, SHA1-12, HMAC-MD5, HMAC-SHA1-12, SHA2-256, HMAC-
SHA2-256 and HMAC-SHA1-20 as authentication algorithms. Keychain only maintains the
authentication algorithm; applications must support these authentication algorithms to use the
authentication functionality.
14.8.3 Applications
This section describes typical applications of Keychain.
14.8.3.1 Non-TCP Applications of Keychain

Equipment
Non-TCP applications such as RIP and ISIS can initialize or de-initialize with the Keychain
module through the exposed initialization application programming interface (API) provided
with Keychain.
When an application needs to send packets, it performs the process shown in the following
figure.
Figure 14-28 Sending packets in Non-TCP application
Trigger for Application

sending packet Send out
Formulate packet packet
Generate Generate
with authentication
Packet MAC
information
Provide
Send packet data
Request active for which Provide
active
key-id information MAC has calculated
key-id
to be MAC value
created
Keychain
1. Through the Keychain API, the application queries Keychain for the active send key-id.
When it receives the active key-id, the application constructs the packet data for which a
MAC needs to be calculated. Then it sends the packet data to Keychain.
2. Keychain generates a MAC for the packet data and sends the calculated MAC to the
application.
3. The application formulates a packet with authentication information and sends it out.
When an application receives a packet, it performs the process shown in the following figure.

Equipment
Figure 14-29 Receiving packets in Non-TCP application
Application
Receive
packet Accept packet
Receive packet with
Parse packet based on the
authentication
validation
Send information to
Keychain for validation Send
of received success or
authentication failure
information
Validate the received authentication

information, re-calculate MAC and
compare against the received MAC
Keychain
l The application extracts the packet authentication information.

l The application sends the authentication information (Keychain name, packet data, key-
id, algorithm type, MAC) to Keychain.
l Keychain re-calculates a MAC and compares the generated MAC with the received
MAC. If the MACs match, Keychain returns a success message to the application.
Otherwise, Keychain returns a failure message.
l The application accepts or rejects the packet based on the Keychain validation.
When an application that does not carry the key-id in the packet, such as ISIS, receives a
packet, it performs the following process:
l The application extracts the authentication information and sends the information
(Keychain name, packet data, algorithm type, MAC) to Keychain for validation.
l Keychain re-calculates the MAC for each active receive key-id and compares them with
the MAC received in the packet. If the MACs match, then Success is returned to the
application; otherwise, failure is returned to the application.
14.8.3.2 TCP Applications of Keychain
In TCP application of Keychain, authentication is done at the TCP level, not at the application
level. An application specifies that TCP will use Keychain to extract authentication
information. TCP initializes or de-initializes itself with the Keychain module through the
exposed Keychain initialization API.
TCP uses the Enhanced Authentication Option for authenticated communication, as specified
in the TCPM Working Group draft (draft-bonica-tcp-auth-06.txt). The following figure shows
the Option format.

Equipment
Figure 14-30 TCP Enhanced Authentication Option format
Kind Length T K Alg-idRes Key-id
Authentication Data
Because the draft is not a standard yet, the Internet Assigned Numbers Authority (IANA) has
not defined the kind value (Option type) nor the algorithm-id for some algorithms. Thus
different vendors use different values. To be interoperable with other vendors, the kind value
and TCP algorithm-id of TCP are configurable and are maintained in Keychain.
The Keychain API provides a query function for applications to obtain TCP kind and
algorithm-id values.
When a TCP application needs to send packets, it performs the process shown in the
following figure.
Figure 14-31 Sending out packets in TCP application
Trigger for TCP Application Send

sending
Formulate out
packet
packet with packet
Generate Packet Generate MAC
authentication
information
Provide
Request TCP Send TCP
packet
kind and TCP kind and Request
Send data for Provide
algorithm-id TCP active
active which Calculated
value algorithm-id key-id
key-id MAC has MAC value
value information
to be
created
Keychain
1. To set the Enhanced Authentication Option, the application queries the Keychain module
to get the active send key-id authentication information.
2. From the authentication information obtained, the application generates packet data and
sends it to Keychain to generate a MAC. Keychain calculates the MAC and sends it to
the application.
3. The application fills in the TCP kind value, TCP algorithm-id that corresponds to the
active send key-id algorithm, and generated MAC in the Enhanced Authentication
Option format and sends out the packet.
When the TCP application receives a packet, it performs the process shown in the following
figure.

Equipment
Figure 14-32 Receiving packets in TCP application
TCP Application
Receive
packet Success Accept packet
Receive packet
Parse packet based on the
with authentication
validation
Send to validate the Send

received authentication success or
information failure
Keychain
1. Checks if TCP algorithm-id received in the packet

matches with the TCP algorithm-id corresponding to the
received key-id.
2.Validate the received authentication information, re-
calculate MAC and compare against the received MAC
l The application extracts authentication information from the packet and provides it to
Keychain for validation.
l Keychain checks whether the TCP algorithm-id in the packet matches the TCP
algorithm-id that corresponds to the received key-id algorithm. If algorithm-ids do not
match, then a failure message will be returned.
l Keychain re-calculates the MAC and compares the generated MAC and received MAC.
If they match, then a success message is returned to the application; otherwise, a failure
message is returned.
Terms
Term Description
Keychain Set of key-ids with authentication

information
Key-id Unique identifier for authentication

information
Send time Duration for a key-id to be used as send key
Receive time Duration for a key-id to be used as receive

key
Authentication password Key used for generating a MAC
Authentication algorithm Algorithm used to generate a MAC to

authenticate a message

Equipment
Term Description
TCP kind Option type value used in the TCP

Enhanced Authentication Option
TCP algorithm-id Value that indicates the algorithm used for

the TCP Enhanced Authentication Option

CBB Common Building Block
ISIS Intermediate System to Intermediate System
MAC Message Authentication Code
14.9 IPSec
NOTE
The ATN does not support data encryption on an IPsec VPN tunnel. To comply with RFC standards,
IPsec on the ATN applies only to the IPv4 PIM, IPv6 PIM, MLD, OSPFv3, RIPng protocol packets but
not to the transmitted data.
14.9.1 Introduction
Definition
Internet Protocol Security (IPSec) is a security protocol suite defined by the Internet
Engineering Task Force (IETF). IPSec secures data transmission on the Internet through data
origin authentication, data encryption, data integrity check, and anti-replay functions:
l Data origin authentication: The receiver checks the validity of the sender.
l Data encryption: The sender encrypts data packets and transmits them in cipher text on
the Internet. The receiver decrypts or directly forwards the received data packets.
l Data integrity check: The receiver validates received data to check whether the data has
been tampered with.
l Anti-replay: The receiver rejects old or duplicate packets to prevent attacks that
malicious users initiate by re-sending obtained data packets.
Purpose
On the Internet, most data is transmitted in plain text, causing security risks. For example,
bank accounts and passwords may be intercepted or tampered with, user identities may be

Equipment
counterfeited, or bank networks may be attacked. IPSec can protect transmitted IP packets to
reduce the risk of information leaks.
Benefits
IPSec reduces the risk of information leaks and tampering, ensures data integrity and
confidentiality, and secures service transmission.
14.9.2 Principles
14.9.2.1 IPSec Basic Concepts
IPSec configurations include the security association (SA), security protocol, encapsulation
mode, authentication algorithm, and encryption algorithm.
SA
IPSec provides secure communication between IPSec peers (two communication ends).
An SA functions as a convention for some elements of IPSec peers, and is used to protect
data, providing IPSec with essential functions. It defines the security protocol to be applied,
encapsulation mode, authentication mode, and shared key for data protection. The security
protocol can be configured as either Authentication Header (AH) or Encapsulating Security
Payload (ESP), while the authentication mode can be set to Message Digest 5 (MD5), Secure
Hash Algorithm 1 (SHA-1) or SHA-2.
An SA is unidirectional, with incoming packets and outgoing packets being processed by
different SAs. Therefore, if two hosts (ATN A and ATN B) are to communicate through ESP,
two SAs must be set up on ATN A, one of which processes the outgoing packets and the other
processes the incoming packets. Similarly, two SAs must be set up on ATN B, as shown in
Figure 14-33.
Figure 14-33 Unidirectional logical connections of SAs

ATNA ATNB
SA to process the SA to process the

outgoing packets incoming packets
SA to process the SA to process the

incoming packets outgoing packets
SAs are protocol-specific. If ATN A and ATN B use both AH and ESP for secure
communication, four SAs are required on ATN A. Two SAs (one for incoming packets and

Equipment
one for outgoing packets) are configured for AH and the other two SAs (one for incoming
packets and one for outgoing packets) are configured for ESP. Similarly, four SAs with
equivalent relationships are required on ATN B.
An SA is uniquely identified by a 3-tuple, which comprises the Security Parameter Index
(SPI), destination IP address, and security protocol (AH or ESP). The SPI is a 32-bit number
generated to identify an SA and is carried in the AH or ESP header during transmission.
Security protocol
IPSec ensures security by authenticating and encrypting data using two security protocols,
AH and ESP, the features of which are shown in Table 14-2.

Equipment
Table 14-2 AH and ESP

Function\Security AH ESP
protocol
Data integrity authentication This function prevents data This function prevents data
from being modified by from being modified by
unauthorized users during unauthorized users during
transmission. Data security transmission. Data security
is ensured by using an is ensured by using an
authentication key and authentication key and
authentication algorithm that authentication algorithm that
are shared by sending and are shared by sending and
receiving parties. receiving parties.
Before transmitting data, the The authentication process
sending party calculates the is the same as that of AH,
data using the authentication with the difference that,
key and specified using ESP, all IP packet
authentication algorithm. contents except the IP
The sending party then header are authenticated.
sends the calculation result Therefore, AH provides a
together with the data packet more secure service than
to the receiving party. After ESP.
receiving the packet, the
receiving party uses the
same authentication key and
authentication algorithm to
calculate the data. If the
result calculated by the
receiving party is the same
as the calculation result sent
by the sending party, the
packet is considered
integrated and not to have
been modified during data
transmission; otherwise, the
packet is considered to have
been modified and is
dropped.
If AH is used to protect
data, the entire IP packet is
authenticated.
Data origin authentication This function checks that the This function checks that the
party sending the data is party sending the data is
authorized. authorized.

Equipment
Function\Security AH ESP
protocol
Data encryption - This function prevents data

from being intercepted and
viewed illegally during
transmission. An encryption
key and algorithm are
shared by the sending party
and receiving party.
The sending party encrypts
data using the encryption
key and a specified
encryption algorithm and
then sends the encrypted
data to the receiving party.
After receiving the data, the
receiving party uses the
same encryption key and
encryption algorithm to
decrypt and restore the data.
Because the data is
encrypted before
transmission, the contents of
the packet are well protected
during transmission.
Encapsulation mode
IPSec currently supports the transport mode for encapsulation. With this mode, the AH or
ESP header is inserted following the IP header, but before all transport layer protocol headers
or all other IPSec protocol headers, as shown in Figure 14-34.
Figure 14-34 Format of a packet in transport mode
Mode
Transport
Protocol
AH IP Header AH TCP Header data
ESP ESP Auth

ESP IP Header ESP TCP Header data
Tail Data
The transport mode is applicable to a scenario in which two hosts, or a host and a security
gateway, are communicating with each other. In transport mode, the two devices encrypting
and decrypting packets must be the original packet sender and the final receiver, respectively.

Equipment
Authentication and encryption algorithms

Authentication algorithm
Both AH and ESP can check the integrity of IP packets and determine whether a packet has
been modified during data transmission. Authentication is based on the hash function, a type
of algorithm that does not limit the length of input messages, but always outputs messages of
a certain length. The output message is called the message summary. To authenticate message
integrity, IPSec peers calculate the message summary based on the hash function. If the
message summaries are identical on both peers, the packet is considered integrated and not to
have been modified. The following IPSec authentication algorithms are supported:
l MD5: generates a 128-bit message summary for an input message of any length.
l SHA-1: generates a 160-bit message summary for an input message of less than 264 bits.
l SHA2-256: generates a 128-bit message summary for an input message of less than 832
bits
bits
bits
The message summary generated by SHA-2 and SHA-1 is longer than that of MD5, and
therefore provides a more secure service.
Encryption algorithm
ESP encrypts an IP packet to prevent disclosure of the packet contents during transmission.
The encryption algorithm is implemented based on a symmetric key system, which encrypts
and decrypts data using the same key. IPSec uses the following two encryption algorithms:
l Data Encryption Standard (DES): uses a 56-bit key to encrypt a 64-bit packet in
plaintext.
l Triple Data Encryption Standard (3DES): uses three 56-bit keys (in effect, a 168-bit key)
to encrypt a packet in plaintext.
l Advanced Encryption Standard (AES): uses a 128-bit/192-bit/256-bit key to encrypt
packet in plaintext.
14.9.2.2 IPSec Implementation
The peer using IPSec perform various security functions for different data flows. The
implementation process of IPSec on the ATN is as follows:
l Define a security proposal, and specify the security protocol, authentication algorithm,
encryption algorithm, and encapsulation mode in the proposal.
l Define an SA, and specify the association relationship between the security protocols,
SPIs, and authentication keys.
l Apply the SA to the service type that requires protection.
Defining a Security Proposal

The security proposal prescribes the security protocol, authentication algorithm, encryption
algorithm, and encapsulation mode for service-based protection.

Equipment
The ATN supports AH and ESP security protocols. AH and ESP can be used independently or
together. Both AH and ESP supports three authentication algorithms (MD5, SHA-1 and
SHA-2), and ESP supports three encryption algorithms (DES, 3DES and AES).
A security proposal must be configured before associating to a service type.
Defining an SA
An SA imports the security proposal to specify the security protocol, authentication
algorithm, encryption algorithm, and encapsulation mode for service-based protection. An SA
determines the key for data authentication and encryption and is uniquely identified by the SA
name and SPI. Therefore, the key, SPI, and security proposal are required for an SA.
Applying an SA
Currently, IPSec configured on the ATN protects data based on the service type. Packets to be
protected are encapsulated and the receiving party drops the packets that are not protected by
IPSec or fail to be decapsulated. Service-based IPSec does not require an Access Control List
(ACL) to specify the data flow to be protected or a specific segment of an IPSec tunnel. It is
bound only to a specific service and protects all packets of this service, regardless of which
interface sends the packets.
14.9.3 Applications
14.9.3.1 IPSec Application in PIM
Service Overview
Protocol Independent Multicast (PIM) is the most widely used inter-domain multicast
protocol. PIM builds up an multicast distribution tree (MDT) to forward multicast data. PIM
therefore requires high levels of protection. PIM itself does not define any authentication
mechanism, therefore, if no additional authentication mechanism is configured for PIM,
packets will be prone to be intercepted, modified, or faked. This can potentially affect PIM
neighbor relationships and interrupt multicast network communication.
IPSec can be used for authenticating PIM packets. An AH or ESP header inserted into a PIM
packet provides a basis for data origin authentication and data integrity authentication,
protecting PIM neighbor relationships and network communication.
On the networking shown in Figure 14-35, the multicast service is deployed. ATN A set up
PIM neighbor relationships with ATN B and ATN C, and they need to exchange PIM protocol
packets to maintain the neighbor relationships and multicast routing entries. On this network,
there may be attackers who tend to attack the ATNs by sending pseudo PIM protocol packets,
which causes the ATNs fail to forward multicast data. To avoid attacks, you can configure
PIM IP Security (IPSec) on the interfaces of ATN A, ATN B and ATN C to authenticate the
IPv6 PIM protocol packets transmitted between them. In this manner, malicious attacks are
avoided to ensure normal multicast data transmission and Receiver can receive multicast data
from Source.

Equipment
Figure 14-35 Typical Networking
Ethernet
ATNA
Ethernet
Source PIM SM
ATNC
Receiver
ATNB
Ethernet
Receiver
IPSec SA Negotiation
Feature Deployment
IPSec can be deployed in a PIM process or on an interface.
PIM IPsec configured in the interface view has the same effect as that configured in the PIM
view, but their application scopes are different:
l PIM IPsec configured in the interface view: applies only to the current interface.
l PIM IPsec configured in the PIM view: applies to all interfaces.
PIM IPsec configured in the interface view takes precedence over PIM IPsec configured in
the PIM view. If no PIM IPsec configuration exists in the interface view, the interface uses the
PIM IPsec configuration in the PIM view.
As shown in Figure 14-35, IPSec is configured on all the interfaces so that PIM neighbor
relationships are set up only when IPSec authentication succeeds. Packets that fail IPSec
authentication or undergo different authentication modes on IPSec peers will be dropped.

Abbreviation
3DES Triple Data Encryption Standard
AES Advanced Encryption Standard

Equipment

Abbreviation
AH Authentication Header
DES Data Encryption Standard
ESP Encapsulating Security Payload
HMAC Hash Message Authentication Code
IPSec Internet Protocol Security
MD5 Message Digest 5
OSPFv3 Open Shortest Path First version 3
SA Security Association
SHA Secure Hash Algorithm
SPI Security Parameter Index

Equipment
Feature Description 15 User Management
15 User Management
About This Chapter
This document describes the overview, principle and typical applications of user management
feature.
15.1 AAA and User Management

15.2 DHCP
15.3 DHCPv6
Only the ATN 910/ATN 910I/ATN 910B/ATN 905/ATN 950B (with the AND2CXPB/
AND2CXPE configured) supports the DHCPv6 Relay.Only the ATN 910I/ATN 910B/ATN
905 support the DHCPv6 Client.
15.4 Plug-and-Play
15.5 DCN
15.6 PPPoE Access
The PPPoE configuration only supported by the ATN 905/ATN 910I/ATN 910B.
15.7 PPPoE+
Only ATN 910B supports PPPoE+.
15.8 802.1x Access
AND2CXPE configured) supports this command.
15.9 Attributes List of RADIUS, HWTACACS
This section lists the attributes of RADIUS, HWTACACS.
15.1 AAA and User Management

Equipment
15.1.1 Introduction to AAA and User Management
Definition
AAA, short for Authentication, Authorization, and Accounting, provides the following types
of security functions:
l Authentication: determines the users who can access the network.

l Authorization: authorizes users to use specific services.
l Accounting: records the utilization of network resources.
The ATN implements Authentication and Authorization through the Remote Authentication
Dial in User Service (RADIUS) protocol or the Huawei Terminal Access Controller Access
Control System (HWTACACS) protocol.
l RADIUS
RADIUS is one of the most commonly used protocols to implement AAA. As an
application-layer protocol running between the ATN and a RADIUS server, RADIUS
defines the procedure for transmitting user information and accounting information
between theATN and the RADIUS server and the format of packets exchanged between
them.
l HWTACACS
AAA can also be implemented through HWTACACS. HWTACACS is the enhancement
of TACACS that is an access control protocol defined in RFC 1492. Similar to RADIUS,
HWTACACS adopts the client/server model to communicate with the HWTACACS
server, therefore implementing Authentication and Authorization for various users.
Currently, the device manages users in the following modes:
l Domain-based user management

All users belong to a same domain. By default, if the user account that is input when a
user accesses the device does not contain a domain name, it indicates that users are
added to a default domain. The BRAS manages users by configuring service attributes
for a domain. Therefore, the users in the same domain have the same service attributes.
l User account-based user management
User accounts and related service attributes are configured on an AAA server such as the
RADIUS server or the HWTACACS server, and are then delivered to users when the
users get online.
In actual applications (except the applications of non-accounting) on the ATN, all user
accounts must be configured on an AAA server, and all the domains to which the user
accounts belong must be configured on the ATN. The ATN supports the configuration and
management of local user accounts.
Commonly, the service attributes configured in a domain have a lower priority than the
service attributes delivered by an AAA server. Therefore, when service attributes are both
configured for a domain and delivered by an AAA server, the ATN adopts the service
attributes that are delivered by the AAA server. The service attributes configured for a domain
take effect only when the AAA server does not support or deliver the service attributes.

Equipment
Purpose
The ATN implements AAA through either RADIUS, and implements Authentication,
Authorization and Accounting through HWTACACS.
The ATN supports domain-based or user account-based user management and supports
multiple authentication and accounting policies.
Benefits
This feature brings the following benefits to carriers:
l Access users are identified to guarantee legal service access.
l Authorities of access users are controlled through domain-based user management.
l The reliability of access user accounting is ensured through the RADIUS or
HWTACACS accounting accounting protocol and the local accounting function in case
of the remote accounting failure.
15.1.2 Principles
15.1.2.1 AAA
Authentication
The ATN supports the following authentication modes. The modes can be used in
combination.
l Local authentication
In this mode, user information, including the user name, password, and attributes, is
configured on the ATN. This mode features fast processing speed and low operation
costs. The major limitation is that the information storage capacity is subject to the
capacity of device hardware.
l Remote authentication
In this mode, user information, including the user name, password, and attributes, is
configured on an authentication server. The ATN supports remote authentication through
RADIUS or HWTACACS. As a client, the ATN communicates with the RADIUS or
HWTACACS server. The RADIUS protocol can be either a standard RADIUS protocol
or an extended RADIUS protocol of Huawei, that is, RADIUS+V1.0 or RADIUS+V1.1.
l First local authentication and later remote authentication
It is a local-authentication-preferred policy. That is, remote authentication is performed
only after local user name did not exist.
l First remote authentication and later local authentication
It is a remote-authentication-preferred policy. That is, local authentication is performed
only after the AAA server gives no response.
Authorization
The ATN supports user authorization during user login. During user login, the ATN supports
various types of authorization schemes.
The ATN supports the following authorization modes during user login:

Equipment
l Local authorization
In this mode, users are authorized based on the attributes of local user accounts
configured on the ATN.
l HWTACACS authorization
In this mode, users are authorized through a HWTACACS server.
l If-authenticated authorization
In this mode, users pass the authorization after passing authentication.
l RADIUS authorization
RADIUS integrates authentication and authorization. Therefore, RADIUS authorization
cannot be performed independently.
Accounting
The ATN supports Non-accounting, HWTACACS accounting, and RADIUS accounting
mode. By default, RADIUS accounting mode is adopted.
l Accounting mode
AAA supports the following accounting modes:
– Non-accounting
Free services are provided.
– Remote accounting
The ATN supports remote accounting through a RADIUS server.
During remote accounting, the real-time accounting function can be enabled. By
default, real-time accounting is disabled.
n Real-time accounting
During real-time accounting for online users, the ATN periodically generates
accounting packets and then sends them to a remote accounting server. Real-
time accounting is also a bill protection measure. It furthest reduces error bills
and ensures accuracy of accounting information in case of a link failure.
l Accounting failure policy
The ATN supports the configuration of a remote accounting failure policy. Remote
accounting failure policies include:
– Policy for start-accounting failures
When start-accounting fails,
n If the policy is set to "offline", the ATN terminates user access.
n If the policy is set to "online", the user remains online but no real-time
accounting packets can be exchanged between the user and the AAA server,
even though the AAA server gives a response again. The user still needs to
send an accounting packet to the AAA server for going offline.
– Policy for real-time accounting failures
When real-time accounting fails,
n If the policy is set to "offline", the ATN terminates user access.
n If the policy is set to "online", the user remains online and sends real-time
accounting packets to the AAA server. If the user needs to go offline, it sends
an accounting packet to the AAA server.

Equipment
15.1.2.2 RADIUS
RADIUS Message Format

Figure 15-1 shows the format of a RADIUS message.
Figure 15-1 RADIUS message format

0-1- 2- 3- 4- 5- 6-7- 0-1- 2- 3- 4- 5- 6-7- 0-1- 2- 3- 4- 5- 6-7- 0-1- 2- 3- 4- 5- 6-7
1 Code Identifier Length
2
3
Authenticator
4
5
6 Attribute
The meaning of each field is described as follows:

l Code: indicates the message type, such as the access request, access permission, and
accounting request.
l Identifier: is a string of numbers in ascending order for matching the request and
response packets.
l Length: indicates the total length of all fields.
l Authenticator: is used for checking the validity of a RADIUS message.
l Attribute: indicates the contents of a message, describing user attributes.
Process of Exchanging RADIUS Messages

The RADIUS server builds a unique database to store user names and passwords that are
required for authentication. To obtain the right to access certain networks or to use certain
network resources, a user needs to set up a connection with the ATN through a device. In this
case, the ATN functions in connecting the user and the device.
The ATN is responsible for sending information about the user to the RADIUS server.
RADIUS prescribes how to transmit information between the ATN and the RADIUS server.
The RADIUS server receives connection requests from users, authenticates users, and then
sends the required configuration information back to the ATN.
The authentication information between the ATN and the RADIUS server is transmitted with
a key. This protects the user password from theft on an insecure network. Figure 15-2 shows
the process of exchanging RADIUS messages between the RADIUS server and client.
Figure 15-2 Process of exchanging RADIUS messages between the RADIUS server and
client
1.User name
password 2.Request
3.Response
User ATN RADIUS sever

Equipment
1. A user initiates authentication and sends a user name and password to the ATN.
2. After the RADIUS client configured on the ATN receives the user name and password, it
sends an authentication request to the RADIUS server.
3. If the request is valid, the RADIUS server completes the authentication and sends the
required authorization information back to the RADIUS client.
Authentication information is encrypted before being transmitted between the RADIUS client
and RADIUS server. This prevents theft of information on an insecure network.
The process of exchanging accounting messages is similar to that of exchanging
authentication or authorization messages.
RADIUS Features
RADIUS adopts the server/client model and has the following characteristics:
l RADIUS features excellent real-time performance by using the User Datagram Protocol
(UDP) as the transmission protocol.
l RADIUS possesses high reliability owing to the retransmission mechanism and backup
server mechanism.
l RADIUS is easy to implement and is applicable to the multi-threaded server in the case
of a large number of users.
RADIUS Versions
The ATN supports standard RADIUS, RADIUS+V1.0, and RADIUS+V1.1. RADIUS+V1.1
and RADIUS+V1.0, derived from the standard RADIUS protocol, are Huawei proprietary
protocols. With these protocols. The two protocols are both applicable to IPHotel and Portal
services though they are different in expansion.
l RADIUS+V1.0
In RADIUS+V1.0, a private attribute set is suffixed to the standard attribute set. That is,
the private attributes are added to the standard attribute set. Such an extension may
conflict with the subsequent extension of the standard RADIUS protocol.
l RADIUS+V1.1
In RADIUS+V1.1, all private attributes are considered a subset to be contained in the
vendor-specific attribute defined in RFC 2865. This ensures the interworking and
controllability between extended RADIUS+V1.1 of Huawei and the extended RADIUS
protocols defined by other vendors, and avoids the conflict between extended RADIUS
+v1.1 of Huawei and the subsequent extension of the standard RADIUS protocol.
RADIUS Implementation on the ATN

As a RADIUS client, the ATN implements the following functions:
l Caches the accounting-stop packets locally and retransmits them.
If the number of retransmission failures exceeds the set value, the accounting-stop
packets are saved to the buffer queue. The system periodically scans the queue, extracts
the packets, sends them to the specific server, and enables the waiting timer. If the
transmission fails or no response packet is received from the server within the timeout
period, the packets are put to the buffer queue again.

Equipment
l Automatically switches to another RADIUS server in the server group.

If the current server does not work or the number of retransmission events exceeds the
set maximum number, the ATN selects another server in the server group to transmit
packets.
l Switches RADIUS attributes.
The ATN supports the RADIUS attribute switching function. When the RADIUS
attribute switching function is enabled and then configured, the ATN encapsulates or
parses the original attribute value in accordance with the post-switching attribute format
during the transmission of RADIUS messages. In this manner, the ATN can interwork
with other devices.
Dynamic Authorization
The RADIUS server changes the service attributes of online users through CoA packets. The
format of the CoA packet is the same as that of the normal RADIUS packet, as shown in
Figure 15-1. In the CoA packet, the Code field has the following values:
l 43, CoA-Request packet
l 44, CoA-ACK packet
l 45, CoA-NAK packet
The dynamic authorization is applied in destination address accounting (DAA) services that
use service policies delivered by the RADIUS server. For details of the DAA services, refer to
"Configuring DAA Service" of the ATNMulti-service Access EquipmentConfiguration Guide
- User Access - DAA Configuration.
Figure 15-3 shows the basic process of dynamic authorization.
Figure 15-3 Dynamic authorization
User ATN RADIUSserver Portalserver
1. User subscribes to
a service
2. Information about
the ordered service
3. CoA-Request
packet
4. CoA-NAK/CoA-
NAK packet
5. Updated policy
The basic process of dynamic authorization is as follows:

Equipment
1. The user accesses the portal server and subscribes to a DAA service online.
2. The portal server sends information about the service that the user subscribes to the
RADIUS server.
3. The RADIUS server makes or modifies the service policies according to the service
information and user information. Then, the RADIUS server sends the CoA-Request
packet to the ATN, and requests to modify the authorization information of the user.
4. After receiving the CoA-Request packet from the RADIUS server, the ATN modifies the
authorization information of the user without changing the online state of the user. If the
modification is successful, the ATN returns the CoA-ACK packet to the RADIUS server;
if the modification fails, the ATN returns the CoA-NAK packet to the RADIUS server.
5. If the modification is successful, when the user uses the DAA service, the ATN controls
the service based on the modified authorization attribute.
Disconnect Message
Disconnect Message (DM), is the operation of AAA server to get users offline. In the DM
packet, the Code field has the following values:
l 40 - Disconnect-Request
l 41 - Disconnect-ACK
l 42 - Disconnect-NAK
Figure 15-4 shows the basic process of DM.
Figure 15-4 DM
RADIUS
User ATN Server Portal Server
1. Forces the user to

2. DM Request get offline
packet
3. DM Response
packet 4. Updates new
policy succeeds
5. Informs the user
to get offline
The basic process of DM is as follows:

1. The administrator forces the user to go offline. The portal server sends information about
the service that the user goes offline to the RADIUS server.
2. The RADIUS server sends the DM-Request packet to the ATN and requests the user to
go offline.

Equipment
3. After receiving the DM-Request packet from the RADIUS server, the ATN processes the
DM message without changing the online state of the user. If the processing is
successful, the ATN returns the DM-ACK packet to the RADIUS server; if the
processing fails, the ATN returns the DM-NAK packet to the RADIUS server.
4. If the processing is successful, the ATN instructs the user to go offline.
15.1.2.3 HWTACACS
Format of an HWTACACS message

The process of transmitting HWTACACS messages is similar to that of transmitting RADIUS
messages.
Features of HWTACACS
Compared with RADIUS, HWTACACS is more reliable in transmission and encryption and
therefore is more suitable for security control. Table 15-1 shows comparisons between
HWTACACS and RADIUS.
Table 15-1 Comparisons between HWTACACS and RADIUS

HWTACACS RADIUS
Uses the Transmission Control Protocol Uses UDP.

(TCP) to provide reliable transmission.
Encrypts the main structure of a packet Encrypts only the password field in the
except the standard HWTACACS header. authentication packet.
Separates authorization from Performs authentication together with

authentication. authorization.
Is suitable for security control. Is suitable for accounting.
Authorizes the commands executed by Does not authorize the commands executed by
administrative users. administrative users.
Command-Line Authorization in HWTACACS

HWTACACS supports command-line authorization for the users with specific levels in a
specified domain or a specified Secure Shell (SSH) user.
In command-line authorization mode, after a user logs in to the ATN through Telnet or SSH,
every command input by the user needs to be authorized by the HWTACACS server. The
command can be run only after command-line authorization is passed. Otherwise, the
HWTACACS server displays a message to inform the user that command-line authorization
fails and the command cannot be run.
If the ATN does not receive any authorization response from the HWTACACS server within
the timeout period set by the user, it considers that the command-line authorization times out,
and therefore the command cannot be run.
Figure 15-5 shows the process of command-line authorization in HWTACACS.

Equipment
Figure 15-5 Process of command-line authorization in HWTACACS

1.command 2.author-cmd REQ
3.author-cmd ACK
User ATN TACACS
Server
1. The user enters a command on the ATN.

2. The ATN sends a command-line authorization request to the TACACS server.
3. The TACACS server returns the authorization result to the ATN. If authorization
succeeds, the user can run the command of the corresponding level; otherwise, the user
cannot run the command.
15.1.2.4 User Management
Overview
Currently, the device manages users in the following modes:
l Domain-based user management

All users belong to a same domain. By default, if the user account that is input when a
user accesses the device does not contain a domain name, it indicates that users are
added to the default domain. The BRAS manages users by configuring service attributes
in a domain. Thus, the users in the same domain have the same service attributes.
l User account-based user management
User accounts and related service attributes are configured on an AAA server such as the
RADIUS server or the HWTACACS server, and are then delivered to users when the
users get online.
The service attributes configured for a domain have a lower priority than the service attributes
delivered by an AAA server. Therefore, when service attributes are both configured for a
domain and delivered by an AAA server, the ATN adopts the service attributes that are
delivered by the AAA server. The service attributes configured for a domain take effect only
when the AAA server does not support or deliver the service attributes.
Overview of a Domain
The ATN supports a user account in the format of username@domain or domain@username.
Here, @ is a domain name delimiter. The positions of the domain name and the user name can
be exchanged. If the user account that is input when a user accesses the ATN does not contain
a domain name, it indicates that the user belongs to the default domain of the system.
l Default domain
A default domain is fixed in the system. The service attributes of the default domain can
be modified rather than deleted.
The ATN has one default domain: default_admin, as shown in Table 15-2.

Equipment
Table 15-2 Default domains of the ATN

Name Description Default
Attributes
default_admin It is a domain to which an operation user belongs. In First local

the case that an operation user logs in to the ATN authentication and
through Telnet or SSH, if the operation user inputs a later RADIUS
user account that does not contain a domain name authentication
during authentication, the ATN by default considers Non-accounting
that the operation user belongs to default_admin.
15.1.3 Applications
15.1.3.1 RADIUS Authentication and Accounting

User access the ATN through the Internet. The users send authentication packets to the
RADIUS server for authentication and authorization, and send accounting packets to the
RADIUS server for accounting. When the master server goes Down, the packets are switched
to the backup server for authentication or accounting. After the authentication succeeds, the
RADIUS server delivers corresponding rights to the users.
Figure 15-6 shows the network diagram of RADIUS authentication and accounting.
Figure 15-6 Network diagram of RADIUS authentication and accounting
RADIUS RADIUS
(master) (backup)
129.7.66.66 129.7.66.67
BTS1
Internet
BTS2
ATN user
BTS3
15.1.3.2 HWTACACS Authentication, Accounting, and Authorization

User accesses the ATNthrough the Internet. The users send authentication packets to the
HWTACACS server for authentication , accounting,and authorization. When the master

Equipment
server goes Down, the packets are switched to the backup server for authentication. After the
authentication succeeds, the HWTACACS server delivers corresponding rights to the users.
Figure 15-7 shows the network diagram of HWTACACS authentication, accounting, and
authorization.
Figure 15-7 Networking diagram of HWTACACS authentication, Accounting, and

authorization
HWTACACS HWTACACS
(master) (backup)
BTS1
Internet
BTS2 ATN
user
BTS3

Acronym & Full Name
Abbreviation
AAA Authentication Authorization Accounting
RADIUS Remote Authentication Dial In User Service
HWTACACS HUAWEI Terminal Access Controller Access Control

System
15.2 DHCP
15.2.1 DHCP Overview

Definition
The Dynamic Host Configuration Protocol (DHCP) dynamically assigns IP addresses to users
and manages user configurations in a centralized manner.

Equipment
Purpose
As the network expands and becomes complex, the number of hosts often exceeds the number
of available IP addresses. As portable computers and wireless networks are widely used, the
positions of computers often change, causing IP addresses of the computers to be changed
accordingly. As a result, network configurations become increasingly complex. To properly
and dynamically assign IP addresses to hosts, DHCP is used.
DHCP is developed based on the BOOTstrap Protocol (BOOTP). BOOTP runs on networks
where each host has a fixed network connection. The administrator configures a BOOTP
parameter file for each host, and the file remains unchanged for a long period of time. DHCP
has the following new features compared with BOOTP:
l Dynamically assigns IP addresses and configuration parameters to clients.
l Enables a host to obtain an IP address dynamically, but does not specify an IP address
for each host.
DHCP rapidly and dynamically allocates IP addresses, which improves IP address usage.
15.2.2 Principles
This section describes the implementation of DHCP.
15.2.2.1 DHCP Overview

DHCP uses the client/server model. A DHCP client sends a packet to a DHCP server to
request configuration parameters such as the IP address, subnet mask, and default gateway
address. The DHCP server responds with a packet carrying the requested configurations based
on a policy.
DHCP Architecture
Figure 15-8 shows the DHCP architecture.
Figure 15-8 DHCP architecture
DHCP Client DHCP Relay DHCP Server
IP Network
DHCP involves the following roles:

l DHCP Client
A DHCP client exchanges messages with a DHCP server to obtain an IP address and
other configuration parameters. On the device, an interface can function as a DHCP
client to dynamically obtain configuration parameters such as an IP address from a
DHCP server. This facilitates configurations and centralized management.
l DHCP Relay
A DHCP relay agent forwards DHCP packets exchanged between a DHCP client and a
DHCP server that are located on different network segments so that they can complete

Equipment
their address configuration. Using a DHCP relay agent eliminates the need for deploying
a DHCP server on each network segment. This feature reduces network deployment
costs and facilitates device management.
In the DHCP architecture, the DHCP relay agent is optional. A DHCP relay agent is
required only when the server and client are located on different network segments.
l DHCP Server
A DHCP server processes requests of address allocation, address lease extending, and
address releasing from a DHCP client or a DHCP relay agent, and allocates IP addresses
and other network configuration parameters to the DHCP client.
15.2.2.2 Introduction to DHCP Messages
DHCP Message Format

Figure 15-9 shows the format of a DHCP message.
Figure 15-9 Format of a DHCP message

0 7 15 23 31
op(1) htype (1) hlen (1) hops (1)
xid (4)
secs (2) flags (2)
ciaddr (4)
yiaddr (4)
siaddr (4)
giaddr (4)
chaddr (16)
sname (64)
file (128)
options (variable)
In Figure 15-9, numbers in the round brackets indicate the field length, expressed in bytes.

Equipment
Table 15-3 Description of each field in a DHCP message

op(op 1 byte Indicates the message type. The options are as follows:
code) l 1: DHCP Request message
l 2: DHCP Reply message
htype 1 byte Indicates the hardware address type. For Ethernet, the value of
(hardware this field is 1.
type)
hlen 1 byte Indicates the length of a hardware address, expressed in bytes. For
(hardware Ethernet, the value of this field is 6.
length)
hops 1 byte Indicates the number of DHCP relay agents that a DHCP Request
message passes through. This field is set to 0 by a DHCP client or
a DHCP server. The value increases by 1 each time a DHCP
Request message passes through a DHCP relay agent. This field
limits the number of DHCP relay agents that a DHCP message
can pass through.
xid 4 bytes Indicates a random number chosen by a DHCP client. It is used by

the DHCP client and DHCP server to exchange messages.
secs 2 bytes Indicates the time elapsed since the client obtained or renewed an
(seconds) IP address, in seconds.
flags 2 bytes Indicates the Flags field. Only the leftmost bit of the Flags field is
valid and other bits are set to 0. The leftmost bit determines
whether the DHCP server unicasts or broadcasts a DHCP Reply
message. The options are as follows:
l 0: The DHCP server unicasts a DHCP Reply message.
l 1: The DHCP server broadcasts a DHCP Reply message.
ciaddr 4 bytes Indicates the IP address of a client. The IP address can be an

(client ip existing IP address of a DHCP client or an IP address assigned by
address) a DHCP server to a DHCP client. During initialization, the client
has no IP address and the value of this field is 0.0.0.0.
NOTE
The IP address 0.0.0.0 is used only for temporary communication during
system startup in DHCP mode. It is an invalid address.
yiaddr 4 bytes Indicates the DHCP client IP address assigned by the DHCP
(your client server. The DHCP server fills this field into a DHCP Reply
ip address) message.
siaddr 4 bytes Server IP address from which a DHCP client obtains the startup
(server ip configuration file.
address)

Equipment
giaddr 4 bytes Indicates the IP address of the first DHCP relay agent. If the
(gateway ip DHCP server and client are located on different network
address) segments, the first DHCP relay agent fills its IP address into this
field of the DHCP Request message sent by the client and
forwards the message to the DHCP server. The DHCP server
determines the network segment where the client resides based on
this field, and assigns an IP address on this network segment from
an address pool.
The DHCP server also returns a DHCP Reply message to the first
DHCP relay agent. The DHCP relay agent then forwards the
DHCP Reply message to the client.
chaddr 16 Indicates the client MAC address. This field must be consistent
(client bytes with the hardware type and hardware length fields. When sending
hardware a DHCP Request message, the client fills its hardware address
address) into this field. For Ethernet, a 6-byte Ethernet MAC address must
be filled in this field when the hardware type and hardware length
fields are set to 1 and 6 respectively.
sname 64 Indicates the name of the server from which a client obtains
(server host bytes configuration parameters. This field is optional and is filled in by
name) the DHCP server. The field must be filled in with a character
string that ends with 0.
file (file 128 Indicates the Bootfile name specified by the DHCP server for a
name) bytes DHCP client. This field is filled in by the DHCP server and is
delivered to the client when the IP address is assigned to the
client. This field is optional. The field must be filled in with a
character string that ends with 0.
options Variabl Indicates the DHCP Options field, which has a maximum of 312
e bytes. This field contains the DHCP message type and
configuration parameters assigned by a server to a client,
including the gateway IP address, DNS server IP address, and IP
address lease.
DHCP Message Types

DHCP messages are classified into eight types. A DHCP server and a DHCP client
communicate by exchanging DHCP messages.
Table 15-4 DHCP message types
Message Description
Name
DHCP A DHCP Discover message is broadcast by a DHCP client to locate a

DISCOVER DHCP server when the client attempts to connect to a network for the
first time.

Equipment
Message Description
Name
DHCP OFFER A DHCP Offer message is sent by a DHCP server to respond to a DHCP
Discover message. A DHCP Offer message carries various configuration
information.
DHCP A DHCP Request message is sent in the following conditions:

REQUEST l After a DHCP client is initialized, it broadcasts a DHCP Request
message to respond to the DHCP Offer message sent by a DHCP
server.
l After a DHCP client restarts, it broadcasts a DHCP Request message
to confirm the configuration including the assigned IP address.
l After a DHCP client obtains an IP address, it unicasts or broadcasts a
DHCP Request message to update the IP address lease.
DHCP ACK A DHCP ACK message is sent by a DHCP server to acknowledge the
DHCP Request message from a DHCP client. After receiving a DHCP
ACK message, the DHCP client obtains the configuration parameters
including the IP address.
DHCP NAK A DHCP NAK message is sent by a DHCP server to reject the DHCP
Request message from a DHCP client. For example, after a DHCP server
receives a DHCP Request message, it cannot find matching lease records.
Then the DHCP server sends a DHCP NAK message, notifying that no IP
address is available for the DHCP client.
DHCP A DHCP Decline message is sent by a DHCP client to notify the DHCP
DECLINE server that the assigned IP address conflicts with another IP address.
Then the DHCP client applies to the DHCP server for another IP address.
DHCP A DHCP Release message is sent by a DHCP client to release its IP

RELEASE address. After receiving a DHCP Release message, the DHCP server can
assign this IP address to another DHCP client.
DHCP A DHCP Inform message is sent by a DHCP client to obtain other

INFORM network configuration parameters such as the gateway address and DNS
server address after the DHCP client has obtained an IP address.
15.2.2.3 Description of the Option 82 Field
Format of the Option 82 Field

Option 82 is a Information option recording the location information of the DHCP client. It is
a special field contained in a DHCP message.
When a DHCPREQUEST message sent from a DHCP client passes through a interface, the
relay agent appends an Option 82 field to this DHCPREQUEST message. When receiving the
DHCPREQUEST message carrying an Option 82 field, the DHCP server returns a Response
message containing the same Option 82 field to the DHCP relay agent.

Equipment
As shown in Figure 15-10, the Code field in Option 82 is 82; the Length field indicates the
total number of bytes in the Agent Information field; the iN field indicates a sub-option of the
Agent Information field and each sub-option is a SubOpt/Length/Value tuple.
The initially assigned device sub-options are as follows:1: agent circuit ID sub-option
A DHCP server uses the agent circuit ID sub-option for IP and other parameter assignment
policies.
Figure 15-10 Format of a message with an Option 82 field

Code Length Agent Information Field
i
82 N i1 i2 i3 i4 i5 …
N
The ATN device uses the Option 82 field to define the address assignment policies or other
policies for the DHCP server to perform.
Appending an Option 82 Field to the DHCP Message on an ATN Device

When an Option 82 field is appended to the DHCP message on an ATN device, the ATN
device refers to the device.
As shown in Figure 15-11, after Option 82 is enabled, the device appends an Option 82 field
to the DHCPDISCOVER and DHCPREQUEST messages. The DHCP server then performs
the IP assignment policy and other policies based on Option 82.
The DHCPREPLY message returned by the DHCP server also carries the Option 82 field.
After receiving the DHCPREPLY message, the interface removes the Option 82 field from the
message before forwarding the message to the client.

Equipment
Figure 15-11 Appending an Option 82 field to the DHCP message on an ATN Device
Client3
Client2
DHCP DHCP
ATN Relay Server
Client1 Internet
Discover
Discover+Option82
Offer+Option82
Offer
Request
Request+Option82
Ack+Option82
Ack
Data Exchange
Option 82 Implementation
After Option 82 is enabled, a interface checks whether the DHCPREQUEST message sent
from a client or the message ready to send to a client contains an Option 82 field.
l If the DHCPREQUEST message contains an Option 82 field, do as follows:

Check configurations about Option 82 field appending, if the current interface is
configured with the Rebuild mode, it indicates that this interface does not trust the
Option 82 field contained in the received message and must modify Sub-option 1
contained in the Option 82 field.
l If the DHCP Request packet does not contain the Option 82 field:
The device adds an Option 82 field with Sub-option 1.
When the DHCPREPLY message is forwarded, the device first checks whether the message
contains Sub-option 1 and whether the sub-option contains the Huawei Device Identifier field.
If so, the device can successfully parse the Option 82 field, and then removes the Huawei
Device Identifier field from Sub-option 1 before forwarding the message.
15.2.2.4 Operation Principle of a DHCP Client

Only the ATN 910I/ATN 910B/ATN 905 support the DHCP client.
Figure 15-12 shows the operation principle of a DHCP client.

Equipment
Figure 15-12 Operation principle of a DHCP client
DHCP Client DHCP Server
1.DHCP Discover
2.DHCP Offer
3.DHCP Request
4.DHCP ACK
1. The DHCP client sends a DHCPDISCOVER packet to the DHCP server and enters the
selecting state. Then, the DHCP client creates a timer for waiting DHCPOFFER packets
from the DHCP server.
– If the DHCP client receives a non-DHCPOFFER packet, it discards the packet.
– If the DHCP client receives no DHCPOFFER packet before the timer expires, the
DHCP client is initialized and sends another request for an IP address.
2. After receiving a DHCPOFFER packet, the DHCP client deletes the timer and sends a
DHCP request. Then, the DHCP client creates a timer for waiting a DHCPACK packet.
– If the DHCP client receives a packet that is not a DHCPACK or DHCPNAK packet,
it discards the packet.
– If the DHCP client receives a DHCPNAK packet, it sends another request for an
address.
– If the DHCP client has not received a DHCPACK or DHCPNAK packet before the
timer expires, it sends request packets four times at intervals of 4s, 8s, 16s, and 32s,
respectively. If the DHCP client still does not receive any response within 60s after
the request packets have been sent for four times, it re-sends DHCPDISCOVER
packets at intervals of 4s, 8s, 16s, 32s, 64s in sequence and 64s for later on to
initialize address allocation and apply for an IP address again until a DHCPOFFER
packet is received.
3. After being allocated an IP address, the DHCP client sends a gratuitous ARP packet to
check whether the allocated address is already in use. If the address is in use, the DHCP
client sends a DHCPDECLINE packet to the DHCP server and returns to the initial state.
15.2.2.5 DHCP Relay Principles
The DHCP relay function enables message exchanges between a DHCP server and a client on
different network segments. When the DHCP client and server are on different network
segments, the DHCP relay agent transparently transmits DHCP messages to the destination

Equipment
DHCP server. In this way, DHCP clients on different network segments can communicate
with one DHCP server.
Figure 15-13 shows how a DHCP client uses the DHCP relay agent to apply for an IP address
for the first time.
Figure 15-13 Working process of a DHCP relay agent

Client DHCP Relay Server
DHCP DISCOVER DHCP DISCOVER

Step1 (Broadcast) (Unicast)
DHCP OFFER DHCP OFFER
Step2 (Unicast)
DHCP REQUEST DHCP REQUEST
Step3 (Broadcast) (Unicast)
DHCP ACK/DHCPNAK
DHCP ACK/DHCPNAK (Unicast)
Step4
Figure 15-13 shows the working process of a DHCP relay agent. The DHCP client sends a
Request message to the DHCP server. When receiving the message, the DHCP relay agent
processes and unicasts the message to the specified DHCP server on the other network
segment. The DHCP server sends requested configurations to the client through the DHCP
relay agent based on information in the Request message.
1. After receiving a DHCP Discover message or a Request message, the DHCP relay agent
performs the following operations:
– Discards DHCP Request messages whose number of hops is larger than the hop
limit to prevent loops. Or, increases the value of the hop by 1, indicating that the
message passes through a DHCP relay agent.
– Checks the giaddr field. If the value is 0, set the value of the giaddr field to the IP
address of the interface which receives the Request message. Selects one IP address
if the interface has multiple IP addresses. All the Request messages received by the
interface later use this IP address to fill the giaddr field. If the value is not 0, do not
change the value.
– Sets the TTL in the request packets to the default value in the DHCP relay device,
not the value calculated by decreasing the original TTL by 1. You can change the
value of the hops field to prevent loops and limit hops.
– Changes the destination IP address of the DHCP Request message to the IP address
of the DHCP server or the IP address of the next DHCP relay agent. In this way, the
DHCP Request message can be forwarded to the DHCP server or the next DHCP
relay agent.
2. The DHCP server assigns IP addresses to the client based on the Relay Agent IP Address
field and sends the DHCP Reply message to the DHCP relay agent specified in the Relay
Agent IP Address field. After receiving the DHCP Reply message, the DHCP relay agent
performs the following operations:

Equipment
– The DHCP relay agent assumes that all the Reply messages are sent to the directly-
connected DHCP clients. The Relay Agent IP Address field identifies the interface
directly connected to the client. If the value of the Relay Agent IP Address field is
not the IP address of a local interface, the DHCP relay agent discards the Reply
message.
– The DHCP relay agent checks the broadcast flag bit of the message. If the broadcast
flag bit is 1, the DHCP relay agent broadcasts the DHCP Reply message to the
DHCP client; otherwise, the DHCP relay agent unicasts the DHCP Reply message
to the DHCP client. The destination IP address is the value in the Your (Client) IP
Address field, and the MAC address is the value in the Client Hardware Address
field.
Figure 15-14 shows how a DHCP client extends the IP address lease through the DHCP relay
agent.
Figure 15-14 Extending the IP address lease through the DHCP relay agent
Client DHCP Relay Server
DHCP RESQUEST(Unicast)
Step1
DHCP ACK/DHCPNAK (Unicast)

Step2
1. After accessing the network for the first time, the DHCP client only needs to unicast a
DHCP Request message to the DHCP server that assigned its currently-used IP address.
2. The DHCP server then directly unicasts a DHCP ACK message or a DHCP NAK
message to the client.
DHCP Releasing
The DHCP relay agent, instead of the client, can send a Release message to the DHCP server
to release the IP addresses that assigned to the DHCP clients. You can configure a command
on the DHCP relay agent to release the IP addresses that the DHCP server assigns to the
DHCP client.
15.2.2.6 Working Principles of a DHCP Server

A DHCP server assigns IP addresses to clients. After a DHCP client sends a packet to the
DHCP server to request configuration parameters (such as the IP address, subnet mask, and
default gateway), the DHCP server responds with a packet carrying the requested
configurations based on a policy. Both request and reply packets are encapsulated using UDP.

Equipment
Interaction Modes Between a DHCP Client and Server

To obtain a valid dynamic IP address, a DHCP client exchanges different information with a
server at different stages. Generally, the DHCP client and server interact in the following
modes (defined in RFC 2131):
l The DHCP client accesses a network for the first time.
Figure 15-15 Interaction with the DHCP server when the DHCP client accesses a
network for the first time
Client Server
DHCP DISCOVER
Step1
DHCP OFFER
DHCP REQUEST
Step2
DHCP ACK/DHCPNAK
When the DHCP client accesses a network for the first time, it goes through the
following stages to set up a connection to the DHCP server:
a. Discovery stage: The DHCP client searches for the DHCP server. The DHCP client
broadcasts a DHCP Discover packet, and only the DHCP server replies to the
packet.
b. Offer stage: The DHCP server offers an IP address to the DHCP client. After
receiving the DHCP Discover packet from the DHCP client, the DHCP server
selects an unassigned IP address from the IP address pool and then sends to the
DHCP client a DHCP Offer packet that carries information about the leased IP
address and other settings.
c. Request stage: The DHCP client selects an IP address. If multiple DHCP servers
send DHCP Offer packets to the DHCP client, the DHCP client accepts only the
first received DHCP Offer packet and then broadcasts to each DHCP server a
DHCP Request packet that carries information about the selected IP address.
d. Acknowledgement stage: The DHCP server acknowledges the IP address that is
offered. After receiving the DHCP Request packet from the DHCP client, the
DHCP server sends the DHCP client a DHCP ACK packet that carries the offered
IP address and other settings. After receiving the DHCP ACK packet, the DHCP
client broadcasts a gratuitous ARP packet to check whether any host is using the IP
address assigned by the DHCP server. If the DHCP client does not receive a
response within a specified period, it uses the IP address. If the DHCP client
receives a response within a specified period, it sends a DHCP Decline packet to the
DHCP server to notify the DHCP server that the IP address is unavailable. The
DHCP client then re-applies for an IP address.
The unassigned IP addresses offered by other DHCP servers (except the DHCP server
selected by the DHCP client) are available for other DHCP clients.

Equipment
l The DHCP client accesses a network for the second time.

When the DHCP client accesses a network for the second time, it goes through the
following stages to set up a connection to the DHCP server:
a. If the DHCP client has ever correctly accessed the network, it broadcasts only a
DHCP Request packet that carries the previously assigned IP address when
accessing the network again.
b. After receiving the DHCP Request packet, the DHCP server sends a DHCP ACK
packet to instruct the DHCP client to continue to use the previously assigned IP
address if the IP address is not assigned to another DHCP client.
c. If the IP address cannot be assigned to the DHCP client (for example, it has been
assigned to another DHCP client), the DHCP server sends a DHCP NAK packet to
the DHCP client. After receiving the DHCP NAK packet, the DHCP client sends a
DHCP Discover packet to apply for a new IP address.
l The DHCP client extends the IP address lease.
Generally, each dynamic IP address assigned to the DHCP client has a lease. The DHCP
server withdraws the IP address after the lease expires. To continue to use the IP address,
the DHCP client must renew the IP address lease.
Static and Dynamic Assignment of IP Addresses

Different hosts require different leases for IP addresses. For example, servers may require
fixed IP addresses for a long time; some enterprise hosts may require dynamically assigned IP
addresses for a long time; some clients may require only temporary IP addresses.
To meet the preceding requirements, the DHCP server provides the following IP address
assignment policies:
l Manual address assignment: An administrator assigns fixed IP addresses to a few
specific hosts, such as the WWW server.
l Automatic address assignment: The DHCP server assigns fixed IP addresses to hosts that
access a network for the first time. These IP addresses can be used by the hosts for a long
time.
l Dynamic address assignment: The DHCP server assigns IP addresses with leases to
clients. The clients must apply for new IP addresses after the leases expire. This address
assignment policy is widely used.
The DHCP server assigns an IP address to a client in the following sequence:
1. IP address that is in the database of the DHCP server and is statically bound to the MAC
address of the client
2. IP address assigned to the client before, that is, the IP address in the requested IP Addr
option of a DHCP Discover packet sent by the client
3. IP address that is first found when the DHCP server searches the DHCP address pool for
available IP addresses
4. Available IP address that is obtained in turn from the expired IP addresses and
conflicting IP addresses if the DHCP address pool has no available IP address
Method for Preventing Repetitive IP Address Assignment

To prevent address conflict, the DHCP server sends ping packets to check whether the IP
address is being used before assigning an IP address to a client.

Equipment
After sending a ping packet, the DHCP server checks whether a response to the ping packet
can be received within a specified period. If the number of ping packets reaches the upper
limit there is still no response, the DHCP server considers that the IP address has not been
used by any device on the network segment of the IP address, ensuring that the IP address
assigned to the client is unique (implemented according to RFC 2132).
The DHCP server sends two ping packets by default, and the default timeout period for each
ping response is 500 ms.
Lease of IP Addresses in an Address Pool

Different address pools can have different IP address leases, but IP addresses in the same
address pool have the same lease.
Generally, each dynamic IP address assigned to the DHCP client has a lease. The DHCP
server withdraws the IP address after the lease expires. To continue to use the IP address, the
DHCP client must renew the IP address lease.
When obtaining an IP address, the DHCP client enters the binding state. Three timers are
configured for the DHCP client to respectively control lease renewal, rebinding, and lease
expiration. When assigning an IP address to the client, the DHCP server can specify values
for the timers. If the DHCP server does not specify values for the timers, the DHCP client
uses the default values. The following table lists the default values of the timers.
Table 15-5 Default values of the timers

Timer Default Value
Lease renewal 50% of the lease
Rebinding 87.5% of the lease
Lease expiration Lease
When the lease renewal timer expires, the DHCP client must renew its IP address. The DHCP
client automatically sends a DHCP Request packet to the DHCP server that assigns the IP
address and enters the renewal state. If the IP address is valid, the DHCP server replies with a
DHCP ACK packet to entitle the DHCP client a new lease. The DHCP client then re-enters
the binding state. If the DHCP client receives a DHCP NAK packet from the DHCP server, it
enters the initializing state.
After sending a DHCP Request packet for extending the lease, the DHCP client remains in the
renewal state and waits for a response. If the DHCP client does not receive any response from
the DHCP server until the rebinding timer expires, it considers the original DHCP server
unavailable and starts to broadcast a DHCP Request packet.
Any DHCP server on the network can reply to the DHCP Request packet with a DHCP ACK
or DHCP NAK packet.
If the DHCP client receives a DHCP ACK packet, it re-enters the binding state and resets the
lease renewal and rebinding timers. If the DHCP client receives only DHCP NAK packets, it
stops using the IP address immediately and returns to the initializing state to apply for a new
IP address.
If the DHCP client does not receive any responses before the lease expiration timer expires, it
stops using the IP address immediately and returns to the initializing state. The DHCP client

Equipment
then sends a DHCP Discover packet to apply for a new IP address (implemented according to
RFC 2131).
15.2.3 Applications
This section describes the applicable scenario of DHCP.
15.2.3.1 DHCP Client Application
Figure 15-16 shows a networking diagram in which the UPE functioning as a DHCP client
initiates a request for obtaining a management IP address.
Figure 15-16 Networking diagram of obtaining a management IP address
UPE PE-AGG NMS

IP/MPLS
DHCP Client DHCP Relay
DHCP Server
15.2.3.2 DHCP Server Application
As it is shown in Figure 15-17, a DHCP server and multiple DHCP clients (such as PCs and
portable computers) are deployed.
Figure 15-17 Typical networking of the DHCP server

DHCP Clients DHCP Server
DHCP Clients

Equipment
Generally, the DHCP server is used to assign IP addresses in the following scenarios:
l On a large network, manual configurations take a long time and bring difficulties to
centralized management over the entire network.
l Hosts on the network are more than available IP addresses. Thus, not every host has a
fixed IP address. Many hosts need to dynamically obtain IP addresses through the DHCP
server. In addition, network administrators hope that there is a limit to the number of
users of on-line at the same time.
l Only a few hosts on the network require fixed IP addresses.
15.2.3.3 DHCP Relay Application

Figure 15-18 shows typical networking of DHCP relay.
Figure 15-18 Typical networking of DHCP relay
Internet
DHCP Relay DHCP Server
DHCP Clients
The earlier DHCP protocol applies to only the scenario that the DHCP client and DHCP
server are on the same network segment. To dynamically assign IP addresses to hosts on
network segments, the network administrator needs to configure a DHCP server on each
network segment, which increases costs.
The DHCP relay function is introduced to solve this problem. A DHCP client can apply to the
DHCP server on another network segment to obtain a valid IP address. In this manner, DHCP
clients on multiple network segments can share one DHCP server. This reduces costs and
facilitates centralized management.
Terms
None

Acronym & Full Name
Abbreviation
SNP Snooping

Equipment
Acronym & Full Name

Abbreviation
CHADDR Client Hardware Address
15.3 DHCPv6
AND2CXPE configured) supports the DHCPv6 Relay.Only the ATN 910I/ATN 910B/ATN
905 support the DHCPv6 Client.
15.3.1 Introduction
Definition
Dynamic Host Configuration Protocol for IPv6 (DHCPv6) is designed to assign IPv6
addresses, prefixes, and other network configuration parameters to hosts.
Purpose
The IPv6 protocol provides huge address space formed by 128-bit IPv6 addresses that require
proper and efficient assignment and management policies. IPv6 stateless address
autoconfiguration defined in RFC2462 is widely used. Hosts configured with the stateless
address autoconfiguration function automatically configure IPv6 addresses based on prefixes
carried in Route Advertisement (RA) packets sent from a neighboring router.
When stateless address autoconfiguration is used, routers do not record IPv6 addresses of
hosts. Therefore, stateless address autoconfiguration has poor manageability. In addition,
hosts configured with stateless address autoconfiguration cannot obtain other configuration
parameters such as the DNS server address. ISPs do not provide instructions for automatic
allocation of IPv6 prefixes for routers. Therefore, users need to manually configure IPv6
addresses for routing and switching devices during IPv6 network deployment.
DHCPv6 solves this problem. DHCPv6 is a stateful protocol for configuring IPv6 addresses
automatically. During stateful address configuration, a DHCPv6 server assigns a complete
IPv6 address to a host and provides other configuration parameters, such as the DNS server
address. A DHCPv6 relay agent may be used to relay DHCPv6 packets. The DHCPv6 server
binds the IPv6 address to a client. This improves network manageability.
Compared with manual address configuration and IPv6 stateless address autoconfiguration
that uses network prefixes in RA packets, DHCPv6 has the following advantages:
l Controls IPv6 address assignment better. A DHCPv6 device can record addresses
assigned to hosts and assign requested addresses. This function facilitates network
management.
l Assigns IPv6 address prefixes to network devices. This function facilitates automatic
configuration and hierarchical network management.
l Provides other network configuration parameters such as the DNS server address.

Equipment
15.3.2 Principles
15.3.2.1 Principles of DHCPv6 Access
DHCP Unique Identifier (DUID) of a DHCPv6 device

During the interaction between a DHCPv6 client and a DHCPv6 server, each DHCPv6 client
or DHCPv6 server is identified by a unique DUID. On a DHCPv6 server, the client DUID
identifies a DHCPv6 client and is used in the local address allocation policy. On a DHCPv6
client, the server DUID is used to identify a DHCPv6 server. DUIDs can be generated in the
following modes:
l DUID Based on Link-layer Address Plus Time (DUID-LLT)

In DUID-LLT mode, DUIDs are generated for DHCPv6 devices based on link-layer
addresses and the time.
l DUID Assigned by Vendor Based on Enterprise Number (DUID-EN)
In DUID-EN mode, DUIDs are generated for DHCPv6 devices based on the enterprise
numbers registered with the Internet Assigned Numbers Authority (IANA).
l DUID Based on Link-layer Address (DUID-LL)
In DUID-LL mode, DUIDs are generated for DHCPv6 devices based on link-layer
addresses.
Interaction Model Between a DHCPv6 Client and a DHCPv6 Server

DHCPv6 adopts the client/server model for communications. A client sends an Information-
Request message to a specific server for a valid dynamic IPv6 address/prefix or other
configuration information. After receiving the Information-Request message, each server
replies with a Response message containing the configuration information for the client
according to policies. At different stages, DHCPv6 clients and servers exchange different
information in different modes:
l Exchanging configuration information, involving two steps

– After a client obtains an IPv6 address/prefix in the stateless address configuration
manner, it multicasts a DHCPv6 Information-Request message.
– After a server receives the Information-Request message, it replies with a Response
message containing the configuration information for the client.
l Exchanging IPv6 addresses/prefixes and other configuration information, involving two
steps:
When a client accesses the network for the first time, it can obtain an IPv6 address/prefix
and configuration information based on the following steps:
– When a client accesses the network for the first time, it multicasts a DHCPv6
Solicit message containing the Rapid Commit option.
– After receiving the DHCPv6 Solicit message from the client, a DHCPv6 server
selects an unassigned IPv6 address/prefix from the IPv6 address/prefix pool and
assigns it to the client. If the server supports two-step interaction, it sends a
DHCPv6 Response message containing the leased IPv6 address/prefix and other
configuration information. Otherwise, the server sends a DHCPv6 Advertise
message.

Equipment
– After receiving the Response message, the client uses the IPv6 address/prefix and
other configuration information in the Response message. If the client receives only
a DHCPv6 Advertise message within the specified period, the client undergoes four
stages to obtain the configuration information according to the configured policy.
l Exchanging IPv6 addresses/prefixes and other configuration information, involving four
stages:
When a DHCPv6 client accesses the network for the first time, similar to a DHCPv4
client, the DHCPv6 client undergoes four stages to obtain an IPv6 address/prefix and
other configuration information:
– Discovering stage: indicates the stage at which the DHCPv6 client searches for a
DHCPv6 server. The client multicasts a DHCPv6 Solicit message.
– Offering stage: indicates the stage at which the DHCPv6 server offers an IPv6
address/prefix to the DHCPv6 client. After receiving the DHCPv6 Solicit message
from the client, the DHCPv6 server selects an unassigned IPv6 address/prefix from
the IPv6 address/prefix pool, and then sends a DHCPv6 Advertise message
containing the leased IPv6 address/prefix and other configuration information to the
client.
– Selecting stage: indicates the stage at which the DHCPv6 client selects an IPv6
address/prefix. If multiple DHCPv6 servers send DHCPv6 Advertise messages to
the client, the client selects a server according to the configured policy. If the
Advertise message contains the Server Unicast option, and the client also supports
this option, the client unicasts a DHCPv6 Request message to each DHCPv6 server.
Otherwise, the client multicasts a DHCPv6 Request message containing
information used to instruct the selected DHCPv6 server to offer an IPv6 address/
prefix.
– Acknowledging stage: indicates the stage at which the DHCPv6 server
acknowledges the IPv6 address/prefix to be offered. After receiving the DHCPv6
Request message from the client, the DHCPv6 server sends a DHCPv6 Response
message to the client. The DHCPv6 Response message contains the offered IPv6
address/prefix and other configuration information. After receiving the DHCPv6
Response message, the client uses the offered IPv6 address/prefix and other
configuration information.
l The DHCPv6 client extends the IPv6 address/prefix lease.
When a DHCPv6 server assigns an IPv6 address/prefix to a client, the server sends a
message containing the preferred lifetime, valid lifetime, lease renew time, and rebind
time. The relationship between them is as follows: lease renew time < rebind time <
preferred lifetime < valid lifetime.
The preferred lifetime is used to limit the lease renew time and rebind time. By default,
the lease renew time and rebind time account for 50% and 80% respectively of the
preferred lifetime.
The valid lifetime is the lease set for the IPv6 address/prefix assigned to a client. The
server retrieves the IPv6 address/prefix after the valid lifetime expires. If the client
intends to continue to use this IPv6 address/prefix, it needs to extend the IPv6 address/
prefix lease before the valid lifetime ends.
When the lease of the IPv6 address/prefix expires, the DHCPv6 client automatically
sends a DHCPv6 Renew message to the server. If the client and server support unicast,
the client unicasts a DHCPv6 Renew message. Otherwise, the client multicasts a
DHCPv6 Renew message.
After the DHCPv6 server receives the DHCPv6 Renew message, if the contained IPv6
address/prefix is valid and the lease can be renewed, the server replies with a DHCPv6

Equipment
Response message containing the new lease of the IPv6 address/prefix. After receiving
the DHCPv6 Response message from the server, the client renews the lease of its IPv6
address/prefix.
When the rebind time expires, if the DHCPv6 client does not finish renewing the lease of
its IPv6 address/prefix, it multicasts a DHCPv6 Rebind message to all available servers.
After the DHCPv6 server receives the DHCPv6 Rebind message, if the contained IPv6
address/prefix is valid and the lease can be renewed, the server replies with a DHCPv6
Response message containing the new lease of the IPv6 address/prefix. After receiving
the DHCPv6 Response message from the server, the client renews the lease of its IPv6
address/prefix.
l When the link to which the DHCPv6 client is connected changes, the client needs to
check whether its IPv6 address/prefix is still available.
When the link to which the DHCPv6 client is connected changes, for example, the
network cable is loosely connected, the client needs to send a DHCPv6 Confirm message
to the server to check whether its IPv6 address/prefix is still available.
If the IPv6 address needs to be validated, the client multicasts a DHCPv6 Confirm
message containing the IPv6 address to be validated.
After the DHCPv6 server receives the DHCPv6 Confirm message, if the IPv6 address/
prefix assigned to the client is still available, the server replies with a DHCPv6 Response
message in which the status of the IPv6 address is set to Success. After receiving the
DHCPv6 Response message from the server, the client continues to use this IPv6
address.
If the IPv6 prefix needs to be validated, the client multicasts a DHCPv6 Rebind message
containing the IPv6 prefix to be validated. The DHCPv6 server processes the received
Rebind message and then replies with a Response message. After the client receives the
Response message from the server, if the lifetime of the IPv6 prefix is not 0, the client
continues to use this prefix and renew the lease.
l The DHCPv6 client detects a duplicate IPv6 address.
If the DHCPv6 client detects a duplicate IPv6 address, it notifies the server of the
address conflict.
That is, the DHCPv6 client sends a DHCPv6 Decline message containing the duplicate
IPv6 address to the server. The source address of the Decline message cannot be the
duplicate address. If the client and server support unicast, the client unicasts a DHCPv6
Decline message to the server. Otherwise, the client multicasts a DHCPv6 Decline
message to the server.
When receiving the DHCPv6 Decline message, the server marks the IPv6 address
contained in the Decline message as a duplicate address.
l The DHCPv6 client releases an IPv6 address/prefix.
To release its IPv6 address/prefix, the DHCPv6 client sends a DHCPv6 Release message
containing the IPv6 address/prefix to be released to the server. If the client and server
support unicast, the client unicasts a DHCPv6 Release message. Otherwise, the client
multicasts a DHCPv6 Release message.
After receiving the DHCPv6 Release message from the client, the server releases the
IPv6 address/prefix assigned to the client and responds with a Reply message.

Equipment
Principles of the DHCPv6 Relay Agent

The ATN can function as a DHCPv6 relay agent to forward DHCPv6 request messages from
DHCPv6 clients and DHCPv6 response messages from the DHCPv6 server. The following
figure shows the DHCPv6 message forwarding process.
Figure 15-19 DHCPv6 message forwarding process
Client Relay Relay Server
Client sends
packets
Relay-Forward
Relay-Forward
Relay-Reply
Relay-Reply
Server requests
packets
A DHCPv6 relay agent encapsulates all request messages that pass through it into Relay-
forward messages before forwarding them to other relay agents or the server. These request
messages include DHCP request messages originating from clients and Relay-forward
messages originating from other relay agents. The server then encapsulates messages in
response to the clients into Relay-reply messages and sends the Relay-reply messages to the
relay agent.
15.3.3 Applications
15.3.3.1 DHCPv6 Client over PPPoE (Including DHCPv6-PD)
The ATN device that functions as a CPE supports the following functions:
l Routed mode, either numbered or unnumbered
– In numbered routed mode, the ATN device sends DHCPv6 requests that carry the
IA_NA option to apply for IPv6 addresses of WAN interfaces, as shown in Figure
15-20.
Figure 15-20 Numbered routed mode

DHCPv6 requests with the IA_NA
option to apply for IPv6
addresses of WAN interfaces
ND
HOST CPE BNG

Equipment
– In unnumbered routed mode, the ATN device sends DHCPv6 requests that do not
carry the IA_NA option so that IPv6 addresses of WAN interfaces are not applied
separately, as shown in Figure 15-21.
Figure 15-21 Unnumbered routed mode

DHCPv6 requests without the
IA_NA option so that IPv6
addresses of WAN interfaces are
not applied separately
ND
HOST CPE BNG
l Applying for access-side IPv6 address pool using DHCPv6-PD and assigning IPv6
prefixes to the user side using ND
15.3.3.2 DHCPv6 Relay

The ATN device can function as a DHCPv6 relay agent to forward DHCPv6 request messages
from DHCPv6 clients and DHCPv6 response messages from the DHCPv6 server.
Figure 15-22 DHCPv6 relay networking
NMS
CPE PE-AGG
IP/MPLS
DHCPv6 Client DHCPv6 Relay
DHCPv6 Server

Terms
None

Equipment

Acronym & Full Name
Abbreviation
SNP Snooping
CHADDR Client Hardware Address
15.4 Plug-and-Play
15.4.1 Introduction to Plug-and-Play

Definition
Plug-and-Play (PnP) is a reference to the ability of a network management system (NMS) to
automatically configure a device added to the network through the Dynamic Host
Configuration Protocol (DHCP). Through PnP, you can commission remote devices in a
centralized manner.
Purpose
A great number of devices need to access the network; therefore, the CapEx of project
deployment, especially of on-site commissioning of devices on the mobile bearer network is
high and the profit of the carrier is greatly affected. In this situation, Huawei launches a PnP
solution for networking schemes to address this problem.
Benefits
This feature brings the following benefits to carriers:
PnP can greatly reduce time taken for on-site commissioning of devices and prevent device
commissioning engineers from working in atrocious outdoor environments. In this manner,
PnP can accelerate progress and improve quality of the project.
15.4.2 Principles
15.4.2.1 Principle of DHCP
The Dynamic Host Configuration Protocol (DHCP) provides a framework for transmitting
configuration information to hosts on a TCP/IP network. DHCP, based on the Bootstrap
Protocol (BOOTP), adds the capability of automatically allocating reusable network addresses
and adds additional configuration options to DHCP packets.
DHCP packets can be classified into eight types. A DHCP server and a DHCP client
communicate with each other by exchanging these DHCP packets.

Equipment
l DHCPDISCOVER: It is the first packet used to search for a DHCP server when a DHCP
client accesses the network for the first time.
l DHCPOFFER: It is sent by a DHCP server to respond to a DHCPDISCOVER packet. A
DHCPOFFER packet carries configuration information.
l DHCPREQUEST: The DHCP client sends a DHCPREQUEST packet to the DHCP
server in any of the following situations.
– After being initialized, the DHCP client broadcasts a DHCPREQUEST packet to
respond to the DHCPOFFER packet sent from the DHCP server.
– After being restarted, the DHCP client broadcasts a DHCPREQUEST packet to
confirm the correctness of the configurations, such as the previously allocated IP
address.
– After being bound to an IP address, the DHCP client sends a unicast
DHCPREQUEST packet to extend the lease of the IP address.
l DHCPACK: It is sent by a DHCP server to acknowledge the DHCPREQUEST packet
sent from a DHCP client. After receiving a DHCPACK packet, the DHCP client obtains
the configuration information, including the IP address.
l DHCPNAK: It is sent by a DHCP server to refuse the DHCPREQUEST message from a
DHCP client. For example, the IP address that is assigned by the DHCP server to the
DHCP client expires, or the DHCP client moves to another network.
l DHCPDECLINE: It is sent by a DHCP client to notify the DHCP server that the
assigned IP address conflicts with the other IP addresses. Then, the DHCP client applies
to the DHCP server for another IP address.
l DHCPRELEASE: It is sent by a DHCP client to ask the DHCP server to release the
network address and cancel the remaining lease.
l DHCPINFORM: It is sent by a DHCP client to the DHCP server to ask for configuration
parameters after the DHCP client obtains an IP address.
15.4.2.2 Operation Process of PnP

An underlayer PE (UPE) must function as a DHCP client to support the PnP feature. That is,
the UPE must be able to obtain an IP address by exchanging DHCP packets with the DHCP
server as described in Figure 15-23. The Network Management System (NMS) delivers
configuration files and startup files to the UPE. Then, the UPE can use the PnP feature after
restarting with the new configuration files.

Equipment
Figure 15-23 Operation process of PnP

7
9
8 6
5
3
4
2
1
IP/MPLS
UPE (DHCP client) NMS (DHCP
DHCP Relay Server)
10
11
The operation process of PnP is as follows:
1. After being powered on, the UPE starts the PnP process automatically. First, the UPE
sends a DHCP Discover broadcast packet that carries the Vendor Class Identifier (VCI)
in the Option 60 field and Option 61 field.
2. The DHCP relay agent receives the DHCP Discover packet and appends the Option 82
field to the packet. Then, the DHCP relay agent unicasts the packet to the DHCP server,
which functions as the NMS.
3. The DHCP server searches the network element planning forms for a fixed IP address
according to the Option 60, Option 61 and Option 82 fields carried in the packet. The
DHCP server allocates a fixed IP address and responds to the DHCP relay agent with a
DHCP Offer packet.
4. The DHCP relay agent receives the DHCP Offer packet and then sends it to the UPE.
5. The UPE broadcasts a DHCP request.
6. The DHCP relay agent receives the DHCP request and appends the Option 82 field to the
packet. Then, the DHCP relay agent unicasts the packet to the DHCP server, which
functions as the NMS.
7. The DHCP server checks information in the received packet and acknowledges the
address allocation for the UPE. In addition, the DHCP server responds to the DHCP
relay agent with a DHCP ACK packet.
8. The DHCP relay agent receives the DHCP ACK packet and sends it to the UPE.
9. After receiving the DHCP ACK packet, the UPE sends gratuitous ARP packets and
checks whether the allocated address is already in use.
If the UPE finds that the allocated address is not in use, the UPE obtains information
such as the IP address (yiaddr), mask (option 1), and gateway (option 3) from the DHCP
ACK packet and generates a route. Meanwhile, the IP address X.X.X.X mask command
is automatically executed; Telnet, AAA administrative user, and SNMP are configured
on the UPE. Finally, the DHCP client function is disabled on the UPE, and the UPE
cannot send or handle DHCP packets any longer.
10. The NMS delivers configuration files, startup files, and a restart command to the UPE.

Equipment
11. The UPE can use the PnP feature. That is, the PnP process is complete.
NOTE
After the PnP process is complete, all VTY channels are available. The first user is required to log in to the
device through the DHCP server or DHCP relay and create the login password.
15.4.3 Applications
15.4.3.1 Application of PnP
As shown in Figure 15-24, the UPE obtains a management IP address through DHCP and
starts a management channel through automatic configuration. The NMS delivers
configuration files and startup files through the management channel.
Figure 15-24 Network diagram of obtaining a management IP address and starting a

management channel
UPE PE-AGG NMS

IP/MPLS
DHCP Client DHCP Relay
DHCP Server

Terms
Term Definition
Plug-and-Play Plug-and-Play (PnP) is a reference to the ability of an NMS to

automatically configure a device added to the network
through DHCP. Through PnP, you can commission remote
devices in a centralized manner.

Equipment

PNP Plug-and-Play
15.5 DCN
15.5.1 Introduction
Definition
The data communication network (DCN) refers to the network on which network elements
(NEs) exchange Operation, Administration and Maintenance (OAM) information with the
network management system (NMS), and it is constructed for communication between
managed devices.
A DCN can be an external or internal DCN. As shown in Figure 15-25, the external DCN is
the network between the NMS and the access point. The internal DCN is the network on
which network elements transmit OAM information. The DCN mentioned in the following
description is internal DCN.
Figure 15-25 External DCN and internal DCN
NMS
External
DCN
GNE GNE
NE NE
Internal Internal
DCN DCN
NE NE
Gateway network elements (GNEs) are the network elements that connect to the NMS using
protocols, such as Transfer Control Protocol (TCP) and Simple Network Management
Protocol (SNMP). GNEs are able to forward data at the network or application layer. The
NMS can use GNEs to manage remote NEs connected through optical fibers.

Equipment
Purpose
When constructing a large network, hardware engineers must install devices on site, and
software commissioning engineers must configure the devices on site. This network
construction method requires significant human and material resources, causing high capital
expenditure (CAPEX) and operational expenditure (OPEX). If a new NE is deployed but the
NMS cannot detect the NE, the network administrator cannot manage or control the NE. Plug-
and-play is required so that the NMS can automatically detect new NEs and remotely
commission NEs to reduce CAPEX/OPEX.
The DCN technique offers a mechanism to implement plug-and-play. After an NE is installed
and started, an IP address (NEIP address) is automatically generated for the NE based on the
NEID. Each NE adds its NEID and NEIP address to a Type-10 LSA. Then, OSPF advertises
all the Type-10 LSAs to construct a core routing table consisting of mappings between NEIP
addresses and NEIDs on each NE. After detecting a new NE, the GNE reports the NE to the
NMS. The NMS accesses the NE using the IP address of the GNE and ID of the NE. To
commission NEs, the NMS can use the GNE to remotely manage the NEs on the network.
NOTE
To improve the system security, it is recommended that the NEIP address be changed to the planned one.
Benefits
The NMS is able to manage NEs using service channels provided by the managed NEs. No
additional devices are required, reducing CAPEX/OPEX.
15.5.2 Principles
NEID and NEIP

l NEID
On a data communication network (DCN), a network element (NE) is uniquely identified
by an ID but not an IP address. This ID is called an NEID. A 24-bit NEID consists of a
subnet number and a basic ID. The leftmost 8 bits of an NEID indicate a subnet. The
rightmost 16 bits of an NEID indicate a basic ID. Each NE is assigned a default NEID
before the NE is delivered.
As the unique identities of NEs on a DCN, NEIDs must be different from each other. If
the NEIDs of two NEs on a DCN are identical, route flapping occurs.
l NEIP
NEIP addresses help managed terminals access NEs and allow addressing between NEs
in IP networking. An NEIP address consists of a network number and a host number. A
network number uniquely identifies a physical or logical link. All the NEs along the link
have the same network number. A network number is obtained using an AND operation
on the 32-bit IP address and the corresponding subnet mask.
An NEIP address is derived from an NEID when an NE is being initialized. An NEIP
address is in the format of 128.subnet number.basic ID.
The following example uses the default NEID 0x09BFE0, which is
1001.10111111.11100000 in binary format. The basic ID is the 16 least significant bits
10111111.11100000, which is 191.224 in decimal format. The subnet number is the 8

Equipment
most significant bits 00001001, which is 9 in decimal format. Therefore, the NEIP
address derived from 0x09BFE0 is 128.9.191.224.
Before the NEIP address is manually changed, the NEIP address and NEID are
associated; therefore, the NEIP address changes if the NEID is changed. Once the NEIP
address is manually changed, it no longer changes when the associated NEID is changed.
NOTE
To improve the system security, it is recommended that the NEIP address be changed to the
planned one.
Core Routing Table

Each device maintains a routing table consisting of mappings between NEIP addresses and
NEIDs of NEs on a DCN. This routing table is called a DCN core routing table. After the
device is started, OSPF is enabled automatically, and each device adds its NEID and NEIP
address to a Type-10 Link Status Advertisement (LSA). Then, OSPF advertises all the
Type-10 LSAs to construct a core routing table on each device. To access a non-GNE through
a GNE on a DCN, the NMS searches the DCN core routing table for a destination NEIP
address and sends a TCP packet to the GNE. The GNE uses the NEID carried in the TCP
packet to search for a route, and then forwards a UDP packet to the target non-GNE. The
UDP packet is addressed to the destination based on IP routing. Therefore, the generation of
the core routing table indicates that NEs have successfully negotiated with the GNE.
15.5.2.2 Basic DCN Principles

Huawei NEs can use serial interfaces or sub-interfaces numbered 4049 for DCN
communication. Non-Huawei NEs cannot use serial interfaces for DCN communication.
Therefore, to implement DCN communication between Huawei NEs and non-Huawei NEs,
sub-interfaces numbered 4094 must be configured.

Equipment
Using Serial Interfaces for DCN Communication
Figure 15-26 Using serial interfaces for DCN communication
NMS apps DCN apps DCN apps

TCP TCP UDP UDP
IP IP IP IP
ETH ETH PPP PPP
MPLS/PPP ETH ETH MPLS/PPP
ETH port METH port Service port Service port
NMS GNE NE
Management info flow

NMS
NEIP: NEIP: NEIP:

Management 128.9.0.1/24 128.9.0.2/24 128.9.0.3/24
Info NEID:9-1 NEID:9-2 NEID:9-3
ID:9-3
GNE PPP PPP
TCP
IP:10.9.0.1
NE1 NE2
ETH
Management
Info
ID:9-3 Original packet from the NMS
UDP
IP:10.9.0.3 Packet processed by the GNE
PPP
ETH
The devices on a DCN communicate with each other using the Point-to-Point Protocol (PPP)
through single-hop logical channels. Therefore, packets transmitted on the DCN are
encapsulated into PPP frames and forwarded at the data link layer of service ports.
As shown in Figure 15-26, the NMS uses the GNE to manage NEs in the following process:
1. After the DCN function is enabled, a PPP channel and an OSPF neighbor relationship
are established between devices.
2. OSPF protocol packets are sent between OSPF neighbors to learn host routes carrying
NEIP addresses to obtain mappings between NEIP addresses and NEIDs.
3. When the NMS accesses an NE, it uses the NEIP address of the GNE as the destination
address and the NEID of the NE to address a TCP packet to the GNE.
4. The GNE sends the packet to its application layer, searches for the destination NEIP
address based on the NEID, changes the NEIP address of the NE as the destination
address, encapsulates the TCP packet into a UDP packet, and searches the local routing
table to send the packet to the NE.
5. After receiving the packet, the NE obtains the IP address carried in the packet, verifies
that the IP address is its own NEIP address, and sends the packet to the application layer
for further processing.

Equipment
A core routing table is generated in the following process:

1. After PPP Network Control Protocol (NCP) negotiation is complete, a point-to-point
route is generated without network segment restrictions.
2. The OSPF neighbor relationship is established, and the OSPF route is generated for the
entire network.
3. NEIDs are advertised through OSPF LSAs, triggering the generation of a core routing
table.
Using the Sub-Interface Numbered 4094 for DCN Communication
Figure 15-27 Using the sub-interface numbered 4094 for DCN communication
NMS apps DCN apps DCN apps

TCP TCP UDP UDP
IP IP IP IP
ETH ETH ETH ETH
ETH port METH port Service port Service port
NMS GNE NE
Management info flow

NMS
NEIP: NEIP: NEIP:

Management 128.9.0.1/24 128.9.0.2/24 128.9.0.3/24
Info NEID:9-1 NEID:9-2 NEID:9-3
ID:9-3
TCP GNE
IP:128.9.0.1
NE1 NE2
ETH
Management
Info
ID:9-3 Original packet from the NMS
UDP
IP:128.9.0.3 Packet processed by the GNE
ETH
The sub-interface numbered 4094 is a sub-interface that is configured on a DCN-enabled

interface and used for DCN communication between NEs. After the sub-interface numbered
4094 is configured, it can be associated with VLAN 4094, and its encapsulation type is dot1q
VLAN tag termination. In addition, OSPF is enabled on the sub-interface numbered 4094 by
default.
As shown in Figure 15-27, the sub-interface numbered 4094 is used for DCN
communication, and the NMS uses the GNE to manage NEs in the following process:
1. The NEs use Link Layer Discovery Protocol (LLDP) to discover neighboring NEs
through advertising their NE IP addresses to each other.

Equipment
2. OSPF protocol packets are sent between OSPF neighbors to learn host routes carrying
NEIP addresses to obtain mappings between NEIP addresses and NEIDs.
3. When the NMS accesses an NE, it uses the NEIP address of the GNE as the destination
address and the NEID of the NE to address a TCP packet to the GNE.
4. The GNE sends the packet to its application layer, searches for the destination NEIP
address based on the NEID, changes the NEIP address of the NE as the destination
address, encapsulates the TCP packet into a UDP packet, and searches the local routing
table to send the packet to the NE.
5. After receiving the packet, the NE obtains the IP address carried in the packet, verifies
that the IP address is its own NEIP address, and sends the packet to the application layer
for further processing.
Routes in the DCN core routing table are generated in the same process.
15.5.2.3 DCN over Service Interfaces

When the ATN and NMS are connected and there is no ATN between them processing DCN
packets at the DCN protocol layer, the gateway interface must be added to a DCN VRP. So
service interfaces cannot transmit gateway DCN packets directly.
The following figure shows gateway DCN application layer in the protocol stack.
Figure 15-28 Gateway DCN application layer in the protocol stack (PTN forwarding mode)
TCP TCP
IP IP IP
VLAN ETH
Ethernet Ethernet
ETH GE/FE
NMS Management network

GNE
TCP connection

Equipment
Figure 15-29 Gateway DCN application layer in the protocol stack (IPRAN forwarding
mode)
TCP TCP
IP IP
VLAN ETH
Ethernet Ethernet
ETH GE/FE
NMS Management
GNE
network
TCP
connection
When ATNs are connected through service interfaces, intra-domain DCN over service
interfaces is deployed.
The following figure shows intra-domain DCN application layer in the protocol stack.
Figure 15-30 Intra-domain DCN application layer in the protocol stack (PTN forwarding
mode)
Application Application Application
TCP TCP UDP UDP
IP IP IP IP IP
PPP PPP PPP

Ethernet Ethernet
ETH/cSTM-1/ ETH/cSTM-1/ ETH/cSTM-1/
E1 E1 E1
NMS GNE Forwarding NE Destination NE
TCP connection PPP connection PPP connection

Equipment
Figure 15-31 Intra-domain DCN application layer in the protocol stack (IPRAN forwarding
mode)
TCP UDP
IP IP IP
PPP PPP PPP

Ethernet Ethernet
ETH/cSTM-1/ ETH/cSTM-1/ ETH/cSTM-1/
E1 E1 E1
NMS GNE Forwarding NE Destination NE
TCP connection PPP connection PPP connection
Service and DCN packets are transmitted separately on service interfaces through DCN VRF.
15.5.2.4 Gateway DCN over the Control Plane

On a DCN over service interfaces, GNE need to process DCN packets of both its own and
other NEs in the domain and calculate routes for the DCN packets, causing a heavy CPU load.
As a result, the number of non-GNEs is limited, which limits the DCN network scale.
Gateway DCN over the control plane allows DCN packets to be transmitted over the control
plane of ATNs. DCN packets are transmitted over the control plane, and the management
plane does not need to maintain routing information, reducing CPU burden. The ATN
integrates virtual routing and forwarding (VRF) routes on the control plane into the hardware
forwarding plane. Therefore, DCN packets are forwarded directly by chips, accelerating DCN
packet transmission and enabling larger coverage of a DCN.
The following figure shows protocol layers used in gateway DCN over the control plane,
which are the same as those of the network management interface. DCN packets in the
gateway DCN over the control plane do not need to be encapsulated into PPP frames. The
DCN application layer of the aggregation ATN processes its own DCN packets only, and the
aggregation ATN performs IP routing for DCN packets from other NEs only.
Figure 15-32 Protocol layers used in gateway DCN over the control plane (PTN forwarding
mode)
TCP TCP
IP IP IP
ETH/PPP ETH/PPP
Ethernet Ethernet
Service Port Service Port
NMS Forwarding NE GNE
TCP connection

Equipment
Figure 15-33 Protocol layers used in gateway DCN over the control plane (IPRAN
forwarding mode)
TCP TCP
IP IP
ETH/PPP ETH/PPP
Ethernet Ethernet
Service Port Service Port
NMS Forwarding NE GNE
TCP connection
In gateway DCN over the control plane and gateway DCN over service interfaces, service
interfaces are used for access and function as gateways. In gateway DCN over the control
plane, the VLAN tag does not need to be removed, and DCN packets are transmitted over the
control plane, avoiding address conflicts with the management plane.
15.5.2.5 DCN Security

The user logs in to the device over Authentication, Authorization and Accounting (AAA), as
Figure 15-34 User login
DCN
Network
User GNE
Unauthorized users may frequently send authentication attempts to crack account information
by simulating Qx or MML AAA login packets or using exhaustive attack method to construct
login information. To prevent this case, DCN supports delayed processing for users with
authentication failures. The detailed implementation process is as follows:
1. When a user logs in to a device for the first time, DCN authenticates the user. If the
authentication is successful, the user can log in to the device. If the authentication fails,
DCN locks the user and sets an 8-second timeout period.
2. If the user resends a login request before the timeout period expires, DCN discards the
request. If the user resends a login request after the timeout period expires, DCN
reauthenticates the user. If the authentication is successful, the user can log in to the
device. If the authentication fails again, DCN relocks the user and sets a 16-second
timeout period.
3. If the user resends a login request before the 16-second timeout period expires, DCN
discards the request. If the user resends a login request after the 16-second timeout
period expires, DCN reauthenticates the user. If the authentication is successful, the user
can log in to the device. If the authentication fails again, DCN relocks the user and sets a
32-second timeout period.

Equipment
4. If the user fails to log in to the device three consecutive times, DCN disconnects the
user's TCP connection after the 32-second timeout period expires.
15.5.3 Applications
15.5.3.1 Typical DCN Application
During network deployment, every network element (NE) must be configured with software
and commissioned after hardware installation to ensure that all NEs can communicate with
each other. As a large number of NEs are deployed, on-site deployment for each NE requires
significant manpower and is time-consuming. In order to reduce the on-site deployment times
and the cost of operation and maintenance, the DCN can be deployed.
Figure 15-35 Typical DCN application

NMS
External TCP
DCN
Active Standby
GNE GNE
UDP
As shown in Figure 15-35, to improve reliability, Active and standby GNEs can be deployed.
If the active GNE fails, this function can be gracefully switched to the standby GNE by the
NMS.

Equipment
DCN Traversal over a Third-Party Layer 2 Network
Figure 15-36 DCN traversal over a third-party Layer 2 network
HUAWEI
ATN-1 Network
Third-party
Network
NMS VLL-1
HUAWEI
Network
GNE
VLL-N
…
HUAWEI
ATN-N Network
1. A DCN VLAN group is configured on the GNE, and the VLAN ID of the Dot1q
termination subinterface is the same as the DCN VLAN ID of the main interface.
2. The GNE sends DCN negotiation packets to VLANs in the DCN VLAN group.
3. The DCN negotiation packets are sent to different leaf nodes through VLLs.
4. NEs learn the DCN VLAN ID sent by the GNE and establish DCN connections with the
GNE.
Terms
Term Description
GNE Gateway network elements (GNEs) are able to forward

data at the network or application layer. The NMS can
use GNEs to manage remote NEs connected through
optical fibers.
Core routing table A core routing table consists of mappings between NEID
and NEIP addresses of NEs on a data communication
network (DCN). Before accessing a non-GNE through a
GNE, the NMS must search the core routing table for the
NEIP address of the non-GNE based on the destination
NEID.

Equipment
Abbreviations
DCN Data Communication Network
GNE Gateway Network Element
15.6 PPPoE Access

The PPPoE configuration only supported by the ATN 905/ATN 910I/ATN 910B.
15.6.1 Introduction to the PPPoE

Definition
The Point

ATN 905&910&910I&910B&950B V200R006C20SPC600 Feature Description 01 (CLI) PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

ATN 905&910&910I&910B&950B V200R006C20SPC600 Feature Description 01 (CLI) PDF

Încărcat de

Drepturi de autor:

Formate disponibile

ATN 905&910&910I&910B&950B Multi-Service

HUAWEI TECHNOLOGIES CO., LTD.

Trademarks and Permissions

Huawei Technologies Co., Ltd.

Issue 01 (2018-01-30) Huawei Proprietary and Confidential i

About This Document

Product Name Version

ATN 905 V200R006C20SPC600

Issue 01 (2018-01-30) Huawei Proprietary and Confidential ii

encryption algorithms, such as AES/RSA (RSA-2048 or higher)/SHA2/HMAC-SHA2 is

Issue 01 (2018-01-30) Huawei Proprietary and Confidential iii

Indicates an imminently hazardous situation which, if not

Indicates a potentially hazardous situation which, if not

Indicates a potentially hazardous situation which, if not

Indicates a potentially hazardous situation which, if not

Calls attention to important information, best practices and

Boldface The keywords of a command line are in boldface.

Italic Command arguments are in italics.

[] Items (keywords or arguments) in brackets [ ] are optional.

{ x | y | ... } Optional items are grouped in braces and separated by

[ x | y | ... ] Optional items are grouped in brackets and separated by

{ x | y | ... }* Optional items are grouped in braces and separated by

[ x | y | ... ]* Optional items are grouped in brackets and separated by

Issue 01 (2018-01-30) Huawei Proprietary and Confidential iv

Boldface Buttons, menus, parameters, tabs, window, and dialog titles

> Multi-level menus are in boldface and separated by the ">"

Changes in Issue 01 (2018-01-30)

Issue 01 (2018-01-30) Huawei Proprietary and Confidential v

About This Document.....................................................................................................................ii

Issue 01 (2018-01-30) Huawei Proprietary and Confidential vi

1.2.3.4 Applications of SSH............................................................................................................................................... 33

Issue 01 (2018-01-30) Huawei Proprietary and Confidential vii

2.4.3.1 End-to-End Performance Measurement Scenarios.................................................................................................86

Issue 01 (2018-01-30) Huawei Proprietary and Confidential viii

2.7.3 Terms and Abbreviations.........................................................................................................................................142

Issue 01 (2018-01-30) Huawei Proprietary and Confidential ix

Issue 01 (2018-01-30) Huawei Proprietary and Confidential x

3.5.2.6 Comparison Between Protocols............................................................................................................................242

4 Interface Management.............................................................................................................. 266

5 LAN Access and MAN Access................................................................................................ 280

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xi

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xii

5.4.2.5 Evolution from STP to RSTP............................................................................................................................... 367

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xiii

5.7.4 Terms, Acronyms, and Abbreviations..................................................................................................................... 446

6 WAN Access............................................................................................................................... 491

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xiv

6.2.2.2 Process of Establishing an MP Connection.......................................................................................................... 500

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xv

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xvi

7.3.2.1 Differences Between ACL4 and ACL6................................................................................................................ 588

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xvii

8.1.2.6 Classification of Dynamic Routing Protocols...................................................................................................... 643

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xviii

8.4.2.7 Multi-process........................................................................................................................................................ 669

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xix

8.6.2.13 OSPF Fast Convergence..................................................................................................................................... 758

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xx

8.9.3 Terms, Acronyms, and Abbreviations..................................................................................................................... 821

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxi

9.3.3.1 Typical IGMP Applications..................................................................................................................................898

Issue 01 (2018-01-30) Huawei Proprietary and Confidential xxii

9.8.2.2 Basic Implantation Principles............................................................................................................................... 939