Protection

Automatic Protection Switching
Yaakov (J) Stein CTO RAD Data Communications
Mar 2012
Course Outline
General protection switching principles Examples of protection mechanisms SONET/SDH Ethernet linear protection Ethernet ring protection MPLS fast reroute MPLS-TP APS
Y(J)S APS
Slide 2
General principles
Definition References Traffic types Network topologies Triggers Protection classes Entities Protection types Signaling
Y(J)S APS
Slide 3
Definition
Automatic Protection Switching (APS) is a functionality of carrier-grade transport networks is often called resilience since it enables service to quickly recover from failures is required to ensure high reliability and availability APS includes :
detection of failures (signal fail or signal degrade) on a working channel switching traffic transmission to a protection channel selecting traffic reception from the protection channel (optionally) reverting back to the working channel once failure is repaired
Automatic means uses (at most) control plane protocols

Y(J)S APS Slide 4
Some useful references

G.808.1 generic linear protection G.808.2 generic ring protection (not yet written) G.841 and G.842 SDH G.774.3/4/9/10 SDH protection management G.870 and G.873.1 OTN G.8031 Ethernet linear protection G.8032 Ethernet ring protection G.8131 T-MPLS APS Y.1720 MPLS I.630 ATM M.495 analog signal protection G.781 clock selection (can be used to protect synchronization) RFC 4090 MPLS Fast ReRoute RFC 6372 MPLS-TP Survivability Framework RFC 6378 MPLS-TP Linear Protection
Y(J)S APS Slide 5
Traffic types
In a network with APS capabilities, there are three types of traffic :
protected traffic traffic that may be rapidly switched to protection channel

at any time it may be on the working channel or protection channel
Nonpreemptible Unprotected Traffic (NUT) noncritical traffic that does not require protection mechanism not affected by protection mechanism somewhat less expensive to customer extra (preemptible) traffic best effort background traffic that runs on protection channel
Y(J)S APS
Slide 6
Network topologies
APS can be defined for any topology with redundant links e.g., for tree topologies no protection is possible We will often discuss protection of individual links However, there are two topologies that are of particular interest :
rings protection is natural for rings although there are other reasons for using rings as well rings are so important that protection for other topologies is often called linear protection dense meshes for this topology multiple local bypasses can be preconfigured protection switching is similar to routing change, but
Y(J)S APS
Slide 7
Triggers
Protection switching is usually triggered by a failure although the operator may manually force a protection switch A failure is declared when a fault condition persists long enough for the ability to perform the required function to be considered terminated Failures are Signal Fail (SF) or Signal Degrade (SD) (of various
types)
and may be : detected by physical layer indicated by signaling (e.g. AIS) detected by OAM mechanisms When there is no SF or SD, the state is called No Request (NR)
Y(J)S APS Slide 8
Switching time (1)

SONET/SDH protection switching takes place in under 50 ms Regarding multiplex section shared protection rings, G.841 states :
The following network objectives apply: 1) Switch time In a ring with no extra traffic, all nodes in the idle state (no detected failures, no active automatic or external commands, and receiving only Idle K-bytes), and with less than 1200 km of fibre, the switch (ring and span) completion time for a failure on a single span shall be less than 50 ms. On rings under all other conditions, the switch completion time can exceed 50 ms (the specific interval is under study) to allow time to remove extra traffic, or to negotiate and accommodate coexisting APS requests.
while for linear VC trail protection, it says :

The following network objectives apply: 1) Switch time The APS algorithm for LO/HO VC trail protection shall operate as
Y(J)S APS Slide 9
Switching time (2)

This 50 ms time has become the golden standard and new protection schemes are expected to meet this objective However, studying the literature that lead up to SONET/SDH standards shows that the objective was to attain the minimum possible time for the sum of
persistent (i.e. non-transient) failure detection speed of light propagation signaling protocol time regaining sync alignment
and 50 ms was the minimum that was considered practical ! Many modern standards have built in 50 ms and much marketing literature boasts faster than 50 ms But there is really nothing special about 50 ms

50 ms gaps in voiced speech are noticeable, but not fatal if infrequent
Y(J)S APS
Slide 10
Protection classes
It is useful to distinguish two different protection classes
path protection (AKA trail protection, end-to-end protection) when a failure is detected on the end-to-end path we switch to an alternative end-to-end path the failure is usually detected by end-to-end OAM local protection (AKA local restoration, SNC protection, bypass,
detour)
we protect individual network elements, links, or groups of same when such an entity fails only that local entity is bypassed the failure may be detected by link OAM or physical
Y(J)S APS Slide 11
APS entities (1)

The following entities are important in APS

working channel channel used when no failure exists protection channel channel used when a failure exists head-end entity transmitting data to working/protection
channel
tail-end entity receiving data from the working/protection

channel we will usually consider traffic to be bidirectional so that the head-end for one direction working channel is the tail-end for the opposite direction protection channel
Note:
head-end
tail-end
Y(J)S APS
Slide 12
APS entities (2)
Bridge function at head-end that connects traffic (including

extra traffic) to the working and protection channels
Selector function at tail-end that extracts traffic (perhaps

extra traffic) from the working or protection channel
APS signaling channel channel used to communicate between head-end and tail-end for APS purposes Trail termination detection
head-end (bridge)
function responsible for failure

tail-end (selector)
working channel including injection and extraction of OAM protection channel signaling channel
Y(J)S APS
Slide 13
Revertive operation
Reversion means returning to use the working channel after the failure has been rectified Protection mechanisms can be revertive or nonrevertive Revertive mechanisms may be preferable when the working channel has better performance
BER, delay)

(free BW,
when there are frequent switches (easier to manage) when there is extra traffic
but nonrevertive also has advantages only one service disruption due to protection switching may be simpler to implement
Y(J)S APS
Slide 14
Uni/bi-directional
We will usually consider bidirectional traffic but even then the failures can be uni- or bidirectional and for unidirectional failures there can be uni- or bidirectional switching
unidirectional failure unidirectional protection working channel
protection channel in use working channel protection channel
bidirectional failure
bidirectional protection working channel

protection channel in use working channel protection channel in use
Y(J)S APS Slide 15
Uni- / bidirectional switching

Unidirectional switching may be advantageous

for 1+1 - faster and no signaling channel is needed no unnecessary service disruption for direction without failure higher chance of protection under multiple failures easier to implement for local protection maintains extra traffic in direction without failure
But bidirectional may be preferable
easier management since directions traverse same network elements does not disrupt delay balance between direction may simplify repair since failed spans are unused
Y(J)S APS
Slide 16
Protection types
We distinguish several different protection types 1+1 1:1 1:n m:n
(1:1)n
Each type has its applicability, advantages, and disadvantages and there are trade-offs between simplicity BW consumption protection switch time signaling requirements
Y(J)S APS Slide 17
1+1 protection
Simplest and fastest form of protection but wasteful - only 50% of actual physical capacity is used Head-end bridge always sends data on both channels Tail-end selector chooses channel to use (based on BER, dLOS, etc.) For unidirectional1+1 switching there is no need for APS signaling If non-revertive there is no distinction between working and protection channels
channel A
channel B
Y(J)S APS
Slide 18
1:1 protection
Head-end bridge usually sends data on working channel When failure detected it starts sending data over protection channel and tail-end needs to select the protection channel When not in use, protection channel can be used for extra traffic However, since failure is detected by tail-end, APS signaling is needed
working channel
Protection channel should have OAM running to ensure its functionality

extra traffic protection channel APS signaling
Y(J)S APS Slide 19
1:n protection
One protection channel is allocated for n working channels Only can protect one working channel at a time but improbable that more than 1 working channel will simultaneously fail Only 1/(n+1) of total capacity is reserved for protection
working channels protection channel

Y(J)S APS Slide 20
m:n protection
To enable protection of more than 1 channel m protection channels are allocated for n working channels (m < n) m simultaneous failures can be protected Less protection capacity dedicated than for n times 1:1 When failure detected, 1 of the m protection channels need to be assigned and signaled High complexity but conserves resources
working channels
protection channels
Y(J)S APS Slide 21
n (1:1)
(1:1) protection
This is like n times 1:1 but the n protection channels share bandwidth Only 1 failed working channel can be protected This is different from 1:n since n protection channels are preconfigured n working channels need not be of the same type Protection bandwidth must be at least that of the largest working channel
Y(J)S APS
Slide 22
APS algorithm
We have seen that protection switching is a tricky business So it is not surprising that network elements that support APS run an APS algorithm This algorithm inputs : configuration (protection type, revertive?, available channels, ) failure indications (NR, SF, SD) operator commands APS signaling (more on that soon) and makes switching decisions The algorithm maintains state information for head-end and tailend APS algorithms are detailed in standards documents
Y(J)S APS Slide 23
Priority
Not every failure event / operator command results in a protection switch For example in 1:n protection the protection channel may already be in use ! Conflicts are resolved by assigning priorities to events/commands When an event is detected or a command received the APS algorithm will not act if an event/command or equal or higher priority is already in effect True failure conditions usually have higher priority than manual
Y(J)S APS Slide 24
Timers
Even failure events with priority are not acted upon immediately to do so would cause unnecessary switches after transient defects The APS algorithm may maintains several timers, such as Holdoff timers the time between detection of a SF or SD event and the APS algorithm acting upon this even the algorithm usually used is called peek twice i.e., the condition is checked again after the timer expires Wait To Restore timer for revertive switching, the time between detection of the failure being cleared and the APS algorithm acting upon this event
also used in SDH optimized bidirectional 1+1 (nonrevertive)
Guard timer for rings blockout time during which APS messages are ignored (since they may be old and outdated)
Y(J)S APS
Slide 25
APS signaling
In all types except unidirectional 1+1, some APS signaling is needed APS signaling is used to synchronize between head-end and tailend It is critical that head-end and tail-end always be in the same state Example messages include :

No Request (NR) by tail-end to inform head-end of Signal Failure (SF) by head-end to confirm the events priority by head-end to report the particular protection channel by head-end to inform tail-end of Reverse (bidirectional) Request (RR)
Y(J)S APS
Slide 26
APS signaling phases

When APS signaling is used, it needs to be as rapid as possible Depending on the scenario it may be
1-phase tailhead (fastest) tail-end informs head-end of failure both ends uniquely know the protection channel to be used only for 1+1 and unidirectional-(1:1)n 1) tailhead 2) headtail tail-end informs head-end of failure head-end signals that it has switched to protection channel not for bidirectional-1:n or m:n
Y(J)S APS Slide 27
(including 1:1)
2-phase
Examples of 1-phase
Example of when 1-phase signaling is possible is 1:1 or (1:1)n 1. upon detection of failure the tail-end sends SF to the headend and immediately changes its selector (blind switch) upon receipt the head-end changes the bridge setting (no priority is checked) 1-phase can also be used for bidirectional 1:1 1. upon detection of failure the tail-end sends SF to the headend and immediately changes both its selector and bridge upon receipt the head-end changes its bridge and selector
Y(J)S APS Slide 28
Example of 2-phase
2-phase is useful for unidirectional 1:n with priority checking 1. upon detection of failure the tail-end sends SF to the headend but does not change its selector 2. the head-end checks priority sends confirmation to tail-end (with identity of working channel) the bridge setting is changed 3. the tail-end changes its selector
Y(J)S APS
Slide 29
Example of 3-phase
3-phase signaling is imperative for bidirectional 1:n 1. upon detection of failure the tail-end sends SF to the headend but does not change its selector 2. the head-end checks priority, and sends confirmation to tailend head-end changes its bridge setting and also sends a reverse request 3. the tail-end changes selector checks priority and sends confirmation to head-end tail-end changes its bridge setting (as head-end of opposite direction) head-end receives confirmation and changes its selector
Y(J)S APS Slide 30
For G.805 buffs

to add 1+1 trail protection to a trail - expand a trail termination function
we use a special transport processing function - the protection switch
unprotected trail protected trail
the unprotected TTs report status to the protection switch
Y(J)S APS
Slide 31
SONET/SDH APS
Y(J)S APS
Slide 32
SONET protection ?
SONET/SDH networks need to be highly reliable (five nines) Down-time should be minimal (less than 50 msec) So systems must repair themselves (no time for manual
intervention)
Upon detection of a failure (dLOS, dLOF, high BER) the network must reroute traffic (protection switching) from working channel to protection channel SDH APS is unidirectional SDH APS may be revertive
working channel
head-end NE
protection channel
tail-end NE
Y(J)S APS
Slide 33
SONET/SDH layers
ADM
Path Termination Line Termination
regenerator
Section Termination
ADM
Line Termination Path Termination
path line section section line (MS section) section line section
Between regenerators there are sections (regenerator sections) Between ADMs there are lines (multiplex sections) Between path terminations there are paths Protection can be at OC-n level (different physical fibers) or at STM/VC level or end-to-end path (trail protection)
Y(J)S APS Slide 34
Line APS
90 columns
3 rows
9 rows
6 rows
Synchronous Payload Envelope

TOH
A1 B1 D1 H1 B2 D4 D7 DA S1
A2 E1 D2 H2 K1 D5 D8 DB M0
J0 F1 D3 H3 K2 D6 D9 DC E2
TOH consists of
3 rows of section overhead - frame sync, trace, EOC, 6 rows of line overhead - pointers, SSM, FEBE, and Line APS signaling uses bytes K1 and K2
Y(J)S APS
Slide 35
HO Path APS
J1 B3 C2 G1 F2 H4 F3 K3 N1
POH POH is responsible for type, status, path performance monitoring, VCAT,
trace
HO Path APS signaling uses 4 MSBs of byte K3

Y(J)S APS Slide 36
30
LO Path APS
59 87
V5
VC OH is responsible for Timing, PM, REI, LO Path APS signaling is 4 MSBs of byte K4
V1 J2 V2 N2 V3 K4 V4
VC OH
Y(J)S APS Slide 37
How does it work?

Head-end and tail-end NEs have bridges (muxes) Head-end and tail-end NEs maintain bidirectional signaling channel Signaling is contained in K bytes of protection channel For line APS K1 tail-end status and requests K2 head-end status
head-end bridge tail-end bridge
working channel
protection channel
signaling channel
Y(J)S APS Slide
Linear 1+1 protection

Can be at OC-n level (different physical fibers) or at STM/VC level (SubNetwork Connection Protection) or end-to-end path (called trail protection) Head-end bridge always sends data on both channels Tail-end chooses channel to use based on BER, dLOS, etc. No need for signaling If non-revertive there is no distinction between working and protection channels
working channel
head-end NE
protection channel
tail-end NE
Y(J)S APS Slide 39
Linear 1:1 protection

Head-end bridge usually sends data on working channel When tail-end detects failure it signals (using K1) to head-end Head-end then starts sending data over protection channel When not in use protection channel can be used for (discounted) extra traffic
(pre-emptible unprotected traffic)
May be at any layer (but only OC-n level protects against fiber cuts)
working channel
extra traffic protection channel

Y(J)S APS Slide 40
Linear 1:N protection

In order to save BW we allocate 1 protection channel for every N working channels N limited to 14 4 bits in K1 byte from tail-end to head-end
0 protection channel 1-14 working channels 15 extra traffic channel
working channels protection channel

Y(J)S APS Slide 41
Ring based protection is popular in North America (100K+

rings)
Two fiber vs. Four-fiber rings
Full protection against physical fiber cuts Simpler and less expensive than mesh topologies Protection at line (multiplexed section) or path layer Four-fiber rings fully redundant at OC level can support bidirectional routing at line layer Two-fiber rings support unidirectional routing at line layer
2 fibers in opposite directions

Y(J)S APS Slide 42
Unidirectional vs. bidirectional

Unidirectional routing working channel B-A same direction (e.g. clockwise) as A-B management simplicity: A-B and B-A can occupy same timeslots Inefficient: waste in ring BW and excessive delay in one direction Bidirectional routing A-B and B-1 are opposite in direction both using shortest route spatial reuse: timeslots can be reused in other sections A-B B A-B B B-A A A C-B B-A C
Y(J)S APS Slide 43
B-C
UPSR vs. BLSR (MS-SPRing)

UPSR BLSR Unidirectional Bidirectional Path switching Line switching Two-fiber Four-fiber
Of all the possible combinations, only a few are in use Unidirectional (routing) Path Switched Rings protects tributaries extension of 1+1 to ring topology Bidirectional (routing) Line Switched Rings (two-fiber and fourfiber versions) called Multiplex Section Shared Protection Ring in SDH simultaneously protects all tributaries in STM extension of 1:1 to ring topology
Y(J)S APS Slide 44
UPSR
Working channel is in one direction protection channel in the opposite direction All path traffic is added in both directions (1+1) decision as to which to use is made at drop point (no signaling) Normally non-revertive, so effectively two diversity paths Good match for access networks 1 access resilient ring less expensive than fiber pair per customer Inefficient for core networks no spatial reuse every signal in every span in both directions node needs to continuously monitor every tributary to be dropped
2 rings
SONET ADM
Y(J)S APS Slide 45
BLSR
Switch at line level less monitoring When failure detected tail-end NE signals head-end NE Works for unidirectional/bidirectional fiber cuts, and NE failures Two-fiber version half of OC-N capacity devoted to protection only half capacity available for traffic Four-fiber version full redundant OC-N devoted to protection twice as many NEs as compared to two-fiber
wrap-around
2 rings
Example recovery from unidirectional fiber cut

Y(J)S APS Slide 46
Ethernet linear APS
STP LAG G.8031
Y(J)S APS
Slide 47
STP
The original Spanning Tree Protocol automatically removed loops from arbitrary networks (with loops) However, its convergence was very slow (about a minute) STP can not be used as a protection mechanism since its reconvergence time is very long due to a cumbersome protocol and long holdoff timer settings An evolutionary update called Rapid STP 802.1w was incorporated into 802.1D-2004 clause 17 that converges in about the same time as STP but can reconverge after a topology change in less than 1 second RSTP can be used to detect failures and reconverge and thus can be used as a primitive protection mechanism However, the switching time will be many tens of ms to 100s of
Y(J)S APS Slide
Use of LAG
Ethernet link aggregation
teaming) (AKA bonding, Ethernet trunk, inverse mux, NIC
enables bonding several ports together as single uplink Defined by 802.3ad task force and folded into 802.3-2000 as clause 43 Binding of ports to Link Aggregation Groups (LAGs) distributed via Link Aggregation Control Protocol (LACP) LACP uses slow protocol frames (up to 5 per second) Links may be dynamically added/removed from LAG and LACP continuously monitors to detect if changes needed Upon link failure LAG delivers traffic at a reduced rate Thus LAG can be used as a primitive protection mechanism When used this way it is called worker/standby or N+N mode
Y(J)S APS Slide
G.8031
Q9 of SG15 in the ITU-T is responsible for protection switching In 2006 it produced G.8031 Linear Ethernet Protection Switching G.8031 uses standard Ethernet formats, but is incompatible with STP The standard addresses point-to-point VLAN connections SNC (local) protection class 1+1 and 1:1 protection types unidirectional and bidirectional switching for 1+1 bidirectional switching for 1:1 revertive and nonrevertive modes 1-phase signaling protocol G.8031 uses Y.1731 OAM CCM messages in order to detect failures G.8031 defines a new OAM opcode (39) for APS signaling messages
Y(J)S APS Slide
G.8031 signaling
The APS signaling message looks like this : MEL (3b)
(4b)
VER=0 (5b)
(4b)
OPCODE=39
(1B)
FLAGS=0
(1B)
OFFSET=4
(1B)
req/state prot. type END=0

(1B)
requested sig
(1B)
bridged sig
(1B)
reserved
(1B)
regular APS messages are sent 1 per 5 seconds after change 3 messages are sent at max rate (300 per sec)
where
req/state identifies the message (NR, SF, WTR, SD, forced switch, etc) prot. type identifies the protection type (1+1, 1:1, uni/bidirectional, etc.)
Y(J)S APS Slide
In the normal (NR) state :

G.8031 1:1 revertive operation
head-end and tail-end exchange CCM (at 300 per second rate) on both working and protection channels head-end and tail-end exchange NR APS messages on the protection channel (every 5 seconds) tail-end stops receiving 3 CCM messages on working channel tail-end enters SF state tail-end sends 3 SF messages at 300 per second on the APS channel tail-end switches selector (bi-d and bridge) to the protection channel head-end (receiving SF) switches bridge (bi-d and selector) to protection channel tail-end continues sending SF messages every 5 seconds head-end sends NR messages but with bridged=normal tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12 min) tail-end sends WTR message to head-end (in nonrevertive - DNR message) tail-end sends WTR every 5 seconds Y(J)S APS Slide
When a failure appears in the working channel

When the failure is cleared

Ethernet ring APS

G.8032 RPR CLEER
Y(J)S APS
Slide 53
Ethernet rings ?
Ethernet has become carrier grade : deterministic connection-oriented forwarding OAM synchronization The only thing missing to completely replace SDH is ring protection However, Ethernet and ring architectures dont go together Ethernet has no TTL, so looped traffic will loop forever STP builds trees out of any architecture no loops allowed There are two ways to make an Ethernet ring open loop cut the ring by blocking some link when protection is required - block the failed link
closed loop disable STP (but avoid infinite loops in some way !)
Y(J)S APS
Slide 54
Ethernet ring protocols

Open loop methods G.8032 (ERPS) rSTP (ex 802.1w) RFER (RAD) ERP (NSN) RRST (based on RSTP) REP (Cisco) RRSTP (Alcatel) RRPP (Huawei) EAPS (Extreme, RFC 3619) EPSR (Allied Telesis) PSR (Overture) Closed loop methods RPR (IEEE 802.17) CLEER and NERT (RAD)
Y(J)S APS Slide 55
G.8032
Q9 of SG15 produced G.8032 between 2006 and 2008 G.8032 is similar to G.8031 strives for 50 ms protection (< 1200 km, < 16 nodes) but here this number is deceiving as MAC table is flushed standard Ethernet format but incompatible with STP uses Y.1731 CCM for failure detection employs Y.1731 extension for R-APS signaling (opcode=40) R-APS message format similar to APS of G.8031
(but between every 2 nodes and to MAC address 01-19-A7-00-0001)
revertive and nonrevertive operation defined
However, G.8032 is more complex due to requirement to avoid loop creation under any circumstances need to localize failures
Y(J)S APS Slide 56
RPL
G.8032v1 defines the Ring Protection Link (RPL) as the link to be blocked (to avoid closing the loop) in NR state One of the 2 nodes connected to the RPL is designated the RPL owner
Unlike RFER there is only one RPL owner the RPL and owner are designated before setup operation is usually revertive
All ring nodes are simultaneously in 1 of 2 modes idle or protecting in idle mode the RPL is blocked in protecting mode the failed link is blocked and RPL is unblocked in revertive operation once the failure is cleared the block link is unblocked
Y(J)S APS
Slide 57
In the idle state :

G.8032 revertive operation
adjacent nodes exchange CCM at 300 per second rate (including over RPL) exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5 seconds (but not over RPL) R-APS messages are never forwarded node(s) missing CCM messages peek twice with holdoff time node(s) block failed link and flush MAC table node(s) send SF message (3 times @ max rate, then every 5 sec) node receiving SF message will check priority and unblock any blocked link node receiving SF message will send SF message to its other neighbor in stable protecting state SF messages over every unblocked link node(s) detect CCM and start guard timer (blocks acting on R-APS messages) node(s) send NR messages to neighbors (3 times @ max rate, then every 5
sec)
When a failure appears between 2 nodes

When the failure is cleared

RPL owner receiving NR starts WTR timer when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB
Y(J)S APS Slide
G.8032-2010
After coming out with G.8032 in 2008 (G.8032v1) the ITU came out with G.8032-2010 (G.8032v2) in 2010 This new version is not backwards-compatible with v1 but a v2 node must support v1 as well (but then operation is
according to v1)
RPL RPL
next neighbor
RPL
owner
RPL
neighbor
Major differences :
2 designated nodes RPL owner node and RPL neighbor node and for optional flush-optimization next neighbor node significant changes to state machine priority logic commands (forced/manual/clear) and protocol subring new Wait To Block timer ring subring supports more general topologies (sub-rings) ladders (For Further Study in v1) multi-ring ladder ring topology discovery Y(J)S APS Slide 59 virtual channel based on VLAN or MAC address
RPR 802.17
Resilient Packet Rings are compatible with standard Ethernet, but different frame format are robust (lossless, <50ms protection, OAM) are fair (based on client throttling) support QoS (3 classes A, B, C) ringlet0 are efficient (full spatial reuse) are plug and play (automatic station autodiscovery) extend use of existing fiber rings
ringlet1
counter-rotating add/drop ringlets, running

SONET/SDH (any rate, PoS, GFP or LAPS) or packetPHY (1 or 10 Gb/s ETH PHY)
ringlet selection
developed by 802.17 WG
based on Ciscos Spatial Reuse Protocol (RFC 2892)
Y(J)S APS
Slide
Basic RPR queuing

traffic going around ring
placed into internal buffer in dual-transit queue mode placed into 1 of 2 buffers according to service class sent according to fairness
A C C B B
PTQ STQ
traffic for local sink

placed in output buffer according to service class
Primary/Secondary Transit Queue
traffic from local source

sent according to fairness first sent to ringlet selection
Y(J)S APS Slide
fairness
RPR service classes

RPR defines 3 main classes class A : real time (low latency/FDV) class B : near real time (bounded predictable latency/FDV) class C : best effort class A0 A1 RT RT use info rate
reserved allocated, reclaimable
D/FDV
low low bounded
FE No No No
B-CIR near RT allocated,

reclaimable
B-EIR C
near RT opportunistic unbounded Yes BE

opportunistic unbounded Yes
Y(J)S APS Slide
RPR Class use

A0 ring BW is reserved not reclaimed even if no traffic in dual-transit queue mode: class A frames from the ring are queued in PTQ class B, C in STQ priority for egress frames in PTQ local class A frames local class B (when no frames in PTQ) frames in STQ local class C (when no PTQ, STQ, local A or B) Notes: class A have minimal delay class B have higher priority than STQ transit frames, so bounded delay/FDV classes B and C share STQ, so once in ring have similar delay
Y(J)S APS Slide
RPR - protection
rings give inherent protection against single point of failure RPR specifies 2 mechanisms steering wrapping (optional) (implementations may also do wrapping then steering)
steering info
wrap
Y(J)S APS Slide
NERT and CLEER

New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet Ring Similar to RPR but uses real Ethernet format NERT and CLEER distinguish between ring nodes switches connected to ring nodes Traffic in ring is MAC-in-MAC encapsulated External MACs are of ring node Internal MACs are original Unexpected external MACs discarded External MACs learned as in 1ah Ring nodes forward according to table NERT floods, CLEER never floods Protection switch only involves changing table switches
Y(J)S APS Slide
ring nodes
MPLS fast reroute

IP FRR RFC 4090
Y(J)S APS
Slide 66
IP FRR
True protection mechanisms do not exist for connectionless IP In practice, routing protocols discover breaks and recalculate routes but this usually takes a long time Link-state IGPs detect link-down state using hellos for OSPF - typically every 10 sec, and detection after 40 sec and then Dijkstra algorithm avoids the failed link BFD can be used to speed up the detection However, the information still has to be propagated further (seconds?) and FIBs updated (100s of ms) Various IP Fast ReRoute (IP FRR) mechanisms have been proposed but true protection is best done at the MPLS level
Y(J)S APS Slide 67
MPLS fast reroute

RSVP-TE enables MPLS traffic engineering by fine control over placement specifies explicit path using information gathered from IGP resources may be reserved at LSRs along the way RFC 4090 defines extensions to RSVP-TE Fast ReRoute (FRR) LSRs along the path preconfigure local bypasses (detours) not Upon detection of failure by discussed BFD (specified in microseconds, typically 10s of ms) or in RFC RSVP hellos (RFC default is 5 ms) or 4090 RESV / PATH messages (driven by IGP) upstream LSR simply enables the detour Since this is a local action, it should be fast RFC 4090 only discusses adding FRR to RSVP-TE network but its use with LDP is possible if there is a single label generator
Y(J)S APS
Slide 68
PLRs and MPs

A fundamental entities in MPLS FRR are Point of Local Repair (PLR) Merge Point (MP) A PLR is the LSR before the failed element (link or node) All LSRs except the egress LER can be PLRs The PLR is solely responsible for the FRR (no explicit APS signaling) During path setup, potential PLRs create detours towards the egress LER A MP is the LSR where the detour rejoins egress the LSP ingress
LER LER MP All LSRs except the PLR ingress LER can be MPs
Y(J)S APS
Slide 69
Methods
RFC 4090 defines two different protection methods Usually one or the other is employed in a given network One-to-one backup each LSP protected separately detour LSP created for each LSP at each potential PLR no labels pushed PLR MP
Facility backup backup tunnel for multiple LSPs bypass tunnel created at each potential PLR uses label stacking
PLR
MP
Y(J)S APS
Slide 70
NHOP and NNHOP

MPLS FRR can bypass a failed link or a failed node In order to bypass a single failed link we need an alternative path to the next hop (NHOP)
PLR MP
In order to bypass a single failed node, we need an alternative path to the next next hop (NNHOP)
PLR MP
Y(J)S APS
Slide 71
MPLS TP APS
RFC 6372 (MPLS-TP Survivability Framework) RFC 6378 (MPLS-TP Linear Protection) draft-ietf-mpls-tp-ring-protection
Y(J)S APS
Slide 72
MPLS-TP resilience
Since it strives to be a carrier-grade transport network TP has strong protection switching requirements APS has been almost as contentious issue as OAM and indeed the arguments are inter-related RFC 6372 gives a general framework and differentiates between linear shared-mesh and ring protection
Y(J)S APS
Slide 73
Linear protection
from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection) 1+1, 1:1, 1:n and uni/bidi are supported APS signaling protocol (for all modes except 1+1 uni) is single-phase and called the Protection State Coordination protocol PSC messages are sent over the protection channel APS messages are sent over the GACh with a single channel type message functions identified by a request field 6 states: normal, protecting due to failure, admin protecting, WTR, protection path unavailable, DNR when revertive, a WTR timer is used
Y(J)S APS Slide 74
PSC message format

S=1 GAL Label (13) TTL
00000000 PSC
TC
GAL GACh
0001 VER channel type
Ver Request PT R Path TLV Length
Res
FPath PSC
Res Optional TLVs
Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n R = Revertive FPath = which path has fault Path = which data path is on Y(J)S APS
Slide 75
PSC control logic states

Normal state - no trigger events reported Unavailable state - protection path is unavailable Protecting failure state traffic is being transported on the protection path Protecting administrative state operator issued command switching traffic to protection path Wait-to-Restore state - recovering from working path SF/SD WTR timer not up Do-not-Revert state - recovered from a protecting state but operator has configured DNR
Y(J)S APS Slide 76
PSC local requests

In order from highest to lowest priority :
1. Clear (operator command) 2. Lockout of protection (operator command) 3. Forced Switch (operator command) 4. Signal Fail on protection (OAM / control-plane / server indication) 5. Signal Fail on working (OAM / control-plane / server indication) 6. Signal Degrade on working (OAM / control-plane / server indication) 7. Clear Signal Fail/Degrade (OAM / control-plane / server indication) 8. Manual Switch (operator command) 9. WTR Expires (WTR timer) 10. No Request (default)
Y(J)S APS Slide 77
Linear protection ITU style

from draft-zulr-mpls-tp-linear-protectionswitching Similar to previous, but uses Y.1731/G.8031 format (no
surprise!)
S=1
GAL Label (13) TTL
TC
GAL GACh
0001 VER 00000000 allocated channel type ME VER OPCODE =39 request ed sig FLAGS =0 bridged sig OFFSE T=4 reserve d
L req prot state type END= 0
G.803 1
Y(J)S APS
Slide 78
Ring protection
once again there were two drafts, both supporting p2p and p2mp, wrapping and steering, link/node failures draft-ietf-mpls-tp-ring-protection (not yet RFC)
Between any 2 LSRs can define a Sub-Path Maintenance Entity So between 2 LSRs on a ring there are 2 SPMEs we define 1 as the working channel and 1 as the protection channel Now we re-use the linear protection mechanisms, including the PSC protocol
draft-helvoort-mpls-tp-ring-protection-switching
Both counter-rotating rings carry working and protection traffic The bandwidth on each ring is divided X BW is dedicated to working traffic and Y dedicated to protection traffic The protection bandwidth of one ring is used to protect the
Y(J)S APS
Slide 79

Protection

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Protection

Încărcat de

Drepturi de autor:

Formate disponibile

Automatic Protection Switching

Yaakov (J) Stein CTO RAD Data Communications

Automatic means uses (at most) control plane protocols

Some useful references

protected traffic traffic that may be rapidly switched to protection channel

Switching time (1)

while for linear VC trail protection, it says :

Switching time (2)

50 ms gaps in voiced speech are noticeable, but not fatal if infrequent

APS entities (1)

tail-end entity receiving data from the working/protection

APS entities (2)

Bridge function at head-end that connects traffic (including

Selector function at tail-end that extracts traffic (perhaps

function responsible for failure

protection channel in use working channel protection channel

bidirectional protection working channel

Uni- / bi- directional switching

But bidirectional may be preferable

Protection channel should have OAM running to ensure its functionality

working channels protection channel

APS signaling phases

For G.805 buffs

the unprotected TTs report status to the protection switch

Synchronous Payload Envelope

HO Path APS signaling uses 4 MSBs of byte K3

How does it work?

Linear 1+1 protection

Linear 1:1 protection

extra traffic protection channel

Linear 1:N protection

working channels protection channel

Ring based protection is popular in North America (100K+

Two fiber vs. Four-fiber rings

2 fibers in opposite directions

Unidirectional vs. bidirectional

UPSR vs. BLSR (MS-SPRing)

Example recovery from unidirectional fiber cut

Ethernet linear APS

STP LAG G.8031

req/state prot. type END=0

In the normal (NR) state :

G.8031 1:1 revertive operation

When a failure appears in the working channel

When the failure is cleared

Ethernet ring APS

Ethernet ring protocols

revertive and nonrevertive operation defined

In the idle state :

G.8032 revertive operation

When a failure appears between 2 nodes

When the failure is cleared

counter-rotating add/drop ringlets, running

based on Ciscos Spatial Reuse Protocol (RFC 2892)

Basic RPR queuing

traffic for local sink

Primary/Secondary Transit Queue

traffic from local source

RPR service classes

B-CIR near RT allocated,

near RT opportunistic unbounded Yes BE

RPR Class use

NERT and CLEER