Documente Academic
Documente Profesional
Documente Cultură
Protection
Switching
Yaakov (J) Stein
CTO
RAD Data Communications
Mar 2012
Course Outline
General protection switching principles
Examples of protection mechanisms
SONET/SDH
Ethernet linear protection
Ethernet ring protection
MPLS fast reroute
MPLS-TP APS
Y(J)S APS
Slide 2
General principles
Definition
References
Traffic types
Network topologies
Triggers
Protection classes
Entities
Protection types
Signaling
Y(J)S APS
Slide 3
Definition
Automatic Protection Switching (APS)
is a functionality of carrier-grade transport networks
is often called resilience
since it enables service to quickly recover from failures
is required to ensure high reliability and availability
APS includes :
Slide 4
Slide 5
Traffic types
In a network with APS capabilities, there are three types of traffic :
protected traffic
traffic that may be rapidly switched to protection channel
Y(J)S APS
Slide 6
Network topologies
APS can be defined for any topology with redundant links
e.g., for tree topologies no protection is possible
We will often discuss protection of individual links
However, there are two topologies that are of particular interest :
rings
protection is natural for rings
although there are other reasons for using rings as well
rings are so important that protection for other topologies
is often called linear protection
dense meshes
for this topology multiple local bypasses can be preconfigured
protection switching is similar to routing change, but faster
often called Fast ReRoute (FRR)
Y(J)S APS
Slide 7
Triggers
Protection switching is usually triggered by a failure
although the operator may manually force a protection switch
A failure is declared when a fault condition
persists long enough
for the ability to perform the required function
to be considered terminated
Failures are Signal Fail (SF) or Signal Degrade (SD) (of various types)
and may be :
Y(J)S APS
Slide 8
Switching time
(1)
Slide 9
Switching time
(2)
Slide 10
Protection classes
It is useful to distinguish two different protection classes
Slide 11
APS entities
(1)
head-end
protection channel
tail-end
Y(J)S APS
Slide 12
APS entities
(2)
(including
(perhaps
Trail termination
detection
working
channel of OAM
including injection and
extraction
head-end
(bridge)
protection channel
tail-end
(selector)
signaling channel
Y(J)S APS
Slide 13
Revertive operation
Reversion means returning to use the working channel
after the failure has been rectified
Protection mechanisms can be revertive or nonrevertive
Revertive mechanisms may be preferable
when the working channel has better performance (free BW, BER, delay)
Y(J)S APS
Slide 14
Uni/bi-directional
We will usually consider bidirectional traffic
but even then the failures can be uni- or bi- directional
and for unidirectional failures there can be uni- or bi- directional switching
unidirectional
failure
unidirectional
protection working channel
protection channel in use
working channel
protection channel
bidirectional
failure
bidirectional
protection working channel
protection channel in use
working channel
protection channel in use
Y(J)S APS
Slide 15
Y(J)S APS
Slide 16
Protection types
We distinguish several different protection types
1+1
1:1
1:n
m:n
(1:1)n
Each type has its applicability, advantages, and
disadvantages
and there are trade-offs between
simplicity
BW consumption
signaling requirements
Y(J)S APS Slide 17
1+1 protection
Simplest and fastest form of protection
but wasteful - only 50% of actual physical capacity is used
Head-end bridge always sends data on both channels
Tail-end selector chooses channel to use (based on BER, dLOS, etc.)
For unidirectional1+1 switching there is no need for APS signaling
If non-revertive
there is no distinction between working and protection channels
channel A
channel B
Y(J)S APS
Slide 18
1:1 protection
Head-end bridge usually sends data on working channel
When failure detected it starts sending data over protection channel
and tail-end needs to select the protection channel
When not in use, protection channel can be used for extra traffic
However, since failure is detected by tail-end, APS signaling is needed
Protection channel should have OAM running to ensure its functionality
working channel
extra traffic
protection channel
APS signaling
Y(J)S APS
Slide 19
1:n protection
One protection channel is allocated for n working channels
Only can protect one working channel at a time
but improbable that more than 1 working channel will
simultaneously fail
Only 1/(n+1) of total capacity is reserved for protection
working channels
protection channel
Y(J)S APS
Slide 20
m:n protection
To enable protection of more than 1 channel
m protection channels are allocated for n working channels (m < n)
m simultaneous failures can be protected
Less protection capacity dedicated than for n times 1:1
When failure detected,
1 of the m protection channels need to be assigned and signaled
High complexity but conserves resources
working channels
protection channels
Y(J)S APS
Slide 21
(1:1)n protection
This is like n times 1:1 but the n protection channels share
bandwidth
Only 1 failed working channel can be protected
This is different from 1:n since
n protection channels are preconfigured
n working channels need not be of the same type
Protection bandwidth must be at least that of the largest working channel
Y(J)S APS
Slide 22
APS algorithm
We have seen that protection switching is a tricky business
So it is not surprising that network elements that support APS
run an APS algorithm
This algorithm inputs :
operator commands
Y(J)S APS
Slide 23
Priority
Not every failure event / operator command results in a protection
switch
For example
in 1:n protection the protection channel may already be in use !
Conflicts are resolved by assigning priorities to events/commands
When an event is detected or a command received
the APS algorithm will not act
if an event/command or equal or higher priority is already in effect
True failure conditions usually have higher priority than manual
commands
Y(J)S APS
Slide 24
Timers
Even failure events with priority are not acted upon immediately
to do so would cause unnecessary switches after transient defects
The APS algorithm may maintains several timers, such as
Holdoff timers
the time between detection of a SF or SD event
and the APS algorithm acting upon this even
the algorithm usually used is called peek twice
i.e., the condition is checked again after the timer expires
Guard timer
for rings blockout time during which APS messages are
ignored (since they may be old and outdated)
Y(J)S APS
Slide 25
APS signaling
In all types except unidirectional 1+1, some APS signaling is needed
APS signaling is used to synchronize between head-end and tail-end
It is critical that head-end and tail-end always be in the same state
Example messages include :
No Request (NR)
Slide 26
2-phase
1) tailhead 2) headtail
Y(J)S APS
Slide 27
Examples of 1-phase
Example of when 1-phase signaling is possible is 1:1 or (1:1) n
1. upon detection of failure the tail-end sends SF to the headend
and immediately changes its selector (blind switch)
upon receipt the head-end changes the bridge setting
(no priority is checked)
1-phase can also be used for bidirectional 1:1
1. upon detection of failure the tail-end sends SF to the headend
and immediately changes both its selector and bridge
upon receipt the head-end changes its bridge and selector
Y(J)S APS
Slide 28
Example of 2-phase
2-phase is useful for unidirectional 1:n with priority checking
1. upon detection of failure the tail-end sends SF to the headend
but does not change its selector
2. the head-end checks priority
sends confirmation to tail-end (with identity of working
channel)
the bridge setting is changed
3. the tail-end changes its selector
Y(J)S APS
Slide 29
Example of 3-phase
3-phase signaling is imperative for bidirectional 1:n
1. upon detection of failure the tail-end sends SF to the head-end
but does not change its selector
2. the head-end checks priority, and sends confirmation to tail-end
head-end changes its bridge setting
and also sends a reverse request
3. the tail-end changes selector
checks priority and sends confirmation to head-end
tail-end changes its bridge setting (as head-end of opposite
direction)
head-end receives confirmation and changes its selector
Y(J)S APS
Slide 30
protected trail
Y(J)S APS
Slide 31
SONET/SDH APS
Y(J)S APS
Slide 32
SONET protection ?
SONET/SDH networks need to be highly reliable (five nines)
Down-time should be minimal (less than 50 msec)
So systems must repair themselves (no time for manual
intervention)
head-end NE
protection channel
tail-end NE
Y(J)S APS
Slide 33
SONET/SDH layers
ADM
regenerator
ADM
Path
Line
Section
Line
Path
Termination
Termination
Termination
Termination
Termination
path
line
section
section
line
section
Slide 34
Line APS
A1
A2
J0
B1
E1
F1
D1
D2
D3
H1
H2
H3
B2
K1
K2
D4
D5
D6
D7
D8
D9
DA
DB
DC
S1
M0
E2
9 rows
6 rows
3 rows
90 columns
TOH consists of
Slide 35
HO Path APS
J1
B3
C2
G1
F2
H4
F3
K3
N1
POH
POH is responsible for type, status, path performance monitoring,
VCAT, trace
Slide 36
30
LO Path APS
59
87
V5
VC OH is responsible for
Timing, PM, REI,
LO Path APS signaling
is
4 MSBs of byte K4
V1
J2
V2
N2
V3
K4
V4
VC OH
Y(J)S APS
Slide 37
tail-end bridge
working channel
protection channel
signaling channel
Y(J)S APS
Slide
head-end NE
protection channel
tail-end NE
Y(J)S APS
Slide 39
May be at any layer (but only OC-n level protects against fiber cuts)
working channel
extra traffic
protection channel
Y(J)S APS
Slide 40
working channels
protection channel
Y(J)S APS
Slide 41
Slide 42
Unidirectional vs.
bidirectional
Unidirectional routing
working channel B-A same direction (e.g. clockwise) as A-B
management simplicity: A-B and B-A can occupy same timeslots
Inefficient: waste in ring BW and excessive delay in one direction
Bidirectional routing
A-B and B-1 are opposite in direction
both using shortest route
spatial reuse: timeslots can be reused in other sections
A-B
A-B
B-C
B-A
A
A
C-B
B-A
C
Y(J)S APS
Slide 43
(MS-SPRing)
UPSR
Unidirectional
Path switching
Two-fiber
BLSR
Bidirectional
Line switching
Four-fiber
Slide 44
UPSR
Working channel is in one direction
protection channel in the opposite direction
All path traffic is added in both directions (1+1)
decision as to which to use is made at drop point (no
signaling)
Normally non-revertive, so effectively two diversity paths
Good match for access networks
1 access resilient ring
less expensive than fiber pair per customer
Inefficient for core networks
no spatial reuse
every signal in every span
in both directions
node needs to continuously monitor
every tributary to be dropped
2 rings
SONET ADM
Y(J)S APS
Slide 45
BLSR
Switch at line level less monitoring
When failure detected tail-end NE signals head-end NE
Works for unidirectional/bidirectional fiber cuts, and NE
failures
Two-fiber version
half of OC-N capacity devoted to protection
only half capacity available for traffic
Four-fiber version
full redundant OC-N devoted to protection
twice as many NEs as compared to two-fiber
wrap-around
2 rings
Example
recovery from unidirectional fiber cut
Y(J)S APS
Slide 46
STP
LAG
G.8031
Y(J)S APS
Slide 47
STP
The original Spanning Tree Protocol automatically removed loops
from arbitrary networks (with loops)
However, its convergence was very slow (about a minute)
STP can not be used as a protection mechanism
since its reconvergence time is very long
due to a cumbersome protocol
and long holdoff timer settings
An evolutionary update called Rapid STP 802.1w
was incorporated into 802.1D-2004 clause 17
that converges in about the same time as STP
but can reconverge after a topology change in less than 1 second
RSTP can be used to detect failures and reconverge
and thus can be used as a primitive protection mechanism
However, the switching time will be many tens of ms to 100s of ms
Y(J)S APS
Slide
Use of LAG
Ethernet link aggregation
teaming)
Slide
G.8031
Q9 of SG15 in the ITU-T is responsible for protection switching
In 2006 it produced G.8031 Linear Ethernet Protection Switching
G.8031 uses standard Ethernet formats, but is incompatible with
STP
The standard addresses
point-to-point VLAN connections
SNC (local) protection class
1+1 and 1:1 protection types
unidirectional and bidirectional switching for 1+1
bidirectional switching for 1:1
revertive and nonrevertive modes
1-phase signaling protocol
G.8031 uses Y.1731 OAM CCM messages in order to detect failures
G.8031 defines a new OAM opcode (39) for APS signaling messages
Switching times should be under 50 ms (only holdoff timers when groups)
Y(J)S APS
Slide
G.8031 signaling
The APS signaling message looks like this :
MEL
(3b)
VER=0
(5b)
END=0
(1B)
(4b)
OPCODE=39
FLAGS=0
OFFSET=4
requested sig
bridged sig
reserved
(1B)
(1B)
(1B)
(1B)
(1B)
(1B)
where
req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)
Y(J)S APS
Slide
head-end and tail-end exchange CCM (at 300 per second rate)
on both working and protection channels
head-end and tail-end exchange NR APS messages
on the protection channel (every 5 seconds)
tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12
min)
tail-end sends WTR message to head-end (in nonrevertive - DNR message)
tail-end sends WTR every 5 seconds
when WTR expires both sides enter NR state
Y(J)S APS
Slide
Y(J)S APS
Slide 53
Ethernet rings ?
Ethernet has become carrier grade :
OAM
synchronization
The only thing missing to completely replace SDH is ring
protection
However, Ethernet and ring architectures dont go together
open loop
cut the ring by blocking some link
when protection is required - block the failed link
closed loop
disable STP (but avoid infinite loops in some way !)
when protection is required - steer and/or wrap traffic
Y(J)S APS
Slide 54
G.8032 (ERPS)
RFER (RAD)
ERP (NSN)
REP (Cisco)
RRSTP (Alcatel)
RRPP (Huawei)
PSR (Overture)
Closed loop methods
Slide 55
G.8032
Q9 of SG15 produced G.8032 between 2006 and 2008
G.8032 is similar to G.8031
Slide 56
RPL
G.8032v1 defines the Ring Protection Link (RPL)
as the link to be blocked (to avoid closing the loop) in NR state
One of the 2 nodes connected to the RPL
is designated the RPL owner
Unlike RFER
in revertive operation
once the failure is cleared the block link is unblocked
and the RPL is blocked again
Y(J)S APS
Slide 57
G.8032 revertive
operation
adjacent nodes exchange CCM at 300 per second rate (including over RPL)
exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5
seconds (but not over RPL)
R-APS messages are never forwarded
node(s) detect CCM and start guard timer (blocks acting on R-APS messages)
node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)
RPL owner receiving NR starts WTR timer
when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB
node receiving NR RB flushes table, unblocks any blocked ports, sends NR
RB
Y(J)S APS
Slide
G.8032-2010
After coming out with G.8032 in 2008 (G.8032v1)
the ITU came out with G.8032-2010 (G.8032v2) in 2010
This new version is not backwards-compatible with v1
but a v2 node must support v1 as well (but then operation is
according to v1)
RPL
RPL
next
neighbor
RPL
RPL
owner
neighbor
Major differences :
subring
Y(J)S APS
Slide 59
RPR 802.17
Resilient Packet Rings
are compatible with standard Ethernet, but different
frame format
are robust (lossless, <50ms protection, OAM)
are fair (based on client throttling)
support QoS (3 classes A, B, C)
are efficient (full spatial reuse)
ringlet0
are plug and play (automatic station autodiscovery)
extend use of existing fiber rings
counter-rotating add/drop ringlets, running
ringlet1
developed by 802.17 WG
ringlet selection
Y(J)S APS
Slide
PTQ
STQ
fairness
Y(J)S APS
Slide
use
info rate
D/FDV
FE
A0
RT
reserved
low
No
A1
RT
allocated,
low
No
bounded
No
reclaimable
B-EIR
BE
Slide
Y(J)S APS
Slide
RPR - protection
rings give inherent protection against single point of failure
RPR specifies 2 mechanisms
steering
wrapping (optional)
(implementations may also do wrapping then steering)
steering info
wrap
Y(J)S APS
Slide
ring
nodes
switches
Y(J)S APS
Slide
Y(J)S APS
Slide 66
IP FRR
True protection mechanisms do not exist for connectionless IP
In practice, routing protocols discover breaks and recalculate routes
but this usually takes a long time
Link-state IGPs detect link-down state using hellos
for OSPF - typically every 10 sec, and detection after 40 sec
and then Dijkstra algorithm avoids the failed link
BFD can be used to speed up the detection
However,
Y(J)S APS
Slide 67
Slide 68
MP
egress
LER
Y(J)S APS
Slide 69
Methods
RFC 4090 defines two different protection methods
Usually one or the other is employed in a given network
One-to-one backup
no labels pushed
PLR
MP
Facility backup
MP
Y(J)S APS
Slide 70
MP
MP
Y(J)S APS
Slide 71
MPLS TP APS
RFC 6372 (MPLS-TP Survivability Framework)
RFC 6378 (MPLS-TP Linear Protection)
draft-ietf-mpls-tp-ring-protection
Y(J)S APS
Slide 72
MPLS-TP resilience
Since it strives to be a carrier-grade transport
network
TP has strong protection switching requirements
APS has been almost as contentious issue as OAM
and indeed the arguments are inter-related
RFC 6372 gives a general framework
and differentiates between
linear
shared-mesh and
ring protection
Y(J)S APS
Slide 73
Linear protection
from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)
1+1, 1:1, 1:n and uni/bidi are supported
APS signaling protocol (for all modes except 1+1
uni)
is single-phase
and called the Protection State Coordination
protocol
PSC messages are sent over the protection channel
APS messages are sent over the GACh with a single
channel type
message functions identified by a request field
6 states: normal, protecting due to failure, admin
protecting,
WTR, protection path unavailable, DNR
Y(J)S APS
Slide 74
0001
VER
channel type
00000000
Ver Request PT R
Path
TC
PSC
Res
TLV Length
GAL
GACh
FPath
Res
PSC
Optional TLVs
Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR
PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n
R = Revertive
FPath = which path has fault Path = which data path is on protection
channel
Y(J)S APS
Slide 75
Y(J)S APS
Slide 76
Y(J)S APS
Slide 77
from draft-zulr-mpls-tp-linear-protectionswitching
Similar to previous, but uses Y.1731/G.8031 format
(no
surprise!)
S=1
TC
0001
VER
00000000
allocated channel type
ME
VER
L
req prot
state type
END=
OPCODE
FLAGS
=39
request
ed
sig
=0
bridged
sig
GAL
GACh
OFFSE
T=4
reserve
G.803
1
0
Y(J)S APS
Slide 78
Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node
failures
draft-ietf-mpls-tp-ring-protection (not yet RFC)
Between any 2 LSRs can define a Sub-Path Maintenance Entity
So between 2 LSRs on a ring there are 2 SPMEs
we define 1 as the working channel and 1 as the protection
channel
Now we re-use the linear protection mechanisms, including the
PSC protocol
draft-helvoort-mpls-tp-ring-protectionswitching
Both counter-rotating rings carry working and protection traffic
The bandwidth on each ring is divided
X BW is dedicated to working traffic and Y dedicated to
protection traffic
Y(J)S APS
Slide 79