Sunteți pe pagina 1din 79

Automatic

Protection
Switching
Yaakov (J) Stein
CTO
RAD Data Communications

Mar 2012

Course Outline
General protection switching principles
Examples of protection mechanisms

SONET/SDH
Ethernet linear protection
Ethernet ring protection
MPLS fast reroute
MPLS-TP APS

Y(J)S APS

Slide 2

General principles
Definition
References
Traffic types
Network topologies
Triggers
Protection classes
Entities
Protection types
Signaling

Y(J)S APS

Slide 3

Definition
Automatic Protection Switching (APS)
is a functionality of carrier-grade transport networks
is often called resilience
since it enables service to quickly recover from failures
is required to ensure high reliability and availability
APS includes :

detection of failures (signal fail or signal degrade) on a working channel

switching traffic transmission to a protection channel

selecting traffic reception from the protection channel

(optionally) reverting back to the working channel once failure is


repaired

Automatic means uses (at most) control plane protocols


no management layer or manual operations needed
Y(J)S APS

Slide 4

Some useful references


G.808.1 generic linear protection
G.808.2 generic ring protection (not yet written)
G.841 and G.842 SDH
G.774.3/4/9/10 SDH protection management
G.870 and G.873.1 OTN
G.8031 Ethernet linear protection
G.8032 Ethernet ring protection
G.8131 T-MPLS APS
Y.1720 MPLS
I.630 ATM
M.495 analog signal protection
G.781 clock selection (can be used to protect
synchronization)
RFC 4090 MPLS Fast ReRoute
RFC 6372 MPLS-TP Survivability Framework
RFC 6378 MPLS-TP Linear Protection
Y(J)S APS

Slide 5

Traffic types
In a network with APS capabilities, there are three types of traffic :

protected traffic
traffic that may be rapidly switched to protection channel

at any time it may be on the working channel or protection channel

Nonpreemptible Unprotected Traffic (NUT)


noncritical traffic that does not require protection mechanism
not affected by protection mechanism
somewhat less expensive to customer

extra (preemptible) traffic


best effort background traffic that runs on protection channel
preempted (blocked) when protection channel is needed
very inexpensive to customer

Y(J)S APS

Slide 6

Network topologies
APS can be defined for any topology with redundant links
e.g., for tree topologies no protection is possible
We will often discuss protection of individual links
However, there are two topologies that are of particular interest :

rings
protection is natural for rings
although there are other reasons for using rings as well
rings are so important that protection for other topologies
is often called linear protection

dense meshes
for this topology multiple local bypasses can be preconfigured
protection switching is similar to routing change, but faster
often called Fast ReRoute (FRR)

Y(J)S APS

Slide 7

Triggers
Protection switching is usually triggered by a failure
although the operator may manually force a protection switch
A failure is declared when a fault condition
persists long enough
for the ability to perform the required function
to be considered terminated
Failures are Signal Fail (SF) or Signal Degrade (SD) (of various types)
and may be :

detected by physical layer

indicated by signaling (e.g. AIS)

detected by OAM mechanisms


When there is no SF or SD, the state is called No Request (NR)

Y(J)S APS

Slide 8

Switching time

(1)

SONET/SDH protection switching takes place in under 50 ms


Regarding multiplex section shared protection rings, G.841 states :
The following network objectives apply:
1) Switch time In a ring with no extra traffic, all nodes in the idle state (no detected
failures,
no active automatic or external commands, and receiving only Idle K-bytes), and with less
than 1200 km of fibre, the switch (ring and span) completion time for a failure on a single
span shall be less than 50 ms. On rings under all other conditions, the switch completion
time can exceed 50 ms (the specific interval is under study) to allow time to remove extra
traffic, or to negotiate and accommodate coexisting APS requests.

while for linear VC trail protection, it says :


The following network objectives apply:
1) Switch time The APS algorithm for LO/HO VC trail protection shall operate as fast as
possible. A value of 50 ms has been proposed as a target time. Concerns have been
expressed over this proposed target time when many VCs are involved. This is for further
study. Protection switch completion time excludes the detection time necessary to initiate
the
protection switch, and the hold-off time.

There are similar statements in other clauses as well


Y(J)S APS

Slide 9

Switching time

(2)

This 50 ms time has become the golden standard


and new protection schemes are expected to meet this objective
However, studying the literature that lead up to SONET/SDH
standards
shows that the objective was to attain the minimum possible time
for the sum of

persistent (i.e. non-transient) failure detection


speed of light propagation
signaling protocol time
regaining sync alignment

and 50 ms was the minimum that was considered practical !


Many modern standards have built in 50 ms
and much marketing literature boasts faster than 50 ms
But there is really nothing special about 50 ms

50 ms gaps in voiced speech are noticeable,


but not fatal if infrequent
50 ms of data at high rates can not be stored and later forwarded
timing circuits can withstand much more than 50 ms without clock
Y(J)S APS

Slide 10

Protection classes
It is useful to distinguish two different protection classes

path protection (AKA trail protection, end-to-end protection)


when a failure is detected on the end-to-end path
we switch to an alternative end-to-end path
the failure is usually detected by end-to-end OAM

local protection (AKA local restoration, SNC protection, bypass, detour)


we protect individual network elements, links, or groups of
same
when such an entity fails
only that local entity is bypassed
the failure may be detected by link OAM or physical layer
means
Y(J)S APS

Slide 11

APS entities

(1)

The following entities are important in APS

working channel channel used when no failure exists

protection channel channel used when a failure exists

head-end entity transmitting data to working/protection


channel

tail-end entity receiving data from the working/protection


channel

Note: we will usually consider traffic to be bidirectional


so that the head-end for one direction
is the tail-end for theworking
opposite
direction
channel

head-end

protection channel

tail-end

Y(J)S APS

Slide 12

APS entities

(2)

Bridge function at head-end that connects traffic

(including

extra traffic) to the working and protection channels

Selector function at tail-end that extracts traffic

(perhaps

extra traffic) from the working or protection channel

APS signaling channel channel used to


communicate between head-end and tail-end for APS
purposes

Trail termination
detection

function responsible for failure

working
channel of OAM
including injection and
extraction
head-end
(bridge)

protection channel

tail-end
(selector)

signaling channel
Y(J)S APS

Slide 13

Revertive operation
Reversion means returning to use the working channel
after the failure has been rectified
Protection mechanisms can be revertive or nonrevertive
Revertive mechanisms may be preferable

when the working channel has better performance (free BW, BER, delay)

when there are frequent switches (easier to manage)

when there is extra traffic


but nonrevertive also has advantages

only one service disruption due to protection switching

may be simpler to implement

Y(J)S APS

Slide 14

Uni/bi-directional
We will usually consider bidirectional traffic
but even then the failures can be uni- or bi- directional
and for unidirectional failures there can be uni- or bi- directional switching

unidirectional
failure

unidirectional
protection working channel
protection channel in use
working channel
protection channel

bidirectional
failure

bidirectional
protection working channel
protection channel in use
working channel
protection channel in use
Y(J)S APS

Slide 15

Uni- / bi- directional


switching
Unidirectional switching may be advantageous

for 1+1 - faster and no signaling channel is needed

no unnecessary service disruption for direction without failure

higher chance of protection under multiple failures

easier to implement for local protection

maintains extra traffic in direction without failure

But bidirectional may be preferable

easier management since directions traverse same network


elements

does not disrupt delay balance between direction

may simplify repair since failed spans are unused

Y(J)S APS

Slide 16

Protection types
We distinguish several different protection types

1+1

1:1

1:n

m:n

(1:1)n
Each type has its applicability, advantages, and
disadvantages
and there are trade-offs between

simplicity

BW consumption

protection switch time

signaling requirements
Y(J)S APS Slide 17

1+1 protection
Simplest and fastest form of protection
but wasteful - only 50% of actual physical capacity is used
Head-end bridge always sends data on both channels
Tail-end selector chooses channel to use (based on BER, dLOS, etc.)
For unidirectional1+1 switching there is no need for APS signaling
If non-revertive
there is no distinction between working and protection channels

channel A

channel B

Y(J)S APS

Slide 18

1:1 protection
Head-end bridge usually sends data on working channel
When failure detected it starts sending data over protection channel
and tail-end needs to select the protection channel
When not in use, protection channel can be used for extra traffic
However, since failure is detected by tail-end, APS signaling is needed
Protection channel should have OAM running to ensure its functionality

working channel

extra traffic
protection channel
APS signaling
Y(J)S APS

Slide 19

1:n protection
One protection channel is allocated for n working channels
Only can protect one working channel at a time
but improbable that more than 1 working channel will
simultaneously fail
Only 1/(n+1) of total capacity is reserved for protection

working channels
protection channel
Y(J)S APS

Slide 20

m:n protection
To enable protection of more than 1 channel
m protection channels are allocated for n working channels (m < n)
m simultaneous failures can be protected
Less protection capacity dedicated than for n times 1:1
When failure detected,
1 of the m protection channels need to be assigned and signaled
High complexity but conserves resources

working channels

protection channels
Y(J)S APS

Slide 21

(1:1)n protection
This is like n times 1:1 but the n protection channels share
bandwidth
Only 1 failed working channel can be protected
This is different from 1:n since
n protection channels are preconfigured
n working channels need not be of the same type
Protection bandwidth must be at least that of the largest working channel

Y(J)S APS

Slide 22

APS algorithm
We have seen that protection switching is a tricky business
So it is not surprising that network elements that support APS
run an APS algorithm
This algorithm inputs :

configuration (protection type, revertive?, available channels, )

failure indications (NR, SF, SD)

operator commands

APS signaling (more on that soon)


and makes switching decisions
The algorithm maintains state information for head-end and tail-end
APS algorithms are detailed in standards documents

Y(J)S APS

Slide 23

Priority
Not every failure event / operator command results in a protection
switch
For example
in 1:n protection the protection channel may already be in use !
Conflicts are resolved by assigning priorities to events/commands
When an event is detected or a command received
the APS algorithm will not act
if an event/command or equal or higher priority is already in effect
True failure conditions usually have higher priority than manual
commands

Y(J)S APS

Slide 24

Timers
Even failure events with priority are not acted upon immediately
to do so would cause unnecessary switches after transient defects
The APS algorithm may maintains several timers, such as

Holdoff timers
the time between detection of a SF or SD event
and the APS algorithm acting upon this even
the algorithm usually used is called peek twice
i.e., the condition is checked again after the timer expires

Wait To Restore timer


for revertive switching, the time between detection of the
failure being cleared and the APS algorithm acting upon this
event

also used in SDH optimized bidirectional 1+1 (nonrevertive)

Guard timer
for rings blockout time during which APS messages are
ignored (since they may be old and outdated)

Y(J)S APS

Slide 25

APS signaling
In all types except unidirectional 1+1, some APS signaling is needed
APS signaling is used to synchronize between head-end and tail-end
It is critical that head-end and tail-end always be in the same state
Example messages include :

No Request (NR)

by tail-end to inform head-end of Signal Failure (SF)

by head-end to confirm the events priority

by head-end to report the particular protection channel

by head-end to inform tail-end of Reverse (bidirectional) Request


(RR)

by tail-end after failure cleared to Wait To Restore (WTR)

by tail-end after failure cleared to Do Not Revert (DNR) for


nonrevertive
Y(J)S APS

Slide 26

APS signaling phases


When APS signaling is used, it needs to be as rapid as possible
Depending on the scenario it may be

1-phase tailhead (fastest)

tail-end informs head-end of failure

both ends uniquely know the protection channel to be used

only for 1+1 and unidirectional-(1:1) n (including 1:1)

2-phase

1) tailhead 2) headtail

tail-end informs head-end of failure

head-end signals that it has switched to protection channel

not for bidirectional-1:n or m:n

3-phase 1) tailhead 2) headtail 3) tailhead (slowest)


works for all protection types (including m:n)

Y(J)S APS

Slide 27

Examples of 1-phase
Example of when 1-phase signaling is possible is 1:1 or (1:1) n
1. upon detection of failure the tail-end sends SF to the headend
and immediately changes its selector (blind switch)
upon receipt the head-end changes the bridge setting
(no priority is checked)
1-phase can also be used for bidirectional 1:1
1. upon detection of failure the tail-end sends SF to the headend
and immediately changes both its selector and bridge
upon receipt the head-end changes its bridge and selector
Y(J)S APS

Slide 28

Example of 2-phase
2-phase is useful for unidirectional 1:n with priority checking
1. upon detection of failure the tail-end sends SF to the headend
but does not change its selector
2. the head-end checks priority
sends confirmation to tail-end (with identity of working
channel)
the bridge setting is changed
3. the tail-end changes its selector

Y(J)S APS

Slide 29

Example of 3-phase
3-phase signaling is imperative for bidirectional 1:n
1. upon detection of failure the tail-end sends SF to the head-end
but does not change its selector
2. the head-end checks priority, and sends confirmation to tail-end
head-end changes its bridge setting
and also sends a reverse request
3. the tail-end changes selector
checks priority and sends confirmation to head-end
tail-end changes its bridge setting (as head-end of opposite
direction)
head-end receives confirmation and changes its selector

Y(J)S APS

Slide 30

For G.805 buffs


to add 1+1 trail protection to a trail - expand a trail termination
function
we use a special transport processing function - the protection switch
unprotected
trail

protected trail

the unprotected TTs report status


to the protection switch

Y(J)S APS

Slide 31

SONET/SDH APS

Y(J)S APS

Slide 32

SONET protection ?
SONET/SDH networks need to be highly reliable (five nines)
Down-time should be minimal (less than 50 msec)
So systems must repair themselves (no time for manual
intervention)

Upon detection of a failure (dLOS, dLOF, high BER)


the network must reroute traffic (protection switching)
from working channel to protection channel
SDH APS is unidirectional
SDH APS may be revertive
working channel

head-end NE

protection channel

tail-end NE

Y(J)S APS

Slide 33

SONET/SDH layers
ADM

regenerator

ADM

Path

Line

Section

Line

Path

Termination

Termination

Termination

Termination

Termination

path
line
section

line (MS section)


section

section

line
section

Between regenerators there are sections (regenerator sections)


Between ADMs there are lines (multiplex sections)
Between path terminations there are paths
Protection can be at OC-n level (different physical fibers)
or at STM/VC level
or end-to-end path (trail protection)
Y(J)S APS

Slide 34

Line APS

A1

A2

J0

B1

E1

F1

D1

D2

D3

H1

H2

H3

B2

K1

K2

D4

D5

D6

D7

D8

D9

DA

DB

DC

S1

M0

E2

9 rows

6 rows

3 rows

90 columns

Synchronous Payload Envelope


TOH

TOH consists of

3 rows of section overhead - frame sync, trace,


EOC,

6 rows of line overhead - pointers, SSM, FEBE, and


Line APS signaling uses bytes K1 and K2
Y(J)S APS

Slide 35

HO Path APS
J1
B3
C2
G1
F2
H4
F3
K3
N1

POH
POH is responsible for type, status, path performance monitoring,
VCAT, trace

HO Path APS signaling uses 4 MSBs of byte K3


Y(J)S APS

Slide 36

30

LO Path APS
59
87

V5

VC OH is responsible for
Timing, PM, REI,
LO Path APS signaling
is
4 MSBs of byte K4

V1
J2
V2
N2
V3
K4
V4

VC OH
Y(J)S APS

Slide 37

How does it work?


Head-end and tail-end NEs have bridges (muxes)
Head-end and tail-end NEs maintain bidirectional signaling
channel
Signaling is contained in K bytes of protection channel
For line APS
K1 tail-end status and requests
K2 head-end status
head-end bridge

tail-end bridge

working channel

protection channel

signaling channel
Y(J)S APS

Slide

Linear 1+1 protection


Can be at OC-n level (different physical fibers)
or at STM/VC level (SubNetwork Connection Protection)
or end-to-end path (called trail protection)
Head-end bridge always sends data on both channels
Tail-end chooses channel to use based on BER, dLOS, etc.
No need for signaling
If non-revertive
there is no distinction between working and protection
channels
working channel

head-end NE

protection channel

tail-end NE
Y(J)S APS

Slide 39

Linear 1:1 protection


Head-end bridge usually sends data on working channel
When tail-end detects failure it signals (using K1) to head-end
Head-end then starts sending data over protection channel
When not in use
protection channel can be used for (discounted) extra traffic
(pre-emptible unprotected traffic)

May be at any layer (but only OC-n level protects against fiber cuts)

working channel

extra traffic
protection channel
Y(J)S APS

Slide 40

Linear 1:N protection


In order to save BW
we allocate 1 protection channel for every N working
channels
N limited to 14
4 bits in K1 byte from tail-end to head-end
0
protection channel
1-14 working channels
15 extra traffic channel

working channels
protection channel
Y(J)S APS

Slide 41

Two fiber vs. Four-fiber


rings

Ring based protection is popular in North America (100K+


rings)

Full protection against physical fiber cuts


Simpler and less expensive than mesh topologies
Protection at line (multiplexed section) or path layer
Four-fiber rings
fully redundant at OC level
can support bidirectional routing at line layer
Two-fiber rings
support unidirectional routing at line layer

2 fibers in opposite directions


Y(J)S APS

Slide 42

Unidirectional vs.
bidirectional
Unidirectional routing
working channel B-A same direction (e.g. clockwise) as A-B
management simplicity: A-B and B-A can occupy same timeslots
Inefficient: waste in ring BW and excessive delay in one direction
Bidirectional routing
A-B and B-1 are opposite in direction
both using shortest route
spatial reuse: timeslots can be reused in other sections

A-B

A-B

B-C

B-A
A

A
C-B
B-A

C
Y(J)S APS

Slide 43

UPSR vs. BLSR

(MS-SPRing)

UPSR

Unidirectional

Path switching

Two-fiber

BLSR

Bidirectional

Line switching

Four-fiber

Of all the possible combinations, only a few are in use


Unidirectional (routing) Path Switched Rings
protects tributaries
extension of 1+1 to ring topology
Bidirectional (routing) Line Switched Rings (two-fiber and fourfiber versions)
called Multiplex Section Shared Protection Ring in SDH
simultaneously protects all tributaries in STM
extension of 1:1 to ring topology
Y(J)S APS

Slide 44

UPSR
Working channel is in one direction
protection channel in the opposite direction
All path traffic is added in both directions (1+1)
decision as to which to use is made at drop point (no
signaling)
Normally non-revertive, so effectively two diversity paths
Good match for access networks
1 access resilient ring
less expensive than fiber pair per customer
Inefficient for core networks
no spatial reuse
every signal in every span
in both directions
node needs to continuously monitor
every tributary to be dropped

2 rings

SONET ADM
Y(J)S APS

Slide 45

BLSR
Switch at line level less monitoring
When failure detected tail-end NE signals head-end NE
Works for unidirectional/bidirectional fiber cuts, and NE
failures
Two-fiber version
half of OC-N capacity devoted to protection
only half capacity available for traffic
Four-fiber version
full redundant OC-N devoted to protection
twice as many NEs as compared to two-fiber

wrap-around

2 rings

Example
recovery from unidirectional fiber cut
Y(J)S APS

Slide 46

Ethernet linear APS

STP
LAG
G.8031

Y(J)S APS

Slide 47

STP
The original Spanning Tree Protocol automatically removed loops
from arbitrary networks (with loops)
However, its convergence was very slow (about a minute)
STP can not be used as a protection mechanism
since its reconvergence time is very long
due to a cumbersome protocol
and long holdoff timer settings
An evolutionary update called Rapid STP 802.1w
was incorporated into 802.1D-2004 clause 17
that converges in about the same time as STP
but can reconverge after a topology change in less than 1 second
RSTP can be used to detect failures and reconverge
and thus can be used as a primitive protection mechanism
However, the switching time will be many tens of ms to 100s of ms

Y(J)S APS

Slide

Use of LAG
Ethernet link aggregation

(AKA bonding, Ethernet trunk, inverse mux, NIC

teaming)

enables bonding several ports together as single uplink


Defined by 802.3ad task force and folded into 802.3-2000 as clause 43
Binding of ports to Link Aggregation Groups (LAGs) distributed via
Link Aggregation Control Protocol (LACP)
LACP uses slow protocol frames (up to 5 per second)
Links may be dynamically added/removed from LAG
and LACP continuously monitors to detect if changes needed
Upon link failure LAG delivers traffic at a reduced rate
Thus LAG can be used as a primitive protection mechanism
When used this way it is called worker/standby or N+N mode
The restoration time will be on the order of 1 second
Y(J)S APS

Slide

G.8031
Q9 of SG15 in the ITU-T is responsible for protection switching
In 2006 it produced G.8031 Linear Ethernet Protection Switching
G.8031 uses standard Ethernet formats, but is incompatible with
STP
The standard addresses
point-to-point VLAN connections
SNC (local) protection class
1+1 and 1:1 protection types
unidirectional and bidirectional switching for 1+1
bidirectional switching for 1:1
revertive and nonrevertive modes
1-phase signaling protocol
G.8031 uses Y.1731 OAM CCM messages in order to detect failures
G.8031 defines a new OAM opcode (39) for APS signaling messages
Switching times should be under 50 ms (only holdoff timers when groups)
Y(J)S APS

Slide

G.8031 signaling
The APS signaling message looks like this :

MEL
(3b)

VER=0
(5b)

req/state prot. type


(4b)

END=0
(1B)

(4b)

OPCODE=39

FLAGS=0

OFFSET=4

requested sig

bridged sig

reserved

(1B)

(1B)

(1B)

(1B)

(1B)

(1B)

regular APS messages are sent 1 per 5 seconds


after change 3 messages are sent at max rate (300 per sec)

where

req/state identifies the message (NR, SF, WTR, SD, forced switch, etc)

prot. type identifies the protection type (1+1, 1:1, uni/bidirectional,


etc.)

requested and bridged signal identify incoming / outgoing traffic


since only 1+1 and 1:1 they are either null or traffic (all other values
reserved)

Y(J)S APS

Slide

G.8031 1:1 revertive


operation
In the normal (NR) state :

head-end and tail-end exchange CCM (at 300 per second rate)
on both working and protection channels
head-end and tail-end exchange NR APS messages
on the protection channel (every 5 seconds)

When a failure appears in the working channel

tail-end stops receiving 3 CCM messages on working channel


tail-end enters SF state
tail-end sends 3 SF messages at 300 per second on the APS channel
tail-end switches selector (bi-d and bridge) to the protection channel
head-end (receiving SF) switches bridge (bi-d and selector) to protection
channel
tail-end continues sending SF messages every 5 seconds
head-end sends NR messages but with bridged=normal

When the failure is cleared

tail-end leaves SF state and enters WTR state (typically 5 minutes, 5..12
min)
tail-end sends WTR message to head-end (in nonrevertive - DNR message)
tail-end sends WTR every 5 seconds
when WTR expires both sides enter NR state
Y(J)S APS

Slide

Ethernet ring APS


G.8032
RPR
CLEER

Y(J)S APS

Slide 53

Ethernet rings ?
Ethernet has become carrier grade :

deterministic connection-oriented forwarding

OAM

synchronization
The only thing missing to completely replace SDH is ring
protection
However, Ethernet and ring architectures dont go together

Ethernet has no TTL, so looped traffic will loop forever

STP builds trees out of any architecture no loops allowed


There are two ways to make an Ethernet ring

open loop
cut the ring by blocking some link
when protection is required - block the failed link

closed loop
disable STP (but avoid infinite loops in some way !)
when protection is required - steer and/or wrap traffic
Y(J)S APS

Slide 54

Ethernet ring protocols


Open loop methods

G.8032 (ERPS)

rSTP (ex 802.1w)

RFER (RAD)

ERP (NSN)

RRST (based on RSTP)

REP (Cisco)

RRSTP (Alcatel)

RRPP (Huawei)

EAPS (Extreme, RFC 3619)

EPSR (Allied Telesis)

PSR (Overture)
Closed loop methods

RPR (IEEE 802.17)

CLEER and NERT (RAD)


Y(J)S APS

Slide 55

G.8032
Q9 of SG15 produced G.8032 between 2006 and 2008
G.8032 is similar to G.8031

strives for 50 ms protection (< 1200 km, < 16 nodes)


but here this number is deceiving as MAC table is flushed

standard Ethernet format but incompatible with STP

uses Y.1731 CCM for failure detection

employs Y.1731 extension for R-APS signaling (opcode=40)

R-APS message format similar to APS of G.8031


(but between every 2 nodes and to MAC address 01-19-A7-00-0001)

revertive and nonrevertive operation defined

However, G.8032 is more complex due to

requirement to avoid loop creation under any circumstances

need to localize failures

need to maintain consistency between all nodes on ring

existence of a special node (RPL owner)


Y(J)S APS

Slide 56

RPL
G.8032v1 defines the Ring Protection Link (RPL)
as the link to be blocked (to avoid closing the loop) in NR state
One of the 2 nodes connected to the RPL
is designated the RPL owner
Unlike RFER

there is only one RPL owner

the RPL and owner are designated before setup

operation is usually revertive

All ring nodes are simultaneously in 1 of 2 modes idle or protecting

in idle mode the RPL is blocked

in protecting mode the failed link is blocked and RPL is unblocked

in revertive operation
once the failure is cleared the block link is unblocked
and the RPL is blocked again

Y(J)S APS

Slide 57

In the idle state :

G.8032 revertive
operation

adjacent nodes exchange CCM at 300 per second rate (including over RPL)
exchange NR RB (RPL Blocked) messages in dedicated VLAN every 5
seconds (but not over RPL)
R-APS messages are never forwarded

When a failure appears between 2 nodes

node(s) missing CCM messages peek twice with holdoff time


node(s) block failed link and flush MAC table
node(s) send SF message (3 times @ max rate, then every 5 sec)
node receiving SF message will check priority and unblock any blocked link
node receiving SF message will send SF message to its other neighbor
in stable protecting state SF messages over every unblocked link

When the failure is cleared

node(s) detect CCM and start guard timer (blocks acting on R-APS messages)
node(s) send NR messages to neighbors (3 times @ max rate, then every 5 sec)
RPL owner receiving NR starts WTR timer
when WTR expires RPL owner blocks RPL, flushes table, and sends NR RB
node receiving NR RB flushes table, unblocks any blocked ports, sends NR
RB
Y(J)S APS

Slide

G.8032-2010
After coming out with G.8032 in 2008 (G.8032v1)
the ITU came out with G.8032-2010 (G.8032v2) in 2010
This new version is not backwards-compatible with v1
but a v2 node must support v1 as well (but then operation is
according to v1)

RPL
RPL
next
neighbor

RPL

RPL

owner

neighbor

Major differences :

2 designated nodes RPL owner node and RPL neighbor node


and for optional flush-optimization next neighbor node
significant changes to
state machine
priority logic
commands (forced/manual/clear) and protocol
new Wait To Block timer
subring ring
supports more general topologies (sub-rings)
ladders (For Further Study in v1)
multi-ring
ring topology discovery
ladder
virtual channel based on VLAN or MAC address

subring

Y(J)S APS

Slide 59

RPR 802.17
Resilient Packet Rings
are compatible with standard Ethernet, but different
frame format
are robust (lossless, <50ms protection, OAM)
are fair (based on client throttling)
support QoS (3 classes A, B, C)
are efficient (full spatial reuse)
ringlet0
are plug and play (automatic station autodiscovery)
extend use of existing fiber rings
counter-rotating add/drop ringlets, running

ringlet1

SONET/SDH (any rate, PoS, GFP or LAPS) or


packetPHY (1 or 10 Gb/s ETH PHY)

developed by 802.17 WG

ringlet selection

based on Ciscos Spatial Reuse Protocol (RFC 2892)

Y(J)S APS

Slide

Basic RPR queuing


traffic going around ring

traffic for local sink


placed in output buffer
according to service class

placed into internal buffer


in dual-transit queue mode
placed into 1 of 2 buffers
according to service class
sent according to fairness

PTQ
STQ

fairness

Primary/Secondary Transit Queue

traffic from local source


sent according to fairness
first sent to ringlet selection

Y(J)S APS

Slide

RPR service classes


RPR defines 3 main classes
class A : real time (low latency/FDV)
class B : near real time (bounded predictable latency/FDV)
class C : best effort
class

use

info rate

D/FDV

FE

A0

RT

reserved

low

No

A1

RT

allocated,

low

No

bounded

No

reclaimable

B-CIR near RT allocated,


reclaimable

B-EIR

near RT opportunistic unbounded Yes

BE

opportunistic unbounded Yes


Y(J)S APS

Slide

RPR Class use


A0 ring BW is reserved not reclaimed even if no traffic
in dual-transit queue mode:
class A frames from the ring are queued in PTQ
class B, C in STQ
priority for egress
frames in PTQ
local class A frames
local class B (when no frames in PTQ)
frames in STQ
local class C (when no PTQ, STQ, local A or B)
Notes:
class A have minimal delay
class B have higher priority than STQ transit frames, so bounded
delay/FDV
classes B and C share STQ, so once in ring have similar delay

Y(J)S APS

Slide

RPR - protection
rings give inherent protection against single point of failure
RPR specifies 2 mechanisms
steering
wrapping (optional)
(implementations may also do wrapping then steering)

steering info

wrap
Y(J)S APS

Slide

NERT and CLEER


New Ethernet Ring Technology / Closed Loop Encapsulated Ethernet
Ring
Similar to RPR but uses real Ethernet format
NERT and CLEER distinguish between
ring nodes
switches connected to ring nodes
Traffic in ring is MAC-in-MAC encapsulated
External MACs are of ring node
Internal MACs are original
Unexpected external MACs discarded
External MACs learned as in 1ah

ring
nodes

Ring nodes forward according to table


NERT floods, CLEER never floods
Protection switch only involves changing table
so service restoration is fast

switches
Y(J)S APS

Slide

MPLS fast reroute


IP FRR
RFC 4090

Y(J)S APS

Slide 66

IP FRR
True protection mechanisms do not exist for connectionless IP
In practice, routing protocols discover breaks and recalculate routes
but this usually takes a long time
Link-state IGPs detect link-down state using hellos
for OSPF - typically every 10 sec, and detection after 40 sec
and then Dijkstra algorithm avoids the failed link
BFD can be used to speed up the detection
However,

the information still has to be propagated further (seconds?)

and FIBs updated (100s of ms)


Various IP Fast ReRoute (IP FRR) mechanisms have been proposed
but true protection is best done at the MPLS level

Y(J)S APS

Slide 67

MPLS fast reroute


RSVP-TE enables MPLS traffic engineering by fine control over
placement
specifies explicit path using information gathered from IGP
resources may be reserved at LSRs along the way
RFC 4090 defines extensions to RSVP-TE Fast ReRoute (FRR)
LSRs along the path preconfigure local bypasses (detours)
Upon detection of failure by
not

BFD (specified in microseconds, typically 10s of ms) or


discussed

RSVP hellos (RFC default is 5 ms) or


in RFC

RESV / PATH messages (driven by IGP)


4090
upstream LSR simply enables the detour
Since this is a local action, it should be fast
RFC 4090 only discusses adding FRR to RSVP-TE network
but its use with LDP is possible if there is a single label generator
Y(J)S APS

Slide 68

PLRs and MPs


A fundamental entities in MPLS FRR are

Point of Local Repair (PLR)

Merge Point (MP)


A PLR is the LSR before the failed element (link or node)
All LSRs except the egress LER can be PLRs
The PLR is solely responsible for the FRR (no explicit APS signaling)
During path setup, potential PLRs create detours towards the egress
LER
A MP is the LSR where the detour rejoins the LSP
All LSRs except the ingress LER can be MPs
ingress
PLR
LER

MP

egress
LER

Y(J)S APS

Slide 69

Methods
RFC 4090 defines two different protection methods
Usually one or the other is employed in a given network
One-to-one backup

each LSP protected separately

detour LSP created for each LSP at each potential PLR

no labels pushed
PLR
MP

Facility backup

backup tunnel for multiple LSPs

bypass tunnel created at each potential PLR

uses label stacking


PLR

MP

Y(J)S APS

Slide 70

NHOP and NNHOP


MPLS FRR can bypass a failed link or a failed node
In order to bypass a single failed link
we need an alternative path to the next hop (NHOP)
PLR

MP

In order to bypass a single failed node, we need an


alternative path to the next next hop (NNHOP)
PLR

MP

Y(J)S APS

Slide 71

MPLS TP APS
RFC 6372 (MPLS-TP Survivability Framework)
RFC 6378 (MPLS-TP Linear Protection)
draft-ietf-mpls-tp-ring-protection

Y(J)S APS

Slide 72

MPLS-TP resilience
Since it strives to be a carrier-grade transport
network
TP has strong protection switching requirements
APS has been almost as contentious issue as OAM
and indeed the arguments are inter-related
RFC 6372 gives a general framework
and differentiates between
linear
shared-mesh and
ring protection

Y(J)S APS

Slide 73

Linear protection
from RFC 6378 (ex draft-ietf-mpls-tp-linear-protection)
1+1, 1:1, 1:n and uni/bidi are supported
APS signaling protocol (for all modes except 1+1
uni)
is single-phase
and called the Protection State Coordination
protocol
PSC messages are sent over the protection channel
APS messages are sent over the GACh with a single
channel type
message functions identified by a request field
6 states: normal, protecting due to failure, admin
protecting,
WTR, protection path unavailable, DNR
Y(J)S APS

Slide 74

PSC message format


S=1

GAL Label (13)


TTL

0001
VER
channel type

00000000

Ver Request PT R
Path

TC
PSC

Res

TLV Length

GAL
GACh

FPath
Res

PSC

Optional TLVs
Request : NR, SF, SD, manual switch, forced switch, lockout, WTR, DNR
PT = Protection Type : uni 1+1, bidi 1+1, bidi 1:1/1:n
R = Revertive
FPath = which path has fault Path = which data path is on protection
channel

Y(J)S APS

Slide 75

PSC control logic states


Normal state - no trigger events reported
Unavailable state - protection path is unavailable
Protecting failure state
traffic is being transported on the protection path
Protecting administrative state
operator issued command switching traffic to protection
path
Wait-to-Restore state - recovering from working path
SF/SD
WTR timer not up
Do-not-Revert state - recovered from a protecting state
but operator has configured DNR

Y(J)S APS

Slide 76

PSC local requests


In order from highest to lowest priority :
1. Clear (operator command)
2. Lockout of protection (operator command)
3. Forced Switch (operator command)
4. Signal Fail on protection (OAM / control-plane / server
indication)
5. Signal Fail on working (OAM / control-plane / server indication)
6. Signal Degrade on working (OAM / control-plane / server
indication)
7. Clear Signal Fail/Degrade (OAM / control-plane / server
indication)
8. Manual Switch (operator command)
9. WTR Expires (WTR timer)
10. No Request (default)

Y(J)S APS

Slide 77

Linear protection ITU


style

from draft-zulr-mpls-tp-linear-protectionswitching
Similar to previous, but uses Y.1731/G.8031 format

(no

surprise!)

S=1

GAL Label (13)


TTL

TC

0001
VER
00000000
allocated channel type
ME

VER

L
req prot
state type
END=

OPCODE

FLAGS

=39
request
ed
sig

=0
bridged
sig

GAL
GACh

OFFSE
T=4
reserve

G.803
1

0
Y(J)S APS

Slide 78

Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node
failures
draft-ietf-mpls-tp-ring-protection (not yet RFC)
Between any 2 LSRs can define a Sub-Path Maintenance Entity
So between 2 LSRs on a ring there are 2 SPMEs
we define 1 as the working channel and 1 as the protection
channel
Now we re-use the linear protection mechanisms, including the
PSC protocol

draft-helvoort-mpls-tp-ring-protectionswitching
Both counter-rotating rings carry working and protection traffic
The bandwidth on each ring is divided
X BW is dedicated to working traffic and Y dedicated to
protection traffic
Y(J)S APS

Slide 79

S-ar putea să vă placă și