Documente Academic
Documente Profesional
Documente Cultură
Presentation Slides
Will be available on
ftp://ftp-eng.cisco.com /pfs/seminars/NANOG44-BGP-Techniques.pdf And on the NANOG44 website
NANOG 44
NANOG 44
BGP Basics
What is BGP?
NANOG 44
Described in RFC4271
RFC4276 gives an implementation report on BGP RFC4277 describes operational experiences using BGP
NANOG 44
Collection of networks with same routing policy Single routing protocol Usually under single ownership, trust and administrative control Identified by a unique number (ASN)
NANOG 44
Usage:
1-64511 64512-65534 23456 0 and 65535 65536-4294967295
Current 16-bit ASN allocations up to 49151 have been made to the RIRs
Around 29400 are visible on the Internet
See www.iana.org/assignments/as-numbers
8
NANOG 44
BGP Basics
Peering
A C
AS 100
B E D
AS 101
Runs over TCP port 179 Path vector protocol Incremental updates Internal & External BGP
AS 102
NANOG 44
AS 100
B
DMZ Network
AS 101
D
AS 102
Shared network between ASes
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
10
NANOG 44
11
eBGP used to
exchange prefixes with other ASes implement routing policy
NANOG 44
12
eBGP
eBGP
eBGP
iBGP IGP
iBGP IGP
iBGP IGP
iBGP IGP
NANOG 44
13
AS 100
B
AS 101
Between BGP speakers in different AS Should be directly connected Never run an IGP between eBGP peers
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
14
NANOG 44
15
D
Topology independent Each iBGP speaker must peer with every other iBGP speaker in the AS
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
16
Do not want iBGP session to depend on state of a single interface or the physical topology
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
17
BGP Attributes
NANOG 44
18
AS-Path
Sequence of ASes a route has traversed Used for:
Loop detection Applying policy
AS 200
170.10.0.0/16
AS 100
180.10.0.0/16
AS 300
AS 400
150.10.0.0/16
AS 500
NANOG 44
19
AS 80000
170.10.0.0/16
AS 70000
180.10.0.0/16
AS 300
AS 400
150.10.0.0/16
AS 90000
NANOG 44
20
AS 100
180.10.0.0/16 140.10.0.0/16 170.10.0.0/16 500 300 500 300 200
AS 300
140.10.0.0/16
AS 500
180.10.0.0/16 170.10.0.0/16 140.10.0.0/16 300 200 100 300 200 300
180.10.0.0/16 is not accepted by AS100 as the prefix has AS100 in its ASPATH this is loop detection in action
NANOG 44
21
Next Hop
150.10.1.1 150.10.1.2
150.10.0.0/16
AS 200
iBGP
A
eBGP
AS 300
150.10.0.0/16 150.10.1.1 160.10.0.0/16 150.10.1.1
160.10.0.0/16
AS 100
eBGP address of external neighbour iBGP NEXT_HOP from eBGP Mandatory non-transitive attribute
NANOG 44
22
iBGP
Loopback 120.1.254.2/32
Loopback 120.1.254.3/32
AS 300
D A
120.1.1.0/24 120.1.254.2 120.1.2.0/23 120.1.254.3
23
150.1.1.1
150.1.1.2
eBGP between RouterA and RouterB 120.68.1/24 prefix has next hop address of 150.1.1.3 this is passed on to RouterC instead of 150.1.1.2 More efficient No extra config needed
B
120.68.1.0/24
AS 201
NANOG 44
24
ISP Best Practice is to change external next-hop to be that of the local router
NANOG 44
25
NANOG 44
26
Origin
Conveys the origin of the prefix Historical attribute
Used in transition from EGP to BGP
Transitive and Mandatory Attribute Influences best path selection Three values: IGP, EGP, incomplete
IGP generated by BGP network statement EGP generated by EGP incomplete redistributed from another routing protocol
NANOG 44
27
Aggregator
Conveys the IP address of the router or BGP speaker generating the aggregate route Optional & transitive attribute Useful for debugging purposes Does not influence best path selection
NANOG 44
28
Local Preference
AS 100
160.10.0.0/16
AS 200
D
500 800
AS 300
E
A
160.10.0.0/16 > 160.10.0.0/16 500 800
AS 400
C
NANOG 44
29
Local Preference
Non-transitive and optional attribute Local to an AS non-transitive
Default local preference is 100 (IOS)
NANOG 44
30
2000 1000
AS 200
C D
120.68.1.0/24
2000
120.68.1.0/24
1000
B
120.68.1.0/24
AS 201
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
31
Multi-Exit Discriminator
Inter-AS non-transitive & optional attribute Used to convey the relative preference of entry points
determines best path for inbound traffic
Path with lowest MED wins Absence of MED attribute implies MED value of zero (RFC4271)
NANOG 44
32
NANOG 44
33
Community
Communities are described in RFC1997
Transitive and Optional Attribute
32 bit integer
Represented as two 16 bit integers (RFC1998) Common format is <local-ASN>:xx 0:0 to 0:65535 and 65535:0 to 65535:65535 are reserved
34
E D
AS 400
ISP 1
C
AS 300
permit 160.10.0.0/16 in permit 170.10.0.0/16 in
AS 100
AS 200
170.10.0.0/16
160.10.0.0/16
NANOG 44
35
E D
AS 400
ISP 1
C
AS 300
160.10.0.0/16 300:1 170.10.0.0/16 300:1
AS 100
AS 200
170.10.0.0/16
160.10.0.0/16
NANOG 44
36
Well-Known Communities
Several well known communities
www.iana.org/assignments/bgp-well-known-communities
do not advertise to any eBGP peers do not advertise to any BGP peer do not advertise outside local AS (only used with confederations)
no-peer
65535:65284
37
No-Export Community
105.7.0.0/16 105.7.X.X No-Export 105.7.X.X
A B C E
D AS 200 F G
105.7.0.0/16
AS 100
Subprefixes marked with no-export community Router G in AS200 does not announce prefixes with no-export community set
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
38
No-Peer Community
105.7.0.0/16 105.7.X.X No-Peer upstream
D
105.7.0.0/16
C A
upstream upstream
Sub-prefixes marked with no-peer community are not sent to bi-lateral peers
They are only sent to upstream providers
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
39
NANOG 44
40
NANOG 44
41
NANOG 44
42
NANOG 44
43
NANOG 44
44
NANOG 44
45
NANOG 44
46
NANOG 44
47
NANOG 44
48
Implementations also have policy language which can do various match/set constructs on the attributes of chosen BGP routes
NANOG 44
49
BGP Capabilities
Extending BGP
NANOG 44
50
BGP Capabilities
Documented in RFC2842 Capabilities parameters passed in BGP open message Unknown or unsupported capabilities will result in NOTIFICATION message Codes:
0 to 63 are assigned by IANA by IETF consensus 64 to 127 are assigned by IANA first come first served 128 to 255 are vendor specific
NANOG 44
51
BGP Capabilities
Current capabilities are:
0 1 2 3 4 64 65 66 67 68 Reserved Multiprotocol Extensions for BGP-4 Route Refresh Capability for BGP-4 Outbound Route Filtering Capability Multiple routes to a destination capability Graceful Restart Capability Support for 4 octet ASNs Deprecated 2003-03-06 Support for Dynamic Capability Multisession BGP [ID] [ID] [RFC3392] [RFC4760] [RFC2918] [RFC5291] [RFC3107] [RFC4724] [RFC4893]
See www.iana.org/assignments/capability-codes
NANOG 44
52
BGP Capabilities
Multiprotocol extensions
This is a whole different world, allowing BGP to support more than IPv4 unicast routes Examples include: v4 multicast, IPv6, v6 multicast, VPNs Another tutorial (or many!)
Route refresh is a well known scaling technique covered shortly 32-bit ASNs have recently arrived The other capabilities are still in development or not widely implemented or deployed yet
NANOG 44
53
NANOG 44
54
NANOG 44
55
NANOG 44
56
NANOG 44
57
Dynamic Reconfiguration
Route Refresh
NANOG 44
58
Route Refresh
BGP peer reset required after every policy change
Because the router does not store prefixes which are rejected by policy
NANOG 44
59
No additional memory is used Requires peering routers to support route refresh capability RFC2918
NANOG 44
60
Dynamic Reconfiguration
Use Route Refresh capability if supported
find out from the BGP neighbour status display Non-disruptive, Good For the Internet
If not supported, see if implementation has a workaround Only hard-reset a BGP peering as a last resort
61
Route Reflectors
NANOG 44
62
Two solutions
Route reflector simpler to deploy and run Confederation more complex, has corner case advantages
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
63
AS 100
B C
NANOG 44
64
Route Reflector
Reflector receives path from clients and non-clients Selects best path If best path is from client, reflect to other clients and non-clients If best path is from non-client, reflect to clients only Non-meshed clients Described in RFC4456
B
Clients
Reflectors
C
AS 100
NANOG 44
65
NANOG 44
66
Cluster_list attribute
The local cluster-id is added when the update is sent by the RR Best to set cluster-id is from router-id (address of loopback) (Some ISPs use their own cluster-id assignment strategy but needs to be well documented!)
NANOG 44
67
NANOG 44
68
PoP3
AS 100
69
NANOG 44
70
NANOG 44
71
NANOG 44
72
AS 100 AS 200
E
73
BGP Confederations
NANOG 44
74
Confederations
Divide the AS into sub-AS
eBGP between sub-AS, but some iBGP information is kept Preserve NEXT_HOP across the sub-AS (IGP carries this information) Preserve LOCAL_PREF and MED
NANOG 44
75
Confederations (Cont.)
Visible to outside world as single AS Confederation Identifier
Each sub-AS uses a number from the private AS range (6451265534)
NANOG 44
76
Confederations
Sub-AS 65530
A
Sub-AS 65532
C
77
Confederations: AS-Sequence
180.10.0.0/16 A 200
Sub-AS 65002
180.10.0.0/16 {65004 65002} 200 B C 180.10.0.0/16 {65002} 200
Sub-AS 65004
H
Sub-AS 65003
E F
Sub-AS 65001
180.10.0.0/16
NANOG 44
100
NANOG 44
79
RRs or Confederations
Internet Connectivity Multi-Level Hierarchy Policy Control Scalability Migration Complexity
Yes
Yes
Medium
Medium to High
Route Reflectors
Yes
Yes
Very High
Very Low
Most new service provider networks now deploy Route Reflectors from Day One
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
80
Can use route-reflectors with confederation sub-AS to reduce the sub-AS iBGP mesh
NANOG 44
81
Network Stability for the 1990s Network Instability for the 21st Century!
NANOG 44
82
NANOG 44
83
NANOG 44
84
NANOG 44
85
Operation
Add penalty (1000) for each flap
Change in attribute gets penalty of 500
NANOG 44
86
Operation
4000 Penalty 3000 Suppress limit
Penalty
2000 Reuse limit 1000
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Time
Network Announced
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
Network Re-announced
87
Operation
Only applied to inbound announcements from eBGP peers Alternate paths still usable Controllable by at least:
Half-life reuse-limit suppress-limit maximum suppress time
NANOG 44
88
Configuration
Implementations allow various policy control with flap damping
Fixed damping, same rate applied to all prefixes Variable damping, different rates applied to different ranges of prefixes and prefix lengths
NANOG 44
89
NANOG 44
90
Serious Problems:
"Route Flap Damping Exacerbates Internet Routing Convergence
Zhuoqing Morley Mao, Ramesh Govindan, George Varghese & Randy H. Katz, August 2002
Various work on routing convergence by Craig Labovitz and Abha Ahuja a few years ago Happy Packets
Closely related work by Randy Bush et al
NANOG 44
91
Problem 1:
One path flaps:
BGP speakers pick next best path, announce to all peers, flap counter incremented Those peers see change in best path, flap counter incremented After a few hops, peers see multiple changes simply caused by a single flap prefix is suppressed
NANOG 44
92
Problem 2:
Different BGP implementations have different transit time for prefixes
Some hold onto prefix for some time before advertising Others advertise immediately
Race to the finish line causes appearance of flapping, caused by a simple announcement or path change prefix is suppressed
NANOG 44
93
Solution:
Do NOT use Route Flap Damping whatever you do! RFD will unnecessarily impair access
to your network and to the Internet
NANOG 44
94
NANOG 44
95
NANOG 44
96
BGP Communities
Another ISP scaling technique Prefixes are grouped into different classes or communities within the ISP network Each community means a different thing, has a different result in the ISP network
NANOG 44
97
98
NANOG 44
99
NANOG 44
100
More at http://info.connect.com.au/docs/routing/general/multi-faq.shtml#q13
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
101
NANOG 44
102
NANOG 44
103
mnt-by: source:
NANOG 44
104
NANOG 44
105
AS5400 BT Ignite European Backbone Community to Not announce To peer: Community to AS prepend 5400 5400:2000 5400:2500 5400:2501 5400:2502 5400:2503 5400:2504 5400:2506 5400:2001 5400:2002 5400:2004
5400:1000 All peers & Transits 5400:1500 5400:1501 5400:1502 5400:1503 5400:1504 5400:1506 All Transits Sprint Transit (AS1239) SAVVIS Transit (AS3561) Level 3 Transit (AS3356) AT&T Transit (AS7018) GlobalCrossing Trans(AS3549)
5400:1001 Nexica (AS24592) 5400:1002 Fujitsu (AS3324) 5400:1004 C&W EU (1273) notify@eu.bt.net CIP-MNT RIPE
NANOG 44
107
AS3356 Level 3 Communications ------------------------------------------------------customer traffic engineering communities - Suppression ------------------------------------------------------64960:XXX - announce to AS XXX if 65000:0 65000:0 - announce to customers but not to peers 65000:XXX - do not announce at peerings to AS XXX ------------------------------------------------------customer traffic engineering communities - Prepending ------------------------------------------------------65001:0 - prepend once to all peers 65001:XXX - prepend once at peerings to AS XXX 3356:70 3356:80 3356:90 3356:9999 set local set local set local blackhole preference to 70 preference to 80 preference to 90 (discard) traffic
LEVEL3-MNT RIPE
NANOG 44
109
Okay, so weve learned all about BGP now; how do we use it on our network??
NANOG 44
110
Deploying BGP
The role of IGPs and iBGP Aggregation Receiving Prefixes Configuration Tips
NANOG 44
111
NANOG 44
112
NANOG 44
113
eBGP used to
exchange prefixes with other ASes implement routing policy
NANOG 44
114
eBGP
eBGP
eBGP
iBGP IGP
iBGP IGP
iBGP IGP
iBGP IGP
NANOG 44
115
NANOG 44
116
Point static route to customer interface Enter network into BGP process
Ensure that implementation options are used so that the prefix always remains in iBGP, regardless of state of interface i.e. avoid iBGP flaps caused by interface flaps
NANOG 44
117
Aggregation
Quality or Quantity?
NANOG 44
118
Aggregation
Aggregation means announcing the address block received from the RIR to the other ASes connected to your network Subprefixes of this aggregate may be:
Used internally in the ISP network Announced to other ASes to aid with multihoming
Unfortunately too many people are still thinking about class Cs, resulting in a proliferation of /24s in the Internet routing table
NANOG 44
119
Aggregation
Address block should be announced to the Internet as an aggregate Subprefixes of address block should NOT be announced to Internet unless special circumstances (more later) Aggregate should be generated internally
Not on the network borders!
NANOG 44
120
Announcing an Aggregate
ISPs who dont and wont aggregate are held in poor regard by community Registries publish their minimum allocation size
Anything from a /20 to a /22 depending on RIR Different sizes for different address blocks
No real reason to see anything longer than a /22 prefix in the Internet
BUT there are currently >141000 /24s!
NANOG 44
121
Aggregation Example
100.10.10.0/23 100.10.0.0/24 100.10.4.0/22
customer
Internet
AS100
100.10.10.0/23
Customer has /23 network assigned from AS100s /19 address block AS100 announces customers individual networks to the Internet
NANOG 44
122
Aggregation Example
100.10.0.0/19
100.10.0.0/19 aggregate
customer
Internet
AS100
100.10.10.0/23
Customer has /23 network assigned from AS100s /19 address block AS100 announced /19 aggregate to the Internet
NANOG 44
124
The whole Internet becomes visible immediately Customer has Quality of Service perception
NANOG 44
125
Aggregation Summary
Good example is what everyone should do!
Adds to Internet stability Reduces size of routing table Reduces routing churn Improves Internet QoS for everyone
NANOG 44
126
NANOG 44
127
NANOG 44
128
Networks
165 0 0 0 0 3 87 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Block
79/8 80/8 81/8 82/8 83/8 84/8 85/8 86/8 87/8 88/8 89/8 90/8 91/8 96/8 97/8 98/8 99/8 112/8 113/8 114/8 115/8 116/8 117/8
Networks
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Block
118/8 119/8 120/8 121/8 122/8 123/8 124/8 125/8 126/8 173/8 174/8 186/8 187/8 189/8 190/8 192/8 193/8 194/8 195/8 196/8 198/8 199/8 200/8
Networks
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6275 2390 2932 1338 513 4034 3495 1348
Block
201/8 202/8 203/8 204/8 205/8 206/8 207/8 208/8 209/8 210/8 211/8 212/8 213/8 216/8 217/8 218/8 219/8 220/8 221/8 222/8
Networks
0 2276 3622 3792 2584 3127 2723 2817 2574 617 0 717 1 943 0 0 0 0 0 0
129
Networks
3103 1087 1479 1317 853 2653 2303 3069 5953 4012 7172 2652 2858 4203 1798 1186 3543 254 3002 1086 1029 1515 1169
Block
79/8 80/8 81/8 82/8 83/8 84/8 85/8 86/8 87/8 88/8 89/8 90/8 91/8 96/8 97/8 98/8 99/8 112/8 113/8 114/8 115/8 116/8 117/8
Networks
588 2162 1724 1641 1215 1290 2316 768 1484 900 2824 220 2227 255 162 389 282 0 0 4 4 1011 960
Block
118/8 119/8 120/8 121/8 122/8 123/8 124/8 125/8 126/8 173/8 174/8 186/8 187/8 189/8 190/8 192/8 193/8 194/8 195/8 196/8 198/8 199/8 200/8
Networks
649 469 0 1054 1600 1225 1787 2217 46 0 0 2 6 1475 3203 6929 6220 4926 4480 1769 4799 4116 8626
Block
201/8 202/8 203/8 204/8 205/8 206/8 207/8 208/8 209/8 210/8 211/8 212/8 213/8 216/8 217/8 218/8 219/8 220/8 221/8 222/8
Networks
3632 10934 11000 5601 3008 3863 4285 5444 5590 4931 2875 3015 3310 7129 2666 1375 1320 2153 969 1268
130
NANOG 44
131
NANOG 44
132
109034 prefixes
After aggregating by Origin AS
i.e. ignoring each ASNs traffic engineering
NANOG 44
133
NANOG 44
134
NANOG 44
135
NANOG 44
136
NANOG 44
137
NANOG 44
138
NANOG 44
139
NANOG 44
140
NANOG 44
141
NANOG 44
142
Importance of Aggregation
Size of routing table
Memory is no longer a problem Routers can be specified to carry 1 million prefixes
NANOG 44
143
NANOG 44
144
NANOG 44
145
NANOG 44
146
Aggregation Summary
Aggregation on the Internet could be MUCH better
35% saving on Internet routing table size is quite feasible Tools are available
Commands on the routers are not hard CIDR-Report webpage
NANOG 44
147
Receiving Prefixes
NANOG 44
148
Receiving Prefixes
There are three scenarios for receiving prefixes from other ASNs
Customer talking BGP Peer talking BGP Upstream/Transit talking BGP
NANOG 44
149
NANOG 44
150
NANOG 44
151
NANOG 44
152
NANOG 44
153
NANOG 44
154
NANOG 44
155
NANOG 44
156
Receiving Prefixes
Paying attention to prefixes received from customers, peers and transit providers assists with:
The integrity of the local network The integrity of the Internet
NANOG 44
157
Configuration Tips
NANOG 44
158
Make sure IGP carries loopback /32 address Consider the DMZ nets:
Use unnumbered interfaces? Use next-hop-self on iBGP neighbours Or carry the DMZ /30s in the iBGP Basically keep the DMZ nets out of the IGP!
NANOG 44
159
iBGP: Next-hop-self
BGP speaker announces external network to iBGP peers using routers local address (loopback) as nexthop Used by many ISPs on edge routers
Preferable to carrying DMZ /30 addresses in the IGP Reduces size of IGP to just core infrastructure Alternative to using unnumbered interfaces Helps scale network Many ISPs consider this best practice
NANOG 44
160
Even using AS_PATH prepends, it is not normal to see more than 20 ASes in a typical AS_PATH in the Internet today
The Internet is around 5 ASes deep on average Largest AS_PATH is usually 16-20 ASNs
NANOG 44
161
If your implementation supports it, consider limiting the maximum AS-path length you will accept
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
162
ISP
R1
TTL 254
AS 100
R2 TTL 254
Attacker
TTL 253
NANOG 44 2008 Cisco Systems, Inc. All rights reserved.
163
NANOG 44
164
Templates
Good practice to configure templates for everything
Vendor defaults tend not to be optimal or even very useful for ISPs ISPs create their own defaults by using configuration templates
NANOG 44
165
NANOG 44
166
Powerful preventative tool, especially when combined with filters and the TTL hack
NANOG 44
167
168
NANOG 44
169
Summary
Use configuration templates Standardise the configuration Be aware of standard tricks to avoid compromise of the BGP session Anything to make your life easier, network less prone to errors, network more likely to scale Its all about scaling if your network wont scale, then it wont be successful
NANOG 44
170
End of Tutorial!
NANOG 44
171